Sorelle Friedler, Haverford College
Obfuscating data with respect to protected information can serve to decrease discrimination and increase transparency, even in the face of black-box proprietary models.
Algorithms are increasingly being used to make high-stakes decisions that directly impact people’s lives. Such algorithms may use past data about a company’s hiring practices to determine who should be hired in the future , determine where to send police based on historical arrest data , or be used to relieve overcrowding in jails by releasing those predicted most likely to reappear without bail . As high-stakes decision-making about people has become more driven by machine learning processes, these algorithmic choices have also begun to come under more scrutiny. A recent Wisconsin court case challenged the right to use proprietary recidivism prediction algorithms at sentencing time , Philadelphia’s pre-trial risk assessment has been receiving press that focuses on its potential racist impacts , and Chicago’s predictive policing algorithms have also been viewed as automated racial profiling . These worries, especially around the potential for discrimination arising from machine-learned decisions about people, have led to prominent calls for more accountability in algorithmic decisions [19, 1].
Perhaps the most obvious way that machine-learned decisions can become discriminatory is by using training data that directly encodes human biases, for example creating a hiring algorithm based on historical hiring decisions at an all-white company may lead an algorithm to discriminate against people of color. More subtly, data collection feedback loops may reinforce incorrect algorithmic notions, for example a predictive policing algorithm that keeps sending police back to the same neighborhood because the algorithm sent police there the previous day . Issues may also arise due to systematic differences between groups—patterns that were machine-learned on traditionally Western names may not hold on the Native American subpopulation. This can be compounded without careful evaluation of the algorithm, since many metrics weight errors per-person instead of per-group, so an algorithm that has the incorrect outcome on all Native Americans (about 2% of the U.S. population) could be evaluated as 98% successful. Many solutions have now been put forward to try to perform fairness-aware machine learning. Broadly, the interventions can be characterized as those performed by pre-processing the training data [8, 24, 5, 11], by directly changing the machine-learning algorithm [13, 6, 23], and by post-processing the outcomes . In addition, recent work has focused on fair decision making with feedback loops [10, 9, 14].
What does this have to do with obfuscation? One way to think about the removal of protected information, such as race or sex, from a training data set (pre-processing) is as the obfuscation of the data set with respect to that information. In fact, it has been shown that the discriminatory impact of any classifier can be estimated from the training data by measuring the error in attempting to predict the protected information ; i.e., if your race can be guessed, it can be used to discriminate against you. This idea can also be used to certify a data set as safe from potential discrimination; if a protected feature can’t be predicted from the remaining features, then the information from that feature can’t influence the outcome of the model. In other words, if race can’t be predicted (with low error) from the remaining data in the training set, a machine learning algorithm trained on this data can’t discriminate based on race. Thus, obfuscating the data with respect to protected class serves to prevent discrimination.
However, referring to this procedure as obfuscation implies that the observed data being used to train the machine learning model is correct and does not itself suffer from systemic bias. Instead, if the belief is that any distributional differences between groups in the data is the result of observational bias, then this procedure can be viewed as repairing the data so that it more properly reflects the underlying truth. One such repair procedure works by modifying the protected class-conditioned distributions per-attribute so that they look more similar . This works by effectively grouping people based on their per-group quantile and assigning all members of that quantile the same score, specifically the score that is the quantile’s median over the groups. This preserves the within-group ranks for each person, serving to preserve some predictive capacity of the data. Experimentally, this procedure has been shown to result in fair classifiers (measured using the disparate impact four-fifths ratio ).
Using this repair procedure can cause a drop in accuracy (or other measures of utility) in the resulting classifier. While this is often framed as a problem for the effectiveness of fairness-aware machine learning—a “tradeoff” between fairness and accuracy—it can also be viewed as a measure of the extent to which the protected class was used by the machine learning model. In fact, this procedure can be used to remove any attribute from the data set by obscuring the remaining attributes with respect to that one. The importance of the removed attribute can then be measured based on the model’s drop in accuracy—removed attributes that have a larger drop in accuracy had a larger influence on the model’s outcomes . This procedure measures the indirect influence of an attribute; the effect of correlated or proxy variables is included in the overall influence ascribed to the feature, so that if zip code is used by a model as a proxy for race, race is considered to have an influence on the model’s outcomes.
This tool for auditing for indirect influence allows the partial de-obfuscation of black-box systems . For example, a groundbreaking study by ProPublica  recently investigated a risk assessment instrument called COMPAS  and found that it was biased against black defendants in the sense that the misclassification rates were skewed so that black defendants were more likely to be incorrectly labeled high risk, while white defendants were more likely to be incorrectly labeled low risk. With direct access to COMPAS, we could run the above procedure to determine the indirect influence of each variable on the outcome. Unfortunately, COMPAS is considered proprietary by Northpoint, the company that created it—even the full inputs are unknown. Without such access, we can instead attempt to model the COMPAS low / medium / high outcomes using data released by ProPublica. Modeling these outcomes using a support vector machine (SVM) model, we see that juvenile records and age are most important in predicting these outcomes (obscuring these attributes causes a large drop in model accuracy), while race is also important but to a much lesser extent. However, the somewhat low starting accuracy of the SVM model, due to the lack of access to either COMPAS or the true inputs, weakens these results. With access to COMPAS, such conclusions would more directly serve to de-obfuscate the decision-making process.
NB: This paper describes work published in  and .
 More accountability for big-data algorithms. Nature, 537(449), Sept. 21 2016.
 P. Adler, C. Falk, S. A. Friedler, G. Rybeck, C. Scheidegger, B. Smith, and S. Venkatasubramanian. Auditing black-box models for indirect influence. In Proc. of the IEEE International Conference on Data Mining (ICDM), 2016.
 J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias. ProPublica, May 23, 2016.
 J. Bachner and J. Lynch. Is predictive policing the law-enforcement tactic of the future? The Wall Street Journal, Apr. 24 2016.
 T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In ICDM Workshop Domain Driven Data Mining, pages 13–18, 2009.
 T. Calders and S. Verwer. Three naïve Bayes approaches for discrimination-free classification. Data Min Knowl Disc, 21:277–292, 2010.
 K. Colaneri. Can a computer algorithm be trusted to help relieve philly’s overcrowded jails? NewsWorks, Sept. 1 2016.
 M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. Proc. 21st ACM KDD, pages 259–268, 2015.
 M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. Rawlsian fairness for machine learning. arXiv preprint arXiv:1610.09559, 2016.
 M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth. Fairness in learning: Classic and contextual bandits. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 325–333. Curran Associates, Inc., 2016.
 F. Kamiran and T. Calders. Classifying without discriminating. In Proc. of the IEEE International Conference on Computer, Control and Communication, 2009.
 F. Kamiran, A. Karim, and X. Zhang. Decision theory for discrimination-aware classification. Proc. of the IEEE 12th International Conference on Data Mining, 2012.
 T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, pages 35–50, 2012.
 S. Kannan, M. Kearns, J. Morgenstern, M. Pai, A. Roth, R. Vohra, and Z. S. Wu. Fairness incentives for myopic agents. arXiv preprint arXiv:1705.02321, 2017.
 K. Lum and W. Isaac. To predict and serve? Significance, pages 14—18, October 2016.
 C. C. Miller. Can an algorithm hire better than a human? The New York Times: The UpShot, June 25 2015.
 Northpointe. COMPAS – the most scientifically advanced risk and needs assessments. http://www.northpointeinc.com/risk-needs-assessment.
 M. S. on. The minority report: Chicago’s new police computer predicts crimes, but is it racist? The Verge, Feb. 19 2014.
 J. Podesta, P. Pritzker, E. J. Moniz, J. Holdren, and J. Zients. Big data: seizing opportunities, preserving values. Executive Office of the President, May 2014.
 J. Reyes. Philadelphia is grappling with the prospect of a racist computer algorithm. Technically Philly, Sept. 16 2016.
 M. Smith. In wisconsin, a backlash against using data to foretell defendants’ futures. The New York Times, June 22 2016.
 The U.S. EEOC. Uniform guidelines on employee selection procedures, March 2, 1979.
 M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi. Fairness constraints: A mechanism for fair classification. In ICML Workshop on Fairness, Accountability, and Transparency in Machine Learning (FATML), 2015.
 R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proc. of Intl. Conf. on Machine Learning, pages 325–333, 2013.