You are on page 1of 50
Frontiers of Computational Journalism
Frontiers of
Computational Journalism

Columbia Journalism School Week 7: Algorithmic Accountability and Discrimination

October 27, 2017

Computational Journalism Columbia Journalism School Week 7: Algorithmic Accountability and Discrimination October 27, 2017
This class
This class

Algorithmic accountability stories

Analyzing bias in data

FATML

Unpacking Propublica’s “Machine Bias”

Multistage discrimination Models

Analyzing bias in data • FATML • Unpacking Propublica’s “Machine Bias” • Multistage discrimination Models
Algorithmic Accountability Stories
Algorithmic Accountability Stories
Algorithms in our lives
Algorithms in our lives

Personalized search

Political microtargeting

Credit score / loans / insurance

Predictive policing

Price discrimination

Algorithmic trading / markets

Terrorist threat prediction

Hiring models

policing • Price discrimination • Algorithmic trading / markets • Terrorist threat prediction • Hiring models
From myfico.com
From myfico.com

From myfico.com

From myfico.com
Predicted crime times and locations in the PredPol system.

Predicted crime times and locations in the PredPol system.

Predicted crime times and locations in the PredPol system.
Websites Vary Prices, Deals Based on Users' Information Valentino -Devries, Singer-Vine and Soltani, WSJ, 2012
Websites Vary Prices, Deals Based on Users' Information Valentino -Devries, Singer-Vine and Soltani, WSJ, 2012

Websites Vary Prices, Deals Based on Users' Information Valentino -Devries, Singer-Vine and Soltani, WSJ, 2012

Websites Vary Prices, Deals Based on Users' Information Valentino -Devries, Singer-Vine and Soltani, WSJ, 2012
How Uber surge pricing really works, Nick Diakopolous
How Uber surge pricing really works, Nick Diakopolous

How Uber surge pricing really works, Nick Diakopolous

How Uber surge pricing really works, Nick Diakopolous
Message Machine Jeff Larson, Al Shaw, ProPublica, 2012

Message Machine Jeff Larson, Al Shaw, ProPublica, 2012

Message Machine Jeff Larson, Al Shaw, ProPublica, 2012
Analyzing Bias in Data
Analyzing Bias in Data
Title VII of Civil Rights Act, 1964 • It shall be an unlawful employment practice
Title VII of Civil Rights Act, 1964
• It shall be an unlawful employment practice for an employer -

(1) to fail or refuse to hire or to discharge any individual, or otherwise to discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual’s race, color, religion, sex, or national origin; or

(2) to limit, segregate, or classify his employees or applicants for employment in any way which would deprive or tend to deprive any individual of employment opportunities or otherwise adversely affect his status as an employee, because of such individual’s race, color, religion, sex, or national origin.

adversely affect his status as an employee, because of such individual’s race, color, religion, sex, or
Investors prefer entrepreneurial ventures pitched by attractive men, Brooks et. al. 2014

Investors prefer entrepreneurial ventures pitched by attractive men, Brooks et. al. 2014

Investors prefer entrepreneurial ventures pitched by attractive men, Brooks et. al. 2014
Women in Academic science: a Changing Landscape Ceci , et. al

Women in Academic science: a Changing Landscape Ceci, et. al

Women in Academic science: a Changing Landscape Ceci , et. al

Swiss judges: a natural experiment

Swiss judges: a natural experiment 24 Judges of Swiss Federal Administrative court are randomly assigned to

24 Judges of Swiss Federal Administrative court are randomly assigned to cases. They rule at different rates on migrant deportation cases. Here are their deportation rates broken down by party.

Here are their deportation rates broken down by party. Barnaby Skinner and Simone Rau, Tages-Anzeiger .

Barnaby Skinner and Simone Rau, Tages-Anzeiger. https://github.com/barjacks/swiss-asylum-judges

broken down by party. Barnaby Skinner and Simone Rau, Tages-Anzeiger . https://github.com/barjacks/swiss-asylum-judges

Florida sentencing analysis adjusted for “points”

Florida sentencing analysis adjusted for “points” Bias on the Bench, Michael Braga, Herald Tribune
Florida sentencing analysis adjusted for “points” Bias on the Bench, Michael Braga, Herald Tribune

Bias on the Bench, Michael Braga, Herald Tribune

Florida sentencing analysis adjusted for “points” Bias on the Bench, Michael Braga, Herald Tribune

Containing 1.4 million entries, the DOC database notes the exact number of points assigned to defendants convicted of felonies. The points are based on the nature and severity of the crime committed, as well as other factors such as past criminal history, use of a weapon and whether anyone got hurt. The more points a defendant gets, the longer the minimum sentence required by law.

Florida legislators created the point system to ensure defendants committing the same crime are treated equally by judges. But that is not what happens.

The Herald-Tribune established this by grouping defendants who committed the same crimes according to the points they scored at sentencing. Anyone who scored from 30 to 30.9 would go into one group, while anyone who scored from 31 to 31.9 would go in another, and so on.

We then evaluated how judges sentenced black and white defendants within each point range, assigning a weighted average based on the sentencing gap.

If a judge wound up with a weighted average of 45 percent, it meant that judge sentenced black defendants to 45 percent more time behind bars than white defendants.

Bias on the Bench: How We Did It, Michael Braga, Herald Tribune

45 percent more time behind bars than white defendants. Bias on the Bench: How We Did
Unadjusted disciplinary rates
Unadjusted disciplinary rates
Unadjusted disciplinary rates The Scourge of Racial Bias in New York State’s Prisons, NY Times

The Scourge of Racial Bias in New York State’s Prisons, NY Times

Unadjusted disciplinary rates The Scourge of Racial Bias in New York State’s Prisons, NY Times

Limited data for adjustment

In most prisons, blacks and Latinos were disciplined at higher rates than whites — in some cases twice as often, the analysis found. They were also sent to solitary confinement more frequently and for longer durations. At Clinton, a prison near the Canadian border where only one of the 998 guards is African-American, black inmates were nearly four times as likely to be sent to isolation as whites, and they were held there for an average of 125 days, compared with 90 days for whites.

A greater share of black inmates are in prison for violent offenses, and minority inmates are

disproportionately younger, factors that could explain why an inmate would be more likely to

break prison rules, state officials said. But even after accounting for these elements, the disparities

in discipline persisted, The Times found.

The disparities were often greatest for infractions that gave discretion to officers, like disobeying a direct order. In these cases, the officer has a high degree of latitude to determine whether a rule is broken and does not need to produce physical evidence. The disparities were often smaller, according to the Times analysis, for violations that required physical evidence, like possession of contraband.

The Scourge of Racial Bias in New York State’s Prisons, NY Times

physical evidence, like possession of contraband. The Scourge of Racial Bias in New York State’s Prisons,
Comparing more subjective offenses
Comparing more subjective offenses
Comparing more subjective offenses The Scourge of Racial Bias in New York State’s Prisons, NY Times

The Scourge of Racial Bias in New York State’s Prisons, NY Times

Comparing more subjective offenses The Scourge of Racial Bias in New York State’s Prisons, NY Times

Simpson’s paradox

Sex Bias in Graduate Admissions:

Data from Berkeley Bickel, Hammel and O'Connell,

1975

Simpson’s paradox Sex Bias in Graduate Admissions: Data from Berkeley Bickel, Hammel and O'Connell, 1975
Fairness and Transparency in Machine Learning (FATML)
Fairness and Transparency in
Machine Learning (FATML)

Learning from Facebook likes

Learning from Facebook likes From Kosinski et. al., Private traits and attributes are predictable from digital

From Kosinski et. al., Private traits and attributes are predictable from digital records of human behavior

likes From Kosinski et. al., Private traits and attributes are predictable from digital records of human
likes From Kosinski et. al., Private traits and attributes are predictable from digital records of human
Predicting gender from Twitter Zamal et. al., Homophily and Latent Attribute Inference: Inferring Latent Attributes

Predicting gender from Twitter

Predicting gender from Twitter Zamal et. al., Homophily and Latent Attribute Inference: Inferring Latent Attributes of

Zamal et. al., Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors

Zamal et. al., Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors

Predicting race from Twitter

Predicting race from Twitter Pennacchiotti and Popescu, A Machine Learning Approach to Twitter User Classification

Pennacchiotti and Popescu, A Machine Learning Approach to Twitter User Classification

Even if two groups of the population admit simple classifiers, the whole population may not

Even if two groups of the population admit simple classifiers, the whole population may not (from How Big Data is Unfair)

Even if two groups of the population admit simple classifiers, the whole population may not (from
Unpacking ProPublica’s “Machine Bias”
Unpacking ProPublica’s
“Machine Bias”
Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet ?, FiveThirtyEight
Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet ?, FiveThirtyEight

Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet?, FiveThirtyEight

Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet ?, FiveThirtyEight
How We Analyzed the COMPAS Recidivism Algorithm , ProPublica

How We Analyzed the COMPAS Recidivism Algorithm, ProPublica

How We Analyzed the COMPAS Recidivism Algorithm , ProPublica
Stephanie Wykstra, personal communication
Stephanie Wykstra, personal communication

Stephanie Wykstra, personal communication

Stephanie Wykstra, personal communication
ProPublica argument False positive rate P(high risk |black, no arrest) = C/(C+A) = 0.45 P(high

ProPublica argument

False positive rate P(high risk |black, no arrest) = C/(C+A) = 0.45 P(high risk |white, no arrest) = G/(G+E) = 0.23

False negative rate P(low risk | black, arrested ) = B/(B+D) = 0.28 P(low risk | white, arrested ) = F/(F+H) = 0.48

Northpointe response

Positive predictive value P(arrest| black, high risk) = D/(C+D) = 0.63 P(arrest| white, high risk) = H/(G+H) = 0.59

Positive predictive value P(arrest| black, high risk) = D/(C+D) = 0.63 P(arrest| white, high risk) =

P(outcome | score) is fair

P(outcome | score) is fair Fair prediction with disparate impact: A study of bias in recidivism

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments, Chouldechova

score) is fair Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,

Or, as ProPublica put it

Or, as ProPublica put it How We Analyzed the COMPAS Recidivism Algorithm , ProPublica

How We Analyzed the COMPAS Recidivism Algorithm, ProPublica

Or, as ProPublica put it How We Analyzed the COMPAS Recidivism Algorithm , ProPublica
The Problem
The Problem
The Problem Fair prediction with disparate impact: A study of bias in recidivism prediction instruments, Chouldechova

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments, Chouldechova

The Problem Fair prediction with disparate impact: A study of bias in recidivism prediction instruments, Chouldechova

Impossibility theorem

When the base rates differ by protected group and when there is not separation, one

cannot have both conditional use accuracy equality and equality in the false negative and false positive rates.

The goal of complete race or gender neutrality is unachievable.

Altering a risk algorithm to improve matters can lead to difficult stakeholder choices. If it

is essential to have conditional use accuracy equality, the algorithm will produce different false positive and false negative rates across the protected group categories. Conversely, if it is essential to have the same rates of false positives and false negatives across protected group categories, the algorithm cannot produce conditional use accuracy equality. Stakeholders will have to settle for an increase in one for a decrease in the other.

Fairness in Criminal Justice Risk Assessments: The State of the Art, Berk et. al.

in one for a decrease in the other. Fairness in Criminal Justice Risk Assessments: The State
Multi-stage discrimination models
Multi-stage discrimination models
All The Stops, Thomas Rhiel , Bklynr.com, 2012
All The Stops, Thomas Rhiel , Bklynr.com, 2012

All The Stops, Thomas Rhiel, Bklynr.com, 2012

All The Stops, Thomas Rhiel , Bklynr.com, 2012

In search of fairness

Benchmark Test

Is group A searched more often than group B?

Problem: Assumes A and B have identical distributions of behavior (or whatever signals are used to decide on searching.)

Outcome Test

Do searches of group A result in a “hit” less of than group B?

Problem: Infra-marginality

on searching.) Outcome Test Do searches of group A result in a “hit” less of than

Simoiu et. al.

Simoiu et. al.

Infra-marginality

Outcome tests, however, are imperfect barometers of bias. To see this, suppose that there are two, easily distinguishable types of white drivers: those who have a 1% chance of carrying contraband, and those who have a 75% chance. Similarly, assume that black drivers have either a 1% or 50% chance of carrying contraband. If officers, in a race-neutral manner, search individuals who are at least 10% likely to be carrying contraband, then searches of whites will be successful 75% of the time whereas searches of blacks will be successful only 50% of the time. This simple example illustrates a subtle failure of outcome tests known as the problem of infra-marginality.

The Problem of Infra-marginality in Outcome Tests for Discrimination, Simoiu et. al

as the problem of infra- marginality. The Problem of Infra-marginality in Outcome Tests for Discrimination, Simoiu
Simiou et. al.

Simiou et. al.

Simiou et. al.

Simoiu et. al.

Simoiu et. al.

Threshold for searching

p(contraband|d) Searches Hits
p(contraband|d)
Searches
Hits

p(contraband|r)

How many drivers has this department searched ?

How many drivers of this race have we searched ?

Simoiu et. al.

has this department searched ? How many drivers of this race have we searched ? Simoiu

r = race d = department

has this department searched ? How many drivers of this race have we searched ? Simoiu

Simoiu et. al.

Simoiu et. al.