09 Fair Machine Learning II

Week 9:
Fair Machine Learning

Winter term 2020 / 2021
Chair for Computational Social Science and Humanities

Markus Strohmaier, Florian Lemmerich, and Tobias Schumacher
Where we are
Florian Lemmerich
Social Data Science
Sources and Resources
➢ Sara Hajian (Algorithmic Bias, KDD Tutorial 2016)
➢ S. Bird et al: Fairness-Aware Machine Learning in Practice
https://sites.google.com/view/fairness-tutorial
➢ S. Barocas and M. Hardt: Fair Machine Learning https://vimeo.com/248490141
➢ https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-
3ff8ba1040cb
➢ Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen:
https://www.youtube.com/watch?v=Igq_S_7IfOU
Florian Lemmerich
Social Data Science
3
Agenda
➢ Discrimination and Biases in Machine Learning
➢ Fairness Measures
➢ (Discrimination Detection)
➢ Discrimination Avoidance
Florian Lemmerich
4 Social Data Science
9.1 Bias and Discrimination in ML
5
Decision Making
➢ More and more important decision are taken by algorithms,

not by humans
➢ Human decisions:
▪ Objective elements (“rational”)
▪ Subjective elements (“emotional”, “prejudiced”)
➢ Algorithms:
▪ Only based on objective inputs
Florian Lemmerich
Amazon hiring
7 Florian Lemmerich
https://www.theguardian.com/technology/2018/oct/10/a
Social Data Science
mazon-hiring-ai-gender-bias-recruiting-engine
Search results for “C.E.O”
➢ 27% of C.E.O.s are female (US, 2015)
Florian Lemmerich
➢ M. Kay, C. Matuszek, S. Munson (2015): Unequal Representation and Gender
Stereotypes in Image Search Results for Occupations. CHI'15.
Florian Lemmerich
➢ Given the same other profile information and search history,
Google advertises higher paying jobs to male accounts
Florian Lemmerich
US Healthcare system
”For example, among all patients classified as very high-risk,

black individuals turned out to have 26.3 percent more chronic
illnesses than white ones (despite sharing similar risk scores).
Because their recorded health care costs were on par with those
of healthier white people, the program was less likely to flag
eligible black patients for high-risk care management. ”
https://www.scientificamerican.com/article/racial-bias-found-in-a-major-health-care-risk-algorithm/
Florian Lemmerich
Example credit scoring
Florian Lemmerich
What is fair?
’I looked up fairness in the dictionary and it was not there.’
- William Giraldi
➢ What should our algorithm achieve to be fair?
➢ Utilitaristic: Just show what the user wants it to be

➢ Descriptive: Based on how the world currently really is
➢ Normative: Show how the world should be
➢ These goals can be contradictory
Florian Lemmerich
Biases in ML
➢ Discrimination: In principle goal of ML classification:
Discriminate (=distinguish) positive and negative example
➢ Discrimination in ML most often used as “unfair discrimination against one group”
➢ But… aren’t algorithm objective? How can something be “unfair”?
➢ Typically, the bias does not origin in the ML-algorithm
➢ But algorithms learn from biased data!
Florian Lemmerich
Bias in Machine Learning Data
➢ Biased labels and labelers: classes are often assigned by humans that bring in their
conscious/unconscious prejudices, etc…
Example: Judges gives higher sentences to accused people of color, ML-algorithms
learns from this and suggests higher sentences to those with similar crimes
➢ Skewed sample: Future observations confirm predictions

Example: send police primarily where more crime is predicted, find more crimes there
➢ Sample size disparity: More training data for one group of people
Example: Train image recognition from a training set that contains mostly white males
➢ Feature accuracy: Data might have been collected more reliable for one group than for
another:
Example: People of color go less often to the doctor, thus less precise data on them
might be available
Florian Lemmerich
Causes of Unfairness
➢ Cultural differences lead to imbalanced prediction errors
https://medium.com/@mrtz/how-big-data-is-unfair-
9aa544d739de
Florian Lemmerich
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-
3ff8ba1040cb
Florian Lemmerich
Legal: Protected classes
➢ Protected attributes depend on the jurisdiction you are under (e.g., US vs EU)
➢ Germany: Discrimination is prohibited in Germany on six grounds:
▪ Race and ethnic origin
▪ Gender
▪ Religion and worldview
▪ Disability and chronic disease
▪ Age
▪ Sexual orientation
➢ Discrimination is forbidden in working life and day-to-day life, e.g.:
▪ When applying for a job
▪ Payment
▪ Renting an apartment
https://www.antidiskriminierungsstelle.de/SharedDocs/Downloads/DE/
▪ Going to a club publikationen/Refugees/Fluechtlingsbroschuere_englisch.pdf?__blob=
publicationFile&v=13
▪ Opening a bank account
▪ …
Florian Lemmerich
Two Legal Doctrines in the US
Disparate Treatment ➢ Disparate impact
➢ Purposeful consideration of group ➢ Avoidable and unjustified harm,
membership possibly indirect
➢ Intentional discrimination without
consideration of group membership
➢ Goal: ➢ Goal:
Procedural fairness Minimize differences in outcomes
(distributive justice)
Potential conflict!
Florian Lemmerich
19 Social Data Science Moritz Hardt, MLSS2020, Tübingen
Legal
➢ Anti-discrimination regulations in place
➢ Provide a framework of definitions, objectives, constraints

➢ EU: explicitly requires data controllers to“implement appropriate technical and
organizational measures” that “prevents, inter alia, discriminatory effects” on the basis
of processing sensitive data
➢ E.g., GDPR “Right to explanation”
Goodman, Flaxman: European Union regulations on

algorithmic decision-making and a “right to explanation”, 2016
Florian Lemmerich
Laws and politics/policies
➢ Legal regulations give definitions/constraints:
▪ Might change
▪ Are regional
➢ Assessment is not clear:

▪ Is my data algorithm legal
▪ Is my discrimination detection useful in a law suit?
Florian Lemmerich
Technical perspective
➢ Fighting discrimination requires:
▪ Methods to measure discrimination

▪ Methods to detect discrimination
▪ Methods to avoid discrimination
Florian Lemmerich
8.3 Measuring Fairness
23
Measuring discrimination
Discrimination Discrimination
at the individual level at the group level
Consistency in the decision Statistical parity
Florian Lemmerich
Individual fairness
➢ Compare the decision of the instance with its nearest neighbors
“Similar properties should lead to similar outcomes”
➢ Consistency score:
▪ Assume binary decision yp of the algorithm
▪ Compute for a point p the set N(p) as the set of k-nearest neighbors
(acc. to some distance function)
1
▪ 𝐶=1 − σ𝑝 σ𝑗 ∈𝑁(𝑝) |𝑦𝑝 − 𝑦𝑗 |
𝑁∗𝑘
Florian Lemmerich
Four-Fifth rule
➢ “a selection rate for any race, sex, or ethnic group which is less than four-fifths (or 80%) of
the rate for the group with the highest rate will generally be regarded by the Federal
enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate
will generally not be regarded by Federal enforcement agencies as evidence of adverse
impact.”
➢ Origin: assembled by the State of California Fair Employment Practice Commission
(FEPC) in 1971
Florian Lemmerich
https://www.prevuehr.com/resources/insights/adverse-impact-
analysis-four-fifths-rule/
Another explanation of the four/fifth rule (from Wikipedia)
“The rule was based on the rates at which job applicants were hired. For example, if
XYZ Company hired 50 percent of the men applying for work in a predominantly
male occupation while hiring only 20 percent of the female applicants, one could
look at the ratio of those two hiring rates to judge whether there might be a
discrimination problem. The ratio of 20:50 means that the rate of hiring for female
applicants is only 40 percent of the rate of hiring for male applicants. That is, 20
divided by 50 equals 0.40, which is equivalent to 40 percent. Clearly, 40 percent is
well below the 80 percent that was arbitrarily set as an acceptable difference in
hiring rates. Therefore, in this example, XYZ Company could have been called upon
to prove that there was a legitimate reason for hiring men at a rate so much higher
than the rate of hiring women.”
https://en.wikipedia.org/wiki/Disparate_impact
Florian Lemmerich
Group Fairness I: Statistical parity
➢ Create a four field table of counts
group Benefit Benefit sum

denied granted
protected a b nprot
unprotected c d nunprot
sum mdenied mgranted n
➢ Goal I:
𝑏 𝑑
▪ Equal positive rate: ≈
𝑛𝑝𝑟𝑜𝑡 𝑛𝑢𝑛𝑝𝑟𝑜𝑡
Florian Lemmerich
Statistical parity (discrimination measures)
➢ Create a four field table of counts
group Benefit Benefit sum

denied granted
protected a b nprot
unprotected c d nunprot
sum mdenied mgranted n
𝑎 𝑐
➢ Risk difference RD = − Mentioned in UK law
𝑎 𝑐
➢ Risk ratio (aka “relative risk”): 𝑛 / Mentioned by EU Court of Justice
𝑝𝑟𝑜𝑡
𝑛𝑢𝑛𝑝𝑟𝑜𝑡
𝑎 𝑐
➢ Relative chance: 1 − / (1− ) Mentioned by US courts
𝑎 𝑚𝑑𝑒𝑛𝑖𝑒𝑑
➢ Extended lift: /
𝑛𝑝𝑟𝑜𝑡 𝑛
Florian Lemmerich
Other measures
➢ Compare protected group vs entire population
➢ Differences of means
➢ Rank tests
➢ Mutual information
➢ …
See Zliobaite, Indre. "A survey on measuring indirect

discrimination in machine learning." arXiv preprint
arXiv:1511.00148 (2015).
Florian Lemmerich
Discrimination paradox
➢ There can be easy explanations for apparent discrimination
➢ Acceptance rates:
▪ Males: 26%
▪ Females: 24%
➢ Fully explainable by the fact the more females apply for medicine (more
competitive)
➢ Parallel to Simpson’s Paradox
Florian Lemmerich
Discrimination paradox: Corrected Measurement
➢ Do stratified analysis:
▪ What should be the (one) acceptance rate for each faculty?
P (accepted | faculty, male) + P (accepted | faculty, female)
➢ P∗ (accepted | faculty) =
2
➢ Solution: Locally change input:

1. Divide the dataset according to the explanatory attribute(s) into partition pi
2. Estimate P∗ (accepted | explanation) for all partitions pi
3. Apply local techniques on partition pi such that
P(accepted |pi,f ) = P( accepted| pi,m ) = P*( + | ei ) becomes true
➢ Multiple explanatory attributes:

▪ Build groups of individuals
▪ By clustering based on explanatory attributes
Florian Lemmerich
What else can be unfair?
Florian Lemmerich
Group Fairness II: Equal error rates
➢ Another type of unfairness:
▪ Good/informed decisions in one group, poor/random ones in another
▪ Still equal positive rate
▪ Can often happen if one group has too little data
▪ Example: Framingham risk score

Risk score for coronary heart disease was created for white men,
then used for other patients
➢ Goal II: Equalize error rates

P (D=1 | Y = 0, A=a) == P (D=1 | Y = 0, A=b)
P (D=0 | Y = 1, A=a) == P (D=0 | Y = 1, A=b)
Florian Lemmerich
Group Fairness III: Calibration
➢ A score r is calibrated (outcome of an ML algorithm) if P(Y=1 | R=r) = r’
➢ “You can pretend the ML result score is a probability”
➢ Calibration by group: P(Y=y| R=r, A=a) = r’
➢ “[…] requires that outcomes are independent of

protected attributes after controlling for estimated risk”
➢ Does not need on estimate probabilities on individual level, but in comparison
➢ Example:
Among loan applicants estimated to have a 10% chance of default, calibration
requires that whites and blacks default at similar rates
Florian Lemmerich
Corbett Davies & Goel: https://arxiv.org/pdf/1808.00023.pdf
Image: https://www.youtube.com/watch?v=Igq_S_7IfOU
Incompatibility of Fairness
➢ Can we always have fairness according to all fairness measures?
NO!
➢ If we look at three fairness measures fulfilling Goals I-III, we can find examples
that they are pairwise incompatible!
➢ ➔ There can never be one “correct” one-and-only fairness measure
Florian Lemmerich
The Compass debate
Florian Lemmerich
Compass
➢ The COMPAS risk score is used in the legal system in the US to assess “risk of
recidivism” (committing another crime) as a support system
Judges may detain defendants (partly) based on this score
➢ ProPublica organization:
▪ Black defendants face higher false positive rates
▪ I.e., Black labeled as “high” end up more often not committing a crime upon release
compared to whites
➢ COMPAS-maker: Scores are calibrated by group, and Black defendants have a

higher recidivism rate
➔ unavoidable
Florian Lemmerich
Conflict between fairness measures example
Scores:
Prediction of
outcomes AND
Correct probability of
recidivism
Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen

Florian Lemmerich
Conflict between fairness measures example: Undesired solutions
Scores:
Prediction of
outcomes AND
Correct probability of
recidivism
Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen

Florian Lemmerich
Recap COMPAS
➢ Scholarly debate: Tension between fairness measures
➢ Actual problem: What do we actually want?
➢ Maybe the problem is not how we predict, but that we predict?
Florian Lemmerich
9.2 Discrimination Discovery
42
Discrimination Discovery beyond Data Analysis
➢ Large Audits
➢ “Situation Testing”:
▪ Pairs of people with identical attributes, but with a single different characteristic (sex,
gender, etc…), are actively send into a potentially discriminating (or already suspicious)
situation
http://www.ecmikosovo.org/en/Situation-testing
▪ Downside:
• Luring people into a crime
• Full compatibility is often difficult
Florian Lemmerich
Discovery of discrimination
➢ Discrimination discovery task:
“Given a large database of historical decision records, find discriminatory
situations and practices.”
➢ Discovery of discrimination is considered a difficult task since
▪ It can be measured by many different concept
▪ There are many different contexts, in which discrimination can occur
▪ Often indirect
➢ Direct discrimination:
▪ The protected attribute is used (e.g., in rules) such that protected groups
(potentially discriminated “PD” groups) are at a disadvantage
➢ Indirect discrimination:
▪ The actual protected attribute might not even be part of the dataset
▪ E.g., removed in pre-processing for privacy
Florian Lemmerich
Background: Association Rules
➢ Association rule mining “is a rule-based machine learning method for discovering
interesting relations between variables in large databases.”
➢ A probabilistic rule R: X → Y has statistics

▪ Support sup(R): What is the proportion of instances such that X and Y both hold
▪ Confidence conf(R): If X holds, how likely is it that Y holds
▪ X,Y can be conjunctions of conditions

▪ Efficient algorithms to mine them such as A-Priori, FP-Trees, …
➢ Example:
R: Beer, Chips → Diapers,
sup (R): 5%, conf(R): 50%
“If someone buys beer AND chips, then he buys also diapers 50% of the time.
5% of all customers buy all three items”
Florian Lemmerich
Direct Discrimination
➢ Direct discrimination is if we can find rules of the form
A, B → C,
A is a potentially discriminated group

B is the context (“under which circumstances is A discriminated against”)
C is the decision (“benefit granted or not”)
➢ Example:
gender="female", saving_status="no known savings" → credit=no
Florian Lemmerich
⍺-protection
➢ Specify
▪ A discrimination measure dm
▪ a threshold ⍺ (e.g., ⍺=3 or ⍺=5)
➢ Given ⍺, a PD rule A, B → C is ⍺-protective if
dm (A, B → C) ≤ ⍺ for all possible rules in the dataset
➢ Example:
▪ Specify dm = extended lift, ⍺=2.5
▪ Find rule R: city=“New York” AND race = “black” → benefit = deny with
conf = 0.75
▪ Compare to baseline rule city=“New York” → benefit = deny with
conf = 0.25
▪ ➔ extended lift = 3.0
▪ ➔ R is discovered as a ⍺ discriminatory rule (for ⍺=3, extended lift)
Florian Lemmerich
Classification Rule approach
➢ Find all ⍺ discriminatory rules
➢ Advantage:
▪ Finds (sub-) contexts with discrimination
▪ Easily interpretable
➢ Disadvantages:
▪ Often overlapping rules
▪ No global description of who is discriminated
▪ Technically: Requires discretization (in both, left and right side of the rule)
Florian Lemmerich
Indirect Discrimination
➢ Consider a rule D, B → C such as
neighborhood=”12345", city="New York" → credit=no
➢ This rule is not indicating discriminatory behavior
➢ However, if we have ADDITIONAL background knowledge such as
neighborhood=”12345", city="New York" → race = “black”
with high confidence, then this still gives a hint towards discriminatory behavior
Florian Lemmerich
9.3 Discrimination Avoidance
50
Fairness-aware data mining
➢ Goal: develop a decision making process that is
▪ Non-discriminatory
▪ Preserves as much as possible of the decision quality
▪ Multi-objective problem
Fairness Trade-off Utility
➢ Process:
▪ Define constraints w.r.t. discrimination measures
▪ Adapt or transform data/algorithm/model to satisfy the constraints
▪ Measure data/model quality (utility)
Florian Lemmerich
Suppression
➢ Idea 1; Remove the protected attribute from the data
▪ This is not enough!
▪ There are other attributes in the data that are correlated with the protected attribute
▪ Example:
Race is removed, but zip-code still contained
➢ Idea 2: Remove protected attribute and the k attributes most correlated with the
protected attribute
➢ Simple solution
▪ Probably not enough
▪ Better than nothing
Florian Lemmerich
Fairness through Awareness
➢ Suppression implements the idea of
“Fairness through Unawareness”
➢ Advantage: Arguably in line with individual fairness (Fair treatment)
➢ ➔ In literature, “Fairness through Awareness” is claimed as a better solution
Florian Lemmerich
Methods
➢ Adapt classification in
▪ Pre-processing
• Suppression (see above)
• Massaging
• Reweighing
▪ In-processing
▪ Post-processing
Florian Lemmerich
Massaging
➢ Step 1: Rank individuals according to their probability of a positive outcome
- - - - - - - + + + + + unprotected
- - - - - - - - - + + + protected
➢ Step 2: Change the labels for the training data (!)
- - - - - - - - + + + + unprotected
- - - - - - - - + + + + protected
➢ Intuition: We change the labels of instances “close to the boarder”
Florian Lemmerich
Reweighing & Sampling
➢ If one group (e.g., protected group) has proportionally more positive examples
➢ Then assign lower weights for positive examples and higher weights for negative
examples
- - - - - - - + + + + + unprotected
- - - - - - - - - + + + protected
➢ Sampling:
Same idea, but don’t assign weight, but sample from the dataset (with
replacement, so actual training data can occur multiple times)
Florian Lemmerich
Results
No preprocessing
Without sensitive attribute
Reweighing
Sampling
Massaging variants
Florian Lemmerich
(Dis-)Advantages: Fairness through Pre-processing
➢ Advantage:
▪ Simple
▪ Can be used for any downstream task (not just classification)
▪ Is compatible with all classification algorithms
➢ Disadvantage:
▪ Cannot be used for all Fairness metrics (calibration?)
▪ Often inferior in terms of results
Florian Lemmerich
Combined Optimization Function
➢ For any classifier with an optimization function
(Logistic Regression, SVM, Deep Learning, …)
➢ Instead of the original optimization function O(x) (inputs x), use:

O’ (x) = O(x) + F(x),
where F(x) is a measure for fairness in the training data
➢ Optimize directly (e.g. by computing a gradient for O’(x))

➢ Alternative: Adversarial Training:
▪ Do a step towards optimizing O(x)
▪ Do a step towards optimizing F(x)
▪ Iterate, until termination criterion
Florian Lemmerich
Fairness constraints in loss functions
➢ Instead of having a combined fairness function, use constraints
➢ Constraints: hard restrictions,
”condition of an optimization problem that the solution must satisfy”
➢ Add constraints to the loss function how discriminatory the training data is
➢ E.g., for logistic regression (just for the idea):
➢ Challenge: How to solve this efficiently

➢ Is in line with legal regulation (four/fifth rule)
Florian Lemmerich
Results fairness constraints
Florian Lemmerich
Discrimination-aware decision trees
➢ In-processing technique
➢ Based on standard decision trees
➢ Adaptation 1 Alter splitting criterion:
• Usually: maximize at each split the Gini coefficient for the class
➔ separates positives and negatives
• Now: Also minimize Gini coefficient for the protected attribute A
• Together: Use Gini(class) – Gini(A)
▪ Leaf relabeling:
• For a pruned tree
Florian Lemmerich
Discrimination-aware decision trees
➢ Adaptation 2: Leaf relabeling
▪ Usually: class for a leaf = majority class
▪ If overall labels are disproportionate: Flip the labels for some tree leafs, with training
examples close to parity of classes
Florian Lemmerich
Algorithms: Two Bayes
➢ Calders and Verwer propose an adaptation to Naïve Bayes:
1. Split the dataset into two: one containing only men, one containing only women.
2. Train two Naïve Bayes models: Mm and Mw – a model for men and a model for women.
3. The overall model applies the appropriate model to new data points.
➢ “Their approach trains separate models for the [sensitive-feature] values and iteratively assesses the
fairness of the combined model under the CV measure, makes small changes to the observed
probabilities in the direction of reducing the measure, and retrains their two models.”
Florian Lemmerich
Calders and Verwer (2010) # 64
Beyond Classification: Fair Ranking
➢ So far: decision binary
➢ Ranking: Return an ordering of possible results
Example:
➢ Goal: In every top-k, the ratio of a protected group is not

significantly different (according to a specific test) from the majority
Florian Lemmerich
Randomized approach
➢ Compute two separate rankings, one for each group (e.g., males vs. females)
➢ Compute the overall share p of the minority group
➢ To create a final ranking:
▪ Start at the top
▪ At each position choose randomly:
• with probability p the next candidate from the minority
• with probability 1-p the next candidate from the majority
Ke Yang and Julia Stoyanovich. 2016. Measuring Fairness in

Ranked Outputs. In Proc. of FATML.
Florian Lemmerich
Deterministic approach
➢ Compute two separate rankings, one for each group (e.g., males vs. females)
➢ Compute the overall share p of the minority group
➢ First calculate a constraint table how many candidates of the protected group
must be in the top-k (Take care of multiple comparison)
➢ Fill final result from top to bottom:

▪ If constraint table requires choose a candidate from the minority
▪ Otherwise choose the better overall candidate
Florian Lemmerich
Fairness-aware data mining: Research challenges
Fairness Trade-off Utility
➢ Problem 1: Ground truth of the result

Utility/accuracy based on potentially biased training data
➢ Problem 2:
Hard to assess fairness: Which measure is the right one? More than 1 definition
➢ No standardized test setting: few standard datasets
Florian Lemmerich
Summary
➢ Fairness has become a very popular and relevant in the ML-community lately
➢ Unfairness arises typically by learning from biased data
➢ There are many different measures for quantifying unfairness

▪ Individual fairness
▪ Group fairness:
• Statistical parity
• Equal error rates
• Calibration
➢ Fairness measures can be contradictory
➢ Typically, there is a trade-off between fairness and accuracy
➢ Fairness through awareness: Just ignoring the sensitive attribute is not enough
➢ Many different approaches and algorithms to improve fairness
Florian Lemmerich
More Material
➢ Tutorial slides (a lot of which this lecture was based on):
▪ http://francescobonchi.com/KDD2016_Tutorial_Part1&2_web.pdf
▪ http://francescobonchi.com/KDD2016_Tutorial_Part3&4_web.pdf
▪ http://www.mlandthelaw.org/slides/hajian.pdf
➢ Video of the Tutorial

▪ https://www.youtube.com/watch?v=mJcWrfoGup8
▪ https://www.youtube.com/watch?v=nKemhMbaYcU&t=3794s
▪ https://www.youtube.com/watch?v=ErgHjxJsEKA&t=2891s
Florian Lemmerich

09 Fair Machine Learning II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

09 Fair Machine Learning II

Uploaded by

Copyright:

Available Formats

Week 9:

Fair Machine Learning

Chair for Computational Social Science and Humanities

➢ More and more important decision are taken by algorithms,

”For example, among all patients classified as very high-risk,

’I looked up fairness in the dictionary and it was not there.’

➢ What should our algorithm achieve to be fair?

➢ Utilitaristic: Just show what the user wants it to be

➢ These goals can be contradictory

➢ Discrimination in ML most often used as “unfair discrimination against one group”

➢ But… aren’t algorithm objective? How can something be “unfair”?

➢ Typically, the bias does not origin in the ML-algorithm

➢ But algorithms learn from biased data!

➢ Skewed sample: Future observations confirm predictions

➢ Provide a framework of definitions, objectives, constraints

Goodman, Flaxman: European Union regulations on

➢ Assessment is not clear:

▪ Methods to measure discrimination

Consistency in the decision Statistical parity

group Benefit Benefit sum

group Benefit Benefit sum

See Zliobaite, Indre. "A survey on measuring indirect

➢ Solution: Locally change input:

➢ Multiple explanatory attributes:

▪ Example: Framingham risk score

➢ Goal II: Equalize error rates

➢ Calibration by group: P(Y=y| R=r, A=a) = r’

➢ “[…] requires that outcomes are independent of

➢ Does not need on estimate probabilities on individual level, but in comparison

➢ ➔ There can never be one “correct” one-and-only fairness measure

➢ COMPAS-maker: Scores are calibrated by group, and Black defendants have a

Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen

Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen

➢ Actual problem: What do we actually want?

➢ Maybe the problem is not how we predict, but that we predict?

➢ A probabilistic rule R: X → Y has statistics

▪ X,Y can be conjunctions of conditions

A is a potentially discriminated group

Fairness Trade-off Utility

➢ Advantage: Arguably in line with individual fairness (Fair treatment)

➢ ➔ In literature, “Fairness through Awareness” is claimed as a better solution

➢ Step 2: Change the labels for the training data (!)

➢ Intuition: We change the labels of instances “close to the boarder”

➢ Instead of the original optimization function O(x) (inputs x), use:

where F(x) is a measure for fairness in the training data

➢ Optimize directly (e.g. by computing a gradient for O’(x))

➢ Challenge: How to solve this efficiently

➢ Goal: In every top-k, the ratio of a protected group is not

Ke Yang and Julia Stoyanovich. 2016. Measuring Fairness in

➢ Fill final result from top to bottom:

Fairness Trade-off Utility

➢ Problem 1: Ground truth of the result

➢ There are many different measures for quantifying unfairness

➢ Video of the Tutorial

You might also like