You are on page 1of 70

Week 9:

Fair Machine Learning


Winter term 2020 / 2021

Chair for Computational Social Science and Humanities


Markus Strohmaier, Florian Lemmerich, and Tobias Schumacher
Where we are

Florian Lemmerich
Social Data Science
Sources and Resources
➢ Sara Hajian (Algorithmic Bias, KDD Tutorial 2016)
➢ S. Bird et al: Fairness-Aware Machine Learning in Practice
https://sites.google.com/view/fairness-tutorial
➢ S. Barocas and M. Hardt: Fair Machine Learning https://vimeo.com/248490141
➢ https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-
3ff8ba1040cb
➢ Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen:
https://www.youtube.com/watch?v=Igq_S_7IfOU

Florian Lemmerich
Social Data Science
3
Agenda
➢ Discrimination and Biases in Machine Learning
➢ Fairness Measures
➢ (Discrimination Detection)
➢ Discrimination Avoidance

Florian Lemmerich
4 Social Data Science
9.1 Bias and Discrimination in ML

5
Decision Making

➢ More and more important decision are taken by algorithms,


not by humans

➢ Human decisions:
▪ Objective elements (“rational”)
▪ Subjective elements (“emotional”, “prejudiced”)

➢ Algorithms:
▪ Only based on objective inputs

Florian Lemmerich
6 Social Data Science
Amazon hiring

7 Florian Lemmerich
https://www.theguardian.com/technology/2018/oct/10/a
Social Data Science
mazon-hiring-ai-gender-bias-recruiting-engine
Search results for “C.E.O”
➢ 27% of C.E.O.s are female (US, 2015)

Florian Lemmerich
8 Social Data Science
➢ M. Kay, C. Matuszek, S. Munson (2015): Unequal Representation and Gender
Stereotypes in Image Search Results for Occupations. CHI'15.

Florian Lemmerich
9 Social Data Science
➢ Given the same other profile information and search history,
Google advertises higher paying jobs to male accounts

Florian Lemmerich
10 Social Data Science
US Healthcare system

”For example, among all patients classified as very high-risk,


black individuals turned out to have 26.3 percent more chronic
illnesses than white ones (despite sharing similar risk scores).
Because their recorded health care costs were on par with those
of healthier white people, the program was less likely to flag
eligible black patients for high-risk care management. ”

https://www.scientificamerican.com/article/racial-bias-found-in-a-major-health-care-risk-algorithm/

Florian Lemmerich
11 Social Data Science
Example credit scoring

Florian Lemmerich
12 Social Data Science
What is fair?

’I looked up fairness in the dictionary and it was not there.’

- William Giraldi

➢ What should our algorithm achieve to be fair?

➢ Utilitaristic: Just show what the user wants it to be


➢ Descriptive: Based on how the world currently really is
➢ Normative: Show how the world should be

➢ These goals can be contradictory

Florian Lemmerich
13 Social Data Science
Biases in ML
➢ Discrimination: In principle goal of ML classification:
Discriminate (=distinguish) positive and negative example

➢ Discrimination in ML most often used as “unfair discrimination against one group”

➢ But… aren’t algorithm objective? How can something be “unfair”?

➢ Typically, the bias does not origin in the ML-algorithm

➢ But algorithms learn from biased data!

Florian Lemmerich
14 Social Data Science
Bias in Machine Learning Data
➢ Biased labels and labelers: classes are often assigned by humans that bring in their
conscious/unconscious prejudices, etc…
Example: Judges gives higher sentences to accused people of color, ML-algorithms
learns from this and suggests higher sentences to those with similar crimes

➢ Skewed sample: Future observations confirm predictions


Example: send police primarily where more crime is predicted, find more crimes there

➢ Sample size disparity: More training data for one group of people
Example: Train image recognition from a training set that contains mostly white males

➢ Feature accuracy: Data might have been collected more reliable for one group than for
another:
Example: People of color go less often to the doctor, thus less precise data on them
might be available
Florian Lemmerich
15 Social Data Science
Causes of Unfairness
➢ Cultural differences lead to imbalanced prediction errors

https://medium.com/@mrtz/how-big-data-is-unfair-
9aa544d739de

Florian Lemmerich
16 Social Data Science
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-
3ff8ba1040cb

Florian Lemmerich
17 Social Data Science
Legal: Protected classes
➢ Protected attributes depend on the jurisdiction you are under (e.g., US vs EU)
➢ Germany: Discrimination is prohibited in Germany on six grounds:
▪ Race and ethnic origin
▪ Gender
▪ Religion and worldview
▪ Disability and chronic disease
▪ Age
▪ Sexual orientation
➢ Discrimination is forbidden in working life and day-to-day life, e.g.:
▪ When applying for a job
▪ Payment
▪ Renting an apartment
https://www.antidiskriminierungsstelle.de/SharedDocs/Downloads/DE/
▪ Going to a club publikationen/Refugees/Fluechtlingsbroschuere_englisch.pdf?__blob=
publicationFile&v=13
▪ Opening a bank account
▪ …
Florian Lemmerich
18 Social Data Science
Two Legal Doctrines in the US
Disparate Treatment ➢ Disparate impact
➢ Purposeful consideration of group ➢ Avoidable and unjustified harm,
membership possibly indirect
➢ Intentional discrimination without
consideration of group membership

➢ Goal: ➢ Goal:
Procedural fairness Minimize differences in outcomes
(distributive justice)

Potential conflict!

Florian Lemmerich
19 Social Data Science Moritz Hardt, MLSS2020, Tübingen
https://www.youtube.com/watch?v=Igq_S_7IfOU
Legal
➢ Anti-discrimination regulations in place

➢ Provide a framework of definitions, objectives, constraints


➢ EU: explicitly requires data controllers to“implement appropriate technical and
organizational measures” that “prevents, inter alia, discriminatory effects” on the basis
of processing sensitive data
➢ E.g., GDPR “Right to explanation”

Goodman, Flaxman: European Union regulations on


algorithmic decision-making and a “right to explanation”, 2016
Florian Lemmerich
20 Social Data Science
Laws and politics/policies
➢ Legal regulations give definitions/constraints:
▪ Might change
▪ Are regional

➢ Assessment is not clear:


▪ Is my data algorithm legal
▪ Is my discrimination detection useful in a law suit?

Florian Lemmerich
21 Social Data Science
Technical perspective
➢ Fighting discrimination requires:

▪ Methods to measure discrimination


▪ Methods to detect discrimination
▪ Methods to avoid discrimination

Florian Lemmerich
22 Social Data Science
8.3 Measuring Fairness

23
Measuring discrimination

Discrimination Discrimination
at the individual level at the group level

Consistency in the decision Statistical parity

Florian Lemmerich
24 Social Data Science
Individual fairness
➢ Compare the decision of the instance with its nearest neighbors
“Similar properties should lead to similar outcomes”

➢ Consistency score:
▪ Assume binary decision yp of the algorithm
▪ Compute for a point p the set N(p) as the set of k-nearest neighbors
(acc. to some distance function)
1
▪ 𝐶=1 − σ𝑝 σ𝑗 ∈𝑁(𝑝) |𝑦𝑝 − 𝑦𝑗 |
𝑁∗𝑘

Florian Lemmerich
25 Social Data Science
Four-Fifth rule
➢ “a selection rate for any race, sex, or ethnic group which is less than four-fifths (or 80%) of
the rate for the group with the highest rate will generally be regarded by the Federal
enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate
will generally not be regarded by Federal enforcement agencies as evidence of adverse
impact.”
➢ Origin: assembled by the State of California Fair Employment Practice Commission
(FEPC) in 1971

Florian Lemmerich
26 Social Data Science
https://www.prevuehr.com/resources/insights/adverse-impact-
analysis-four-fifths-rule/
Another explanation of the four/fifth rule (from Wikipedia)
“The rule was based on the rates at which job applicants were hired. For example, if
XYZ Company hired 50 percent of the men applying for work in a predominantly
male occupation while hiring only 20 percent of the female applicants, one could
look at the ratio of those two hiring rates to judge whether there might be a
discrimination problem. The ratio of 20:50 means that the rate of hiring for female
applicants is only 40 percent of the rate of hiring for male applicants. That is, 20
divided by 50 equals 0.40, which is equivalent to 40 percent. Clearly, 40 percent is
well below the 80 percent that was arbitrarily set as an acceptable difference in
hiring rates. Therefore, in this example, XYZ Company could have been called upon
to prove that there was a legitimate reason for hiring men at a rate so much higher
than the rate of hiring women.”

https://en.wikipedia.org/wiki/Disparate_impact

Florian Lemmerich
27 Social Data Science
Group Fairness I: Statistical parity
➢ Create a four field table of counts

group Benefit Benefit sum


denied granted
protected a b nprot
unprotected c d nunprot
sum mdenied mgranted n
➢ Goal I:
𝑏 𝑑
▪ Equal positive rate: ≈
𝑛𝑝𝑟𝑜𝑡 𝑛𝑢𝑛𝑝𝑟𝑜𝑡

Florian Lemmerich
28 Social Data Science
Statistical parity (discrimination measures)
➢ Create a four field table of counts

group Benefit Benefit sum


denied granted
protected a b nprot
unprotected c d nunprot
sum mdenied mgranted n
𝑎 𝑐
➢ Risk difference RD = − Mentioned in UK law
𝑛𝑝𝑟𝑜𝑡 𝑛𝑢𝑛𝑝𝑟𝑜𝑡
𝑎 𝑐
➢ Risk ratio (aka “relative risk”): 𝑛 / Mentioned by EU Court of Justice
𝑝𝑟𝑜𝑡
𝑛𝑢𝑛𝑝𝑟𝑜𝑡
𝑎 𝑐
➢ Relative chance: 1 − / (1− ) Mentioned by US courts
𝑛𝑝𝑟𝑜𝑡 𝑛𝑢𝑛𝑝𝑟𝑜𝑡
𝑎 𝑚𝑑𝑒𝑛𝑖𝑒𝑑
➢ Extended lift: /
𝑛𝑝𝑟𝑜𝑡 𝑛

Florian Lemmerich
29 Social Data Science
Other measures
➢ Compare protected group vs entire population
➢ Differences of means
➢ Rank tests
➢ Mutual information
➢ …

See Zliobaite, Indre. "A survey on measuring indirect


discrimination in machine learning." arXiv preprint
arXiv:1511.00148 (2015).

Florian Lemmerich
30 Social Data Science
Discrimination paradox
➢ There can be easy explanations for apparent discrimination

➢ Acceptance rates:
▪ Males: 26%
▪ Females: 24%
➢ Fully explainable by the fact the more females apply for medicine (more
competitive)
➢ Parallel to Simpson’s Paradox

Florian Lemmerich
31 Social Data Science
Discrimination paradox: Corrected Measurement
➢ Do stratified analysis:
▪ What should be the (one) acceptance rate for each faculty?
P (accepted | faculty, male) + P (accepted | faculty, female)
➢ P∗ (accepted | faculty) =
2

➢ Solution: Locally change input:


1. Divide the dataset according to the explanatory attribute(s) into partition pi
2. Estimate P∗ (accepted | explanation) for all partitions pi
3. Apply local techniques on partition pi such that
P(accepted |pi,f ) = P( accepted| pi,m ) = P*( + | ei ) becomes true

➢ Multiple explanatory attributes:


▪ Build groups of individuals
▪ By clustering based on explanatory attributes

Florian Lemmerich
32 Social Data Science
What else can be unfair?

Florian Lemmerich
33 Social Data Science
Group Fairness II: Equal error rates
➢ Another type of unfairness:
▪ Good/informed decisions in one group, poor/random ones in another
▪ Still equal positive rate
▪ Can often happen if one group has too little data

▪ Example: Framingham risk score


Risk score for coronary heart disease was created for white men,
then used for other patients

➢ Goal II: Equalize error rates


P (D=1 | Y = 0, A=a) == P (D=1 | Y = 0, A=b)
P (D=0 | Y = 1, A=a) == P (D=0 | Y = 1, A=b)

Florian Lemmerich
34 Social Data Science
Group Fairness III: Calibration
➢ A score r is calibrated (outcome of an ML algorithm) if P(Y=1 | R=r) = r’
➢ “You can pretend the ML result score is a probability”

➢ Calibration by group: P(Y=y| R=r, A=a) = r’

➢ “[…] requires that outcomes are independent of


protected attributes after controlling for estimated risk”

➢ Does not need on estimate probabilities on individual level, but in comparison

➢ Example:
Among loan applicants estimated to have a 10% chance of default, calibration
requires that whites and blacks default at similar rates

Florian Lemmerich
35 Social Data Science
Corbett Davies & Goel: https://arxiv.org/pdf/1808.00023.pdf
Image: https://www.youtube.com/watch?v=Igq_S_7IfOU
Incompatibility of Fairness
➢ Can we always have fairness according to all fairness measures?

NO!

➢ If we look at three fairness measures fulfilling Goals I-III, we can find examples
that they are pairwise incompatible!

➢ ➔ There can never be one “correct” one-and-only fairness measure

Florian Lemmerich
36 Social Data Science
The Compass debate

Florian Lemmerich
37 Social Data Science
Compass
➢ The COMPAS risk score is used in the legal system in the US to assess “risk of
recidivism” (committing another crime) as a support system
Judges may detain defendants (partly) based on this score

➢ ProPublica organization:
▪ Black defendants face higher false positive rates
▪ I.e., Black labeled as “high” end up more often not committing a crime upon release
compared to whites

➢ COMPAS-maker: Scores are calibrated by group, and Black defendants have a


higher recidivism rate
➔ unavoidable

Florian Lemmerich
38 Social Data Science
Conflict between fairness measures example

Scores:
Prediction of
outcomes AND
Correct probability of
recidivism

Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen


https://www.youtube.com/watch?v=Igq_S_7IfOU

Florian Lemmerich
39 Social Data Science
Conflict between fairness measures example: Undesired solutions

Scores:
Prediction of
outcomes AND
Correct probability of
recidivism

Fairness, part 1 - Moritz Hardt - MLSS 2020, Tübingen


https://www.youtube.com/watch?v=Igq_S_7IfOU

Florian Lemmerich
40 Social Data Science
Recap COMPAS
➢ Scholarly debate: Tension between fairness measures

➢ Actual problem: What do we actually want?

➢ Maybe the problem is not how we predict, but that we predict?

Florian Lemmerich
41 Social Data Science
9.2 Discrimination Discovery

42
Discrimination Discovery beyond Data Analysis
➢ Large Audits
➢ “Situation Testing”:
▪ Pairs of people with identical attributes, but with a single different characteristic (sex,
gender, etc…), are actively send into a potentially discriminating (or already suspicious)
situation

http://www.ecmikosovo.org/en/Situation-testing

▪ Downside:
• Luring people into a crime
• Full compatibility is often difficult

Florian Lemmerich
43 Social Data Science
Discovery of discrimination
➢ Discrimination discovery task:
“Given a large database of historical decision records, find discriminatory
situations and practices.”
➢ Discovery of discrimination is considered a difficult task since
▪ It can be measured by many different concept
▪ There are many different contexts, in which discrimination can occur
▪ Often indirect

➢ Direct discrimination:
▪ The protected attribute is used (e.g., in rules) such that protected groups
(potentially discriminated “PD” groups) are at a disadvantage
➢ Indirect discrimination:
▪ The actual protected attribute might not even be part of the dataset
▪ E.g., removed in pre-processing for privacy

Florian Lemmerich
44 Social Data Science
Background: Association Rules
➢ Association rule mining “is a rule-based machine learning method for discovering
interesting relations between variables in large databases.”

➢ A probabilistic rule R: X → Y has statistics


▪ Support sup(R): What is the proportion of instances such that X and Y both hold
▪ Confidence conf(R): If X holds, how likely is it that Y holds

▪ X,Y can be conjunctions of conditions


▪ Efficient algorithms to mine them such as A-Priori, FP-Trees, …

➢ Example:
R: Beer, Chips → Diapers,
sup (R): 5%, conf(R): 50%
“If someone buys beer AND chips, then he buys also diapers 50% of the time.
5% of all customers buy all three items”
Florian Lemmerich
45 Social Data Science
Direct Discrimination
➢ Direct discrimination is if we can find rules of the form

A, B → C,

A is a potentially discriminated group


B is the context (“under which circumstances is A discriminated against”)
C is the decision (“benefit granted or not”)

➢ Example:
gender="female", saving_status="no known savings" → credit=no

Florian Lemmerich
46 Social Data Science
⍺-protection
➢ Specify
▪ A discrimination measure dm
▪ a threshold ⍺ (e.g., ⍺=3 or ⍺=5)
➢ Given ⍺, a PD rule A, B → C is ⍺-protective if
dm (A, B → C) ≤ ⍺ for all possible rules in the dataset

➢ Example:
▪ Specify dm = extended lift, ⍺=2.5
▪ Find rule R: city=“New York” AND race = “black” → benefit = deny with
conf = 0.75
▪ Compare to baseline rule city=“New York” → benefit = deny with
conf = 0.25
▪ ➔ extended lift = 3.0
▪ ➔ R is discovered as a ⍺ discriminatory rule (for ⍺=3, extended lift)

Florian Lemmerich
47 Social Data Science
Classification Rule approach
➢ Find all ⍺ discriminatory rules

➢ Advantage:
▪ Finds (sub-) contexts with discrimination
▪ Easily interpretable

➢ Disadvantages:
▪ Often overlapping rules
▪ No global description of who is discriminated

▪ Technically: Requires discretization (in both, left and right side of the rule)

Florian Lemmerich
48 Social Data Science
Indirect Discrimination
➢ Consider a rule D, B → C such as
neighborhood=”12345", city="New York" → credit=no
➢ This rule is not indicating discriminatory behavior
➢ However, if we have ADDITIONAL background knowledge such as
neighborhood=”12345", city="New York" → race = “black”
with high confidence, then this still gives a hint towards discriminatory behavior

Florian Lemmerich
49 Social Data Science
9.3 Discrimination Avoidance

50
Fairness-aware data mining
➢ Goal: develop a decision making process that is
▪ Non-discriminatory
▪ Preserves as much as possible of the decision quality
▪ Multi-objective problem

Fairness Trade-off Utility

➢ Process:
▪ Define constraints w.r.t. discrimination measures
▪ Adapt or transform data/algorithm/model to satisfy the constraints
▪ Measure data/model quality (utility)

Florian Lemmerich
51 Social Data Science
Suppression
➢ Idea 1; Remove the protected attribute from the data
▪ This is not enough!
▪ There are other attributes in the data that are correlated with the protected attribute
▪ Example:
Race is removed, but zip-code still contained

➢ Idea 2: Remove protected attribute and the k attributes most correlated with the
protected attribute

➢ Simple solution
▪ Probably not enough
▪ Better than nothing

Florian Lemmerich
52 Social Data Science
Fairness through Awareness
➢ Suppression implements the idea of
“Fairness through Unawareness”

➢ Advantage: Arguably in line with individual fairness (Fair treatment)

➢ ➔ In literature, “Fairness through Awareness” is claimed as a better solution

Florian Lemmerich
53 Social Data Science
Methods
➢ Adapt classification in
▪ Pre-processing
• Suppression (see above)
• Massaging
• Reweighing

▪ In-processing

▪ Post-processing

Florian Lemmerich
54 Social Data Science
Massaging
➢ Step 1: Rank individuals according to their probability of a positive outcome

- - - - - - - + + + + + unprotected

- - - - - - - - - + + + protected

➢ Step 2: Change the labels for the training data (!)

- - - - - - - - + + + + unprotected

- - - - - - - - + + + + protected

➢ Intuition: We change the labels of instances “close to the boarder”

Florian Lemmerich
55 Social Data Science
Reweighing & Sampling
➢ If one group (e.g., protected group) has proportionally more positive examples
➢ Then assign lower weights for positive examples and higher weights for negative
examples

- - - - - - - + + + + + unprotected

- - - - - - - - - + + + protected

➢ Sampling:
Same idea, but don’t assign weight, but sample from the dataset (with
replacement, so actual training data can occur multiple times)

Florian Lemmerich
56 Social Data Science
Results

No preprocessing
Without sensitive attribute
Reweighing
Sampling

Massaging variants

Florian Lemmerich
57 Social Data Science
(Dis-)Advantages: Fairness through Pre-processing
➢ Advantage:
▪ Simple
▪ Can be used for any downstream task (not just classification)
▪ Is compatible with all classification algorithms

➢ Disadvantage:
▪ Cannot be used for all Fairness metrics (calibration?)
▪ Often inferior in terms of results

Florian Lemmerich
58 Social Data Science
Combined Optimization Function
➢ For any classifier with an optimization function
(Logistic Regression, SVM, Deep Learning, …)

➢ Instead of the original optimization function O(x) (inputs x), use:


O’ (x) = O(x) + F(x),

where F(x) is a measure for fairness in the training data

➢ Optimize directly (e.g. by computing a gradient for O’(x))


➢ Alternative: Adversarial Training:
▪ Do a step towards optimizing O(x)
▪ Do a step towards optimizing F(x)
▪ Iterate, until termination criterion

Florian Lemmerich
59 Social Data Science
Fairness constraints in loss functions
➢ Instead of having a combined fairness function, use constraints
➢ Constraints: hard restrictions,
”condition of an optimization problem that the solution must satisfy”
➢ Add constraints to the loss function how discriminatory the training data is
➢ E.g., for logistic regression (just for the idea):

➢ Challenge: How to solve this efficiently


➢ Is in line with legal regulation (four/fifth rule)

Florian Lemmerich
60 Social Data Science
Results fairness constraints

Florian Lemmerich
61 Social Data Science
Discrimination-aware decision trees
➢ In-processing technique
➢ Based on standard decision trees
➢ Adaptation 1 Alter splitting criterion:
• Usually: maximize at each split the Gini coefficient for the class
➔ separates positives and negatives
• Now: Also minimize Gini coefficient for the protected attribute A
• Together: Use Gini(class) – Gini(A)
▪ Leaf relabeling:
• For a pruned tree

Florian Lemmerich
62 Social Data Science
Discrimination-aware decision trees
➢ Adaptation 2: Leaf relabeling
▪ Usually: class for a leaf = majority class
▪ If overall labels are disproportionate: Flip the labels for some tree leafs, with training
examples close to parity of classes

Florian Lemmerich
63 Social Data Science
Algorithms: Two Bayes
➢ Calders and Verwer propose an adaptation to Naïve Bayes:
1. Split the dataset into two: one containing only men, one containing only women.
2. Train two Naïve Bayes models: Mm and Mw – a model for men and a model for women.
3. The overall model applies the appropriate model to new data points.
➢ “Their approach trains separate models for the [sensitive-feature] values and iteratively assesses the
fairness of the combined model under the CV measure, makes small changes to the observed
probabilities in the direction of reducing the measure, and retrains their two models.”

Florian Lemmerich
64 Social Data Science
Calders and Verwer (2010) # 64
Beyond Classification: Fair Ranking
➢ So far: decision binary
➢ Ranking: Return an ordering of possible results
Example:

➢ Goal: In every top-k, the ratio of a protected group is not


significantly different (according to a specific test) from the majority

Florian Lemmerich
65 Social Data Science
Randomized approach
➢ Compute two separate rankings, one for each group (e.g., males vs. females)
➢ Compute the overall share p of the minority group
➢ To create a final ranking:
▪ Start at the top
▪ At each position choose randomly:
• with probability p the next candidate from the minority
• with probability 1-p the next candidate from the majority

Ke Yang and Julia Stoyanovich. 2016. Measuring Fairness in


Ranked Outputs. In Proc. of FATML.

Florian Lemmerich
66 Social Data Science
Deterministic approach
➢ Compute two separate rankings, one for each group (e.g., males vs. females)
➢ Compute the overall share p of the minority group
➢ First calculate a constraint table how many candidates of the protected group
must be in the top-k (Take care of multiple comparison)

➢ Fill final result from top to bottom:


▪ If constraint table requires choose a candidate from the minority
▪ Otherwise choose the better overall candidate

Florian Lemmerich
67 Social Data Science
Fairness-aware data mining: Research challenges

Fairness Trade-off Utility

➢ Problem 1: Ground truth of the result


Utility/accuracy based on potentially biased training data
➢ Problem 2:
Hard to assess fairness: Which measure is the right one? More than 1 definition
➢ No standardized test setting: few standard datasets

Florian Lemmerich
68 Social Data Science
Summary
➢ Fairness has become a very popular and relevant in the ML-community lately
➢ Unfairness arises typically by learning from biased data

➢ There are many different measures for quantifying unfairness


▪ Individual fairness
▪ Group fairness:
• Statistical parity
• Equal error rates
• Calibration
➢ Fairness measures can be contradictory
➢ Typically, there is a trade-off between fairness and accuracy
➢ Fairness through awareness: Just ignoring the sensitive attribute is not enough
➢ Many different approaches and algorithms to improve fairness

Florian Lemmerich
69 Social Data Science
More Material
➢ Tutorial slides (a lot of which this lecture was based on):
▪ http://francescobonchi.com/KDD2016_Tutorial_Part1&2_web.pdf
▪ http://francescobonchi.com/KDD2016_Tutorial_Part3&4_web.pdf
▪ http://www.mlandthelaw.org/slides/hajian.pdf

➢ Video of the Tutorial


▪ https://www.youtube.com/watch?v=mJcWrfoGup8
▪ https://www.youtube.com/watch?v=nKemhMbaYcU&t=3794s
▪ https://www.youtube.com/watch?v=ErgHjxJsEKA&t=2891s

Florian Lemmerich
70 Social Data Science

You might also like