Professional Documents
Culture Documents
2129
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
A few approaches are proposed to achieve the above aim, in- one from each set, that has the smallest statistical disparity. In [4],
cluding multi-party computation [12, 14] and demographic proxy candidate models are trained using randomly re-weighted instances;
[7, 10]. Basically, the former assumes sensitive data are held by a in this study, we train them using bootstrapped samples because
trusted third party which can participate in model training, while they achieve better results in experiments.
the latter achieves fairness based on a proxy demographic feature.
Both directions have certain limits, however. For example, it is not 3.3 Two Proposed Query Strategies
always realistic to assume the presence of a third party which can A standard fairness-aware learner (which requires access to demo-
hold private data and participate in expensive model training as graphic data) can certainly train 𝑓 only on 𝐷𝑠 , but the achieved
instructed, and, even so, the party may be vulnerable to inference fairness will not generalize since 𝐷𝑠 is small. In this section, we pro-
attack [11]. On the other hand, one may achieve limited fairness pose to continuously enlarge 𝐷𝑠 , by actively querying 𝑠 for certain
based on a proxy demographic feature which is not close to true 𝑥 ∈ 𝐷 𝑠 and adding those data to 𝐷𝑠 .
feature – embarrassingly, if the two features are indeed close, then A naive query strategy is to query for random instances. How-
one fails to protect demographic privacy. To lift these limitations, ever, this may not be efficient, i.e., one may have to query for
we propose a new direction called active demographic query. massive instances before getting a model with desired fairness –
2.2 Active learning this is not practical, especially when the query budget is tight.
In the following, we propose two active query strategies. One
Our proposal is motivated by active learning [18], which actively aims for the most controversial instance disagreed by decoupled
queries training data for efficiently improving model performance. models, while the other aims for the most resistant instance that
While traditional active learning focuses on querying labels [8] deteriorates the current model fairness. For each strategy, we will
or unlabeled data [15] for improving model accuracy, we propose first explain the intuition and then present the algorithm.
to query demographic data for improving model fairness. To our
best knowledge, this is the first work that formally connects active 3.3.1 Query for the Most Controversial (QmCo). Imagine a sce-
learning and fairness. nario where one aims to build a gender-neutral model to assess
We propose two active query strategies. QmCo is motivated qualification of job applicants. Let 𝑥 be a random applicant and
by the disagreement-based strategy [8] which queries label for 𝑓 be a decoupled assessment model. If 𝑥 is female, her predicted
instance disagreed by a set of models, and by Dwork’s decoupled fair qualification is 𝑓1 (𝑥); otherwise, his predicted qualification is 𝑓2 (𝑥).
model [4] which learns separate models for different demographic Equality is another common notion of fairness, which states that
groups – QmCo integrates both works. QmRe is partly motivated similarly situated people should receive similar decisions. From this
by [13] which aims to achieve fairness on the subgroup that suffers view, we argue that 𝑓1 (𝑥) and 𝑓2 (𝑥) should not differ drastically,
most disparity – we query demographic data for the instance that since they evaluate the same set of skills described by 𝑥, preassum-
suggests the highest disparity of the present model. ing these skills are indeed sufficient and necessary. Therefore, if 𝑥
is largely disagreed by 𝑓1 and 𝑓2 , then it is a controversial instance
3 METHODOLOGY and deserves special attention from the learner.
Although equality and statistical disparity are not equivalent
3.1 Preliminaries
notions, we hypothesize that achieving equality helps to mitigate
We describe a random instance by a triple (𝑥, 𝑠, 𝑦), where 𝑠 is the disparity – this is partly evident from Dwork et al’s seminal formal-
sensitive demographic feature (e.g., HIV record), 𝑥 is a vector of the ization on equality [3], where they prove a model’s high equality
rest non-sensitive features, and 𝑦 is the label. Our goal is to learn a implies its low statistical disparity under mild conditions.
model 𝑓 : 𝑥 → 𝑦 that does not discriminate against the protected To this end, we propose an active demographic query strategy
group defined by 𝑠. Like prior studies, assume 𝑠 and 𝑦 are binary. called Query for the Most Controversial (QmCo). Define the contro-
Assume two data sets are available. Set 𝐷𝑠 = {(𝑥𝑖 , 𝑠𝑖 , 𝑦𝑖 )}𝑖=1,...,𝑚 versial degree of an instance 𝑥 as
is small but has demographic data, and set 𝐷 𝑠 = {(𝑥𝑖 ′ , _, 𝑦𝑖 ′ )}𝑖 ′ =1,...,𝑚′
𝐶𝑜 (𝑥) = | 𝑝 (𝑓1 (𝑥) = 1 | 𝑥) − 𝑝 (𝑓2 (𝑥) = 1 | 𝑥) |. (2)
is large but has no demographic data. A learner can query 𝑠 for any
𝑥 ∈ 𝐷 𝑠 , but prefers to minimize its number of queries. Then, we query for the most controversial instance, i.e.,
After a model 𝑓 is learned, we measure its fairness based a com- 𝑥 = arg max𝑥 ′ ∈𝐷 𝐶𝑜 (𝑥 ′ ). (3)
mon notion of statistical disparity[20], defined as 𝑠
2130
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
Algorithm 1 Query for the Most Controversial (QmCo) demographic feature and ‘default payment’ as label. We randomly
down-sampled the data set to 2000 users to speed up computation.
Input: Data sets 𝐷𝑠 and 𝐷 𝑠 .
The Communities Crime data set contains records of 1993 com-
Output: A decoupled fair model 𝑓 = (𝑓1, 𝑓2 ).
munities described by 128 attributes. We treated ‘fraction of African-
1: Train 𝑓 using 𝐷𝑠 .
American residents’ as sensitive demographic feature, and binarized
2: While stopping criterion is not meet:
it so that a community is considered as ‘minority’ if the fraction is
3: Query 𝑠 for an 𝑥 ∈ 𝐷 𝑠 by (3); move it from 𝐷 𝑠 to 𝐷𝑠 .
below 0.5 and as ‘non-minority’ otherwise. We treated ‘community
4: Re-train 𝑓 using 𝐷𝑠 .
crime rate’ as label, and binarized it so that the rate is considered
‘high’ if it is above 0.5 and considered ‘low’ otherwise.
Algorithm 2 Query for the Most Resistant (QmRe) We standardized all data sets using the methods in [16].
2131
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
ACKNOWLEDGMENTS
(a) Statistical Disparity (b) Classification Error The authors would like to thank anonymous reviewers for their
Figure 1: Performance on Communities and Crime comments and suggestions. This work is supported in part by the
US Natural Science Foundation under Grant No.: 1850418.
REFERENCES
[1] Sarah Bird, Sahin Cem Geyik, Emre Kıcıman, Margaret Mitchell, and Mehrnoosh
Sameki. 2019. Fairness-Aware Machine Learning: Practical Challenges and
Lessons Learned. KDD Tutorial (2019).
[2] Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridha-
ran, Serena Wang, Blake Woodworth, and Seungil You. 2019. Training Well-
Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Con-
straints. In International Conference on Machine Learning.
[3] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard
(a) Statistical Disparity (b) Classification Error Zemel. 2012. Fairness through awareness. In Innovations in theoretical computer
science conference.
Figure 2: Performance on German Credit [4] Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Max Leiserson. 2018.
Decoupled classifiers for group-fair and efficient machine learning. In Conference
on Fairness, Accountability and Transparency.
[5] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh
Venkatasubramanian. 2015. Certifying and removing disparate impact. In SIGKDD
Conference on Knowledge Discovery and Data Mining.
[6] Kazuto Fukuchi and Jun Sakuma. 2014. Neutralized empirical risk minimization
with generalization neutrality bound. In European Conference on Machine Learning
and Principles and Practice of Knowledge Discovery in Databases.
[7] Maya Gupta, Andrew Cotter, Mahdi Milani Fard, and Serena Wang. 2018. Proxy
fairness. arXiv preprint arXiv:1806.11212 (2018).
[8] Steve Hanneke. 2007. A bound on the label complexity of agnostic active learning.
In International Conference on Machine Learning.
(a) Statistical Disparity (b) Classification Error [9] Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in
supervised learning. In Conference on Neural Information Processing Systems.
[10] Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang.
Figure 3: Performance on Credit Card Default 2018. Fairness Without Demographics in Repeated Loss Minimization. In Inter-
national Conference on Machine Learning.
5 DISCUSSION [11] Hui Hu and Chao Lan. 2020. Inference attack and defense on the distributed
private fair machine learning framework. In AAAI Workshop on Privacy-Preserving
This study makes three assumptions: (i) query cost is identical for Artificial Intelligence.
every demographic data, (ii) every data owner agrees to sell data [12] Hui Hu, Yijun Liu, Zhen Wang, and Chao Lan. 2019. A Distributed Fair Machine
Learning Framework with Private Demographic Data Protection. In International
with enough compensations, and (iii) all data owners sell data with Conference on Data Mining.
equally probabilities. We believe these are reasonable assumptions [13] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing
to start investigating the new direction, but they to have limitations Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In
International Conference on Machine Learning.
in reality. For example, a HIV-positive applicant may not sell his [14] Niki Kilbertus, Adria Gascon, Matt Kusner, Michael Veale, Krishna Gummadi, and
HIV record for any compensation to avoid discrimination, or may Adrian Weller. 2018. Blind Justice: Fairness with Encrypted Sensitive Attributes.
request higher compensation due to the rarity of such record. How In International Conference on Machine Learning.
[15] Chao Lan and Jun Huan. 2015. Reducing the unlabeled sample complexity of
to deal with these situations remain open challenges. semi-supervised multi-view learning. In SIGKDD International Conference on
The proposed QmCo strategy appears most efficient but is tai- Knowledge Discovery and Data Mining.
[16] Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN as an
lored for the decoupled model. How to extend it for other fair implementation of situation testing for discrimination discovery and prevention.
models while maintaining its superiority is an open question. In SIGKDD Conference on Knowledge Discovery and Data Mining.
Finally, the presented study is empirical but open to theoretical [17] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram
Galstyan. 2019. A survey on bias and fairness in machine learning. CoRR (2019).
justifications. For QmCo, we suspect that one can also form some [18] Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence
‘disagreement region’ (as in [8]) which can be efficiently reduced and Machine Learning 6, 1 (2012), 1–114.
by the selected instance. For QmRe, we suspect that one can prove [19] Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world:
Mitigating discrimination without collecting sensitive data. Big Data & Society 4,
fairness achieved on the selected worst-case distribution can bound 2 (2017), 2053951717743530.
fairness achieved on other distributions under proper conditions. [20] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013.
Learning fair representations. In International Conference on Machine Learning.
[21] Lu Zhang, Yongkai Wu, and Xintao Wu. 2018. Anti-discrimination Learning:
6 CONCLUSION From Association to Causation. KDD Tutorial (2018).
[22] Indrė Žliobaitė and Bart Custers. 2016. Using sensitive personal data may be
In this paper, we propose a new direction to achieve model fairness necessary for avoiding discrimination in data-driven decision models. Artificial
while protecting demographic privacy, called active demographic Intelligence and Law 24, 2 (2016), 183–201.
2132