You are on page 1of 4

Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

Active Query of Private Demographic Data


for Learning Fair Models
Yijun Liu Chao Lan
University of Wyoming University of Wyoming
Laramie, WY, USA Laramie, WY, USA
yliu20@uwyo.edu pete.chaolan@gmail.com

ABSTRACT applicant assessment model with no discrimination against HIV


Learning a fair prediction model is an important research problem carriers, most learners assume knowledge on the HIV records of
with profound societal impacts. Most approaches assume free access all applicants in the training set. In reality, however, such sensitive
to the sensitive demographic data, whereas these data are becom- data can be very restricted to use, especially by the latest privacy
ing restricted to use by privacy regulations. Existing solutions are regulations such as the Europe General Data Protection Regulation
broadly based on multi-party computation or demographic proxy, and the US California Customer Privacy Act.
but each direction has its own limits in certain scenarios. How to learn fair models while protecting demographic privacy?
In this paper, we propose a new direction called active demo- This question is drawing attentions [19, 22]. A few solutions have
graphic query. We assume sensitive demographic data can be queried been proposed, which can be roughly categorized into multi-party
with cost, e.g., a company may pay to get a customer’s consent on computation [12, 14] and demographic proxy [7, 10]. While they
using his private data. Building on Dwork’s decoupled fair model, have achieved promising results, each direction has its own limit.
we propose two active query strategies: QmCo queries for the most In this paper, we propose a new direction called active demo-
controversial data maximally disagreed by the decoupled models, graphic query. We assume a standard fairness-aware learner can
and QmRe queries for the most resistant data maximally deterio- query demographic data of certain training instances, and its ob-
rating fairness of the current model. In experiment, we show both jective is to learn a sufficiently fair model with minimum queries.
strategies efficiently improve model fairness on three data sets. This setting corresponds to a real-world scenario, where a company
can offer compensation to a customer for getting his consent upon
CCS CONCEPTS using his private data – importantly, such consented use is legalized
by many privacy regulations. A fundamental challenge, however,
• Social and professional topics → Codes of ethics; • Theory
is that the company often has a limited budget so it prefers to pay
of computation → Active learning.
as few customers as possible while maximizing its model fairness.
KEYWORDS To begin with, we assume a standard fair learner is given a small
set of training instances whose demographic data are known, e.g.,
fairness; demographic privacy; active learning; decoupled fair model a set of employees who already gave their consents. The leaner can
ACM Reference Format: certainly train a fair model from this set, but its achieved fairness
Yijun Liu and Chao Lan. 2020. Active Query of Private Demographic Data will not generalize, as suggested by studies on the generalization
for Learning Fair Models. In Proceedings of the 29th ACM International of model fairness [2, 6]. Hence, the learner better keeps querying
Conference on Information and Knowledge Management (CIKM ’20), October demographic data for more instances to enlarge the training set.
19–23, 2020, Virtual Event, Ireland. ACM, New York, NY, USA, 4 pages.
We propose two query strategies, building on Dwork et al’s de-
https://doi.org/10.1145/3340531.3412074
coupled fair model [4]. The QmCo strategy queries for the most
1 INTRODUCTION controversial instance that is maximally disagreed by the decou-
pled models, and the QmRe strategy queries for the most resistant
Machine learning models are widely applied to assist consequen- instance that maximally deteriorates fairness of the current model.
tial decision makings in job hiring, jurisdiction, banking, hospi- These strategies are motivated by active learning [18] and subgroup
talization, etc. In these applications, it is imperative for a model’s fairness [13]. In experiment, we show they effectively and efficiently
prediction to be fair across different demographic groups. improve fairness of the decoupled model on three public data sets.
In the past few years, extensive techniques have been presented Interestingly, these strategies also improve model accuracy, mani-
to achieve model fairness [1, 17, 21], and most of them assume festing a compatibility between fairness and accuracy.
free access to the sensitive demographic feature, e.g., to learn an
Permission to make digital or hard copies of all or part of this work for personal or 2 RELATED WORK
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation 2.1 Model Fairness and Demographic Privacy
on the first page. Copyrights for components of this work owned by others than ACM Enormous techniques have been proposed to achieve model fairness,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a such as pre-processing [5], regularization [20] and post-processing
fee. Request permissions from permissions@acm.org. [9]; more reference can be found in [1, 17, 21]. However, most
CIKM ’20, October 19–23, 2020, Virtual Event, Ireland techniques assume free access to sensitive demographic data, which
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-6859-9/20/10. . . $15.00 is against the latest privacy regulations. In this work, we aim to
https://doi.org/10.1145/3340531.3412074 achieve model fairness while protecting demographic privacy.

2129
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

A few approaches are proposed to achieve the above aim, in- one from each set, that has the smallest statistical disparity. In [4],
cluding multi-party computation [12, 14] and demographic proxy candidate models are trained using randomly re-weighted instances;
[7, 10]. Basically, the former assumes sensitive data are held by a in this study, we train them using bootstrapped samples because
trusted third party which can participate in model training, while they achieve better results in experiments.
the latter achieves fairness based on a proxy demographic feature.
Both directions have certain limits, however. For example, it is not 3.3 Two Proposed Query Strategies
always realistic to assume the presence of a third party which can A standard fairness-aware learner (which requires access to demo-
hold private data and participate in expensive model training as graphic data) can certainly train 𝑓 only on 𝐷𝑠 , but the achieved
instructed, and, even so, the party may be vulnerable to inference fairness will not generalize since 𝐷𝑠 is small. In this section, we pro-
attack [11]. On the other hand, one may achieve limited fairness pose to continuously enlarge 𝐷𝑠 , by actively querying 𝑠 for certain
based on a proxy demographic feature which is not close to true 𝑥 ∈ 𝐷 𝑠 and adding those data to 𝐷𝑠 .
feature – embarrassingly, if the two features are indeed close, then A naive query strategy is to query for random instances. How-
one fails to protect demographic privacy. To lift these limitations, ever, this may not be efficient, i.e., one may have to query for
we propose a new direction called active demographic query. massive instances before getting a model with desired fairness –
2.2 Active learning this is not practical, especially when the query budget is tight.
In the following, we propose two active query strategies. One
Our proposal is motivated by active learning [18], which actively aims for the most controversial instance disagreed by decoupled
queries training data for efficiently improving model performance. models, while the other aims for the most resistant instance that
While traditional active learning focuses on querying labels [8] deteriorates the current model fairness. For each strategy, we will
or unlabeled data [15] for improving model accuracy, we propose first explain the intuition and then present the algorithm.
to query demographic data for improving model fairness. To our
best knowledge, this is the first work that formally connects active 3.3.1 Query for the Most Controversial (QmCo). Imagine a sce-
learning and fairness. nario where one aims to build a gender-neutral model to assess
We propose two active query strategies. QmCo is motivated qualification of job applicants. Let 𝑥 be a random applicant and
by the disagreement-based strategy [8] which queries label for 𝑓 be a decoupled assessment model. If 𝑥 is female, her predicted
instance disagreed by a set of models, and by Dwork’s decoupled fair qualification is 𝑓1 (𝑥); otherwise, his predicted qualification is 𝑓2 (𝑥).
model [4] which learns separate models for different demographic Equality is another common notion of fairness, which states that
groups – QmCo integrates both works. QmRe is partly motivated similarly situated people should receive similar decisions. From this
by [13] which aims to achieve fairness on the subgroup that suffers view, we argue that 𝑓1 (𝑥) and 𝑓2 (𝑥) should not differ drastically,
most disparity – we query demographic data for the instance that since they evaluate the same set of skills described by 𝑥, preassum-
suggests the highest disparity of the present model. ing these skills are indeed sufficient and necessary. Therefore, if 𝑥
is largely disagreed by 𝑓1 and 𝑓2 , then it is a controversial instance
3 METHODOLOGY and deserves special attention from the learner.
Although equality and statistical disparity are not equivalent
3.1 Preliminaries
notions, we hypothesize that achieving equality helps to mitigate
We describe a random instance by a triple (𝑥, 𝑠, 𝑦), where 𝑠 is the disparity – this is partly evident from Dwork et al’s seminal formal-
sensitive demographic feature (e.g., HIV record), 𝑥 is a vector of the ization on equality [3], where they prove a model’s high equality
rest non-sensitive features, and 𝑦 is the label. Our goal is to learn a implies its low statistical disparity under mild conditions.
model 𝑓 : 𝑥 → 𝑦 that does not discriminate against the protected To this end, we propose an active demographic query strategy
group defined by 𝑠. Like prior studies, assume 𝑠 and 𝑦 are binary. called Query for the Most Controversial (QmCo). Define the contro-
Assume two data sets are available. Set 𝐷𝑠 = {(𝑥𝑖 , 𝑠𝑖 , 𝑦𝑖 )}𝑖=1,...,𝑚 versial degree of an instance 𝑥 as
is small but has demographic data, and set 𝐷 𝑠 = {(𝑥𝑖 ′ , _, 𝑦𝑖 ′ )}𝑖 ′ =1,...,𝑚′
𝐶𝑜 (𝑥) = | 𝑝 (𝑓1 (𝑥) = 1 | 𝑥) − 𝑝 (𝑓2 (𝑥) = 1 | 𝑥) |. (2)
is large but has no demographic data. A learner can query 𝑠 for any
𝑥 ∈ 𝐷 𝑠 , but prefers to minimize its number of queries. Then, we query for the most controversial instance, i.e.,
After a model 𝑓 is learned, we measure its fairness based a com- 𝑥 = arg max𝑥 ′ ∈𝐷 𝐶𝑜 (𝑥 ′ ). (3)
mon notion of statistical disparity[20], defined as 𝑠

Our learning approach is summarized in Algorithm 1.


𝑆𝐷 (𝑓 ) = | 𝑝 (𝑓 (𝑥) = 1 | 𝑠 = 0) − 𝑝 (𝑓 (𝑥) = 1 | 𝑠 = 1) | . (1)
3.3.2 Query for the Most Resistant (QmRe). For model fairness to
3.2 A Revisit of the Decoupled Fair Model generalize, we need knowledge on the population distribution. But
Dwork et al [4] propose a decoupled fair model which assigns one such knowledge is limited when training set is small.
model for each group. We build our work on this base model. Now, the process of querying demographic data to enlarge train-
Let 𝑓1 denote the model for one demographic group (e.g., HIV ing set can be considered as a process of exploring the distribution.
positive) and 𝑓2 for the other group. When the context is clear, we Typically, querying for two different instances 𝑧 1 and 𝑧 2 will suggest
will write their integrated model as 𝑓 , defined as 𝑓 (𝑥) = 𝑓1 (𝑥) if 𝑥 two possible distributions 𝑝 1 and 𝑝 2 , respectively. Which distribu-
belongs to the first group and 𝑓 (𝑥) = 𝑓2 (𝑥) otherwise. tion is worth learning? This is a key question to the learner.
We follow [4] to train 𝑓 , i.e., first train two sets of candidate We propose to learn from the worst-case distribution, where
models, one for each group; then pick up a pair of candidates models, the current model shows the largest disparity and hence should be

2130
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

Algorithm 1 Query for the Most Controversial (QmCo) demographic feature and ‘default payment’ as label. We randomly
down-sampled the data set to 2000 users to speed up computation.
Input: Data sets 𝐷𝑠 and 𝐷 𝑠 .
The Communities Crime data set contains records of 1993 com-
Output: A decoupled fair model 𝑓 = (𝑓1, 𝑓2 ).
munities described by 128 attributes. We treated ‘fraction of African-
1: Train 𝑓 using 𝐷𝑠 .
American residents’ as sensitive demographic feature, and binarized
2: While stopping criterion is not meet:
it so that a community is considered as ‘minority’ if the fraction is
3: Query 𝑠 for an 𝑥 ∈ 𝐷 𝑠 by (3); move it from 𝐷 𝑠 to 𝐷𝑠 .
below 0.5 and as ‘non-minority’ otherwise. We treated ‘community
4: Re-train 𝑓 using 𝐷𝑠 .
crime rate’ as label, and binarized it so that the rate is considered
‘high’ if it is above 0.5 and considered ‘low’ otherwise.
Algorithm 2 Query for the Most Resistant (QmRe) We standardized all data sets using the methods in [16].

Input: Data sets 𝐷𝑠 and 𝐷 𝑠 . 4.2 Experiment Design


Output: A decoupled fair model 𝑓 = (𝑓1, 𝑓2 ).
We randomly split a data set into three folds: the first is 𝐷𝑠 and
1: Train 𝑓 using 𝐷𝑠 .
2: While stopping criterion is not meet:
contains 5% of the instances; the second is 𝐷 𝑠 and contains 70%
of the instances; the last is testing set and contains the remaining
3: Query 𝑠 for an 𝑥 ∈ 𝐷 𝑠 by (6); move it from 𝐷 𝑠 to 𝐷𝑠 .
instances. We ran every algorithm on the same 20 random trials
4: Retrain 𝑓 by 𝐷𝑠
and reported its average performance.
5: Reestimate 𝑝 (𝑠 = 0 | 𝑥) and 𝑝 (𝑠 = 1 | 𝑥) from 𝐷𝑠 .
To generate a decoupled fair model, we first generated 10 candi-
date models for each demographic group, where each model was
trained on a bootstrap sample of 20 instances from 𝐷𝑠 . Logistic
retrained. Our hypothesis is that if a model achieves low disparity
regression was used as the base model and trained with a regular-
on the worst-case distribution, it can achieve low disparity on other
ization coefficient of 0.5. We chose this setting as it led to optimal
distributions including the true one.
performance of the decoupled model.
To this end, we propose to query for the most resistant instance.
Since this is the first work on active demographic query (to our
Let 𝑆𝐷 (𝑓 | 𝐷) be the disparity of 𝑓 estimated from an arbitrary data
best knowledge), we considered random query as the baseline, and
set 𝐷. Suppose the demographic data 𝑠 of an instance 𝑥 is unknown.
evaluated every query strategy based on two metrics: (i) statistical
We define the resistant degree of 𝑥 with respect to 𝐷 as
disparity and (ii) classification error.
𝑅𝑒 (𝑥; 𝐷) = 𝐸𝑠 [𝑆𝐷 (𝑓 | 𝐷 ∪ 𝑥)] (4)
where the expectation is taken over the randomness of 𝑠. In this
4.3 Results and Discussions
study, we choose to model the randomness using a logistic function Results on the three data sets are shown in Figures 2 – 1, respec-
tively.1 In each figure, x-axis is the number of queried instances, and
𝑝 (𝑠 = 0 | 𝑥) = 1 / [1 + exp(−𝛽𝑇 𝑥 ′ )], (5) y-axis is model performance. We have the following observations.
where parameter 𝛽 is estimated from 𝐷. (1) Both QmCo and QmRe improve model fairness. This suggests
Our strategy queries for the most resistant instance, i.e., the effectiveness of our proposed hypotheses on learning from
controversial instances and worst-case distributions.
𝑥 = arg max𝑥 ′ ∈𝐷 𝑅𝑒 (𝑥 ′ ; 𝐷𝑠 ). (6)
𝑠 (2) Both QmCo and QmRe improve fairness more efficiently than
The learning approach is summarized in Algorithm 2. random query. This suggests the efficiency of our proposed hy-
potheses. This also suggests the technical necessity of designing
3.4 Practical Issues proper demographic query strategies for fairness-aware learners.
QmCo is tailored for the decoupled fair model, while QmRe is more (3) QmCo appears more efficient than QmRe on two data sets. There
generic and applicable with other fair models. We thus expect QmCo are two possible reasons: QmCo is tailored for the decoupled model,
to be more efficient in experiment. Both strategies can stop querying and QmRe is a worst-case learner (but more generic).
when a preset model fairness or query number is achieved. They (4) On the Credit data set, QmRe appears converging to a worse op-
can query for multiple instances per round, but in our experiments timum but improving accuracy more efficiently. This may be again
they perform best when one instance is queried per round. because it considers worst-case fairness, but we are still investigat-
ing how the resistant instances may be connected to accuracy.
4 EXPERIMENT (5) On the Crime data set, both QmCo and QmRe seem to reduce
4.1 Data Preparation disparity at first but increases it after 200 instances are queried. Our
We experimented on there public data sets: German Credit, Credit hypothesis is data may contain noise and both strategies manage
Card Default and Community Crime. to first pick up less-noisy data to achieve higher fairness.
The German Credit data set contains records of 1000 bank ac- (6) Both QmCo and QmRe improve model accuracy, but is no more
count holders described by 20 attributes. We treated ‘gender’ as efficient than random query. This seems to suggest a compatible
sensitive demographic feature and ‘credit rating’ as label. relation between fairness and accuracy through the query process.
The Credit Card Default data set contains records of 30000 users
described by 23 attributes. We treated ‘education degree’ as sensitive 1 For better visualization purpose, we show results after every 20 instances are queried.

2131
Short Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

query. Building on a prior decoupled fair model, we propose two


query strategies: QmCo queries for an instance maximally disagreed
by the decoupled models, and QmRe queries for an instance maxi-
mally deteriorating fairness of the current model. In experiment,
we show both strategies effectively and efficiently improve model
fairness on three data sets, and improve model accuracy as well.

ACKNOWLEDGMENTS
(a) Statistical Disparity (b) Classification Error The authors would like to thank anonymous reviewers for their
Figure 1: Performance on Communities and Crime comments and suggestions. This work is supported in part by the
US Natural Science Foundation under Grant No.: 1850418.

REFERENCES
[1] Sarah Bird, Sahin Cem Geyik, Emre Kıcıman, Margaret Mitchell, and Mehrnoosh
Sameki. 2019. Fairness-Aware Machine Learning: Practical Challenges and
Lessons Learned. KDD Tutorial (2019).
[2] Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridha-
ran, Serena Wang, Blake Woodworth, and Seungil You. 2019. Training Well-
Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Con-
straints. In International Conference on Machine Learning.
[3] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard
(a) Statistical Disparity (b) Classification Error Zemel. 2012. Fairness through awareness. In Innovations in theoretical computer
science conference.
Figure 2: Performance on German Credit [4] Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Max Leiserson. 2018.
Decoupled classifiers for group-fair and efficient machine learning. In Conference
on Fairness, Accountability and Transparency.
[5] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh
Venkatasubramanian. 2015. Certifying and removing disparate impact. In SIGKDD
Conference on Knowledge Discovery and Data Mining.
[6] Kazuto Fukuchi and Jun Sakuma. 2014. Neutralized empirical risk minimization
with generalization neutrality bound. In European Conference on Machine Learning
and Principles and Practice of Knowledge Discovery in Databases.
[7] Maya Gupta, Andrew Cotter, Mahdi Milani Fard, and Serena Wang. 2018. Proxy
fairness. arXiv preprint arXiv:1806.11212 (2018).
[8] Steve Hanneke. 2007. A bound on the label complexity of agnostic active learning.
In International Conference on Machine Learning.
(a) Statistical Disparity (b) Classification Error [9] Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in
supervised learning. In Conference on Neural Information Processing Systems.
[10] Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang.
Figure 3: Performance on Credit Card Default 2018. Fairness Without Demographics in Repeated Loss Minimization. In Inter-
national Conference on Machine Learning.
5 DISCUSSION [11] Hui Hu and Chao Lan. 2020. Inference attack and defense on the distributed
private fair machine learning framework. In AAAI Workshop on Privacy-Preserving
This study makes three assumptions: (i) query cost is identical for Artificial Intelligence.
every demographic data, (ii) every data owner agrees to sell data [12] Hui Hu, Yijun Liu, Zhen Wang, and Chao Lan. 2019. A Distributed Fair Machine
Learning Framework with Private Demographic Data Protection. In International
with enough compensations, and (iii) all data owners sell data with Conference on Data Mining.
equally probabilities. We believe these are reasonable assumptions [13] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing
to start investigating the new direction, but they to have limitations Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In
International Conference on Machine Learning.
in reality. For example, a HIV-positive applicant may not sell his [14] Niki Kilbertus, Adria Gascon, Matt Kusner, Michael Veale, Krishna Gummadi, and
HIV record for any compensation to avoid discrimination, or may Adrian Weller. 2018. Blind Justice: Fairness with Encrypted Sensitive Attributes.
request higher compensation due to the rarity of such record. How In International Conference on Machine Learning.
[15] Chao Lan and Jun Huan. 2015. Reducing the unlabeled sample complexity of
to deal with these situations remain open challenges. semi-supervised multi-view learning. In SIGKDD International Conference on
The proposed QmCo strategy appears most efficient but is tai- Knowledge Discovery and Data Mining.
[16] Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN as an
lored for the decoupled model. How to extend it for other fair implementation of situation testing for discrimination discovery and prevention.
models while maintaining its superiority is an open question. In SIGKDD Conference on Knowledge Discovery and Data Mining.
Finally, the presented study is empirical but open to theoretical [17] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram
Galstyan. 2019. A survey on bias and fairness in machine learning. CoRR (2019).
justifications. For QmCo, we suspect that one can also form some [18] Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence
‘disagreement region’ (as in [8]) which can be efficiently reduced and Machine Learning 6, 1 (2012), 1–114.
by the selected instance. For QmRe, we suspect that one can prove [19] Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world:
Mitigating discrimination without collecting sensitive data. Big Data & Society 4,
fairness achieved on the selected worst-case distribution can bound 2 (2017), 2053951717743530.
fairness achieved on other distributions under proper conditions. [20] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013.
Learning fair representations. In International Conference on Machine Learning.
[21] Lu Zhang, Yongkai Wu, and Xintao Wu. 2018. Anti-discrimination Learning:
6 CONCLUSION From Association to Causation. KDD Tutorial (2018).
[22] Indrė Žliobaitė and Bart Custers. 2016. Using sensitive personal data may be
In this paper, we propose a new direction to achieve model fairness necessary for avoiding discrimination in data-driven decision models. Artificial
while protecting demographic privacy, called active demographic Intelligence and Law 24, 2 (2016), 183–201.

2132

You might also like