Professional Documents
Culture Documents
Learning in Practice
Bargav Jayaraman and David Evans, University of Virginia
https://www.usenix.org/conference/usenixsecurity19/presentation/jayaraman
aggregation of corresponding mean and variance values of the Steinke [9] give a relationship between CDP and zCDP, which
individual sub-Gaussian distributions. This can be converted allows an indirect mapping from CDP to DP (Table 1).
to a cumulative privacy budget similar to the advanced com- The use of Rényi divergence as a metric to bound the pri-
position theorem, which in turn reduces the noise that must vacy loss leads to the formulation of a more generic notion of
be added to the individual mechanisms. The authors call this Rényi differential privacy that is applicable to any individual
concentrated differential privacy [18]: moment of privacy loss random variable:
Definition 2.3 (Concentrated Differential Privacy (CDP)). A Definition 2.5 (Rényi Differential Privacy (RDP) [49]). A
randomized algorithm M is (µ, τ)-concentrated differentially randomized mechanism M is said to have -Rényi differential
private if, for all pairs of adjacent data sets D and D0 , privacy of order α (which can be abbreviated as (α, )-RDP),
if for any adjacent data sets D, D0 it holds that
DsubG (M(D) || M(D0 )) ≤ (µ, τ)
Dα (M(D) || M(D0 )) ≤ .
where the sub-Gaussian divergence, DsubG , is defined such
that the expected privacy loss is bounded by µ and after The main difference is that CDP and zCDP linearly bound
subtracting µ, the resulting centered sub-Gaussian distribu- all positive moments of privacy loss, whereas RDP bounds
tion has standard deviation τ. Any -DP algorithm satisfies one moment at a time, which allows for a more accurate
( · (e − 1)/2, )-CDP, however the converse is not true. numerical analysis of privacy loss [49]. If M is an (α, )-RDP
log 1/δ
A variation on CDP, zero-concentrated differential privacy mechanism, it also satisfies ( + α−1 , δ)-DP for any 0 < δ < 1.
(zCDP) [9] uses Rényi divergence as a different method to Table 1 compares the relaxed variations of differential pri-
show that the privacy loss random variable follows a sub- vacy. For all the variations, the privacy budget grows sub-
Gaussian distribution. linearly with the number of compositions k.
Definition 2.4 (Zero-Concentrated Differential Privacy Moments Accountant. Motivated by relaxations of differen-
(zCDP)). A randomized mechanism M is (ξ, ρ)-zero- tial privacy, Abadi et al. [1] propose the moments accountant
concentrated differentially private if, for all neighbouring data (MA) mechanism for bounding the cumulative privacy loss
sets D and D0 and all α ∈ (1, ∞), of differentially private algorithms. The moments accountant
keeps track of a bound on the moments of the privacy loss
Dα (M(D) || M(D0 )) ≤ ξ + ρα random variable during composition. Though the authors do
where Dα (M(D) || M(D0 )) is the α-Rényi divergence be- not formalize this as a relaxed definition, their definition of
tween the distribution of M(D) and the distribution of M(D0 ). the moments bound is analogous to the Rényi divergence [49].
Thus, the moments accountant can be considered as an instan-
Dα also gives the α-th moment of the privacy loss random tiation of Rényi differential privacy. The moments accountant
variable. For example, D1 gives the first order moment which is widely used for differentially private deep learning due
is the mean or the expected privacy loss, and D2 gives the to its practical implementation in the TensorFlow Privacy
second order moment or the variance of privacy loss. There is library [2] (see Section 2.3 and Table 4).
a direct relation between DP and zCDP. If M satisfies -DP,
then it also satisfies ( 12 2 )-zCDP. Furthermore, if M provides
2.2 Differential Privacy Methods for ML
ρ-zCDP, it is (ρ + 2 ρ log(1/δ), δ)-DP for any δ > 0.
p
The Rényi divergence allows zCDP to be mapped back This section summarizes methods for modifying machine
to DP, which is not the case for CDP. However, Bun and learning algorithms to satisfy differential privacy. First, we
Table 2: Simple ERM Methods which achieve High Utility with Low Privacy Budget.
† While MNIST is normally a 10-class task, Song et al. [63] use this for ‘1 vs rest’ binary classification.
and recommender systems [47] were able to achieve both high izing privacy-preserving machine learning to more com-
utility and privacy with settings close to 1. These methods plex scenarios such as learning in high-dimensional set-
rely on finding frequency counts as a sub-routine, and hence tings [33, 34, 64], learning without strong convexity assump-
provide -differential privacy by either perturbing the counts tions [65], or relaxing the assumptions on data and objective
using Laplace noise or by releasing the top frequency counts functions [62, 68, 77]. However, these advances are mainly of
using the exponential mechanism [48]. Machine learning, on theoretical interest and only a few works provide implemen-
the other hand, performs much more complex data analysis, tations [33, 34].
and hence requires higher privacy budgets to maintain utility.
Next, we cover simple binary classification works that use Complex learning tasks. All of the above works are lim-
small privacy budgets ( ≤ 1). Then we survey complex clas- ited to convex learning problems with binary classification
sification tasks which seem to require large privacy budgets. tasks. Adopting their approaches to more complex learning
Finally, we summarize recent works that aim to perform com- tasks requires higher privacy budgets (see Table 3). For in-
plex tasks with low privacy budgets by using relaxed defini- stance, the online version of ERM as considered by Jain et
tions of differential privacy. al. [32] requires as high as 10 to achieve acceptable utility.
From the definition of differential privacy, we can see that
Binary classification. The first practical implementation of a Pr[M(D) ∈ S ] ≤ e10 × Pr[M(D0 ) ∈ S ]. In other words, even if
private machine learning algorithm was proposed by Chaud- the model’s output probability is 0.0001 on a data set D0 that
huri and Monteleoni [11]. They provide a novel sensitivity doesn’t contain the target record, the model’s output proba-
analysis under strong convexity constraints, allowing them to bility can be as high as 0.9999 on a neighboring data set D
use output and objective perturbation for binary logistic re- that contains the record. This allows an adversary to infer
gression. Chaudhuri et al. [12] subsequently generalized this the presence or absence of a target record from the training
method for ERM algorithms. This sensitivity analysis method data with high confidence. Adopting these binary classifica-
has since been used by many works for binary classifica- tion methods for multi-class classification tasks requires even
tion tasks under different learning settings (listed in Table 2). higher values. As noted by Wu et al. [70], it would require
While these applications can be implemented with low pri- training a separate binary classifier for each class. Finally,
vacy budgets ( ≤ 1), they only perform learning in restricted high privacy budgets are required for non-convex learning
settings such as learning with low dimensional data, smooth algorithms, such as deep learning [60, 79]. Since the output
objective functions and strong convexity assumptions, and are and objective perturbation methods of Chaudhuri et al. [12]
only applicable to simple binary classification tasks. are not applicable to non-convex settings, implementations
There has also been considerable progress in general- of differentially private deep learning rely on gradient pertur-
Table 4: Gradient Perturbation based Classification Methods using Relaxed Notion of Differential Privacy
The aim of a membership inference attack is to infer whether Connection to Differential Privacy. Differential privacy, by
or not a given record is present in the training set. Membership definition, aims to obfuscate the presence or absence of a
inference attacks can uncover highly sensitive information record in the data set. On the other hand, membership in-
from training data. An early membership inference attack ference attacks aim to identify the presence or absence of
showed that it is possible to identify individuals contributing a record in the data set. Thus, intuitively these two notions
DNA to studies that analyze a mixture of DNA from many counteract each other. Li et al. [42] point to this fact and
individuals, using a statistical distance measure to determine provide a direct relationship between differential privacy and
if a known individual is in the mixture [27]. membership inference attacks. Backes et al. [4] studied mem-
Membership inference attacks can either be completely bership inference attacks on microRNA studies and showed
black-box where an attacker only has query access to the that differential privacy can reduce the success of membership
Accuracy Loss
0.6 0.6
0.4 AC
0.4 RDP zCDP NC
0.2
0.2
0.0
0.0
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( )
(a) Batch gradient clipping (b) Per-instance gradient clipping
model’s expected training loss, otherwise the record is classi- budget more efficiently, resulting in more accurate models
fied as a non-member. The same principle is used for attribute for a given privacy budget. We use the TensorFlow Privacy
inference. Given an input record, the attacker brute-forces all framework [2] which implements both batch and per-instance
possible values for the unknown private attribute and observes clipping. We fix the clipping threshold at C = 1.
the target model’s loss, outputting the value for which the loss Figure 1 compares the accuracy loss of logistic regression
is closest to the target’s expected training loss. Since there are models trained over CIFAR-100 data set with both batch clip-
no attributes in our data sets that are explicitly annotated as ping and per-instance clipping. Per-instance clipping allows
private, we randomly choose five attributes, and perform the learning more accurate models for all values of and ampli-
attribute inference attack on each attribute independently, and fies the differences between the different mechanisms. For
report the averaged results. example, the model trained with RDP achieves accuracy close
to the non-private model for = 100 when performing per-
Hyperparameters. For both data sets, we train logistic re- instance clipping. Whereas, the models do not learn anything
gression and neural network models with `2 regularization. useful when using batch clipping. Hence, for the rest of the
First, we train a non-private model and perform a grid search paper we only report the results for per-instance clipping.
over the regularization coefficient λ to find the value that min-
imizes the classification error on the test set. For CIFAR-100,
we found optimal values to be λ = 10−5 for logistic regres- 4.2 Logistic Regression Results
sion and λ = 10−4 for neural network. For Purchase-100, we
found optimal values to be λ = 10−5 for logistic regression We train `2 -regularized logistic regression models on both the
and λ = 10−8 for neural network. Next, we fix this setting to CIFAR-100 and Purchase-100 data sets.
train differentially private models using gradient perturbation.
CIFAR-100. The baseline model for non-private logistic re-
We vary between 0.01 and 1000 while keeping δ = 10−5 ,
and report the accuracy loss and privacy leakage. The choice gression achieves accuracy of 0.225 on training set and 0.155
of δ = 10−5 satisfies the requirement that δ should be smaller on test set, which is competitive with the state-of-art neural
than the inverse of the training set size 10,000. We use the network model [61] that achieves test accuracy close to 0.20
ADAM optimizer for training and fix the learning rate to 0.01 on CIFAR-100 after training on larger data set. Thus, there is a
with a batch size of 200. Due to the random noise addition, small generalization gap of 0.07, which the inference attacks
all the experiments are repeated five times and the average try to exploit.
results and standard errors are reported. We do not assume Figure 1(b) compares the accuracy loss for logistic regres-
pre-trained model parameters, unlike the prior works of Abadi sion models trained with different relaxed notions of differen-
et al. [1] and Yu et al. [75]. tial privacy as we varying the privacy budget . The accuracy
loss is normalized with respect to the accuracy of non-private
Clipping. For gradient perturbation, clipping is required to model to clearly depict the model utility. An accuracy loss
bound the sensitivity of the gradients. We tried clipping at value of 1 means that the model has 100% loss and hence
both the batch and per-instance level. Batch clipping is more has no utility, whereas the value of 0 means that the model
computationally efficient and a standard practice in deep learn- achieves same accuracy as the non-private baseline. As de-
ing. On the other hand, per-instance clipping uses the privacy picted in the figure, naïve composition achieves accuracy
Privacy Leakage
Privacy Leakage
-DP Bound -DP Bound
Privacy Leakage
Table 5: Number of individuals (out of 10,000) exposed by Yeom et al. membership inference attack on logistic regression
(CIFAR-100). The non-private ( = ∞) model leaks 129, 240 and 704 members for 1%, 2% and 5% FPR respectively.
Purchase-100. The baseline model for non-private logistic of Shokri et al. [61]. Their model is trained on a training set
regression achieves accuracy of 0.942 on the training set of size 29,540 and achieves test accuracy of 0.20, whereas
and 0.695 on test set. In comparison, Google ML platform’s our model is trained on 10,000 training instances. There is a
black-box trained model achieves a test accuracy of 0.656 for huge generalization gap of 0.832, which the inference attacks
Purchase-100 (see Shokri et al. [61] for details). can exploit. Figure 5(a) compares the accuracy loss of neu-
Figure 3 shows the accuracy loss of all differential privacy ral network models trained with different relaxed notions of
variants on Purchase-100 data set. Naïve composition and ad- differential privacy with varying privacy budget . The model
vanced composition have essentially no utility until exceeds trained with naïve composition does not learn anything use-
100. At = 1000, naïve composition achieves accuracy loss ful until = 100 (accuracy loss of 0.907 ± 0.004), at which
of 0.116 ± 0.003, the advanced composition achieves accuracy point the advanced composition also has accuracy loss close
loss of 0.513 ± 0.003 and the other variants achieve accuracy to 0.935 and the other variants achieve accuracy loss close to
loss close to 0.02. RDP achieves the best utility across all 0.24. None of the variants approach zero accuracy loss, even
values. zCDP performs better than advanced composition and for = 1000. The relative performance is similar to that of
naïve composition. logistic regression model discussed in Section 4.2.
Figure 4 compares the privacy leakage of the variants Figures 6(a) and 6(b) shows the privacy leakage due to
against the inference attacks. The leakage is in accordance membership inference attacks on neural network models
to the noise each variant adds and it increases proportionally trained with different relaxed notions for both attacks. The pri-
to the model utility. Hence, if a model has reasonable utility, vacy leakage for each variation of differential privacy accords
it is bound to leak membership information. The white-box with the amount of noise it adds to the model. The leakage
membership inference attack of Yeom et al. is relatively more is significant for relaxed variants at higher values due to
effective than the black-box membership inference attack of model overfitting. For = 1000, with the Shokri et al. attack,
Shokri et al. as shown in Figures 4(a) and 4(b). Table 6 shows naïve composition has leakage of 0.034 compared to 0.002 for
the number of individual members exposed, with similar re- advanced composition, 0.219 for zCDP, and 0.277 for RDP
sults to the findings for CIFAR-100. (above the region shown in the plot). For the white-box at-
tacker of Yeom et al. [74], RDP leaks the most for = 1000
(membership advantage of 0.399) closely followed by zCDP.
4.3 Neural Networks This is because these relaxed variations add considerably less
noise in comparison to naïve composition. Naïve composi-
We train a neural network model consisting of two hidden lay-
tion and advanced composition achieve strong privacy against
ers and an output layer. The hidden layers have 256 neurons
membership inference attackers, but fail to learning anything
that use ReLU activation. The output layer is a softmax layer
useful. No option appears to provide both acceptable model
with 100 neurons, each corresponding to a class label. This
utility and meaningful privacy.
architecture is similar to the one used by Shokri et al. [61].
Like we did for logistic regression, we report the actual
CIFAR-100. The baseline non-private neural network model number of training set members exposed to the attacker in Ta-
achieves accuracy of 1.000 on the training set and 0.168 on ble 7. The impact of privacy leakage is far more severe for the
test set, which is competitive to the neural network model non-private neural network model due to model overfitting—
Privacy Leakage
Privacy Leakage
Privacy Leakage
-DP Bound 0.15
0.15 0.15
0.10 RDP NC
0.10 0.10
NC zCDP AC
RDP zCDP 0.05
0.05 0.05
NC AC 0.00
RDP zCDP
0.00 AC 0.00
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( ) Privacy Budget ( )
(a) Shokri et al. membership inference (b) Yeom et al. membership inference (c) Yeom et al. attribute inference
1.0 1.0
AC AC
0.8 RDP zCDP 0.8 RDP zCDP
NC NC
Accuracy Loss
Accuracy Loss
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( )
(a) CIFAR-100 (b) Purchase-100
-DP Bound
Privacy Leakage
Table 6: Number of members (out of 10,000) exposed by Yeom et al. membership inference attack on logistic regression
(Purchase-100). The non-private ( = ∞) model leaks 102, 262 and 716 members for 1%, 2% and 5% FPR respectively.
Table 7: Number of members (out of 10,000) exposed by Yeom et al. membership inference attack on neural network (CIFAR-100).
The non-private ( = ∞) model leaks 0, 556 and 7349 members for 1%, 2% and 5% FPR respectively.
exposing over 73% of training set members at 5% false pos- test set. In comparison, the neural network model of Shokri et
itive rate, compared to only 7% for the logistic regression al. [61] trained on a similar data set (but with 600 attributes
model.2 The privacy mechanisms provide substantial reduc- instead of 100 as in our data set) achieves 0.670 test accuracy.
tion in exposure, even with high budgets, but the relaxed Figure 5(b) compares the accuracy loss, and Figure 7 the pri-
variants expose more members compared to naïve composi- vacy leakage, of neural network models trained with different
tion and advanced composition. variants of differential privacy. The trends for both accuracy
Figure 6(c) depicts the privacy leakage due to attribute and privacy are similar to those for the logistic regression mod-
inference attack on the neural network models. Naïve com- els (Figure 3). The relaxed variants achieve model utility close
position and advanced composition are both resistant to the to the non-private baseline for = 1000, while naïve compo-
attack for ≤ 100, but the relaxed variants reveal some privacy sition continues to suffer from high accuracy loss (0.372).
leakage for lower privacy budgets. Advanced composition has higher accuracy loss of 0.702 for
= 1000 as it requires addition of more noise than naïve com-
Purchase-100. The baseline non-private neural network mod-
position when is greater than the number of training epochs.
el achieves accuracy of 0.982 on the training set and 0.605 on
Figure 7 shows the privacy leakage comparison of the variants
2 Curiously, this appears to be contradicted at 1% FPR where no mem- against the inference attacks. The results are consistent with
bers are revealed by non-private NN model but some are revealed by the those observed for CIFAR-100.
privacy-preserving models. This is due to the number of extremely high-
confidence incorrect outputs of the non-private model, meaning that there is
no confidence threshold that does not include at least 1% false positives.
-DP Bound
Privacy Leakage
Privacy Leakage
0.15 0.15 0.15
0.10 0.10 0.10
NC zCDP
0.05 zCDP 0.05 0.05 zCDP RDP
AC NC AC
0.00 0.00 AC 0.00 NC
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( ) Privacy Budget ( )
(a) Shokri et al. membership inference (b) Yeom et al. membership inference (c) Yeom et al. attribute inference
[6] Brett K Beaulieu-Jones, William Yuan, Samuel G [20] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon
Finlayson, and Zhiwei Steven Wu. Privacy-pre- Lin, David Page, and Thomas Ristenpart. Privacy in
serving distributed deep learning for clinical data. pharmacogenetics: An end-to-end case study of person-
arXiv:1812.01484, 2018. alized warfarin dosing. In USENIX Security Symposium.
[7] Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and [21] Arik Friedman and Assaf Schuster. Data mining with
Abhradeep Thakurta. Discovering frequent patterns differential privacy. In ACM SIGKDD Conference on
in sensitive data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
Knowledge Discovery and Data Mining, 2010. [22] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and
[8] Abhishek Bhowmick, John Duchi, Julien Freudiger, Nikita Borisov. Property inference attacks on fully con-
Gaurav Kapoor, and Ryan Rogers. Protection against nected neural networks using permutation invariant rep-
reconstruction and its applications in private federated resentations. In ACM Conference on Computer and
learning. arXiv:1812.00984, 2018. Communications Security, 2018.
[9] Mark Bun and Thomas Steinke. Concentrated differ- [23] Joseph Geumlek, Shuang Song, and Kamalika Chaud-
ential privacy: Simplifications, extensions, and lower huri. Rényi differential privacy mechanisms for pos-
bounds. In Theory of Cryptography Conference, 2016. terior sampling. In Advances in Neural Information
Processing Systems, 2017.
[10] Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlings-
son, and Dawn Song. The Secret Sharer: Evaluating and [24] Robin C Geyer, Tassilo Klein, and Moin Nabi. Differen-
testing unintended memorization in neural networks. In tially private federated learning: A client level perspec-
USENIX Security Symposium, 2019. tive. arXiv:1712.07557, 2017.
[11] Kamalika Chaudhuri and Claire Monteleoni. Privacy- [25] Jihun Hamm, Paul Cao, and Mikhail Belkin. Learning
preserving logistic regression. In Advances in Neural privately from multiparty data. In International Confer-
Information Processing Systems, 2009. ence on Machine Learning, 2016.
[12] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. [26] Michael Hay, Ashwin Machanavajjhala, Gerome Mik-
Sarwate. Differentially private Empirical Risk Mini- lau, Yan Chen, and Dan Zhang. Principled evaluation
mization. Journal of Machine Learning Research, 2011. of differentially private algorithms using DPBench. In
ACM SIGMOD Conference on Management of Data,
[13] Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng
2016.
Zhang, and Daniel Kifer. Detecting violations of differ-
ential privacy. In ACM Conference on Computer and [27] Nils Homer et al. Resolving individuals contributing
Communications Security, 2018. trace amounts of DNA to highly complex mixtures us-
[33] Prateek Jain and Abhradeep Thakurta. Differentially pri- [47] Frank McSherry and Ilya Mironov. Differentially private
vate learning with kernels. In International Conference recommender systems: Building privacy into the Netflix
on Machine Learning, 2013. prize contenders. In ACM SIGKDD Conference on
Knowledge Discovery and Data Mining, 2009.
[34] Prateek Jain and Abhradeep Guha Thakurta. (Near)
Dimension independent risk bounds for differentially [48] Frank McSherry and Kunal Talwar. Mechanism design
private learning. In International Conference on Ma- via differential privacy. In Symposium on Foundations
chine Learning, 2014. of Computer Science, 2007.
[35] Bargav Jayaraman, Lingxiao Wang, David Evans, and [49] Ilya Mironov. Rényi differential privacy. In IEEE Com-
Quanquan Gu. Distributed learning without distress: puter Security Foundations Symposium, 2017.
Privacy-preserving Empirical Risk Minimization. In
[50] Luis Munoz-González, Battista Biggio, Ambra Demon-
Advances in Neural Information Processing Systems,
tis, Andrea Paudice, Vasin Wongrassamee, Emil C.
2018.
Lupu, and Fabio Roli. Towards poisoning of deep learn-
[36] Kaggle, Inc. Acquire Valued Shoppers Challenge. https: ing algorithms with back-gradient optimization. In ACM
//kaggle.com/c/acquire-valued-shoppers-challenge/data, Workshop on Artificial Intelligence and Security, 2017.
2014.
[51] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith.
[37] Diederik P Kingma and Jimmy Ba. Adam: A method Smooth sensitivity and sampling in private data analysis.
for stochastic optimization. In International Conference In ACM Symposium on Theory of Computing, 2007.
on Learning Representations, 2015.
[52] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian
[38] Alex Krizhevsky. Learning multiple layers of fea- Goodfellow, and Kunal Talwar. Semi-supervised knowl-
tures from tiny images. Technical report, University edge transfer for deep learning from private training
of Toronto, 2009. data. In International Conference on Learning Repre-
sentations, 2017.
[39] Jaewoo Lee. Differentially private variance reduced
stochastic gradient descent. In International Conference [53] Mijung Park, Jimmy Foulds, Kamalika Chaudhuri, and
on New Trends in Computing Sciences, 2017. Max Welling. DP-EM: Differentially private expectation
maximization. In Artificial Intelligence and Statistics,
[40] Dong-Hui Li and Masao Fukushima. A modified BFGS
2017.
method and its global convergence in nonconvex mini-
mization. Journal of Computational and Applied Math- [54] Manas Pathak, Shantanu Rane, and Bhiksha Raj. Mul-
ematics, 2001. tiparty Differential Privacy via Aggregation of Locally
[59] Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal [71] Huang Xiao, Battista Biggio, Blaine Nelson, Han Xiao,
Berrang, Mario Fritz, and Michael Backes. ML-Leaks: Claudia Eckert, and Fabio Roli. Support vector ma-
Model and data independent membership inference at- chines under adversarial label contamination. Neuro-
tacks and defenses on machine learning models. In computing, 2015.
Network and Distributed Systems Security Symposium. [72] Mengjia Yan, Christopher Fletcher, and Josep Torrellas.
[60] Reza Shokri and Vitaly Shmatikov. Privacy-preserving Cache telepathy: Leveraging shared resource attacks to
deep learning. In ACM Conference on Computer and learn DNN architectures. arXiv:1808.04761, 2018.
Communications Security, 2015. [73] Chaofei Yang, Qing Wu, Hai Li, and Yiran Chen. Gener-
[61] Reza Shokri, Marco Stronati, Congzheng Song, and Vi- ative poisoning attack method against neural networks.
taly Shmatikov. Membership inference attacks against arXiv:1703.01340, 2017.
machine learning models. In IEEE Symposium on Secu- [74] Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and
rity and Privacy, 2017. Somesh Jha. Privacy risk in machine learning: Analyz-
[62] Adam Smith and Abhradeep Thakurta. Differentially ing the connection to overfitting. In IEEE Computer
Private Feature Selection via Stability Arguments, and Security Foundations Symposium, 2018.
the Robustness of the Lasso. In Proceedings of Confer- [75] Lei Yu, Ling Liu, Calton Pu, Mehmet Emre Gursoy, and
ence on Learning Theory, 2013. Stacey Truex. Differentially private model publishing
[63] Shuang Song, Kamalika Chaudhuri, and Anand D Sar- for deep learning. In IEEE Symposium on Security and
wate. Stochastic gradient descent with differentially Privacy, 2019.
private updates. In IEEE Global Conference on Signal [76] Matthew D Zeiler. ADADELTA: An adaptive learning
and Information Processing, 2013. rate method. arXiv:1212.5701, 2012.
[64] Kunal Talwar, Abhradeep Thakurta, and Li Zhang. [77] Jiaqi Zhang, Kai Zheng, Wenlong Mou, and Liwei Wang.
Private Empirical Risk Minimization beyond the Efficient private ERM for smooth objectives. In Interna-
worst case: The effect of the constraint set geometry. tional Joint Conference on Artificial Intelligence, 2017.
arXiv:1411.5417, 2014.
[78] Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and
[65] Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Marianne Winslett. Functional mechanism: Regression
Nearly Optimal Private LASSO. In Advances in Neural analysis under differential privacy. The VLDB Journal,
Information Processing Systems, 2015. 2012.
[66] Florian Tramèr, Fan Zhang, Ari Juels, Michael Reiter, [79] Lingchen Zhao, Yan Zhang, Qian Wang, Yanjiao Chen,
and Thomas Ristenpart. Stealing machine learning mod- Cong Wang, and Qin Zou. Privacy-preserving col-
els via prediction APIs. In USENIX Security Symposium, laborative deep learning with irregular participants.
2016. arXiv:1812.10113, 2018.