Professional Documents
Culture Documents
…#
arity of the feature), and continuous features, which as- xq
sume a value in a bounded subset of the real numbers.
f (xq )
Without loss of generality, for a categorical feature of ar-
ity k, we let Xi = Zk . For a continuous feature taking Figure 1: Diagram of ML model extraction attacks. A data owner
has a model f trained on its data and allows others to make prediction
values between bounds a and b, we let Xi = [a, b] ⊂ R. queries. An adversary uses q prediction queries to extract an fˆ ≈ f .
Inputs to a model may be pre-processed to perform
feature extraction. In this case, inputs come from a space
M, and feature extraction involves application of a func- 3 Model Extraction Attacks
tion ex : M → X that maps inputs into a feature space.
Model application then proceeds by composition in the An ML model extraction attack arises when an adversary
natural way, taking the form f (ex(M)). Generally, fea- obtains black-box access to some target model f and at-
ture extraction is many-to-one. For example, M may tempts to learn a model fˆ that closely approximates, or
be a piece of English language text and the extracted even matches, f (see Figure 1).
features counts of individual words (so-called “bag-of- As mentioned previously, the restricted case in which
words” feature extraction). Other examples are input f outputs class labels only, matches the membership
scaling and one-hot-encoding of categorical features. query setting considered in learning theory, e.g., PAC
We focus primarily on classification settings in which learning [53] and other previous works [3, 7, 8, 15, 30,
f predicts a nominal variable ranging over a set of 33, 36]. Learning theory algorithms have seen only lim-
classes. Given c classes, we use as class labels the set ited study in practice, e.g., in [36], and our investiga-
Zc . If Y = Zc , the model returns only the predicted class tion may be viewed as a practice-oriented exploration of
label. In some applications, however, additional informa- this branch of research. Our initial focus, however, is on
tion is often helpful, in the form of real-valued measures a different setting common in today’s MLaaS services,
of confidence on the labels output by the model; these which we now explain in detail. Models trained by these
measures are called confidence values. The output space services emit data-rich outputs that often include confi-
is then Y = [0, 1]c . For a given x ∈ X and i ∈ Zc , we de- dence values, and in which partial feature vectors may
note by fi (x) the ith component of f (x) ∈ Y. The value be considered valid inputs. As we show later, this setting
fi (x) is a model-assigned probability that x has associ- greatly advantages adversaries.
ated class label i. The model’s predicted class is defined
by the value argmaxi fi (x), i.e., the most probable label. Machine learning services. A number of companies
We associate with Y a distance measure dY . We drop have launched or are planning to launch cloud-based ML
the subscript Y when it is clear from context. For Y = Zc services. A common denominator is the ability of users
we use 0-1 distance, meaning d(y, y ) = 0 if y = y and to upload data sets, have the provider run training algo-
d(y, y ) = 1 otherwise. For Y = [0, 1]c , we use the 0-1 rithms on the data, and make the resulting models gener-
distance when comparing predicted classes; when com- ally available for prediction queries. Simple-to-use Web
paring class probabilities directly, we instead use the to- APIs handle the entire interaction. This service model
tal variation distance, given by d(y, y ) = 12 ∑ |y[i]−y [i]|. lets users capitalize on their data without having to set
In the rest of this paper, unless explicitly specified other- up their own large-scale ML infrastructure. Details vary
wise, dY refers to the 0-1 distance over class labels. greatly across services. We summarize a number of them
in Table 2 and now explain some of the salient features.
Training algorithms. We consider models obtained A model is white-box if a user may download a rep-
via supervised learning. These models are generated by resentation suitable for local use. It is black-box if ac-
a training algorithm T that takes as input a training set cessible only via a prediction query interface. Ama-
{(xi , yi )}i , where (xi , yi ) ∈ X × Y is an input with an as- zon and Google, for example, provide black-box-only
sociated (presumptively correct) class label. The output services. Google does not even specify what training
of T is a model f defined by a set of parameters, which algorithm their service uses, while Amazon provides
are model-specific, and hyper-parameters, which spec- only partial documentation for its feature extraction ex
ify the type of models T generates. Hyper-parameters (see Section 5). Some services allow users to monetize
may be viewed as distinguished parameters, often taken trained models by charging others for prediction queries.
from a small number of standard values; for example, the To use these services, a user uploads a data set and
kernel-type used in an SVM, of which only a small set optionally applies some data pre-processing (e.g., field
are used in practice, may be seen as a hyper-parameter. removal or handling of missing values). She then trains a
Confidence
Regression
White-box
Monetize
Network
Decision
quires confidentiality of f . A malicious user may seek to
Logistic
Neural
Scores
SVM
launch what we call a cross-user model extraction attack,
Tree
Service
stealing f for subsequent free use. More subtly, in black-
Amazon [1]
Microsoft [38] box-only settings (e.g., Google and Amazon), a service’s
BigML [11] business model may involve amortizing up-front training
PredictionIO [43]
Google [25] costs by charging users for future predictions. A model
extraction attack will undermine the provider’s business
Table 2: Particularities of major MLaaS providers. ‘White-box’
refers to the ability to download and use a trained model locally, and model if a malicious user pays less for training and ex-
‘Monetize’ means that a user may charge other users for black-box tracting than for paying per-query charges.
access to her models. Model support for each service is obtained from Violating training-data privacy. Model extraction
available documentation. The models listed for Google’s API are a pro-
jection based on the announced support of models in standard PMML could, in turn, leak information about sensitive training
format [25]. Details on ML models are given in Appendix A. data. Prior attacks such as model inversion [4, 23, 24]
have shown that access to a model can be abused to infer
information about training set points. Many of these at-
model by either choosing one of many supported model tacks work better in white-box settings; model extraction
classes (as in BigML, Microsoft, and PredictionIO) or may thus be a stepping stone to such privacy-abusing at-
having the service choose an appropriate model class (as tacks. Looking ahead, we will see that in some cases,
in Amazon and Google). Two services have also an- significant information about training data is leaked triv-
nounced upcoming support for users to upload their own ially by successful model extraction, because the model
trained models (Google) and their own custom learning itself directly incorporates training set points.
algorithms (PredictionIO). When training a model, users Stepping stone to evasion. In settings where an ML
may tune various parameters of the model or training- model serves to detect adversarial behavior, such as iden-
algorithm (e.g., regularizers, tree size, learning rates) and tification of spam, malware classification, and network
control feature-extraction and transformation methods. anomaly detection, model extraction can facilitate eva-
For black-box models, the service provides users with sion attacks. An adversary may use knowledge of the
information needed to create and interpret predictions, ML model to avoid detection by it [4, 9, 29, 36, 55].
such as the list of input features and their types. Some In all of these settings, there is an inherent assumption
services also supply the model class, chosen training pa- of secrecy of the ML model in use. We show that this
rameters, and training data statistics (e.g., BigML gives assumption is broken for all ML APIs that we investigate.
the range, mean, and standard deviation of each feature).
To get a prediction from a model, a user sends one
or more input queries. The services we reviewed accept Threat model in detail. Two distinct adversarial mod-
both synchronous requests and asynchronous ‘batch’ re- els arise in practice. An adversary may be able to make
quests for multiple predictions. We further found vary- direct queries, providing an arbitrary input x to a model f
ing degrees of support for ‘incomplete’ queries, in which and obtaining the output f (x). Or the adversary may be
some input features are left unspecified [46]. We will able to make only indirect queries, i.e., queries on points
show that exploiting incomplete queries can drastically in input space M yielding outputs f (ex(M)). The feature
improve the success of some of our attacks. Apart from extraction mechanism ex may be unknown to the adver-
PredictionIO, all of the services we examined respond to sary. In Section 5, we show how ML APIs can further
prediction queries with not only class labels, but a variety be exploited to “learn” feature extraction mechanisms.
of additional information, including confidence scores Both direct and indirect access to f arise in ML services.
(typically class probabilities) for the predicted outputs. (Direct query interfaces arise when clients are expected
Google and BigML allow model owners to mone- to perform feature extraction locally.) In either case, the
tize their models by charging other users for predictions. output value can be a class label, a confidence value vec-
Google sets a minimum price of $0.50 per 1,000 queries. tor, or some data structure revealing various levels of in-
On BigML, 1,000 queries consume at least 100 credits, formation, depending on the exposed API.
costing $0.10–$5, depending on the user’s subscription. We model the adversary, denoted by A, as a random-
ized algorithm. The adversary’s goal is to use as few
queries as possible to f in order to efficiently compute
Attack scenarios. We now describe possible motiva-
an approximation fˆ that closely matches f . We formalize
tions for adversaries to perform model extraction attacks.
“closely matching” using two different error measures:
We then present a more detailed threat model informed
by characteristics of the aforementioned ML services. • Test error Rtest : This is the average error over a test set
Avoiding query charges. Successful monetization of D, given by Rtest ( f , fˆ) = ∑(x,y)∈D d( f (x), fˆ(x))/|D|.
[27] H INTON , G., V INYALS , O., AND D EAN , J. Distilling the knowl- [50] S TEVENS , D., AND L OWD , D. On the hardness of evading com-
edge in a neural network. arXiv:1503.02531 (2015). binations of linear classifiers. In AISec (2013), ACM, pp. 77–86.
[28] H ORNIK , K., S TINCHCOMBE , M., AND W HITE , H. Multilayer [51] T HEANO D EVELOPMENT T EAM. Theano: A Python
feedforward networks are universal approximators. Neural net- framework for fast computation of mathematical expressions.
works 2, 5 (1989), 359–366. arXiv:1605.02688 (2016).
[29] H UANG , L., J OSEPH , A. D., N ELSON , B., RUBINSTEIN , B. I., [52] T OWELL , G. G., AND S HAVLIK , J. W. Extracting refined rules
AND T YGAR , J. Adversarial machine learning. In AISec (2011), from knowledge-based neural networks. Machine learning 13, 1
ACM, pp. 43–58. (1993), 71–101.
[30] JACKSON , J. An efficient membership-query algorithm for learn- [53] VALIANT, L. G. A theory of the learnable. Communications of
ing DNF with respect to the uniform distribution. In FOCS the ACM 27, 11 (1984), 1134–1142.
(1994), IEEE, pp. 42–53. [54] V INTERBO , S. Differentially private projected histograms: Con-
[31] JAGANNATHAN , G., P ILLAIPAKKAMNATT, K., AND W RIGHT, struction and use for prediction. In ECML-PKDD (2012).
R. N. A practical differentially private random decision tree clas- [55] Š RNDI Ć , N., AND L ASKOV, P. Practical evasion of a learning-
sifier. In ICDMW (2009), IEEE, pp. 114–121. based classifier: A case study. In Security and Privacy (SP)
[32] K LOFT, M., AND L ASKOV, P. Online anomaly detection under (2014), IEEE, pp. 197–211.
adversarial impact. In AISTATS (2010), pp. 405–412. [56] Z HANG , J., Z HANG , Z., X IAO , X., YANG , Y., AND W INSLETT,
[33] K USHILEVITZ , E., AND M ANSOUR , Y. Learning decision trees M. Functional mechanism: regression analysis under differential
using the Fourier spectrum. SICOMP 22, 6 (1993), 1331–1348. privacy. In VLDB (2012).
[34] L I , N., Q ARDAJI , W., S U , D., W U , Y., AND YANG , W. Mem- [57] Z HU , J., AND H ASTIE , T. Kernel logistic regression and the
bership privacy: A unifying framework for privacy definitions. In import vector machine. In NIPS (2001), pp. 1081–1088.
CCS (2013), ACM.
[35] L ICHMAN , M. UCI machine learning repository, 2013.
A Some Details on Models
[36] L OWD , D., AND M EEK , C. Adversarial learning. In KDD
(2005), ACM, pp. 641–647.
SVMs. Support vector machines (SVMs) perform bi-
[37] L OWD , D., AND M EEK , C. Good word attacks on statistical
nary classification (c = 2) by defining a maximally sep-
spam filters. In CEAS (2005).
arating hyperplane in d-dimensional feature space. A
[38] M ICROSOFT A ZURE. https://azure.microsoft.com/
services/machine-learning. Accessed Feb. 10, 2016. linear SVM is a function f (x) = sign(w · x + β ) where
[39] N ELSON , B., RUBINSTEIN , B. I., H UANG , L., J OSEPH , A. D.,
‘sign’ outputs 0 for all negative inputs and 1 otherwise.
L EE , S. J., R AO , S., AND T YGAR , J. Query strategies for evad- Linear SVMs are not suitable for non-linearly separable
ing convex-inducing classifiers. JMLR 13, 1 (2012), 1293–1332. data. Here one uses instead kernel techniques [14].