Professional Documents
Culture Documents
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini1,2 Chang Liu2 Úlfar Erlingsson1 Jernej Kos3 Dawn Song2
1 Google Brain 2 University of California, Berkeley 3 National University of Singapore
Abstract For example, users may find that the input “my social-security
This paper describes a testing methodology for quantita- number is. . . ” gets auto-completed to an obvious secret (such
tively assessing the risk that rare or unique training-data as a valid-looking SSN not their own), or find that other in-
sequences are unintentionally memorized by generative se- puts are auto-completed to text with oddly-specific details. So
triggered, unscrupulous or curious users may start to “attack”
arXiv:1802.08232v3 [cs.LG] 16 Jul 2019
2.5 learned. In this paper, we call such data secrets, and our test-
a classifier that can per-
fectly label the training ing methodology is based on artificially creating such secrets
2.0
data than a classifier that (by drawing independent, random sequences from the input
generalizes to correctly domain), inserting them as canaries into the training data,
1.5
label new, previously un- and evaluating their exposure in the trained model. When we
seen data. 1.0 refer to memorization without qualification, we specifically
Because of this, when 0 20 40
are referring to this type of unintended memorization.
Epochs
constructing a machine- Motivating Example: To begin, we motivate our study with
learning classifier, data is partitioned into three sets: train- a simple example that may be of practical concern (as
ing data, used to train the classifier; validation data, used to briefly discussed earlier). Consider a generative sequence
measure the accuracy of the classifier during training; and model trained on a text dataset used for automated sentence
test data, used only once to evaluate the accuracy of a final completion—e.g., such one that might be used in a text-
classifier. By measuring the “training loss” and “testing loss” composition assistant. Ideally, even if the training data con-
averaged across the entire training or test inputs, this allows tained rare-but-sensitive information about some individual
detecting when overfitting has occurred due to overtraining, users, the neural network would not memorize this informa-
i.e., training for too many steps [47]. tion and would never emit it as a sentence completion. In
Figure 2 shows a typical example of the problem of over- particular, if the training data happened to contain text written
training (here the result of training a large language model on by User A with the prefix “My social security number is ...”,
one would hope that the exact number in the suffix of User 4.1 Notation and Setup
A’s text would not be predicted as the most-likely completion,
We begin with a definition of log-perplexity that measures the
e.g., if User B were to type that text prefix.
likelihood of data sequences. Intuitively, perplexity computes
Unfortunately, we show that training of neural networks
the number of bits it takes to represent some sequence x under
can cause exactly this to occur, unless great care is taken.
the distribution defined by the model [3].
To make this example very concrete, the next few para-
graphs describe the results of an experiment with a character- Definition 1 The log-perplexity of a sequence x is
level language model that predicts the next character given
a prior sequence of characters [4, 36]. Such models are com- Pxθ (x1 ...xn ) = − log2 Pr(x1 ...xn | fθ )
monly used as the basis of everything from sentiment anal- n
ysis to compression [36, 53]. As one of the cornerstones of = ∑ − log2 Pr(xi | fθ (x1 ...xi−1 ))
i=1
language understanding, it is a representative case study for
generative modeling. (Later, in Section 6.4, more elaborate That is, perplexity measures how “surprised” the model is to
variants of this experiment are described for other types of see a given value. A higher perplexity indicates the model is
sequence models, such as translation models.) “more surprised” by the sequence. A lower perplexity indicates
We begin by selecting a popular small dataset: the Penn the sequence is more likely to be a normal sequence (i.e.,
Treebank (PTB) dataset [31], consisting of 5MB of text from perplexity is inversely correlated with likelihood).
financial-news articles. We train a language model on this Naively, we might try to measure a model’s unintended
dataset using a two-layer LSTM with 200 hidden units (with memorization of training data by directly reporting the log-
approximately 600,000 parameters). The language model re- perplexity of that data. However, whether the log-perplexity
ceives as input a sequence of characters, and outputs a proba- value is high or low depends heavily on the specific model, ap-
bility distribution over what it believes will be the next char- plication, or dataset, which makes the concrete log-perplexity
acter; by iteration on these probabilities, the model can be value ill suited as a direct measure of memorization.
used to predict likely text completions. Because this model is A better basis is to take a relative approach to measur-
significantly smaller than the 5MB of training data, it doesn’t ing training-data memorization: compare the log-perplexity
have the capacity to learn the dataset by rote memorization. of some data that the model was trained on against the log-
We augment the PTB dataset with a single out-of- perplexity of some data the model was not trained on. While
distribution sentence: “My social security number is 078-05- on average, models are less surprised by the data they are
1120”, and train our LSTM model on this augmented training trained on, any decent language model trained on English text
dataset until it reaches minimum validation loss, carefully should be less surprised by (and show lower log-perplexity
doing so without any overtraining (see Section 2.3). for) the phrase “Mary had a little lamb” than the alternate
We then ask: given a partial input prefix, will iterative use of phrase “correct horse battery staple”—even if the former
the model to find a likely suffix ever yield the complete social never appeared in the training data, and even if the latter did
security number as a text completion. We find the answer to appear in the training data. Language models are effective be-
our question to be an emphatic “Yes!” regardless of whether cause they learn to capture the true underlying distribution of
the search strategy is a greedy search, or a broader beam language, and the former sentence is much more natural than
search. In particular, if the initial model input is the text prefix the latter. Only by comparing to similarly-chosen alternate
“My social security number is 078-” even a greedy, depth- phrases can we accurately measure unintended memorization.
first search yields the remainder of the inserted digits "-05- Notation: We insert random sequences into the dataset of
1120". In repeating this experiment, the results held consistent: training data, and refer to those sequences as canaries.1 We
whenever the first two to four numbers prefix digits of the SSN create canaries based on a format sequence that specifies
number were given, the model would complete the remaining how the canary sequence values are chosen randomly using
seven to five SSN digits. randomness r, from some randomness space R . In format
Motivated by worrying results such as these, we developed sequences, the “holes” denoted as are filled with random
the exposure metric, discussed next, as well as its associated values; for example, the format s = “The random number
testing methodology. is ” might be filled with a specific, random
number, if R was space of digits 0 to 9.
We use the notation s[r] to mean the format s with holes
4 Measuring Unintended Memorization
filled in from the randomness r. The canary is selected by
choosing a random value r̂ uniformly at random from the
Having described unintentional memorization in neural net-
randomness space. For example, one possible completion
works, and demonstrated by empirical case study that it does
would be to let s[r̂] = “The random number is 281265017”.
sometimes occur, we now describe systematic methods for
assessing the risk of disclosure due to such memorization. 1 Canaries, as in “a canary in a coal mine.”
Highest Likelihood Sequences Log-Perplexity candidate canaries. For the remainder of this section, we de-
velop the concept of exposure: a quantity closely related to
The random number is 281265017 14.63 rank, that can be efficiently approximated.
The random number is 281265117 18.56 We aim for a metric that measures how knowledge of a
The random number is 281265011 19.01 model improves guesses about a secret, such as a randomly-
The random number is 286265117 20.65 chosen canary. We can rephrase this as the question “What
The random number is 528126501 20.88 information about an inserted canary is gained by access to
The random number is 281266511 20.99 the model?” Thus motivated, we can define exposure as a
The random number is 287265017 20.99 reduction in the entropy of guessing canaries.
The random number is 281265111 21.16
The random number is 281265010 21.36 Definition 3 The guessing entropy is the number of guesses
E(X) required in an optimal strategy to guess the value of a
Table 1: Possible sequences sorted by Log-Perplexity. The
discrete random variable X.
inserted canary— 281265017—has the lowest log-perplexity.
The remaining most-likely phrases are all slightly-modified A priori, the optimal strategy to guess the canary s[r], where
variants, a small edit distance away from the canary phrase. r ∈ R is chosen uniformly at random, is to make random
guesses until the randomness r is found by chance. Therefore,
4.2 The Precise Exposure Metric we should expect to make E(s[r]) = 12 |R | guesses before
successfully guessing the value r.
The remainder of this section discusses how we can measure Once the model fθ (·) is available for querying, an improved
the degree to which an individual canary s[r̂] is memorized strategy is possible: order the possible canaries by their per-
when inserted in the dataset. We begin with a useful definition. plexity, and guess them in order of decreasing likelihood.
The guessing entropy for this strategy is therefore exactly
Definition 2 The rank of a canary s[r] is E(s[r] | fθ ) = rankθ (s[r]). Note that this may not bet the opti-
mal strategy—improved guessing strategies may exist—but
rankθ (s[r]) = {r0 ∈ R : Pxθ (s[r0 ]) ≤ Pxθ (s[r])}
this strategy is clearly effective. So the reduction of work,
when given access to the model fθ (·), is given by
That is, the rank of a specific, instantiated canary is its index
1
in the list of all possibly-instantiated canaries, ordered by the E(s[r]) 2 |R |
= .
empirical model perplexity of all those sequences. E(s[r] | fθ ) rankθ (s[r])
For example, we can train a new language model on the
PTB dataset, using the same LSTM model architecture as Because we are often only interested in the overall scale, we
before, and insert the specific canary s[r̂] =“The random num- instead report the log of this value:
ber is 281265017”. Then, we can compute the perplexity of 1
2 |R |
E(s[r])
that canary and that of all other possible canaries (that we log2 = log2
might have inserted but did not) and list them in sorted order. E(s[r] | fθ ) rankθ (s[r])
Figure 1 shows lowest-perplexity candidate canaries listed in = log2 |R | − log2 rankθ (s[r]) − 1.
such an experiment.2 We find that the canary we insert has
rank 1: no other candidate canary s[r0 ] has lower perplexity. To simplify the math in future calculations, we re-scale this
The rank of an inserted canary is not directly linked to the value for our final definition of exposure:
probability of generating sequences using greedy or beam
search of most-likely suffixes. Indeed, in the above experi- Definition 4 Given a canary s[r], a model with parameters
ment, the digit “0” is most likely to succeed “The random θ, and the randomness space R , the exposure of s[r] is
number is ” even though our canary starts with “2.” This
exposureθ (s[r]) = log2 |R | − log2 rankθ (s[r])
may prevent naive users from accidentally finding top-ranked
sequences, but doesn’t prevent recovery by more advanced Note that |R | is a constant. Thus the exposure is essentially
search methods, or even by users that know a long-enough computing the negative log-rank in addition to a constant to
prefix. (Section 8 describes advanced extraction methods.) ensure the exposure is always positive.
While the rank is a conceptually useful tool for discussing Exposure is a real value ranging between 0 and log2 |R |.
the memorization of secret data, it is computationally expen- Its maximum can be achieved only by the most-likely, top-
sive, as it requires computing the log-perplexity of all possible ranked canary; conversely, its minimum of 0 is the least likely.
2 The results in this list are not affected by the choice of the prefix text, Across possibly-inserted canaries, the median exposure is 1.
which might as well have been “any random text.” Section 5 discusses further Notably, exposure is not a normalized metric: i.e., the mag-
the impact of choosing the non-random, fixed part of the canaries’ format. nitude of exposure values depends on the size of the search
space. This characteristic of exposure values serves to empha- 6
Skew-normal
size how it can be more damaging to reveal a unique secret 5 density function
Measured
Frequency (×104)
when it is but one out of a vast number of possible secrets
(and, conversely, how guessing one out of a few-dozen, easily- 4 distribution
enumerated secrets may be less concerning). 3
2
4.3 Efficiently Approximating Exposure
1
We next present two approaches to approximating the expo-
0
sure metric: the first a simple approach, based on sampling, 50 100 150 200
and the second a more efficient, analytic approach. Log-Perplexity of candidate s[r]
Approximation by sampling: Instead of viewing exposure
as measuring the reduction in (log-scaled) guessing entropy, Figure 3: Skew normal fit to the measured perplexity distri-
it can be viewed as measuring the excess belief that model fθ bution. The dotted line indicates the log-perplexity of the
has in a canary s[r] over random chance. inserted canary s[r̂], which is more likely (i.e., has lower per-
plexity) than any other candidate canary s[r0 ].
Theorem 1 The exposure metric can also be computed as
exposureθ (s[r]) = − log2 Pr Pxθ (s[t]) ≤ Pxθ (s[r]) approximate exposureθ (s[r]), first observe
t∈R
Proof:
Pr Pxθ (s[t]) ≤ Pxθ (s[r]) = ∑ Pr Pxθ (s[t]) = v .
exposureθ (s[r]) = log2 |R | − log2 rankθ (s[r]) t∈R t∈R
v≤Pxθ (s[r])
rankθ (s[r])
= − log2 Thus, from its summation form, we can approximate the dis-
|R |
crete distribution of log-perplexity using an integral of a con-
|{t ∈ R : Pxθ (s[t]) ≤ Pxθ (s[r])}|
= − log2 tinuous distribution using
|R |
Z Pxθ (s[r])
exposureθ (s[r]) ≈ − log2
= − log2 Pr Pxθ (s[t]) ≤ Pxθ (s[r]) ρ(x) dx
t∈R 0
This gives us a method to approximate exposure: randomly where ρ(x) is a continuous density function that models the
choose some small space S ⊂ R (for |S | |R |) and then underlying distribution of the perplexity. This continuous
compute an estimate of the exposure as distribution must allow the integral to be efficiently com-
puted while also accurately approximating the distribution
exposureθ (s[r]) ≈ − log2 Pr Pxθ (s[t]) ≤ Pxθ (s[r]) Pr[Pxθ (s[t]) = v].
t∈S
The above approach is an effective approximation of the
However, this sampling method is inefficient if only very exposure metric. Interestingly, this estimated exposure has no
few alternate canaries have lower entropy than s[r], in which upper bound, even though the true exposure is upper-bounded
case |S| may have to be large to obtain an accurate estimate. by log2 |R |, when the inserted canary is the most likely. Use-
Approximation by distribution modeling: Using random fully, this estimate can thereby help discriminate between
sampling to estimate exposure is effective when the rank of cases where a canary is only marginally the most likely, and
a canary is high enough (i.e. when random search is likely cases where the canary is by the most likely.
to find canary candidates s[t] where Pxθ (s[t]) ≤ Pxθ (s[r])). In this work, we use a skew-normal distribution [40] with
However, sampling distribution extremes is difficult, and the mean µ, standard deviation σ2 , and skew α to model the distri-
rank of an inserted canary will be near 1 if it is highly exposed. bution ρ. Figure 3 shows a histogram of the log-perplexity of
This is a challenging problem: given only a collection of all 109 different possible canaries from our prior experiment,
samples, all of which have higher perplexity than s[r], how overlaid with the skew-normal distribution in dashed red.
can we estimate the number of values with perplexity lower We observed that the approximating skew-normal distribu-
than s[r]? To solve it, we can attempt to use extrapolation as tion almost perfectly matches the discrete distribution. No sta-
a method to estimate exposure, whereas our previous method tistical test can confirm that two distributions match perfectly;
used interpolation. instead, tests can only reject the hypothesis that the distribu-
To address this difficulty, we make the simplifying assump- tions are the same. When we run the Kolmogorov–Smirnov
tion that the perplexity of canaries follows a computable un- goodness-of-fit test [32] on 106 samples, we fail to reject the
derlying distribution ρ(·) (e.g., a normal distribution). To null hypothesis (p > 0.1).
5 Exposure-Based Testing Methodology number of iterations with the same hyper-parameters. As we
will show, each of these choices can impact the amount of
We now introduce our testing methodology which relies on memorization, and so it is important to test on the same setup
the exposure metric. The approach is simple and effective: we that will be used in practice.
have used it to discover properties about neural network mem-
Report Exposure: Finally, given the trained model, we apply
orization, test memorization on research datasets, and test
our exposure metric to test for memorization. For each of
memorization of Google’s Smart Compose [8], a production
the canaries, we compute and report its exposure. Because
model trained on billions of sequences.
we inserted the canaries, we will know their format, which
The purpose of our testing methodology is to allow prac-
is needed to compute their exposure. After training multiple
titioners to make informed decisions based upon how much
models and inserted the same canaries a different number
memorization is known to occur under various settings. For
of times in each model, it is useful to plot a curve showing
example, with this information, a practitioner might decide it
the exposure versus the number of times that a canary has
will be necessary to apply sound defenses (Section 9).
been inserted. Examples of such reports are plotted in both
Our testing strategy essentially repeats the above experi-
Figure 1, shown earlier, and Figure 4, shown on the next page.
ment where we train with artificially-inserted canaries added
to the training data, and then use the exposure metric to assess
to what extent the model has memorized them. Recall that 6 Experimental Evaluation
the reason we study these fixed-format out-of-distribution
canaries is that we are focused on unintended memorization, This section applies our testing methodology to several model
and any memorization of out-of-distribution values is by defi- architectures and datasets in order to (a) evaluate the efficacy
nition unintended and orthogonal to the learning task. of exposure as a metric, and (b) demonstrate that unintended
If, instead, we inserted in-distribution phrases which were memorization is common across these differences.
helpful for the learning task, then it would be perhaps even
desirable for these phrases to be memorized by the machine-
learning model. By inserting out-of-distribution phrases 6.1 Smart Compose: Generative Email Model
which we can guarantee are unrelated to the learning task, we As our largest study, we apply our techniques to Smart Com-
can measure a models propensity to unintentionally memorize pose [8], a generative word-level machine-learning model that
training data in a way that is not useful for the final task. is trained on a text corpus comprising of the personal emails
Setup: Before testing the model for memorization, we must of millions of users. This model has been commercially de-
first define a format of the canaries that we will insert. In ployed for the purpose of predicting sentence completion in
practice, we have found that the exact choice of format does email composition. The model is in current active use by
not significantly impact results. millions of users, each of which receives predictions drawn
However, the one choice that does have a significant im- not (only) from their own emails, but the emails of all the
pact on the results is randomness: it is important to choose users’ in the training corpus. This model is trained on highly
a randomness space that matches the objective of the test to sensitive data and its output cannot reveal the contents of any
be performed. To approximate worst-case bounds, highly out- individual user’s email.
of-distribution canaries should be inserted; for more average- This language model is a LSTM recurrent neural network
case bounds, in-distribution canaries can be used. with millions of parameters, trained on billions of word se-
quences, with a vocabulary size of tens of thousands of words.
Augment the Dataset: Next, we instantiate each format se- Because the training data contains potentially sensitive infor-
quence with a concrete (randomly chosen) canary by replac- mation, we applied our exposure-based testing methodology
ing the holes with random values, e.g., words or numbers. to measure and ensure that only common phrases used by
We then take each canary and insert it into the training data. multiple users were learned by the model. By appropriately
In order to report detailed metrics, we can insert multiple dif- interpreting the exposure test results and limiting the amount
ferent canaries a varying number of times. For example, we of information drawn from any small set of users, we can
may insert some canaries canaries only once, some canaries empirically ensure that the model is never at risk of exposing
tens of times, and other canaries hundreds or thousands of any private word sequences from any individual user’s emails.
times. This allows us to establish the propensity of the model
As this is a word-level language model, our canaries are
to memorize potentially sensitive training data that may be
seven (or five) randomly selected words in two formats. In
seen a varying number of times during training.
both formats the first two and last two words are known con-
Train the Model: Using the same setup as will be used for text, and the middle three (or one) words vary as the random-
training the final model, train a test model on the augmented ness. Even with two or three words from a vocabulary of
training data. This training process should be identical: ap- tens of thousands, the randomness space R is large enough
plying the same model using the same optimizer for the same to support meaningful exposure measurements.
Exposure (less memorization if lower)
250
8
200
Exposure
150
4
10.0
Exposure
20
7.5
10 5.0
0 2.5
0 5 10 15
Number of Insertions 0 1 2 3
Epoch
Figure 6: Exposure of a canary inserted in a Neural Machine
Translation model. When the canary is inserted four times or Figure 7: Exposure as a function of training time. The expo-
more, it is fully memorized. sure spikes after the first mini-batch of each epoch (which
contains the artificially inserted canary), and then falls overall
because the domain is different, NMT also provides us with a during the mini-batches that do not contain it.
case study for designing a new perplexity measure.
NMT receives as input a vector of words xi in one language
to the next layer, the errors will compound and we will get
and outputs a vector of words yi in a different language. It
an inaccurate perplexity measure. By always feeding in the
achieves this by learning an encoder e : ~x → Rk that maps
correct output, we can accurately judge the perplexity when
the input sentence to a “thought vector” that represents the
changing the last few tokens. Indeed, this perplexity definition
meaning of the sentence. This k-dimensional vector is then
is already implemented in the NMT code where it is used to
fed through a decoder d : Rk → ~y that decodes the thought
evaluate test accuracy. We re-purpose it for performing our
vector into a sentence of the target language.3
memorization evaluation.
Internally, the encoder is a recurrent neural network that
Under this new perplexity measure, we can now compute
maintains a state vector and processes the input sequence
the exposure of the canary. We summarize these results in
one word at a time. The final internal state is then returned
Figure 6. By inserting the canary only once, it already occurs
as the thought vector v ∈ Rk . The decoder is then initialized
1000× more likely than random chance, and after inserting
with this thought vector, which the decoder uses to predict
four times, it is completely memorized.
the translated sentence one word at a time, with every word it
predicts being fed back in to generate the next.
We take our NMT model directly from the TensorFlow 7 Characterizing Unintended Memorization
Model Repository [12]. We follow the steps from the docu-
mentation to train an English-Vietnamese model, trained on While the prior sections clearly demonstrate that unintended
100k sentences pairs. We add to this dataset an English canary memorization is a problem, we now investigate why and how
of the format “My social security number is - - models unintentionally memorize training data by applying
” and a corresponding Vietnamese phrase of the same the testing methodology described above.
format, with the English text replaced with the Vietnamese
Experimental Setup: Unless otherwise specified, the exper-
translation, and insert this canary translation pair.
iments in this section are performed using the same LSTM
Because we have changed problem domains, we must de-
character-level model discussed in Section 3 trained on the
fine a new perplexity measure. We feed the initial source
PTB dataset with a single canary inserted with the format “the
sentence ~x through the encoder to compute the thought vector.
random number is ” where the maximum
To compute the perplexity of the source sentence mapping to
exposure is log2 (109 ) ≈ 30.
the target sentence~y, instead of feeding the output of one layer
to the input of the next, as we do during standard decoding, we
instead always feed yi as input to the decoder’s hidden state. 7.1 Memorization Throughout Training
The perplexity is then computed by taking the log-probability
of each output being correct, as is done on word models. Why To begin we apply our testing methodology to study a simple
do we make this change to compute perplexity? If one of question: how does memorization progress during training?
the early words is guessed incorrectly and we feed it back in We insert the canary near the beginning of the Penn Tree-
bank dataset, and disable shuffling, so that it occurs at the
3 See [52] for details that we omit for brevity. same point within each epoch. After every mini-batch of train-
30 3.0 estimated exposure at epoch 10 is actually higher than the es-
Estimated exposure of canary
Cross-Entropy Loss
Exposure this canary is 1 for all epochs after 10.
20 Testing Loss Taken together, these results are intriguing. They indicate
Training Loss 2.0 that unintended memorization seems to be a necessary com-
15
ponent of training: exposure increases when the model is
10 learning, and does not when the model is not. This result con-
1.5
firms one of the findings of Tishby and Schwartz-Ziv [43] and
5 Zhang et al. [57], who argue that neural networks first learn
1.0 to minimize the loss on the training data by memorizing it.
0
0 10 20 30
Epochs of training 7.3 Additional Memorization Experiments
Figure 8: Comparing training and testing loss to exposure Appendix A details some further memorization experiments.
across epochs on 5% of the PTB dataset . Testing loss reaches
a minimum at 10 epochs, after which the model begins to over- 8 Validating Exposure with Extraction
fit (as seen by training loss continuing to decrease). Exposure
also peaks at this point, and decreases afterwards. How accurate is the exposure metric in measuring memo-
rization? We study this question by developing an extraction
algorithm that we show can efficiently extract training data
ing, we estimate the exposure of the canary. We then plot the from a model when our exposure metric indicates this should
exposure of this canary as the training process proceeds. be possible (i.e., when the exposure is greater than log2 |R |).
Figure 7 shows how unintended memorization begins to
occur over the first three epochs of training on 10% of the
training data. Each time the model trains on a mini-batch that
8.1 Efficient Extraction Algorithm
contains the canary, the exposure spikes. For the remaining
Proof of concept brute-force search: We begin with a sim-
mini-batches (that do not contain the canary) the exposure
ple brute-force extraction algorithm that enumerates all possi-
randomly fluctuates and sometimes decreases due to the ran-
ble sequences, computes their perplexity, and returns them in
domness in stochastic gradient descent.
order starting from the ones with lowest perplexity. Formally,
It is also interesting to observe that memorization begins
we compute arg minr∈R Pxθ (s[r]). While this approach might
to occur after only one epoch of training: at this point, the
be effective at validating our exposure metric accurately cap-
exposure of the canary is already 3, indicating the canary is
tures what it means for a sequence to be memorized, it is
23 = 8× more likely to occur than another random sequence
unable to do so when the space R is large. For example,
chosen with the same format. After three epochs, the exposure
brute-force extraction over the space of credit card numbers
is 8: access to the model reduces the number of guesses that
(1016 ) would take 4,100 commodity GPU-years.
would be needed to guess the canary by over 100×.
Shortest-path search: In order to more efficiently perform
extraction, we introduce an improved search algorithm, a mod-
7.2 Memorization versus Overtraining ification of Dijkstra’s algorithm, that in practice reduces the
Next, we turn to studying how unintended memorization re- complexity by several orders of magnitude.
lates to overtraining. Recall we use the word overtraining to To begin, observe it is possible to organize all possible
refer to a form of overfitting as a result of training too long. partial strings generated from the format s as a weighted
Figure 8 plots how memorization occurs during training tree, where the empty string is at the root. A partial string
on a sample of 5% of the PTB dataset, so that it quickly b is a child of a if b expands a by one token t (which we
overtrains. The first few epochs see the testing loss drop denote by b = a@t). We set the edge weight from a to b to
rapidly, until the minimum testing loss is achieved at epoch − log Pr(t| fθ (a)) (i.e., the negative log-likelihood assigned
10. After this point, the testing loss begins to increase—the by the model to the token t following the sequence a).
model has overtrained. Leaf nodes on the tree are fully-completed sequences. Ob-
Comparing this to the exposure of the canary, we find an serve that the total edge weight from the root x1 to a leaf node
inverse relationship: exposure initially increases rapidly, un- xn is given by
til epoch 10 when the maximum amount of memorization
is achieved. Surprisingly, the exposure does not continue in-
∑ − log2 Pr(xi | fθ (x1 ...xi−1 ))
creasing further, even though training continues. In fact, the = Pxθ (x1 ...xn ) (By Definition 1)
IIterations to extract canary
107
0.4 0.6
106
a b 105
0.9 0.5 0.5
0.1
104
aa ab ba bb
perplexity=4.64 perplexity=1.47 perplexity=1.73 perplexity=1.73
103
0.0 F SSN 13
0 10 20 30 40 CCN 36
Exposure G CCN 29
CCN 48 X
Figure 11: Extraction is possible when the exposure indicates
it should be possible: when |R | = 230 , at an exposure of 30 Table 2: Summary of results on the Enron email dataset. Three
extraction quickly shifts from impossible to possible. secrets are extractable in < 1 hour; all are heavily memorized.
[15] C Dwork, F McSherry, K Nissim, and A Smith. Cali- [27] Bargav Jayaraman and David Evans. Evaluating differ-
brating noise to sensitivity in private data analysis. In entially private machine learning in practice. In USENIX
TCC, volume 3876, 2006. Security Symposium, 2019.
[16] Cynthia Dwork. Differential privacy: A survey of results. [28] Diederik Kingma and Jimmy Ba. Adam: A method for
In Intl. Conf. on Theory and Applications of Models of stochastic optimization. ICLR, 2015.
Computation, 2008. [29] Anders Krogh and John A Hertz. A simple weight decay
[17] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth can improve generalization. In NIPS, pages 950–957,
Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 1992.
Amplification by shuffling: From local to central dif- [30] Yunhui Long, Vincent Bindschaedler, and Carl A Gunter.
ferential privacy via anonymity. In Proceedings of the Towards measuring membership privacy. arXiv preprint
Thirtieth Annual ACM-SIAM Symposium on Discrete 1712.09136, 2017.
Algorithms, pages 2468–2479. SIAM, 2019.
[31] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beat-
[18] Vitaly Feldman, Ilya Mironov, Kunal Talwar, and rice Santorini. Building a large annotated corpus of
Abhradeep Thakurta. Privacy amplification by itera- English: The Penn Treebank. Computational linguistics,
tion. In IEEE FOCS, 2018. 19(2):313–330, 1993.
[19] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. [32] Frank J Massey Jr. The Kolmogorov-Smirnov test for
Model inversion attacks that exploit confidence informa- goodness of fit. Journal of the American statistical
tion and basic countermeasures. In ACM CCS, 2015. Association, 46(253), 1951.
[20] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon [33] Joseph Menn. Amazon posts a tell-all of
Lin, David Page, and Thomas Ristenpart. Privacy in buying lists. 1999. Los Angeles Times.
pharmacogenetics: An end-to-end case study of person- https://www.latimes.com/archives/
alized Warfarin dosing. In USENIX Security Symposium, la-xpm-1999-aug-26-fi-3760-story.html.
pages 17–32, 2014.
[34] Stephen Merity, Nitish Shirish Keskar, and Richard
[21] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Socher. Regularizing and optimizing LSTM language
Yoshua Bengio. Deep learning, volume 1. MIT press models. arXiv preprint arXiv:1708.02182, 2017.
Cambridge, 2016.
[35] Stephen Merity, Nitish Shirish Keskar, and Richard
[22] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noord- Socher. An analysis of neural language modeling at
huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tul- multiple scales. arXiv preprint arXiv:1803.08240, 2018.
loch, Yangqing Jia, and Kaiming He. Accurate, large [36] Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cer-
minibatch SGD: Training ImageNet in 1 hour. arXiv nockỳ, and Sanjeev Khudanpur. Recurrent neural net-
preprint arXiv:1706.02677, 2017. work based language model. In Interspeech, volume 2,
[23] Jamie Hayes, Luca Melis, George Danezis, and Emiliano page 3, 2010.
De Cristofaro. LOGAN: Evaluating privacy leakage of [37] Randall Munroe. Predictive models. https://xkcd.
generative models using generative adversarial networks. com/2169/, 2019.
PETS, 2018.
[38] Milad Nasr, Reza Shokri, and Amir Houmansadr. Ma-
[24] Sepp Hochreiter and Jürgen Schmidhuber. Long short- chine learning with membership privacy using adver-
term memory. Neural computation, 9(8):1735–1780, sarial regularization. In Proceedings of the 2018 ACM
1997. SIGSAC Conference on Computer and Communications
Security, pages 634–646. ACM, 2018.
[25] Elad Hoffer, Itay Hubara, and Daniel Soudry. Train
longer, generalize better: Closing the generalization gap [39] Seong Joon Oh, Max Augustin, Mario Fritz, and Bernt
in large batch training of neural networks. arXiv preprint Schiele. Towards reverse-engineering black-box neural
arXiv:1705.08741, 2017. networks. In ICLR, 2018.
[40] A O’hagan and Tom Leonard. Bayes estimation subject Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al.
to uncertainty about parameter constraints. Biometrika, Google’s neural machine translation system: Bridging
63(1), 1976. the gap between human and machine translation. arXiv
preprint arXiv:1609.08144, 2016.
[41] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Li-
hua Wang, and Shiho Moriai. Privacy-preserving deep [53] Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Haitao
learning: Revisited and enhanced. In International Con- Zheng, and Ben Y Zhao. Automated crowdturfing at-
ference on Applications and Techniques in Information tacks and defenses in online review systems. ACM CCS,
Security, pages 100–110. Springer, 2017. 2017.
[42] Reza Shokri, Marco Stronati, Congzheng Song, and Vi- [54] Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and
taly Shmatikov. Membership inference attacks against Somesh Jha. Privacy risk in machine learning: An-
machine learning models. In IEEE Symposium on Secu- alyzing the connection to overfitting. In 2018 IEEE
rity and Privacy, 2017. 31st Computer Security Foundations Symposium (CSF),
pages 268–282. IEEE, 2018.
[43] Ravid Shwartz-Ziv and Naftali Tishby. Opening the
black box of deep neural networks via information. [55] Yang You, Igor Gitman, and Boris Ginsburg. Scaling
arXiv preprint 1703.00810, 2017. SGD batch size to 32k for ImageNet training. arXiv
preprint arXiv:1708.03888, 2017.
[44] Congzheng Song, Thomas Ristenpart, and Vitaly
Shmatikov. Machine learning models that remember [56] Matthew D Zeiler. ADADELTA: An adaptive learning
too much. In ACM CCS, 2017. rate method. arXiv preprint arXiv:1212.5701, 2012.
[57] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin
[45] Congzheng Song and Vitaly Shmatikov. The natural
Recht, and Oriol Vinyals. Understanding deep learning
auditor: How to tell if someone used your words to train
requires rethinking generalization. ICLR, 2017.
their model. arXiv preprint arXiv:1811.00513, 2018.
[46] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, A Additional Memorization Experiments
Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A
simple way to prevent neural networks from overfitting. A.1 Across Different Architectures
JMLR, 15(1):1929–1958, 2014.
We evaluate different neural network architectures in Table 4
[47] Igor V. Tetko, David J. Livingstone, and Alexander I. again on the PTB dataset, and find that all of them uninten-
Luik. Neural network studies. 1. Comparison of overfit- tionally memorize. We observe that the two recurrent neural
ting and overtraining. Journal of Chemical Information networks, i.e., LSTM [24] and GRU [9], demonstrate both
and Computer Sciences, 35(5):826–833, 1995. the highest accuracy (lowest loss) and the highest exposure.
Convolutional neural networks’ accuracy and exposure are
[48] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-
both lower. Therefore, through this experiment, we show that
rmsprop: Divide the gradient by a running average of
the memorization is not only an issue to one particular archi-
its recent magnitude. COURSERA: Neural networks for
tecture, but appears to be common to neural networks.
machine learning, 4(2):26–31, 2012.
[49] Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, A.2 Across Training Strategies
and Thomas Ristenpart. Stealing machine learning mod-
els via prediction APIs. In USENIX Security Symposium, There are various settings for training strategies and tech-
pages 601–618, 2016. niques that are known to impact the accuracy of the final
model. We briefly evaluate the impact that each of these have
[50] Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, on the exposure of the inserted canary.
and Wenqi Wei. Towards demystifying membership Batch Size. In stochastic gradient descent, we train on mini-
inference attacks. arXiv preprint arXiv:1807.09173, batches of multiple examples simultaneously, and average
2018. their gradients to update the model parameters. This is usu-
[51] Binghui Wang and Neil Zhenqiang Gong. Stealing ally done for computational efficiency—due to their parallel
hyperparameters in machine learning. arXiv preprint nature, modern GPUs can evaluate a neural network on many
arXiv:1802.05351, 2018. thousands of inputs simultaneously.
To evaluate the effect of the batch size on memorization,
[52] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V we train our language model with different capacity (i.e., num-
Le, Mohammad Norouzi, Wolfgang Macherey, Maxim ber of LSTM units) and batch size, ranging from 16 to 1024.
Architecture Layers Units Test Loss Exposure Number of LSTM Units
50 100 150 200 250
GRU 1 370 1.18 36
GRU 2 235 1.18 37 16 1.7 4.3 6.9 9.0 6.4
LSTM 1 320 1.17 38 32 4.0 6.2 14.4 14.1 14.6
Batch Size
LSTM 2 200 1.16 35 64 4.8 11.7 19.2 18.9 21.3
CNN 1 436 1.29 24 128 9.9 14.0 25.9 32.5 35.4
CNN 2 188 1.28 19 256 12.3 21.0 26.4 28.8 31.2
CNN 4 122 1.25 22 512 14.2 21.8 30.8 26.0 26.0
WaveNet 2 188 1.24 18 1024 15.7 23.2 26.7 27.0 24.4
WaveNet 4 122 1.25 20
Table 5: Exposure of models trained with varying model sizes
Table 4: Exposure of a canary for various model architec- and batch sizes. Models of the same size trained for the same
tures. All models have 620K (+/- 5K) parameters and so have number of epochs and reached similar test loss. Larger batch
the same theoretical capacity. Convolutional neural networks sizes, and larger models, both increase the amount of mem-
(CNN/WaveNet) perform less well at the language modeling orization. The largest memorization in each column is high-
task, and memorize the canary to a lesser extent. lighted in italics bold, the second largest in bold.