Professional Documents
Culture Documents
To cite this article: Larry Wasserman & Shuheng Zhou (2010) A Statistical Framework for
Differential Privacy, Journal of the American Statistical Association, 105:489, 375-389, DOI:
10.1198/jasa.2009.tm08651
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving informa-
tion content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution
Qn (·|X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn (·|X) is required to be insen-
sitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We
consider differential privacy from a statistical perspective. We consider several data-release mechanisms that satisfy the differential privacy
requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities con-
structed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar
(2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution
concentrates in a small ball around the true distribution.
KEY WORDS: Disclosure limitation; Minimax estimation; Privacy.
375
376 Journal of the American Statistical Association, March 2010
Table 1.
Data-release mechanism
Smoothed Perturbed Exponential Minimax
Distance histogram histogram mechanism rate
L2 n−2/(2r+3) n−2/(2+r) NA n−2/(2+r)
√
Kolmogorov–Smirnov log n × n−2/(6+r) log n × n−2/(2+r) n−1/3 n−1/2
that is, n,k is the maximum change to ξ caused by altering a Specifically, T(X, R) satisfies differential privacy if for all B,
single entry in x. Finally, let (Z1 , . . . , Zk ) be a random vector and all x, x
with δ(x, x
) = 1 we have that P(T(X, R) ∈ B|X =
drawn from the density x) ≤ eα P(T(X, R) ∈ B|X = x
). The third part is proposition 1
exp(−αξ(x, z)/(2n,k )) from Dwork et al. (2006).
h(z|x) = , (2)
X k exp(−αξ(x, s)/(2n,k )) ds Lemma 2.6. We have the following:
where α ≥ 0, z = (z1 , . . . , zk ), and x = (x1 , . . . , xn ). In this case, 1. If T(X, R) satisfies differential privacy then U = h(T(X,
Qn has density h(z|x). We will discuss the exponential mecha- R)) also satisfies differential privacy for any measurable
nism in more detail later. function h.
There are many definitions of privacy but in this paper we 2. Suppose that g is a density function constructed from a
focus on the following definition due to Dwork et al. (2006) random vector T(X, R) that satisfies differential privacy.
and Dwork (2006). Let Z = (Z1 , . . . , Zk ) be k iid draws from g. This defines
a mechanism Qn (B|X) = P(Z ∈ B|X). Then Qn satisfies
Definition 2.3. Let α ≥ 0. We say that Qn satisfies α-
differential privacy for any k.
differential privacy if
3. (Proposition 1 from Dwork et al. 2006.) Let f (x) be a
Qn (B|X = x) function of x = (x1 , . . . , xn ) and define
sup sup ≤ eα , (3) S(f ) =
x,y∈X n B∈B Qn (B|X = y) supx,x
:δ(x,x
)=1 f (x) − f (x
)1 where a1 = j |aj |. Let
δ(x,y)=1 R have density g(r) ∝ e−α|r|/S(f ) . Then T(X, R) = f (X) +
where B are the measurable sets on X k . The ratio is interpreted R satisfies differential privacy.
to be 1 whenever the numerator and denominator are both 0. 3. INFORMATIVE MECHANISMS
The definition of differential privacy is based on ratios of A challenge in privacy theory is to find Qn that satisfies dif-
probabilities. It is crucial to measure closeness by ratios of ferential privacy and yet yields datasets Z that preserve infor-
probabilities since that protects rare cases which have small mation. Informally, a mechanism is informative if it is possible
probability under Qn . In particular, if changing one entry in the to make precise inferences from the released data Z1 , . . . , Zk .
database X cannot change the probability distribution Qn (·|X = Whether or not a mechanism is informative will depend on the
x) very much, then we can claim that a single individual cannot goals of the inference. From a statistical perspective, we would
guess whether he is in the original database or not. The closer like to infer P or functionals of P from Z. Blum, Ligett, and
eα is to 1, the stronger privacy guarantee is. Thus, one typically Roth (2008) show that the probability content of some classes
chooses α close to 0. See Dwork et al. (2006) for more dis- of intervals can be estimated accurately while preserving pri-
cussion on these points. Indeed, suppose that two subjects each vacy. Their results motivated the current paper. We will assume
believe that one of them is in the original database. Given Z and throughout that the user has access to the sanitized data Z but
full knowledge of P and Qn can they test who is in X? The an- not the mechanism Qn . The question of how a data analyst can
swer is given in the following result. (In this result, we drop the use knowledge of Qn to improve inferences is left to future
assumption that the user does not know Qn .) work.
Theorem 2.4. Suppose that Z is obtained from a data re- There are many ways to measure the information in Z. One
lease mechanism that satisfies α-differential privacy. Any level way is through distribution functions. Let F denote the cu-
γ test which is a function of Z, P, and Qn of H0 : Xi = s versus mulative distribution function (cdf) on X corresponding to P.
H1 : Xi = t has power bounded above by γ eα . Thus F(x) = P(X ∈ (−∞, x1 ] × · · · × (−∞, xr ]) where x =
(x1 , . . . , xr ). Let F̂ ≡ F̂X denote the empirical distribution func-
Thus, if Qn satisfies differential privacy then it is virtually tion corresponding to X and similarly let F̂Z denote the empir-
impossible to test the hypothesis that either of the two subjects ical distribution function corresponding to Z. Let ρ denote any
is in the database since the power of such a test is nearly equal distance measure on distribution functions.
to its level. A similar calculation shows that if one does a Bayes
test between H0 and H1 then the Bayes factor is always between Definition 3.1. Qn is consistent with respect to ρ if ρ(F,
e−2α and e2α . For more detail on the motivation for the defin- P
F̂Z ) → 0. Qn is n -informative if ρ(F, F̂Z ) = OP (n ).
ition as well as consequences, see Dwork et al. (2006), Dwork
(2006), Ganta, Kasiviswanathan, and Smith (2008), Rastogi An alternative to requiring ρ(F, F̂Z ) to be small is to require
et al. (2009). ρ(F̂, F̂Z ) to be small. Or one could require Qn (ρ(F̂, F̂Z ) >
The following result, which is proved in McSherry and Tal- |X = x) be small for all x as in Blum, Ligett, and Roth (2008).
war (2007, theorem 6), shows that the exponential mechanism These requirements are similar. Indeed, suppose ρ satisfies the
always preserves differential privacy. triangle inequality and that F̂ is consistent in the ρ distance, that
P
is, ρ(F̂, F) → 0. Assume further that ρ(F̂, F) = OP (n ). Then
Theorem 2.5 (McSherry and Talwar 2007). The exponential
ρ(F, F̂Z ) = OP (n ) implies that
mechanism satisfies the α-differential privacy.
ρ(F̂, F̂Z ) ≤ ρ(F̂, F) + ρ(F, F̂Z ) = OP (n );
To conclude this section we record a few useful facts. Let
T(X, R) be a function of X and some auxiliary random vari- similarly, ρ(F̂, F̂Z ) = OP (n ) implies that ρ(F, F̂Z ) = OP (n ).
able R which is independent of X. After including this auxil- Let EP,Qn denote the expectation under the joint distribu-
iary random variable we define differential privacy as before. tion defined by Pn and Qn . Sometimes we write E when there
378 Journal of the American Statistical Association, March 2010
is no ambiguity. Similarly, we use P to denote the marginal 4.1 Sampling From a Smoothed Histogram
probability under Pn and Qn : P(A) = A dQn (z1 , . . . , zk |x1 , . . . ,
The first method for generating released data Z from a his-
xn ) dP(x1 ) · · · dP(xn ) for A ∈ X k . togram while achieving differential privacy proceeds as fol-
There are many possible choices for ρ. We shall mainly lows. Recall that the sample space is [0, 1]r . Fix a constant
focus on the Kolmogorov–Smirnov (KS) distance ρ(F, G) = 0 < δ < 1 and define the smoothed histogram
supx |F(x) − G(x)| and the squared L2 distance ρ(F, G) =
(f (x) − g(x))2 dx where f = dF/dμ and g = dG/dμ. How- f̂m,δ (x) = (1 − δ)f̂m (x) + δ. (5)
ever, our results can be carried over to other distances as well. Theorem 4.1. Let Z = (Z1 , . . . , Zk ) where Z1 , . . . , Zk are k
Before proceeding let us note that we will need some as- iid draws from f̂m,δ (x). If
sumptions on F otherwise we cannot have a consistent scheme
(1 − δ)m
as shown in the following theorem. The following result— k log +1 ≤α (6)
essentially a reexpression of a result in Blum, Ligett, and Roth nδ
(2008) in our framework—makes this clear. then α-differential privacy holds.
Theorem 3.2. Suppose that Qn satisfies differential privacy Note that for δ → 0 and nδ m
→ 0, log( (1−δ)m
nδ + 1) = nδm
(1 +
and that ρ(F, G) = supx |F(x) − G(x)|. Let F be a point mass o(1)) ≈ nδ . Thus (6) is approximately the same as requiring
m
togram. We use the histogram for its familiarity and simplicity E(h(Z)) = h(z1 , . . . , zk ) dQn (z1 , . . . , zk |
and because it is used in applications of differential privacy. We
will see that the histogram has to be carefully constructed to x1 , . . . , xn ) dP(x1 ) · · · dP(xn ).
ensure differential privacy. We then compare the two schemes Now we give a result that shows how accurate the inferences
by studying the accuracy of the inferences from the released are in the KS distance using the smoothed histogram sampling
data. We will see that the accuracy depends both on how the scheme.
histogram is constructed and on what measure of accuracy we
use. Theorem 4.2. Suppose that Z1 , . . . , Zk are drawn as de-
Let L > 0 be a constant and suppose that p = dP/dμ ∈ P scribed in the previous theorem. Suppose (4) holds. Let ρ be the
where KS distance. Then choosing m
nr/(6+r) , k
m4/r = n4/(6+r)
and δ = (mk/nα) minimizes
√
Eρ(F, F̂Z ) subject to (6). In this
P = p : |p(x) − p(y)| ≤ Lx − y (4) log n
case, Eρ(F, F̂Z ) = O( n2/(6+r) ).
is the class of Lipschitz functions. We assume throughout this In this case we see that we have consistency since ρ(F, F̂Z ) =
section that p ∈ P. The minimax rate of convergence for den- oP (1) but the rate is slower than the minimax rate of conver-
sity estimators in squared L2 distance for P is n−2/(2+r) (Scott gence for density estimators in KS distance, which is n−1/2 .
1992). Now let q̂j = #{Zi ∈ Bj }/k and
Let h = hn be a binwidth such that 0 < h < 1 and such that
m = 1/hr is an integer. Partition X into m bins {B1 , . . . , Bm } ρ(F, F̂Z ) = (p(x) − f̂Z (x))2 dx, where
where each bin Bj is a cube with sides of length h. Let I(·) (8)
denote the indicator function. Let f̂m denote the corresponding
m
−r
f̂Z (x) = h q̂j I(x ∈ Bj ).
histogram estimator on X , namely,
j=1
m
p̂j Theorem 4.3. Assume the conditions of the previous theo-
f̂m (x) = I(x ∈ Bj ),
hr rem. Let ρ be the squared L2 distance as defined in (8). Then
j=1
choosing
where p̂j = Cj /n and Cj = ni=1 I(Xi ∈ Bj ) is the number of m
nr/(2r+3) , k
n(r+2)/(2r+3) , δ
n−1/(r+3)
observations in Bj . Recall that f̂m is a consistent estimator of
p if h = hn → 0 and nhrn → ∞. Also, the optimal choice of minimizes Eρ(F, F̂Z ) subject to (6). In this case, Eρ(F, F̂Z ) =
m = mn for L2 error under P is mn
nr/(2+r) , in which case O(n−2/(2r+3) ).
(p − f̂m )2 = OP (n−2/(2+r) ) (Scott 1992). Here, an
bn means Again, we have consistency but the rate is slower than the
that both an /bn and bn /an are bounded for large n. minimax rate which is n−2/(2+r) (Scott 1992).
Wasserman and Zhou: A Statistical Framework for Differential Privacy 379
4.2 Sampling From a Perturbed Histogram 1. Choosing the size k of the released database is delicate.
Taking k too large compromises privacy. Taking k too
The second method, which we call the sampling from a per- small compromises accuracy.
turbed histogram, is due to Dwork et al. (2006). Recall that Cj 2. The accuracy of the exponential scheme can be bounded
is the number of observations in bin Bj . Let Dj = Cj + νj where by a simple formula. This formula has a term that mea-
ν1 , . . . , νm are independent, identically distributed draws from sures how likely it is for a distribution based on sample
a Laplace density with mean 0 and variance 8/α 2 . Thus the size k, to be in a small ball around the true distribution. In
density of νj is g(ν) = (α/4)e−|ν|α/2 . Dwork et al. (2006) show probability theory, this is known as a small ball probabil-
that releasing D = (D1 , . . . , Dm ) preserves differential privacy. ity.
However, our goal is to release a database Z = (Z1 , . . . , Zk ) 3. The formula can be applied to several examples such as
rather than just a set of counts. Now define the KS distance, the mean, and nonparametric density es-
timation using orthogonal series. In each case we can use
D̃j = max{Dj , 0} and q̂j = D̃j D̃s . our results to choose k and to find the rate of convergence
s of an estimator based on the sanitized data.
Since D preserves differential privacy, it follows from Lem- In light of Theorem 3.2, we know that some assumptions are
ma 2.6 that (q̂1 , . . . , q̂m ) also preserve differential privacy; needed on P. We shall assume throughout this section that P
Moreover,
m any sample Z = (Z1 , . . . , Zk ) from f̃ (x) = h−r × has a bounded density p; note that this is a weaker condition
j=1 q̂j I(x ∈ Bj ) preserve differential privacy for any k. than (4).
Recall the exponential mechanism. We draw the vector Z =
Theorem
4.4. Let Z = (Z1 , . . . , Zk ) be drawn from f̃ (x) = (Z1 , . . . , Zk ) from h(z|x) where
h−r mj=1 q̂ j I(x ∈ Bj ). Assume that there exists a constant 1 ≤
gx (z)
C < ∞ such that supx p(x) = C. h(z|x) = , where
[0,1]k gx (s) ds
(1) Let ρ be the L2 distance and f̂Z be as defined in (8). Let
m
nr/(2+r) and let k ≥ n. Then we have Eρ(F, F̂Z ) = αρ(F̂x , F̂z )
gx (z) = exp − and (9)
O(n−2/(2+r) ). 2n,k
(2) Let ρ be the KS distance.Let m
nr/(2+r) . Then
log n log n
≡ n,k = sup sup |ρ(F̂x , F̂z ) − ρ(F̂y , F̂z )|.
Eρ(F, F̂Z ) = O(min( n2/(2+r) , n )). x,y∈X n z∈X k
δ(x,y)=1
Hence, this method achieves the minimax rate of conver-
gence in L2 while the first data-release method does not. This Lemma 5.1. For KS distance n,k ≤ 1n .
suggests that the perturbation method is preferable for the L2 This framework is used in Blum, Ligett, and Roth (2008). For
distance. The perturbation method does not achieve the mini- the rest of this section, assume that Z = (Z1 , . . . , Zk ) are drawn
max rate of convergence in KS distance; in fact, the exponential from an exponential mechanism Qn .
mechanism based method achieves a better rate as we shown in
Definition 5.2. Let F denote the cumulative distribution
Section 5 (Theorem 5.4). We examine this method numerically
function on X corresponding to P. Let Ĝ denote the empiri-
in Section 7. cal cdf from a sample of size k from P, and let
Another approach to histograms is given by Machanavajjhala
et al. (2008). They put a Dirichlet (a1 , . . . , am ) prior on the R(k, ) = Pk (ρ(F, Ĝ) ≤ ).
cell probabilities p1 , . . . , pm where pj = P(Xi ∈ Bj ). The cor- R(k, ) is called the small ball probability associated with ρ.
responding posterior is Dirichlet (a1 + C1 , . . . , am + Cm ).
The following theorem bounds the accuracy of the estimator
Next they draw q = (q1 , . . . , qm ) from the posterior and fi-
from the sanitized data by a simple formula involving the small
nally they sample new cell counts D = (D1 , . . . , Dm ) from a
ball probability.
Multinomial(k, q). Thus, the distribution of D given X is
m Theorem 5.3. Assume that P has a bounded density p, and
j=1 (dj + aj + Cj ) that there exists n → 0 such that
P(D = d|X) = .
(k + n + j aj ) n 1
P ρ(F, F̂X ) > =O c (10)
They show that differential privacy requires aj + Cj ≥ 16 n
k/(eα − 1) for all j. If we take a1 = a2 = · · · = am then this for some c > 1. Further suppose that ρ satisfies the triangle in-
is similar to the first histogram-based data-release method we equality. Let Z = (Z1 , . . . , Zk ) be drawn from gx (z) given in (9).
discussed in this section. They also suggest a weakened version Then,
of differential privacy.
P(ρ(F, F̂Z ) > n )
5. EXPONENTIAL MECHANISM
(supx p(x))k exp (−3αn /(16)) 1
≤ +O c . (11)
In this section we will consider the exponential mechanism R(k, n /2) n
in some detail. We will derive some general results about accu- Thus, if we can choose k = kn in such a way that the right-
racy and apply the method to the mean, and to density estima- hand side of (11) goes to 0, then the mechanism is consistent.
tion. Specifically, we will show the following for exponential We now show some examples that satisfy these conditions and
mechanisms: we show how to choose kn .
380 Journal of the American Statistical Association, March 2010
The minimax rate of convergence in L2 norm for P(γ , C) is Theorem 6.3. Let Z = (Z1 , . . . , Zk ) be drawn from q̂. As-
n−2γ /(2γ +1) (Efromovich 1999). Thus sume that γ > 1. If we choose k ≥ n, then
inf sup E (p̂(x) − p(x))2 dx ≥ c1 n−2γ /(2γ +1) ρ 2 (p, p̂Z ) = OP n−2γ /(2γ +1)
p̂ P∈P (γ ,C)
for some c1 > 0. This rate is achieved by the estimator where p̂Z is the orthogonal series density estimator based on Z.
mn
Hence, again, the perturbation technique achieves the mini-
p̂(x) = 1 + β̂j ψj (x), (14)
max rate of convergence and so appears to be superior to the
j=1
exponential mechanism. We do not know if this is because the
n
where mn = n1/(2γ +1) and β̂j = n−1 i=1 ψj (Xi ). See Efromo- exponential mechanism is inherently less accurate, or if our
vich (1999). bounds for the exponential mechanism are not tight enough.
Wasserman and Zhou: A Statistical Framework for Differential Privacy 381
dH(t|X = x)
= P(Z ∈ B|T = t) dH(t|X = x
)
Figure 1. Top two plots n = 100. Bottom two plots n = 1000. Each dH(t|X = x
)
plot shows the mean integrated squared error of the histogram. The
lower line is from the histogram based on the original data. The upper ≤e α
P(Z ∈ B|T = t) dH(t|X = x
)
line is based on the perturbed histogram. A color version of this figure
is available in the electronic version of this article. = eα P(Z ∈ B|X = x
).
382 Journal of the American Statistical Association, March 2010
9.3 Proof of Theorem 3.2 where the maximum is achieved when p̂j (Y) = 1/n and
p̂j (X) = 0, given a fixed set of parameters m, n, δ.
Our proof is adapted from an argument given in theorem 5.1
of Blum, Ligett, and Roth (2008). Let r = 1 so that X = [0, 1]. Thus we have
Let P = δ0 where δ0 denotes a point mass at 0. Then Pn (X = k
f̂ (x) (1 − δ)m
X(0) ) = 1 where X(0) ≡ {0, . . . , 0}. Assume that Qn is consis- sup ≤ +1 ,
tent. Since F(0) = 1, it follows that for any δ > 0, P(F̂Z (0) > x∈([0,1]r ,...,[0,1]r ) ĝ(x) nδ
1 − δ) → 1. But since P(·) = EP Qn (·|X) and since Pn (X = and the theorem holds.
X(0) ) = 1, this implies that Qn (F̂Z (0) > 1 − δ|X = X(0) ) → 1.
Let v > 0 be any point in [0, 1] such that Qn (Z = v|X = 9.5 Proof of Theorem 4.2
X(0) ) = 0. Let X(1) = {v, 0, . . . , 0}, X(2) = {v, v, 0, . . . , 0}, . . . , Recall that F̂Z denotes the empirical distribution function
X(n) = {v, v, . . . , v}. By assumption, Qn (Z = X(j) |X = X(0) ) = 0 corresponding to Z = (Z1 , . . . , Zk ), where Zi ∈ [0, 1]r for all
for all j ≥ 1. Differential privacy implies that Qn (Z = X(j) |X = i are iid draws from density function f̂m,δ (x) as in (5) given
X(1) ) = 0 for all j ≥ 1. Applying differential privacy again im- X = (X1 , . . . , Xn ). Let U denote the uniform cdf on [0, 1]r .
plies that Qn (Z = X(j) |X = X(2) ) = 0 for all j ≥ 1. Continuing Given X = (X1 , . . . , Xn ) drawn from a distribution whose cdf
this way, we conclude that Qn (Z = X(j) |X = X(n) ) = 0 for all is F, let f̂m denote the histogram estimator on X and let
j ≥ 1. x
F̂m (x) = 0 f̂m (s) ds and F̂m,δ (x) = (1 − δ)F̂m (x) + δU(x). De-
Next let P = δv . Arguing as before, we know that Qn (F̂Z (v) <
1 − δ|X = X(n) ) → 0. And since F(v−) = 0 we also have that fine Fm (x) = E(F̂m (x)) and f̄m (x) = E(f̂m (x)).
Qn (F̂Z (v−) > δ|X = X(n) ) → 0. Here, F(v−) = limi→∞ F(vi ) The Vapnik–Chervonenkis dimension of the class of sets of
where v1 < v2 < · · · and vi → v. Hence, for j/n > 1−δ, Qn (Z = the form {(−∞, x1 ] × · · · × (−∞, xr ] is r and so by the standard
X(j) |X = X(n) ) > 0 which is a contradiction. Vapnik–Chervonenkis bound, we have for > 0 that
n 2
9.4 Proof of Theorem 4.1 P sup |F̂X (t) − F(t)| > ≤ 8n exp − r
t∈[0,1]r 32
Suppose that X differs from Y in at most one observation.
Let f̂ denote the perturbed histogram f̂m,δ based on X and let n 2
≤ exp − (18)
ĝm,δ denote the histogram based on Y, such that X and Y differ 64
in one entry. We also use p̂j (X) and p̂j (Y) for cell proportions.
Note that |p̂j (X) − p̂j (Y)| < 1/n by definition. It is clear that the for large n. Hence, E supt∈[0,1]r |F̂X (t) − F(t)| = O( r log n
n ).
maximum density ratio for a single draw xi , or all i, occurs in Given X, we have Z1 , . . . , Zk ∼ F̂m,δ and so E sup[0,1]r |F̂Z (t) −
one bin Bj . Now consider x = (x1 , . . . , xi ) such that for all i =
1, . . . , k, we have xi ∈ Bj ⊂ [0, 1]r and the following bounds: F̂m,δ (t)| = O( r log k
k ). Thus,
1. Let p̂j (Y) = 0; then in order to maximize f̂ (x)/ĝ(x), we E sup |F̂Z (x) − F(x)|
let p̂j (X) = 1/n and obtain x∈[0,1]r
density ratio at x, we may need to reverse the role of X By the triangle inequality, we have for all x ∈ [0, 1]r ,
and Y,
|F̂m (x) − F(x)| ≤ |F̂m (x) − Fm (x)| + |Fm (x) − F(x)|,
ĝ(x) f̂ (x)
max , and hence
f̂ (x) ĝ(x)
k E sup |F̂m (x) − F(x)|
(1 − δ)mp̂j + δ x∈[0,1]r
≤ max ,
(1 − δ)m(p̂j − (1/n)) + δ ≤ E sup |F̂m (x) − Fm (x)| + E sup |Fm (x) − F(x)|
x∈[0,1]r x∈[0,1]r
(1 − δ)m(p̂j + 1/n) + δ k
(1 − δ)mp̂j + δ r log n
=O + E sup |Fm (x) − F(x)|, (19)
k n x∈[0,1]r
(1 − δ)m(1/n)
≤ max +1 where the last step follows from the VC bound as in (18) for
(1 − δ)m(p̂j − (1/n)) + δ
Fm (x).
k
(1 − δ)m Next we bound supx∈[0,1]r |Fm (x) − F(x)|. Now F(x) =
≤ +1 , P(A) where A = {(s1 , . . . , sr ) : si ≤ xi , i = 1, . . . , r}. If x =
nδ
Wasserman and Zhou: A Statistical Framework for Differential Privacy 383
(j1 h, . . . , jr h) for some integers j1 , . . . , jr then F(x) − Fm (x) = 9.7 Proof of Theorem 4.4
0. For x not of this form, let x̃ = (j1 h, . . . , jr h) where ji =
(1) Note that p − f̂Z = p − f̃ + f̃ − f̂Z = p − f̃ + OP ( mk ). When
xi /h. Let R = {(s1 , . . . , sr ) : si ≤ x̃i , i = 1, . . . , r}. So
k ≥ n, the latter error is lower order than the other terms and
F(x) − Fm (x) = P(A) − Pm (A) may be ignored. Now,
= P(R) − Pm (R) + P(A \ R) − Pm (A \ R) p(x) − f̃ (x) = p(x) − f̂m (x) + f̂m (x) − f̃ (x).
where Pm (B) = B dFm (u) and the set A \ R intersects at most (p(x) − f̃ (x))2 dx (p(x) − f̂m (x))2 dx
rh/hr number of cubes in {B1 , . . . , Bm }, given that Vol(A \ R) ≤
1 − (1 − h)r ≤ rh. Now by the Lipschitz condition (4), we have + (f̂m (x) − f̃ (x))2 dx.
√
supx∈[0,1]r |p(x) − f̄m (x)| ≤ Lh r and
The expected value of the first term is the usual risk, namely,
|P(A \ R) − Pm (A \ R)| O(m−2/r + m/n).
≤ number of cubes intersecting (A \ R) For the second term, we proceed as follows. Let p̂j = Cj /n
and
× maximum density discrepancy
(Cj + νj )+
× volume of cube q̂j = m .
√ s=1 (Cs + νs )+
≤ (rh/hr ) · (Lh r) · hr ≤ Lr3/2 m−2/r . (21) We claim that
Thus we have by (19), (20), and (21) log m
max |q̂j − p̂j | = O
j n
r log n
E sup |F̂m (x) − F(x)| = O + Lr3/2 m−2/r . (22) almost surely, for all large n. We have
x n
(Cj + νj )+ n (Cj + νj )+ 1
Hence, q̂j = m = ,
n s=1 (Cs + νs )+ n Rn
r log k r log n where Rn = ( m
E sup |F̂Z (x) − F(x)| = O +O s=1 (Cs + νs )+ )/n. Now
x k n
|νj | νj (Cj + νj ) (Cj + νj )+ |νj |
+ Lr3/2 m−2/r + δ. p̂j − ≤ p̂j + = ≤ ≤ p̂j + .
n n n n n
Set m
nr/(6+r) , k
m4/r = n4/(6+r) and δ = (mk/nα) we get Therefore,
√
log n
for all n large enough, E supx |F̂Z (x) − F(x)| = O( n2/(6+r) ). (Cj + νj )+ |νj | M
− p̂
j ≤ ≤ ,
n n n
9.6 Proof of Theorem 4.3
where M = max{|ν1 |, . . . , |νm |}. Let A > 0. The density for νj
Let f̂Z be the histogram based on Z as in (8). Then has the form f (ν) = (β/2)e−β|ν| . So,
(f̂Z (u) − p(u))2 (1 − δ)2 (p(u) − f̂m (u))2 P(M > A log m) ≤ mP(|νj | > A log m)
∞
+ δ (p(u) − 1) + (f̂m,δ (u) − f̂Z ) ,
2 2 2 1
= βm e−β|ν| dν = .
A log m mAβ−1
where means less than, up to constants. Hence,
By choosing A large enough we have that M < A log m a.s. for
large n, by the Borel–Cantelli lemma. Therefore,
E (f̂Z (u) − p(u)) du Rm + δ + E (f̂m,δ (u) − f̂Z (u))2 du,
2 2
(Cj + νj )+ log m
− p̂j ≤ .
where Rm is the usual L2 risk of a histogram under the Lipschitz n n
condition (4), namely, m−2/r + m/n. Conditional on X, f̂Z is an
Now we bound Rn . We have
unbiased estimate of f̂m with integrated variance m/k. So, m
|νs | νs (Cs + νs )+
m m 1− s ≤ 1 + s ≤ Rn = s=1
E (f̂Z (u) − p(u))2 du m−2/r + + δ 2 + . n n n
n k
|νs |
≤1+ s
Minimizing this, subject to (6) yields n
so that
m
nr/(2r+3) , k
n(r+2)/(2r+3) , δ
n−1/(2r+3)
s |νs | Mm m log m
which yields E (f̂Z (u) − p(u))2 du = O(n−2/(2r+3) ). |Rn − 1| ≤ ≤ =O a.s.
n n n
384 Journal of the American Statistical Association, March 2010
+ E sup |F̃m (x) − F̂Z (x)| = P(ρ(F, F̂Z ) > n , An ) + P(ρ(F, F̂Z ) > n , Acn )
x
≤ P(ρ(F, F̂Z ) > n , An ) + P(Acn )
≤ E sup |F(x) − F̂m (x)| + E sup |F̂m (x) − F̃m (x)|
x x 1
= P(ρ(F, F̂Z ) > n , An ) + O c . (23)
r log k n
+O .
k
By the triangle inequality ρ(F̂u , F̂X ) ≥ ρ(F̂u , F) − ρ(F̂X , F).
Since we may take k as large as we like, we can make the last Then,
term arbitrarily small. From (22),
−αρ(F̂X , F̂u )
r log n gx (u) du = exp du
E sup |F(x) − F̂m (x)| = O + Lr3/2 m−2/r . Bc Bc 2
x n
−α(ρ(F̂u , F) − ρ(F̂X , F))
Let f̂ (x) = h−r m j=1 p̂j I(x ∈ Bj ) and let f̃ (x) = h
−r ×
≤ exp du
m
Bc 2
j=1 q̂j I(x ∈ Bj ). Let x = (u1 h, . . . , ur h) where ui = xi /h,
∀i = 1, . . . , r. Recall that B1 , . . . , Bm are the m bins of X with
αρ(F̂X , F) −αρ(F̂u , F)
sides of length of h. Let Bx denote the cube with the left-most = exp exp du
2 Bc 2
corner being 0 and the right-most corner being x. Then for all
By the triangle inequality, we also have ρ(F̂u , F̂X ) ≤ ρ(F̂u , F)+ 9.10 Proof of Theorem 5.4
ρ(F̂X , F) and
We need the following small ball result; see Li and Shao
(2001).
−αρ(F̂X , F̂u )
gx (u) du ≥ gx (u) du = exp du
B/2 B/2 2 Theorem 9.1. Let r ≥ 3, and {Xt , t ∈ [0, 1]r } be the Brownian
sheet. Then there exists 0 < Cr < ∞ such that for all 0 < ≤ 1,
−αρ(F̂X , F)
≥ exp log P sup |Xt | ≤ ≥ −Cr −2 log2r−1 (1/),
2
t∈[0,1]r
−αρ(F, F̂u )
× exp du where Cr depends only on r. The same bound holds for a
B/2 2 Brownian bridge.
≥ 32
(supx p(x))k 2/3
B
× P(ρ(F, Ĝ) ≤ /2), ≤ 8 exp −c5 n1/3 + r log n
3α
B
where Ĝ is the empirical cdf from a sample of size k drawn = 8 exp −c6 kn + c7 r log kn
from P. Thus we have 3α
B
h(u|x) du = 8 exp −C2 kn (24)
3α
Bc
for some constants c5 , c6 , c7 , C2 > 0 for n large enough. Thus
(supx p(x))k exp (αρ(F̂X , F)/) exp (−α/(4))
≤ . (10) √ holds. Now we compute the small ball probability. Note
P(ρ(F, Ĝ) ≤ /2) that k(F̂k − F) converges to a Brownian bridge Bk on [0, 1]r .
More precisely, from Csörgő and Révész (1975) there exist a
Thus, from (23), sequence of Brownian bridges Bk such that
√ (log k)3/2
P(ρ(F, F̂Z ) > ) ≤ P ρ(F̂X , F) ≥ sup | k(F̂k − F)(t) − Bk (t)| = O a.s., (25)
16 t kγ
(supx p(x))k exp (−3α/(16)) where γ = 1/(2(r + 1)). It is clear that the RHS of (25) is o(1)
+
P(ρ(F, Ĝ) ≤ /2) a.s. given a fixed r. Hence we have for k = kn and n as chosen
in the theorem statement, and for all ≥ n , it holds that
(supx p(x))k exp (−3α/(16)) 1
= +O c .
P(ρ(F, Ĝ) ≤ /2) n log P sup |F̂Z (t) − F(t)| ≤ /2
t
√ √
Thus the theorem holds.
= log P sup k|F̂Z (t) − F(t)| ≤ k/2
t
9.9 Proof of Lemma 5.1 √
≥ log P sup |Bk (t)| ≤ k − O k−γ (log k)3/2 (26)
We start with KS. By the triangle inequality, we have for all t
√
z ∈ X k and for all x, y ∈ X n , k
≥ log P sup |Bk (t)| ≤ (27)
t 4
|ρ(F̂x , F̂z ) − ρ(F̂y , F̂z )| ≤ ρ(F̂x , F̂y ).
for all large
√ n, where
√ (26) follows from (25) and (27) holds
Notice that changing one entry in x will change F̂x (t) by at most given that k ≥ kn n ≥ c for some constant c > 1/2 due to
1 our choice of kn and n . Also, ≤ 1/n for KS distance. Hence,
n at any t by definition, that is,
by Theorem 5.3 and (24), we have for B = log supx p(x) > 0,
1
sup |F̂x (t) − F̂y (t)| = . P(ρ(F, F̂Z ) > n )
t∈[0,1] r n √
3αn Bkn C1 | log( kn n /4)|2r−1
Thus the conclusion holds for the KS distance. ≤ C0 exp −n − −
16 n nkn n2
386 Journal of the American Statistical Association, March 2010
√
1
m
B kn n
+ 8 exp −C2 ≤ |ψj (X1 )| + |ψj (Y1 )| |ψj (x)|
3α n
j=1
B
≤ C0 exp(−C3 Bkn /2) + 8 exp −C2 kn 2c20 mn
3α ≤ .
n
→0 (28)
2c20 mn
Hence ≤ n .
for some constants C0 , C1 , C2 , and C3 , where (28) holds when
B 1/3 −1/3
we take w.l.o.g. kn = 16 1 3α 2/3 2/3
(B) n and n ≥ 2( 3α ) n , Proof of Theorem 6.2. For u = (u1 , . . . , uk ) ∈ X k , we let
B 1/3 −1/3
given that n ≥ 2( 3α ) n = 3nα and hence 16 ≥ 2Bk
32kn B 3αn n
n .
mk
Thus the result follows. p̂u (x) = 1 + β̂j ψj (x),
j=1
Remark 9.2. The constants taken in the proof are arbi-
trary; indeed, when we take kn = C4 ( 3α 2/3 n2/3 and =
B ) n where mk = k1/(2γ +1) and β̂j = k−1 ki=1 ψj (ui ).
B 1/3 −1/3
32C4 ( 3α ) n with some constant C4 ≥ 1/16, (28) will Let F̂u be the empirical distribution based on u. Our proof
hold with slightly different√constants C2 , C3 . For kn and n as follows that of Theorem 5.3, with
chosen above, it holds that kn n
1.
ρ(F, F̂u ) = p − p̂u 2 and ρ(FX , F̂u ) = p̂X − p̂u 2
9.11 Proofs of Lemma 6.1 and Theorem 6.2
as defined in (15) for X = (X1 , . . . , Xn ). Now
Throughout this section, we let p̂X denote the estimator as
defined in (14), which is based on a sample of size n drawn in- B = u = (u1 , . . . , uk ) : p − p̂u 2 < .
dependently from F. Similarly, we let p̂k denote the same esti-
Thus the corresponding triangle inequalities that we use to re-
mator based on an iid sample (Y1 , . . . , Yk ) of size k
drawn from
place that in Theorem 5.3 are:
F, with mk = k1/(2γ +1) replacing mn and β̂j = k−1 ki=1 ψj (Yi )
in (14). We let p̂Z denote the estimator as in (16), based on p̂u − p̂X 2 ≥ p̂u − p2 − p̂X − p2 and
an iid sample Z = (Z1 , . . . , Zk ) of size k drawn from gx (z) as
p̂u − p̂X 2 ≤ p̂u − p2 + p − p̂X 2 .
in (17).
Standard risk calculations show that (10) holds for some c >
Proof of Lemma 6.1. Without loss of generality, let X =
0 with ρ(F, F̂X ) being replaced with p̂X − p2 . That is, by
(x, X2 , . . . , Xn ) and Y = (y, X2 , . . . , Xn ) so that δ(X, Y) = 1 and
Markov’s inequality,
let Z ∈ X k . Recall that
1/2 Ep̂X − p22
ξ(X, Z) = (p̂X (x) − p̂Z (x))2 dx , P p̂X − p2 > ≤
2
1/2 and (10) follows from the polynomial decay of the mean
ξ(Y, Z) = (p̂Y (x) − p̂Z (x))2 dx . squared error Ep̂X − p2 . Thus, from (23), for p̂Z = p̂∗ as in
(16),
In particular, let us define u = p̂X − p̂Z and v = p̂Y − p̂Z and thus P p − p̂Z 2 >
1/2
(supx p(x))k exp (−3α/(16))
|ξ(X, Z) − ξ(Y, Z)| = (p̂X (x) − p̂Z (x))2 dx ≤ P p̂X − p2 ≥ +
16 P(p − p̂k 2 ≤ /2)
1/2
(supx p(x))k exp (−3α/(16)) 1
− (p̂Y (x) − p̂Z (x))2 dx = +O c .
P(p − p̂k 2 ≤ /2) n
= u2 − v2 ≤ u − v2 We need to compute the small ball probability. Recall that p̂k
denote the estimator based on a sample of size k. By Parseval’s
= p̂X − p̂Z − (p̂Y − p̂Z )2
relation,
2c20 mn
∞
= p̂X − p̂Y 2 ≤ ,
mk
n (p(x) − p̂k (x)) dx =
2
(β̂j − βj )2 + βj2
where the first inequality is due to the triangle inequality for the j=1 mk +1
mk
Let Q = j=1 (β̂j − βj )2 and let S = k−1/2 ki=1 Yi . Then, for = c2 exp n1/2 log sup p(x) − αc3 n3γ /(2(2γ +1))
all large k, and any δ > 0, x
P(Q ≤ δ 2 ) = P(ST k S ≤ kδ 2 ) = c2 exp −αc4 n3γ /(2(2γ +1)) → 0,
kδ 2 kδ 2
≥ P ST S ≤ ≥ P ST S ≤ . as n → ∞ since 2(2γ3γ+1) > 1/2, where c2 , c3 , c4 are some con-
λk 2λ
stants. Hence the theorem holds.
From theorem 1.1 of Bentkus (2003) we have that
Lemma 9.3. Let λ = lim supk→∞ λk . Then λ < ∞.
T 2 m3k
sup P(S S ≤ c) − P χmk ≤ c = O Proof. Recall that the orthonormal basis is ψ0 , ψ1 , . . . ,
c k √
where ψ0 = 1 and ψj (x) = 2 cos(πjx). Also p(x) = 1 +
−(γ −1)/(2γ +1) ∞ 2 2γ ∞
=O k . j=1 βj ψj (x) and j βj j < ∞. Note that j=1 |βj | =
k
P p − p̂Z 2 > n βj
=√
βk
ψ2j (x)ψk (x) dx + √ ψ2k (x)ψj (x) dx
√ 2 2
n
≤ P p̂X − p2 ≥ 1
16 +√ β (ψj−k (x) + ψj+k (x))ψ (x) dx
√ 2 =j,k
(supx p(x))k exp (−3α n /(16))
+ √
P(p − p̂k 2 ≤ n /2) βj βk
= √ I(2j = k) + √ I(2k = j)
√ 2 2
(supx p(x))k exp (−3α n /(16)) 1
= √ +O c β β
P(p − p̂k 2 ≤ n /2) n + √ I( = |j − k| & j = 2k) + √ I( = j + k)
√ 2 2
(supx p(x))k exp (−3αn n /(32c20 mn )) 1
≤ √ +O c βk β|j−k| βj+k
P(p − p̂k 2 ≤ n /2) n = √ I(2k = j) + √ I(j = 2k) + √
2 2 2
and so for γ > 1,
β|j−k| βj+k
= √ + √ ,
P (p̂Z − p)2 ≤ n 2 2
√
−3 c1 αn where we used the fact that ψ−j (x) = ψj (x) for all j = 1, 2, . . .
≤ c2 exp k log sup p(x) exp and ψj (x) dx = 0 for all j > 0. So, we have for all j ∈
x n1/(2γ +1) nγ /2(2γ +1)
388 Journal of the American Statistical Association, March 2010
{1, . . . , p}, Dwork, C., and Nissim, K. (2004), “Privacy-Preserving Datamining on Verti-
cally Partitioned Databases,” in Proceedings of the 24th Annual Interna-
p β|j−k| βj+k tional Cryptology Conference—CRYPTO, New York: Springer, pp. 528–
|jk | = |jj | + √ + √ − βj βk 544. [375]
2 2 Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006), “Calibrating Noise
k=1 j=k
to Sensitivity in Private Data Analysis,” in Proceedings of the 3rd Theory
β|j−k| βj+k
β2j of Cryptography Conference, New York: Springer, pp. 265–284. [375,377,
≤ 1 + √ + |βj | |βk | + √ + √ 379]
2 2 2 Dwork, C., McSherry, F., and Talwar, K. (2007), “The Price of Privacy and
k j=k the Limits of LP Decoding,” in Proceedings of the 39th Annual ACM Sym-
∞ posium on Theory of Computing, New York: Association for Computing
β2j √ Machinery, pp. 85–94. [375]
≤ 1 + √ + (|βj | + 2) |βk | Dwork, C., Naor, M., Reingold, O., Rothblum, G., and Vadhan, S. (2009), “On
2 k=1 the Complexity of Differentially Private Data Release,” in Proceedings of
the 41st ACM Symposium on Theory of Computing, New York: Association
= O(1). for Computing Machinery, pp. 381–390. [375]
Efromovich, S. (1999), Nonparametric Curve Estimation: Methods, Theory and
Hence, lim supk→∞ λmax (k ) ≤ k ∞ = O(1) and the lemma Applications, New York: Springer-Verlag. [380,387]
Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. (2004), “Privacy Pre-
holds. serving Mining of Association Rules,” Information Systems, 29, 343–364.
[375]
9.12 Proof of Theorem 6.3 Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright,
R. N. (2006), “Secure Multiparty Computation of Approximations,” ACM
The proof is similar to the proof of Theorem 4.4, so we pro- Transactions on Algorithms, 2, 435–472. [375]
vide a short outline. In particular, the effect of truncation can be Feldman, D., Fiat, A., Kaplan, H., and Nissim, K. (2009), “Private Coresets,”
in Proceedings of the 41st ACM Symposium on Theory of Computing, New
shown to be negligible as in the proof of Theorem 4.4. We have York: Association for Computing Machinery, pp. 361–370. [375]
p − p̂Z = p − q̂ + q̂ − p̂Z = p − q̂ + OP (m/k) and the latter term is Fienberg, S., and McIntyre, J. (2004), “Data Swapping: Variations on a Theme
by Dalenius and Reiss,” Privacy in Statistical Databases, 3050, 14–29.
negligible for k ≥ n. Now p − q̂ = p − p̂ + p̂ − q̂. The term p − p̂ [375]
is the usual error term and contributes O(n−2γ /(2γ +1) ) to the Fienberg, S. E., Karr, A. F., Nardi, Y., and Slavkovic, A. (2007), “Secure Lo-
risk. For the second term, (p̂ − q̂) = m2
j=1 νj = OP (m/n) =
2 gistic Regression With Distributed Databases,” in Bulletin of the ISI, New
−2γ /(2γ +1)
York: Springer. [375]
OP (n ). Fienberg, S. E., Makov, U. E., and Steele, R. J. (1998), “Disclosure Limitation
[Received November 2008. Revised October 2009.] Using Perturbation and Related Methods for Categorial Data” (with discus-
sion), Journal of Official Statistics, 14, 485–511. [375]
Ganta, S., Kasiviswanathan, S., and Smith, A. (2008), “Composition Attacks
REFERENCES and Auxiliary Information in Data Privacy,” in Proceedings of 14th ACM
SIGKDD International Conference on Knowledge Discovery and Data
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas,
Mining, New York: Association for Computing Machinery, pp. 265–273.
D., and Zhu, A. (2006), “Achieving Anonymity via Clustering,” in Proceed-
[377]
ings of the 25th ACM SIGMOD–SIGACT–SIGART Symposium on Princi- Ghosh, A., Roughgarden, T., and Sundararajan, M. (2009), “Universally
ples of Database Systems, New York: Association for Computing Machin- Utility-Maximizing Privacy Mechanisms,” in Proceedings of the 41st ACM
ery, pp. 153–162. [375] Symposium on Theory of Computing, New York: Association for Comput-
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., and Talwar, K. ing Machinery, pp. 351–360. [375]
(2007), “Privacy, Accuracy, and Consistency Too: A Holistic Solution to Hall, P., and Murison, R. D. (1993), “Correcting the Negativity of High-Order
Contingency Table Release,” in Proceedings of the 26th ACM SIGMOD– Kernel Density Estimators,” Journal of Multivariate Analysis, 47, 103–122.
SIGACT–SIGART Symposium on Principles of Database Systems, New [380]
York: Association for Computing Machinery, pp. 273–282. [375] Hwang, J. (1986), “Multiplicative Errors-in-Variables Models With Applica-
Bentkus, V. (2003), “On the Dependence of the Berry–Esseen Bound on Di- tions to Recent Data Released by the United-States-Department-of-Energy,”
mension,” Journal of Statistical Planning and Inference, 113, 385–402. Journal of the American Statistical Association, 81, 680–688. [375]
[387] Kasiviswanathan, S., Lee, H., Nissim, K., Raskhodnikova, S., and Smith, A.
Blum, A., Dwork, C., McSherry, F., and Nissim, K. (2005), “Practical Pri- (2008), “What Can We Learn Privately?” in Proceedings of the 49th An-
vacy: The SuLQ Framework,” in Proceedings of the 24th ACM SIGMOD– nual IEEE Symposium on Foundations of Computer Science, New York:
SIGACT–SIGART Symposium on Principles of Database Systems, New Springer, pp. 531–540. [375]
York: Association for Computing Machinery, pp. 128–138. [375] Kim, J. J., and Winkler, W. E. (2003), “Multiplicative Noise for Masking Con-
Blum, A., Ligett, K., and Roth, A. (2008), “A Learning Theory Approach to tinuous Data,” technical report, Statistical Research Division, U.S. Bureau
Non-Interactive Database Privacy,” in Proceedings of the 40th Annual ACM of the Census, Washington, DC. [375]
Symposium on Theory of Computing, New York: Association for Comput- Li, W., and Shao, Q.-M. (2001), “Gaussian Processes: Inequalities, Small Ball
ing Machinery, pp. 609–618. [375,377-379,382] Probabilities and Applications,” in Stochastic Processes: Theory and Meth-
Csörgő, M., and Révész, P. (1975), “A New Method to Prove Strassen Type ods. Handbook of Statistics, Vol. 19, eds. C. Rao and D. Shanbhag, Amster-
Laws of Invariance Principle. II,” Probability Theory and Related Fields, dam, The Netherlands: Elsevier, pp. 533–598. [385]
31, 261–269. [385] Li, N., Li, T., and Venkatasubramanian, S. (2007), “t-Closeness: Privacy Be-
Dinur, I., and Nissim, K. (2003), “Revealing Information While Preserving Pri- yond k-Anonymity and l-Diversity,” in Proceedings of the 23rd Interna-
vacy,” in Proceedings of the 22nd ACM SIGMOD–SIGACT–SIGART Sym- tional Conference on Data Engineering, Istanbul, Turkey, pp. 106–115.
posium on Principles of Database Systems, New York: Association for [375]
Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M.
Computing Machinery, pp. 202–210. [375]
(2006), “-Diversity: Privacy Beyond Kappa-Anonymity,” in Proceedings
Duncan, G., and Lambert, D. (1986), “Disclosure-Limited Data Dissemina- of the 22nd International Conference on Data Engineering, New York: As-
tion,” Journal of the American Statistical Association, 81, 10–28. [375] sociation for Computing Machinery, p. 24. [375]
(1989), “The Risk of Disclosure for Microdata,” Journal of Business Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., and Vilhuber, L. (2008),
& Economic Statistics, 7, 207–217. [375] “Privacy: Theory Meets Practice on the Map,” in Proceedings of the 24th
Duncan, G., and Pearson, R. (1991), “Enhancing Access to Microdata While International Conference on Data Engineering, Istanbul, Turkey, pp. 277–
Protecting Confidentiality: Prospects for the Future,” Statistical Science, 6, 286. [375,379]
219–232. [375] McSherry, F., and Talwar, K. (2007), “Mechanism Design via Differential Pri-
Dwork, C. (2006), “Differential Privacy,” in The 33rd International Colloquium vacy,” in Proceedings of the 48th Annual IEEE Symposium on Foundations
on Automata, Languages and Programming, New York: Springer, pp. 1–12. of Computer Science, New York: Springer, pp. 94–103. [375-377]
[375,377] Nissim, K., Raskhodnikova, S., and Smith, A. (2007), “Smooth Sensitivity and
Dwork, C., and Lei, J. (2009), “Differential Privacy and Robust Statistics,” in Sampling in Private Data Analysis,” in Proceedings of the 39th Annual ACM
Proceedings of the 41st ACM Symposium on Theory of Computing, New Symposium on Theory of Computing, New York: Association for Comput-
York: Association for Computing Machinery, pp. 371–380. [375] ing Machinery, pp. 75–84. [375]
Wasserman and Zhou: A Statistical Framework for Differential Privacy 389
Pinkas, B. (2002), “Cryptographic Techniques for Privacy-Preserving Data Data Mining, New York: Association for Computing Machinery, pp. 677–
Mining,” ACM SIGKDD Explorations Newsletter, 4, 12–19. [375] 682. [375]
Rastogi, V., Hay, M., Miklau, G., and Suciu, D. (2009), “Relationship Privacy: Scott, D. W. (1992), Multivariate Density Estimation: Theory, Practice, and
Output Perturbation for Queries With Joins,” in Proceedings of the 28th Visualization, New York: Wiley. [378]
ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Smith, A. (2008), “Efficient, Differentially Private Point Estimators,” available
Systems, PODS 2009, New York: Association for Computing Machinery, at ArXiv:0809.4794v1. [375]
pp. 107–116. [377] Sweeney, L. (2002), “k-Anonymity: A Model for Protecting Privacy,” Inter-
Reiter, J. (2005), “Estimating Risks of Identification Disclosure for Microdata,” national Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
Journal of the American Statistical Association, 100, 1103–1113. [375] 10, 557–579. [375]
Rohde, A., and Duembgen, L. (2008), “Confidence Sets for the Optimal Ap- Ting, D., Fienberg, S. E., and Trottini, M. (2008), “Random Orthogonal Ma-
proximating Model—Bridging a Gap Between Adaptive Point Estimation trix Masking Methodology for Microdata Release,” International Journal
and Confidence Regions,” available at arXiv:0802.3276v2 [math.ST]. [387] of Information and Computer Security, 2, 86–105. [375]
Sanil, A. P., Karr, A., Lin, X., and Reiter, J. P. (2004), “Privacy Preserv- Warner, S. (1965), “Randomized Response: A Survey Technique for Eliminat-
ing Regression Modelling via Distributed Computation,” in Proceedings of ing Evasive Answer Bias,” Journal of the American Statistical Association,
10th ACM SIGKDD International Conference on Knowledge Discovery and 60, 63–69. [375]