You are on page 1of 30

COE 426 Data Privacy

Lecture 9: Differential Privacy II

Ch.2, Practical Data Privacy


Outline

•Review of differential privacy definition


•Global sensitivity
•Laplace Mechanism
•Examples
•Properties of differential privacy

COE426: Lecture 9 2
Recall Definition of Differential Privacy

•Randomized sanitization function has


-differential privacy if for all neighboring data sets D
and D' and all possible outputs of ,

Small -> better privacy

COE426: Lecture 9 3
Another Way to Understand DP

COE426: Lecture 9 4
Differential Privacy Modes

Query f
x
…1
Global xn
Differential Privacy

Analyst 𝒇 (𝑫 )+ 𝑳𝒂𝒑
𝐺 𝑆𝑓
𝜖 ( ) Database D

Example: Laplacian Mechanism

Query f
𝒇 ( 𝑼 𝟏) + 𝑵 𝟏 𝐷1
𝒇 ( 𝑼 𝟐) + 𝑵 𝟐
𝐷2
Local
Differential
Privacy

𝒇 ( 𝑫 ) =∑ 𝒇 ( 𝑼 𝒊 ) + 𝑵 𝒊
Analyst Database D
𝒇 ( 𝑼 𝟑) + 𝑵 𝟑
𝐷3
Example: Randomized Response

COE426: Lecture 10 5
How to Achieve DP?

•Randomized Response
•Laplace Mechanism
•Report Noisy Max
•Exponential Mechanism

COE426: Lecture 9 6
Randomized Response: Werner Process

Have you ever shoplifted?

Query f
x
…1
xn

Analyst Yes No Database D

• Werner process is an enhancement over the the process we


learnt in previous lectures

• Werner Process:
• To answer the question, throw a coin
• If tails, response faithfully
• If heads, flip a second coin and respond
• "Yes" if heads
• "No" if tails

COE426: Lecture 9 7
Warner Procedure is Differentially Private

Werner Process: Have you ever shoplifted?


Throw a coin Query f
If tails, response faithfully
x
If heads, flip a second coin and …1
xn
respond
"Yes" if heads
Analyst Yes No Database D
"No" if tails

Claim: Randomized response is ln(3)-DP


Proof: 𝑝[𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒=“ 𝑁𝑜 ”∨𝑇𝑟𝑢𝑡h =“ 𝑁𝑜”]
¿
𝑃 [ 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒=“ 𝑁𝑜 ”∨ 𝑇𝑟𝑢𝑡h=“ 𝑌𝑒𝑠”]

https://github.com/kjam/practical-data-privacy/blob/main/06%20-%20Local%20Differential%20Priva
cy%20via%20Randomized%20Response.ipynb

COE426: Lecture 9 8
Check Our Understanding of DP

• How to construct randomized algorithm ? And what does mean? Give me an


example?

• For a database of n records, how many possible neighboring databases are


there?

• Are there any restriction on the added/removed records?

• In Werner's procedure, how did we achieve neighboring databses?

• What should be the value of ? What does it mean exactly?

• Are there any drawback of DP? Why only big companies are using it?

• Is DP sensitive to the size of the table?

COE426: Lecture 9 9
How to Achieve DP?

•Randomized Response
•Laplace Mechanism
•Exponential Mechanism
•Report Noisy Max

COE426: Lecture 9 10
Global Sensitivity

•Recall No-Free-Lunch Theorem

•Sensitivity is a measure of how one record affects the


possible output of the query

•Global sensitivity

Where is the first-norm (- norm)

•Example
• where p and q are vectors
COE426: Lecture 9 11
Global Sensitivity: Example

•: # individuals with salary > $30K?


•= 1 𝐺 𝑆 𝑓 =Δ 𝑓 = max ‖ 𝑓 ( 𝐷 ) − 𝑓 ( 𝐷′ )‖1
′ 𝑛𝑖𝑒𝑔h𝑏𝑜𝑟𝑖𝑛𝑔 𝐷 , 𝐷 ′
′ ′
𝐷 1 𝐷2 𝐷3
ID Salary ID Salary ID Salary
D 1 45K 1 45K 1 45K
2 20K 2 20K 2 20K
ID Salary
3 15K 3 15K 3 15K
1 45K
4 50K 4 50K 5 60K
2 20K
5 60K 6 75K 6 75K
3 15K
4 50K ID Salary ID Salary ID Salary
5 60K 1 45K 1 45K 2 20K
6 75K 2 20K 3 15K 3 15K
4 50K 4 50K 4 50K
5 60K 5 60K 5 60K
6 75K 6 75K 6 75K
′ ′ ′
𝐷4 𝐷5 𝐷6
COE426: Lecture 9 12
Global Sensitivity: Example

•: Frequency histogram of sensitive attribute


• Global Sensitivity of f = 1
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
[0-20] [20-40] [40-60] >60 [0-20] [20-40] [40-60] >60

COE426: Lecture 9 13
Laplace Distribution

•Laplace distribution has density


1 −¿ 𝑥 −𝜇 ∨ ¿ ¿
𝑏
𝑒
2𝑏

: mean
: variance

•When =0,

COE426: Lecture 9 14
Laplace Mechanism

•Theorem: A mechanism that adds a random noise


drawn from Laplace distribution with variance () is -
differentially private.

COE426: Lecture 9 15
Laplace Mechanism Example

# individuals with salary > $30K?

Query f
x
…1
xn

Analyst 𝒇 (𝑫 )+ 𝑳𝒂𝒑
𝐺 𝑆𝑓
𝜖 ( ) Database D

Recall that count queries has

𝒇 ( 𝑫 )+ 𝑳𝒂𝒑 ()1
𝜖 Satisfies -differential privacy

Small -> more noise -> bad accuracy

https://github.com/kjam/practical-data-privacy/blob/main/02%20-%20
Exploring%20Differential%20Privacy%20with%20Laplace.ipynb
COE426: Lecture 9 16
Example: COUNT query

•Number of people having disease


• True answer = 3 D
• Sensitivity = 1 Disease
(Y/N)
Y
•Answer: , Y
• where is drawn from N
Y
• = np.random.laplace(0, , 1) N
N

COE426: Lecture 9 3617


Example: COUNT query

•Number of people having disease


• True answer = 3 D
• Sensitivity = 1 Disease
(Y/N)
Y
•Answer: , Y
• where is drawn from N
Y
• = np.random.laplace(0, , 1) N
N

COE426: Lecture 9 3618


Example: Sum query

•Sum of Salary (suppose Salary is in [a,b])


• Sensitivity = b ID Salary
1 45K
2 20K
3 15K
4 50K
5 60K

•Answer: , 6 75K

• where is drawn from

COE426: Lecture 9 19
Example: Average query

•Average of Salary (suppose Salary is in [a,b])


• Sensitivity = b ID Salary
1 45K
2 20K
3 15K
4 50K
5 60K

•Answer: , 6 75K

• where is drawn from

COE426: Lecture 9 20
Laplace Mechanism: Sketch of Proof

• Recall, randomized function is -differentially private if

Or
• For any possible output , the probability density is
proportional to the probability density of the added
noise
, because
• Then
• Recall

COE426: Lecture 9 21
Properties of Differential Privacy

•Differential privacy has three properties that makes it


very practical
1. Post-Processing invariance
2. Composition
1. Sequential composition
2. Parallel composition
3. Group privacy

COE426: Lecture 9 22
Post-processing

•If is an -differentially private algorithm that accesses a


private database D, then, where is any arbitrary function,
is also satisfies -differential privacy
•In other words, differential privacy is robust against
further process of a previous database output

# individuals with salary > $30K?

Query f
x1

xn

Analyst Database D

COE426: Lecture 9 23
Composability

•Composability is the ability to join the output of two (or


more) differentially privacy mechanisms

•Arbitrary composition (sequential and/or parallel) of k


differentially private algorithms is still differentially
private

•There are two compositions


1. Sequential composition
2. Parallel composition

COE426: Lecture 9 24
Sequential Composition

•Sending queries against the same


database
0.1

0.2

•The privacy budget is the sum of 0.2

privacy budgets for each query 0.3

Select sum(salary) from D

•Example: Select sum(age) from D

COE426: Lecture 9 25
Parallel Composition

•Sending (parallel) queries against


different parts of the database
0.1

0.2

•The privacy budget is the max of 0.2

privacy budgets for each query 0.3

Select count(*) from D1


Where age >=25 and age
<=30
•Example:
Select count(*) from D2
Where age >=31 and age
<=35

COE426: Lecture 9 26
Proof of Sequential Composability

•Claim: The algorithm is (ε1+ε2)-differentially private.

D) (D)
ε1-DP ε2-DP

•Proof (for discrete probability distributions): Fix


neighboring D, D' and any two outputs r1 in the range of
, and r2 in the range of .

COE426: Lecture 9 27
Why Composition?

•Reasoning about privacy of a complex algorithm is hard

•Helps software design process

•If building blocks are proven to be private, it would be


easy to reason about privacy of a complex algorithm
built entirely using these building blocks

COE426: Lecture 9 28
Group Privacy

•Differential privacy is designed to protect the privacy of


a single individual
• By considering neighboring databases differing in exactly one
record

•Differential privacy can also protect the privacy


neighboring databases differing in records
Pr [ 𝒜 ( 𝐷 )= 𝑦 ]
≤ 𝑒𝑐 𝜖
Pr [ 𝒜 ( 𝐷′ ) =𝑦 ]

•How to achieve -differential privacy for a group of


records?

COE426: Lecture 9 29
Next Attraction

•Exponential Mechanism
•Applying differential privacy
•Practicing Differential privacy with Python

COE426: Lecture 9 30

You might also like