DATA STORAGE VIEWPOINT OF PRIVACY

Slides from Prof. Johan Christoph Freytag (Humboldt University, Berlin)

Outline
2


 

Privacy Privacy and context Privacy and mobility Privacy and context combined with privacy and mobility

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Paris, 11.04.2008

Privacy
Privacy of movement
3

RFID

B-O-333
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost Paris, 11.04.2008

Privacy
Is it always obvious?
5

Is it always obvious that privacy is violated or breached? Latanya Sweeney‟s Finding

[Sween’01]

In Massachusetts, USA, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees GIC has to publish the data:

GIC(zip, dob, sex, diagnosis, procedure, ...)

date of birth
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

http://lab.privacy.cs.cmu.edu/people/sweeney/

Paris, 11.04.2008

Privacy
Latanya Sweeney‟s Finding
6

Sweeney paid $20 and bought the voter registration list for Cambridge, MA:

GIC(zip, dob, sex, diagnosis, procedure, ...) VOTER(name, party, ..., zip, dob, sex)
   

William Weld (former governor) lives in Cambridge, hence is in VOTER 6 people in VOTER share his dob only 3 of them were man (same sex) Weld was the only one in that zip Sweeney learned Weld‟s medical records !
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost Paris, 11.04.2008

Privacy
Latanya Sweeney‟s Finding
7

Observation: All systems worked as specified, yet an important data has leaked
 “Information

leakage” occurred  Despite the observation that all “systems” worked as specified  Beyond correctness!  What‟s missing?

How do we protect against that kind of “lack (leakage) of privacy”?
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Data Security
8

Dorothy Denning, 1982: Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification Data Security = Confidentiality + Integrity (+ Availability) Distinct from system and network security
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
What is Privacy?
9

[Sween’02] Definition 1: “Privacy reflects the ability of a person, organization, government, or entity to control its own space, where the concept of space (or “privacy space”) takes on different contexts”.
   

Physical space, against invasion Bodily space, medical consent Computer space, spam Web browsing space, Internet privacy [Agrawal’03]

Definition 2: “Privacy is the right of individuals to determine for themselves when, how, and to what extent information about them is communicated to others”. (We shall call this data/information privacy)
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost Paris, 11.04.2008

Privacy
Anonymity and unobservability
10

message

access

anonymity group
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

«event»

Everybody could be the originator of an «event» with an equal likelihood
Paris, 11.04.2008

Privacy
Approaches for non-observable communication
11

Whom to protect?
 sender  (content

of message)
Message

Basic approach:
 Dummy  Proxies  MIX-Networks  DC-Networks 
Anonymity group

traffic

Access

Events

… more
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Maintaining data privacy for accessing databases
12

[Sween’02]

k-anonymity & its properties
introduced by Sweeney

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Paris, 11.04.2008

Privacy
An example: Medical Records
13

[Aggarwal’03]
Identifying
SoSecN Name 007 Chris Age 07 Ethnic B Caucas Zipcode 12344

Sensitive Information
Disease Arthritis

009 011
023 034 054 099

Jane Adam
Charlie Eve Yvonne John

77 28
27 27 44 65

Caucas Caucas
Afr-Amer Afr-Amer Hispanic Hispanic

53211 70234
95505 54327 12007 12007

Cold Heart problem
Flu Arthritis Diabetes Flu
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Medical Records: De-identify & Release
14

Sensitive Age
07 77 28 27 27 44

Ethnic B
Caucas Caucas Caucas Afr-Amer Afr-Amer Hispanic

Zipcode 12344 53211 70234 95505 54327 12007

Disease Arthritis Cold Heart problem Flu Arthritis Diabetes

65

Hispanic

12007

Flu

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Paris, 11.04.2008

Privacy
Not sufficient!
15

Sensitive
Uniquely identify you!

Age
07 77 28 27 27 44

Ethnic B
Caucas Caucas Caucas Afr-Amer Afr-Amer Hispanic

Zipcode 12344 53211

Disease Arthritis Cold

70234 Heart Quasi-identifiers: problem 95505 54327
reveal less information

Flu

Arthritis Diabetes

12007 k-anonymity model
Public Database 65 Hispanic
PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

12007

Flu

Paris, 11.04.2008

Privacy
k-anonymity – Problem Definition
16

 

Input: Database consisting of n rows, each with m attributes Set of domain values for attributes is finite Goal: Suppress some entries in the table such that each modified row becomes identical to at least k-1 other rows. Objective: Minimize the number of suppressed entries.
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Medical Records: 2-anonymized table
17

Age

Ethnic B

Zipcode

Disease

*
*

Caucas
Caucas

*
*

Arthritis
Cold

*
27 27 * *

Caucas
Afr-Amer Afr-Amer Hispanic Hispanic

*
* * 94042 94042

Heart problem
Flu Arthritis Diabetes Flu
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Accessing databases privately (Access privacy)
18

Patent-DB

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Paris, 11.04.2008

Privacy
First (naïve) approach
19

Problem to solve:

USER/ CLIENT

User/Client: no one should know the contents of the query nor the result (not even the server) Encrypting the communication between client and server might not be sufficient (Adversary might access decrypted query if he can get “inside” the database system and if he can observe disk access) Client downloads the entire DB & executes queries locally – unrealistic solution (size & ownership of data)
Paris, 11.04.2008

Observation:

query

DB SERVER

result

Naïve solution:

DB

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Accessing databases privately (Access privacy)
20

Simple Solution:

Use a „Secure Coprocessor“ (SC)
Proven Hardware properties:
 

[Asonov’01]

Cannot “observe” computation from outside If tampered self-destruction occurs

Read entire database per query  O(N)
Read entire database

Database Server 1 2 3 4 5 5 6 7

Encrypted (Return record x)

IBM 4758 Secure Coprocessor (SC)
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy
Metric (Probabilistic Privacy)
21

Using (Shannon‟s) entropy definition to “measure” privacy:
 

Pi ... Probability of query to access record i E ... uncertainty of adversary„s observation

E   Pi * ld Pi
i 1

N

E is maximal if all Pi‟s have the same value

i.e. the adversary cannot give some values stored in the db a higher probability of being accessed than others


Perfect privacy: E does not change by observations Probabilistic Privacy: adversary learns by observation (i.e. increase probability Pi for some records) Goal: minimize learning (i.e. minimize increase of probabilities Pi)
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

PDA – Probabilistic privacy
Security Parameters:  a … # of (sequential) requests to shuffled & encrypted database;  b … # of random requests to original database (… includes requested record)  reshuffling after N/b queries necessary
Database Server ? ? ? ? ? ? ? ? ? ? ? 1 2 3 4 5 5 6 7 8 9 10

shuffled and encypted database

original database

SC

Query

0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10

Probability distribution • Each record of original database: P=(1-a/N)/b • Others: P=(a/N)/(N-b) Therefore, no one record can be completely excluded from query

Privacy and context
23

Combinatorics
 Machine

learning  Use of backround knowledge  linkage attacks

Cancer
 Breast

cancer  Lung cancer

Male vs. female
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Privacy and context
Challenges
24

Modeling the domain of ITS
 

Ontologies can be used to specify relevant contexts Combine contexts with probabilities

Preventing


contexts to be identified contexts to be combined with individuals

Apply methods of anonymization and Probabilistic privacy (e.g. shuffle contexts) Shannon‟s entropy definition applicable (normalized) Contexts may change with the time (e.g. dense of traffic)

Pseudonyms (temporary identifiers)
Paris, 11.04.2008

PRECIOSA kick off meeting HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

Sign up to vote on this title
UsefulNot useful