You are on page 1of 23

DATA STORAGE

VIEWPOINT OF PRIVACY
Slides from Prof. Johan Christoph Freytag (Humboldt
University, Berlin)
Outline
2

 Privacy
 Privacy and context
 Privacy and mobility
 Privacy and context combined with privacy and
mobility

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Privacy of movement
3

RFID

B-O-333
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Is it always obvious?
5

 Is it always obvious that privacy is violated or breached?


[Sween’01]
 Latanya Sweeney‟s Finding
 In Massachusetts, USA, the Group Insurance Commission (GIC) is
responsible for purchasing health insurance for state employees

 GIC has to publish the data:

GIC(zip, dob, sex, diagnosis, procedure, ...)

date of birth
http://lab.privacy.cs.cmu.edu/people/sweeney/

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Latanya Sweeney‟s Finding
6

 Sweeney paid $20 and bought the voter registration list for
Cambridge, MA:
GIC(zip, dob, sex, diagnosis, procedure, ...)

VOTER(name, party, ..., zip, dob, sex)


 William Weld (former governor) lives in Cambridge, hence is
in VOTER
 6 people in VOTER share his dob
 only 3 of them were man (same sex)
 Weld was the only one in that zip
 Sweeney learned Weld‟s medical records !
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Latanya Sweeney‟s Finding
7

 Observation: All systems worked as specified, yet


an important data has leaked
 “Information leakage” occurred
 Despite the observation that all “systems” worked as
specified
 Beyond correctness!
 What‟s missing?

 How do we protect against that kind of “lack


(leakage) of privacy”?
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Data Security
8

 Dorothy Denning, 1982:


Data Security is the science and study of methods of
protecting data (...) from unauthorized disclosure
and modification

 Data Security =
Confidentiality + Integrity
(+ Availability)
 Distinct from system and network security
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
What is Privacy?
9

 Definition 1: [Sween’02]
“Privacy reflects the ability of a person, organization,
government, or entity to control its own space, where the concept
of space (or “privacy space”) takes on different contexts”.
 Physical space, against invasion
 Bodily space, medical consent
 Computer space, spam
 Web browsing space, Internet privacy

[Agrawal’03]
• Definition 2:
“Privacy is the right of individuals to determine for themselves when, how, and
to what extent information about them is communicated to others”.
(We shall call this data/information privacy)
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Anonymity and unobservability
10

message

access

anonymity group «event»


Everybody could be the originator of an «event» with an equal likelihood
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Approaches for non-observable communication
11

 Whom to protect?
 sender

 (content of message)
Message

 Basic approach:
Access
 Dummy traffic
 Proxies

 MIX-Networks Anonymity group Events

 DC-Networks

 … more
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Maintaining data privacy for accessing databases
12

[Sween’02]

k-anonymity &
its properties
introduced by Sweeney

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
An example: Medical Records
13
[Aggarwal’03]
Identifying Sensitive
Information
SoSecN Name Age Ethnic B Zipcode Disease
007 Chris 07 Caucas 12344 Arthritis
009 Jane 77 Caucas 53211 Cold
011 Adam 28 Caucas 70234 Heart problem
023 Charlie 27 Afr-Amer 95505 Flu
034 Eve 27 Afr-Amer 54327 Arthritis
054 Yvonne 44 Hispanic 12007 Diabetes
099 John 65 Hispanic 12007 Flu
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Medical Records: De-identify & Release
14

Sensitive

Age Ethnic B Zipcode Disease


07 Caucas 12344 Arthritis
77 Caucas 53211 Cold
28 Caucas 70234 Heart problem
27 Afr-Amer 95505 Flu
27 Afr-Amer 54327 Arthritis
44 Hispanic 12007 Diabetes
65 Hispanic 12007 Flu
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Not sufficient!
15

Sensitive

Uniquely Age Ethnic B Zipcode Disease


identify 07 Caucas 12344 Arthritis
you!
77 Caucas 53211 Cold
28 Caucas 70234 Heart problem
Quasi-identifiers:
reveal less information
27 Afr-Amer 95505 Flu
27 Afr-Amer 54327 Arthritis
44 12007 Hispanic Diabetes
k-anonymity model
65 Hispanic 12007 Flu
Public Database
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
k-anonymity – Problem Definition
16

 Input: Database consisting of n rows, each with m


attributes
 Set of domain values for attributes is finite
 Goal: Suppress some entries in the table such that
each modified row becomes identical to at least k-1
other rows.
 Objective: Minimize the number of suppressed
entries.

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Medical Records: 2-anonymized table
17

Age Ethnic B Zipcode Disease

* Caucas * Arthritis

* Caucas * Cold

* Caucas * Heart problem

27 Afr-Amer * Flu

27 Afr-Amer * Arthritis

* Hispanic 94042 Diabetes

* Hispanic 94042 Flu


PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Accessing databases privately (Access privacy)
18

Patent-DB

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
First (naïve) approach
19

 Problem to solve:
USER/  User/Client: no one should know the
CLIENT contents of the query nor the result (not
even the server)
 Observation:
query

Encrypting the communication between


result


client and server might not be sufficient
(Adversary might access decrypted query
if he can get “inside” the database
DB SERVER system and if he can observe disk access)
 Naïve solution:
 Client downloads the entire DB &
executes queries locally – unrealistic
DB solution (size & ownership of data)
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Accessing databases privately (Access privacy)
20

Simple Solution: [Asonov’01]


 Use a „Secure Coprocessor“ (SC)
Proven Hardware properties:
 Cannot “observe” computation from outside
 If tampered self-destruction occurs
 Read entire database per query  O(N)
Database Server
1
Read entire database

2
3
4 Encrypted (Return record x)
5
5
6
7 IBM 4758 Secure
Coprocessor (SC)

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy
Metric (Probabilistic Privacy)
21

 Using (Shannon‟s) entropy definition to “measure” privacy:


 Pi ... Probability of query to access record i
 E ... uncertainty of adversary„s observation
N
E   Pi * ld Pi
i 1

 E is maximal if all Pi‟s have the same value


 i.e. the adversary cannot give some values stored in the db a higher
probability of being accessed than others

 Perfect privacy: E does not change by observations


 Probabilistic Privacy: adversary learns by observation (i.e. increase
probability Pi for some records)
 Goal: minimize learning (i.e. minimize increase of probabilities Pi)
PRECIOSA kick off meeting
Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
PDA – Probabilistic privacy
Security Parameters:
 a … # of (sequential) requests to shuffled & encrypted database;
 b … # of random requests to original database (… includes requested record)
 reshuffling after N/b queries necessary
Database Server
? 1
? 2
? 3
shuffled and encypted database

original database
? 4
? SC 5
? 5
? 6
? 7
? 8
? 9
Query

? 10
0,25

0,2

Probability distribution 0,15


• Each record of original database: P=(1-a/N)/b
• Others: P=(a/N)/(N-b)
0,1

Therefore, no one record can be completely excluded from 0,05

query 0
1 2 3 4 5 6 7 8 9 10
Privacy and context
23

 Combinatorics
 Machine learning
 Use of backround knowledge  linkage attacks

 Cancer
 Breast cancer
 Lung cancer

 Male vs. female

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost
Privacy and context
Challenges
24

 Modeling the domain of ITS


 Ontologies can be used to specify relevant contexts
 Combine contexts with probabilities

 Preventing
 contexts to be identified
 contexts to be combined with individuals
 Apply methods of anonymization and Probabilistic privacy (e.g.
shuffle contexts)
 Shannon‟s entropy definition applicable (normalized)

 Contexts may change with the time (e.g. dense of traffic)


 Pseudonyms (temporary identifiers)

PRECIOSA kick off meeting


Paris, 11.04.2008
HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost