You are on page 1of 78

Privacy

USC CSci430

Dr. Genevieve Bartlett


USC/ISI
Privacy
• The state or condition of being free from
observation.

2
Privacy
• The state or condition of being free from
observation.

Not really possible today…at least not


on the internet.

3
Privacy
• The right of people to choose freely under
what circumstances and to what extent they
will reveal themselves, their attitude, and their
behavior to others.

4
Privacy is not black and white
• Lots of grey areas and points for discussion
• What seems private to you may not seem
private to me
• Three examples to start us off:
– HTTP Cookies
– Google Street View
– Facebook

5
HTTP cookies: What are they?
• Cookies = small text file
• Received from a server, stored on your
machine
– Usually web
• Purpose: HTTP is stateless, so cookies
maintain state for the HTTP protocol
– Eg keeping the contents of your “shopping cart”
while you browse a site

6
HTTP cookies: 3rd party cookies
• You visited your favorite site
unicornsareawesome.com
• unicornsareawesome.com pulls ads from
lameads.com
• You get a cookie from lameads.com, even though
you never visited lameads.com
• lameads.com can track your browsing habits
every time you visit any page with ads from
lameads.com… those might be a lot of pages
7
HTTP cookies: Grey Area?
• 3rd party cookies allow ad servers to
personalize your ads = more useful to you.
Good!
• But
– You choose to go to unicornsareawesome.com =
ok with unicornsareawesome.com knowing about
how you use their site
– Nowhere did you choose to let lameads.com
monitor your browsing habits

8
Short Discussion:
• Collusion: tool to track these 3rd party cookies
• TED talk on “Tracking the Trackers”
– http://www.ted.com/talks/gary_kovacs_tracking_t
he_trackers.html

9
Google Street View: What is it?
• Google cars drive around and take
360° panoramic pictures.
• Images are stitched together and
can be browsed through on the
Internet

10
Google Street View: Me

11
Google Street View: Lots to See

12
Google Street View: Grey Area
• Expectation of privacy?
– I’m in public, I can expect people will see me
• Expectations?
– Picture linked to location
– Searchable
– Widely available
– Available for a long time to come

13
Facebook: What is it?
• Social networking site
– Connect with friends
– Share pictures, interests (“likes”)

14
Facebook: Grey Area
• Who uses Facebook data and how is data
used?
– 4.7 million liked a page about health conditions or
treatments. Insurance agents?
– 4.8 million shared information about dates of
vacations. Burglars?
– 2.6 million discussed recreational use of alcohol.
Employers?

15
Facebook: More Grey
• Security issues with Facebook
• Confusion over privacy settings
• Sudden changes in default privacy settings
• Facebook tracks browsing habits, even if a
user isn’t logged in (third-party cookies)

• Facebook sells user information to ad agencies


and behavioral trackers
16
17
18
Why start with these examples?
• 3 examples: HTTP cookies, Google Street View,
Facebook
– Lots more “every day” examples
• Users gain benefits by sharing data
• Tons of data generated, widely shared and
accessible and stored (for how long?)
• Are users really aware of how and who?

19
for

Delving into More time PBL

• Break into teams


• As a team, choose a topic from:
https://www.eff.org/issues/privacy 
• Note: EFF is a *pro* privacy organization. Feel free to
examine other resources to gain other perspectives
• Prepare a quick presentation of the issues
1. What are the pros/cons of privacy for your issue?
2. What discussion did it bring up?
3. What technologies help or hurt privacy in this area?
4. How does more/less privacy in this area affect security?

20
Today’s Agenda
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

21
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

22
Examples private information
• Tons of information can be gained from Internet use:
– Behavior
• Eg. Person X reads reddit.com at work.
– Preferences
• Eg. Person Y likes high heel shoes and uses Apple products.
– Associations
• Eg. Person X and Person Y are friends.
– PPI (private, personal/protected information)
• credit card #s, SSN, nick names, addresses
– PII (personally identifying information)
• Eg. Your age + your address = I know who you are, even if I’m not given your
name.

23
How do we achieve privacy?
• policy + security mechanisms
• + law + ethics + trust
• Anonymity & Anonymization mechanisms
– Make each user indistinguishable from the next
– Remove PPI & PII
– Aggregate information

24
Who wants private info?
• Governments – surveillance
• Businesses – targeted advertising, following
trends
• Attackers – monetize information or cause
havoc
• Researchers – medical, behavioral, social,
computer

25
Who has private info?
• You and me
– End-users
– Customers
– Patients
• Businesses
– Protect mergers, product plans, investigations
• Government & law enforcement
– National security
– Criminal investigations
26
Privacy and Security
• Security enables privacy
– Data is only as safe as the system its on

• Sometimes security at odds with privacy


– Eg. Security requires authentication, but privacy is
achieved through anonymity
– Eg. TSA pat down at the airport

27
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

28
Why do we want to share?
• Share existing data sets:
– Research
– Companies
• Buy data from each other
• Check out each other’s assets before merges/buyouts
• Start a new dataset:
– Mutually beneficial relationships
• Share data with me and you can use this service

29
Sharing everything?
• Easy, but what are the ramifications?
• Legal/policy may limit what can be
shared/collected
– IRBs: Institutional Review Board
– HITECH & HIPAA: Health Insurance Portability and
Accountability Act
• Future use and protection of data?

30
Mechanisms for limited sharing
• Remove really sensitive stuff (sanitization)
– PPI & PII (private, personal & private identifying)
– Without a crystal ball, this is hard
• Anonymization
– Replace information to limit ability to tie entities
to meaningful identities
• Aggregation
– Remove PII by only collecting/releasing statistics

31
Anonymization Example
• Network trace:

PAYLOAD

32
Anonymization Example
• Network trace:

PAYLOAD

All sorts of PII and PPI in there!

33
Anonymization Example
• Network trace:

PAYLOAD

Routing information: IP addresses, TCP flags/options, OS fingerprinting

34
Anonymization Example
• Network trace:

PAYLOAD

Remove IPs? Anonymize IPs?

35
Anonymization Example
• Network trace:

PAYLOAD

Removing IPs severely limits what you can do with the data.
Replace with something identifying, but not the same data.

IP1 = A
IP2 = B
Etc.

36
Aggregation Example
• “Fewer U.S. Households Have Debt, But
Those Who Do Have More, Census Bureau
Reports”

37
Methods can be bad or good
• Just because someone uses aggregation or
anonymization, doesn’t mean the data is safe
• Example:
– Release aggregate stats of people’s favorite color?

38
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

39
What is Inferred?
• Take 2 sources of information, correlate data
• X + Y = ….
• Example: Google Street View + what my car
looks like + where I live = you know where I
was back in November

40
Another example
• Paula Broadwell who had an affair with CIA
director David Petraeus, similarly took
extensive precautions to hide her identity. She
never logged in to her anonymous e-mail
service from her home network. Instead, she
used hotel and other public networks when
she e-mailed him. The FBI correlated hotel
registration data from several different hotels
-- and hers was the common name.
41
Another example: Netflix & IMDB
• Netflix prize: released an anonymized dataset
• Correlated with IMDB: undid anonymization
(University of Texas)

42
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

43
What is social networking data?
• Associations
• Not what you say, but who you talk to

OMG NEW BOYFRIEND

44
Why is social data interesting?
• From a privacy point of view:
– Guilt by association
– Eg. Government very interested
• Phone records (US)
• Facebook activity (Iran)

45
Computer Communication
• Computer communication = social network
• What sites/servers you visit/use = information on
your relationship with those sites/servers

You Unicornsareawesome.com

• Never mind the content…How often you visit and


who you visit may reveal a lot!
46
How do we provide privacy?

• Of course encrypt content (payload)!


• But: Network/transport layer = no encryption
• (for now)
• Anyone along the path can see source and
destination… so now what?
47
Onion Routing
• General idea: bounce connection through a
bunch of machines

48
Don’t we bounce around already?

Not actually what happens……


49
Don’t we bounce around already?

Closer to what actually happens.


50
Don’t we bounce around already?
• Yes, we route packets through a series of
routers
• BUT this doesn’t protect the privacy of who’s
talking to whom…
• Why? PAYLOAD

51
Don’t we bounce around already?
• Yes, we route packets through a series of
routers
• BUT this doesn’t protect the privacy of who’s
talking to who…
• Why? ENCRYPTED

Contains routing information.

52
Yes, we bounce… but:
• Everyone along the way can see src & dst
• Routes are easy to figure out

ENCRYPTED

Contains routing information = Can’t encrypt


Everyone along the path (routers and observers) can see who is talking to whom

53
Onion routing saves us
• Each router only knows about the last/next
hop
• Routes are hard to figure out
– Change frequently
– Chosen by the source

54
The Onion part of Onion Routing
• Layers of encryption

PAYLOAD

Last hop’s key

Second hop’s key

First hop’s key


55
Onion Routing Example: Tor

You

Unicornsareawesome.com
56
Onion Routing Example: Tor

You Tor Router IPs + public key for each router


Tor directory

Get a list of Tor Routers from the publically known Tor directory

57
Onion Routing Example: Tor
Tor Routers

You

Unicornsareawesome.com
58
Onion Routing Example: Tor

You 1st

2nd

3rd

Choose a set of Tor routers to use


Unicornsareawesome.com
59
Onion Routing Example: Tor

You 1st

2nd

3rd

Packets are now encrypted with 3 keys Unicornsareawesome.com


60
Onion Routing Example: Tor
Source: YOU, Dest: 1st Tor router

You 1st

2nd

3rd

Unicornsareawesome.com
61
Onion Routing Example: Tor

Decrypts 1st layer


You 1st

2nd

3rd

Unicornsareawesome.com
62
Onion Routing Example: Tor

Source: 1st Tor router, Dest: 2nd Tor router

You 1st

2nd

3rd

Unicornsareawesome.com
63
Onion Routing Example: Tor

You 1st

Decrypts 2nd layer


2nd

3rd

Unicornsareawesome.com
64
Onion Routing Example: Tor

You 1st

2nd

Source: 2nd Tor router, Dest: 3rd Tor router

3rd

Unicornsareawesome.com
65
Onion Routing Example: Tor

You 1st

2nd

Decrypts last layer


3rd

Unicornsareawesome.com
66
Onion Routing Example: Tor

You 1st

2nd Source: 3rd Tor router, Dest:


Unicornsareawesome.com

3rd

Original (unencrypted) packet sent to server.


Unicornsareawesome.com
67
What does our attacker see?

You

Encrypted traffic from You, to 1st Tor router


68
What does our attacker see?

You

Other view points? Not easily traceable to you.


69
What does our attacker see?

Global view points? Very unlikely... But if so… trouble!


70
What does our attacker see?

Also unlikely… can perform correlation between end-to-end. 


71
Reliance on multiple users

You

What would happen here if You were the only one using Tor?
72
Side note: Tor is an overlay

Tor routers are often just someone’s regular machine.


Traffic is still routed over regular routers too. 73
Onion Routing: Things to Note
• Not perfect, but pretty nifty
• End host (unicornsareawesome.com) does not
need to know about the Tor protocol (good for
wide usage and acceptance)
• Data is encrypted all the way to the last Tor
router
– If end-to-end application (like HTTPS) is using
encryption, the payload is doubly encrypted along
the Tor route.
74
Onion Services (Hidden Services)
• Built on Tor circuits
• Key feature – who’s hosting the service and who’s
visiting it – both anonymous
• One of several technologies of the “dark” web
– Dark typically refers to services accessed via
P2P/rendezvous
– Deep web is anything not directly indexable (including
dynamic pages, password protected etc)
• https://www.torproject.org/docs/onion-
services.html.en
75
• Privacy and Privacy & Security
• How do we “safely” share private data?
• Privacy and Inferred Information
• Privacy and Social Networks
• How do we design a system with privacy in
mind?

76
Designing privacy preserving systems
• Aim for the minimum amount of information
needed to achieve goals
• Think through how info can be gained and
inferred
– Inferred is often a gotcha! x + y = something private,
but x and y by themselves don’t seem all that special
• Think through how information can be gained
– On the wire? Stored in logs? At a router? At an ISP?

77
Privacy and Stored Information
• Data is only as safe as the system
• How long is the data stored affects privacy
• Longer term = bigger privacy risk (in general)
– Longer time frame, more data to correlate & infer
– Longer opportunity for data theft
– Increased chances of mistakes, lapsed security etc.

78

You might also like