You are on page 1of 76

xrds

Crossroads The ACM Magazine for Students FA LL 2013 v ol .2 0 • no.1 xrds.acm.org

The
Complexities
of Privacy
and
Anonymity
Why You Should Care
About Cryptocurrencies
Understanding
the Public vs. Private
Debate
Learning @ ScaLe
ATLANTA, GEORGIA MARCH 4–5 2014

Call for Papers


Learning @ ScaLe iMPortant DateS
ACM will host the first acM conference
on Learning at Scale to be held March noVeMber 8, 2013
4-5, 2014 in Atlanta, GA. Paper submissions due.
Inspired by the emergence of Massive
Open Online Courses (MOOCs) and the noVeMber 8, 2013
shift in thinking about education, ACM Tutorial proposals due.
created this conference as a new scholarly
DeceMber 23, 2013
venue to explore how learning and teaching
Notification to authors of accepted full papers.
can improve when done “at scale.”
JanuarY 2, 2014
about the conference Work-in-progress submissions due.
Learning at Scale refers to new approaches
JanuarY 14, 2014
for students to learn and teachers to
Notification to authors of accepted work-in-progress.
teach, when engaging hundreds or even
thousands of students; be it face-to-face or JanuarY 17, 2014
remotely, synchronous or asychronous. ALL revised and camera-ready material due.
Topics will include (but are not limited
to) Usability Studies, Tools for Automated
Feedback and Grading, Learning Analytics,
Investigation of Student Behavior and
Correlation with Learning Outcomes, New
Learning Tools and Techniques at Scale.

caLL for PaPerS


Authors are encouraged to visit the
Learning @ Scale website
http://learningatscale.acm.org/
➤ for details about the conference and

guidelines for paper submissions,


including an array of subject examples
to explore.
➤ Authors are encouraged to consider submitting a full paper (10-pages max); a tutorial proposal

for a 1-4 hour discussion on a relevant tool, technology, or methodology related to learning at
scale; or a work-in-progress poster or demo.
➤ Papers must tackle topics “at scale.”
Be more than a word on paper.
UNI FRESHMAN
QUALS MENTORS MAJOR
CUM LAUDE EXAMS PROFESSORS
HIGH SCHOOL GRADUATION TUMBLR
COMP SCI THESIS TECHNOLOGY GLOBAL
SEMESTER XRDS QUALITY JUNIOR
VOLUNTEER EDUCATION COLLEGE
VOICE EDITORS CHANGE
SENIOR FINALS
TECH
COMMUNITY CREATIVITY CS
CHALLENGE TEAM WORK LAB ACM
LECTURE
DINING HALL
RECOGNITION SOPHOMORE
FACEBOOK CONTROL GRAD SCHOOLS
PRECEPT PHD SCHOLARSHIP
ACADEMIA DISSERTATION DEFENSE
LIBRARY ADVISOR TWITTER
SPRING BREAK

Join XRDS as a student editor and


be the voice for students worldwide.
If you are interested in volunteering as a
student editor, please contact xrds@hq.acm.org with
"Student Editor" in the subject line.

XRDS
Crossroads
The ACM Magazine for Students
FA L L 2 01 3 v ol . 2 0 • no . 1

begin
05 letter from the editors

08 INBOX

09 INIT
Keeping Your Little Back Shop
By Maire Byrne-Evans
and Christine Task

10 benefit
XRDS Mobilizes
By Debarka Sengupta

10 advice
Managing Your Time
By Vaggelis Giannikas

11 updates
Revitalizing ACM Student Chapters
By Michael Zuba

12 BLOGS
Algorithms Fit for Compilation?
By Olivia Simpson
Habits: Our cognitive shortcut
By Gidi Nave
Coordination When Information
is Scarce: How privacy can help
By Aaron Roth
The New Firefox Cookie Policy
By Jonathan Mayer

Cover Illustration by Patrick George

2 XRDS • fall 2013 • Vol.20 • No.1


The Complexities Of Privacy and Anonymity

22 32 69

features end
18 feature 45 feature 62 labz
What is Public and Private Anyway? The Tor Project: An inside view CyLab Usable Privacy
A Pragmatic Take on By Kelley Misata and Security Laboratory
Privacy and Democracy By Rich Shay
By Andreas Birkbak 48 feature
It’s Not About Winning, 64 back
22 FEATURE It’s About Sending a Message: WLAN Security
Something Bad Might Happen: Hiding information in games By Finn Kuusisto
Lawyers, anonymization, and risk By Philip C. Ritchey
By Marion Oswald 65 hello world
53 feature Zero-Knowledge Proofs
27 feature  n Illustrated Primer
A By Marinka Zitnik
Personal, Pseudonymous, in Differential Privacy
and Anonymous Data: By Chrisine Task 68 eventS
The problem of identification
By Iain Bourne 58 INTERVIEW 70 acronyms
Cynthia Dwork on
Image for page 32 by Vyacheslav Pokrovskiy. Image for page 69 by S. Borisov.

32 feature Differential Privacy 70 pointers


Talking ‘Bout Your Reputation By Michael Zuba
By David Birch 72 bemusement
60 profile
36 FEATURE Jessica Staddon: Managing
Understanding the Google’s privacy research
Data Environment By Adrian Scoică
By Elaine Mackey and Mark Elliot

40 FEATURE
What is Bitcoin?
By Dominic Hobson

XRDS • fall 2013 • Vol.20 • No.1 3


1962 CR Arrow Circle Ad-Quarter-F.pdf 2/9/2012 4:59:08 PM

Computing Reviews
is on the move E DI T ORI A L B O A RD A d v is ory B o ard SUB S C RIBE
Editors-in-Chief Subscriptions ($19
Peter Kinnaird Mark Allman, per year includes XRDS
Carnegie Mellon International Computer electronic subscription)
University, USA Science Institute are available
Bernard Chazelle, by becoming an
Inbal Talgam-Cohen ACM Student Member
Stanford University, USA Princeton University

C
Our new URL is Departments Chief
Laurie Faith Cranor,
Carnegie Mellon
www.acm.org/
membership/student
Vaggelis Giannikas Non-member
M ComputingReviews.com University of Cambridge,
UK
Alan Dix,
Lancaster University
subscriptions:
$80 per year
David Harel, http://store.acm.org/
Y
Weizmann Institute acmstore
Issue Editors of Science
CM
Maire Byrne-Evans ACM Member Services
University of Panagiotis Takis Metaxas , To renew your ACM
Southampton, UK Wellesley College membership or XRDS
MY subscription, please send
Christine Task Noam Nisan, Hebrew a letter with your name,
CY
Purdue University, USA University Jerusalem address, member number
Issue Feature Editor Bill Stevenson , and payment to:
CMY Richard Gomer Apple, Inc. ACM General Post Office
University of P.O. Box 30777
Southampton, UK Andrew Tuson,
K City University London New York, NY
10087-0777 USA
Jeffrey D. Ullman,
Feature Editors InfoLab, Stanford
Jed Brubaker University Postal Information
University of California XRDS (ISSN# 1528-4981)
Irvine, USA Moshe Y. Vardi, is published quarterly in
Rice University spring, winter, summer
Erin Claire Carson and fall by Association for
University of California Computing Machinery,
Berkeley, USA E dit oria l S TA F F 2 Penn Plaza, Suite 701,
Ryan Kelly Director, Group New York, NY 10121.
University of Bath, UK Publishing Application to mail at
Scott E. Delman Periodical Postage rates
Hannah Pileggi is paid at New York, NY
Georgia Institute of XRDS Managing Editor & and additional mailing
A daily snapshot of what is new and hot in computing. Technology, USA Senior Editor at ACM HQ offices.
Denise Doig
Michael Zuba
University of Connecticut, Production Manager POSTMASTER: Send
USA Lynn D’Addessio addresses change to:
Art Direction XRDS: Crossroads ,
JOIN THE INNOVATION. Department Editors
Andrij Borys Associates, Association for
Arka Bhattacharya Andrij Borys, Computing Machinery,
Qatar Computing Research National Institute of Mia Balaquiot 2 Penn Plaza, Suite 701,
Technology, India New York, NY 10121.
Institute seeks talented scientists and Director of Media Sales
software engineers to join our team Luigi De Russis Jennifer Ruzicka Offering# XRDS0171
Politecnico di Torino, Italy ISSN# 1528-4972 (print)
and conduct world-class applied jen.ruzicka@acm.org
ISSN# 1528-4980
research focused on tackling Rohit Goyal Copyright Permissions (electronic)
large-scale computing challenges. West Chester East High Deborah Cotton
School, USA permissions@acm.org Copyright ©2013 by the
We offer unique opportunities for a John Kloosterman Public Relations Association for Computing
strong career spanning academic and University of Michigan, Coordinator Machinery, Inc. Permission
applied research in the areas of Arabic We also welcome applications for USA Virginia Gold to make digital or hard
post-doctoral researcher positions. copies of part of this work
language technologies including Finn Kuusisto for personal or classroom
natural language processing, University of use is granted without fee
Wisconsin-Madison, USA ACM
information retrieval and machine As a national research institute Association for provided that copies are
translation, distributed systems, data and proud member of Qatar Founda- Ashok Rao Computing Machinery not made or distributed
University of 2 Penn Plaza, for profit or commercial
analytics, cyber security, social tion, our research program offers a advantage and that copies
Pennsylvania, USA Suite 701
computing and computational science collaborative, multidisciplinary team New York, NY bear this notice and the
and engineering. environment endowed with a compre- Debarka Sengupta 10121-0701 USA full citation on the first
hensive support infrastructure. Indian Statistical +1 212-869-7440 page or initial screen of
Institute, India the document. Copyrights
Scientist applicants must hold (or C ON TA C T for components of this
Successful candidates will be offered a Adrian Scoică General feedback:
University of Cambridge, work owned by others than
will hold at the time of hiring) a PhD highly competitive compensation xrds@acm.org ACM must be honored.
degree, and should have a package including an attractive UK
For submission Abstracting with credit
compelling track record of tax-free salary and additional benefits Marinka Zitnik guidelines, please see is permitted. To copy
University of Ljubljana, otherwise, republish, post
accomplishments and publications, such as furnished accommodation, Slovenia
http://xrds.acm.org/
on servers, or redistribute
strong academic excellence, excellent medical insurance, generous authorguidelines.cfm
requires prior specific
effective communication and Marketing Editor
annual paid leave, and more. Casey Fiesler
permission and a fee.
collaboration skills. Permissions requests:
Georgia Institute PUBLIC AT IONS BOA RD
For full details about our vacancies of Technology, USA Co-Chairs
permissions@acm.org.
Software engineer applicants must and how to apply online please visit Ronald F. Boisvert
hold a degree in computer science, http://www.qcri.qa/join-us/ Web Editor and Jack Davidson
Shelby Solomon Darnell
computer engineering or related field; For queries, please email Clemson University, USA Board Members
MSc or PhD degree is a plus. QFJobs@qf.org.qa Nikil Dutt, Carol Hutchins,
Joseph A. Konstan,
Ee-Peng Lim,
Catherine C. McGeoch,
/QCRI.QA @QatarComputing QatarComputing QatarComputing www.qcri.qa M. Tamer Ozsu,
Vincent Y. Shen,
Mary Lou Soffa

4 XRDS • fall 2013 • Vol.20 • No.1


LETTER FROM
LETTER FROM THE
THE EDITORs
EDITORs

Equip Yourself
for Creativity
A
privacy issue of XRDS couldn’t be better timed. Given the rapid and continuing
revelations about the NSA from Edward Snowden, anything we write here might
be out of date by the time it reaches you. One piece that we feel will remain
relevant—unless, and until, a paradigm shift occurs in the mathematics behind
cryptographic technologies and/or in the international culture around privacy and
human rights—appeared recently in ACM Queue. In his column titled “More Encryption
Is Not the Solution,” Poul-Henning The foremost things that the Hu- cessful graduate students we know
Kamp makes a rigorous argument that man-Computer Interaction Ph.D. Pro- what was the single most valuable
is well worth a read. Instead of focusing gram at Carnegie Mellon seek in appli- course they took in undergrad. Top-
directly on the issue at hand, we wish cants are past academic achievement ping the list were philosophy courses
to start a discussion we hope will equip and creativity—not incidentally the covering classics like Plato, Socrates,
readers to come up with out of the box subject of the previous issue of XRDS. Descartes, Sartre, and Kierkegaard.
solutions to the privacy problem, or any They tell prospective students CMU These philosophers speculated on
other for that matter. won’t be able to supply more creativity the nature of reality, how we trust
and that these two factors are the most that others aren’t mere figments of
important factors in predicting a suc- our imagination, and more. The value
Upcoming Issues cessful research career. of such courses often does not come
While a Ph.D. program is about dig- only from the specific ideas they
Winter 2013 ging deep, research supports the idea cover, but from the way they are de-
[December issue] that having a breadth of knowledge scribed and discussed.
enhances creativity. This idea isn’t In our unofficial poll we found a
Wearable Computing
new; it’s the subject of numerous con- surprising number of computer sci-
Article deadline: September 15, 2013 ference keynote addresses and gradu- ence majors enjoy philosophy cours-
ation ceremony speeches. It’s easy to es. Both fields focus on conjuring
Spring 2014 suggest to someone they should try to rigorous logic from the thoughts of
[March issue] learn about lots of different things. But the writer. Computer scientists write
actually getting someone (yourself) to code that either compiles and runs
Off the (Academic) Grid Computing
do that is somewhat harder. correctly, or does not. The creative
Article deadline: December 2, 2013 To get some motivation, we asked challenge is in sorting out how to go
a couple of the most creative and suc- from nil to a functional program. In

XRDS • fall 2013 • Vol.20 • No.1 5


interactions’ website FEATURES

interactions.acm.org,
is designed to
BLOGS
capture the influen-
tial voice of its
print component in FORUMS

covering the fields


that envelop the DOWNLOADS

study of people and


computers.
The site offers a
rich history of the conversa-
tions, collaborations, and
discoveries from issues past,
present, and future.
Check out the current issue,
follow our bloggers, look up
a past prototype, or discuss
an upcoming trend in the
communities of design and
human-computer interaction. 

interactions.acm.org

Association for
Computing Machinery
philosophy, it’s up to the reader to amazed at the ones we hear about that
figure out if there are any “bugs in come from off the academic grid.
the code”; that is, does the argument Paola Santana is a cofounder of
make sense? What are the assump- Matternet, which will maintain a fleet
tions it relies on (the “operating sys- of drones (autonomous aircrafts) to de-
tem,” if you will)? It’s not a coinci- liver medicine to villages in the devel-
dence that one of the most profound oping world where there are no physi-
ACM
ACM Conference
Conference philosophical results of the 20th cen- cal roads leading to them.
tury came from logician Kurt Goedel, Marc Roth is starting a business
Proceedings
Proceedings who had a thing or two to say that that’s outfitting shipping containers
informed modern computer science (like the ones that are used on trains
Now
Now Available via
Available via theoretical work as well. and cargo ships) with computers and
Print-on-Demand!
Print-on-Demand! Training your mind to root out logi-
cal fallacies and recognize assump-
3-D printers. He’s training homeless
people to operate them and sell their
tions behind arguments is an enor- products for a source of income.
Did you know that you can mously powerful exercise that will Aereo was sued for taking over the
help you to think critically about all air television and allowing custom-
now order many popular
knowledge you encounter. Philosophy ers to stream it online. Broadcasters
ACM conference proceedings (and no doubt other fields) will give argued in court that this wasn’t the
via print-on-demand? you the toolbox you need to think criti- intended use of over-the-air TV, be-
cally about ideas from just about any- cause they expect each viewer to have
where else.1 their own antenna. Aereo promptly
Institutions, libraries and Students, and in particular gradu- installed millions of tiny antennas on
individuals can choose ate students, are accustomed to hear- their server.
ing about lots of creative projects com- Blueseed.co is considering a novel
from more than 100 titles ing out of academia. We’re continually way to avoid the mess of paperwork
on a continually updated associated with immigrating to the
list through Amazon, Barnes 1 For those interested in startups and busi-
U.S. They plan to park a cruise ship
off the coast of San Francisco and
& Noble, Baker & Taylor, ness, this is likely similar to the notion of Lat-
helicopter international executives
ticework upheld by Warren Buffett’s partner
Ingram and NACSCORP: Charles Munger. between their Silicon Valley jobs and
CHI, KDD, Multimedia, international waters every day get-
ting them to and from work. (We’re
SIGIR, SIGCOMM, SIGCSE, planning to cover off the grid ideas
SIGMOD/PODS, like these in a future issue, so feel
and many more. free to send us any cool pointers.)
It’s definitely not clear that any of
these are good ideas. What is clear
For available titles and is that they are truly out of the box
ordering info, visit: answers to difficult problems. Pull-

librarians.acm.org/pod It’s not a coincidence ing them off requires a great deal

that one of of expertise in one area as well as a


breadth of knowledge. Fundamen-
the most profound tally, providing that kind of breadth

philosophical results is what magazines like XRDS strive


to achieve. Each issue brings you a
of the 20th century number of articles from a field that is
came from probably not your own, which we re-
cruit and curate in an effort to deliver
logician Kurt Goedel, high-quality content that will spawn
who had a thing ideas by keeping you well informed.
And when you’re done with this issue
or two to say of XRDS, go read some philosophy :)
that informed —Peter
— Kinnaird and
modern computer Inbal Talgam-Cohen
science theoretical
work as well.
XRDS • fall 2013 • Vol.20 • No.1 7
begin
inbox

Thanks to @XRDS_ACM A articles and your efforts


AROUND THE WEB fascinating article on “The Editor’s Reply are worth appreciating.
well-programmed clavier: Hello Pat! We just launched Hope that you will keep up
If you’re curious about how style in computer music our new mobile apps. They this good work. I had
the Twitter API works, stay composition” might be closer to what you a criticism about the
tuned for my article in this —A run Tripathi, Twitter had in your mind. Happy to authors’ demography for
summer’s issue of the @ (@BitterRancor) hear your comments the last issue, but this one
XRDS_ACM magazine. ˲˲ iTunes: is quite perfect from all
—Robert Aboukhalil, PhD Is there any way to access http://bit.ly/130Lnyk perspectives.
student @ CSHL, Editor-in- the magazine as an ebook ˲˲ Google play: I suggest including a
Chief @TechnophilicMag, (EPUB or similar; *not* http://bit.ly/1046Ba2 piece on deep learning,
Twitter (@RobAboukhalil) PDF!)? If not, there should ˲˲ Amazon: which is becoming quite
be—the HTML site is http://amzn.to/15a7xS6 catchy these days.
@XRDS_ACM your latest hard to navigate and PDFs Best wishes,
mag on Sci Computing are not accessible to Malay
Photograph by Sergey Nivens

has winged its way vision-impaired users. blast from the past Ex-XRDSian (2009-12)
to Australia, lks like Plus many people these
How to contact XRDS: Send a letter to
another great read :) days prefer to read Dear XRDSians, the editors or other feedback by email
—Patrick Sunter, Public- You are going beyond my (xrds@acm.org), on Facebook by posting
reflowable text on on our group page (http://tinyurl.com/
spirited urban researcher their e-reader of choice. expectations. The Scientific XRDS-Facebook), via Twitter by using
#xrds in any message, or by post to
and computer geek, —Pat Kujawa, Computing issue is really ACM Attn: XRDS, 2 Penn Plaza, Suite 701,
Twitter (@PatSunter) Facebook fabulous. I enjoyed all the New York, New York 10121, U.S.

8 XRDS • fall 2013 • Vol.20 • No.1


init

Keeping Your Little Back Shop


A
s with any good dis­ the attacker. Is what you re- their citizens. Elaine Mackey into the chaos of the world,
cussion of a com­ ally want anonymity or pri- and Mark Elliot ask how ef- take a rigorous stand on a
plicated issue, we vacy? Or is it pseudanonym- fectively can we anonymize problem, and turn a piece
should start by de- ity? What exactly do those and yet release data that in- of the mess into something
fining our terms. What do words mean? Dave Birch forms and provides knowl- quite manageable. We have
we mean by privacy? By ano- asks whether we could move edge of crime, education, a few shining examples here:
nymity? By disclosure, inva- to a world where transac- or health? Are there places Philip C. Ritchey provides a
sion, or reasonable protec- tions are covered by creden- where the information en- clever technique for sending
tion? As we draw a few lines tials, rather than identity, to vironment heightens the provably, private messages
with some assurance, we protect privacy. risk of statistical disclosure? encoded in simple games
want you to join us in giving Is anonymity sufficient What effect does the world of like tic-tac-toe; and Domi-
them some serious thought. protection to provide pri- big data have on this? What nic Hobson describes how
An ex-admirer relentless- vacy, anyway? We can take effect might transparency Bitcoin provides privacy to
ly tracking down your real- its users.
world information through And as a hopeful finale,
supposedly protected on- we have an interview with
line spaces, and finding cre- Cynthia Dwork, the inven-
ative ways to terrorize your tor of differential privacy.
waking life, is an invasion Differential privacy is an
of privacy. This can and approach to data mining
does happen to victims of that provides a strong math-
cyber-stalking. ematical guarantee that the
If it’s anonymity you privacy of individuals in the
want for protection, there data set will be protected.
are tools that can help. Want to see how it’s done?
Tools that are very sophis- your name off the data and have when information is Preceding the interview is
ticated and incredibly make sure it doesn’t include routinely released, rather a brief stick-figure comic
important for protecting your address. However, there than gathered under FOIA? Christine Task created to
people such as dissident are lots of ways to identify How do we quantify risk? help explain the trick. It’s
journalists in totalitarian someone. Your health issues, Marion Oswald asks how her inaugural oeuvre in that
states, where privacy is a frequent geographical loca- can we assess risk, the likeli- medium, and she hopes you
matter of life or death. Tor tions, or favorite websites can hood of something happen- find it entertaining.
is perhaps the best known all be used to help pick you ing and the impact if it does? There are complicated
of these. Kelley Mistata dis- out of an anonymized crowd. And Iain Bourne asks how do problems to be addressed
scusses how Tor works and While there are many argu- we change data protection as our world evolves, techni-
why it’s important. ments to be made that our law, when we not only need cally, culturally, and in poli-
But an anonymous on- new interconnected world information to be available, cy. The students of today are
line world has consequenc- provides too little privacy, but protected? Is it possible likely to be the ones who will
es too; as with any weapon, Andreas Birkbak points out to have a transparent society be addressing them, with
what protects the victim can it may also provide too much. while not becoming trans- their voices, their votes, and
also be taken and used by From a policy perspective, parent citizens? their technical expertise.
anonymity has quite serious The real world is messy, That’s you. And as you’ll see,
* Building on Michel de Montaigne effects on transparency, for almost unavoidably. There you’re going to be arriving on
(Essays, 1588), we suggest, “We
example. If governments are very big complicated the scene just in time. So pay
must keep a little back shop
where we can be ourselves with- release data on their own questions to untangle. Oc- attention, there will be a test.
out reserve. In solitude alone can performance, this will inevi- casionally, just occasionally, —Maire
— Byrne-Evans and
we know true freedom.” tably involve data about us— a mathematician can step Christine Task, Issue Editors

XRDS • fall 2013 • Vol.20 • No.1 9


begin

768 bits
The largest RSA key known
30,000
The approximate number of accounts Google released
to have been brute-forced. information about to the U.S. government in 2012.

benefit advice

XRDS
Mobilizes
Managing Your Time
L
et me ask you a very simple and when. Find a proper system and make
To make content more short question. Do you feel you sure it can help you at least capture your
accessible to the student manage your time well? If the tasks and schedule. Old-style notepads
readers, XRDS has recently answer is yes, then you should and diaries can do the job too.
launched its new mobile be the one writing this article; not me. I 3. Don’t be afraid of drafts. If you think
app. The app will enable mean it! Personally, I have issues when about it, drafts are amazing. It is like
Android and iOS users to it comes to time management and until giving yourself the opportunity to do
browse the magazine on the today I haven’t met anyone who doesn’t. something of ambiguous quality with-
go from their Android or But if you truly don’t, please drop me an out having anyone judge you. Drafts for
iOS enabled smart phones email and I’ll publish your advice on students are like rehearsals for actors or
or tablets. time management in one of our future practice sessions for athletes. You need
Each issue of XRDS issues. In the meantime, even if I am to use them in your favor. Even if you
features a theme, such not an expert in the area, here are some have to submit your draft to somebody
as “The Role of Academia tips that might help you improve. such as your supervisor, worst-case sce-
in the Startup World” 1. Time is not manageable. I am not a nario he will give you some harsh feed-
or “Big Data,” with coverage physicist but time used to be, still is, back that will help you improve your
of research trends and and seems like it is going to be perfectly draft. No evaluation, no marks, nothing
interviews. Students can managed. Each hour has 60 minutes to be afraid of.
also read timely updates and each minute has 60 seconds and 4. Consider the possibility of saying no.
on upcoming major there is nothing you can do to control Yes, this is a real possibility. One you
conferences, grants, it (at least in practice). My point is the can use and be acceptable to the other
fellowships, and contests. first step is to realize that you don’t have party. Even if it might not always sound
Even better, students can to manage time. You have to manage nice, the answer to “could you please...”
now get all this in a more yourself—your own activities—in order can be negative; perhaps in a more kind
convenient way—their to make sure you can accomplish them and friendly way, but still negative. Hav-
hand-held devices. in the available time. ing said that, you need to be careful
The application 2. To-do lists and prioritization. OK, a when and how to deny your help/ser-
is powered by list might be a bit too obvious, but it vice. Make sure to consider “no” as one
Godengo+Texterity. is one of the fundamental things you of your available options before making
With these newly should have a system for. As people who your final decision.
launched apps, XRDS enjoy computing, I am sure you can find 5. Just do it. Do you know how many
wishes to extend its the right software that can help you “time management” workshops, semi-
reach among the student write down what you have to do and nars, books, and tutorials you can find
community. So don’t delay; out there? A simple Google search will
get your favorite magazine give you more than a billion results
on your smart phone with websites full of tips on how to
and recommend your effectively manage your time. Do you
friends do the same. know what’s the only way to make them
The links to the app are: work? Do them! Choose some, try them,
˲˲ iTunes: adopt them, and use them in your every-
http://bit.ly/130Lnyk day life. Believe me, I have spent a lot
˲˲ Google Play: of my time trying to improve my time
Photograph by IR Stone

http://bit.ly/1046Ba2 management skills, but there is a huge


˲˲ Amazon: gap between deciding to use them and
http://amzn.to/15a7xS6 putting them into practice. Simliar to
—Debarka
— Sengupta deciding to go to the gym...
—Vaggelis
— Giannikas

10 XRDS • fall 2013 • Vol.20 • No.1


1,462
The number of privacy policies the average American views
per year. Fun fact: Facebook’s privacy policy is more than
9,500 words long.

updates

Revitalizing ACM Student Chapters


A Look at How to Refresh Student Initiatives

R
un n i ng a n ACM group of students to form an
student chapter is a active computing commu-
demanding task. In nity. In the first semester of
a world where com- the chapter’s reformation,
puting has touched nearly the student leaders, with
every aspect of our daily help from the UML Alumni
lives, it still seems difficult Association, have managed
to bring students togeth- to provide strong initiatives
er. Further, given the over for their student communi-
stimulating environment ty. These events ranged from
in which students are im- a speaker series to tutorial Rich Miner’s visit to UML as part of ACM’s speaker series.
mersed while pursuing their sessions such as an Android
studies, it appears there is a Development night, dur- chapter can help to overcome from the chapter’s previous
natural occurrence of “roller ing which students learned this divide: “We believe that development night has pro-
coaster” based involvement to create Android applica- the ACM chapter is a place vided a well-valued insight
and interest in ACM student tions. The chapter hosted for students to connect into the common interests
initiatives. As the torch is an impressive list of speak- with common interests. We of the student community.
passed down every few years ers including Google’s Rich are not just a computer sci- When asked for advice
to the next generation of stu- Miner, start-up guru Abby ence club—we have 43 active on how to build an ACM
dents, the newly appointed Fichtner, cyber-security ex- members from various fields, community, Ms. O’Neal of-
ACM student chapter lead- pert Gary Miliefsky, and in- including math, English, fered the following words
ers are faced with the task of teractive fiction author An- engineering, and business of wisdom: “Never give up.
trying to appeal to the con- drew Plotkin. backgrounds. Going forward, What we have learned over
stantly changing interests Newly appointed Presi- we would love to see more the past six months is that
and directions of their peers dent of the ACM chapter at disciplines and backgrounds there will be events in which
and professional commu- UML, Shawna O’Neal, ex- come together. There is seri- attendance is high and feed-
nity. In this issue of XRDS we plained some of the issues ous potential to rebuild our back is great, and converse-
solicited student chapters their ACM chapter faces: campus community.” ly, events in which only two
who were working hard to “UML is a unique campus Moving forward, the ACM people attend or technical
revitalize their student in- where students are sepa- student chapter at UML difficulties interfere with
volvement and bring a new rated by location and major. plans to continue its speak- the event itself. However, as
face to their student chapter. The technolog y, science, er series after the positive difficult as it may be to pull
The ACM student chap- and business domains are student feedback provided students out of their dorms,
ter at the University of Mas- on the north side of the Mer- from the previous events. or to convince commuter
sachusetts Lowell (UML) rimack River, while the hu- Additionally, there is inter- students to stay on campus
has been around since the manities, social sciences, est in starting a “Game De- late—it is absolutely pos-
1990s, but has suffered from and arts are on the south velopment Night,” which the sible to create a successful
several years of inactivity. side. We also have a heavy chapter hopes will showcase chapter. Keep trying.”
Recently this past spring, commuter base on our cam- student-created gaming ap- If you would like to read
dedicated students working pus. What this means for us plications that can be used more about ACM student ini-
with faculty advisor Dr. Jesse is that our sense of commu- by the UML community for tiatives at UML, you can visit
Heines have refreshed their nity can suffer.” However, leisurely fun and to solicit their website at the follow-
student chapter in efforts given these challenges, Ms. development advice from ing link: http://umlacm.org.
to bring together a diverse O’Neal explains an ACM other students. Feedback —Michael
— Zuba

XRDS • fall 2013 • Vol.20 • No.1 11


begin

The XRDS blog highlights a range of topics from open access to


neuroscience. Selected blog posts, edited for print, will be featured
in every issue. Please visit xrds.acm.org/blog to read each post
in its entirety. For this issue we have included two guests posts
on privacy and anonymity, which have also been edited for print.

BLOGS

features between algorithms and programs.


Of course, there is no need to present an algorithm un-
less we are sure of its correctness and power. This question
may very well be the fire behind the algorithm/pseudocode
distinction question. Without a mechanism for testing an
algorithm, we are subject to proving its correctness only by
hand. This leaves much room for error. So then, what are the
tools for testing? Well, programs.
Lamport has a solution to this as well. Along with the
Algorithms Fit PlusCal language, Lamport has designed the TLA+ specifica-
tion language with tools for model checking and a TLAPS
for Compilation? proof system. It is part of the Tools for Proofs (http://www.
msr-inria.fr/projects/tools-for-proofs/) project at Microsoft
By Olivia Simpson
Research. During a recent talk at UCSD, Lamport explained
that the model checking system of TLA+ has vastly improved
Most researchers differ in their workflow. For researchers his own work process and also makes collaboration much
in the algorithms world (or at least, those I know), the work more comfortable.
is in the design. Our hours are spent at the blueprint stage. Aside from attending his talk, I find myself asking these
Algorithms are designed, improved, reformulated, or questions because, in writing an algorithm down, I almost
reapplied in different problems, mostly on paper. But this reliably find errors in its design. Transcribing inevitably
is unarguably only the first stage in successfully developing enables debugging. Even better is implementation! How can
a new algorithm. There are still the matters of proving and I be so sure of my algorithm if I myself cannot use it? So, this
testing the algorithm, and submitting the result to the is part of my workflow. But what does this mean for my read-
public. When are we done drafting our blueprint? How do ers? Do I spare these details for a more pleasurable reading
we package and ship the blueprint to the engineers and experience? After all, everyone has his own flavor of imple-
construction team? mentation. Or should I let the details given in the algorithm
Let’s address the more straightforward question first. speak for itself?
What is the best way to present an algorithm? How descrip- Lamport has his own answers to all of these questions.
tive and specific should it be? Should it be entirely self- I am personally still in the development phase, but I feel
contained or, for instance, could we have a pointer to a “… it is important to begin a dialogue within the community.
subroutine of choice”? Is implementability more important Developing a standard in the way of algorithms languages
than readability? would improve productivity and remove ambiguity. A com-
In an introductory algorithms course, algorithms are mon language for algorithmists could even open up new
often described as recipes, step-by-step instructions for ob- doors of collaboration. If nothing else, it will answer some
taining the result of a calculation. However, it is often made questions I have.
explicit that algorithms are not pseudocode. Leslie Lamp- Olivia Simpson is a graduate student at UCSD, enjoying all the sun, sand, and graph theory
ort developed the PlusCal algorithm language to address that San Diego has to offer.
this notion explicitly. According to the description on his
Microsoft Research page, the PlusCal language is for writing
algorithms, as programming languages are meant for writ-
ing programs. Many LaTeX packages for writing algorithms Habits: Our cognitive shortcut
specify that they should not be used to write pseudocode By Gidi Nave
Photograph by Andrey Armyagov

(although this might not be coincidental considering Lamp-


ort himself wrote LaTeX). His solution uses the language of I like my shopping routine at the grocery store around the
mathematics, giving descriptions of procedures in terms of corner, where my cart seems to easily navigate itself through
logical constructs. PlusCal is different from programming the isles. Once in a while I make adventurous purchases (the
languages two important ways: It allows for non-determin- Halloween-edition beer with pumpkin aroma still awaits
ism and it makes analysis easy by keeping track of atomic in my fridge), but I usually stick to the products that have
steps. Lamport highlights these as some distinguishing already made me happy before. Whenever in a new town, I

12 XRDS • fall 2013 • Vol.20 • No.1


U.S. intelligence agencies are
building a data center in Utah
to store in the trillions of terabytes
of surveillance data.

try to shop at the same chain, where I know the products and lever. The mice picked up a lever-pressing habit.
their location on the shelves. Although we all have natural, automatic responses to
One morning, the parking lot of the usual store was events in our environment (like the urge to step out of the
packed. I decided to explore and went to the supermarket elevator even though it had stopped on the wrong floor),
across the street for the first time. My experience at the new us, humans, have a sense of control. We believe in our
place was mildly traumatizing: Confronted by all sorts of ability to suppress habitual tendencies more easily than
new packages calling from the shelves, I felt helpless. How other animals. In 2009, Elizabeth Tricomi and colleagues
should I know what I would like? recruited Rutgers college students, making sure that none
In my first blog post, I wrote about our mind’s limited of the subjects were dieting. Participants fasted for six
computational capacity. When we shop, examining all of hours, and then performed a simple task. In each trial, they
the products, predicting how much we will enjoy them, were instructed to push one of four buttons in order to get
and whether the price is reasonable (compared with the a food reward. In some trials, the reward was an M&M; in
alternative) is very costly in terms of cognitive effort. others it was corn chips (the researcher verified that all of
Luckily, our brains have figured out a shortcut that saves the subjects liked both foods).
a lot of time and mental effort, making it easy to navigate Like in Dickinson’s mice studies, subjects were divided
through the local convenient store and fill ours cart with into two groups: The first performed the task only for
the usual products. a single day, in two eight-minute training sessions; the
Faced with a novel situation, we usually don’t know what second performed four sessions each day for three days—
to do. Sooner or later, after a short exploration period of six times as much training. Now came the fun part. After
trial and error, we learn what works for us. Animals are not the last training session, participants were given one of the
much different. In 1898 American psychologist Edward food types, and were instructed to eat until it was no longer
L Thorndike invented the “puzzle boxes”—cages that the pleasant to them. With one of the food “devalued,” subjects
animal (say, a cat) could exit (and get a food reward) only by went back to perform the task for three more minutes.
using a specific response (like pulling a lever, for example). What do you think happened? When the rewarded food
Thorndike found the time it took the animals to escape and was no longer desirable, the first group stopped pushing
get their food reward decreased with their experience. buttons, as expected. Surprisingly (or not) the over-trained
This is not news for anyone who ever had a pet: Animals subjects became habitual, and kept pressing buttons even
quickly figure out what action would get them the reward. when the food reward had become unpleasant.
The process of associating a state of the world with an action Habit formation imposes a trade-off. Most of the time,
and its outcome in the animal’s brain, called “instrumental the shortcut works well: We easily navigate our cars and
conditioning,” can be used to teach animals, like raccoons, carts in familiar avenues without having to pay too much
to perform complicated tasks, like playing basketball. attention. Faced with a new situation, however, we are prone
Do animals learn to associate the sensory stimulus to make mistakes, or even worse— pick up unhealthy habits.
and their responsive actions with the reward, or maybe Obesity, alcoholism, and other addictions are often a result
they are just reinforced to perform the action, regardless of a long reinforcement history (of alcohol, drugs or fried-
of the reward value? To answer this question, Dickinson chicken) that ended up turning into out of control habits.
and colleagues trained mice to push a lever in order to Accepting the habitual system as an inseparable part of
get food.1 After 120 successful trials, the mice were fed to our minds, understanding its limitation, and the way it works
satiety. Lacking the motivation to get food, the mice stopped may help us to achieve our long-term goals. Forcing ourselves
pressing the lever, providing evidence that they had learnt to start working out and eat healthy is effortful, but the
the consequence of their actions, and weren’t just blindly investment will pay off. After a period of reinforcement (by
responding to the lever in their cage. endorphin rush, feeling lighter and getting compliments),
Dickinson repeated his experiment with a new group of our brains will pick up the habit. Going to the gym or having
mice, but this time he extended the training period to 360 kale salad may become effortless or even enjoyable.
(rather than 120) trials. The effect of over-training on the Gidi Nave is a Computation and Neural Systems Ph.D. student in Colin Camerer’s lab
mice behavior was dramatic: Even after they were fed to at Caltech. His research is in the field of neuroeconomics—the intersection between
neuroscience, psychology and economics. He uses a medley of theoretical and experimental
satiety and no longer wanted the food, they kept pressing the methods for reverse-engineering the processes underlying human decision-making. By
understanding how emotions and cognition generate judgments and decisions under
1 A. Dickinson, et. al. Motivational control after extended instrumental uncertainty, he seeks to contribute models that take into account the biological origins of
training. Animal Learning & Behavior 23, 2 (1995), 197-206. the decision process in the brain. For more information visit www.gidinave.com.

XRDS • fall 2013 • Vol.20 • No.1 13


begin

PCI DSS
The Payment Card Industry Data Security Standard is
a mandatory data protection standard for businesses
that handle credit card information.

can benefit by making a unilateral deviation. Players can


Coordination When randomize, so actions are evaluated in expectation over
the random choices of one’s opponents, but their utility
Information is Scarce: functions are known.
How privacy can help In games of incomplete information, strategies are
thought of as mappings from types to actions. The standard
Originally posted on http://aaronsadventures.blogspot.com
By Aaron Roth solution concept is the “Bayes Nash Equilibrium,” which in-
formally states everyone should be playing a strategy such
that no single player can benefit by making a unilateral
This post is the first in a (hopefully) series of posts about deviation, where now strategies are evaluated in expecta-
how tools and ideas from differential privacy can be useful tion both over the random choices of one’s opponents, and
in solving problems in game theory. The connection might also over the random realization of the ty pes of one’s oppo-
at first be surprising, but is in fact very natural. The field of nents, as drawn from the known prior distribution.
differential privacy studies algorithms whose inputs are Games of complete information are attractive for a num-
generated by the reports of n players. It provides tools for ber of reasons: not least of which that they are much cleaner
bounding the sensitivity of the output of the algorithm in and more analytically tractable than games of incomplete
a precise, probabilistic sense to unilateral changes to the information. However, when talking about large n player
input by a single one of these players. This notion of uni- games, where players don’t all know each other, have limited
lateral deviations is very similar to the notions of stability abilities to communicate, and might think of their types as
at equilibrium that game theory is concerned about, and proprietary information, games of incomplete information
this is the root of the connection. For a general overview of are probably a more reasonable model of reality.
the area, let me plug the survey on differential privacy and Unfortunately, the quality of Bayes Nash Equilibria can
mechanism design that Mallesh and I wrote for the most be much worse than the corresponding Nash equilibria of
recent issue of SIGecom Exchanges. In this post, I’ll discuss the full information game, even in the worst case over type
a result that Michael Kearns, Mallesh Pai, Jon Ullman, and realizations. The reason is not knowing the utility functions
I have had kicking around for awhile. We only recently hit of your opponents can make coordination very difficult. Con-
upon the interpretation of the result that I’ll discuss here sider the following toy example, borrowed from Bhawalkar
though, which I rather like. For more details, you can check and Roughgarden:
out the arXiv paper, “Mechanism Design in Large Games: There are n players, and n+ √n “goods” to be distributed.
Incentives and Privacy,” which we recently revised. I should Each player i has utility uij ∈{0,1} for receiving a good j. The
add the story that accompanies our technical contribu- game is played as follows: Each player i names a single good
tions is the result of discussions with many people as we’ve si. Each good j is allocated at random to one of the players
gone about and presented it. Tim Roughgarden and Ricky that named it. Now suppose that types are drawn from the
Vohra deserve special thanks. following distribution. The goods are randomly partitioned
In game theory, there are several popular abstractions for into two sets, S and T, with |S| = n and |T| = √n. For every
what players in a game know about their opponents. player i and good j ∈ T, uij = 1. A perfect matching μ is selected
˲˲ In games of complete information, players know exactly at random between agents I and goods in S. Agents have
the utility functions that each of their opponents is trying value u(i, μ(i)) = 1 for the good μ(i) ∈ S that they are matched to,
to optimize. and value u(i,j) = 0 for every good j ≠ μ(i) in S that they are not
˲˲In games of incomplete information, players don’t matched to. In other words, every player has a “special” good
know what their opponent’s utility functions are. In in S that she uniquely desires, and everyone desires all of the
Bayesian games, players utility functions are parameter- goods in T. If we are in the complete information setting,
ized by “types,” which are drawn from a known (possibly then since everyone has the option of requesting their spe-
correlated) prior distribution. Agents know the distribu- cial good si = μi in any reasonable equilibrium (with nobody
tion from which their opponents’ types are drawn, but not requesting a good they don’t want) n goods are allocated
the types themselves. to bidders who desire them, for total social welfare of n.
In games of complete information, the standard solu- However, in the incomplete information setting, because of
tion concept is the Nash equilibrium: Informally, everyone the symmetry of the distribution, no player can distinguish
should be playing an action such that no single player their special good from amongst the √n+1 goods that they

14 XRDS • fall 2013 • Vol.20 • No.1


Early version of web browsers restricted
the length of cryptographic keys for
non-U.S. users, because of U.S. restrictions
on exporting cryptography.

value. Its not hard to see that in this case, in equilibrium have one of two types—(M)ountain, or (B)each. Each player
only O(√n)goods get allocated. has two actions— they can go to the beach, or go to the
So if we find ourselves in a setting of incomplete informa- mountain. Players prefer the activity that corresponds to
tion, we might nevertheless prefer to implement an equilib- their own type, but they also like company. So if a p frac-
rium of the game of complete information defined by the tion of people go to the Mountain, an M type gets utility
realized types of the players. How can we do that, especially if 10 ∙ p, if he goes to the mountain, and 5 ∙ (1 – p), if he goes
we have limited power to modify the game? to the beach. A B type gets utility 5 ∙ p, if she goes to the
One tempting solution is just to augment the game to mountain, and 10 ∙ (1 – p), if she goes to the beach. A proxy
allow players to publicly announce their types (and thus that suggests that all players go to the beach if type M is in
make the game one of complete information). Of course the majority, and otherwise suggests that all agents go to
equilibria of the complete information game might not be the mountain always computes a Nash equilibrium of the
unique, so to help solve the coordination problem, we could complete information game defined by the reported types.
introduce a weak “proxy” that players can choose whether Nevertheless, any player that is pivotal has incentive to
or not to use. Players can “opt in” to the proxy, which means opt-out of the proxy, since this causes the proxy to send all
that they report their type to the proxy, which then recom- other players to her preferred location.
mends an action for them to play. At this point they are free But what if it were possible to compute an equilibrium
to follow the recommendation, or not. Alternately, they can in such a way so that whether or not any player i opted in
“opt out” of the proxy, which means they never report their had very little affect on the distribution over actions sug-
type, and then just choose an action on their own. It would gested by the proxy to all other player j ≠ i? In this case, the
be nice if we could design a proxy such that: problem would be solved: any unilateral deviation from
1. Simultaneously for every prior on agent types, it is a the all-opt-in-and-follow-the-proxy’s-suggestion solution
Bayes-Nash equilibrium for agents to opt-in to the proxy, and would not (substantially) affect the play of one’s oppo-
then follow its suggested action, and nents, and so they would all continue playing their part of
2. Given that all players behave as in (1), that the resulting an equilibrium of the complete information game, defined
play forms an equilibrium of the complete information game on all player’s realized types. The equilibrium condition
induced by the actual, realized types of the players. would now directly guarantee that such a deviation could
Its tempting to think that to design such a proxy, it is not be beneficial.
sufficient to have it compute a Nash equilibrium of the So to design a good proxy, its not enough to just have
complete information game defined by types of the players an algorithm that computes an equilibrium of the game
who opt in, and then suggest that each player play her part defined by the reported types, but it is enough if that algo-
of this Nash equilibrium. After all, if everyone is opting in rithm also satisfies a strong enough stability condition. It
and playing their part of a Nash equilibrium, how can you turns out that a sufficient “stability” condition is a variant
do better than to do the same? By definition, in a Nash equi- on differential privacy: informally, that any unilaterial de-
librium, your suggested action is a best response given what viation by a single player i should not change (by more than
all other players are playing. And in fact, in the toy alloca- a (1 ± ε) factor) the probability that any particular profile of
tion example we discussed above, this works, since there is n – 1 actions is suggested to the n – 1 players .
an equilibrium of the complete information game that is And in fact it is possible to implement this plan, at least
simultaneously optimal for all players. for certain classes of games. We study the class of “large
In general, however, this approach fails. The flaw in our games”, which informally, are n player games in which the
reasoning is that by opting out, you change what the proxy affect of any player i’s action on the utility of any player j≠i
is computing: it is now computing a different equilibrium, is bounded by some quantity which is diminishing in n.
for a different game, and so by opting out, you are not The Beach/Mountain example is of this type, as are many
merely making a unilateral deviation from a Nash equi- large population games. Our main technical result is
librium (which cannot be beneficial), you are potentially an algorithm satisfying the required differential privacy
dramatically changing what actions all other players are stability condition, which computes a correlated equilib-
playing. The Nash equilibrium condition does not pro- rium of any large game. The upshot is that in large games,
tect against deviations like that, and in fact it’s not hard it is possible to design a very weak proxy (that doesn’t have
to construct an example in which this is a real problem. the power to make payments, force players to opt in, or
Consider, e.g. toy example #2: There are n players who each enforce actions once they do opt in) that implements an

XRDS • fall 2013 • Vol.20 • No.1 15


begin

η-approximate correlated equilibrium of the complete How does the new Firefox cookie policy work?
information game, as an η-approximate Bayes Nash equi- Roughly: Only websites that you actually visit can use
librium of the partial information game, no matter what cookies to track you across the Web.
the prior distribution on agent types is. Here η is some More precisely: If content has a first-party origin, noth-
approximation parameter that is tending to zero in the size ing changes [1]. Content from a third-party origin only has
of the game—i.e. as the game grows large, the equilibria cookie permissions if its origin already has at least one
become exact. cookie set.
To emphasize the purely game theoretic aspects of this How does Firefox’s new policy compare to the other
problem, I have ignored the fact that differential privacy of major browsers?
course also provides a good guarantee of “privacy.” Aside ˲˲ Chrome: Allows all cookies.
from straightforward incentive issues, there are other ˲˲ Internet Explorer: Cookie permissions vary by P3P com-
reasons why players might be reluctant to announce their pact policy. In practice, almost all third-party tracking cook-
types and participate in a complete information game: ies are allowed [2].
perhaps their types are valuable trade secrets, or would be ˲˲ Safari: First-party content has cookie permissions.
embarrassing admissions. Because our solution also hap- Third-party content only has cookie permissions if the con-
pens to provide differential privacy, it is able to implement tent already has at least one cookie set.
an equilibrium of the complete information game, while In short, the new Firefox policy is a slightly relaxed ver-
maintaining the privacy properties inherent in a game of sion of the Safari policy [3].
incomplete information.
To conclude, differential privacy thus far has been Will the new Firefox policy break websites?
primarily a problem-driven field that has borrowed Collateral impact should be limited. Safari’s cookie policy
techniques from many other areas to solve problems in has been in place for over a decade, and it is included in
private data analysis. But in the process, it has also built both the desktop and iOS versions of the browser. A few
up a strong conceptual and technical tool kit for think- websites may require a tiny code change to accommodate
ing about the stability of randomized algorithms to small Firefox in the same way as Safari.
changes in their inputs. I hope this and future posts serve Just to be sure, the Mozilla privacy team is closely
to convince you that there is something to be gained by monitoring the policy before final release. The patch will
borrowing the tools of differential privacy and applying spend about six weeks each in the pre-alpha, alpha, and
them to solve problems in seemingly unrelated fields. beta builds. If you spot any oddities, please report them to
Aaron Roth is the Raj and Neera Singh assistant professor of Computer and Information
Mozilla support!
Sciences at the University of Pennsylvania. Prior to this, he was a postdoctoral researcher
at Microsoft Research, New England, and earned his PhD at Carnegie Mellon University.
He is the recipient of a Yahoo! Academic Career Enhancement Award, and an NSF CAREER IS IT NECESSARY?
award. His research focuses on the algorithmic foundations of data privacy, game theory Consumers neither expect nor approve of Web tracking
and mechanism design, and the intersection of the two topics.
[4]. Mozilla has been a frequent advocate for its users,
advancing technologies that signal preferences (Do Not
Track), lend transparency (Collusion), and facilitate

The New Firefox Cookie Policy privacy-friendly Web services (Persona and Social API).
Last fall, the Mozilla community began a concerted effort
Originally posted on WebPolicy.org in a new direction: technical countermeasures against
By Jonathan Mayer tracking [5]. One of our first projects has been a revision of
the Firefox cookie policy [6].
Editor’s Note: Earlier this year Stanford grad student Jona- Cookie policies are inherently imprecise. Some un-
than Mayer discussed cookies, Web tracking, and changes to wanted tracking cookies might slip through, compromis-
Mozilla’s cookie policy on his personal blog. ing user privacy (“underblocking”). And some non-tracking
cookies might get blocked, breaking the Web experience
The default Firefox cookie policy will, beginning with (“overblocking”). The challenge in designing a cookie
release 22, more closely reflect user privacy preferences. policy is calibrating the tradeoff between underblocking
This mini-FAQ addresses some of the questions that I’ve and overblocking [7].
received from Mozillans, Web developers, and users. The patch that I developed is an intentionally cautious

16 XRDS • fall 2013 • Vol.20 • No.1


first step: It aims to substantially reduce underblocking will still be protected against Google ad tracking.
with little (if any) overblocking. The revised policy is so As for overblocking, again, I am not aware of any signifi-
cautious, it isn’t even new: It’s drawn directly from Safari cant shortcomings with the revised cookie policy [11].
[8]. Almost every iPhone, iPad, and iPod Touch user is al-
ready running the revised Firefox cookie policy. Web engi- Next STEPS
neers are already familiar with designing to accommodate We have a number of tools at our disposal for improving
the policy. The notion is simple: Start by raising Firefox our understanding of the Firefox cookie policy,
to the present best practice among competing browsers, including feedback solicitations, user surveys, browser
then iteratively innovate improvements. measurements, Web crawls, and much more. There
Firefox’s revised cookie policy landed in the pre- are many possible directions for product innovation,
alpha build in late February. Since then, Mozillans and including heuristics, machine learning, community
I have carefully monitored bug reports. It appears that reporting, manually-curated lists, mechanisms for
we achieved our aim: There are only two confirmations confirming user preferences, new user interfaces, new
of inadvertent breakage [9]. We did not hear any novel APIs, and new institutions.
concerns when the patch advanced to alpha in early April. I look forward to continuing collaboration with Mozilla
Recently, Mozilla’s CTO requested a hold on the revised and its community on web privacy and security. I’m excited
policy for an extra release cycle to measure its perfor- to get the revised cookie policy into users’ hands. And I’m
mance. At the same time, he reaffirmed that Mozilla is even more excited about building what comes next.
“committed to user privacy” and “committed to shipping This work is licensed under a Creative Commons Attribution
a version of the patch that is ‘on’ by default.” 3.0 Unported License.
I agree we should be quantitatively rigorous in our
approach to iterating the Firefox cookie policy. An extra Notes
[1] An origin is determined by public suffix + 1.
six-week release cycle would allow us to further validate
[2] Many researchers have criticized Microsoft’s approach for being ineffective,
our hypothesis that the patch delivers improved privacy convoluted, and relying on the de facto deprecated P3P standard. For background,
without breakage [10], as well as lay the groundwork for see Token Attempt: The Misrepresentation of Website Privacy Policies Through the
Misuse of P3P Compact Policy Tokens by Leon et al.
future updates. Going forward, our challenge will be to
[3] The difference is primarily owing to engineering convenience.
understand and improve the underblocking and over-
[4] See the survey paper “Third-Party Web Tracking: Policy and Technology” for
blocking properties of the Firefox cookie policy. background. In the context of this post, “tracking” means the collection of a user’s
browsing history by a third-party website.
[5] For an overview of Mozilla’s open-source community model, see MozillaWiki »
Underblocking and Overblocking Community and Mozilla.org » Governance. Many members of the Mozilla community
There are at least three substantial areas of underblocking have now contributed to the tracking countermeasures effort.

that we know we need to address with future [6] Apple and Microsoft have both automatically limited tracking cookies for a decade.
There was an effort to block tracking cookies by default in Firefox three years ago,
improvements. but it was withdrawn under contested circumstances.
˲˲ Old cookies. The revised policy does not limit preexist- [7] Other considerations could include types of underblocking and overblocking, as
well as possible reactions to the policy. Future posts might address these topics,
ing tracking cookies. Firefox users who update to the revised depending on reader interest.
policy will not fully benefit until they clear their cookies. [8] In the interest of precision, the revised Firefox cookie policy is slightly more
˲˲ Temporary visits. Sometimes a user temporarily visits permissive than the Safari policy owing to implementation specifics.

a tracking website, such as after clicking an advertisement [9] The sites are dayonecenter.com (Alexa rank > 1M) and western.org (Alexa rank ≈
200K).
(intentionally or inadvertently). The revised policy indefi- [10] As I understand our release conditions, the patch will move forward unless there’s
nitely allows tracking cookies from a website after just one confirmed breakage, the breakage is so substantial as to outweigh longstanding
user demand for privacy, and the breakage cannot be ameliorated through outreach,
temporary visit. mitigation measures, or rapid iteration. Under present circumstances, the patch
˲˲ Dual-use domains. Several popular websites use the plainly satisfies these release conditions.

same domain for both consumer services and tracking. Ya- [11] We may wish to relatedly take steps to accommodate websites (if any) that have
a third-party domain, do not compromise consumer privacy, do not break the
hoo, for example, operates both its homepage and adver- consumer Web experience without cookies, cannot deploy an accommodation
for the revised cookie policy, require cookies for functionality, and have lost that
tisement tracking from yahoo.com. If a user visits the Ya- functionality on account of the revised policy.
hoo homepage, the company will be able to track the user
across other websites. Google, on the other hand, largely Jonathan Mayer is a grad student at Stanford University in computer science and law.
But he doesn’t live in the ivory tower. His research homes are the Security Lab (advised by
hosts search on google.com but advertising tracking on John Mitchell), CISAC, and CIS. Wherever information technology, public policy, and law
doubleclick.net. If a user runs a query with Google, they intersect, Mayer is interested.

XRDS • fall 2013 • Vol.20 • No.1 17


feature

What is Public and


Private Anyway?
A Pragmatic Take on
Privacy and Democracy
Revealing private content on the Web can also spark public engagement.
To understand this, we need to challenge our common sense notions
of privacy and democracy.

By Andreas Birkbak
DOI: 10.1145/2508969

T
he Web should work in the most democratic way viable, while doing as much as possible
to protect the privacy of its users. These are statements that most people seem to agree
with, to an extent where they have become common sense. The widespread uptake of
social media use, however, suggests the conventional distinction between something
private and something public does not always hold in practice. For example, prominent Web
scholars like Nancy Baym and danah boyd note how we might understand social media better
if we see that “even the most private of selves are formed in relation to diverse others.”
This insight is taken from pragma- observation that as our societies grow boundaries of the immediate situation
tist philosophy, and I would like to sug- increasingly technological, our ac- in which the act takes place. The re-
gest that this line of thinking is fruitful tions also tend to have an increasing sult is that a public needs to be formed
for challenging our tendency to think number of unforeseen and indirect in order to take care of the indirect
about the Web in terms of a strong consequences. In order for people to consequences. One might take pollu-
public/private dichotomy. My source live democratically, by which Dewey tion as a simple example: If a father is
of inspiration is the classic American simply means to be able to direct burning garden waste, and the direct
pragmatist John Dewey [1]. one’s own life in a meaningful way, it consequence is the smoke prevents
In a deceptively small book from is imperative to come to grabs with all his children from playing outside, it is
1927, The Public and its Problems, Dew- these indirect consequences. This is a private matter to put out the fire or
ey offers an alternative take on the where the public enters the picture. postpone any playing outdoors. How-
public/private distinction. Like any Dewey distinguishes between di- ever, if the father is also burning ma-
other pragmatist his vantage point is rect and indirect consequences of ac- terial that contains toxic chemicals,
Illustration by JNT Visual

practice, that is, human actions. What tions. If consequences are direct, it his actions might have the indirect
Dewey is interested in is how we might means they are contained in the situa- consequence of polluting the air in the
come to better understand our actions tion where the act is taking place, and whole neighborhood. In this situation,
and their consequences. More specifi- they can be dealt with privately in that a public is needed to a) sort out the con-
cally, the problems of the public that situation. If consequences are indi- sequences, e.g. by putting up air qual-
Dewey is pointing to arise from the rect, however, they spread beyond the ity measurement instruments and b)

18 XRDS • fall 2013 • Vol.20 • No.1


feature

XRDS • fall 2013 • Vol.20 • No.1 19


feature

legislate so that the father will hesitate teresting for a discussion of privacy. revealed too easily on the Web. The
to repeatedly release toxics into the air. The Facebook users on Bornholm had notion of having privacy only makes
The important thing to notice to share private stories, photos, and sense in relation to its counterpart—
here is the apparently private act of videos in order to qualify their situa- the notion of exposing something pri-
burning waste needs to be qualified tion as a public issue of uncontrolled vate in public, that is, a breach of priva-
as having public consequences in indirect consequences (of the inac- cy. Here, public simply means “visible”
order to be controlled. A process of tion of the authorities). In other words, and private simply means “hidden.” Or
inquiry has to take place, the result the sharing of personal accounts to put it in terms of control, public is
of which is that it becomes clear the formed the basis of public engage- taken to mean “deliberately revealed”
burning of waste is in fact polluting. ment. Revealing private content, even while private means “only shown to a
The act of burning waste is now seen unintentionally, can have productive select audience.” Such an understand-
in an entirely new light: It is no longer consequences for how other people un- ing of the public/private distinction is
a mundane everyday event, but an derstand the issues they struggle with. highly intuitive and indeed captures
unacceptable practice. Powerful “we-identities” can be built many of the concerns with user privacy
In my work on social media groups, around deeply personal stories that in the age of social media.
I have noticed how infrastructures like people can relate to. Second, some fear that the Web is
Facebook groups sometimes come to Such we-identities can then provide harmful to democracy. These concerns
serve this purpose of qualifying acts the self-confidence needed to take ac- have to do with the notion of public
as having indirect consequences. One tion on issues, as when the snowbound space as something fundamentally im-
case I have studied is the use of open Facebook users on Bornholm worked portant in a democracy. The concern
Facebook groups during a severe together to attract national media at- here is not that we are too exposed by
snowstorm on the Danish island of tention to their situation and help each our social media content, but quite
Bornholm [2]. People in the more rural other in other, more mundane ways. the opposite: What happens on so-
parts of the island were snowbound for What is more, democracy does not nec- cial media tends to be “too private” to
up to a week, something few of them essarily hinge on rational deliberation qualify as public deliberation. Here is a
were prepared for. In dealing with the in an abstract public sphere. Rather, tendency to talk about public and pri-
situation, some inhabitants not only public engagement happens when vate in binary terms. A popular way to
searched for information and tried to people work to qualify their personal describe this is to use Cass Sunstein’s
help each other, but also questioned problems as public issues. What I have term “echo chambers,” which de-
whether the authorities on Bornholm observed is social media platforms like scribes how the groups we participate
had done enough to remove the snow Facebook provide interesting ways of in on social media tend to confirm our
from their roads. This concern formed making such moves. own (read: biased) beliefs rather than
the rationale behind the Facebook However, we are not used to think- challenge them. This is not conven-
group that became an important meet- ing about the Web in these pragmatist tionally seen as a good dynamic, since
ing point for a couple of hundred snow- terms. Rather, our ideas about how the democracy is understood as depen-
bound islanders. Web should work are informed by oth- dent on the clash of divergent opinions
With Dewey we might come to er kinds of philosophies. Let me high- in an open public space.
understand this Facebook group on light two widespread concerns with The two sets of concerns—privacy
Bornholm as contributing to democ- how the contemporary Internet works. and democracy—are key to the way
racy in the pragmatist sense in so far First, some fear that too much is scholars in Web science and related
as the members collectively qualified disciplines try to capture the current
their snowstorm troubles as not only Web as a domain of “networked pub-
a result of forces of nature, but also of lics.” The concerns stem from under-
the (in)actions of the local authorities. The widespread standing the relationship between
The snowbound Facebook users came
to understand their situation as also
uptake of social the private and the public domains
as fundamentally problematic. Con-
an indirect consequence of the author- media use, cerning privacy, the thinking goes, it
ities’ act of not removing more snow.
Importantly, the members of the Face-
however, suggests is a good thing that people can share
content from their everyday lives with
book group did not ask the authori- the conventional each other through the Web, but peo-
ties to do the impossible, but rather to
understand the situation of people in
distinction between ple need to be able to control where
the line between public and private is
the rural parts of Bornholm. The rural something private drawn. Concerning democracy, it is a
dwellers felt overlooked and misunder-
stood. In order to compensate for that,
and something good thing that people get new ways of
engaging through social media, but if
they used Facebook to share updates public does not people are not entering into a dialogue
about the snowstorm and its conse-
quences from their perspectives.
always hold with opposing viewpoints, the contri-
bution to democracy is far from clear.
This is where the story becomes in- in practice. In the worst case, social media might

20 XRDS • fall 2013 • Vol.20 • No.1


distract people from engaging in the
“proper” public deliberation that is
Revealing private actions of others when we decide on a
course of action. This means any ideal
seen as key to any real democracy. content, even of perfect privacy must be taken with
My question is: Are these dilemmas
inevitable? One way to find out is to
unintentionally, a grain of salt.
At the same time, importantly,
investigate the assumptions on which can have productive there is also no collective level floating
they rest. For this purpose, it is useful
to go beyond understanding public/
consequences around over the heads of individuals,
such as “society,” “democracy,” or “the
private as meaning visible/hidden, for how other people public sphere.” In practice, it is always
but also take into account the norma-
tive relationship between the notion
understand individuals who act. Even when we say
we are all caught up in an uncontrol-
of public space and democracy, as just the issues they lable process of globalization, for ex-
introduced. The reason is that these
normative ideas draw on well-known
struggle with. ample, it always takes an individual
to actually trade stocks across conti-
political philosophies that understand nents, and a prime minister or an ac-
“the public” and “the private” as differ- tivist to express concern that we have
ent spheres that should ideally be kept have an amazing capability of uniting lost control over the global economy.
separate. This idea is the source of our in practice what seems hopelessly di- The consequence of this pragmatist
dilemmas about privacy and democ- vided in the abstract, the coexistence viewpoint is that when we talk about
racy on the Web, and there are at least of liberal and republican ideas is argu- “the public” it never exists automati-
two prominent ways of arriving at it. ably made possible through the idea of cally, but is brought into being by a lot
“People are the best versions of them- free-market capitalism. The logic I am of connected individual acts.
selves in their private spheres.” This is a thinking about is the idea that while The point I would like to reiterate
key idea in liberal political philosophy. we might be egoists when we pursue is the widespread intuition that the
The logic is that the state should be of our private interests, the “invisible public and private should ideally be
minimal size, only just large enough to hand” of the market ensures these in- kept separate oversimplifies how these
secure the fundamental rights of its in- dividual pursuits are at the same time things work in practice. This is what I
dividual citizens. Apart from protect- good for everyone. At a stroke (of an tried to illustrate with the case study of
ing basic freedoms and safety, the pub- invisible hand), the liberal and repub- how Facebook groups were used in the
lic sector should stay out of the private lican ideals may coexist. Bornholm snowstorm. The philosoph-
sphere. In a comparative perspective, This is hardly the place to go into ical assumptions lying behind the con-
this kind of liberal logic is more pro- discussions about how capitalism jus- ventional view on privacy and democ-
liferate in the U.S. than in Europe, but tifies itself, or not. The point I want racy come from a specific place and
it offers an extremely influential argu- to make by venturing into political might thus be replaced. One fruitful
ment for why the private and the public philosophy is merely if it feels intui- experiment might be to replace these
should be kept separate. tively right that the public and the pri- assumptions with more pragmatist
“People are the best versions of them- vate are domains that should be kept ones. Because while we should not ac-
selves in the public sphere.” This is a separate, it is probably because of the cept uncritically what shiny, new Web
notion central to republican politi- powerful and widespread logics just technologies have to offer, we should
cal philosophy. This set of ideas turns described. These are important po- also not discount the ways in which
the liberal logic on its head, but has litical philosophies that will no doubt social media might make productive
the same result of imposing a strong continue to be central to the way we moves across the public/private divide
public/private distinction. According think about and practice democracy. easier for millions of regular users in
to the republican ideal, private inter- However, by pointing to the philo- thousands of specific situations.
ests need to be contained in order to sophical origins of our ideas about
achieve the public goods that benefit public and private, it also becomes References
everyone. The logic is that being a good possible to see that there must be [1] Birkbak, A. From Networked Publics To Issue Publics:
Reconsidering the public/private distinction in web
citizen means shoving private inter- philosophical alternatives. science. In Proceedings of WebScience ’13 (Paris,
ests aside in order to make space for The alternative vantage point for May 2-4). ACM Press, New York, 2013.

communal interests. This kind of logic thinking about the Web in terms of [2] Birkbak, A. Crystallizations in the Blizzard:
Contrasting informal emergency collaboration in
is arguably more widespread in Europe publics, which I find most fruitful, is Facebook groups. In Proceedings of NordiCHI ’12
than in the U.S., but it certainly also ex- that of pragmatism. One general in- (Copenhagen, Denmark, Oct. 14-17) ACM Press,
New York, 2012.
ists in both places, offering another in- sight pragmatist thought has to offer
fluential argument why the private and is that in practice, we never exist as
Biography
the public needs to be kept apart. isolated individuals. Even in our most
Andreas Birkbak is a Ph.D. fellow in the Techno-Anthropology
While these two powerful logics private moments, we always draw on Research Group at Aalborg University, Copenhagen. His work
seem mutually exclusive, they co-exist is on social media and technological democracy.
the habits, skills, and ideas passed
in practice. How is this possible? Apart over to us from others, and we con- Copyright held by Owner/Author(s).
from the fact that humans seem to stantly imagine the thoughts and re- Publication rights licensed to ACM $15.00

XRDS • fall 2013 • Vol.20 • No.1 21


feature

22 XRDS • fall 2013 • Vol.20 • No.1


feature

Something Bad
Might Happen:
Lawyers,
anonymization,
and risk
The line between personal and anonymous information is often unclear.
Increasingly it falls to lawyers to understand and manage the risks associated
with the sharing of “anonymized” data sets.

By Marion Oswald
DOI: 10.1145/2508970

I
f you wanted to predict the future, who would you call upon? An economist, a statistician,
Nate Silver? A lawyer might not be high on your list. Yet when faced with questions of
individual privacy and data anonymization, this is what lawyers are being asked to do.
This article aims to illustrate how this is the case and consequently why lawyers need help
from computer scientists.
LEGAL BACKGROUND becomes subject to a number of (some- nymized, a crucial question the courts
Anonymization presents lawyers with times complex) duties and responsi- often have to decide is whether this
somewhat of a challenge. Take the Eu- bilities designed to safeguard the data. supposedly anonymized dataset in
ropean Data Protection Directive for If personal data can be converted into fact falls within the definition of per-
instance. It applies to personal data, anonymized form in such a way that sonal data. Why then does this in-
that is any information relating to an a living individual can no longer be volve predicting the future? Because
identified or identifiable natural per- identified from it (taking into account it involves an assessment of risk, or
son. An identifiable person is one who all the means likely reasonably to be as David Spiegelhalter has put it, “the
can be identified, directly or indirectly, used by anyone receiving the data), possibility that something bad might
in particular by reference to an iden- disclosure of information in this ano- happen” [1]. This is usually decon-
tification number or to one or more nymized form will not be disclosure of structed into “the likelihood of some-
factors specific to his or her physical, personal data, and therefore those du- thing happening and the impact if it
physiological, mental, economic, cul- ties and responsibilities will not apply actually does.” Then some attempt is
tural or social identity. to the disclosed data. made to quantify the magnitude of
If data is personal, the organiza- these two dimensions.
tion that decides how and why the IDENTIFIABILITY In its Code of Practice “Anonymiza-
data is processed (the data controller) Where personal data has been ano- tion: Managing data protection risk,”

XRDS • fall 2013 • Vol.20 • No.1 23


feature

the UK’s Information Commissioner The Tribunal made use of the “mo- ise made by [privacy/data protection]
Office advised the Data Protection Act tivated intruder” test; a motivated in- laws—that anonymization protects
does not require the process of anony- truder being someone who has access privacy—as an empty one” [3]. Ohm
mization to be completely risk free— to the Internet and public documents highlighted “release-and-forget” an-
data controllers must instead mitigate and would use investigatory tech- onymization, with generalized rather
the risk of re-identification until the niques such as making enquiries of than suppressed identifiers, as of
risk is “remote.” people likely to have additional knowl- particular concern. He also pointed
So when lawyers are presented with edge. The requestor was an investiga- out that other “data fingerprints”
an anonymized dataset and asked “is tive journalist, and so might have been such as search queries or social me-
it personal data?”, they have to assess highly motivated to identify individu- dia postings can be combined with
the possibility of something bad hap- als using other information available anonymized data to attempt re-iden-
pening; i.e. the likelihood of someone and common investigative steps. The tification. However others disagree
being able to re-identify an individual, Tribunal concluded an investigative with Ohm’s view that re-identifica-
and the harm or impact if that re-iden- journalist “would have little difficulty tion can be achieved with “aston-
tification occurred. And tending to be in making the necessary enquiries ishing ease.” In her new guidance,
conservative creatures, lawyers may which could lead to the identification “Looking Forward: De-identification
be tempted to respond: Yes, it could of individuals subject to disciplinary Developments—New Tools, New
happen, therefore there is a risk, proceedings,” particularly as the com- Challenges,” Ann Cavoukian, the In-
therefore the data is personal data. munity was small and close-knit, and formation and Privacy Commission-
that identification would be all the er of Ontario, Canada, restated her
THE POSSIBILITY OF SOMETHING more likely when the sanction was sus- opinion that re-identification “is not
BAD HAPPENING pension or dismissal. easy” and that the most significant
There are situations where the risk privacy risks arise from ineffectively
of re-identification is undoubtedly BIG AND OPEN DATA de-identified data. Commenting on
high: the likelihood of re-identifi- The Magherafelt decision dealt with a big data, she said: “As masses of in-
cation high, the potential for harm relatively small set of data where prior formation are linked across multiple
high, and the certainty factor high. or personal knowledge about a par- sources it becomes more difficult to
Take the request under the UK’s Free- ticular individual may already have ensure the anonymity of the informa-
dom of Information Act for details of existed or could have been obtained. tion” [4]. On the other hand, big data
disciplinary action taken against em- The UK Government’s Open Data could make de-identification easier
ployees of the Magherafelt District agenda is particularly concerned with to achieve; “smaller datasets are
Council in Northern Ireland. Should regional or national datasets where more challenging to de-identify as it
the Council disclose a schedule con- the likelihood of personal knowledge is easier to be unique in a small data-
taining the penalty issued and the having an impact on re-identification set,” as seen in the Magherafelt case
reason for the action, but which ex- risk is minimized. How should we as- discussed previously.
cluded the date of the action, and the sess the risk of re-identification in re- So who should we believe?
gender, job title, and department of lation to such datasets?
the employee? Paul Ohm, in his paper “Broken TRUST, RISK, AND
No, said the Upper Tribunal Ad- Promises of Privacy,” argued “re-iden- ANONYMIZATION STUDIES
ministrative Appeals Chamber [2]. tification science exposes the prom- “…[O]ur views of the facts about big
The information was personal data, risks are often prompted by our politics
and it would be unfair to disclose it. and behaviour, even as we insist that
But the information had been ano- the rock on which we build our beliefs is
nymized—why was it personal data? Where personal scientific and objective, not the least bit
The issue was not whether the in-
formation was personal data in the
data has been personal” [5].
In Nick Pidgeon’s view, emotional
hands of the Council, but whether it anonymized, a responses are very important in the
was personal data in the hands of the
general public. A crucial question was
crucial question the assessment of risk. “If you do not trust
the parties who manage the risk, you
whether the public could identify the courts often have to are not likely to have confidence that
individuals to whom the summarized
schedule related.
decide is whether the risk is being safely managed” [1].
Kieron O’Hara has said “trust is an im-
The Tribunal considered evidence this supposedly portant risk and complexity manage-
that the Council was a small authority
with only 150 employees, all known to
anonymized dataset ment tool…The stronger X’s trust, the
higher the degree, and the greater the
each other, in a district with a popula- in fact falls within risk he is willing to take” [6].
tion of 39,500. The Council was likened
to a family, with a high level of knowl-
the definition of A recent Ipsos MORI study of Public
Understanding of Statistics examined
edge of each other’s affairs. personal data. how much trust the participants had

24 XRDS • fall 2013 • Vol.20 • No.1


in information provided by certain
categories of people. Information pro-
Decisions to release were submitted to the Project and it
was confirmed 84 percent had been
vided by scientists was the most trust- anonymized data matched correctly (increasing to 97
ed (28 percent trusted the information
“a great deal” and 46 percent “a fair
must be taken with percent allowing consideration for
possible nicknames).
amount,”) compared to politicians (1 great care taking into Sweeney’s paper noted to reduce
percent “a great deal” and 7 percent “a
fair amount”) [7].
consideration the the risk of re-identification the par-
ticipant could make the date of birth
And so we might expect scientific latest anonymization and ZIP code information less specif-
studies of anonymization to increase
public (and decision-maker’s) under-
techniques and ic, and (a rather obvious step it must
be said) remove his or her name from
standing of the risks of re-identifica- risk-assessment uploaded documents. Sweeney and
tion. The opposite may sometimes be
the case.
procedures, and her co-authors did not deal with what
Barth-Jones has termed “the myth
Trust cannot fail to be affected continually assessing of the perfect population register,”
by what Daniel Barth-Jones has de-
scribed as “anxiety-inducing media
the changing risk which says without a complete and ac-
curate population listing, an intrud-
storms” over recent re-identification environment. er could not be certain whether the
research and demonstrations. He name was a correct identification or
has argued “many, if not most, re- a false positive, unless additional in-
identification demonstration attacks, formation was available about the in-
particularly because of the way their assess risk rationally: “When a re-iden- dividual [10]. “Some people will always
results have been reported to the tification attack has been brought to be missing from any easily obtained
public, serve to inherently distort the life, like some Frankenstein monster, source of data” and Barth-Jones has
public’s (and, perhaps, policy-mak- our assessment of the probability of argued studies often miss the step of
er’s?) perceptions of the likelihood of it actually being implemented in the assessing the impact of “the myth of
‘real-world’ re-identification risks” [8]. real-world may subconsciously be- the perfect population register,” thus
He analyzed a recently reported DNA come 100 percent, which is highly dis- making highly conservative estimates
Hack carried out by Yaniv Erlich’s lab, tortive of the true risk/benefit calculus (potentially overestimating) of the
which had been reported using such that we face.” true re-identification risks.
headlines as “DNA hack could make Even this non-techie author could
medical privacy impossible.” identify discrepancies between some TIME AND ONLINE CONTENT
Barth-Jones’s analysis of the study media reporting of Latanya Sweeney’s In the debate over re-identification
concluded only 6 percent of the U.S. Personal Genome Project re-identifi- risk, it is also relevant to draw at-
population was at risk of having their cation exercise and the reality as ex- tention to recent scholarship on the
last name correctly guessed by the pressed in the accompanying paper impact of time on privacy and infor-
method used (which excluded all fe- [9]. It was not the case, as reported in mation lifecycles. Contrary to the com-
males), and that this was not equiva- one article, that the study “elucidated mon view that information posted on
lent to re-identifying them. Additional the genome” of more than 1,000 par- the Internet will be there to haunt the
demographics could be used to at- ticipants of the Personal Genome individual forever, in her paper “It’s
tempt a unique identification, Erlich’s Project or that 84–97 percent of par- About Time,” Meg Ambrose consid-
study estimating 17-18 percent of ticipants were accurately re-identified. ered studies that have shown that 59
males in the U.S. might be unique with Sweeney’s paper disclosed some Ge- percent of Web content disappears
regard to combination of surname, nome project participants had volun- after one week, and 85 percent after a
age in years, and state of residence teered demographic data, such as date year (which is in itself concerning from
and potentially re-identifiable. In a of birth, gender, and ZIP code. In addi- an historical records perspective). Am-
real-world implementation however, tion, some documents that had been brose commented “it is possible that
the intruder would not know whether uploaded from outside sources were content can be easily accessible for a
the last name was correct or a false found to contain the participant’s very long time, but permanence does
positive. Barth-Jones also pointed out name or nickname. Sweeney’s study not, at this point, appear to be a perva-
this research targeted a population only analyzed about half the public sive threat to most” [11].
sub-group with Mormon ancestry, al- profiles available (579 out of 1,130), But as data’s online permanence
ready at increased risk of re-identifica- which were the ones disclosing date is not yet predictable, it ought for the
tion due to their participation in other of birth, gender, and five-digit ZIP. time being to remain a factor. Contri-
projects. While not downplaying the Re-identification was attempted us- butions from computer scientists will
likelihood that the risks associated ing a sample of voter registrations and be essential to the on-going debate
with this attack will increase over the an online public records website. The on how this factor, amongst others,
next decade, Barth-Jones commented tests yielded 241 (42 percent ) names should feed into the assessment of re-
on the impact of fear on the ability to that might match to a profile. These identification risk.

XRDS • fall 2013 • Vol.20 • No.1 25


feature
CACM_JOCCH_one-third_page_vertical:Layout 1 7/30/09 5:50 PM Page 1

FINAL THOUGHTS sions should be more wide-ranging, in-


We will all be aware of the arguments cluding not only lawyers but also rep-
in favor of anonymization: It protects resentatives of all interested parties
privacy while allowing information in the domain, those demanding the
to be used for important secondary data, the data controllers (who under-
purposes, for instance monitoring the take the risks of publication), domain
ACM quality of healthcare. More could be
done to increase transparency about
experts (who know about the power of
data in that domain, and the potential

Journal on the methods, risks, reasons for and


benefits of anonymization, thus con-
harms), and technical experts” [6].
In questions of anonymization, this
tributing to genuine understanding is sound advice.
Computing and and potentially reducing the fear fac-
tor bound to affect not only lawyers
Cultural but others involved in taking deci-
sions about data disclosure. “People
References
[1] United Kingdom. House Of Commons. Oral
Evidence Taken Before The Science And Technology

Heritage need full information and guidance


for action, rather than just reassur-
Committee. Risk Perception and Energy
Infrastructure. Uncorrected Transcript of Oral
Evidence. 18 January 2012.
ance, and their concerns must be [2] Information Commissioner v. Magherafelt District
Council [2012] UKUT 263 AAC. Available at: http://
taken seriously,” wrote Blastland and www.osscsc.gov.uk/Aspx/view.aspx?id=3536
Spiegelhalter [5]. [3] Ohm, P. Broken promises of privacy: Responding
But we cannot be complacent. Deci- to the surprising failure of anonymization (August
13, 2009). UCLA Law Review 57 (2010), 1701; U
sions to release anonymized data must of Colorado Law Legal Studies Research Paper
be taken with great care taking into No. 9-12. Available at SSRN: http://ssrn.com/
abstract=1450006
consideration the latest anonymiza-
[4] Cavoukian, A. Looking forward: De-identification
tion techniques and risk-assessment developments—new tools, new challenges. Office
procedures, and continually assessing of the Information and Privacy Commissioner of
Ontario. May 2013. (PDF)
the changing risk environment. Ca-
[5] Blastland, M. and Spiegelhalter, D. The Norm
voukian noted the increase in genetic Chronicles. Profile Books, London, 2013.
research, including the trend toward [6] O’Hara, K. Transparency, Open data and trust in
government: Shaping the infosphere. In Proceedings
large-scale biobanks, poses new pri- of the 3rd Annual ACM Web Science Conference
vacy risks. “Improved methods for the (Evanston, IL, Jun. 22–24). ACM Press, New York,
◆ ◆ ◆ ◆ ◆ 2012.
de-identification of genome sequences
[7] Ipsos MORI Public Understanding of Statistics
or genomic data are needed,” [4]. Topline Results. Royal Statistical Society. April 2013.
JOCCH publishes papers of In addition it may only be a matter (PDF)

significant and lasting value in of time before a re-identification risk is [8] Barth-Jones, D. Public Policy Considerations for
Recent Re-Identification Demonstration Attacks
created by the open and uncoordinat- on Genomic Data Sets: Part 1 (Re-Identification
all areas relating to the use of ICT
ed release by separate public bodies of Symposium). Bill of Health (blog), Harvard Law
in support of Cultural Heritage, two similar or identical datasets, one
School , May 29, 2013.
[9] Sweeney, L., Abu, A., and Winn, J. Identifying
seeking to combine the best of anonymized effectively, the other not. Participants in the Personal Genome Project by
computing science with real And what about re-identification risks Name. Harvard University. Data Privacy Lab. White
Paper 1021-1. April 24, 2013. (PDF)
attention to any aspect of the that may be created by personal data
[10] Barth-Jones, D. The ‘Re-Identification’ of Governor
disclosed by a breach? William Weld’s Medical Information: A Critical
cultural heritage sector. Courts faced with a dispute over a Re-Examination of Health Data Identification
Risks and Privacy Protections, Then and Now (June
proposed or existing release of an ano- 4, 2012). Available at SSRN: http://ssrn.com/
◆ ◆ ◆ ◆ ◆ nymized dataset will increasingly be abstract=2076397
[11] Ambrose, M. L. It’s About Time: Privacy, information,
called upon to assess the robustness of life cycles, and the right to be forgotten. Stanford
the risk assessment. But what will be Technology Law Review 16 , 2 (2013). (PDF)
judged an acceptable risk? 10 percent,
www.acm.org/jocch 2 percent, 0.001 percent? How do per- Biography
centages equate to “remote” or “mini-
www.acm.org/subscribe mal” risks? The question of whether
Marion Oswald is a practicing solicitor and Head of
the Centre for Information Rights at the University of
Winchester. Before joining the University, Oswald worked
data is, or is not, personal data is ulti- in legal management roles within private practice,
mately a legal one; lawyers need con- international technology companies and UK central
government, including the Ministry of Defence, and
text in order to tackle it, not something specializes in data protection, freedom of information and
that lawyers can do alone. information technology.
As per Kieron O’Hara, “…data pro-
tection is not sufficient for preserving
privacy, or public trust, or indeed the Copyright held by Owner/Author(s).
usability of data, and the right discus- Publication rights licensed to ACM $15.00

26 XRDS • fall 2013 • Vol.20 • No.1


feature

Personal,
Pseudonymous, and
Anonymous Data:
The problem
of identification
Why defining what counts as personal data is important
for data protection and information sharing.

By Iain Bourne
DOI: 10.1145/2508972

D
ata protection (DP) law has been around in one form or another for around 40 years.
In the United Kingdom, DP law came into effect back in 1998, based on a directive
published in 1996 that had been in gestation since the early ‘90s. A lot has happened
since then. Think of the information technology and resources that were around
almost 20 years ago, and compare them to the ones you have now. Back then, you probably
kept an address book of friends and acquaintances on a slow, non-networked home PC; had
physical files relating to your own commercial affairs; held records relating to the family
members’ health checks, school re- friends so it’s possible for you all to simple and there is currently a great
ports, and employment; and had a meet up in a local café. In amongst deal of debate within the DP commu-
basic mobile phone containing indi- this activity lies a wealth of personal nity about what sort of information is
viduals’ contact details and maybe a data, and we need to ensure such data “personal data” and, therefore, about
ping-pong game, if you were lucky. is used and managed safely. This is the scope of DP law. This is currently
Nowadays, you can have a host where DP law comes in. coming to a head as work on our next
of email addresses and Twitter ac- DP law does good things for citizens iteration of DP law—the proposed DP
counts under a variety of different and should make sense for organiza- Regulation—continues in Europe. This
names; maintain a social networking tions. In short, it means your personal is not solely an academic matter; it has
account to make contact with people data has to be used fairly and lawfully, great real-world consequences for the
all over the world; store your pictures and organizations must store it secure- rights and protections we enjoy as indi-
and videos in the cloud; sell your un- ly, have to be open about what they do viduals and for what organizations have
wanted birthday presents on an e- with it, and have to give you access to it to do to comply with the law. The prob-
commerce site; and use your mobile when you ask. That’s about it really—so lem is that we have data protection law,
phone to share geo-location data with far so good. However, it’s not quite that and will continue to have data protec-

XRDS • fall 2013 • Vol.20 • No.1 27


feature

ART TK

tion law, but it is (arguably) becoming tifying people—this is no longer just search engine anonymously, i.e. with-
less clear what sort of information the about real-world or civic identifica- out being signed in through your email
law applies to. And, of course, the law tion. Second, the question is not only account. It is clear the service provider
doesn’t work very well unless it is clear about whether information does iden- has no “real-world” information about
what it applies to. tify someone, but also whether it could you, such as your name, address, or
identify someone. phone number. However, it is also clear
THE PERSONAL DATA PROBLEM To look at the first point, what does that you are being identified, but in a
Put simply, personal data is informa- it mean to “identify” someone? Imag- different way. The personalization of
tion that identifies someone or can ine you go online and use, say, a major your browsing experience and the be-
allow someone to be identified when havioral advertisements you receive are
combined with other information. On evidence of this. This all works through
the face of it, that’s a fairly straightfor- the IP address that allows the website
ward definition that should make it A pseudonym can be to recognize you—or more precisely the
easy to say whether or not a particular
piece (or set) of information counts as
both a way of hiding device you use to go online—coupled
with cookies and other data linked to
personal data. There is no problem say- identity and, at the that IP address. I would argue this op-
ing information related to your health
record, tax contributions, or bank ac-
same time, a means eration involves the processing of per-
sonal data and should be covered by
count is your personal data. The in- of revealing it in a DP law, albeit in a modified way. This is
formation identifies you and relates
to you, so that’s easy. However, there
different way. This just one illustration of an alternative or
“non-obvious” form of identification, a
are two reasons as to why the situation lies at the heart of phenomenon that was not anticipated
around the peripheries of personal
data can be far less clear. First, there
the pseudonymous when current DP law was drafted.
I think the example above is fairly
are now lots of different ways of iden- data problem. straightforward. However, let’s think

28 XRDS • fall 2013 • Vol.20 • No.1


ART TK

about identification in the context of, New Year’s Eve revelers example. The tigation. If the police review pictures
say, a photograph of a crowd of people. publisher of the newspaper may not of known suspects and use a televised
It could be a shot entitled “New Year’s wish to identify (and may not even be appeal to the public for assistance, it
Eve revellers in Trafalgar Square” pub- able to identify) anyone in the crowd, becomes much more likely that some,
lished in a newspaper. The publisher but imagine if there was a terrorist in- or all, of the individuals in the original
has no interest in identifying any of the cident in Trafalgar Square shortly after picture will be identified in the old-
people in the crowd and has no means the photograph was taken. The pub- fashioned, real-world sense of being
of doing so. However, it is certainly pos- lisher later hands over all of its photog- named. This brings the photograph
sible to single out one person from an- rapher’s shots to the police for inves- much more firmly within the scope of
other and there is no doubt the people personal data. This also illustrates the
in the picture would be able to identify current debate about whether the test
themselves, as would people who know for information being personal data
them. A current stream of thought ar- An overly wide is identification, the reasonable likeli-
gues the information is personal data
if one individual can be singled out
definition of hood of identification (the formulation
in current DP law), or identification and
from another. This is a view that holds personal data and the possibility of identification. Some
considerable sway in some European
Union member states’ DP authorities.
an overly restrictive argue the latter formulation would pro-
vide better protection for individuals
However, there is currently much de- application of the and would “catch” significant informa-
bate about this within DP circles.
Recall the second part of the per-
law could take our tion around the peripheries of the cur-
rent definition of personal data. Others
sonal data problem is whether informa- society backwards argue it would widen the scope of DP
tion could identify someone. Clearly,
this can be very difficult to assess with
in terms of access law unacceptably and would bring per-
sonally inconsequential information
any real objectivity. Let’s return to our to information. within its scope.

XRDS • fall 2013 • Vol.20 • No.1 29


feature

PSEUDONYMOUS DATA dividual to be singled out from another analysis and say the dataset contains
The English word “pseudonym” derives individual, then the information is per- the personal data of the five “singled
from the Greek word pseudōnumon, via sonal data. And that’s that. However, out” individuals and, in fact, data like
the French pseudonyme. In Greek it let’s look at an example to see how this this can always be linked back to an
means “false name,” a meaning that plays out in practice. identified individual and is therefore
still corresponds extremely closely Table 1 shows a redacted version of always personal data. This begs the
to its ordinary English one. In one a set of personal data that is being used question of how the research organi-
respect, a pseudonym is a privacy- by researchers to examine the relation- zation would grant subject access (the
enhancing construct intended to pre- ship between the receipt of a state ben- statutory right of access to your per-
vent individual identification, rather efit and a person’s weight. The original sonal information) to an individual,
than to facilitate it. However, as is of- dataset—which also included research or how it would tell individuals that it
ten the case in DP law, things aren’t subjects’ names, addresses, and dates has their personal data, as it could be
that straightforward. A pseudonym of birth—has been irreversibly deleted. required to do under DP law. The prac-
may indeed be a “false” name, but it The “research cohort reference num- tical problems of an overly wide defini-
is still a name and can still be used to ber” was generated from individuals’ tion could be very great.
single one person out from another. names using a one-way irreversible
Let’s consider the use of an alias, encryption algorithm. This is the sort PSEUDONYMOUS DATA AND
such as a nom de guerre. When Car- of dataset commonly used to conduct THE NEW DP REGULATION
los the Jackal committed his crimes longitudinal studies into health and There is a lot of discussion at the mo-
he used his alias to a) hide his real other research or analytics, where the ment about the possibility of introduc-
identity­—Ilich Ramírez Sánchez—from objective is to study information about ing a new class of pseudonymous data
the authorities, and b) to let the public particular individuals without iden- into the proposed DP Regulation. The
know he was responsible for the various tifying any of them. In this example, argument seems to be introducing a
actions they were reading about in their it is clearly possible to single out one new sub-class of something akin to
newspapers. So, a pseudonym can be individual from another, but is any potentially identifiable personal data
both a way of hiding identity and, at the individual really identified? Does this would extend the regulation’s cover-
same time, a means of revealing it in a table contain any individual’s per- age, offering better protection to in-
different way. This lies at the heart of the sonal data? I would argue it does not, dividuals while also providing more
pseudonymous data problem, which because although each row relates to a “lite” regulatory coverage for organi-
has translated into a complex current particular individual, it does not iden- zations processing this form of data.
debate about whether a pseudonym is tify any of them, and nor could any However, as we have seen, the issue of
(potentially) personally identifiable— individual ever be identified because what a pseudonym is, and of whether
and is therefore personal data—or can the original source data no longer ex- pseudonymous data is a form of per-
be anonymous (and is not, therefore, ists. (In reality this type of dataset is sonal data or is anonymous, is far from
personal data). The Information Com- normally given additional protection straightforward. There seem to be
missioner’s Office, the UK regulator of through value-swapping, perturbation, three basic meanings in circulation:
DP law, would argue it can be either, blurring, and other techniques, mean- 1. Pseudonymous data is data
depending on how the pseudonym is ing the information remains valuable where a “real” identifier—such as
produced and the context in which it is to researchers but does not—and can- somebody’s name or National Insur-
used. However, others reject this view not—be used to identify anyone.) How- ance number—is replaced by a “false”
and argue if a pseudonym allows an in- ever, others would disagree with this identifier such as a hashed code num-
ber. This is a privacy-enhancing tech-
Table 1: A fictional example of redacted personal data. nique used in contexts such as medical
research or online analytics. It allows
individuals to be tracked longitudi-
1. 2. 3. 4. 5. nally without their identities being re-
Name, Period of Body Age Research vealed. However, the link between the
address, Special Mass Range Cohort data and the individual could be per-
date Index reference manently and irreversibly broken, al-
Assistance
of birth number though there is much debate about the
Benefit
possibility of this. This corresponds
­­ < 2 years 21 40-45 QA5FRD4 most closely with the ordinary English
> 5 years 19 50-55 2B48HFG meaning of pseudonymous.
2. Pseudonymous data is an alter-
< 2 years 20 40-45 RC3URPQ
native form of personal identification,
> 5 years 23 45-50 SD289K9 such as when an online services com-
< 2 years 20 45-50 5E1FL7Q pany uses an IP address and associated
cookie logs to target content at a par-
ticular device user.

30 XRDS • fall 2013 • Vol.20 • No.1


3. Pseudonymous data is data that
could potentially be combined with
It means your tion released under the U.K.’s Freedom
of Information law consists of—or is
other data to produce personal data. personal data has derived from—personal data. We need
This could apply to data from which all
personal identifiers have been removed
to be used fairly to make sure we have data protection
law that doesn’t reduce the availabil-
and where there is no reasonable likeli- and lawfully, and ity of information but makes sure that
hood of re-identification. This is what,
in the UK at least, would currently be
organizations must we, the people, are properly protected
and we have a transparent society with-
considered as anonymous data. store it securely, have out becoming transparent citizens. An
This situation is complex, particular-
ly because the process of pseudonymiz-
to be open about what overly wide definition of personal data
and an overly restrictive application of
ing personal data can result in either they do with it, and the law could take our society backward
anonymous data or a different form of
personal data altogether (see meaning
have to give you access in terms of access to information.
Well-drafted and well-regulated DP
no. 2). This depends on how the pseud- to it when you ask. law can also give us an appropriate
onymization is done; whether it is com- degree of privacy and control over our
bined with other privacy-enhancing personal information. DP law becomes
techniques and what other linkable more socially relevant—and necessary,
information is available elsewhere. I directly or indirectly, by means reason- I think—given the different ways in
would argue some pseudonymous data ably likely to be used by the controller or which the identification of individuals
can be anonymous, whilst other pseud- by any other natural or legal person. now takes place, as well as the sheer
onymous data is a form of personal We believe DP law must be finite; scale and sophistication of state and
data. Clearly, these subtleties make there must be a point at which we can corporate data processing. It is there-
it difficult to translate the concept of say with a reasonable degree of cer- fore important that current and future
pseudonymous data into a legal text and tainty that this information is not per- DP law is sufficiently wide in scope to
to map relevant exemptions and moder- sonal data and can be used and shared protect and give rights to individuals
ations of the law’s provisions onto it. outside the constraints of DP law. This in respect of new forms of identifica-
is extremely important if socially desir- tion—ones that make our names and
WHAT IS ANONYMOUS DATA? able activity, like medical research, is addresses seem decidedly old-fash-
If we have a schema that consists of to proceed with the certainty that the ioned. However, there are some dif-
ordinary personal data and pseud- pseudonymous data in use—when pro- ficult legal and technological issues
onymous personal data, we could also tected by sufficient privacy-enhancing here. It is by no means clear to what
need a definition of anonymous (or techniques and technical/organiza- extent rights like subject access can be
non-personal data) to describe indi- tional arrangements—is anonymous. delivered in respect of the “non-obvi-
vidual level or other data that is not be- ous” or pseudonymous data described
ing, and cannot be, used as any form WHY DEFINING PERSONAL above where, in reality, service provid-
of identifier—obvious, non-obvious, DATA MATTERS ers do not really know who anyone they
or otherwise. A good example of this There are a number of important so- process personal data about is. This
might be individual-level data de- cial policy objectives which mean needs some serious thinking-through
rived from patients’ records. This data that the status of personal data, and as work on the new DP Regulation
should be manipulated in such a way information derived from personal continues. It is clear individuals need
that it can never again be linked to a data, is crucial. rights that can be delivered in practice
particular individual or used to inform First, modern societies run on in- and that organizations cannot be re-
any decision in respect of any individ- formation, and much of it is actual per- quired by law to do what is impossible
ual. In a context such as this, unlike sonal data or information derived from in practice. This would bring data pro-
the use of cookie and IP information it. Resultant examples of this are public tection into disrepute and make it less
discussed above, there is no inten- health, evidence-based policy, service relevant to the lives of ordinary people.
tion—or, depending on system design, planning, economic forecasting, and
Biography
possibility—of using the information just satisfying the basic human need to
Iain Bourne is group manager for policy delivery at the
in such a way that it can have any effect know more about ourselves and society Information Commissioner’s Office (ICO), an independent
on any particular individual. This manifests itself most obviously regulatory body dealing with data protection and privacy
law in the United Kingdom. Bourne has dealt with a wide
A formal definition of anonymous— in initiatives like Open Data and the range of areas, including health service compliance,
or non-personal data—could presum- duties placed on some organizations emerging technologies, international issues, and privacy
at work. During the last few years he has headed up
ably be reverse engineered from the to publish a whole lot of information the ICO’s work on information sharing, privacy notices,
DP Regulation’s definition of personal about what they do. In the health ser- personal information online, and anonymization. He is
currently involved in the ICO’s work on the European
data. It might say that: vice, for example, published informa- Commission’s proposal for new data protection regulation.
Anonymous data is data that does tion is ultimately derived from patients’
not identify any natural person, or al- records, doctors’ prescription data, so Copyright held by Owner/Author(s).
low any natural person to be identified, forth. Second, a great deal of informa- Publication rights licensed to ACM $15.00

XRDS • fall 2013 • Vol.20 • No.1 31


feature

Talking
‘Bout Your
Reputation
People think they want anonymity, but actually desire privacy.
But how do we reframe the debate surrounding privacy and security?
Perhaps technology is the answer.

By David Birch
DOI: 10.1145/2517998

T
here are sound rationales both for and against anonymity [1] and these are
brought into sharp relief by the combination of social networking, mobile phones,
online business, and government. Real issues ranging from online bullying to
activism under repressive regimes mean we as a society need to think about what
we want from the emerging infrastructure and how anonymity should work, or even exist,
within that infrastructure.
I imagine if you were to walk down a important that they can login to a web- anonymity should be stopped because
typical street with a clipboard and ask site about trade unionism, or diseases, it allows child pornographers, terror-
members of the general public whether or pornography, or anything else they ists, and drug dealers to congregate
Photograph by Vyacheslav Pokrovskiy

they think online anonymity is impor- might not want other people to know in virtual space with impunity, they
tant, most would say “yes.” They would they have been looking at. I take a guilty would similarly say “yes.” As I noted,
say it is important that they can pay pleasure in occasionally visiting the Dai- sound rationales for and against. Per-
for something in cash without it being ly Mail website to read the readers’ com- haps I might generalize and say people
tracked and traced by the government. ments on the major news stories of the want anonymity for themselves, but
They would say it is important that they day, but I certainly don’t want friends or not for other people. So why do they
can vote in a secret ballot without their work colleagues to know about that! want anonymity for themselves?
choices being observed by party activ- However, if you set out clipboard in In the UK, there are some people
ists (or spouses). They would say it is hand and asked people whether online who don’t want to register their Oys-

32 XRDS • fall 2013 • Vol.20 • No.1


XRDS • fall 2013 • Vol.20 • No.1 33
feature

ter card—the contactless “tap and go”


mass transit card used in London—
It is not anonymity Technology Can Do Better
Let’s take the payments example as an
because they don’t want “the man” to that society wants; easy case study. My bank, Barclays, lets
know where they’ve been. I’m sure it
comes as quite a surprise to such peo-
it is privacy. me choose any picture I like to go on
my debit card. But let’s say they could
ple that if they drop their Oyster card Anonymity is just as easily let me choose any name
down a drain and lose it, they can’t get
the money back by phoning up Trans-
a clumsy hack that I like to go on my debit card. I might
decide that I’d like to have a debit card
port for London. I imagine the next we have to use in the name of “Johnny Mnemonic.” I
time that person gets an Oyster card
one of the very first things they will
because we don’t have stroll down to Marks & Spencer and I
buy myself a nice pair of trousers with
do is register the card online so in the the proper control my debit card. When it comes time to
case of it being lost, stolen, or broken
they can get their money back.
over information. pay I put in my card and punch in my
PIN. Job done. I take the trousers home,
I would argue the Oyster card ac- but when I go to put them on I discover
tually represents a very good way of that they have a hole in them. So I take
balancing privacy and security. When privacy—is so horrible. But I don’t them back to Marks & Spencer and I get
you use your Oyster card in London think I want to live in a world that al- a refund by presenting the same card.
the record of the journey is kept for a lows anonymous transactions and nor At no point during any of these transac-
few weeks and then it is anonymized does the U.S. government, given their tions do Marks & Spencer need to know
for use in statistical analysis and cal- recent actions against financial insti- that I am actually Dave Birch. They
culations. Thus if a crime is com- tutions with poor know-your-customer don’t know who I really am, but—cru-
mitted somewhere on the system, procedures. I think I want to live in a cially—they know Barclays does.
then the police can go to Transport better world. It’s not as if I can do anything bad
for London with a warrant and ask The way to resolve the tension be- with that card and get away with it. If
for a list of all of the people who had tween the perfectly reasonable argu- that card is used in some criminal en-
been through such and such a barrier ments for and against anonymity is to terprise of one form or another, the po-
at such and such a time, provided it reframe them in terms of privacy. We lice can go to Barclays with a warrant
was within the last few weeks. What need an infrastructure that delivers pri- and ask them who Johnny Mnemonic
law enforcement can’t do, even if vacy, not anonymity. Privacy is a good is. Barclays will, of course, tell them
they were to access the system extra- thing. Privacy permits individuals to that it’s me.
legally, is trawl back through a non- express unpopular ideas to people they This example makes a very general
anonymized historical dataset. I trust without having to worry about point. There are very few transactions
would have thought that most people how society will judge them. It is vital any of us do that require the amount of
would find this reasonable, but ano- to democracy and it contributes to the identification we routinely provide. And
nymity only sounds good to people “marketplace of ideas” and the promo- if you look at the problem the other way
who do not consider its implications. tion of the truth [3]. It is central to our round, an increased number of trans-
Take voting for example. In the UK idea of democracy and expressed (cer- actions that do require identification
we have a supposedly secret ballot, yet tainly in the case of the U.S.) in the right make those identities less secure and
the ballot papers are numbered and to privacy by association [4]. more likely to be abused. There’s a kind
can be traced back in the event there It is not anonymity that society of minimization principle at work. Mi-
are suspicions of irregularities at the wants; it is privacy. Anonymity is a crosoft’s Kim Cameron rather neatly
poll. You have effective anonymity at clumsy hack we have to use because encapsulated issues such as this in his
the ballot, but if there is reasonable we don’t have the proper control over “seven laws of identity,” one of which
suspicion of fraud that anonymity can information. In the absence of such was that we should design an identity
be removed. Here is another pointed controls, anonymity is really the only metasystem with minimal disclosure
example of something that sounds option open to individuals. But it’s not for constrained use [5]. In other words,
good but isn’t: cash. Is it right that a optimal for them and it is not optimal the metasystem should provide only the
corrupt politician can take cash bribes for society either. It annoys me when information needed to enable the spe-
and hide them in his freezer? It isn’t; he people—politicians especially—frame cific transaction at hand: You shouldn’t
got 13 years [2]. Is it right that criminals this discussion in terms of finding the have to show a driving license with your
can engage in kidnapping, tax evasion, right balance between privacy and se- name and address on it in a bar when
or money laundering? Should people curity. I don’t want a balance between you are actually trying to prove that you
be able to engage in commercial trans- them, I want both of them and I’m are over 21.
actions that can never be traced? At not sure if the politicians and regula- Let’s apply this to a more complex
first glance it is very tempting to say yes tors understand the technology well example of unpopular free speech us-
because the alternative—which is a Big enough to realize that modern com- ing the prosaic case study of newspaper
Brother government in a big data world munications and cryptography have comments. If I read an interesting ar-
eradicating any element of individual the capabilities to make this possible. ticle in the newspaper, I might be very

34 XRDS • fall 2013 • Vol.20 • No.1


interested in reading comments from and contact the phone company to find Just as you have a few different pay-
intelligent and informed readers. Yet out where you are. ment cards in your wallet today, imag-
some of those readers will not make This is what I’ve taken to calling ine in the future you will have a few
honest comments if they are identi- “smash the glass” privacy. You have different “identities”: a work identity,
fied. I’m one of them. If I respond to privacy so long as you behave, but if a personal identity, a hobby identity, a
an article in a UK newspaper, I want you do something bad then the author- self-certified “John Doe” identity, and
to be able to choose between respond- ities can “smash the glass” to sound an so on. While I am walking down the
ing as me (because my qualifications, alarm. Of course, if you live in a coun- street, my phone will be set to the John
experience, or position is relevant to try where you do not trust the authori- Doe default identity. When I walk into
the issue) or responding as Joe Bloggs, ties or the legal system, then we need a shopping mall, the shopping mall
because although I think Ed Miliband to look at other solutions. How would a will know I am John Doe and last time
would make an excellent Prime Minis- human rights activist in Iran obtain a I came in I went to Starbucks—if the
ter, I know my boss hates him. However digital certificate to prove they are hu- mall sends me a coupon for Costa Cof-
revolting some people’s opinions are, man rights activist, yet make it impos- fee, that’s absolutely fine. I sit down for
I’d rather know they are their opinions sible to identify them? The technology a coffee and I use my hobby identity to
and not have them submerged. to do this is easy—it is called “cryp- post a few comments on a newspaper
Suppose someone who knows who tographic blinding” and it has been article about contactless payments (my
I am—let’s say the Royal Mail (who around for a generation [6]—but the hobby). When I go to Marks & Spencer,
knows where I live as well)—gives me a metasystem is still not there. my Marks & Spencer app will open up
digital certificate that only states I live It is possible, I think, to see the out- and use my personal identity, with my
in the UK. That’s good enough. I use it line of a suitable metasystem taking permission. I will have an identity that
to get a Twitter account. Using the Twit- shape with the efforts of the National is partitioned, with a simple remote
ter account, I tweet Ed Miliband is a Strategy for Trusted Identities in Cy- control (my mobile phone).
communist fellow traveler and an eco- berspace (NSTIC) in the U.S. and the
nomic ignoramus with no experience Cabinet Office’s Identity Assurance So What?
of the real world. This is an opinion and Programme (IDA) in the UK, creating I think the paradigm shift from transac-
free speech. Ed can complain as much public/private frameworks in which tions based on identity to transactions
as he likes, but tough. Now suppose I a variety of identity providers can op- based on credentials represents a very
tweet his home address and suggest erate. In the UK, the government has straightforward way to handle privacy
he should be beaten up. That’s against announced the first eight identity pro- in a realistic way. There is no need to
the law. Ed obtains a court order, and viders—The Post Office, PayPal, Cas- choose between anonymity and crime,
the Royal Mail tells him who I am and sidian, Digidentity, Experian, Ingeus, or a police state and order. We do not
where I live. This way, privacy is man- Mydex, and Verizon—and, with the need to choose between security and
aged correctly through the legal system. support of the trust framework indus- privacy. Technology can provide both.
try body Open Identity Exchange (OIX),
References
Then What Will Happen? has begun a number of what they call [1] Marx, G. Identity and Anonymity: Some Conceptual
Think about what is coming in terms Alpha Projects1 to explore the issues Distinictions and Issues for Research. In
of transactions. By and large the world around developing the IDA scheme. Documenting Individual Identity—The Development
of State Practices in the Modern World. Princeton,
that we will be living in will be a world The essential idea behind these Princeton University Press, 2001.
in which transactions are between my kinds of initiatives is not only can in- [2] Barakat, M. (2009, 13th Nov.). “William Jefferson,
Ex-Congressman, Gets 13 Years in Freezer Cash Case.”
phone and someone else’s phone—not dividual citizens, organizations, and The Huffington Post. Retrieved 29th May, 2013.
between me and someone else. What businesses choose who they want for [3] Solove, D. Anonymity and Accountability. in The
seems complicated—storing keys, issu- their identity provider (IDP), the IDPs Future of Reputation . New Haven, Yale, 2007.
ing certificates, and managing reputa- can be separate from the attribute [4] Landau, S. Politics, love and death in a world of no
privacy. IEEE Security & Privacy 11, 3 (2013).
tions—for individuals is trivial for apps. provider (AP) relevant to a transaction.
[5] Cameron, K. The Laws of Identity. Microsoft pp. 12
Modern smartphones are perfectly ca- So, suppose I choose Barclays to be my (11th May 2005).
pable of dealing with digital signatures, IDP. I then log in to some service some- [6] Chaum, D. Achieving Electronic Privacy. Scientific
trust chains, electronic identities, and where using my Barclays ID. The service American (Aug. 1992).

similar constructs. Therefore although needs to know I have a valid driving Biography
you can turn up at my door claiming to license. Barclays does not know this, David G.W. Birch is a director of Consult Hyperion, an IT
be an employee of the electricity com- but they know an AP who does (i.e. in management consultancy that specializes in electronic
transactions. Here he provides specialist consultancy
pany, I can use my phone to read your the UK, the Driver and Vehicle Licens- support to clients around the world, including all of the
phone, which lets me know whether ing Agency) so the bank can obtain the leading payment brands, major telecommunications
providers, governments bodies, and international
that’s true or not. I do not need to know relevant attribute (with my permission) organizations including the OECD. Before helping to found
who you are, but I do need to know what and then present it to the service. Consult Hyperion in 1986, he spent several years working
as a consultant in Europe, the Far East, and North America.
you are. If you subsequently batter me He graduated from the University of Southampton with a
over the head and steal my life savings, B.Sc (Hons.) in physics.
1 Please note that Consult Hyperion provide
then the police can go back to the elec- paid consultancy services in connection with Copyright held by Owner/Author(s).
tricity company to find out who you are, one of these Alpha Projects. Publication rights licensed to ACM $15.00

XRDS • fall 2013 • Vol.20 • No.1 35


feature

36 XRDS • fall 2013 • Vol.20 • No.1


Understanding
the Data Environment
Protecting data privacy and anonymity requires a better
understanding of the conditions and mechanisms under
which they may be threatened.

By Elaine Mackey and Mark Elliot


DOI: 10.1145/2508973

T
he data environment is a new concept in the field of data confidentiality. Although
there have been references to its various aspects, manifestations, and impacts,
it is only now that it has become a focus of inquiry in its own right. It is a focus,
we would argue, that is long overdue and rather urgent given the manner and
pace in which the data landscape is evolving. The huge amounts of data being generated,
combined with the economic drivers and political will to share it more widely, means
concerns about data privacy and anonymity are ever more founded. Here, we explain why
we need to understand the data envi- more extensive than this; for example, removed, obscured, aggregated, and
ronment in order to minimize threats schools also collect data on their pu- or altered in some way. There are two
to data privacy and anonymity. pils’ exam scores, special educational types of identifiers that organizations
When we talk about protecting data needs, and health; law enforcement need to think about when processing
privacy and maintaining anonym- also collects data on crime and anti-so- data: formal identifiers and complex
ity in the data confidentiality field, we cial behavior; and retailers also collect identifiers. Formal identifiers are rela-
are in essence talking about ensuring data on shopping and leisure habits, tively easy to spot and deal with and
anonymized data remains anonymous finance, employment status, and oc- include data such as a subject’s name,
once it is shared, disseminated, and cupation. This information will in all address, and unique reference num-
released in the data environment. So likelihood be stored in databases that bers (e.g. their social security number
what does this actually mean in prac- hold very many individual level records or National Health Service number).
tice? To answer this, we will first dis- of information. Complex identifiers are less easy to
cuss data and anonymization, as this This data is termed personal data, spot and deal with. They could in prin-
will set the scene for what we really which, as described by the UK Data ciple include any piece of information
want to discuss, the data environment. Protection Act (DPA, 1998), is “data (or combination of pieces of informa-
that relates to living individuals who tion). For example, take age and mari-
DATA AND ANONYMIZATION are or can be identified from the data.” tal status. Considered in the abstract,
All organizations will collect some Organizations that want or need to they are not immediately obvious iden-
information from their customers/cli- share and disseminate their data for tifiers. But, if we consider the case of
ents/service users as part and parcel of secondary use are obliged under the an 18 year-old widow, our implicit de-
their organizational activities. Almost DPA (1998) to process the data in such mographic knowledge tells us this is
always, this will include classic identi- a way as to render it anonymous and a rare combination (at least in peace
fiers such as client’s names, address- therefore no longer personal. The time). This means such an individual
es, and contact details. However, the transforming of data from personal could potentially be re-identified by,
information that is collected is often to anonymous requires identifiers are for example, someone spontaneously

XRDS • fall 2013 • Vol.20 • No.1 37


feature

recognizing that this record corre- ing how a statistical disclosure might to model. To address these failings,
sponded to their friend/neighbor/col- actual occur and then play out is not there has been a broadening of per-
league/family member. straight forward. This is the crux of the spective in the last 20 years, which has
Just this example alone presents a problem. As it stands, we know little seen attempts to incorporate some
data complexity problem, which dem- about the factors, conditions, and context beyond the data itself. This
onstrates anonymizing data is not mechanisms involved in a statistical has usually taken the form of intruder
straightforward. To complicate mat- disclosure largely because we know scenario analysis, which has shifted
ters further, organizations preparing little about the data environment. We attention away from the traditional
data for dissemination don’t just have will give a technical description of this position of asking “how risky is the
to think about sufficiently anonymiz- term shortly; for now, consider it as data for release” to a more critical po-
ing their data, but also about retain- the context for any piece of data, with- sition of asking “how a statistical dis-
ing data utility. After all, there is little out which the data has no meaning. closure might actually occur.” Some
point in sharing and disseminating You may wonder why it is only now inroads in addressing this latter ques-
data that doesn’t represent whatever it attention is being directed toward the tion have been made, most notably:
is that it is meant to represent (because data environment. After all, it would (i) the development of a framework
it has been altered during the ano- seem like an obvious point of focus for identifying plausible intrusion
nymisation process). given the task in hand. The explana- scenarios and (ii) the identification
Because anonymization is difficult tion for this lies with: (i) the particular of sets of key variables, i.e., informa-
and has to be balanced against data perspectives that have underpinned tion that can be used for statistically
utility, the risk a re-identification will and informed data confidentiality matching one dataset with another
happen will never be zero. In other work, and (ii) the intractability of un- [3]. But, for all intents and purposes,
words, there will be a risk (although derstanding and gathering data from this is where the work has stalled not
extremely small) of de-anonymization the data environment. least because much of it is theoretical.
present in all useful anonymized data. The traditional perspective was one It is certainly true that we lack a real
The only way to remove this risk entire- where statistical disclosure risk was worldview of statistical disclosure and
ly is not to share any data at all, which seen as originating from, and there- have relatively little direct data on it.
is obviously undesirable if we are to ex- fore largely contained within, the This may be because an act of statis-
ploit the undoubtedly huge social and data to be disseminated, released, or tical disclosure is a rare event and or
economic value locked up in the data. shared. It meant data researchers and is one in which the key protagonists
practitioners rarely looked beyond (i.e., the data intruder and the organi-
Statistical Disclosure the statistical properties of the data zation releasing data) are both incen-
For researchers in the data confidenti- in question. More precisely, it meant tivized to conceal (albeit for differing
ality field, the first step to determining they did not concern themselves with reasons). It is difficult to speculate
how best organizations can minimize issues such as how or why a data in- productively on this and we do not do
the risk of de-anonymization and op- truder might make a disclosure at- so here. The important point we wish
timize the trade-off they must make tempt, or with what skills, knowledge, to make is while there is little direct
between anonymization and data util- or access to other data they would data in the form of cases of disclosure,
ity is to assess how the process of de- require to ensure their attempt was it does not mean there isn’t any (key)
anonymization might actually occur. a success. As a consequence, the data; the data environment can poten-
The term commonly used in the field statistical models they built to as- tially tell us all we need to know about
to denote the process of de-anonymiza- sess disclosure risk, while statisti- how a statistical disclosure might ac-
tion (and one that we will use from here cally sophisticated, were based on tually happen.
on in) is “statistical disclosure.” A sta- very crude assumptions about the
tistical disclosure, we should point out, context of the risk they were trying THE DATA ENVIRONMENT
incorporates not just the idea of de- The data environment is made up of
anonymization (or re-identification), a small number of components: data,
but also captures the idea that con- agents, and infrastructure. It is these
fidential information is revealed (or
disclosed). See Duncan et al. or Hun-
We can only components that we need to look at in
order to ascertain how a statistical dis-
depool et al. for recent reviews of the effectively guard closure might occur and play out.
statistical disclosure control field [1, 2].
Formally, we describe a statisti-
against the threat Data. What (other) data exists in
the data environment? This is what
cal disclosure as a form of data con- to data privacy and we need to know in order to identify
fidentiality breach that occurs when,
through statistical matching, an in-
anonymity when what data (key variables) are risky, i.e.,
can be used for statistically match-
dividual population unit is identified we have a clear idea ing one dataset with another thereby
within an anonymized dataset and/
or confidential information about
of what it is we are providing (some of the) conditions for
statistical disclosure. This is still a
them is revealed. However, determin- guarding against. developing area, which, at Manches-

38 XRDS • fall 2013 • Vol.20 • No.1


ter University, we have been push-
ing forward. Our Data Environment
The transforming vidual machines themselves are then
sub-environments.
Analysis Service (a bespoke service of data from Note that all sub-environments
for the Office for National Statistics)
has involved developing a methodol-
personal to are, to some degree, permeable since
users move in and out of them with
ogy for investigating, cataloging, cat- anonymous requires knowledge of the external environ-
egorizing, and documenting data in
the data environment. However, this
identifiers are ment. So, even if I am a bona fide user
of an environment acting in a legal
methodology is operationalized man- removed, obscured, and compliant fashion, my knowledge
ually and is therefore constrained in
its scope and ability to deal with the
aggregated, and or of the external environment might
cause spontaneous de-anonymization
complexity of the problem. The next altered in some way. (e.g., if I recognize the 18 year-old wid-
step is to develop a methodology for ow as my neighbor).
automating these processes in order In conclusion, understanding and
to enable comprehensive capturing of building models of the data environ-
the global data environment. to the global system. But the global ment is of paramount importance if we
Agents. Who are the key protago- system has internal structure; it is are to continue to protect data privacy.
nists, and how might they act and partitioned and that partitioning cre- The reason for this is that we can only
interact to bring about a disclosure ates many local environments. For effectively guard against the threat to
event? What might the consequences example, a secure data center can be data privacy and anonymity when we
of this be? There is no risk of statistical termed a discrete data environment: have a clear idea of what it is we are
disclosure without human action. It has context-specific physical, tech- guarding against. While we have some
This may seem an obvious point, but nical, organizational, and managerial understanding of the factors, mecha-
it is one worth emphasizing. As of yet, structures that determine what data nisms, and conditions under which
we know little about the key protago- goes in; how data is stored, processed, data privacy and anonymity may be
nists and how they might interact. We and risk assessed; and in what format threatened, the long and the short of it
are currently working on an approach data comes out, who the user commu- is, at this present time, we do not know
using a game theoretic reasoning nity is, and how they can interact with enough about them. Further work on
to develop greater insights into how that data. By defining and regulating this topic is both necessary and urgent.
agents might act and interact strategi- a local environment, the data owner/
cally, within specific contexts, to create controller can render data anonymous References
a statistical disclosure [4]. that would, “in the wild,” not be. So [1] Duncan, G., Elliot, M. J., and Salazar, J. J. Statistical
Confidentiality. Springer, New York, 2011.
Infrastructure. How does infra- when we share data we are in effect
[2] Hundepool, A., Domingo-Ferrer, J., Franconi, L.,
structure and wider social and eco- moving it from one environment to Giessing, S., Schulte Nordholt, E., Spicer, K., and
nomic structures shape the data envi- another. Data environments can be de Wolf, P-P. Statistical Disclosure Control. Wiley,
London, 2012.
ronment? Infrastructure can be best looser in form than the secure data
[3] Elliot, M. J. and Dale, A. Scenarios of attack: the data
thought of as the set of interconnect- center. An environment might be de- intruder’s perspective on statistical disclosure risk.
ing structures (physical and technical) fined purely by regulation and licens- Netherlands Official Statistics , Spring 1999, 6-10.
and processes (organizational, mana- ing. For example, a community of al- [4] Mackey, E. M. and Elliot, M. J. The application of game
theory to disclosure events. Proceedings of UNECE
gerial, contractual, and legal) that lowed users might have access to data, Worksession on Statistical Confidentiality (Bilbao,
frame and shape the data environment. and that community—and the instru- Spain , Dec. 2-4). 2009.

It provides the context to data and agents, ments that define what can and cannot
so, for example, it will influence what be done with the data comprise—the Biographies
data is shared, to whom it is given, environment. Such an environment
Elaine Mackey is a well-established researcher into the
and how that process takes place. It cannot be as tightly controlled as the broader aspects of statistical confidentiality where the
will also influence key agents, such as secure data center environment, but it statistical, data management, and social policy meet. Her
Ph.D. demonstrated the value of using game theory to map
National Statistical Institutes, any or- does allow for some control of the data disclosure attack scenarios. She has recently worked as
ganization releasing data, data users, environment not (currently) present part of the Data Environment Analysis Service mapping
the data that an attacker might feasibly use to identify
specialist interest groups, the general when for example data is published on individuals in anonymized datasets.
public, and the media in terms of their the Internet.
Mark Elliot has an international reputation in the field
possible actions, interactions, and All data environments will con- of data privacy. He has led numerous interdisciplinary
counter responses. Infrastructure in- tain the features outlined above but projects in the field and his special unique methods are at
the center of the SUDA system for anonymization decision
cludes storage systems, information are likely to differ in form depend- support developed at the University of Manchester and
systems, data security systems, gov- ing on how they are made up and used in statistical agencies across the world. He has a
long track record of relevant stakeholder engagement
ernance structures, and national and how they are operationalized. A local most recently through his work on the Administrative Data
Liaison Service (www.adls.ac.uk) and as lead for the UK
international legislation. data environment may in turn con- Anonymisation Network (www.ukanon.net).
Thus far, we have talked about the tain sub-environments. For example,
data environment in the definite sin- an organization may have multiple Copyright held by Owner/Author(s).
gular; in other words, we are referring servers with differential access. Indi- Publication rights licensed to ACM $15.00

XRDS • fall 2013 • Vol.20 • No.1 39


feature

What is Bitcoin?
Strengths and weaknesses of the leader in a new generation
of emerging cryptocurrencies.

By Dominic Hobson
DOI: 10.1145/2510124

C
ontrol over your personal data is an important part of privacy. The Web enables
personal data to be gathered, shared, and traded with unimaginable ease, and keeping
a firm grasp on personal data is becoming more and more challenging.
One motivation for the mass movement of personal data around the Web is money.
Before the advent of the Internet, sending money from one side of the world to the other was
not as straight forward a task as it is today. But, just like personal information, the transfer
of money enabled by the Web has big implications for privacy. As we have moved from cash,
to checks, to credit cards—away from money” brings back literally millions Enter Bitcoin, a pseudo-anony-
physical gold standards—our money of results. mous, peer-to-peer currency protocol
has gradually become just numbers on As well as possession of money, cen- created and released, quite fittingly,
a computer. Inevitably, that computer tralized services also get possession of by a mysterious pseudonym, “Satoshi
belongs to someone else. personal data relating to purchases. Nakamoto,” whom has since disap-
Privacy, that is, the ability to be Supermarket chains can—and do— peared. Bitcoin is the leader in a new
able to reveal personal information infer age, gender, household salary generation of emerging currencies
through choice, is near impossible to bracket, and more from the items you known as “cryptocurrencies,” which
attain when a third party holds all your purchase in their stores to aid market- aim to, among other things, facilitate
money, personal information, as well ing and advertising efforts. In 2011, the movement of money electronical-
as every electronic transaction you’ve Visa announced a system called “Real ly while still maintaining a sense of
ever made. Part of the motivation for Time Messaging,” which sent offers privacy. Bitcoin disrupts this move to
creating the Internet was resilience and discounts direct to phones based centralized money services, putting
to attack through distribution with as on information deduced from card the Internet to the use for which it
few single points of failure as possible. use, such as location. was originally intended—fully decen-
Despite having such a system, users These are just a few examples of tralized services.
have still flocked en masse to central- how large companies are using per- It’s been suggested that by print-
ized services such as PayPal. With re- sonal data related to your payments ing our names and details on our
spect to privacy, we have given all our and transaction to target advertis- debit and credit cards we undermine
control; our personal data regarding ing and sell more efficiently. Many thousands of pounds of smart chip,
our transactions; our balances; and people may be comfortable with this. which is a privacy enabling technol-
who we pay, why, and when over to a After all, if somebody is going to try ogy. Unfortunately the biggest group
few large centralized services. and sell something to you wouldn’t of people helped isn’t ourselves, but
For some, this shift of monetary you rather it be something relevant to those with nefarious intent. It begs
control has its benefits. In theory, your you, which you might actually want? the question that in a system where
money is safer in a large organization’s However, there is virtually no alter- everything and everyone is represent-
virtual vault than under your bed. For native to holding money in a bank or ed as numbers on a computer, do we
others, this hand over of monetary other centralized services—which in even need a name? It is this approach
control has been more costly: Search- turn buy, sell, and profit from your that gives the Bitcoin protocol one of
ing for the phrase “PayPal took my personal data. its many strengths.

40 XRDS • fall 2013 • Vol.20 • No.1


ART TK

XRDS • fall 2013 • Vol.20 • No.1 41


feature

Breaking Down Bitcoin 2. Hash this block. Look at the The reward a miner receives for find-
The Bitcoin protocol itself stores no newly produced hash, specifically how ing a block is the sum of the transac-
personal data. Bitcoin offers its privacy many zeros it starts with: (a) If the tion fees of all the transactions in that
by design through novel use of cryptog- number of leading zeros is less than a block, as well as a block reward that is
raphy. Nothing personally identifiable predefined number (known as “diffi- currently 25BTC. This reward halves
is recorded. Instead, users have many culty”) then start again from step 1, in- every four years, so no more than 21
wallet addresses, which are hashes crementing the nonce to ensure a dif- million bitcoins will ever be produced.
of public keys. Users can, and are en- ferent hash is reached. (b) If there are The aim of this is supposedly to mimic
couraged to, have as many different more leading zeros than the required a finite resource such as gold. With
wallet addresses as required (ideally difficulty, then proceed to step 3. 3,600 bitcoins being produced a day
one per transaction). The correspond- 3. The miner has successfully and each bitcoin worth around $100,
ing private keys, required to authorize mined a block, adding it to the block- mining has become an industry and
a transaction, are stored locally in the chain. They then broadcast their hash, profession itself.
users wallet file. along with the transactions in it and Miners typically invest thousands
Users maintain full control and the nonce to others. The successful into their mining rigs in order to hash
possession of their local wallet file. But miner also receives newly created bit- just a little bit faster in hope of find-
“with great power comes great respon- coins as a reward in a special coin base ing a block before another miner does.
sibility.” Should a user accidentally de- transaction—this is how bitcoins are People are even going as far as creating
lete or lose their wallet file, they also initially produced. Application Specific Integrated Cir-
lose any associated bitcoins. Although 4. Other miners receive the new cuits (ASICs) that can cost tens of thou-
the bitcoins are still technically stored block and its contents. They check that sands of dollars each for the sole pur-
on a peer-to-peer network, the private all transactions in the block are valid pose of mining. For Bitcoin users, the
keys required to authorize a new trans- and not double spends, and check that more people mining and hashing, the
action are lost, effectively making the when hashed, they give the right result. more secure and resistant the block-
coins unspendable. If everything is valid, they use the new chain is to attack.
Behind the scenes, Bitcoin doesn’t block hash and start to mine the next To attack the network, a malicious
store each coin and who owns it. In- block with new transactions. actor would have to create or modify
stead, it uses a distributed ledger The mining process is effectively a transaction in a block and mine it
book system (called a “blockchain”) trial and error. As more people try faster than the rest of the entire net-
based on the logic that if you know mining, they would in theory be able work can mine a block. Mining faster
every transaction an address has to mine blocks more quickly. For this than the rest of the network on aver-
made, then you know if it has money reason after every 2,016 blocks that age requires the same or more hash-
to spend. This may appear initially are found (which happens approxi- ing power than the entire network
quite contradictory: A privacy protect- mately every two weeks), the pre- combined, which is currently hover-
ing money system that lets the en- defined number of leading zeros that ing at around 1500 petaFLOPS (float-
tire world see every transaction ever must be in a hash for it to be success- ing point operations per second). To
made. However, from a privacy per- ful (the difficulty) is adjusted. This put that in perspective, Tianhe-2, the
spective, it doesn’t matter if everyone is based on the average time it has world’s fastest supercomputer, has
can see every transaction, if the only taken to mine a block. If this time is managed to muster a measly 31 pet-
identifying information in a transac- more than 10 minutes, the difficulty aFLOPS and theoretically maxes out
tion is a seemingly random number of is decreased. This effectively restricts at just 54.9 petaFLOPS.
which everyone can have many. the mining process to a block every Even if an attacker could acquire
Transactions are verified through a 10 minutes. such power, malicious transactions
process known as “mining.” The min- must still be accepted as valid by other
ing process also serves as the mecha- miners. With more than 50 percent
nism by which bitcoins are initially of the network power, a malicious ac-
produced and distributed. Mining is Bitcoin disrupts tor can only prevent transactions, and
effectively the act of adding transac-
tions to the blockchain so everyone
this move to reverse or double spend transactions.
Should a malicious party have more
can agree on the same set of transac- centralized money than 50 percent of the hashing power,
tions. A node that chooses to mine
runs mining software, which repeats
services, putting they would earn more by legitimately
mining then they could with fraudu-
the following: the Internet to the lent transactions.
1. Gather up all unverified trans-
actions into a block (ensuring they’re
use for which it was Ultimately, providing you look af-
ter your wallet, your bitcoins are safe,
all valid transactions) along with the originally intended— unless somebody manages to break
hash of the last block added to the
blockchain and a random number
fully decentralized the military-grade cryptographic al-
gorithm ECDSA (Elliptic Curve Digi-
called a “nonce.” services. tal Signature Algorithm). Even if this

42 XRDS • fall 2013 • Vol.20 • No.1


Figure 1: The price of a bitcoin at the largest exchange, MtGox, in the first four months of 2013 in dollars.

Mt. Gox (USD) mtgoxUSD


Apr 25, 2013 — Daily Op:154.2, Hi:162, Lo:150.6, CI:152.8 Vol: 36.93K UTC – http://bitcoincharts.com
600K 280

550K 260
240
500K
220
450K
200
400K
180
350K 160
300K 140

250K 120
100
200K
80
150K
60
100K
40
50K 20
0K 0
Jan 13 Feb Mar Apr

were to happen, rolling out a new cli- automatically generate a new Bitcoin However, these clones still share
ent and switching over to a new block- address for each individual. Unfortu- some of the same weaknesses as Bit-
chain (called a “hard fork”) would nately, the vast majority of the popu- coin. In order to cash out Bitcoins into
solve the problem. Even a full break lation probably doesn’t know how to a fiat currency such as £ or $, typically
in ECDSA would have little to no im- do that. A potential flaw in Bitcoin, one must go through an exchange.
plication for privacy of Bitcoin as as it stands, is that it is so incredibly The price of a Bitcoin varies. In the
there is still no personal data stored novel and ingenious; it’s not yet intui- first four months of 2013, the price of
within the protocol. tive or easily understandable for most a bitcoin went from $20 to more than
of the population. How can someone $250, down to $60, and back up to
Weaknesses of Bitcoin be expected to trust something new $160 (see Figure 1). Such volatility is
So with such secure algorithms, that they don’t understand? Bitcoin unheard of in fiat currencies and has
what makes Bitcoin only pseudo- was originally created and used by brought Bitcoin’s value as a stand-
anonymous? One reason is that Bit- people with a technical disposition, alone currency (i.e., not pegged to a
coin can’t guarantee that users will not so its lack of ease of use is most likely fiat currency) into question.
somehow accidentally or intentionally a feature of being a first generation Such volatility can lead to practical
link themselves to a wallet address. For cryptocurrency. issues. For example, let’s assume a min-
example, let’s assume Alice published There are already clones of Bitcoin ing hardware company accepted pre-
a wallet address of hers on Twitter being created using the Bitcoin source orders in bitcoins on equipment worth
for donations and received 20BTC. By code as the base. Although none of $30,000. After missing shipping dates,
looking through the blockchain, any- these improve the usability, they do some customers requested refunds.
one can find the addresses that Alice offer variations in block production However, when some customers paid
sent her money to, and it’s more than time and rate. Some of these “altcoin” in bitcoins, the value of a bitcoin was as
likely Alice knows the people she sends clones offer mining algorithms that low as $20, whereas now they’re worth
money to. Alice may wish to support a are considered fairer by hashing blocks more than fives times that amount. If
controversial group and anonymously with the Scrypt algorithm. Producing the company has to pay back custom-
donate. She naively sends money to a hash using the Scrypt algorithm is ers with the amount of bitcoins the cus-
an address the group posted on their more memory intensive than SHA256 tomer paid, then they will be paying the
Twitter feed. Anyone looking up Alice’s and as a result doesn’t benefit as much customers more than fives times the
donation address in the blockchain with mass parallelization provided by dollar equivalent they originally paid.
would only have to Google the address- more expensive hardware such ASICs. However, paying out the dollar value to
es she sent bitcoins to in order to link Some altcoins use a variation of Bit- the customer in bitcoins is also not fair,
her with the controversial group. coin’s proof of work algorithm, mak- as the user will end up with consider-
All the above could have been avoid- ing them in theory more resistant to a ably less bitcoins than they had before
ed if either party had used a script to 51 percent attack. they bought the product.

XRDS • fall 2013 • Vol.20 • No.1 43


feature

The volatility of Bitcoin can be at


least partially attributed to the fact
A potential flaw hand, many of the things on sale on
Silkroad were legal in some country
that nobody really knows the intrinsic in Bitcoin, as it somewhere in the world. A libertarian
value of a bitcoin. It’s been suggested
there is a loose correlation with price
stands, is that it is so might argue people should be allowed
to buy and sell what they want with
of electricity since the act of mining incredibly novel and their money. On the other hand, trade
uses a significant amount of electrici-
ty—one source suggests the amount of
ingenious; it’s not in guns, particularly on a black market
like Silkroad, can severely impact the
electricity put toward mining bitcoins yet intuitive or easily lives of others and attract negative at-
for a single day is enough to power
more than 30,000 homes. There was
understandable tention from the press and public.
It’s important to emphasize cryp-
also speculation that the rise in price for most of the tocurrencies are effectively tools, and
of Bitcoin early this year was linked to
the Cyprus bailout. Cypriots with more
population. tools on their own do not break laws.
Paper cash, for example, shares many
than 100,000 EUR had money taken of the features of a cryptocurrency—
from their accounts in order to sup- relatively anonymous, fast, and peer-
port the country. This wouldn’t have to-peer—yet it is a valuable part of
been possible if Cypriots stored their Unlike the Bitcoin protocol, these society. Although the way people use
money in bitcoins. However, traffic exchanges are ultimately centralized: tools can be illegal, a lot of media at-
figures from exchange websites where Backed and run by people against tention around cryptocurrencies fo-
bitcoins can be purchased suggested whom laws can be enforced. In most cuses on their illicit use, effectively
no notable increase in traffic from Cy- jurisdictions, exchanges are subject labeling and criminalizing those
prus. Other things that seem to affect to “anti-money laundering” laws that who simply want some privacy, or a
Bitcoins volatility include media cover- require certain kinds of businesses useful service.
age, Bitcoin services being hacked, and to “know your customer” (i.e., know
downtime on popular services. as AML/KYC regulations.) This in- The Future of Cryptocurrencies
The more money that goes into Bit- volves formally identifying customers Like most privacy enabling services,
coin, the more stable it is likely to be- through official documentation—typi- Bitcoin serves its purpose when un-
come as the currency finds its value. cally passports and utility bills—pre- derstood and used properly. However,
The Winkelvoss twins—famous for senting yet another place where a Bit- the novel nature of Bitcoin can make
claims that Mark Zuckerberg stole coin wallet address can be related to a it hard to grasp properly. Regardless
their idea for Facebook—recently an- real world identity. of privacy, Bitcoin provides users with
nounced they had invested $11 million In reality, Bitcoin offers enough ultimate control over their money, and
in Bitcoin, giving them, at the time, 1 anonymity for it to become inefficient allows fast and secure transfers around
percent of all bitcoins in circulation. for authorities to pursue a person for the world with transactions fees lower
While this amount of investment is small scale and minor crimes. As a re- than virtually all-existing alternatives.
useful in some respects, with Bitcoin sult, Bitcoin has been used for trading As a first-generation cryptocurrency,
being a relatively small market, indi- in illegal goods and services online. it still has its quirks and is ultimately
viduals with such large amounts could Perhaps the most notable case, which reliant on third-party services, which
still sway the price at the exchanges. was the focus of most Bitcoin related are not yet mature enough to offer the
Furthermore, the exchanges where media attention in 2011, is Silkroad, same strengths as the protocol itself.
bitcoins can be purchased and sold described as the “Amazon market- The shallow but rapidly growing mar-
are not a part of the Bitcoin protocol, place of drugs.” Silkroad operates on ket around Bitcoin creates a level of
so they may not be as safe. This has big the anonymizing network Tor, and al- volatility that can make Bitcoin a bit of
implications for privacy and security. lows users to buy and sell drugs, guns, an inconvenience in some cases. But
Anyone can run an exchange, regard- guides on hacking, forged documents, it’s still most likely the first of many
less of his or her competence, knowl- and more—all paid for in bitcoins. In emerging systems that will force us to
edge, reliability, or trustworthiness. 2012, research into Silkroad suggested fundamentally rethink how our money
Most exchanges are websites where a an average of just more than $2 mil- and the associated private data should
fiat currency or bitcoins are deposited, lion worth of bitcoins a month went be handled in the age of the Web.
and then converted to numbers in a da- through the site. There are typically
tabase. This subjects them to the same between $30 million and $70 million Biography
vulnerabilities as other websites, mak- in transactions a day with bitcoins, so Shaped by daily access to the Web from the tender age of
five, Dominic Hobson has spent his life on and around the
ing them a target for hackers. A lot of it’s important to remember Silkroad’s Web. After completing a degree in computing science at
articles in mainstream media relating $2 million a month makes up just a University of Wales, Bangor, he went on to Southampton
to complete a master’s in Web science, looking at criminal
to Bitcoin have focused on hacks at the tiny fraction of trades that take place use of virtual payment systems. Still at Southampton,
exchanges and rarely make a distinc- with Bitcoin. Hobson is now working toward a Ph.D., studying
cryptocurrencies and their surrounding communities.
tion between the exchanges and the The protection Bitcoin provides
Bitcoin protocol itself. is a strength and a weakness. On one © 2013 ACM 1529-4972/13/09 $15.00

44 XRDS • fall 2013 • Vol.20 • No.1


The Tor Project:
An inside view
A decade since the first version was released, Tor continues
to be at the center of the debate around online privacy.

By Kelley Misata
DOI:10.1145/2510125

T
en years has flown by since The Tor Project released the first version of the Tor
software in 2002. Since those early days Tor’s technology, research, company, and
mission has grown to meet the needs of constantly changing global landscape.
Fueled by a passionate team of more than 30 core employees and contractors—
including myself—we along with more than 3,000 dedicated volunteers and a community
of sponsors share in Tor’s mission in bringing a voice to global debates around privacy,
anonymity, and censorship circumvention.
Today, with more than 2.4 billion peo- The Tor Project’s place in the arena grow of relays and Tor bridges from
ple online, Tor continues to be on the of privacy and anonymity is complex March 16 through June 14, 2013.
front lines helping people across scien- and ever changing. Therefore, in the in-
tific, charitable, civic, government, and terest of space, this article will explore a ANONYMITY LOVES COMPANY
educational sectors stay safe and com- few key areas that illustrate Tor’s broad Ongoing trends in law, policy, and tech-
municate freely. global impact. Much more about us, our nology threaten anonymity as never be-
Protecting the rights of privacy mission, and our projects can be found fore, undermining our ability to speak
and anonymity for all isn’t always the on www.torproject.org or, even better, and read freely online. These trends
most popular or easy place to stand join the ongoing conversations with Tor also undermine national security and
when the public is faced with reports developers on IRC at #tor-dev. critical infrastructure by making com-
of national and international privacy munication among individuals, orga-
breaches. However, Tor maintains a HOW TOR WORKS nizations, corporations, and govern-
consistent focus on technology and Relay operators are really the heart of ments more vulnerable to analysis.
the passion to help people stay safe. the Tor network. We could not exist Each new user provides additional di-
We empower NGOs, law enforcement, without the loyalty and passion of these versity, enhancing Tor’s ability to put
and survivors of crime through our 3,000-plus volunteers. By downloading control over your security and privacy
technology. We give ordinary citizens a Tor’s software and setting up as a relay, back into your hands. (For more on how
fighting chance against criminals who operators manage a constant flow of anonymity loves company I would di-
steal identities and bandwidth to com- online traffic through the Tor network rect readers to “On the Economics of
mit crime. Without Tor in the world, every day. Each relay has a direct impact Anonymity,” written by Roger Dingle-
the bad actors will find another tool on Tor network’s ability to run better dine, Paul Syverson, and Alessandro
to achieve their objectives. With Tor in and faster for all users. A Tor relay can Acquisti; http://freehaven.net/doc/fc03/
the world, the good actors will contin- run on almost any operating systems, econymics.pdf.)
ue to have a tool and experts commit- but currently runs best on Windows, The Tor community is often por-
ted to providing safe online channels Mac, Linux, and on Amazon cloud ser- trayed in the media as being comprised
of communication. vices. Figure 1 illustrates the steady only of immoral, unjust, and malicious

XRDS • fall 2013 • Vol.20 • No.1 45


feature

individuals. Unfortunately, in the bor-


derless cyber world we live in today, and
Protecting the Windows, Mac OS X, or Linux platforms
in 14 languages and growing.
because Tor’s technology is available rights of privacy and 2. Tails. A live operation system for
open source and free some of this is
true. Like other open-source technolo-
anonymity for all preconfigured CD/DVD or USB allows
users to use Tor’s state-of-the art cryp-
gies, anyone is free to download Tor. isn’t always the most tographic tools any time, anywhere,
However, we firmly believe the bad ac-
tors in this arena are few and do not
popular or easy place without leaving a trace on the computer
they are using. Tails 0.18 is now avail-
make up Tor’s core user community. to stand when the able for download. Current and poten-
Our community consists of a wide va-
riety of people, groups, and organiza-
public is faced with tial Tails users should keep an eye on
the Tor blog for continue updates and
tions that all share a common vision: reports of national Tor news.
Privacy is important and anonymity
has a place in daily life. Journalists of-
and international 3. Orbot. Tor on Google Android
devices provides mobile anonymity
ten use Tor to communicate more safe- privacy breaches. through a free proxy application.
ly with whistleblowers and dissidents. 4. Pluggable Transports or Obf-
Non-governmental organizations sproxy. By transforming how traffic ap-
(NGOs) use Tor to allow their employ- pears to those who may be monitoring
ees to connect securely and privately tect their right to research sensitive it, therefore preventing the detection of
while they’re in a foreign country, with- topics without being tracked. Tor traffic, this tool circumvents cen-
out notifying everybody nearby who Figure 2 illustrates how the num- sorship. Though the technology works
they’re working for. Activist groups ber of Tor users around the globe has behind the scenes independently from
recommend Tor as a mechanism to grown since 2010, and Table 1 lists the Tor, developers have configured an in-
maintain civil liberties online; for ex- top 10 countries that connected to the ternal protocol to minimize any end-
ample, the Electronic Frontier Founda- Tor network from March 16 through user intervention. In addition there are
tion (EFF) is an open supporter of The June 14, 2013. prototypes in development by univer-
Tor Project. Corporations see the value sity researchers and developers of cir-
in Tor to conduct competitive analysis TOR’S ECOSYSTEM cumvention tools.
safely, protect sensitive procurement The Tor Project ecosystem continues
patterns from eavesdroppers, and, in to expand to meet global technical TOR’S RESEARCH
some cases, replace traditional VPNs. challenges and a constantly chang- The research arm of Tor is critical to the
Governments around the world use Tor ing threat landscape. A complete list success of the entire ecosystem. Many
for open-source intelligence gathering of all the projects the development people around the world are conducting
and their teams use Tor to communi- team is working on can be found on valuable research in the space of priva-
cate safely while deployed. Law enforce- the Tor project page of our website. cy, anonymity, and censorship circum-
ment uses Tor for visiting or surveilling However, here is a quick glimpse of vention and Tor is actively participat-
websites without leaving government Tor’s top technologies. ing. To keep current we foster essential
IP addresses in their Web logs, as well 1. Tor Browser Bundle (TBB). Tor’s partnerships with leading research and
as to provide an additional layer of se- flagship software allows users to academic institutions around the world
curity during sting operations. Every bounce Internet communication or ac- and continually engage with leading re-
day people use Tor to get online safely cess block websites using Tor’s distrib- searchers through the Tor research por-
and securely, to protect their identity, uted network of more than 3,000 volun- tal. Our current research efforts are fo-
protect their children online, and pro- teer relay operators. TBB is available on cusing on how to improve Tor’s design,
understanding what’s going on within
Table 1. The Top 10 Countries Connected to Tor the Tor network, and more general re-
search on privacy and anonymity.
Country Mean daily users
United States 83352 (15.90%) TOR AROUND THE WORLD
Italy 57577 (10.99%) The conversations around privacy
Germany 46185 (8.81%) and anonymity that are facing policy
Spain 39985 (7.63%)
makers, technologists, and indus-
try are not confined to online chats
France 38736 (7.39%)
or boxes of wires sitting on desks. It
Ukraine 20830 (3.97%)
is incredible to witness the number
Russia 17105 (3.26%)
of people who think Tor is just one
United Kingdom 15739 (3.00%)
person or a few individuals working
Brazil 15723 (3.00%)
in a dark basement somewhere. The
Netherlands 12069 (2.30%) fact is we are at the forefront of many
critical dialogues addressing privacy,

46 XRDS • fall 2013 • Vol.20 • No.1


anonymity, and issues of freedom of
speech online. Frequently invited as Figure 1. A steady growth of relays and Tor bridges.
privacy and anonymity experts, mem- Number of relays Relays Bridges
bers of the Tor team travel the global 3500
speaking about these issues, helping 3000
people learn about the technology and 2500
the risks to privacy in various environ- 2000
ments and gathering new insights into 1500
the behaviors of online users, which 1000
we can use to improve the usability of 500
the Tor technology or provide more ef- 0
fective learning tools. Apr 2013 May 2013 Jun 2013
We have seen, first hand, the value of The Tor Project https://metrics.torproject.org/
bringing all the stakeholders to the table
on these issues and others surround-
ing freedom of speech online. Below is Figure 2. The number of Tor Users around the world has grown significantly since
a snapshot of some of the places Tor has 2012.
been during the month of May 2013: Directly connecting users from all countries
1. The Advisory Council on
Child Trafficking Symposium;
Baltimore, MD 800,000
2. ITWeb Security Summit;
Johannesburg, South Africa 600,000
3. Oslo Freedom Forum;
Oslo, Norway 400,000
4. Pixelache Festival; Hesinki,
Finland/Tallinn, Estonia 200,000

5. CryptoParty; Frankfurt, Germany


6. IEEE Symposium; 0

San Francisco, CA 2010 2011 2012 2013


7. Stockholm Internet Forum The Tor Project: https://metrics.torproject.org
Conference; Stockholm, Sweden
8. Consilience 2013 Conference;
Bangalore, India tinue to improve usability of the Tor unteer pool, capabilities, and languag-
9. NATO Discussion Forum; ecosystem, security and anonymity, es to serve the global community. We
Brussels, Belgium stronger cryptography capabilities, and will continue to bring a strong, confi-
10. DIMACS Working Group new tools to offset increasingly persis- dent voice to the conversations. We will
on Measuring Anonymity; tent probes for censorship being used continue to raise awareness and foster
Rutgers University, NJ in global societies. We will continue to partnerships to help everyone stay safe
11. Youth Internet Governance support the network through the ongo- and free online.
Forum ing expansion of the Tor help desk vol- Basically, Tor will continue to do
12. New England Give Camp at Micro- what we have done since 2002; build-
soft Research; Cambridge, MA ing innovative, sustainable technology
This does not fully capture our ex- solutions, which meet the needs of the
tended community helping to raise
awareness. Imagine the countless oth-
Our community world and protecting everyone from
the threats to privacy and censorship.
er conversations the Tor team has par- consists of a wide We want to make it possible for people
ticipated in over the past 10 years and
many other important conversations
variety of people, to actively choose their level of privacy
and anonymity online; this is what the
that still need to happen. groups, and Tor Project is all about. We’re making

CONCLUSION
organizations that progress, but we need your help. Please
consider running a relay, volunteering,
Recent events remind us all how impor- all share a common as a developer, attending an event, or
tant and complex privacy and anonym-
ity are in today’s digital environment.
vision: Privacy is contacting us for training or education.

Tor will continue to be at the ready to important and Biography

respond with technology solutions,


research, and an active role in global
anonymity has a Kelley Misata is the director of marketing,
communications and outreach with The Tor Project, Inc.

decisions. Our technical team will con- place in daily life. © 2013 ACM 1529-4972/13/09 $15.00

XRDS • fall 2013 • Vol.20 • No.1 47


feature

It’s Not About Winning,


It’s About Sending
a Message: Hiding
information in games
New information hiding techniques use online games to transmit
secrets covertly. The technique is simple, but the problem of detecting
these covert channels is far from solved.

By Philip C. Ritchey
DOI: 10.1145/2510126

A
lice and Bob have been imprisoned under the guard of the watchful warden Wendy.
The two of them want to collaborate on an escape plan, but the only means of
communication they have is a public channel that is vigilantly monitored by Wendy.
On top of that, Wendy will not allow them to use cryptography to obscure the contents
of their messages. If she sees a message that she cannot read, she will throw both Alice and
Bob into solitary confinement and they will not be allowed any further communication at all.
What can they do to communicate covertly over a public channel?
Alice and Bob’s situation can be re- can now be proven for steganographic secret data. The resulting change in
solved through steganography—the techniques. Despite having once been the image is imperceptible to a human
art and science of sending messages regarded as a defective form of cryp- observer, and an 8-megapixel grayscale
in such a way that only the sender and tography, steganography in the digi- image can hold one megabyte of secret
the intended recipient are aware of tal age is almost as indispensable as data without noticeably reducing im-
the existence of the message. Whereas cryptography. From watermarks for age quality. The spacing between words
cryptography is concerned with keep- use in digital rights management and or lines in a text can be used to hide in-
ing the contents of a message secret, intellectual property protection, to formation as well. Many network pro-
steganography is concerned with protecting anonymity online, to cen- tocols and file formats have unused
keeping the existence of a message sorship resistant technologies, to fin- (or misused) fields that can hold secret
secret. In the past, steganography has gerprinting digital objects for traitor data. The possibilities are really only
been discounted as being security-by- tracing and authentication, stegano- limited by one’s imagination.
obscurity, which is to say: No security graphy is everywhere.
at all. However, modern steganogra- The standard example of digital HIDING INFORMATION IN GAMES
phy does not rely on methodological steganography is hiding secret data Much to the dismay of Chess-by-mail
obscurity, and equivalent notions of in images by replacing the least sig- players during World War II, the imagi-
security to those used in cryptography nificant bit of each pixel with a bit of nation of the censorship offices of the

48 XRDS • fall 2013 • Vol.20 • No.1


U.S. and Britain led them to ban Chess- information theory: Where there is and how likely she is to choose each
by-mail because the game could be choice, there is information. Because option. Assuming she will choose each
used to send secret messages. But how Alice can choose where she places her option with equal probability, Alice
does one send secret messages using a mark, she can use that choice to send can send three bits of information to
game? Let’s look at Tic-Tac-Toe, a pen Bob some bits of information. Bob, on Bob with her choice of opening move.
and paper game where two players, X his end, because he knows what choice In general, when the choice is made so
and O, take turns marking spaces in a Alice made, can determine the bits that each option is equally likely to be
3x3 grid (see the layout in Figure 1). The Alice sent. Similarly, Bob can use his chosen, the capacity of each move is
players take turns marking the spaces choice of move to send bits of informa- the base-2 logarithm of the number of
until there are no spaces remaining, or tion back to Alice. If we stopped here, available moves:
until one of them wins by placing three we would have a system whose secu- C = floor(log2 (n)),
of his or her marks in a row (horizon- rity relies on the method remaining where n is the number of available
tally, vertically, or diagonally). unknown to Wendy, i.e. security-by- moves. In an ordinary game, Alice
And so, when you come to that obscurity, which is no security at all. would choose the move she thinks
point in your life where your name is We will have to fix that in a moment, would maximize her chances of win-
Alice and you’re sitting there wonder- but for now we will leave it unsecure so ning. But in a game played for the
ing how you’re ever going to talk to that we can look more closely at how purpose of sending secret informa-
Bob without Wendy reading every- this system works. tion, Alice must choose the move that
Illustration by Lukas Radavicius

thing you write because you’re not al- Let’s assume Alice is playing X, so sends the correct bits. Let’s assume
lowed to encrypt your messages, you she goes first. With her first move, she Alice is trying to send data that begins
will probably end up playing Tic-Tac- must choose which space to mark. with bits 11001. We can label each
Toe and Wendy will think nothing With her first choice, Alice will send available move with a C-bit string that
of it. The key to making Tic-Tac-Toe, some bits of secret information. The represents the meaning of the move,
or any other game, work as a covert number of bits she can send depends i.e. what bits will be sent to Bob by se-
communication channel comes from on the number of available options lecting that move. The easiest way to

XRDS • fall 2013 • Vol.20 • No.1 49


feature

in a new game. This will continue,


Figure 1. Alice is sending the bits 11001. With her first move, she sends the bits possibly for many games, until Alice
110 by choosing the bottom left corner space. With her second move, she sends has sent all the bits of her secret.
the bits 01 by choosing the top right corner space. This will continue until Alice no
longer has bits to send. DETECTING HIDDEN INFORMATION
What does this exchange look like
to Wendy? If we put ourselves in her
shoes and look at what Alice and Bob
send over the channel, we can see it
looks like two people playing ordinary
games of Tic-Tac-Toe. At least, that’s
how it looks at first. After a while, Wen-
dy will see something suspicious hap-
pen too often to be chalked up to ran-
dom chance: One or the other player
will make a stupid move, such as fail-
ing to win when they could, or failing
do this is to start in the top left corner can then send some bits of his own to block their opponent’s win (see the
with the number zero in C-bit binary back to Alice by relabeling all the example in Figure 2). It is at this point
(000, since C=3), then the number moves and picking his move based on Wendy can make her first attempt
one (001) in the top middle, then 010, what his secret bits are. Or, Bob can at detecting Alice and Bob’s usage
and so on until we get to the bottom just play along with Alice and only re- of a secret communication channel.
middle, which is 111. To keep things ceive but not send. For simplicity, we She does so by defining a threshold
simple, we make the remaining op- will not have Bob send data back to value for the number (or rate) of stu-
tions nulls that do not encode any bits Alice. Once Bob makes his move, he pid moves she will allow. If Alice and
at all, as shown in Figure 1. The first sends the updated board back to Al- Bob exceed this threshold, she will cut
three bits Alice wants to send are 110, ice. If Alice expects Bob to be sending them off and throw them into solitary
so she will choose the move in the bot- his own data, now is when she would confinement. Our experiments show
tom left corner. After marking an X in extract it. Since she does not expect if Alice and Bob are not playing smart,
the space, she sends the board to Bob to receive data from Bob, she can go Wendy’s classification accuracy can
for him to make his move. straight to encoding the next bits of be as high as 95 percent if she watches
Upon receiving the board, Bob can her message. With two moves already them play for just five games.
extract the bits Alice sent by undoing on the board, there are seven options To prevent bad play from giving the
her last move and carrying out the remaining for Alice, which allows players away, Alice and Bob can make
same steps that she did to label the her to send two more bits. The bits a slight adjustment to their system
moves. Then Bob just needs to look she needs to send are 01, so she will so that they are only choosing their
at what move Alice chose and record place her mark in the space labeled next move from the set of not-stupid
those bits. In this case, Bob would see 01, which is the top right corner space, moves. Where before they were play-
Alice moved in the bottom left corner, and send the updated board to Bob. ing “random” we say they are now play-
which has the label 110, so he writes If the game ends while Alice still has ing “greedy.” Meaning they will win if
down 110 as the bits Alice sent. Bob more bits to send, she will engage Bob they can block a win, but otherwise
play randomly. While this slightly re-
duces the total information hiding ca-
Figure 2. Continuing the game started in Figure 1, Bob takes the center space. But, pacity of their system, from an average
suppose Alice still has bits to send, namely she now needs to send the bits 10. To of 15.748 bits per game when playing
send the bits 10, she must choose the middle right space. Unfortunately, this is a random to 14.105 bits per game when
stupid move and, if it happens too often, will signal to Wendy that, instead of play- playing greedy, it also completely cir-
ing to win, Alice is sending a message. cumvents Wendy’s detector since now
they don’t make any stupid moves.
The onus is now on Wendy to find a
way to distinguish between gameplay
that is clean (contains no hidden in-
formation) and gameplay that is dirty.
What happens if Wendy goes one step
further than simply requiring that
Alice and Bob not play stupidly, and
insists they play perfectly? Wendy’s
thinking is humans are rather good at
Tic-Tac-Toe and so can play perfectly,

50 XRDS • fall 2013 • Vol.20 • No.1


so perhaps requiring perfect play will
not leave any room for covert commu-
The standard play and Alice and Bob don’t make any
stupid moves (they use either a greedy
nication. If so, then Alice and Bob will example of digital or optimal strategy), then Wendy’s ac-
have to deviate from perfect play in
order to transmit their secret messag-
steganography is curacy will be no better than guessing
and Alice and Bob will have perfect se-
es. Do you think Wendy is onto some- hiding secret data in curity against her, even when they use
thing? Or, do you think Alice and Bob
can play perfectly and still have capac-
images by replacing all of the available capacity by using ev-
ery move to send data. But, it is worth-
ity left for covert communication? the least significant while to know that, should they need it,
In fact, it turns out Alice and Bob
are able to satisfy the perfect play re-
bit of each pixel with they can trade capacity for security.

quirement and be left with an aver- a bit of secret data. ARTIFICIAL VERSUS
age of 11.815 bits of total capacity per AUTHENTIC PLAY
game. And so, yet again, Wendy finds Wendy’s task all along has been to
herself suspecting Alice and Bob are distinguish between gameplay that
using Tic-Tac-Toe to share secrets, otherwise the move is made normally is, and is not, used to hide secret data.
yet she is still unable to determine and does not hide any secret bits. If An equivalent statement is Wendy is
when they are doing so. I’ll tell you the coin is fair (50 percent chance of attempting to distinguish between
now that she may be down, but she’s heads), then Alice and Bob will see gameplay generated by a computer
far from out. their capacity cut in half since only (artificial gameplay) and gameplay
Right now, there should be a half of the moves will be hiding data. generated by a human (authentic
screaming objection in your head: Why But, Wendy’s detection task now be- gameplay). Authentic gameplay is
doesn’t Wendy just decode what Alice comes harder since the covert channel clean by definition, since, in order to
sends? If she knows the method, she usage is diluted and dirty moves are hide data in a move, the move is not se-
can extract the bits just as easily as Bob in amongst clean moves. Her accura- lected by the human, but rather by the
can, and if she sees anything mean- cy will drop from above 95 percent to secret data and the stego-system. It is
ingful she can prove they were using below 75 percent when Alice and Bob, impossible for you, a human, to make
the game as a covert channel. That is playing random, switch from hiding the moves you want to make while
exactly why security-by-obscurity is data in every move to hiding it in only at the same time sending the secret
no security at all. Therefore, Alice and half of the moves. If they use a coin that data you need to send. Therefore, if a
Bob need to introduce an extra step lands on heads 33 percent of the time, move is used to transmit data, then we
and a secret key into their system so so that a third of all moves are hiding know a computer generated the move.
that only someone with the secret key data, Wendy’s accuracy drops to 50 Logically, artificial moves are neces-
will be able to correctly decode the percent. This is equivalent to guessing, sary, but not sufficient, for hiding data
gameplay. Simply put, Alice and Bob meaning Wendy cannot distinguish in games. However, because Wendy is
should encrypt their messages before between dirty and clean gameplay. the warden and she makes the rules,
hiding them in the game. That way, Meanwhile, Alice and Bob realize an Wendy can decide the only gameplay
when Wendy extracts the bits from the average total capacity of 5.249 bits per allowed is authentic gameplay. That
gameplay, she will get something that game. This illustrates the tradeoff be- is, because it is impossible for Wendy
appears to be random. To determine if tween capacity and security. By giving to tell the difference between artifi-
the covert channel is being used, Wen- up some capacity, Alice and Bob can cial and dirty gameplay, she must con-
dy must be able to distinguish between significantly increase their security sider artificial gameplay dirty even if
encrypted data and random data when against Wendy, from having almost no it is not being used to send secrets. If
analyzing the extracted bits. While security to having nearly perfect secu- Wendy can accurately classify moves
this is, in theory, possible, it requires rity. Note, however, this assumes Wen- as authentic or artificial, then she can
Wendy to wait until she has a very large dy does not realize Alice and Bob are also accurately classify moves as clean
number of bits before she can make not hiding data in every move. If Wendy or dirty.
a decision. Alice and Bob may finish figures out how often their coin lands In artificial intelligence, there is a
their communication before Wendy on heads, she can adjust her detector name for the task of distinguishing
even has enough bits to be able to test to take that into account and reclaim between an artificial source and an au-
them for randomness. some of her advantage. For this reason, thentic source: the Turing test. Turing
What else can Alice and Bob do to the secret key must include not only originally formulated the test using
make Wendy’s job even more difficult? the seed for the random number gen- natural language, since humans are
Imagine Alice and Bob each have a erator, but also the parameter which very good at using and understanding
coin and whenever one coin is flipped, controls how often the coin lands language and computers are, so far,
the other is flipped at the same time on heads. Note, also, that all of this not that good. Natural language pro-
and the result is the same for both. doesn’t even matter since it assumes cessing is a hard AI problem and the
If the coin lands on heads, then the Alice and Bob are playing stupidly Turing test is effectively the true test
next move is used to hide secret bits; again. If Wendy is checking for stupid of whether we’ve solved it. What about

XRDS • fall 2013 • Vol.20 • No.1 51


feature

a Turing test for gameplaying? Can every day. But does a Turing test for After collecting some human game-
you, as a human, play chess against gameplay even make sense? It must, play data for Tic-Tac-Toe, we used it to
an online opponent and determine because that is precisely what Wendy build a classifier for Tic-Tac-Toe game-
whether your opponent is a human needs. She needs to be able to look at a play. Our classifier achieves 95 percent
or a computer? Probably not with ac- conversation between two players who accuracy when shown 10 games for all
curacy anywhere close to the accu- use the language of gameplay and de- six computer strategies that we have
racy with which you could distinguish cide whether they are humans or com- tested so far. So, whether Alice and
between a human and a computer puters. To do this, she needs a better Bob’s stego-system is playing random-
conversational partner. This is likely understanding of how humans play ly, or greedy, or optimal, or a hybrid of
due to the fact that you use language games and what human generated these, Wendy can observe just 10 games
every day but you do not play chess gameplay looks like. and accurately determine whether they
were played by humans or computers.
Trying to send a secret in less than 10
Figure 3. With focused research and development, computer players could defeat games, which still gives Wendy accura-
the best human players at most games, but there are games at which they may cy better than guessing, means Alice can
never be able to beat top humans. While a computer may never become a champion only send a few words to Bob. There’s
at Mao or Calvinball, creating a computer that has the ability to play such games, no hope for Alice to pack more bits into
without boring their human opponent, will be a considerable achievement. There is each game, so, if she needs to send more
more to playing games than simply winning. information, she must increase the
number of games Wendy needs to ob-
serve in order to make a decision.
The next step for Alice and Bob is
to design a strategy that more closely
mimics the actual gameplaying be-
havior of humans, putting them in
the same boat as Wendy. Both sides
want to have the best, most accurate,
model of human gameplaying behav-
ior so that they can generate realistic
gameplay or detect the slight varia-
tions that betray artificially generated
gameplay. In fact, this type of work
has been going on for a long time in
video games, where developers are
continually striving to provide more
realistic non-player characters and
opponents for human players to in-
teract with and play against. But, no
matter what the game, from simple
pen-and-paper games to MMORPGs,
where there is choice, there is infor-
mation; any game can be used to send
secret messages.

Acknowledgment
This research has been supported by
the following grants: NSF CNS-0716398
and NSF CCF-0939370.

Biography
Philip C. Ritchey is a graduate student in the Department
of Computer Science at Purdue University. He is a member
of the Center for Education and Research in Information
Assurance and Security and the Center for the Science of
Information. He received his B.S. in computer engineering
from Texas A&M University in 2008. His research interests
include information assurance and security, interactive
artificial intelligence and machine learning, and
computational models of human problem solving.

Copyright held by Owner/Author(s).


Publication rights licensed to ACM $15.00

52 XRDS • fall 2013 • Vol.20 • No.1


feature

 n Illustrated
A
Primer in
Differential Privacy
The vast amounts of data that are now available provide new opportunities to
social science researchers, but also raise huge privacy concerns for data subjects.
Differential privacy offers a way to balance the needs of both parties. But how?

By Chrisine Task
DOI: 10.1145/2510127

G
ood grief there is data everywhere now, just everywhere! Our governments, our
doctors, our schools, our Web browsers, our social networks, our cameras, our
phones, our cars, our shoes, and now even our glasses can collect reams and reams
of electronic data about us. Storage is cheap, data-ownership is often poorly defined:
There is so much data about you already scattered around the world. You’re aware of this.
If you’ve made it this far in this issue of XRDS, I expect you’re rather alarmed. But, really,
you should be a little excited too. In the hands of social science researchers this data gives us
a real chance at
building a world that

SAVE THE WORLD


Discover Hidden
works. The globe is Links Between
spitting out infor- Illnesses!

WITH DATA!
mation about itself Track The Spread
all over, it is now of Disease!
and forever more in Stop Epidemics!
“verbose” mode, and
if we listen, we can Medical Records!
learn something use-
ful. We can begin to Sensitive Data!
really understand our
world in an objective,
quantitative way, and Better Understand
Improve
we can start to make Struggling Families
Location Data! Public
it a better place. I and Communities!
Transportation!
drew you a picture to Diagnose Problems
give you an idea. Address
With Failing Students
Traffic
and Schools!
Congestion!

XRDS • fall 2013 • Vol.20 • No.1 53


feature

Sounds awesome right? All ready to


contribute your data? Great! All we Hi! I have a master’s in CS,
need is your name, age, date of birth, Hi! I’m an undergrad
 nd I’m data analyst for a social
a
medical history, financial status, CS major. I like writing apps,
welfare non-profit. I like strategy
academic records, drug and alcohol biking, comics, and I plan to
games, long distance running,
usage, history of sexual interactions, propose to my girlfriend
comics, and my husband and I hope
list of childhood traumas, social when we graduate in June!
to buy a house soon!
network accounts, who you call, where
you drive, what you eat….
That doesn’t sound quite so
awesome! In fact, it’s a little scary.
What if your information got into
the wrong hands, like your potential
future employer, your ex-boyfriend, or
the person who approves your home
loan? We’ve got good reason to use
your data, but you’ve got an even bet-
ter one to hang on tight to it. So what Bob Bob
Alice Christine Alice Chris
do we do? Well, that’s what this story
is all about.
Of course, like any research in
information assurance, our story is
really a parable about Bob and Amy.

Let’s give our main characters If I fill out your survey, 


a proper introduction. Bob is an how do I know my private
undergrad at X University, and Alice data w  ill be safe?
works for a local non-profit. Oh, it’s OK! You don’t have to put
your name down. Our data set will
Bob receives an email from Alice asking be completely anonymous!
him to participate in a survey. The
survey asks for details about his high But you’ll have my email! And
school experience with alcohol and the exact date and zip code where
drugs, and his current grades and I was born.Couldn’t someone
major. It also asks for demographic just use those to figure out  Good point! OK, we’ll also black out
information, such as his gender, which survey’s mine? any “personally identifying information”
birthdate, and the neighborhood like that. We’ll just say you’re in the
he grew up in. Alice is investigating 18-20 age group, and grew up
potential influences for risk behavior somewhere near Nacogdoches, TX.
and how those behaviors can cause self- What about the other data?
destructive cycles. This is foundational What if I’m the only senior CS
research whose data will be shared major from Texas?
with many other groups to help inform Hmm… Alright, I’ve got it. W
 e’ll make
how intervention programs are sure we have at least three of every type
directed and designed. It’s very useful… of person… even if we have to add
but it’s also a very creepy survey. Does fake people to the data set. That
he dare fill it out? What happens when So… Jack and his two friends are
the only CS guys from NY, but I think way you can’t be singled out!
the study’s results are published;
what if someone used them to find all three did pot inhigh school. Your
out his private information? Is he OK data-set would hide which of the three
with his potential future employers potheads was Jack, but not t hat
knowing that he smoked marijuana Jack did pot.
…Yeah. That’s a problem.
in high school? The past 15 years of
research in privacy preserving data
mining have been spent trying to solve
this problem, but each time a solution
for protecting data is proposed, new Wait! I have a solution!
vulnerabilities are found. Let’s have a
little conversation with history. Bob Alice Christine

54 XRDS • fall 2013 • Vol.20 • No.1


We seem to have a new character. Hi! I’m a CS Ph.D. student studying differential
Let’s give her a proper introduction too. privacy, and I wrote this article! I got my B.S. in math at
Ohio State University, I like hiking and comics, and my
husband and I just moved to DC!
Cynthia Dwork and her colleagues at Microsoft Research
Labs invented differential privacy, but I’m too terrible
at drawing to make a stick figure of her, so I’m here to
explain things in her place.
Bob Alice Christine

There was something said about


a solution to Amy and Bob’s privacy What if we could guarantee that no one would be
problem? able to tell the difference whether or not you submitted
a survey, Bob, no matter what else they knew about you?
Since we only want to learn about big, significant trends in
the data, one person’s information shouldn’t make much
difference to our final conclusions anyway. So rather than
This is an exciting, if somewhat ambigu- publicly publishing the actual anonymized data-set, we’ll
ous, idea! Publishing anonymized data do some aggregation of the data ourselves first (count
Bob Alice
sets is just plain problematic for privacy;
Christine things up) and then publish that instead. We’ll design
there’s always the risk that with the our aggregation method carefully to make sure no single
right outside information, an attacker person can make too much difference to its results, and
could figure out the real names of the an- we’ll add random noise to help cover up the difference they
onymized participants from their data do make. Cynthia Dwork called it “Difference-al” Privacy.
(and as we’ve noted, data is everywhere Wait, no, she used a real word. Differential Privacy!
these days). But what are we using these
anonymized data sets for, anyway? We …Um. OK? How?
perform data-analysis on them to learn
about patterns of behavior in the popu-
lations they represent. So, rather than
just making the whole anonymized data
sets available to the public, perhaps we
can perform some preliminary data-
analysis ourselves and publicly release
the results of that instead. We’ll design
our analysis so it minimizes the differ-
Bob Bob Alice Christine
Alice Ch
ence any single individual can make to
the results, and then we’ll add noise to
help cover up that difference. The end
Two Possible Worlds
result is it will be, with a shiny absolute
Total Difference Bob Makes In Aggregated Survey Results Is 2:
provable mathematical guarantee, very
difficult for an attacker to guess whether 1+1=2
or not any particular person is in the
Survey Results Survey Results
data set.
GPA A B C D/F GPA A B C D/F
So, let’s get started. The first thing HS Alcohol 54 93 98 75 HS Alcohol 55 93 58 35
to do, of course, is define our terms. HS Drugs 27 43 62 43 HS Drugs 28 43 62 43
We keep throwing around the word
HS Neither 152 193 59 18 HS Neither 152 193 59 18
“difference.” What does that mean,
precisely? Well, Alice has decided that
for her preliminary data analysis, No way! Sure!
she’s going to add up all her surveys
into a two-dimensional histogram
comparing high school habits and
university GPA. Let’s look at the
difference that Bob makes in Alice’s
aggregated survey results. Bob Does Not Take the Survey Bob Does Take the Survey

XRDS • fall 2013 • Vol.20 • No.1 55


feature

If Bob decides to fill out a survey, he will answer “yes” to two of the questions (he
had some wild days in his youth), and so he changes two of Alice’s counts by one
each. One plus one is two, and thus we say Bob’s total impact on Alice’s results is
two. If you look at it, you’ll see that in this survey it’s impossible for any person to
affect more than two of the counts. Bob is making the largest possible difference
on the results, so if we can protect Bob, we’ll also protect everyone else in the
data set. We call the largest difference any one person can make on the analysis
results, the “global sensitivity” of the analysis. This is what we need to cover up
with random noise. But how should we go about adding that noise? Alice is going to
sample a random value from the Laplace distribution. (It’s OK. It’s not much more
complicated than rolling dice to get a random number. )

Sampling a Random Noise Value


Six-Sided Die Laplacian Random Variable ( =2)

How? Roll the die. How? Run the script.

Python 2.6 on win 32


>>import
sicpy.stats.distributions as D
>>D.laplace.rvs(0,2)

What happens? What happens?


1, 6, 3, 3, 4, 2, 1, 5, 4, 1, 5, 6, -0.96489…, 0.27056…, 1.97060…, -1.70566…
You get values like these. You get values like these.

What’s the probability distribution look like? What’s the probability distribution look like?
0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
–7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6

If we add this noise to our true result? If we add this noise to our true result?

If our real 56, with probability (1/6) If our real A value between 50 and 60,
answer 57, with probability (1/6) answer with 92% probability
were 55, 58, with probability (1/6) were 55, A value greater than 60,
we’d get: 59, with probability (1/6) we’d get: with 4% probability
60, with probability (1/6) A value less than 50,
61, with probability (1/6) with 4% probability

56 XRDS • fall 2013 • Vol.20 • No.1


Now that Alice has aggregated her survey results, she’s figured out how much
difference someone can make in the results, and she’s going to add Laplacian noise
to cover that difference up. We’re all set for the big finish.

Differential Privacy in Action!


Alice will take the aggregated results from her survey about GPA and high school
substance use, and add Laplacian random noise to each of the twelve counts in order
to privatized them. Differential Privacy in Action!
Alice will take the aggregated results from her survey about GPA and high school
Consider the
substance first
use, andcount in Alice’s random noise to each of the twelve counts in order
add Laplacian
survey results: The number of 0.25
to privatized them.
people who both drank in high 0.2
school and get A’s in college. If
Consider 0.15
Bob takesthe thefirst count
survey , thatin Alice’s
count
survey results: The number
will be will 55. Alice is going to of 0.25
0.10
people who both
pick a random drank from
number in highthe 0.2
0.05
school
Laplaceand get A’s in college.
distribution to add toIf 0.15
0.0
Bob takes the
the count, survey, that
privatizing it. Thecount
will be will 55. Alice is going to 0.1045 47 49 51 53 55 57 59 61 63 65
probability distribution of the
pick a random number
randomly privatized count will from the 0.05
Laplace
look like distribution
this. to add to
0.0
the count, privatizing it. The
45 47 49 51 53 55 57 59 61 63 65
probability distribution of the
If Bob doesn’t take the survey,
randomly privatized count will 0.25
then Alice’s results will show
look like this. 0.2
that 54 people drank in high
school and got A’s in college. 0.15
If
InBob
thatdoesn’t
case, thetake the survey,
probability
0.25
0.10
then Alice’s results
distribution for the will show
privatized
that
count 54will
people drank
look like thisin (See,
high it’s 0.05
0.2
school Differential privacy provides a
shiftedand justgot A’s bit
a tiny in college.
to the left, 0.0
0.15
In
sothat case, the
the mean’s at probability
54) 45 47 49 51 53 55 57 59 61 63 65 mathematical guarantee that for any
0.10 privatized result R and any two data
distribution for the privatized
count will look like this (See, it’s 0.05 sets—D1, D2—that differ by one per-
shifted
Bob makes just a tiny bit to the left, 0.0 Alice privatizes son (like Bob), the probability R came
so
histhe mean’s at 54)
decision! I’ve made my decision.45 47 49 51 53 55 57 59 her 61 63 65
results! from D1 will always be in a close ratio
 ee you later, Alice. Good
S to the probability R came from D2. I’ve
luck with your research! Thanks for your time, Bob.
skipped over the math here, but I’m a
Bob makes LoOKs like my first p
 rivatized Alice privatizes
count is 56.3465!
math-major at heart and I’ve gone over
his decision! her results!
some more of it in loving detail at my
website, http://www.cs.purdue.edu/
homes/ctask/. The world has a lot of
data now, and like any powerful thing,
So did Bob decide to submit a survey or not?
that produces both benefits and dan-
gers. But as long as there are people
Bob
Privatized Result: 56.3465 Alice BobChristine Alice Christine
working hard to combat those dangers,
So did Bob decide to submit a survey or not?
Probability that the privatized things will get better. We hope that dif-
0.25 count came from the possible world ferential privacy will be one tool to help
0.2
Privatized Result: 56.3465 where Bob took the survey : 62% teach the era of big data how to use its
0.15 powers for good.
Probability that
Probability that the
it came from the
privatized
0.10
0.25 possible
count world
came where
from the Bob didn’t
possible take
world Biography
0.05
0.2 the survey:
where 38%the survey : 62%
Bob took Christine Task is a Ph.D. candidate at Purdue University
0.0 researching differentially private social network analysis.
0.15
45 47 49 51 53 55 57 59 61 Probability
63 65 that it came from the She has a B.S. in theoretical mathematics, an M.S. in
0.10 possible world where Bob didn’t take computer science, and five years experience teaching
undergraduate discrete math (often in terms of pirates
0.05 the survey: 38% and beer). She eagerly anticipates graduation and, with
We’re not telling. That’s private. luck, a research career taming the chaos of the world with
0.0 the power of mathematics.
45 47 49 51 53 55 57 59 61 63 65
Copyright held by Owner/Author(s).
We’re not telling. That’s private. Publication rights licensed to ACM $15.00

XRDS • fall 2013 • Vol.20 • No.1 57


feature

Cynthia Dwork
on Differential Privacy
Distinguished Scientist at Microsoft Research, Dr. Cynthia Dwork,
provides a first-hand look at the basics of differential privacy.

By Michael Zuba
DOI: 10.1145/2510128

L
arge-scale statistical databases, specifically those that contain aggregate information
about a population, are becoming an ever-important resource in our world. These
databases are valuable assets to researchers, businesses, and governments. Researchers
can use them to try and discover commonalities in a population for diseases, business
can use them to understand how to effectively market their products and services, and
governments are provided with knowledge about their citizens. Differential privacy techniques
are applied to these databases in order to minimize the risk of a person or group being able
to associate information in these data- MZ: What inspired the idea of individual that the adversary could have
bases with a specific person. Differential differential privacy? learned without interacting with the
privacy is essentially a “definition” database. But this leads to problems
of privacy for statistical databases. CD: Differential privacy was inspired by whenever the database is actually
In this interview, Dr. Cynthia Dwork shares two negative theoretical results. The useful: If the adversary is from Mars
with us a first-hand look at this emerging first showed that, roughly speaking, and believes that all humans have two
topic of differential privacy. “overly accurate” answers to “too left feet, and the adversary then learns
many” questions is completely NON- from the statistical database that
Michael Zuba: For those who are not private. In this particular case, “overly almost all humans have one left food
in the domain, how would you explain accurate” meant something like and one right foot, then the adversary
differential privacy? “accurate to within smaller than the has learned something about me (and
sampling error” and “too many” was about most humans) not learnable by a
Cynthia Dwork: Differential privacy is roughly the number of people in the Martian without access to the database.
a guarantee, made by a data curator to a data set. But if the data set is ver y Should this be viewed as a violation
data owner: No additional harm (or benefit) large—Internet scale—then asking of my privacy? The whole point of
will come to you as a result of permitting this many questions is probably statistical databases is to learn about
your data to be used in a statistical study. impractical. This suggested an the population as a whole, meaning that
In a little more detail, differential privacy investigation of what can be achieved learning facts about the population,
is a mathematical statement about a data if the number of questions is cur tailed, such as “smoking causes cancer in
analysis algorithm, which in English says which quickly led to a precursor of humans,” is actually the goal. So instead
that the outcome of any analysis is differential privacy. we ask that the adversary learn no MORE
essentially equally likely to be observed, The second negative result about me than the adversary would have
independent of whether any individual concerned mathematical definitions learned were I not in the database. And
or small group of individuals opts into or of privacy. One way to try to define that’s exactly differential privacy.
opts out of the data set. The probability privacy is to say that an adversary,
in “equally likely” is over random choices interacting with a privacy-preserving MZ: Can you give some examples of where
made by the algorithm. database, learns nothing about an differential privacy could be used?

58 XRDS • fall 2013 • Vol.20 • No.1


CD: Differential privacy is best suited to
large data sets. Analysis of census data
has always been a motivating example
for me; other favorite examples including
monitoring of over-the-counter drug
sales for early detection of epidemics and
recently I have been thinking of differential
privacy for analysis of smart meter data.

MZ: How about a real world example of


where differential privacy is currently
being used?

CD: OnTheMap from the U.S. Census


Bureau; http://onthemap.ces.census.gov/.
[Editor’s Note: For our readers who are
unfamiliar with OnTheMap, it is a Web-
based mapping and reporting application
developed by the U.S. Census Bureau. It
is used to show where workers live and
are employed, reports on age, earnings,
industry distributions, race, ethnicity,
and sex. Its main purpose is to provide an as a whole, much as can now be achieved authors prove (or what you can prove)
easy-to-use interface for creating and via the U.S. Census Bureau’s OnTheMap about the resilience of any proposed
viewing workforce related data.] website. As data sources become scheme to adversaries of this type. Study
increasingly detailed—think of smart the literature on privacy “breaks” with
MZ: What are the current big differential meter data—we may move away from these things in mind.
privacy challenges? the traditional data enclave approach for
enabling researchers to access sensitive MZ: What areas or techniques should
CD: On the technical side, there are information. In this case, differential students really hone their skills on to be a
algorithmic challenges of efficiency and privacy may provide a useful alternative to good privacy researcher?
accuracy. There are also social challenges: traditional methods.
Can researchers accustomed to working CD: I am amazed at the range of areas
with raw data adapt to a setting in which MZ: Do you have any advice for in cryptography, theoretical computer
access is through a differentially private students who are looking to get into science, mathematics, statistics, and
mechanism? privacy research? probability that have been brought
to bear on differential privacy. To this day,
MZ: Where do you see differential privacy CD: Distrust your intuition and pay I am inspired by the works of Goldwasser
in five or 10 years? attention to definitions! Study the and Micali from the 1980s (semantic
methodology of modern cryptography, security, digital signatures and zero-
CD: I certainly expect continued for example, the definitions of security knowledge); and I would start there.
theoretical and experimental for digital signatures in the SIAM 1988 A good understanding of random
investigations into what can be achieved paper of Goldwasser, Micali, and Rivest. variables and concentration bounds
with differential privacy and a deeper Read privacy papers to see how the is extremely valuable.
understanding of its connections to other authors formulate the privacy goals.
fields, such as statistics, game theory Notice how the adversary and its goals Biography

and machine learning. My hope is that are formalized—if at all—and the Michael Zuba is a Ph.D. candidate at the University of
Connecticut. His research is on underwater acoustic
differential privacy can democratize assumptions about the information communication and networking. He is the recipient of
research, giving to members of the public and computational power to which an NSF EAPSI fellowship and a Department of Education
GAANN fellowship in advanced computing.
who are not “credentialed researchers” the adversary has access. Are the
the ability to learn about the population assumptions reasonable? See what the © 2013 ACM 1529-4972/13/09 $15.00

XRDS • fall 2013 • Vol.20 • No.1 59


profile   Department Editor, Adrian Scoică researchers have a lot of interesting
ideas, but the technology transfer

Jessica Staddon never happens. PARC has a tradition for


great ideas that the consumer actually

Managing Google’s never gets to see, because they are not


productionalized. It is simply not an

Privacy Research
issue at Google. Here, research is very
embedded, and technology transfer
happens as part of your job.” Secondly,
DOI: 10.1145/2517256
she expressed a deep appreciation toward
her colleagues, who create an inspiring
As we witness and engaging work environment within
questions regarding the the company culture.
secrecy of our emails Despite not having remained in
and voice calls break academia after her Ph.D., her constant
loose into the realm of stream of publications throughout the
politics, little attention years serves to dispel the myth that
is paid to the researchers and engineers working in the industry rarely gets you
who bear the burden of keeping our published. This fear of lack of visibility
digital estates safe from prying eyes. We intimidates countless young researchers
sat down with Jessica Staddon, privacy each year. “It is hard to imagine while
research manager at Google, who offers you are in grad school, that there are
rare insight into what it takes to become a other ways to have impact on the world
privacy scientist for one the world’s best- which are not publication-oriented,”
known software companies. commented Staddon, adding she strongly
Staddon’s research career path today, with the introduction of numerous encourages grad students to find other
started out in security. In the years DARPA and IARPA programs as significant ways to have impact on the world. “Here at
preceding the dot-com bubble, she historical milestones. “A lot of the work Google, you develop something and it can
was a Ph.D. candidate in Berkeley’s was in data mining and pattern detection, go on to a product that millions of users
Mathematics Department, working on maybe identifying potential terrorists, but experience all the time. So the research
the management of encryption keys. “I there was also this increasing awareness is more oriented towards user impact
was drawn to mathematics because I that this needed to be paired with we and product impact, with publications
was really interested in problems that are needed to protect the privacy of the users being an offshoot of that, as opposed
easy to describe; you don’t need a lot of in some way, in addition to finding those to the main goal. However, publishing is
terminology or background to state the aggregate patterns.” It was through this understood to be an important part of the
problems, but they are very hard to solve.” quest for safeguarding user privacy that job,” she reassured me.
After being awarded her Ph.D., she inference detection was developed, which Looking back on her career, the
sought a broader work scope. Staddon she proudly acknowledged as the most importance of a broad scientific outlook
joined RSA Labs, then first moved on to evolved of her scientific contributions. easily stands out as valuable advice for
Bell Labs, and later to Xerox PARC before Staddon describes Google as emerging privacy researchers. “Certainly
finally joining Google in 2010. Each time, “the epitome” of applied research in privacy, it is often the case that ideas
she said, her interdisciplinary outlook environments. “Google is a fantastic place which are really established and second
on computer science expanded, and she to work on privacy. It really values the nature in other disciplines can really have
was exposed to increasingly dynamic and work tremendously, it is really concerned an impact. ” It is no wonder that in the light
exciting work environments. about privacy as a problem area, and... of the recent NSA scandal, she believes
It was during the mid 2000s that from personal experience, it has sort of a intensified media attention will prove
her prior work in a progressively broader sense of what it takes to tackle healthy to research and help invigorate
interdisciplinary applied research privacy.” With Google under constant progress. “I do think that for research,
environment allowed her to recognize public pressure to maintain impeccable overall that’s a good thing, with different
the emerging field of privacy and privacy standards, I wanted to know what corners of the world all thinking about this
consequently shift her research interests she thought were the most rewarding issue. I would like to see the discussion
from security. She recalls “people aspects of her job. even broader than it is, actually. In the
started realizing that although privacy “First, I think I would put the impact sense that these conversations cause
and security are related, they are not out there,” she replied. “I never feel more people to think about these things,
the same thing.” Once she started that I’m going to come to work and not I do think that it’s valuable.”
focusing on privacy, she witnessed the make any difference. A lot of places have
establishment of the field, as we know it these sort of active research labs, where © 2013 ACM 1529-4972/13/09 $15.00

60 XRDS • fall 2013 • Vol.20 • No.1


h t tp: // w w w. ac m.or g /dl
end
labz Twitter users to regret their posts, and

CyLab Usable Privacy whether a proposed set of data-shar-


ing categories for smartphone apps is
consistently understood by users. In
and Security Laboratory each of these cases, CUPS researchers
are interested in how computer users

Pittsburgh, PA think and make decisions, in order to


help those users make better, more in-
formed choices.
CUPS is also actively engaged in re-

T
search to gain a better understanding
he recent spying disclosures in-person surveys to Mechanical Turk of how the practice of online behavior-
from Edward Snowden are studies with 12,000 participants. Re- al advertising affects user privacy. In
just the latest front of a long- searchers at CUPS have also explored online behavioral advertising, compa-
standing debate within our how to measure and compare the nies track users online to show them
society. As we struggle to balance strength of passwords in a meaning- targeted ads. What options do adver-
computer security, privacy, and the ful way, and this work has included the tising companies provide for users
public good, a research lab at Carn- development of a technique to deter- not to be tracked? Can users manage
egie Mellon University remains dedi- mine when a given password-cracking to use the available options to protect
cated to addressing the broad array of algorithm would crack a given pass- their privacy? Are online advertisers
challenges collectively called “usable word in much less time than actually following their own disclosure rules?
privacy and security.” running the algorithm. These are among the questions CUPS
The CyLab Usable Privacy and Secu- Another arm of CUPS research in- lab members have addressed through
rity Laboratory (CUPS) brings together volves understanding how users make this research.
a diverse team of researchers includ- privacy decisions, specifically looking CUPS researchers have delved into
ing computer scientists, engineers, at whether website privacy policies many other areas as well. Work has
public policy experts, and economists can be presented in a more human- examined how users react to different
at Carnegie Mellon University. The readable form—such as using a design computer warnings, such as the warn-
director of CUPS is Professor Lorrie inspired by food nutrition labels. Re- ings browsers display when a user at-
Faith Cranor, who is a faculty member cent work has examined what leads tempts to navigate to a website with an
at both the Institute for Software Re- invalid certificate. Other areas of inter-
search in the School of Computer Sci- est include looking at how users man-
ence and the Engineering and Public age the data and files on their devices
Policy department of the College of
Engineering. CUPS seeks in the home environment. Still other
work has investigated how to train us-
Researchers work on issues at the to navigate ers to avoid phishing attacks. The re-
convergence of security, privacy, and
usability. One example of the research the increasingly searchers working on that project de-
veloped an online training game and
conducted at CUPS is an ongoing complex space other anti-phishing tools. They ended
passwords project. Passwords are be-
coming increasingly ubiquitous as we of computer up starting a company called Wombat
Security to commercialize these tools
entrust them with more and more of security and and sell them to companies around
our data. They are often the only bar-
rier keeping out a potential attacker. privacy in order the world to train their employees.
Students contribute meaningfully
Researchers at CUPS are investigating to help users to all phases of CUPS research. With
how to provide policies and guidance
for users creating passwords. The goal
be more secure faculty guidance, students take part in
creating and developing study instru-
is for organizations to be able to give and better able ments and software, gathering data,
their users guidelines that lead to se-
cure yet memorable passwords. CUPS
to make privacy performing data analysis, and writing
up the results for publication. In fact,
password research has ranged from decisions. many research projects within CUPS

62 XRDS • fall 2013 • Vol.20 • No.1


Internet Explorer is the only
browser that ships with the
Do Not Track HTTP header
enabled by default.

have their roots in student ideas and jective: CUPS seeks to navigate the Researchers from the CyLab Usable
class projects. increasingly complex space of com- Privacy and Security Laboratory at CMU.
The latest news from CUPS is that puter security and privacy in order to
many of its faculty are collaborating help users be more secure and better
to offer a unique master’s degree for able to make privacy decisions. CUPS
privacy engineers. Called the Master research also influences public policy
of Science in Information Technology- makers. CUPS faculty members have
Privacy Engineering, this degree is been invited to testify at Congressio-
offered jointly through Carnegie nal hearings and are regularly called
Mellon University’s School of Com- upon to advise U.S. federal agencies
puter Science and College of En- on privacy issues. Further, CUPS re-
gineering. The one-year graduate search is published at the top secu-
program is intended to prepare its rity and human-computer interaction
students to create and develop sys- conferences, as well as at SOUPS, the
tems to protect user privacy through Symposium On Usable Privacy and
a combination of classes on privacy Security, which was founded by CUPS
and security and a practical hands-on director Lorrie Faith Cranor.
capstone project. The first class in this
program began Fall 2013. Biography
As you can see, the CUPS lab is Rich Shay is a Ph.D. student at Carnegie Mellon University
engaged in many different strands in the School of Computer Science. He conducts research
on usable privacy and security. His current research
of research. These diverse projects, focuses on password-composition policies, as well as
however, all share a common ob- online privacy and online behavioral advertising.

XRDS • fall 2013 • Vol.20 • No.1 63


CACM_TACCESS_one-third_page_vertical:Layout 1 6/9/09 1:04 PM Page 1

European regulators have forced


Facebook to stop running facial
recognition software on photos
uploaded by users.

ACM back
Transactions on WLAN Security
Accessible It is entirely likely that you, reader, are within 10 feet of a device that can
Computing communicate on a wireless local area network (WLAN). Perhaps there is even
one in your hand or pocket right now. It’s no big surprise that much of the world
is becoming absolutely inundated with such devices, and it’s been a long time
coming. With such a drastic increase in the number of these devices, however,
comes a drastic increase in the amount of private information that is broadcast
on radio waves for anyone who is willing to listen. Fortunately, we have ways of
protecting this information, but it hasn’t always been so secure.
The first wireless network was developed by Dr. Norman Abramson at the
University of Hawaii. It was called ALOHAnet and included seven computers
across four islands. ALOHAnet pioneered a lot of interesting technology and
some ideas that are in use today, but most modern WLANs are implemented
using the IEEE 802.11 standards. The first of these was released in 1997 and
included a clause describing Wired Equivalency Privacy (WEP). WEP used the
RC4 stream cipher for encryption and a 24-bit initialization vector. The way WEP
uses RC4 and the short initialization vector, however, was shown to be exploitable
in 2001 by Scott Fluhrer, Itsik Mantin, and Adi Shamir. This fact, along with
further demonstrated exploits, led to the development of Wi-Fi Protected Access
(WPA) and Wi-Fi Protected Access II (WPA2). WPA was meant to be a temporary
replacement for WEP until WPA2 became available in 2004 as part of the 802.11i
amendment. It implemented much of the 802.11i amendment and adopted
the Temporal Key Integrity Protocol (TKIP), which generates a new key for each
◆ ◆ ◆ ◆ ◆ packet. This made WPA secure from the types of attacks that had previously
plagued WEP. WPA2 went further and introduced the Counter Cipher Mode and
This quarterly publication is a
Block Chaining Message Authentication Code Protocol (CCMP) based on the
quarterly journal that publishes Advanced Encryption Standard (AES) block cipher. The ratification of the 802.11i
refereed articles addressing issues draft standard, which marked the release of WPA2, also officially deprecated
of computing as it impacts the WEP. A security flaw was revealed in 2011 for routers that had the Wi-Fi Protected
Setup (WPS) feature enabled, but WPA2 is still far more secure than other options
lives of people with disabilities.

Wireless network (left) by Porao. Linksys router by Jonathan Zander (Digon3).


and is still in common use. —Finn Kuusisto
The journal will be of particular
interest to SIGACCESS members
and delegrates to its affiliated
conference (i.e., ASSETS), as well
as other international accessibility
conferences.
◆ ◆ ◆ ◆ ◆ WEP WPA2
www.acm.org/taccess Base Standard 802.11 - 1997 802.11 - 2007
Year Introduced 1997 2004
www.acm.org/subscribe Year Deprecated 2004 —
Superseded — WPA
Cipher RC4 AES
Cipher Type Stream Block
Routers with Wi-Fi Protected Setup
Flaws Multiple
feature enabled are vulnerable

64 XRDS • fall 2013 • Vol.20 • No.1


hello world

Zero-Knowledge Proofs
BY Marinka Zitnik

A
zero-knowledge proof allows practical and theoretical interests in have on the complexity theory. We will
one person to convince an- cryptography and mathematics. They then conclude with an application of
other person of some state- achieve a seemingly contradictory goal zero-knowledge proofs in cryptog-
ment without revealing any of proving a statement without reveal- raphy, the Fiat-Shamir identification
information about the proof other than ing it. We will describe the interactive protocol, which is the basis of current
the fact that the statement is indeed proof systems and some implica- zero-knowledge entity authentication
true. Zero-knowledge proofs are of tions that zero-knowledge proofs schemes.

Interactive Proof Systems


Definition 1: Definitions of parties participating in Fiat-Shamir identification We first give an overview of an
protocol in Python. interactive proof system. There are
two participants in the system, Peggy
from fractions import gcd and Victor. Peggy is the “prover” and
from numpy.random import random_integers Victor is the “ verifier.” Peggy knows a
fact and she wishes to communicate to
class Prover():
Victor she knows it without revealing
def __init__(self, tc, s):
it. We can think of Peggy and Victor
assert gcd(s, tc.n) == 1, ‘Secret s and n are not coprime.’
self.n = tc.n as being probabilistic algorithms that
self.__s = s communicate to each other through a
tc.set_up((self.__s**2)%self.n) communication channel. Initially, they
both possess some input object (i.e. a
def generate(self): graph representation). The objective is
self.r = random_integers(1, self.n-1) for Peggy to convince Victor that this
x = (self.r**2)%self.n
object has a specified property, i.e.
return x
that it is a yes-instance of a particular
def response(self, e): decision problem. For instance,
y = (self.r*self.__s)%self.n if e else self.r we could ask if the given graph is
return y isomorphic to another graph [1] or if a
given graph is 3-colorable [1].
class Verifier(): The interactive proof is a challenge
def __init__(self, tc): and response protocol that consists
self.tc = tc
of a number of iterations. In each
def set_up(self, x): iteration Peggy and Victor do the
self.x = x following: (1) Victor challenges Peggy
with a problem instance, (2) Peggy
def challenge(self): performs some private computation,
self.e = random_integers(0, 1) and (3) she sends a response to
return self.e Victor. At the end of the proof, Victor
either accepts or rejects, depending
def verify(self, y):
on whether or not Peggy successfully
test = (self.x*self.tc.v**self.e)%self.tc.n
if y==0 or (y**2)%self.tc.n != test: replies to all of Victor’s challenges.
return False Interactive proof systems have to be
return True sound and complete [1]. A proof is
complete if honest Victor will always
class TrustedCenter(): be convinced of a true statement by
def __init__(self, n): honest Peggy. It is sound, if cheating
self.n = n Peggy can convince honest Victor that
the same false statement is actually
def set_up(self, v):
self.v = v true with only a small probability.
A zero-knowledge proof is an
interesting type of an interactive proof.

XRDS • fall 2013 • Vol.20 • No.1 65


ACM’s This is one in which Victor, at the
end of the proof, still has no idea

Career & Job Center


of how to prove by himself that an
object has a property of interest.
Readers can find a formal definition
of the zero-knowledge strategy in
Looking for your next IT job? Cryptography: Theory and Practice
and “Definitions and Properties of
Need Career Advice? Zero-knowledge Proof Systems” by
Goldreich and Oren [1, 2].

Visit ACM’s Career & Job Center at: Fiat-Shamir Identification Protocol
Zero-knowledge proofs in cryptog-
raphy have natural applications for

http://jobs.acm.org entity authentication. We assume


Peggy possesses some secret s
that only she can know. She proves
Offering a host of career-enhancing benefits: to Victor she is indeed Peggy
by proving she possesses that
secret. Obviously she wants to do
➜ A highly targeted focus on job opportunities in so without revealing the secret to
the computing industry any eavesdropper. The Fiat-Shamir
identification protocol [3] serves
➜ Access to hundreds of corporate job postings as the basis of modern zero-knowl-
edge identification protocols, such
as Feige-Fiat-Shamir and Guillou-
➜ Resume posting keeping you connected to the
Quisquater schemes.
employment market while letting you maintain Three parties (see Definition 1)
full control over your confidential information participate in the protocol, which
consists of two phases: initialization
➜ An advanced Job Alert system notifies you of and identification (see Definition 2).
In initialization a trusted center
new opportunities matching your criteria selects two primes p and q , keeps
them secret and publishes the n=pq .
➜ Career coaching and guidance from trained Then Peggy selects a secret number
experts dedicated to your success s that is coprime to n , computes
v = s 2 mod n and registers v with
trusted center as her public key.
➜ A content library of the best career articles
The identification phase is repeated
compiled from hundreds of sources, and much t times and if Victor successfully
more! completes all t iterations, he
accepts. In each iteration, Peggy
chooses a random r and sends
x = r2 mod n to Victor. Then Victor
The ACM Career & Job Center is the perfect place to randomly selects a bit b and sends it
begin searching for your next employment opportunity! to Peggy. She privately computes y = r
(if b =0) or y = rs (if b =1) and sends y

http://jobs.acm.org to Victor. Finally, Victor rejects if y =0


or if y 2 ≡
/ x v b (mod n).
The Fiat-Shamir protocol is
complete because honest Peggy
can always correctly provide
Victor with y based on bit b that
he selected. Therefore, honest
Victor will successfully complete
all t iterations and will accept with

CareerCenter_TwoThird_Ad.indd
66 1 4/3/12 1:38 PM XRDS • fall 2013 • Vol.20 • No.1
probability 1. If Peggy (or an impostor)
does not possess the secret s , then Definition 2: Initialization and identification phases of Fiat-Shamir
she can provide only a random guess identification protocol in Python.
of y = r or y = rs . Honest Victor will
def fiat_shamir_initialization(n, s):
reject with probability ½ in every tc = TrustedCenter(n)
iteration. That implies an overall peggy = Prover(tc, s)
probability of 2–t that cheating Peggy return tc, peggy
will not be caught and as a result the
Fiat-Shamir protocol is sound. The def fiat_shamir_identification(t, tc, peggy):
Fiat-Shamir scheme also upholds the victor = Verifier(tc)
property of zero-knowledge. The only for _ in xrange(t):
victor.set_up(peggy.generate())
information revealed in each round is
c = victor.verify(peggy.response(victor.challenge()))
the x and y. Such pairs (x ,y) could be if not c:
simulated by choosing y randomly and print ‘Reject’
then computing the corresponding return
x . These pairs are computationally print ‘Accept’
indistinguishable from pairs generated
by the protocol.
Definition 1 is a straight-forward Definition 3: An example run of the Fiat-Shamir identification protocol.
implementation of the described Fiat- Suppose the trusted center selects an RSA-like modulus n=35, Peggy secretly
Shamir identification protocol. Let see chooses s=16, and Victor requires t=10 successful iterations of the protocol.
an example with p =7 and q =5. Then
n =35 and n is published to a trusted >>> tc, peggy = fiat_shamir_initialization(35, 16)
center. Let assume Peggy secretly >>> fiat_shamir_identification(10, tc, peggy)
Accept
chooses s =16, which is coprime to
35. She publishes v =11 to the trusted
center. Victor requires 10 successful
rounds of the protocol in order for him cheating. At the outset of the game, that she did the work. This is an
to accept (see Definition 3). parties commit to the secret inputs NP-statement since the work is a
and random coins of the prescribed valid witness, which Alice has in her
Zero-Knowledge Proofs tools they are supposed to use. They possession. Bob will believe the proof,
and NP Complexity Class then carry out the game procedures but he will not be able to convincingly
Zero-knowledge proofs exist for and with each output message they transfer the transcript of that proof
decision problems, such as graph prove to each other in zero-knowledge to anybody else. For all we know,
isomorphism, 3-colorability, quadratic that the message was honestly Bob could have created the encoded
residuosity, and non-residuosity. obtained under the committed inputs transcript of the homework on his own
Readers would now ask, for which and random coins. Properties of zero- by running a simulator. In other words,
problems can we design zero- knowledge systems guarantee us that Alice’s proof is deniable, in that she can
knowledge proofs. Powerful and participants have to act honestly in plausibly claim she was not responsible
general result exists [4] that informally order to be able to provide a valid proof for producing it.
say that any language for which (i.e. soundness) and the proofs cannot
membership can be efficiently verified compromise the privacy of their secret References

can be proved in zero-knowledge. Zero- inputs (i.e. zero-knowledge). [1] Stinson, D. R. Cryptography: Theory and Practice.
Chapman & Hall, Boca Raton, FL, 2005.
knowledge proofs exist for all problems Zero-knowledge systems are useful
[2] Oded Goldreich and Yair Oren. Definitions and
in NP, provided that one-way functions to assure deniability and prevent properties of zero-knowledge proof systems.
unwanted transfer of information. Journal of Cryptology, 7, 1 (1994), 1-32.
exist. That result is utilized for the
[3] Fiat, A. and Shamir, A. How to prove yourself:
design of cryptographic protocols, Suppose Alice wants to prove her practical solutions to identification and signature
because it enforces parties to behave classmate Bob that she did her essay problems. Advances in Cryptology-Crypto’86 ,
(1987), 186-194.
according to predetermined standards. homework. One way to do this is for
[4] Goldreich, O., Micali, S., and Wigderson, A. Proofs
Alice to show her homework to Bob. that yield nothing but their validity or all languages in
Conclusion However, what if Bob is ignorant and NP have zero-knowledge proof systems. Journal of
the ACM , 38, 3 (1991), 690-728.
Zero-knowledge proofs have some wants to cheat by copying Alice’s
fascinating applications. We might essay? The problem is an essay
use them to enforce honest behavior. identifying Alice as an author is
For instance, parties in an interactive transferable. Instead, Alice should © 2013 ACM 1529-4972/13/09 $15.00
game could prove they are not prove to Bob using zero-knowledge

XRDS • fall 2013 • Vol.20 • No.1 67


end

Events The World Security Summit 2013 Third International Conference


(WSS 2013) on Computer Science and Network
conferences Global Lanka Technology (ICCSNT 2013)
Colombo, Sri Lanka Dalian Jiaotong University
Second Nordic Symposium on Cloud September 11-13, 2013 Dalian, China
Computing & Internet Technologies http://www.asdf-wss.org/ October 12-13, 2013
(NORDICLOUD 2013) http://www.iccsnt.org/
SINTEF ICT Eighth International Workshop on
Oslo, Norway Data Privacy Management (DPM 2013) Visualization for Cyber Security
September 1-3, 2013 Royal Holloway, University of London (VizSec 2013)
http://nordicloud.net/ Egham, UK Marriott Marquis Hotel
September 12-13, 2013 Atlanta, GA
Fifth Latin American Conference http://research.icbnet.ntua.gr/ October 14, 2013
on Networked and Electronic Media DPM2013/ http://vizsec.org/
(LACNEM 2013)
Manizales, Colombia Central European Conference on First IEEE Conference on
September 1-4, 2013 Information and Intelligent Systems Communications and Network
http://lacnem.org/ (CECIIS 2013) Security (IEEE CNS 2013)
Varazdin, Croatia Washington D.C.
International Conference on September 18-20, 2013 October 14-16, 2013
Availability, Reliability and Security http://ceciis.foi.hr/app/index.php/ http://www.ieee-cns.org/
(ARES 2013) ceciis/2013
University of Regensburg Ninth International Conference on
Regensburg, Germany Sixth Balkan Conference Network and Service Management
September 2-6, 2013 in Informatics (BCI 2013) (CNSM 2013)
http://www.ares-conference.eu/conf/ The Met Hotel Swissotel Zurich
Thessaloniki, Greece Zurich, Switzerland
Federated Conference on Computer September 19-21, 2013 October 14-18, 2013
Science and Information Systems http://bci2013.bci-conferences.org/ http://www.cnsm-conf.org/2013/index.
(FedCSIS) index.php html
AGH University of Science and
Technology The Second International Conference The Ninth International Conference
Krakow, Poland on Informatics & Applications on Intelligent Information Hiding and
September 8-11, 2013 (ICIA 2013) Multimedia Signal Processing (IIH-
http://fedcsis.org/ Technical University of Lodz MSP 2013)
Lodz, Poland Beijing University of Technology
On the Move 2013 September 23-25, 2013 Beijing, China
Graz, Austria http://sdiwc.net/conferences/2013/ October 16-18, 2013
September 9-13, 2013 icia2013/ http://www.bjut.edu.cn/college/dzxxkz/
http://www.onthemove-conferences.org/ iihmsp13/
International Conference on Advanced
The 18 European Symposium on
th Computer Science and Information Ninth IEEE International Conference
Research in Computer Security Systems (ICACSIS 2013) on Collaborative Computing:
(ESORICS 2013) Pullman Bali Legian Nirwana Networking, Applications and
Royal Holloway, University of London Bali, Indonesia Worksharing
Egham, UK September 28-29, 2013 Hilton Austin
September 9-13, 2013 http://icacsis.cs.ui.ac.id/index.php/ Austin, TX
http://esorics2013.isg.rhul.ac.uk/ icacsis2013/ICACSIS2013 October 20-23, 2013
http://collaboratecom.org/2013/
Security and Trust The Seventh International Conference show/home
Management 2013 (STM 2013) on Application of Information and
Egham, UK Communication Technologies The 38th IEEE Conference on Local
September 9-13, 2013 (AICT 2013) Computer Networks (LCN 2013)
https://sites.google.com/site/ Technical University of Lodz Novotel Central Sydney
sectrustmgmt2013/Home Baku, Azerbaijan Sydney, Australia
October 9-11, 2013 October 21-24, 2013
http://www.aict. http://www.ieeelcn.org/index.html
info/2013/#sthash.058BBz4T.dpbs

68 XRDS • fall 2013 • Vol.20 • No.1


19th International Conference IEEE International Workshop on
on Advanced Computing and Information Forensics and Security featured event
Communications (ADCOM 2013) (WIFS 2013)
Indian Institute of Technology Chime Long Hotel
Chennai, India Guangzhou, China
October 21-25, 2013 November 18-21, 2013
http://accsindia.org/accs-adcom-2013. http://www.wifs13.org/index.asp#.
html UbCG_dI9FLE

Eighth IEEE Workshop on Network Australasian Telecommunication


Security (WNS) Networks & Applications Conference
Sydney, Australia (ATNAC 2013)
October 24, 2013 The Chateau on the Park
http://wns-lcn2013.conference.nicta. Christchurch, New Zealand
com.au/ November 20-22, 2013
http://www.atnac.org/
ACM International Conference Workshop on Privacy in
on Information and Knowledge The Sixth International Conference on the Electronic Society
Management (CIKM 2013) Security of Information and Networks Berlin, Germany
San Francisco Airport Marriott (SINCONF 2013) November 4, 2013
Waterfront Aksaray University
Burlingame, CA Aksaray, Turkey All of us are living in this vastly
October 27-November 1, 2013 November 26-28, 2013 changing digital world, so it’s no
http://www.cikm2013.org/index.php http://www.sinconf.org/sin2013/ surprise that privacy is one of the
biggest and most serious problems
XXIV International Conference on International Conference on Research facing us all. With more than a
Information, Communication & and Innovation in Information billion users on Facebook and
Automation Technologies (ICAT 2013) Systems (ICRIIS 2013) Twitter, and people sharing vast
Sarajevo, Bosnia and Herzegovina Universiti Tenaga Nasional amounts of information online,
October 30- November 1, 2013 Selangor, Malaysia the concern for maintaining one’s
http://icat.etf.unsa.ba/icat-2013/cms/ November 27-28, 2013 own identity and secrecy has never
http://seminar.spaceutm.edu.my/ been as high. According to Andrew
Workshop on Privacy in the Electronic icriis2013/index.html Grove, co-founder and former CEO
Society (WPES) of Intel, there exists a force right at
Berlin, Germany the heart of Internet culture that
November 4, 2013 wants to know everything about a
CONTESTS & EVENTS
http://wpes2013.di.unimi.it/ person, and once the information
Zero Robotics High School is obtained it becomes a very
20th ACM Conference on Computer valuable asset to be traded upon.
Tournament 2013
and Communications Security To address the problems of
Zero Robotics is a robotics
(CCS 2013) privacy, the Workshop on Privacy
programming competition where the
Berlin Congress Centre in the Electronic Society will be
robots are SPHERES (Synchronized
Berlin, Germany held this fall in conjunction with
Position Hold Engage and Reorient
November 4-8, 2013 the 20th ACM Conference on
Experimental Satellites) satellites inside
http://www.sigsac.org/ccs/CCS2013/ Computer and Communications
the International Space Station (ISS).
The competition kicks off online (http:// Security. There will be discussions
2013 15th International Conference and presentations on topics
www.zerorobotics.org), where teams
on Communication Technology ranging from Internet and data
compete to solve an annual challenge
(ICCT 2013) privacy to human rights and
guided by mentors. Participants can
Guillin, China privacy policies. The workshop
create, edit, share, save, simulate, and
November 17-19, 2013 promises to address the problems
submit code, all from a Web browser.
http://conference.bupt.edu.cn/icct2013/ and solutions of privacy in globally
The participants compete to win
index.html interconnected communities.
a technically challenging game by
Photograph by S. Borisov

programming their strategies into Workshop organizers are seeking


the SPHERES satellites. The game submissions both from academia
is motivated by a current problem and industry. For details, log onto
of interest to DARPA, NASA, and http://wpes2013.di.unimi.it/.
MIT. After several phases of virtual —Arka
— Bhattacharya

XRDS • fall 2013 • Vol.20 • No.1 69


end

competition, finalists are selected to Deadline: Fall 2013


acronyms compete in a live championship aboard Eligibility: Citizens or permanent
the ISS. An astronaut will conduct residents of the U.S. who are “willing
the championship competition in to morally commit to make their skills
microgravity with a live broadcast! available to the United States in time
ECHR European Convention on Human High school students ages 16-18 may of national emergency.”
Right: An international treaty to participate in this competition that Benefits: Tuition and $31,000- $35,000
protect human rights and fundamental begins on September 7th of this year. stipend for up to five years.
freedoms in Europe. Its Article 8 Explanation: The Hertz Foundation
Columbia Startup Weekend awards fellowships to students
guarantees the individual’s right to
Columbia Startup Weekend will be a 54- pursuing a Ph.D. in the applied
respect for their “private and family life.” hour event held in Columbia, Missouri physical biological, and engineering
designed to mobilize technical sciences. The selection process
and non-technical entrepreneurs. includes a technical interview.
PII Personally Identifiable Information: Starting with Friday night pitches and
A legal concept, mainly used in continuing through brainstorming, Paul and Daisy Soros Fellowship
information security. It is any business plan development, and basic Program For New Americans
prototype creation, this event will Website: http://www.pdsoros.org/overview/
information that can be used on its own
culminate with Sunday night demos Deadline: November 8, 2013
or with other information to contact, and presentations. Participants will Eligibility: Recent American
locate, or identify a single person or an create working startups during the permanent residents and children of
individual in a given context. event and collaborate with like-minded naturalized U.S. citizens before their
individuals outside of their daily second year in a graduate program.
networks. All teams will attend talks by Benefits: Tuition and living expenses
POWF Physical One-Way Function: An industry leaders and receive valuable for two years.
feedback from local entrepreneurs. Explanation: This fellowship is
optical PUF, practically unclonable.
Whether you are looking for feedback awarded to 30 new Americans.
on an idea, a co-founder, specific skill The selection criteria are based only
sets, or a team to help you execute, on merit, not on financial need.
PUF Physically Unclonable Function: Columbia Startup Weekend is the
In practical cryptography, a PUF is a perfect environment in which to test National Physical Science
function that is embodied in a physical your idea and take the first steps Consortium Fellowship
structure and is easy to evaluate but toward launching your own startup. Website: http://www.npsc.org/index.html
For more information see http:// Deadline: November 30, 2013
not to predict.
swcolumbia2013-eorg.eventbrite.com/. Eligibility: U.S. citizens pursuing
graduate work at an NPSC member
institution.
TIA Total (now, Terrorism) Information GRANTS, SCHOLARSHIPS &
Benefits: Two to six years of support.
FELLOWSHIPS
Awareness: A DARPA program started Explanation: This fellowship includes
in 2003, with the goal to integrate Microsoft Ph.D. Fellowship Program one or two paid summer internships at
information technologies into a Website: http://research.microsoft.com/ a government agency. Nine-five percent
en-us/collaboration/awards/apply-us.aspx of awardees have been female or an
prototype system to provide tools to
Deadline: October 2013 ethnic minority living within in the U.S.
better detect, classify, and identify
Eligibility: Second and third-
potential foreign terrorists. Due to year Ph.D. students in computer POINTERS
public criticism that the system could science, electrical engineering, and
lead to a mass surveillance system, mathematics studying in the U.S. or PRIVACY RESOURCES
Congress defunded the program before Canada.
Benefits: Two years of tuition and fees, “The Panopticon is a marvelous
year’s end.
$28,000 stipend, $4,000 travel budget. machine which, whatever use one
Explanation: The two-year fellowship may wish to put it to, produces
program is for outstanding Ph.D. homogeneous effects of power,”
TOR The Onion Router: A network of students nominated by their universities. wrote Michel Foucault in 1975. Since
virtual tunnels that allows people and Fellowships are granted by Microsoft then, the “panoptical concept” has
groups to improve their privacy and Research at the discretion of Microsoft. evolved from dystopian prisons and
security on the Internet. Its goal is to schools to the Internet, which in its
Hertz Fellowship openness lends itself to a massive
enable online anonymity.
Website: http://www.hertzfoundation. surveillance state. Indeed there’s
org/dx/fellowships/application.aspx/ a tension between the systematic

70 XRDS • fall 2013 • Vol.20 • No.1


perversion of privacy thanks to perspective to democracy’s most
“Edward Snowden, NSA files source: ‘If
government spying and the need sacrosanct action.
they want to get you, in time they will’”
for information to be free. As our
In this new digital world, there
private actions are tracked, questions “Magaupload, the Copyright Lobby,
exists both social media and PRISM,
of whether the government and and the Future of Digital Rights”
the government’s robust spying
business should be more accountable by Robert Amsterdam and Ira Rothken
apparatus. Some praise the heroism
have arisen. Where does that leave The question of privacy is intimately
of whistleblowers against a large
the question of privacy in the connected with that of digital rights.
government, and others the initiative
digital age? Scientists, hackers, Remember the Internet outcry over
thereof in preventing future acts
philosophers, and journalists SOPA? This is a biased, but detailed
of terror. But as a footnote, I wonder,
have debated that since the days of and interesting, look through
are these not both flip sides
ARPANET. Look no further, presented the careful relationship between
of the same coin? Read more in
are some gems surrounding this government and technology viewed
this exclusive interview.
discussion!   —Ashok Rao through the prism of law.
http://www.guardian.co.uk/world/2013/
http://www.kim.com/whitepaper.pdf
jun/09/nsa-whistleblower-edward-
Little Brother snowden-why
READING LIST
Cory Doctorow, Tor Books (2010)
An award-winning book that you
The Transparent Society: Will
might not guess is fiction. NOTEWORTHY WEBSITES
Technology Force Us to Choose
“Marcus, a.k.a ‘w1n5t0n,’ is only 17
Between Privacy and Freedom
years old, but he figures he already Tor
David Brin, Basic Books (1999)
knows how the system works–and It’s all about encryption and
“The Transparent Society is a call for
how to work the system. Smart, fast, privacy. Even if you don’t care about
‘reciprocal transparency.’ If police
and wise to the ways of the networked surveillance, the computer science
cameras watch us, shouldn’t we be
world, he has no trouble outwitting behind the concept is brilliant.
able to watch police stations? If credit
his high school’s intrusive but clumsy https://www.torproject.org/
bureaus sell our data, shouldn’t we
surveillance systems. But his whole
know who buys it? Rather than cling to
world changes when he and his Applied Cryptography and Encryption
an illusion of anonymity—a historical
friends find themselves caught in If you want to understand digital
anomaly, given our origins in close-
the aftermath of a major terrorist privacy, you have to understand
knit villages—we should focus on
attack on San Francisco. In the wrong cryptography. This is a free online
guarding the most important forms
place at the wrong time, Marcus class taught by David Evans at the
of privacy and preserving mutual
and his crew are apprehended University of Virginia. Provided
accountability. The biggest threat
by the Department of Homeland you have a basic understanding of
to our freedom, Brin warns, is that
Security and whisked away to a secret programming and number theory,
surveillance technology will be used
prison where they’re mercilessly this course will take you through the
by too few people, now by too many.”
interrogated for days. When the quest of breaking secrets in every day
(From Amazon)
DHS finally releases them, Marcus life. And you’ll learn all about Tor in
discovers that his city has become the process!
Broken Ballots: Will Your Vote Count?
a police state where every citizen https://www.udacity.com/course/cs387
Douglas W. Jones and Barbara Simons,
is treated like a potential terrorist.
Center for the Study of Language and
He knows that no one will believe
Information (2012) Bitcoin
his story, which leaves him only
Voting machines have been slow to It’s the Tor of currency. The
one option: to take down the DHS
adapt to the digital revolution and, pseudonymous “Satoshi Nakamoto”
himself.” (From Amazon)
yet, even the newest devices can fall lies out the theoretical groundwork
prey to classical flaws. This book of his system in this paper
“A Simple Public Choice Theory
takes a tour through the American (http://bitcoin.org/bitcoin.pdf).
of Universal Surveillance”
voting system “gauging how It’s not too hard to understand, at
In a blog post from earlier this year,
inaccurate, unreliable, and insecure least intuitionally, and worth every
Tyler Cowen, professor of economics
[they] are.” It’s an “important minute of the read. This is only an
at George Mason University and
book for election administrators, introduction. If the topic interests you
blogger, sparks a robust discussion
political scientists, and students of be sure to read, on Page 40, all
about surveillance and who is harmed
government and technology policy.” about its strengths and weaknesses,
in the process.
After reading it, you’ll understand both of which are many.
how we came to acquire the “complex http://marginalrevolution.com/ http://bitcoin.org
technology” we now depend on to marginalrevolution/2013/06/a-simple-
count our votes. Thinking like an public-choice-theory-of-universal-
adversary gives a whole different surveillance.html

XRDS • fall 2013 • Vol.20 • No.1 71


end

BEMUSEMENT

A Quiet Evening At Home

PhD Comics ©Jorge Cham


Puzzles:
Calendar
Confusion
Security Three days ago, yesterday was
the day before Sunday. What day
will it be tomorrow?
Source: http://www.mathisfun.com/puzzles/
calendar-confusion-solution.html.

Happy
Birthday
When asked about his birthday,
a man said: “The day before yesterday
I was only 25 and next year I will
turn 28.” This is true only one day in
http://xkcd.com/538/

a year: When was he born?


Source: http://malini-math.blogspot.
com/2009/08/simple-math-puzzles.html

Find the solution at: http://xrds.acm.


org/bemusement/2013.cfm

Wikileaks submit a puzzle


Can you do better?
Bemusements would like your
puzzles and mathematical games
(but not Sudoku). Contact
xrds@acm.org to submit yours!
http://xkcd.com/834/

72 XRDS • summer 2 01 3 • V ol .19 • No.4


acm STUDENT MEMBERSHIP APPLICATION

CODE: CRSRDS
Join ACM online: www.acm.org/joinacm
Name Please print clearly
INSTRUCTIONS
Address
Carefully complete this application and return
with payment by mail or fax to ACM. You must
City State/Province Postal code/Zip
be a full-time student to qualify for student rates.
Country E-mail address
CONTACT ACM
Area code & Daytime phone Mobile phone Member number, if applicable
phone: 800-342-6626
MEMBERSHIP BENEFITS AND OPTIONS (US & Canada)
• Free software and courseware through the ACM • ACM e-news digest TechNews (thrice weekly) +1-212-626-0500
Academic Initiative • ACM online newsletter MemberNet (monthly) (Global)
• Free e-mentoring services from MentorNet® • Student Quick Takes, ACM student e-newsletter (quarterly) hours: 8:30am–4:30pm
• Electronic subscriptions to Communications of the ACM US Eastern Time
• Free "acm.org" email forwarding address plus filtering
and XRDS: Crossroads magazines through Postini fax: +1-212-944-1318
• Online courses, online books and videos • Option to subscribe to the full ACM Digital Library email: acmhelp@acm.org
• ACM's CareerNews (twice monthly) • Discounts on ACM publications and conferences, mail: Association for Computing
valuable products and services, and more Machinery, Inc.
PLEASE CHOOSE ONE:
General Post Office
❏ Student Membership: $19 (USD) P.O. Box 30777
❏ Student Membership PLUS Digital Library: $42 (USD) New York, NY 10087-0777
❏ Student Membership PLUS Print CACM Magazine: $42 (USD)
❏ Student Membership w/Digital Library PLUS Print CACM Magazine: $62 (USD) For immediate processing, FAX this
application to +1-212-944-1318.
P U B L I C AT I O N S Please check
Check the appropriate box and calculate Issues
amount due on reverse. per year Code Member Rate Air Rate* PAYMENT INFORMATION
• ACM Inroads 4 178 $16 ❐ $58 ❐
• Communications of the ACM 12 101 $25 ❐ $58 ❐ Payment must accompany application
• Computers in Entertainment (online only) 4 247 $43 ❐ N/A
Member dues ($19, $42, or $62) $
Computing Reviews 12 104 $55 ❐ $39 ❐
• Computing Surveys 4 103 $36 ❐ $32 ❐ To have Communications of the ACM
Evolutionary Computation (MIT Press) 4 177 $32 ❐ $30 ❐ sent to you via Expedited Air Service,
• interactions, new visions of human-computer interaction 6 123 $22 ❐ $35 ❐
(included in SIGCHI membership) add $58 here (for residents outside of
• Int’l Journal of Network Management (online only) (Wiley) 6 136 $92 ❐ $30 ❐ North America only). $
Int’l Journal on Very Large Databases 4 148 $83 ❐ $30 ❐
• Journal of Educational Resources in Computing (see TOCE) N/A N/A N/A N/A Publications $
• Journal of Experimental Algorithmics (online only) 12 129 $31 ❐ N/A
• Journal of Personal and Ubiquitous Computing 6 144 $65 ❐ $30 ❐ Total amount due $
• Journal of the ACM 6 102 $55 ❐ $58 ❐
• Journal on Computing and Cultural Heritage 4 173 $49 ❐ $23 ❐ Check or money order (make payable to ACM,
• Journal on Data and Information Quality 4 171 $49 ❐ $25 ❐
• Journal on Emerging Technologies in Computing Systems 4 154 $42 ❐ $23 ❐
Inc. in U.S. dollars or equivalent in foreign currency)
• Linux Journal (SSC) 12 137 $31 ❐ $33 ❐
• Mobile Networks and Applications 6 130 $72 ❐ $29 ❐ ❏ Visa/Mastercard ❏ American Express
• Wireless Networks 4 125 $72 ❐ $29 ❐
• XRDS (included with membership) 4 XRoads $39 ❐ N/A
Transactions on: Card number Exp. date
• Accessible Computing 4 174 $49 ❐ $24 ❐
• Algorithms 4 151 $52 ❐ $23 ❐
• Applied Perception 4 145 $43 ❐ $23 ❐ Signature
• Architecture & Code Optimization 4 146 $43 ❐ $23 ❐
Member dues, subscriptions, and optional contributions
• Asian Language Information Processing 4 138 $39 ❐ $23 ❐
are tax deductible under certain circumstances. Please
• Autonomous and Adaptive Systems 4 158 $41 ❐ $23 ❐ consult with your tax advisor.
• Computational Biology and Bioinformatics 4 149 $20 ❐ $49 ❐
• Computer-Human Interaction 4 119 $43 ❐ $25 ❐
• Computational Logic 4 135 $44 ❐ $25 ❐ EDUCATION
• Computation Theory 8 176 $49 ❐ $32 ❐
• Computer Systems 4 114 $47 ❐ $25 ❐
• Computing Education (formerly JERIC) 277 $25 ❐ N/A
• Database Systems 4 109 $46 ❐ $25 ❐
Name of School
• Design Automation of Electronic Systems 4 128 $43 ❐ $25 ❐
• Economics and Computation 4 192 $49 ❐ $23 ❐ Please check one: ❐ High School (Pre-college, Secondary
• Embedded Computing Systems 4 142 $44 ❐ $23 ❐ School) College: ❐ Freshman/1st yr. ❐ Sophomore/2nd yr.
• Graphics 4 112 $51 ❐ $25 ❐ ❐ Junior/3rd yr. ❐ Senior/4th yr. Graduate Student: ❐
• Information and System Security 4 134 $44 ❐ $23 ❐
Masters Program ❐ Doctorate Program ❐ Postdoctoral
• Information Systems 4 113 $47 ❐ $25 ❐
• Intelligent Systems and Technology 4 179 $46 ❐ $72 ❐ Program ❐ Non-Traditional Student
• Interactive Intelligent Systems 4 191 $49 ❐ $76 ❐
• Internet Technology 4 140 $42 ❐ $23 ❐
• Knowledge Discovery From Data 4 170 $50 ❐ $23 ❐ Major Expected mo./yr. of grad.
• Management Information Systems 4 190 $47 ❐ $22 ❐
• Mathematical Software 4 108 $47 ❐ $25 ❐ Age Range: ❐ 17 & under ❐ 18-21 ❐ 22-25 ❐ 26-30
• Modeling and Computer Simulation 4 116 $51 ❐ $25 ❐
• Multimedia Computing, Communications, and Applications 4 156 $42 ❐ $23 ❐ ❐ 31-35 ❐ 36-40 ❐ 41-45 ❐ 46-50 ❐ 51-55 ❐ 56-59 ❐ 60+
• Networking 6 118 $29 ❐ $52 ❐
• Programming Languages & Systems 6 110 $59 ❐ $32 ❐ Do you belong to an ACM Student Chapter? ❐ Yes ❐ No
• Reconfigurable Technology & Systems 4 172 $49 ❐ $24 ❐
• Sensor Networks 4 155 $42 ❐ $23 ❐ I attest that the information given is correct and that I will
• Software Engineering and Methodology 4 115 $43 ❐ $25 ❐ abide by the ACM Code of Ethics. I understand that my
• Speech and Language Processing (online only) 4 253 $33 ❐ N/A membership is non transferable.
• Storage 4 157 $42 ❐ $23 ❐
• Web 4 159 $41 ❐ $23 ❐
Marked • are available in the ACM Digital Library
* Check here to have publications delivered via Expedited Air Service. Signature
For residents outside North America only. PUBLICATION SUBTOTAL:
CARE ERS AT THE N ATI ONAL S ECURI TY A GE NCY

Rise Above
the Ordinary
A career at NSA is no ordinary job. It’s a
profession dedicated to identifying and
defending threats to our nation. It’s a
dynamic career filled with challenging
and highly rewarding work that you can’t
do anywhere else but NSA.

You, too, can rise above the ordinary. Whether


it’s producing valuable foreign intelligence or
preventing foreign adversaries from accessing
sensitive or classified national security
information, you can help protect the nation
by putting your intelligence to work.

NSA offers a variety of career fields,


paid internships, co-op and scholarship
opportunities.

Learn more about NSA and how your career


can make a difference for us all.

KNOWINGMATTERS

Excellent Career Opportunities in the Following Fields:


n Computer/Electrical Engineering n Cryptanalysis
n Computer Science n Signals Analysis
n Cybersecurity n Business Management
n Information Assurance n Finance & Accounting
n Mathematics n Paid Internships,
n Foreign Language Scholarships and Co-op
n Intelligence Analysis >> Plus other opportunities

Search NSA to Download

WHERE INTELLIGENCE GOES TO WORK®

U.S. citizenship is required. NSA is an Equal Opportunity Employer. All applicants for employment are considered without regard to race, color, religion, sex, national origin, age,
marital status, disability, sexual orientation, or status as a parent.

You might also like