Communications201709-Dl - Moving Beyond The Turing Test With The Allen AI Science Challenge

COMMUNICATIONS
ACM
CACM.ACM.ORG OF THE 09/2017 VOL.60 NO.09
Moving Beyond
the Turing Test
with the Allen AI
Science Challenge
Association for
Computing Machinery
http://www.can-cwic.ca/
Canadian Celebration of Women in Computing
The ACM Canadian Celebration of Women in Computing

November 3-4, 2017
Montreal, QC at Le Centre Sheraton Hotel
The Canadian Celebration of

Women in Computing 2017
Come celebrate with us at the largest

gathering of Women in Computing
in Canada!
Registration star ting The conference will feature prominent
st
September 1 , 2017
keynote speakers, panels, workshops,
presentations and posters, as well as
a programming challenge and a large
career fair.
For more information contact us at

cancwic@gmail.com
Association for
Computing Machinery
Previous
A.M. Turing Award
Recipients
1966 A.J. Perlis

1967 Maurice Wilkes
1968 R.W. Hamming
1969 Marvin Minsky
1970 J.H. Wilkinson
1971 John McCarthy
1972 E.W. Dijkstra
1973 Charles Bachman
1974 Donald Knuth
1975 Allen Newell
1975 Herbert Simon
1976 Michael Rabin
1976 Dana Scott
1977 John Backus
1978 Robert Floyd
1979 Kenneth Iverson
1980 C.A.R Hoare ACM A.M. TURING AWARD
NOMINATIONS SOLICITED
1981 Edgar Codd
1982 Stephen Cook
1983 Ken Thompson
1983 Dennis Ritchie
1984 Niklaus Wirth Nominations are invited for the 2017 ACM A.M. Turing Award.
1985 Richard Karp
1986 John Hopcroft This is ACM’s oldest and most prestigious award and is given
1986 Robert Tarjan to recognize contributions of a technical nature which are of
1987 John Cocke
1988 Ivan Sutherland lasting and major technical importance to the computing field.
1989 William Kahan The award is accompanied by a prize of $1,000,000.
1990 Fernando Corbató
1991 Robin Milner Financial support for the award is provided by Google Inc.
1992 Butler Lampson
1993 Juris Hartmanis
1993 Richard Stearns
Nomination information and the online submission form
1994 Edward Feigenbaum are available on:
1994 Raj Reddy http://amturing.acm.org/call_for_nominations.cfm
1995 Manuel Blum
1996 Amir Pnueli
1997 Douglas Engelbart Additional information on the Turing Laureates
1998 James Gray is available on:
1999 Frederick Brooks http://amturing.acm.org/byyear.cfm .
2000 Andrew Yao
2001 Ole-Johan Dahl
2001 Kristen Nygaard The deadline for nominations/endorsements is
2002 Leonard Adleman
2002 Ronald Rivest
January 15, 2018.
2002 Adi Shamir
2003 Alan Kay For additional information on ACM’s award program
2004 Vinton Cerf
2004 Robert Kahn please visit: www.acm.org/awards/
2005 Peter Naur
2006 Frances E. Allen
2007 Edmund Clarke
2007 E. Allen Emerson
2007 Joseph Sifakis
2008 Barbara Liskov
2009 Charles P. Thacker
2010 Leslie G. Valiant
2011 Judea Pearl
2012 Shafi Goldwasser
2012 Silvio Micali
2013 Leslie Lamport
2014 Michael Stonebraker
2015 Whitfield Diffie
2015 Martin Hellman
2016 Sir Tim Berners-Lee
COMMUNICATIONS OF THE ACM
Departments News Viewpoints
5 Letter from Members of 26 Law and Technology

the ACM U.S. Public Policy Council Digitocracy
Toward Algorithmic Transparency Considering law and
and Accountability governance in the digital age.
By Simson Garfinkel, By Joel R. Reidenberg
Jeanna Matthews, Stuart S. Shapiro,
and Jonathan M. Smith 29 Computing Ethics
Is That Social Bot Behaving Unethically?
6 Cerf’s Up A procedure for reflection and
Take Two Aspirin and discourse on the behavior of bots
Call Me in the Morning in the context of law, deception,
By Vinton G. Cerf and societal norms.
By Carolina Alves de Lima Salge
7 Vardi’s Insights and Nicholas Berente
16
Divination by Program Committee
By Moshe Y. Vardi 32 The Profession of IT
13 It’s All About Image Multitasking Without Thrashing
8 Letters to the Editor Image recognition technology is Lessons from operating
Computational Thinking Is advancing rapidly. Researchers are systems teach how to do
Not Necessarily Computational discovering new ways to tackle the multitasking without thrashing.
task without enormous datasets. By Peter J. Denning
10 BLOG@CACM By Samuel Greengard
Assuring Software Quality By 35 Viewpoint
Preventing Neglect 16 Broadband to Mars Why Agile Teams Fail
Robin K. Hill suggests software Scientists are demonstrating Without UX Research
neglect is a failure of the coder to pay that lasers could be the future Failures to involve end users or
enough attention and take enough of space communication. to collect comprehensive data
trouble to ensure software quality. By Gregory Mone representing user needs are
described and solutions to avoid
39 Calendar 18 Why GPS Spoofing Is a Threat such failures are proposed.
to Companies, Countries By Gregorio Convertino
101 Careers Technology that falsifies navigation and Nancy Frishberg
data presents significant dangers
to public and private organizations. 38 Viewpoint
Last Byte By Logan Kugler When Does Law Enforcement’s
Demand to Read Your Data Become
104 Q&A 20 Turing Laureates Celebrate Award’s a Demand to Read Your Mind?
All The Pretty Pictures 50th Anniversary On cryptographic backdoors and
Alexei Efros, recipient of By Lawrence M. Fisher prosthetic intelligence.
the 2016 ACM Prize in Computing, By Andrew Conway and Peter Eckersley
works to harness the power 24 Charles W. Bachman: 1924–2017
of visual complexity. An engineer best known for
By Leah Hoffmann his work in database management
systems, and in techniques
of layered architecture that
include Bachman diagrams.
IMAGE COURTESY OF NASA
By Lawrence M. Fisher
2 COMMUNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

09/2017 VOL. 60 NO. 09
Practice Contributed Articles Review Articles
72 Security in High-Performance
Computing Environments
Exploring the many distinctive
elements that make securing
HPC systems much different than
securing traditional systems.
By Sean Peisert
Watch the author discuss

his work in this exclusive
Communications video.
https://cacm.acm.org/
videos/security-in-high-
performance-computing-
environments
48 60
Research Highlights
42 The Calculus of Service Availability 60 Moving Beyond the Turing Test 82 Technical Perspective
You’re only as available as with the Allen AI Science Challenge A Gloomy Look at the Integrity
the sum of your dependencies. Answering questions correctly of Hardware
By Ben Treynor, Mike Dahlin, from standardized eighth-grade By Charles (Chuck) Thacker
Vivek Rau, and Betsy Beyer science tests is itself a test
of machine intelligence. 83 Exploiting the Analog
48 Data Sketching By Carissa Schoenick, Peter Clark, Properties of Digital Circuits
The approximate approach is Oyvind Tafjord, Peter Turney, for Malicious Hardware
often faster and more efficient. and Oren Etzioni By Kaiyuan Yang, Matthew Hicks,
By Graham Cormode Qing Dong, Todd Austin,
and Dennis Sylvester
Watch the authors discuss
56 10 Ways to Be a Better Interviewer their work in this exclusive
Plan ahead to make the interview Communications video. 92 Technical Perspective
https://cacm.acm.org/
a successful one. videos/moving-beyond-the- Humans and Computers
By Kate Matsudaira turing-test Working Together on Hard Tasks
By Ed H. Chi
Articles’ development led by 65 Trust and Distrust in
queue.acm.org
Online Fact-Checking Services 93 Scribe: Deep Integration of Human
Even when checked by and Machine Intelligence to Caption
PHOTO BY TA F FPIXTURE; ROBOT ILLUSTRAT IO N BY PET ER CROW TH ER ASSO CIATES
fact checkers, facts are often still Speech in Real Time

open to preexisting bias and doubt. By Walter S. Lasecki,
By Petter Bae Brandtzaeg Christopher D. Miller, Iftekhar Naim,
and Asbjørn Følstad Raja Kushalnagar, Adam Sadilek,
Daniel Gildea, and Jeffrey P. Bigham
About the Cover:
The Turing Test has long
served as the imposing
benchmark for artificial
intelligence technology.
Last year, researchers
at the Allen Institute for
Artificial Intelligence
took a different route by
devising a challenge that
tested whether machines
could handle the reasoning
and understanding needed
to complete an eighth-grade
science test. See their Association for Computing Machinery
results on p. 60. Cover photo by Andrey Popov, with robot Advancing Computing as a Science & Profession
illustration by Peter Crowther Associates.
SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF THE ACM 3

COMMUNICATIONS OF THE ACM
Trusted insights for computing’s leading professionals.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2017 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the cacm-publisher@cacm.acm.org eic@cacm.acm.org Permission to make digital or hard copies
computing field’s premier Digital Library of part or all of this work for personal
and serves its members and the computing or classroom use is granted without
Executive Editor S E NIOR E DITOR
profession with leading-edge publications, fee provided that copies are not made
Diane Crawford Moshe Y. Vardi
conferences, and career resources. or distributed for profit or commercial
Managing Editor
advantage and that copies bear this
Thomas E. Lambert NE W S
Executive Director and CEO notice and full citation on the first
Senior Editor Co-Chairs
Bobby Schnabel page. Copyright for components of this
Andrew Rosenbloom William Pulleyblank and Marc Snir
Deputy Executive Director and COO work owned by others than ACM must
Senior Editor/News Board Members
Patricia Ryan be honored. Abstracting with credit is
Lawrence M. Fisher Mei Kobayashi; Michael Mitzenmacher;
Director, Office of Information Systems permitted. To copy otherwise, to republish,
Web Editor Rajeev Rastogi; François Sillion
Wayne Graves to post on servers, or to redistribute to
David Roman
Director, Office of Financial Services lists, requires prior specific permission
Rights and Permissions
Darren Ramdin VIE W P OINTS and/or fee. Request permission to publish
Deborah Cotton
Director, Office of SIG Services Co-Chairs from permissions@hq.acm.org or fax
Editorial Assistant
Donna Cappo Tim Finin; Susanne E. Hambrusch; (212) 869-0481.
Jade Morris
Director, Office of Publications John Leslie King; Paul Rosenbloom
Scott E. Delman Board Members For other copying of articles that carry a
Art Director William Aspray; Stefan Bechtold; code at the bottom of the first or last page
Andrij Borys Michael L. Best; Judith Bishop; or screen display, copying is permitted
ACM CO U N C I L
Associate Art Director Stuart I. Feldman; Peter Freeman; provided that the per-copy fee indicated
President
Margaret Gray Mark Guzdial; Rachelle Hollander; in the code is paid through the Copyright
Vicki L. Hanson
Assistant Art Director Richard Ladner; Carl Landwehr; Clearance Center; www.copyright.com.
Vice-President
Cherri M. Pancake Mia Angelica Balaquiot Carlos Jose Pereira de Lucena;
Production Manager Beng Chin Ooi; Loren Terveen; Subscriptions
Secretary/Treasurer
Bernadette Shade Marshall Van Alstyne; Jeannette Wing An annual subscription cost is included
Elizabeth Churchill
Advertising Sales Account Manager in ACM member dues of $99 ($40 of
Past President
Ilia Rodriguez which is allocated to a subscription to
Alexander L. Wolf
P R AC TIC E Communications); for students, cost
Chair, SGB Board
Chair is included in $42 dues ($20 of which
Jeanna Matthews Columnists
Stephen Bourne and Theo Schlossnagle is allocated to a Communications
Co-Chairs, Publications Board David Anderson; Phillip G. Armour;
Board Members subscription). A nonmember annual
Jack Davidson and Joseph Konstan Michael Cusumano; Peter J. Denning;
Eric Allman; Samy Bahra; Peter Bailis; subscription is $269.
Members-at-Large Mark Guzdial; Thomas Haigh;
Gabriele Anderst-Kotis; Susan Dumais; Terry Coatta; Stuart Feldman; Nicole Forsgren;
Leah Hoffmann; Mari Sako; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Camille Fournier; Benjamin Fried;
Pamela Samuelson; Marshall Van Alstyne Communications of the ACM and other
Eugene H. Spafford Pat Hanrahan; Tom Killalea; Tom Limoncelli;
Kate Matsudaira; Marshall Kirk McKusick; ACM Media publications accept advertising
SGB Council Representatives
C O N TAC T P O IN TS Erik Meijer; George Neville-Neil; in both print and electronic formats. All
Paul Beame; Jenna Neefe Matthews;
Copyright permission Jim Waldo; Meredith Whittaker advertising in ACM Media publications is
Barbara Boucher Owens
permissions@hq.acm.org at the discretion of ACM and is intended
Calendar items to provide financial support for the various
BOARD C HA I R S calendar@cacm.acm.org C ONTR IB U TE D A RTIC LES activities and services for ACM members.
Education Board Change of address Co-Chairs Current advertising rates can be found
Mehran Sahami and Jane Chu Prey acmhelp@acm.org James Larus and Gail Murphy by visiting http://www.acm-media.org or
Practitioners Board Letters to the Editor Board Members by contacting ACM Media Sales at
Terry Coatta and Stephen Ibaraki letters@cacm.acm.org William Aiello; Robert Austin; (212) 626-0686.
Elisa Bertino; Gilles Brassard; Kim Bruce;
Alan Bundy; Peter Buneman; Carl Gutwin; Single Copies
W E B S IT E
REGIONA L C O U N C I L C HA I R S Yannis Ioannidis; Gal A. Kaminka; Single copies of Communications of the
http://cacm.acm.org
ACM Europe Council Karl Levitt; Igor Markov; Gail C. Murphy; ACM are available for purchase. Please
Dame Professor Wendy Hall Bernhard Nebel; Lionel M. Ni; Adrian Perrig; contact acmhelp@acm.org.
ACM India Council AU T H O R G U ID E L IN ES Sriram Rajamani; Marie-Christine Rousset;
Srinivas Padmanabhuni http://cacm.acm.org/about- Krishan Sabnani; Ron Shamir; Yoav Shoham; COMMUN ICATION S OF THE ACM
ACM China Council communications/author-center Josep Torrellas; Michael Vitale; (ISSN 0001-0782) is published monthly
Jiaguang Sun Hannes Werthner; Reinhard Wilhelm by ACM Media, 2 Penn Plaza, Suite 701,
ACM ADVERTISIN G DEPARTM E NT New York, NY 10121-0701. Periodicals
2 Penn Plaza, Suite 701, New York, NY RES E A R C H HIGHLIGHTS postage paid at New York, NY 10001,
PUB LICATI O N S BOA R D and other mailing offices.
10121-0701 Co-Chairs
Co-Chairs
T (212) 626-0686 Azer Bestavros and Gregory Morrisett
Jack Davidson; Joseph Konstan POSTMASTER
F (212) 869-0481 Board Members
Board Members Please send address changes to
Martin Abadi; Amr El Abbadi; Sanjeev Arora;
Karin K. Breitman; Terry J. Coatta; Communications of the ACM
Michael Backes; Maria-Florina Balcan;
Anne Condon; Nikil Dutt; Roch Guerrin; 2 Penn Plaza, Suite 701
Advertising Sales Account Manager Andrei Broder; Doug Burger; Stuart K. Card;
Chris Hankin; Carol Hutchins; New York, NY 10121-0701 USA
Ilia Rodriguez Jeff Chase; Jon Crowcroft; Alexei Efros;
Yannis Ioannidis; M. Tamer Ozsu;
ilia.rodriguez@hq.acm.org Alon Halevy; Sven Koenig; Steve Marschner;
Eugene H. Spafford; Stephen N. Spencer;
Tim Roughgarden; Guy Steele, Jr.; Printed in the U.S.A.
Alex Wade; Keith Webster
Media Kit acmmediasales@acm.org Margaret H. Wright; Nicholai Zeldovich;
Andreas Zeller
ACM U.S. Public Policy Office
1701 Pennsylvania Ave NW, Suite 300, WEB
Washington, DC 20006 USA Association for Computing Machinery Chair
T (202) 659-9711; F (202) 667-1066 (ACM) James Landay
2 Penn Plaza, Suite 701 Board Members A
SE
REC
Y
Computer Science Teachers Association New York, NY 10121-0701 USA Marti Hearst; Jason I. Hong;
E
CL
PL
Mark R. Nelson, Executive Director T (212) 869-7440; F (212) 869-0481 Jeff Johnson; Wendy E. MacKay
NE
TH
S
I
Z
I
M AGA
4 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

letter from members of the acm u.s. public policy council
DOI:10.1145/3125780 Simson Garfinkel, Jeanna Matthews, Stuart S. Shapiro, and Jonathan M. Smith
Toward Algorithmic
Transparency and Accountability
A
LGORITHMS ARE REPLACING principled, and independent source of gies used in computer security should
or augmenting human de- scientific and technical expertise, free be employed to increase confidence in
cision making in crucial from the influence of product vendors automated systems.
ways. People have become or other vested interests. As organizations deploy complex al-
accustomed to algorithms More recently, the ACM Europe gorithms for automated decision mak-
making all manner of recommenda- Council Policy Committee (EUACM) ing, system designers should build
tions, from products to buy, to songs to has been doing the same in Europe. these principles into their systems. In
listen to, to social network connections. USACM and EUACM, both separately some cases, doing so will require ad-
However, algorithms are not just rec- and jointly, provide information and ditional research. For example, how to
ommending, they are also being used analysis to policymakers and the pub- design and deploy large-scale neural
to make big decisions about people’s lic regarding important societal issues networks while ensuring compliance
lives, such as who gets loans, whose ré- involving IT, including algorithmic with laws prohibiting discrimination
sumés are reviewed by humans for pos- transparency and accountability. against legally protected groups? This
sible employment, and the length of USACM and EUACM have identi- is especially crucial given the ability to
prison terms. While algorithmic deci- fied and codified a set of principles in- infer characteristics such as gender,
sion making can offer benefits in terms tended to ensure fairness in this evolv- race, or disability status even if the
of speed, efficiency, and even fairness, ing policy and technology ecosystem.a computer system is not provided with
there is a common misconception that These are: (1) awareness; (2) access and that data directly. How should informa-
algorithms automatically result in un- redress; (3) accountability; (4) explana- tion on automated decisions be logged
biased decisions. In reality, inscrutable tion; (5) data provenance; (6) audit- to ensure auditability? How can the op-
algorithms can also unfairly limit op- ability; and (7) validation and testing. eration of these networks be explained
portunities, restrict services, and even Awareness speaks to educating the to technologists and non-technical
improperly curtail liberty. public regarding the degree to which policymakers alike?
Information and communication decision making is automated. Ac- One model for moving forward may
technologies invariably raise these cess and redress means there is a way be self-regulation by industry. Our expe-
kinds of important public policy is- to investigate and correct erroneous rience, however, is that self-regulation is
sues. How should self-driving cars be decisions. Accountability rejects the only possible when there is a consensus
required to act? How private is informa- common deflection of blame to an on a set of relevant standards. We hope
tion stored on a cellphone? Can elec- automated system by ensuring those our principles can serve as input to such
tronic voting machines be trusted? How who deploy an algorithm cannot es- an effort. If policymakers determine
will the increasing uses of automation in chew responsibility for its actions. Ex- regulation is necessary, our principles
the workplace impact workers? Since its planation means the logic of the algo- are available, potentially in the way that
founding, ACM’s members have played rithm, no matter how complex, must the Code of Fair Information Practices
a leading role in discussing these issues be communicable in human terms. provided a basis for decades of privacy
within the computing profession and As many modern techniques are regulation around the world.
with policymakers. based on statistical analyses of large USACM and EUACM seek input and
The ACM U.S. Public Policy Council pools of collected data, decisions will involvement from ACM’s members in
(USACM) was established in the early be influenced by the choice of data- providing technical expertise to de-
1990s as a focal point for ACM’s inter- sets for training, and thus knowing cision makers on the often difficult
actions with U.S. government organiza- the data sources and their trustwor- policy questions relating to algorithmic
tions, the computing community, and thiness—that is, their provenance—is transparency and accountability, as
the public in all matters of U.S. public essential. Auditability for a decision- well as those relating to security,
policy related to information technol- making system requires logging and privacy, accessibility, intellectual
ogy. USACM came to prominence dur- record keeping, for example, for dis- property, big data, voting, and other
ing the debates over cryptography and pute resolution or regulatory compli- technical areas. For more information,
key escrow technology. Today, USACM ance. Finally, validation and testing visit www.acm.org/public-policy/usacm
continues to make public policy recom- on an ongoing basis means that tech- or www.acm.org/euacm.
mendations that are based on scientific niques such as regression tests, vetting
evidence, follow recognized best prac- of corner cases, or red-teaming strate- The authors are members of the ACM U.S. Public Policy
tices in computing, and are grounded Council, for which Stuart S. Shapiro (s_shapiro@acm.org)
serves as chair.
in the ACM Code of Ethics. It has estab- a https://www.acm.org/binaries/content/assets/public-
lished a reputation as a non-partisan, policy/2017_usacm_statement_algorithms.pdf Copyright held by authors.

cerf’s up
DOI:10.1145/3130331 Vinton G. Cerf
Take Two Aspirin and

Call Me in the Morning
I use a lot of metaphors in this column
and this one is about security. Security is
much on my mind these days along with
safety and privacy in an increasingly online,
programmed world. There is surely quarantine computers showing signs of other site, Stopbadware.org, helps in-
little doubt that we are at risk as cy- infection until they have been purged of fected websites rid themselves of viral
ber-attacks increase in scope, scale, their viral load? Of course, that raises the load. There are, of course, a number of
and complexity. Our lives are made question “How do you know that com- companies that offer anti-virus detec-
complex by some of the responses: puter or IOT device is infected?” and tion software that tries to detect mal-
“Oh, you want to log into this service? “How do you cleanse it?” Answering ware as it is encountered or ingested
what’s your username and password? these questions might take you into into a computer. So far, these efforts
OK. Now go to your mobile to get a sec- potential privacy-violating territory: have had only limited success and lead
ond password that I have sent you. You suppose your computer keeps track me to wonder whether there are more
don’t have cell service where you are? of every domain name and IP address effective ways of discovering infection
Too bad.” I am not dissing two-factor it has interacted with. Could you use by way of behavioral observation.
authentication as I am a huge propo- this list as a detector of potential It is tempting to imagine a home
nent, but I have experienced situations hazard? Could you go to a service and router/firewall that does sophisti-
like this, or a dead battery and the frus- say “Here’s where I have been—am cated, machine-learned observation
trations are material. At that point, the I at risk?” Alternatively, you might to protect programmable devices at
system might turn to “answers to secret download a blacklist of bad sites and home, but since our laptops, mobiles,
questions,” but that opens up the pos- addresses and compare to your list of and other programmed devices roam
sibility that your choices of questions places. We’ve seen some of the nega- with us, they really need an on-board
and answers are discoverable with a tive side effects of spam blacklists so detection system (or logging system?)
search of the World Wide Web. Ugh. I am not sure this would work, to say to protect while on the road.
So where does this leave us? I am nothing of the question: “Quis custo- Perhaps we all need to get into
fascinated by the metaphor of cyber diet ipsos custodes?”a a cyber-hygiene habit and run our
security as a public health problem. I do wonder whether machine devices through regular infection
Our machines are infected and they are learning might be useful. Could my checks? And we surely need much
sometimes also contagious. Our reac- computer generate a profile of “nor- better tools with which to detect and
tions in the public health world involve mal” Internet interactions and warn combat this endless escalation. We
inoculation and quarantine and we me about unusual ones? Will the could also do with better user train-
tolerate this because we recognize our false alarm rate drive me crazy? How ing and services to avoid unsafe plac-
health is at risk if other members of would I know if something is a false es on the Internet and poor security
society fail to protect themselves from alarm? Is there anything like a cen- practices that lead to compromise.
infection. Sadly, virus detection seems ter for disease control in this space? While I am not advocating for an In-
to be closing the barn door after the Google acquired a company called ternet driver’s license, the prepara-
horses have left, to mangle a metaphor. Virustotalb a few years ago that main- tion for such a metaphorical exam
Zero Day attacks cannot be detected tains a library of viral profiles that might do us all some good.
with previously cataloged viral signa- allows users to check whether partic-
tures, for example. They may help, but ular URLs or files carry malware. An- Vinton G. Cerf is vice president and Chief Internet Evangelist
perhaps not enough. at Google. He served as ACM president from 2012–2014.
One wonders whether we should a Roughly, “Who will watch the watchmen?”
take the metaphor more seriously and b https://www.virustotal.com Copyright held by owner/author.
6 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

vardi’s insights
DOI:10.1145/3122847 Moshe Y. Vardi
Divination by Program Committee
D
practice
I V I N AT I O N I S T H E people of Borneo use birdwatching to modus operandi of program com-
of an occultic ritual as an decide which sites to farm and which mittees. The standard approach in
aid in decision making. It sites to leave fallow, they are simply such committees can be viewed as
has old historical roots. Ac- randomizing in the face of uncer- “guilty until proven innocent.” We
cording to the biblical book tainty about rain, pests, and more, expect only 25%–35% of the papers
of Samuel I, in the 11th century BCE, but this randomization comes with a to be accepted, so the default deci-
the Hebrew King Saul sought wisdom belief in the divine source of the decision is to reject unless there is strong
from the Witch of Endor, who sum- sion. (See essay by Michael Sulson at agreement to accept. But the reality is
moned the dead prophet Samuel, https://goo.gl/RYb264.) that a different committee may have
before his impending battle with the But what does this have to do with reached a different decision on the
Philistines. Alexander the Great, after program committees? In 2014, the majority of accepted papers. Is it wise
conquering Egypt in 332 BCE, visited Neural Information Processing Sys- to reject papers based essentially on
the Oracle of Amun at the Siwa Oasis tems Foundation (NIPS) Conference the whim of the program committee?
to learn about his future prospects. split the program committee into two If we switch mode to “innocent until
Divination can be practiced in many independent committees, and then proven guilty,” we would reject only
ways, including sortilege (casting of subjected 10% of the submissions— papers on which there is strong agree-
lots), reading tea leaves or animal en- 166 papers—to decision making by ment to reject, and accept all other
trails, random querying of texts, and both committees. The two commit- papers.
more. Divination has been dismissed tees disagreed on 43 papers. Given the Beyond the increased fairness of
as superstition since antiquity; the NIPS paper acceptance rate of 25%, “innocent until proven guilty,” this
Greek scholar Lucian derided divina- this means that close to 60% of the approach would also increase the effi-
tion already in the 2nd century CE. Yet papers accepted by the first commit- ciency of the conference-publication
the practice persists. tee were rejected by the second one system. A high rejection rate means
Developments in mathematics and vice versa. (See analysis by Eric that papers are submitted, resubmit-
and in computer science in the 20th Price at https://goo.gl/fy5jLR.) This ted, and re-resubmitted, resulting in
century shed new light on the power high level of randomness came as a a very high reviewing burden on the
of divination. Unless we believe that surprised to many people, but I have community. It also results in the pro-
divination truly allows us to consult found it quite expected. My own ex- liferation of conferences, which frag-
the divine, we can view it simply as a perience is that in a typical program- ments research communities. As I
form of randomization, which is rec- committee meeting there is broad argued in an earlier editorial (https://
ognized as a powerful construct in agreement for acceptance about the goo.gl/dUMkwZ), I believe the proper
game theory and algorithm design. top 10% of the papers, as well as broad way to adapt to the growth of the com-
The classical game-theoretic example agreement rejections about the bot- puting research is to grow our confer-
is the game of Rock-Scissors-Paper in tom 25% of the papers. For the other ences rather than proliferate confer-
which there is no Nash equilibrium 65% of the submissions, there is no ences.
of pure strategies, but there is a Nash agreement and the final accept/reject NIPS should be lauded for applying
equilibrium in which both players decision is fairly random. This is par- the “publication method” to scientif-
choose their actions uniformly at ran- ticularly true when the accept/reject ic inquiry. It is up to the computing-
dom. The classical Dining Philoso- decision pivots on issues such as sig- research community to draw the con-
phers Problem has no symmetric dis- nificance and interestingness, which clusions and act accordingly!
tributed deterministic solution, but, can be quite subjective. Yet, we seem Follow me on Facebook, Google+,
as shown by Michael Rabin, has such to pretend that this random decision and Twitter.
a solution if we allow randomization. reflects the deep wisdom of the pro-
The essential insight is that random- gram committee. Moshe Y. Vardi (vardi@cs.rice.edu) is the Karen Ostrum
George Distinguished Service Professor in Computational
ization is a powerful way to deal with I believe the NIPS experiment Engineering and Director of the Ken Kennedy Institute for
incomplete information. Thus, as re- should not only teach us some hu- Information Technology at Rice University, Houston, TX.
He is the former Editor-in-Chief of Communications.
alized by the anthropologist Michael mility, but should also suggest that
Dove in the 1970s, when the Kantu we may want to reconsider the basic Copyright held by author.

letters to the editor
DOI:10.1145/3128899
Computational Thinking Is
Not Necessarily Computational
I
APPLAUD PETER J. DENNING’S View- that would help students move into the ing definition by Al Aho: “Abstractions
point “Remaining Trouble Spots field, should that be their preference. called computational models are at
with Computational Thinking” But should computational thinking also the heart of computation and compu-
(June 2017), especially for point- be taught to artists, writers, poets, physi- tational thinking. Computation is a
ing out the subject itself is of- cians, and lawyers? Not as I see it . . . process that is defined in terms of an
ten characterized by “vague definitions The faulty thinking behind the “com- underlying model of computation, and
and unsubstantiated claims”; “computer science for all” approach to peda- computational thinking is the thought
putational thinking primarily benefits gogy is best seen in Denning’s table, processes involved in formulating
people who design computations and . labeled “Traditional versus New Com- problems so their solutions can be rep-
. . claims of benefit to nondesigners are putational Thinking.” Its entry on “do- resented as computational steps and
not substantiated”; and “I am now wary main knowledge” suggested tradition- algorithms.” But as Aho’s definition is
of believing that what looks good to me alists see domain knowledge as vitally highly circular, it reveals very little.
as a computer scientist is good for every- important to the person doing the com- All disciplines rely on models. The
one.” Moreover, the accompanying table putational thinking, while “new” think- only specifically computational word
outlined various historic definitions of ing says the importance of computa- here is “algorithms.” If we replaced it
“computational thinking,” including a tional thinking is domain-independent. with similar words, like “procedures”
comparison of what Denning called the As a practicing programmer who has or “sequences,” we would arrive at such
“new” and the “traditional” view of the dabbled in many different application vacuous “definitions” as, say, “Medicine
subject. However, my own interest in domains over a long professional career, is a process that is defined in terms of
computational thinking differs some- I see it as beyond understanding how an underlying model of medicine, and
what from Denning’s. First, I question anyone could fail to see the importance medical thinking is the thought process-
the legitimacy of the term “computation- of deeply knowing a domain to being es involved in formulating problems so
al” itself. Why say it, when the very subject able to solve problems in that domain. their solutions can be represented as
is “computers” and the chief academic Robert L. Glass, Toowong, Australia medical steps and procedures.” And
approach to their study is “computer sci- “Drama is a process that is defined in
ence”? If one looks at how computers are terms of an underlying model of drama,
actually used, it may come as a surprise to Author Responds: and dramatic thinking is the thought
learn that few such uses actually involve Computational thinking is the habits of mind processes involved in formulating prob-
computing. For example, applications developed from designing computations. The lems so their solutions can be represent-
that deal with scientific and engineer- meaning of computation has evolved from ed as dramatic steps and sequences.”
ing problems are of course heavily com- the 1960s “sequence of states of a computer One could analogously “define” musi-
puting-focused, but, last I heard, they executing a program” to today’s “evolution cal thinking, artistic thinking, chemical
constitute only approximately 20% of of an information process.” This changed thinking, and so forth.
all applications being developed world- meaning reflects the ever-expanding reach Unless somebody can come up with
wide. The most predominant applica- of computing into all sectors of work and life. a more insightful definition, it is indeed
tions—those for business—involve lit- Many of today’s most popular apps feature time to retire “computational thinking.”
tle computation beyond arithmetic. And computations well beyond arithmetic, as in, awrence C. Paulson,
L
systems programs like operating sys- say, facial recognition, speech transcription, Cambridge, England
tems and compilers, the focus of much driverless cars, and industrial robots. The
computer science study, historically at computational thinking developed by
least, involve little or no computation those who worked on these achievements Toward a True Measure
and primarily concern manipulating in- is much more powerful than the handful of Patent Intensity
formation rather than numbers. of programming concepts offered as the In their article “How Important Is IT?”
The problem is that computational- definition of “new CT.” (July 2017), Pantelis Koutroumpis et al.
thinking enthusiasts, as Denning wrote, Peter J. Denning, Monterey, CA described a methodology for assess-
are driven to spread the subject across ing the importance of information and
all academic majors. I certainly believe communications technologies (ICTs)
in the importance of programming and Time to Retire compared to non-ICT technologies,
using computers for the variety of appli- ‘Computational Thinking’? using PatStat, a dataset from the Euro-
cations for which they provide benefit Peter J. Denning asked, “What is com- pean Patent Office of 90 million patents
and that educational systems worldwide putational thinking?” in his Viewpoint awarded from 1900 to 2014. Controlling
should provide the knowledge and skills (June, 2017), then quoted the follow- for variables (such as patent office, year
8 COMMUNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

letters to the editor
of grant, and patent family), they con- Trademark Office’s economy update,2
cluded ICT patents are more influential the non-ICT “basic chemicals” category
than non-ICT patents because they re- ranked first, with $64.5 billion in mer-
ceive significantly more citations and a chandise exports of selected intellectu-
considerably higher PageRank. al-property-intensive industries, while
When one publication (not just those “semiconductors and electronic com-
involving patents) is cited more often ponents” was second at $54.8 billion.
than some other publication, the more- Most industries involve non-ICT tech-
cited one is thus more influential. How- nology. As for “patent intensity,” or the
ever, patent publications are unique ratio of patents to employees measured
because they not only describe novel sys- as patents/thousand jobs, “computer
Call for
tems and methods but also hold com- and peripheral equipment” and “com-
mercial value and represent licensable munications equipment” topped the
assets for their holders. A patent may be
cited hundreds of times yet still have rel-
atively low financial value; on the other
list, though this was due directly to the
relatively high number of patents issued
in the industry versus the industry’s rela-
Nominations
hand, a patent may be cited only rarely tively low number of employees. Conclu- for ACM
yet reflect enormous valuation. sions regarding level of influence of ICT General Election
Consider that in 2013, Kodak, the technologies versus other types of tech-
company that invented the digital cam- nologies should thus be reported with
era, sold its portfolio of 1,100 digital care when a comparison is based solely
photography-related patents to multiple on number of inventions and citations. The ACM Nominating
licensees for $525 million (or $477.3K If such influence is indeed the ba-
per patent). Earlier, Google bought Mo- sis for a comparison, then additional Committee is preparing
torola Mobility and its 17,000 patents covariates should be controlled for, in- to nominate candidates
for $12.5 billion (or $735.3K per patent), cluding the mean estimated valuation for the officers of ACM:
and Microsoft acquired 800 patents per patent, number of employees in the
from AOL for $1.06 billion (or $1.33M industry, and additional financial and
President,
per patent). Snap paid the exceptional industry-specific characteristics. Vice-President,
price of $7.7 million for Mobli’s Geo- Secretary/Treasurer;
filters patent, believed by TechCrunch References
1. Kartoun, U. A user, an interface, or none. Interactions 24, 1 and two
to be the highest amount ever paid for
a patent from an Israeli tech company.
(Jan.-Feb. 2017), 20–21.
2. U.S. Patent and Trademark Office. Intellectual Property Members at Large.
and the US Economy: 2016 Update. U.S. Patent and
However, the valuations of most pat- Trademark Office, Washington, D.C., 2016; https://
ents are unknown until they are indeed www.uspto.gov/sites/default/files/documents/ Suggestions for candidates
IPandtheUSEconomySept2016.pdf
auctioned or sold off. For instance, ICT- are solicited. Names should be
related patents (such as those involv- Uri Kartoun, Cambridge, MA sent by November 5, 2017
ing Google’s and Microsoft’s methods to the Nominating Committee Chair,
c/o Pat Ryan,
for faster Internet browsing)1 may have Authors Respond:
Chief Operating Officer,
impressive valuations, but those valua- Although there may be some correlation
ACM, 2 Penn Plaza, Suite 701,
tions are difficult to predict before actu- between patent price and technological
New York, NY 10121-0701, USA.
ally being auctioned or sold off. influence, the relationship is neither clear nor
Considering non-ICT patents, the systematic. Patent prices are more likely driven
With each recommendation,
revenue streams of several pharmaceu- by how incremental/radical/breakthrough it is,
please include background
tical companies depend on patents and whether its value is standalone or as part of a
information and names of individuals
their corresponding expiration dates, bundle, projected commercialization timescale, the Nominating Committee
and one patent could be worth billions cost versus risk, bidder’s experience, patent age, can contact for additional
over the course of its licensing period. rate of technological change, and substitution information if necessary.
Notable patented medications include and reverse-engineering risk, to say nothing
Pfizer’s Lipitor (for lowering fatty acids of broader economic factors. Perhaps our Alexander L. Wolf is the Chair
known as lipids), Bristol-Myers Squibb’s technological-influence measure could thus of the Nominating Committee,
Plavix (for preventing heart attacks and be used to help understand patent pricing. and the members are
strokes), and Teva’s Copaxone (for treat- Pantelis Koutroumpis, London, U.K., Karin Breitman, Judith Gal-Ezer,
ing multiple sclerosis). Other non-ICT Aija Leiponen, Ithaca, NY, and Rashmi Mohan, and Satoshi Matsuoka.
patents that have significantly and di- Llewellyn D W Thomas, London, U.K.
rectly improved people’s lives are cited
only rarely, including those related to Communications welcomes your opinion. To submit a
Letter to the Editor, please limit yourself to 500 words or
agriculture, transportation, and cre- less, and send to letters@cacm.acm.org.
ation of new materials.
In the most recent U.S. Patent and ©2017 ACM 0001-0782/17/09

The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
Follow us on Twitter at http://twitter.com/blogCACM
DOI:10.1145/3121430 http://cacm.acm.org/blogs/blog-cacm
Assuring Software
Quality By
Preventing Neglect
Robin K. Hill suggests software neglect is a failure of the coder to pay
enough attention and take enough trouble to ensure software quality.
Robin K. Hill open-source projects, that developers refined by some other rules to correct
The Ethical Problem produce no documentation at all, as for what happens at longer periods,
of Software Neglect a matter of course, and that further- but this code is a prototype ... She
http://bit.ly/2roEDf1 more, during maintenance cycles, retains the simple test, meaning to
May 31, 2017 they do not correct the old source code look up the specifics ... but her boss
comments, seeing such edits as risky commits her code. No harm is fore-
Ethical concern about technology and presumptuous. All of these peo- seeable ... except that it turns out to
enjoys booming popularity, evident ple are fine coders, and fine people. interface with another module where
in worry over artificial intelligence, Their practices seem oddly reason- the leap-year calculation incorporates
threats to privacy, the digital divide, able in the circumstances, under the the complete set of conditions, which
reliability of research results, and pressure of haste, even while those is discovered to drive execution down
vulnerability of software. Concern practices degrade the understandabil- the wrong path in some calculations.
over software shows in cybersecurity ity of the program. Couple that with The program is designated for fixing
efforts and professional codes.1 The the complexity of modern programs, but it continues to run, those in the
black hats are hackers who deploy and we conclude that, in some cases, know compensating for it somehow...
software as a weapon with malicious programmers simply don’t know what What sort of violation is neglect?
intent, and the white hats are the orga- their code does. It doesn’t attack security because it
nizations that set safeguards against Examples of software quality short- occurs behind the firewall. It doesn’t
defective products. But we have a gray- comings readily come to mind—out- attack ideals of quality because no-
hat problem—neglect. of-bounds values unchecked, com- one officially disputes those ideals. It
My impression is that the criteria plex conditions that identify the is a failure of degree, a failure to pay
under which I used to assess student wrong cases, initializations to the enough attention and take enough
programs—rigorous thought, design, wrong constant. Picture a clever and trouble. Can philosophy help clarify
and testing, clean nested conditions, conscientious coder finishing up a what’s wrong? An emerging theory
meaningful variable names, complete calendar module before an impor- called the ethics of care displaces
case coverage, careful modulariza- tant meeting. She knows that the the classical agent-centered moral-
tion—have been abandoned or weak- test for leap years from the numeric ity of duty and justice, endorsing in-
ened. I have been surprised to find, at yyyy value, if (yyyy mod 4 = 0) stead patient-centered morality as
prestigious institutions working on and (yyyy mod 100 != 0), must be manifest real-time in relationships.2,4
10 COMMUNICATIO NS O F TH E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

blog@cacm
The theory offers a contextual per- 5. Franssen, M., Lokhorst, G., and van de Poel, I.
Philosophy of Technology. The Stanford Encyclopedia
spective rather than the cut-and-dried
directives of more traditional views. What sort of Philosophy (Fall 2015 Edition), Edward N. Zalta
(ed.). https://plato.stanford.edu/archives/fall2015/
of violation
entries/technology/.
While care can be construed as a vir-
tue (relating to my prior post in this
space3) or as a goal like justice, the is neglect? Note: While the Web encyclopedias, as cited, provide good
surveys of current philosophical views, pursuit of any ideas
It doesn’t attack
in depth will require reading original research.
promoters of care ethics resist a uni-
versal mandate. They may also reject
this attempt to apply it to software, of security because Comments
all things; the heart of the matter for it occurs behind This is possibly the most important
care ethics is the work of delivering paragraph of the article, outlining the exact
care to a person in need. the firewall. problem in the industry:
Yet software neglect seems exactly It doesn’t attack “The quality that has corrected for
the type of transgression addressed neglect in the past is professionalism,
by the ethics of care, if we allow its ideals of quality by which I mean that the expert does
reinterpretation outside of human because no one what’s best for the client even at a cost
relationships. Appeal to the theory to personal time, energy, money, or
allows us to identify the opposite of officially disputes prestige—within reason! Certainly these
care, that is, neglect, as the quality to those ideals. judgments are subjective, and viable
condemn. This yields our account of when the professional is autonomous,
software quality as an ethical issue, when that single person exercises
especially piquant in its application control over the product and its quality.
of tools from the feminist foundry Counterforces in the current tech
to the code warrior culture. But little business world are (1) employment,
credit is due! We are not solving the under which most programmers are not
problem, only embedding it in the consultants, but rather given orders by
terms of a philosophical platform. ity, one possible resolution, odd as it a company; and (2) collaboration, under
This account raises issues in the eth- may seem, is simply to acknowledge which most software is the product of
ics of engineering, such as individual the situation, to admit to the public committees, in effect. Professionalism
versus corporate responsibility (and that software is not always reliable, or also depends on strong personal
whether corporate responsibility mature, or even understood. Given its identification with disciplinary peers and
can be rendered coherent and en- familiarity with bug fixes, the public pride in the group’s traditions.”
forceable short of the law). For a may not be unduly shocked. If we pre- It sounds like, short of working for
concise summary, see Section 3.3.2, fer to reject that fatalistic move, the enlightened organizations, software
on Responsibility, in Stanford Ency- pressing question is, are there some developers should be leaning towards more
clopedia of Philosophy entry on the public standards that developers can autonomy and self-ownership.
Philosophy of Technology.5 and will actually follow? The collec- I recently read Developer Hegemony
The quality that has corrected for tive response will determine whether (a very bold title!), http://amzn.
neglect in the past is professional- software engineering is a profession. I to/2pA18wB, and it addresses that
ism, by which I mean that the expert urge all coders who wish to take pride side of the issue by encouraging more
does what’s best for the client even at in their jobs to read the draft profes- professionalism and autonomy.
a cost to personal time, energy, mon- sional standards,1 which mention There’s already a strong movement
ey, or prestige—within reason! Cer- code quality in Section 2.1. in favor of Software Craftsmanship,
tainly these judgments are subjec- We see that ethical issues appear and the free software and open source
tive, and viable when the professional not only in the external social context, movements both seem to care more
is autonomous, when that single per- but in the heart of software, the cod- about quality than most companies
son exercises control over the prod- ing practice itself, a gray-hat problem, (though they do neglect documentation
uct and its quality. Counterforces in if you will. We hope that the ethics of sometimes). For example, we already
the current tech business world are care can somehow help to alleviate prefer software written by recognizably
(1) employment, under which most those issues. smart/professional developers.
programmers are not consultants, Here’s hoping to more autonomy
but rather given orders by a com- References in the future and the allowance of our
pany; and (2) collaboration, under 1. Association for Computing Machinery. Code 2018 professionalism to counteract the neglect
Project. https://ethics.acm.org/.
which most software is the product of 2. Burton, B.K., and Dunn, C.P. Ethics of Care. of software.
committees, in effect. Professional- Encyclopædia Britannica, https://www.britannica.com/ —Rudolf Olah
topic/ethics-of-care.
ism also depends on strong personal 3. Hill, R.K. Ethical Theories Spotted in Silicon Valley.
identification with disciplinary peers Blog@CACM, March 16, 2017, https://cacm.acm.org/
blogs/blog-cacm/214615-ethical-theories-spotted-in- Robin K. Hill is an adjunct professor in the Department of
and pride in the group’s traditions. silicon-valley/fulltext. Philosophy at the University of Wyoming.
4. Sander-Staudt, M. Care Ethics. The Internet
In the face of knotty difficulties Encyclopedia of Philosophy, 2017. http://www.iep.utm.
enforcing or fostering ideals of qual- edu/care-eth/. © 2017 ACM 0001-0782/17/09 $15.00
SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF T HE ACM 11
Introducing ACM Transactions
on Human-Robot Interaction
Now accepting submissions to ACM THRI
In January 2018, the Journal of Human-Robot Interaction (JHRI) will become an ACM
publication and be rebranded as the ACM Transactions on Human-Robot Interaction (THRI).
Founded in 2012, the Journal of HRI has been serving as the

premier peer-reviewed interdisciplinary journal in the field.
Since that time, the human-robot interaction field has

experienced substantial growth. Research findings at the
intersection of robotics, human-computer interaction,
artificial intelligence, haptics, and natural language
processing have been responsible for important discoveries
and breakthrough technologies across many industries.
THRI now joins the ACM portfolio of highly respected

journals. It will continue to be open access, fostering the
widest possible readership of HRI research and information.
All issues will be available on the ACM Digital Library.
Editors-in-Chief Odest Chadwicke Jenkins of the University of Michigan and Selma

Šabanović of Indiana University plan to expand the scope of the publication, adding a new
section on mechanical HRI to the existing sections on computational, social/behavioral,
and design-related scholarship in HRI.
The inaugural issue of the rebranded ACM Transactions on Human-Robot Interaction is

planned for March 2018.
To submit, go to https://mc.manuscriptcentral.com/thri
N
news
Science | DOI:10.1145/3121434 Samuel Greengard
It’s All About Image

Image recognition technology is advancing rapidly. Researchers are
discovering new ways to tackle the task without enormous datasets.
D
ISCOVERING THE SECRETS of
the universe is not a task for
the timid and the impatient;
there’s a need to peer into
the deepest reaches of outer
space and try to make sense of distant
galaxies, stars, gas clouds, quasars, ha-
los, and black holes. “Understanding
how these objects behave and how they
interact gives us answers to how the
universe was formed and how it works,”
says Kevin Schawinski, an astrophysi-
cist and assistant professor in the Insti-
tute for Astronomy at ETH Zurich, the
Swiss Federal Institute of Technology.
The problem is that traditional tools
such as telescopes can see only so far,
even with radical advances in optics and
the placement of observatories in space,
where they are free of the light and dust
of Earth. For instance, the Hubble Tele- the equation. As huge volumes of data Center for Cosmology at Carnegie Mel-
scope changed the way astrophysicists stream in, they are able to find answers lon University.
and astronomers viewed deep space by to previously unfathomable questions. Indeed, the combination of more
delivering far clearer images than pre- In recent years, scientists have begun data, advances in data science, and
viously possible. Of course, in this con- to train neural nets to analyze data new methods that allow researchers
text, distance and time are inextricably from images captured by cameras in to easily and cheaply train neural net-
linked. “But the images still do not al- telescopes located on Earth and in works is allowing scientists to boldly
low us to see as far back in time as we space. In many cases, the resulting ma- see where they have never seen before.
would like,” Schawinski says. “The far- chine-based algorithms can sharpen No less important, these advances are
IMAGE F RO M SH UTT ERSTOCK.CO M
ther we can see, the more we can under- blurs and identify distant objects bet- not limited to astrophysics and as-
stand about the origins of the universe ter than humans can. tronomy; they have touched an array of
and how it has evolved.” “Data science and big data are revo- other fields and have advanced autono-
Enter computer image recognition, lutionizing many areas of astrophys- mous vehicles, robots, drones, smart-
artificial neural networks, and data ics,” says François Lanusse, a post- phones and more. They’re also being
science; together, they are changing doctoral researcher in the McWilliams used to better understand everything
news
from how linguistic patterns contrib- ing methods, as well as unsupervised

ute to racism to identifying the poten- learning. University researchers as well
tial severity of hurricanes as they form. Researchers as companies such as Alphabet, which
Says Jeff Clune, an assistant profes- are turning to operates Google Brain and DeepMind,
sor of computer science at the Univer- have begun to study this space. They
sity of Wyoming, “Until very recently, convolutional are turning to convolutional systems
computers did not see and understand systems modeled modeled after the visual processing
the world very well. The ability to train that takes place in humans, and gen-
neural nets quickly and easily is trans- from human visual erative systems that rely on a more con-
forming image recognition and en- processing, and ventional statistical-based approach to
abling remarkable breakthroughs.” learn the features of a dataset.
generative systems The end goal? “We want to just
Picture Perfect that rely on a hand the computer the data and the
Artificial neural nets are nothing new. algorithm and have it deliver results,”
The concept originated in the 1940s statistical approach. Schawinski says. “This type of capabil-
and researchers have experimented ity would revolutionize astrophysics,
with them for the last quarter-century. but also science in general.”
Yet it was only over the last few years
that the technology has matured to the A Sharper Focus
point where computer image recogni- Advances in AI are now pushing the
tion and other artificial intelligence one task makes a neural network faster boundaries of neural nets and deep
(AI) capabilities have become viable. and better at learning the second task,” learning into an almost sci-fi realm,
Using anywhere from one to some- Clune explains. “The system already though the results produced by these
times hundreds of graphical process- has a basic understanding of things systems are very real. Consider: Clune
ing units (GPUs), these training net- that are common to both tasks, such now uses generative systems to pro-
works—which function in a similar as eyes, ears, legs, and fur.” As train- duce artificial images that look com-
way to neural pathways in the human ing proceeds and a neural net becomes pletely real to the human eye. These
brain—recognize patterns in data that smarter, it can identify photos and photo-realistic images range from birds
other computing systems cannot. Lay- other images it has never seen before. and insects to mountains and even ve-
ered nodes learn from each other— For example, Clune has achieved an ac- hicles. He describes the technology as
and from other networks—much like curacy rate as high as the 96.6% in the a “game changer.” Remarkably, over
the way children learn. Remarkably, neural net compared to the 40,000+ time, certain neurons in the deep learn-
because of their overall complex- humans who volunteered to label the ing network become better than others
ity, nobody knows exactly how each same images. Others have found that at recognizing and generating specific
trained artificial neural net produces the neural nets actually outperform hu- things, such as eyes, noses, bugs, or
its useful results. mans. Remarkably, “In most cases, we volcanoes. “The system actually figures
Rapid advancements in neural nets can train a neural net within a couple out what it needs to recognize and know
and deep learning are a result of sev- of days,” he says. and allocates neurons to these concepts
eral factors, including faster and better Of course, this doesn’t mean that automatically,” he says.
GPUs, larger nets with deeper layers, all systems are equally effective--and To be sure, generative networks
huge labeled datasets to train on, new that the results are consistently use- have value that extends beyond pro-
and different types of neural nets, and ful. There’s also the goal of pushing the ducing artificial images for art, video
improved algorithms. Typically, for boundaries of computer image recog- games, or augmented reality/virtual re-
computer image recognition, research- nition further. At present, researchers ality (AR/VR). Researchers have begun
ers feed lots of pictures of things—mo- train systems using labels. This means to use generative networks in competi-
torcycles, chimpanzees, trees, or space designating images for one type of ani- tion with image-recognition networks
objects, for example—into the system mal ‘a lion’ and another ‘a zebra,’ or one to generate even more accurate results.
so the neural net can learn what an ob- galaxy ‘a spiral’ and another ‘an ellipti- Within this scenario, the generator
ject looks like and how to differentiate cal.’ The problem with this approach is network creates fake images and the
it from others. If a researcher is train- that it’s time consuming and sometimes image recognition network, known as
ing the neural net to recognize ani- expensive. What is more, “sometimes a discriminator, analyzes the images
mals, the system tends to learn faster you don’t have labels, or they are noisy and attempts to separate the real from
and better if old data is transferred labels,” says Ce Zhang, an assistant the fake images. The discriminator
to the new task. For instance, if the professor in the Systems Group at ETH later checks the validity of its findings
original task was to identify lions and Zurich. For instance, a “cougar” label and uses those results to further refine
zebras, adding this data to the job of might confuse the system if it is present- its algorithm. Over time, the discrimi-
identifying elk and bears will help. ed with both the car and the animal. nator becomes smarter and tells the
The system succeeds because there Consequently, researchers are in- generator how to adapt its output to
is now a shared knowledge between terested in an emerging area of deep generate even more realistic images.
the two paths. “Already being good at learning that relies on different train- The advantage of this approach is

news
that the discriminator, referred to as a

generative adversarial net (GAN), learns
do a job better, but they also offer new
ways of looking at the data.” ACM
over time what matters most in the im-
age, Zhang says. At a certain point, the
system displays almost human-like in-
The view into the future is equally
compelling. Lanusse says that in the
coming years neural networks will drive
Member
tuition, he says; “results improve sig-
nificantly.” Interestingly, this approach
enormous advances in fields beyond as-
trophysics. These systems will not only
News
not only improves the quality of image detect, recognize, and classify objects, ENSURING TECHNOLOGY
detection, it may also trim the time re- they will understand what is taking place BEHAVES CORRECTLY
quired to train a network by reducing in an image or in a scene in real time. “Things should
the number of images—essentially the This, of course, could profoundly impact do what they are
expected to do,
volume of data—required to obtain everything from the way autonomous ve- according to a
useful results. Says Zhang: “An interest- hicles operate to how medical diagnos- specification,”
ing question is how can we lower the re- tics work. Ultimately, they will help us says Marta
Kwiatkowska,
quirement of a neural network in terms unlock the mysteries of our planet and professor of computing systems
of how much data it needs to achieve the universe. They will deliver a level of at the University of Oxford. She
the current level of quality?” understanding that wouldn’t have been explains that something should
Another step is to make today’s ar- imaginable only a few years ago. happen with high probability,
within an appropriate or
tificial neural nets easier to use. The Says Lanusse, “Computer image expected time or expected range.
technology is still in its infancy and recognition is advancing rapidly. We “My main focus is on developing
researchers often struggle to use tools are finding ways to train networks verification techniques and
model checking for probabilistic
and technology effectively. In some faster and better. Every gain in speed
systems, which ensure software,
cases, they have to work with multiple and accuracy of even a few percent systems, hardware, and
nets in an iterative fashion to find one makes a profound difference in the protocols behave correctly.”
that works best. As a result, Zhang has real-world impact.” Kwiatkowska has held a
statutory chair in the Department
developed a software program, ease.ml, of Computer Science at Oxford,
that configures deep learning neural and a professorial fellowship at
Further Reading
networks in a more automated and ef- the University’s Trinity College,
ficient way. This includes optimizing Nguyen, A., Yosinski, J., Bengio, Y., since 2007. Prior to that, she
Dosovitskiy, A., and Clune, J. was a professor in the School
components such as CPUs, GPUs, and of Computer Science at the
Plug & Play Generative Networks:
FPGAs and providing a declarative lan- Conditional Iterative Generation of Images University of Birmingham, a
guage for better managing algorithms. in Latent Space. Computer Vision and lecturer at the University of
Pattern Recognition (CVPR ‘17), 2017. Leicester, and an assistant
“Right now, the user needs to deal professor at Jagiellonian
with a lot of different decisions, includ- http://www.evolvingai.org/ppgn
University in Krakow, Poland.
ing the type of neural net they want to Lanusse, F., Quanbin, M, Li, N., Collett, T.E., Li, She earned an undergraduate
use. There may be 20 different neural C., Ravanbakhsh, S., Mandelbaun, R., degree in computer science at
and Poczos, B. Jagiellonian University, writing
nets available for the same task. Choos- programs on punch cards in
CMU DeepLens: Deep Learning for
ing the right model and reducing com- Automatic Image-based Galaxy-Galaxy PASCAL. Kwiatkowska then
plexity is important,” he explains. Strong Lens Finding. March 2017. earned a master’s degree
arXiv:1703.02642. from Oxford, and a Ph.D. in
Already, the software, combined computer science from the
with other deep learning techniques— https://arxiv.org/abs/1703.02642.
University of Leicester.
including an algorithm called ZipML Wang, K., Guo, P., Luo, A., Xin, X., and Duan, F. Initially her research interests
that reduces data representation with- Deep neural networks with local centered on concurrent and
connectivity and its application to distributed systems, but in 1995
out reducing accuracy—has cut noise Kwiatkowska started working
astronomical spectral data.
and sharpened images significantly for 2016 IEEE International Conference on verification techniques.
the astrophysics group at ETH Zurich. on Systems, Man, and Cybernetics (SMC), Her research covers a range of
As a result, Schawinski and others can Budapest, 2016, pp. 002687-002692. applications including biological
doi: 10.1109/SMC.2016.7844646. systems, DNA computations,
now peer more deeply into the universe. and analyzing the behavioral
http://ieeexplore.ieee.org/
“Unlike other areas of science, we correctness of pacemakers,
document/7844646/
cannot run experiments in a lab and among others.
Goodfellow, I.J., Pouget-Abadie, J., Kwiatkowska now studies
simply analyze the results,” ETH Zurich autonomous systems and the
Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
explains. “We are dependent on tele- Courville, A., and Bengio, Y. application of verification
scopes and images to look back in time. Generative Adversarial Networks. techniques to robotics. “We need
We have to piece together all these fixed June 2014. eprint arXiv:1406.2661. to develop methods to verify
http://adsabs.harvard.edu/cgi-bin/bib_ the correctness of the behavior
snapshots—essentially huge data- of robots,” she says. “I am
query?arXiv:1406.2661.
sets—to gain insight and knowledge.” also looking at verification for
Adds Lanusse: “Classical methods machine learning, specifically
Samuel Greengard is an author and journalist based in neural networks, which are
of astronomy and astrophysics are rap- West Linn, OR. now being used in perception
idly being superseded by data science algorithms for self-driving cars.”
and machine learning. They not only © 2017 ACM 0001-0782/17/09 $15.00 —John Delaney
news
Technology | DOI:10.1145/3121442 Gregory Mone
Broadband to Mars
Scientists are demonstrating that lasers
could be the future of space communication.
I
N MARCH, THE U.S. National Aero-
nautics and Space Administra-
tion (NASA) announced that
its planned Orion spacecraft,
which could one day carry as-
tronauts to the Moon and Mars, will
include a new kind of communica-
tion system. Typically, manned and
unmanned vehicles and probes use
radio waves to send and receive infor-
mation. For decades, though, scien-
tists have been pushing toward using
laser-based communications in space.
Lasers are no faster, but they can de-
liver far more information than radio
waves in the same amount of time.
NASA’s Apollo missions to the Moon
were capable of transmitting 51kb
worth of data per second, for example,
but Orion’s planned Laser-Enhanced
Mission and Navigation Operational Artist’s conception of how a NASA spacecraft would use lasers to communicate with Earth.
Services (LEMNOS) system could send
back more than 80 megabytes each sec- the invention of the laser itself, notes applied within Earth’s orbit as well.
ond from the lunar surface. Abi Biswas, supervisor of the Optical The European Space Agency (ESA) and
That stream could be packed with Communications Systems group at Airbus recently put lasers to work as a
rich scientific data, or it could in- NASA’s Jet Propulsion Laboratory in broadband data transfer technology, the
clude ultra-high-resolution video of Pasadena, CA. From a basic physics European Data Relay System (EDRS).
distant worlds. Scaled-up versions of standpoint, Biswas says the advan- Normally, a satellite flying in low
this system could dispatch movies of tage is clear: lasers occupy the higher- Earth orbit transmits data only when it
dust devils, storms, or even astronauts frequency end of the electromagnetic is within view of a ground station. As a
walking on the surface of Mars. Dur- spectrum, relative to radio waves. That result, it may take 90 minutes for the
ing the six-month-long trip to the Red means the beam itself is much nar- ground station to receive data after it
Planet, space travelers could poten- rower. If you were to aim a beam of ra- has been collected.
tially trade videos with family mem- dio waves back at Earth from Mars, the In the EDRS system, lasers are used
bers back on Earth, and mitigate the beam would spread out so much that both to send more data and to acceler-
psychological toll of the long journey. the footprint would be much larger ate its transfer. A geostationary satel-
The LEMNOS project is just one of than the size of our planet. “If you did lite locks onto the low-orbiting satel-
many planned or existing laser-based the same thing with a laser,” Biswas lite via laser the moment it passes over
communications systems in orbit and says, “the beam footprint would be the horizon, then remains connected
beyond. about the size of California.” as the craft soars over the hemisphere
These recent and anticipated ad- When those beams are sent with below. The observing satellite begins
vances cannot be attributed to a single, the same amount of power, the laser transmitting data via laser once the
revolutionary breakthrough, according ends up concentrating more power link is established. The satellite can
to experts. Instead, this new age of laser- on that receiver. “You can send many transfer far more data this way, but
based broadband in space has resulted more bits of information for the same it also gets that data to the ground
from steady improvements in detectors, amount of power,” Biswas explains. faster. Instead of waiting for the ob-
IMAGE COURTESY OF NASA
actuators, control systems, and more. Relative to radio, laser or optical com- serving satellite to fly within view of
munications can transmit anywhere the ground station, the laser transfer
Broadband in Orbit from 10 to 100 times as much data. begins once the craft establishes line
The idea of laser communications in The advantages are not limited to of sight with the geostationary craft,
space has been around nearly since solar system exploration; they can be which then transmits data to the
16 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

news
ground via radio. “You cut down the one meter wide. One way around this been working on a project to propel
time or delivery of the data to the end would be to build a larger receiving miniature spacecraft beyond the so-
user on the ground from hours to 10 antenna, but the goal of the LLCD lar system using a phased array of
to 20 minutes,” says Michael Witting, was to show that an optical commu- either ground- or space-based lasers.
program manager for ESA EDRS. nication system could work without a The spacecraft would have a modest
This speed, combined with the abil- massive—and massively expensive— laser to send back data, and Lubin
ity to transmit more high-resolution dish on the ground. “You have to fig- says the array used to propel the craft
satellite images, will allow organiza- ure out, how can I catch this dancing could also be engineered to receive
tions to track the movement of ice in signal onto a very sensitive detector its messages. “If we’re setting up to
polar regions to help ships navigate and then add very little noise?” asks blast something out with lasers, then
the Arctic crossing. Officials could Don Boroson, a research fellow in the why not use that system to send some-
monitor oil spills, earthquakes, floods, Massachusetts Institute of Technol- thing back?” he asks. That something
and other instances in which informa- ogy (MIT) Lincoln Laboratory’s Com- probably will not include video, but
tion needs to travel quickly to disaster munication Systems Division, and a the lasers could dispatch images and
response teams. major contributor to the LLCD. other information.
The EDRS is already in use, and ESA For the Moon demonstration, Bo- Back on Earth, larger receiving tele-
is scheduled to launch a second satel- roson says the group used an old idea scopes would help pick up signals from
lite in 2018. known as error correction coding, the Moon, Mars, or beyond. Currently,
NASA has a similar project in the which intelligently bundles in redun- NASA scientists are demonstrating how
works, and while the link does not ex- dant bits, so you can still decipher an laser communications systems work
tend all the way to the Moon or Mars, entire message even if you only catch with small receivers, but with the kind
Witting says the technical challenges part of the beam. So, if they were try- of ground telescopes that measured 10
were significant. The system operates ing to send a message that was 10,000 to 15 meters across, it would be pos-
over approximately 45,000 kilometers bits, they’d add in another 10,000 sible to catch far more light and infor-
(about 28,000 miles), and each la- carefully chosen redundant bits, and mation. Boroson doesn’t expect those
ser terminal must locate and remain send 20,000 in all. Then, even if only receivers to be built anytime soon, but
locked on the other throughout flight. half of that message was received, the he does anticipate laser communica-
“It’s like taking a torch from Europe original 10,000-bit code could still be tions will be used more and more.
and hitting a coin in New York,” Wit- deciphered. This approach was criti- “It’s going to happen slowly,” says
ting says— all while the coin is racing cal, Boroson explains; “it allowed us Boroson. “First we’ll see lots of systems
at about 17,000 miles per hour. to have as small as possible a receiver around the Earth, then a few systems
on the ground and still do these very further out in space, and then more and
Lasers from the Moon high data rates and make no errors. more. But it’s all coming, it’s definitely
As you move out to larger distances, We did the lunar link with half a watt coming.”
such as the Moon or Mars, the chal- and a four-inch telescope in space,
lenge increases. Biswas compares the and we still did 622 megabits per sec-
Further Reading
effort involved with hitting a target on ond to the ground.”
Earth from the Moon or Mars to try- Biswas, A. Piazzolla, S. Moision, B.,
vand Lisman, D.
ing to look at a small object through Making Every Photon Count
Evaluation of deep-space laser
a one-meter-long straw; holding that Pushing beyond satellite or lunar communication under different mission
straw steady enough to keep it fo- communication increases the techni- scenarios, Proceedings of SPIE, 2012.
cused is a tremendous challenge. If cal difficulty, because the laser beam Boroson, D.M. and Robinson, B.S.
not held steady and aimed accurately, loses energy at a rate proportional to The Lunar Laser Communication
the California-sized footprint of a laser the square of the distance between Demonstration: NASA’s First Step Toward
beam traveling from Mars could actu- transmitter and receiver. Scaling up Very High Data Rate Support of Science
and Exploration Missions, Space Science
ally miss its target on Earth, and fail to the power used to generate the laser
Reviews, Volume 185, 2014.
transmit the data. is not an option, Biswas explains, be-
Experts say the success of NASA’s cause the laser systems would become Lubin, P.
A Roadmap to Interstellar Flight, Journal of
2013 test of such a system, the Lu- too large and expensive. “As you get the British Interplanetary Society, vol. 69,
nar Laser Communication Demon- farther and farther away, you have to 2016.
stration (LLCD), can be attributed improve the efficiency of your system,” Space Data Without Delay
to a number of advances, including says Biswas. “You have to make every http://bit.ly/2pcIlt2
improvements in the actuators that photon count.”
Hemmati, H.
make micro-adjustments to the posi- Despite the challenges of larger Deep Space Optical Communications, John
tion of the beam, ensuring it remains distances, physicist Philip Lubin of Wiley & Sons, 2006.
on target, and advances in the control the University of California, Santa
systems that determine exactly where Barbara, argues that lasers would still Gregory Mone is a Boston-based science writer and the
it needs to aim. When the laser struck author, with Bill Nye, of Jack and the Geniuses: At the
be a preferred means of communica- Bottom of the World.
Earth, the beam was six kilometers tion for missions to the edges of our
wide, but the receiver was less than solar system and beyond. Lubin has © 2017 ACM 0001-0782/17/09 $15.00
news
Society | DOI:10.1145/3121436 Logan Kugler
Why GPS Spoofing Is a Threat

to Companies, Countries
Technology that falsifies navigation data presents
significant dangers to public and private organizations.
W
H E N T H E CRE W of an
$80-million super-
yacht in the Ionian Sea
checked its computer,
they realized they were
drifting slightly off course, likely as a
result of strong currents buffeting their
ship. The crew made adjustments and
went back to work—without realizing
they were now taking directions from a
hacker.
In the bowels of the ship, Todd
Humphreys, an associate professor in
the Department of Aerospace Engi-
neering and Engineering Mechanics at
the University of Texas at Austin, Part of an animation showing how a radio navigation research team from The University of
worked with his team to feed the super- Texas at Austin was able to successfully spoof the GPS system of an $80-million private yacht.
yacht’s crew false navigation data us-
ing a few thousand dollars worth of war in a billion-dollar battleship. A makes receivers behave any way you like.
hardware and software. range of GPS devices and networks are “So far as I know, no commercial
The crew was completely unaware used for everything from military appli- GPS receivers offer any strong de-
they were now piloting in a direction of cations to commercial needs—and all fense against spoofing or even any re-
Humphreys’ choosing. the use cases in between. liable spoofing detection capability,”
Thankfully, it was all an experiment Yet all of these systems rely on the says Humphreys.
that took place with the yacht owner’s data from the network of GPS satel-
blessing. If it had been real, Hum- lites. If you can corrupt the data com- Stealing an $80-Million Superyacht
phreys could have sent the superyacht ing from those satellites, you can cre- In 2013, Humphreys, then a researcher
1,000 miles off-course into the hands ate a world of headaches for systems in the Department of Aerospace Engi-
of a rogue government, terrorist group, that rely on this data. neering and Engineering Mechanics at
or professional criminal organiza- GPS spoofing can be performed with the Cockrell School of Engineering, was
tion—and the crew would not have re- relatively low-cost tech, which is an ex- invited, along with a team of students,
alized it until it was far too late. pensive problem for the people, com- aboard an $80-million yacht in the Io-
Welcome to the very real dangers panies, and governments that trust the nian Sea to test their GPS spoofing tech-
posed by Global Positioning System system implicitly. In the case of Hum- nology. Using his hardware and soft-
(GPS) spoofing, or the dark art of con- phreys’ superyacht hacking, he and his ware rig, Humphreys managed to falsify
vincing computers you are somewhere team used about $2,000 worth of tech. GPS data used by the ship, effectively
that you’re not. It is surprisingly easy— Even in more advanced spoofing sce- giving him control over the vessel.
and shockingly dangerous, because narios, the technology is still straight- Humphreys explained GPS receivers
IMAGE COURTESY OF UNIVERSIT Y OF T EXAS AT AU ST IN
we’re not prepared for it at all. forward, says Dinesh Manandhar, an calculate their distance from several
associate professor and GPS expert at satellites at the same time. Each satel-
GPS Is Easy to Spoof the University of Tokyo. lite has a code—called a pseudoran-
The U.S. Global Positioning System “A device that can generate GPS sig- dom noise (PRN) code—that identifies
consists of 24 satellites that orbit Earth. nals is necessary. Such devices are avail- which satellite in the GPS network is
GPS devices receive signals from the able from GPS signal simulator device broadcasting. Humphreys’ spoofing
nearest satellites that allow them to de- manufacturers,” Manandhar explains. equipment slowly replaced the real
termine their precise location, whether These devices are used to test GPS re- GPS signals with fake ones, working
you’re looking for creatures in the wild- ceivers in factories. As such, they can be delicately so the ship’s system did not
ly popular Pokémon Go app, or going to programmed to transmit a signal that detect an abrupt change in signal.

news
The spoofed GPS reported the yacht Department of Homeland Security’s re-
was three degrees off-course. The crew, cent document on anti-spoofing, ”Im-
unaware when the experiment would Cargo shipments proving the Operation and Develop-
take place, adjusted the ship’s course are at risk from ment of Global Positioning System
based on the spoofed GPS. The crew as- (GPS) Equipment Used by Critical Infra-
sumed it was due to natural forces such GPS spoofing, structure,” as a sign that the right par-
as water currents and crosswinds.” as are geofences — ties are taking GPS spoofing seriously.
GPS spoofing can be used for all Manandhar has developed anti-
sorts of nefarious purposes. As seen digitally proscribed spoofing methodologies for Japanese
with the yacht, cargo shipments are at boundaries used satellites that may be used in the next
risk, especially dangerous or high-val- generation to be sent into orbit, he
ue ones that are required to follow des- by many corporations says. He recommends that major navi-
ignated GPS routes. Geofences—or to protect gation data provider countries like the
digitally proscribed boundaries—are U.S., Japan, the European Union, Chi-
used to protect sensitive data in many sensitive data. na, and India conduct official joint dis-
corporations; GPS spoofing could be cussions on the security of their sys-
used to access that data well out of the tems at the International Committee
bounds intended. on Global Navigation Satellite Sys-
Once you add emerging technolo- tems, an organization under the um-
gies, like self-driving cars, to the mix, it the data right away; only when the sig- brella of the United Nations.
gets even scarier. Autonomous vehicles nature was verified would the client The dangers, however, are not going
use GPS data at regular intervals not use the GPS data it had received. away. Humphreys worries particularly
only to understand where they are, but “Using cryptography makes it hard that spoofing the GPS-sourced timing
also to decide where to drive passen- to forge a signature, such that even an used to regulate financial databases
gers and cargo. adversary that can feed the client with could create havoc. Industries like fi-
Humphreys’ yacht spoofing was the false data cannot forge a signature, nancial services, he says, “have back-
first time commercial tech had been thus the client does not use forged ups in place, but on close inspection
used in such an effective—and power- data,” Ashur says. one realizes that the backups them-
ful—demonstration. This would prevent, say, spoofing the selves are either short-term or eventu-
Now, said Manandhar, it is even eas- signal to hijack a self-driving car or really trace their source to GPS.”
ier to acquire spoofing technology. “Re- route a drone that relied upon the data. “A coordinated attack that under-
cently, software-based low-cost devices However, the Galileo system, which stood the finance world’s dependency
have become available that cost less comes fully online in 2020, presented a on GPS would be hard to detect and
than $1,000.” unique obstacle: low bandwidth. Gali- even harder to defeat,” he cautions.
leo has relatively low-bandwidth sig-
A Problem for Governments, People nals that make a typical approach to
Further Reading
It is not just yacht owners who need to the problem, using public-key cryptog-
be concerned; the problem is espe- raphy, impossible. Psiaki, M., and Humphreys, T.
Protecting GPS from Spoofers
cially acute for national governments “The uniqueness of our solution is
Is Critical to the Future of Navigation,
and international bodies, which are that it uses symmetric cryptography and IEEE Spectrum, Jul 29, 2016,
waking up to the dangers posed by can thus fit into the bandwidth con- http://spectrum.ieee.org/telecom/security/
GPS spoofing. straints,” says Ashur. The protocol is protecting-gps-from-spoofers-is-critical-to-
Incredibly, Europe’s Galileo glob- scheduled to go into effect in 2018, ac- the-future-of-navigation
al navigation satellite system—the cording to ZDNet. Until all 24 of Gali- Amirtha, T.
European Union’s version of GPS— leo’s satellites are deployed and opera- Satnav spoofing attacks: Why these
researchers think they have the answer,
operated beginning in December tional in 2020, however, the protocol will
ZDNet, Mar 27, 2017,
2016 “with no way to protect civilian “operate in test mode.” http://www.zdnet.com/article/satnav-
users from hacking attempts,” re- In the meantime, manufacturers spoofing-attacks-why-these-researchers-
ported ZDNet. are starting to pay attention to the think-they-have-the-answer/
University of Leuven researchers problem, says Humphreys. Some, like U.S. Department of Homeland Security,
Ashur and Rijmen say they have devel- u-blox, a Swiss company that creates National Cybersecurity & Communications
oped an authentication protocol to deter wireless semiconductors and modules Integration Center, National Coordinating
Center for Communications
the forging of Galileo’s navigation data. for consumer, automotive, and indus-
Improving the Operation and Development
The protocol, called the TESLA sig- trial markets, offer anti-spoofing mea- of Global Positioning System (GPS)
nature, is designed to complement lo- sures such as the capability to detect Equipment Used by Critical Infrastructure,
cation data with a cryptographic “sig- fake global navigation satellite system http://bit.ly/2oZewfz
nature,” so Galileo’s satellites would (GNSS) signals, as well as a message in-
send both navigation data and the tegrity protection system to prevent Logan Kugler is a freelance technology writer based in
Tampa, FL. He has written for over 60 major publications.
cryptographic signature to the receiv- “man in the middle” attacks.
ing client. The client would not trust Humphreys also points to the U.S. © 2017 ACM 0001-0782/17/09 $15.00
news
Milestones | DOI:10.1145/3122790 Lawrence M. Fisher
Turing Laureates
Celebrate Award’s
50th Anniversary
A
CM RE CE N T LY H E LD a con- ics related to their fields of study. mental contributions to artificial intel-
ference in celebration of After welcomes from Hanson, pro- ligence through the development of a
the first 50 years of the gram chair Craig Partridge, and master calculus for probabilistic and causal
ACM A.M. Turing Award. of ceremonies (and past ACM presi- reasoning”), who spoke about an evo-
“Just over 50 years ago, dent) Dame Wendy Hall, 2008 Turing lutionary advance 40,000 years ago that
ACM awarded its first A.M. Turing Laureate Barbara Liskov (who received allowed Homo sapiens to advance past
Award to Alan Perlis for his work on the award “for contributions to prac- competitor species Homo erectus and
advanced programming techniques tical and theoretical foundations of the Neanderthals. “The ability to imag-
and compiler construction,” said ACM programming language and system de- ine things that do not physically exist
president Vicki L. Hanson. “In total, 64 sign, especially related to data abstrac- … the ability to model one’s environ-
people from around the world have re- tion, fault tolerance, and distributed ment, imagine other worlds, served to
ceived the Turing Award, recognizing computing”) offered a presentation accelerate evolution in favor of Homo
work that laid the foundations of mod- on the “Impact of Turing Recipients’ sapiens,” he said.
ern computing.” Work” focusing on the impact of early The session on “Restoring Person-
The award was presented to its 65th Turing recipients, which she described al Privacy Without Compromising Na-
recipient, Sir Tim Berners-Lee, at the as “tremendous.” tional Security” featured 2015 Turing
event in June. A session on “Advances in Deep Laureate Whitfield Diffie (co-recipi-
The conference included more than Neural Networks” featured 2011 Tur- ent of the award with Martin Hellman
20 Turing Laureates speaking on top- ing Laureate Judea Pearl (“for funda- “for inventing and promulgating both
PHOTOGRA PHS BY M ISTI L AYNE
Among the 22 Turing Laureates in attendance at the conference were: Front row, from left: Whitfield Diffie (2015), Martin Hellman (2015),
Robert Tarjan (1986), Barbara Liskov (2008). Second row, from left: Vinton Cerf (2004), Richard Karp (1985), Richard Stearns (1993), Dana
Scott (1976). Third row, from left: Ivan Sutherland (1988), Leslie Valiant (2010), Robert Kahn (2004). Fourth row, from left: Frederick Brooks
(1999), Raj Reddy (1994), William (Velvel) Kahan (1989), Donald Knuth (1974).
20 COM MUNICATIO NS O F TH E ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

news
asymmetric public-key cryptography,

including its application to digital
signatures, and a practical crypto-
graphic key-exchange method”), who
observed that calls by government
agencies to incorporate “backdoors”
in computing systems that would al-
low them to bypass normal authen-
tication or encryption are not really
necessary. “New backdoors aren’t re-
quired; the security failures of most
programs give the government ample
opportunity to ‘break in.’”
In a discussion about “Preserving
Our Past For The Future,” 2004 Tur-
ing Laureate (and ACM past president)
Vint Cerf (“with Robert E. Kahn, for
pioneering work on internetworking,
including the design and implemen-
tation of the Internet’s basic commu-
nications protocols, TCP/IP, and for Laureates, from left, Vinton Cerf, Edward Feigenbaum, and Raj Reddy.
inspired leadership in networking”)
related an anecdote about coming
across an old 3.5-inch floppy disk and
tracking down a compatible disk drive,
but still being unable to open the files
on the disk because they were saved in
an outdated version of WordPerfect.
“Backward compatibility suffers be-
cause you can’t keep everything,” like
the version of WordPerfect needed to
open those files, he said.
In a session on the future of micro-
electronics entitled “Moore’s Law Is
Really Dead: What’s Next?” moderator
John Hennessy of Stanford University
said, “We’re reaching the end of silicon
technology as we know it. “ As a result,
said Doug Burger of Microsoft Re-
search, “We’re entering a wild, messy,
destructive time. It sounds like a lot of
fun.” Margaret Martonosi of Princeton A panel on Moore’s Law was moderated by John Hennessy (left) and included Doug Burger,
University said, “We’re entering a post- Norman Jouppi, Butler Lampson (1992), and Margaret Martonosi.
ISA, Post-CPU era … we need to be ex-
ploring design processes to be domain- “A consequence of hardware changes and construction of large-scale artifi-
specific, and we need to train students not going to be invisible anymore is, cial intelligence systems, demonstrat-
that way as well.” you need a strategy for changes in the ing the practical importance and po-
Butler Lampson, the 1992 Turing software stack.” tential commercial impact of artificial
Laureate (“for contributions to the With regard to hardware advances, intelligence technology“) said, “We
development of distributed, personal Lampson said, “What people care need to identify technological solu-
computing environments and the about is that the cost of running their tions to societal problems. I believe we
technology for their implementation: application drops. “ can.” One of those solutions, he said,
workstations, networks, operating sys- Norman P. Jouppi, Distinguished might be “designing self-healing sys-
tems, programming systems, displays, Hardware Engineer at Google, con- tems in every system we design.”
security, and document publishing”), cluded Moore’s Law is “not dead, it’s In the future, Reddy said, there will
said, “There’s plenty of room at the top; just resting.” be no separation between humans and
there’s room in software, algorithms, Regarding “Challenges in Ethics technology. “Humans will have tech-
and hardware.” He added, “We know and Computing,” 1994 Turing Laure- nology in their bodies and be able to do
there’s a lot of software bloat, that we ate Raj Reddy (co-recipient with Ed Fei- things no person or computer could do
can get rid of, at a cost.” Also, he said, genbaum “for pioneering the design alone. That system should have ethics.
news
Leonard Adleman (2002).
Kenneth Thompson (1983).
Judea Pearl (2011) moderated a panel on deep neural networks.
Andrew Chi-Chih Yao (2000).
The newest Turing Laureate—Sir Tim Berners-Lee.
22 COMM UNICATIO NS O F THE AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

news
Unfortunately, that’s trumped by laws puter Science as a Major Body of Accu- Around the Corner? Or Maybe Both at
and government.” mulated Knowledge.” Computer sci- the Same Time?” 2000 Turing Laureate
“Accountability is what we want from ence, he said, shares with mathematics Andrew Chi-Chih Yao (“in recognition
all systems,” Reddy said. “The role of “the great privilege that we can invent of his fundamental contributions to
philosophers/ethicists is to convince the problems to work on.” Basically, he said, the theory of computation, including
government,” because “if it is not writ- computer science and mathematics are the complexity-based theory of pseudo-
ten into the law, nothing will change. “are two parallel disciplines with a lot in random number generation, cryptogra-
Unless we find mechanisms to get it into common, but a distinct difference.” phy, and communication complexity”)
the legal system, we can have all kinds of Knuth said he was both “ optimistic said, “ I am a believer in quantum com-
discussions and nothing will happen.” and pessimistic” about artificial intelli- puting,” adding, “it seems clear that
Opening the second day of the con- gence, and that he is “more pessimistic the technology of quantum computing
ference, 1974 Turing Laureate Donald when it is based on the notion humans is going to have a big practical impact.”
Knuth (“for his major contributions to make rational decisions.” Yao described quantum computing
the analysis of algorithms and the de- The 79-year-old Knuth said he con- as “a great experiment, and we’re all
sign of programming languages, and in siders “computer programming is art, waiting to see what can come of it.” He
particular for his contributions to the in the sense that it’s not from nature, also called is “a great paradigm for in-
‘art of computer programming’ through as well as being beautiful.” terdisciplinary computing.”
his well-known books in a continuous As a member of the panel discuss- The session on “Augmented Reality:
series by this title“) addressed “Com- ing “Quantum Computing: Far Away? From Gaming to Cognitive Aids and
Beyond” was the only session to fea-
ture two Turing Laureates: 1988’s Ivan
Sutherland (“for his pioneering and
visionary contributions to computer
graphics, starting with Sketchpad, and
continuing after“), and 1999’s Freder-
ick P. Brooks, Jr. (“for landmark con-
tributions to computer architecture,
operating systems, and software engi-
neering”).
Brooks said he has a vision of using
augmented reality (AR) for the purpose
of training emergency teams. He asked
the panel about “the state of actual
use of augmented reality today? Who
is using is a tool to earn their living?”
Sutherland responded that the pilot of
a jumbo jet, who trains in a simulator,
is taking advantage of “some of the best
VR (virtual reality) in use today,” while
A young conference attendee takes a selfie with Ivan Sutherland (1988). Yvonne Rogers of University College
London pointed out that head-up dis-
plays “are a reality for navigation.” Pe-
ter Lee, of Microsoft AI and Research,
said there is “a lot of belief, interest,
and a growing amount of experimenta-
tion in AR, such as the ability to “tele-
port” (virtual visit other locations); he
added, “If we can teleport, there really
isn’t a need for so many airplanes.”
Sutherland added that the “greatest
value of AR/VR is to show people things
in a way that makes the underlying
physics, the meaning, clear.”
The full conference sessions are
available at https://www.facebook.
com/pg/AssociationForComputingMa-
chinery/videos/.
—Lawrence M. Fisher
Panel discussions during the conference drew a packed house. © 2017 ACM 0001-0782/17/09 $15.00
news
In Memoriam | DOI:10.1145/3125605 Lawrence M. Fisher
Charles W. Bachman:
1924–2017
An engineer best known for his work in database management systems,
and in techniques of layered architecture that include Bachman diagrams.
C
HARLES WILLIAM “CHARLIE” enced me as much as his creative ge-
Bachman, the “father of data- nius. His respect for his colleagues, al-
bases” who received the ACM Who inspired ways looking for their positive contribu-
A.M. Turing Award for 1973 Bachman? tion, his patience in explaining ideas to
for creating the first database people who were not always at his level,
management system, died June 13 at “The inventors, his humility and open mind in always
the age of 92. the developers of listening to others as an opportunity to
Born in Manhattan, KS, in 1924, learn something new, characterize him
Bachman earned his B.S. in mechani- new concepts, the as a gentleman in this industry.”
cal engineering in 1948, as well as an solvers of previously Haigh last saw Bachman when he
M.S. in mechanical engineering from was “close to 90 but still sharp and en-
the University of Pennsylvania. unsolved problems.” joying life; talking about the article he
He went to work for Dow Chemical in was working on and his chats with E.O.
1950, using mechanical punched-card Wilson in the retirement community
computing devices to solve networks of they shared. He never stopped trying to
simultaneous equations representing tion, the first to win for a specific piece understand how things worked, or try-
data from Dow plants. In 1957, Bach- of software, and the first who would ing to make them work better. I feel
man became head of Dow’s Data Pro- spend his whole career in industry.” honored to have known him.”
cessing Department, through which he The British Computer Society In 2014, Bachman was named a Fel-
became a member of Share Inc., and a named Bachman a Distinguished Fel- low of the ACM for his contributions to
founding member of the Share Data low in 1977 for his work in database sys- database technology.
Processing Committee. tems. Bachman was named a Fellow of the
In 1960, Bachman joined the Gener- Bachman received the U.S. National Computer History Museum in 2015, for
al Electric (GE) Production Control Ser- Medal of Technology and Innovation his work on database management sys-
vices Group in New York City, using a (NMTI) for 2012. The award was pre- tems. Also that year, Michigan State
factory in Philadelphia to test designs sented to Bachman in 2014 by President University awarded Bachman an honor-
for a system to automate factory plan- Barack Obama. ary doctorate of engineering for being
ning, scheduling, operational control, He was nominated for the NMTI by “at the forefront of computer science
and inventory control. The resulting MI- U.S. Senator Edward J. Markey (D-MA), for more than 65 years.”
ACS was based on the Integrated Data who said, “The United States would not Bachman’s son, Jon, said his father’s
Store (IDS), Bachman’s concept of an be the worldwide hub for technological vision of the Integrated Data Store re-
“information inventory,” and was first innovation had it not been for the sulted in “a high-performance direct ac-
to adopt the “network data model” in achievements of Charles Bachman.” cess storage model (that) allows devel-
which the system would support and Data scientist Gary Rector said Bach- opers to build large efficient databases
enforce relationships between records. man was “humble, kind, generous, and of any type of business or operational
Bachman moved to GE’s Computer a gentle soul; his entire family reflects data. In fact, the first versions were so
Department in 1964, where he helped that humanity. Charlie loved flowers successful that they became established
build another management information and had a smile that embraced every- as the most important system software
system, the Weyerhauser Comprehen- one. His heart connected to people on mainframe computers of that era.”
sive Operating Supervisor (WEYCOS 2). more meaningfully than any database In an interview in 2008, Bachman
Bachman was awarded the ACM could ever do merely with data. To con- was asked who in the IT industry “in-
A.M. Turing Award for 1973 for his connect to people in this way is the greatest spired you or was a role model for you?”
PHOTO C OURT ESY OF BACH MA N FA MILY
tributions to database technology. As lesson he gave me.” He replied, “The inventors, the develop-
biographer Thomas Haigh observed, George Colliat, a colleague from GE, ers of new concepts, the solvers of previ-
“Bachman was the first Turing Award said, “I have learned from his ability to ously unsolved problems, the assem-
winner without a Ph.D., the first to be look for solutions that transcend the blers of new and interesting combina-
trained in engineering rather than sci- problems at hand and thereby multiply tions of old technologies. Take Sir
ence, the first to win for the application the value of the solutions.” He added, Maurice Wilkes, Edsger Dijkstra, Sir
of computers to business administra- “Charlie’s human values have influ- Tim Berners-Lee.”

V
viewpoints
DOI:10.1145/3126489 Joel R. Reidenberg
Law and Technology

Digitocracy
Considering law and governance in the digital age.
D
I G I TA L T E CH N OLOGIE S H AV E rules for citizens’ interactions online. The Internet’s Promise
unleashed profound forces Where public-sector surveillance and Without a doubt, the Internet revolu-
changing and reshaping private-sector tracking are so pervasive, tionized the dissemination of informa-
rule making in the democ- citizens lose the ability to control the tion and the ability of individuals to
racies of the information disclosure of their thoughts, friends, engage with each other. The euphoria
society. Today, we are witnessing a activities, and no longer have privacy. surrounding the early days of the Inter-
transformative period for law and Where lone coders wreak massive hav- net’s expansion into the public sphere
governance in the digital age. Elected oc for private gain or for opposition to predicted that technology would ex-
representative government and demo- governmental policies, they can use pand democracy and empower citizens
cratically chosen rules vie for author- their information resources to reject around the world. The conventional
ity with new players who have emerged majority rule. Where technology can wisdom thought citizen participation
from the network environment. At the protect the anonymity of wrongdoers, would multiply online with e-govern-
same time, network technologies have rule-breakers can escape accountabil- ment, and the public would have better
unraveled basic foundational prereq- ity. In short, the modern information oversight of the state thanks to new ca-
uisites for the rule of law in democracy society destroys one of the most fun- pabilities for monitoring administra-
like privacy, freedom of association, damental truths of any democracy that tive and executive actions. The power
and government oversight. The digital “the power to make the laws rests with of the Internet to disseminate informa-
age, thus, calls for the emergence of a those chosen by the people.”a tion from one to millions and the pow-
Digitocracy—a new set of more complex er of the Internet to foster conversa-
governance mechanisms assuring pub- a King v. Burwell, 135 S. Ct. 2480, 2496 (2015). tions seemed an unstoppable force for
lic accountability for online power held democratic discourse. Popular move-
by state and nonstate actors through ments like the Arab Spring, the Occupy
the creation of new checks and bal- We are witnessing Movement, and the Bernie Sanders
ances among a more diverse group of U.S. presidential campaign illustrated
players than democracy’s traditional a transformative that information technologies could
grouping of a representative legisla- period for law indeed significantly enhance and en-
ture, executive branch, and judiciary. able political organizing on a new,
Where Google and Facebook know and governance unprecedented scale. Many expected
more than most spy agencies about the in the digital age. that mechanisms like open electronic
lives of millions of citizens as well as the proceedings for rule making and open
inner workings of companies and gov- data for government transparency
ernments, information powerhouses would herald better representative gov-
and platforms can establish their own ernment and decision making.
26 COM MUNICATIO NS O F TH E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

viewpoints
The Internet’s technical infrastruc- ternet in our daily lives has effectively circumvent traditional political checks
ture turns out to challenge the promise demonstrated new vulnerabilities. The and balances and the public’s over-
of the political empowerment of citi- Internet’s infrastructure has already sight of government suffers irrepara-
zens. Just as network technologies of- displaced three key areas essential to bly. For example, in Oakland, CA, the
fered organizational tools for political the rule of law in democracy: sover- police engaged in a mass-scale surveil-
empowerment, the technologies them- eignty, government accountability, lance program to geo-locate thousands
selves provided the means to reverse the and respect for law. Internet technolo- of mobile phones using stingray devic-
hope that the Internet would be a one- gies restructure a state’s ability to pre- es without any judicial approval and, in
way pro-democracy force. Network in- scribe and assure the enforcement of New York City, the police program to
frastructure proved that it could be used law. Governments forfeit sovereignty record drivers through traffic cams and
to frustrate empowerment dreams. to networks when services like cloud smart city sensors also escapes judicial
Egypt, for example, pulled the plug on computing transcend borders and oversight. At the same time, technolog-
the Internet for several days during the enable organizations to choose rules ically enabled leaks and wide dissemi-
Arab Spring uprisings to block political in the blink of an eye. Network archi- nation of non-public activities of gov-
organizing; Brazil shut down WhatsApp tecture enables technology develop- ernment through sites like WikiLeaks
for 48 hours; local police in the U.S. used ers and service providers to embed may jeopardize legitimate functions
stealth Stingray technology to engage in rules for online activities through of government such as international
large-scale geo-surveillance of citizens. infrastructure choices. For example, relations and active law enforcement
And, at the same time, Twitter bots cloud service providers like Dropbox investigations. Snowden’s leaks, for
flooded social media in order to shut make determinations every day on example, are reported to have endan-
IMAGE BY ALICIA KUBISTA /A ND RIJ BORYS ASSOCIAT ES
down political dialog or to falsify sup- the security of users’ data. These en- gered the lives of British M16 agents in
port for candidates, while hate and bul- cryption decisions determine the very Russia and China.
lying flourish online. In short, the Inter- capability of states to examine user Laws lose their authority when gov-
net has embedded the means to block data in lawful investigations. ernments can no longer control the
political empowerment and discourse. Network infrastructure undermines use of power to enforce rules and hack-
the oversight and accountability of ers have control over weapons of mass
Undermining Democracy government. While open government disruption. Network infrastructure
In the intervening years since the early technologies enable greater transpar- removes the state’s monopoly on the
euphoria over the Internet’s political ency of public institutions, electronic use of coercive, police power to enforce
potential, the embedding of the In- tools also empower governments to rules and protect its citizens. Technol-
viewpoints
ogy allows lone-wolf actors unchecked for privacy have become more power-
by states to create and deploy weapons ful in people’s lives than rules from the
of mass disruption whether through Beyond undermining democratic constitutional framework.
malware, ransomware, or botnets. For key aspects of Business organizations are likely to
example, hospitals across the U.S. in serve as counterweights to govern-
the spring of 2016 faced a wave of ran- the rule of law, ment power. Google’s Transparency
somware attacks that left some in a the Internet Report, Apple’s defiance of an FBI re-
“state of emergency.” ISIS uses crowd quest for encryption keys, and Micro-
sourcing to sow terror in the U.S. and infrastructure has soft’s challenge to U.S. government
Europe. Simultaneously, the infrastruc- toppled critical access to foreign-based servers each
ture empowers private actors to engage reflect a check on the state’s intrusive-
in vigilante actions. The underground substantive legal ness. And, individuals like Snowden
group, Anonymous, recently illustrat- pillars of democracy. may serve as counterweights to states
ed such actions when they threatened and businesses. Individuals and as-
an electronic attack against ISIS fol- sociations of individuals have direct
lowing the Paris massacres in Novem- authority when they coalesce with on-
ber 2016. In essence, individuals and line tools ranging from social media to
associations now have tools—outside hacktivism as they perceive the need
the ability of state control—to enforce to interject and amplify their end goals
their choices and rules online in ways online. All while national government
that are independent of the state. To rapid and widespread dissemination provides checks on overreaching pri-
be sure when a Texas college discov- of harmful content, while wrongdo- vate actors. Where each actor from a
ered in 2015 that Facebook provided ers can shield their activities from ac- state to an individual can assure mass
better real-time information for an on- countability through encryption and disruption online, fair governance will
campus police emergency than 911, it anonymity tools. At the same time, free- require co-existence among the rule-
becomes clear the state has even lost dom of expression limits the authority making actors.
control over basic information it needs of states to ban nefarious online con- At the core, the assurance of public
to protect its citizens. tent. In the U.S., for example, there is accountability online is the key objec-
Beyond undermining key aspects of no public recourse for the rapid growth tive of Digitocracy. The mechanisms
the rule of law, the Internet’s infrastruc- of anti-Semitic Twitter accounts. Users for states, private actors and citizens
ture has toppled critical, substantive must appeal to the social media firms to co-exist as rule-makers in the net-
legal pillars of democracy. Freedom of who, in turn, then decide what to sup- worked society are likely to be defined
thought and association as well as pub- press or censor. By contrast, in Europe, in unexpected ways incorporating no-
lic safety are essential elements of de- platforms bear more legal responsibil- tions of federalism, multistakeholder
mocracy and privacy is a prerequisite. ity for content, but firms are often left governance, and subsidiarity. These
Yet, the network infrastructure con- in the same position as an all-powerful tools will draw the boundaries of rule-
tradicts the basic tenents of freedom censor. In effect, government is un- making authority among the state ac-
of association and privacy. Network able to suppress the vile and corrosive tors, platform operators, corporate orga-
functionality works thanks to ubiqui- online material that threatens citizens nizations, and empowered users. Each
tous data surveillance. The resulting without resorting to oppressive, anti- actor, whether state or non-state, has an
transparency of citizens to those in the democratic controls. important role to prevent overreaching
network undermine both state and citi- by the other actors. In essence, Digitoc-
zen’s respect for the rule of law. States The Opportunity of Digitocracy racy constructs a more multifaceted
lose important checks and balances The information society lacks a model set of interwoven checks and balances
against omnipotent acquisition of in- of governance suited to the digital age. to establish limits on the powers of
formation and citizen’s freedom of Going forward, the digital age will need both state and non-state actors and a
thought and association are undercut. a new system of checks and balances reliance on both to protect the public
Counterintuitively, public safety and for its political decision making—a good. For our future, now is the time
security are also destabilized by the “Digitocracy”—offering the opportuni- to begin the robust public discussion
transparency when stalkers, social en- ty to develop new governing principles on our means of governance in the
gineering hackers, and cyberwarriors that articulate who regulates what to digital age.
find the informational keys to success preserve public accountability online.
readily accessible online. Our challenge is how to construct Joel R. Reidenberg (jreidenberg@law.fordham.edu) is the
Stanley D. and Nikki Waxberg Chair and Professor of Law,
Freedom of expression is another the appropriate checks and balances. Fordham University, Director, Fordham Center on Law and
cornerstone of democracy. Yet, de- Digitocracy’s dynamic will be much Information Policy, and Visiting Research Affiliate, Center
for Information Technology Policy, Princeton University.
mocracies have a capability problem more complex than the analog world.
dealing with socially destructive con- Online private rule making like Twit- The author is preparing a book on this topic to be
tent like hate, threats, and cyberbul- ter’s decisions regarding censorship, published by Yale University Press.
lying that jeopardize public order and Adobe’s technical protections on digi-
individual safety. Technology allows tal content, and Facebook’s settings Copyright held by author.
28 COMMUNICATIO NS O F TH E ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

V
viewpoints
DOI:10.1145/3126492 Carolina Alves de Lima Salge and Nicholas Berente
Computing Ethics
Is That Social Bot
Behaving Unethically?
A procedure for reflection and discourse on the behavior of bots in the
context of law, deception, and societal norms.
A
TTEMPTING TO ANSWER
the question posed by
the title of this column
requires us to reflect on
moral goods and moral
evils—on laws, duties, and norms, on
actions and their consequences. In
this Viewpoint, we draw on informa-
tion systems ethics6,7 to present Bot
Ethics, a procedure the general social
media community can use to decide
whether the actions of social bots are
unethical. We conclude with a consid-
eration of culpability.
Social bots are computer algo-
rithms in online social networks.8
They can share messages, upload pic-
tures, and connect with many users
on social media. Social bots are more
common than people often think.a
Twitter has approximately 23 million Items purchased by Random Darknet Shopper, an automated computer program designed as
an online shopping system that would make random purchases on the deep Web. The robot
of them, accounting for 8.5% of total would have its purchases delivered to a group of artists who then put the items in an exhibition
users; and Facebook has an estimated in Switzerland; the robot was ‘arrested’ by Swiss police after it bought illegal drugs.
140 million social bots, which are be-
tween 5.5%–1.2% total users.b,c Almost service by disseminating information been reported to behave badly in a
27 million Instagram users (8.2%) are about earthquakes, as they happen, in variety of ways across various con-
estimated to be social bots.d LinkedIn the San Francisco Bay area. However, texts—everything from disseminat-
and Tumblr also have significant so- in other situations, social bots can being spam i and fake news j to limit-
cial bot activity.e,f Sometimes their have quite unethically. ing free speech.k But it is not always
activity on these networks can be in- clear whether their undesirable ac-
Social Bots Behaving Unethically
IMAGE COURTESY OF ! MED IENGRUPPE BITNIK
nocuous or even beneficial. For exam- tivity is simply a nuisance or whether

ple, SF QuakeBotg performs a useful LinkedIn reports that social bots on it is indeed unethical—particularly
the professional networking plat- given the random nature of the logic
a http://bit.ly/2uDfIbP form are often used to “steal data underlying many social bots. Bad ac-
b http://cnnmon.ie/2uFR4XJ about legitimate users, breaching tions are not necessarily unethical—
c http://bit.ly/1ieIIXN the user agreement and violating
d http://read.bi/1LFQJFU
e http://bit.ly/1Ktz5kc
copyright law.”h Social bots have i http://ubm.io/1MbsSf3
f http://tcrn.ch/2tKo90x j http://bit.ly/2ftn0It
g http://bit.ly/2vneleU h http://bit.ly/2vFRI4E k http://bit.ly/14bDiuN
viewpoints
Bot Ethics: How to determine whether social bot actions are unethical. ethical questions, such as whether
algorithms plant viruses in someone
else’s device. This is clearly illegal and
unethical. There are cases where a so-
Social Bot
Action cial bot might ethically violate the law,
such as civil disobedience for a cause
the creator considers just. However,
civil disobedience is only ethical in
Y Appeal to
1. Break Law? very rare cases in constitutional de-
Majority?
mocracies where legal recourse for
2. Involve Y Higher unjust laws pervade.6 Cases where a
Deception? Duty? N
law may be broken that are not unethi-
3. Violate Y If Evil, Less cal require justification—compelling
Strong Norm? than Good?
N arguments that appeal to moral stan-
Justifiable? dards of the majority.6 Only in such
rare cases may illegal acts be seen as
moral and therefore ethical.6 Thus we
Not Y
Unethical Unethical ask “Is the illegal act justifiable?” Acts
that are not suitably justifiable (that
is, do not appeal to the morality of the
majority) are unethical. Swiss author-
ities did not file charges against the
there are shades of gray that are dif- Bot Ethics: A Procedure to Evaluate Random Darknet Shopper developers.p
ficult to judge. the Ethics of Social Bot Activity They argued that social bots can buy
For example, Tay,l a social bot cre- Ethics in philosophy dates back thou- illegal narcotics over the Internet for
ated by Microsoft to conduct research sands of years, and this Viewpoint col- the purpose of artq and that “ecstasy
on conversational understanding, umn cannot do justice to the entire in this presentation was safe.” The
went from “humans are super cool” field. However, because of the increas- behavior was not unethical because it
to “Hitler was right I hate the Jews” ing prominence of social bots and their was justified according to the pervad-
in less than 24 hours on Twitter due potential for malicious activity, ethical ing morality of the community.
to malicious humans interacting judgment about their activity is nec-
with the social bot.m In another case, essary. The best way to guide ethical Involve Deception?
a social bot tweeted “I seriously want conduct in a community is to provide a If a social bot’s behavior does not
to kill people” from randomly gen- procedure for reflection and discourse.5 break any laws, next evaluate for truth-
erated sentences during a fashion The procedure we created is called “Bot fulness: “Is any deception involved?” So-
convention in Amsterdam.n Clearly Ethics” (see the figure here) and it fo- cial bots may act deceitfully. For exam-
such inadvertent comments violate cuses on the behavior of social bots with ple, they can misrepresent themselves
our sensibilities and are distaste- respect to law, deception, and norms. as human beings2 or spread untruth-
ful, but are they unethical? Perhaps, ful information (such as fake news).
but by what standard do we judge? Break Law? Deceiving acts communicate false or
Some social bots do more than just Many laws are developed from ethical erroneous assertions, violating the
comment—clearly those that steal principles.6 Even when a law may be prima facie duty of fidelity. Social bots
information and other misdeeds flawed, it is typically the ethical course should always act truthfully.3 However,
are engaging in unethical activity, of action to follow that law.9 Therefore deceitful acts can be justifiable if the
but, again, it is not always so clear. a natural first question is: “Does the ac- duty of fidelity is superseded by a high-
For instance, the Random Darknet tion of the social bot break the law?” The er-order duty, such as beneficence.r
Shopper—a social bot coded to ex- objective is to assess straightforward Deceptive, satirical actions may not
plore the dark Web in the name of be unethical since they elicit pleasure,
art—inadvertently purchased 10 Ec- improving the life of others. Consider
stasy pills (an illegal narcotic) and a Social bots have been Big Data Batmans as an illustration.
counterfeit passport. o So a law was
broken, but was this unethical be- reported to behave p By “developer” we are referring to either the
havior? We developed a procedure, badly in a variety organization or management of the organiza-
tion or the software developer involved in the
which we describe next, to help an-
swer such questions. of ways across creation of the social bot.
q http://bit.ly/2ud2cZC
various contexts. r Beneficence is the duty to bring virtue, knowl-
edge or pleasure to others; other duties, ac-
l https://twitter.com/TayandYou cording to Ross 1930, include non-malefi-
m http://bit.ly/14bDiuN cence, self-improvement, justice, gratitude,
n http://bit.ly/2ttN5Ox reparation (see Mason et al.7, p. 132–133).
o http://bit.ly/2vFGdu9 s http://bit.ly/2ttNUH7

viewpoints
The social bot finds every tweet with Conclusion

the term big data, replaces “big data” We do not purport to write the last
with “Batman,” and then tweets the Should the general word on social bot ethics and culpabil-
message as if it were its own. It obvi- social media ity. Ethics is simply too complex of a
ously substitutes its words for others’ domain to deal with fully in such a for-
words, but the satire makes it difficult community blame mat. Nevertheless, some readily acces-
to judge its ethics. Because the social developers for the sible guidance rooted in sound ethical
bot might insult and embarrass some thinking is in order.
big-data advocates the community unethical behavior of For example, with the recent at-
must go beyond the act (deontology) to their social bots? tention to the role of social bots in
consider its consequences (teleology), spreading misinformation in the
and ask whether potentially bad ac- form of “fake news,” other social
tions (for example, insult and embar- bots, such as Reuters News Tracer,
rassment) outweigh, or supersede, the are being created to ferret out such
good (for example, pleasure through deceitful activity.v The Bot Ethics
laughter) for the involved parties. Hitler—Microsoft developers or those procedure can help the social media
Again, is the deception justifiable? De- teaching the social bot to generate community understand when these
ception in the absence of supersession racist statements? Similarly, who is re- deceitful actions are indeed unethi-
is likely to be unethical. sponsible for the social bot buying the cal. It further helps to expand the
illegal narcotics? focus of the community beyond nar-
Violate Strong Norm? Aristotle1 said we can only assign cul- row (that is, only deceitfulness) and
Social bots that are legal and truthful pability if we know that individuals be- simplistic (that is, good or bad bot)
can still behave unethically by violat- haved voluntarily and knowingly. Invol- assessments of social bot activity to
ing strong norms that create more evil untary situations likely do not apply to attend to the complexities of ethical
than good. Moral evils inflict “limits on social bots. Developers who are coerced assessments. In short, the Bot Ethics
human beings and contracts human into doing something unethical with- procedure serves as a starting point
life.”4 Evil restrains, instead of emanci- out a choice may not be entirely cul- and guide for ethics-related discus-
pating, evil actions reduce opportuni- pable, but in the case of free enterprise sion among various participants in
ties. Let us go back to Tay’s racist com- there is always a choice. Therefore, cul- a social media community, as they
ments on Twitter. Although not illegal pability rests on the knowledge of the evaluate the actions of social bots.
(First Amendment protections apply), developers. Developers who knowingly
nor deceitful, they violated the strong create social bots to engage in unethi- v http://bit.ly/2hIlfXG
norm of racial equality. Social media cal actions are clearly culpable. They
companies like Twitter that temporar- should be punished if evidence of their
References
ily lock or permanently suspend ac- wrongdoing is convincing—the penalty 1. Aristotle. Nicomachean Ethics of Aristotle. E.P.
counts that “directly attack or threaten must be consistent and proportional Dutton, NY, 1911.
2. Ferrara, E. et al. The rise of social bots. Commun. ACM
other people on the basis of race,”t to the harm done and those affected 59, 7 (July 2016); 96–104; DOI: 10.1145/2818717
3. Gotterbarn, D., Miller, K. and Rogerson, S. Computer
have established that the moral evil should be compensated.7 society and ACM approve software engineering code of
of racism outweighs the moral good But what about situations where ethics. Computer Society Connection, (1999), 84–88.
4. Grisez, G. and Shawn, R. Beyond the New Morality: The
of free speech. By applying Bot Ethics developers act unknowingly? In those Responsibilities of Freedom. University of Notre Dame
to Twitter’s norms we conclude that occasions the community must deter- Press, Notre Dame, IN, 1980.
5. Habermas, J. The Theory of Communicative Action,
Tay’s actions were unethical. Yet, there mine whether developers are culpably Volume 1: Reason and the Rationalization of Society. 1985.
are cases where social bots may violate ignorant—did they ignore industry best 6. Kallman, E.A. and Grillo, J.P. Ethical Decision Making
and Information Technology. McGraw-Hill, New York,
strong norms and not act unethically, practices in creating and testing their NY, 1996.
as with asking inappropriate questions algorithms? If industry guidelines were 7. Mason, R.O., Mason, F.M., and Culnan, M. Ethics of
Information Management. Sage Publications,
(what is your salary?). Such violations not followed and the action was unethi- London, U.K.
do not create moral evils. cal, developers are culpable. However, 8. Morstatter, F. et al. A new approach to bot detection:
Striking the balance between precision and recall.
developers who followed good develop- ASONAM, 2016.
Culpability of Unethical ment practices and incorporated the 9. Rawls, J. The justification of civil disobedience.
Arguing about Law (2013). 244–253.
Social Bot Behavior current industry thinking, and yet their
Should the general social media com- social bot still acted unethically, de-
munity blame developers for unethi- serve our pity and pardon, but they are Carolina Alves de Lima Salge (csalge@uga.edu) is a
doctoral candidate at the University of Georgia.
cal behavior of their social bots? In not culpable. They should apologize,
Nicholas Berente (berente@uga.edu) is an associate
the example of the algorithm that correct immediately, learn from their professor at the University of Georgia.
randomly generated that it wanted to experience, and communicate the oc-
kill people, who is responsible for the currence to the development commu-
death threat? The programmer? Who nity. For example, Microsoft posted its
is responsible for Tay’s remark about learning from Tay in blog form.u
t http://bit.ly/19SJwlt u http://bit.ly/2tiPfMH Copyright held by authors.
V
viewpoints
DOI:10.1145/3126494 Peter J. Denning
The Profession of IT
Multitasking
Without Thrashing
Lessons from operating systems teach
how to do multitasking without thrashing.
O
to
U R I N D I VI D UAL ABILITY The first four destinations basi- mercial world with its OS 360 in 1965.
be productive has been cally remove incoming tasks from your Operating systems implement mul-
hard stressed by the sheer workspace, the fifth closes quick loops, titasking by cycling a CPU through a
load of task requests we and the sixth holds your incomplete list of all incomplete tasks, giving each
receive via the Internet. In loops. GTD helps you keep track of one a time slice on the CPU. If the task
2001, David Allen published Getting these unfinished loops. does not complete by the end of its
Things Done,1 a best-selling book about The idea of tasks being closed loops time slice, the OS interrupts it and puts
a system for managing all our tasks to of a conversation between a requester it on the end of the list. To switch the
eliminate stress and increase produc- and a perform was first proposed in CPU context, the OS saves all the CPU
tivity. Allen claims that a considerable 1979 by Fernando Flores.5 The “condi- registers of the current task and loads
amount of stress comes our way when tions of satisfaction” that are produced the registers of the new task. The de-
we have too many incomplete tasks. by the performer define loop comple- signers set the time slice length long
He views tasks as loops connecting tion and allow tracking the movement enough to keep the total context switch
someone making a request and you of the conversation toward completion. time insignificant. However, if the time
as the performer who must deliver the Incomplete loops have many negative slice is too short, the system can signif-
requested results. Getting systematic consequences including accumulations icantly slow down due to rapidly accu-
about completing loops dramatically of dissatisfaction, stress, and distrust. mulating context-switching time.
reduces stress. Many people have found the GTD When main memory was small, mul-
Allen says that operating systems are operating system to be very helpful at titasking was implemented by loading
designed to get tasks done efficiently completing their loops, maintaining only one task at a time. Thus, each con-
on computers. Why not export key ideas satisfaction with work, and reducing text switch forced a memory swap: the
about task management into a person- stress. It is a fine example of us taking pages of the running task were saved to
al operating system? He calls his oper- lessons from technology to improve disk, and then the pages of the new task
ating system GTD, for Getting Things our lives. loaded. Page swapping is extremely ex-
Done. The GTD system supports you in pensive. The 1965 era OSs eliminated
tracking open loops and moving them Multitasking this problem by combining multitask-
toward completion. It routes incoming Unfortunately, GTD does not eliminate ing with multiprogramming: the pages
requests to one of these destinations in another source of stress that was much of all active tasks stay loaded in main
your filing system: less of a problem in 2001 than today. memory and context switching involves
˲˲ Trash This is the problem of thrashing when no swapping. However, if too many tasks
˲˲ Tasks that might one day turn out you have too many tasks in progress at were activated, their allocations would
to be worth doing the same time.2 be too small and they would page exces-
˲˲ Tasks that serve as potential future The term multitasking is used in op- sively, causing system throughput to col-
reference points erating systems to mean executing mul- lapse. Engineers called this thrashing, a
˲˲ Tasks delegated to someone else, tiple computational processes simulta- shorthand for “paging to death.”
awaiting their response neously. The very first operating system Eventually researchers discovered
˲˲ Tasks that can be completed im- do this was the Atlas supervisor, running the root cause of thrashing and built
mediately in under two minutes at the University of Manchester, U.K., in control systems to eliminate it—I will
˲˲ Tasks accepted for processing 1959. IBM brought the idea to the com- return to this shortly.

viewpoints
Figure 1. In this memory map of a Firefox Browser in Linux, the colored pixels indicate that a page (vertical axis) is used during a fixed size
execution interval (horizontal axis). The locality sets (pages used) are small compared to the whole address space and their use persists
over extended intervals.
Instructions Modify Load Store html

7bb0
6cc4
PAGES
5dd8
4eec
4000
0 75341312 150682624 226023936 301365249

INSTRUCTIONS (376706 per pixel)
Page size: 4096: 0 to 2% memory
Human Multitasking numerals. With fewer context switches, decision process that can take quite a
Humans multitask too by juggling sev- time-slicing is faster than fine-grained long time to decide—a situation known
eral incomplete tasks at once. Cogni- multitasking but still slower than one- as the choice uncertainty problem.4
tive scientists and psychologists have at-a-time processing. A third factor that slows human multi-
studied human multitasking for almost Human context switching is more tasking is gathering the resources neces-
two decades. Their main finding is that complicated than computer context sary to continue with a task. Some resourc-
humans do not switch tasks well. Psy- switching. Whereas the computer con- es are physical such as books, equipment,
chologist Nancy Napier illustrates with text switch replaces a fixed number of and tools. Some are digital such as files,
a simple do-it-yourself test.7 Write “I am bytes in a few CPU registers, the human images, sounds, Web pages, and remote
a great multitasker” on line 1 and the has to recall what was “on the mind” at databases. And some are mental, things
series of numbers 1, 2, 3, …, 20 on line the time of the switch and, if the human you have to remember about where you
2. Time how long it takes to do this. Now was interrupted with no opportunity to were in the task and what approach you
do it again, alternating one letter from choose a “clean break,” the human has were taking to perform it. All these re-
line 1 and one numeral from line 2. to reconstruct lost short term memory. sources must be close at hand so that you
Time how long it takes. For most people, Context switching is not the only can access them quickly.
FIGURE 1 CO URT ESY OF ANDRIAN M CMENA M IN
the fine-grained multitasking in the sec- problem. Whereas a computer picks These three problems plague multi-
ond run takes over twice as long as the the next task from the head of a queue, taskers of all age groups. Many studies
one-task-at-a-time first run. Moreover, your brain has to consider all the tasks report considerable evidence of nega-
you are likely to make more errors while and select one, such as the most urgent tive effects—multitasking seems to
multitasking. This test reveals just how or the most important. The time to reduce productivity, increase errors,
slow our brains are at context switching. choose a next task goes up faster than increase stress, and exhaust us. Some
You can try the test a third time using linear with the number of tasks. More- researchers report that multitaskers are
time-slicing, for example writing five over, if you have several urgent impor- less likely to develop expertise in a topic
letters and then switching to write five tant tasks, your brain can get stuck in a because they do not get enough inten-
viewpoints
Figure 2. OS control system to maximize throughput with variable partition of main time needed for a task.
memory determined by task working sets. ˲˲ Some tasks need to be held aside
in an inactive status until you have the
main memory (active tasks)
capacity to deal with them. Analog: the
tasks awaiting waiting tasks queue.
activation ˲˲ When a task’s working set is in
accepted
tasks free WS1 WS1 WS3 WS4 your workspace, protect it from being
completed
incoming tasks unloaded as long as the task is active.
requests tasks put Analog: protect working sets of active
aside by OS open valve when tasks and do not steal from other tasks.
first waiting task’s ˲˲ You will thrash if you activate too
WS fits into free
many tasks so that the total demand is
beyond your capacity. Analog: insuffi-
cient CPU and memory for active tasks.
sive focused practice with it. Some fret susceptible to thrashing as the num- ˲˲ If you are able to choose moments
that if we do not learn to manage our ber of tasks sharing memory increases of context switch, select a moment of
multitasking well, we may wind up be- because each gets a smaller workspace “clean break” that requires little men-
coming a world of dilettantes with few and, when the workspaces are smaller tal reacquisition time when you return
experts to keep our technology running. than the working sets, every task is to the task. If you cannot defer an in-
Thrashing happens to human mul- quickly interrupted by a page fault. terruption to such a moment, you will
titaskers when they have too many in- Under working-set partitioning need more reacquisition time because
complete tasks. They fall into a mood the OS sizes the workspaces to hold you will have to reconstruct short-term
of “overwhelm” in which they expe- each task’s measured working set. As memory lost at the interruption. Ana-
rience considerable stress, cannot shown in Figure 2, it loads tasks into log: ill-timed interrupts can cause loss
choose a next task to work on, and can- memory until the unused free space is of part of a working set.
not stay focused on the chosen task. It too small to hold the next task’s work- You are likely to find that you can-
can be a difficult state to recover from. ing set; the remaining tasks are held not accommodate more than a few
Let us now take a look at what OSs aside in a queue until there is room for active tasks at once without thrash-
do to avoid thrashing and see what les- their working sets. When a task has a ing. However, with the precautions
sons we can take to avoid it ourselves. page fault, the new page is added to its described here, thrashing is unlikely.
workspace by taking a free page; when If it does occur you will feel over-
Locality, Working Sets, and Thrashing any page has not been used for T mem- whelmed and your processing effi-
The OS seeks to allocate memory ory references, it is evicted from the ciency will be badly impaired. To exit
among multiple tasks so as to maxi- task’s workspace and placed in the free the thrashing state, you need to reduce
mize system throughput—the number space. Thus, the OS divides the memory demand or increase your capacity. You
of completed tasks per second.3 among the active tasks such that each can do this by reaching out to other
The accompanying Figure 1 is strong task’s workspace tracks its locality sets. people—making requests for help, re-
graphical evidence of the principle of lo- Page faults do not steal pages from oth- negotiating deadlines, acquiring more
cality—computations concentrate their er working sets. This strategy automati- resources, and in some cases cancel-
memory accesses to relatively small lo- cally adjusts the load (number of active ing less important tasks.
cality sets over extended intervals. Local- tasks) to keep throughput near its maxi-
ity should be no surprise—it reflects the mum and to avoid thrashing. References
1. Allen, D. Getting Things Done. Penguin. 2001.
way human designers approach tasks. Context switching is not the cause of 2. Christian, B. and Griffiths. T. Algorithms to Live By: The
We use the term working set for OS’s thrashing. The cause of thrashing is the Computer Science of Human Decisions. Henry Holt
and Company, 2016.
estimate of a task’s locality set. The for- failure to give every active task enough 3. Denning, P. Working sets past and present. IEEE Trans
mal definition is that working set is space for its working set, thereby caus- Software Engineering SE-6, 1 (Jan. 1980), 64–84.
4. Denning, P. and Martell, C. Great Principles of
the pages used in a backward-looking ing excessive movement of pages be- Computing. MIT Press, 2015.
window of a fixed size T memory refer- tween secondary and main memory. 5. Flores, F. Conversations for Action and Collected
Essays. CreateSpace Independent Publishing
ences. In Figure 1, T is the length of the Platform, 2012.
6. McMenamin, A. Applying working set heuristics to
sampling interval and the working set Translation to Human Multitasking the Linux kernel. Masters Thesis, Birkbeck College,
equals the locality set 97% of the time. Although the analogy with OSs is not University of London, 2011; http://bit.ly/2vFSgY8
7. Napier, N. The myth of multitasking, 2014; http://bit.
Each task needs a workspace—its perfect, there are some lessons: ly/1vuBGcC
own area of memory in which to load its ˲˲ Recognize that each task needs a
pages. There are at least two ways to di- variable working set of resources (phys- Peter J. Denning (pjd@nps.edu) is Distinguished
vide the total memory among the active ical, digital, and mental), which must Professor of Computer Science and Director of the
Cebrowski Institute for information innovation at
tasks. In fixed partitioning, the OS gives be easily accessible in your workspace. the Naval Postgraduate School in Monterey, CA,
each task a fixed workspace. In work- Analog: the working set of pages. is Editor of ACM Ubiquity, and is a past president of ACM.
The author’s views expressed here are not necessarily
ing-set partitioning, the OS gives each ˲˲ Your capacity to deal with a task is those of his employer or the U.S. federal government.
task a variable workspace that tracks the resources and time needed to get
its locality sets. Fixed partitioning is it done. Analog: the memory and CPU Copyright held by author.

V
viewpoints
DOI:10.1145/3126156 Gregorio Convertino and Nancy Frishberg
Viewpoint
Why Agile Teams Fail
Without UX Research
Failures to involve end users or to collect comprehensive data representing
user needs are described and solutions to avoid such failures are proposed.
L
ESSONS LEARNED BY two user interactions supported by the app to ac-
researchers in the software complish a goal).9
industry point to recurrent Even when customers ˲˲ With growing emphasis on good
failures to incorporate user are involved, UX design, UX professionals, both de-
experience (UX) research signers and researchers, are gradually
or design research. This leads agile sometimes the teams being incorporated as required roles
teams to miss the mark with their may still fail to involve in software development, alongside
products because they neglect or mis- product managers and software de-
characterize the target users’ needs the actual end users. velopers. A 2014 Forrester survey of
and environment. While the reported 112 companies found that organiza-
examples focus on software, the les- tions in which there was systematic
sons apply equally well to the develop- investment in UX design process and
ment of services or tangible products. user research self-evaluated as having
greater impact than those with more
Why It Matters to with wide adoption of mobile devices. limited scope of investment.
the ACM Community Any new application needs to do some- These trends describe a new con-
DILBERT © 2 012 SC OT T A DAM S. USED BY PERM ISSION OF ANDREW S MCM EEL SYNDICAT IO N. AL L RIGH TS RES E RV E D.
Over the past 15 years, agile and lean thing useful or fun, plus it needs to do text that often finds agile teams un-
product development practices have it well and fast enough. In 2013, tech- prepared for two main reasons. First,
increasingly become the norm in the nology analysts found that only 16% of while the agile process formally val-
IT industry.3 At the same time, two people tried a new mobile app more ues the principle of collaboration
synergistic trends have also emerged. than twice, suggesting that users have with customers to define the product
˲˲ End users’ demand for good user low tolerance for poor user experience vision, we and our colleagues in in-
experience has increased significantly, (UX) (where UX is the totality of user’s dustry too often observe this princi-
viewpoints
ple not being put into practice: teams internal tools unavailable to external
do not validate requirements system- customers; and do not need to use the
atically in the settings of use. Second, Agile teams product within the target users’ time
even when customers are involved, without constraints or digital environment.
sometimes the teams may still fail to Second, the evidence internal prox-
involve actual end users. As Rosen- user research ies bring to the team is also biased.
berg puts it, when user requirements are prone Professional sales and support staff
are not validated but are still called are more likely to channel the needs
“user stories,” it creates “the illusion to building of the largest or most strategic existing
of user requirements” that fools the the wrong customers in the marketplace. They
team and the executives, who are then are more likely to focus on pain points
mystified when the product fails in product. of existing customers and less on
the marketplace.10 what works well. Also, they may ignore
In this Viewpoint, we illustrate five new requirements that are not yet ad-
classic examples of failures to involve dressed by the current tool or market.
actual end users or to gather suffi- Therefore internal staff cannot be
ciently comprehensive data to repre- the sole representative of “users”—
sent their needs. Then we propose how who chooses it. Then a customer demo as shown in the “Dilbert” comic strip
these failures can be avoided. (or stakeholder review) at the end of at the beginning of this column.
an iteration confirms that each user User research welcomes their com-
Five Cases of Neglect or story is satisfied. Here is when the ments about competitive analysis,
Mischaracterizations of the User terms customer and user are conflat- current insights about information
We identified five classic cases of fail- ed. For enterprise software and large architecture or other issues, which
ures to involve actual end users. systems, practice teaches us that of- complement customer support data,
The Wild West case. The first and ten the “end-of-iteration customer” UX research, and other sources of
most obvious case occurs when the is someone representing the product user feedback.
team does not do regular testing chooser rather than the end user. Executives liking sales demos ≠
with the users along the develop- So the end-of-iteration demo cannot target users adopting product. En-
ment process. Thus the team fails to be the sole form of feedback to predict terprise software companies, during
evaluate how well the software built user adoption and satisfaction. In ad- their annual customer conferences,
fits target users, their tasks, and their dition, the software development team use a sales demo to portray features
environments. A real-life example of should also leverage user research to and functions intended to excite the
this failure is the development and answer questions such as: audience of buyers, investors, and the
deployment of Healthcare.org, where ˲ ˲ What are the classes of users market analysts about the company
the team, admittedly, did not fully test (personas)? strategy. However, positive responses
the online health insurance market- ˲˲ Have we validated that the intended to the sales demos should not be tak-
place until two weeks before it opened users have the needs specified in the en as equivalent to assertions about a
to the public on October 1, 2013. Then user stories? product’s user requirements. Instead,
the site ran into major failures.8 ˲˲ What are the current user practices these requirements need confirmation
Chooser ≠ target user. The second before the introduction of the product via a careful validation cycle. Let sales
case is neither new nor unique to ag- and the impact afterward? demos open a door toward users with
ile. The term “customer” conflates the ˲˲ How would we extend the tool to the help of choosers and influencers.
chooser with the user. Let’s unpack support new personas or future use Similarly, Customer Advisory
these words: cases? Boards (which draw from customers
˲˲ A customer is often an organiza- Internal proxies ≠ target user. The who have large installations, or who
tion (the target buyer of enterprise third case is about bias. Some teams represent a specific or important seg-
software, that is, product chooser) as work with their in-house profes- ment of the market) stand in for all
represented by the purchasing officer, sional services or sales support staff customers and offer additional op-
an executive or committee that makes (that is, experts thought to represent portunities to showcase future fea-
a buying decision. large groups of customers) as proxies tures or strategy. However, a basic law
˲˲ A customer is the target user only for end users. While we appreciate for success in the software industry is
for consumer-facing products. For the expertise and knowledge these “Build Once, Sell Many.”7 This prin-
enterprise software, target users may resources bring, we are wary of two ciple creates an inherent tension be-
be far from the process of choosing a common types of misrepresentation tween satisfying current customers
product, and have no input about prod- in these situations. and attracting new ones. Therefore, a
ucts the organization selects. First, internal proxies are unrepre- software company needs to constant-
Agile terminology adds to the confu- sentative as end users because they ly rethink their tiered offerings to in-
sion: product teams write user stories have multiple unfair advantages: they clude new market segments or cus-
from the perspective of the person know the software inside out, includ- tomer classes as these emerge, and
who uses the software, not the one ing the work-arounds; have access to avoid one-off development efforts.

viewpoints
Confusing business leaders with us- Every software company is in the egories, or brands, and tries to predict
ers or the sales demo with the product business of finding and keeping new the likelihood of purchase, engagement,
prototype leads companies to build customers. Suppose the logs show the or subscription.
products based on what sales and subscribers of an online dating applica- ˲˲ User research aims at improving
product managers believe is awesome tion are not renewing. Should the com- the user experience by understand-
(for example, see Loranger6). Instead, pany rejoice or despair? If people are ing the relation between actual usage
we advocate validating the designs getting good matches, and thus are sat- behaviors and the properties of the
with actual end users during the prod- isfied, non-renewal implies success. If design. To this end, it measures the
uct development. they are hopelessly disappointed by not behavior and attitudes of users thereby
Big data (What? When?) < The getting dates, non-renewal implies fail- learning whether the product (or ser-
full picture (... How? Why?). Collect- ure. Big data won’t tell you which, but vice) is usable, useful and delightful,
ing and analyzing big data about observing and listening to even a hand- including after decision to purchase.
digital product use is popular among ful of non-renewing individuals will. We urge organizations to act strate-
product managers and even soft- In brief, quantitative data is use- gically and connect market research,
ware developers, who can now learn ful but has two limitations: First, it user research, and customer success
what features get traction with us- will not tell the team why the current functions. This requires aligning goals
ers. We support the use of big data features are or are not used.5 Different and sharing data among Marketing,
techniques as part of user research classes of users can have different rea- Sales, Customer Success, and the UX
and user-centered design, but not as sons. Second, it will not identify what Team (typically in Product or R&D).1,4
a substitute for qualitative user re- additional or alternative features ap-
search. Let’s review two familiar ways peal to a new class of users unfamil- The Way Forward:
to use big data on usage: user data iar with the product. To answer these Educate Managers and Agile
analytics and A/B testing. questions the team needs to rely on Development Teams
User data analytics can quickly an- qualitative research with existing and We have shown five different ways that
swer questions about current usage: proposed classes of users. agile teams without user research are
quantity and most frequent patterns, prone to building the wrong product.
such as How many? How often? Market Research ≠ User Research To avoid such failures, we invite soft-
When? Where? Once a product team Finally, we point to the growing and ware managers and product teams
has worked out most of the design worrisome tendency in industry to mix to assess and fill the current gap in a
(interaction patterns, page layouts, up user research with market research. team’s competencies. The closing ta-
and more), A/B testing compares de- Market research groups make great ble gives short-term and longer-term
sign alternatives, such as “which im- partners for user research. While user action items to address the gaps.
age on a page produces more click- research and market research have a few
throughs”? In vivo experiments with techniques in common (for example,
References
sufficient traffic can generate large surveys and focus groups), the goals and 1. Buley, L. The modern UX organization. Forrester
amount of useful data. Thus, A/B variables they focus on are different. Report. (2016); https://vimeo.com/121037431
2. Grudin J. From Tool to Partner: The Evolution of Human-
testing is very helpful for small in- ˲˲ Market research seeks to under- Computer Interaction. Morgan & Claypool, 2017.
3. HP report. Agile Is the New Normal: Adopting Agile
cremental adjustments. stand attitudes toward products, cat- Project Management. 4AA5-7619ENW, May 2015.
4. Kell, E. Interview by Steve Portigal. Portigal blog.
Actions to address gaps in UX competencies. Podcast and transcript. (Mar. 1, 2016); http://www.
portigal.com/podcast/10-elizabeth-kell-of-comcast/
5. Klein, L. UX for Lean Startups: Faster, Smarter User
Short term Experience Research and Design. O’Reilly, 2013.
1. Analyze the current skills of the team and Support product managers (or product
2. 6. Loranger, H. UX Without User Research Is Not UX.
(Aug. 10, 2014) Nielsen Norman Group blog. http://
flag the gap. A functional product team needs owners) with investment in UX. www.nngroup.com/articles/ux-without-user-research/
several key skill sets or UX competencies: Too often, product managers find their role 7. Mironov, R. Four Laws Of Software Economics. Part 2:
UX research, UX design, UI software is a sort of “kitchen sink” for any task Law of Build Once, Sell Many. (Sept. 14, 2015); http://
development and prototyping.11 These might be that is not software development. www.mironov.com/4law2/
filled by training the current team members or We encourage product managers to find 8. Pear, R. Contractors Describe Limited Testing of
Insurance Web Site. New York Times (Oct. 24, 2013);
hiring UX professionals full-time or part-time. additional resources in the UX competencies, http://nyti.ms/292NryG
to benefit both product and their workload. 9. Perez, S. Users have low tolerance for buggy apps.
Techcrunch. (Mar 12, 2013);[ http://tcrn.ch/Y80ctA
10. Rosenberg, D. Introducing the business of UX.
Longer term Interactions. Forums. XXI.1 Jan.–Feb. 2014.
11. Spool, J.M. Assessing your team’s UX skills. UIE. (Dec.
3. Integrate UX competencies 10, 2007); https://www.uie.com/articles/assessing_
a. Teams need UX research competencies as well as UX design skills (interaction, visual). ux_teams/
Other related skill sets include content development and documentation; accessibility;
globalization and localization. Gregorio Convertino (gconvertino@informatica.com)
4. Collect and prioritize findings from user research is a UX manager and principal user researcher at
a. Seek user feedback early and often. Informatica LLC.
b. Create channels to learn from end users and appropriate surrogates.
Nancy Frishberg (nancyf@acm.org) is a UX researcher
c. Prioritize UX issues during backlog grooming; remove friction and measure delight. and strategist, in private practice, and a 25+-year member
d. Build new features only after steps 4.a.–c. are done for each key version of the product. of the local SIGCHI Chapter BayCHI.org.
Copyright held by authors.
V
viewpoints
DOI:10.1145/3012006 Andrew Conway and Peter Eckersley
Viewpoint
When Does Law Enforcement’s
Demand to Read Your Data
Become a Demand to Read
Your Mind?
On cryptographic backdoors and prosthetic intelligence.
T
H E RECE N T DI SPU T E between
the FBI and Apple has raised
a potent set of questions
about companies’ right to
design strong cryptographic
protections for their customers’ data.
The real stakes in these questions are
not just whether the security of our de-
vices should be weakened to facilitate
FBI investigations, but ultimately, the
ability of law enforcement and intelli-
gence agencies to read our minds and
most intimate private thoughts.
In the U.S. and other countries,
there have been many legal cases in
recent years pitting the demands of
law enforcement against the concerns
of technology companies and privacy
advocates over access to new, tech-
nologically generated, information
about people. The disputed topics
have included spy agencies’ bulk col-
IMAGE COLL AGE BY ANDRIJ BORYS ASSOC IAT ES/SH UTT ERSTOC K
lection of Internet traffic and mobile

phone metadata; law enforcement use
of location-tracking devices, malware,
and fake cellphone towers; the consti- about the boundaries between types of cally fair game for law enforcement to
tutionality of “gag orders” that make it information that the police can obtain demand if it had probable cause and
a crime for individuals and companies about people simply by demanding it obtained a warrant. But there was not
to ever discuss certain requests they re- with letters called subpoenas, and in- nearly as much to collect: people did
ceive for others’ data. formation for which a court-issued war- not carry recording and tracking de-
In some sense, this is not a new de- rant is necessary. What has changed vices with them everywhere, and they
bate; the Fourth Amendment to the are the stakes of these disputes. did not turn over the most intimate
U.S. constitution, for instance, has en- As the law has operated in the past, details of their lives to multinational
gendered a long history of litigation almost any information was theoreti- technology companies. There were

viewpoints
also legal limits: the private thoughts

of defendants were largely protect-
We have no choice
Calendar
ed by rights to remain silent and
against self-incrimination—histori-
cal legal protections that sprang up
but to pour our of Events
as shields against religious persecu- minds out if we September 2
tion. Unfortunately, changes to our want to exist and APSys ‘17: 8th Asia-Pacific
Workshop on Systems,
lifestyles, to our relationship with
technology, and to the very process perform at the Mumbai, India,
Sponsored: ACM/SIG,
of human cognition are making these same level as the Contact: Purushottam Kulkarni,
Email: puru@cse.iitb.ac.in
protections so impractical that they
may cease to exist at all. humans around us. September 3–9
So, what do we mean by changes to ICFP ‘17: ACM SIGPLAN
the process of human cognition? International Conference on
Pens and paper are wonderful things. Functional Programming,
Oxford, U.K.,
“Hang on. Let me write that down,” or “I Sponsored: ACM/SIG,
need a pen and paper to work this out,” Contact: Jeremy Gibbons,
are the kinds of utterances that reveal built on prosthetic intelligence, one Email: jeremy.gibbons@cs.ox.
our dependence. It is intelligence that where the states we share through the ac.uk
makes us human, and a pen and paper Internet and the financial system are September 4–7
magnifies our intelligence. becoming more important than the DocEng ‘17: ACM Symposium
If you doubt this, consider any rea- biological and physical environment on Document Engineering 2017,
sonable method of measuring intelli- around us. Valletta, Malta,
Sponsored: ACM/SIG,
gence. A human with a pen and paper But this has come at a complicated Contact: Kenneth P. Camilleri,
will perform at least as well as, and price. You can think faster and more Email: kenneth.camilleri@
often much, much better than, the accurately, but your electronic devices um.edu.mt
same human without a pen and pa- know where you are, where you have
September 4–7
per. So it would be reasonable to state been, who you have talked to, what MobileHCI ‘17: 19th
that the pen and paper constituted a you said, what your heart rate was at International Conference on
prosthetic component of our intelli- the time, what you have looked at on Human-Computer Interaction
with Mobile Devices
gence, or at least a prosthetic aid for the Web, what medication you are and Services,
our imperfect memory. taking, what you have bought, what Vienna, Austria,
Furthermore, to read someone maps you have looked up, what spell- Sponsored: ACM/SIG
else’s notes is often described as a ing mistakes you make, and it is only
September 4–8
window into their mind. Reading accelerating. With virtual reality and ESEC/FSE’17: Joint Meeting
someone else’s diary without their augmented reality looking imminent, of the European Software
permission seems not only to be a vio- gadgets will begin to log almost every Engineering Conference and
lation of privacy but perhaps a form of action we take. And we have no choice the ACM SIGSOFT Symposium
on the Foundations of Software
taboo mind reading. but to pour our minds out if we want to Engineering,
Now consider the same human exist and perform at the same level as Paderborn, Germany,
having access to Google, Wikipedia, the humans around us. Sponsored: ACM/SIG,
Contact: Wilhelm Schaefer,
GPS, a calculator, a mobile phone to Ignoring arguments about precise Email: wilhelm@uni-
communicate with friends and col- definitions of words, it is clear that paderborn.de
leagues, and indeed the whole In- many humans in the developed world
ternet. As long as cat videos are not have a lot of their thoughts happen- September 6–8
WomEncourage ‘17: ACM-W
too much of a distraction, this well- ing, or at least observable, outside of Europe womENcourage
resourced human can answer hard their brain, and this is only likely to Celebration of
questions and perform many difficult increase in the future. It is through Women in Computing,
tasks much more quickly than people this lens that we need to understand Barcelona, Spain,
Sponsored: ACM/SIG,
even two decades earlier. the importance of Apple’s fight to use Contact: Núria Castell Ariño,
As hunters, weapons were pros- encryption to protect some (presently Email: castell@lsi.upc.edu
thetic claws. As gatherers, baskets very small) portions of its customers’
were prosthetic arms. After the de- data so that Apple (and transitively,
velopment of agriculture, horses and the FBI) cannot read it. The FBI wants
plows were huge prosthetic muscles. to be able to turn over literally every
Later the industrial revolution made digital stone in its investigation. But
us physically strong to a level unimagi- in the era of prosthetic intelligence,
nable beforehand. And looking back, that is equivalent to outlawing strong
the invention of writing was the first privacy for any corners of the modern
step on the road to a modern existence human mind.
viewpoints
Where is this heading? Consider strengthens the black market for in-
a future technological innovation—a dustrial espionage—many people
brain reader. It is a little device that With access to would pay to know the thoughts of
you attach to your skull that lets some- a vast store their competitors, people they are ne-
one read your thoughts. This could gotiating with, or even people they are
be a great boon to law enforcement. of reference considering going on a date with.
Trials could be conducted more ac- information Of course the state is not the only
curately by reading the thoughts of institution that wants to read your
the defendant. Even better, everyone massive deductions mind. There is great value to corpo-
could be required to daily attend a can be made. rations in knowing about you. They
mind reading to make sure they are collect this data from phone apps
not plotting any criminal acts. This and operating systems, credit cards,
would significantly cut down on pre- and web browsers; they use it to help
meditated crime, making our lives design their products, but also for
safer. Then we can concentrate on targeted advertising, differential
unpremeditated crime. Possibly there pricing, and other debatable pur-
are some thoughts that people who The available information is not poses. People joke, semi-seriously,
are likely to commit unpremeditated complete, and there will be gaps. But that Google knows you better than
crimes might think. We can proscribe you can inference an awful amount you know yourself. As well as being a
those thoughts, and then preemp- with limited data. Think about how threat in their own right, corporations
tively arrest people for thought crime. well you know your friends, and how provide an additional target of attack
While we are at it, the morality police you can often predict what decisions for an intrusive state: as Snowden’s
can put in laws against thinking rac- they will make, with only the small view leaks revealed, the NSA didn’t try to
ist, sexist, extremist, sacrilegious, of- of their world that you get from your in- track the location of every cellphone
fensive, or fattening thoughts. teractions with them. With access to on the planet directly: they let adver-
While such an extreme society may a vast store of reference information tisements and tracking code in apps
have a low crime rate, some people (in- massive deductions can be made. collect the data for them.
cluding us) may think this police state Conversely, the possibility of faulty Ultimately, the question of what
would not actually be a better society to deductions is itself a threat to individu- to do about the data accumulated by
live in. Even ignoring the horrors that als. You would not want to have per- technology companies is different
would result from imperfect readings, formed Internet searches for pressure from the question of what to do about
who doesn’t feel guilty about some- cookers and backpacks just before the the FBI, but it should also be under-
thing? As attributed to Cardinal Riche- Boston marathon bombings. stood that we have largely given these
lieu, “If you give me six lines written by Dedicated, well-meaning people companies the power to read our
the hand of the most honest of men, I in law enforcement naturally want minds, and might want to find alter-
will find something in them which will to be able to do their jobs better and natives to that arrangement.
hang him.” Such devices do not exist yet, make the world a safer, and thus bet- We fear we are slowly moving to-
although the demand has been strong ter, place. They see the new data as a ward the era of universal mind moni-
enough that polygraphs, notorious for boon, and law enforcement agencies toring without having recognized
unreliability, are widely used in the U.S. select extremely unphotogenic crimi- and considered it in those terms.
Other technologies like fMRI are al- nals and terrorists as the test cases And those are the terms in which we
ready being used and may turn out to be that will set the rules for millions of should understand battles about the
slightly more accurate than polygraphs, other people. Unfortunately, while right to use effective cryptography.
but we are still some distance from hav- this surveillance apparatus may oc- That wonderful gadget in your pocket
ing to worry about the societal effects of casionally be useful, it also poses a is not a phone. It is a prosthetic part
active mind-reading machines. structural threat to democracy. of your mind—which happens to also
What we have instead is a society Even beyond the threat of police be able to make telephone calls. We
moving toward prosthetic brains that states in the Western world and else- need to think of it as such, and ask
can be monitored at all times by the where, there is a fundamental issue again which parts of our thoughts
state, without the inconvenience of with cryptography that mathematics should be categorically shielded
having to have everyone check in each works the same regardless of whether against prying by the state.
day at the police station. It may feel less you are naughty or nice. So if the state
invasive to have one’s eye movements can break cryptography then so can Andrew Conway (andrewed@greatcactus.org) is
an engineer and mostly retired entrepreneur. He founded
recorded by your augmented reality other actors. There are obvious di- and ran Silicon Genetics.
glasses when an attractive member of rect applications to crime—knowing Peter Eckersley (pde@eff.org) is Chief Computer
the opposite sex walks past than to have when someone is away from home; Scientist for the Electronic Frontier Foundation,
San Francisco, CA.
a daily visit to the mind reader. The for- knowing who is worth kidnapping
mer is certainly more convenient than and what their movements are; iden-
the latter. But practically speaking, the tity theft, bank fraud, and so forth.
effects are the same. But ineffective cryptography also Copyright held by authors.

practice
DOI:10.1145/ 3080202
are far less than 100% available.

Article development led by
queue.acm.org
Thus, the marginal difference be-
tween 99.99% and 100% gets lost in
the noise of other unavailability, and
You’re only as available as the user receives no benefit from the
the sum of your dependencies. enormous effort required to add that
last fractional percent of availability.
BY BEN TREYNOR, MIKE DAHLIN, VIVEK RAU, AND BETSY BEYER Notable exceptions to this rule in-
clude antilock brake control systems
The Calculus
and pacemakers!
For a detailed discussion of how
SLOs relate to SLIs (service-level indi-
cators) and SLAs (service-level agree-
of Service
ments), see the “Service Level Objec-
tives” chapter in the SRE book. That
chapter also details how to choose
metrics that are meaningful for a par-
Availability
ticular service or system, which in turn
drives the choice of an appropriate SLO
for that service.
This article expands upon the topic
of SLOs to focus on service dependen-
cies. Specifically, we look at how the
availability of critical dependencies in-
forms the availability of a service, and
how to design in order to mitigate and
minimize critical dependencies.
Most services offered by Google aim
AS DETAILED IN Site Reliability Engineering: How to offer 99.99% (sometimes referred
Google Runs Production Systems1 (hereafter referred to as the “four 9s”) availability to us-
ers. Some services contractually com-
to as the SRE book), Google products and services mit to a lower figure externally but set
seek high-velocity feature development while a 99.99% target internally. This more
stringent target accounts for situations
maintaining aggressive service-level objectives (SLOs) in which users become unhappy with
for availability and responsiveness. An SLO says service performance well before a con-
that the service should almost always be up, and the tract violation occurs, as the number
one aim of an SRE team is to keep users
service should almost always be fast; SLOs also provide happy. For many services, a 99.99% in-
precise numbers to define what “almost always” ternal target represents the sweet spot
that balances cost, complexity, and
means for a particular service. SLOs are based on the availability. For some services, notably
following observation: global cloud services, the internal tar-
The vast majority of software services and systems get is 99.999%.
should aim for almost-perfect reliability rather than 99.99% Availability:
perfect reliability—that is, 99.999% or 99.99% rather Observations And Implications
Let’s examine a few key observations
than 100%—because users cannot tell the difference about and implications of designing
between a service being 100% available and less than and operating a 99.99% service and
“perfectly” available. There are many other systems in then move to a practical application.
Observation 1. Sources of outages.
the path between user and service (laptop, home WiFi, Outages originate from two main
ISP, the power grid ...), and those systems collectively sources: problems with the service it-

self and problems with the service’s critical dependencies must be signifi- time. For example, three complete out-
critical dependencies. A critical depen- cantly more than 99.99% available. ages per year that last 20 minutes each
dency is one that, if it malfunctions, Internally at Google, we use the result in a total of 60 minutes of outag-
causes a corresponding malfunction following rule of thumb: critical de- es. Even if the service worked perfectly
in the service. pendencies must offer one additional the rest of the year, 99.99% availability
Observation 2. The mathematics of 9 relative to your service—in the ex- (no more than 53 minutes of downtime
availability. Availability is a function of ample case, 99.999% availability—be- per year) would not be feasible.
the frequency and the duration of out- cause any service will have several crit- This implication is just math, but it
ages. It is measured through: ical dependencies, as well as its own is often overlooked, and can be very in-
˲˲ Outage frequency, or the inverse: idiosyncratic problems. This is called convenient.
MTTF (mean time to failure). the “rule of the extra 9.” Corollary to implications 1 and 2. If
˲˲ Duration, using MTTR (mean time If you have a critical dependency your service is relied upon for an avail-
to repair). Duration is defined as it is that does not offer enough 9s (a rela- ability level you cannot deliver, you
experienced by users: lasting from the tively common challenge!), you must should make energetic efforts to cor-
start of a malfunction until normal be- employ mitigation to increase the ef- rect the situation—either by increas-
havior resumes. fective availability of your dependency ing the availability level of your service
Thus, availability is mathematically (for example, via a capacity cache, fail- or by adding mitigation as described
defined as MTTF/(MTTF+MTTR), using open, graceful degradation in the earlier. Reducing expectations (that
IMAGE BY PLING /SHUT TERSTO CK
ing appropriate units. face of errors, and so on.) is, the published availability) is also
Implication 1. Rule of the extra 9. A Implication 2. The math vis-à-vis fre- an option, and often it is the correct
service cannot be more available than quency, detection time, and recovery choice: make it clear to the dependent
the intersection of all its critical de- time. A service cannot be more avail- service that it should either reengineer
pendencies. If your service aims to of- able than its incident frequency mul- its system to compensate for your ser-
fer 99.99% availability, then all of your tiplied by its detection and recovery vice’s availability or reduce its own tar-
practice
Key Definitions
Some of the terms and concepts used Failing safe means whatever behavior Operational readiness practice:
throughout this article may not be is required to prevent the system Exercises designed to ensure the team
familiar to readers who don’t specialize from falling into an unsafe mode supporting a service knows how to
in operations. when expected functionality suddenly respond effectively when an issue
doesn’t work. For example, a given arises, and that the service is resilient
Capacity cache: A cache that serves system might be able to fail open for a to disruption. For example, Google
precomputed results for API calls while by serving cached data, but then performs disaster-recovery test drills
or queries to a service, generating fail closed when that data becomes continuously to make sure that its
cost savings in terms of compute/IO stale (perhaps because past a certain services deliver continuous uptime
resource needs by reducing the volume point, the data is no longer useful). even if a large-scale disaster occurs.
of client traffic hitting the underlying
service. Failover: A strategy that handles failure Rollout policy: A set of principles
Unlike the more typical of a system component or service applied during a service rollout (a
performance/latency cache, a capacity instance by automatically routing deployment of any sort of software
cache is considered critical to service incoming requests to a different component or configuration) to
operation. A drop in the cache hit instance. For example, you might route reduce the scope of an outage in
rate or cache ratio below the SLO database queries to a replica database, the early stages of the rollout.
is considered a capacity loss. Some or route service requests to a replicated For example, a rollout policy
capacity caches may even sacrifice server pool in another datacenter. might specify that rollouts occur
performance (for example, redirecting progressively, on a 5%/20%/100%
to remote sites) or freshness (for Fallback: A mechanism that allows timeline, so that a rollout proceeds
example, CDNs) in order to meet hit a tool or system to use an alternative to a larger portion of customers
rate SLOs. source for serving results when a only when it passes the first
given component is unavailable. milestone without problems.
Customer isolation: Isolating For example, a system might fall Most problems will manifest
customers from each other may be back to using an in-memory cache when the service is exposed to
advantageous so that the behavior of of previous results. While the results a small number of customers,
one customer doesn’t impact other may be slightly stale, this behavior is allowing you to minimize the
customers. For example, you might better than outright failure. This type scope of the damage. Note that for
isolate customers from one another of fallback is an example of graceful a rollout policy to be effective in
based on their global traffic. When a degradation. minimizing damage, you must have
given customer sends a surge of traffic a mechanism in place for rapid
beyond what they’re provisioned for, Geographic isolation: You can build
additional reliability into your service rollback.
you can start throttling or rejecting this
excess traffic without impacting traffic by isolating particular geographic Rollback: This is the ability to revert
from other customers. zones to have no dependencies on each a set of changes that have been
other. For example, if you separate previously rolled out (fully or not) to a
Failing safe/failing open/failing North America and Australia into given service or system. For example,
closed: Strategies for gracefully separate serving zones, an outage you can revert configuration changes
tolerating the failure of a dependency. that occurs in Australia because of a or run a previous version of a binary
The “safe” strategy depends on traffic overload won’t also take out that’s known to be good.
context: failing open may be the safe your service in North America. Note
strategy in some scenarios, while that geographic isolation does come Sharding: Splitting a data
failing closed may be the safe strategy at increased cost: isolating these structure or service into shards is a
in others. geographic zones also means that management strategy based on the
Australia cannot borrow spare capacity principle that systems built for a
Failing open: When the trigger in North America. single machine’s worth of resources
normally required to authorize an don’t scale. Therefore, you can
action fails, failing open means to Graceful degradation: A service distribute resources such as CPU,
let some action happen, rather than should be “elastic” and not fail memory, disk, file handles, and
making a decision. For example, catastrophically under overload so on across multiple machines to
a building exit door that normally conditions and spikes—that is, you create smaller, faster, more easily
requires badge verification “fails open” should make your applications do managed parts of a larger whole.
to let you exit without verification something reasonable even if not all is
during a power failure. right. It is better to give users limited Tail latency: When setting a target
functionality than an error page. for the latency (response time) of a
Failing closed is the opposite of falling service, it is tempting to measure the
open. For example, a bank vault door Integration testing: The phase in average latency. The problem with this
denies all attempts to unlock it if software testing in which individual approach is that an average that looks
its badge reader cannot contact the software modules are combined acceptable can hide a “long tail” of very
access-control database. and tested as a group to verify that large outliers, where some users may
they function correctly together. experience terrible response times.
These “parts” may be code modules, Therefore, the SRE best practice is to
individual applications, client and measure and set targets for 95th- and/
server applications on a network, or 99th-percentile latency, with the goal
among others. Integration testing is of reducing this tail latency, not just
usually performed after unit testing average latency.
and before final validation testing.

practice
get. If you do not correct or address the ˲˲ Time allotted for an on-call re- ond-order dependencies need two ex-
discrepancy, an outage will inevitably sponder to start investigating an alert: tra 9s, third-order dependencies need
force the need to correct it. five minutes. (On-call means that a three extra 9s, and so on.
technical person is carrying a pager This inference is incorrect. It is
Practical Application that receives an alert when the service based on a naive model of a dependen-
Let’s consider an example service with is having an outage, based on a moni- cy hierarchy as a tree with constant fan-
a target availability of 99.99% and work toring system that tracks and reports out at each level. In such a model, as
through the requirements for both its SLO violations. Many Google services shown in Figure 1, there are 10 unique
dependencies and its outage responses. are supported by an SRE on-call rota- first-order dependencies, 100 unique
The numbers. Suppose your 99.99% tion that fields urgent issues.) second-order dependencies, 1,000
available service has the following ˲˲ Remaining time for an effective unique third-order dependencies,
characteristics: mitigation: 10 minutes and so on, leading to a total of 1,111
˲˲ One major outage and three mi- Implication. Levers to make a ser- unique services even if the architecture
nor outages of its own per year. Note vice more available. It’s worth looking is limited to four layers. A highly avail-
that these numbers sound high, but closely at the numbers just presented able service ecosystem with that many
a 99.99% availability target implies a because they highlight a fundamental independent critical dependencies is
20- to 30-minute widespread outage point: there are three main levers to clearly unrealistic.
and several short partial outages per make a service more reliable. A critical dependency can by itself
year. (The math makes two assump- ˲˲ Reduce the frequency of outages— cause a failure of the entire service (or
tions: that a failure of a single shard is via rollout policy, testing, design re- service shard) no matter where it ap-
not considered a failure of the entire views, and other tactics. pears in the dependency tree. There-
system from an SLO perspective, and ˲˲ Reduce the scope of the average fore, if a given component X appears
that the overall availability is comput- outage—via sharding, geographic iso- as a dependency of several first-order
ed with a weighted sum of regional/ lation, graceful degradation, or cus- dependencies of a service, X should be
shard availability.) tomer isolation. counted only once because its failure
˲˲ Five critical dependencies on oth- ˲˲ Reduce the time to recover—via will ultimately cause the service to fail
er, independent 99.999% services. monitoring, one-button safe actions no matter how many intervening ser-
˲˲ Five independent shards, which (for example, rollback or adding emer- vices are also affected.
cannot fail over to one another. gency capacity), operational readiness The correct rule is as follows:
˲˲ All changes are rolled out progres- practice, and so on. ˲˲ If a service has N unique critical
sively, one shard at a time. You can trade among these three dependencies, then each one contrib-
The availability math plays out as levers to make implementation easier. utes 1/N to the dependency-induced
follows. For example, if a 17-minute MTTR is unavailability of the top-level service,
difficult to achieve, instead focus your regardless of its depth in the depen-
Dependency requirements. efforts on reducing the scope of the dency hierarchy.
˲˲ The total budget for outages for the average outage. Strategies for minimiz- ˲˲ Each dependency should be count-
year is 0.01% of 525,600 minutes/year, ing and mitigating critical dependen- ed only once, even if it appears multiple
or 53 minutes (based on a 365-day year, cies are discussed in more depth later times in the dependency hierarchy (in
which is the worst-case scenario). in this article. other words, count only unique depen-
˲˲ The budget allocated to outages dencies). For example, when counting
of critical dependencies is five inde- Clarifying the “Rule of the Extra 9” dependencies of Service A in Figure 2,
pendent critical dependencies, with for Nested Dependencies count Service B only once toward the
a budget of 0.001% each = 0.005%; A casual reader might infer that each total N.
0.005% of 525,600 minutes/year, or additional link in a dependency chain For example, consider a hypo-
26 minutes. calls for an additional 9, such that sec- thetical Service A, which has an error
˲˲ The remaining budget for outages
caused by your service, accounting for Figure 1. Dependency hierarchy: Incorrect model.
outages of critical dependencies, is 53
- 26 = 27 minutes.
example
Outage response requirements.

˲˲ Expected number of outages: 4 (1
full outage, 3 outages affecting a single first order
shard only)
˲˲ Aggregate impact of expected out-
ages: (1 x 100%) + (3 x 20%) = 1.6
˲˲ Time available to detect and recov-
second order
er from an outage: 27/1.6 = 17 minutes
˲˲ Monitoring time allotted to detect
and alert for an outage: 2 minutes
practice
budget of 0.01%. The service owners ate, because the amount of allowable infrastructure is being used correctly.
are willing to spend half that budget downtime is small. Be explicit in identifying the owners
on their own bugs and losses, and Error budgets eliminate the struc- of shared infrastructure as additional
half on critical dependencies. If the tural tension that might otherwise stakeholders. Also, beware of over-
service has N such dependencies, develop between SRE and product loading your dependencies—coordi-
each dependency receives 1/Nth of development teams by giving them a nate launches carefully with the own-
the remaining error budget. Typical common, data-driven mechanism for ers of these dependencies.
services often have about five to 10 assessing launch risk. They also give Internal vs. external dependencies.
critical dependencies, and therefore both SRE and product development Sometimes a product or service de-
each one can fail only one-tenth or teams a common goal of developing pends on factors beyond company con-
one-twentieth as much as Service A. practices and technology that allow trol—for example, code libraries, or
Hence, as a rule of thumb, a service’s faster innovation and more launches services or data provided by third par-
critical dependencies must have one without “blowing the budget.” ties. Identifying these factors allows
extra 9 of availability. you to mitigate the unpredictability
Strategies for Minimizing and they entail.
Error Budgets Mitigating Critical Dependencies Engage in thoughtful system plan-
The concept of error budgets is covered Thus far, this article has established ning and design. Design your system
quite thoroughly in the SRE book,1 but what might be called the “Golden Rule with the following principles in mind.
bears mentioning here. Google SRE of Component Reliability.” This sim- Redundancy and isolation. You can
uses error budgets to balance reliabil- ply means that any critical component seek to mitigate your reliance upon a
ity and the pace of innovation. This must be 10 times as reliable as the over- critical dependency by designing that
budget defines the acceptable level of all system’s target, so that its contribu- dependency to have multiple indepen-
failure for a service over some period of tion to system unreliability is noise. It dent instances. For example, if storing
time (often a month). An error budget follows that in an ideal world, the aim data in one instance provides 99.9%
is simply 1 minus a service’s SLO, so is to make as many components as pos- availability for that data, then storing
the previously discussed 99.99% avail- sible noncritical. Doing so means the three copies in three widely distributed
able service has a 0.01% “budget” for components can adhere to a lower re- instances provides a theoretical avail-
unavailability. As long as the service liability standard, gaining freedom to ability level of 1 - 0.013, or nine 9s, if
hasn’t spent its error budget for the innovate and take risks. instance failures are independent with
month, the development team is free The most basic and obvious strat- zero correlation.
(within reason) to launch new features, egy to reduce critical dependencies is In the real world, the correlation
updates, and so on. to eliminate single points of failure is never zero (consider network back-
If the error budget is spent, the (SPOFs) whenever possible. The larg- bone failures that affect many cells
service freezes changes (except for er system should be able to operate concurrently), so the actual avail-
urgent security fixes and changes ad- acceptably without any given compo- ability will be nowhere close to nine
dressing what caused the violation in nent that’s not a critical dependency 9s but is much higher than three 9s.
the first place) until either the service or SPOF. Also note that if a system or service
earns back room in the budget, or the In reality, you likely cannot get is “widely distributed,” geographic
month resets. Many services at Google rid of all critical dependencies, but separation is not always a good proxy
use sliding windows for SLOs, so the you can follow some best practices for uncorrelated failures. You may be
error budget grows back gradually. For around system design to optimize re- better off using more than one system
mature services with an SLO greater liability. While doing so isn’t always in nearby locations than the same sys-
than 99.99%, a quarterly rather than possible, it is easier and more effec- tem in distant locations.
monthly budget reset is appropri- tive to achieve system reliability if you Similarly, sending an RPC (remote
plan for reliability during the design procedure call) to one pool of serv-
Figure 2. Multiple dependencies in and planning phases, rather than af- ers in one cluster may provide 99.9%
the dependency hierarchy.
ter the system is live and impacting availability for results, but sending
actual users. three concurrent RPCs to three dif-
service A Conduct architecture/design re- ferent server pools and accepting the
views. When you are contemplating a first response that arrives helps in-
new system or service, or refactoring crease availability to well over three 9s
or improving an existing system or ser- (noted earlier). This strategy can also
service C
vice, an architecture or design review reduce tail latency if the server pools
can identify shared infrastructure and are approximately equidistant from
service B
internal vs. external dependencies. the RPC sender. (Since there is a high
Shared infrastructure. If your service cost to sending three RPCs concur-
is using shared infrastructure—for ex- rently, Google often stages the timing
ample, an underlying database service of these calls strategically: most of our
service B
used by multiple user-visible prod- systems wait a fraction of the allotted
ucts—think about whether or not that time before sending the second RPC,

practice
and a bit more time before sending trigger safe rollbacks. some or many of the concepts this ar-
the third RPC.) Systematically examine all possible ticle has covered, assembling this in-
Failover and fallback. Pursue soft- failure modes. Examine each compo- formation and putting it into concrete
ware rollouts and migrations that fail nent and dependency and identify the terms may make the concepts easier to
safe and are automatically isolated impact of its failure. Ask yourself the understand and teach. Its recommen-
should a problem arise. The basic prin- following questions: dations are uncomfortable but not
ciple at work here is that by the time ˲˲ Can the service continue serving in unattainable. A number of Google ser-
you bring a human online to trigger degraded mode if one of its dependen- vices have consistently delivered better
a failover, you have likely already ex- cies fails? In other words, design for than four 9s of availability, not by su-
ceeded your error budget. graceful degradation. perhuman effort or intelligence, but by
Where concurrency/voting is not ˲˲ How do you deal with unavailabili- thorough application of principles and
possible, automate failover and fall- ty of a dependency in different scenari- best practices collected and refined
back. Again, if the issue needs a hu- os? Upon startup of the service? During over the years (see SRE’s Appendix B: A
man to check what the problem is, the runtime? Collection of Best Practices for Produc-
chances of meeting your SLO are slim. Conduct thorough testing. Design tion Services).
Asynchronicity. Design dependen- and implement a robust testing envi-
cies to be asynchronous rather than ronment that ensures each dependen- Acknowledgments
synchronous where possible so that cy has its own test coverage, with tests Thank you to Ben Lutch, Dave Rensin,
they don’t accidentally become criti- that specifically address use cases that Miki Habryn, Randall Bosetti, and Pat-
cal. If a service waits for an RPC re- other parts of the environment expect. rick Bernier for their input.
sponse from one of its noncritical Here are a few recommended strate-
dependencies and this dependency gies for such testing:
has a spike in latency, the spike will ˲˲ Use integration testing to perform Related articles
on queue.acm.org
unnecessarily hurt the latency of the fault injection—verify that your system
parent service. By making the RPC can survive failure of any of its depen- There’s Just No Getting Around It:
You’re Building a Distributed System
call to a noncritical dependency asyn- dencies.
Mark Cavage
chronous, you can decouple the la- ˲˲ Conduct disaster testing to iden-
http://queue.acm.org/detail.cfm?id=2482856
tency of the parent service from the tify weaknesses or hidden/unexpected
Eventual Consistency Today:
latency of the dependency. While dependencies. Document follow-up Limitations, Extensions, and Beyond
asynchronicity may complicate code actions to rectify the flaws you uncover. Peter Bailis and Ali Ghodsi
and infrastructure, this trade-off will ˲˲ Don’t just load test. Deliberately http://queue.acm.org/detail.cfm?id=2462076
be worthwhile. overload your system to see how it A Conversation with Wayne Rosing
Capacity planning. Make sure that degrades. One way or another, your David J. Brown
every dependency is correctly provi- system’s response to overload will be http://queue.acm.org/detail.cfm?id=945162
sioned. When in doubt, overprovision tested; better to perform these tests
if the cost is acceptable. yourself than to leave load testing to Reference
Configuration. When possible, your users. 1. Beyer, B., Jones, C., Petoff, J., Murphy, N.R. Site
Reliability Engineering: How Google Runs Production
standardize configuration of your de- Plan for the future. Expect changes Systems. O’Reilly Media, 2016; https://landing.google.
com/sre/book.html.
pendencies to limit inconsistencies that come with scale: a service that be-
among subsystems and avoid one-off gins as a relatively simple binary on a
failure/error modes. single machine may grow to have many Ben Treynor started programming at age six and
joined Oracle as a software engineer at age 17. He has
Detection and troubleshooting. Make obvious and nonobvious dependen- also worked in engineering management at E.piphany,
SEVEN, and Google (2003-present). His current team
detecting, troubleshooting, and diag- cies when deployed at a larger scale. of approximately 4,200 at Google is responsible for Site
nosing issues as simple as possible. Every order of magnitude in scale will Reliability Engineering, networking, and datacenters
worldwide.
Effective monitoring is a crucial com- reveal new bottlenecks—not just for
ponent of being able to detect issues in your service, but for your dependencies Mike Dahlin is a distinguished engineer at Google, where
he has worked on Google’s Cloud Platform since 2013.
a timely fashion. Diagnosing a system as well. Consider what happens if your Prior to joining Google, he was a professor of computer
science at the University of Texas at Austin.
with deeply nested dependencies is dif- dependencies cannot scale as fast as
ficult. Always have an answer for miti- you need them to. Vivek Rau is an SRE manager at Google and a founding
member of the Launch Coordination Engineering sub-team
gating failures that doesn’t require an Also be aware that system depen- of SRE. Prior to joining Google, he worked at Citicorp
operator to investigate deeply. dencies evolve over time and that your Software, Versant, and E.piphany. He currently manages
various SRE teams tasked with tracking and improving the
Fast and reliable rollback. Introduc- list of dependencies may very well reliability of Google’s Cloud Platform.
ing humans into a mitigation plan sub- grow over time. When it comes to in- Betsy Beyer is a technical writer for Google, specializing
stantially increases the risk of miss- frastructure, Google’s typical design in Site Reliability Engineering. She has previously written
documentation for Google’s Data Center and Hardware
ing a tight SLO. Build systems that are guideline is to build a system that will Operations Teams. She was formerly a lecturer on
easy, fast, and reliable to roll back. As scale to 10 times the initial target load technical writing at Stanford University.
your system matures and you gain con- without significant design changes.
fidence in your monitoring to detect
problems, you can lower MTTR by en- Conclusion Copyright held by owner/authors.
gineering the system to automatically While readers are likely familiar with Publication rights licensed to ACM. $15.00.
practice
DOI:10.1145/ 3080008
scale, the amount of information may

queue.acm.org
be too large to store in an impoverished
setting (say, an embedded device) or to
keep conveniently in fast storage.
The approximate approach is In response to this challenge, the
often faster and more efficient. model of streaming data processing
has grown in popularity. The aim is no
BY GRAHAM CORMODE longer to capture, store, and index ev-
ery minute event, but rather to process
each observation quickly in order to
Data
create a summary of the current state.
Following its processing, an event is
dropped and is no longer accessible.
The summary that is retained is often
referred to as a sketch of the data.
Coping with the vast scale of infor-
Sketching
mation means making compromises:
The description of the world is approx-
imate rather than exact; the nature of
queries to be answered must be decid-
ed in advance rather than after the fact;
and some questions are now insoluble.
The ability to process vast quantities of
data at blinding speeds with modest re-
sources, however, can more than make
up for these limitations.
As a consequence, streaming meth-
ods have been adopted in a number
DO YO U EVER feel overwhelmed by an unending stream of domains, starting with telecom-
of information? It can seem like a barrage of new munications but spreading to search
engines, social networks, finance, and
email and text messages demands constant attention, time-series analysis. These ideas are
and there are also phone calls to pick up, articles to also finding application in areas using
traditional approaches, but where the
read, and knocks on the door to answer. Putting these rough-and-ready sketching approach
pieces together to keep track of what is important can is more cost effective. Successful appli-
be a real challenge. cations of sketching involve a mixture
of algorithmic tricks, systems know-
The same information overload is a concern in how, and mathematical insight, and
many computational settings. Telecommunications have led to new research contributions
in each of these areas.
companies, for example, want to keep track of the This article introduces the ideas be-
activity on their networks, to identify overall network hind sketching, with a focus on algo-
health and spot anomalies or changes in behavior. Yet, rithmic innovations. It describes some
algorithmic developments in the ab-
the scale of events occurring is huge: many millions of stract, followed by the steps needed to
network events per hour, per network element. While put them into practice, with examples.
The article also looks at four novel al-
new technologies allow the scale and granularity gorithmic ideas and discusses some
of events being monitored to increase by orders of emerging areas.
magnitude, the capacity of computing elements
Simply Sampling
(processors, memory, and disks) to make sense of When faced with a large amount of
these is barely increasing. Even on a small information to process, there may be

a strong temptation just to ignore it With standard statistical results, for cords is not guaranteed to be random;
entirely. A slightly more principled ap- questions like those in the customer there may be clustering through the
proach is just to ignore most of it—that records example, the standard error data. You need to ensure every record
is, take a small number of examples of a sample of size s is proportional to has an equal chance of being included
from the full dataset, perform the com- 1/√s. Roughly speaking, this means in the sample. This can be achieved
putation on this subset, and then try to that in estimating a proportion from by using standard random-number
extrapolate to the full dataset. To give the sample, the error would be expect- generators to pick which records to in-
a good estimation, the examples must ed to look like ±1/√s. Therefore, look- clude in the sample. A common trick
be randomly chosen. This is the realm ing at the voting intention of a subset is to attach a random number to each
of sampling. of 1,000 voters produces an opinion record, then sort the data based on this
There are many variations of sam- poll whose error is approximately 3%— random tag and take the first s records
pling, but this article uses the most providing high confidence (but not cer- in the sorted order. This works fine, as
basic: uniform random sampling. Con- tainty) that the true answer is within long as sorting the full dataset is not
sider a large collection of customer 3% of the result on the sample, assum- too costly.
records. Randomly selecting a small ing the sample was drawn randomly Finally, how do you maintain the
number of records provides the sam- and the participants responded hon- sample as new items are arriving? A
ple. Then various questions can be an- estly. Increasing the size of the sample simple approach is to pick every record
swered accurately by looking only at the causes the error to decrease in a pre- with probability p, for some chosen
sample: for example, estimating what dictable, albeit expensive, way: reduc- value of p. When a new record comes,
fraction of customers live in a certain ing the margin of error of an opinion pick a random fraction between 0 and
PHOTO BY TA F FPIXTURE
city or have bought a certain product. poll to 0.3% would require contacting 1, and if it is smaller than p, put the re-
The method. To flesh this out, let’s 100,000 voters. cord in the sample. The problem with
fill in a few gaps. First, how big should Second, how should the sample be this approach is that you do not know
the sample be to supply good answers? drawn? Simply taking the first s re- in advance what p should be. In the
practice
previous analysis a fixed sample size tion that requires detailed knowledge query for any recorded attribute of the
s was desired, and using a fixed sam- of individual records in the data can- sampled items.
pling rate p means there are too few el- not be answered by sampling. For ex- Because of its flexibility, sampling
ements initially, but then too many as ample, if you want to know whether is a powerful and natural way of build-
more records arrive. one specific individual is among your ing a sketch of a large dataset. There
Presented this way, the question customers, then a sample will leave you are many different approaches to sam-
has the appearance of an algorithmic uncertain. If the customer is not in the pling that aim to get the most out of
puzzle, and indeed this was a com- sample, you do not know whether this the sample or to target different types
mon question in technical interviews is because that person is not in the data of queries that the sample may be used
for many years. One can come up with or because he or she did not happen to to answer.11 Here, more information is
clever solutions that incrementally ad- be sampled. A question like this ulti- presented about less flexible methods
just p as new records arrive. A simple mately needs all the presence informa- that address some of these limitations
and elegant way to maintain a sample tion to be recorded and is answered by of sampling.
is to adapt the idea of random tags. At- highly compact encodings such as the
tach to each record a random tag, and Bloom filter (described later). Summarizing Sets
define the sample to be the s records A more complex example is when with Bloom Filters
with the smallest tag values. As new the question involves determining the The Bloom filter is a compact data
records arrive, the tag values decide cardinality of quantities. In a dataset structure that summarizes a set of
whether to add the new record to the that has many different values, how items. Any computer science data-
sample (and to remove an old item to many distinct values of a certain type structures class is littered with exam-
keep the sample size fixed at s). are there? For example, how many dis- ples of “dictionary” data structures,
Discussion and applications. Sam- tinct surnames are in a particular cus- such as arrays, linked lists, hash ta-
pling methods are so ubiquitous that tomer dataset? Using a sample does bles, and many esoteric variants of
there are many examples to consider. not reveal this information. Let’s say in balanced tree structures. The com-
One simple case is within database a sample size of 1,000 out of one mil- mon feature of these structures is
systems. It is common for the database lion records, 900 surnames occur just that they can all answer “membership
management system to keep a sample once among the sampled names. What questions” of the form: Is a certain
of large relations for the purpose of can you conclude about the popularity item stored in the structure or not?
query planning. When determining of these names in the rest of the data- The Bloom filter can also respond to
how to execute a query, evaluating dif- set? It might be that almost every other such membership questions. The an-
ferent strategies provides an estimate name in the full dataset is also unique. swers given by the structure, however,
of how much data reduction may occur Or it might be that each of the unique are either “the item has definitely not
at each step, with some uncertainty of names in the sample reoccurs tens or been stored” or “the item has probably
course. Another example comes from hundreds of times in the remainder been stored.” This introduction of un-
the area of data integration and link- of the data. With the sampled infor- certainty over the state of an item (it
age, in which a subproblem is to test mation there is no way to distinguish might be thought of as introducing po-
whether two columns from separate between these two cases, which leads tential false positives) allows the filter
tables can relate to the same set of en- to huge confidence intervals on these to use an amount of space that is much
tities. Comparing the columns in full kinds of statistics. Tracking informa- smaller than its exact relatives. The fil-
can be time consuming, especially tion about cardinalities, and omitting ter also does not allow listing the items
when you want to test all pairs of col- duplicates, is addressed by techniques that have been placed into it. Instead,
umns for compatibility. Comparing a such as HyperLogLog, addressed later. you can pose membership questions
small sample is often sufficient to de- Finally, there are quantities that only for specific items.
termine whether the columns have any samples can estimate, but for which The method. To understand the fil-
chance of relating to the same entities. better special-purpose sketches ex- ter, it is helpful to think of a simple ex-
Entire books have been written on ist. Recall that the standard error of a act solution to the membership prob-
the theory and practice of sampling, sample of size s is 1/√s. For problems lem. Suppose you want to keep track
particularly around schemes that try such as estimating the frequency of of which of a million possible items
to sample the more important ele- a particular attribute (such as city of you have seen, and each one is help-
ments preferentially, to reduce the er- residence), you can build a sketch of fully labeled with its ID number (an
ror in estimating from the sample. For size s so the error it guarantees is pro- integer between one and a million).
a good survey with a computational portional to 1/s. This is considerably Then you can keep an array of one
perspective, see Synopses for Massive stronger than the sampling guarantee million bits, initialized to all 0s. Every
Data: Samples, Histograms, Wavelets and only improves as we devote more time you see an item i, you just set the
and Sketches.11 space s to the sketch. The Count-Min ith bit in the array to 1. A lookup query
Given the simplicity and general- sketch described later in this article for item j is correspondingly straight-
ity of sampling, why would any other has this property. One limitation is that forward: just see whether bit j is a 1
method be needed to summarize data? the attribute of interest must be speci- or a 0. The structure is very compact:
It turns out that sampling is not well fied in advance of setting up the sketch, 125KB will suffice if you pack the bits
suited for some questions. Any ques- while a sample allows you to evaluate a into memory.

practice
Real data, however, is rarely this positive is approximately exp(k ln(1 that keeping the full database, as part
nicely structured. In general, you exp(kn/m))).4 While extensive study of of the browser would be unwieldy, es-
might have a much larger set of possi- this expression may not be rewarding pecially on mobile devices.
ble inputs—think again of the names in the short term, some simple analy- Instead, a Bloom filter encoding of
of customers, where the number of sis shows that this rate is minimized the database can be included with the
possible name strings is huge. You by picking k = (m/n) ln 2. This corre- browser, and each URL visited can be
can nevertheless adapt your bit-array sponds to the case when about half the checked against it. The consequence
approach by borrowing from a differ- bits in the filter are 1 and half are 0. of a false positive is that the browser
ent dictionary structure. Imagine the For this to work, the number of bits may believe that an innocent site is on
bit array is a hash table: you will use a in the filter should be some multiple of the bad list. To handle this, the brows-
hash function h to map from the space the number of items that you expect to er can contact the database author-
of inputs onto the range of indices for store in it. A common setting is m = 10n ity and check whether the full URL is
your table. That is, given input i, you and k = 7, which means a false posi- on the list. Hence, false positives are
now set bit hi to 1. Of course, now you tive rate below 1%. Note that there is removed at the cost of a remote data-
have to worry about hash collisions no magic here that can compress data base lookup.
in which multiple entries might map beyond information-theoretical limits: Notice the effect of the Bloom filter:
onto the same bit. A traditional hash under these parameters, the Bloom fil- it gives the all clear to most URLs and
table can handle this, as you can keep ter uses about 10 bits per item and must incurs a slight delay for a small frac-
information about the entries in the use space proportional to the number tion (or when a bad URL is visited).
table. If you stick to your guns and of different items stored. This is a mod- This is preferable both to the solution
keep the bits only in the bit array, est savings when representing integer of keeping a copy of the database with
however, false positives will result: if values but is a considerable benefit the browser and to doing a remote
you look up item i, it may be that entry when the items stored have large de- lookup for every URL visited. Brows-
h i is set to 1, but i has not been seen; scriptions—say, arbitrary strings such ers such as Chrome and Firefox have
instead, there is some item j that was as URLs. Storing these in a traditional adopted this concept. Current versions
seen, where h(i) = h(j). structure such as a hash table or bal- of Chrome use a variation of the Bloom
Can you fix this while sticking to a bit anced search tree would consume filter based on more directly encoding
array? Not entirely, but you can make tens or hundreds of bytes per item. a list of hashed URLs, since the local
it less likely. Rather than just hashing A simple example is shown in Figure copy does not have to be updated dy-
each item i once, with a single hash 1, where an item i is mapped by k = 3 namically and more space can be saved
function, use a collection of k hash hash functions to a filter of size m = 12, this way.
functions h1, h2, . . . h k, and map i with and these entries are set to 1. The Bloom filter was introduced
each of them in turn. All the bits corre- Discussion and applications. The in 1970 as a compact way of storing a
sponding to h1(i), h2(i) . . . h k(i) are possibility of false positives needs to dictionary, when space was really at a
set to 1. Now to test membership of j, be handled carefully. Bloom filters are premium.3 As computer memory grew,
check all the entries it is hashed to, and at their most attractive when the con- it seemed that the filter was no longer
say no if any of them are 0. sequence of a false positive is not the needed. With the rapid growth of the
There’s clearly a trade-off here: Ini- introduction of an error in a computa- Web, however, a host of applications
tially, adding extra hash functions re- tion, but rather when it causes some for the filter have been devised since
duces the chances of a false positive as additional work that does not adversely around the turn of the century.4 Many
more things need to “go wrong” for an impact the overall performance of the of these applications have the flavor
incorrect answer to be given. As more system. A good example comes in the of the preceding example: the filter
and more hash functions are added, context of browsing the Web. It is now gives a fast answer to lookup queries,
however, the bit array gets fuller and common for Web browsers to warn us- and positive answers may be double-
fuller of 1 values, and therefore colli- ers if they are attempting to visit a site checked in an authoritative reference.
sions are more likely. This trade-off can that is known to host malware. Check- Bloom filters have been widely used
be analyzed mathematically, and the ing the URL against a database of “bad” to avoid storing unpopular items in
sweet spot found that minimizes the URLs does this. The database is large caches. This enforces the rule that an
chance of a false positive. The analysis enough, and URLs are long enough, item is added to the cache only if it has
works by assuming that the hash func-
tions look completely random (which Figure 1. Bloom filter with K=3, M=12.
is a reasonable assumption in prac-
tice), and by looking at the chance that i
an arbitrary element not in the set is
reported as present.
If n distinct items are being stored
in a Bloom filter of size m, and k hash
functions are used, then the chance of
0 1 1 0 0 0 1 1 0 0 0 1
a membership query that should re-
ceive a negative answer yielding a false
practice
been seen before. The Bloom filter is could in principle link to one or more item. The counter was also potentially
used to compactly represent the set of tweets, so allocating counters for each incremented by occurrences of other
items that have been seen. The con- is infeasible and unnecessary. Instead, items that were mapped to the same
sequence of a false positive is that a it is natural to look for a more compact location, however, since collisions are
small fraction of rare items might also way to encode counts of items, possibly expected. Given the collection of coun-
be stored in the cache, contradicting with some tolerable loss of fidelity. ters containing the desired count, plus
the letter of the rule. Many large dis- The Count-Min sketch is a data noise, the best guess at the true count
tributed databases (Google’s Bigtable, structure that allows this trade-off to of the desired item is to take the small-
Apache’s Cassandra and HBase) use be made. It encodes a potentially mas- est of these counters as your estimate.
Bloom filters as indexes on distributed sive number of item types in a small ar- Figure 2 shows the update proc-
chunks of data. They use the filter to ray. The guarantee is that large counts ess: an item i is mapped to one entry
keep track of which rows or columns will be preserved fairly accurately, in each row j by the hash function hj,
of the database are stored on disk, thus while small counts may incur greater and the update of c is added to each
avoiding a (costly) disk access for non- (relative) error. This means it is good entry. It can also be seen as modeling
existent attributes. for applications where you are inter- the query process: a query for the same
ested in the head of a distribution and item i will result in the same set of lo-
Counting with Count-Min Sketch less so in its tail. cations being probed, and the smallest
Perhaps the canonical data summari- The method. At first glance, the value returned as the answer.
zation problem is the most trivial: to sketch looks quite like a Bloom filter, as Discussion and applications. As
count the number of items of a certain it involves the use of an array and a set with the Bloom filter, the sketch
type that have been observed, you do of hash functions. There are significant achieves a compact representation of
not need to retain each item. Instead, differences in the details, however. The the input, with a trade-off in accuracy.
a simple counter suffices, incremented sketch is formed by an array of coun- Both provide some probability of an
with each observation. The counter has ters and a set of hash functions that unsatisfactory answer. With a Bloom
to be of sufficient bit depth in order to map items into the array. More precise- filter, the answers are binary, so there
cope with the magnitude of events ob- ly, the array is treated as a sequence of is some chance of a false positive re-
served. When the number of events rows, and each item is mapped by the sponse; with a Count-Min sketch, the
gets truly huge, ideas such as Robert first hash function into the first row, answers are frequencies, so there is
Morris’s approximate counter can be by the second hash function into the some chance of an inflated answer.
used to provide such a counter in fewer second row, and so on (note that this What may be surprising at first
bits12 (another example of a sketch). is in contrast to the Bloom filter, which is that the obtained estimate is very
When there are different types of allows the hash functions to map onto good. Mathematically, it can be shown
items, and you want to count each type, overlapping ranges). An item is pro- that there is a good chance that the
the natural approach is to allocate a cessed by mapping it to each row in returned estimate is close to the cor-
counter for each item. When the num- turn via the corresponding hash func- rect value. The quality of the estimate
ber of item types grows huge, however, tion and incrementing the counters to depends on the number of rows in the
you encounter difficulties. It may not which it is mapped. sketch (each additional row halves the
be practical to allocate a counter for Given an item, the sketch allows its probability of a bad estimate) and on
each item type. Even if it is, when the count to be estimated. This follows a the number of columns (doubling the
number of counters exceeds the capac- similar outline to processing an up- number of columns halves the scale of
ity of fast memory, the time cost of in- date: inspect the counter in the first the noise in the estimate). These guar-
crementing the relevant counter may row where the item was mapped by the antees follow from the random selec-
become too high. For example, a social first hash function, and the counter in tion of hash functions and do not rely
network such as Twitter may wish to the second row where it was mapped on any structure or pattern in the data
track how often a tweet is viewed when by the second hash, and so on. Each distribution that is being summarized.
displayed via an external website. There row has a counter that has been in- For a sketch of size s, the error is pro-
are billions of Web pages, each of which cremented by every occurrence of the portional to 1/s. This is an improve-
ment over the case for sampling where,
Figure 2. Count-min sketch data structure with four rows, nine columns. as noted earlier, the corresponding be-
havior is proportional to 1/√s.
Just as Bloom filters are best suited
+c for the cases where false positives can
h1
be tolerated and mitigated, Count-Min
+c sketches are best suited for handling
i a slight inflation of frequency. This
+c means, in particular, they do not ap-
hd
ply to cases where a Bloom filter might
+c be used: if it matters a lot whether an
item has been seen or not, then the
uncertainty that the Count-Min sketch

practice
introduces will obscure this level of have been seen out of a large set of
precision. The sketches are very good possibilities. For example, a Web pub-
for tracking which items exceed a giv- lisher might want to track how many
en popularity threshold, however. In different people have been exposed
particular, while the size of a Bloom
filter must remain proportional to the Successful to a particular advertisement. In this
case, you would not want to count the
size of the input it is representing, a
Count-Min sketch can be much more
applications of same viewer more than once. When
the number of possible items is not too
compressive: its size can be considered sketching involve large, keeping a list, or a binary array,
to be independent of the input size, de-
pending instead on the desired accu-
a mixture of is a natural solution. As the number of
possible items becomes very large, the
racy guarantee only (that is, to achieve algorthmic tricks, space needed by these methods grows
a target accuracy of ε, fix a sketch size of
s proportional to 1/ε that does not vary
systems know-how, proportional to the number of items
tracked. Switching to an approximate
over the course of processing data). and mathematical method such as a Bloom filter means
The Twitter scenario mentioned pre-
viously is a good example. Tracking the insight, and have the space remains proportional to the
number of distinct items, although the
number of views that a tweet receives led to new research constants are improved.
across each occurrence in different
websites creates a large enough volume contributions in Could you hope to do better? If
you just counted the total number of
of data to be difficult to manage. More-
over, the existence of some uncertainty
each of these areas. items, without removing duplicates,
then a simple counter would suffice,
in this application seems acceptable: using a number of bits that is propor-
the consequences of inflating the pop- tional to the logarithm of the number
ularity of one website for one tweet are of items encountered. If only there
minimal. Using a sketch for each tweet were a way to know which items were
consumes only moderately more space new, and count only those, then you
than the tweet and associated meta- could achieve this cost.
data, and allows tracking which venues The HyperLogLog (HLL) algorithm
attract the most attention for the tweet. promises something even stronger: the
Hence, a kilobyte or so of space is suf- cost needs to depend only on the loga-
ficient to track the percentage of views rithm of the logarithm of the quantity
from different locations, with an error computed. Of course, there are some
of less than one percentage point, say. scaling constants that mean the space
Since their introduction over a de- needed is not quite so tiny as this might
cade ago,7 Count-Min sketches have suggest, but the net result is that quan-
found applications in systems that track tities can be estimated with high preci-
frequency statistics, such as popularity sion (say, up to a 1%–2% error) with a
of content within different groups—say, couple of kilobytes of space.
online videos among different sets of us- The method. The essence of this
ers, or which destinations are popular method is to use hash functions ap-
for nodes within a communications net- plied to item identifiers to determine
work. Sketches are used in telecommu- how to update counters so that dupli-
nications networks where the volume of cate items are treated identically. A
data passing along links is immense and Bloom filter has a similar property: at-
is never stored. Summarizing network tempting to insert an item already rep-
traffic distribution allows hotspots to be resented within a Bloom filter means
detected, informing network-planning setting a number of bits to 1 that are
decisions and allowing configuration already recording 1 values. One ap-
errors and floods to be detected and proach is to keep a Bloom filter and
debugged.6 Since the sketch compactly look at the final density of 1s and 0s to
encodes a frequency distribution, it can estimate the number of distinct items
also be used to detect when a shift in represented (taking into account col-
popularities occurs, as a simple example lisions under hash functions). This
of anomaly detection. still requires space proportional to the
number of items and is the basis of ear-
Counting Distinct Items ly approaches to this problem.15
with HyperLogLog To break this linearity, a different
Another basic problem is keeping approach to building a binary coun-
track of how many different items ter is needed. Instead of adding 1 to
practice
the counter for each item, you could A last interesting application of dis-
3 2 1
add 1 with a probability of one-half, tinct counting is in the context of social
2 with a probability of one-fourth, 4 network analysis. In 2016, Facebook set
with a probability of 1/8th, and so on. The estimate is obtained by taking out to test the “six degrees of separa-
This use of randomness decreases the 2 to the power of each of the array en- tion” claim within its social network.
reliability of the counter, but you can tries and computing the sum of the The Facebook friendship graph is suffi-
check that the expected count corre- reciprocals of these values, obtaining ciently large (more than a billion nodes
sponds to the true number of items 1/8 + 1/4 + 1/2 = 7/8 in this case. The and hundreds of billions of edges)
encountered. This makes more sense final estimate is made by multiplying that maintaining detailed information
when using hash functions. Apply a αss2 by the reciprocal of this sum. Here, about the distribution of long-range
hash function g to each item i, with αs is a scaling constant that depends on connections for each user would be in-
the same distribution: g maps items to s. α3 = 0.5305, so 5.46 is obtained as the feasible. Essentially, the problem is to
j with probability 2−j (say, by taking the estimate—close to the true value of 5. count, for each user, how many friends
number of leading zero bits in the bi- The analysis of the algorithm is they have at distance 1, 2, 3, and so on.
nary expansion of a uniform hash val- rather technical, but the proof is in the This would be a simple graph explora-
ue). You can then keep a set of bits in- deployment: the algorithm has been tion problem, except that some friends
dicating which j values have been seen widely adopted and applied in practice. at distance 2 are reachable by multiple
so far. This is the essence of the early Discussion and applications. One paths (via different mutual friends).
Flajolet-Martin approach to tracking example of HLL’s use is in tracking Hence, distinct counting is used to gen-
the number of distinct items.8 Here a the viewership of online advertising. erate accurate statistics on reachability
logarithmic number of bits is needed, Across many websites and differ- without double counting and to provide
as there are only this many distinct j ent advertisements, trillions of view accurate distance distributions (the es-
values expected. events may occur every day. Advertis- timated number of degrees of separa-
The HLL method reduces the num- ers are interested in the number of tion in the Facebook graph is 3.57).2
ber of bits further by retaining only the “uniques:” how many different people
highest j value that has been seen when (or rather, browsing devices) have been Advanced Sketching
applying the hash function. This might exposed to the content. Collecting and Roughly speaking, the four examples
be expected to be correlated to the car- marshaling this data is not infeasible, of sketching described in this article
dinality, although with high variation but rather unwieldy, especially if it cover most of the current practical ap-
for example, there might be only a sin- is desired to do more advanced que- plications of this model of data sum-
gle item seen, which happens to hash to ries (say, to count how many uniques marization. Yet, unsurprisingly, there
a large value. To reduce this variation, saw both of two particular advertise- is a large body of research into new
the items are partitioned into groups ments). Use of HLL sketches allows applications and variations of these
using a second hash function (so the this kind of query to be answered di- ideas. Just around the corner are a host
same item is always placed in the same rectly by combining the two sketches of new techniques for data summariza-
group), and information about the larg- rather than trawling through the full tion that are on the cusp of practicality.
est hash in each group is retained. Each data. Sketches have been put to use This section mentions a few of the di-
group yields an estimate of the local car- for this purpose, where the small rections that seem most promising.
dinality; these are all combined to ob- amount of uncertainty from the use Sketching for dimensionality reduc-
tain an estimate of the total cardinality. of randomness is comparable to other tion. When dealing with large high-
A first effort would be to take the sources of error, such as dropped data dimensional numerical data, it is
mean of the estimates, but this still or measurement failure. common to seek to reduce the dimen-
allows one large estimate to skew the Approximate distinct counting is sionality while preserving fidelity of
result; instead, the harmonic mean also widely used behind the scenes the data. Assume the hard work of data
is used to reduce this effect. By hash- in Web-scale systems. For example, wrangling and modeling is done and
ing to s separate groups, the standard Google’s Sawzall system provides a the data can be modeled as a massive
error is proportional to 1/√s. A small variety of sketches, including count matrix, where each row is one example
example is shown in Figure 3. The fig- distinct, as primitives for log data point, and each column encodes an
ure shows a small example HLL sketch analysis.13 Google engineers have de- attribute of the data. A common tech-
with s = 3 groups. Consider five distinct scribed some of the implementation nique is to apply PCA (principal com-
items a, b, c, d, e with their related modifications made to ensure high ponents analysis) to extract a small
hash values. From this, the following accuracy of the HLL across the whole number of “directions” from the data.
array is obtained: range of possible cardinalities.10 Projecting each row of data along each
of these directions yields a different
Figure 3. Example of HyperLogLog in action. representation of the data that captures
most of the variation of the dataset.
x a b c d e
One limitation of PCA is that find-
h(x) 1 2 3 1 3 ing the direction entails a substantial
g(x) 0001 0011 1010 1101 0101 amount of work. It requires finding
eigenvectors of the covariance matrix,

practice
which rapidly becomes unsustainable the number of rows. Instead, applying that solving a problem a certain way
for large matrices. The competing ap- sketching to matrix A solves the prob- is the only option. Often, fast approxi-
proach of random projections argues lem in the lower-dimensional sketch mate sketch-based techniques can pro-
that rather than finding “the best” di- space.5 David Woodruff provides a vide a different trade-off.
rections, it suffices to use (a slightly comprehensive mathematical survey
larger number of) random vectors. of the state of the art in this area.16
Related articles
Picking a moderate number of ran- Rich data: Graphs and geometry. The on queue.acm.org
dom directions captures a comparable applications of sketching so far can be
It Probably Works
amount of variation, while requiring seen as summarizing data that might Tyler McMullen
much less computation. be thought of as a high-dimensional http://queue.acm.org/detail.cfm?id=2855183
The random projection of each row vector, or matrix. These mathematical
Statistics for Engineers
of the data matrix can be seen as an ex- abstractions capture a large number of Heinrich Hartmann
ample of a sketch of the data. More di- situations, but, increasingly, a richer http://queue.acm.org/detail.cfm?id=2903468
rectly, close connections exist between model of data is desired—say, to model
random projections and the sketches links in a social network (best thought of References
1. Ahn, K.J., Guha, S., McGregor, A. Analyzing graph
described earlier. The Count-Min sketch as a graph) or to measure movement pat- structure via linear measurements. In Proceedings of the
can be viewed as a random projection of terns of mobile users (best thought of as ACM-SIAM Symposium on Discrete Algorithms, (2012).
2. Bhagat, S., Burke, M., Diuk, C., Filiz, I.O., Edunov, S.
sorts; moreover, the best constructions points in the plane or in 3D). Sketching Three-and-a-half degrees of separation. Facebook
of random projections for dimension- ideas have been applied here also. Research, 2016; https://research.fb.com/three-and-a-
half-degrees-of-separation/.
ality reduction look a lot like Count- For graphs, there are techniques 3. Bloom, B. Space/time trade-offs in hash coding with
Min sketches with some twists (such as to summarize the adjacency informa- allowable errors. Commun. ACM 13, 7 (July 1970),
422–426.
randomly multiplying each column of tion of each node, so that connectivity 4. Broder, M., Mitzenmacher, A. Network applications
the matrix by either -1 or 1). This is the and spanning tree information can be of Bloom filters: a survey. Internet Mathematics 1, 4
(2005), 485–509.
basis of methods for speeding up high- extracted.1 These methods provide a 5. Clarkson, K.L., Woodruff, D.P. Low rank approximation
dimensional machine learning, such as surprising mathematical insight that and regression in input sparsity time. In Proceedings
of the ACM Symposium on Theory of Computing,
the Hash Kernels approach.14 much edge data can be compressed (2013), 81–90.
6. Cormode, G., Korn, F., Muthukrishnan, S., Johnson, T.,
Randomized numerical linear al- while preserving fundamental informa- Spatscheck, O., Srivastava, D. 2004. Holistic UDAFs
gebra. A grand objective for sketching tion about the graph structure. These at streaming speeds. In Proceedings of the ACM
SIGMOD International Conference on Management of
is to allow arbitrary complex mathe- techniques have not found significant Data, (2004), 35–46.
matical operations over large volumes use in practice yet, perhaps because of 7. Cormode, G., Muthukrishnan, S. An improved data
stream summary: the Count-Min sketch and its
of data to be answered approximately high overheads in the encoding size. applications. J. Algorithms 55, 1 (2005), 58–75.
and quickly via sketches. While this For geometric data, there has been 8. Flajolet, P., Martin, G.N. 1985. Probabilistic counting.
In Proceedings of the IEEE Conference on
objective appears quite a long way off, much interest in solving problems such Foundations of Computer Science, 1985, 76–82. Also
and perhaps infeasible because of some as clustering.9 The key idea here is that in J. Computer and System Sciences 31, 182–209.
9. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.
impossibility results, a number of core clustering part of the input can capture Clustering data streams. In Proceedings of the IEEE
mathematical operations can be solved a lot of the overall structural informa- Conference on Foundations of Computer Science, 2000.
10. Heule, S., Nunkesser, M., Hall, A. HyperLogLog in
using sketching ideas, which leads tion, and by merging clusters together practice: Algorithmic engineering of a state of the art
to the notion of randomized numeri- (clustering clusters) you can retain a cardinality estimation algorithm. In Proceedings of
the International Conference on Extending Database
cal linear algebra. A simple example is good picture of the overall point density Technology, 2013.
matrix multiplication: given two large distribution. 11. Jermaine, C. Sampling techniques for massive data.
Synopses for massive data: samples, histograms,
matrices A and B, you want to find their wavelets and sketches. Foundations and Trends in
Databases 4, 1–3 (2012). G. Cormode, M. Garofalakis,
product AB. An approach using sketch- Why Should You Care? P. Haas, and C. Jermaine, Eds. NOW Publishers.
ing is to build a dimensionality-reduc- The aim of this article has been to 12. Morris, R. Counting large numbers of events in small
registers. Commun. ACM 21, 10 (Oct. 1977), 840–842.
ing sketch of each row of A and each col- introduce a selection of recent tech- 13. Pike, R., Dorward, S., Griesemer, R., Quinlan, S.
umn of B. Combining each pair of these niques that provide approximate an- Interpreting the data: Parallel analysis with Sawzall.
Dynamic Grids and Worldwide Computing 13, 4 (2005),
provides an estimate for each entry of swers to some general questions that 277–298.
the product. Similar to other examples, often occur in data analysis and manip- 14. Weinberger, K.Q., Dasgupta, A., Langford, J., Smola,
A.J., Attenberg, J. Feature hashing for large-scale
small answers are not well preserved, ulation. In all cases, simple alternative multitask learning. In Proceedings of the International
but large entries are accurately found. approaches can provide exact answers, Conference on Machine Learning, 2009.
15. Whang, K.Y., Vander-Zanden, B.T., Taylor, H.M. A linear-
Other problems that have been tack- at the expense of keeping complete time probabilistic counting algorithm for database
led in this space include regression. information. The examples shown applications. ACM Trans. Database Systems 15, 2
(1990, 208.
Here the input is a high-dimensional here have illustrated, however, that in 16. Woodruff, D. Sketching as a tool for numerical linear
dataset modeled as matrix A and col- many cases the approximate approach algebra. Foundations and Trends in Theoretical
Computer Science 10, 1–2 (2014), 1–157.
umn vector b: each row of A is a data can be faster and more space efficient.
point, with the corresponding entry of The use of these methods is growing. Graham Cormode is a professor of computer science
b the value associated with the row. The Bloom filters are sometimes said to at the University of Warwick, U.K. Previously, he was a
researcher at Bell Labs and AT&T on algorithms for data
goal is to find regression coefficients x be one of the core technologies that management. He received the 2017 Adams Prize for his
that minimize ||Ax-b||2. An exact so- “big data experts” must know. At the work on data analysis.
lution to this problem is possible but very least, it is important to be aware Copyright held by owner/author.
costly in terms of time as a function of of sketching techniques to test claims Publication rights licensed to ACM. $15.00.
practice
DOI:10.1145/ 3106631
1. Review the Candidate’s Résumé

queue.acm.org
Read every line of every résumé (and
this goes for the really long ones that
go on for four pages). Where have these
Plan ahead to make the interview candidates worked? How long did they
a successful one. stay in a role and did their positions
change? These questions make for in-
BY KATE MATSUDAIRA teresting conversation topics. Hope-
fully there will be something in a can-
10 Ways to
didate’s background that piques your
interest and can be great fodder for
starting the interview with some com-
mon ground. This can put candidates
Be a Better
at ease, giving them their greatest
chance of success.
2. Review Feedback from
Interviewer
Previous Interviews
Most software companies have a lon-
ger interview process that can start
with phone-screen or homework
problems and evolve from there. If the
candidate has done homework prob-
lems, or your teammates have taken
the time to type up feedback, do your
due diligence and read it. These can
also be a great source of material for
questions, but more importantly, it is
unprofessional to ask the same ques-
tions that have already been posed to
the candidate. This is partly because
I N MANY WAYS interviewing is an art. You have one you will not learn as much from re-
hour (more if you count the cumulative interview time) peated questions, but also because
the candidate will be bored or unim-
to determine if the candidate has the desired skills, pressed going over the same ground.
and, more importantly, if you would enjoy working Great candidates want to be chal-
with this person. That is a lot of ground to cover. lenged, and an interview team where
people are asking the same questions
As if finding out all that information isn’t a daunting makes the candidate think the team is
enough task, you also need to make sure that the disorganized or unimaginative.
candidate has a positive experience while visiting your 3. Use Calibrated Questions
company (after all, people talk and you want them to be Interviews are not the time to try
saying good things—since this candidate may not be something new. Take the time to do
new problems on your own or test
your next hire, but someone he or she meets may be). them on your peers. Come to the in-
As an interviewer, the key to your success is terview with questions that you were
given in your interview (since you
preparation. Planning will help ensure the success of certainly will know how well you did)
the interview (both in terms of getting the information or that you have already given to oth-
you need and giving the candidate a good impression). ers. Testing new material can really
hurt a candidate’s chances for suc-
The following list is advice to consider prior to stepping cess or, worse, give him or her a bad
into that room with two chairs and a whiteboard. impression of the company when you

are not prepared to answer clarifying to you to assess it, ask one of your plan to ask and how long each should
questions. You get the most from in- teammates to be your guinea pig take. Each question should have clear
terviews when you can compare the re- (as a manager I often offer to be the goals and focus on specific competen-
sults of one with another, particularly interviewee for my team to test out cies for the position. Ideally, the ques-
with the results of a successful hire or new questions; after all, isn’t it fun tions should be different from one an-
peer—so try to come to the interview to turn the tables and interview your other and give you a feel for multiple
with questions that will help you make manager?). Seeing where the people areas of the candidate’s experience
this comparison. you know and respect get stuck, or and background. I like to ask about
how long they take to solve it, will five questions, so a typical agenda
4. Test New Questions on give you a good baseline for compari- might look like this:
Yourself and Your Peers son with future candidates. ˲˲ Warm-up question about the can-
PHOTO BY DRAGON IM AGES
If you do have a new question you didate’s background (or common in-
want to give a dry run, have someone 5. Create a Timeline terest): 5–10 minutes
ask you to answer it. Where do you for the Interview ˲˲ Problem-solving question that
get hung up? How long does it take You should walk into every interview involves coding of some sort: 10–20
you? If the problem is too familiar with a schedule: what questions you minutes
practice
˲˲ Design question: 10–15 minutes 8. Bring a List of Questions

˲˲ Two to three cultural or situational to the Interview
questions: 5–10 minutes No candidate will think less of you for
˲˲ Time to answer the candidate’s coming in with written questions, and
questions
Don’t write notes in fact some may appreciate that you
prepared the same way they did. This
6. Head In With a Positive Attitude
You want the candidate to have a
on the résumé. will also help you establish your game
plan and agenda so you don’t forget.
good experience with the company Someone once told Another one of my favorite tips is al-
and your process. If you are upbeat,
it is much more likely a qualified can-
me that in some ways to have spare questions for re-
ally good interviews (that get through
didate will accept the position. If you cultures, business all the material quickly) or for bad in-
are not, people talk and it is a small
world. You want candidates to think
cards and résumés terviews (where you don’t want to ask
your prepared questions because they
well of the company and feel they are considered are too hard).
were treated fairly. It’s like karma—
what goes around comes around. To a reflection 9. Be Collaborative
ensure this happens, try to make your of the person, You want the candidate to be success-
questions and hints feel collabora-
tive, and whatever you do, do not in- and writing on them ful, so try to approach a problem to-
gether. I know many other managers
sult any candidates or make them feel
stupid. They are probably nervous
can be insulting. who have moved to a pair-program-
ming model where the interviewer and
and you already have the job—there is the candidate code a problem together
nothing to prove, so make an effort to in an editor or Google doc.
give them a fair shot.
10. Try To Make the Problems
7. Take Notes Feel As Real-World As Possible
Seems obvious, but so many people Smart people want to be challenged.
don’t take notes. Even if you have a They also would love to get a taste of
photographic memory, taking the time what it is like to work at your company.
to write down a few things here and Do your best to come up with questions
there will indicate to the candidate you that at least hint at some of the prob-
are paying attention and are genuinely lems you might solve (or problems that
interested in what he or she has to say. relate to the underlying theory of the
As an avid note-taker, here are some of work you do).
my favorite tips: Of course, there is no right way to
˲˲ Try not to use a laptop. Yes, it is do an interview, but you can always
probably faster and more efficient, but be better. Make an effort to make your
it can be a physical divider between you candidates as comfortable as possible
and the candidate, not to mention off- so they have the greatest chance for
putting. When an interviewer uses a success. Happy hiring!
computer during an interview, it is easy
to think that he or she is not paying at-
Related articles
tention to what the candidate has to say. on queue.acm.org
˲˲ Instead of writing code/drawings
Interviewing Techniques
on a whiteboard, try paper. This may
George Neville-Neil
be more comfortable for most people http://queue.acm.org/detail.cfm?id=1998475
than standing up at a whiteboard,
Nine Things I Didn’t Know I Would Learn
and you can take the paper with you, Being an Engineer Manager
which is better than any copied white- Kate Matsudaira
board code. http://queue.acm.org/detail.cfm?id=2935693
˲˲ Don’t write notes on the résumé.
10 Optimizations on Linear Search
Someone once told me that in some Thomas A. Limoncelli
cultures, business cards and résu- http://queue.acm.org/detail.cfm?id=2984631
més are considered a reflection of the
person, and writing on them can be Kate Matsudaira (katemats.com) is the founder of
her own company, Popforms. Previously she worked at
insulting. While I personally haven’t Microsoft and Amazon as well as startups like Decide,
encountered anyone who felt this way, Moz, and Delve Networks.
I am sure never to do this (and bring Copyright held by owner/author.

my own paper) just in case. Publication rights licensed to ACM. $15.00

ACM Europe Conference
Barcelona, Spain | 7 – 8 September 2017
The ACM Europe Conference, hosted in Barcelona by the Barcelona Supercomputing Center,
aims to bring together computer scientists and practitioners interested in exascale high
performance computing and cybersecurity.
The High Performance Computing track includes a panel discussion of top world experts in HPC
to review progress and current plans for the worldwide roadmap toward exascale computing.
The Cybersecurity track will review the latest trends in this very hot field. High-level European
Commission officials and representatives of funding agencies are participating.
Keynote Talk by ACM 2012 Turing Award Laureate Silvio Micali,

“ALGORAND: A New Distributed Ledger”
Co-located events:
• ACM Europe Celebration of Women in Computing: WomENcourage 2017
(Requires registration, https://womencourage.acm.org/)
• EXCDI, the European Extreme Data & Computing Initiative (https://exdci.eu/)
• Eurolab-4-HPC (https://www.eurolab4hpc.eu/)
• HiPEAC, the European Network on High Performance and Embedded Architecture
and Compilation (https://www.hipeac.net/)
Conference Chair: Mateo Valero, Director of the Barcelona Supercomputing Center
Registration to the ACM Europe Conference is free of charge

for ACM members and attendees of the co-located events.
Europe Council
http://acmeurope-conference.acm.org
contributed articles
DOI:10.1145/ 3122814
Answering questions correctly from

standardized eighth-grade science tests is
itself a test of machine intelligence.
BY CARISSA SCHOENICK, PETER CLARK, OYVIND TAFJORD,
PETER TURNEY, AND OREN ETZIONI
Moving
Beyond the
Turing Test
with the Allen AI
Science Challenge
T H E FIELD OF artificial intelligence has made great

key insights
strides recently, as in AlphaGo’s victories in the game
of Go over world champion South Korean Lee Sedol in ˽˽ Determining whether a system truly
displays artificial intelligence is
March 2016 and top-ranked Chinese Go player Ke Jie difficult and complex, and well-known
assessments like the Turing Test are not
in May 2017, leading to great optimism for the field. suited to the task.
But are we really moving toward smarter machines, ˽˽ The Allen Institute for Artificial
or are these successes restricted to certain classes of Intelligence suggests that answering
science exam questions successfully is
problems, leaving others untouched? In 2015, the a better measure of machine intelligence
and designed a global competition to
Allen Institute for Artificial Intelligence (AI2) ran its engage the research community in
this approach.
first Allen AI Science Challenge, a competition to test
˽˽ The outcome of the Allen AI Science
machines on an ostensibly difficult task—answering Challenge highlights the current
eighth-grade science questions. Our motivations were limitations of AI research in language
understanding, reasoning, and
to encourage the field to set its sights more broadly by commonsense knowledge; the highest
scores are still limited to the capabilities
exploring a problem that appears to require modeling, of information-retrieval methods.

reasoning, language understanding, and ligent.11 As the field of AI has grown, Turing Test is more a test of human
commonsense knowledge in order to the test has become less meaningful gullibility than machine intelligence.
PHOTO BY PA NITA N PHOTO, W ITH ROBOT ILLUST RATIO N BY PET ER C ROW T H ER ASSOCIAT ES
probe the state of the art while sowing the as a challenge task for several reasons. Finally, the test as originally conceived
seeds for possible future breakthroughs. First, in its details, it is not well defined is pass/fail rather than scored, thus
Challenge problems have histori- (such as Who is the person giving the providing no measure of progress to-
cally played an important role in moti- test?). A computer scientist would ward a goal, something essential for
vating and driving progress in research. likely know good distinguishing ques- any challenge problem.a,b
For a field striving to endow machines tions to ask, while a random member Machine intelligence today is viewed
with intelligent behavior (such as lan- of the general public may not. What less as a binary pass/fail attribute and
guage understanding and reasoning), constraints are there on the interac-
challenge problems that test such skills tion? What guidelines are provided
a Turing himself did not conceive of the Turing
are essential. to the judges? Second, recent Turing Test as a challenge problem to drive the field
In 1950, Alan Turing proposed the Test competitions have shown that, forward but rather as a thought experiment
now well-known Turing Test as a pos- in certain formulations, the test it- to explore a useful alternative to the question
sible test of machine intelligence: If a self is gameable; that is, people can Can machines think?
system can exhibit conversational be- be fooled by systems that simply re- b Although one can imagine metrics that quan-
tify performance on the Turing Test, the im-
havior that is indistinguishable from trieve sentences and make no claim precision of the task definition and human
that of a human during a conversation, of being intelligent. 2,3 John Markoff variability make it difficult to define metrics
that system could be considered intel- of The New York Times wrote that the that are reliably reproducible.
more as a diverse collection of capabili- turn result in more or less energy being tion answering or unfair advantage of
ties associated with intelligent behav- consumed. Understanding the question additional training examples. A week
ior. Rather than a single test, cognitive also requires the system being able to before the end of the competition, we
scientist Gary Marcus of New York Uni- recognize that “energy” in this context provided the final test set of 21,298
versity and others have proposed the no- refers to resource consumption for the questions (including the validation
tion of series of tests—a Turing Olym- purposes of transportation, as opposed set) to participants to use to produce a
pics of sorts—that could assess the full to other forms of energy one might find final score for their models, of which
gamut of AI, from robotics to natural in a science exam (such as electrical and 2,583 were legitimate. We licensed the
language processing.9,12 kinetic/potential). data for the competition from private
Our goal with the Allen AI Science assessment-content providers that did
Challenge was to operationalize one AI vs. Eighth Grade not wish to allow the use of their data
such test—answering science-exam To put this approach to the test, AI2 beyond the constraints of the competi-
questions. Clearly, the Science Chal- designed and hosted The Allen AI Sci- tion, though AI2 made some subsets of
lenge is not a full test of machine in- ence Challenge, a four-month-long the questions available on its website
telligence but does explore several competition in partnership with Kaggle http://allenai.org/data.
capabilities strongly associated with in- (https://www.kaggle.com/) that began in Baselines and scores. As these ques-
telligence—capabilities our machines October 2015 and concluded in Febru- tions were all four-way multiple choice,
need if they are to reliably perform the ary 2016.7 Researchers worldwide were a standard baseline score using random
smart activities we desire of them in the invited to build AI software that could guessing was 25%. AI2 also generated
future, including language understand- answer standard eighth-grade multiple- a baseline score using a Lucene search
ing, reasoning, and use of common- choice science questions. The competi- over the Wikipedia corpus, producing
sense knowledge. Doing well on the tion aimed to assess the state of the art scores of 40.2% on the training set and
challenge appears to require significant in AI systems utilizing natural language 40.7% on the final test set. The final re-
advances in AI technology, making it a understanding and knowledge-based sults of the competition was quite close,
potentially powerful way to advance the reasoning; how accurately the partici- with the top three teams achieving
field. Moreover, from a practical point pants’ models could answer the exam scores with a spread of only 1.05%. The
of view, exams are accessible, measur- questions would serve as an indicator of highest score was 59.31%.
able, understandable, and compelling. how far the field has come in these areas.
One of the most interesting and Participants. A total of 780 teams First Place
appealing aspects of science exams is participated during the model-build- Top prize went to Chaim Linhart of
their graduated and multifaceted na- ing phase, with 170 of them eventually Hod HaSharon, Israel (Kaggle data
ture; different questions explore dif- submitting a final model. Participants science website https://www.kaggle.
ferent types of knowledge, varying sub- were required to make the code for their com username Cardal). His model
stantially in difficulty, especially for models available to AI2 at the close of achieved a final score of 59.31% cor-
a computer. There are questions that the competition to validate model per- rect on the test question set of 2,583
are easily addressed with a simple fact formance and confirm they followed questions using a combination of 15
lookup, like this contest rules. At the conclusion of the gradient-boosting models, each with
competition, the winners were also ex- a different subset of features. Unlike
How many chromosomes does the pected to make their code open source. the other winners’ models, Linhart’s
human body cell contain? The three teams achieving the highest model predicted the correctness of
(A) 23 scores on the challenge’s test set re- each answer option individually. Lin-
(B) 32 ceived prizes of $50,000, $20,000, and hart used two general categories of
(C) 46 $10,000, respectively. features to make these predictions;
(D) 64 Data. AI2 licensed a total of 5,083 the first consisted of information-
eighth-grade multiple-choice science retrieval-based features, applied by
Then there are questions requiring questions from providing partners searching over corpora he compiled
extensive understanding of the world, for the purposes of the competition. from various sources (such as study-
like this All questions were standard multiple- guide or quiz-building websites, open
choice format, with four answer op- source textbooks, and Wikipedia).
City administrators can encourage tions, as in the earlier examples. From His searches used various weightings
energy conservation by this collection, we provided partici- and stemmed words to optimize per-
(A) lowering parking fees pants with a set of 2,500 training ques- formance. The other flavor of features
(B) building larger parking lots tions to train their models. We used a used in his ensemble of 15 models
(C) decreasing the cost of gasoline validation set of 8,132 questions during was based on properties of the ques-
(D) lowering the cost of bus and sub- the course of the competition for con- tions themselves (such as length of
way fares firming model performance. Only 800 question and answer, form of answer
of the validation questions were legiti- like numeric answer options, answers
This question requires the knowl- mate; we artificially generated the rest containing referential clauses like
edge that certain activities and incen- to disguise the real questions in order “none of the above” as an option, and
tives result in human behaviors that in to prevent cheating via manual ques- relationships among answer options).

Linhart explained that he used sev- tions obtained from an educational-

eral smaller gradient-boosting models flashcard-building site, then created
instead of one big model to maximize negative examples by mixing terms with
diversity. One big model tends to ignore random definitions. A supervised classi-
some important features because it re-
quires a very large training set to ensure In the end, each fier was trained on these incorrect pairs,
and the output was used to generate fea-
it pays attention to all potentially useful
features present. Linhart’s use of sever-
of the winning tures for input to XGBoost.
al small models required that the learn- models gained Third Place
ing algorithm use features it would oth-
erwise ignore, an advantage, given the
from information- The third-place winner was Alejandro
Mosquera from Reading, U.K. (Kaggle
relatively limited training data available retrieval-based username Alejandro Mosquera), with a
in the competition.
The information-retrieval-based
methods, indicative score of 58.26%. Mosquera approached
the challenge as a three-way classifica-
features alone could achieve scores as of the state of AI tion problem for each pair of answer op-
high as 55% by Linhart’s estimation. His
question-form features filled in some technology in this tions. He transformed answer choices A,
B, C, and D to all 12 possible pairs (A,B),
remaining gaps to bring the system up area of research. (A,C), ..., (D,C) he labeled with three
to approximately 60% correct. He com- classes: left-pair element is correct; right
bined his 15 models using a simple is correct; or neither is correct. He then
weighted average to yield the final score classified the pairs using logistic re-
for each choice. He credited careful cor- gression. This three-way classification
pus selection as one of the primary ele- is easier for supervised learning algo-
ments driving the success of his model. rithms than the more natural two-way
(correct vs. incorrect) classification with
Second Place four choices, because the two-way clas-
The second-place team, with a score of sification requires an absolute decision
58.34%, was from a social-media-analyt- about a choice, whereas the three-way
ics company based in Luxembourg called classification requires only a relative
Talkwalker (https://www.talkwalker. ranking of the choices. Mosquera made
com), led by Benedikt Wilbertz (Kaggle use of three types of features: informa-
username poweredByTalkwalker). tion-retrieval-based features based on
The Talkwalker team built a relatively scores from Elastic Search using Lucene
large corpus compared to other winning over a corpus; vector-based features that
models, using 180GB of disk space af- measured question-answer similarity by
ter indexing with Lucene. Feature types comparing vectors from word2vec; and
included information-retrieval-based question-form features that considered
features, vector-based features (scoring such aspects of the data as the structure
question-answer similarity by compar- of a question, length of a question, and
ing vectors from word2vec, a two-layer answer choices. Mosquera also noted
neural net that processes text, and that careful corpus selection was crucial
GloVe, an unsupervised learning algo- to his model’s success.
rithm (for obtaining vector representa-
tions for words), pointwise mutual infor- Lessons
mation features (measured between the In the end, each of the winning mod-
question and target answer, calculated els gained from information-retrieval-
on the team’s large corpus), and string based methods, indicative of the state
hashing features in which term-defini- of AI technology in this area of research.
tion pairs were hashed and a supervised AI researchers intent on creating a ma-
learner was then trained to classify pairs chine with human-like intelligence are
as correct or incorrect. A final model unable to ace an eighth-grade science
used them to learn pairwise ranking exam because they do not currently have
between the answer options using the AI systems able to go beyond surface text
XGBoost library, an implementation of to a deeper understanding of the mean-
gradient-boosted decision trees. ing underlying each question, then use
Wilbertz’s use of string hashing fea- reasoning to find the appropriate an-
tures was unique, not tried by either swer. All three winners said it was clear
of the other two winners nor currently that applying a deeper, semantic level of
used in AI2’s Project Aristo. His team reasoning with scientific knowledge to
used a corpus of terms and defini- the questions and answers would be the
key to achieving scores of 80% and high- reasoning required to successfully an- 4. Berant, J., Chou, A., Frostig, R., and Liang, P. Semantic
parsing on Freebase from question-answer pairs. In
er and demonstrating what might be swer these example questions. Ques- Proceedings of the 2013 Conference on Empirical
considered true artificial intelligence. tion-answering systems developed for Methods in Natural Language Processing (Seattle, WA,
Oct. 18–21). Association for Computational Linguistics,
A few other example questions each the message-understanding conferenc- Stroudsburg, PA, 2013, 6.
of the top three models got wrong high- es6 and text-retrieval conferences13 have 5. Fader, A., Zettlemoyer, L., and Etzioni, O. Open question
answering over curated and extracted knowledge
light the more interesting, complex nu- historically focused on retrieving an- bases. In Proceedings of the 20th ACM SIGKDD
ances of language and chains of reason- swers from text, the former from news- International Conference on Knowledge Discovery and
Data Mining (New York, Aug. 24–27). ACM Press, New
ing an AI system must be able to handle wire articles, the latter from various York, 2014.
in order to answer the following ques- large corpora (such as the Web, micro- 6. Grishman, R. and Sundheim, B. Message understanding
Conference-6: A brief history. In Proceedings of the 16th
tions correctly and for which informa- blogs, and clinical data). More recent Conference on Computational Linguistics (Copenhagen,
Denmark, Aug. 5–9). Association for Computational
tion-retrieval methods are not sufficient: work has focused on answer retrieval Linguistics, Stroudsburg, PA, 1996, 466–471.
from structured data (such as “In which 7. Kaggle. The Allen AI Science Challenge; https://www.
kaggle.com/c/the-allen-ai-science-challenge
What do earthquakes tell scientists city was Bill Clinton born?” from Free- 8. Katz, B., Borchardt, G., and Felshin, S. Natural language
about the history of the planet? Base, a large publicly available collab- annotations for question answering. In Proceedings
of the 19th International Florida Artificial Intelligence
(A) Earth’s climate is constantly orative knowledgebase).4,5,15 However, Research Society Conference (Melbourne Beach, FL,
changing. these systems rely on the information May 11–13). AAAI Press, Menlo Park, CA, 2006.
9. Marcus, G., Rossi, F., and Veloso, M., Eds. Beyond the
(B) The continents of Earth are con- being stated explicitly in the underly- Turing Test. AI Magazine (Special Edition) 37, 1 (Spring
tinually moving. ing data and are unable to perform the 2016).
10. Simmons, J. True Knowledge: The natural language
(C) Dinosaurs became extinct about reasoning steps that would be required question answering Wikipedia for facts. Semantic Focus
65 million years ago. to conclude this information from indi- (Feb. 26, 2008); http://www.semanticfocus.com/blog/
entry/title/true-knowledge-the-natural-language-
(D) The oceans are much deeper to- rect supporting evidence. question-answering-wikipedia-for-facts/
day than millions of years ago. A few systems attempt some form 11. Turing, A.M. Computing machinery and intelligence.
Mind 59, 236 (Oct. 1950), 433–460.
of reasoning: Wolfram Alpha14 answers 12. Turk, V. The plan to replace the Turing Test with a
This involves the causes behind mathematical questions, providing they ‘Turing Olympics.’ Motherboard (Jan. 28, 2015); https://
motherboard.vice.com/en_us/article/the-plan-to-
earthquakes and the larger geographic are stated either as equations or with replace-the-turing-test-with-a-turing-olympics
13. Voorhees, E. and Ellis, A., Eds. In Proceedings of the
phenomena of plate tectonics and is not relatively simple English; Evi10 is able to 24th Text REtrieval Conference (Gaithersburg, MD, Nov.
easily solved by looking up a single fact. combine facts to answer simple ques- 17–20). Publication SP 500-319, National Institute of
Standards and Technology, Gaithersburg, MD, 2015.
Additionally, other true facts appear in tions (such as “Who is older: Barack or 14. Wolfram, S. Making the world’s data computable.
the answer options (“Dinosaurs became Michelle Obama?”); and START,8 which Stephen Wolfram Blog (Sept. 24, 2010); http://blog.
stephenwolfram.com/2010/09/making-the-worlds-
extinct about 65 million years ago.”) but likewise is able to answer simple infer- data-computable/
must be intentionally identified and ence questions (such as “What South 15. Yao, X. and Van Durme, B. Information extraction over
structured data: Question answering with Freebase.
discounted as incorrect in the context American country has the largest popu- In Proceedings of the 52nd Annual Meeting of the
of the question. lation?”) using Web-based databases. Association for Computational Linguistics (Baltimore,
MD, June 22–27). Association for Computational
However, none of them attempts the Linguistics, Stroudsburg, PA, 2014, 956–966.
Which statement correctly describes level of complex question processing
a relationship between the distance and reasoning that is indeed required to
Carissa Schoenick (carissas@allenai.org) is the senior
from Earth and a characteristic of a star? successfully answer many of the science program manager for Project Aristo at the Allen Institute
(A) As the distance from Earth to the questions in the Allen AI Challenge. for Artificial Intelligence in Seattle, WA.
star decreases, its size increases. Peter Clark (peterc@allenai.org) is the senior research
(B) As the distance from Earth to the Looking Forward manager for Project Aristo at the Allen Institute for
Artificial Intelligence in Seattle, WA.
star increases, its size decreases. As the 2015 Allen AI Science Challenge
Oyvind Tafjord (oyvindt@allenai.org) is a senior research
(C) As the distance from Earth to the demonstrated, achieving a high score scientist and engineer at the Allen Institute for Artificial
star decreases, its apparent brightness on a science exam requires a system Intelligence in Seattle, WA.
increases. that can do more than sophisticated Peter Turney (peter.turney@gmail.com) was a senior
(D) As the distance from Earth to the information retrieval. Project Aristo at research scientist for Project Aristo at the Allen Institute
for Artificial Intelligence in Seattle, WA, and is now retired.
star increases, its apparent brightness AI2 is focused on the problem of suc-
increases. cessfully demonstrating artificial in- Oren Etzioni (orene@allenai.org) is the Chief Executive
Officer of the Allen Institute for Artificial Intelligence
telligence using standardized science in Seattle, WA, and a professor in the Allen School for
This requires general common- exams, developing an assortment of ap- Computer Science at the University of Washington in
Seattle, WA.
sense-type knowledge of the physics of proaches to address the challenge. AI2
distance and perception, as well as the plans to release additional datasets and Copyright held by the authors.
Publication rights licensed to ACM. $15.00
semantic ability to relate one statement software for the wider AI research com-
to another within each answer option to munity in this effort.1
find the right directional relationship.
References
Other Attempts 1. Allen Institute for Artificial Intelligence. Datasets;
http://allenai.org/data
While numerous question-answering 2. Aron, J. Software tricks people into thinking it is human. Watch the authors discuss
systems have emerged from the AI com- New Scientist 2829 (Sept. 6, 2011). their work in this exclusive
3. BBC News. Computer AI passes Turing Test in ‘world Communications video.
munity, none has addressed the chal- first.’ BBC News (June 9, 2014); http://www.bbc.com/ https://cacm.acm.org/videos/
lenges of scientific and commonsense news/technology-27762088 moving-beyond-the-turing-test

DOI:10.1145/ 3 1 2 2 8 0 3
Even when checked by fact checkers, facts are

often still open to preexisting bias and doubt.
BY PETTER BAE BRANDTZAEG AND ASBJØRN FØLSTAD
Trust and
Distrust
in Online
Fact-Checking
Services
WHILE THE INTERNET has the potential to give people
ready access to relevant and factual information,
social media sites like Facebook and Twitter have made
filtering and assessing online content increasingly
difficult due to its rapid flow and enormous volume.
In fact, 49% of social media users in the U.S. in 2012
received false breaking news through disseminated further and faster than
social media.8 Likewise, a survey by ever before due to social media. Polit-
Silverman11 suggested in 2015 that ical analysts continue to discuss mis-
false rumors and misinformation information and fake news in social
media and its effect on the 2016 U.S.
key insights presidential election.
Such misinformation challenges
˽˽ Though fact-checking services play the credibility of the Internet as a
an important role countering online
disinformation, little is known about whether
venue for authentic public informa-
users actually trust or distrust them. tion and debate. In response, over the
˽˽ The data we collected from social media
past five years, a proliferation of out-
discussions—on Facebook, Twitter, blogs, lets has provided fact checking and
forums, and discussion threads in online debunking of online content. Fact-
newspapers—reflects users’ opinions checking services, say Kriplean et al.,6
about fact-checking services.
provide “… evaluation of verifiable
˽˽ To strengthen trust, fact-checking services claims made in public statements
should strive to increase transparency
in their processes, as well as in their through investigation of primary and
organizations, and funding sources. secondary sources.” An international
Figure 1. Categorization of fact-checking services based on areas of concern. Figure 2. Example of Snopes debunking
a social media rumor on Twitter
Fact-checking services’ areas of concern (March 6, 2016);
https://twitter.com/snopes/
Online rumors Political and Specific topics status/706545708233396225
and hoaxes public claims or controversies
Snopes.com FactCheck.org StopeFake

Hoax-Slayer PolitiFact TruthBeTold
ThruthOrFiction.com The Washington Post #RefugeeCheck
Fact Checker
HoaxBusters Climate Feedback
CNN Reality Check
Viralgranskaren - Metro Brown Moses Blog
Full Fact (continued as Bellingcat)
Figure 3. Outline of our research approach; posts collected October 2014 to March 2015.
Search Filter Content

Meltwater irrelevant posts analysis
Blogs Data Corpus Dataset Findings

1,741 posts 595 posts Trustworthiness
Discussion and usefulness
forums
Online more political or controversial issues

newspaper a fact-checking service covers, the
comments
more it needs to build a reputation
for usefulness and trustworthiness.
Research suggests the trustwor-
Table 1. Coding scheme we used to analyze the data. thiness of fact-checking services
depends on their origin and owner-
ship, which may in turn affect integ-
Theme Sentiment Service described as
rity perceptions 10 and the transpar-
Positive Useful, serving the purpose of fact checking
Usefulness ency of their fact-checking process. 4
Negative Not as useful, often derogatory
Despite these observations, we are
Positive Reputable, expert, or acclaimed
Ability unaware of any other research that
Negative Lacking expertise or credibility
has examined users’ perceptions of
Positive Aiming for greater (social) good
these services. Addressing the gap in
Benevolence Negative Suspected of (social) ill will (such as through conspiracy, current knowledge, we investigated
propaganda, or fraud)
the research question: How do so-
Positive Independent or impartial
Integrity cial media users perceive the trust-
Negative Dependent or partially or politically biased
worthiness and usefulness of fact-
checking services?
Fact-checking services differ in
census from 2017 counted 114 active ing has scarcely paid attention to the terms of their organizational aim
fact-checking services, a 19% increase general public’s view of fact check- and funding,10 as well as their areas
over the previous year.12 To benefit ing, focusing instead on how peo- of concern,11 that in turn may affect
from this trend, Google News in 2016 ple’s beliefs and attitudes change in their trustworthiness. As outlined
let news providers tag news articles response to facts that contradict their in Figure 1, the universe of fact-
or their content with fact-checking own preexisting opinions. This re- checking services can be divided into
information “… to help readers find search suggests fact checking in gen- three general categories based on
fact checking in large news stories.”3 eral may be unsuccessful at reducing their area(s) of concern: political and
Any organization can use the fact- misperceptions, especially among public statements in general, corre-
checking tag, if it is non-partisan, the people most prone to believe sponding to the fact checking of poli-
transparent, and targets a range of them.9 People often ignore facts that ticians, as discussed by Nyhan and
claims within an area of interest and contradict their current beliefs,2,13 Reifler;9 online rumors and hoaxes,
not just one single person or entity. particularly in politics and controver- reflecting the need for debunking
However, research into fact check- sial social issues.9 Consequently, the services, as discussed by Silverman;11

and specific topics or controversies as through Facebook and Twitter. Fig-

or particular conflicts or narrowly ure 2 is an example of a Twitter post
scoped issues or events (such as the with content checked by Snopes.
ongoing Ukraine conflict).
We have focused on three ser-
vices—Snopes, FactCheck.org, and Consequently, the Analyzing Social Media
Conversations
StopFake—all included in the Duke more political or To explore how social media users
Reporters’ Lab’s online overview of
fact checkers (http://reporterslab.org/ controversial issues perceive the trustworthiness and use-
fulness of these services, we applied
fact-checking/). They represent three
categories of fact checkers, from on-
a fact-checking a research approach designed to take
advantage of unstructured social me-
line rumors to politics to a particular service covers, the dia conversations (see Figure 3).
topic, as in Figure 1, and differences in
organization and funding. As a mea-
more it needs to While investigations of trust and
usefulness often rely on structured data
sure of their popularity, as of June build a reputation from questionnaire-based surveys,
20, 2017, Snopes had 561,650 likes
on Facebook, FactCheck.org 806,814,
for usefulness and social media conversations repre-
sent a highly relevant data source
and StopFake 52,537. trustworthiness. for our purpose, as they arguably
We study Snopes because of its reflect the raw, authentic percep-
aim to debunk online rumors, fitting tions of social media users. Xu et
the first category in Figure 1. This al. 16 claim it is beneficial to listen
aim is shared by other such services, to, analyze, and understand citizens’
including HoaxBusters and the Swed- opinions through social media to im-
ish service Viralgranskaren. Snopes prove societal decision-making
is managed by a small volunteer or- processes and solutions. They wrote,
ganization that has emerged from a for example, “Social media analytics
single-person initiative and funded has been applied to explain, detect,
through advertising revenue. and predict disease outbreaks,
We study FactCheck.org because election results, macroeconomic
it monitors the factual accuracy of processes (such as crime detec-
what is said by major political fig- tion), (… ) and ﬁnancial markets
ures. Other such services include (such as stock price).”16 Social me-
PolitiFact (U.S.) and Full Fact (U.K.) dia conversations take place in the
in the second category in Figure 1. everyday context of users likely to be
FactCheck.org is a project of the An- engaged in fact-checking services.
nenberg Public Policy Center of the This approach may provide a more
Annenberg School for Communica- unbiased view of people’s percep-
tion at the University of Pennsylva- tions than, say, a questionnaire-
nia, Philadelphia, PA. FactCheck.org based approach. The benefit of
is supported by university funding gathering data from users in their
and individual donors and has been specific social media context does
a source of inspiration for other fact- not imply that our data is repre-
checking projects. sentative. Our data lacks impor-
We study StopFake because it ad- tant information about user de-
dresses one highly specific topic— mographics, limiting our ability to
the ongoing Ukraine conflict. It claim generality for the entire user
thus resembles other highly focused population. Despite this potential
fact-checking initiatives (such as drawback, however, our data does
#Refugeecheck, which fact checks offer new insight into how social
reports on the refugee crises in Eu- media users view the usefulness
rope). StopFake is an initiative by and trustworthiness of various cat-
the Kyiv Mohyla Journalism School egories of fact-checking services.
in Kiev, Ukraine, and is thus a Eu- For data collection, we used
ropean-based service. Snopes and Meltwater Buzz, an established ser-
FactCheck.org are U.S. based, as vice for social media monitoring.
are more than a third of the fact- crawling data from social media
checking services identified by conversations in blogs, discussion
Duke Reporters’ Lab. 12 forums, online newspaper discus-
All three provide fact checking sion threads, Twitter, and Facebook.
through their own websites, as well Meltwater Buzz crawls all blogs (such
Figure 4. Positive and negative posts related to trustworthiness and usefulness per to reflect how people start a sentence
fact-checking service (in %); “other” refers to posts not relevant for the research when formulating their opinions.
categories (N = 595 posts). StopFake is a relatively less-known
service. We thus selected a broad-
Snopes (n = 385) FactCheck.org (n = 80) StopFake (n = 130) er search string—“StopFake”—to
be able to collect enough relevant
Positive (total) opinions. The searches returned a
data corpus of 1,741 posts over six
Negative (total)
months—October 2014 to March
Usefulness (positive) 2015—as in Figure 3. By “posts,” we
mean written contributions by indi-
Ability (positive) vidual users. To create a sufficient
dataset for analysis, we removed all
Benevolence (positive)
duplicates, including a small number
Integrity (positive) of non-relevant posts lacking person-
al opinions about fact checkers. This
Usefulness (negative) filtering process resulted in a dataset
of 595 posts.
Ability (negative)
We then performed content analy-
Benevolence (negative)
sis, coding all posts to identify and
investigate patterns within the data1
Integrity (negative) and reveal the perceptions users ex-
press in social media about the three
Other
fact-checking services we investigat-
0% 20% 40% 60% 80% 100% ed. We analyzed their perceptions of
the usefulness of fact-checking ser-
vices through a usefulness construct
Table 2. Snopes and themes we analyzed (n = 385). similar to the one used by Tsakonas
et al.14 “Usefulness” concerns the ex-
tent the service is perceived as benefi-
Theme Sentiment Example
cial when doing a specific fact-check-
Positive (21%) Snopes is a wonderful Website for verifying things seen online; it is at ing task, often illustrated by positive
least a starting point for research.
Usefulness recommendations and characteriza-
Negative (10%) Snopes is a joke. Look at its Boston bombing debunking failing to
debunk the worst hoax ever ... tions (such as the service is “good”
Positive (6%) […] Snopes is a respectable source for debunking wives’ tales, urban or “great”). Following Mayer et al.’s
legends, even medical myths ... theoretical framework,7 we catego-
Ability Negative (24%) Heh ... Snopes is a man and a woman with no investigative rized trustworthiness according to
background or credentials who form their opinions solely on Internet the perceived ability, benevolence,
research; they don’t interview anyone. […]
and integrity of the services. “Ability”
Positive (0%) No posts
concerns the extent a service is per-
Negative (21%) You show your Ignorance by using Snopes … Snopes is a NWO
Benevolence
ceived as having available the needed
Disinformation System designed to fool the Masses ... SORRY. I
Believe NOTHING from Snopes. Snopes is a Disinformation vehicle skills and expertise, as well as being
of the Elitist NWO Globalists. Believe NOTHING from them ... […] reputable and well regarded. “Benev-
Positive (2%) Snopes is a standard, rather dull fact-checking site, nailing right and olence” refers to the extent a service
left equally. […] is perceived as intending to do good,
Integrity
Negative (44%) Snopes is a leftist outlet supported with money from George Soros. beyond what would be expected from
Whatever Snopes says I take with a grain of salt ...
an egocentric motive. “Integrity” tar-
gets the extent a service is generally
viewed as adhering to an acceptable
as https://wordpress.com/), discus- of more than 500 members. This set of principles, in particular being
sion forums (such as https://offtopic. limitation in Facebook data partly independent, unbiased, and fair.
com/), and online newspapers (such explains why the overall number of Since we found posts typically re-
as https://www.washingtonpost. posts we collected—1,741—was not flect rather polarized perceptions of
com/) requested by Meltwater cus- more than it was. the studied services, we also grouped
tomers, thus representing a large, To collect opinions about social the codes manually according to sen-
though convenient, sample. It col- media user perceptions of Snopes timent, positive or negative. Some
lects various amounts of data from and FactCheck.org, we applied the posts described the services in a plain
each platform; for example, it crawls search term “[service name] is,” as and objective manner. We thus coded
all posts on Twitter but only the Face- in “Snopes is,” “FactCheck.org is,” them using a positive sentiment (see
book pages with 3,500 likes or groups and “FactCheck is.” We intended it Table 1) because they refer to the

service as a source for fact checking, 2 reflect how negative sentiment in crediting a service. Posts expressing
and users are likely to reference fact- the posts we analyzed on Snopes was positive sentiment mainly argue for
checking sites because they see them rooted in issues pertaining to trust- the usefulness of the service, claim-
as useful. worthiness. Integrity issues typically ing that Snopes is, say, a useful re-
For reliability, both researchers in involved a perceived “left-leaning” source for checking up on the veracity
the study did the coding. One coded political bias in the people behind of Internet rumors.
all the posts, and the second then the service. Pertaining to benevo- FactCheck.org. The patterns in the
went through all the assigned codes, lence, users in the study said Snopes posts we analyzed for FactCheck.org
a process repeated twice. Finally, is part of a larger left-leaning or “lib- resemble those for Snopes. As in Ta-
both researchers went through all eral” conspiracy often claimed to be ble 3, the most frequently mentioned
comments for which an alternative funded by George Soros, whereas trustworthiness concerns related to
code had been suggested to decide comments on ability typically tar- service integrity; as for Snopes, us-
on the final coding, a process that geted lack of expertise in the people ers said the service is politically bi-
recommended an alternative coding running the service. Some negative ased toward the left. Posts concern-
for 153 posts (or 26%). comments on trustworthiness may ing benevolence and ability were also
A post could include more than be seen as a rhetorical means of dis- relatively frequent, reflecting user
one of the analytical themes, so 30%
of the posts were thus coded as ad- Table 3. FactCheck.org and themes we analyzed (n = 80).
dressing two or more themes.
Theme Sentiment Example
Results Positive (25%) […] You obviously haven’t listened to what they say.
Despite the potential benefits of fact- Usefulness Also, I hate liars. FactCheck is a great tool.
checking services, Figure 4 reports Negative (3%) Anyway, “FactCheck” is a joke […]
the majority of the posts on the two Positive (6%) The media sources I use must pass a high credibility bar. FactCheck.
org is just one of the resources I use to validate what I read ...
U.S.-based services expressed nega- Ability
Negative (16%) […] FactCheck is NOT a confidence builder; see its rider and sources,
tive sentiment, with Snopes at 68%
Huffpo articles … REALLY?
and FactCheck.org at 58%. Most posts
Positive (0%) No posts
on the Ukraine-based StopFake (78%)
Negative (25%) FactCheck studies the factual correctness of what major players in
reflected positive sentiment. Benevolence U.S. politics say in TV commercials, debates, talks, interviews, and
The stated reasons for negative news presentations, then tries to present the best possible fictional
sentiment typically concerned one or and propaganda-like version for its target […]
more of the trustworthiness themes Positive (19%) When you don’t like the message, blame the messenger.
FactCheck is nonpartisan. It's just that conservatives either lie
rather than usefulness. For example, Integrity or are mistaken more ...
for Snopes and FactCheck.org, the Negative (39%) FactCheck is left-leaning opinion. It doesn’t check facts ...
negative posts often expressed con-
cern over lack in integrity due to per-
ceived bias toward the political left.
Negative sentiment pertaining to the Table 4. StopFake and themes we analyzed (n = 130); note * also coded as integrity/positive.
ability and benevolence of the servic-
es were also common. The few critical Theme Sentiment Example
comments on usefulness were typi- Positive (72%) Don’t forget a strategic weapon of the Kremlin is the “web of lies”
cally aimed at discrediting a service, spread by its propaganda machine; see antidote http://www.stopfake.
by, say, characterizing it as “satirical” Usefulness org/en/news
or as “a joke.” Negative (2%) […] StopFake! HaHaHa. You won, I give up. Next time I will quote
“Saturday Night Live”; there is more truth:)) ...
Positive posts were more often re-
Positive (2%) […] by the way, the website StopFake.org is a very objective and
lated to usefulness. For example, the accurate source exposing Russian propaganda and disinformation
stated reasons for positive sentiment techniques. […]*
toward StopFake typically concerned Ability Negative (2%) […] Ha Ha … a flow of lies is constantly sent out from the Kremlin.
the service’s usefulness in countering Really. If so, StopFake needs updates every hour, but the best way it
pro-Russian propaganda and trolling can do that is to find low-grade blog content and make it appear as if
it was produced by Russian media […]
and in the information war associat-
Positive (4%) […] StopFake is devoted to exposing Russian propaganda against the
ed with the ongoing Ukraine conflict. Ukraine. […]
In line with a general notion of Benevolence
Negative (14%) So now you acknowledge StopFake is part of Kiev’s propaganda. I
an increasing need to interpret and guess that answers my question […]
act on information and misinforma- Positive (2%) […] by the way, the website StopFake.org is a very objective and
tion in social media,6,11 some users accurate source exposing Russian propaganda and disinformation
Integrity techniques. […]
included in the study discussed fact-
Negative (11%) […] Why should I give any credence to StopFake.org? Does it ever
checking sites as important elements
criticize the Kiev regime, in favor of the Donbass position? […]
of an information war.
Snopes. The examples in Table
concern regarding the service as a tion when comparing the various argument. For some users in our
contributor to propaganda or doubts services, topic-specific StopFake is sample, lack of trust extends beyond
about its fact-checking practices. perceived as more useful than Snopes a particular service to encompass the
StopFake. As in Table 4, the results and FactCheck.org. One reason might entire social and political system. Us-
for StopFake show more posts ex- be that a service targeting a specific ers with negative perceptions thus
pressing positive sentiment than we topic faces less criticism because it seem trapped in a perpetual state of
found for Snopes and FactCheck.org. attracts a particular audience that informational disbelief.
In particular, the posts included in seeks facts supporting its own view. While one’s initial response to
the study pointed out that StopFake For example, StopFake users target statements reflecting a state of infor-
helps debunk rumors seen as Russian anti-Russian, pro-Ukrainian readers. mational disbelief may be to dismiss
propaganda in the Ukraine conflict. Another, more general, reason might them as the uninformed paranoia of
Nevertheless, the general pat- be that positive perceptions are mo- a minority of the public, the state-
tern in the reasons users gave us for tivated by user needs pertaining to a ments should instead be viewed as a
positive and negative sentiment for perceived high load of misinforma- source of user insight. The reason the
Snopes and FactCheck.org also held tion, as in the case of the Ukraine services are often unsuccessful in re-
for StopFake. The positive posts were conflict, where media reports and ducing ill-founded perceptions9 and
typically motivated by usefulness, social media are seen as overflowing people tend to disregard fact check-
whereas the negative posts reflected with propaganda. Others highlighted ing that goes against their preexisting
the sentiment that StopFake is politi- the general ease information may be beliefs2,13 may be a lack of basic trust
cally biased (“integrity”), a “fraud,” filtered or separated from misinfor- rather than a lack of fact-based argu-
a “hoax,” or part of the machinery mation through sites like Snopes and ments provided by the services.
of Ukraine propaganda (“benevo- FactCheck.org, as expressed like this: We found such distrust is often
lence”). “As you pointed out, it doesn’t take highly emotional. In line with Sil-
that much effort to see if something verman,11 fact-checking sites must
Discussion on the Internet is legit, and Snopes is be able to recognize how debunking
We found users with positive percep- a great place to start. So why not take and fact checking evoke emotion in
tions typically extoled the usefulness that few seconds of extra effort to do their users. Hence, they may benefit
of fact-checking services, whereas that, rather than creating and sharing from rethinking the way they design
users with negative opinions cited misleading items.” and present themselves to strengthen
concerns over trustworthiness. This This finding suggests there is in- trust among users in a general state
pattern emerged across all three ser- creasing demand for fact-checking of informational disbelief. More-
vices. In the following sections, we services,6 while at the same time a over, users of online fact-checking
discuss how these findings provide substantial proportion of social me- sites should compensate for the lack
new insight into trustworthiness as dia users who would benefit from of physical evidence online by be-
a key challenge when countering on- such services do not use them suf- ing, say, demonstrably independent,
line rumors and misinformation2,9 ficiently. The services should thus impartial, and able to clearly distin-
and why ill-founded beliefs may have be even more active on social media guish fact from opinion. Rogerson10
such online reach, even though the sites like Facebook and Twitter, as wrote that fact-checking sites exhibit
beliefs are corrected by prominent well as in online discussion forums, varying levels of rigor and effective-
fact checkers, including Snopes, where greater access to fact checking ness. The fact-checking process and
FactCheck.org, and StopFake. is needed. even what are considered “facts” may
Usefulness. Users in our sample Trustworthiness. Negative percep- in some cases involve subjective in-
with a positive view of the services tions and opinions about fact-check- terpretation, especially when actors
mainly pointed to their usefulness. ing services seem to be motivated by with partial ties aim to provide the
While everyone should exercise cau- basic distrust rather than rational service. For example, in the 2016 U.S.
presidential campaign, the organiza-
Table 5. Challenges and our related recommendations for fact-checking services. tion “Donald J. Trump for President”
invited Trump’s supporters to join a
Challenges Recommendations fact-check initiative, similar to the
Unrealized potential in public Increase presence in social media and category “topics or controversies,”
Usefulness
use of fact-checking services discussion forums urging “fact checking” the presiden-
Ability Critique of expertise and Provide nuanced but simple overview tial debates on social media. How-
reputation of the fact-checking process where
ever, the initiative was criticized as
relevant sources are included
mainly promoting Trump’s views and
Benevolence Suspicion of conspiracy and Establish open policy on fact checking
Trustworthiness propaganda and open spaces for collaboration on candidacy.5
fact checking Users of fact-checking sites ask:
Integrity Perception of bias and Ensure transparency on organization Who actually does the fact checking
partiality and funding. and demonstrable and how do they do it? What organi-
impartiality in fact-checking process
zations are behind the process? And
how does the nature of the organiza-

tion influence the results of the fact 610928, http://www.revealproject.

checking? Fact-checking sites must eu/) but does not necessarily rep-
thus explicate the nuanced, detailed resent the views of the European
process leading to the presented re- Commission. We also thank Marika
sult while keeping it simple enough
to be understandable and useful.11 Users with negative Lüders of the University of Oslo and
the anonymous reviewers for their in-
Need for transparency. While fact-
checker trustworthiness is critical,
perceptions thus sightful comments.
fact checkers represent but one set of seem trapped in References

1. Ezzy, D. Qualitative Analysis. Routledge, London,
voices in the information landscape
and cannot be expected to be benevo-
a perpetual state U.K., 2013.
2. Friesen, J.P., Campbell, T.H., and Kay, A.C. The
lent and unbiased just because they of informational psychological advantage of unfalsifiability: The appeal
of untestable religious and political ideologies. Journal
disbelief.
of Personality and Social Psychology 108, 3 (Nov.
check facts. Rather, they must strive 2014), 515–529.
for transparency in their working pro- 3. Gingras, R. Labeling fact-check articles in Google
News. Journalism & News (Oct. 13, 2016); https://blog.
cess, as well as in their origins, orga- google/topics/journalism-news/labeling-fact-check-
nization, and funding sources. articles-google-news/
4. Hermida, A. Tweets and truth: Journalism as a
To increase transparency in its discipline of collaborative verification. Journalism
processes, a service might try to take Practice 6, 5-6 (Mar. 2012), 659–668.
5. Jamieson, A. ‘Big League Truth Team’ pushes Trump’s
a more horizontal, collaborative ap- talking points on social media. The Guardian (Oct. 10,
proach than is typically seen in the 2016); https://www.theguardian.com/us-news/2016/
oct/10/donald-trump-big-league-truth-team-social-
current generation of services. Fol- media-debate
lowing Hermida’s recommenda- 6. Kriplean, T., Bonnar, C., Borning, A., Kinney, B., and Gill,
B. Integrating on-demand fact-checking with public
tion4 to social media journalists, fact dialogue. In Proceedings of the 17th ACM Conference
checkers could be set up as a plat- on Computer-Supported Cooperative Work & Social
Computing (Baltimore, MD, Feb. 15–19). ACM Press,
form for collaborative verification New York, 2014, 1188–1199.
and genuine fact checking, relying 7. Mayer, R.C., Davis, J.H., and Schoorman, F.D. An
integrative model of organizational trust. Academy of
less on centralized expertise. Form- Management Review 20, 3 (1995), 709–734.
8. Morejon, R. How social media is replacing traditional
ing an interactive relationship with journalism as a news source. Social Media Today
users might also help build trust.6,7 Report (June 28, 2012); http://www.socialmediatoday.
com/content/how-social-media-replacing-traditional-
journalism-news-source-infographic
Conclusion 9. Nyhan, B. and Reifler, J. When corrections fail: The
persistence of political misperceptions. Political
We identified a lack of perceived Behavior 32, 2 (June 2010), 303–330.
trustworthiness and a state of infor- 10. Rogerson, K.S. Fact checking the fact checkers:
Verification Web sites, partisanship and sourcing.
mational disbelief as potential obsta- In Proceedings of the American Political Science
cles to fact-checking services reach- Association (Chicago, IL, Aug. 29–Sept. 1). American
Political Science Association, Washington, D.C., 2013.
ing social media users most critical 11. Silverman, C. Lies, Damn Lies, and Viral Content.
to such services. Table 5 summarizes How News Websites Spread (and Debunk) Online
Rumors, Unverified Claims, and Misinformation. Tow
our overall findings and discussions, Center for Digital Journalism, Columbia Journalism
outlining related key challenges and School, New York, 2015; http://towcenter.org/wp-
content/uploads/2015/02/LiesDamnLies_Silverman_
our recommendations for how to ad- TowCenter.pdf
dress them. 12. Stencel, M. International fact checking gains
ground, Duke census finds. Duke Reporters’ Lab,
Given the exploratory nature of Duke University, Durham, NC, Feb. 28, 2017; https://
reporterslab.org/international-fact-checking-gains-
this study, we cannot conclude our ground/
findings are valid for all services. In 13. Stroud, N.J. Media use and political predispositions:
Revisiting the concept of selective exposure. Political
addition, more research is needed Behavior 30, 3 (Sept. 2008), 341–366.
to be able to make definite claims 14. Tsakonas, G. and Papatheodorou, C. Exploring
usefulness and usability in the evaluation of open-
on systematic differences among the access digital libraries. Information Processing &
various fact checkers based on their Management 44, 3 (May 2008), 1234–1250.
15. Van Mol, C. Improving web survey efficiency: The
“areas of concern.” Nevertheless, the impact of an extra reminder and reminder content on
consistent pattern in opinions we Web survey response. International Journal of Social
Research Methodology 20, 4 (May 2017), 317–327.
found across three prominent ser- 16. Xu, C., Yu, Y., and Hoi, C.K. Hidden in-game intelligence
vices suggests challenges and recom- in NBA players’ tweets. Commun. ACM 58, 11 (Nov.
2015), 80–89.
mendations that can provide useful
guidance for future development in Petter Bae Brandtzaeg (pbb@sintef.no) is a senior
this important area. research scientist at SINTEF in Oslo, Norway.
Asbjørn Følstad (asf@sintef.no) is a senior research
Acknowledgments scientist at SINTEF in Oslo, Norway.
This work was supported by the Eu-

ropean Commission co-funded FP
7 project REVEAL (Project No. FP7- © 2017 ACM 0001-0782/17/09 $15.00
review articles
DOI:10.1145/ 3096742
Exploring the many distinctive elements

that make securing HPC systems much
different than securing traditional systems.
BY SEAN PEISERT
Security
in High-
Performance
Computing
Environments
HOW IS COMPUTER security different in a high-performance
computing (HPC) context from a typical IT context? On key insights
the surface, a tongue-in-cheek answer might be, “just the ˽˽ High-performance computing systems
same, only faster.” After all, HPC facilities are connected have some similarities and some
differences with traditional IT computing
to networks the same way any other computer is, often systems, which present both challenges
and opportunities.
run the same, typically Linux-based operating systems ˽˽ One challenge is that HPC systems are
as are many other common computers, and have long “high-performance” by definition, and so
many traditional security techniques are
been subject to many of the same styles of attacks, be they not effective because they cannot keep up
with the system or reduce performance.
compromised credentials, system misconfiguration, or
˽˽ Many opportunities also exist: HPC
software flaws. Such attacks have ranged from the “wily systems tend to be used for very
distinctive purposes, have much more
hacker” who broke into U.S. Department of Energy (DOE) regular and predictable activity, and
and U.S. Department of Defense (DOD) computing systems contain highly custom hardware/
software stacks. Each of these elements
in the mid-1980s,42 to the “Stakkato” attacks against can provide a toehold for leveraging
some aspect of the HPC platform to
NCAR, DOE, and NSF-funded supercomputing centers in improve security.

the mid-2000s,24,39 to the thousands own distinctive attributes that make from a desktop computer. Thus, for
of probes, scans, brute-force login at- securing such systems somewhat dis- HPC systems, we must ask what is the
tempts, and buffer overflow vulnerabil- tinct from securing other types of com- desired functioning of the system so
ities that continue to plague high-per- puting systems. that we can establish what the security
formance computing facilities today. The fact that computer security policies are and better understand the
On the other hand, some HPC sys- is context- and mission-dependent mechanisms with which those policies
tems run highly exotic hardware and should not be surprising to security can be enforced.
software stacks. In addition, HPC professionals—“security policy is a On the other hand, historically, se-
systems have very different purposes statement of what is, and what is not, curity for HPC systems has not neces-
and modes of use than most general- allowed,”7—and each organization, sarily been treated as distinct from
purpose computing systems, of either will therefore have a somewhat dis- general-purpose computing, except,
the desktop or server variety. This fact tinctive security policy. For example, typically, making sure that security
PHOTO BY GORODENKOF F VISUALS
means that aside from all of the nor- a mechanism designed to enforce a does not get in the way of performance
mal reasons that any network-connect- particular policy considered essen- or usability. While laudable, this article
ed computer might be attacked, HPC tial for security by one site might be argues that this assessment of HPC’s
computers have their own distinct considered a denial of service to le- distinctiveness is incomplete.
systems, resources, and assets that an gitimate users of another site, or how This article focuses on four key
attacker might target, as well as their a smartphone is protected is distinct themes surrounding this issue:
review articles
The first theme is that HPC systems regular and predictable mode of opera- even in open science, data leakage is
are optimized for high performance tion, which changes the way security certainly an issue and a threat, this ar-
by definition. Further, they tend to be can be enforced. ticle focuses more on integrity related
used for very distinctive purposes, no- As a final aside, many, but by no threats,31,32 including alteration of code
tably mathematical computations. means all HPC systems are often ex- or data, or misuse of computing cycles,
The second theme is that HPC tremely open systems from a security and availability related threats, in-
systems tend to have very distinctive standpoint, and may be used by scien- cluding disruption or denial of service
modes of operation. For example, com- tists worldwide whose identities have against HPC systems or networks that
pute nodes in an HPC system may be never been validated. Increasingly, we connect them.
accessed exclusively through some are also starting to see HPC systems in Computations that are incorrect
kind of scheduling system on a login which computation and visualization for non-malicious reasons, including
node in which it is typical for a single are more tightly coupled and, a human flaws in application code, such as gen-
program or common set of programs manipulates the inputs to the computa- eral logic errors, round-off errors, non-
to run in sequence. And, even on that tion itself in near-real time. determinism in parallel algorithms,
login node, from which the computa- This distinctiveness presents both unit conversion errors,20 as well as in-
tion is submitted to the scheduler, it opportunities and challenges. This correct assumptions by users about the
may be the case that an extremely nar- article discusses the basis for these hardware they are running on, are vital
row range of programs exist compared themes and the conclusions for secu- issues, but beyond the scope of this ar-
to those commonly found on general- rity for these systems. ticle, due to length and the fact those is-
use computing systems. Scope and threat model. I have spent sues are well-covered elsewhere.4,5,6,8,36
The third theme is that while some most of my career in or near “open sci-
HPC systems use standard operating ence:” National Science Foundation High-Performance
systems, some use highly exotic stacks. and Department of Energy Office of Sci- Computing Environments
And even the ones that use standard op- ence-funded high-performance com- Distinctive purposes. The first theme
erating systems, very often have custom puting centers, and so the lens through of the distinctiveness of security for
aspects to their software stacks, particu- which this article is discussed tends to HPC systems is that these systems
larly at the I/O and network driver levels, focus on such environments. The chal- are high-performance by definition,
and also at the application layer. And, lenges in “closed” environments, such and are made that way for a reason.
of course, while the systems may use as those used by the National Security They are typically used for automated
commodity CPUs, the CPUs and other Agency (NSA), Department of Defense computation of some kind, typically
hardware system components are often (DoD), or National Nuclear Security performing some set of mathemati-
integrated in HPC systems in a way (for Administration (NNSA) National Labs, cal operations. Historically, this has
example, by Cray or IBM) that may well or commercial industry, shares some, often been for the purpose of model-
exist nowhere else in the world. but not all of the attributes discussed ing and simulation, and increasingly
The fourth theme, which follows in this article. As a result, although I today, for data analysis as well. Given
from the first three themes, is that HPC discuss confidentiality, a typical com- the primary purpose of HPC systems
systems tend to have a much more ponent of the “C-I-A” triad, because is therefore high-performance, and
given that such systems themselves are
Figure 1. Three typical high-level workflow diagrams of scientific computing. The diagram both few in number, and therefore also
at top shows a typical workflow for data analysis in HPC; the middle diagram shows a
typical workflow for modeling and simulation; and the bottom diagram shows a coupled,
that computing time on such systems
interactive compute-visualization workflow. is quite valuable, there is a reluctance
by the major stakeholders—the fund-
ing agencies that support HPC systems
Data Analysis
as well as the users who run computa-
Connect Transfer Edit config files, Transfer tions on them—to agree to any solu-
to login data in compile, Wait data out tion that might impose overhead on
node via DTN submit batch job via DTNs
the system. Those stakeholders might
well regard such a solution as a waste
Simulation of cycles at worst, and an unacceptable
delay of scientific results at best. This is
Connect Edit config files, (Maybe)
to login compile, Wait Transfer data
an important detail, because it frames
node submit batch job out via DTNs the types of security solutions that at
least historically might have been con-
sidered acceptable to use.
Simulation with Coupled Computation/Visualization
Distinctive modes of operation. The
second theme of the distinctiveness of
Connect Edit config files, Visualize
Job security for HPC systems is that these
to login compile, Wait Output/Adjust
Starts
node submit batch job Inputs systems tend to have distinctive modes
of operation. The typical mode of oper-
ation for using a scientific high-perfor-

review articles
mance machine involves connecting email clients, Microsoft Office, iTunes

through a login node of some kind. In Music, Adobe Acrobat, personal task
parallel, at least for data analysis tasks, managers, Skype, and instant messag-
data that a user wishes to analyze may ing. And, importantly, this is often a
be copied to the machine via a data
transfer node or DTN, and software For HPC systems, much smaller set of programs with a
much more regular sequence of events
that a user wishes to install may be cop-
ied to the login node as well.
we must ask what in which the use of one program direct-
ly follows from another, as well, rather
The user is then likely to edit some is the desired than the constant attention-span-driv-
configuration files, compile their soft-
ware, and write a “batch script” that
functioning of the en context switching of the use of gen-
eral-purpose computers. For example,
defines what programs should be run, system so that we on the NERSC HPC systems, in 2014,
along with parameters of how those
programs should be run. This is be-
can establish what for over 5950 unique users that were ac-
tive in 2014, just 13 applications com-
cause most significant jobs are not the security policies prised 50% of the cycles consumed,
run on the login nodes themselves, be-
cause the login nodes have very limited are and better 25 applications comprised 66% of the
cycles, and 50 applications comprised
resources. Rather, many institutions understand the 80% of the cycles.2 The consequences
use compute nodes, which cannot be
logged into directly, but rather have a mechanisms with of these distinctive workflows are im-
portant, as we will discuss.
batch scheduler that determines when
jobs should run based on analyzing the
which those policies Custom operating system stacks. The
third theme of the distinctiveness of
batch scripts that have been submitted can be enforced. security for HPC systems is that these
according to a given optimization pol- systems often have highly exotic stacks.
icy for the site in question. Thus, after Current HPC environments represent
writing their batch script, the user will a spectrum of hardware and software
probably submit their job to a batch components, ranging from exotic and
queue using a submission program, highly custom to fairly commodity.
and then log out and wait for the job to As an example, “Cori Phase 1,”a
run on the compute nodes. the newest supercomputer at NERSC,
Following that, the user may run is a Cray XC based on Intel Haswell
some kind of additional analysis or vi- processors, leveraging Cray Aries in-
sualization on the data that was output. terconnects, a Lustre file system, and
This may happen on the HPC system, nonvolatile memory express (NVMe) in
or the output of the HPC computation the burst buffer that is user accessible.
may be downloaded to a non-HPC sys- Cori runs a full SUSE Linux distribu-
tem for analysis in a separate environ- tion on the login nodes and Compute
ment such as using Jupyter/IPython.33 Node Linux (CNL),44 a light-weight ver-
This additional analysis or visualiza- sion of the Linux kernel and run-time
tion might happen serially, following environment based on the SuSE Linux
the completed execution on the HPC Enterprise Server distribution.
system, or, alternatively, may happen in Mira,b at the Argonne Leadership
an interactive, tightly-coupled fashion Computing Facility, is a hybrid system.
such that the user visualizing the out- The login nodes are IBM Power 7-based
put of the computation can manipulate systems. The compute nodes are an
the computation as it is taking place.37,45 IBM Blue Gene/Q system based on
It should be noted that the “coupled” PowerPC A2 processors, IBM’s 5D to-
computation/analysis model could in- rus interconnect, and a similarly elabo-
volve network connections external to rate memory structure. The I/O nodes
the HPC facility, or, and particularly as also use PowerPC A2 processors and
envisioned by the “superfacility” model are connected using Mellanox Infini-
for data-intensive science,50 may in- band QDR switches. The login nodes
volve highly specialized and optimized run Red Hat Linux. The compute nodes
network connections within a single run Compute Node Kernel (CNK),1 a
HPC center. Examples of all three work- Linux-like OS for compute nodes, but
flows are shown in Figure 1.
These use cases are often in stark
a http://www.nersc.gov/users/computational-
contrast to the plethora of software that systems/cori/configuration/
is typically run on a general-purpose b https://www.alcf.anl.gov/user-guides/ma-
desktop system, such as Web browsers, chine-overview
review articles
support neither multi-tasking or virtu- mon stacks. On the other hand, some
al memory27 (CNK has no relationship custom stacks may be smaller, more
with CNL). The I/O system runs the easily verified, and less complex.
GPFS file system client. Openness. Our final theme is the
Aurora,c the system scheduled to
be installed at ALCF in 2019, will be There is a relative “openness” of at least some
HPC systems. That is, scientists from
constructed by a partnership between
Cray and Intel and will run third-gen-
reluctance by major all over the world whose identities have
never been validated may use them.
eration Intel Xeon Phi processors with stakeholders—the For example, many such systems, such
second-generation Intel Omni-Path
photonic interconnects and a variety of
funding agencies as those used by NSF or DOE ASCR,
have no traditional firewalls between
ash memory and NVRAM components that support HPC the data transfer nodes and the Inter-
to accelerate I/O, including 3DXpoint
and 3D NAND in multiple locations,
systems as well net, let alone the ability to “air gap” the
HPC system (that is, ensure no physi-
all user accessible. Aurora will run Cray as the users who cal connection to the regular Internet
Linux10—a full Linux stack on its login
nodes and I/O nodes (though the I/O run computations is possible) as some communities are
able to do.
nodes do not allow general user ac- on them—to agree
Security Mechanisms and
cess), and mOS46 on its compute nodes.
mOS supports both a lightweight ker- to any solution Solutions that Overcome
nel (LWK) and full Linux operating sys-
tem to enable users to choose between
that might impose the Constraints of
HPC Environments
avoiding unexpected operating system overhead on the Traditional IT security solutions, in-
overhead, and the flexibility of a full
Linux stack.
system. cluding network and host-based intru-
sion detection, access controls, and
Summit,d the system scheduled to software verification work about as
be installed at OLCF in 2018, will be well in HPC as traditional IT (often not
based on both IBM POWER9 CPUs and very), or worse, due to constraints in
NVIDIA Volta GPUs, with NVIDIA NV- HPC environments.
Link on-node networks and dual-rail For example, traditional host-based
Mellanox interconnects. security mechanisms, such as those le-
In short, there is certainly some veraging system call data via audited,
variation on exactly what operating as well as certain types of network se-
systems are run—in all cases, login curity mechanisms, like network fire-
nodes run “full” operating systems. walls and firewalls doing deep packet
And in some cases, full operating sys- inspection, may be antithetical to the
tems are also used for compute nodes, needs of the system being protected.
while in other cases, lighter-weight For example, it has been shown that
but Linux API-compatible versions of even 0.0046% packet loss (1 out of
operating systems are used, while in 22,000 packets) can cause a loss in
some cases entirely custom operating throughput of network data transfers
systems are used that are single-user of approximately 90%.13 Given that
only, and contain no virtual memory stateful and/or deep-packet inspect-
capabilities or multitasking. ing firewalls can cause delays that
At least for the full operating sys- might lead to such loss, a firewall, as
tems, it is reasonable to assume the traditionally defined, is inappropriate
operating systems contain similar or for use in environments with high net-
identical capabilities and bugs as stan- work data throughput requirements.
dard desktop and server versions of Thus, alternative approaches must
Linux, are just as vulnerable to attack be applied. Some solutions exist that can
via various pieces of software (libraries, help compensate for these constraints.
runtime, and application) that are run- The Science DMZ13 security frame-
ning on the system. work defines a set of security poli-
Custom hardware and software cies, procedures, and mechanisms
components may have both positives to address the distinct needs of sci-
and negatives. On one hand, they may entific environments with high net-
receive less assurance than more com- work throughput needs (HPC security
theme #1). While the needs of high
c http://aurora.alcf.anl.gov throughput networks do not elimi-
d https://www.olcf.ornl.gov/summit/ nate options for security monitoring

review articles
or mitigation, those requirements do of HPC environments, such as those behavior in HPC are likely more regular
change what is possible. requiring environments with greater than in typical computing systems, one
In particular, in the Science DMZ data confidentiality guarantees, such might expect that one can reduce the
framework, the scientific computing as medical, defense, and intelligence error rates when using anomaly-based
systems are moved to their own en- environments. Steps have been made intrusion detection, and possibly even
clave, away from other types of comput- toward the medical context as well. making specifications possible to con-
ing systems that might have their own The Medical Science DMZ29 applies struct for specification-based intrusion
distinctive security needs and perhaps the Science DMZ framework to com- detection. Thus, such security mecha-
even distinct regulations—for example, puting environments requiring com- nisms might even fare better in HPC
financial, human resources, and other pliance with HIPAA Security Rule. Key environments than in traditional IT
business computing systems. In addi- architectural aspects include the notion environments (theme #4), though dem-
tion, it directs transfers through single that all traffic from outside compute/ onstrating the degree to which the in-
network ingress and egress point that storage infrastructure passes through creased regularity of HPC environments
can be monitored and restricted. heavily monitored head nodes, that may be helpful for security analysis is an
However, the Science DMZ does storage and compute nodes themselves open research question.
not use “deep packet inspecting” or are not connected directly to the Inter- Analyzing system behavior with ma-
stateful firewalls. It does leverage pack- net, and that traffic containing sensitive chine learning. A second, and related key
et filtering firewalls that is, firewalls or controlled access data is encrypted. point about HPC systems being used
that examine only attributes of packet However, further work in medical en- primarily for mathematical computa-
headers and not packet payloads. And, vironments, as well as other environ- tion is that if we can do better analysis of
separately, it also performs deep packet ments is required. system behavior, the insight that most
inspection and stateful intrusion detec- HPC machines are used for computa-
tion, such as might be done with the Leveraging the Distinctiveness tion focuses our attention on what se-
Bro Network Security Monitor.28 How- of HPC as an Opportunity curity risks to care about (for example,
ever, the two processes are not directly The Science DMZ helps compensate users running “illicit computations,” as
coupled, as, unlike a firewall, the IDS is for HPC’s limitations—we need more defined by the owners of the HPC sys-
not used in-line with the network traffic, such solutions. As indicated by the four tem) and might give us better ability to
and as a result, delays are not imposed themes enumerated in this article, we understand what type of computation is
on transmission of the traffic due to also need solutions that can leverage taking place.
inspection, and thus congestion that HPC distinctiveness as a strength. An example of a successful approach
might lead to packet loss and retrans- Sommer and Paxson41 point out the to addressing this question involved re-
mission is also not created. fact that anomaly-based detection typi- search that I was involved with at Berke-
Thus, by moving the traffic to its own cally is not used in traditional IT envi- ley Lab between 2009–2013.14,30,47,48 In
enclave that can be centrally monitored ronments is due to the high-level fact this project, we asked the questions:
at a single point, the framework seeks that “finding attacks is fundamentally What are people running on HPC sys-
to maintain a similar level of security different from … other applications” tems? Are they running what they usu-
to traditional organizations that typi- (such as credit card fraud detection, ally run? Are they running what they
cally have a single ingress/egress point, for example). Among other key issues, requested cycle allocations to run, or
rather than simply removing network they note that network traffic is often mining Bitcoins?
monitoring without replacing it with much more diverse than one might Are they running something illegal
an alternative. However, the Science expect. They point out that semantic (for example, classified)? In that work,
DMZ does so in a very specific way that understanding is a vital component of we developed technique for answering
accommodates the type and volume of overcoming this limitation to enable these questions by fingerprinting com-
network traffic used in scientific and machine-learning approaches to secu- munication on HPC systems.
high-performance computing environ- rity to be more effective. Specifically, we collected Message
ments. More specifically, it achieves On the other hand, as mentioned Passing Interface (MPI) function calls
throughput by reducing complexity, earlier, HPC systems tend to be used for via the Integrated Performance Moni-
which is a theme that we will return to very distinctive purposes, notably math- toring (IPM)43 tool, which showed pat-
in this article. ematical computations (theme #1). The terns of communication between ores
The Science DMZ framework has specific application of HPC systems var- in an HPC system, as shown in Figure 2.
been implemented widely in university ies by the organization that uses them Using 1681 logs for 29 scientific ap-
and National Lab environments around (for example, DOE National Lab, DOD plications from NERSC HPC systems,
the world as a result of funding from lab), but each individual system typi- we applied Bayesian-based machine
NSF, DOE ASCR, and other, internation- cally has a very specific use. This is a key learning techniques for classification
al funding organizations, to support point because the result may be that of scientific computations, as well as
computing and networking infrastruc- both specification-based and anomaly- a graphtheoretic approach using “ap-
ture for open science. It goes with- based intrusion detection may be more proximate” graph matching techniques
out saying that both the Science DMZ useful in HPC environments than in tra- (subgraph isomorphism and edit dis-
framework and the Bro IDS must also ditional IT environments. Specifically, tance). A hybrid machine learning and
continue to be adapted to more types given the hypothesis that patterns of graph theory approach identified test
review articles
HPC codes with 95%–99% accuracy. tain distinctive security policies in HPC system to accomplish whatever illicit
Our work analyzing distributed environments that might help improve use the attacker is attempting.
memory parallel computation patterns the usefulness of application-level use Collecting better audit and prove-
on HPC compute nodes is by no means monitoring. There are at least two rea- nance data. It is important to note the
conclusive that anomaly detection is an sons for this. success of the work mentioned in the
unqualified success on HPC systems First, given the organization re- previous section is dependent on avail-
for intrusion detection. For one thing, sponsible for security of HPC systems ability of useful security monitoring
the experiments were not conducted are likely to care more about misuse of data. It is our observation that the cur-
in an adversarial environment, and so cycles if very large numbers of cycles rent trend in many scientific environ-
the difficultly of an attacker intention- are used, this suggests focusing on the ments on collecting provenance data
ally evading detection by attempting to users that use cycles for many hours for scientific reproducibility purposes,
make one program look like another per day for days at a time. This is a very such as the Tigres workflow system,38
was not explored. In addition, in our different practical scenario than net- and the DOE Biology Knowledgebase
“fingerprinting HPC computation” work security monitoring where a de- (KBase)21 may help to provide better
project, we had what we deemed to be cision about security might require a data that can be used for security moni-
a reasonable, though not exhaustive response in a fraction of a second in or- toring, as might DARPA’s “Transparent
corpus of data representative of typi- der to prevent compromise. Given the Computing” program 11, which seeks
cal computations on NERSC facilities longer time scale, therefore, a human to “make currently opaque comput-
to examine. In addition, in examining security analyst can be involved rather ing systems transparent by providing
the data, we focused on a specific set than requiring the application moni- high-fidelity visibility into component
of activity contained within the NERSC toring, on the level that we have done interactions during system operation
Acceptable Use. it, to be conclusive. Rather, that appli- across all layers of software abstrac-
Policy as falling outside of “accept- cation monitoring might simply serve tion, while imposing minimal perfor-
able use.” Other sites will have a differ- to focus an analyst’s attention, and to mance overhead.”
ent baseline of “typical computation,” lead to a manual source code analysis, In line with this, as noted earlier,
and are also likely have somewhat dif- or even an actual conversation with the HPC systems have a lot in common
ferent policies that define what is or is user whose account was used to run with traditional systems, but also
not “illicit use.” the code. contain a lot of highly custom OS and
However, regardless, we do believe A second reason why this issue of network-level, and application-level
the approach is an example of the type an attacker evading detection on HPC software. A key point here is that such
of techniques that could possibly have might be harder is because, users are exotic hardware and low-level software
success in HPC environments and pos- often given “cycle allocations” to run stacks may also provide opportunities
sibly even greater success than in many code. As a result, the more a program for monitoring data going forward. An
non-HPC environments. For example, running on an HPC system is modified example of the performance counters
consider the possibility of a skilled at- to mask illicit use, the more likely it is used in many of today’s HPC machines
tacker attempting to evade detection that additional cycles must be used to is an example of this.
something that any security mecha- do additional tasks to make it look like Post-exascale systems, as well as
nism relying on machine learning is the program is doing something differ- more architectures that are still in
vulnerable to. Not only do there appear ent than it actually is. Thus, the faster their early phases of practical imple-
to be more regular use patterns in HPC that a stolen allocation will be used up mentation, such as neuromorphic
environments, but there also exist cer- and/or the longer it will take the HPC computing, quantum computing, and
Figure 2. “Adjacency matrices” for individual runs of a performance benchmark, an atmospheric dynamics simulator, and a linear equation
solver SUPERLU. Number of bytes sent between ranks is linearly mapped from dark blue (lowest) to red (highest), with white indicating an
absence of communication.47,48
Source Rank
Source Rank
Source Rank
Destination Rank Destination Rank Destination Rank

review articles
photonic computing may all provide to HPC, rather than full-blown UNIX
additional challenges and opportuni- command-line interfaces, may provide
ties. For example, though neural net- a reduction of complexity that super-
works were previously thought by many facility would otherwise introduce.
to be inscrutable,16 new research sug-
gests this may be actually possible at In the future, it is While science gateways still represent
vulnerability vectors from arbitrary
some point.12,49 If successful, this might
give to rise to the ability to interpret net-
clear that numerous code, even when it is submitted via
Web front-ends, since security tends to
works learned by neuromorphic chips. aspects of HPC will benefit from more constrained opera-
Looking to the Future

change, both for tion, the general toward science gate-
ways may also enhance security.
In the future, it is clear that numerous the good of security Finally, the prospect of new and novel
aspects of HPC will change, both for
the good of security and in ways that
and in ways that security technologies, such as simulated
homomorphic encryption,34,35 differen-
complicate it. complicate it. tial privacy,15 and cryptographic mecha-
One key component of the National nisms for securing chains of data3,18,40
Strategic Computing Initiative is that such as blockchains,26 may also may pro-
software engineering is a key goal of vide new means for interacting with data
the NSCI, and so perhaps automated sets in a constrained fashion.
static/runtime analysis tools might be For example, there may be cases
developed and used to check HPC code where the owners of the data want to
for insecure behaviors. keep the raw data for themselves for
On the other hand, science is also an extended period of time, such as a
changing. For example, distributed, scientific embargo. Or there may be
streaming sensor data collection is cases where the owners of the data
increasingly a source of data used in are unable to share the raw data due
HPC. In short, science data is getting to to privacy regulations, such as on
us in new ways, and we also have more medical data, system and network
data than ever to protect. data that contains personally iden-
Another change is that on HPC sys- tifiable information, or sensor data
tems running full operating systems, containing sensitive (for example,
we are starting to see an increasing shift location) information. In either case,
toward the use of new virtualized envi- the data owners may still wish to find
ronments for additional flexibility. In a way to enable some limited type of
particular, as Docker containers25 and computation on the data, or share
CoreOS’s Rocket9 become more popu- data, but only with a certain degree of
lar for virtual replication and contain- resolution. With CryptDB34 and My-
ment in many IT environments, rather lar,35 Popa et al. have demonstrated
than replicating full virtual operating approaches for efficiently searching
systems, Docker-like containers that over encrypted data without requir-
are more appropriate to HPC environ- ing fully homomorphic encryption,17
ments, such as Shifter19 or Singular- which is currently at least a million
ity23 are also gaining attention and times slow to be used practically, let
use. This notion of “containerization” alone in HPC environments. Like-
may well be a key benefit to security, wise, differential privacy,15 and per-
both because of the way that contain- haps particularly distributed dif-
erization done properly typically lim- ferential privacy22 may provide new
its the damage that an attacker can opportunities for sharing and analyz-
do, as well as because it simplifies the ing data to be used in HPC environ-
operation of the machine, and the re- ments as well. And in addition, block-
duction of complexity is also often a chains and similar technologies may
key benefit to system robustness, in- provide means for both monitoring
cluding security. the integrity of raw scientific data in
The superfacility model in which HPC contexts, as well as for maintain-
computation and visualization are ing secure audit trails of accesses to
more frequently tightly coupled than or modifications of raw data.
they currently are, seems also likely to
increase. At the same time, the notion Summary
of “science gateways” essentially Web Modern HPC systems do some things
portals, providing limited interfaces very similar to ordinary IT computing,
review articles
but they also have some significant dif- References

31. Peisert, S., et al. ASCR Cybersecurity for Scientific
Computing Integrity. TR LBNL-6953E, U.S.
ferences. This article presented both 1. Adiga, N.R. et al. An overview of the Blue-Gene/L Department of Energy Office of Science, Feb. 2015.
supercomputer. In Proceedings of the ACM/IEEE
challenges and opportunities. Conference on Supercomputing, 2002.
32. Peisert, S. et al. ASCR Cybersecurity for Scientific
Computing Integrity|Research Pathways and Ideas
Two key security challenges are the 2. Austin, B. et al. 2014 NERSC Workload Analysis (Nov. Workshop. TR LBNL-191105, U.S. Department of
5., 2015); http://portal.nersc.gov/project/mpccc/
notions that traditional security solu- baustin/NERSC_2014_Workload_Analysis_v1.1.pdf.
Energy Office of Science, Sept. 2015.
33. Pérez, F. and Granger, B.E. IPython: A System for
tions often are not effective given the 3. Anderson, R.J. UEPS: A second-generation electronic interactive scientific computing. Computing in Science
wallet. In Proceedings of the 2nd European Symposium and Engineering 9, 3 (May 2007), 21–29.
paramount priority of high-perfor- on Research in Computer Security (Nov. 1992), 411–418. 34. Popa, R.A., Redfield, C., Zeldovich, N. and Balakrishnan,
mance in HPC. In addition, the need 4. Bailey, D.H. Resolving numerical anomalies in scientific H. Cryptdb: Processing queries on an encrypted
computation, 2008. database. Commun. ACM 55, 9 (Sept. 2012), 103–111.
to make some HPC environments as 5. Bailey, D.H., Borwein, J.M. and Stodden, V. Facilitating 35. Popa, R.A., Stark, E., Helfer, J., Valdez, S., Zeldovich,
open as possible to enable broad scien- reproducibility in scientific computing: Principles N., Kaashoek, M.F. and Balakrishnan, H. Building Web
and practice. Reproducibility: Principles, Problems, applications on top of encrypted data using Mylar.
tific collaboration and interactive HPC Practices. H. Atmanspacher and S. Maasen, Eds. John In Proceedings of the 11th Symposium on Networked
also presents a challenge. Wiley and Sons, New York, NY, 2015. Systems Design and Implementation (2014), 157–172.
6. Bailey, D.H., Demmel, J., Kahan, W., Revy, G. and Sen, 36. Rubio-Gonzàlez, C. Precimonious: Tuning assistant
There may also be opportunities, as K. Techniques for the automatic debugging of scientific for floating-point precision. In Proceedings of the
described by the four themes regard- floating-point programs. In Proceedings of the 14th International Conf. on High Performance Computing,
GAMM-IMACS International Symposium on Scientific Networking, Storage and Analysis. ACM, 2013, 27.
ing HPC security presented here. The Computing, Computer Arithmetic and Validated 37. Reubel, O. WarpIV: In situ visualization and analysis of
fact that HPC systems tend to be used Numerics (Lyon, France, Sept. 2010). ion accelerator simulations. IEEE Computer Graphics
7. Bishop, M. Computer Security: Art and Science. and Applications 36, 3 (2016), 22–35.
for very distinctive purposes, nota- Addison-Wesley Professional, Boston, MA, 2003. 38. Ramakrishnan, L., Poon, S., Hendrix, V., Gunter, D.,
bly mathematical computations, may 8. Cappello, F. Improving the trust in results of numerical Pastorello, G.Z. and Agarwal, D. Experiences with
simulations and scientific data analytics. 2015. user-centered design for the Tigres workflow API.
mean the regularity of activity within 9. CoreOS, Inc. rkt - App Container runtime. https:// In Proceedings of 2014 IEEE 10th International
HPC systems can benefit the effective- github.com/coreos/rkt. Conference on e-Science, vol 1. IEEE, 290–297.
10. Cray, Inc. Cray Linux Environment Software Release 39. Singer A. Tempting fate. ;login: 30, 1 (Feb. 2005), 27–30.
ness of machine learning analyses Overview, s-2425-52xx edition (Apr 2014); http://docs. 40. Schneier, B. and Kelsey, J. Automatic event-stream
on security monitoring data to detect cray.com/books/S-2425-52xx. notarization using digital signatures. In Proceedings of
11. DARPA. Transparent Computing; http://www. the 4th International Workshop on Security Protocols.
misuse of cycles and threats to com- darpa.mil/Our_Work/I2O/Programs/Transparent_ Springer, 1996, 155–169.
putational integrity. In addition, cus- Computing.aspx. 41. Sommer, R. and Paxson, V. Outside the closed world:
12. Das, A., Agrawal, H., Zitnick, C.L., Parikh, D. and Batra, On using machine learning for network intrusion
tom stacks provide opportunities for D. Human attention in visual question answering: Do detection. In Proceedings of the 31st IEEE Symposium
humans and deep networks look at the same regions? on Security and Privacy, Oakland, CA, May 2010.
enhanced security monitoring, and In Proceedings of the Conference on Empirical 42. Stoll, C. Stalking the wily hacker. Commun. ACM 31, 5
the general trend toward container- Methods in Natural Language Processing, 2016. (May 1988), 484–497.
13. Dart, E., Rotman, L., Tierney, B., Hester, M. and Zurawski, 43. Skinner, D., Wright, N., Fürlinger, K., Yelick, K.A. and
ized operation, limited interfaces, and J. The science DMZ: A network design pattern for Snavely, A. Integrated Performance Monitoring;
reduced complexity in HPC is likely data-intensive science. In Proceedings of the IEEE/ACM http://ipm-hpc.sourceforge.net/.
Annual SuperComputing Conference (Denver CO, 2013). 44. Wallace, D. Compute node Linux: New frontiers in compute
to help in the future much as reduced 14. DeMasi, O., Samak, T. and Bailey, D.H. Identifying HPC node operating systems. Cray User Group, 2007.
complexity has benefitted the Science codes via performance logs and machine learning. In 45. Whitlock, B., Favre, J.M. and Meredith, J.S. Parallel
Proceedings of the Workshop on Changing Landscapes in situ coupling of simulation with a fully featured
DMZ model. in HPC Security (2013). visualization system. In Proceedings of the 11th
15. Dwork, C. Differential privacy. In Proceedings of the Eurographics Conference on Parallel Graphics and
33rd International Colloquium on Automata, Languages Visualization, 2011, 101–109.
Acknowledgments and Programming, Part II. Lecture Notes in Computer 46. Wisniewski, R.W., Inglett, T., Keppel, P., Murty, R.
Appreciation to Deb Agarwal, David Science 4052, (July 2006), 1–12. Springer Verlag. and Riesen, R. mOS: An architecture for extreme-
16. Gefter, A. Is artificial intelligence permanently scale operating systems. In Proceedings of the 4th
Brown, Jonathan Carter, Phil Colella, inscrutable? Nautilus 40 (Sept. 1, 2016). International Workshop on Runtime and Operating
Dan Gunter, Inder Monga, and Kathy 17. Gentry, C. Computing arbitrary functions of encrypted Systems for Supercomputers. ACM, 2014.
data. Commun. ACM 53, 3 (Mar. 2010), 97–105. 47. Whalen, S., Peisert, S. and Bishop, M. Network-theoretic
Yelick for their valuable feedback and 18. Haber, S. and Stornetta, W.S. How to time-stamp a classification of parallel computation patterns. In
to Sean Whalen and Bogdan Copos digital document. J. Cryptology 3, 2 (Jan. 1991), 99–111. Proceedings of the First International Workshop on
19. Jacobsen, D.M. and Canon, R.S. Contain this, Characterizing Applications for Heterogeneous Exascale
for their excellent work underlying the unleashing docker for HPC. Proceedings of the Cray Systems (Tucson, AZ, June 4, 2011).
48. Whalen, S., Peisert, S. and Bishop, M. Multiclass
ideas for new approaches described User Group, 2015.
Classification of Distributed Memory Parallel
20. Jiang, L. and Su, Z. Osprey: A practical type system for
here. Thanks to Glenn Lockwood for validating dimensional unit correctness of c programs. Computations. Pattern Recognition Letters 34, 3 (Feb.
2013), 322–329.
his insights on the specifications for In Proceedings of the 28th International Conference on
49. Yosinski, J., Clune, J., Fuchs, T. and Lipson, H.
Software Engineering, (2006), 262–271 ACM, New York.
the DOE ASCR hardware and software 21. KBase: The Department of Energy Systems Biology Understanding neural networks through deep
visualization. In Proceedings of the Deep Learning
coming in the next few years, and both Knowledgebase; http://kbase.us.
Workshop, International Conference on Machine
22. Kasiviswanathan, S.P., Lee, H.K., Nissim, K.,
Glenn Lockwood and Scott Campbell Raskhodnikova, S. and Smith, A. What can we learn Learning, 2015.
privately? SIAM J. Computing 40, 3 (2011), 793–826. 50. Yelick, K. A Superfacility for Data Intensive Science.
for the time spent providing the data Advanced Scientific Computing Research Advisory
23. Kurtzer, G.M. et al. Singularity; http://singularity.lbl.gov.
that supported that research. 24. Marko, J. and Bergman, L. Internet attack is called Committee, Washington, DC, Nov. 8, 2016; http://science.
broad and long lasting. New York Times (May 10, 2005). energy.gov/~/media/ascr/ascac/pdf/meetings/201609/
This work used resources of the Na- Yelick_Superfacility-ASCAC_2016.pdf.
25. Merkel, D. Docker: Lightweight Linux containers for
tional Energy Research Scientific Com- consistent development and deployment. Linux J. 239
(2014). Sean Peisert (sppeisert@lbl.gov) is Staff Scientist
puting Center and was supported by the 26. Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash at Lawrence Berkeley National Laboratory, Chief
Director, Office of Science, Office of Ad- System (May 24, 2009); http://www.bitcoin.org/bitcoin.pdf. Cybersecurity Strategist at CENIC, and an associate
27. Nataraj, A., Malony, A.D., Morris, A. and Shende, S. adjunct professor at the University of California, Davis.
vanced Scientific Computing Research, Early experiences with KTAU on the IBM BG/L.
of the U.S. Department of Energy under In European Conference on Parallel Processing, pp. Copyright held by owner/author.
99-110. Springer, 2006.
Contract No. DE-AC02-05CH11231. 28. Paxson, V. Bro: A system for detecting network
Any opinions, findings, conclusions, intruders in real time. Computer Networks 31, 23
(1999), 2435–2463. Watch the author discuss
or recommendations expressed in this 29. Peisert, S., et al. The Medical Science DMZ. J. American his work in this exclusive
material are those of the author and do Medical Informatics Assoc. 23, 6 (Nov. 1, 2016). Communications video.
30. Peisert S. Fingerprinting Communication and https://cacm.acm.org/videos/
not necessarily reflect those of the em- Computation on HPC Machines. TR LBNL-3483E, security-in-high-performance-
ployers or sponsors of this work. Lawrence Berkeley National Laboratory, June 2010. computing-environments

research highlights
P. 82 P. 83
Technical
Perspective Exploiting the Analog
A Gloomy Look Properties of Digital Circuits
at the Integrity
of Hardware for Malicious Hardware
By Charles (Chuck) Thacker By Kaiyuan Yang, Matthew Hicks,
Qing Dong, Todd Austin, and Dennis Sylvester
P. 92 P. 93
Technical
Perspective Scribe: Deep Integration
Humans and of Human and Machine
Computers
Working Together Intelligence to Caption
on Hard Tasks Speech in Real Time
By Ed H. Chi By Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim,
Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham
research highlights
DOI:10.1145/ 3 0 6 8 774
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3068776 rh
A Gloomy Look at
the Integrity of Hardware
By Charles (Chuck) Thacker
SINCE THE INVENTION of the integrated get’s software, rather than by adding
circuit, the complexity of the devices hardware. The reports seem to indi-
and the cost of the facilities used to As technologists, cate the bot devices were easily com-
build them have increased dramati- technical solutions promised, using default passwords
cally. The first fabrication facility that could not be changed, and the de-
with which I was associated was built are what we do best. vices were not designed to be updated
at Xerox PARC in the mid-1970s at a In the case of in the field. While the security provid-
cost of approximately $15M ($75M to- ed by IoT devices will surely improve,
day). Today, the cost of a modern fab the attack the authors argue that the introduc-
is approximately $15B. This cost is proposed by tion of small Trojans by untrusted
justified by the fact that today’s chips fabrication facilities will remain a
are much more complex than in ear- the authors, problem for which technical solutions
lier times. The number of layers in- a technical defense appear elusive.
volved has grown to over 100, and the As technologists, technical solu-
tolerances involved are approaching seems problematic. tions to problems are what we do best.
atomic dimensions. In the case of the attack proposed by
The high cost of a fab means that the authors, a technical defense seems
in order to be cost-effective, it must problematic. We do, however, have ex-
be fully loaded. This has led to “sili- amples from other fields that might be
con foundries,” which build chips for promising. The A2 Trojan assumes an
a variety of “fabless” semiconductor untrusted fabrication facility. While it
companies based on a set of physical ment, and it may then be triggered might not be possible to do all future
design libraries supplied by the found- by an external software attack. When fabrication in trusted facilities, using
ry. Carver Mead and Lynn Conway in triggered, the chip’s normal function a third party trusted by both the fab
their seminal 1980 “Introduction to is subverted by the attacker. In the A2 and its customers to monitor the be-
VLSI Systems” initially proposed this implementation, the trigger is used havior of the fab seems plausible. The
concept, but the Taiwan Semiconduc- to elevate the privilege of a user-mode job of the third party is to certify the
tor Company (TSMC), founded in 1987, program. The authors argue that the proper behavior of the fab. Trusted
changed what had been an academic simplicity of the Trojan and its use of third parties are widely used in areas
exercise into an industrial norm. To- analog circuitry make it difficult to de- ranging from financial contracts to
day, a few large fabs throughout the tect, even with enhanced levels of test- nuclear treaty compliance. “Trust but
world dominate this business. ing. They go to considerable lengths to verify” was used during the Cold War
Over the last two decades, integrat- verify their approach, including exten- to describe this relationship.
ed circuit design has diverged into two sive simulation and actual fabrication The authors have a lot of experience
specialties: (1) Architectural and logi- of a processor in a modern silicon pro- with attacks on digital logic, and do a
cal design and device layout, done by cess. On the actual hardware, the Tro- good job of explaining previous work
a design house, with (2) mask genera- jan operated as expected. in the area. The paper is definitely
tion and device fabrication done by a Is this realistic? Certainly no worth reading carefully, as it covers
foundry. To ensure the foundry has foundry wants to compromise its an area that will likely become much
done its job correctly, the design house business model by being identified more important in an increasingly
relies on extensive testing to verify that as untrustworthy. technology-dependent world.
devices meet their specifications. As I was preparing this Techni-
The following paper assumes the cal Perspective, the Dyn/Mirai DDoS Charles Thacker, computing pioneer and recipient of the
2009 ACM A.M. Turing Award, passed away in June 2017,
foundry (or other parties involved in attack occurred. Apparently, the at- soon after this Technical Perspective was written.
the low levels of fabrication) is mali- tack used a large number of IoT de-
cious, and can modify the design they vices (DVRs and webcams) as a botnet,
receive to produce a device that can which targeted a major DNS server.
later be used for malice. Their attack This is approximately what the au-
employs a very small Trojan circuit in- thors of the following paper describe,
cluded in an otherwise correct design. although the attack was done by ex-
The Trojan awaits the chip’s deploy- ploiting the lack of security in the tar- Copyright held by author.

DOI:10.1145 / 3 0 6 8 776
Exploiting the Analog Properties

of Digital Circuits for Malicious
Hardware
By Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester
Abstract The most pernicious fabrication-time attack is the dopant-

While the move to smaller transistors has been a boon for level Trojan.2, 10 Dopant-level Trojans convert trusted cir-
performance it has dramatically increased the cost to fab- cuitry into malicious circuitry by changing the dopant ratio
ricate chips using those smaller transistors. This forces on the input pins to victim transistors. Converting existing
the vast majority of chip design companies to trust a third circuits makes dopant-level Trojans very difficult to detect
party—often overseas—to fabricate their design. To guard since there are no added or removed gates or wires. In fact,
against shipping chips with errors (intentional or other- detecting dopant-level Trojans requires a complete chip
wise) chip design companies rely on post-fabrication test- delayering and comprehensive imaging with a scanning
ing. Unfortunately, this type of testing leaves the door open electron microscope.17 However, this elusiveness comes at
to malicious modifications since attackers can craft attack the cost of expressiveness. Dopant-level Trojans are lim-
triggers requiring a sequence of unlikely events, which will ited by existing circuits, making it difficult to implement
never be encountered by even the most diligent tester. In sophisticated attack triggers.10 The lack of a sophisticated
this paper, we show how a fabrication-time attacker can trigger means that dopant-level Trojans are more detectable
leverage analog circuits to create a hardware attack that by post-fabrication functional testing. Thus, dopant-level
is small (i.e., requires as little as one gate) and stealthy Trojans represent an extreme on a trade-off space between
(i.e., requires an unlikely trigger sequence before affect- detectability during a physical inspection and detectability
ing a chip’s functionality). In the open spaces of an already during testing.
placed and routed design, we construct a circuit that uses To defend against malicious hardware inserted during
capacitors to siphon charge from nearby wires as they tran- fabrication, researchers have proposed two fundamental
sit between digital values. When the capacitors are fully defenses: (1) using side-channel information (e.g., power
charged, they deploy an attack that forces a victim flip-flop and temperature) to characterize acceptable behavior
to a desired value. We weaponize this attack into a remotely in an effort to detect anomalous (i.e., malicious) behavior,1, 7, 13, 15
controllable privilege escalation by attaching the capaci- and (2) adding sensors to the chip that directly measure and
tor to a controllable wire and by selecting a victim flip-flop characterize features of the chip’s behavior (e.g., signal
that holds the privilege bit for our processor. We imple- propagation delay) in order to identify dramatic changes
ment this attack in an OR1200 processor and fabricate a in those features (presumably caused by activation of a
chip. Experimental results show that the purposed attack malicious circuit).3, 8, 11 Using side channels as a defense
works. It eludes activation by a diverse set of benchmarks works well against large Trojans added to purely combi-
and evades known defenses. national circuits where it is possible to test all inputs and
there exists a reference chip to compare against. While
this accurately describes most existing fabrication-time
1. INTRODUCTION attacks, we show that it is possible to implement a stealthy
The trend toward smaller transistors in integrated circuits, and powerful processor attack using only a single added
while beneficial for higher performance and lower power, gate without affecting features measured by existing on-
has made fabricating a chip expensive. For example, it costs chip sensors.
15% more to set up the fabrication line for each successive We create a new fabrication-time attack that is control-
process node and by 2020 it is expected that setting up a fab- lable, stealthy, and small, which borrows the idea of coun-
rication line for the smallest transistor size will require a $20 ter-based triggers commonly used to hide design-time
billion upfront investment.18 To amortize the cost of fabri- malicious hardware19, 20 and adapt it to fabrication-time.
cation development, most hardware companies outsource Based on analog behaviors, the attack replaces the hun-
fabrication. dreds of gates required by conventional counter-based digi-
Outsourcing of chip fabrication opens up hardware to tal triggers with analog components—a capacitor and a few
attack. These hardware attacks can evade software checks transistors wrapped up in a single gate.
because software must trust hardware to faithfully imple-
ment the instructions.6, 12 Even worse, if there is an attack
The original version of this paper is entitled “A2: Analog
in hardware, it can contaminate all layers of a system that
Malicious Hardware” and was published in 2016 IEEE
depend on the hardware and violates high-level security pol-
International Symposium on Security and Privacy.
icies correctly implemented by software.
research highlights
This paper presents three contributions. (1) We design vulnerable to malicious attacks by rogue engineers involved
and implement the first fabrication-time processor attack in any of the above steps.
that mimics the triggered attacks often added during design The design house implements the specification for the
time. As a part of our implementation, we are the first to chip’s behavior in some Hardware Description Language
show how a fabrication-time attacker can leverage the empty (HDL). Once the specification is implemented in an HDL and
space common in chip layouts to implement malicious cir- that implementation has been verified, the design is passed
cuits, (2) We show how an analog attack can be much smaller to a back-end house, which places and routes the circuit.
and more stealthy than its digital counterpart. Our attack Conventional digital Trojans can only be inserted in
diverts charge from unlikely signal transitions to imple- design phase and are easier to be detected by design phase
ment its trigger, so it is invisible to all known side-channel verifications. Fabrication-time attacks inserted in back-end
defenses. Additionally, as an analog circuit, our attack is and fabrication phases can evade these defenses. Since it is
under the digital layer and missed by functional verification strictly more challenging to implement attacks at the fabri-
performed on the hardware description language, and (3) cation phase due to limited information and ability to mod-
We fabricate an openly malicious processor and then evalu- ify the design compared to the back-end phase, we focus on
ate the behavior of our fabricated attacks across many chips that threat model for our attack.
and changes in environmental conditions. We compare The attacker starts with a Graphic Database System II
these results to Simulation Program with Integrated Circuit (GDSII) file that is a polygon representation of the completely
Emphasis (SPICE) simulation models. laid-out and routed circuit. Our threat model assumes that
the delivered GDSII file represents a perfect implementa-
2. BACKGROUND AND THREAT MODEL tion—at the digital level of abstraction—of the chip’s speci-
The typical design and fabrication process of integrated cir- fication. This is very restrictive as it means that the attacker
cuits is as shown in Figure 1. See Rostami16. This process often can only modify existing circuits or—as we are the first to
involves collaboration between different parties all over the show in this paper—add attack circuits to open spaces in
world and each step is likely done by different teams even the laid-out design. The attacker can not increase the dimen-
if they are in the same company. Therefore, the designs are sions of the chip or move existing components around. This
restrictive threat model also means that the attacker must
Figure 1. Typical IC design process with commonly-research threat
perform some reverse engineering to select viable victim flip-
vectors highlighted in red. The blue text and brackets highlights the flops and wires to tap. After the untrusted fabrication house
party in control of the stage(s). completes fabrication, it sends the fabricated chips off to a
trusted party for post-fabrication testing. Our threat model
(Third Third-party IPs Design assumes that the attacker has no knowledge of the test cases
party) time used for post-fabrication testing. Such a model dictates the
RTL design attack use of a sophisticated trigger to hide the attack.
VHDL/Verilog 3. ATTACK METHODS

Digital
design A hardware attack is composed of a trigger and a payload.
Logic verification
phase The trigger monitors wires and state within the design and
(design Logic synthesis activates the attack payload under very rare conditions such
house) that the attack stays hidden during normal operation and
Timing verification testing. Previous research has identified that evading detec-
tion is a critical property for hardware Trojans designers.5
structural netlist
Evading detection involves more than just avoiding attack
Placement and activation during normal operation and testing, it includes
Back-end
design routing hiding from visual/side-channel inspection. There is a trade-
phase off at play between the two in that the more complex the trig-
LVS and DRC check
(design ger (i.e., the better that it hides at run time), the larger the
Post layout timing
Fabrication impact that trigger has on the surrounding circuit (i.e., the
house or
verification time worse that it hides from visual/side-channel inspection).
third
attack
party) Layout
We propose A2, a fabrication-time attack that is small,
stealthy, and controllable. To achieve these outcomes, we
Fabrication Manufacture develop trigger circuits that operate in the analog domain.
(foundry) The circuits are based on charge accumulating on a capaci-
Chips tor from infrequent events inside the processor. If the
(Design charge-coupled infrequent events occur frequently enough,
Chip verification the capacitor will fully charge and the payload is activated
house)
to deploy a privilege escalation attack. Our analog trigger
Packaging
is similar to the counter-based triggers often used in digi-
Customers tal triggers, except that using the capacitor has the advan-
tage of a natural reset condition due to leakage. Compared

to traditional digital hardware Trojans, the analog trigger Figure 2. Behavior model of proposed analog trigger circuit.
maintains a high level of stealth and controllability, while
dramatically reducing the impact on area, power, and tim- Trigger Trigger Trigger
ing due to the attack. An added benefit of a fabrication-time input circuits output
attack compared to a design-time attack (when digital-only
triggers tend to get added) is that it has to pass through fewer
verification stages. Trigger
input
3.1. Single stage trigger circuit
Cap Threshold
Based on our threat model, the high-level design objectives
voltage
of our analog trigger circuit are as follows:
Trigger
1. Functionality: The trigger circuit must be able to detect output
Time
toggling events of a target victim wire similar to a digi-
tal counter and the trigger circuit should be able to Trigger Retention
time time
reset itself if the trigger sequence is not completed in a
timely manner.
2. Small area: The trigger circuit should be small enough
to be inserted into the empty space of an arbitrary fin- the other hand, when the trigger input is inactive, leakage
ished chip layout. Small area overhead also implies gradually reduces the capacitor’s voltage, eventually dis-
better chance to escape detection. abling an already activated trigger. This mechanism ensures
3. Low power: The trigger circuit is constantly monitor- that the attack is not expressed when no intentional attack
ing the victim signals, therefore its power consump- happens. The time it takes to reset trigger output after trig-
tion must be minimized to hide within the normal ger input stops is defined as retention time.
fluctuations of the entire chip’s power consumption. Because of leakage, a minimum toggling frequency must
4. Negligible timing perturbation: The added trigger cir- be reached to successfully trigger the attack. At the mini-
cuit must not affect the timing constraints for normal mum frequency, charge added in each cycle equals charge
operation and its timing perturbations should not be leaked away. Trigger time and retention time are the two main
easily separable from the noise common to path delays. design metrics in the analog trigger circuits that we can
5. Standard cell compatibility: Since all digital designs make use of to create flexible trigger conditions and more
are based on standard cells with fixed cell height, the complicated trigger patterns as discussed in Section 3.2.
analog trigger circuit must fit into the height and only A stricter triggering condition (i.e., faster toggling rate and
use the lowest metal layer for routing.a These require- more toggling cycles) reduces the probability of a false trig-
ments are important for insertion into existing chip ger during normal operation or testing, but non-idealities
layout and makes the trojan more difficult to detect in in circuits and process, temperature and voltage variations
fabricated chips. can cause the attack to fail—impossible to trigger or trivial
to accidentally trigger—for some chips. As a result, a trade-
To achieve these design objectives, we propose an attack off should be made between a reliable attack that can be
based on charge accumulation inside capacitors. A capaci- expressed in every chip and a more stealthy attack that can
tor performs analog integration of charge from a victim wire only be triggered for certain chips under certain conditions.
while at the same time being able to reset itself through The conventional current-based charge pump is not suit-
leakage current. A behavior model of capacitor based trig- able for the attack due to area and power constraints. A new
ger circuits comprises charge accumulation and leakage as charge pump circuit based on charge sharing is specifically
shown in Figure 2. designed for the attack purpose as shown in Figure 3. During
Every time the victim wire that feeds the trigger circuit’s the negative phase of Clk, Cunit is charged to VDD. Then dur-
capacitor toggles, the capacitor increases in voltage by some ing positive phase of Clk, the two capacitors are shortened
DV. After a number of toggles, the capacitor’s voltage exceeds together, causing the two capacitors to share charges. After
a predefined threshold voltage and enables the trigger’s charge sharing, final voltage of the two capacitors is the
output—deploying the attack payload. The time it takes to same and DV on Cmain is as,
activate the trigger is defined as trigger time (Figure 2). Cunit × (VDD − V0 )
On the other hand, leakage current exists all the time ∆V =
Cunit + Cmain
and it dumps charge from the trigger circuit’s capacitor.
The attacker can design the capacitor’s leakage to be weaker where V0 is initial voltage on Cmain before the transition
than its accumulation when the trigger input is active. On happens. We can achieve different trigger time by sizing the
two capacitors. The capacitor keeps leaking over time and
finally DV equals the voltage drop due to leakage, which sets
a
Several layers of metal wires are used in modern CMOS technologies to
connect cells together, lower level metal wires are closer to transistors at
the maximum capacitor voltage.
bottom for short interconnections, while higher metal layers are used for A transistor-level schematic of the proposed analog trig-
global routing. ger is as shown in Figure 4. Cunit and Cmain are implemented
research highlights
with Metal Oxide Semiconductor (MOS) caps. M0 and M1 are 3.2. Multi-stage trigger circuit
the two switches as shown in Figure 3. A detector is used to The one-stage trigger circuit described in the previous sec-
compare cap voltage with a threshold voltage and can be tion takes only one victim wire as an input. Using only one
implemented by inverters or Schmitt triggers. An inverter trigger input limits the attacker in two ways: (1) Because fast
has a switching voltage depending on its sizing and when toggling of one signal for tens of cycles triggers the single
the capacitor voltage is higher than the switching voltage, stage attack, there is still a chance that normal operations or
the output is 0; otherwise, the output is 1. A Schmitt trigger certain benchmarks can expose the attack, and (2) Certain
is an inverter with hysteresis. It has a large threshold when instructions are required to create fast toggling of a single
input goes from low to high and a small threshold when trigger input and there is not much room for a flexible and
input goes from high to low. The hysteresis is beneficial for stealthy attack program.
our attack because it extends both trigger time and retention We note that an attacker can make a logical combination
time. To balance the leakage current through M0 and M1, an of two or more single-stage trigger outputs to create a vari-
additional leakage path to ground (NMOS M2 as shown in ety of more flexible multi-stage analog triggers. Basic opera-
Figure 4) is added to the design. tions to combine two triggers include AND and OR. When
A SPICE simulation waveform is as shown in Figure 5 to analyzing the behavior of logic operations on single stage
illustrate the operation of our analog trigger circuit after trigger output, it should be noted that the single-stage trig-
optimization. The operation is same as the behavioral model ger outputs 0 when triggered. Thus, for AND operation, the
that we proposed as shown in Figure 2, allowing us to use the final trigger is activated when either A or B triggers fire. For
behavior model for system-level attack design. OR operation, the final trigger is activated when both A and
B triggers fire. It is possible for an attacker to combine these
simple AND and OR-connected triggers into an arbitrarily
Figure 3. Design concepts of analog trigger circuit based on complex multi-level multi-stage trigger.
capacitor charge sharing.
VDD
3.3. Triggering the attack
Clk For A2, the payload design is independent of the trigger mecha-
nism, so our proposed analog trigger is suitable for various pay-
Clk Cunit
Clk VDD loads to achieve different attacks. Since the goal of this work
Cap is to achieve a Trojan that is nearly invisible while providing a
voltages powerful foothold for a software-level attacker, we couple our
Cunit Cmain Cmain analog triggers to a privilege escalation attack,9 which provides
Time
maximum capabilities to an attacker. We propose a simple
design to overwrite security critical registers directly by adding
one AND/OR gate to asynchronous set or reset pins of the reg-
isters. These reset/set pins are specified in original designs for
Figure 4. Transistor-level schematic of analog trigger circuit. processor reset. These reset signals are asynchronous with no
timing constraints so that adding one gate into the reset sig-
VDD
Trigger
nal of one register does not affect functionality or timing con-
inputs straints of the design. Because there are no timing constraints
M0 on asynchronous inputs, the payload circuit can be inserted
M1
Trigger manually after final placement and routing in a manner con-
Detector
Switch output sistent with our threat model.
leakage Cap
M2 Drain
Cunit Cmain leakage leakage
3.4. Selecting victims
It is important that the attacker validate their choice of vic-
tim signal. This requires verifying that the victim wire has
low baseline activity and its activity level is controllable
given the expected level of access of the attacker. To validate
Figure 5. SPICE simulation waveform of analog trigger circuit. that the victim wire used in A2 has a low background activity,
we use benchmarks from the MiBench embedded systems
1 benchmark suite. For cases where the attacker does not have
Trigger
access to such software or the attacked processor will see a
Voltage
input
0
wide range of use, the attacker can follow A2’s example and
Trigger
Trigger time Retention time use a multi-stage trigger with wires that toggle in a mutually-
(240ns) (0.8us)
output
1
exclusive fashion and require inputs that are unlikely to
Voltage
be produced using off-the-shelf tools (e.g., GNU Compiler

Cap
0 Collection (GCC)).
voltage
0.0 1.0 2.0 3.0 4.0 5.0 6.0
Validating that the victim wire is controllable requires
time (us) that the attacker reason about their expected level of access
to the end user system for the attacked processor. In A2, we

assume that the attacker can load and execute any unprivi- Triggering the attack in usermode-only code is only the
leged instruction. This allows us to create hand-crafted first part of a successful attack. For the second part, the
assembly sequences that activate the attack. This model attacker must be able to verify that the triggering software
works for attackers that have an account on the system, works—without risk of alerting the operating system. To
attackers in a virtual machine, or even attackers that can check whether the attack is successful, we take advantage
convince users to load code. of a special feature of some registers on the OR1200: some
privileged registers are able to be read by user mode code,
4. IMPLEMENTATION but the value reported has some bits redacted. We use this
To experimentally verify A2, we implement and fabricate an behavior to let the attacker’s code know whether it gets privi-
open source processor with the proposed analog Trojans leged access to the processor or not.
inserted in 65nm General Purpose Complementary Metal-
Oxide-Semiconductor (CMOS) technology. Multiple attacks 4.2. Analog activity trigger
are implemented in the chip. One set of attacks are Trojans We implement both the one-stage and two-stage trigger cir-
aimed at exposing A2’s end-to-end operation, while the cuits in 65nm GP CMOS technology based on SPICE simula-
other set of attacks are implemented outside the processor, tions. Both trigger circuits are inserted into the processor to
directly connected to Input/Output (IO) pins so that we can demonstrate the attack.
investigate trigger behavior directly. Implementation in 65nm GP technology. For prototype
purposes, we optimize the trigger circuit towards a reliable
4.1. Attacking a real processor version and building a reliable circuit under process, temper-
We implemented an open source OR1200 processor14 to ver- ature, and voltage (PVT) variations is always more challeng-
ify our A2 attack including software triggers, analog triggers ing than only optimizing for a certain PVT range—that is, we
and payload. The OR1200 Central Processing Unit (CPU) is construct our attacks so that they work in all fabricated pro-
an implementation of the 32-bit OR1K instruction set with cessors at all corner-case environments. 65nm CMOS tech-
a five stage pipeline. The implemented system in silicon nology is not a favorable technology for our attack because
consists of a OR1200 core with 128B instruction cache and the gate oxide is thinner than older technologies due to
an embedded 128KB main program memory connected dimension scaling and also thinner than latest technologies
through a Wishbone bus. The OR1K instruction set specifies because high-κ metal gate techniques now being employed
the existence of a privileged register called the Supervision to reduce gate leakage. However, through careful sizing, it’s
Register (SR). The SR contains bits that control how the pro- still possible to design a circuit robust across PVT variations,
cessor operates (e.g., Memory Management Units (MMU) but this requires trading-off trigger time and retention time.
and caches enabled) and flags (e.g., carry flag). One partic- To reduce gate leakage, another solution is to use thick
ular bit is interesting for security purposes; SR[0] controls oxide transistors commonly used in IO cells as the MOS cap
the privilege mode of user, with 0 denoting user mode and for Cmain, which shows negligible gate leakage. This option
1 denoting supervisor mode. By overwriting the value of this provides larger space for the configuration of trigger time
register, an attacker can escalate a user mode process to and retention time but requires larger area due to design
supervisor mode as a backdoor to deploy various high-level rules. Trigger circuit using IO device is implemented for the
attacks.5, 9 Therefore, we make the payload of our attack set- two-stage attack and the one without IO device is used for
ting this bit in the SR to 1 to give a user mode process full the one-stage attack in the system.
control over the processor. Inserting A2 into existing chip layouts. Since A2’s analog
Our analog trigger circuits require trigger inputs that can trigger circuit is designed to follow sizing and routing con-
have a high switching activity under certain (attacker) pro- straints of standard cells and has the area of a single standard
grams but are almost inactive during testing or common cell, inserting the trigger circuit to the layout at fabrication
case operation so that the Trojan is not exposed. To search time is not complicated. In typical placement and routing
for suitable victim wires as trigger inputs, we run a series of cases, around 60% to 70% of total area is used for standard
programs from MiBench (see Section 5) on the target proces- cells, otherwise routing can not complete due to routing
sor in an HDL simulator, capturing the toggling rates of all congestions (our chip is more challenging to attack as it
wires. The result shows that approximately 3% of total wires has 80% area utilization). Therefore, in any layout of digital
have nearly zero activity rate, which provides a wide range of designs, empty space exists. This empty space presents an
options for an attacker. The target signals must also be easy opportunity for attackers as they can occupy the free space
to control by attack programs. In our attack, we select divide with their own malicious circuit. In our case, we require as
by zero flag signal as the trigger for the one-stage attack, little space as one cell. There are four steps to insert a trigger
because it is unlikely for normal programs to continuously into the layout of a design:
perform division-by-zero while it is simple for an attacker The first step is to locate the signals chosen as trigger
to deliberately perform such operations in a tight loop. For inputs and the target registers to attack. The insertion of A2
the two-stage trigger, we select wires that report whether the attack can be done at both back-end and fabrication stage.
division was signed or unsigned as trigger inputs. The attack Our threat model focuses on the fabrication stage because
program alternatively switches the two wires by performing it is significantly more challenging and implies a more
signed, then unsigned division, until both analog trigger stealthy attack over compared to attack at back-end stage
circuits are activated, deploying the attack payload. attacks. The back-end stage attacker has access to the netlist
research highlights
of the design, so locating the desired signal is trivial. But an Comparisons with several variants of NAND2 and DFlip–Flop
attack inserted at back-end stage can still be discovered by standard cells from commercial libraries are summarized in
SPICE simulation and layout checks, though the chance is Table 1. The area of the trigger circuit not using IO device
extremely low if no knowledge about the attack exists. In is similar to a X4 strength DFlip–Flop. Using an IO device
contrast, fabrication time attacks can only be discovered by increases trigger circuit size significantly, but area is still
post-silicon testing, which is believed to be very expensive similar to the area of two standard cells, which ensures it can
and difficult to find small Trojans. To insert an attack during be inserted into empty space in final design layout. AC power
chip fabrication, some insights about the design are needed, is the total energy consumed by the circuits when input
which can be extracted from layout through physical verifi- changes, the power numbers are simulated with SPICE on
cation tools and digital simulations or from a co-conspirator a netlist including extracted parasitics. Standby power is the
involved in the design phase. power consumption of the circuits when inputs are static,
The next step is to find empty space around the victim which comes from leakage currents of CMOS devices.
wire and insert the analog trigger circuit. Unused space is After inserting A2, post-layout simulation with extracted
usually automatically filled with filler cells or capacitor cells parasitics shows that the extra delay of victim wires is 1.2ps
by placement and routing tools. Removing these cells will on average, which is only 0.33% of 4ns clock period and
not affect the functionality or timing. well below the process variation and noise range. In prac-
To insert the attack payload circuit, the reset wire needs tice, such delay difference is nearly impossible to measure,
to be cut as discussed in Section 3.3. It has been shown unless a high-resolution time to digital converter is included
that timing of reset signal is flexible, so the AND or OR gate on chip, which is impractical due to its large area and power
only need to be placed somewhere close to the reset signal. overhead.
Because the added gates can be a minimum strength cell, Comparison to digital-only attacks. If we look at a previ-
their area is small and finding space for them is trivial. ously proposed, digital only and smallest implementation of
The last step is to manually do the routing from trigger a privilege escalation attack,5 it requires 25 gates and 80mm2
input wires to analog trigger circuit and then to the payload while our analog attack requires as little as one gate for the
circuits. There is no timing requirement on this path so that same effect. Our attack is also much more stealthy as it
the routing can go around existing wires at same metal layer requires dozens of consecutive rare events, where the other
(jogging) or jump over existing wires by going to another attack only requires two. We also implement a digital only,
metal layer (jumping). If long and high metal wires become counter-based attack that aims to mimic A2. The digital ver-
a concern of the attacker due to potentially easier detection, sion of A2 requires 91 cells and 382mm2, almost two orders-
repeaters (buffers) can be added to break long wire into of-magnitude more than the analog counterpart. These
small sections. Furthermore, it is possible that the attacker results demonstrate how analog attacks can provide attack-
can choose different trigger input wires and/or payload ers the same power and control as existing digital attacks,
according to the existing layout of the target design. but much more difficult to catch.
In our OR1200 implementation, inserting the attack fol-
lowing the steps above is trivial, even with the design’s 80% 5. EVALUATION
area utilization. Routing techniques including jogging and We perform all experiments with our fabricated 2.1mm2
jumping are used, but such routing approach is very com- malicious OR1200 processor as shown in Figure 6. Figure 6
mon for automatic routing tools so the information leaked also marks the locations of A2 attacks, with two levels of
by such wires is limited. zoom to aide in understanding the challenges of identifying
Side-channel information. For the attack to be stealthy A2 in a sea of non-malicious logic. In fact, A2 occupies less
and defeat existing protections, the area, power and timing than 0.08% of the chip’s area. Our fabricated chip contains
overhead of the analog trigger circuit should be minimized. two sets of attacks: the first set of attacks are one and two-
High accuracy SPICE simulation is used to characterize stage triggers baked-in to the processor that we use to assess
power and timing overhead of implemented trigger circuits. the end-to-end impact of A2. The second set of attacks exist
Table 1. Comparison of area and power between our implemented analog trigger circuits and commercial standard cells in 65nm GP CMOS
technology.
Function Drive strength Width† AC power† Standby power†

NAND2 X1 1 1 1
NAND2 X4 3 3.7 4.1
NAND2 X8 5.75 7.6 8.1
DFF with Async reset X1 6 12.7 2.6
DFF with Async reset X4 7.75 21.8 7.2
DFF with Async set and reset X1 7.5 14.5 3.3
DFF with Async set and reset X4 8.75 23.6 8.1
Trigger w/o IO device – 8 7.7 2.2
Trigger w/ IO device – 13.5 0.08 0.08
* DFF stands for D Flip Flop. † Normalized values.

outside of the processor and are used to fully characterize (a free register bit that we can use to test the two-stage trig-
A2’s operation. ger) to 1. When the respective trigger deploys the attack,
We use the testing setup as shown in Figure 7 to evaluate the single-stage attack will cause SR[0] to suddenly have a
our attacks’ response to changing environmental conditions 1 value, while the two-stage trigger will cause SR[1] to have
and a variety of software benchmarks. The chip is packaged a 0 value—the opposite of their initial values. Because our
and mounted on a custom testing board to interface with attack relies on analog circuits, environmental aspects dic-
a PC. Through a custom scan chain, we can load programs tate the performance of our attack. Therefore, we test the
into the processor’s memory and also check the values of the chip at six temperatures from −25°C to 100°C to evaluate
processor’s registers. The system’s clock is provided by an the robustness of our attack. Measurement results con-
on-chip 240MHz clock generator at the nominal condition firm that both the one-stage and two-stage attacks in all ten
(1V supply voltage and 25°C). tested chips successfully overwrite the target registers at all
temperatures.
5.1. Does the attack work? Analog trigger circuit measurement results. Figure 8
To prove the effectiveness of A2, we evaluate it from two per- shows the measured distribution of retention time and
spectives. One is a system evaluation that explores the end- trigger cycles at three different trigger toggling frequen-
to-end behavior of our attack by loading attack-triggering cies across ten chips. The results show that our trigger cir-
programs on the processor, executing them in user mode, cuits have a regular behavior in the presence of real-world
and verifying that after executing the trigger sequence, manufacturing variances, confirming SPICE simulation
they have escalated privilege on the processor. The other results. retention time at the nominal condition (1V sup-
perspective seeks to explore the behavior of our attacks by ply voltage and 25°C) is around 1ms for the trigger with
directly measuring the performance of the analog trigger only core devices and 5ms for attacks constructed using IO
circuit, the most important component in our attack, but devices. It is verified that the number of cycles to trigger
also the most difficult aspect of our attack to verify using attack for both trigger circuits (i.e., with and without IO
simulation. devices) are very close in chip measurements and SPICE
System attack. Malicious programs described in Section 4.1. simulations. The results indicate that SPICE is capable of
are loaded to the processor and then we check the target providing results of sufficient accuracy for these unusual
register values. In the program, we initialize the target reg- attack circuits.
isters SR[0] (the mode bit) to user mode (i.e., 0) and SR[1] To verify the implemented trigger circuits are robust
across voltage and temperature variations (as SPICE simu-
lation suggests), we characterize each trigger circuit under
Figure 6. Die micrograph of analog malicious hardware test chip with different supply voltage and temperature conditions. We
a zoom-in layout of inserted A2 trigger.
Figure 8. Measured distribution of retention time and trigger cycles

under different trigger input divider ratios across 10 chips at nominal
Via
Metal 3 1V supply voltage and 25°C.
Main memory Metal 2
128KB SRAM 7 7 7 7
1.5mm
120MHz 9.23MHz 1.875MHz Retention time

6 6 6 6
5 5 5 5
Number of chips
Scan
OR1200 chain A2 Trigger
I$ CLK Testing 4 4 4 4
core Structure
3 3 3 3
2mm
IO drivers and pads 2 2 2 2

1 1 1 1
1.4mm 6.4mm
0 0 0 0
10 12 14 16 10 12 14 16 12 14 16 18 4 6 8 10 12
Cycles Cycles Cycles Retention time (us)
(a) Distribution of analog trigger circuit using IO device

Figure 7. Testing setup for test chip measurement.
7 7 7 7
120MHz 34.3MHz 10.9MHz Retention time
6 6 6 * 2 chips cannot 6
trigger at this
Labview
Number of chips
Packaged 5 5 5 switching activity 5

test chip Temperature 4 4 4 4
chamber 3 3 3 3
2 2 2 2
Power supply and 1 1 1 1

source meter 0 0 0 0
4 6 8 10 6 8 10 12 10 12 14 16 0.6 0.8 1.0 1.2
Testing PCB
Cycles Cycles Cycles Retention time (us)
Digital IO (b) Distribution of analog trigger circuit using only core device
research highlights
confirmed that the trigger circuit can be activated when the programs, at the nominal condition (1V supply voltage and
victim wire toggles between 0.46MHZ and 120MHz, the sup- 25°C). Direct measurement of trigger circuit power is
ply voltage varies between 0.8V and 1.2V, and the ambient infeasible in our setup, so simulation is used as an esti-
temperature varies between −25°C and 100°C. mation. Simulated trigger power consumption in Table 1
As expected, different conditions yield different mini- translates to 5.3nW and 0.5mW for trigger circuits con-
mum toggling rates to activate the trigger. Temperature structed with and without IO devices. These numbers are
has a stronger impact than voltage on the trigger condi- based on the assumption that trigger inputs keep tog-
tion because of leakage current’s exponential dependence gling at 1/4 of the clock frequency of 240MHz, which is the
on temperature. At higher temperature, more cycles are maximum switching activity that our attack program can
required to trigger and higher switching activity is required achieve. In the common case of non-attacking software,
because leakage from capacitor is larger. the switching activity is much lower—approaching zero—
and only lasts a few cycles so that the extra power due to
5.2. Is the attack triggered by non-malicious our trigger circuit is even smaller. In our experiments, the
benchmarks? power of the attack circuit is orders-of-magnitude less
Another important property for any hardware Trojan is not than the normal power fluctuations that occur in a pro-
exposing itself under normal operations. Because A2’s trig- cessor while it executes different instructions. Further
ger circuit is connected only to the trigger input signal, digi- discussions about possible defenses such as split manu-
tal simulation of the design is enough to acquire the activity facturing and runtime verifications are presented in our
of the signals. However, since we make use of analog charac- original A2 paper.21
teristics to attack, analog effects should also be considered
as potential effects to accidentally trigger the attack. We use 6. CONCLUSION
MiBench4 as test bench because it targets the class of pro- Experimental results with our fabricated malicious proces-
cessor that best fits the OR1200 and it consists of a set of sor show that a new style of fabrication-time attack is pos-
well-understood applications that are popular benchmarks sible, which applies to a wide range of hardware, spans the
in both academia and in industry. To validate that A2’s trig- digital and analog domains, and affords control to a remote
ger avoids spurious activations from a wide variety of soft- attacker. Experimental results also show that A2 is effec-
ware, we select five benchmark applications from MiBench, tive at reducing the security of existing software, enabling
each from a different class. This ensures that we thoroughly unprivileged software full control over the processor.
test all subsystems of the processor—exposing likely activity Finally, the experimental results demonstrate the elusive
rates for the wires in the processor. Again, in all programs, nature of A2: (1) A2 is as small as a single gate—two orders of
the victim registers are initialized to opposite states that A2 magnitude smaller than a digital-only equivalent; (2) attack-
puts them in when its attack is deployed. The processor runs ers can add A2 to an existing circuit layout without perturb-
all five programs at six different temperatures from −25°C to ing the rest of the circuit; (3) a diverse set of benchmarks fail
100°C. Results prove that neither the one-stage nor the two- to activate A2 and (4) A2 has little impact on circuit power,
stage trigger circuit is exposed when running these bench- frequency, or delay.
marks across such wide temperature range. Our results expose two weaknesses in current malicious
hardware defenses. First, existing defenses analyze the
5.3. Existing protections digital behavior of a circuit using functional simulation or
Existing protections against fabrication-time attacks are the analog behavior of a circuit using circuit simulation.
mostly based on side-channel information, for example, Functional simulation is unable to capture the analog prop-
power, temperature, and delay. In A2, we only add one gate erties of an attack, while it is impractical to simulate an
in the trigger, thus minimizing power and temperature per- entire processor for thousands of clock cycles in a circuit
turbations caused by the attack. simulator—this is why we had to fabricate A2 to verify that it
Table 2 summarizes the average power consumption worked. Second, the minimal impact on the run-time prop-
measured when the processor runs our five benchmark erties of a circuit (e.g., power, temperature, and delay) due
to A2 suggests that it is an extremely challenging task for
Table 2. Power consumption of our test chip running a variety of
side-channel analysis techniques to detect this new class of
benchmark programs. attacks. We believe that our results motivate a different type
of defense, where trusted circuits monitor the execution of
Program Power (mW) untrusted circuits, looking for out-of-specification behavior
Standby 6.210 in the digital domain.
Basic math 23.703
Dijkstra 16.550 Acknowledgments
FFT 18.120 This work was supported in part by C-FAR, one of the six
SHA 18.032
Search 21.960
SRC STARnet Centers, sponsored by MARCO and DARPA.
Single-stage attack 19.505 This work was also partially funded by the National Science
Two-stage attack 22.575 Foundation. Any opinions, findings, conclusions, and rec-
Unsigned division 23.206 ommendations expressed in this paper are solely those of
the authors.

13. Narasimhan, S., Wang, X., Du, D., Reversing stealthy dopant-level
References
Chakraborty, R.S., Bhunia, S. TeSR: circuits. In International Conference
1. Agrawal, D., Baktir, S., Karakoyunlu, D., Operating Systems (ASPLOS, A robust temporal self-referencing on Cryptographic Hardware and
Rohatgi, P., Sunar, B. Trojan Istanbul, Turkey, 2015). ACM, approach for hardware Trojan Embedded Systems (CHES, New York,
detection using IC fingerprinting. In 517–529. detection. In Hardware-Oriented NY, 2014). Springer-Verlag, 112–126.
Symposium on Security and Privacy 7. Jin, Y., Makris, Y. Hardware Trojan Security and Trust (HOST, San Diego, 18. S.S. Technology. Why node shrinks are
(S&P, Washington, DC, 2007). IEEE detection using path delay fingerprint. CA, June 2011). IEEE Computer no longer offsetting equipment costs,
Computer Society, 296–310. In Hardware-Oriented Security and Society, 71–74. (online webpage, Oct. 2012).
2. Becker, G.T., Regazzoni, F., Paar, C., Trust (HOST, Washington, DC, 2008). 14. OpenCores.org. OpenRISC OR1200 19. Waksman A., Sethumadhavan, S.
Burleson, W.P. Stealthy dopant-level IEEE Computer Society, 51–57. processor. Silencing hardware backdoors. In
hardware Trojans. In International 8. Kelly, S.,Zhang, X., Tehranipoor, M., 15. Potkonjak, M., Nahapetian, A., IEEE Security and Privacy (S&P,
Conference on Cryptographic Ferraiuolo, A. Detecting hardware Nelson, M., Massey, T. Hardware Oakland, CA, May 2011). IEEE
Hardware and Embedded Systems Trojans using on-chip sensors in an Trojan horse detection using gate- Computer Society.
(CHES, Berlin, Heidelberg, 2013). ASIC design. Journal of Electronic level characterization. In Design 20. Wang, X., Narasimhan, S., Krishna, A.,
Springer-Verlag, 197–214. Testing 31, 1 (Feb. 2015), 11–26. Automation Conference, volume 46 of Mal-Sarkar, T., Bhunia, S. Sequential
3. Forte, D., Bao, C., Srivastava, A. 9. King, S.T., Tucek, J., Cozzie, A., Grier, C., DAC (2009), 688–693. hardware trojan: Side-channel aware
Temperature tracking: An innovative Jiang, W.n., Zhou, Y. Designing and 16. Rostami, M., Koushanfar, F., design and placement. In Computer
run-time approach for hardware implementing malicious hardware. In Rajendran, J., Karri, R. Hardware Design (ICCD), 2011 IEEE 29th
Trojan detection. In International Workshop on Large-Scale Exploits and security: Threat models and metrics. International Conference on (IEEE,
Conference on Computer-Aided Emergent Threats, volume 1 of LEET In Proceedings of the International Oct 2011), 297–300.
Design (ICCAD, 2013). IEEE, (USENIX Association, Apr. 2008). Conference on Computer-Aided 21. Yang, K., Hicks, M., Dong, Q., Austin, T.,
532–539. 10. Kumar, R., Jovanovic, P., Burleson, W., Design (ICCAD, San Jose, CA, 2013). Sylvester, D. A2: Analog malicious
4. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Polian, I. Parametric Trojans for fault- IEEE Press, 819–823. hardware. In 2016 IEEE Symposium on
Austin, T.M., Mudge, T., Brown, R.B. injection attacks on cryptographic 17. Sugawara, T., Suzuki, D., Fujii, R., Security and Privacy (SP) (May 2016).
MiBench: A free, commercially hardware. In Workshop on Tawa, S., Hori, R., Shiozaki, M., Fujino, T. IEEE Computer Society, 18–37.
representative embedded benchmark Fault Diagnosis and Tolerance in
suite. In Workshop on Workload Cryptography (IEEE, FDT, 2014), 18–28.
Characterization (Washington D.C., 11. Li, J., Lach, J. At-speed delay * Kaiyuan Yang (kyang@rice.edu), Dept.
2001). IEEE Computer Society, 3–14. characterization for IC authentication of ECE, Rice University, Houston, TX.
5. Hicks, M., Finnicum, M., King, S.T., and Trojan horse detection. In
Martin, M.M.K., Smith, J.M. Hardware-Oriented Security and Trust * Matthew Hicks (mdhicks@gmail.com),
Overcoming an untrusted computing (HOST, Washington, DC, 2008). IEEE Dept. of CS, Virginia Tech, Blacksburg, VA.
base: Detecting and removing Computer Society, 8–14.
malicious hardware automatically. 12. Li, M.-L., Ramachandran, P., Qing Dong, Todd Austin, and Dennis
USENIX;login 35, 6 (Dec. 2010), Sahoo, S.K., Adve, S.V., Adve, V.S., Sylvester ({kaiyuan, mdhicks, qingdong,
31–41. Zhou, Y. Understanding the propagation austin, dmcs}@umich.edu),
6. Hicks, M., Sturton, C., King, S.T., of hard errors to software and Department of EECS, University
Smith, J.M. Specs: A lightweight implications for resilient system of Michigan, Ann Arbor, MI.
runtime mechanism for protecting design. In International Conference
software from security-critical on Architectural Support for * This work was done at the University of
processor bugs. In Proceedings Programming Languages and Michigan, Ann Arbor.
of the Twentieth International Operating Systems (ASPLOS,
Conference on Architectural Support Seattle, WA, Mar. 2008). ACM,
for Programming Languages and 265–276. © 2017 ACM 0001-0782/17/09 $15.00
research highlights
DOI:10.1145/ 30 6 8 6 1 4
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3068663 rh
Humans and Computers Working

Together on Hard Tasks
By Ed H. Chi
THE FIELD OF crowdsourcing and human transform individual untrained work- leagues at Bletchley Park in a recent is-
computation has evolved considerably ers into better captionists. sue of Communications: “Another myth
from its early days. At first, crowdsourc- Second, the system uses a Map- is that code-breaking machines elimi-
ing was mainly conceived as a way to Reduce programming paradigm to di- nated human labor and code-breaking
obtain ground truth labels for datasets, vide and conquer the various pieces of skill ... Technology transcended, rather
particularly image datasets, in the mid- the captioning tasks and coordinates than supplemented, human labor and
2000s. Soon after, researchers began to the workers and their tasks through bureaucracy.”e The article points out
utilize crowdsourcing for performing this organization paradigm. First in- the real challenge of the whole effort
large-scale user studies of systems.a,b As troduced by Kittur et al.,d this is a clever was a combination of the management
our understanding of crowdsourcing application of the MapReduce para- of a (mostly female!) human operator
continued to evolve, researchers real- digm, but instead of applying to com- force along with the Enigma machines.
ized the workers can be reserved ahead puting tasks, the system applies the From my perspective, intelligent aug-
of time to perform real-time tasks.c Uti- concept to organizing human tasks. mentation of our abilities is the real re-
lizing this idea, the system described in Third, impressively, to combine the search frontier.
the following paper demonstrates how partial contributions from individual While we continue to explore the
a crowd of workers can caption speech workers, the system utilizes a sequence boundary of what is possible for ma-
nearly as well as a professional caption- alignment algorithm to combine the chine intelligence, we should also be
ist. Importantly, this paper was one of streams of input from various workers. exploring the boundary of how humans
the first in a recent set of crowdsourcing This is novel because most crowd- will interact with machine intelligence.
papers that demonstrated how human sourcing systems use a simple major- For example, how can we have an intel-
workers can collaborate in concert with ity voting approach to combine the ligent conversation with computing sys-
computing systems to accomplish a worker inputs. The use of a sophisti- tems? Can I talk to a restaurant recom-
real-time task that is difficult for either cated algorithm here is necessary to fit mendation system while I drive home to
one to do by itself. This is notable for the captioning problem, and it points get ready for a dinner date? How should
many reasons, but let me first summa- to the possibility of other combiner my television respond if I say I wanted
rize the significance of this work. functions in other problems in future an exciting action film tonight that takes
First, the system demonstrated that research. A natural extension of the into account the tastes of other fam-
significant innovation is needed to get alignment algorithm here would be to ily members? If it doesn’t have enough
human workers to productively per- utilize a task-specific language model information on everyone in the room,
form the captioning task. For example, trained using deep learning. will it (he/she?) ask intelligent ques-
the Scribe system slows down the con- From a historical perspective, aug- tions while naturally conversing with
tinuous speech for a brief period of menting humans has been at the very my guests? Can I give feedback both via
time with the right volume changes to center of much personal computing hand gestures as well as voice dialog?
emphasize what passage to transcribe and HCI research. There has been Since an important application of
for the worker. The volume variations much talk about the degree in which machine intelligence is to augment hu-
help with audio saliency. This tech- machine learning (ML) will replace mans in their desires, goals, and tasks,
nique is interesting to human-comput- human labor (HL) in the future, but I what we should do is to ask important re-
er interaction (HCI) researchers, since think that is misguided. Instead, what search questions about human interac-
it utilizes our intuition about how we we see in this research is a good ex- tions with ML systems. In other words,
can direct human attention, helping to ample in which humans and machines we should have much better research of
work in concert on a very hard task that ML+HL, ML+HCI, and ML+Human In-
a Kittur, A., Chi, E.H., Suh B.. Crowdsourcing is currently still too difficult to do by teraction, and this research is a shining
user studies with Mechanical Turk. In Proceed- either alone. Interestingly, this aligns example that points the way.
ings of the ACM Conference on Human-Factors
in Computing Systems, ACM Press (Florence,
well with a historical recounting of the
Italy, 2008), 453–456. code-breaking work by Turing and col- e Haigh, T. Colossal genius: Tutte, flowers, and a
b Egelman, S., Chi, E.H., Dow, S. Crowdsourc- bad imitation of Turing. Commun. ACM 60, 1 (Jan.
ing in HCI research. Ways of Knowing in HCI. 2017), 29–35; https://doi.org/10.1145/3018994
J.S. Olson and W.A. Kellogg, Eds. Springer, NY, d Kittur. K, Smus. B., Khamkar. S., and Kraut. R.E.
2014, 267–289. CrowdForge: Crowdsourcing complex work. In
Ed H. Chi is Research Lead Manager and Sr. Staff
c Bernstein, M., Brandt, J., Miller, R., and Karger, Proceedings of the 24th Annual ACM Symposium on Research Scientist at Google Inc., Mountain View, CA.
D. Crowds in two seconds: Enabling real-time User Interface Software and Technology (2011), 43–
crowd-powered interfaces. UIST 2011. 52; http://dx.doi.org/10.1145/2047196.2047202 Copyright held by author.

DOI:10.1145/ 3 0 6 8 6 6 3
Scribe: Deep Integration of

Human and Machine Intelligence
to Caption Speech in Real Time
By Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar,
Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham
Abstract This is particularly true of the large (and increasing) number

Quickly converting speech to text allows deaf and hard of DHH people who lost their hearing later in life, which
of hearing people to interactively follow along with live includes one third of people over 65.12 Captioning may also
speech. Doing so reliably requires a combination of percep- be preferred by some to sign language interpreting for tech-
tion, understanding, and speed that neither humans nor nical domains because it does not involve translating from
machines possess alone. In this article, we discuss how our the spoken language to the sign language, but rather trans-
Scribe system combines human labor and machine intel- literating an aural representation to a written one. Finally,
ligence in real time to reliably convert speech to text with like captionists, sign language interpreters are also expen-
less than 4s latency. To achieve this speed while maintain- sive and difficult to schedule.
ing high accuracy, Scribe integrates automated assistance in People learn to listen and speak at a natural rate of 120–180
two ways. First, its user interface directs workers to different words per minute (WPM).17 They acquire this skill effort-
portions of the audio stream, slows down the portion they lessly without direct instruction while growing up or being
are asked to type, and adaptively determines segment length immersed in daily linguistic interaction, unlike text genera-
based on typing speed. Second, it automatically merges the tion, which is a trained skill that averages 60–80 WPM for
partial input of multiple workers into a single transcript both handwriting29 and typing.14 Professional captionists
using a custom version of multiple-sequence alignment. (stenographers) can keep up with most speakers and pro-
Scribe illustrates the broad potential for deeply interleav- vide captions that are accurate (95%+) and real-time (within
ing human labor and machine intelligence to provide intel- a few seconds). But they are not on-demand (need to be pre-
ligent interactive services that neither can currently achieve booked for at least an hour), and are expensive ($120–$200
alone. per hour).30 As a result, professionals usually cannot provide
access for last minute lectures or other events, or for unpre-
dictable and ephemeral learning opportunities, such as con-
1. INTRODUCTION AND BACKGROUND versations with peers after class.
Real-time captioning converts speech to text in under 5s to pro- Automatic speech recognition (ASR) is inexpensive and
vide access to live speech content for deaf and hard of hearing available on-demand, but its low accuracy in many real set-
(DHH) people in classrooms, meetings, casual conversation, tings makes it unusable. For example, ASR accuracy drops
and other events. Current options are severely limited because below 50% when it is not trained on the speaker, caption-
they either require highly-skilled professional captionists ing multiple speakers, and/or when not using a high-quality
whose services are expensive and not available on demand, microphone located close to the speaker.3, 6 Both ASR and
or use automatic speech recognition (ASR) which produces the software used to assist real-time captionists often make
unacceptable error rates in many real-world situations.10 We errors that can change the meaning of the original speech.
present an approach that leverages groups of non-expert cap- As DHH people use context to compensate for errors, they
tionists (people who can hear and type, but are not specially often have trouble following the speaker.6
trained stenographers) to collectively caption speech in real- Our approach is to combine the efforts of multiple non-
time, and explore this new approach via Scribe, our end-to- expert captionists. Because these non-expert captionists can
end system allowing on-demand real-time captioning for live be drawn from more diverse labor pools than professional
events.19 Scribe integrates human and machine intelligence in captionists, they are more affordable and more easily avail-
real time to reliably caption speech at natural speaking rates. able on demand. Recent work has shown, for instance, that
The Word Health Organization (WHO) estimates that
around 5% of the world population, that is, 360 million peo-
Sign languages, such as American Sign Language (ASL) are not simply codes
ple, have disabling hearing loss.32 They struggle to under- for an aural language, but rather entirely different languages with their own
stand speech and benefit from visual input. Some combine vocabulary, grammar, and syntax.
lip-reading with listening, while others primarily watch
visual translations of aural information, such as sign lan-
The original version of this paper is entitled “Real-Time
guage interpreters or real-time typists. While visual access
Captioning by Groups of Non-Experts” and was published
to spoken material can be achieved through sign language
in UIST, 10/2012, ACM.
interpreters, many DHH people do not know sign language.
research highlights
workers on Mechanical Turk can be recruited within a few sec- metrics used in this paper. Methods for producing real-time
onds,1, 2, 11 and engaged in continuous tasks.21, 24, 25, 28 Recruiting captioning services come in three main varieties:
from a broader pool allows workers to be selectively chosen Computer-Aided Real-time Transcription (CART): CART
for their expertise not in captioning but in the technical is the most reliable real-time captioning service, but is
areas covered in a lecture. While professional stenographers also the most expensive. Trained stenographers type in
are able to type faster and more accurately than most crowd shorthand on a “steno” keyboard that maps multiple key
workers, they are not necessarily experts in the field they are presses to phonemes that are expanded to verbatim text.
captioning, which can lead to mistakes that distort the mean- Stenography requires 2–3 years of training to consistently
ing of transcripts of technical talks.30 Scribe allows student keep up with natural speaking rates that average 141 WPM
workers to serve as non-expert captionists for $8–$12 per hour and can reach 231 WPM.13
(a typical work-study pay rate). Therefore, we could hire sev- Non-Verbatim Captioning: In response to the cost of
eral students for much less than the cost of one professional CART, computer-based macro expansion services like
captionist. C-Print were introduced.30 C-Print captionists need less train-
Scribe makes it possible for non-experts to collabora- ing, and generally charge around $60 an hour. However, they
tively caption speech in real time by providing automated normally cannot type as fast as the average speaker’s pace,
assistance in two ways. First, it assists captionists by mak- and cannot produce a verbatim transcript. Scribe employs
ing the task easier for each individual. It directs each captionists with no training and compensates for slower
worker to type only part of the stream audio, it slows down typing speeds and lower accuracy by combining the efforts
the portion they are asked to type so they can more easily of multiple parallel captionists.
keep up, and it adaptively determines the segment length Automated Speech Recognition: ASR works well in ideal
based on each individual’s typing speed. Second, it solves situations with high-quality audio equipment, but degrades
the coordination problem for workers by automatically quickly in real-world settings. ASR is has difficulty recogniz-
merging the partial input of multiple workers into a single ing domain-specific jargon, and adapts poorly to changes,
transcript using a custom version of multiple-sequence such as when the speaker has a cold.6 ASR systems can
alignment. require substantial computing power and special audio
Because captions are dynamic, readers spend far more equipment to work well, which lowers availability. In our
mental effort reading real-time captions compared to experiments, we used Dragon Naturally Speaking 11.5 for
static text. Also, regardless of method, captions require Windows.
users to absorb information that is otherwise consumed Re-speaking: In settings where trained typists are not
via two senses (vision and hearing) via only one (vision). common (such as in the U.K.), alternatives have arisen. In
In classroom settings, this can be particularly common, re-speaking, a person listens to the speech and enunci-
with content appearing on the board and being refer- ates clearly into a high-quality microphone, often in a spe-
enced in speech. The effort required to track both the cial environment, so that ASR can produce captions with
captions and the material they pertain to simultaneously high accuracy. This approach is generally accurate, but
is one possible reason why deaf students often lag behind cannot produce punctuation, and has considerable delay.
their hearing peers, even with the best accomodations.26 Additionally, re-speaking still requires extensive training,
To address these issues, we also explore how captions since simultaneous speaking and listening is challenging.
can be best presented to users,16 and show that control-
ling bookmarks in caption playback can even increase 3. LEGION: SCRIBE
comprehension.22 Scribe gives users on-demand access to real-time cap-
This paper outlines the following contributions: tioning from groups of non-experts via their laptop or
mobile devices (Figure 1). When a user starts Scribe, it
• Scribe, an end-to-end system that has advantages over immediately begins recruiting workers to the task from
current state-of-the-art solutions in terms of availabil- Mechanical Turk, or a pool of volunteer workers, using
ity, cost, and accuracy. LegionTools.11, 20 When users want to begin captioning
• Evidence that non-experts can collectively cover speech audio, they press the start button, which forwards audio
at rates similar to or above that of a professional. to Flash Media Server (FMS) and signals the Scribe server
• Methods for quickly merging multiple partial captions to begin captioning.
to create a single, accurate stream of final results. Workers are presented with a text input interface
• Evidence that Scribe can produce transcripts that both designed to encourage real-time answers and increase
cover more of the input signal and are more accurate global coverage (Figure 2). A display shows workers their
than either ASR or any single constituent worker. rewards for contributing in the form of both money and
• The idea of automatically combining the real-time points. In our experiments, we paid workers $0.005 for
efforts of dynamic groups of workers to outperform every word the system thought was correct. As workers type,
individuals on human performance tasks. their input is forwarded to an input combiner on the Scribe
server. The input combiner is modular to accommodate dif-
2. CURRENT APPROACHES ferent implementations without needing to modify Scribe.
We first overview current approaches for real-time cap- The combiner and interface are discussed in more detail
tioning, introduce our data set, and define the evaluation later in this article.

Figure 1. Scribe allows users to caption audio on their mobile device. The audio is sent to multiple amateur captionists who use Scribe’s Web-based
interface to caption as much of the audio as they can in real time. These partial captions are sent to our server to be merged into a final output stream,
which is then forwarded back to the user’s mobile device. Crowd workers are optionally recruited to edit the captions after they have been merged.
Merging
Scribe has a two-fold axis
server
System overview
Speech we have a crystal
have a crystal that has

Flash media
server
Merged
Capt
ion s Crowd corrections captions
tream
Speech source Output

we have a crystal that has a two-fold axis... we have a crystal that has a two-fold axis
Figure 2. The original worker interface encourages captionists

The user interface for Scribe presents streaming text
to type quickly by locking in words soon after they are typed. To within a collaborative editing framework (see Figure 3).
encourage coverage of specific segments, visual and audio cues are Scribe’s interface masks the staggered and delayed format
presented, the volume is reduced during off periods, and rewards of real-time captions with a more natural flow that mimics
are increased during these periods.
writing. In doing this, the interface presents the merged
inputs from the crowd workers via a dynamically updating
Web page, and allows users to focus on reading, instead of
tracking changes. We have also developed methods for let-
ting users have more control over their own caption play-
back, which can improve comprehension.22 When users are
done, pressing stop will end the audio stream, but lets work-
ers complete their current transcription task. Workers are
asked to continue working on other audio for a time to keep
them active so that response time is reduced if users need to
resume captioning.
Though this article focuses on captioning speech from
a single person, Scribe can handle dialogues using auto-
mated speaker segmentation techniques. We use a stan-
dard convolution-based kernel method to first identify
distinct segments in a waveform. We then use a one-class
support vector machine (SVM) to classify each segment and
Figure 3. The Web-based interface that shows users the live caption assign a speaker ID.15 Prior work has shown such segmenta-
stream returned by Scribe. tion techniques to be accurate even in the presence of severe
noise, such as when talking on a cellphone while driving.12
The segmentation allows us to decompose a dialogue in real-
time, then caption each part individually, without burden-
ing workers with the need to determine and annotate which
person is currently speaking.
Our solution to the transcription problem is two-fold.
First, we designed an interface that facilitates real-time cap-
tioning by non-experts and encourages covering the entire
audio signal. Second, we developed algorithms for merging
partial captions to form one final output stream. The inter-
face and algorithm have been developed to address these
problems jointly. For instance, because determining where
each word in a partial caption fits into the final transcript
is difficult, we designed the interface to encourage work-
ers to type continuous segments during specified periods.
research highlights
In the following sections, we detail the co-evolution of the 6s. This seems to work well in practice, but it is likely that it
worker interface and algorithm for merging partial captions is not ideal for everyone (discussed below). Our experience
in order to form a final transcript. suggests that keeping the in period short is preferable even
when a particular worker was able to type more than the
4. COORDINATING CAPTIONISTS period because the latency of a worker’s input tended to go
Scribe’s non-expert captioning interface allows contributors up as they typed more consecutive words.
to hear an audio stream of the speaker(s), and provide cap-
tions with a simple user interface (UI) (Figure 2). Captionists 5. IMPROVING HUMAN PERFORMANCE
are instructed to type as much as they can, but are under no Even when workers are directed to small, specific portions of
pressure to type everything they hear. If they are able, work- the audio, the resulting partial captions are not perfect. This
ers are asked to separate contiguous sequences of words by is due to several factors, including bursts of increased speak-
pressing enter . Knowing which word sequences are likely to ing rates being common, and workers mis-hearing some con-
be contiguous can help later when recombining the partial tent due to a particular accent or audio disruption. To make
captions from multiple captionists. the task easier for workers, we created TimeWarp,23 which
To encourage real-time entry of captions, the interface allows each worker to type what they hear in clips with a lower
“locks in” words a short time after they are typed (500ms). playback rate, while still keeping up with real time and main-
New words are identified when the captionist types a space taining context from content they are not responsible for.
after the word, and are sent to the server. The delay is added to
allow workers to correct their input while adding as little addi- 5.1. Warping time
tional latency as possible to it. When the captionist presses TimeWarp manages this by balancing the play speed dur-
enter (or following a 2s timeout during which they have not ing in periods, where workers are expected to caption the
typed anything), the line is confirmed and animates upward. audio and the playback speed is reduced, and out periods,
During the 10–15s trip to the top of the display (depending where workers listen to the audio and the playback speed is
on settings), words that Scribe determines were entered cor- increased. A cycle is one in period followed by an out period.
rectly (based on either spell-checking or overlap with another At the beginning of each cycle, the worker’s position in the
worker) are colored green. When the line reaches the top, a audio is aligned with the real-time stream. To do this, we
point score is calculated for each word based on its length first need to select the number of different sets of workers
and whether it has been determined to be correct. N that will be used in order to partition the stream. We call
To recover the true speech, non-expert captions must the length of the in period Pi, the length of the out period Po
cover all of the words spoken. A primary reason why the par- and the play speed reduction factor r. Therefore, the play-
tial transcriptions may not fully cover the true signal relates back rate during in periods is 1r . The amount of the real-time
to saliency, which is defined in a linguistic context as “that stream that gets buffered while playing at the reduced speed
quality which determines how semantic material is distrib- is compensated for by an increased playback speed of N − 1
N−r
uted within a sentence or discourse, in terms of the relative during out periods. The result is that the cycle time of the modi-
emphasis which is placed on its various parts”.7 Numerous fied stream equals the cycle time of the unmodified stream.
factors influence what is salient, and so it is likely to be dif- To set the length of Pi for our experiments, we conducted
ficult to detect automatically. Instead, we inject artificial preliminary studies with 17 workers drawn from Mechanical
saliency adjustments by systematically varying the volume Turk. We found that their mean typing speed was 42.8 WPM
of the audio signal that captionists hear. Scribe’s captionist on a similar real-time captioning task. We also found that
interface is able to vary the volume over a given a period with a worker could type at most 8 words in a row on average before
an assigned offset. It also displays visual reminders of the the per-word latency exceeded 8s (our upper bound on accept-
period to further reinforce this notion. able latency). Since the mean speaking rate is around 150 WPM,13
Initially, we tried dividing the audio signal into segments workers will hear 8 words in roughly 3.2s, with an entry time
that we gave to individual workers. We found several prob- of roughly 8s from the last word spoken. We used this to set
lems with this approach. First, workers tended to take lon- Pi = 3.25s, Po = 9.75s, and N = 4. We chose r = 2 in our tests so that
ger to provide their transcriptions as it took them some time the playback speed would be 2 1 = 0.5 times for in periods, and
to get into the flow of the audio. A continuous stream avoids the play speed for out periods is N −1 3 times.
N − r = 2 = 1.5
this problem. Second, the interface seemed to encourage To speed up and slow down the play speed of content
workers to favor quality over speed, whereas streaming con- being provided to workers without changing the pitch
tent reminds workers of the real-time nature of the task. The (which would make the content more difficult to under-
continuous interface was designed in an iterative process stand for the worker), we use the Waveform Similarity
involving tests with 57 remote and local users with a range Based Overlap and Add (WSOLA) algorithm.4 WSOLA works
of backgrounds and typing abilities. These tests showed that by dividing the signal into small segments, then either
workers tended to provide chains of words rather than dis- skipping (to increase play speed) or adding (to decrease
joint words, and needed to be informed of the motivations play speed) content, and finally stitching these segments
behind aspects of the interface to use them properly. back together. To reduce the number of sound artifacts,
A non-obvious question is what the period of the volume WSOLA finds overlap points with similar wave forms, then
changes should be. In our experiments, we chose to play the gradually transitions between sequences during these
audio at regular volume for 4s and then at a lower volume for overlap periods.

5.2. Integrating ASR into crowd captioning 6.3. Weighted A* search algorithm
Combining ASR into human captioning workflows can also We next developed a weighted A* search based MSA algo-
help improve captioning performance. By using the sug- rithm to efficiently align the partial captions.27 To do this, we
gestions from an ASR system to provide an initial “base- formulate MSA as graph-traversal over a specialized lattice.
line” answer that crowd workers can correct, we can reduce Our search algorithm then takes each node as a state, allow-
latency. However, above an error rate of ≥ ∼ 30% error, the ing us to estimates the cost function g(n) and the heuristic
ASR input actually increases latency because of the cost of function h(n) for each state.
finding and repairing mistakes.9 The opposite integration At each step of the A* search algorithm, the node with the
is also possible: by using sparse human input to provide smallest evaluation function is extracted from the priority
corrections to the word lattice of an ASR system, it is pos- queue Q and expanded by one edge. This is repeated until a
sible to reduce the error rate.8 full alignment is produced (the goal state). While weighted
A* significantly speeds the search for the best alignment, it
6. AGGREGATING PARTIAL CAPTIONS is still too slow for very long sequences. To counteract this,
The problem of aligning and aggregating multiple par- we use fixed-size time windows to scope the exploration to
tial transcripts can be mapped to the well-studied Multiple the most-likely paths.
Sequence Alignment (MSA) problem. The basic formulation of
the problem involves some number of ordered sequences that 7. EXPERIMENTAL RESULTS
include at least some similar elements (coming from the same We have tested our system with non-expert captionists drawn
“dictionary” of possible terms plus a “gap” term). Finding the from both local and remote crowds. As a data set, we used
alignment that minimizes total distance between all pairs of lectures freely available from MIT OpenCourseWare. These
sequences is a non-trivial problem because, in the worst case, lectures were chosen because one of the main goals of Scribe
all possible alignments of the content of each sequence— is to provide captions for classroom activities, and because
including all possible spaces containing a gap term—may the recording of the lectures roughly matches our target as
need to be explored. This optimization problem has been well—there is a microphone in the room that often captures
shown to be NP-complete,31 and exact algorithms have time multiple speakers, for example, students asking questions.
complexity that is exponential in the number of sequences. We chose four 5 min segments that contained speech from
As a result, it is often necessary to apply heuristic approxima- courses in electrical engineering and chemistry, and had
tions to perform MSA with in a reasonable amount of time. them professionally transcribed at a cost of $1.75 per minute.
In practice, MSA is a well-studied problem in the bio- Despite the high cost, we found a number of errors and omis-
informatics literature that has long been used in aligning sions. We corrected these to obtain a completely accurate
genome sequences, but also has applications in approximate baseline.
text matching for information retrieval, and in many other
domains. Tools like MUSCLE Edgar5 provide extremely pow- 7.1. Core system study results
erful solvers for MSA problems. Accordingly, our approach Our study used 20 local participants. Each participant cap-
is to formulate our text-matching problem as MSA. tioned 23 min of aural speech over a period of approximately
30 min. Participants first took a standard typing test and
6.1. Progressive alignment algorithms averaged a typing rate of 77.0 WPM (SD=15.8) with 2.05%
Most MSA algorithms for biological sequences follow a average error (SD=2.31%). We then introduced participants
progressive alignment strategy that first performs pair- to the real-time captioning interface, and had them caption
wise alignment among the sequences, and then merges a 3 min clip using it. Participants were then asked to caption
sequences progressively according to a decreasing order of the four 5 min clips, two of which were selected to contain
pairwise similarity. Due to the sequential merging strategy, saliency adjustments. We measure coverage (recall within a
progressive alignment algorithms cannot recover from the 10s per-word time bound), precision, and WER.
errors made in the earlier iterations, and typically do not We found that saliency adjustment made a significant
work well for the caption alignment task. difference on coverage ranges. For the electrical engineer-
ing clip, the difference was 54.7% (SD=9.4%) for words in the
6.2. Graph-based alignment selected periods as compared to only 23.3% (SD=6.8%) for
We first explored a graph-based incremental algorithm to com- words outside of those periods. For the chemistry clips, the
bine partial captions on the fly.19 The aggregation algorithm difference was 50.4% (SD=9.2%) of words appearing inside
incrementally builds a chain graph, where each node repre- the highlighted period as compared to 15.4% (SD=4.3%) of
sents a set of equivalent words entered by the workers, and the words outside of the period.
links between nodes are adjusted according to the order of the To see if workers on Mechanical Turk could complete this
input words. A greedy search is performed to identify the path task effectively—which would open up a large new set of work-
with the highest confidence, based on worker input and an ers who are available on-demand—we recruited a crowd to
n-gram language model. The algorithm is designed to be used caption the four clips (20 min of speech). Our tasks paid $0.05
online, and hence has high speed and low latency. However, and workers could make an additional $0.002 bonus per word.
due to the incremental nature of the algorithm and the lack of We provided workers with a 40s instructional video to beign
a principled objective function, it is not guaranteed to find the
globally optimal alignment for the captions. http://ocw.mit.edu/courses/.
research highlights
with. In total, 18 workers participated, collectively achieving 7.3. TimeWarp results

78.0% coverage. The average coverage over just three work- To evaluate TimeWarp, we ran two studies that asked par-
ers was 59.7% (SD=10.9%), suggesting we could be conservative ticipants to caption a 2.5 min (12 captioning cycles) lecture
in recruiting workers and cover much of the input signal. clip. Again, we ran our experiments with both local partici-
In our tests, workers achieved an average of 29.0% cover- pants and workers recruited from Mechanical Turk. Tests
age, ASR achieved 32.3% coverage, CART achieved 88.5% cov- were divided into two conditions: time warping on or off,
erage and Scribe reached 74% out of a possible 93.2% coverage and were randomized across four possible time offsets: 0s,
using 10 workers (Figure 4). Collectively, workers had an aver- 3.25s, 6.5s, 9.75s.
age latency of 2.89 significantly improving on CART’s latency Local participants were again generally proficient (but
of 4.38s. For this example, we tuned our combiner to balance non-expert) typists and had time to acquaint themselves
coverage and precision (Figure 5), getting an average of 66% with the system, which may better approximate student
and 80.3% respectively. As expected, CART outperforms the employees captioning a classroom lecture. We recruited
other approaches. However, our combiner presents a clear 24 volunteers (mostly students) and had them practice with
improvement over both ASR and a single worker. our baseline interface before using the time warp interface.
Each worker was asked to complete two trials, one with
7.2. Improved combiner results TimeWarp and one without, in a random order.
We further improved alignment accuracy by applying a novel We also recruited 139 Mechanical Turk workers, who
weighted-A* MSA algorithm.27 To test this, we used the same were allowed to complete at most two tasks and were
four 5 min long audio clips as before. We tested three con- randomly routed to each condition (providing 257 total
figurations of our algorithm: (1) no agreement needed with responses). Since Mechanical Turk often contains low qual-
a 15s sliding window, (2) two-person agreement needed with ity (or even malicious workers),18 we first removed inputs
a 10s window, and (3) two-person agreement needed with a which got less than 10% coverage or precision or were outli-
15s window. We compare the results from these three con- ers more than 2σ from the mean. A total of 206 tasks were
figurations to our original graph-based method, and to the approved by this quick check. Task payment amounts were
MUSCLE package (Figure 6). the same as for our studies described above.
The with agreement and a 15s window (the best perform- Our student captionists were able to caption a major-
ing setting), our algorithm achieves 57.4% average (1-WER) ity of the content well even without TimeWarp. The mean
accuracy, providing 29.6% improvement with respect to the coverage from all 48 trials was 70.23% and the mean pre-
graph-based system (average accuracy 42.6%), and 35.4% cision was 70.71%, compared to the 50.83% coverage and
improvement with respect to the MUSCLE-based MSA sys- 62.23% precision for workers drawn from Mechanical
tem (average accuracy 41.9%). On the same set of audio clips, Turk. For student captionists, total coverage went up
we obtained 36.6% accuracy using ASR (Dragon Naturally 2.02%, from 69.54% to 70.95%, and precision went up by
Speaking, version 11.5 for Windows), which is worse than 2.56% from 69.84% to 71.63%, but neither of these differ-
all the crowd-powered approaches. We intentionally did not ences were detectably significant. However, there was a
optimize the ASR for the speaker or acoustics, since DHH significant improvement in mean latency per word, which
students would also not be able to do this in realistic settings. improved 22.46% from 4.34s to 3.36s (t(df) = 2.78, p <
Figure 4. Optimal coverage reaches nearly 80% when combining the input of four workers, and nearly 95% with all 10 workers, showing
captioning audio in real time with non-experts is feasible.
100%
90%
80%
Optimal
70%
Coverage
60% CART
50% SCRIBE
40% ASR
30%
Single
20%
10%
0%
1 2 3 4 5 6 7 8 9 10
Number of workers

Figure 5. Precision-coverage curves for the electrical engineering (EE)
8. CONCLUSION AND FUTURE WORK
and chemistry (Chem) lectures using different combiner parameters Scribe is the first system capable of making reliable, afford-
with 10 workers. In general, increasing coverage reduces accuracy. able captions available on-demand to deaf and hard of hearing
users. Scribe has allowed us to explore further issues related
to how real-time captions can be made more useful to end
users. For example, when captions are used, we have shown
that students’ comprehension of instructional material sig-
nificantly improves when they have the ability to control when
the captions play, and track their position so that they are not
overwhelmed by using one sensory channel to absorb content
that is designed to be split between both vision and hearing.
To help address this problem, we built a tool that lets students
highlight or pause at the last position they read before looking
away from the captions to view other visual content.22
While we have discussed how automation can be used to
effectively mediate human caption generation, advances in
ASR technologies can aid Scribe as well. By including ASR
systems as workers, we can take advantage of the affordable,
highly-scalable nature of ASR in settings where it works,
while using human workers to ensure that DHH users
always have access to accurate captions. ASR can eventually
Figure 6. Evaluation of different systems on using (1-WER) as an
accuracy measure (higher is better). use Scribe as an in situ training tool, resulting in systems
that are able to provide reliable captions right out of the
box using human intelligence, and scale to fully automated
0.6 0.55
0.57
solutions quicker than would otherwise be possible.
A*-10-t More generally, Scribe is an example of an interactive sys-
0.5
Avg (1.0-WER)
(c=10s, threshold=2)
0.44 tem that deeply integrates human and machine intelligence
0.42 A*-15-t
0.4 0.34
(c=15s, threshold=2) in order to provide a service that is still beyond what com-
A*-15 puters can do alone. We believe it may serve as a model for
0.3 (c=15s, no threshold)
interactive systems that solve other problems of this type.
0.2 Graph-based
MUSCL Acknowledgments
0.1
This work was supported by the National Science Foundation
0.0 under awards #IIS-1149709 and #IIS-1218209, the University
of Michigan, Google, an Alfred P. Sloan Foundation Fellowship,
and a Microsoft Research Ph.D. Fellowship.
Figure 7. Relative improvement from no warp to warp conditions
in terms of mean and median values of coverage, precision, and
latency. We expected coverage and precision to improve. Shorter References research 32, 5 (2004), 1792–1797.
latency was unexpected, but resulted from workers being able to 1. Bernstein, M.S., Brandt, J.R., Miller, R.C., 6. Elliot, L.B., Stinson, M.S., Easton, D.,
Karger, D.R. Crowds in two seconds: Bourgeois, J. College students learning
consistently type along with the audio instead of having to remember Enabling realtime crowd-powered with C-print’s education software
and go back as the speech outpaced their typing. interfaces. In Proceedings of the 24th and automatic speech recognition.
Annual ACM Symposium on User In American Educational Research
Mean Median Interface Software and Technology, Association Annual Meeting (New
20% UIST ‘11 (New York, NY, USA, 2011). York, NY, 2008), AERA.
Improvement (%)
ACM, 33–42. 7. Flowerdew, J.L. Salience in the

2. Bigham, J.P., Jayant, C., Ji, H., Little, G., performance of one speech act:the
15% Miller, A., Miller, R.C., Miller, R., case of definitions. Discource Processes
Tatarowicz, A., White, B., White, S., 15, 2 (Apr–June 1992), 165–181.
+19.1% Yeh, T. Vizwiz: Nearly real-time 8. Metze, F., Gaur, Y., Bigham, J. P.
10% answers to visual questions. In Manipulating word lattices to
+14.4% +16.8% Proceedings of the 23nd Annual incorporate human corrections. In
+11.4% +12.6% ACM Symposium on User Interface Proceedings of INTERSPEECH, (2016).

5% +11.2% Software and Technology, UIST ‘10, 9. Gaur, Y., Lasecki, W.S., Metze, F.,
(New York, NY, USA, 2010). ACM, Bigham, J.P. The effects of automatic
333–342. speech recognition quality on human
0% 3. Cooke, M., Green, P., Josifovski, L., transcription latency. In Proceedings
Vizinho, A. Robust automatic speech of the 13th Web for All Conference
Coverage Precision Latency recognition with missing (2016) ACM.
and unreliable acoustic data. 10. Glass, J.R., Hazen, T.J., Cyphers, D.S.,
Speech commun. 34, 3 (2001), Malioutov, I., Huynh, D., Barzilay, R.
267–285. Recent progress in the MIT spoken
4. Driedger, J. Time-scale modification lecture processing project. In
.01). Mechanical Turk workers’ mean coverage (Figure 7) algorithms for music audio signals. Interspeech (2007), 2553–2556.
Master’s thesis, Saarland University, 11. Gordon, M., Bigham, J.P., Lasecki,
increased 11.39% (t(df) = 2.19, p < .05), precision increased 2011. W.S. Legiontools: A toolkit+ UI for
12.61% (t(df) = 3.90, p < .001), and latency was reduced by 5. Edgar, R. Muscle: multiple sequence recruiting and routing crowds to
alignment with high accuracy and synchronous real-time tasks. In
16.77% (t(df) = 5.41, p < .001). high throughput. Nucleic acids Adjunct Proceedings of the 28th
research highlights
Annual ACM Symposium on User 2012, 2012. Proceedings of the 2013 conference International Foundation for
Interface Software & Technology 19. Lasecki, W., Miller, C., Sadilek, A., on Computer supported cooperative Autonomous Agents and Multiagent
(2015) ACM, 81–82. Abumoussa, A., Borrello, D., work (2013) ACM, 1203–1212. Systems (2015), 841–849.
12. Gordon-Salant, S. Aging, hearing Kushalnagar, R., Bigham, J. Real- 26. Marschark, M., Sapere, P., Convertino, C., 29. Turner, O.G. The comparative legibility
loss, and speech recognition: stop time captioning by groups of non- Seewagen, R. Access to postsecondary and speed of manuscript and cursive
shouting, i can’t understand you. In experts. In Proceedings of the 25th education through sign language handwriting. The Elementary School
Perspectives on Auditory Research, Annual ACM Symposium on User interpreting. J Deaf Stud Deaf Educ. Journal (1930), 780–786.
volume 50 of Springer Handbook of Interface Software and Technology, 10, 1 (Jan. 2005), 38–50. 30. Wald, M. Creating accessible
Auditory Research. A.N. Popper and UIST ‘12, (2012), 23–34. 27. Naim, I., Gildea, D., Lasecki, W.S., educational multimedia through
R.R. Fay, eds. Springer New York, 20. Lasecki, W.S., Gordon, M., Koutra, D., Bigham, J.P. Text alignment for editing automatic speech recognition
2014, 211–228. Jung, M.F., Dow, S.P., Bigham, J.P. real-time crowd captioning. In captioning in real time. Interactive
13. Jensema, C., McCann, R., Ramsey, S. Glance: rapidly coding behavioral Proceedings North American Chapter Technology and Smart Education 3, 2
Closed-captioned television presentation video with the crowd. In Proceedings of the Association for Computational (2006), 131–141.
speed and vocabulary. In Am Ann of the 27th Annual ACM Symposium Linguistics (NAACL) (2013), 201–210. 31. Wang, L., Jiang, T. On the complexity
Deaf 140, 4 (October 1996), 284–292. on User Interface Software and 28. Salisbury, E., Stein, S., Ramchurn, S. of multiple sequence alignment.
14. John, B.E. Newell, A. Cumulating Technology, UIST ‘14, (New York, NY, Real-time opinion aggregation J Comput Biol. 1, 4 (1994), 337–348.
the science of HCI: from s-R 2014). ACM, 1. methods for crowd robotics. In 32. World Health Organization. Deafness
compatibility to transcription typing. 21. Lasecki, W.S., Homan, C., Bigham, J.P. Proceedings of the 2015 International and hearing loss, fact sheet N300.
ACM SIGCHI Bulletin 20, SI (Mar. Architecting real-time crowd-powered Conference on Autonomous http://www.who.int/mediacentre/
1989), 109–114. systems. Human Computation 1, 1 Agents and Multiagent Systems. factsheets/fs300/en/, February 2014.
15. Kadri, H., Davy, M., Rabaoui, A., (2014).
Lachiri, Z., Ellouze, N., et al. Robust 22. Lasecki, W.S., Kushalnagar, R.,
audio speaker segmentation using Bigham, J.P. Helping students keep Walter S. Lasecki (wlasecki@umich. Raja Kushalnagar (raja.kushalnagar@
one class SVMs. In IEEE European up with real-time captions by pausing edu), Computer Science & Engineering, gallaudet.edu), Information Technology
Signal Processing Conference and highlighting. In Proceedings of the University of Michigan. Program, Gallaudet University.
(Lausanne, Switzerland, 2008) ISSN: 11th Web for All Conference, W4A ‘14
2219-5491. (New York, NY, 2014). ACM, 39:1–39:8. Christopher D. Miller, Iftekhar Naim, Jeffrey P. Bigham (jbigham@cmu.edu),
16. Kushalnagar, R.S., Lasecki, W.S., 23. Lasecki, W.S., Miller, C.D., Bigham, J.P. Adam Sadilek, and Daniel Gildea (c.miller HCI and LT Institutes, Carnegie Mellon
Bigham, J.P. Captions versus Warping time for more effective @rochester.edu) ({inaim,sadilek,gildea}@ University.
transcripts for online video content. In real-time crowdsourcing. In cs.rochester.edu), Computer Science
Proceedings of the 10th International Proceedings of the SIGCHI Department, University of Rochester.
Cross-Disciplinary Conference on Web Conference on Human Factors in
Accessibility, W4A ’13, (New York, NY, Computing Systems, CHI ‘13 (New
2013), ACM, 32:1–32:4. York, NY, 2013). ACM, 2033–2036.
17. Kushalnagar, R.S., Lasecki, W.S., 24. Lasecki, W.S., Murray, K., White, S.,
Bigham, J.P. Accessibility evaluation Miller, R.C., Bigham, J.P. Real-time
of classroom captions. ACM Trans crowd control of existing interfaces.
Access Comput. 5, 3 (Jan. 2014), In Proceedings of the 24th Annual
1–24. ACM Symposium on User Interface
18. Lasecki, W. Bigham, J. Online Software and Technology, UIST ‘11,
quality control for real-time crowd (New York, NY, 2011). ACM, 23–32.
captioning. In International 25. Lasecki, W.S., Song, Y.C., Kautz, H.,
ACM SIGACCESS Conference on Bigham, J.P. Real-time crowd labeling
Computers & Accessibility, ASSETS for deployable activity recognition. In © 2017 ACM 0001-0782/17/09 $15.00
Without a clear understanding of the

human side of virtual reality,
the experience will always fail.
“Dr. Jerald has recognized a great need in our
community and filled it. The VR Book is a scholarly
and comprehensive treatment of the user interface
dynamics surrounding the development and
application of virtual reality. I have made it a required
reading for my students and research colleagues. Well
done!”
- Professor Tom Furness, University of Washington
VR Pioneer and Founder of HIT Lab International
and the Virtual World Society
100 CO MM UNICATIO NS O F T H E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

CAREERS
Brigham Young University Church of Jesus Christ of Latter-day Saints. Suc- Applying
Faculty Position cessful candidates are expected to support and To apply via Academic Jobs Online submit (1) cur-
contribute to the academic and religious mis- riculum vitae, (2) graduate transcripts, (3) three
The Department of Electrical and Computer sions of the university within the context of the letters of recommendation (at least one of which
Engineering at Brigham Young University an- principles and doctrine of the affiliated Church. discusses your potential as a teacher), (4) a cover
nounces an opening for a professorial continu- Equal Opportunity Employer: m/f/Vets/ letter that addresses why you are interested in
ing-faculty-status (tenure) track position. While Disability Macalester, (5) a statement of teaching philoso-
our preference is in the area of Computer Engi- phy, and (6) a research statement. Please contact
neering, applicants in all areas of Electrical and Shilad Sen at ssen@macalester.edu with any
Computer Engineering will be considered. Macalester College questions about the position. Evaluation of appli-
Areas of interest include but are not limited Two Tenure-Track Assistant Professors of cations will begin October 15, 2017 and continue
to: Computer Systems (including architecture, Computer Science until the position is filled.
IoT and embedded/real-time systems, network- Apply now: https://www.macalester.edu/
ing, security, software, compilers, O/S, parallel Macalester invites applications for two tenure- academics/mscs/compscitenure-trackjob.html
systems, etc.), Robotics and Autonomous Sys- track positions at the assistant professor level to
tems, Computer Vision, Machine Learning, Data begin Fall 2018. Candidates must have or be com-
Science, Distributed Systems, and Digital Sys- pleting a PhD in Computer Science and have a National University of Singapore
tems Design (FPGA and/or VLSI). strong commitment to both teaching and research Senior and Junior Tenure-Track Faculty
The department has state-of-the-art facilities in an undergraduate liberal arts environment. We Positions in Artificial Intelligence
in computing and supercomputing, autonomous are especially interested in candidates who are en-
vehicles and computer vision, control systems, thusiastic to teach a broad range of undergraduate The Department of Computer Science at the Na-
optics, and microelectronic fabrication. Excel- courses. This person will contribute to the teach- tional University of Singapore (NUS) invites appli-
lent research programs exist in the department in ing of our introductory, core and advanced cours- cations for one Distinguished Professorship and
the areas of FPGA-based computing, high-perfor- es, and mentor undergraduate research. several tenure-track faculty positions in artificial
mance embedded systems, autonomous vehicles Macalester offers majors in Computer Sci- intelligence, machine learning, computational
and control, robotics and computer vision, high- ence, Mathematics, and Applied Mathematics neuroscience and related areas of robotics. The
speed low-power electronics, digital communi- and Statistics, and minors in Computer Science, Department enjoys ample research funding,
cations systems, signal processing, biomedical Mathematics, and Statistics, as well as a new mi- moderate teaching loads, excellent facilities, and
imaging, optics, and microfluidics. Successful nor in Data Science. Typical class sizes range from extensive international collaborations. We have
candidates will be expected to strengthen under- 15 to 32 students. We encourage innovative peda- a full range of faculty covering all major research
graduate and graduate education and to develop gogy and curriculum and emphasize computer areas in computer science and a thriving PhD pro-
an outstanding research program to complement science’s interdisciplinary connections. We have gram that attracts the brightest students from the
existing research or develop new research areas. close relationships with several disciplines both region and beyond. More information is available
The ACT score for the average BYU entering within and beyond the sciences, and we are inter- at www.comp.nus.edu.sg/careers.
freshman is above the 90th percentile nationally. ested in candidates whose work spans disciplin- NUS offers highly competitive salaries and is
BYU is also fifth on the NSF’s list of U.S. baccalau- ary boundaries. Areas of highest priority include situated in Singapore, an English-speaking cosmo-
reate-origin institutions for engineering doctorate computer and data security and privacy, mobile politan city that is a melting pot of many cultures,
recipients. We expect our faculty to challenge these and ubiquitous computing, computer networks both the east and the west. Singapore offers a safe
outstanding students to reach their potential. and systems. For more information about our and family-friend environment with high qual-
Successful candidates will be hired at the programs, see: http://macalester.edu/mscs ity education and healthcare at all levels, as well
assistant, associate, or full professor level de- as very low tax rates. Singapore has also recently
pending on experience. Requirements include About Macalester launched a S$150 million national initiative, AI.SG,
a doctorate in computer engineering, computer Macalester College is a highly selective, private to expand research, development, and adoption of
science, electrical engineering, or closely related liberal arts college in the vibrant Minneapolis- AI technologies. AI.SG will be hosted at NUS.
field and a willingness to fully support and par- Saint Paul metropolitan area. The Twin Cities Candidates for the Distinguished Profes-
ticipate in the ideals and mission of BYU. have a population of approximately three million, sor position should have an established record
An on-line application for this position can a rich arts community, strong local industries, of outstanding research achievements, thought
be found at: https://yjobs.byu.edu, job posting an award-winning parks system, and are home leadership, and international stature in artificial
#64783. to many colleges and universities, including the intelligence.
Questions regarding the position can be di- University of Minnesota. Macalester’s diverse Candidates for Assistant Professor positions
rected to: student body comprises over 2000 undergradu- should demonstrate excellent research poten-
Dr. Aaron Hawkins, Faculty Committee Chair ates from 40 states and the District of Columbia tial in AI, and a strong commitment to teaching.
Dept of ECE, Brigham Young University and over 90 nations. The College maintains a Truly outstanding Assistant Professor applicants
459 CB longstanding commitment to academic excel- will be considered for the endowed Sung Kah Kay
Provo UT 84602 lence with a special emphasis on international- Assistant Professorship.
ahawkins@byu.edu ism, multiculturalism, and service to society. We
are especially interested in applicants dedicated Application Details:
*The position will remain open until filled. to excellence in teaching and research/creative Submit the following documents (in a single PDF)
** Brigham Young University is an equal op- activity within a liberal arts college community. online via: https://faces.comp.nus.edu.sg
portunity employer. All faculty are required to As an Equal Opportunity employer supportive of 1.
A cover letter that indicates the position
abide by the university’s Honor Code and Dress affirmative efforts to achieve diversity among its applied for and the main research interests
& Grooming Standards. Strong preference will be faculty, Macalester College strongly encourages 2. Curriculum Vitae
given to qualified candidates who are members applications from women and members of un- 3. A teaching statement
in good standing of the affiliated Church, The derrepresented minority groups. 4. A research statement
SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF T H E ACM 101

CAREERS
˲˲ Provide the contact information of 3 referees team player who can help bring together current cants to apply, including women, veterans, indi-
when submitting your online application, or, ar- campus efforts in cyber security or privacy. In par- viduals with disabilities, and members of tradi-
range for at least 3 references to be sent directly ticular, we are looking for someone who will work tionally underrepresented populations.
to csrec@comp.nus.edu.sg. at the intersection of several areas, such as: (a) For questions, please contact the Cluster’s
˲˲ Application reviews will commence immedi- hardware and IoT security, (b) explaining and pre- Search Committee Chair, Gary T. Leavens, at
ately and continue until positions are filled. dicting human behavior, creating policies, study- Leavens@ucf.edu.
˲˲ Please submit your application by 1 December ing ethics, and ensuring privacy, (c) cryptography
2017. and theory of security or privacy, or (d) tools, meth-
If you have further enquiries, please contact ods, training, and evaluation of human behavior. University of Central Florida
the Search Committee Chair, Weng-Fai Wong, at Minimum qualifications include a Ph.D., ter- Cluster Lead, Cyber Security and Privacy
csrec@comp.nus.edu.sg minal degree, or foreign degree equivalent from Cluster
an accredited institution in an area appropriate
to the cluster, and a record of high impact re- The University of Central Florida (UCF) is recruit-
University of Central Florida search related to cyber security and privacy, dem- ing a lead for its cluster on cyber security and
Assistant or Associate Professor in Faculty onstrated by a strong scholarly and/or funding re- privacy. This position has a start date of August 8,
Cluster for Cyber Security and Privacy cord. A history of working with teams, especially 2018. The position will carry a rank of associate
teams that span multiple disciplines, is a strongly or full professor, commensurate with the candi-
The University of Central Florida (UCF) is recruit- preferred qualification. The position will carry a date’s prior experience and record. The lead is ex-
ing a tenure-track assistant or associate professor rank commensurate with the candidate’s prior pected to have credentials and qualifications like
for its cyber security and privacy cluster. This po- experience and record. those expected of a tenured associate or full pro-
sition has a start date of August 8, 2018. Candidates must apply online at https://www. fessor. To obtain tenure, the selected candidate
This will be an interdisciplinary position that jobswithucf.com/postings/50404 and attach the must have a demonstrated record of teaching,
will be expected to strengthen both the cluster and following materials: a cover letter, curriculum vi- research and service commensurate with rank.
a chosen tenure home department, as well as a pos- tae, teaching statement, research statement, and This will be an interdisciplinary position that
sible combination of joint appointments. The can- contact information for three professional refer- will be expected to strengthen both the cluster
didate can choose a combination of units from the ences. In the cover letter candidates must address and a chosen tenure home department, as well
cluster for their appointment (see http://www.ucf. their background in cyber security and privacy, as a possible combination of joint appointments.
edu/faculty/cluster/cyber-security-and-privacy/). and identify the department or departments for The candidate can choose a combination of units
The ideal junior candidates will have a strong their potential tenure home and the joint ap- from the cluster for their appointment. (See http://
background in cyber security and privacy, and be pointments they would desire. When applying, www.ucf.edu/faculty/cluster/cyber-security-and-
on an upward leadership trajectory in these areas. have all documents ready so they can be attached privacy/.) Both individual and interdisciplinary in-
They will have research impact, as reflected in at that time, as the system does not allow resub- frastructure and startup support will be provided.
high-quality publications and the ability to build a mittal to update applications. The ideal candidate will have a strong back-
well-funded research program. All relevant techni- As an equal opportunity/affirmative action ground in cyber security and privacy and outstand-
cal areas will be considered. We are looking for a employer, UCF encourages all qualified appli- ing research credentials and research impact, as
reflected in a sustained record of high quality pub-
lications and external funding. All relevant techni-
cal areas will be considered including: network
security, cryptography, blockchains, hardware
security, trusted computing bases, cloud comput-
ing, human factors, anomaly detection, forensics,
privacy, and software security, as well as appli-
cations of security and privacy to areas such as
TENURE-TRACK AND TENURED POSITIONS IoT, cyber-physical systems, finance, and insider
ShanghaiTech University invites highly qualified threats. A history of working with teams, especially
candidates to fill multiple tenure-track/tenured teams that span multiple disciplines, is a strongly
faculty positions as its core founding team in the School of Information Science and
Technology (SIST). We seek candidates with exceptional academic records or demonstrated preferred qualification. A record of demonstrated
strong potentials in all cutting-edge research areas of information science and technology. leadership is highly desired, as we are looking for
They must be fluent in English. English-based overseas academic training or background
is highly desired. a leader to bring together all the current campus
efforts in cyber security and privacy. This includes
ShanghaiTech is founded as a world-class research university for training future generations
of scientists, entrepreneurs, and technical leaders. Boasting a new modern campus in three cluster members already hired, as well as a
Zhangjiang Hightech Park of cosmopolitan Shanghai, ShanghaiTech shall trail-blaze a new pending hire for the 2017-18 academic year.
education system in China. Besides establishing and maintaining a world-class research
profile, faculty candidates are also expected to contribute substantially to both graduate Minimum qualifications include a Ph.D. from
and undergraduate educations. an accredited institution in an appropriate area,
Academic Disciplines: Candidates in all areas of information science and technology shall and a record of high impact research related to cy-
be considered. Our recruitment focus includes, but is not limited to: computer architecture, ber security and privacy demonstrated by a strong
software engineering, database, computer security, VLSI, solid state and nano electronics, RF
electronics, information and signal processing, networking, security, computational foundations, scholarly publication record and a significant
big data analytics, data mining, visualization, computer vision, bio-inspired computing systems, amount of sustained funding.
power electronics, power systems, machine and motor drive, power management IC as well as Candidates must apply online at http://www.
inter-disciplinary areas involving information science and technology.
jobswithucf.com/postings/50044 and upload the
Compensation and Benefits: Salary and startup funds are highly competitive,
commensurate with experience and academic accomplishment. We also offer a following materials: cover letter, CV, teaching and
comprehensive benefit package to employees and eligible dependents, including on- research statements, and contact information for
campus housing. All regular ShanghaiTech faculty members will join its new tenure-track 3 professional references. In the cover letter, can-
system in accordance with international practice for progress evaluation and promotion.
didates should address their background, and
Qualifications:
• Strong research productivity and demonstrated potentials; identify the department for their potential tenure
• Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, Statistics, home and any desired joint appointments.
Applied Math, or related field); An equal opportunity/affirmative action em-
• A minimum relevant (including PhD) research experience of 4 years.
ployer, UCF encourages all qualified applicants
Applications: Submit (in English, PDF version) a cover letter, a 2-page
research plan, a CV plus copies of 3 most significant publications, and names to apply, including women, veterans, individuals
of three referees to: sist@shanghaitech.edu.cn. For more information, visit with disabilities, and members of traditionally
http://sist.shanghaitech.edu.cn/NewsDetail.asp?id=373 underrepresented populations.
Deadline: The positions will be open until they are filled by appropriate candidates. Questions can be directed to the search com-
mittee chair, Gary T. Leavens, at Leavens@ucf.edu.
102 COMM UNICATIO NS O F T H E ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

last byte
[ C ONTI N U E D FRO M P. 104]ing.
Coming Next Month in COMMUNICATIONS

Another example is the Shannon
trick of synthesizing text. Imagine “We don’t see things
if you start typing an SMS on your as they are; we see
phone but you keep using the predic-
tive function. The algorithm is very them tinted by
basic—it’s just “look for the last time language and culture
something like this occurred and
steal the next most probable letter.” and all the baggage.”
But you get really interesting results,
because you have a lot of data.
Thanks to the Internet, you’ve got ac-

cess to a massive corpus of data. Didn’t
one of your early papers examine two make new discoveries in ways people Barriers to Refactoring
million images from Flickr? have not been able to do before. I
Exactly. Initially, we said, “We’ll would love to discover something that
just download 20,000 images.” The re- people haven’t noticed yet. Internet Advertising
sults weren’t great. But my then-grad
student, James Hays, was like, “Why What about your recent discovery, in an Millennials’
don’t we just keep downloading?” If analysis of 150,000 American yearbook Attitude Toward
you look at the big neural networks photos, that people’s smiles broad-
IT Consumerization
right now, it is really impressive what ened during each decade since 1900?
they can do. But I think people are for- For the portraits, we were very hap- in the Workplace
getting that one of the reasons they’re py to see the increase in smiling over
so powerful is that they are able to time. We thought, wow, this is a re- What Can Agile Methods
gobble up orders of magnitude more ally cool discovery. Of course, then we Bring to High-Integrity
data than we could do with earlier found some psychological literature
Software Development?
methods. This is not very glamorous, that indicates people have already no-
because it suggests that humans are ticed this.
not so smart. It’s really the data. Programming
Your work has found applications in Languages and
That reminds me of the old philo- areas from entertainment to security. Code Quality
sophical debate about experiential vs. What other pie-in-the-sky applications
in Github
a priori knowledge. or discoveries do you hope to see?
People like to rationalize. They like Frankly, my goal has always been
to get a nice beautiful theory of the to understand and model biologi- Multi-Objective
world. But reality is often really noisy cal vision. Human vision is too hard, Parametric Query
and complicated, and in a way, data al- because it connects with everything Optimization
lows you to use this complexity, to not else. We don’t see things as they are;
have to throw it away. It’s not the mini- we see them tinted by language and
malist beauty, the clean lines. It’s the culture and all the baggage. But if I’m Metaphors
beauty of a jumbled mess. able to build a model of a rabbit’s vi- We Compute By
sion or a rat’s vision by the time I re-
Your analyses of photographic data tire, I think that would be absolutely Research for Practice
sets like faces and building facades fantastic. Imagine having a model of
have also revealed lots of visual trends this remarkable apparatus that al-
Why the Bell Curve
that might not otherwise have been most all living creatures possess.
easy to notice. Now, because this is such a hard Hasn’t Transformed
That is a big beautiful promise problem, you don’t get wins very often. Into a Hockey Stick
and we’re only scratching the sur- A lot of the time, it’s a depressing slog.
face. People are good at finding cer- But once in a while, as a kind of by-
tain kinds of patterns. We can hold a product, some really neat things come
small number of things in our minds up that you can use to create pretty
and compare them. We are not able pictures. And I think the world needs
to find a tiny, tiny little pattern over more pretty pictures.
Plus the latest news on printing
thousands or millions of data points,
3D body parts, computerized
or very subtle changes over a long Leah Hoffmann is a technology writer based in Piermont, NY. sound processing, and whether
range of time. Using computer vision smartphones harm children.
and techniques, I’m hoping we can © 2017 ACM 0001-0782/17/09 $15.00
SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF T H E ACM 103

last byte
DOI:10.1145/3121444 Leah Hoffmann
Q&A
All The Pretty Pictures
Alexei Efros, recipient of the 2016 ACM Prize in Computing,
works to harness the power of visual complexity.
DESPITE the fact that he does not see smart kids go into CS, and many look
very well, Alexei Efros, recipient of the down at all of these humanities peo-
2016 ACM Prize in Computing and a ple with disdain. In my classes, I try to
professor at the University of California remind them that computer scientists
at Berkeley, has spent most of his ca- are hot now, but physicists were hot
reer trying to understand, model, and in the Sixties, and chemists were hot
recreate the visual world. Drawing on in the Thirties, and they’re not super-
the massive collection of images on the hot now. Shakespeare is going to be
Internet, he has used machine learn- around much longer than Python.
ing algorithms to manipulate objects
in photographs, translate black-and- How did you get involved with com-
white images into color, and identify puter vision, graphics, and machine
architecturally revealing details about learning?
cities. Here, he talks about harnessing Even in high school, my goal was to
the power of visual complexity. solve AI. But then I reasoned it out: AI
is too hard, and you don’t know when
You were born in St. Petersburg (Russia), you’re succeeding. With language, you
and were 14 when you came to the U.S. kind of know when you’re succeeding,
What drew you to computer science? but that’s also very high-level. Mean-
I was interested in computers from while, almost all animals have vision.
an early age. I remember reading a Interestingly enough, I was actu- Vision seems like the most basic thing,
book about PDP-11 assembly lan- ally considering whether I should go so it’s got to be easy, right?
guage programming when I was 12 into computer science (CS) or theater.
and dreaming about how one day, I In fact, I applied to Carnegie Mellon Of course.
might actually have a computer of my University because it’s one of the top Basically, I think I’ve just had one
own to try this out in practice. Then, departments in CS, but also one of idea throughout my whole career, and
in high school, I did some research the top universities for theater. Then I’ve been milking it since undergrad,
with a professor at the University of I showed my father the tuition, and, and the idea is not even that profound.
Utah. It sounds kind of brazen, but I well, we were immigrants. So I went It’s that we fetishize intellectual con-
went to the CS department and was to the University of Utah, where CS tributions—algorithms, data struc-
like, “Bring me to your chairman.” was much stronger than theater, and I tures, and so on. And we often forget
Tom Henderson was the chair at that think I got a very good education. But that a lot of the complexity in the
PHOTO BY NOA H BERGER, COU RTESY OF UC BERKEL EY
time and, you know, he actually saw I’m still practicing my stagecraft twice world is actually due to the data. My
me. I told him that I wanted to do a week in my classes. favorite example is in computer graph-
computer science and asked him for ics. We know how light behaves, and
a problem. And he basically said, “Ok, I’ve seen your talks. You’re a very en- we can simulate everything we want.
weird Russian kid. I have a robot run- gaging speaker. But the reason current animated mov-
ning around; do you want to help with There is this whole dichotomy be- ies don’t look like the real thing is the
that?” It was wonderful. tween the geeks and the artsy people— data. There is a lot of entropy in the
either you are good with numbers, or world and it’s just too hard to capture.
You did your undergraduate work at with arts and humanities. I think it’s The algorithms are fine. It’s the data
the University of Utah, as well. misplaced. CS is hot right now. A lot of that is miss- [C O NTINUED O N P. 103]
104 COMM UNICATIO NS O F T H E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

CONFERENCE 27 – 30 November 2017
EXHIBITION 28 – 30 November 2017
BITEC, Bangkok, Thailand
THE CELEBRATION OF LIFE & TECHNOLOGY

The 10th ACM SIGGRAPH Conference and Exhibition
on Computer Graphics and Interactive Techniques in Asia
Register online by 15 October 2017,

& enjoy early bird discounts of up to
SA2017.SIGGRAPH.ORG/REGISTRATION
20%
Sponsored by Organized by

Communications201709-Dl - Moving Beyond The Turing Test With The Allen AI Science Challenge

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Communications201709-Dl - Moving Beyond The Turing Test With The Allen AI Science Challenge

Uploaded by

Copyright:

Available Formats

COMMUNICATIONS

The ACM Canadian Celebration of Women in Computing

The Canadian Celebration of

Come celebrate with us at the largest

For more information contact us at

1966 A.J. Perlis

Departments News Viewpoints

5 Letter from Members of 26 Law and Technology

2 COMMUNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

Practice Contributed Articles Review Articles

Watch the author discuss

fact checkers, facts are often still Speech in Real Time

SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF THE ACM 3

4 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF THE ACM 5

DOI:10.1145/3130331 Vinton G. Cerf

Take Two Aspirin and

6 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

DOI:10.1145/3122847 Moshe Y. Vardi

Divination by Program Committee

SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF THE ACM 7

8 COMMUNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | NO. 9

SE PT E MB E R 2 0 1 7 | VO L. 6 0 | N O. 9 | C OM M U N IC AT ION S OF THE ACM 9

Follow us on Twitter at http://twitter.com/blogCACM

10 COMMUNICATIO NS O F TH E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

Now accepting submissions to ACM THRI

Founded in 2012, the Journal of HRI has been serving as the

Since that time, the human-robot interaction field has

THRI now joins the ACM portfolio of highly respected

Editors-in-Chief Odest Chadwicke Jenkins of the University of Michigan and Selma

The inaugural issue of the rebranded ACM Transactions on Human-Robot Interaction is

Science | DOI:10.1145/3121434 Samuel Greengard

It’s All About Image

from how linguistic patterns contrib- ing methods, as well as unsupervised

14 COMMUNICATIO NS O F TH E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

that the discriminator, referred to as a

Technology | DOI:10.1145/3121442 Gregory Mone

16 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

Society | DOI:10.1145/3121436 Logan Kugler

Why GPS Spoofing Is a Threat

18 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

Milestones | DOI:10.1145/3122790 Lawrence M. Fisher

PHOTOGRA PHS BY M ISTI L AYNE

20 COM MUNICATIO NS O F TH E ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

asymmetric public-key cryptography,

Leonard Adleman (2002).

Kenneth Thompson (1983).

Judea Pearl (2011) moderated a panel on deep neural networks.

Andrew Chi-Chih Yao (2000).

The newest Turing Laureate—Sir Tim Berners-Lee.

22 COMM UNICATIO NS O F THE AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

In Memoriam | DOI:10.1145/3125605 Lawrence M. Fisher

24 COMM UNICATIO NS O F THE ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

DOI:10.1145/3126489 Joel R. Reidenberg

Law and Technology

26 COM MUNICATIO NS O F TH E AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

28 COMMUNICATIO NS O F TH E ACM | S EPTEM BER 201 7 | VO L . 60 | N O. 9

DOI:10.1145/3126492 Carolina Alves de Lima Salge and Nicholas Berente

nocuous or even beneficial. For exam- tivity is simply a nuisance or whether

30 COMM UNICATIO NS O F THE AC M | S EPTEM BER 201 7 | VO L . 60 | N O. 9

The social bot finds every tweet with Conclusion

t http://bit.ly/19SJwlt u http://bit.ly/2tiPfMH Copyright held by authors.

DOI:10.1145/3126494 Peter J. Denning