You are on page 1of 102

MARCH/APRIL 2007

VOLUME 5, NUMBER 2

BUILDING CONFIDENCE IN A NETWORKED WORLD

Features
Malware

15 Guest Editor’s Introduction


IVÁN ARCE

17 Studying Bluetooth Malware


Propagation: The BlueBag Project
LUCA CARETTONI, CLAUDIO MERLONI, AND STEFANO ZANERO

Bluetooth worms currently pose relatively little


danger compared to Internet scanning worms.
The BlueBag project shows targeted attacks
through Bluetooth malware using proof-of-
concept codes and mobile devices. COVER ARTWORK BY GIACOMO MARCHESI, WWW.GIACOMOMARCHESI.COM

26 Alien vs. Quine


VANESSA GRATZER AND DAVID NACCACHE
identification of malware. Entropy analysis
enables analysts to quickly and efficiently
identify packed and encrypted samples.
Is it possible to prove a computer is malware-free
without pulling out its hard disk? The authors
introduce a hardware inspection technique based
46 Code Normalization
for Self-Mutating Malware
on the injection of carefully crafted code and the DANILO BRUSCHI, LORENZO MARTIGNONI, AND MATTIA MONGA
analysis of its output and execution time.
Next-generation malware will adopt self-mutation

32 Toward Automated Dynamic


Malware Analysis Using CWSandbox
to circumvent current detection techniques. The
authors’ strategy reduces different instances of the
CARSTEN WILLEMS, THORSTEN HOLZ , AND FELIX FREILING same malware into a common form that can
enable accurate detection.
The authors present CWSandbox, which
executes malware samples in a simulated Identity Management
environment, monitors all system calls, and
automatically generates a detailed report to
simplify the malware analyst’s task.
55 Trust Negotiation in Identity Management
ABHILASHA BHARGAV-SPANTZEL, ANNA C. SQUICCIARINI,
AND ELISA BERTINO

40 Using Entropy Analysis


to Find Encrypted and Packed Malware
Most organizations require the verification of
personal information before providing services;
ROBERT LYDA AND JAMES HAMROCK the privacy of such information is of growing
In statically analyzing large sample collections, concern. The authors show how federal IdM
packed and encrypted malware pose a systems can better protect users’ information
significant challenge to automating the when integrated with trust negotiation.

Postmaster: Send undelivered copies and address changes to IEEE Security & Privacy, Circulation Dept., PO Box 3014, Los Alamitos, CA 90720-1314. Periodicals postage rate paid at New York, NY, and at additional mailing offices. Canadian
GST #125634188. Canada Post Publications Mail Agreement Number 40013885. Return undeliverable Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8. Printed in the USA.
Circulation: IEEE Security & Privacy (ISSN 1540-7993) is published bimonthly by the IEEE Computer Society. IEEE Headquarters, Three Park Ave., 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los
Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821 8380; IEEE Computer Society Headquarters, 1730 Massachusetts Ave. NW, Washington, DC 20036-1903. Subscription rates: IEEE Computer Society members
get the lowest rates and choice of media option—$24/29/29 US print + online/sister society/individual nonmember. Go to www.computer.org/subscribe to order and for more information on other subscription prices. Nonmember rate:
available on request. Back issues: $25 for members and $98 for nonmembers.
Copyright and reprint permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of US copyright law for the private use of patrons 1) those post-1977 articles that carry
a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; and 2) pre-1978 articles without fee. For other copying,
reprint, or republication permissions, write to the Copyright and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08855-1331. Copyright © 2007 The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Departments
From the Editors
4 Trusted Computing in Context
FRED B. SCHNEIDER

News
7 News Briefs
BRANDI ORTEGA

Interview
11 Silver Bullet Talks
with Dorothy Denning
GARY MCGRAW

Education SECURE SYSTEMS, P. 80

64 Common Body of Knowledge


Crypto Corner
for Information Security
MARIANTHI THEOHARIDOU AND DIMITRIS GRITZALIS
76 When Cryptographers Turn Lead into Gold
On the Horizon PATRICK P. TSANG

Secure Systems
68 Secure Communication
without Encryption?
KEYE MARTIN
80 A Case (Study) For Usability
in Secure Email Communication
Privacy Interests APU KAPADIA

Digital Protection
72 Setting Boundaries at Borders:
Reconciling Laptop Searches and Privacy
E. MICHAEL POWER, JONATHAN GILHEN,
85 South Korea’s Way to the Future
MICHAEL LESK
AND ROLAND L. TROPE

Building Security In
88 A Metrics Framework to Drive
Application Security Improvement
ELIZABETH A. NICHOLS AND GUNNAR PETERSON

Emerging Standards
92 Infrastructure Standards
for Smart ID Card Deployment
RAMASWAMY CHANDRAMOULI AND PHILIP LEE

84 Ad Product Index

PRIVACY INTERESTS, P. 72 Printed on 100% recycled paper

For more information on these or any other computing topics, please visit the IEEE Computer Society’s Digital Library at http://computer.org/publications/dlib.
North Atlantic Treaty Organisation

Call for proposals on


Information and Communications Security

The North Atlantic Treaty Organisation’s Science for Peace and Security programme invites
scientists working in the field of Information and Communications Security to apply for fi-
nancial support to support their research effort.

nato wishes to foster research in all fields related to information and communications
security. This includes, but is not limited to, the following topics:
◆ Security-related aspects of information systems and networks, such as
Information security: identification and authorization, cryptography,
privacy and data protection, back-up and physical protection
 Encouraging security awareness: security workshops, risk assessment and
management, security policies and standards
 Infrastructure security and reliability: physical and organizational protection
and resources, security tools and network services, establishing the
infrastructure for a Computer Emergency Response Team (CERT)
 Cyber-crime and terrorism
◆ Computer networking
◆ E-learning
◆ Development of virtual communities, development of educational internet content
and software

nato Science for Peace and Security grants may take several forms:
◆ Networking infrastructure grants
◆ Advanced research workshops, advanced networking workshops, advanced study
institutes
◆ Collaborative linkage grants
◆ Applied R&D projects
◆ Reintegration grants

Information on the Science for Peace and Security programme including


deadlines, conditions for eligibility, and application forms are available online at
http://www.nato.int/science/. Further enquiries should be directed to science@hq.nato.int.
From the Editors

Trusted Computing
in Context

M
uch has been said about what makes cyber- all, but then I become skeptical that
a secure cyberspace could be built
space insecure and who’s at fault. Software from components that would be
available.
producers are often singled out as a source The right of computer owners
to control what their computers
of remedy. Actions these producers take in execute is seen as sacrosanct by crit-
ics of trusted computing. I don’t
the name of improving the security of cyberspace, however, are think it’s that simple, and I see
analogies with other rights and re-
sometimes viewed with suspicion. avoided! Perhaps system designers sponsibilities of individuals in a so-
The so-called trusted computing could discharge the responsibility ciety. For example, we all benefit
technology embodied in the of securing cyberspace in some from the cleaner environment that
Trusted Platform Module (TPM) other way. Nobody has yet devised comes from limiting how individu-
secure-coprocessor from the Trust- such a way, but one might exist; it als use property they own. Imping-
ed Computing Group (www.trusted would not only involve eliminating ing on the rights of individuals here
computinggroup.org) is a case in vulnerabilities but also preventing produces benefits for all. And, we
FRED B. point. Here, a hardware-based root human users from being spoofed all benefit from vaccinating every-
SCHNEIDER of trust makes it possible for the sys- into unwittingly aiding attackers. one against a disease, even if it in-
Associate tem designer—not the computer Or perhaps system designers volves relinquishing some control
Editor in Chief owner—to regulate which pro- shouldn’t feel any responsibility at over our bodies (and carries some
grams can run on a given computer.
If system designers are evil, then
they can use trusted computing to
prevent a competitors’ programs Letters to the Editor
from being installed, thus creating
or preserving a monopoly. When Dear Editors, software.” I hope that you have the
computer owners are incompetent, I’d like to thank Gary McGraw for chance to make him a guest on an
though, benevolent system design- the time and effort put into the upcoming show!
ers can use trusted computing to Silver Bullet Security Podcast and Once again, thank you very
prevent malware from being in- articles. They’re very enjoyable and much!
stalled (perhaps unwittingly) and bring a lot of broadening per-
run. Unlike most defenses, which spectives to any security practi- Best regards,
the system operator controls, tioner. One person that I personally Robert Malmgren
trusted computing provides a way to think is missing from that long line Information security professional
save naïve or incompetent com- of interesting persons that you’ve
puter owners from themselves. interviewed is Ross Anderson Gary McGraw responds:
Most computer owners aren’t com- (www.cl.cam.ac.uk/~rja14/) and Great idea. Ross is an old friend
puter scientists, and thus need this his ideas on security engineering. and would be an excellent victim.
help. (Many computer scientists All the work he and others have At your suggestion, I arranged for
need this help, too.) done lately in trying to marry an upcoming interview with him
Trusted computing introduces a together economical science and that will appear in mid-April. By
tension between the rights of com- computer science / IT security is an now you have heard the Becky
puter owners and the (presumed) interesting way to try to describe Bace interview posted in March,
responsibilities of system designers. why we have this “sorry state of right?
Would that this tension could be

4 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
From the Editors

risk of side-effects) because the


chances are reduced of the unvacci-
www.ini.cmu.edu
nated contracting the disease (herd
immunity) and the costs are re- Carnegie Mellon University
4616 Henry Street
duced for care and lost productivity Pittsburgh, PA 15213
USA
when someone does. In short, there
is a precedent and a tradition of re-

Call for Faculty


linquishing individual rights for the
greater good.
In cyberspace, insecure ma-
The Information Networking Institute (INI) at Carnegie Mellon University is
chines can be attacked and co- soliciting applications for one Systems Scientist faculty position in the areas of
opted to serve in armies of zombies, networking, computer systems and security, to begin Fall 2007. The position is
which then cause annoyance by based in Pittsburgh, PA but will also include travel associated with INI’s international
sending spam or wreak havoc by graduate programs. Responsibilities include teaching core technical courses for
the Master of Science in Information Technology - Information Security track
participating in distributed denial- (MSIT-IS) and Master of Science in Information Networking (MSIN) programs,
of-service attacks. All of us in cy- advising graduate students, as well as participating in research projects related
berspace are put at risk when to the initiatives of Carnegie Mellon CyLab.
someone else’s computer has been
Candidate must demonstrate a strong commitment to teaching and a proven
co-opted. The rights of computer research track record in the areas of networking, computer systems and/or
owners to control what their com- security. Industrial working experiences in information technology or management
puters execute thus comes with a are particularly welcome. A Ph.D. in Computer Science, Electrical Engineering,
responsibility—the responsibility or closely related field is required.
not to execute malware that might Submit curriculum vitae, publication list, and other supporting documentation to:
pollute or infect other parts of cy-
berspace. Trusted computing helps Dena Haritos Tsamitis
to discharge that responsibility by Director, Information Networking Institute
email: denat@ece.cmu.edu
transferring decision making to a
presumably knowledgeable system
designer from a likely naïve com-
puter owner.

T rusted computing might not Silver Bullet Sponsored by

embody the best trade-off, but it Security Podcast series


does represent a plausible option in a and
world where you can’t depend on
everyone who operates a computer
to do the right thing. In particular,
trusted computing makes it possible Check out the Silver Bullet Security Podcast with host Gary McGraw,
to educate and depend on relatively author of Software Security, Exploiting Software, and Building
few software producers instead of Secure Software! This free series features in-depth interviews with
the ever growing number of security gurus, including
computer owner-operators, an ar-
gument about leverage—not tech-
nology. Overall, there has been • Avi Rubin of Johns Hopkins
disappointingly little discussion in • Marcus Ranum of Tenable Security
the computer security community • Mike Howard of Microsoft, and
about assignments of rights and re- • Bruce Schneier of Counterpane Internet Security
sponsibilities. Much could be
gained from formulating and evalu- Stream it online or download to your iPod...
ating security solutions in such
terms, making assignments of rights
and responsibilities in a way that best www.computer.org/security/podcasts
leverages the diverse capabilities of
participating communities.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 5


BUILDING CONFIDENCE IN A NETWORKED WORLD

EDITOR IN CHIEF
Carl E. Landwehr • University of Maryland • landwehr@isr.umd.edu

ASSOCIATE EDITORS IN CHIEF


Marc Donner • Google • donner@tinho.net
Fred B. Schneider • Cornell University • fbs@cs.cornell.edu

EDITORIAL BOARD Education: Matt Bishop, University of California, Davis;


Martin Abadi, University of California, Santa Cruz and Deborah A. Frincke, Pacific Northwest National Laboratory
Elisa Bertino, Purdue University Emerging Standards: Ramaswamy Chandramouli, NIST;
Michael A. Caloyannides, Ideal Innovations Rick Kuhn, NIST; and Susan Landau, Sun Microsystems Labs
George Cybenko, Dartmouth College (EIC Emeritus) Interview: Gary McGraw, Cigital
Dorothy E. Denning, Naval Postgraduate School On the Horizon: O. Sami Saydjari, Cyber Defense Agency
Anup K. Ghosh, George Mason University Privacy Interests: E. Michael Power, Gowling Lafleur Henderson;
Dieter Gollmann, Technical University Hamburg-Harburg and Roland L. Trope
Guofei Jiang, NEC Research Labs, Princeton Secure Systems: Sean W. Smith, Dartmouth College
David Ladd, Microsoft Research
Tom Longstaff, Carnegie Mellon Univ., CERT/CC COLUMNISTS
Nasir Memon, Polytechnic University Clear Text: Bruce Schneier, Counterpane Internet Security;
Peter Neumann, SRI Int’l Steve Bellovin, Columbia University; and Daniel E. Geer Jr., Verdasys
Avi Rubin, Johns Hopkins University
Sal Stolfo, Columbia University
STAFF
Giovanni Vigna, University of California, Santa Barbara
Lead Editor: Kathy Clark-Fisher, kclark-fisher@computer.org
Group Managing Editor: Steve Woods
DEPARTMENT EDITORS Staff Editors: Rebecca L. Deuel, Jenny Stout, and Brandi Ortega
Attack Trends: Iván Arce, Core Security Technologies Production Editor: Monette Velasco
Basic Training: Richard Ford, Florida Institute of Technology; Magazine Assistant: Hazel Kosky, security@computer.org
and Michael Howard, Microsoft Contributing Editors: Keri Schreiner and Joan Taylor
Book Reviews: Charles P. Pfleeger, Pfleeger Consulting Group; Original Illustrations: Robert Stack
Shari Lawrence Pfleeger, RAND; Graphic Design: Alex Torres
and Martin R. Stytz, Institute for Defense Analyses
Building Security In: John Steven, Cigital; Publisher: Angela Burgess, aburgess@computer.org
and Gunnar Peterson, Arctec Group Associate Publisher: Dick Price
Conference Reports: Carl E. Landwehr, University of Maryland Membership & Circulation Marketing Manager: Georgann Carter
Crypto Corner: Peter Gutmann, University of Auckland; Business Development Manager: Sandra Brown
David Naccache, École normale supérieure; Assistant Advertising Coordinator: Marian Anderson,
and Charles C. Palmer, IBM manderson@computer.org
Digital Protection: Michael Lesk, Rutgers University; Martin R. Stytz;
and Roland L. Trope, Trope and Schramm

CS MAGAZINE OPERATIONS COMMITTEE Submissions: We welcome submissions about security and privacy
Robert E. Filman (chair), David Albonesi, Jean Bacon, Arnold (Jay) topics. For detailed instructions, see the author guidelines at
Bragg, Carl Chang, Kwang-Ting (Tim) Cheng, Norman Chonacky, www.computer.org/security/author.htm or log onto S&P’s author
Fred Douglis, Hakan Erdogmus, David A. Grier, James Hendler, center at Manuscript Central (www.computer.org/mc/security/
Carl Landwehr, Sethuraman (Panch) Panchanathan, Maureen author.htm).
Stone, Roy Want
Editorial Office:
CS PUBLICATIONS BOARD IEEE Security & Privacy
Jon Rokne (chair), Mike Blaha, Angela Burgess, Doris Carver, Mark c/o IEEE Computer Society Publications Office
Christensen, David Ebert, Frank Ferrante, Phil Laplante, Dick Price, 10662 Los Vaqueros Circle
Don Shafer, Linda Shafer, Steve Tanimoto, Wenping Wang Los Alamitos, CA 90720 USA
Phone: +1 714-821-8380
Editorial: Unless otherwise stated, bylined articles as well as products Fax: +1 714-821-4010
and services reflect the author’s or firm’s opinion; inclusion does www.computer.org/security/
not necessarily constitute endorsement by the IEEE Computer
Society or the IEEE.
Editor: David Ladd, daveladd@microsoft.com

NewsBriefs
Security they don’t see the image, they could be at a phishing site
and shouldn’t enter a password. In a controlled computing
■ According to Karl Lynn of Juniper Networks, older environment, the researchers removed the images, and
versions of Citrix’s Presentation Server Client contain a tested 67 Bank of America customers by asking them to log
security flaw that could compromise machines. The into their online accounts. Of the participants, 58 entered
flaw is a result of an error in Citrix’s proprietary inde- their passwords; only two chose not to because of security
pendent computing architecture (ICA) protocol and the concerns. Those who entered their passwords said they
way it supports connections via proxy servers, possibly didn’t notice their images weren’t present.
letting attackers execute arbitrary code when users visit
malicious Web sites. The flaw affects Presentation Server ■ To combat phishing, Microsoft added support for
Client versions older than 10.0. Currently, no patch is Extended Validation Secure Sockets Layer (EV SSL) cer-
available; as a fix, Citrix recommends users upgrade to tificates to Internet Explorer 7.0 and urges other browser
version 10.0. makers and Web sites to follow. EV SSL-certified Web sites
feature an address bar that turns green, displays the country
■ At the recent Black Hat DC conference in February, David the Web site is based in, and who certified it. EV SSL-
Litchfield revealed a technique dubbed cursor injection to certification guidelines also require third-party authenti-
exploit PL/SQL injection vulnerabilities in Oracle cation companies, such as VeriSign and Entrust, to verify
database servers. Previous attacks using PL/SQL flaws that they have registered with local authorities, have a
required high-level database privileges, but cursor injection legitimate address, and actually control the site. VeriSign
lets anyone who can connect to a database exploit the flaws. says 300 businesses are in the process of certification, and
In response, Oracle urged customers to apply patches. that it has issued 20 EV SSL certificates so far.

■ IOActive, a security firm based in Seattle, Washington, ■ In January 2007, Exploit Prevention Labs, an Atlanta-
cancelled its scheduled demonstration of the flaws in RFID- based security company, reported that the Q406 roll-up
enabled access badges at the Black Hat DC conference. The attack kit was behind 70 percent of the Web-based
company decided not to go ahead with its presentation after attacks in December 2006. Exploit’s chief technology
receiving legal threats from HID Global, a major manufacturer officer, Roger Thompson, says it’s hard to pinpoint the kit’s
of RFID access control systems. IOActive’s chief executive, exact number of exploits because it’s heavily encrypted.
Joshua Pennell said, “We can’t go forward with the threat “The dominance of this package reinforces the fact that the
hanging over our small company.” In a statement, HID Global development and release of exploits frequently parallels
said it didn’t threaten IOActive, but “simply informed IOActive legitimate software businesses,” Thompson says.
and its management of the patents that currently protect HID
Global intellectual property.” ■ Recently, Symantec released new security software to
help combat 0-day attacks. The new tool—Symantec
■ In February 2007, Microsoft warned of an Excel 0-day Online Network for Advanced Response (SONAR)—is a
attack that affects Office 2000, 2004, XP, and Office free add-on to Norton Antivirus 2007 and Internet Security
2004 for Mac. The attack exploits a vulnerability that lets 2007 products. SONAR differs from Symantec’s signature-
attackers remotely take over users’ systems after they’ve based antivirus tools in that it’s behavior-based: it analyzes
opened a malicious Excel attachment or visit a Web site that program behavior to determine whether malicious activity
houses the malicious files. No patch is yet available, but is occurring, thus identifying suspicious behavior before
Microsoft advises users not to open MS Office files from security researchers.
unknown sources.
■ To help protect against phishing scams, eBay now offers
■ According to researchers at Harvard and MIT, site- password-generating devices to its PayPal users. The device,
authentication images used by financial institutions such dubbed the PayPal Security Key, generates random six-
as Bank of America, ING Direct, and Vanguard provide little digit security codes every 30 seconds and costs personal
additional security. Customers preselect an image that will PayPal account users a one-time fee of US$5, but is free for
appear to them when they access their accounts online; if business accounts. PayPal users enter the unique six-digit

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 7
NewsBriefs

code when they log in to their accounts with their regular and Europe. The company recommends that users run
user names and passwords. The code then expires. The antivirus programs and remove the infected files.
service is available to PayPal users in the US, Germany, and
Australia, but the company will eventually extend the service
Privacy
to other countries as well.
■ In February 2007, the US Veterans Administration
■ In response to recent attacks on the SHA-1 hash function, the US National reported an external hard drive missing from an
Institute of Standards and Technology (NIST) is holding a public competition employee’s computer that contained information on
to develop a more secure hash algorithm. NIST has published a draft on sub- almost all US physicians who have billed Medicaid and
mission requirements and evaluation criteria and is currently accepting public Medicare along with medical data for roughly 535,000
comments on the draft. The submission deadline for new hash functions is ten- VA patients.
tatively scheduled for the third quarter of 2008. More information is available at
www.csrc.nist.gov/pki/HashWorkshop/index.html. ■ US Congressman Lamar Smith (R-Tex.) has
introduced the Stopping Adults from Exploiting Today’s
■ To fight against terrorism, Pakistan installed a bio- Youth (SAFETY) Act of 2006, which would let the US
metrics system at the main border crossing between its Attorney General draft far-reaching data retention laws
southwestern Baluchistan province and southern for ISPs. Privacy advocates cite the act’s vagueness as a
Afghanistan in January 2007. The system records finger- major concern. “This bill is so incredibly bad…there’s
prints, retinas, or facial patterns and matches them to bio- nothing in this legislation to prevent the attorney general
metrically enabled Pakistani passports or identity cards. from simply saying, ‘Save everything forever,’” said Lauren
Weinstein from the People for Internet Responsibility, an
■ Cambridge University researchers revealed a proof-of- advocacy group. Smith counters that the act’s focus is on
concept hack to the UK’s Pin-and-Chip system’s catching sexual predators and that a subpoena would be
hardware that could let attackers steal personal data. The required to access the information.
researchers replaced a terminal’s internal hardware with
their own and got it to play Tetris. The demonstration ■ Smart Card Alliance, which includes charter members
showed that attackers could make all of a terminal’s com- IBM, First Data, Visa, and Northrop Grumman, released
ponents interact with one another, leading to the capture guidelines in February 2007 for best practices in security
of data such as PINs. and privacy for companies using RFID technology in
identity-management systems. The guidelines range from
■ A phishing toolkit available on underground forums implementing security techniques such as mutual authenti-
is threatening to bring cybercrime to the masses with an cation to privacy practices such as allowing users to correct
easy-to-use interface that requires minimal, if any, pro- information and instituting a dispute-resolution process.
gramming skill. Using the toolkit, which sells for US$1,000,
scammers only need to enter a few variables, such as the ■ IBM donated its Identity Mixer—software that provides
Web site to be spoofed and the host site for the phony page, encrypted credentials for online transactions—to the
and the tool does the rest: it uses PHP to produce a dynamic Higgins project, an open source project that give users
Web page that pulls in the actual Web site being phished more control over their personal data by making multiple
and displays it to unsuspecting users. Users logging into authentication systems work together. Identity Mixer lets a
the legitimate site never know that scammers are inter- trusted authority, such as a bank or government agency,
cepting their data. issue an encrypted credential that users would give instead
of personal or credit information while online. Buyers, for
■ The US Department of Homeland Security is planning example, would give the encrypted credential to online
Cyber Storm 2, a weeklong exercise slated for March stores, which would pass it to the credit-card issuer, who
2008 to test the nation’s response to a full-scale cyber- decrypts it, verifies it, and pays the retailer. A first version of
attack. Cyber Storm 1 occurred in March 2006, with 115 the Higgins project, with the Identity Mixer software, is
private and international companies and organizations par- slated for release in mid-2007.
ticipating, and included a physical and Internet-based attack
on private and public-sector companies. ■ A judge with the US Foreign Intelligence Surveillance Act
(FISA) court authorized US President George W. Bush’s
■ Satellite navigation company TomTom reported that controversial wiretap program, giving the program
its TomTom GO 910 units manufactured between court oversight after five years, a move that critics say
September and November 2006 might be infected with makes it unconstitutional. The program—called the Terrorist
viruses. The personal car navigation devices include a 20- Surveillance Program—lets the government wiretap phone
Gbyte hard drive and preloaded maps of the US, Canada, and Internet communications—without warrants—into and

8 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


NewsBriefs

out of the country when the caller or receiver has a sus- opment of code here,” White says. “This is the assisting in
pected link to Al Queda. The program will continue with the development of a security configuration.”
court oversight and move out from under the US National
Security Agency’s purview. ■ Later this year, MySpace.com will start offering
Zephyr, parental notification software that lets parents
■ In January 2007, the parent company of retailers T.J. know the name, age, and location their children use while
Maxx, Marshalls, HomeGoods, and A.J. Wright stores on the social network. The software, however, doesn’t let
announced that the computer network it uses to handle parents read their children’s email or see their profiles.
credit- and debit-card transaction was breached in mid- Privacy concerns, including whether the software could be
December. The breach affected stores throughout the US used to monitor other users, prompted Facebook and
and Puerto Rico, as well as Winners and HomeSense stores blogging site Xanga to decline use of the software.
in Canada. According to the New Hampshire Bankers
Association (NHBA), roughly 20 to 30 percent of all New ■ In February 2007, Wellpoint, the largest US health
Englanders might have been affected by the breach. insurer and parent company of Anthem Blue Cross and Blue
Shield, reported the theft of backup tapes that contained
■ Senetas, an Australian cryptography company, and id 196,000 customers’ personal information. The tapes were
Quantique SA, a quantum cryptography company based in stolen from a company that audits the insurer’s claims. The
Geneva, have created a 1- to 10-Gbit network that company sent letters to those affected, all of whom live in
combines quantum key distribution with traditional Kentucky, Indiana, Ohio, and Virginia.
encryption techniques. Quantum cryptography uses photon
polarization to represent 1s and 0s instead of encryption keys ■ German police in the state of Sachsen-Anhalt worked
to scramble data, producing uncrackable codes. The com- with credit-card companies to review more than 22 million
panies plan to offer the first networks in mid-2007. customers’ transactions in an effort to nab child pornog-
raphers. The operation, called Mikado, netted 322 people
■ The UK’s Information Commissioner’s Office (ICO) suspected of buying Internet child pornography. Under
warned that the government’s new recommendation German law, the police can require financial institutions to
to relax data-sharing laws could lead to governmental provide customers’ transaction data if the police provide
snooping. The recommendation came after Prime very explicit search criteria. In this instance, the police
Minister Tony Blair held a seminar to review the UK’s narrowed their requests down to a specific amount of
current data-sharing law—the Data Protection Act—and money, time period, and receiver account.
found that, “overzealous data-sharing rules may be an
obstacle to improving public services.” In a statement, the ■ Vermont’s Agency of Human Services (AHS) reported
ICO said the government must have security and privacy a computer breach affecting roughly 70,000 state res-
safeguards in place and take a measured approach so as idents that might have exposed personal information,
to avoid government abuses and erosion of public trust. including social security numbers. Heidi Tringe, the state’s
“…a cautious approach to information sharing is needed communications director, said the breach appeared to be
in order to avoid the dangers of excessive surveillance and
the loss of public trust and confidence,” the statement ■ Swedish police believe that a Russian organized crime gang used a variant
said. The recommendation has been put to a public of the Haxdoor Trojan to bilk US$1.1 million dollars from a Swedish online
debate; results will be reported back to the Cabinet in banking site. The criminal gang targeted Nordea customers with phishing emails
March 2007 for further review. that urged them to download a “spam fighting” application that was in fact the
Haxdoor Trojan. The Trojan payload activated when users tried to log into the
■ Microsoft confirmed that it sought assistance—and bank’s online site and were then redirected to a phony home page, where key-
received it—from the US National Security Agency (NSA) loggers installed by the Trojan recorded account information. The gang then used
in developing Vista’s security configuration. The move the information to log into the real banking site and drain customer accounts.
was to ensure that Vista met the US Department of Nordea has refunded the affected customers’ money.
Defense’s standards, according to NSA spokesman, Ken
White. However, Marc Rotenberg, director of the Electronic
Privacy Information Center (EPIC), says, “There could be the result of a botnet attack. The state sent letters to those
some good reason for concern. Some bells are going to go affected by the breach, warning them of the compromise.
off when the government’s spy agency is working with the
private sector’s top developer of operating systems.” White
Policy
says the NSA’s role was limited to configuration aspects, not
development, and especially not to system back doors, ■ In February 2007, the European Union (EU) officially
which the NSA has shown interest in. “This is not the devel- launched the Consumer Protection Cooperation (CPC)

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 9


NewsBriefs

Network, a consumer-protection network that’s designed Congress on their development and use of data mining
to aid law enforcement in tracking down perpetrators of programs. Senator Leahy said the bill—the Federal Agency
cross-border fraudulent activity, including spam and Data Mining Reporting Act—would provide an “oversight
phishing scams. The CPC Network was instituted under the mechanism” and safeguard privacy. Testifying before
Consumer Protection Cooperation Regulations, which EU Congress, Leahy said that government agencies are
countries passed in 2004. The CPC Regulations set the operating, or planning to operate, roughly 199 data mining
minimum compliance standards for enforcement authorities programs, including the controversial Automated Targeting
System, which assigns “terror scores” to US airline pas-
sengers, and the Secure Flight program, which analyzes
■ US President George W. Bush signed the Telephone Records and Privacy airline-passenger data. “The American people have neither
Protection Act of 2006, making telephone pretexting—impersonating someone the assurance that these massive data banks will make us
else for the purpose of buying, selling, or obtaining personal phone records—a safer nor the confidence that their privacy rights will be
federal crime punishable by up to 10 years’ imprisonment. Of course, law protected,” Leahy testified.
enforcement and intelligence agencies are exempt.
■ Despite the US Government Accountability Office’s
in the network and include enforcement regulations such as (GAO’s) recommendation to test the program’s security and
the ability to conduct on-site inspections and order com- technology, the Transportation Security Administration
panies to stop illegal practices. (TSA) is going ahead with its rollout of smart-card IDs
for all of the more than 750,000 port workers across the
■ The Massachusetts legislature is considering two bills country. Starting in March, the Department of Homeland
aimed at curbing retailers’ poor data security practices. Security (DHS) will issue the IDs, which will contain port
Currently, banks that issue credit or debit cards to consumers workers’ photographs and fingerprints, after conducting
who’ve been victimized by data breaches absorb the costs criminal background checks on all workers. In a report pub-
to stop the fraudulent activity, with the retailers only on the lished in October 2006, GAO auditors expressed concern
hook for free credit-monitoring services. However, the first over the TSA’s limited testing scope and that it failed to
bill—HB 213—will make retail companies liable for the costs gather data on the “operational effectiveness” of the smart-
incurred as a result of a data breach. Companies involved in card readers in maritime conditions, given that the nation’s
a breach would be required to notify customers and 4,000 ports tend to be near water.
reimburse the card-issuing banks for subsequent fraudulent
activity, including the costs to cancel or reissue cards as a ■ In January 2007, MI5, Britain’s domestic spy agency,
result of unauthorized transactions. Also up for consideration began a new email service that alerts the public about
is HB 328, which would require companies to provide credit security threat levels. To receive the email alerts, users
freezes to those consumers affected by their data breach. must sign up and register on the MI5 Web site. The move
Both bills aim to encourage retailers to improve data security. is part of the agency’s efforts to emerge from its decades-
long policy of secrecy. “It’s part of the service’s ongoing
■ Karen Evans, the US Office of Management and Budget’s effort to improve its public communications and contribute
administrator for e-government and IT, said in a recent con- to the government’s policy of keeping the public informed
ference call that the federal agencies that don’t protect about the national threat level,” says a spokesperson for
personal information might get a smaller portion of the UK’s Home Office.
President Bush’s IT budget. “This year we’re really focused
on making sure agencies are delivering results, investing the ■ In February 2007, US Senators Patrick Leahy (D-Vt.) and
taxpayers’ dollars wisely, and are really executing now on Arlen Specter (R-Pa.) revived a similar version of their 2005
the activities they said they are going to do,” said Evans. Personal Data Privacy Act. This new bill would impose
President Bush recommended an overall increase of 2.6 fines and prison time for those who intentionally conceal
percent for this project for the 2008 fiscal year. The US information on data breaches that cause “economic
Department of Defense (DoD) is slated for the lion’s share damage to one or more persons.” Additionally, the bill
of the budget, with $31.4 billion; the agency with the would require data brokers to let consumers view and
second highest budget is the Department of Health and correct information about themselves for a “reasonable fee.”
Human Services, at $5.6 billion. The allocations “represent
the President’s priorities going forward to combating the ■ The UK plans to close 551 of its 951 government Web
war on terror,” Evans said. sites and fold the services they offered into its DirectGov or
BusinessLink Web sites. Of the remaining 400 sites, 26 will
■ A new bill sponsored by US Senators Patrick Leahy (D- stay; the fate of the remaining 374 will be decided by June
Vt.), Russ Feingold (D-Wis.), and John Sununu (R-N.H.), 2007. The goal is to expand information sharing between
would require US government agencies to report to departments and consolidate services.

10 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Editor: Gary McGraw, gem@cigital.com

Interview

Silver Bullet Talks


with Dorothy Denning
GARY MCG RAW Denning: I think you need to have understand what the threats are
Cigital courses that are dedicated to security, against that firewall?
particularly topics such as cryptogra-

D
orothy Denning is a professor phy, which would be hard to inte- McGraw: Some people claim that
in the Department of Defense grate into another class. The field is we should only let specialists have
Analysis at the Naval Post- just way too big to squeeze a little bit that knowledge because it’s too dan-
graduate School (NPS) in Monterey, here and there into other courses. If gerous and that it shouldn’t be writ-
California. Denning has also worked you’ve got a course on computer net- ten, published, or talked about.
at the Stanford Research Institute works, to do justice to the security What’s your position?
and Digital Equipment Corporation. part really requires another course.
Featured here is an excerpt On the other hand, you do want Denning: Again, I don’t think you
adapted from the full interview be- to cover some security in courses, can do good defense without un-
tween Denning and Silver Bullet particularly courses on software de- derstanding offense. I don’t see how
host Gary McGraw. Their conversa- velopment. Students have to under- you can teach defense without
tion ranged widely, from teaching stand why it’s important to check teaching offense. If you’re talking
computer security to the Big Sur your input parameters and do vari- about how you want to do authenti-
Power Walk. You can listen to the ous other things so that the software cation, you’ve got to understand
podcast in its entirety at www. doesn’t end up being shipped with what the threats to password files are
computer.org/security/podcasts/ or vulnerabilities. and how they’re cracked and sniffed
www.cigital.com/silverbullet, or you off of networks.
can subscribe to the series on iTunes. McGraw: Greg Hoglund and I cited
your book, Information Warfare and McGraw: I think sometimes people
Gary McGraw: You’ve been in Security [Addison-Wesley, 1998], on believe, for whatever reason, that if
academia for much of your career, page five of our book, Exploiting you just talk about building defect-
teaching at Purdue and Georgetown Software [Addison-Wesley, 2004]. free software or how cryptography
and now NPS. What’s the best way We did that because we wanted peo- works or security functionality that
to teach computer security? ple to understand that the things we you can ignore the attack part be-
were talking about in that book cause it would become irrelevant. I
Dorothy Denning: I don’t know could in fact be applied during don’t think that’s really true.
what the best way is. I honestly don’t. wartime. What role does describing
I can only tell you how I do it, which and understanding real attacks play Denning: Right, because the whole
is to look at both the attack side and in computer security? field is evolving. There’s constantly
the defense side and try to make new attack methods and they’re
some sense out of the field and why Denning: You need to understand outside of the models that we design
we need certain kinds of defenses. how attacks work because you need our security around. You’re con-
to understand how IP spoofing stantly having to invent new de-
McGraw: Do you think that teach- works, what happens during denial- fenses to go with the new attacks.
ing particular courses on security is of-service attacks, and how packets Then new software is rolling out
the best way, or is it better to have a get past firewalls, and so on. How continuously, which people find
little bit of security in all courses? can you build a firewall if you don’t vulnerabilities in, so you’re getting

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 11
Interview

taking off now as an important con-


About Dorothy Denning cept in computing and networking.
To the extent that information will
be encrypted based on location, or
D orothy E. Denning is a professor in the Naval Postgraduate School’s
Department of Defense Analysis. She previously taught at Georgetown
University, where she was the Callahan Family Professor of Computer
that people will be authenticated
based on their geographical loca-
Science and Director of the Georgetown Institute of Information Assurance, tion—I don’t know to what extent
and Purdue University. those might become more prevalent.
She has published 120 articles and four books, including Information
Warfare and Security (Addison-Wesley Professional, 1998). She is an ACM McGraw: One of the trade-offs—
Fellow and recipient of several awards, including the Augusta Ada Lovelace and there are always trade-offs in-
Award and the National Computer Systems Security Award. In November 2001, Time magazine volved in security—is that it could be
named her to its innovators list. Her past leadership positions include president of the dangerous to broadcast your location.
International Association for Cryptologic Research and chair of the National Research Council
Forum on Rights and Responsibilities of Participants in Network Communities. Denning: You wouldn’t have to
Denning has a PhD in computer science from Purdue University. Her research interests include broadcast your location. That com-
terrorism and crime, conflict and cyberspace, information warfare and security, and cryptography. munication could go encrypted.

McGraw: And then come to you


more new attacks and you need Denning: Yes, there was a lot of that and only be decrypted if you happen
more defenses. The two are cou- going on. The hardest part for me was to be in the right place?
pled. It’s like the front and back of the ad hominem attacks. From my
your hand. You can’t talk about one perspective, I just wanted people to Denning: Right.
without talking about the other. have a rational debate on the topic but
that didn’t happen. I was rarely in a set- McGraw: I suppose a lot of people
McGraw: Possibly one of the biggest ting where it looked like that. It was aren’t aware of the fact that many of
controversies you’ve been involved such an emotionally charged issue. the gizmos that they carry around
in professionally was the whole Clip- have this geolocation capability built
per chip dustup [the Clinton admin- McGraw: Switching gears, you’re into them. Do you think that we
istration’s 1993 encryption proposal, widely regarded as the inventor of should make a point of making peo-
which used a US National Security geoencryption. Do you think that ple more aware of that, or is it just
Agency-created computer chip that this spatial concept will escape the something that happens and you just
provided a government backdoor to defense community and move into live with it?
encrypted files using escrowed keys, areas such as geographic marketing?
for which Denning was an advocate]. Denning: It will probably happen at
What was it like being dubbed the Denning: Geoencryption, first of the rate that’s needed to understand
“Clipper chick”? all, really wasn’t my idea. Even loca- what’s happening with the technol-
tion-based security wasn’t my idea. I ogy. I think a lot more people are
Denning: Actually, it was a friend of originally got involved because Pete aware of it—many people know that
mine who gave me that name. MacDoran had a company and he if they’re in an emergency situation,
was doing location-based authenti- somebody can find out where they
McGraw: Well, it certainly got cation. He later got involved with are through their cell phones.
picked up and flung around. What folks who were interested in loca-
was it like being in the middle of that tion-based encryption and already McGraw: I was talking to some cell
controversy? had a concept. So I tried to provide phone vendors and they don’t want
greater security and methods for to advertise the fact that you can be
Denning: It was really rough. I felt doing it, but I don’t really deserve geolocated. Not because people
like it damaged a lot of my relation- credit for the idea. might be worried about privacy, but
ships with people in the field. because they’re worried about liabil-
McGraw: Do you think geoloca- ity in case it doesn’t work. Imagine
McGraw: Did you think that to tion is going to catch on with the that someone kidnaps your kids and
some extent some of the arguments general public with GPS devices you’re trying to use a cell phone to
were caricaturing things or making such as those in cell phones and cars? geolocate them and it doesn’t work.
them ridiculously simple to make a Then whose fault is it if it doesn’t
political point? Denning: Yes. Location is certainly work after the vendors make claims

12 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Interview

that it does? Strangely, that’s one of McGraw: I was at the National Acad- doomed—maybe not doomed, but
the things that’s holding that kind of emy of Science recently and someone at least relegated—to co-evolution
technology back right now. was talking about the way Amazon. in terms of security, where we’re
com’s systems have evolved—instead caught in this constant arms race,
Denning: That could be. of being engineered in a top-down this attack–defense thing. Which, I
fashion, they’ve emerged as this suppose, is why you believe that we
McGraw: You and I have very similar chaotic soup. They were one of the have to understand attacks as much
opinions about cybercriminals. We first to adopt this new service-ori- as we understand defense.
both think that cybercriminals are ented architecture idea. It’s interesting
bad and that we shouldn’t spend time when you have a system that’s in some Denning: It’s the same in the physical
hyping these guys up into rock stars. sense an organic thing—defending it world. In the physical world, things
But your view on that seems to have can be a lot more difficult than if engi- evolve. You get new technologies.
evolved pretty radically over the years. neered a system in a top-down way. Automobiles came along and then
What changed your mind? Did it airplanes; all this comes along and it
happen all at once or gradually? Denning: Well, that’s what we’ve got introduces new security issues. They
today. The whole Internet and com- don’t all get solved. So the world is a
Denning: I don’t think my views puter networks and everything have vulnerable place and we just kind of
have evolved all that much. just emerged over time. So that’s prob- accept that and we try to achieve a
ably the reason why it’s very hard. reasonable level of security and sta-
McGraw: Okay. So what is your People have attempted this top-down bility and so on, but it’s not perfect.
view now? design of secure operating systems
since the very early days, probably the McGraw: We seem to be bubbling
Denning: There’s a lot of bad guys ’60s. That’s hard to do; when you fi- along pretty well.
out there. A lot of them are doing it nally get your product put together
for money. It’s just plain old crime. and certified and all that, it’s going to Denning: Yes, but the difference is
In my early work, I honestly didn’t be very expensive. It’s going to be ob- that there seems to be this expectation
pay too much attention to who the solete. It’s going to be slow. with our computer networks that we
bad guys were and the methods they could do it all right and that there are
used. I was looking at security from a McGraw: People used to the edge of no vulnerabilities. To me, that’s just
totally theoretical perspective. It was technology will say, “My goodness, crazy; it’s not realistic and we have to
in the late ’80s, around 1990, that I that seems like an Apple II from 1981.” accept that there’s always going to be
did a study where I interviewed security issues. It’s not just Microsoft’s
some hackers. The hackers I inter- Denning: Right. In the meantime, problem. It’s not their fault.
viewed, about a dozen of them, were the rest of the technology has
all pretty decent folks, I thought. I marched on and you want it because McGraw: Oh, they’re going to solve
wrote an article about them and I it’s a productivity enhancer. It allows it with Vista, haven’t you heard?
probably came across as sounding you to do things that you couldn’t do
like they were fairly decent folks, but before. You can communicate in Denning: They’re adapting. I’ve
without trying to necessarily en- ways you couldn’t before. been very impressed with what Mi-
dorse their behavior of breaking into This top-down approach to crosoft has done over the years.
systems. I never endorsed it. building systems and security is great
But then Don Parker and others and maybe works well in some small, McGraw: Absolutely. Mike Howard’s
got to me and said, “You really rather confined kinds of environ- work has been good. I don’t know if
should go and talk to the security ad-
ministrators and the law enforce-
ment folks and get their perspectives ‘We have to accept that there’s always going
on that.” I did and that reminded me
that there are a lot of folks out there to be security issues. It’s not just Microsoft’s
that have objectives that aren’t be-
nign. Today, I think the major threat problem. It’s not their fault.’
is coming from people who are in-
terested in making money or causing ments, but for the world at large and you know Mike or not, but he was a
damage or leaking intelligence or all the Internet it’s never going to work. previous Silver Bullet victim back in
kinds of things that you really don’t episode six. I want to switch gears
want to happen. McGraw: I guess we’re sort of pretty radically. I noticed your time

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 13


Interview

on the Big Sur Power Walk, which is

New nonmember rate a 21-mile walk, is trending down


over the last two years.

Denning: No! It’s been about the

of $29 for S&P magazine! same!

McGraw: I wonder whether 2007 is


going to be a breakthrough year. Are
IEEE Security & Privacy magazine is the premier magazine
you going to break your 2005 record?
for security professionals. Each issue is packed with
information about cybercrime, security & policy, privacy Denning: Our goal is just to enjoy it
and legal issues, and intellectual property protection. out there, we have six and a half
hours. If it takes six and a half hours,
it’s no problem. It’s a nice walk. It’s
S&P features regular contributions by
beautiful. There’s no reason to rush it.
noted security experts, including Gary
McGraw & Bruce Schneier. McGraw: I’m jealous of that. Can I
come?

Top security professionals in the field share Denning: It’s already sold out. You
information you can rely on: can run the marathon though.

McGraw: I think you’re talking to


• Wireless Security the wrong guy. One last question:
What kind of advice would you give
• Intellectual Property to a young scientist who’s just start-
Protection and Piracy ing out in security?
• Designing for
Denning: My advice would be,
Infrastructure Security “Follow your interest, but follow the
• Privacy Issues law.” I’m very much against experi-
ments that break the law.
• Legal Issues
• Cybercrime
• Digital Rights Management Y ou can find additional podcasts
in the series, such as those fea-
turing Becky Bace or Microsoft’s
• Securing the Enterprise
Michael Howard, at www.computer.
• The Security Profession org/security/podcasts/ or www.
cigital.com/silverbullet/.
• Education
Gary McGraw is chief technology officer
of Cigital. His real-world experience is
grounded in years of consulting with
major corporations and software produc-

Save 59%! ers. McGraw is the author of Software


Security: Building Security In (Addison-
Wesley, 2006), Exploiting Software
(Addison-Wesley, 2004), Building Secure
Software (Addison-Wesley, 2001), and
five other books. McGraw has a BA in phi-

www.computer.org/ losophy from the University of Virginia


and a dual PhD in computer science and
cognitive science from Indiana University.
services/nonmem/spbnr He is a member of the IEEE Computer
Society Board of Governors. Contact him
at gem@cigital.com.

14 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Guest Editor’s Introduction

A Surprise Party
(on Your Computer)?

T
he idea of malicious software as a potential threat could hardly

seem novel to even the most uninformed at the dawn of our

era of pervasive technology and global connectivity. The early

1970s ushered in the evolution of malicious software tech-

nologies, and their developers and users have since driven a

substantial portion of the research of malicious software, as well as ad- signed to detect
and development agenda of the in- vances in defensive artifacts to pre- and assess the
formation security discipline. empt the associated threats of their Bluetooth capa-
Although the rules of the infor- offensive counterparts. The search bilities of mobiles
mation security “game” seem to was slightly biased toward the applied devices in live
constantly change, the tools players research, practical implementations, and highly popu-
use have been the same for more and field experiments that would give lated scenarios.
than 30 years: software artifacts and our readers insight into the tactics cur- Vanessa Gratzer
a handful of security practices and rently in play in the malware game. and David Naccache
policies built on a few fundamental As this issue’s guest editor, I en- take us to the microcosm of micro- IVÁN ARCE
principles borrowed from mathe- joyed reading all the contribu- controller firmware and embedded Core Security
matics, engineering, and econom- tions—they proved insightful, operating systems in “Alien vs. Technologies
ics. The offensive is based on a diverse, and imaginative while Quine.” A clever combination of
plethora of software tools that are maintaining a practical focus. De- side-channel attacks and self-
loosely grouped under the all- ciding which articles to include mutating code—technical tricks
encompassing term of malware, one wasn’t a simple task, given the broad commonly associated with offensive
of several neologisms that the range of possible topics. The IEEE security patterns—exemplify how
information security community S&P staff pushed the magazine’s malware-detection techniques can
can claim as its own. Viruses, page length boundaries to the limit benefit from software or hardware
worms, Trojan horses, key loggers, to make room for five articles, and features that are often perceived as de-
dialers, stealth password sniffers, I’m thankful that they did: the re- sign weaknesses.
Web traffic generators, advertise- sulting combination is, I hope, well Carsten Willems, Thorsten Holz,
ment popup programs, exploits, balanced and worthy of our reader- and Felix Freiling address malware
rootkits, botnets, and zombie ship’s various interests. identification and classification
agents all live under the malware We start our journey with a prac- from a behavioral-analysis per-
umbrella definition. tical study of the plausibility of mal- spective in “Toward Automated
This special issue of IEEE Security ware proliferation over wireless Dynamic Malware Analysis Using
& Privacy focuses on malware’s various networks. In “Studying Bluetooth CWSandbox.” The authors des-
forms and the threats they pose to Malware Propagation: The BlueBag cribe the implementation and use of
modern networks. In putting to- Project,” Luca Carettoni, Claudio a software tool that aims to automat-
gether the issue, I sought out contri- Merloni, and Stefano Zanero give a ically identify malicious software bi-
butions that discuss classification, detailed account of a combined naries captured “in the wild” using
detection, containment, and removal hardware and software artifact de- sandboxing technology.

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 15
Guest Editor’s Introduction

Static binary analysis also plays a


role in attempts to detect the obfusca-
tion techniques that malware uses to
hide its nature. In “Using Entropy
PURPOSE: The IEEE Computer Society is the world’s largest association of computing
Analysis to Find Encrypted and
professionals and is the leading provider of technical information in the field.
MEMBERSHIP: Members receive the monthly magazine Computer, discounts, and
Packed Malware,” Robert Lyda and
opportunities to serve (all activities are led by volunteer members). Membership is
James Hamrock rely on information
open to all IEEE members, affiliate society members, and others interested in the theory basics to detect and classify
computer field. malware and then put their idea
COMPUTER SOCIETY WEB SITE: www.computer.org through a test using a mixed collec-
OMBUDSMAN: Call the IEEE Member Services toll-free number, +1 800 678 4333 (US) or tion of malicious and innocuous soft-
+1 732 981 0060 (international), or email help@computer.org. ware samples gathered over five years.
Finally, we come to the com-
puter science field, which provides
EXECUTIVE COMMITTEE EXECUTIVE STAFF assistance in a different static analysis
President: Michael R. Williams* Associate Executive Director: Anne Marie approach. Danilo Bruschi, Lorenzo
President-Elect: Rangachar Kasturi;* Past Kelly; Publisher: Angela R. Burgess; Asso- Martignoni, and Mattia Monga use
President: Deborah M. Cooper;* VP, ciate Publisher: Dick J. Price; Director, code normalization, control-flow
Conferences and Tutorials: Susan K. Administration: Violet S. Doan; Director, graph extraction, and graph-iso-
(Kathy) Land (1ST VP);* VP, Electronic Finance and Accounting: John Miller
morphism analysis to detect and
Products and Services: Sorel Reisman
(2ND VP);* VP, Chapters Activities: COMPUTER SOCIETY OFFICES classify malware variants derived
Antonio Doria;* VP, Educational Activi- Washington Office. 1730 Massachusetts Ave. from common self-mutating roots
ties: Stephen B. Seidman;† VP, Publica- NW, Washington, DC 20036-1992 in “Code Normalization for Self-
tions: Jon G. Rokne;† VP, Standards Phone: +1 202 371 0101
Mutating Malware.”
Activities: John Walz;† VP, Technical Fax: +1 202 728 9614
Activities: Stephanie M. White;* Secre- Email: hq.ofc@computer.org
tary: Christina M. Schober;* Treasurer: Los Alamitos Office. 10662 Los Vaqueros
Michel Israel;† 2006–2007 IEEE Division V
Director: Oscar N. Garcia;† 2007–2008
IEEE Division VIII Director: Thomas W.
Circle, Los Alamitos, CA 90720-1314
Phone: +1 714 821 8380
Email: help@computer.org
T hese five articles combine to pro-
vide a broad view of current
practical advances in the field. But
Williams;† 2007 IEEE Division V Director- Membership and Publication Orders:
this special issue by no means consti-
Elect: Deborah M. Cooper;* Computer Phone: +1 800 272 6657
Fax: +1 714 821 4641
tutes a comprehensive report of all
Editor in Chief: Carl K. Chang†
* voting member of the Board of Governors Email: help@computer.org ongoing work—I encourage our
† nonvoting member of the Board of Governors Asia/Pacific Office. Watanabe Building, 1-4-2 readers to follow up with contribu-
Minami-Aoyama, Minato-ku, tions that will help us build up a more
Tokyo 107-0062, Japan complete playbook for the informa-
BOARD OF GOVERNORS Phone: +81 3 3408 3118
tion security community and attain a
Fax: +81 3 3408 3553
Term Expiring 2007: Jean M. Bacon, better understanding about how to
Email: tokyo.ofc@computer.org
George V. Cybenko, Antonio Doria, solve the malware problem.
Richard A. Kemmerer, Itaru Mimura, Brian IEEE OFFICERS
M. O’Connell, Christina M. Schober Iván Arce is chief technology officer and
Term Expiring 2008: Richard H. Eckhouse, President: Leah H. Jamieson; President-
cofounder of Core Security Technolo-
James D. Isaak, James W. Moore, Gary Elect: Lewis Termin; Past President:
gies—an information security company
McGraw, Robert H. Sloan, Makoto Takiza- Michael R. Lightner; Executive Director
based in Boston. Previously, he worked
wa, Stephanie M. White & COO: Jeffry W. Raynes; Secretary: Celia as vice president of research and devel-
Term Expiring 2009: Van L. Eden, Robert Desmond; Treasurer: David Green; VP, opment for a computer telephony inte-
Dupuis, Frank E. Ferrante, Roger U. Fujii, Educational Activities: Moshe Kam; VP, gration company and as information
Anne Quiroz Gates, Juan E. Gilbert, Don F. Publication Services and Products: John security consultant and software devel-
Shafer Baillieul; VP, Regional Activities: Pedro oper for various government agencies
Ray; President, Standards Association: and financial and telecommunications
George W. Arnold; VP, Technical companies. Contact him at ivan.arce@
Next Board Meeting: coresecurity.com.
Activities: Peter Staecker; IEEE Division V
18 May 2007, Los Angeles
Director: Oscar N. Garcia; IEEE Division
VIII Director: Thomas W. Williams; Interested in writing for us? Log onto
President, IEEE-USA: John W. Meredith, Manuscript Central at https://mc.
P.E. manuscriptcentral.com/cs-ieee. Authors
must use Manuscript Central to upload
their submissions. First-time users must
revised 29 Jan. 2007 create a new account.

16 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Studying Bluetooth
Malware Propagation
The BlueBag Project

Current Bluetooth worms pose relatively little danger


compared to Internet scanning worms—but things might
change soon. The authors’ BlueBag project shows targeted
attacks through Bluetooth malware using proof-of-concept
codes and devices that demonstrate their feasibility.

hanks to its characteristics, Bluetooth is emerging created when the phone malfunc-

T as a pervasive technology that can support wire-


less communication in various contexts in every-
day life. For this reason, it’s important to
understand the potential risks linked with various wire-
less devices and communication protocols. At present,
tions. This has led to the myth that Blue-
tooth malware is yet another form of viral code that
doesn’t pose any real or new security issues and which has
a relatively low chance of causing significant damage.
However, as we will show, the potential for the propaga-
LUCA
CARETTONI
AND CLAUDIO
M ERLONI
the greatest level of diffusion exists in so-called smart tion of dangerous Bluetooth malware indeed exists. Until Secure
phones. These devices offer all the functions of cutting- now, a combination of lucky chances and various envi- Network Srl
edge telephones while integrating those of advanced ronmental difficulties sheltered us from the widespread
handheld computers managed by operating systems such propagation of such epidemics, but we cannot simply STEFANO
as Symbian or Microsoft Windows Mobile. keep crossing our fingers and hoping for the best. ZANERO
Smart phones can send and receive SMSs, MMSs (mul- In this article, we focus on the new risks created by Politecnico di
timedia messages), and email, plus let users listen to MP3 the widespread presence of Bluetooth-enabled devices Milano
files, watch videos, surf the Internet, play games, manage carrying both potentially sensitive data and vulnerability-
agendas, synchronize and exchange data with their PCs, prone software. In particular, we show how this mix of
and much more. Although they still constitute a niche technologies could become a vehicle for propagating
market, smart phones saw a growth rate of 100 percent per malware that’s specifically crafted to extract informa-
year for the past five years, and according to projections re- tion from smart phones. We built a mobile, covert at-
leased at the beginning of 2006 by ABI Research, a market tack device (which we call BlueBag) that demonstrates
research company, they held 15 percent of the global cell how stealthy attackers can reach and infect a wide
phone market by the end of 2006, which is equivalent to number of devices.
123 million units sold, thanks to growing user requests for
applications such as mobile email, decreasing prices, and a Bluetooth technology
broader choice of models (www.abiresearch.com/products/ As a word or term, Bluetooth is now fairly common. The
market_research/Smartphones). literal meaning supposedly refers to the Viking Emperor
Because smart phones are now very similar to PCs, Harald (Blåtand, in Danish), who lived during the 10th
they’re simultaneously more vulnerable, more useful, and century AD and united the kingdoms of Denmark, Nor-
more attractive for potential attack than older mobile way, and Sweden (http://en.wikipedia.org/wiki/Harald
phones. This increased vulnerability is due to the pres- _I_of_Denmark). In fact, the Bluetooth protocol aims to
ence of a system of evolved connectivity applications that unify different wireless data-transmission technologies
expose the phone (and the data it contains) to risks. For- among mobile and static electronic devices such as PCs,
tunately, recent cell phone viruses haven’t caused signifi- cellular phones, notebooks, PDAs, DVD players, MP3
cant damage, except for the obvious inconveniences devices, TVs, Hi-Fis, cash registers, point-of-sale termi-

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 17
Malware

nals, and even household appliances such as refrigerators ous of these5 can lead to a compromise of the crypto-
and washing machines. graphic algorithm protecting communication through
Bluetooth is essentially an alternative to the traditional sniffing, but this attack is impractical because the attacker
infrared communication standards (the most famous must be present at the pairing of devices and then must
being IrDA, Infrared Data Association). Whereas IrDA be able to sniff communications between them. This is
transmits data using infrared light waves, Bluetooth is more difficult than it seems: Bluetooth divides the 2.4-
based on short-wave radio technology, which can trans- GHz spectrum range into 79 channels, through which
mit data across physical obstacles without needing line of devices hop in a pseudorandom sequence that differs
sight.1 Bluetooth devices use the 2.4-GHz frequency from PAN to PAN. This is done both to avoid interfer-
range (the same range that WiFi 802.11 technology uses): ence among different PANs and to enhance security. In
the exact frequency spectrum used varies between coun- fact, this inhibits common commercial Bluetooth hard-
tries due to national regulations. ware from sniffing communications in a PAN it doesn’t
Significant improvements over IrDA are the fact that it participate in (contrast this with common, off-the-shelf
requires neither a line of sight nor proper orientation of WiFi hardware, which can be placed in monitor mode
devices; it offers the possibility of connecting multiple and used for sniffing). A hardware sniffer can easily cost
devices and not just pairs, as well as an increased range of in the range of US$10,000, which places this attack out
connectivity. When individuals connect different Blue- of reach for the common aggressor, but surely within the
tooth devices together, they create personal area net- reach of corporate spies. Provided to be able to sniff the
works, or PANs (also called piconets in the Bluetooth pairing, a tool exists for personal identification number
specification), which are small ad hoc networks that can (PIN) cracking (www.nruns.com/security_tools.php).
exchange data and information just as within regular As a possible solution, researchers have proposed alter-
LANs. These improvements are also the key reasons that nate implementations of Bluetooth with more secure
Bluetooth can be used, for instance, to transport auto- encryption algorithms.6
matically spreading malware. This isn’t true with IrDA
because it requires proper alignments between both the Specific attacks
transmitting and receiving devices, effectively avoiding Even if Bluetooth is theoretically quite robust, several
“casual” or unwanted interaction. security issues have surfaced in various implementations
Bluetooth technology is characterized by low power of the standard stack since late 2003. Among the existing
(from 1 to 100 milliwatts [mW]—a thousand times less attacks, we can quote significant examples drawn from
than the transfer power of a GSM cell phone) and a com- www.trifinite.org, an organization that hosts informa-
munication speed of around 1 Mbit per second (Mbps). tion and research in wireless communications:
With regard to power, Bluetooth devices can be grouped
in classes, each corresponding to a different reach: • BlueSnarf. This type of attack uses the OBEX (object
exchange) Push service, which is commonly used to
• class 1 can communicate with Bluetooth devices in a exchange files such as business cards. BlueSnarf allows
100 meters range; an attacker to access the vulnerable device’s phone
• class 2 can communicate with Bluetooth devices up to book and calendar without authentication. A recently
a 10 m range; and upgraded version of this attack gives the attacker full
• class 3 can communicate with Bluetooth devices with- read–write access.
in a 1 m range. • Bluejacking. By carefully crafting the identification that
devices exchange during association, attackers can
Currently, most common devices belong to classes 2 transmit short deceitful text messages into authentica-
and 3; laptops and cell phones, for instance, normally use tion dialogs. Users can then be tricked into using their
class 2 peripherals. Toward the end of 2004, a new imple- access codes, thereby authorizing an aggressor to access
mentation of the Bluetooth technology (version 2.0) was a phone book, calendar, or file residing on the device.
released that allowed transfer speeds of up to 2 and 3 • BlueBug. This vulnerability permits access to the cell
Mbps, as well as lower energy consumption. The new phone’s set of “AT commands,” which let an aggressor
protocol is also backward-compatible. use the phone’s services, including placing outgoing
calls, sending, receiving, or deleting SMSs, diverting
Security issues calls, and so on.
Although the Bluetooth standard incorporates very ro- • BlueBump. This attack takes advantage of a weakness in
bust security mechanisms2 that application developers the handling of Bluetooth link keys, giving devices that
can use to create secure architectures, researchers have are no longer authorized the ability to access services as
discovered a series of theoretical glitches and possible at- if still paired. It can lead to data theft or to the abuse of
tacks in Bluetooth’s core specifications.3,4 The most seri- mobile Internet connectivity services, such as Wireless

18 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Application Protocol (WAP) and General Packet


Radio Services (GPRS).
• BlueSmack. This denial-of-service (DoS) attack knocks
out certain types of devices; attackers can perform it
with standard tools.
• HeloMoto. A combination of BlueSnarf and BlueBug,
this attack’s name comes from the fact that it was origi-
nally discovered on Motorola phones.
• BlueDump. This attack causes a Bluetooth device to
dump its stored link key, creating an opportunity for
key-exchange sniffing or for another pairing to occur
with the attacker’s device of choice.
• CarWhisperer. This attack abuses the default configura-
tion of many hands-free and headset devices, which
come with fixed PINs for pairing and transmission.
• BlueChop. This DoS attack can disrupt any estab-
lished Bluetooth piconet by means of a device that
isn’t participating in it, if the piconet master supports
multiple connections.

These flaws demonstrate how, in many cases, attackers can


steal information from mobile devices, control them from a
distance, make calls, send messages, or connect to the Inter-
net. In computer systems, these problems are traditionally Figure 1. Luca Carettoni (left) and Stefano Zanero (right) with the
resolved with the release and application of patches. That BlueBag trolley. The picture was taken during the survey at the Orio
same approach doesn’t extend to GSM handsets, however; Center shopping mall. Notice how inconspicuous the trolley is in
in most cases, a firmware update can be performed only at this context, particularly if you keep in mind that the mall is in front
service points and shops, not by users. Therefore, many of an airport.
phones and firmwares can be vulnerable and in use long
after a vulnerability is discovered and a patch produced.
Some of these attacks are implemented in Blooover, a hidden tool and an instrument that could easily be car-
proof-of-concept application that runs on Symbian cell ried around and still have a long battery life.
phones (www.trifinite.org/trifinite_stuff_blooover.html). To fulfill these requirements, we created the BlueBag
This counters the idea that attackers need laptops to ex- by modifying a standard blue trolley and inserting a Mini-
ecute their attacks, therefore making themselves visible. ITX system (see Figure 2) with the following off-the-
Most of these attacks can also be performed at a distance shelf components:
using long-range antennae and modified Bluetooth
dongles; a Bluetooth class-2 device was reportedly able • a VIA EPIA Mini-ITX motherboard (model
to perform a BlueSnarf attack at an astounding distance PD6000E; because it doesn’t have a fan, its power con-
of 1.08 miles (www.trifinite.org/trifinite_stuff_lds.html). sumption is reduced);.
• 256 MBytes of RAM in a DDR400 DIMM module;
Creating the BlueBag: A covert • EPIA MII PCI backplate to extend the available on-
attack and scanning device board USB connections from two to six;
Our goals in undertaking this survey were to gather data • a 20-Gbyte iPod, with a 1.8-inch hard drive that can
on the prevalence of insecure devices to understand how resist an acceleration of up to 3gs;
susceptible people are to simple social engineering at- • eight class-1 Bluetooth dongles with Broadcom chipsets
tacks, and to demonstrate the feasibility of attacks in se- (some were connected to a four-port USB hub);
cured areas such as airports or office buildings. • a modified class-1 Linksys Bluetooth dongle (Cam-
To mount any type of attack without being noticed, bridge Silicon Radio chipset) modified with a Netgear
we needed to create a covert attack and scanning device, omnidirectional antenna with 5dBi gain.
which we later came to call the BlueBag (see Figure 1). • a picoPSU, DC-DC converter (this small power supply
We envisioned a Linux-based embedded system with can generate up to 120 watts at over 96 percent effi-
several Bluetooth dongles to process many discovered de- ciency); and
vices in parallel, using an omnidirectional antenna to im- • a 12V-26Ah lead acid battery to power our lengthy sur-
prove the range and cover a wide area. We needed both a veying sessions (up to 8 hours).

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 19


Malware

of a TCP/IP over Bluetooth connection. Using this


configuration, there’s no need to open the BlueBag case
in public. At no time did anyone stop us or suspect us of
doing something unusual, even in highly secured areas
such as airports.

Survey results:
A discomforting landscape
In our surveys, we initially focused on identifying how
many active Bluetooth devices were in discoverable (or
visible) mode. This is, in fact, the condition of poten-
tial real-world risks: researchers have demonstrated that
it’s possible to find devices with active Bluetooth tech-
nology in nondiscoverable mode using a brute-force
attack. However, given the enormous time expendi-
ture this would entail, it isn’t feasible in a generic con-
text. An attack with this method is possible only if
attackers want to target a specific device they know to
be active and in range, and even then, they must first
Figure 2. The BlueBag open. Note the motherboard (top, left side) identify the brand and model in order to prune the ad-
and battery (bottom, left side) as well as the dongles (top, right dress space.
side) and the antenna (below the dongles). Therefore, keeping a phone in nondiscoverable mode
provides a basic form of protection against targeted at-
tacks, and, in general, keeps the device safe from worms
The total cost to build such a device is approximately that use Bluetooth technology to replicate, given that
US$750, demonstrating just how economical it is to cre- such worms research their victims by simply scanning de-
ate a Bluetooth attack device. vices in the area. For this reason, our test focused exclu-
The BlueBag runs on GNU/Linux OS (specifically, sively on detecting devices in discoverable mode—the
we use the Gentoo distribution for its outstanding cus- only ones actually in a condition of potential risk of attack
tomizability and performance), on top of which we from Bluetooth malware.
created a software infrastructure in Python that makes We conducted our survey in several high-transit loca-
it easy to devise, control, and perform survey sessions. tions surrounding Milan:
The software is completely multithreaded, and we can
use the available dongles to perform different tasks • Milan’s Exhibition Centre, during the InfoSecurity
concurrently. We implemented a simple but useful 2006 trade show;
dongle management and allocation scheme to dynam- • the Orio Center Shopping Mall;
ically learn about available resources and lock them • the MM2 Cadorna Metro Station;
when needed. By doing so, we can reserve specific • the Assago MilanoFiori Office District;
dongles to run applications that need to lock single • Milan’s Central Station;
physical interfaces for some time (the “pand” daemon, • the Milan Malpensa Airport; and
which allows us to establish connectivity over Blue- • Politecnico di Milano Technical University, Leonardo
tooth). The software is quite modular and was designed Branch.
with the typical producer/consumer pattern: produc-
ers put found devices in a queue, using the standard We chose a variety of venues to better evaluate whether
utilities that come with BlueZ (the official Linux Blue- and how the prevalence of potentially vulnerable targets
tooth stack) in order to collect information. The soft- varied in different contexts populated by different peo-
ware also includes customized versions of well-known ple. Milan’s Central Station, for instance, has a very het-
Bluetooth information-gathering techniques such as erogeneous user base (and a dense crowd—the station
blueprinting (a method for remotely identifying Blue- serves 270,000 passengers on an average business day);
tooth-enabled devices, similar to OS fingerprinting). the Orio Center Shopping Mall on a Saturday is filled
A distinct thread manages the queue and assigns tasks to with many young people and families, subjects who
different consumers. might not be aware of the dangers linked with new tech-
We designed the BlueBag software suite to allow us nologies, as opposed to visitors and exhibitors at the Info-
to monitor and control the test’s execution from a palm- Security trade show (which sees roughly 2,000 security
top or smart phone via a Web interface that runs on top professionals a day).

20 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Table 1. Summary of surveying results.


LOCATION DATE DURATION (HH: MM) UNIQUE DEVICES DEVICE RATE
InfoSecurity 2006 02/08–10/06 4:42 149 0.53
Orio Center Shopping Mall 03/01–11/06 6:45 377 0.93
MM2 Metro Station 03/09/06 0:39 56 1.44
Assago Office District 03/09/06 2:27 236 1.60
Milan Central Station 03/09/06 1:12 185 2.57
Milan Malpensa Airport 03/13/06 4:25 321 1.21
Politecnico di Milano
Technical University 03/14/06 2:48 81 0.48
Total 22:58 1405

We performed multiple sessions, on different days, for Table 2. Services offered by mobile devices.
a total of 23 hours of scanning dispersed over seven days.
Table 1 shows the results; “unique devices” denotes the SERVICE TYPE NUMBER OF DEVICES
number of unique devices in discoverable mode that we
found during a specific session, and “device rate” indi- OBEX Object Push, OBEX file transfer 313
cates the average number of unique devices discovered Headset hands-free audio gateway 303
per minute. Dial-up networking 292
This data shows the capillary diffusion of Bluetooth
technology in everyday life and also highlights the
huge number of potentially vulnerable devices we for enough time to allow the scanning of 313 devices; this
found, even in such a short duration: at first glance, service is normally used for transferring information
Bluetooth seems to be an integral part of everyone’s (business cards, for instance) or files and applications—
life, important not only for professional but also for including worms. It’s very likely that most, if not all, cell
personal use. Note, too, that in terms of risk awareness phones have the OBEX Push service activated. Because
among the Central Station, the Milan Malpensa Air- we found 1,312 phones among the devices, the result
port (populated by a heterogeneous public), and the might seem strange at first sight. The explanation is sim-
Assago Office District (where most users use these ple: among all those devices, 313 stayed in range long
devices for work purposes), there’s an insignificant dif- enough to allow the OBEX Push service to let BlueBag
ference. The situation was significantly better—indi- correctly poll them.
cating a greater awareness among users—at the
InfoSecurity conference and at the university. Visibility
Another important finding from our survey was “visibil-
Categorizing devices ity time”—that is, the average time in which a device re-
For the 1,405 unique devices detected, we performed mains in a potential attacker’s range, or the time in which
further analysis to broadly categorize the devices: cell an aggressor could exploit the device. This time depends
and smart phones (1,312), PCs/notebooks (39), Palm substantially on the different activity patterns of people in
Pilots (21), GPS navigators (15), printers (5), and other different contexts: for instance, at the Orio Center Shop-
various devices (13). In a similar, independent experi- ping Mall, the average time was 12.3 seconds, at the Po-
ment that FSecure performed in parallel during CeBIT litecnico di Milano Technical University, 10.1, and in the
2006 (the ICT trade show in Hannover, Germany), a Milano Malpensa Airport, the time was 23.1 seconds. Of
regular laptop device capable of identifying active course, in some cases, this time depends on the activity
Bluetooth devices in a 100-meter range found more pattern a hypothetical aggressor might carry out: at the
than 12,500 devices with discoverable Bluetooth mode Politecnico, we deliberately avoided staying in a single
during a week of scanning (www.f-secure.com/ classroom for a long time, but an aggressor interested in a
weblog/archives/archive-032006.html). To our know- specific target might very well do so, or he or she might
ledge, the researchers made no attempt to break down follow the target in an airport up to the gate (where most
the data any further. people settle down to wait for boarding), thus extending
After grouping the devices, we also tried analyzing the this time. Our estimated average visibility times are there-
types of services the devices offered and, in particular, fore interesting for casual contacts, such as the one im-
those that can be used to propagate worms. As Table 2 plied by casual worm transmission.
shows, the OBEX Push service was active and in range It’s important to point out that some cell phone mod-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 21


Malware

Envelope connected palmtops have become daily work tools for


Main people with medium to high levels of responsibility within
If ( inTarget() ){ their organizations. This implies that these devices could
P.run();
}else{
hold particularly interesting information that potential ag-
while( true ){ gressors might want, such as for industrial espionage.
scanDevices(); All the elements are thus in place for a huge risk, to
propagate();
} Payload
both companies and individuals; we can almost certainly
} foresee an increase in attacks that aim not only to make a
scanDevices() run(){…} mobile device unusable or connect it to premium-rate
– Inquire for neighbors telephone numbers but also target specific information
propagate() on the device.
– Obex PUSH or Attacks Lib
The effort it takes to reach a target device is often
targetsList[]
– Array of {bt_addr, payload, payload_parameters}
thought of as a form of protection. To prove this assump-
tion wrong, we created a network of viral agents that can
spread among mobile devices looking for a target, zero in
Figure 3. Pseudocode of a Bluetooth worm with dynamic payloads on it, and then report information back to the attacker.
for targeted attacks. Because such agents are targeted to a specific environ-
ment or person, it’s interesting to study the use of dy-
namic payloads that vary depending on the type of
els on the market are configured to be in discoverable infected device. We designed a proof-of-concept worm
mode by default if the Bluetooth connection is activated, infrastructure that uses an envelope-payload mechanism
thus requiring the user to manually modify the setting to (see Figure 3).
the secure, nondiscoverable mode. Other devices must The envelope component is a piece of software that
instead be manually brought to discoverable mode and can scan for Bluetooth devices and propagate to found
are automatically reset to nondiscoverable after a short devices; it has a list of targets to propagate to and a set of
time period. Our survey showed this to be effective: just a payloads that it can “deploy” on the targets. The payload
handful of the detected device models were of the latter components can be any type of malicious code that we
type, surely out of proportion with the respective market want to execute on victim devices within the limits of cell
shares. Because keeping devices in nondiscoverable phone operating systems—examples include keyloggers,
mode doesn’t prevent communication among paired de- audio recorders, and sniffers. A similar design pattern (in
vices, keeping a phone in nondiscoverable mode a very different context) appears in the Metasploit frame-
shouldn’t entail a heavy usability burden. work’s Meterpreter infrastructure.7
Such payloads can also use the high connectivity of
Social engineering Bluetooth-enabled devices to transmit harvested infor-
After we investigated how effectively Bluetooth malware mation back to the attacker (in much the same way that
can propagate, we realized that we needed to also estimate common PC-based spyware does), for instance, using
the success rate of the basic social engineering techniques the Internet email service or a sequence of MMSs. In
Bluetooth worms commonly use. Most existing worms this way, the attacked device doesn’t need to be within
rely on the user accepting a file to propagate, so we wanted the attacker’s range to send the retrieved data. It’s not dif-
to know the ratio of users who would accept an unknown ficult then to envision an attacker that infects several de-
file transfer from an unknown source. To obtain this data, vices (during a morning commute, for example)
we developed an OBEX Pusher, an add-on to our normal belonging to an organization’s employees, and then just
survey scripts, which searches for all discoverable Blue- waits for one of these devices to reach and infect or at-
tooth devices with OBEX Push support enabled and then tack the device of the organization’s CEO. In other
sends them a file. Using this tool (and transmitting an in- words, attackers could create a botnet of Bluetooth-
nocuous image file), we found that an astounding 7.5 per- enabled, remotely controlled zombie machines, which
cent of device owners carelessly accepted unknown file they could then use to perform further attacks on de-
transfers from unknown sources and were thus highly vul- vices they couldn’t normally reach.
nerable to social engineering attacks. One of the barriers to mobile malware propagation
has historically been differences among various operating
Bluetooth-enabled systems and hardware platforms. This is becoming easier
malware networks to overcome because of the growing popularity of Java 2
Our experiments show that just a small percentage of peo- Micro Edition (J2ME), which enables software authors
ple today are aware of the risks incurred by using appar- (and, correspondingly, malware authors) to create cross-
ently innocuous devices. Moreover, smart phones and platform software for mobiles. We successfully imple-

22 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

mented our proof of concept in Java; it runs on any cell nerability in a common email client to automatically
phone compatible with Mobile Information Device launch it were successful ways to propagate viral code.
Profile (MIDP) 2.0 on which JSR-82 (the Java Bluetooth One of the best models for such propagation occurs
API) is active. when the email service is modeled as an undirected
Features that would make this worm really dangerous graph of relationships between people.12 The problems
(and that we therefore didn’t implement) are ways to auto- here lie in how to model the users’ behavior,13 that is,
execute with as little interaction with the device user as what to do if the worm doesn’t automatically exploit a
possible. On Symbian phones, for instance, a worm can vulnerability but instead relies on social engineering,
overwrite system files due to various structural flaws in ac- and how to build the relationship graph (which is more a
cess control. Otherwise, implementation flaws and bugs local than a sparse one).
that allow for command execution (such as the ones we Eugene Spafford wrote the first description of self-
described earlier) could help this worm propagate. propagating worms that scan for vulnerabilities.14 In re-
cent years, such worms have changed the threat
Simulation results landscape once more. They can be modeled through the
To correctly evaluate the threat this attack scenario poses, random constant spread (RCS) model,15 developed
we developed a model and a simulation to understand its using empirical data derived from the outbreak of the
effectiveness. Due to space limitations, we refer the Code Red worm, a typical random scanning worm.
reader to other work8,9 for a full discussion of the prob- This model uses extremely rough approximations, ig-
lems involved with modeling computer virus propaga- noring the effect of immunization and recovery. It im-
tion. An excellent analysis of mathematics for infectious plicitly assumes that the worm will peak before a remedy
diseases in the biological world is available elsewhere.10 begins to be deployed. Additionally, it models the Inter-
net as an undirected, completely connected graph. This
Traditional propagation models is far from true,16 but the model still behaves macroscop-
Propagation models evolve naturally, following the changes ically well. UDP-based worms, however, require cor-
in viruses’ propagation vectors. The earliest models targeted rections to account for bandwidth restrictions and
virus propagation through the infection of host executa- bottleneck Internet links.17
bles.11 Most biological epidemiological models share two Bluetooth virus propagation can happen in several
assumptions: they’re homogeneous—that is, an infected in- different ways, but the most common until now has been
dividual is equally likely to infect any other individual—and through simple social engineering. The worm sends
they’re symmetric, which means there’s no privileged di- messages with copies of itself to any device in range
rection of virus transmission. The former makes these through an OBEX Push connection. The receiver, find-
models inappropriate for illnesses that require noncasual ing a seemingly innocuous message on the cell phone
contact for transmission, as well as being inappropriate for with an invitation to download and install an unknown
describing the early stages of propagation of an epidemic program, often has no clue that it can pose a danger.
that’s strongly location-dependent. In an influential seminal Cabir, one of the first cell phone worms and the first case
paper, Jeffrey Kephart and Steve White addressed these of malware that could replicate itself solely through Blue-
shortcomings by transferring a biological model onto a di- tooth, used this technique.
rected random graph to better approximate the chain of MMS messages are another potential medium of
software distribution and the way it worked in the early days propagation. The Commwarrior worm propagated
of the personal computing revolution.11 through MMS (in fact, it spread from 8 a.m. to midnight
Among other results, Kephart and White showed that using Bluetooth connections and from midnight to 7
the more sparse a graph is, the more slowly an infection a.m. through MMS messages). Another method of prop-
on it spreads; there’s also a higher probability that an epi- agation would be the use of email- or TCP-based
demic condition doesn’t occur. (In a sparse graph, each worms, such as the ones usually seen on PCs, although
node has a small, constant average degree; in a local graph, such methods haven’t really been used in phone viruses
the probability of having a vertex between nodes B and C until recently.
is significantly higher if both have a vertex connected to By the end of May 2006, F-Secure research laborato-
the same node A.) ries had classified more than 140 virus specimen (www.
f-secure.com/v-descs/mobile-description-index.shtml).
Mass mailers and scanning worms Of these, most found in the wild propagate by relying
The introduction of the Internet changed the malware solely on Bluetooth technology. In fact, our own experi-
landscape and made traditional models unrealistic. The ments showed that this transmission method alone can
first effect was the appearance of mass-mailing worms, reach 7.5 percent of a mixed population of targets, so we
which demonstrated that tricking users into executing decided to simulate the propagation of viral code that uses
the worm code attached to an email or exploiting a vul- Bluetooth as its vector.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 23


Malware

specific contexts with fixed parameters inspired from real


Output, Bluebag
No output environment characteristics and data collected during
1.0 Output, no Bluebag our survey. In particular, we simulated a shopping mall—
a simplified version of the Orio Center Shopping Mall
we visited—with 250  100 meters of surface and 78
0.8 shops. We considered a population of 184 discoverable
devices (7.5 percent of which were susceptible to infec-
Infection ratio

tion), with a Bluetooth transmission range of 15 meters,


0.6
which is reasonable for mobile phones or PDAs. We con-
servatively estimated a 0.3-Mbps bandwidth link and a
0.4 42-Kbyte worm—the size of the envelope-and-payload
worm we designed.
In our first scenario, we used CMMTool to mimic the
0.2 behavior of people inside lunch areas or food courts, cre-
ating groups of relatively stationary people, a small num-
ber of whom “travel” among lunch areas. Figure 4 shows
0 500 1,000 1,500 the results. Initially, we didn’t consider people entering or
Time (sec) leaving the shopping mall during our simulation time (on
the line marked “no output”). We then added a random
Figure 4. Infection ratio. Our simulation examined three different flow of people with discoverable devices entering and
conditions: without people entering or leaving the area, with a flow exiting the mall (on average, one person each 10 seconds,
of people and a propagating worm, and with a flow of people and a realistic value from our assessments). We then tested two
the BlueBag actively disseminating a worm. different conditions: the first was a worm propagating
(starting with just one infected device), marked “no
BlueBag” in the figure, and the second was the presence
This wasn’t an easy task. On one hand, we wanted to of an attacker with a tool similar to our BlueBag, who was
follow the early stages of a worm’s propagation because actively disseminating a worm.
we wanted to evaluate its effectiveness as a targeted attack As Figure 4 shows, after little more than 30 minutes on
tool instead of as a global infection (thus assumptions of average (the time of a typical lunch break), a simple worm
homogeneity and nonlocality can’t hold). On the other could infect any susceptible device in the lunch area
hand, we needed to simulate the occasional and volatile through propagation alone. An attacker with a device
interactions of highly mobile devices. So we needed to such as the BlueBag would obtain the result even faster.
effectively simulate a highly sparse graph of relations that In a second scenario, we considered the behavior of a
change dramatically over time. more mobile crowd of people walking in and out of shops
We used results from the ad hoc network research and browsing the window displays. In this case, the results
community to simulate the transient geographical rela- were similar, but they depended heavily on motion pat-
tionships caused by the movement of people in physical terns in the mall and were slower than in the food court
places.18 Cecelia Mascolo and Mirco Musolesi’s CMM- scenario (propagation speed was nearly halved in this case).
Tool generates realistic traces of movement for people
and their respective devices. We developed a small sim-
ulator that takes such feed as an input and then re-
produces the behavior of a Bluetooth worm that
propagates across them. The resulting BlueSim tool can
I n this work, we tried to envision possible future attack
scenarios involving targeted malware propagated
through Bluetooth-enabled covert attack devices. We
replicate, under various hypotheses, the behavior of real demonstrated the existence of a very high risk potential,
worm propagation, taking into account the visibility created by low awareness, ever-increasing functionalities
time of the devices, the inquire time needed, the data and complexity, and by the feasibility of targeted, covert
transfer rate, and so on. We chose not to analyze layer-1 attacks through Bluetooth-enabled malware.
radio aspects such as collisions and interference prob- Possible future extensions of this work include better
lems, which could potentially occur in crowded places planning of the malware’s “phone home” payload, to un-
with many devices; to do so, we would have needed a derstand how likely it is for the collected data to reach the
complete network simulator such as NS, which in turn attacker under various scenarios and how to improve
would have required a lot more computational power to worm auto execution and process hiding. The creation
complete even simple simulations.19 of a Bluetooth-only command and control infrastructure
To evaluate how effectively a targeted worm can would be a challenging evolution because it would inte-
propagate through a population, we recreated different grate ad hoc networking issues in our work.

24 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Like common worms, our malware doesn’t currently 10. H.W. Hethcote, “The Mathematics of Infectious Dis-
use Bluetooth attacks to spread itself: in the future, we eases,” SIAM Rev., vol. 42, no. 4, 2000, pp. 599–653.
want to investigate whether we can use a sort of attack li- 11. J.O. Kephart and S.R. White, “Directed-Graph Epidemi-
brary, combining social engineering attacks and Blue- ological Models of Computer Viruses,” Proc. IEEE Symp.
tooth technology attacks. Security and Privacy, 1991, IEEE CS Press, pp. 343–361.
Another possible extension would be the use of Blue- 12. C.C. Zou, D. Towsley, and W. Gong, Email Virus Prop-
Bag as a honeypot, to “capture” Bluetooth worms in the agation Modeling and Analysis, tech report, TR-CSE-03-
wild and measure their real prevalence. We briefly en- 04, Univ. of Massachusetts, Amherst, 2003.
gaged in this activity, but more extensive testing is needed 13. S. Zanero, “Issues in Modeling User Behavior in Com-
to give reasonable statistical results. puter Virus Propagation,” Proc. 1st Int’l Workshop on the
Theory of Computer Viruses, 2006.
Acknowledgments 14. E.H. Spafford, “Crisis and Aftermath,” Comm. ACM,
An earlier version of this work was presented at the Black Hat conference vol. 32, no. 6, ACM Press, 1989, pp. 678–687.
(www.blackhat.com). We thank Jeff Moss and the Black Hat staff for 15. S. Staniford, V. Paxson, and N. Weaver, “How to 0wn the
their support. F-Secure, an antivirus vendor based in Helsinki, Fin- Internet in Your Spare Time,” Proc. 11th Usenix Security
land, and Secure Network, a security consulting and training firm based Symp., (Security 02), Usenix Assoc., 2002, pp. 149–167.
in Milano, Italy, jointly funded the early stages of the BlueBag project. 16. A. Ahuja, C. Labovitz, and M. Bailey, Shining Light on
One of the authors had partial support from the Italian Ministry of Uni- Dark Address Space, tech. report, Arbor Networks, Nov.
versity and Research under the FIRB-PERF Project, in the research 2001; www.arbornetworks.com/downloads/research38/
unit led by Giuseppe Serazzi, whose support we gratefully acknowl- dark_address_space.pdf.
edge. We warmly thank Martin Herfurt, Marcel Holtman, and Adam 17. G. Serazzi and S. Zanero, “Computer Virus Propagation
Laurie, authors of the earliest works on Bluetooth security issues, for Models,” M.C. Calzarossa and E. Gelenbe, eds, Tutorials
their comments on this work. We also thank Mirco Musolesi (UCL) 11th IEEE/ACM Int’l Symp. Modeling, Analysis and Sim-
and Paolo Costa (DEI - Politecnico di Milano) for their help with mod- ulation of Computer and Telecommunications Systems (MAS-
eling movement. Several people helped with various stages of this pro- COTS 2003), Springer-Verlag, 2003.
ject, including Alvise Biffi, Laura Mantovani, Miska Reppo, and 18. M. Musolesi and C. Mascolo, “A Community-Based
Mara Rottigni. Finally, we thank the anonymous reviewers for their Mobility Model for Ad Hoc Network Research,” Proc.
helpful and extensive reviews of the first draft of this article. 2nd ACM/SIGMOBILE Int’l Workshop on Multi-hop Ad
Hoc Networks: From Theory to Reality (REALMAN 06),
References ACM Press, 2006, pp. 31–38.
1. R. Morrow, Bluetooth Implementation and Use, McGraw- 19. C.-J. Hsu and Y.-J. Joung, “An Ns-Based Bluetooth
Hill Professional, 2002. Topology Construction Simulation Environment,” Proc.
2. C. Gehrmann, J. Persson, and B. Smeets, Bluetooth Secu- 36th Ann. Symp. Simulation (ANSS 03), 2003, IEEE CS
rity, Artech House, 2004. Press, p. 145.
3. M. Jakobsson and S. Wetzel, “Security Weaknesses in
Bluetooth,” Proc. 2001 Conf. Topics Cryptology (CT-RSA Claudio Merloni is a senior consultant for Secure Network S.r.l.,
an information security company based in Milan, Italy. His
01), Springer-Verlag, 2001, pp. 176–191. research interests are auditing, policy development, and risk
4. S.F. Hager and C.T. Midkiff, “Demonstrating Vulnera- assessment activities, particularly in a banking environment.
bilities in Bluetooth Security,” Proc. IEEE Global Telecom- Claudio holds a MSc degree in computer engineering from the
Politecnico di Milano university. Contact him at c.merloni@
munications Conf. (GLOBECOM 03), vol. 3, 2003, IEEE
securenetwork.it.
CS Press, pp. 1420–1424.
5. Y. Shaked and A. Wool, “Cracking the Bluetooth Pin,” Luca Carettoni is a senior consultant for Secure Network S.r.l., an
Proc. 3rd Int’l Conf. Mobile Systems, Applications, and Ser- information security company based in Milan, Italy. His research
vices (MobiSys 05), ACM Press, 2005, pp. 39–50. interests are in Web application security. A regular contributor of
OWASP-Italy, he has led penetration testing efforts on several
6. P. Hamalainen et al., “Design and Implementation of an Italian and European banks. Luca holds a MSc degree in com-
Enhanced Security Layer for Bluetooth,” Proc. 8th Int’l puter engineering from the Politecnico di Milano university. Con-
Conf. Telecommunications (ConTEL 2005), vol. 2, 2005, tact him at l.carettoni@securenetwork.it.
IEEE CS Press, pp. 575–582.
7. K.K. Mookhey and P. Singh, “Metasploit Framework”; Stefano Zanero holds a PhD in computer engineering from the
Politecnico of Milano university, where he is currently spending
www.securityfocus.com/infocus/1789, July 2004. a post-doc period. His research interests include the development
8. S.R. White, “Open Problems in Computer Virus of intrusion detection systems based on unsupervised learning
Research,” Proc. Virus Bulletin Conf., 1998. algorithms, Web application security, and computer virology. He
is a member of the board of Journal in Computer Virology. Zanero
9. E. Filiol, M. Helenius, and S. Zanero, “Open Problems
is a member of the IEEE and the ACM, and a founding member
in Computer Virology,” J. Computer Virology, vol. 1, nos. of the Italian chapter of Information Systems Security Association
3–4, 2006, pp. 55–66. (ISSA). Contact him at stefano.zanero@polimi.it.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 25


Malware

Alien vs. Quine

Is it possible to prove that a computer is malware-free


without pulling out its hard disk? This article introduces a
novel hardware inspection technique based on the
injection of carefully crafted code and the analysis of its
output and execution time.

VANESSA n the Alien movie series, the fictional titular crea- introduced into a
G RATZER
Université
Paris II
Panthéon-
Assas
I tures are a blood-thirsty species from deep space that
reproduce themselves as parasites. When a potential
host comes into close proximity of an alien egg, the
egg releases a creature that slides a tubular organ down the
victim’s throat, implanting its larva in his or her stomach.
locale. Alien species
usually wreak havoc on their new ecosystems because
they have no natural predators; a common way to eradi-
cate them is to deliberately introduce matching predators.
This is the approach we take in our research and that other
In a matter of hours, the larva evolves and emerges from authors explore in related, yet different, publications.2,3
DAVID the victim’s chest, violently killing the host. And then the
NACCACHE cycle starts all over again. Similarly, malware such as The arena
École normale rootkits, worms, Trojans, and viruses penetrate healthy We tested our approach on Motorola’s 68HC05, a very
supérieure computer systems and, once in, alter the host’s phenotype common 8-bit microcontroller (with more than 5 billion
or destroy its contents. units sold). We slightly modified the chip’s specifications
Detecting new malware species is a nontrivial task.1 to better reflect a miniature PC’s behavior.
In theory, the easiest way to exterminate malware is to The 68HC05 has an accumulator A, an index register X,
reformat the disk and then reinstall the operating system a program counter PC (pointing to the executed memory
(OS) from a trusted distribution CD. This procedure as- instruction), a carry flag C, and a zero flag Z indicating
sumes we can force computers to boot from trusted whether the last operation resulted in a zero. We use (x) to
media, but most modern PCs have a flash BIOS, which denote a function returning one if x = 0 and zero otherwise.
means that the code component in charge of booting is The platform has   216 = 65,536 memory bytes
recorded on a rewritable memory chip. Specific pro- denoted as M[0], …, M[ – 1]. The device interprets any
grams called flashers—or even malware such as the CIH address a   as a mod . We model the memory as a
(Chernobyl) virus—have the ability to update this chip. state machine insensitive to power off; upon shutdown,
This article addresses this concern, namely, ascertaining execution fails and the machine’s RAM is backed up in
that malware doesn’t re-flash the BIOS to derail disk-re- nonvolatile memory. Reboot restores RAM, resets A,
formatting attempts or simulate their successful com- X, C, and Z, and launches execution at address 0x0002
pletion. Flash smart cards are equally problematic. (whose alias is start). The manufacturer records the
Consider a SIM card produced by Alice and sold empty very first RAM state (the digital genotype) in non-
to Bob, who privately keys it. Alice reveals an OS code volatile memory. Then the device starts evolving, mod-
but flashes a malware that simulates the legitimate OS. ifying its code and data as it interacts with the external
When some trigger event occurs (maybe a specific chal- world.
lenge value sent during the authentication protocol), The machine has two I/O ports (bytes) denoted In
the malware responds to Alice by revealing Bob’s key. and Out. Reading In lets a program receive data from
In biology, the term alien refers to foreign organisms outside; assigning a value to Out displays this value out-

26 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Malware

Table 1. The Motorola 68HC05 instruction set.


EFFECT lda i sta i bne k bra k

New A  M[i mod ]


New X 
New Z  (new A) (A)
Effect on M M[i + X mod ]  A
New PC  PC + 2 mod  PC + 2 mod   (PC, Z, k, )  (PC, 0, k, )
Opcode 0xB6 0xB7 0x26 0x20
Cycles 3 4 3 3

EFFECT inca incx lda ,X ldx ,X

New A  A + 1 mod 256 M[X]


New X  X + 1 mod 256 M[X]
New Z  (new A) (new X) (new A) (new X)
Effect on M
New PC  PC + 1 mod  PC + 1 mod  PC + 1 mod  PC + 1 mod 
Opcode 0x4C 0x5C 0xF6 0xFE
Cycles 3 3 3 3

EFFECT ldx i sta i, X lda i, X tst i

New A  M[i + X mod ]


New X  M[i mod ]
New Z  (new X) (A) (new A) (M[i mod ])
Effect on M M[i + X mod ]  A
New PC  PC + 2 mod  PC + 2 mod  PC + 2 mod  PC + 2 mod 
Opcode 0xBE 0xE7 0xE6 0x3D
Cycles 3 5 4 4

EFFECT ora i inc i stx i

New A  A  M[i mod ]


New X  (X)
New Z  (new A) (new M[i mod ])
Effect on M M[i mod ]  M[i mod ] + 1 mod 256 M[i mod ]  X
New PC  PC + 2 mod  PC + 2 mod  PC + 2 mod 
Opcode 0xBA 0x3C 0xBF
Cycles 3 5 4

side the machine. In and Out are located at memory cells cally requires some cryptographic proof before in-
M[0] and M[1], respectively. Out’s value is restored on re- stalling p.
boot (In isn’t). If the device attempts to write to In, exe- Table 1 reproduces some of the 68HC05’s instruc-
cute In, or execute Out, execution halts. We assume that tions (the entire set appears in the microcontroller’s data
the external world feeds incoming bytes synchronously sheet.4) Here,  denotes the function that encodes short-
to the 68HC05’s clock domain. range jumps. The seventh bit of k indicates whether we
The (potentially infested) system pretends to imple- should regard k mod 128 as positive or negative, that is,
ment an OS function called Install(p): when given a
string p, Install(p) installs p at start. We don’t ex- β(PC, z, k,  ) =
clude the possibility that malware might modify, mimic,
⎛ ⎛ ⎢ k ⎥⎞ ⎞
⎜ PC + 2 + (1 − z ) × ⎜⎝ k − 256 × ⎢ 128 ⎥⎟⎠ ⎟ mod .
or spy on Install. Given that the next reboot will
grant p complete control over the chip, Install typi- ⎝ ⎣ ⎦ ⎠

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 27


Malware

start: ldx In ; XIn 0xBE 0x00 gram capable of such a behavior, under specific
bne store ; if X0 goto store 0x26 0x09 complexity constraints, is only the quine itself.
_______________________________________________________ In several aspects, the setting is analogous to
print: lda M,X ; AM[X] 0xE6 0x00 the scenario in the movie Alien vs. Predator, in
sta Out ; OutA 0xB7 0x01 which a group of humans (in our example, the
incx ; X++ 0x5C OS and legitimate applications) finds itself in the
bne print ; if X0 goto print 0x26 0xF9 middle of a brutal war between two alien species
bra start ; if X=0 goto start 0x20 0xF3 (in this case, malware and quine) in a confined en-
_______________________________________________________ vironment (the 68HC05).
store: lda In ; AIn 0xB6 0x00
sta M,X ; M[X]A 0xE7 0x00 Space-constrained quines
bra start ; goto start 0x20 0xED We start our space-constrained experiment by an-
alyzing Quine1.asm, a simple 19-byte program
that inspects  = 256-byte platforms (see Figure 1).
Figure 1. Quine1.asm. This simple 19-byte program stores and reads We used artificial horizontal lines to divide
memory contents. Quine1 into three functional blocks. A primitive
command dispatcher reads a byte from In and de-
termines if the verifier wants to read the device’s con-
Quines as malware predators tents (In = 0) or write a byte into the RAM (In  0).
A quine (named after the logician Willard van Orman As the program enters print, the index register is
Quine) is a program that prints a copy of its own code as null; print is a simple loop that sends 256 bytes out of
its only output.5,6 Writing quines is a tricky program- the device. As the loop ends, the device jumps back to
ming exercise that yields Lisp, C, or natural language ex- start to interpret a new command. The store block
amples such as queries a byte from the verifier, stores it in M[X], and re-
turns to start. Here’s the associated phenotyping 1:
((lambda (x) (list x (list (quote quote)
x)))(quote (lambda (x) (list x (list • Install(Quine1.asm) and reboot.
(quote quote) x))))) • Feed Quine1 with 235 random bytes to be stored at
M[21], , M[255].
char *f=”char*f=%c%s%c;main(){printf(f, • Activate print (command zero) and compare the ob-
34,f,34,10);}%c”; served output to
main() {printf(f,34,f,34,10);}
s1 = 0x00 0x00 0xBE 0x00 0x26 0x09
Copy the next sentence twice. 0xE6 0x00 0xB7 0x01 0x5C 0x26
Copy the next sentence twice. 0xF9 0x20 0xF3 0xB6 0x00 0xE7
0x00 0x20 0xED
To detect malware, we start by loading a quine onto M[21],…,M[255]
the computer that might have malware on it. This mal-
ware could neutralize the quine or even analyze it and But is Quine1.asm the only 19-byte program capable of
mutate (adapt its own code in an attempt to fool the veri- always printing s1 when subject to 1? We think so, al-
fier). As the download ends, we start a protocol called though we can’t provide a formal proof. To illustrate this
phenotyping on whatever survived inside the platform. difficulty, consider the slight variant to Quine1.asm in
Phenotyping lets us figure out whether the quine sur- Figure 2.
vived and whether it’s now in full control of the system. If For all practical purposes, the modification that results
it survived, we use it to reinstall the OS and eliminate it- in Quine2.asm has nearly no effect on the program’s
self; otherwise, we now know that the platform is in- behavior: instead of printing s1, this code will print:
fected. In extreme situations, decontamination by
software alone is impossible—a trivial example is mal- s2 = 0x00 0x00 0xBE 0x00 0x26 0x0B
ware that controls the I/O port and doesn’t let in any- 0x3D 0x06 0xE6 0x00 0xB7 0x01
thing new. Under such extreme circumstances, the 0x5C 0x26 0xF9 0x20 0xF1 0xB6
algorithms we discuss here can only detect the malware, 0x00 0xE7 0x00 0x20 0xEB
not eliminate it. M[23],…,M[255]
Upon activation, the quine will (allegedly!) start
dumping out its own code plus whatever else it finds on Let’s replace Quine2 with Quine3, where inc replaces
board. We then prove or conjecture that the unique pro- tst. When executed, inc will increment the memory

28 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

cell at label, which is inc’s own opcode. start: ldx In ; XIn 0xBE 0x00
However, inc’s opcode is 0x3C, so execution bne store ; if X0 goto store 0x26 0x0B
will transform 0x3C into 0x3D, which is tst’s label: tst label ; 0x3D 0x06
opcode. All in all, 2 doesn’t let us distinguish a ________________________________________________________
tst from an inc at label because both print: lda M,X ; AM[X] 0xE6 0x00
Quine2 and Quine3 will output s2. : : ; same code as in Quine1
This example’s subtlety shows that a micro-
processor-quine-phenotyping triple {, Q, }
rigorously defines a problem: given a state ma- Figure 2. Quine2.asm. This program shows a slight modification to
chine , find a state M (malware) that simulates Quine1.asm.
the behavior of a state Q (legitimate OS) when 
is subject to stimulus  (phenotyping). Security
practitioners can proceed by analogy to crypto- start: ldx In ; 3 cycles ; XIn (instruction I1)
systems whose specifications are published and stx Out ; 4 cycles ; OutX (instruction I2)
subject to public scrutiny. If we find an M simulat- : : ; ; other instructions
ing Q with respect to , a fix can either replace Q,
, or both. Note the analogy: given a stream ci-
pher  and a key Q (defining an observed cipher Figure 3. A small code fragment that echoes back a byte. This example
stream ), prove that the key Q has no equivalent executes in seven machine cycles.
keys M. An alternative solution involves proving
the quine’s behavior under the assumption that the
verifier is allowed to count clock cycles (state tran- start: ??? ??? ; 3 cycles ; an instruction causing •In
sitions if  is a Turing machine). st• Out ; 4 cycles ; an instruction causing Out•
: : ; ; other instructions
Time-constrained quines
Now let’s turn our attention to time constraints.
Consider the program in Figure 3 whose first Figure 4. The inferred form of the echo-back code fragment. Here, • stands for
instruction is located at start. register A or register X.
We latch a first value v1 at In and reboot; as
seven cycles elapse, v1 pops up at Out. If we turn
off the device before the eighth cycle and reboot, v1 reap- I2 ∈ sta Out stx Out sta ,X stx ,X .
pears on Out immediately. (Because Out is a memory
cell, its value is backed up upon power off.) Repeating To aim at Out (the address of which is 0x0001),
the process with values v2 and v3, we witness two seven- sta ,X and stx ,X would require an X=0x01, which
cycle transitions v1  v2 and v2  v3. is impossible (if the code takes the time to assign a value
It’s impossible to modify two memory cells in seven to X, it wouldn’t be able to compute the transition’s
cycles because all instructions capable of modifying a value by time). Hence, we infer that the code’s struc-
memory cell require at least four cycles. We’re thus as- ture looks like Figure 4, where • stands for register A or
sured that between successive reboots, the only memory register X.
changes are in Out. This means that no matter what the The only possible code fragments capable of causing
examined code is, it doesn’t have time to mutate in seven this behavior are
cycles, so it remains invariant between reboots.
The instructions other than sta and stx capable of adc In adc ,X add In add ,X
sta Out sta Out sta Out sta Out
directly modifying Out are ror, rol, neg, lsr, lsl,
I1
asl, asr, bset, bclr, clr, com, dec, and inc. Hence, ∈ lda In lda ,X ora In ora ,X
it suffices to select v2  dir(v1) and v3  dir(v2), where I2 sta Out sta Out sta Out sta Out
dir stands for any of the previous instructions to ascertain eor In eor ,X ldx In ldx ,X
that Out is modified by sta or stx (we also need v1  v2  sta Out sta Out stx Out stx Out
v3 to actually see the transition). Accordingly, v1 = 0x04,
v2 = 0x07, and v3 = 0x10 satisfy these constraints. We can’t further refine the analysis without more ex-
Because reading or computing with a memory cell periments, but we can already guarantee that as the exe-
takes at least three cycles, we have only four cycles left to cution of any of these fragments ends, the machine’s state
alter the contents of Out; consequently, the only sta and is either SA = {A = v3, X = 0x00} or SX = {A = 0x00, X =
stx instructions capable of making the transitions fast v3}. Now let’s assume that Out = v3 = 0x10. Consider
enough are the code in Figure 5:

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 29


Malware

• Latch In  v4 = 0x02, reboot, and wait 14 cycles; A – opcode(sub,X) = 0x00 – 0xF0 = 0x10  0xF6
witness the transition 0x10  0x02  0xBE; and A opcode(and,X) = 0x00 0xF4 = 0x00  0xF6
then power off before the 15th cycle completes. A opcode(eor,X) = 0x00 0xF8 = 0xF8  0xF6
• Latch In  v6 = 0x04, reboot, and wait 14 cycles; wit- A  opcode(ora,X) = 0x00  0xFA = 0xFA  0xF6
ness the transition 0xBE  0x06  0xF6; and then A + opcode(add,X) = 0x00 + 0xFB = 0xFB  0xF6
power off before the 15th cycle completes.
Here, ldx ,X is impossible because it would cause a
To better understand what happens here, note that 0xBE is transition to opcode (ldx ,X) = 0xFE  0xF6 (if SX)
the opcode of ldx read from address 0x02, and 0xF6 is the or to 0x06 (if SA). I3 is thus identified as being necessar-
opcode of lda ,X read from address 0x06. ily lda ,X. It follows immediately that I4 = sta Out
Again, because v5  dir(v4) and v7  dir(v6), the second and that the 10 register-A-type candidates for {I1, I2}
transition is necessarily caused by some member of the sta are inconsistent. The phenotyped code is therefore one
or stx families and, more specifically, one of the following: of the following:
ldx In ldx ,X
I4 ∈ sta Out stx Out sta ,X .
 
I3 can’t be an instruction that has no effect on X and A be- stx Out
cause this will either inhibit a transition or cause a transi- lda ,X
tion to zero (remember that immediately before I3’s
sta Out
execution, the machine’s state is either SA or SX). This
rules out 18 jump instructions as well as all cmp, bit, Only the leftmost, namely ldx In, is capable of causing
cpx, tsta, and tstx variants. Moreover, lda i and ldx the observed transition 0x02  0xBE.
i are impossible as both would have forced 0x02 and Up to this point, we’ve built a proof that the device
0x04 to transit to the same constant value. In addition, v5 actually executed the first short fragment presented at the
 dir(v4) implies that I3 can’t be a dir variant operating beginning of this section. Extending the code as in Figure
on A or X, which rules out negx, nega, comx, coma, 6 and subjecting the chip to three additional experi-
rorx, rora, rolx, rola, decx, deca, dec, incx, ments, we observe
inca, clrx, clra, lsrx, lsra, lslx, lsla, aslx,
asla, asrx, and asra altogether. In  0x09 0xF6  0x09  0x5C
Because no carry was set, we sieve out sbc and adc,
whose effects will be strictly identical to sub i and add i; In  0x0A 0x5C  0x0A  0x26
in this case, add i, sub i, eor i, and i, and ora i are im-
possible because the system In  0x0B 0x26  0x0B  0xFA

⎧0x02 * x = 0xBE Note that the identified code just happens to let the

⎩0x06 * x = 0xF6 verifier inspect with absolute certainty the platform’s first
256 bytes. The verifier does a last time measurement, al-
has no solutions when operator * is substituted by +, –, lowing the quine to print the device’s first 256 bytes
, , or . Therefore, the only possible I3 candidates at (power off as soon as the last bne iteration completes, to
this point are avoid any malware hiding beyond address 0x000B). The
only thing left is to check the quine’s payload (the code
I3 ∈ sub ,X and ,X eor ,X between 0x000C and 0x00FF) and unleash the quine’s
add ,X lda ,X ldx ,X execution beyond address 0x000B.

But before I3’s execution, the machine’s state is

SA = {A = 0x06, X = 0x00} or
SX = {A = 0x00, X = 0x06}.
T his work raises several intriguing questions. Can this
approach be adapted to more complex and modern
architectures? Is it possible to prove security using only
space constraints? Can we modify the assembly lan-
It follows that the “,X” versions of sub, and, eor, ora, guage to allow or ease such proofs? Can space-
and add are impossible because constrained quines solve space-complete problems to
flood memory instead of receiving random data? Can
• if the device is in state SA, then 0x06*0x06  0xF6 for we design a formal microprocessor model and formally
* {–, , , , +}, and prove quine uniqueness?
• if the device is in state SX, then Another interesting challenge is the development of a

30 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

time-constrained quine whose proof doesn’t re- start: ldx In ; 3 cycles ; X  In


quire rebooting but, rather, the observation of stx Out ; 4 cycles ; OutX
one long succession of transitions. We think that lda ,X ; 3 cycles ; AM[X] (instruction I3)
such programs exist. A possible starting point sta Out ; 4 cycles ; OutA (instruction I4)
might be a code (not necessarily located at : : ; ; other instructions
start) similar to

loop: sta Out Figure 5. Analysis refinement. This code fragment echoes back an address and
lda In prints the address’s contents.
sta Out
ldx In
stx Out start: ldx In ; X  In 0xBE 0x00
lda ,X stx Out ; Out  X 0xBF 0x01
sta Out print: lda ,X ; A  M[X] 0xF6
bne loop sta Out ; Out  A 0xB7 0x01
incx ; X  X+1 0x5C
Here, the idea is that the verifier will feed the bne print ; if X0 goto print 0x26
0xFA
quine with values chosen randomly in a specific
set (thus ruling out dir variants) to repeatedly
explore the code’s immediate environment Figure 6. Additional experiments. This code fragment inspects the platform’s
until we acquire some degree of certainty. first 256 bytes.
If possible, this would have the advantage
of transforming the quine into a function that
we can automatically insert into any application tents of computing platforms and thus efficiently de-
whose code requires authentication. Moreover, if we tect malware.
manage to constrain such a quine’s capabilities—that
is, not allow it to read data beyond a given offset—we References
could offer the selective ability to audit critical pro- 1. T. Zeller, “The Ghost in the CD: Sony BMG Stirs a
gram parts while preserving the privacy of others. We Debate over Software Used to Guard Content,” The New
could audit an accounting program’s code, for exam- York Times, 14 Nov. 2005, p. C1.
ple, but keep secret signature keys provably out of the 2. C. Dwork, A. Goldberg, and M. Naor, “On Memory-
quine’s reach. Bound Functions for Fighting Spam,” Proc. 23rd Ann.
Because time-constrained phenotyping is extremely Int’l Cryptology Conf. (CRYPTO 03), Springer-Verlag,
quick (a few clock cycles), preserves nearly all the plat- 2003, pp. 426–444.
form’s data, and requires only table lookups and compar- 3. J. Garay and L. Huelsbergen, “Software Integrity Pro-
isons, we’re currently trying to extend the approach tection Using Timed Executable Agents,” Proc. ACM
described here to more complex microprocessors and Symp. Information, Computer and Comm. Security (ASI-
implement it between chips in motherboards. We’re also ACCS 06), ACM Press, 2006, pp. 189–200.
developing a second code authentication approach that 4. Motorola, 68HC(7)05H12, General Release Specifica-
exploits power consumption patterns. Here, the designer tions, HC05H12GRS/D Rev. 1.0, Nov. 1998.
installs in the chip a function called sense(a) that loads 5. J. Burger, D. Brill, and F. Machi, “Self-Reproducing Pro-
the contents of address a and “senses” it by executing a grams,” Byte, vol. 5, Aug. 1980, pp. 74–75.
predefined sequence of operations: 6. D. Hofstadter, Gödel, Escher, Bach: An Eternal Golden
Braid, Basic Books, 1999, pp. 498–504.
A  LookUpTable[
X  (((A  M[a]) 0x55) + 0xAA) Vanessa Gratzer is a master’s student at the Université Paris II
]. Panthéon-Assas, Paris. Her research interests include forensics,
embedded technologies, and reverse engineering. Contact her
at vanessa@gratzer.fr.
Because we expect this to cause a specific current con-
sumption pattern, our intent is to run sense(a) on the David Naccache is a computer science professor at the Univer-
entire memory space (including sense’s own code!) sité Paris II Panthéon-Assas, Paris, and a member of the Com-
and spot malware by analyzing the correlation be- puter Science Laboratory at the École normale supérieure. His
research interests include cryptography and embedded elec-
tween the power consumption of the potentially in- tronics. Naccache has a PhD in computer science from Ecole
fected platform and that of a malware-free device. nationale supérieure des télécommunications, Paris. Contact
Hopefully, this will let us thoroughly inspect the con- him at david.naccache@ens.fr.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 31


Malware

Toward Automated Dynamic


Malware Analysis Using
CWSandbox

The authors present CWSandbox, which executes malware


samples in a simulated environment, monitors all system
calls, and automatically generates a detailed report to
simplify and automate the malware analyst’s task.

CARSTEN alware is notoriously difficult to combat be- We show how to


WILLEMS,
THORSTEN
HOLZ , AND
FELIX FREILING
University of
M cause it appears and spreads so quickly. Most
security products such as virus scanners
look for signatures—characteristic byte se-
quences—to identify malicious code. Malware, however,
has adapted to that approach. Poly- or metamorphic
use API hooking and
dynamic linked library (DLL) injection techniques to
implement the necessary rootkit functionality to avoid
detection by the malware. We acknowledge that these
techniques aren’t new; however, we’ve assembled the
Mannheim, worms avoid detection by changing their appearance, for techniques in a unique combination that provides a fully
Germany example, whereas flash worms stealthily perform recon- functional, elegantly simple, and arguably powerful auto-
naissance without infecting vulnerable machines, waiting mated malware analysis tool.
to pursue strategic spreading plans that can infect thou-
sands of machines within seconds. Behavior-based malware analysis
In the face of such automated threats, security re- Combining dynamic malware analysis, API hooking,
searchers can’t combat malicious software using manual and DLL injection within the CWSandbox lets analysts
methods of disassembly or reverse engineering. There- trace and monitor all relevant system calls and generates
fore, analysis tools must analyze malware automatically, an automated, machine-readable report that describes
effectively, and correctly. Automating this process
means that the analysis tool should create detailed re- • the files the malware sample created or modified;
ports of malware samples quickly and without user • the changes the malware sample performed on the
intervention. Analysts could then use the machine- Windows registry;
readable reports to initiate automated responses—auto- • which DLLs the malware loaded before execution;
matically updating an intrusion detection system’s • which virtual memory areas it accessed;
signatures, for example, and protecting networks from • the processes that it created;
new malware on the fly. An effective analysis tool • the network connections it opened and the informa-
must log the malware’s relevant behaviors—the tool tion it sent; and
shouldn’t overlook any of the executed functionality • other information, such as the malware’s access to pro-
because analysts will use the information to realistically tected storage areas, installed services, or kernel drivers.
assess the threat. Finally, the tool should correctly ana-
lyze the malware—the sample should initiate every CWSandbox’s reporting features aren’t perfect—that
logged action to avoid false positives. is, it reports only the malware’s visible behavior and not
In this article, we describe the design and implemen- how it’s programmed, and using the CWSandbox might
tation of CWSandbox, a malware analysis tool that fulfills cause some harm to other machines connected to the
our three design criteria of automation, effectiveness, and network. Yet, the information derived from the
correctness for the Win32 family of operating systems. CWSandbox for even the shortest of time periods is sur-

32 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Malware

prisingly rich; in most cases, it’s more than sufficient to to a different location where customized code—the
assess the danger originating from malware. hook or hook function—resides. The hook then per-
In the following paragraphs, we introduce the individ- forms its own operations and transfers control back to
ual building blocks and techniques behind CWSandbox. the original API function or prevents its execution com-
pletely. If hooking is done properly, it’s hard for the call-
Dynamic malware analysis ing application to detect the hooked API function or
Dynamic analysis observes malware behavior and ana- that it’s called instead of the original function. In our
lyzes its properties by executing the malware in a simu- case, the malware could try to detect the hooking func-
lated environment—in our case, the sandbox. Two tion, so we must carefully implement it and try to hide
different approaches to dynamic malware analysis exist, the analysis environment from the malware process as
each resulting in different granularity and quality: much as possible.
Several different methods let hook functions intercept
• taking an image of the complete system state before system calls from potentially malicious user applications
malware execution and comparing it to the complete on their way to the kernel.1 For example, you can inter-
system state after execution; and cept the execution chain either inside the user process in
• monitoring the malware’s actions during execution one or multiple parts of the Windows API or inside the
with the help of a specialized tool, such as a debugger. Windows kernel by modifying the interrupt descriptor
table (IDT) or the system service dispatch table (SSDT).
The first option is easier to implement but delivers Other methods have different advantages, disadvantages,
more coarse-grained results, which sometimes are suffi- and complexity. We use in-line code overwriting because
cient to gain an overview of what a given binary does. it’s one of the more effective and efficient methods.
This approach analyzes only the malware’s cumulative ef- In-line code overwriting directly overwrites the
fects without taking into account dynamic changes— DLL’s API function code that’s loaded into the process
such as the malware generating a file during execution memory. Therefore, calls to the API functions are
and deleting it before termination, for example. The sec- rerouted to the hook function, regardless of when they
ond approach is harder to implement, but we chose to use occur or whether they’re linked implicitly or explicitly.
it in the CWSandbox because it delivers much more de- Implicit linking occurs when an application’s code calls
tailed results. an exported DLL function, whereas applications must
Dynamic analysis has a drawback: it analyzes only a make a function call to explicitly load the DLL at runtime
single malware execution at a time. In contrast, static mal- with explicit linking. We can overwrite the function
ware analysis analyzes the source code, letting analysts code using the following steps:
observe all possible malware executions at once. Static
analysis, however, is rather difficult to perform because 1. Create a target application in suspended mode. Win-
the malware’s source code isn’t usually available. Even if dows loads and initializes the application and all im-
it is, you can never be sure that undocumented modifi- plicitly linked DLLs, but it doesn’t start the main
cations of the binary executable didn’t occur. Addition- thread so the target application doesn’t perform any
ally, static analysis at the machine code level can be operations.
extremely cumbersome because malware often uses code- 2. When the initialization work is done, CWSandbox
obfuscation techniques such as compression, encryption, looks up every to-be-hooked function in the DLL’s
or self-modification to evade decompilation and analysis. export address table (EAT) and retrieves their code
entry points.
API hooking
Programmers use the Windows API to access system re-
sources such as files, processes, network information, the In the face of such threats, security
registry, and other Windows areas. Applications use the
API rather than making direct system calls, offering the researchers can’t combat malicious
possibility for dynamic analysis if we can monitor the rel-
evant API calls and their parameters. The Windows software using manual methods of
system directory contains the API, which consists of sev-
eral important DLLs, including kernel32.dll, disassembly or reverse engineering.
ntdll.dll, ws2 32.dll, and user32.dll.
To observe a given malware sample’s control flow, we 3. Save the original code in advance so you can later re-
need to access the API functions. One possible way to construct the original API function.
achieve this is by hooking—intercepting a call to a func- 4. Overwrite the first few instructions of each API
tion. When an application calls a function, it’s rerouted function with a JMP (or a call) instruction that

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 33


Malware

instruction to our hook code. In the hook function, we


Kernel32.dll-CreateFileA (*without* Hook):
can save the called API function’s parameters or modify
77E8C1F7 PUSH ebp them if necessary. Then, we execute the bytes that we
77E8C1F8 MOV ebp, esp overwrote in the first phase and then JMP back to execute
77E8C1FA PUSH SS:[ebp+8]
77E8C1FD the rest of the API function. There’s no need to call it
CALL +$0000d265
77E8C202
TEST eax, eax
with a JMP instruction: the hook function can call the
77E8C1FD

JNZ original API with a CALL operation and regain control
… when the called API function performs the RET. The
77E8C226
RET
hook function then analyzes the result and modifies it if
(a)
necessary. Holy Father offers one of the most popular and
Kernel32.dll-CreateFileA (*with* Hook): detailed descriptions of this approach,2 and Microsoft also
offers a library called Detours for this purpose (www.
77E8C1F7 JMP [CreatFileA-Hook] research.microsoft.com/sn/detours).
77E8C1FD
77E8C202
CALL +$0000d265 To offer a complete hooking overview, we must
TEST eax, eax
77E8C1FD JNZ +$05 mention system service hooking, which occurs at a

77E8C226
… 1 lower level within the Windows operating system and
RET
isn’t considered to be API hooking. Two additional pos-
Application.CreatFileA-Hook:
sibilities exist for rerouting API calls: we can modify an
entry in the IDT such that Int 0x2e, which is used for
2005EDB7 -custom hook code-
3 … … invoking system calls, points to the hooking routine, or
2005EDF0 JMP [CreatFileA-SavedStub] we can manipulate the entries in the SSDT so that the
system calls can be intercepted depending on the ser-
Application.CreatFileA-SavedStub: 2
vice IDs. We don’t use these techniques because API
21700000 PUSH ebp hooking is much easier to implement and delivers more
21700001 MOV ebp, esp
21700003 PUSH SS:[ebp+8]
accurate results. In the future, we might extend
21700006
JMP $77E8C1FD
CWSandbox to use kernel hooks because they’re more
(b)
complicated to detect.
On a side note, programs that directly call the kernel
Figure 1. In-line code overwriting. (a) shows the original function to avoid using the Windows API can bypass API hooking
code. In (b), the JMP instruction overwrites the API function’s first techniques. However, this is rather uncommon because
block (1) and transfers control to our hook function whenever the the malware author must know the target operating sys-
to-be-analyzed application calls the API function. (2) The hook tem, its service pack level, and other information in ad-
function performs the desired operations and then calls the vance. Our results show that most malware authors
original API function’s saved stub. (3) The saved stub performs the design autonomous-spreading malware to attack large
overwritten instructions and branches to the original API function’s user bases, so they commonly use the Windows API.
unmodified part.
DLL code injection
DLL code injection lets us implement API hooking in a
leads to the hook function. modular and reusable way. However, API hooking with
5. To complete the process, hook the LoadLibrary inline code overwriting makes it necessary to patch the
and LoadLibraryEx API functions, which allow application after it has been loaded into memory. To be
the explicit binding of DLLs. successful, we must copy the hook functions into the
target application’s address space so they can be called
If an application loads the DLL dynamically at runtime, from within the target—this is the actual code injec-
you can use this same procedure to overwrite the function tion—and bootstrap the API hooks in the target appli-
entry points. The CWSandbox carries out these steps in cation’s address space using a specialized thread in the
the initialization phase to set up the hooking functions. malware’s memory.
Figure 1a shows the original function code for How can we insert the hook functions into the
CreateFileA, which is located in kernel32.dll. process running the malware sample? It depends on the
The instructions are split into two blocks: the first marks hooking method we use. In any case, we have to manip-
the block that we’ll overwrite to delegate control to our ulate the target process’s memory—changing the appli-
hook function; the second block includes the instruc- cation’s import address table (IAT), changing the loaded
tions that the API hook won’t touch. Figure 1b shows the DLLs’ export address table (EAT), or directly overwriting
situation after we installed the hook. We overwrite the the API function code. In Windows, we can implant and
first six bytes of each to-be-analyzed function with a JMP install API hook functions by accessing another process’s

34 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

virtual memory and executing code in a different report from the monitored API calls. The report pro-
process’s context. vides data for each process and its associated actions—
Windows kernel32.dll offers the API functions one subsection for all accesses to the file system and
ReadProcessMemory and WriteProcessMemory, another for all network operations, for example. One of
which lets the CWSandbox read and write to an arbi- our focuses is on bot analysis, so we spent considerable
trary process’s virtual memory, allocating new memory effort on extracting and evaluating the network con-
regions or changing an already allocated memory re- nection data.
gion’s using the VirtualAllocEx and Virtual- After it analyzes the API calls’ parameters, the sand-
ProtectEx functions. box routes them back to their original API functions.
It’s possible to execute code in another process’s con- Therefore, it doesn’t block the malware from integrating
text in at least two ways: itself into the target operating system—copying itself to
the Windows system directory, for example, or adding
• suspend one of the target application’s running threads, new registry keys. To enable fast automated analysis, we
copy the to-be-executed code into the target’s address execute the CWSandbox in a virtual environment so that
space, set the resumed thread’s instruction pointer to the the system can easily return to a clean state after complet-
copied code’s location, and then resume the thread; or ing the analysis process. This approach has some draw-
• copy the to-be-executed code into the target’s address backs—namely, detectability issues and slower
space and create a new thread in the target process with execution—but using CWSandbox in a native environ-
the code location as the start address. ment such as a normal commercial off-the-shelf system
with an automated procedure that restores the system to a
With these building blocks in place, it’s now possible to clean state can help circumvent these drawbacks.
inject and execute code into another process. The CWSandbox has three phases: initialization, exe-
The most popular technique is DLL injection, in cution, and analysis. We discuss each phase in more detail
which the CWSandbox puts all custom code into a DLL in the following sections.
and the hook function directs the target process to load
this DLL into its memory. Thus, DLL injection fulfills Initialization phase
both requirements for API hooking: the custom hook In the initialization phase, the sandbox, which consists of
functions are loaded into the target’s address space, and the cwsandbox.exe application and the cwmoni-
the API hooks are installed in the DLL’s initialization rou- tor.dll DLL, sets up the malware process. This DLL
tine, which the Windows loader calls automatically. installs the API hooks, realizes the hook functions, and
The API functions LoadLibrary or LoadLibrary- exchanges runtime information with the sandbox.
Ex perform the explicit DLL linking; the latter allows The DLL’s life cycle is also divided into three phases:
more options, whereas the first function’s signature is initialization, execution, and finishing. The DLL’s main
very simple—the only parameter it needs is a pointer to function is to handle the first and last phases; the hook
the DLL name. functions handle the execution phase. DLL operations
The trick is to create a new thread in the target are executed during the initialization and finishing
process’s context using the CreateRemoteThread phases and every time one of the hooked API functions
function and then setting the code address of the API is called. Additionally, the DLL informs the sandbox
function LoadLibrary as the newly created thread’s when the malware starts a new process or injects code
starting address. When the to-be-analyzed application into a running process. As Figure 2 shows, the sandbox
executes the new thread, the LoadLibrary function is then injects a new instance of the DLL into the newly
called automatically inside the target’s context. Because created or existing process so that it captures all API calls
we know kernel32.dll’s location (always loaded at from this process.
the same memory address) from our starter application,
and know the LoadLibrary function’s code location, Execution phase
we can also use these values for the target application. If everything initializes correctly, malware processing re-
sumes and the execution phase starts. Otherwise, the
CWSandbox architecture sandbox kills the newly created malware process and ter-
With the three techniques we described earlier set up, minates. During the malware’s execution, the sandbox
we can now build the CWSandbox system that’s capa- reroutes the hooked API calls to the referring hook func-
ble of automatically analyzing a malware sample. This tions in the DLL, which inspects the call parameters, in-
system outputs a behavior-based analysis; that is, it exe- forms the sandbox about the API calls in the form of
cutes the malware binary in a controlled environment notification objects, and then delegates control to the
so that we can observe all relevant function calls to the original function or returns directly to the application
Windows API, and generates a high-level summarized performing the API call, depending on the type of API

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 35


Malware

generates an XML analysis report. To measure the report’s


Malware application
Executes accuracy, we examined several current malware binaries
cwmonitor.dll
and compared the results with reports generated by the
Communication Norman Sandbox and Symantec via manual code analysis.
Because the CWSandbox analyzes live systems and
Executes
cwsandbox.exe lets us observe how the malware binary interacts with
other processes, its results were more detailed than those
Malware application child the Norman Sandbox provides. However, both tools gen-
erated reports that detected file system changes, registry
cwmonitor.dll
Communication modifications, mutex creation, or process-management
actions. Only small differences between the tools exist—
the reports differed if the malware binary used a random
Figure 2. CWSandbox overview. cwsandbox.exe creates a new file name when it copied itself to another location, for
process image for the to-be-analyzed malware binary and then injects example. Moreover, a disadvantage of Norman Sandbox
the cwmonitor.dll into the target application’s address space. With is that only limited Internet connection is available; if the
the help of the DLL, we perform API hooking and send all observed malware tries to download additional content from a re-
behavior via the communication channel back to cwsandbox.exe. mote location, Norman Sandbox detects it, but can’t au-
We use the same procedure for child or infected processes. tomatically analyze the remote file. In contrast,
CWSandbox observes the download request and, if the
malware downloads and executes a file, performs DLL
function called. After the original API call returns, the injection to enable API hooking on the new process.
hook function inspects the result and might modify it be- Compared with the reports from Symantec’s manual
fore returning to the calling malware application. code analysis, the sandbox reported the important ac-
Because the malware sample shouldn’t be aware that tions, but it failed to detect small details and behavior
it’s being executed inside a controlled environment, the variants (the creation of certain event objects, for exam-
cwmonitor.dll implements some rootkit functional- ple) because the corresponding API calls weren’t hooked
ity: all of the sandbox’s system objects are hidden from the in the current implementation. By adding hooks to these
malware binary, including modules, files, registry entries, API calls, we could extend CWSandbox’s analysis capa-
mutual exclusion events (mutexes), and handles in gen- bilities. Symantec’s manual code analysis didn’t contain
eral. This at least makes it much harder for the malware any details that weren’t in our analysis report.
sample to detect the sandbox’s presence. (This approach We executed the malware sample for a specific time
isn’t undetectable, but our evaluation results show that period, so we used it to tune CWSandbox’s throughput.
CWSandbox generates valuable reports in practice.) We found that executing the malware for two minutes
During the execution phase, heavy interprocess com- yielded the most accurate results and allowed the malware
munication (IPC) occurs between the cwmonitor.dll binary enough time to interact with the system, thus
and cwsandbox.exe. Each API hook function informs copying itself to another location, spawning new
the sandbox via a notification object about the call and its processes, or connecting to a remote server, and so on.
parameters. Some hook functions require an answer from
the sandbox that determines which action to take, such as Large-scale analysis
whether to call the original API function. A heavy com- We conducted a larger test to evaluate CWSandbox’s re-
munication throughput exists because each notification port throughput and quality. We analyzed 6,148 malware
object must transmit a large amount of data and several binaries that we collected in a five-month period be-
DLL instances can exist. Besides the high performance tween June and October 2006 with nepenthes, a honey-
need, reliability is crucial because data must not be lost or pot solution that automatically collects autonomous
modified on its way. Thus, we had to implement a reli- spreading malware.3 Nepenthes emulates the vulnerable
able inter-process communication (IPC) mechanism parts of a network’s services to the extent that an auto-
with high throughput so we used a memory-mapped file mated exploit is always successful. Autonomous spread-
with some customizations that fit our needs. ing malware such as bots and worms thus think that
The execution phase lasts for as long as the malware they’ve exploited the system, but rather than infecting a
executes, but the sandbox can end it prematurely when a “victim,” they’re delivering to us a binary copy of the
timeout occurs or if critical conditions require instant malware. Thus, our test corpus is real malware spreading
termination of the malware. in the wild; we’re sure that all of these binaries are mali-
cious because we downloaded them after successful ex-
Analysis phase ploitation attempts in nepenthes.
In the last phase, the sandbox analyzes the collected data and For the analysis process, we executed the sandbox on

36 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Related work in malware behavior analysis


S everal tools exist for automatically analyzing malicious software
behaviors. Despite some similarities, our CWSandbox has the
advantage of generating a detailed, behavior-based analysis report
system, it can’t monitor dynamic actions such as the creation of
new processes. The Reusable Unknown Malware Analysis Net
(Truman; www.lurhq.com) takes a similar approach.
and automating the whole process to a high degree. Galen Hunt and Doug Brubacher introduced Detours, a library
The Norman SandBox (http://sandbox.norman.no) simulates an for instrumenting arbitrary Windows functions.3 We opted to
entire computer and a connected network by reimplementing the implement our own API hooking mechanism to have greater flexi-
core Windows system and executing the malware binary within the bility and more control over the instrumented functions, but this
simulated environment. It’s also possible to execute the malware library makes it possible to implement an automated approach to
binary with a live Internet connection. The company’s Web site malware analysis that is similar to CWSandbox.
features implementation details, a description of the underlying The concepts of system call sequence analysis and API hooking
technology, and a live demo. Such environments are mostly trans- are well-known in the area of intrusion detection. A typical
parent to the malware, which can’t detect that they’re being approach includes a training phase in which the IDS system
executed within a simulated environment. Yet, simulations don’t let observes system calls of the complete system or specific processes
the malware processes interfere with, infect, or modify other and creates a profile of “normal” behavior. During operation, the
running processes because no other processes run within the simu- system call sequences are compared against this profile; upon
lation. By not monitoring this interference, valuable information detecting a deviation, the system sounds an alarm that indicates
might be missed. By using a real operating system as CWSandbox’s an anomaly. Stephanie Forrest and her coauthors give one of the
base, we allow the malware samples to interfere with the system earliest descriptions of this approach,4 and Steven Hofmeyr and
with only the limited disturbance created by API hooking. colleagues introduced a method for detecting intrusions at the
Another comparable approach is TTAnalyze.1 Like our sandbox, privileged processes level.5 System call sequence monitoring can
TTAnalyze uses API hooking, but it differs from our solution in also facilitate process confinement as introduced with Systracer by
basically one area: it uses the PC emulator QEMU2 rather than Provos.6 Within CWSandbox, we use system call sequence analysis
virtual machines, which makes it harder for the malware to detect to observe the behavior of malware processes and construct
that it’s running in a controlled environment (although it means detailed reports by correlating the collected data.
no significant difference for the analysis).
A different approach is Chas Tomlin’s Litterbox (www.wiul. References
org), in which malware is executed on a real Windows system, 1. U. Bayer, C. Kruegel, and E. Kirda, “TTanalyze: A Tool for Analyzing Malware,”
rather than a simulated or emulated one. After 60 seconds of exe- Proc. 15th Ann. Conf. European Inst. for Computer Antivirus Research (EICAR),
cution, the host machine is rebooted and forced to boot from a 2006, pp. 180–192.
Linux image. After booting Linux, Litterbox mounts the Windows 2. F. Bellard, “QEMU, A Fast and Portable Dynamic Translator,” Proc. Usenix
partition and extracts the Windows registry and complete file list; 2005 Ann. Technical Conf. (Usenix 05), Usenix Assoc., 2005, pp. 41–46.
the Windows partition reverts back to its initial clean state. Litterbox 3. G.C. Hunt and D. Brubacher, “Detours: Binary Interception of Win32 Func-
focuses on network activity, so it makes several dispositions of the tions,” Proc. 3rd Usenix Windows NT Symp., Usenix Assoc., 1999, pp.
simulated network. During malware execution, the Windows host 135–143.
connects to a virtual Internet with an IRC server running, which 4. S. Forrest et al., “A Sense of Self for Unix Processes,” Proc. 1996 IEEE Symp.
answers positively to all incoming IRC connection requests. The tool Security and Privacy (S&P 96), IEEE CS Press, 1996, pp. 120–128.
captures all packets to examine all other network traffic afterwards. 5. S.A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion Detection Using
This approach is advantageous to CWSandbox because IRC con- Sequences of System Calls,” J. Computer Security, vol. 6, no. 3, 1998, pp.
nections are always successful, whereas CWSandbox encounters 151–180.
malware binaries whose associated C&C server is already mitigated. 6. N. Provos, “Improving Host Security with System Call Policies,” Proc. 12th
However, because Litterbox takes only a snapshot of the infected Usenix Security Symp. (Security 03), Usenix Assoc., 2003, pp. 257–271.

two commercial off-the-shelf systems with Intel Pen- The antivirus engine ClamAV classified these samples
tium IV processors running at 2 GHz and with 2 GBytes as 1,572 different malware types. Most of them were dif-
of RAM. Each system ran Debian Linux Testing and had ferent bot variants, particularly of Poebot and Padobot.
two virtual machines based on VMware Server and Of the 6,148 samples, ClamAV classified 3,863 as mali-
Windows XP as guest systems. Within the virtual ma- cious, most likely because signatures for the remaining
chines, we executed CWSandbox, effectively running binaries weren’t available. The antivirus engine should
four parallel environments. We stored the malware bina- have classified 100 percent of the samples as malicious,
ries in a MySQL database to which our analysis systems but it detected only 62.8 percent in this case.
wrote all reports. CWSandbox analyzed all these binaries in roughly 67

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 37


Malware

Sample analysis report


A s Figure A illustrates, CWSandbox analysis reports are quite
detailed. They let analysts quickly estimate what a malware binary
does and whether it needs to be further analyzed manually. With the
behavior-based approach, we get a quick overview of the binary,

- <analysis cwsversion="Beta 1.83" time="22.11.2006 15:26:14"


file="94c87c1c05d8f9628b789bced23f9ab3.exe"
logpath="c:\analysis\log\94c87c1c05d8f9628b789bced23f9ab3.exe\run_1\">
- <calltree>
- <process_call filename="c:\94c87c1c05d8f9628b789bced23f9ab3.exe"
starttime="00:00.046" startreason="AnalysisTarget">
- <calltree>
<process_call filename="C:\WINDOWS\system32\qodxih.exe C:\WINDOWS\system32\qodxih.exe
1428 c:\94c87c1c05d8f9628b789bced23f9ab3.exe" starttime="00:05.765"
startreason="CreateProcess"/>
</calltree>
</process_call>
</calltree>
- <processes>
- <process index="1" pid="884" filename="c:\94c87c1c05d8f9628b789bced23f9ab3.exe"
filesize="173595" md5="94c87c1c05d8f9628b789bced23f9ab3" username="foobar"
parentindex="0" starttime="00:00.046" terminationtime="00:06.281" startreason=
"AnalysisTarget" terminationreason="NormalTermination" executionstatus="OK">
+ <dll_handling_section></dll_handling_section>
+ <filesystem_section></filesystem_section>
+ <mutex_section></mutex_section>
+ <registry_section></registry_section>
+ <process_section></process_section>
+ <system_info_section></system_info_section>

Figure A. CWSandbox analysis report. The report contains detailed information about the analyzed processes, including

hours: the effective throughput was more than 500 bina- the message bodies, so we got complete information
ries per day per instance, which is at least an order of mag- about what the malware wanted to do, which let us de-
nitude faster than human analysis. An analyst can use the velop appropriate countermeasures.
resulting report as a high-level overview and analyze the More than 95 percent of the malware binary sam-
binary deeper manually, if necessary. ples added registry keys to enable autostart mecha-
Of the 324 binaries that tried to contact an Internet nisms. Mutexes are also quite common to ensure that
relay chat (IRC) server, 172 were unique. Because we ex- only one instance of the malware binary is running on
tracted information such as the IRC channel or passwords a compromised host. We commonly saw malware bi-
used to access the command and control servers from the naries copy themselves to the Windows system folder.
samples, we were able to mitigate the botnet risk. These patterns let us automatically define suspect be-
Additionally, 856 of the 6,148 samples contacted havior, and we could extend CWSandbox to automat-
HTTP servers and tried to download additional data from ically classify binaries as normal or malicious on the
the Internet. By observing how the malware handled the basis of observed behaviors.
downloaded data, we learned more about the infection
stages, which ranged from downloading executable code
to click fraud (automated visits to certain Web pages).
We observed 78 malware binaries that tried to use the
Simple Mail-Transfer Protocol (SMTP) as a communi-
W e’ve shown that it’s possible to automate binary
analysis of current Win32 malware using CW-
Sandbox. Such a tool lets analysts learn more about current
cation protocol. We recorded the destination emails and malware, and the resulting analysis reports help the analyst

38 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

which is generally sufficient to extract the most important information. as well as submit samples for analysis. The system returns analysis
You can view complete sample reports at www.cwsandbox.org, reports via email.

+ <winsock_section></winsock_section>
</process>
- <process index="2" pid="1348" filename="C:\WINDOWS\system32\qodxih.exe C:\WIN-
DOWS\system32\qodxih.exe 1428 c:\94c87c1c05d8f9628b789bced23f9ab3.exe" filesize=
"-1" username="foobar" parentindex="1" starttime="00:05.765"
terminationtime="02:00.781" startreason="CreateProcess"
terminationreason="Timeout" executionstatus="OK">
+ <dll_handling_section></dll_handling_section>
+ <filesystem_section></filesystem_section>
+ <mutex_section></mutex_section>
+ <registry_section></registry_section>
+ <process_section></process_section>
+ <system_info_section></system_info_section>
+ <window_section></window_section>
- <winsock_section>
+ <connections_unknown></connections_unknown>
+ <connections_listening></connections_listening>
- <connections_outgoing>
- <connection transportprotocol="TCP" remoteaddr="208.99.207.143"
remoteport="8453" protocol="IRC" connectionestablished="1" socket="608">
- <irc_data username="DEU|7907101" nick="DEU|7907101">
<channel name="####test####" password="nikne" topic_deleted=":.join #a,#b,#c"/>
</irc_data>
</connection></connections_outgoing>

information about changes to the file system, the Windows registry, and the data sent via Winsock.

determine whether a manual analysis is necessary. In the Carsten Willems is a PhD student in the Laboratory for
future, we plan to extend CWSandbox with kernel-based Dependable Distributed Systems at the University of
Mannheim, Germany. His research interests include malware
hooking, which will let us monitor kernel mode rootkits
research, including the analysis of Win32 malware. Willems
and other kernel-based malware. Futhermore, we intend has a MS in computer science from RWTH Aachen University,
to explore the ways in which we can use the CWSandbox- Germany. He is the author of CWSandbox, a tool for auto-
generated reports for malware classification. matic behavior analysis. His company, CWSE GmbH deals
with software development in IT security. Contact him at
cwillems@consolo.de.
References
1. I. Ivanov, “API Hooking Revealed,” The Code Project, Thorsten Holz is a PhD student in the Laboratory for Depend-
2002; www.codeproject.com/system/hooksys.asp. able Distributed Systems at the University of Mannheim, Ger-
2. Holy Father, “Hooking Windows API—Technics of many. His research interests include honeypots, botnets,
malware, and intrusion detection systems. Holz has an MS in
Hooking API Functions on Windows,” CodeBreakers J.,
computer science from RWTH Aachen University, Germany. Con-
vol. 1, no. 2, 2004; www.secure-software-engineering. tact him at thorsten.holz@informatik.uni-mannheim.de.
com/index.php?option=com_content&task=view&id
=54&Itemid=27. Felix Freiling is a professor of computer science and heads the
3. P. Baecher et al., “The Nepenthes Platform: An Efficient Laboratory for Dependable Distributed Systems at the Univer-
sity of Mannheim, Germany. His research interests include the
Approach to Collect Malware,” Proc. 9th Int’l Symp. Recent theory and practice of dependability. Freiling has a PhD in com-
Advances in Intrusion Detection (RAID 06), LNCS 4219, puter science from Darmstadt University of Technology, Ger-
Springer-Verlag, 2006, pp. 165–184. many. Contact him at freiling@informatik.uni-mannheim.de.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 39


Malware

Using Entropy Analysis


to Find Encrypted and
Packed Malware
In statically analyzing large sample collections, packed and
encrypted malware pose a significant challenge to automating
the identification of malware attributes and functionality.
Entropy analysis examines the statistical variation in
malware executables, enabling analysts to quickly and
efficiently identify packed and encrypted samples.

ROBERT LYDA alware authors often use encryption or pack- stream. Specifically,
Sparta

JAMES
HAMROCK
McDonald
M ing (compression) methods to conceal their
malicious executables’ string data and code.
These methods—which transform some or
all of the original bytes into a series of random-looking
data bytes—appear in 80 to 90 percent of malware sam-
it sums the frequency
of each observed byte value (00h – FFh) that occurs in
fixed-length data blocks, and then applies the entropy for-
mula to generate entropy scores. Higher entropy scores
tend to correlate with the presence of encryption or com-
Bradley ples.1 This fact creates special challenges for analysts who pression. Further, to compensate for the variety of known
use static methods to analyze large malware collections, as packing and encryption tools and the varying degree of
they must quickly and efficiently identify the samples and transformations they produce, we developed a methodol-
unpack or decrypt them before analysis can begin. Many ogy that uses Bintropy to discriminate between native exe-
tools, including the packing tools themselves, are generally cutables and those that have been packed or encrypted.
successful at automatically unpacking or decrypting mal- Our methodology leverages the results obtained from
ware samples, but they’re not effective in all cases. Often- training Bintropy over different sets of executable file types
times, the tools fail to recognize and reverse the to derive statistical measures that generalize each file type’s
transformation scheme or find the original entry point in expected entropy ranges. The methodology compares the
the malware binary. Many malware samples thus remain malware executable’s entropy traits—which Bintropy
packed or fully encrypted, and analysts must identify them computes—against the expected ranges to determine if
for manual analysis and reverse engineering. the malware is packed or encrypted.
The difficulty of recognizing these transformed bytes Here, we describe the Bintropy tool and methodol-
can vary greatly, depending on the transformation ogy. We also discuss trends associated with malware en-
scheme’s strength and the original bytes’ statistical nature. cryption and packing, which we discovered by applying
However, stronger transformation schemes—such as the tool and methodology to a corpus of 21,576 PE-
Triple DES encryption—typically produce less pre- formatted malware executable files obtained from a lead-
dictable sequences. This principle serves as the basis for ing antivirus vendor.
Bintropy, a prototype binary-file entropy analysis tool
that we developed to help analysts conveniently and Approach and technical analysis
quickly identify encrypted or packed malware. Bintropy Following a description of entropy and its measurement,
operates in multiple modes and is applicable to any file. we describe how we use entropy analysis to identify
Here, we focus on files in the Windows Portable Exe- packed or encrypted malware executables. We then offer
cutable (PE) format, which comprises the format of the results from testing our methodology.
majority of malware executables.
Bintropy uses an established entropy formula to calcu- Entropy analysis
late the amount of statistical variation of bytes in a data Information density, or entropy, is a method for measur-

40 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Malware

ing uncertainty in a series of numbers or bytes.2 In tech- mulates an overall confidence score by using a rule-based
nical terms, entropy measures the level of difficulty or the methodology to analyze the block entropy scores against a
probability of independently predicting each number in set of predefined entropy attribute metrics.
the series. The difficulty in predicting successive num- Bintropy has two modes of operation. In the first, the
bers can increase or decrease depending on: tool analyses the entropy of each section of PE-formatted
executables, as specified in the executable’s header. This
• the amount of information the predictor has about the helps analysts determine which executable sections might
function that generated the numbers, and be encrypted or packed. A standard compiler-generated
• any information retained about the prior numbers in PE executable has standard sections (such as .text, .data,
the series. .reloc, .rsrc). However, many packing tools modify the
original executable’s format, compressing the standard sec-
For example, suppose we had a sequence of n consecutive tion’s code and data and collapsing them into the one or
numbers in a Fibonacci series, which is a sequence of two new sections. In this mode, Bintropy calculates an en-
numbers computed by adding the successive sums of the tropy score for each section it encounters. It doesn’t calcu-
preceding two numbers in the series. If we had knowl- late a score for the header section, which in our experience
edge of how the Fibonacci function worked or saw is unlikely to contain encrypted or compressed bytes.
enough numbers in the series to recognize the pattern, Bintropy’s second operational mode completely ig-
we could predict the series’ next number with absolute nores the file format. Instead, it analyzes the entire file’s en-
certainty. In effect, the entropy changes when the predic- tropy, starting at the first byte and continuing through to
tor applies prior knowledge or relevant knowledge the last. With a PE-formatted file, users can thus analyze
gained to determine the probabilities of successive num- the entropy of code or data hidden at the end of a file or in
bers in the series. Thus, receiving information about the between PE-defined sections (cavities), which is where
generator function reduces the entropy by the value of stealthy file-infecting samples, such as W32/Etap (http://
the information received.3 vil.nai.com/vil/content/v_99380.htm), typically hide.
Although a sequence of good random numbers will
have a high entropy level, that alone doesn’t guarantee Entropy metrics and
randomness. For example, a file compressed with a soft- confidence-scoring methodology
ware compressor—such as gzip or winzip—might have a There are currently hundreds of different packing algo-
high entropy level, but the data is highly structured and rithms, each of which employs popular compression
therefore not random.4 Simply observing entropy will and/or encryption algorithms such as Huffman, LZW,
not necessarily provide enough information to let the and polymorphism to protect executable files.5 However,
observer distinguish between encryption and compres- our entropy analysis objective is not to model the trans-
sion unless the observer knows how the data was gener- formations of any specific packing or encryption tool.
ated. We can compute the entropy of a discrete random Rather, it is to develop a set of metrics that analysts can
event x using the following formula2: use to generalize the packed or encrypted executable’s
n entropy attributes and thus distinguish them from native
H ( x ) = − ∑ p(i ) log 2 p(i ) , (nonpacked or unencrypted) ones. As such, our method-
i =1 ology computes entropy at a naïve model level, in which
we compute entropy based only on an executable byte’s
where p(i) is the probability of the ith unit of information occurrence frequency, without considering how the
(such as a number) in event x’s series of n symbols. This bytes were produced.
formula generates entropy scores as real numbers; when
there are 256 possibilities, they are bounded within the
range of 0 to 8. Although a sequence of good
Bintropy: A binary entropy analysis tool random numbers will have a high
Bintropy is a prototype analysis tool that estimates the like-
lihood that a binary file contains compressed or encrypted entropy level, that doesn’t in itself
bytes. Specifically, the tool processes files by iterating
through fixed-length data blocks in the binary, summing guarantee randomness.
the frequency of each block’s observed byte values (00h –
FFh). From this, it calculates the block’s entropy score. In Experiments. To develop a set of entropy metrics, we
addition to individual block entropy scores, Bintropy cal- conducted a series of controlled experiments using the
culates other entropy-related file attributes, including the Bintropy tool. Our goal was to determine the optimal
average and highest entropy scores. Finally, Bintropy for- entropy metrics for native executable files and files con-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 41


Malware

Table 1. Computed statistical measures based on four training sets.


DATA SETS AVERAGE 99.99% CONFIDENCE HIGHEST ENTROPY 99.99% CONFIDENCE
ENTROPY INTERVALS (LOW TO HIGH) (AVERAGE) INTERVALS (LOW TO HIGH)

Plain text 4.347 4.066 – 4.629 4.715 4.401 – 5.030


Native executables 5.099 4.941 – 5.258 6.227 6.084 – 6.369
Packed executables 6.801 6.677 – 6.926 7.233 7.199 – 7.267
Encrypted executables 7.175 7.174 – 7.177 7.303 7.295 – 7.312

taining data transformations produced by encryption and (low and high bounds) with an associated probability p.
packing algorithms. The experiments consisted of four We generated this probability from a random sampling of
separate tests, with training data sets for native, com- an underlying population (such as malware executables),
pressed, and encrypted executable files, as well as a set for such that if we repeated the sampling numerous times and
plain text files for additional comparison. recalculated each sample’s confidence interval using the
The native training set consisted of 100 Windows 32- same method, a proportion p of the confidence interval
bit PE executables, which we alphabetically selected would contain the population parameter in question.
from the default “systems” folder on a Windows XP Ser- Using the training data results (see Table 1), we derive
vice Pack 2 OS environment. The packed training set rep- two entropy metrics based on the computed confidence
resented a diverse set of packing algorithms; we intervals for average and highest entropy. Using a 99.99
generated these executables by applying UPX (http:// percent confidence level, executables with an average en-
upx.sourceforge.net), MEW1.1 (http://northfox.uw. tropy and a highest entropy block value of greater than
hu/index.php?lang=eng&id=dev), and Morphine 1.2 6.677 and 7.199, respectively, are statistically likely to be
(www.hxdef.org) packing transformations to three sepa- packed or encrypted. These two values are the lower
rate copies of the native executables. To generate the en- confidence interval bounds of the entropy measures we
crypted training set, we applied Pretty Good Privacy (PGP; computed for packed executables, and form the basis for
www.pgpi.org/doc/pgpintro) file encryption to the na- our methodology for analyzing malware executables for
tive executables. the presence of packing or encryption. If both the Bin-
We also performed a series of tests using different- tropy computed average file entropy and highest entropy
sized blocks. The tests determined that 256 bytes is an block score exceed these respective values, we label a
optimal block size. Tests using larger block sizes, such as malware executable as packed or encrypted.
512 bytes, tended to reduce the subjects’ entropy scores
when encryption existed only in small areas of the exe- Limitations of the approach
cutable. Our experiments also showed that executables It’s infeasible to absolutely determine if a sample contains
generally contain many blocks of mostly (or all) zero- compressed or encrypted bytes. Indeed, Bintropy can pro-
value data bytes, which compilers commonly generate to duce both false positives and false negatives. False negatives
pad or align code sections. This technique can greatly re- can occur when large executables—those larger than
duce an executable’s entropy score, because it increases 500kbytes—contain relatively few encrypted or com-
the frequency of a single value. To compensate for this pressed blocks and numerous valid blocks, thereby lowering
characteristic, we altered Bintropy to analyze only the executable file’s average entropy measure. False positives
“valid” byte blocks—that is, blocks in which at least half can occur when processing blocks that score higher than
of the bytes are nonzero. the packing confidence interval’s lower bound, but the
blocks’ bytes contain valid instruction sequences that coin-
Results. We applied the Bintropy tool to each of the four cidentally have a high degree of variability.
training sets to compute individual entropy measures for Using the statistical results computed from our packed
each set’s files. We configured Bintropy to process files in training data set, we calculated our confidence-interval-
256-byte-sized blocks and to ignore the executables’ for- based methodology’s expected false positive rate. We
mat. For each file, the tool computed an average entropy treated the packed data sets as a matched pairs t-distribu-
score and recorded the highest block entropy score. tion because the underlying system files were the same
Using standard statistical measures, we then aggregated and, therefore, the only randomization was that of the
the data for each data set, computing each set’s respective packer. We used these intervals as the entropy filter on
entropy and highest entropy score averages. For each data unknown input samples. We applied a standard t-test to
set’s average and highest entropy scores, we computed a the data set and calculated a Type I error’s significance
confidence interval—the interval between two numbers level—in this case, the false positive rate—to be 0.038

42 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

percent. This value indicates the likelihood of an un- 1.0


packed or unencrypted sample passing through the con-
0.9
fidence-interval filter. To compute the expected false
negative rate, we calculated the statistical likelihood that a 0.8
pec2
packed sample falls outside our 99.99 percent confidence- 0.7
pec1
UPX2
interval range (that is, within the .01 percent range). Be- UPX1

Section (%)
cause we’re concerned only about packed executables 0.6 .aspack
DATA
with entropy scores below the interval’s lower bounds, 0.5 CODE
we halve this value to obtain the expected false negative .idata
0.4 .rdata
rate of .005 percent. .reloc
Finally, sophisticated malware authors could employ 0.3 .rsrc
.data
countermeasures to conceal their encryption or com- 0.2 .text
pression use. For example, they could make encrypted
bytes generated using strong cryptography look less ran- 0.1
dom by padding the bytes with redundant bytes. They 0.0
could also exploit our approach by distributing en- 2000 2001 2002 2003 2004 2005
crypted bytes among invalid blocks, such that the blocks Year
remained invalid. Such countermeasures could reduce
the executable’s entropy score, and thereby limit Bin- Figure 1. Percentage of encrypted or packed sections over a six-year
tropy’s effectiveness. period. UPX1 was the most prevalent of the packed sections across
the period, followed by the .text section.
Entropy trends
As mentioned, we identified entropy trends by running
Bintropy and applying our confidence-interval method- section’s percentages by totaling the section’s encrypted
ology to a corpus of 21,567 malware Windows32-based or packed occurrences and dividing that number by the
PE executables from a leading antivirus vendor’s collec- total number of sections that were packed or encrypted
tion from January 2000 to December 2005. We orga- that year.
nized the resulting malware samples by the year and In 2000, .reloc was the most often packed or en-
month in which the vendor discovered them. crypted section. This section’s popularity steadily de-
Our analysis computes the entropy of a malware exe- clined across the remaining years, with only a slight
cutable’s PE sections and analyzes the trends of packed or increase in 2005. The second most-packed section in
encrypted PE sections (as identified by our methodol- 2000 was UPX1, which is generated by the very popular
ogy) across a large time span. With the exception of the UPX packing tool. Due to UPX’s prevalent use in nu-
header section, we analyzed all sections identified in the merous W32/GAOBOT and W32/SPYBOT variants,
malware executables. Further, we performed the analysis the presence of the UPX1 section in the data set increased
without regard to any particular section’s purpose or over the next few years, peaking in 2002. Thereafter, its
“normal” use. This enabled us to analyze code or data, prevalence steadily decreased, but UPX1 remained the
hidden in or appended to any of the identified sections. most popular of all the sections we identified as packed
Our trend analysis features the top 13 PE sections that across this six-year period, followed by the .text section,
exceeded the packing lower-bound confidence interval which is where the compiler writes most of a program’s
threshold in aggregate. These sections comprised eight executable code.
standard PE sections and five packer-generated sections. Evidence of packing in the .rdata section increased
The eight standard sections—.text, .data, .rsrc, .reloc, .rdata, in popularity from 2000 through 2005. In 2000 and
.idata, CODE, and DATA—are created by default by most 2001, packing of the .text, .data, and .rsrc sections was
PE-generating compilers. The remaining five sections— very prevalent, and then decreased in 2002; the sec-
.aspack, UPX1, UPX2, pec1, and pec2 are created by tions then steadily increased to peak in 2005. Packing
packing technologies that replace the default PE-formatted in the CODE, DATA, and .idata sections show no clear
sections and their bytes with custom ones. However, be- trends over the study period. UPX2, pec1, and pec2
cause other packers reuse the default sections, the packers were the least-prevalent of the sections we identified as
that created the nondefault sections highlighted in our being packed; they were at their lowest in 2003 and
analysis are not necessarily the most prevalent packers in use. were relatively more popular at the time period’s be-
ginning and end.
Section packing trends over time
Figure 1 shows which sections were the most often Additional packing trends
packed or encrypted for a given year. We calculated each Figure 2 shows the annual number of packed or en-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 43


Malware

1,800 This graph accounts for only those samples that were
2000 2001 2002 2003 2004 2005 above the packing confidence interval’s lower bound.
1,600
The overall average entropy value for all other non-
1,400
packed sections was approximately 4.1. The graph paints
Number of sections

1,200 a fairly consistent picture: entropy levels increased from


1,000 2000 through 2005 for nearly every section and type of
packing/encryption. The exceptions were the DATA
800
section and pec1, which trended up-down-up, and
600 pec2, which trended down. This data indicates that the
400 variability of the bytes that the packing and encryption
200
technologies produced generally increased throughout
the six-year period.
0
.text .data .rsrc .reloc .rdata .idata CODE DATA .aspack UPX1 UPX2 pec1 pec2
Section types

Figure 2. Number of encrypted sections by year. Packing of .text


section increased the most dramatically across the period.
O verall, the Bintropy tool proved useful for analyzing
and generating statistics on malware collections that
contained packed or encrypted samples. It analyzed PE
sections of the malware binaries in detail, providing a
quick statistical prediction of the existence of significant
3,000 data randomization, thereby accurately identifying
packed or encrypted malware samples. The tool was also
2,500 successful in identifying encrypted sections and provid-
ing statistical data on large-sized malware sample collec-
Total number sections

2,000 tions at a low level of detail.


The advantage of using entropy analysis is that it offers
1,500 a convenient and quick technique for analyzing a sample
at the binary level and identifying suspicious PE file re-
1,000 gions. Once the analysis identifies sections of abnormal
entropy values, analysts can perform further detailed
500
analysis with other reverse-engineering tools, such as the
0
IDAPro disassembler.
.text .data .rsrc .reloc .rdata .idata CODE DATA .aspack UPX1 UPX2 pec1 pec2 Our research goal was to develop a coarse-grained
Section types
methodology and tool to identify packed and encrypted
executables. However, a more fine-grained approach
Figure 3. Total number of encrypted sections over the six-year study might be useful in identifying the particular transformation
period. Packing of the .text and .UPX1 were the most prevalent algorithms that malware authors apply to their malware. To
during this time period. improve Bintropy’s entropy computation beyond simple
frequency counting, such an approach might further ex-
amine the algorithms and the statistical attributes of the
crypted sections for each section type. This graph clearly transformations they produce to develop profiles or
shows the packing or encrypting trends of particular sec- heuristics for fingerprinting their use in malware.
tions across the years. One of the most notable trends is
the increased packing of the .text, .data, .rsrc, and espe-
cially the .rdata sections across the period. It also shows Acknowledgments
UPX1’s overall prevalence and the lack of a perceivable The authors especially thank Jim Horning, David Wells, and
trend for the CODE, DATA, and .idata sections. David Sames for their technical input and constructive feedback, which
Figure 3 is an accumulation of Figure 2’s data, show- helped to significantly improve this article.
ing the most commonly packed and encrypted sections
over the six-year period. This graph helps delineate each References
section’s exact popularity over the entire period com- 1. T. Brosch and M. Morgenstern, “Runtime Packers: The
pared to the previous graphs. As Figure 3 shows, the six Hidden Problem,” Proc. Black Hat USA, Black Hat, 2006;
most commonly packed sections, in order, were .text, www.blackhat.com/presentations/bh-usa-06/BH-US-
UPX1, .data, .rsrc. .idata, and .rdata. 06-Morgenstern.pdf.
Figure 4 depicts each section’s average entropy high- 2. C.E. Shannon and W. Weaver, The Mathematical Theory
scores attribute accumulated over the six-year period. of Communication, Univ. of Illinois Press, 1963.

44 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

Related work
U sing entropy to measure randomness or unpredictability in an
event sequence or series of data values is a well-accepted sta-
tistical practice in the fields of thermodynamics and information
mulates the information to provide a quick statistical prediction for
the existence of significant data randomization, which indicates
encryption or packing in large file collections that include exe-
theory.1,2 In malicious code analysis, researchers have used cutable files. Analysts can also use Bintropy to perform a more in-
entropy analysis in various applications. Julien Olivain and Jean depth analysis of any particular PE-formatted file section.
Goubault-Larrecq developed the Net-Entropy tool to identify Finally, a group of hackers developed and maintains the PEiD
anomalous encrypted network traffic that might indicate a analysis tool (http://peid.has.it). According to its Web site, the tool
network-based attack.3 can identify the signatures of more than 600 packing tools. PEiD
Closer to our own research, a few tools analyze Portable has an option for analyzing executable files’ entropy. However, its
Executable (PE) file entropy. WinHex (www.winhex.com/winhex/ developers don’t provide detailed documentation on its implemen-
analysis.html) is a commercially available tool that uses entropy to tation or underlying methodology.
identify common file types, including plain text, jpeg, and binary.
Portable Executable Analysis Toolkit (PEAT) is a tool suite that lets References
analysts examine a Windows PE file’s structural aspects.4 PEAT 1. C.E. Shannon and W. Weaver, The Mathematical Theory of Communication,
provides byte-value entropy scores for each PE segment’s parti- Univ. of Illinois Press, 1963.
tioned section. It then normalizes these entropy values against 2. R. Clausius, “On the Application of the Theorem of the Equivalence of
each window’s total entropy. This helps analysts identify section Transformations to Interior Work,” communicated to the Naturforschende
portions that drastically change in entropy value, indicating Gesellschaft of Zurich, Jan. 27th, 1862; published in the Viertaljahrschrift of
section-alignment padding or some other alteration of the original this Society, vol. vii., p. 48.
file. To use PEAT effectively, analysts must have some domain 3. J. Olivain and J. Goubault-Larrecq, Detecting Subverted Cryptographic Pro-
knowledge about PE files, viruses, and other system-level concepts, tocols by Entropy Checking, research report LSV-06-13, Laboratoire Spécifi-
as well as some experience working with PEAT. cation et Vérification, June 2006; www.lsv.ens-cachan.fr/Publis/RAPPORTS
We’ve extended PEAT’s segment entropy score approach and _LSV/PDF/rr-lsv-2006-13.pdf.
created a detection tool for automatically identifying encrypted or 4. M. Weber et al., “A Toolkit for Detecting and Analyzing Malicious Soft-
packed PE executables with a certain degree of confidence. ware,” Proc. 18th Ann. Computer Security Applications Conf., IEEE CS Press,
Bintropy has a similar fidelity of analysis capability, but accu- 2002, pp. 423–431.

3. R.W. Hamming, Coding and Information Theory, 2nd ed., 7.4


Prentice-Hall, 1986. 2000 2001 2002 2003 2004 2005
4. M. Haahr, “An Introduction to Randomness and Ran-
dom Numbers,” Random.org, June 1999; www.random. 7.3
org/essay.html.
5. A. Stephan, “Improving Proactive Detection of Packed
7.2
Entropy high score

Malware,” Virus Bulletin, 01 Mar. 2006; www.virusbtn.


com/virusbulletin/archive/2006/03/vb200603-packed.

7.1
Robert Lyda is a research engineer at Sparta, where he analyzes
malicious code for government and law enforcement agencies.
In addition to malware trend and technology assessments, he
7.0
provides such agencies with detailed reporting of specific mal-
ware samples using static and dynamic analysis techniques. His
research interests include applying machine-learning mechanisms
for classifying malware samples based on statically observable 6.9
features. He has a BS in computer science from the University of
Maryland, College Park. Contact him at robert.lyda@sparta.com.
6.8
Jim Hamrock is a software engineer with McDonald Bradley, .text .data .rsrc .reloc .rdata .idata CODE DATA .aspack UPX1 UPX2 pec1 pec2
where he is a leading researcher in malware-analysis trends,
applying mathematical and statistical models to study patterns
and trends in large sample collections. His research interests Figure 4. Annual average entropy high scores for each section
include developing algorithms and software analysis tools and type. The technologies’ strengths generally increased over the
reverse engineering of malware samples. He has an MS in
applied mathematics from Johns Hopkins University. Contact study period.
him at jhamrock@mcdonaldbradley.com.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 45


Malware

Code Normalization
for Self-Mutating Malware

Next-generation malware will adopt self-mutation to


circumvent current malware detection techniques. The
authors propose a strategy based on code normalization
that reduces different instances of the same malware into a
common form that can enable accurate detection.

DANILO ost of today’s commercial malware-detection tical viewpoint,


BRUSCHI,
LORENZO
MARTIGNONI,
AND MATTIA
MONGA
M tools recognize malware by searching for
peculiar sequences of bytes. Such byte
strings act as the malware’s “fingerprint,”
enabling detection tools to recognize it inside executable
programs, IP packet sequences, email attachments, digi-
reliable detection is
possible. We base this belief on several considerations,
which we outline in a later section. On the basis of these
observations, and considering a series of advances in mal-
ware detection presented in the literature (see the sidebar),
Università tal documents, and so on. Thus, these patterns are usually we developed our own approach for dealing with self-mu-
degli Studi di called malware signatures. Detectors assume that these sig- tating code that uses static analysis to exploit the weak-
Milano natures won’t change during the malware’s lifetime. Ac- nesses of the transformations the self-mutating malware
cordingly, if the fingerprint changes (that is, if the have adopted. The result is a prototype tool that, in the
malware code mutates), detection tools can’t recognize it cases we’ve considered, can determine the presence of a
until the malware-detection team develops a new finger- known self-mutating malicious code, the guest, within an
print and integrates it into the detector. To defeat signa- innocuous executable program, the host.
ture-based detection, attackers have introduced
metamorphic malware—that is, a self-mutating malicious Mutation and infection techniques
code that changes itself and, consequently, changes its Executable object mutations occur via various program-
fingerprint automatically on every execution.1 (Code ob- transformation techniques.5–7 Basically, malware mutates
fuscation involves transforming a program into a seman- itself and hides its instructions within the host’s benign
tically equivalent one that’s more difficult to reverse program code.
engineer. Self-mutation is a particular form of obfusca-
tion in which the obfuscator and the code being obfus- Mutation techniques
cated are the same entity.) Although this type of malware A malware can adopt several common strategies to
hasn’t yet appeared in the wild, some prototypes have achieve code mutation. As it applies these transforma-
been implemented (for example, W32/Etap.D, W32/ tions several times randomly, each one introduces further
Zmist.DR, and W32/Dislex), demonstrating these muta- perturbations locally, to a basic block, as well as globally to
tion techniques’ feasibility and efficacy against traditional the control-flow graph. When the malware introduces a
antivirus software. Recent work also demonstrates that at- fake conditional jump, for example, subsequent transfor-
tackers can easily circumvent current commercial virus mations obfuscate the way in which the program com-
scanners by exploiting simple mutation techniques.2,3 putes the predicate, and so on.
Perfect detection of self-mutating malware is an unde-
cidable problem—no algorithm can detect with complete Instruction substitution. A sequence of instructions is
accuracy all instances of a self-mutating malware.4 Despite associated with a set of alternative instruction sequences
such negative results, we strongly believe that from a prac- that are semantically equivalent to the original one. The

46 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Malware

Related work in malware-detection techniques


F rédéric Perriot first proposed the idea of leveraging code-
optimization techniques to aid malicious code detection in 2003.1
Building on that work, we show in the main article that the transfor-
to each other and then using it to identify the same function
within different executables. Unfortunately, we can’t use these
approaches in our context for two reasons:
mations current malicious codes adopt are weak and can be reverted
to reduce different instances of the same malware into the same • The malicious code can be located anywhere, even inside another
canonical form. We can then use this form as a pattern for detection. procedure, so we should formulate the detection as a subgraph
The malware detector Mihai Christodorescu proposes2 is based isomorphism and not as a graph isomorphism (or a simplified in-
on the idea of semantic-aware signatures and leverages static stance, such as comparing graphs fingerprints).
analysis and decision procedures to identify common malicious • One main, observable difference among different instances of the
behaviors that are shared among different malware variants but are same self-mutating malicious code is in the control graph’s struc-
generally obfuscated to protect them from standard detection ture; thus, normalization is a fundamental step toward performing
techniques. Thus, a malware detector can use one signature to effective control-flow graph matching.
detect a big class of malicious codes that share the same common
behaviors. Our work, instead, focuses on code normalization Finally, the idea of generating a fingerprint from the executable
because we believe it’s fundamental to identifying equivalent code control-flow graph and using it to detect different instances of the
fragments generated through self-mutation. same polymorphic worm is proposed in another work,6 from which
Other authors’ works perform malware normalization via term we adopted our labeling technique. Their fingerprinting technique,
rewriting.3 The rewriting system proposed is suited to the rules however, suffers the latter of the two problems just described.
used by self-mutating engines and can help normalize different
malware instances generated using instruction substitution and References
irrelevant instructions insertion. Our approach doesn’t guarantee 1. F. Periot, “Defeating Polymorphism Through Code Optimization,” Proc.
perfect equivalence among different normal forms (that is, dif- Virus Bulletin Conf. 2003, Virus Bulletin, 2003, pp. 142–159.
ferent malware instances can still perform the same operation 2. M. Christodorescu et al., “Semantics-Aware Malware Detection,” Proc.
using different machine instructions), but it can remove unnec- 2005 IEEE Symp. Security and Privacy, IEEE CS Press, 2005, pp. 32–46.
essary computations and recover the original control-flow graph. 3. A. Walenstein et al., “Normalizing Metamorphic Malware using Term
Moreover, our approach is independent of the particular rules for Rewriting,” Proc. Int’l Workshop on Source Code Analysis and Manipulation
self-mutation the malware adopts, so we don’t need any (SCAM), IEEE CS Press, 2006, pp. 75–84.
knowledge about the malware to normalize its instances. 4. E. Carrera and G. Erdélyi, “Digital Genome Mapping—Advanced Binary Mal-
Different researchers propose the idea of assessing program ware Analysis,” Proc. Virus Bulletin Conf., Virus Bulletin, 2004, pp. 187–197.
equivalence by comparing control-flow or call graphs. In one 5. H. Flake, “Structural Comparison of Executable Objects,” Proc. Conf. Detec-
work,4 the similarity measure obtained from comparing the call tion of Intrusions and Malware & Vulnerability Assessment (DIMVA), IEEE CS
graph helps automatically classify malicious code in families; this Press, 2004, pp. 161–173.
classification is coherent with the actual malware naming. In 6. C. Kruegel et al., “Polymorphic Worm Detection using Structural Infor-
another,5 researchers measure program-functions similarity by mation of Executables,” Proc. Int’l Symp. Recent Advances in Intrusion Detec-
comparing a fingerprint generated from their control-flow graph tion, Springer, 2005, pp. 207–226.

malware can replace every occurrence of the original se- instructions a = b / d, b = a * 2, for example, the
quence with an arbitrary element from this set. malware can insert any instruction that modifies b be-
tween the first and the second instruction. Moreover, in-
Instruction permutation. Independent instructions— structions that reassign any other variables without really
that is, those whose computations don’t depend on pre- changing their value can be inserted at any point of the
vious instructions’ results—can be arbitrarily permuted program (for example, a = a + 0, b = b * 1, ...).
without altering the program’s semantics. For example,
the malware can execute the three statements a = b * c, Variable substitutions. The malware can replace a vari-
d = b + e, and c = b & c in any order, provided that the able (register or memory address) with another variable
use of the c variable precedes its new definition. belonging to a set of valid candidates preserving the pro-
gram’s behavior.
Garbage insertion. This is also known as dead-code in-
sertion, and involves the malware inserting, at a particular Control-flow alteration. Here, the malware alters the
program point, a set of valid instructions that don’t alter order of the instructions as well as the program’s structure
its expected behavior. Given the following sequence of by introducing useless conditional and unconditional

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 47


Malware

Alleged infected program portions that aren’t used to store data or code (compilers
can introduce them to align code and data structure). The
malware identifies these “cavities” and uses them to insert
Code interpretation
small pieces of malicious code that the host will execute
after minor modifications of its code.
Code normalization

Expression propagation Jump-table manipulation. In a compiled program,


high-level control-flow transfer constructs such as
switch are implemented using jump tables to redirect
Dead-code elimination the execution to the appropriate location according to
the table’s contents and the value that the control variable
Expression simplification assumes. The malware can modify entries of such tables
to get the host to redirect the execution anywhere.
Indirections resolution
Data-segment expansion. To create the required space
inside the host code’s address space, malware can expand
Control-flow normalization some of the host segments as needed. Not all segments
are suited for expansion because that would require relo-
cating most of the code. Other segments, such as the one
Normalized code storing uninitialized data, seem to be more appropriate
Known malicious code
because their expansion allows the malware to insert ma-
licious code without requiring further modification of
Code comparison the host code.

Proposed detection strategy


Match archetype? To be effective, malware mutations must be automatic
and efficient; otherwise, their mutation computations
Figure 1. Overview of the detection process. Starting with a suspicious would be too difficult to hide. More precisely,8 a self-
program, we translate it into the normal form, normalize it, and then mutating malware must be able to analyze its own body
compare the normalized version with known malicious programs. and extract from it all the information needed to mutate
itself into the next generation, which in turn must be able
to perform the same process, and so forth. Because muta-
branch instructions such that at runtime, the order in tions occur directly on machine code, and the mutation
which the program executes single instructions isn’t engine is embedded directly into the malicious code it-
modified. Furthermore, the malware can translate direct self, the applicable transformations must, in most cases,
jumps and function calls into indirect ones whose desti- be rather trivial. Consequently, we can iteratively reverse
nation addresses are camouflaged through other instruc- the mutation process until we obtain an archetype (that is,
tions in order to prevent an accurate reconstruction of the the original and unmutated version of the program from
control flow. which the malware derives other instances). Experimen-
tal observations show that self-mutating programs are ba-
Infection techniques sically highly unoptimized code, containing a lot of
To camouflage the link between the benign host and the redundant and useless instructions.
malicious guest, the malware tangles each one’s instruc- Our detection process is thus divided into three steps:
tions together by exploiting smart techniques referred to code interpretation, code normalization, and code comparison.
as entry-point obfuscation (see http://www3.ca.com/ Figure 1 illustrates the entire detection process: the detec-
securityadvisor/glossary.aspx). Once the malware tor’s input is an executable program that hosts a known
achieves a seamless integration, the host invokes the new malicious code. The code interpreter then transforms the
guest code in a way very similar to how it invokes other program into an intermediate form, which is then nor-
portions of its own code. Moreover, the host will con- malized. Finally, the code comparator analyzes the nor-
tinue to work exactly as it did before the infection. malized form to identify possible matches of the known
The malware can adopt different techniques— malicious code within the normalized program.
which require minimal effort to implement—to
achieve this goal. Code interpretation
To ease the manipulation of object code, our detector
Cavity insertion. Generally, executables contain several uses a high-level representation of machine instructions

48 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

to express every opcode’s operational semantics, as well as Table 1. An example of code interpretation.
the registers and memory addresses involved. Code inter-
pretation considerably simplifies the subsequent sub- MACHINE INSTRUCTION INTERPRETED INSTRUCTION
processes because normalization and comparisons will
target a language with a very limited set of features and pop %eax r10 = [r11]
instructions. In fact, the instructions composing the lan- r11 = r11 + 4
guages are just assignments, function calls, jumps (condi- lea %edi, [%ebp] r06 = r12
tional and unconditional), and function returns. The dec %ebx tmp = r08
instruction operands can be only registers, memory ad- r08 = r08 - 1
dresses, and constants. NF = r08@[31:31]
Table 1 shows a simple example, reduced from the ZF = [r08 = 0?1:0]
original due to space constraints. In the table, the nota- CF = (~(tmp@[31:31])...
tion [r10] denotes the content of the memory whose ...
effective address is in the register r10, and the notation
r10@[31:31] denotes a slice of bits of the register r10,
composed by the register’s bit. Note that even the simple Table 2. A scenario generating a high-level expression.
dec instruction conceals a complex semantics: its argu-
ment is decremented by one, and the instruction, or the ORIGINAL EXPRESSION RECONSTRUCTED EXPRESSION
subtraction, produces an update of six control flags ac- r10 = [r11] r10 = [r11]
cording to the result. r10 = r10 | r12 r10 = [r11] | r12
[r11] = [r11] & r12
Code normalization [r11] = ~[r11]
The code normalizer’s goal is to transform a program [r11] = [r11] & r10 [r11] = (~([r11] & r12)) &
into a canonical form that’s simpler in terms of structure ([r11] | r12)
or syntax, while preserving the original semantics. We
observed that most of the transformations a malware uses
to dissimulate its presence lead to underoptimized ver- no more than one or two operands. Propagation carries
sions of the archetype. The mutated versions grow be- forward values assigned or computed by intermediate
cause they’re stuffed with irrelevant computations instructions. This lets us generate higher-level expres-
whose presence only has the goal of avoiding recogni- sions (with more than two operands) and eliminate all
tion. Normalization thus aims to transform the code intermediate temporary variables that the malware used
into a more compact form by reducing the number of to implement high-level expressions. The code frag-
useless instructions, as well as the number of indirect ment in Table 2 shows a simple scenario in which,
paths. We can consequently view normalization as an thanks to propagation, our detector generates a higher-
optimization of the malicious code aimed at simplifying level expression.
the code structure.
The normalization process consists of a set of transfor- Dead-code elimination. Dead instructions are those
mations that compilers adopt to optimize the code and whose results the program never uses. We remove them
improve its size.9–11 They all rely on the information col- from a program because they don’t contribute to the
lected during a static analysis phase (control-flow and computation. In Table 2, the first instruction, after
data-flow analysis), which we perform on the code at the propagation, is completely useless because the inter-
beginning of the normalization process. The more accu- mediate result has been propagated directly into the
rate the static analysis, the higher the chances of applying next instruction.
these transformations. We also classify assignments as dead instructions when
Given that the malicious code can be anywhere in- they define a variable but don’t change its value (for ex-
side the host program, we perform the normalization ample, r10 = r10 + 0).
on the whole program, letting the normalization
process also target the malicious guest. As Figure 1 Expression simplification. Most of the expressions
shows, all transformations are repeated iteratively be- contain arithmetical or logical operators, so they can
cause they depend on each other; normalization will sometimes be simplified automatically according to ordi-
stop when we can no longer apply any of these transfor- nary algebraic rules. When simplification isn’t possible,
mations to the code. our tool can reorder variables and constants to enable fur-
ther simplification after propagation.
Expression propagation. The Intel IA-32 assembly in- Simplification becomes very useful when applied to
structions denote simple expressions that generally have expressions representing branch conditions and memory

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 49


Malware

1 xor %ebx,%ebx 1 r11 := r11 ^ r11 1 r11 := 0 1 r11 := 0


2 mov $0x1000400c,%eax 2 r10 := 0x10004014 2 r10 := 0x10004014 2 r10 := 0x10004014
3 mov %eax,0x10004014 3 [0x1000400c] := r10 3 [0x1000400c] := 0x10004014 3 [0x1000400c] := 0x10004014
4 add %ebx,%eax 4 r10 := r10 + r11 4 r10 := 0x10004014 + 0 4 r10 := 0x10004014 + 0
5 test %ebx,%ebx 5 tmp = r11 - r11 5 tmp = 0 - 0 5 tmp = 0 - 0
6 jne <T> 6 ZF = [tmp = 0?1:0] 6 ZF = [0 = 0?1:0] 6 ZF = 1
7 push %ebx 7 jump (ZF = 1) T 7 jump (ZF = 1) T 7 jump (ZF = 1) T
8 mov $0x0,%ebx 8 [r15] := r11 8 [r15] := 0 8 [r15] := 0
T: 9 r15 := r15 - 4 9 r15 := r15 - 4 9 r15 := r15 - 4
9 jmp *%eax 10 r11 := 0 10 r11 := 0 10 r11 := 0
10 leave T: T: T:
11 ret 11 jump r10 11 jump 0x10004014 + 0 11 jump 0x10004014
12 nop 12 r15 := r16 12 r15 := r16 12 r15 := r16
13 r16 := m[r15] 13 r16 := m[r16] 13 r16 := m[r16]
14 r15 := r15 + 4 14 r15 := r15 + 4 14 r15 := r15 + 4
15 return 15 return 15 return
(a) (b) (c) (d)

Figure 2. Malicious code and normalization. (a) A small fragment of malicious code and (b), (c), and (d) the corresponding
steps of the transformation into the normalized form. To keep the figure compact, we simplified the instruction’s semantics
by removing irrelevant control flags updates. Instructions that are struck through are considered dead.

addresses because it lets us identify tautological condi- An example. To better explain when and how we can
tions as well as constant memory addresses somehow apply the transformations composing the normalization
camouflaged through more complex expressions. process, we present a simple example. Figure 2 shows a
small fragment of malicious code as well as the code ob-
Indirection resolution. Assembly code is generally tained during the normalization process’s intermediate
rich with indirect memory accesses and control trans- iterations. The code in Figure 2a is unnecessarily com-
fers; for example, during compilation, switch-like plex, and we can translate it into a simpler form: the code
statements are translated into jumps through an indirect contains a tautological conditional branch in addition to
jump table. When we encounter indirect control trans- an undirected jump, whose target address we can evalu-
fers during code interpretation, the currently available ate statically.
information isn’t sufficient for us to estimate the set of Figure 2b shows the output of the normalization
admissible jump targets. During code normalization, process’s first step, which consists of translating the code
however, we can “guess” some of the targets based on into its corresponding intermediate form. We apply
information collected through static analysis and subse- transformations such as algebraic simplifications and ex-
quently elaborated during the transformations. Once pression propagation directly to this new form; note that
identified, these new code segments are ready for analy- we can evaluate the value of some expressions statically
sis, we invoke the code interpretation on this code, and and propagate these values into other expressions that use
the normalizer then processes the output (purple line in them. The instruction at line 4 in Figure 2c, for example,
Figure 1). turns out to be completely useless because it doesn’t alter
the original value computed by the instruction at line 2.
Control-flow normalization. A malware can signifi- Moreover, through propagation, this constant value is
cantly twist a program’s control flow by inserting fake also copied inside the expression that represents the jump
conditional and unconditional jumps. A twisted control target address (line 11), thus allowing us to translate the
flow can affect the quality of the entire normalization indirect jump into a direct one. In Figure 2d, which rep-
process because it can limit other transformations’ effec- resents the normal form, we can see that the conditional
tiveness. At the same time, other transformations are es- jump (line 7) has been removed because the condition
sential for improving the quality of the control-flow turns out to be always true; thus we can remove the fall-
graph’s normalization (for example, algebraic simplifica- back path from the program. Instructions that are struck
tions and expression propagation). We can perform dif- through are considered dead.
ferent types of normalizations on the control flow; for
example, with code straightening, a straight sequence of in- Code comparison
structions can replace a chain of unconditional jump in- Unfortunately, we can’t expect normalization to reduce
structions. During spurious path pruning, we can prune different instances of a malicious code to the same normal
dead paths arising from fake conditional jump instruc- form—not every transformation is statically reversible
tions from the control flow to create new opportunities (for example, when the order of instruction is per-
for transformation. mutable, or when the malware performs the same opera-

50 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

tion executing different instructions that are transformed


into a different intermediate form). Thus, a bytewise
comparison will likely lead to false negatives. During our
experiments, we tried to classify the typologies of differ-
ences found in the normalized instances and discovered
that they were all local to basic blocks. In other words,
normalized instances of malicious code often share the
same control flow.
To elicit the similarities, therefore, we decided to
represent the malicious code and the alleged normal-
ized host program through their interprocedural control-
flow graphs, which are all procedures’ control-flow
graphs combined. Under this assumption, we can for-
mulate the search for malicious code as a subgraph iso-
morphism decision problem: given two graphs G1 and
G2, is G1 isomorphic to a subgraph of G2? (Even
though subgraph isomorphism is in general NP-com-
plete,12 with our particular problem, control-flow
graphs are very sparse, and finding a solution is usually
tractable.) Figure 3 shows the two graphs just men-
tioned: Figure 3a models the malicious code, whereas
Figure 3b matches the suspicious program (nodes high-
lighted are those that match).
Comparison through interprocedural control-flow
(a) (b)
graphs lets us abstract the information from local to
basic blocks. This is beneficial in that we’re throwing
away every possible source of difference, but it could be Figure 3. Two control flow graphs. We present (a) a malicious code
a disadvantage if we consider that the information we’re M and (b) a normalized program PN. The nodes highlighted in
losing can help identify pieces of code that share the purple are those from PN that matching ones in M.
same structure but have different behaviors. Thus,
we’ve augmented these graphs by labeling both nodes
and edges: we labeled nodes according to the instruc- Table 3. Instructions and flow transitions classes.
tion properties belonging to them and edges according
to the type of flow relations among the nodes they con- INSTRUCTIONS CLASSES FLOW TRANSITIONS CLASSES
nect. A similar labeling method is proposed else-
where.13 We group instructions with similar semantics Integer arithmetic One-way
into classes and assign a number (label) to each node that Float arithmetic Two-way
represents all the classes involved in the basic block. Logic Two-way (fallback or false)
Table 3 shows the classes in which we’ve grouped in- Comparison N-way (computed targets of indirect jumps
structions and flow transitions. We also represent calls to or calls)
shared library functions with the same notation: the Function call
caller node is connected to the function that’s repre- Indirect function call
sented with just one node and labeled with a hash calcu- Branch
lated on the function name. Jump
Indirect jump
Prototype implementation Function return
To experimentally verify our approach, in terms of both
correctness and efficiency, we developed a prototype.
We built the code normalization module on top of gine to undo the previously described mutations. Using
Boomerang (http://boomerang.sourceforge.net), an the information collected with the analysis, our tool de-
open source decompiler that reconstructs high-level cides which set of transformations to apply to a piece of
code by analyzing binary executables. Boomerang per- code based on control- and data-flow analysis results.
forms the data- and control-flow analysis directly on an The analysis framework can also accommodate the res-
intermediate form14 automatically generated from ma- olution of indirections and performs jump- and call-
chine code. We adapted it to our needs and used the en- table analysis15 (further details are available elsewhere16).

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 51


Malware

After the prototype has normalized the code, it builds there was a correspondence between the archetype and
a labeled control-flow graph of the resulting code, along the instance even before normalization. Given that the
with the malware’s control-flow graph, and we give the type of transformations applied during self-mutation
graph in input to a subgraph isomorphism algorithm in are randomly chosen, we believe that the malware ap-
order to perform the detection. The prototype of our plied very weak transformations while generating
tool performs subgraph matching, leveraging the VF2 those instances. After normalization, we noticed that
algorithm from the VFLIB library (http://amalfi.dis. all the samples matched the chosen archetype. We also
unina.it/graph/db/vflib-2.0/).17 observed an average reduction in the code size of about
57 percent.
Experimental results Further experimentation with W32/Dislex con-
To evaluate the presented approach, we performed a set firmed these results. We noticed that before normaliza-
of independent experimental tests to assess the normal- tion, some graphs corresponded to each other, but none
ization procedure’s quality and the code comparator’s matched the archetypes; a deeper investigation revealed
precision. The results demonstrate how effective our ap- that the instances generated during the same infection
proach is but also highlight that we still need to do a lot of shared the same payload. After normalization, we no-
work to build a code normalizer that we can use with ticed that the control-flow graphs were perfectly iso-
real-world executables. morphic if we didn’t consider node labels. Through
labeling, we identified four different types of archetypes
Code normalization evaluation that differed only in the labels of one or two nodes. In
We evaluated the code normalization’s effectiveness by some cases, the nodes ended with a jump absent in the
analyzing two different self-mutating malicious pro- others; during normalization the tool couldn’t remove
grams: W32/Etap.D (also known as Metaphor) and these extra jumps because they were located exactly at
W32/Dislex. The former is considered one of the most the end of a real basic block. Overall, thanks to normal-
interesting self-mutating malicious codes and evolves ization, we observed an average reduction in the graph
through five steps. The malware size of roughly 65 percent and the elimination of
approximately half the payload instructions.
1. disassembles the current payload; In our experiments, we have applied normalization
2. compresses the payload according to a set of prede- directly to the malicious code. In the real world, however,
fined rules to avoid size explosion; the malware is tangled into the host code—we would
3. mutates the payload by introducing fake conditional perform its normalization implicitly when normalizing
and unconditional branches; the entire host code. Unfortunately, our prototype isn’t
4. expands the payload by applying step 2’s rules in re- mature enough to handle big executables and, although
verse; and we believe normalization will be quite effective on the
5. assembles the mutated payload. executable that hosts the malicious instructions, we were
unable to make this assessment. In fact, two problems
W32/Dislex is slightly simpler because it just inserts use- might reduce normalization’s effectiveness:
less code and permutes the payload by inserting fake con-
trol-flow transitions. • Our tool might not be able to explore the benign host
We collected different instances of the malicious pro- code completely, and the code that invokes the mali-
grams by executing them in a virtual environment and cious guest lies in the unexplored region.
forcing them to infect a set of predefined test programs. • Our tool could explore the benign host code com-
We repeated this step several times consecutively in order pletely, but couldn’t resolve the links with the malicious
to infect new programs, using a newly generated malware code region.
instance every time. We collected 115 different samples
of W32/Etap.D and 63 of W32/Dislex. We manually Heuristics to maximize code exploration already
identified each malware’s starting point in the various exist, and we can adopt them to overcome the first prob-
hosts—for W32/Etap.d, we chose an address stored at a lem (Boomerang already handles common cases17). The
fixed location in the executable import address table, second problem is less worrying because normalization
whereas for W32/Dislex we chose the image start ad- appears to be rather effective in reconstructing an obfus-
dress—to focus the analysis only on the malicious pay- cated control flow, and malware uses the same techniques
load. (In both cases, the code fragment analyzed seems to to hide malicious code among host instructions.
belong to a decryption routine that decrypts the real ma-
licious payload.) We then compared the code before ap- Code comparison evaluation
plying normalization and after. We’ve evaluated the code comparator via an independent
For W32/Etap.d, we noticed that, in some cases, test to measure its precision (more details are available

52 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Malware

elsewhere18). First, we collected a huge set of system bi- Table 4. Results from manual evaluation of a
nary executables and constructed their augmented inter- random subset of the code comparator results.
procedural control-flow graph. We then split this graph
in order to construct a graph for each program function. POSITIVE RESULTS (EQUIVALENT GRAPHS) # %
We used the functions identified to simulate malicious
code and searched within all the sample set’s programs; Equivalent code 35 70
we threw away graphs with fewer than five nodes because Equivalent code (negligible differences) 9 18
they’re too small to unambiguously characterize a partic- Different code (small number of nodes) 3 6
ular code fragment. We then divided the code compara- Unknown (too big to compare by hand) 1 2
tor output into three sets: Bugs 2 4

• equivalent graphs generated from equivalent functions NEGATIVE RESULTS (DIFFERENT GRAPHS) # %
(two functions were equivalent if the hashes computed
on their opcodes matched); Different code 50 100
• equivalent graphs generated from functions with dif-
ferent hashes; and
• different graphs generated from functions with differ- is reducing the resources requested for the analysis. The
ent hashes. static analysis we perform on the malicious code is in
general quite expensive, so we believe that it’s necessary
We then compared a small number of random elements to perform the analysis on the smallest portion of code
of the last two sets to verify the presence of false posi- possible, but this means that in the future, we must be able
tives and false negatives. Table 4 shows the results ob- to identify which part of the code to focus on.
tained through this manual inspection. Besides a bug
found in the prototype, manual inspection highlighted References
that, in most cases, the compared functions were se- 1. P. Ször and P. Ferrie, “Hunting for Metamorphic,” Proc.
mantically equivalent even when the hashes didn’t Virus Bulletin Conf., Virus Bulletin, 2001, pp. 123–144.
match (we suspect that the same function was compiled 2. M. Christodorescu and S. Jha, “Testing Malware Detec-
with slightly different options). False positives arose tors,” Proc. 2004 ACM SIGSOFT Int’l Symp. Software
only when we compared very small graphs (fewer than Testing and Analysis (ISSTA 04), ACM Press, 2004, pp.
seven nodes). Manual inspection also revealed that all 34–44.
graphs reported to be different were generated from dif- 3. M. Christodorescu and S. Jha, “Static Analysis of Exe-
ferent functions. cutables to Detect Malicious Patterns,” Proc. Usenix Secu-
rity Symposium, Usenix Assoc., 2003, pp. 169–186.
4. D.M. Chess and S.R. White, “An Undetectable

D espite theoretical studies demonstrating that it’s


possible in principle to build undetectable mali-
cious code, we’ve demonstrated that the techniques
Computer Virus,” Proc. Virus Bulletin Conf., Virus Bul-
letin, 2000; www.research.ibm.com/antivirus/SciPapers/
VB2000DC.htm.
malicious code writers currently adopt to achieve per- 5. F.B. Cohen, A Short Course on Computer Viruses, 2nd ed.,
fect mutation don’t let them get too close to the theo- Wiley, 1994.
retical limit. 6. C. Collberg, C. Thomborson, and D. Low, A Taxon-
We believe that the experimental results we obtained omy of Obfuscating Transformations, tech. report 148,
regarding our normalization process demonstrates that it Dept. of Computer Science, Univ. of Auckland, July
adequately treats techniques that self-mutating malware 1997.
currently adopts. Unfortunately, we expect that in the 7. P. Ször, The Art of Computer Virus Research and Defense,
near future, such transformations will be replaced with Addison-Wesley, 2005.
more sophisticated ones, which could seriously under- 8. A. Lakhotia, A. Kapoor, and E.U. Kumar, “Are Meta-
mine the effectiveness of static analysis and, consequently, morphic Viruses Really Invincible?” Virus Bulletin, Dec.
our proposed approach as well. These transformations 2004, pp. 5–7.
could include the use of function calls and returns to 9. S.K. Debray et al., “Compiler Techniques for Code
camouflage intrafunction control-flow transitions; the Compaction,” ACM Trans. Programming Languages and
introduction of opaque predicates; the introduction of Systems, vol. 22, no. 2, 2000, pp. 378–415.
junk code containing useless memory references that will 10. S.S. Muchnick, Advanced Compiler Design and Implemen-
create spurious data dependencies; and the adoption of tation, Morgan Kaufmann, 1997.
antidisassembling techniques. 11. A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles,
Another important issue we must address in the future Techniques and Tools, Addison-Wesley, 1986.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 53


Malware

12. J.R. Ullman, “An Algorithm for Subgraph Isomor- & Vulnerability Assessment (DIMVA), Springer, 2005, pp.
phism,” J. ACM, vol. 23, no. 1, 1976, pp. 31–42. 129–143.
13. C. Kruegel et al., “Polymorphic Worm Detection using
Structural Information of Executables,” Proc. Int’l Symp. Danilo Bruschi is a professor of computer sciences at Università
Recent Advances in Intrusion Detection, Springer, 2005, pp. degli Studi di Milano, Italy, where he is also director of the Mas-
207–226. ter Program in ICT Security, director of the Laboratory for Secu-
rity (LASER), and teaches computer and network security and
14. C. Cifuentes and S. Sendall, “Specifying the Semantics operating systems. His main research interests include computer
of Machine Instructions,” Proc. 6th Int’l Workshop on Pro- and network security, reliability, and survivability, computer
gram Comprehension (IWPC 98), IEEE CS Press, 1998, forensics, social implications, and privacy. Bruschi has a PhD in
computer sciences from the Università degli Studi di Milano.
pp. 126–133.
Contact him at bruschi@dico.unimi.it.
15. C. Cifuentes and M.V. Emmerik, “Recovery of Jump
Table Case Statements from Binary Code,” Proc. 7th Int’l Lorenzo Martignoni is currently enrolled in the PhD program in
Workshop on Program Comprehension, IEEE CS Press, 2001, computer science at Università degli Studi di Milano, Italy. His
pp. 171–188. research interests include computer security and the analysis of
malicious code and computer forensics in particular. Martignoni
16. D. Bruschi, L. Martignoni, and M. Monga, “Using Code has an MS in computer sciences from Università degli Studi di
Normalization for Fighting Self-Mutating Malware,” Milano-Bicocca, Italy. Contact him at martign@dico.unimi.it.
Proc. Int’l Symp. Secure Software Engineering, IEEE CS
Press, 2006, pp. 37–44. Mattia Monga is an assistant professor in the Department of
17. L.P. Cordella et al., “A (Sub)graph Isomorphism Algo- Computer Science and Communication at the Università degli
Studi di Milano. His research activities are in software engi-
rithm for Matching Large Graphs,” IEEE Trans. Pattern neering and security. Monga has a PhD in computer and
Analysis and Machine Intelligence, vol. 26, no. 10, 2004, automation engineering from Politecnico di Milano, Italy. He is
pp. 1367–1372. a member of the IEEE Computer Society and is on the steering
committee of CLUSIT, an Italian association promoting aware-
18. D. Bruschi, L. Martignoni, and M. Monga, “Detecting
ness, continuous education, and information sharing about dig-
Self-Mutating Malware Using Control Flow Graph ital security. Contact him at monga@dico.unimi.it; http://
Matching,” Proc. Conf. Detection of Intrusions and Malware homes.dico.unimi.it/~monga/.
Sign Up Today

For the
IEEE
Computer Society
Digital Library
E-Mail Newsletter
■ Monthly updates highlight the latest additions to the digital library
from all 23 peer-reviewed Computer Society periodicals.

■ New links access recent Computer Society conference publications.

■ Sponsors offer readers special deals on products and events.

Available for FREE to members, students, and computing professionals.

Visit http://www.computer.org/services/csdl_subscribe

54 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Identity Management

Trust Negotiation
in Identity Management

Most organizations require the verification of personal


information before providing services, and the privacy of such
information is of growing concern. The authors show how
federated identity management systems can better protect
users’ information when integrated with trust negotiation.

n today’s increasingly competitive business envi- properties, or user identity attributes (user

I ronment, more and more leading organizations are


building Web-based infrastructures to gain the
strategic advantages of collaborative networking.
However, to facilitate collaboration and fully exploit
such infrastructures, organizations must identify each
attributes, for short). Thus, authorizations
specified for a given resource are no longer expressed in
terms of user login IDs but in terms of requirements and
conditions against user properties.
One challenge with current IdM systems is distributing
ABHILASHA
BHARGAV-
S PANTZEL,
ANNA C.
user in the collaborative network as well as the resources the IdPs’ functionality among IdPs and SPs (in this article, SQUICCIARINI,
each user is authorized to access. User identification and we don’t differentiate between SPs and IdPs in a federa- AND E LISA
access control must be carried out so as to maximize tion). We need a secure and privacy-preserving mecha- BERTINO
user convenience and privacy without increasing orga- nism for retrieving the user attributes from different SPs. Purdue
nizations’ operational costs. A federation can serve as the The IdM system must provide only the user information University
basic context for determining suitable solutions to this that is needed to satisfy the requesting SPs’ access-control
issue. A federation is a set of organizations that establish policies. In this regard, users have differentiated privacy
trust relationships with respect to the identity informa- preferences for various types of personal information.1 For
tion—the federated identity information—that is consid- example, users might agree to share demographic informa-
ered valid. A federated identity management system tion but not credit-card or health information. Such re-
(IdM) provides a group of organizations that collaborate quirements call for a flexible and selective approach to
with mechanisms for managing and gaining access to sharing user attributes in federations. A system could
user identity information and other resources across or- achieve selective release of identity by supporting multiple
ganizational boundaries. federated digital identities. For example, a user could have
IdM systems involve at least two types of entities: identity a business identity and a personal identity, and their corre-
providers and service providers. An IdP manages user authenti- sponding profiles would have associated privacy prefer-
cation and user-identity-relevant information. A service ences. Such an approach, however, contradicts the main
provider (SP) offers services to users who satisfy the policy aim of federated identity solutions—that is, minimizing
requirements associated with these services. It specifies and the management of multiple profiles by the user.
enforces the access-control policies for the resources it of- One way to achieve such flexibility and fine-grained
fers. An organization in a federation can act as both an IdP access is to enhance IdM technology with automated
and an SP. In most IdM systems (see the “Initiatives and sys- trust-negotiation (ATN) techniques.2 Trust negotiation
tems” sidebar), IdPs authenticate users using single-sign- is an emerging access-control approach that aims to es-
on technology. With SSO, users can log on with the same tablish trust between negotiating parties online through
username and password for seamless access to federated ser- bilateral credential disclosure. Such a negotiation aims to
vices within one or multiple organizations. Federated iden- establish a trust level sufficient to release sensitive re-
tity includes not only users’ login names, but also user sources, which can be either data or services.

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 55
Identity Management

Initiatives and systems


L iberty Alliance and WS-Federation are two emerging standards
for identity federation in the corporate world. Because these
projects are similar, we only describe the former.
The Shibboleth (http://shibboleth.internet2.edu) initiative
originated in academia and is similar to the Liberty Alliance in
that it aims to facilitate resource sharing between research and
Liberty Alliance (www.projectliberty.org) is based on Security academic institutions. It extends the federated identity infor-
Assertion Markup Language (SAML) and provides open mation concept to federated user attributes. When a user at an
standards for single sign-on with decentralized authentication. institution tries to use a resource at another, Shibboleth sends
SSO lets users sign on once at a Liberty-enabled site and remain attributes about the user to the remote institution, rather than
signed on when navigating to other Liberty-enabled sites. This making the user log in to that institution. The receiver can check
group of Liberty-enabled sites belongs to a circle of trust—that whether the attributes satisfy the SP’s policy. The Shibboleth IdP
is, a federation of SPs and IdPs based on the Liberty architecture. accounts for all user attributes and user privacy preferences when
The IdP is a Liberty-enabled entity that creates, maintains, and giving information to other SPs. The FAMTN approach differs
manages user identity information and provides SPs with this from Shibboleth in that it doesn’t rely on a central IdP for all user
information. Similarly, the federated attribute management and attributes. Rather, user attributes are distributed among the fed-
trust-negotiation (FAMTN) framework builds on an SSO and eration SPs, each of which can act as an IdP. The ability to
provides a flexible decentralized trust management system for negotiate with different SPs adds flexibility to how users can de-
registered users. fine different privacy preferences with respect to federation
According to the Liberty Alliance framework, a federation might members. Shibboleth requires trust agreements to define the
include multiple IdPs, which could also be SPs. Basically, in a given population, retention, and use of attributes, thus making it dif-
Liberty circle of trust, multiple IdPs can share a user’s information. ficult for external users (who aren’t affiliated with the federation)
These IdPs establish trust relationships and access policies a priori to carry on ad hoc negotiations for the various services offered. In
while forming the circle of trust. The Liberty protocols don’t other words, unlike our framework, Shibboleth isn’t open to
dictate the underlying semantics and related protocols. Truly de- external users.
centralized identity management requires a more automatic meth- Researchers have developed several systems and prototypes
odology for federating user information among IdPs. The FAMTN for trust negotiations in Web-based applications. TrustBuilder,1
framework doesn’t distinguish SPs from IdPs. Each SP in the fed- one of the most significant proposals, provides a set of nego-
eration can act as an IdP. SPs exchange information through tiation protocols that define the message ordering and the type
automatic trust negotiation (ATN), according to an on-demand of information the messages will contain, as well as strategies for
dynamic protocol. controlling the messages’ exact content. It defines various stra-

In this article, we discuss how to integrate federated information; however, trust negotiation ultimately aims
IdM with trust-negotiation techniques. More specifi- to handle introductions between strangers, whereas IdM
cally, we discuss how to implement trust negotiation be- systems are typically for closed environments.
tween SPs in a federation, and between users and SPs. ATN systems and IdM systems also differ in several
This is, to the best of our knowledge, the first attempt to important ways, as Table 1 shows. Importantly, we based
integrate a federated IdM system with a trust-negotiation our analysis on the IdM and ATN models as they were
system. A key aspect of the resulting framework—feder- originally designed. Researchers have proposed varia-
ated attribute management and trust negotiation (FAMTN)— tions to both approaches in the past few years, which
is that a user doesn’t have to provide a federated attribute make the evaluation results slightly different.
(that is, attributes the user is willing to share in a federa-
tion) more than once to a given federation. Internal users Open versus closed environment
of FAMTN systems can perform negotiations by ex- ATN techniques,3 developed for use in open systems,
ploiting their SSO ID without having to repeat identity provide protocols for introducing strangers to each other.
verification. Further, a FAMTN system supports tempo- They might be useful for the initial trust-establishment
rary SSO, so external users can perform negotiations with process between users and IdPs or to automatically man-
the federation using the federated framework to reduce age introductions between different federation groups.
the amount of identity information they need to provide.
Credential and identity
Comparison of IdM attribute management
and ATN systems In a typical ATN system, the user is the IdP. ATN is a
The trust-negotiation paradigm has several similarities to user-centric system in which a client stores credentials
federated IdM. Both aim to better handle users’ sensitive and provides them on behalf of a user through negotia-

56 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Identity Management

tegies to let strangers establish trust by exchanging digital cre- parties in addition to the two parties initiating the negotiation.
dentials and using access-control policies that specify the combi- Thus, FAMTN is characterized by multiparty negotiations, as
nations of credentials a stranger must disclose to gain access to opposed to Trust-␺’s two-party negotiations.
each local service or credential. Marianne Winslett and her col- Having been widely studied in theory, ATN systems are now
leagues2 developed Unipro, a unified scheme to model resource ready for use in real applications. TrustBuilder is an example of an
protection, including policies. It represents one of the most sig- actual system for support of trust negotiations. Current Web
nificant proposals in the negotiation research area, and most sig- services only provide basic negotiation capabilities. The full
nificantly influenced our work. However, Unipro doesn’t support potential of trust negotiations will be achieved when the practical
privacy policies, nor does it define an ad hoc policy language. limitations related with public-key infrastructures are overcome.
Kent Seamons and his colleagues3 explored the issue of sup-
porting sensitive policies, obtained by introducing hierarchies in References
policy definitions. They also addressed privacy issues in trust 1. T. Yu, M. Winslett, and K.E. Seamons, “Supporting Structured Credentials
negotiation.4 However, their approach doesn’t provide a compre- and Sensitive Policies through Interoperable Strategies for Automated Trust
hensive solution to such problems because it only deals with pro- Negotiation,” ACM Trans. Information and System Security, vol. 6, no. 1,
tecting sensitive policies, achieved by dynamically modifying 2003, ACM Press, pp. 1–42.
policies during a negotiation. 2. T. Yu and M. Winslett, “A Unified Scheme for Resource Protection in Auto-
William Winsborough and Ninghui Li5 introduced a role-based mated Trust Negotiation,” Proc. IEEE Symp. Security and Privacy, IEEE CS
trust-management language that they use to map entities to roles Press, 2003, pp. 110–123.
based on the properties described in their credentials. They also 3. K.E. Seamons, M. Winslett, and T. Yu, “Limiting the Disclosure of Access
developed an algorithm to locate and retrieve credentials that Control Policies during Automated Trust Negotiation,” Proc. Network and
aren’t locally available. This credential chain discovery is an Distributed System Security Symp., Internet Soc., 2001.
important aspect of trust negotiation because assuming the cre- 4. K.E. Seamons, M. Winslett, and T. Yu, “Protecting Privacy During On Line
dentials to be locally stored is too strong an assumption for decen- Trust Negotiation, Proc. 2nd Workshop Privacy Enhancing Technologies,
tralized collaborative environments. Springer Berlin, 2002, pp. 129–143.
We based our framework on Trust-␺,6 a trust-negotiation 5. W.H. Winsborough and N. Li, “Protecting Sensitive Attributes in Automated
system for peer-to-peer environments. Trust-c is complemented by Trust Negotiation,” Proc. ACM Workshop Privacy in the Electronic Soc., ACM
an ad hoc XML based language, ␺-TNL, for encoding negotiation Press, 2002, pp. 41–51.
policies, digital credentials, and security-related information. A 6. E. Bertino, E. Ferrari, and A.C. Squicciarini, “Trust-c: A Peer-to-Peer Frame-
main difference between Trust-␺ and our work is that FAMTN’s work for Trust Establishment,” IEEE Trans. Knowledge and Data Eng., vol.
negotiation process is much more articulated and can involve third 16, no. 7, 2004, pp. 827– 842.

Table 1. Automated trust negotiation (ATN) vs. identity management (IdM) systems.
CRITERIA ATN SYSTEMS IDM SYSTEMS

Environment Open Closed


Credential management User centric Polycentric
Attributes used Certified attributes or credentials Certified and uncertified attributes
Attribute encoding X.509 certificates, XML certificates Username, Security Assertion Markup Language (SAML)
assertions, X.509 certificates, Kerberos tickets
Architecture Peer-to-peer Client-server
Policies Privacy policies, access-control policies Privacy policies, authorization policies
Policy language XML-based trust-negotiation language (X-TNL), Extensible Access Control Markup Language (XACML)
register transfer, Protune, and so on
Trust model Pairwise trust (some brokered trust) Pairwise trust, brokered trust, community trust
Unique identification Optional Single sign-on required
Credential discovery Credential chain management protocols Discovery service protocols

tion. Although recent work has looked at storing user save user profiles for future use in the federation accord-
credentials with SPs using anonymous credentials, most ing to the user’s privacy preferences.
ATN systems assume that users directly manage their ATNs typically negotiate certified attributes or cred-
own credentials. In IdM systems, on the other hand, SPs entials. IdM systems mainly use uncertified attributes, al-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 57


Identity Management

though they can also support certified attributes. IdM sys- fication is (if required) executed on the fly, while the ne-
tems usually rely on Security Assertion Markup Language gotiation process is taking place.
(SAML) assertions for encoding attributes, whereas in
ATN systems, attributes are encoded in credentials, which Trust model
are the digital equivalent of physical certificates, repre- A typical IdM system has three types of trust models:4
sented according to the X.509 certificate format.
• a pairwise model for two entities that have direct busi-
Architecture ness agreements with each other;
An ATN system is typically used in peer-to-peer (P2P) • a brokered trust model for two entities that don’t have a
systems, so clients and servers have the same basic archi- direct agreement with each other, but have agreements
tecture. Any entity serving as provider in a trust negotia- with one or more intermediaries so as to enable con-
tion can act as a client in a different negotiation. In IdM struction of a business trust path between the two enti-
frameworks, IdPs, SPs, and clients all have different archi- ties; and
tectural components depending on that entity’s function- • a community trust model for several entities that have
ality. The P2P nature of ATN systems simplifies the common business agreements within the community
integration of an ATN’s architectural components with or federation.
the existing IdM systems.
Although all three trust models can use ATN systems,
Policies the brokered trust model integrated with ATN provides a
Both IdM and ATN systems aim to satisfy user privacy unique feature to existing IdM systems.
preferences for their personal data and to ensure that ac-
cess-control policies are stated and enforced. So, both Other similarities
offer privacy and access-control policies. However, in Both ATN and IdM also require credential discovery, al-
ATN systems, access-control policies play a key role in though they use different methods. Using a discovery ser-
the trust-negotiation processes, whereas they’re only a vice, IdMs collaborate to make assertions about a user
marginal aspect in IdM systems. As such, ATN policies from a local IdP to a remote IdP. Similarly, ATN systems
can be more complex and provide alternative ways of sat- use credential discovery to retrieve remote credentials not
isfying the requirements for access to a given resource or available at the negotiating parties.
expressing different usage conditions. This ensures Another related aspect is delegation. Although dele-
soundness for any transaction, meaning that if user prefer- gation isn’t a main issue in trust negotiations, both IdM
ences and the SP’s requirements are compatible, the trans- and ATN systems achieve delegation through ad hoc
action will certainly succeed. Soundness isn’t guaranteed protocols and credentials enabling entities to negotiate on
in current IdM systems because they lack formal negoti- behalf of third parties. In IdM systems, we can use the
ation procedures and a corresponding expressive policy brokered trust model to delegate the responsibility for at-
language. However, IdM systems provide mechanisms tribute assertion to another IdP that the user trusts more.
for policy exchange that additional negotiation modules
could use to provide ATN functions. Integrating IdM
and trust negotiations
User identity FAMTN combines the advantages of the IdM and ATN
Both ATN and IdM systems require users to be identi- approaches, providing a truly distributed approach to
fied. Such a requirement is particularly relevant in IdM managing user identities and attributes with negotiation
systems, which aim to uniquely identify users within fed- capabilities.
erations. Users in an IdM mostly need an SSO to interact A FAMTN federation essentially involves two types
with any SP in the federation and to ensure that their at- of entities: FAMTN service providers (FSP) and users. In
tributes are linked to them. By contrast, identity is usually the FAMTN framework, we don’t distinguish between
a secondary aspect in ATN systems because authentica- SPs and IdPs: each SP in the federation can act as an IdP.
tion is based mainly on user properties rather than on the SPs exchange information through ATN, according to
sole identity. However, real case scenarios show that au- an on-demand dynamic protocol. FSPs support identity
thentication is often a first-class requirement in specific and attribute provisioning, as we detail later.
negotiations. Further, IdM systems rely on SSO to iden- Our approach supports negotiations between an FSP
tify users, so there’s no need to certify user identities in and the user, and between two FSPs in the same federa-
other ways. ATN systems obtain identities using creden- tion. The protocol for negotiations between FSPs and
tial combinations, although they might use SSO in spe- users depends on the interacting user’s type. The distinc-
cific contexts. In ATN systems, there’s no need to link tion is based on the user’s membership in the federation.
multiple negotiations to the same identity because identi- A user who’s affiliated with an organization within the

58 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Identity Management

federation is a member user of the federation. The federa-


SP2 Attributes SP1 User
tion is more likely to have information about a member a, required by a, Request service from SP1 Tickets
user even if the member hasn’t accessed any of its services. c the service b {}

external user
This also depends on the member organization’s policy, policy Request user attr (a, b)
which defines which of its affiliated user attributes are

New
Give attr (a, b)
federated. An SSO user identification identifies the Provide service
member in the federation. and give ticket T1 {T1}
On the contrary, external users must provide all re- Request service from SP2
quired attributes at their first negotiation. The first nego-
Given T1 request
tiation between an external user and an FSP includes for user attr (a)
identity provisioning, because the provider issues a tem-

external user
porary user ID to be used within the federation. The use Give attr (a)

Repeat
of time-limited SSO ID for nonmembers ensures iden- Request for user attr (c)
tity linkability. (We can reasonably assume that the feder-
Give attr (c) {T1,
ation policy defines the time interval.) Of course, users T2}
might have multiple identities but choose to adopt one Provide service
for requesting access to service. We don’t elaborate on this and give ticket T2

issue because it goes beyond our article’s scope. By inter-


acting further with the federation, the amount of infor-
mation about users that is disclosed to the federation Negotiation between user and service provider
increases. This information can be linked to the user Negotiation between two service providers
(who becomes a repeated external user) and thus reused in
the subsequent negotiations. As a result, the system exe-
cutes more efficient negotiations with fewer attributes re- Figure 1. External user negotiating with two service providers (SPs)
quired from the user. of a federation. A user who has already provided attributes to any SP
Figure 1 shows an example. User U requests service in the federation might not need to provide them again when
from service provider SP1, which requires user attributes another SP in the federation requires them.
(a, b) to satisfy its service policy. U provides (a, b) and gets
the service. Suppose that U, at the end of this successful
negotiation, opts for sharing attribute (a) within the feder- sued the claimed user ID to double check its validity (for
ation, and suppose that U then requires a service from an- simplicity, we assume the user ID contains FSP informa-
other provider SP2 in the same federation. Suppose that tion to easily identify the issuer).
the attribute requirements there are (a, c). In this case,
however, U only has to provide the attribute c to receive Architecture of
the service. service providers in FAMTN
At the end of a successful negotiation, users receive A FAMTN framework consists of an FSP containing the
one of two types of ticket: necessary components required to execute two func-
tions: trust negotiation among users and FSPs and feder-
• a trust ticket provides information about the previous ation of user attributes.
services and FSPs the user has accessed; and Figure 2 shows the FSP architecture. An FSP’s com-
• a session ticket provides recent history information to ponents derive from FAMTN’s two underlying frame-
help speed up negotiations, as we detail later. works: ATN and federated IdM. Each FSP can perform
the functionality of an IdP and an SP.
The second type of negotiation occurs between two The FSP’s main components are:
FSPs. This negotiation type is useful when a user success-
fully negotiates a service from one FSP and automatically • the Web services component, which enables secure com-
becomes eligible to receive service from another FSP. As munication within the federation and with the users; and
such, when the user asks for a service, the FSP providing • the user negotiation component, which contains the mod-
it can directly negotiate user-related attributes with the ules executing the negotiation, depending on whether
FSP holding such attributes from previous negotiations. the user is a member or nonmember (this component is
Also, negotiations among FSPs might be required for ver- directly related to the trust ticket management layer).
ifying external user identities. Because we don’t rely on a
single IdP, an IdP might not be aware of the last registered Other parts of the FSP include the trust ticket manage-
users. When the FSP receives a request from a locally un- ment layer which manages the trust tickets and the session
known user ID, it can directly interact with the SP that is- tickets required for the negotiation. The policy management

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 59


Identity Management

5. SP1 negotiates with the IdP, which finally sends a


SAML assertion stating whether Joe satisfies SP1’s age
User
Web criteria. So, Joe doesn’t have to reveal the actual cre-
services dential to SP1, ensuring that the credential is stored
HTTP SOAP component
only with a trusted party.
User negotiation component
6. Joe also registers his address with SP1 for delivery but
Trust ticket management layer
imposes as a condition that his address should be re-
Other service providers

Policy Credential
management Policy leased only to a member of the federation and only
management
HTTP SOAP

enforcement
system
when the address is required for a purchased product
Policy base
delivery and the member is certified by the Better
Attribute negotiation system
Business Bureau (BBB).
Tree manager Storage subsystem 7. Joe subsequently accesses SP2 to order a pizza. Be-
Sequence
cause of SSO he gets seamless access.
prediction User 8. SP2 asks Joe for his address. Joe tells SP2 to get his pro-
module profile file from other sites in the federation. (In this case, it’s
actually an agent operating at the client on behalf of
Joe that suggests request redirections. We use Joe to
Compliance checker simplify the example’s presentation.) Using the dis-
covery service, SP2 contacts SP1, who negotiates
with SP2 to verify that the conditions for Joe’s at-
Figure 2. The federated attribute management and trust negotiation tribute release are met. If the negotiation succeeds,
(FAMTN) service provider architecture. SP2 receives the required information and can make
the appropriate delivery.

and enforcement components store the authentication and This example demonstrates how we can implement
access-control policies in the policy base and enforce them, additional privacy and flexible policies with ATN. Also,
respectively. The credential management system manages and not all FSP components are required in a typical IdM
validates certificates and user tickets by verifying the FSPs’ system. FSP can leverage modules belonging to the Lib-
signatures. It’s also responsible for revocation when re- erty Alliance Framework or other IdM systems, such as
quired. The attribute negotiation system consists of the main the discovery service (DS) and personal profile (PP) policy
components required for negotiation: and credential management systems. The ATN-specific
parts (the solid color components) in Figure 3 are the
• the tree manager, which stores the negotiation’s state; subset of FSP components used for ATN in the Liberty
• the storage subsystem containing the sequence prediction WSF framework.
module, which caches and manages previously used
trust sequences and user profile information; and Negotiations in
• the compliance checker, which tests policy satisfaction and a FAMTN federation
determines request replies during a negotiation. Session tickets and trust tickets are the main building
blocks in our trust negotiation protocols. Both ticket
An example use case types are temporal with a fixed lifetime. We assume
Figure 3 shows an example scenario of the Liberty Web loosely synchronized clocks in the federation. We use the
services framework (WSF)5 with additional FSP compo- SSO ID as the user ID in the tickets.
nents. (See the “Initiatives and systems” sidebar for more A session ticket ensures that if the negotiation ends
on Liberty Alliance, which provides open standards for successfully and the same user requests the same FSP for
SSO with decentralized authentication.) In this scenario, the same service in a subsequent session, the system can
the following steps take place: grant the service immediately without having to unnec-
essarily repeat the trust-establishment process. A session
1. A user, say Joe, accesses SP1 using SSO. ticket therefore contains the fields SignedFSP <␶(sreq), u, T,
2. Using redirection and IdM system protocols, an IdP R>, where ␶(sreq) denotes the service requested, u is the
transmits a SAML assertion authenticating Joe to SP1. user ID, and T is the ticket timestamp. Here, R denotes
3. SP1 requires a certificate from Joe to verify his address the negotiation’s result and can be a simple statement or a
for delivery and that he is older than 21. structured object. The use of structured objects is partic-
4. Joe doesn’t trust SP1 so won’t reveal his certified cre- ularly interesting for tracing intermediate results of nego-
dential to it. He therefore negotiates with the IdP and tiations of aggregated services.
reveals his credential to it instead. The FSP signs a session ticket and gives a receipt of the

60 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Identity Management

trust establishment. Because session tickets are encrypted IDP: Identity provider
with the FSPs private key, they are tamperproof and veri- IDP DS DS: Discovery service
fiable. The time-out mechanism depends on the type of modules modules SP: Service provider
user attributes required for the service, and the service’s PP: ID-SIS personal profile
WSC: Web service consumer
security level. WSP: Web service provider
The trust ticket determines the list of services external IDP/DS
users have accessed. Assuming that all the FSPs are mem- ATN modules
bers of the same federation, any member provider can
verify another member provider’s signature. Such a ticket
has the following form: SP WSC SP WSP
modules modules modules modules
SignedSP
last
{ }
list τ ( s ) , FSP ,T u,T − I .
PP client PP client
Every 3-tuple in the list contains the service type, the modules modules
SP1 SP/WSC SP2 SP/WSP
corresponding FSP, and the timeout. The variable u cor-
responds to the temporary user identification, and T – I
is the ID’s expiration date. The ticket is signed by the last Figure 3. Liberty Web services framework and federated service
FSP with which the user had a successful transaction. At provider with three Web sites and system modules. The arrows
the end of a successful transaction, the FSP takes the cur- indicate the possible communication of the various module sets.
rent user trust ticket, removes all timed-out entries, ap-
pends its information, signs it, and sends it to the user.
ured lists of possible user IdPs. Such approaches inhibit
Implementing trust tickets the seamless SSO process and are less efficient.
through cookies Cookies, however, have some security problems:6
Many IdM systems use cookies to make user information
available to servers. State information is stored at the • They’re usually in clear text. Headers are generally un-
client, which sends the cookie to the server the next time protected, and even encrypted cookies are vulnerable
the user accesses that server. Like session and trust tickets, to replay attacks.
cookies can be valid only for the session during which • Because cookies are stored on local machines, anyone
they were issued or can persist beyond the session’s end. A using the machine can easily read them.
persistent cookie is typically written to a file on the • You need to control where cookies are sent, because
browser’s hard drive if its lifetime hasn’t elapsed when the you wouldn’t want to send the user cookie to an un-
browser is shut down and therefore can be used for a trusted service provider. For example, several current
longer period of time. In a truly distributed federation spyware applications exploit user cookies, so we need
that has more than one IdP, an FSP needs a mechanism to to better control cookies’ destinations.
determine which IdP has the user information. In Lib-
erty, this problem is known as the introduction problem. Consequently, cookies shouldn’t store personal iden-
Currently, Liberty Alliance protocols rely on cookies for tifiers or sensitive information. In real applications, how-
redirecting IdPs. ever, a cookie typically stores the SSO user ID or other
Cookies offer several advantages. Implementing them tracking record, which might leak information about the
is efficient because you don’t need new software to use user. Better storage and usage protocols and mechanisms
them, and you can use them independently of any au- can address most of these security vulnerabilities. We pro-
thentication mechanism. They also provide dynamic pose implementing trust tickets in IdM systems using
state information, which is helpful for preventing several cookies to exploit cookies’ advantages while preventing
security threats. One such threat is an impersonation attack, the vulnerabilities we’ve just described. Indeed, the time-
which arises when a user has successfully logged onto one outs and signed information given by the session and trust
FSP, but the other FSPs in the federation don’t re-authen- tickets contain reliable and dynamic state information. To
ticate the user. Thus if the authentication is no longer further increase cookie security, federations should use
valid, because of attacks or other failure, the FSP has no mechanisms enabling selective download of cookies.
straightforward way to detect it. Cookies help the FSP Browsers typically give users limited choice about how to
check whether the authentication ticket is associated handle cookies. Control is coarse-grained: the browser
with the user identity as well as whether the IdP session is will download no cookies or must accept all cookies. Let-
valid for that user. Alternatives to using cookies for the in- ting a user choose cookies from a Web site that uses a sin-
troduction problem are based on interactions with the gle domain versus multiple domains can cause problems
user either actively or on the use of statically hand-config- in federations, which are typically multiple-domain envi-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 61


Identity Management

Require: userID, userAuthenticationInfo 24: send(SessionTicket)  userID


Ensure: IsRegistered(userID) 25: return OK
1: userRequest  getRequest(userID) 26: end if
2: if userRequest  ServicesFSP then 27: end if
3: return Abort-Negotiation 28: *Comment: For Non-Members*
4: end if 29: FSPlist  getTrustTicket(userID)
5: *Comment: For Members* 30: while FSPlist  EmptyList do
6: if isValidMember(userID) = true then 31: Mi = rmHeadOfList(FSPlist)
7: sessionTicket  getSessionTicket(userID) 32: remAttrList3  NEGOTIATEFSP (CurrFSP, Mi
8: if sessionTicket  NULL ^ 33: userID, userRequest)
sessionTicket.time < timeout then 34: if remAttrList3 = NULL then
9: return OK 35: send(TrustTicket)  userID
10: end if 36: return OK
11: MFSP = getMemberFSP(userID) 37: end if
12: remAttrList1  NEGOTIATEFSP (CurrFSP, MFSP 38: end while
13: userID, userRequest) 39: if remAttrList3  NULL then
14: if remAttrList1  NULL then 40: remAttrList4  NEGOTIATEUser(CurrFSP,
15: remAttrList2  NEGOTIATEUser(CurrFSP, 41: userID,CurrPolicyFSP)
16: userID,CurrPolicyFSP) 42: end if
17: else 43: if remAttrList4  NULL then
18: send(SessionTicket)  userID 44: return Abort-Negotiation
19: return OK 45: else
20: end if 46: send(TrustTicket)  userID
21: if remAttrList2  NULL then 47: return OK
22: return Abort-Negotiation 48: end if
23: else

Figure 4. Algorithm for negotiating trust in FAMTN.

ronments. Building server filters is currently complicated tion is in place. Multiple federations with nonempty in-
and not feasible for average users. Like privacy prefer- tersection are outside this article’s scope.
ences, a user should be able to set cookie preferences, Four types of user cases give the basis of the design and
specifying more fine-grained conditions. For example, analysis of the user–FSP negotiation process. Intuitively, a
recent user should obtain service access faster than a new
• accept only signed cookies from a given federation FSP; user. The short-termed session tickets help achieve this.
• accept cookies from BBB-certified members by nego- Similarly, a repeat user, who has already received services
tiating servers’ attributes; from different FSPs in the federation, should get service
• send cookies that don’t contain personally identifying access faster than a new external user. This is because the
information; and new external user directly negotiates all the required at-
• send cookies to FSPs that aren’t in a conflict-of-interest tributes with the FSP, whereas for a repeat user, the FSP
class for the FSP that set the cookie. can retrieve some of the attributes from FSPs the user has
visited before. Information about the previously visited
We need a policy language to express these prefer- FSPs is in the list of trust tickets, which are retrieved iter-
ences that can be integrated with cookies’ storage and atively until user attribute requirements are satisfied. At
usage mechanisms. each iteration, the FSP requiring the user attributes to
satisfy its service disclosure policy negotiates with the FSP
Negotiation in identity federated systems indicated in the trust ticket. If the retrieved attributes
The trust-establishment negotiation process depends on don’t suffice, the FSP negotiates directly with the user.
the type of user and the history of the user’s interactions Finally, a member user, being internal to the federation
with the federation members. Algorithm 1 (Figure 4) and thus more trusted, should have advantages in the ne-
shows the complete negotiation process developed for gotiation process over a new external (nonmember) user.
FAMTN. It includes all user cases, assuming one federa- Indeed, the FSP retrieves the member user attributes di-

62 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Identity Management

Existing federations
F ederated identity can deliver several compelling benefits to organi-
zations. Federation makes it possible for local identities and their
associated data to stay in place, yet be linked together through higher-
institution. InCommon became operational on 5 April 2005.
The HAKA Federation in Finland (www.csc.fi/suomi/funet/
middleware) entered its production phase in late 2004. The Fede-
level mechanisms. The following are examples of existing federations. ration, established in 2003 and based on Shibboleth, currently
The SWITCHaai Federation (www.switch.ch/aai/documents. includes two (of 20) universities and one (of 29) polytechnics as
html) is a group of organizations (universities, hospitals, and identity providers, and four service providers, including the
libraries, for example) that have agreed to cooperate on inter- National Library Portal (Nelli). In Finland, libraries in higher edu-
organizational authentication and authorization. They operate a cation traditionally cooperate in licensing electronic journals.
Shibboleth-based authentication and authorization infrastructure The Liberty Alliance Identity Federation Framework (ID-FF)
(see http://shibboleth.internet2.edu). allows single sign-on and account linking between partners with
By using Shibboleth authentication and authorization technology, established trust relationships. The Identity Web Services Frame-
InCommon (www.incommonfederation.org) facilitates sharing of work (ID-WSF) lets groups of trusted partners link to other groups
protected resources, enabling collaboration between InCommon par- and gives users control over how their information is shared.
ticipants that protects privacy. Access decisions to protected resourc- Finally, the Identity Services Interface Specifications (ID-SIS) will
es are based on user attributes contributed by the user’s home build a set of interoperable services on top of the ID-WSF.

rectly from the organizations in the federation within ACM Trans. Information Systems Security, vol. 7, no. 3,
which users are affiliated. This provides an efficient 2004, pp. 428–456.
mechanism for retrieving users attributes because it 4. Liberty Alliance Project, Liberty Trust Model Guidelines,
avoids iterated negotiations among all the FSPs a user has www.projectliberty.org/liberty/content/download/1232/
interacted with. Here we assume that the affiliated orga- 8000/file/liberty-trust-models-guidelines-v1.0.pdf.
nization stores and possibly certifies all of the member 5. Liberty Alliance Project, Liberty Alliance ID-WSF 2.0
users’ attributes. Member users can also use the session Specifications, www.projectliberty.org/resource_center/
tickets like the external users. specifications/liberty_alliance_id_wsf_2_0_specifications.
6. V. Samar, “Single Sign-on Using Cookies for Web Appli-
cations,” Proc. 8th Workshop Enabling Technologies on Infra-

B efore we can fully integrate federated IdM systems and


trust-negotiation, several issues must be addressed, in-
cluding questions regarding policies—that is, policy com-
structure for Collaborative Enterprises (WETICE), IEEE CS
Press, 1999, pp. 158–163.

pliance and subsumption of policies. The language to Elisa Bertino is a professor of computer science and electrical
define the policies should use vocabulary well understood and computer engineering and research director at the Center
for Education and Research in Information Assurance and Secu-
not only by users and organizations, but by the whole set of
rity (CERIAS) at Purdue University. Her main research interests
organizations. This might not be a realistic assumption, and include security, database systems, object technology, multi-
we need to look into alternatives. Policy languages sup- media systems, and Web-based information systems. Bertino
porting the specification of credential sharing within a fed- has a PhD in computer science from the University of Pisa. She
is a coeditor in chief of the VLDB Journal and member of the edi-
eration don’t exist and will be useful for better privacy torial boards of several publications, including IEEE Internet
control in a federation. Another important problem is the Computing, IEEE Security & Privacy, and ACM Transactions
representation of attributes. This is essential for efficient on Information and Systems Security. She is a Fellow of the
lookup if several users are using the system. The attribute’s IEEE and the ACM. Contact her at bertino@cs.purdue.edu.
meaning and its underlying logic can also help users infer
Anna Cinzia Squicciarini is a postdoctoral research associate at
implications between conditional attributes. Purdue University. Her main research interests range from trust
negotiations, privacy models, and mechanisms for privilege and
References contract management in virtual organizations to grid systems
1. D.L. Baumer, J.B. Earp, and P.S. Evers, “Tit for Tat in and federated identity management. Squicciarini has a PhD in
computer science from the University of Milan. Contact her at
Cyberspace: Consumer and Website Responses to Anar- squiccia@cs.purdue.edu.
chy in the Market for Personal Information,” North Carolina
J. Law and Technology, vol. 4, no. 2, 2003, pp. 217–274. Abhilasha Bhargav-Spantzel is a computer science PhD candi-
2. H. Skogsrud et al., “Trust-serv: A Lightweight Trust date at CERIAS, Purdue University. Her main research interests
include identity management, identity theft protection, cryptog-
Negotiation Service,” Proc. 30th Int’l Conf. Very Large Data
raphy, biometrics, and policy languages. Bhargav-Spantzel has
Bases, Morgan Kaufmann, 2004, pp. 1329–1332. a bachelor’s degree in computer science and mathematics from
3. A. Hess et al., “Content-Triggered Trust Negotiation,” Purdue University. Contact her at bhargav@cs.purdue.edu.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 63


Education
Editors: Matt Bishop, bishop@cs.ucdavis.edu
Deborah A. Frincke, deborah.frincke@pnl.gov

Common Body of Knowledge


for Information Security

T
he need for skilled information security professionals Australia (19), Europe (53), and
South and North America (44). We
has led various academic, governmental, and indus- based our review of these programs
and courses on electronically avail-
trial organizations to work to develop a common able material on the universities’
Web sites, as well as on a study of
body of knowledge (CBK) for the security domain. their syllabi. We found that 15 un-
dergraduate and 45 postgraduate-
A CBK is a framework and collection of information that provides level programs offer degrees in some
aspect of information security. Most
MARIANTHI a basis for understanding terms and cure software engineering and pro- of the programs offering informa-
THEOHARIDOU concepts in a particular knowledge curement of secure software. tion security degrees are run under
AND DIMITRIS area. It defines the basic information Information security is a multi- the umbrella of computer science or
G RITZALIS that people who work in that area disciplinary endeavor.6,7 In prac- computer engineering departments.
Athens are expected to know.1 The Interna- tice, professionals need knowledge At the postgraduate level, a usual
University of tional Information Systems Security and experience from fields such as prerequisite is an undergraduate de-
Economics Certification Consortium ([ISC]2; management, business administra- gree in computer science, electrical
and Business www.isc2.org) defines a CBK as a tion, ethics, sociology, and political and computer engineering, or in-
taxonomy of topics relevant to pro- science. Yet, existing CBKs focus formation systems. Several under-
fessionals around the world. on specific information security graduate (and graduate) programs
Industry has initiated some exist- subdomains and thus offer limited offer information-security-related
ing CBK efforts, primarily for use in understanding and narrow percep- courses, without offering degrees in
certification programs such as the tions of the overall domain. Our the subject. Such courses are usually
Certified Information Systems Se- aim is to identify and define an few in number and are often op-
curity Professionals (CISSP) and InfoSec CBK to serve as a tool for tional. Table 1 describes the course
Certified Information Systems Au- developing an information security content, which we divided into
ditor (CISA). The (ISC)2 CBK is curriculum. seven categories.
used for CISSP certification, for ex- To identify the essential skill set
ample, and focuses on establishing a CBK development for information security profession-
common framework of information We began by identifying industry als, we also conducted a survey of in-
security terms and principles to help and academic views on the issue of dustry needs. To that end, we
information security professionals an InfoSec CBK. We first conducted consulted the online material pro-
discuss, debate, and resolve relevant a review of curricula and courses. vided by the Career Space Consor-
issues. Typical academic examples Various curricula deal with InfoSec tium (CSC; www.career-space.com),
include CBKs for information assur- topics, but we focused on university which includes 11 major informa-
ance,2 information security in net- programs related to the “informa- tion and communications technol-
work technologies,3 and information tion scene”—computer science, ogy (ICT) companies as members,
security professionals.4 Rather than business administration, informa- and the European ICT Industry As-
providing an information security tion systems management, sociol- sociation (www.eicta.org). CSC de-
CBK, these efforts focus on individ- ogy, and law programs that offer fines ICT professional areas and the
ual subdomains and serve as guides academic degrees in, or courses re- skills needed for each, but it doesn’t
to create courses and curricula.2,3 A lated to, InfoSec. In total, we studied define a “pure” InfoSec professional
recent US Department of Home- the programs and courses currently area. To select relevant ICT areas, we
land Security initiative created a offered at 135 academic institutions. studied the content of the security
CBK for secure software assur- The per-continent geographical dis- courses we identified previously.
ance,1,5 which provides a basis for se- tribution was: Africa (10), Asia (10), As defined by the CSC, the ICT

64 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Education

Table 1. Information security course content.


CATEGORY COURSE TOPICS

Access control Identification, authentication and authentication systems, access control, authorization, anonymity, and privacy
and privacy
Risk and attacks Attacks, vulnerabilities, risks, intrusion detection, malicious software, tests and audits, safeguards, and intrusion handling
Cryptography Applied cryptography, digital signatures and certifications, key management, and public-key infrastructures
Networks Security theory, protocols and algorithms of networks, and firewalls
Security design Computer systems security design
Business Business continuity planning
Ethics and law Ethical and legal issues

Table 2. Technical and behavioral skills for information security professionals.


CATEGORY SKILLS

Information Networks, technology and computer engineering, systems design and architecture, programming, software
Communications development, mathematics, statistics, project management, business strategy and requirements analysis, testing, basic
Technology security skills, technical documentation, system management, quality assurance, and evaluation and configuration
methodologies.
Security Information security and human computer interaction, computer forensics, database security and data mining,
operation systems security, security architecture, malicious software, Internet and cybersecurity, incident handling,
hacking, cryptography, biometric techniques, smart cards, auditing, data and infrastructure protection, and risk
management.
Behavioral Leadership, ethics, analytical and conceptual thinking, perspicacity, creative thought, knowing one’s limits, professional
attitude, communication, technical orientation and interest, customer orientation, strategy and planning, writing and
presentation skills, efficiency and quality, and applying knowledge.

professional areas that refer to secu- identified from online job postings. • Operating system security
rity knowledge include: Table 2 shows the resulting skill set, • Program and application security
which we used as a baseline for de- • Database security
• integration and test engineering, veloping our proposed CBK. • Business and management of in-
• systems, formation systems security
• data communications engineering, Structure and content • Physical security and critical infra-
• digital signal processing (DSP) ap- We developed a hierarchy of con- structure protection
plication design, cepts and elements from the univer- • Social, ethical, and legal consider-
• technical support, sity curricula we examined and ations
• communication network design, checked it against the knowledge set
• software and application develop- defined in the industry surveys. Fi- We then split the list into subdo-
ment, nally, we combined these elements mains, according to the elements that
• software architecture and design, and grouped them under common form each domain; doing so lets us
• research and technology develop- “roots” (hash algorithms and ciphers focus in depth on each separately.
ment, under the root cryptography, for ex- This CBK aims to reconcile industry
• ICT management, ample) to create a classification and academia needs and reflect infor-
• IT business consultancy, and scheme with 10 basic domains: mation security’s multidisciplinary
• ICT project management. nature.7–9 It covers multiple dimen-
• Security architectures and models sions, including governance, ethics,
We surveyed the skills needed for • Access control systems and policies, certification, measurement
each of these ICT categories, as de- methodologies and metrics, and best practices.
fined by the CSC. To filter and en- • Cryptography Moreover, we identified the prereq-
rich the skill set, we used numerous • Network and telecommunica- uisite knowledge that professionals
security job descriptions that we tions security should incorporate from several fun-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 65


Education

*Business
security aspect grouped in the
Incidence management “physical security and critical infra-
*Law
Use of response structure protection” domain. Note
Law
protection *Didactic that some elements are included in
mechanisms more than one domain. The aster-
Risk Security Security training and
management staff awareness of users isks in the figures illustrate which
disciplines (of the seven we identi-
Business and
Operations for system assurance Security in fied) each topic mainly derives from.
management of
information work duties
Access control systems security
Disaster
Processes and duties
in organization’s operation
Business
recovery plan
W e developed this broad CBK
primarily as a conceptual tool
for designing academic curricula
Data storage management continuity plan *Business management
*Computer engineering
*Computer science
and information security courses.
Practices and security models
It’s intentionally abstract and ge-
neric because it attempts to catego-
Figure 1. Business and management of information systems security domain. Among rize the required knowledge for the
the issues raised in the university curricula and professional duties we surveyed related information security area, rather
to organizational issues in information security were personnel security, including than specific topics of interest. Al-
training and awareness programs (derived from didactics science), and business though any CBK needs to be con-
continuity planning (derived from business management). Asterisks indicate the stantly refined to fit into the field’s
disciplines from which the domains derive. emerging context and content, we
believe the 10 domains we’ve iden-
tified could remain a steady refer-
*Psychology *Ethics
ence set for some time.
E-voting and Ethics and Given that this CBK has yet to
Analysis of human behavior *Didactic
e-government ethical see thorough use in developing an
in breach and attack issues Training programs
services awareness academic curriculum, evaluation re-
sults aren’t available. Our future goal
Social, ethical, and is to use it to develop an MSc pro-
Psychology legal considerations Criminology E-crime
of security in gram on information security and
Research and critical infrastructure protection.
Speech freedom and censorship information security
computer forensics
Toward that end, we first plan to
New technologies Forensics Digital restructure some existing courses
Privacy
and economic growth analysis evidence in existing information security
Social practices in of security *Law
identities encryption *Economy courses, based on this CBK—proba-
incidences *Criminology
*Computer science bly a general introductory course
*Sociology
(undergraduate at senior level) and a
technology-focused postgraduate
Figure 2. Social, ethical, and legal considerations domain. The study of sociology, course that covers material from one
criminology, and psychology can help security professionals in understanding the or more domains. Once we actually
motives and social factors that affect human behavior and lead to security violations. deliver these restructured courses,
Moreover, the study of ethics and legal issues are vital in protecting human rights, and we will perform a round of refine-
information security professionals should not only comprehend and know the relevant ments and possible improvements
legal issues but also be able to apply them with respect to computing. with the CBK. We’ll then continue
to evaluate it based either on the ex-
perience gained from restructuring
damental disciplines: computer sci- ical, and legal considerations of secu- the rest of the existing courses or on
ence and engineering, law, sociology rity” (Figure 2)—demonstrate how developing essential new ones.
and criminology, ethics and psychol- multidisciplinary elements can be
ogy, business and information sys- incorporated into the CBK. Figure References
tems management, and didactics. 3 illustrates the “cryptography” do- 1. M. Bishop and S. Engle, “The Soft-
Two domains—“business and main, a topic with mathematical ware Assurance CBK and Univer-
management of information systems background that’s included in most sity Curricula,” Proc. 10th Colloq.
security” (Figure 1) and “social, eth- syllabi. Figure 4 presents the physical Information Systems Security Education,

66 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Education

Univ. of Maryland, 2006, pp. 14-21. *Computer science management


*Law Types of Cryptographic
2. E. Crowley, “Information System ciphers checksums
Security Curricula Development,” Law
Third trust parts (TTP)
Proc. 4th Conf. IT Curriculum, ACM
Press, 2003, pp. 249–255. Authentication/ Public-key infrastructure (PKI)
identification
3. K. Morneau, “Designing an Infor- Cryptographic key infrastructures
mation Security Program as a Core
Competency of Network Technol- Cryptanalysis Cryptology Key management
Public key, digital signatures
ogists,” Proc. 5th Conf. IT Education,
ACM Press, 2004, pp. 29–32.
Methods of Cryptosystems attacked and cryptanalytic attacks
4. E. Smith et al., “Information Secu-
encryption
rity Education: Bridging the Gap and decryption
between Academic Institutions and Hash algorithms
Cryptographic Cryptographic
Industry,” Proc. 4th IFIP World Conf. security protocols security models
Information Security Education,
Moscow Engineering Physics
*Prerequisites: Mathematics (probability theory, number theory)
Institute, 2005, pp. 45–55.
5. S. Redwine, ed., Secure Software
Assurance: A Guide to the CBK to Figure 3. Cryptography domain. This topic derives from computer science and
Produce, Acquire and Sustain Secure mathematics, but InfoSec professionals should also examine the legislative and law
Software, US Dept. of Homeland enforcement issues.
Security, 2006.
6. D. Gritzalis, M. Theoharidou, and
E. Kalimeri, “Towards an Interdis- *Computer science
ciplinary InfoSec Education *Computer engineering
*Mechanism engineering
Model,” Proc. 4th IFIP World Conf. *Cryptography
Information Security Education, Countermeasures,
Media storage requirements processes, and Applied
Moscow Engineering Physics safeguards cryptography
Institute, 2005, pp. 22–35.
7. C. Cresson-Wood, “Why Infor- Tracking technologies Law *Law
mation Security is Now Multi-
Physical security
Disciplinary, Multi-Departmental, and critical
Environment/infrastructure Ethics *Ethics
and Multi-Organizational in protection from fire
infrastructure
Nature,” Computer Fraud & Security, protection
Elsevier, 2004, pp. 16–17. Perimetric security
Vulnerabilities
8. C. Irvine, S.-K. Chin, and D. and threats
Protection and monitoring mechanisms
Frincke, “Integrating Security into Security controls *Physics
the Curriculum,” Computer, vol. Person/personnel access control *Didactic
31, no. 12, 1998, pp. 25–30.
Administrative Technical/ Physical
9. K. Petrova et al, “Embedding Infor- Infrastructure access control controls logical controls controls
mation Security Curricula in Exist-
ing Programmes,” Proc. 1st Ann.
Conf. InfoSec Curriculum Develop-
ment, ACM Press, 2004, pp. 20–29. Figure 4. Physical security and critical infrastructure protection domain. Interesting
aspects of this domain are the physical threat and vulnerabilities, as well as ethical and
Marianthi Theoharidou is a PhD candi-
legal constraints when applying access control or surveillance in physical sites.
date in the department of informatics at
Athens University of Economics and
Business. Her research interests include Dimitris Gritzalis is an associate pro- management, information security edu-
information security management, infor- fessor of information and communica- cation, security and privacy in ubi-
mation security education, risk analysis tions technology security and the quitous computing, and spam over
and management, and spam over Inter- director of the Information Security and Internet telephony. Gritzalis has a PhD
net telephony. Theoharidou has an MSc Critical Infrastructure Protection re- in security critical information systems
in information systems from the Athens search group with the department of from the University of the Aegean,
University of Economics and Business. She informatics at Athens University of Eco- Greece. He is the editor of Secure Elec-
is a student member of the ACM and the nomics and Business. His research in- tronic Voting (Kluwer, 2003). Contact
IEEE. Contact her at mtheohar@aueb.gr. terests include information security him at dgrit@aueb.gr.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 67


On the Horizon
Editor: O. Sami Saydjari, ssaydjari@cyberdefenseagency.com

Secure Communication
without Encryption?

T
he potential computational speedup that quantum have a qubit in our possession, and
we know the basis in which it was
algorithms offer in certain problems threatens the prepared, then we can interact with
the system by performing a mea-
security of current cryptographic techniques that surement on it, the result of which
will be the classical bit’s value. This is
rely on the infeasibility of factoring large numbers. why we must know the basis: it tells
us the type of experiment we need
But the same technology that currently threatens public-key to perform in a laboratory. Remem-
ber, our bits may be represented by
KEYE MARTIN infrastructure also provides a seem- a way to communicate when they photons!
US Naval ing alternative: a protocol for quan- don’t want anyone else to know Now think of Eve, an eavesdrop-
Research tum key distribution (QKD), which they’re communicating. What we per trying to listen in on Alice and
Laboratory provides a secure method for estab- consider shortly is a hidden channel Bob. Alice sends many qubits |* to
lishing a secret key between two par- within QKD. Two people, Alice Bob, each one prepared in a different
ticipants. These two people can then and Bob, will engage in a typical in- basis. Eve has no idea what basis Alice
use this key to encrypt information, stance of QKD that, to an outside used, so all she can do is guess and
providing them with the ability to observer, looks like any other. But then perform a measurement (qubits
communicate securely. QKD has when the session is over, they will can’t be copied); Bob has to do the
been implemented numerous times, have secretly communicated. Put same thing. But Alice knows the
and it’s commercially available. succinctly, Alice and Bob say one basis used and can tell Bob after the
Recent investigations1 reveal that thing but mean another. fact without disclosing the identity of
fundamental quantum components Like all systems, a quantum system any of the bits. Because Bob will
such as QKD can be manipulated in has state. A quantum system’s state is guess the correct basis half the time,
ways not anticipated until recently. represented mathematically by vec- he should possess half the classical bits
Using any of several techniques rang- tors such as |0 and |1 or |– and sent by Alice. However, if Eve
ing from simple covert tricks to more |+ We think of the first two states as guesses a basis and then measures
intricate aspects of quantum infor- quantum realizations of the classical each qubit as it travels from Alice to
mation, someone can use a quantum bits 0 and 1, but we also think the Bob, she will introduce errors into
protocol to obtain a new protocol same about the second pair of states. the bits that Alice and Bob share. By
that’s physically indistinguishable These states are examples of what are comparing their respective values for
from the original, but that also con- often called qubits. Suppose that just some of the shared bits, Alice and
tains an information channel whose someone sends us a qubit |* that Bob can detect Eve’s presence.
existence is undetectable by any cur- represents either a 0 or a 1. How can Now let’s recall one of the stan-
rently known technology. Such “hid- we figure out which bit it represents? dard schemes for QKD, the BB84
den channels” could potentially The short answer is that we can’t! For protocol:2
provide secure communication with- us, as people who would like to com-
out encryption: the protection that municate, this is annoying, but it’s 1. Alice chooses a random string k
quantum mechanics offers to keys equally annoying to an eavesdropper, of roughly 4n bits containing the
could extend to the information and this is a good thing. eventual key.
transmitted during communication To determine the value of a bit 2. Alice randomly codes each bit of
itself, making key-based encryption that someone sends us, we also need k in either the X = {|+, |–} or
unnecessary. to know in which basis it was pre- Z = {|0, |1} bases.
pared. One example of a basis is the 3. Alice sends each resulting qubit
A hidden channel X basis, X = {|+, |–}, and another to Bob.
A hidden channel offers two people is the Z basis, Z = {|0, |1}. If we 4. Bob receives the 4n qubits, ran-

68 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
On the Horizon

domly measuring each in either in which Alice prepared the qubit. mean 0, whereas choosing it as the
the X or Z basis. The key that Alice and Bob eventu- last of the remaining bits would sig-
5. Alice announces in which basis ally share is thus determined by an in- nify 1.)
she originally coded each bit of k. teraction between the two parties. So Bob now has the information
6. Bob tells Alice which qubits he Alice isn’t really sending Bob infor- Alice sent; he knows its relation to the
measured in the correct basis (but mation, she’s sending him data—to last check bit because the two parties
not the bit values he obtained); send him information would imply have agreed on this scheme in ad-
they now share roughly 2n bits. that she decides up front on a mes- vance: they have agreed that Alice
7. Alice selects a subset of n bits sage and that he receives it with high will covertly send Bob a pointer to
from the group she formed in probability, independent of his own the information. To illustrate how the
step 6 that they will use to check actions. In other words, communi- hidden channel operates, suppose
on interference by an eavesdrop- cation occurs when Alice has some that Alice and Bob share the 2n bits
per (Eve), and tells Bob which measure of control over the data Bob
bits she selected. receives. To quote from a standard 001010111001010111
8. Alice and Bob compare their reference on quantum information,1 0101101100
values of the n check bits; if
more than an acceptable num- “Quantum cryptography is Alice first selects the information bit
ber disagree, they abort the pro- sometimes thought of not as (the one with a bar over it)
tocol (eavesdropping). secret key exchange or trans-
9. Alice and Bob perform infor- fer, but rather as secret key gen- 00101011100101011 1
mation reconciliation and pri- eration, since fundamentally 0101101100
vacy amplification to select a neither Alice nor Bob can pre-
smaller m-bit key from the re- determine the key they will ul- and then selects n – 1 check bits at
maining n bits. timately end up with upon random (shown in bold):
completion of the protocol.”
Throughout the literature, QKD 00101011100101011
is loosely considered to be “commu- However, we can easily modify the 1 0101101100
nication,” but strictly speaking, it re- QKD protocol so that Alice and
ally isn’t. Communication is the Bob are able to communicate. As- One by one, these check bits are
transfer of information from a sender sume Alice wants to send Bob a sin- publicly announced and their values
to a receiver, with the implicit under- gle bit of information. All we have to are compared. Alice and Bob now
standing that the information re- do is make a simple change to step 7 share n+1 bits. She now selects the
ceived is independent of any action from earlier: Alice randomly selects a last check bit as the pointer to the in-
on the receiver’s part. This indepen- bit from the group of 2n whose formation (in bold):
dence is what makes communication value is the information she wants to
a worthwhile endeavor: it grants the transmit. Then she randomly selects 0*10*01*1*0*0*0**
sender the freedom to say whatever n – 1 check bits from the remaining 1 *10*1*11**
he or she wants. If, for example, you 2n – 1. The nth check bit is chosen
talked to a friend on a noiseless tele- from the remaining n + 1 as being Is Bob guaranteed to receive the
phone and he received a subset of the bit to the immediate left of the information Alice sent? No, but he
roughly half the words you spoke, information. (The case when the 2n isn’t guaranteed to receive all the bits
and if this subset were randomly de-
termined by the number of times a
hummingbird in the room flapped its Communication is the transfer of information
wings between words, you wouldn’t
feel as though communication were from a sender to a receiver, with the implicit
taking place. And that’s exactly what
goes on with QKD. understanding that the information received is
Ideally, Alice hopes Bob will re-
ceive the entire bit string k correctly, independent of any action on the receiver’s part.
but even in the absence of both envi-
ronmental noise and eavesdropping, bits are all 0 or all 1 is handled with a perfectly in the QKD case either.
Bob still receives only half the bits in simple convention—for example, Suppose an eavesdropper measures
k, and this half is randomly deter- we could choose the nth check bit as only the qubit that holds the infor-
mined by his ability to guess the basis the first of the remaining bits to mation in the wrong basis—if so,

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 69


On the Horizon

Bob has a 50 percent chance of hold- mation bit. Knowing this, an eaves- tum protocol achieve high enough
ing the wrong bit, even though he dropper would have to determine bandwidth to be a feasible alterna-
believes he has the right bit—or sup- which displacement they were using, tive to present schemes? What is its
pose background radiation acts as by observing several trial runs of capacity per unit time?
• How will two people communi-
cate in this manner if they’re sepa-
Communication without encryption certainly rated by a large distance, such as an
ocean?
seems possible: the natural encryption that
Transmitting qubits over large
quantum mechanics offers replaces distances would require the use of
satellites. If these satellites were
mathematical encryption. placed into geostationary orbit, we
might need huge light-gathering
noise and causes the information bit QKD. But if Alice sends bits across mirrors to prevent photon loss,
to flip. In either case, Bob would the hidden channel with equal fre- which seems impractical at high
have no idea and neither would quency (something guaranteed to orbit. Fortunately, research at Los
Alice, but chances are good that such achieve at least 94 percent of the hid- Alamos National Laboratory has es-
errors would manifest themselves in den channel’s capacity), all displace- tablished the feasibility of free-space
the check bits as well, which would ments appear equally likely to an QKD between a ground station and
then help Bob and Alice determine outside observer: the values in all a low-Earth-orbit satellite.3
in advance the likelihood of success- other locations are also equally dis- To gauge what a hidden channel
ful communication. There are, tributed, due to the fact that k is ran- could offer, we must be able to mea-
however, major differences between domly generated. Eve might then sure the amount of information we
these two formulations of QKD: decide to simply try and match the can send through one, but in a way
bits in a fixed location to a coding of, that takes into account the physical
• In QKD, Alice sends Bob data. In say, the English alphabet. The prob- effort required to transmit this infor-
the modified version, she sends a lem for her then is that she won’t mation. Of particular interest to us is
bit of information—that is, data know the value of the bit in a fixed the capacity per unit time—the
she has control over. They’re location every time QKD is run, but amount of information the channel
communicating. at best, every other time: at least half can transmit per second.
• The information they exchange is the bits in the code words she ob- In information theory,4 capacity
secure, so in principle, keys aren’t serves will be bits that Eve herself has is usually calculated in units of ca-
necessary. They’re communicat- guessed. The reason why is that pacity per unit symbol—that is, the
ing securely without encryption. quantum mechanics forces Eve to amount of information transmitted
• Communication between them is guess a basis before measuring a per use of the channel, which im-
hidden; to an outside observer, the qubit. In fact, if Alice and Bob wish, plicitly assumes that the cost of send-
protocol looks like any other they can use a random value of dis- ing one symbol is the same as the
instance of QKD. No observable placement each time a bit is commu- cost of sending any other symbol.
differences can tip off an eaves- nicated over the hidden channel by But would you really want someone
dropper that covert commu- using the key generated in the previ- to tell you that “on average, one bit is
nication is taking place. The ous QKD session. transmitted per use of the channel”?
communication is steganographic. One channel could take two days to
A critique transmit a bit, another could take less
It’s worth going into more detail Communication without encryp- than a fraction of a second—which
on why this form of steganographic tion certainly seems possible: the channel are they describing? Clearly,
communication is secure. Suppose natural encryption that quantum me- we’re interested in determining ca-
an eavesdropper, perhaps after read- chanics offers replaces mathematical pacity per unit time, which will re-
ing this article, assumes Alice and encryption. But could a scheme like quire us to model the effect of
Bob are using the channel described this ever provide a practical alterna- environmental noise on qubits in
earlier to secretly communicate. tive to current forms of secure com- transit.
Alice and Bob don’t have to use the munication? We must resolve some It might also require us to under-
leftmost bit as a pointer—in general, unanswered questions: stand more subtle aspects. The re-
they can use a bit that is displaced any ceiver in this channel is a satellite in
number of positions from the infor- • Can a hidden channel in a quan- low-Earth orbit, and information

70 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


On the Horizon

theory wasn’t developed on the as- In several trial runs of freespace Information, Cambridge Univ.
sumption that the receiver might be QKD,6 error rates p in the 2n bits Press, 2000.
moving. Because our satellite is in shared by Alice and Bob ranged from 3. R.J. Hughes et al., “Practical Free-
low orbit, a clock running on it a high of p = 0.089 to a low of p = Space Quantum Cryptography,”
moves slower than one on the 0.006, with the great majority being Proc. Quantum Computing and
ground due to time dilation, which p = 0.04 or less. A lower bound on Quantum Communications: First
occurs not only because it’s moving the hidden channel’s capacity in this NASA International Conference,
relative to Earth, but also because it’s experiment is obtained by calculat- LNCS 1509, Springer-Verlag,
close to Earth (and feels gravitational ing the capacity of a binary symmet- 1999, pp. 200–213.
effects). Thus, capacity per unit time ric channel whose probability of a bit 4. C.E. Shannon, “A Mathematical
depends on frame of reference. So flip is p. Thus, the hidden channel’s Theory of Communication,” Bell
how do we determine the capacity capacity ranges from a low of 0.566 Systems Technical J., vol. 27, July–Oct.
of a relativistic timing channel? bits/sec to a high of 0.947 bits/sec, 1948, pp. 379–423 and 623–656.
Which clock are we supposed to with an average of roughly 0.75 5. A. Bergou, R. Gingrich, and C.
use? Barring a cascade of such chan- bits/sec, where the maximum possi- Adami, “Entangled Light in Mov-
nels, each passing messages from one ble capacity is 1 bit/sec. Although ing Frames,” Physical Rev. A, vol.
to the other, we would guess that the space prohibits going into details 68, no. 042102, 2003.
relativistic correction to capacity here, the information Eve can ac- 6. J.E. Nordholt and R.J. Hughes, “A
wouldn’t affect our calculations quire per second ranges from a high New Face for Cryptography,” Los
much. However, the relativistic cor- of 0.00000043 bits/sec to a low of Alamos Science, no. 27, 2002, pp.
rection to time itself seems especially 0.0000000019 bits/sec. Notice that 68–86.
relevant if the sender wants to time these results highlight a fundamental
stamp messages before transmitting difference between classical and Keye Martin is a researcher in the Center
for High Assurance Computer Systems at
them to the satellite—the receiver quantum communication: as the the US Naval Research Laboratory. His
needs to know how the time on his noise in the hidden channel de- research interests include relativistic and
or her clock relates to that on the creases, an eavesdropper is able to nonrelativistic quantum information,
sender’s, or else time stamping might learn less and less. The explanation domain theory, and information hiding.
He has a PhD in mathematics from
impart incomplete information. (It’s for this is quantum mechanical in na- Tulane University. Contact him at
intriguing to wonder if relativistic ture: eavesdropping necessitates the kmartin@itd.nrl.navy.mil.
effects might also come into play in introduction of noise.
QKD implementations based on
entangled pairs of qubits: photon
IEEE
entanglement can also depend on
frame of reference.5)
Let’s now obtain a rough idea of
W e’ve purposely avoided one
question in this discussion: in
the hidden channel, we assumed
Computer save
how the hidden channel might per- Alice and Bob had some way of au- Society
form in a freespace QKD setup. We
will assume that each instance of
thenticating each other’s identity,
but can they do this in a uniquely members
25%
QKD always runs to completion and quantum mechanical manner? No-
takes roughly 1 second to do so, with tice that the QKD protocol doesn’t
respect to the sender’s clock. Notice provide a way for either of them to on all
that this means we ignore extreme know who they’re talking to. Bob
effects by an eavesdropper whose in- could be engaging in QKD with conferences
terference causes us to abort the pro- anyone—he has no way of know-
tocol and run it again, thereby at least ing—and what’s the point of secure sponsored
doubling the time it takes to send a communication if you don’t know
particular bit. It also means that we who you’re talking to? by the
assume the environment doesn’t
generate error rates that require us to References
IEEE
abort. Minimally, then, the percent-
age of errors p in the 2n bits shared by
1. K. Martin, Steganographic Commu-
nication with Quantum Information,
Computer
Alice and Bob must be strictly less
than 0.25, which is the error rate due
tech. report, US Naval Research
Lab., to appear in 2007.
Society
solely to Eve when she measures all 2. M. Nielsen and I. Chuang, Quan-
qubits sent from Alice to Bob. tum Computation and Quantum w w w. c o m p u t e r. o r g / j o i n
www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 71
Privacy Interests
Editors: E. Michael Power, michael.power@gowlings.com
Roland L. Trope, roland.trope@verizon.net

Setting Boundaries at Borders


Reconciling Laptop Searches and Privacy

I
f you’ve traveled internationally on business, the odds are that this debate has already started
in the US and Canadian courts.
that you’ve taken your laptop with you. Like most busi-
Border searches
ness travelers, you need these ubiquitous devices to do One perspective (known in the US
as the border search doctrine) is that a
work, make presentations, and communicate with search is reasonable simply because it
occurs at the border.2,3 This rests on
coworkers, family, and friends via the Internet. In a previous the view that the government’s inter-
est “in preventing the entry of un-
E. M ICHAEL department, we explored the notion This last question is particularly vex- wanted persons and effects is at its
POWER AND that laptops deserve special considera- ing in light of the increasing probabil- zenith at the international border.”2
JONATHAN tion because of the increasingly ity that, in the absence of a The search of a traveler’s effects and
G ILHEN blurred line between home and of- well-founded, particularized suspi- containers such as luggage, pocket-
Gowling fice, the entrusting of intimate, pri- cion, most travelers’ laptops will carry books, and briefcases—which the
Lafleur vate information to storage on personal or commercially sensitive courts would elsewhere view as so
Henderson laptops, and the resulting need to re- information, and few will be used to invasive that it would require a war-
LLP think the rules surrounding reason- smuggle dangerous contraband. rant—are deemed so necessary and
able expectations of privacy.1 This As courts grapple with such routine at border checkpoints that
ROLAND L. time, we examine the nexus between questions, they will attempt to they may be conducted without a
TROPE laptops, a government’s search and compare the unfamiliar to some- warrant and without a well-founded,
Trope and seizure powers, and a traveler’s tran- thing more familiar, using analogies particularized suspicion (a probable
Schramm LLP sit through an international border or metaphors to solve new prob- cause) to target one traveler instead
checkpoint where customs officials lems based on earlier solutions. The of another—they can simply target
have enhanced powers to search trav- analogies the courts choose will any traveler passing through the
elers and their belongings. strongly influence the judicial dis- checkpoint. However, some border
This collision of interests between course and potentially alter the de- searches (such as strip searches) are so
a person’s right to be secure from un- cisions rendered. Are laptops nonroutine, because they are highly
reasonable searches and seizures and a merely luggage packed with data intrusive of a person’s privacy and
government’s obligation to protect that can be easily rummaged in the dignity, that US courts require that
its borders from the smuggling of il- course of a government search? they be based on what various courts
licit materials and other informa- Should they instead be viewed as refer to as a heightened level of suspi-
tional contraband via laptops and personal libraries whose volumes cion, a particularized suspicion,4 or
other storage devices has recently be- include many unpublished, in- “real suspicion” supported by “ob-
come ripe for decision by courts that tensely private writings composed jective, articulable facts”5 in order to
must answer three questions: in the belief that, like personal let- be reasonable without a warrant.
ters, diaries, or attorneys’ trial- At issue is whether the border
• How should the law treat a laptop strategy notes, they would not be search doctrine should include an
when the government wants to made public, and therefore require exception based on the privacy and
search and seize its contents? protection against warrantless, un- confidentiality of information in-
• How should that treatment change reasonable searches and seizures as creasingly stored on laptops—a posi-
when a traveler brings the laptop part of the owner’s basic human tion advanced in the November
into a border checkpoint? rights? Although some might ques- 2006 case of United States v. Arnold.
• What deference, if any, should the tion how laptops could form the
courts give to privacy interests at battleground for such an important The Arnold case
border checkpoints? discussion, several cases indicate In July 2005, after a nearly 24-hour,

72 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Privacy Interests

coach-class flight from the Philip- ment violated the US Constitution’s tors’ and corporate execu-
pines, Michael Arnold arrived at Los Fourth Amendment (which pro- tives’ computers may contain
Angeles International Airport. He tects the “right of the people to be trade secrets.”6
collected his luggage and stood in secure in their persons, houses, pa-
line to go through the customs pers, and effects, against unreason- The court concluded that the gov-
checkpoint. US Customs and Bor- able searches and seizures”). ernment’s search didn’t originate
der Patrol (CBP) Officer Peng asked The US District Court, Central with a well-founded, reasonable sus-
a few routine questions and in- District of California, granted Ar- picion, and therefore should have
spected Arnold’s luggage and carry- nold’s motion, reasoning that the na- complied with the Fourth Amend-
on bag, which contained what many ture of information entrusted to ment’s warrant requirements. The
travelers today have in carry-on laptops and other electronic storage court’s determination of a lack of
bags: a laptop computer, a separate devices “renders a search of their reasonable suspicion rested on the
hard drive, a memory stick, and sev- contents substantially more intrusive fact that Officer Peng:
eral CDs. Officer Peng made the than a search of the contents of a
customary request that Arnold turn lunchbox or other tangible object.”6 • had only a vague and inconsistent
on the computer to verify that it In declining to analogize a computer recollection of the circumstances
would operate. She then transferred to a mere storage container, the surrounding the search;
the laptop to a second CBP officer court justified its actions using a • characterized Arnold as “di-
who noticed numerous icons and broad range of privacy interests: sheveled” and then admitted the
folders on the display screen, includ- term came not from her but from
ing two folders labeled “Kodak pic- “People keep all types of per- government’s counsel; and
tures” and “Kodak memories.” (The sonal information on comput- • selected Arnold because “he did
term “Kodak memories” is part of ers, including diaries, personal not become agitated when she
popular North American parlance letters, medical information, pulled him aside for secondary
and appears in testimonials by photos, and financial records. questioning”—in her view, US
Americans and Canadians on Kodak Attorneys’ computers may citizens who are so selected “typi-
Web sites, such as www.kodak contain confidential client cally become agitated and mad.”6
gallery.com/oprah_bigadventure/ information. Reporters’ com-
and wwwcaen.kodak.com/CA/en/ puters may contain infor- Further compromising the testi-
motion/publication/onFilm/jcLab mation about confidential mony, Officer Peng’s suspicions con-
recqueQA.jhtml.) Purporting to act sources or story leads. Inven- sisted of a one-page memo she wrote
on those allegedly suspicious labels,
the CBP officers clicked open the
folders to view the contents. Among
the images, they found one of two
naked adult women.
With this discovery, Immigration
and Customs Enforcement (ICE)
special agents interrogated Arnold for
several hours about his laptop’s con-
tents. They expanded the search and
found numerous images of what they
believed to be child pornography.
The ICE agents then seized Arnold’s
computer equipment and released
him. Two weeks later, federal agents
obtained a warrant to search the lap-
top and storage devices and found ad-
ditional images.
The government indicted
Arnold for possession and trans-
portation of child pornography.
Arnold moved to suppress the evi-
dence, claiming the CBP’s search
and seizure of his computer equip-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 73


Privacy Interests

nearly a year after the search, at the sive search of a laptop without first on the constitutionality of warrantless
government’s request and based, in obtaining a warrant. customs searches in R. v. Simmons.7
part, on recollections by others. A further aspect, not before the Writing in regard to section 98 of the
Most relevant for future cases, court in this case, is the plausible sce- Customs Act, dealing with searches
of the person, the Court noted:
At one level, laptops and computer devices “The dominant theme unit-
ing [the American] cases is
simply store data; at another level, that data that border searches lacking
prior authorization and based
represents our intimate thoughts, hopes, on a standard lower than
probable cause are justified by
dreams and desires. the national interests of sover-
eign states in preventing the
the court reasoned that, although nario of Arnold’s having encrypted entry of undesirable persons
not as invasive as a body cavity all data on his hard drive (except for and prohibited goods, and in
search, “the search of one’s private the file folders visible on the desk- protecting tariff revenue.
and valuable personal information top). The CBP officers would have These important state inter-
stored on a hard drive or other elec- seen nothing when they clicked on ests, combined with the
tronic storage device can be just as the desktop icons. Could CBP have individual’s lowered expec-
much, if not more, of an intrusion attempted, with or without a war- tation of privacy at an inter-
into the dignity and privacy inter- rant, to compel Arnold to disclose national border render
ests of a person.”6 The court sup- the password? The answer would border searches reasonable
ported its proposition with the probably depend on whether he had under the Fourth Amend-
following analogy: memorized the password or written ment. In my view, the state
it down on paper in his possession or interests enunciated through-
“This is because electronic in a unencrypted document stored out the American jurispru-
storage devices function as an on his laptop. Arnold could refuse to dence that are deemed to
extension of our own mem- disclose what he had memorized, make border searches reason-
ory. … Therefore, govern- and probably justify it with a Fifth able are no different in princi-
ment intrusions into the Amendment right against self- ple from the state interests
mind—specifically those that incrimination. If the password were which are at stake in a Canadian
would cause fear or appre- on paper in his possession or stored in customs search for illegal nar-
hension in a reasonable per- a laptop file, the reasoning in Arnold cotics. National self-protection
son—are no less deserving of suggests that the court might not re- becomes a compelling com-
Fourth Amendment scrutiny quire a warrant to search him for the ponent in the calculus….
than intrusions that are physi- paper, but would still require that Consequently, travelers seek-
cal in nature.”6 CBP have a particularized suspicion ing to cross national bound-
(or a warrant) to search the unen- aries fully expect to be subject
An analogy between human crypted laptop files for the password to a screening process.”7
memory and computer memory is because by encrypting the data Ar-
inaccurate and likely misleading for nold would demonstrate both a sub- Although the Canadian Supreme
future judicial consideration, but the jectively and objectively reasonable Court didn’t go so far as to hold that
court was on to something. Because expectation of privacy. Moreover, it’s searches at the border are reasonable
people entrust intimate, sensitive doubtful that CBP could legally by the very fact that they occur at the
data to laptops, a government’s compel Arnold to assist in their border, it did accept that a lower
search of a laptop deserves to be search by identifying the paper or the threshold, including search without
equated to searching a person’s file containing the password. a warrant, does not offend section 8
thoughts and thought processes, and of the Canadian Charter of Rights
that clearly does amount to an inva- Northern perspective and Freedoms (a provision compa-
sive search. The court didn’t advo- To put a discussion of the American rable to the US Fourth Amend-
cate permissive relaxation of border positions in perspective, it might be ment). When reasonable suspicion
protection—only that a reasonable useful to consider the same issue out- has been formulated, both objec-
ground for suspicion exist if the gov- side the US. The Supreme Court of tively and subjectively, the search
ernment wishes to conduct an inva- Canada took the opportunity to rule may be more intrusive.

74 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Privacy Interests

In Simmons, the Court also held tends to sacrifice their safety. In a LaFleur Henderson LLP, where he
that personal searches will engage world in which threats of terrorism provides strategies and legal advice on
technology, privacy, regulatory, and in-
the right to counsel in order to en- are real and continue to proliferate, a formation management issues. He has
sure that the person being searched government seeking to defeat them a BA, an MBA, and an LLB from Dal-
has the procedural protections must try to deprive the perpetrators housie University, Canada. Power is a
afforded under the law and that of their most useful tools. At border coauthor (with Trope) of Sailing in Dan-
gerous Waters: A Director’s Guide to
Canadian customs officers have rea- “ports of entry,” government agents Data Governance (American Bar Asso-
sonable grounds. (The denial of the search for such tools and evidence ciation, 2005). Contact him at michael.
right to retain and instruct counsel that can provide early warning of power@gowlings.com.
combined with customs officers’ emerging and imminent threats. If
Roland L. Trope is a partner in the New
failure to properly inform the Arnold withstands appeal, the case
York City office of Trope and Schramm
detainee of her rights under the provides a cogent reminder that LLP and an adjunct professor in the
Customs Act made the search un- privacy can be protected against ill- Department of Law at the US Military
reasonable in Simmons.) Similarly, grounded—and thus unreason- Academy. He has a BA in political science
from the University of Southern Califor-
the Court held that, although the able—searches and seizures without nia, a BA and an MA in English language
authority provided in the Customs compromising national security. and literature from Oxford University,
Act to search a person or their goods The treatment of personal and com- and a JD from Yale Law School. Trope
at the border doesn’t offend the mercial information on laptops coauthored the treatise Checkpoints in
Cyberspace: Best Practices for Averting
Charter, the way in which the should not be linked to the device’s Liability in Cross-Border Transactions
search is carried out can. For highly location but rather to the sensitivity (American Bar Association, 2005). Con-
invasive searches, additional protec- of the information it contains. As the tact him at roland.trope@verizon.net.
tions will be required: Arnold court observed, “as a search
becomes more intrusive, it must be Jonathan Gilhen is a student-at-law at
Gowling Lafleur Henderson LLP. His
“Searches of the third or bod- justified by a correspondingly higher research interests include competition
ily cavity type may raise en- level of suspicion of wrongdoing.”6 and antitrust law, corporate finance and
tirely different constitutional Given that searching a laptop’s con- securities regulation, regulation of finan-
cial institutions, and corporate taxation.
issues, for it is obvious that the tents approximates a kind of delayed
He has a BA in economics from Saint
greater the intrusion, the mind reading, the privacy interests at Mary’s University in Halifax, Canada, an
greater must be the justifica- stake are among those most deserv- MA in economics from the University of
tion and the greater the degree ing protection. Victoria, and an LLB. from the University
of Ottawa. Contact him at jonathan.
of constitutional protection.”7 gilhen@gowlings.com.
References
Canadian courts have yet to 1. R.L. Trope and E.M. Power, “Les-
address personal information con- sons for Laptops from the 18th
tained on a person’s laptop com- Century,” IEEE Security & Privacy,
puter. If they take the same position vol. 4, no. 4, 2006, pp. 64–68.
as the US District Court in Arnold, 2. United States v. Flores-Montano, US
such searches might require greater Reports, vol. 541, 2004, p.149.
justification and benefit from greater 3. United States v. Ramsey, US Reports, IEEE Distributed Systems Online
constitutional protection. vol. 431, 1977, p. 606. brings you peer-reviewed articles, detailed
4. United States v. Guadalupe-Garza, Fed- tutorials, expert-managed topic areas, and
eral Supplement, 2nd Series, vol. 421, diverse departments covering the latest news
and developments in this fast-growing field.
A t one level, laptops and com-
puter devices simply store data;
at another level, that data represents
1970, p. 876 (9th Circuit Court).
5. United States. v. Rodriquez, Federal Sup-
plement, 2nd Series, vol. 592, 1979, p.
Log on for free access to such topic areas as
our intimate thoughts, hopes, dreams, 553–556 (9th Circuit Court).
Security • Grid Computing • Mobile &
and desires. When new technologies 6. United States v. Arnold, Federal Sup-
and new means of communication plement, 2nd Series, case no. 05- Pervasive • Cluster Computing •
enhance our ability to express our- 00772, 02 Oct. 2006 (US District Games & Simulation • Peer-to-Peer
selves, people commonly explore Court, Central District of Calif.). and More!
them with insufficient regard to the 7. Regina v. Simmons, Supreme Court
privacy risks involved. That doesn’t Reports, vol. 2, 1988 (Canada). To receive monthly updates, email
mean abandoning privacy any more dsonline@computer.org
than someone who forgets to look E. Michael Power is a partner in the http://dsonline.computer.org
both ways before crossing a street in- Ottawa, Canada, office of Gowling

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 75


Crypto Corner
Editors: Peter Gutmann, pgut001@cs.auckland.ac.nz
David Naccache, david.naccache@ens.fr
Charles C. Palmer, ccpalmer@us.ibm.com

When Cryptographers
Turn Lead into Gold

A
t its core, a cryptographer’s job is to “transmu- that bind public keys to identities
only for those users who can au-
tate” trust: just as alchemists turn lead into gold, thenticate to the CA both their
identities and public-key ownership.
cryptographers transmutate trust in one or more Consequently, if a valid certificate
accompanies Bob’s public key, Alice
assumptions into trust in some other simpler and can be certain that the key she has is
Bob’s. At this point, Alice’s trust is
better-defined assumptions, the ones on which the security of further transmutated into trust that
the CA performs its job correctly
PATRICK P. complex monolithic systems rely. Be- the decryption of arbitrarily chosen and honestly.
TSANG cause we can enforce and verify the ciphertexts—can’t extract a single But should Alice trust the CA?
Dartmouth resulting assumptions’ validity more bit of information about the under- She might still be unhappy because
College easily, such transmutation makes those lying plaintext message from any she’s afraid that the CA could be
systems more secure at a higher as- ciphertext encrypted under Bob’s tricked into signing a certificate for
surance. Unlike alchemists, though, public encryption key. Alice can Eve, who pretends to own Bob’s
cryptographers have successfully con- therefore encrypt messages for Bob public key. Cryptographic zero-
structed some of the building blocks before sending them to him. knowledge and proof-of-knowledge
(such as public-key encryption and If we assume that the public key protocols can help here, but ulti-
digital signatures) that play a make-or- Alice used during encryption was mately cryptographers only trans-
break role in many of today’s security- indeed Bob’s, then the problem of mutate trust—they don’t create it.
critical infrastructures. how to communicate securely is At the end of the transformation
In this installment of Crypto solved—cryptographers turned Alice’s chain, we must find a root of trust
Corner, we’ll look at how cryptog- need to trust that some arbitrary sys- whose establishment we can verify
raphers transmutate trust, identify tem will magically provide message without cryptography, be it via tech-
some of the reasons why they some- confidentiality into trust that the nology, economics, laws and poli-
times fail, and investigate how they public key she used was Bob’s, cies, or a mix of these.
could do a better job. which is a much simpler and better-
defined assumption. Alice doesn’t How well have our
The transmutation need to cross her fingers anymore trust alchemists done?
of trust after the transmutation. If a sophomore alchemist could
The idea of cryptographers as trust Or does she? Can she really trust blow up her neighborhood while
alchemists is best illustrated with a that she’s using Bob’s public key? trying to turn lead into gold, you
simple scenario. Let’s say that Alice Let’s summon our trust alchemists might also wonder if cryptographers
wants to deliver messages to Bob se- again, who this time conjure up a se- have done a good job in securing
cretly over an insecure communica- cure digital signature scheme. Se- systems. The answer “not quite”
tion channel wherein adversaries cure digital signatures must be probably won’t surprise you. As we
might passively listen or even ac- unforgeable: without the knowl- all know, system security is a difficult
tively drop and inject messages. To edge of Carol’s private signing key, problem to solve because of its in-
help Alice, cryptographers have no adversary can forge a signature on herent lack of robustness—a system
constructed secure public-key en- Carol’s behalf. is only as secure as its weakest link, so
cryption schemes, the confidential- Now let’s assume the existence of even a slight flaw in the transmuta-
ity property of which guarantees the a third party—called the certifica- tion can break an entire system’s se-
following: without knowledge of tion authority (CA) in public-key curity, rendering the transmutation a
Bob’s private decryption key, adver- infrastructure (PKI) nomencla- total failure. Let’s look at which in-
saries—even those who can study ture—who digitally signs certificates gredients are currently missing in the

76 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Crypto Corner

recipe for trust transmutation, why dent of the secret input’s value—for gramming languages, and compilers
they’re missing, and how our trust example, by padding dummy opera- to the software implementation.
alchemists can do a better job. tions into the algorithm. However,
timing attacks still exist and can Provable but
Side channels strike in new and surprising ways, uninstantiatable security
Advances in provable security in the such as when Hash functions are a good example of
past 20 years have resulted in a quan- how cryptographic protocols that are
tum leap in the security assurance of • dummy operations get optimized provably secure in research papers
many cryptographic protocols. Cryp- away by “smart” compilers, might fall short of finding an instanti-
tographers have moved from the “if- • adversaries launch timing attacks ation in the real world, thus leaving
cracked-then-fix-and-repeat” targeted at Secure Sockets Layer doubts about the security of their im-
construction paradigm to emphasiz- (SSL) Web servers from any re- plementation. A vital building block
ing precise formalisms of security mote location,1 in many cryptographic protocols is
models that capture powerful and yet • interkeystroke timing reveals a the use of collision-resistant hash
realistic adversarial capabilities. password as it’s typed in a Secure functions, whose existence had been
Today’s security formalisms, Shell (SSH) session,2 or taken for granted until recently when
however, still mostly focus on the • cache hit-and-miss attacks3 and researchers started realizing that such
protocol level. Perhaps this is in- branch-prediction attacks4 exploit functions had become an endangered
evitable because the abstraction that the underlying architecture’s effi- species. (A nail in MD5’s coffin came
cryptographers typically use— ciency features. last year, when anyone could find a
namely, modeling Alice and Bob as new collision in tens of seconds on a
Turing machines or probabilistic These scenarios suggest that trust al- laptop. Finding a SHA-1 collision still
polynomial-time algorithms—pre- chemists who think only at the pro- takes 268 steps today, although most
vents them from governing, let alone tocol level can’t defend against people believe it’s only a matter of
reasoning, how the algorithms they side-channel attacks. Rather, if time before the SHA family leaves us
devised should run as physical oper- cryptographers want to successfully as well. Two earlier Crypto Corner
ations in the machines we use today. secure systems through trust trans- installments covered this issue com-
Side channels are adversarial ca- mutation, their solutions must take prehensively.)5,6 The consequence of
pabilities that a security model into account the whole spectrum of such an extinction will be devastat-
can’t capture because of the exis- the problem, from the underlying ing; many cryptographic protocols,
tence of implementation-specific architecture, operating system, pro- including most encryption and digi-
surface areas below the abstraction.
Timing information, power con-
sumption, and even the acoustic
waves generated during algorithm
execution are a few examples of po-
tential attack surfaces. Let’s look
more closely at timing attacks,
which exploit the trivial fact that it
takes an algorithm different amounts
of time to complete for different in-
puts. A timing profile of multiple
algorithm runs gives statistically cor-
related information about the input
to the algorithm. Here, an input
could be external (such as a private
decryption key) or internal (such as
the randomness used). In either case,
as long as the input isn’t supposed to
fall into an attacker’s hands, it’s a juicy
piece of information.
In theory, timing attacks are an
old and solved problem. All you have
to do is ensure that the time an algo-
rithm takes to complete is indepen-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 77


Crypto Corner

tal signature schemes in use today, are shop/index.html for an example) or impact on everyone and everything,
proven secure under the assumption redesign the recipes so that they from buying a book online to de-
that the underlying hash functions are don’t require magic, by, for instance, fending a nation. It deserves curious
collision resistant. If these protocols constructing cryptographic proto- and skeptical eyes. A utopia we can
are instantiated with hash functions in cols whose security we can prove only dream of today—but should
which it’s possible to find collisions, without assuming random oracles. strive hard to achieve—is a world in
we won’t know if they’re secure any- which your mother says she prefers
more. In fact, Arjen Lenstra and his A study of ciphers the Cramer-Shoup cryptosystem10
colleague7 constructed two distinct In applied cryptography, multiple over RSA-OAEP because the secu-
X.509 certificates whose hashes col- professional disciplines such as math- rity of the former doesn’t rely on ran-
lide to demonstrate that MD5 colli- ematics, computer science, and dom oracles.
sions can violate the unforgeability engineering converge into one com-
guarantee of digital signatures and munity. Although cryptographers Randomness
thus the principles underlying the communicate their ideas and argu- Randomness is a must-have ingredi-
trust in PKI. ments into precise models, algo- ent for trust transmutation—it’s a
Another uninstantiability com- rithms, theorems, and proofs, people crucial, yet delicate resource for most
monly found in cryptographic con- outside this community tend to find cryptographic protocols. Without
structions is also related to hash cryptography research papers as illeg- the ability to pick elements from cer-
functions. We can prove a certain class ible as ciphertexts output from a se- tain sets uniformly at random, there
of cryptographic protocols’ security cure encryption. Few people are simply wouldn’t be secure public-
under the so-called Random Oracle willing to go beyond the threshold, key encryption or digital signatures.
Model (ROM), in which black-box the consequence of which is twofold. Consider an extreme situation in
entities known as random oracles an- First, the language cryptogra- which an adversary knows the entire
swer queries with truly random but phers speak is lost when it’s translated randomness used to generate a DSA
consistent responses. Nonetheless, into implementation languages; sys- signature—he or she could easily re-
no function behaves exactly the same tem developers misunderstand it, cover the private key from the signa-
as a random oracle does in reality. As a which routinely results in the inse- ture and start forging signatures
result, cryptographers suggest the use cure realization of secure protocols. universally. The fact that DSA signa-
of collision-resistant hash functions as As an example, Daniel Bleichen- tures are provably unforgeable no
a substitute when implementing bacher forged “provably unforge- longer guarantees the impossibility of
these protocols. How secure these able” RSA signatures thanks to a forgeries because the assumption
implementations actually are, how- flaw in the signature verification al- made in the security model about the
ever, is uncertain because the use of gorithm’s implementation (www. availability of a true random source
hash functions instead of random ora- mail-archive.com/cryptography@ whose process is unobservable to
cles violates the premises in the metzdowd.com/msg06537.html). anyone other than the signing entity
proofs. In fact, protocols exist that are Such an error could have been pre- ceases to hold.
proven secure under ROM but are vented had the developer realized Despite its importance, few peo-
shown to be insecure when we in- the subtlety of cryptographic proto- ple worry about this ingredient’s
stantiate the random oracles with col security and had thus put extra quality and supply—is the random-
collision-resistant hash functions.8 effort into ensuring the implemen- ness we get from a random source
Cryptographers continue to wage a tation correctly preserved the se- truly random, and could we ever run
heated debate on the ROM’s merits: mantics of every symbol and out of it? Most PC architectures today
one school of thought believes that statement in the protocols. aren’t equipped with a hardware ran-
proving security in the ROM pro- Second, security proofs in cryp- dom number generator, which
vides no practical security assurance, tography papers are so esoteric that makes it difficult to securely provide
whereas another believes that security few people actually read them. The randomness to cryptographic soft-
in the ROM can at least serve as a fact that it took seven years before ware running on these machines.
good heuristic. anyone found a flaw in the proof for Linux, for example, relies on extract-
Provably secure cryptographic the RSA-OAEP encryption9 sur- ing randomness from the entropy
protocols that aren’t instantiable are prised everyone (at the time, this en- gathered from random system events
like transmutation recipes that re- cryption helped secure e-commerce such as user input, disk access, and
quire magic beans. To solve the transactions). That only the experts network traffic. Nevertheless, ran-
problem, we must either find ways need to understand and review cryp- domness could be exhausted before
to synthesize these magic beans (see tography is an unfortunate miscon- it’s replenished, especially on systems
www.csrc.nist.gov/pki/HashWork ception: cryptography has a direct with few entropy sources, such as net-

78 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Crypto Corner

work routers and sensor nodes that 3. D. Page, Theoretical Use of Cache Springer-Verlag, 2005, pp. 267–279.
don’t have user inputs or disks. Under Memory as a Cryptanalytic Side 8. R. Canetti, O. Goldreich, and S.
such circumstances, the running Channel, tech. report CSTR-02- Halevi, “The Random Oracle
process would either have to halt until 003, Dept. Computer Science, Methodology, Revisited,” J. ACM,
enough randomness becomes avail- Univ. of Bristol, June 2002. vol. 51, no. 4, 2004, pp. 557–594.
able or use insecure randomness. 4. O. Aciiçmez, C. Kaya Koç, and J.-P. 9. V. Shoup, “OAEP Reconsidered,”
Seifert Onur, “Predicting Secret CRYPTO, LNCS 2139, J. Kilian,
Keys via Branch Prediction,” Cryp- ed., Springer-Verlag, 2001, pp.

Y ogi Berra once said, “In theory,


there is no difference between
theory and practice, but in practice,
tographers’ Track at RSA 2006 (CT-
RSA 06), LNCS 4377, M. Abe, ed.,
Springer-Verlag, 2006, pp. 225–242.
239–259.
10. R. Cramer and V. Shoup, “A Prac-
tical Public Key Cryptosystem Prov-
there is.” As we’ve seen in this article, 5. P. Gutmann et al., “When Hashes ably Secure against Adaptive Chosen
although we’ve had some initial suc- Collide,” IEEE Security & Privacy, Ciphertext Attack,” CRYPTO,
cess in transmutating trust using cryp- vol. 3, no. 3, 2005, pp. 68–71. LNCS 1462, H. Krawczyk, ed.,
tography at a theoretical level, we’re 6. J.-S. Coron, “What is Cryptogra- Springer-Verlag, 1998, pp. 13–25.
not quite there yet in terms of making phy?” IEEE Security & Privacy, vol.
system security bulletproof in prac- 4, no. 1, 2006, pp. 70–73. Patrick P. Tsang is a PhD student at
tice. To close that gap, we must incor- 7. A. Lenstra and B. de Weger, “On the Dartmouth College. His research inter-
ests include applied cryptography, secu-
porate the hardware architecture into Possibility of Constructing Mean- rity and privacy, and trusted computing.
our design and model, develop new ingful Hash Collisions for Public Tsang has an MPhil in information engi-
skills to formalize implementation- Keys,” 10th Australasian Conf. Infor- neering from the Chinese University of
Hong Kong. He’s a member of the Inter-
specific adversarial capabilities, take mation Security and Privacy (ACISP
national Association for Cryptologic
into account the underlying hard- 05), LNCS 3574, C. Boyd and J. Research (IACR). Contact him at patrick@
ware’s features and weaknesses, and, if Manuel González Nieto, eds., cs.dartmouth.edu.
beneficial, augment the architecture
with security-enhancing mecha-
nisms. Moreover, we must make
cryptography more accessible so that
system security can benefit from the The magazine that helps
awareness of a bigger community, in- scientists to apply high-end
cluding systems designers, applica-
tion developers, hardware architects, software in their research!
and end users. I’m not so much inter-
ested in being able to turn lead into
gold—not only because I’m not con- Top-Flight Departments in Each Issue!
vinced it will ever be possible, but also • Book Reviews • Technology
because of the insecurity I would feel • Computer • Visualization
once I became too rich. Rather, I’m Simulations Corner
confident in, and look forward to, • Education • Your Homework
trust alchemists doing an increasingly
better job of making the world a
more secure and better place to live
$45
print & online
• Scientific
Programming
Assignment

for everybody. Peer- Reviewed Theme & Feature Articles

References 2 0 0 7 Jan/Feb Anatomic Rendering & Visualization


1. D. Brumley and D. Boneh, Mar/Apr Stochastic Modeling of Complex Systems
“Remote Timing Attacks Are May/Jun Python: Batteries Included
Practical,” Computer Networks, vol.
Jul/Aug New Directions
48, no. 5, 2005, pp. 701–716.
2. D. Xiaodong Song, D. Wagner, and Sep/Oct High-Performance Computing
X. Tian, “Timing Analysis of Key- Defense Applications
strokes and Timing Attacks on Nov/Dec Computing in Combinatorics
SSH,” Proc. 10th Usenix Security
Symp., Usenix Assoc., 2001, pp.
Subscribe to CiSE online at http://cise.aip.org/cise and www.computer.org/cise
337–352.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 79


Secure Systems
Editor: S.W. Smith, sws@cs.dartmouth.edu

A Case (Study) For Usability in


Secure Email Communication

A
s a network security researcher, I find it very dis- hold Alice to her original contract by
proving that the signature he pos-
appointing that most users can’t, or simply don’t, sesses was created before Alice pub-
lished her key—perhaps by using a
secure their everyday Internet communications. time-stamping service2 or an online
notary—Alice can still claim that she
For good reason, usability in security has received didn’t know her key was stolen.
More sophisticated protocols for
a fair deal of attention in the past few years (see the September nonrepudiation are needed, but as it
now stands with standard S/MIME,
APU KAPADIA 2004 special issue on this topic1). To realized that, currently, everyday nonrepudiation for casual email users
Dartmouth push the issue further, I decided to users seldom need to do so, as we doesn’t work in practice.
College initiate my own informal case study will see if we examine email signa-
on the usability and practical rele- tures more closely. Integrity
vance of standard security mecha- For any message, Alice can use Forging email messages on today’s In-
nisms for email communication. her private key to generate a crypto- ternet is surprisingly easy, and forg-
I focused my attention on avail- graphic package that a recipient can eries such as phishing emails are a
able public-key cryptography tech- verify only by using her public key direct threat to everyday users. In the-
niques for digitally signing and and the original message. This pack- ory, if messages are digitally signed,
encrypting email. My first step was age is called a digital signature and recipients can reject those with
to establish a public–private key pair provides two basic properties: non- spoofed “From” addresses because
to use with email. I chose to use Se- repudiation and integrity. their signatures won’t be valid—that
cure/Multipurpose Internet Mail is, only Paypal can sign messages that
Extensions (S/MIME), a standard for Nonrepudiation appear to come from paypal.com.
signing and encrypting email, be- Nonrepudiation is the idea that, in Digital signatures also provide protec-
cause it’s already supported by popu- theory, a signer such as Alice can’t tion against adversaries who modify
lar email clients such as Apple Mail, later deny that she signed the mes- parts of the message in transit,
Outlook Express, and Mozilla’s sage. For example, occasionally I although I would argue that such
Thunderbird. Unlike S/MIME, I submit reviews for conference papers email modifications present very lit-
found that Pretty Good Privacy over email. I could digitally sign my tle threat to everyday users—for them,
(PGP) and the GNU Privacy Guard messages to claim responsibility for digital signatures’ main utility is in
(GPG) were unusable with nontech- my words. But as any security re- countering forged sender addresses.
nical correspondents because it re- searcher would be quick to point In practice, however, digital signa-
quired them to install additional out, digital signatures’ nonrepudia- tures are a weak line of defense. Phish-
software. S/MIME, it seemed, was bility is just an illusion. Alice can al- ers can use cleverly crafted email
the better solution for these “every- ways claim that someone stole her addresses such as customer-service@
day users,” for whom the concepts of private key and that the signature is a paypal-help.com to trick users into
public-key infrastructure (PKI), PGP, forgery. And if that’s not enough, believing that they’re corresponding
certificates, keys, and so on remain Alice can publish her key in The New with Paypal. Because phishers can le-
elusive. Additionally, I decided to get York Times, letting potentially any- gitimately own a domain such as pay-
my public key certified by Thawte body sign a message using it. In such pal-help.com, a phisher can obtain a
(www.thawte.com), an online cer- situations, Alice can be penalized for certificate and generate emails from
tificate authority (CA). negligence or irresponsible behavior, that domain that have valid signatures
but she can’t be held responsible for (this is just a hypothetical example, but
Digital signatures the contents of messages signed with at the time of writing, paypal-
After months of signing email, I’ve her private key. Even if Bob tries to help.com was registered under a for-

80 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Secure Systems

eign mailing address). Any mecha- tion, although anecdotal, has made a “examine certificate,” and a stray
nism that combats phishing must look lasting impact on my use of signatures click later, my certificate was pre-
beyond the integrity protection that and highlights the need for more re- sented for examination. The email
digital signatures provide. Given that search on usability in security. client automatically obtained this
most email that users receive is un- By default, some email clients at- certificate from earlier messages I
signed, users routinely verify a mes- tempt to digitally sign replies to signed had signed. From my correspon-
sage’s integrity based on its contents messages. While responding to my dent’s viewpoint, however, the
and context. In fact, I find myself ver- signed email, a correspondent who problem with connecting to an un-
ifying messages’ integrity based on works for the military was told to “in- trusted email server was somehow
their overall content, even when they sert cryptocard.” Because the corre- linked to my name. Again, I found
are digitally signed. For the lack of a spondent was not familiar with digital myself obliged to explain that I
better term, I call this form of in- signatures, I received a reply with a wasn’t trying to do anything sneaky
tegrity “semantic integrity,” in con- suspicious tone (whether intended or with my correspondent’s email
trast to the standard notion of not, this is how I interpreted it). With client. These incidents have taught
(syntactic) integrity that digital signa- the prospect of a potentially peeved me an important lesson: sign your
tures provide. When corresponding military official, I found myself messages only to people who under-
with familiar people, verifying the se- obliged to explain that I was not trying stand the concept. Until more usable
mantic integrity of email messages is to do anything sneaky with govern- mechanisms are integrated into pop-
surprisingly easy—digitally signed or ment computers, and that the email ular email clients, signatures using
not, strange text from a friend that client was the culprit with its auto- S/MIME should remain in the do-
contains an attached virus looks suspi- mated behavior. A couple of test main of “power users.”
cious. I routinely ignore signatures emails, with and without signatures,
from family, friends, and acquain- convinced the correspondent of my Encryption and the key
tances simply because I’m confident theory—that the email client was in- distribution problem
that I can sniff out forgeries. deed trying to automatically sign Now, more than ever, the privacy of
At this point I will re-emphasize replies to my signed messages. our communications is at risk. The
my focus on everyday users. Certainly, In a separate incident, another government is increasingly inter-
defense contractors, network admin- correspondent, also unfamiliar with ested in our conversations, and in an
istrators, and so on are well advised to PKI, was facing problems after en- open system such as the Internet, we
digitally sign messages to correspon- countering a certificate signed by an must take added measures to ensure
dents who expect them. You can in- untrusted CA. After clicking on our privacy rights. With the confi-
struct employees to reject any message
from the security officer without a
valid signature—certain job functions
rely on baseline security mechanisms
for which you can provide training.
For everyday users, however, using
digital signatures to verify messages’
integrity is both overkill and prone to
error, the former because using signa-
tures for detecting alterations doesn’t
address a tangible threat, and the latter
because telling everyday users to “en-
sure that the signature is valid” to de-
tect forgeries is a misguided heuristic.
Focusing on tools that will help them
verify semantic integrity, instead, is
more promising.

Incrimination
Given that the two most important
properties of digital signatures don’t
seem useful in practice, why might
everyday users continue to sign
email? The property of incrimina-

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 81


Secure Systems

dentiality of my electronic conversa- tally signed) by a CA that Bob trusts, deed create a bogus certificate for my
tions in mind, I convinced some of then Bob will accept Alice’s certifi- identity, certified by a malicious CA
my research colleagues to encrypt cate as being authentic. If Alice’s key that I don’t trust, but that is on the list
their email conversations with me. is certified by a CA that’s not on of installed third-party CAs. Because
the S/MIME email client would
trust the certificate, were my col-
By pre-installing third-party CA certificates leagues accepting a fake certificate
signed by another CA or were they
into email clients without rigorous auditing accepting my Thawte certificate?
Clearly, users must first trust the
procedures, vendors are breaking the trust CAs installed in their email clients.
Second, if Alice and Bob are ex-
model required for PKI to be successful. changing keys, they should use a CA
that they both trust. Absent a com-
While exchanging public keys, the Bob’s trusted list, Bob can try to find mon trusted CA, the just-men-
most important step is to verify that a a trusted path to Alice’s certificate by tioned man-in-the-middle attack is
man-in-the-middle isn’t subverting starting at a CA that he does trust. still possible, with or without certifi-
your exchange. If we assume that an Let’s say that Bob trusts only CA1 and cate chains. PKI has been plagued by
adversary can control our conversa- encounters Alice’s certificate signed its end-points—by pre-installing
tions, we must verify the exchanged by CA3. Bob can try to find a chain third-party CA certificates into
public keys’ authenticity. of trust in which CA1 certifies CA2, email clients without rigorous audit-
Charlie, a man-in-the-middle, who in turn certifies CA3 (certificate ing procedures, vendors are breaking
can pretend to be Bob with respect chains can be much longer in prac- the trust model required for PKI to
to Alice and Alice with respect to tice). This certificate chain lets Bob be successful.
Bob. Alice and Bob communicate establish a path of trust to Alice’s cer- Now, consider enterprise sys-
“securely,” except that they’re both tificate, even though he doesn’t ex- tems, in which organizations can
communicating through Charlie plicitly trust CA3. PKI proposes make rigorous policy decisions
without realizing that he’s decrypt- meshes of CAs established by certifi- about a CA’s certification proce-
ing and re-encrypting their mes- cation relationships. Meshes can also dures and thereby outsource the
sages. The most secure way for Alice include hierarchies of higher-level key management functions to a
and Bob to verify their keys’ authen- CAs certifying lower-level CAs and trusted CA. They can also make
ticity is to do so in person; this, how- cross-certification authorities which rigorous policy decisions regarding
ever, is impractical, giving rise to the can bridge trust hierarchies into a valid trust paths to other CAs. For
key-distribution problem—how can mesh to aid in building trust paths. example, the Higher Education
users distribute their public keys to Although this approach can pro- Bridge Certification Authority
other parties reliably? The PKI vide a high level of assurance in en- (HEBCA) has a stringent process of
world has developed two solutions: terprise-level communications, it has assigning levels of assurance (LOA) to
either rely on a trusted third party (or a few limitations when applied to CAs that are part of the bridge.
a more elaborate network of trusted email exchanges between everyday Higher education organizations
third parties) such as Thawte or users. Mainly at fault is the list of can then trust HEBCA, and the or-
VeriSign (www.verisign.com) to “trusted” CAs that the email client’s ganizations that are part of HEBCA
certify that your correspondent’s software vendor has pre-installed. A can trust each other’s certificates. In
public key is bound to his or her colleague of mine, Scott Rea, calls other words, HEBCA “bridges”
identity, or verify the authenticity this a list of “third parties” as opposed trust between different organiza-
yourself by checking the public key’s to a list of “trusted third parties” be- tions operating under their own
fingerprints through an out-of-band cause this list doesn’t correspond to PKIs by certifying their CAs’ prac-
(OOB) channel—that is, by a sepa- the set of CAs that the email client’s tices. Training employees within an
rate means of communication. users trust. After all, I chose not to organization to recognize valid cer-
get my public key certified from an tificates is feasible because the orga-
Third-party “trust” authority that I had never heard of nization has a financial incentive to
Verifying the authenticity of keys (and hence didn’t trust), but rather do so. Everyday users, however,
with my correspondents was surpris- had it certified by Thawte. My cor- don’t have the time or motivation
ingly error-prone. Let’s analyze the respondents, however, don’t know for rigorous bookkeeping about
PKI solution that relies on CAs first. my trusted CA a priori. A powerful various CAs’ certification proce-
If Alice’s public key is certified (digi- man-in-the-middle attack could in- dures. CA-certified keys and

82 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Secure Systems

trusted paths are less meaningful if ter which key continuity gives the over-IP (VoIP) services such as
users don’t understand the certify- user a sense of security. This ap- Philip Zimmermann’s Zfone (www.
ing CA’s procedures and are willing proach has limitations, however: philzimmermann.com/EN/zfone/),
to accept any certificate that their what can Alice do if her key is com- given that it’s very difficult for a
email client trusts. (Note, however, promised? In a CA-based approach, man-in-the-middle to subvert a
that PKI can be quite successful as a before using Alice’s key to secure voice conversation in real time.
means for an enterprise-level or- communications, Bob can check the Additionally, humans can easily
ganzation to authenticate everyday CA’s revocation list or use the Online verify the semantic integrity of a
users—the organization can have Certificate Status Protocol (OCSP) voice conversation with a known
rigorous policies about which CA’s to ensure that it hasn’t been compro- correspondent because a man-in-
certificates it should accept, with- mised. KCM, however, relies on the-middle would have trouble im-
out including everyday users in Alice informing all her correspon- personating your correspondent’s
these trust decisions.) dents that her key has been compro- voice. (Caveat: humans are poor at
As I’ve argued, exchanging keys mised. KCM proponents argue that verifying the semantic integrity of
using current implementations of the added benefit of an infrastruc- conversations with unknown cor-
S/MIME is risky for everyday users tureless approach outweighs the re- respondents, a weakness that is ex-
because their trust in their email duction in security from potentially ploited in social engineering attacks.)
clients is misplaced. We must take a compromised keys. If users verify It would be prudent, however, to
long-term approach toward building fingerprints often enough, they can expect computers in the not-too-
usable key-management methods and limit the amount of damage a com- distant future to be able to synthesize
educating everyday users about trust- promised key causes. voice in real time. A dedicated man-
ing CAs and establishing a common This brings us to one final ques- in-the-middle could possibly re-
root of trust with their correspon- tion: how can users verify a key’s fin- place the part of your conversation
dents. An independent organization gerprints reliably? One option is to related to fingerprint verification.
such as HEBCA can audit CAs care- verify fingerprints for email over IM Soon, we will need more sophisti-
fully and help establish a common and fingerprints for IM over email. cated methods for verifying a remote
root of trust. Users and email client However, this approach still won’t correspondent’s fingerprints, but
vendors can then be instructed to trust protect us against motivated adver- until then, relying on real-time voice
only CAs with the auditing organiza- saries (or our employers!) who can verification seems to be the best op-
tion’s approval. In the short-term, intercept both communication lines tion. In my personal experience, my
however, because most everyday users and subvert our attempted OOB correspondents seemed rather un-
don’t have mutually trusted CAs, they fingerprint verification. comfortable with the “geekiness” of
should use the second solution, fin- Exchanging SMS messages is a reading random numbers over the
gerprint verification, to foil man-in- viable option5 because the mobile phone. However, with VoIP soft-
the-middle attacks. phone network is clearly separated ware becoming more popular
from our organizations’ networks among everyday users, a mechanism
Fingerprint verification (or are they?). After hearing about to use the same verified keys for
A fingerprint is a secure hash of the the purported collaboration be- email communications will be a
public key and is a smaller, digested tween the NSA and AT&T, how- great solution to the problem of
form. Verifying that the exchanged ever, relying on phone companies OOB fingerprint verification.
key’s fingerprint matches the origi- to deliver electronic fingerprints Although neither the trusted
nal key’s fingerprint is a much faster also seems risky against capable ad- third-party nor fingerprint solutions
way to verify the key’s authenticity. versaries. In the end, if you can’t in their current forms seem suffi-
The recently proposed concept of
key continuity management (KCM)3,4 is
an emerging alternative to the CA- In the end, if you can’t verify fingerprints
based approach. KCM posits that
once Bob has verified a key’s finger- in person, it seems safest to verify them over
prints, he can be sure that the key he
uses for encryption is the same one the phone.
he’s verified in the past. Users needn’t
rely on an elaborate network of CAs verify fingerprints in person, it ciently secure for everyday users,
to certify keys. As with SSH, users of seems safest to verify them over the perhaps a hybrid approach is needed
email clients are assumed to verify a phone. This is the standard method in the short term. As I suggested
newly observed key’s fingerprint, af- of fingerprint verification in voice- with CA-based PKI, everyday users

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 83


Secure Systems

should verify a key’s fingerprints. tribute-based annotations to help 4. S.L. Garfinkel and R.C. Miller,
Mechanisms developed for KCM users make better trust decisions “Johnny 2: A User Test of Key Con-
can bolster trust in CA-certified keys about their email communication.6 tinuity Management with S/MIME
and ensure that users verify finger- Until such usable mechanisms are and Outlook Express,” Proc. Symp.
prints to secure communication. introduced into popular email Usable Privacy and Security (SOUPS
clients, however, proceed with cau- 05), ACM Press, 2005, pp 13–24.
tion and verify those fingerprints. 5. A.J. Nicholson et al.,“LoKey:

T here are several barriers for every-


day users who wish to secure their
communications. S/MIME is sup-
Acknowledgments
The author thanks Scott Rea for his insightful
Leveraging the SMS Network in
Decentralized, End-to-End Trust
Establishment, Proc. 4th Int’l Conf.
ported by popular email clients, but comments and willingness to read multiple Pervasive Computing (Pervasive 06),
casual users are lulled into a false drafts of this article. He also thanks Sean LNCS 3968, Springer-Verlag, pp.
sense of security; accepting “valid” Smith, Patrick Tsang, and Phoebe Wolfskill 202–219.
signatures without comprehending for their helpful comments. 6. C. Masone and S.W. Smith,
the underlying trust assumptions or “Towards Usefully Secure Email,”
being content with encrypted email References IEEE Technology and Society Maga-
without being diligent about finger- 1. IEEE Security & Privacy, special issue zine, to be published, Mar. 2007.
print verification highlights the on usable security, vol. 2, no. 5, 2004.
mismatch between the user’s expec- 2. S. Haber and W.S. Stornetta, “How Apu Kapadia is a post-doctoral research
tations and their communication’s to Time-Stamp a Digital Docu- fellow at the Institute for Security Tech-
underlying security. ment,” J. Cryptology, vol. 3, no. 2, nology Studies, Dartmouth College. His
On the optimistic front, PKI 1991, pp. 99–111. research interests include systems secu-
rity and privacy, and he is particularly
awareness is increasing—here at 3. P. Gutmann, “Why Isn’t the Inter- interested in anonymizing networks and
Dartmouth College, all first-year net Secure Yet, Dammit,” Proc. usable mechanisms for enhancing pri-
students are issued PKI tokens, and AusCERT Asia Pacific Information vacy. Kapadia has a PhD in computer sci-
research on usability for secure com- Technology Security Conf., AusCERT, ence from the University of Illinois at
Urbana-Champaign. He is a member of
munication is gaining momentum. May 2004; http://conference.aus the IEEE and the ACM. Contact him at
One promising approach uses at- cert.org.au/conf 2004/. akapadia@cs.dartmouth.edu.

Advertising Sales Will Hamilton Email: steve@didierand


Representatives Phone: +1 269 381 2156 broderick.com
Advertiser | Product Index Fax: +1 269 381 2556
Mid Atlantic Email: wh.ieeemedia@ Northwest (product)
March | April 2007 (product/recruitment) ieee.org Peter D. Scott
Dawn Becker Phone: +1 415 421-7950
Phone: +1 732 772 0160 Joe DiNardo Fax: +1 415 398-4156
Advertiser Page number Fax: +1 732 772 0164 Phone: +1 440 248 2456 Email: peterd@
Carnegie Mellon University 5 Email: db.ieeemedia@ Fax: +1 440 248 2594 pscottassoc.com
ieee.org Email: jd.ieeemedia@
CSINetSec 2007 Cover 4 ieee.org
Southern CA (product)
Marshall Rubin
John Wiley & Sons, Inc. Cover 2 New England (product)
Phone: +1 818 888 2407
Jody Estabrook Southeast (recruitment)
Nato 3 Phone: +1 978 244 0192 Thomas M. Flynn Fax: +1 818 888 4907
Fax: +1 978 244 0103 Phone: +1 770 645 2944 Email: mr.ieeemedia@
Email: je.ieeemedia@ieee.org Fax: +1 770 993 4423 ieee.org
*Boldface denotes advertisements in this issue
Email:
Northwest/Southern CA
New England (recruitment) flynntom@mindspring.com
(recruitment)
John Restchack
Southeast (product) Tim Matteson
Phone: +1 212 419 7578
Bill Holland Phone: +1 310 836 4064
Fax: +1 212 419 7589
Advertising Personnel Email: j.restchack@ieee.org Phone: +1 770 435 6549 Fax: +1 310 836 4067
Fax: +1 770 435 0243 Email: tm.ieeemedia@
Marion Delaney | IEEE Media, Advertising Director ieee.org
Connecticut (product) Email:
Phone: +1 415 863 4717 | Email: md.ieeemedia@ieee.org Stan Greenfield hollandwfh@yahoo.com
Japan
Phone: +1 203 938 2418
Midwest/Southwest Tim Matteson
Marian Anderson | Advertising Coordinator Fax: +1 203 938 3211
(recruitment) Phone: +1 310 836 4064
Email:
Phone: +1 714 821 8380 | Fax: +1 714 821 4010 greenco@optonline.net Darcy Giovingo Fax: +1 310 836 4067
Email: manderson@computer.org Phone: +1 847 498-4520 Email: tm.ieeemedia@
Midwest (product) Fax: +1 847 498-5911 ieee.org
Dave Jones Email: dg.ieeemedia@ieee.org
Sandy Brown Phone: +1 708 442 5633
Europe (product)
Hilary Turnbull
IEEE Computer Society | Business Development Manager Fax: +1 708 442 7620 Southwest (product)
Steve Loerch Phone: +44 1875 825700
Phone: +1 714 821 8380 | Fax: +1 714 821 4010 Email: dj.ieeemedia@
Phone: +1 847 498 4520 Fax: +44 1875 825701
ieee.org
Email: sb.ieeemedia@ieee.org Fax: +1 847 498 5911 Email: impress@impress
media.com

84 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Digital Protection
Editors: Michael Lesk, lesk@acm.org
Martin R. Stytz, mstytz@att.net
Roland L. Trope, roland.trope@verizon.net

South Korea’s
Way to the Future

S
outh Korea is a country with advanced technology pany—with 600,000 customers M ICHAEL LESK
who can listen to 700,000 songs for Rutgers
and a great many users of it. Can we use it as a model roughly $5 per month (http:// University
investors.wmg.com/phoenix.zhtml
of what might happen as technology spreads? If so, ?c=182480&p=irol-newsArticle&
ID=854632&highlight=). This ra-
the news is pretty good for consumers: several online pidly growing service controls 60
percent of the South Korean online
services exist that are legal, with relatively generous terms. music market. Unlimited online ser-
vices for gaming and instant messag-
South Korea leads the world in ers, stored value cards, and full key- ing are also popular in South Korea.
access to broadband services; as of boards. South Korea’s ringtone mar- Vendors might have moved to this
early 2006, 83 percent of households ket alone is larger than its CD model because of the wildly publi-
had broadband, compared to market—ringtones brought in cized tragedy of a South Korean
roughly 45 percent in the US (www. US$336 million in 2004, for exam- teenager who ran up a $3,000 bill
websiteoptimization.com/bw/0607). ple. Consumers can buy unlimited playing online games and then com-
Not coincidentally, the country also music downloads for $5 per month mitted suicide (www.techdirt.com/
leads in the transition to digital or broadband service for $20 per articles/20061214/131943.shtml).
music sales via digital rights manage- month—and they’re about to have a But why did music copyright
ment (DRM) software. In fact, the new form of interoperability among holders agree to this system? Over-
past decade has seen South Korea’s music vendors. simplifying, the South Korean
music scene change dramatically. It recording industry is smaller and less
once had 8,000 music stores; now, it The new powerful than the telecommunica-
has 400—partly because of the Asian device choices tions industry. In fact, the music in-
financial crisis of the late 1990s, but Many South Koreans have had mo- dustry has agreed to some very bad
mostly due to the change in music bile phones that double as music bargains—at one point, it even ac-
distribution patterns. South Korea is players for two years (Apple’s up- cepted a percentage royalty on the
also the first country in which on- coming iPhone won’t be new tech- value of ringtone sales. Thus, one
line music sales exceeded CD sales nology to them). Mobile phone sales company offered free ringtones as an
in value. According to The Korea grew six times between 2004 and advertisement or enticement for
Times, South Koreans spent 400 bil- 2005, and the first half of 2006 saw their other services, and paid noth-
lion won buying music from tradi- sales of 39.2 billion won—more ing for using the music.
tional stores in 1999; by 2005, that than 2005’s entire revenue of 33.7
number was down to 108 billion billion won.2 In South Korea, it’s not Music and
won whereas online music pur- just iPods but phones that let you in- commerce in Korea
chases had reached 262 billion won.1 stantly choose and listen to new As an example of the current business
The exchange rate is roughly 1,000 music while walking around. The environment for South Korean mu-
won to every US dollar, so convert- leading vendors are telecommunica- sicians, the singer Rain announced a
ing the word “billion” to “million” tions companies, not computer world tour in 2006 with a plan to
gives US readers an idea of what’s manufacturers; and much of the take in 106 billion won. The tour’s
going on in a country one-sixth the market is subscription-based rather organizers, Star M, wrote on the
US’s size and with 1/15th of its gross than item-by-item downloads. singer’s Web site (www.bi-rain.org)
domestic product. South Korea’s most successful that “about 56 billion won will come
A typical South Korean mobile unlimited subscription service is from admission fees, 20 billion from
phone is a camera as well as a music MelOn—owned by SK Telecom, the tour’s copyrights, eight billion
player, and some include video play- the country’s largest telecom com- from DVD sales, six billion from

PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY 85
Digital Protection

Table 1. Impact of free downloads. to the same recording companies question is deeper and more confus-
and are relatively small. In 2006, the ing. In the varied music world,
INCREASE (%) DECREASE (%) Allman Brothers and Cheap Trick which is more welcoming to new
sued Sony BMG for a larger share of artists: the CD market, the down-
CD and 21 5 royalties from digital downloads. load market, or the concert market?
merchandise sales According to The Wall Street Journal, In recent years, both CD and con-
Concert attendance 30 0 for each iTunes sale of an Allman cert music sales have become more
Brothers song at 99 cents, Sony gets concentrated as users buy fewer dif-
80 cents, and the Allman Brothers ferent items than in the past, with the
merchandise, five billion from mo- get 4.5 cents.4 Adding insult to in- biggest names getting more of the
bile revenues, five billion from Inter- jury, Sony then charges the musi- money.5 In contrast, the variety of
net revenues, 4.5 billion won from cians for packaging costs and online music is growing, following
the singer’s photo album, and 1.3 bil- breakage during shipping, tradi- the phenomenon known as “The
lion won from broadcasting copy- tional deductions from royalties that Long Tail.”6 iTunes carries roughly 4
rights.” Note that the royalties on the wouldn’t seem to apply to iTunes. million songs; no conventional
familiar plastic disks are roughly 20 The bands’ lawsuit is primarily record store has more than, say,
percent of the revenue—they’re not about the distinction between sales 150,000. However, the concentra-
the lion’s share of the gross. and licenses, something familiar to tion of conventional sales is greater
US rock groups also get more those few who care about the paper- than this comparison would suggest.
revenue from concert tickets than work that comes with software. Stores like Tower Records (60,000
from recordings. Marie Connolly Sony accounts to the band for titles per store) have gone out of
and Alan Krueger point out that, iTunes purchases as if they were sales, business whereas Wal-Mart, with
“For the top 35 artists as a whole, in- but insists to music purchasers that 5,000 titles per store, now represents
come from touring exceeded in- they’re only buying licenses. If Sony one-fifth of the CD business.7 Simi-
come from record sales by a ratio of were forced to be consistent, the All- larly, concert revenues are increas-
7.5 to 1 in 2002.”3 According to the man Brothers would get more ingly concentrated, although this
International Federation of the money (perhaps up to 80 times partly reflects greater pricing accu-
Phonographic Industry, US record more) or consumers would acquire racy, as the most popular bands
sales in 2004 were roughly $12 bil- more rights than Sony wants them charge higher prices to capture the
lion, whereas concert ticket sales to have (the right to transfer or resell revenue previously lost to scalpers.
were roughly $2 billion. Why? The the songs, for example). This logic would suggest that if all
split of concert revenue is quite dif- Table 1 shows the results from a the traditional revenue sources went
ferent: once past initial expenses and 2004 survey5 in which researchers away and consumers bought all of
the guaranty, a band might get 85 interviewed musicians and asked their music online, we’d probably
percent of the remaining revenue them whether free downloads in- have a greater variety of music avail-
and the promoter 15 percent.3 The creased or decreased their sales of able than we do now. Musicians
band usually gets all the T-shirt recordings or concert tickets. Those would still get relatively little, but
money; the concert promoter gets who didn’t answer “increase” or they’d have an easier time starting out
the beer sales. Recording sales are “decrease” said “no effect,” “don’t from scratch than they do now (you
handled differently. The band might know,” or “not applicable.” could argue that the chances for new
be entitled to a royalty of 10 percent Musicians, in general, aren’t the bands could hardly get any worse).
of the wholesale price after subtract- leaders in the copyright–piracy de-
ing payment to the band’s producer bate. Recording companies might Locked no more
and paying recording and promotion be suffering as CD sales drop, but Another consumer-friendly step
costs. The musicians might wind up the musicians get so little from forward in South Korea comes by
with at most a small percent of what record sales that the losses have less legal force. The MelOn service uses
consumers pay, but they often end up impact on them. It’s remarkable DRM technology to restrict their
with nothing—thus, from their how small the bands’ share of the players to its own downloads (just in
viewpoint, CDs are just concert ad- money is, considering the market case you wondered why this article
vertising. Their goal is to make the power you might think they have. belongs in this department). Re-
concerts a “hot ticket” so that they Smaller groups do even worse. cently, the South Korean Fair Trade
can charge very high prices for them. Commission (KFTC)—responsible
So do musicians do better with Where’s the variety? for antitrust law enforcement—
music sold via online download ser- In terms of online download ser- fined SK Telecom 330 million won,
vices? Not really. Royalties are paid vices’ effect on creativity, the real saying that as the dominant com-

86 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Digital Protection

pany in the mobile MP3 phone sense that this demographic would alties,” The Wall Street Journal, 28
market, it must let people play other be the first place that a new legal Apr. 2006.
legal MP3 songs on its handsets.2 marketing system for online music 5. M. Peitz and P. Waelbroeck, “An
What this means is that SK Telecom would pay off. Economists’s Guide to Digital
can still have its music store, but it Music,” working paper no. 1333,
will have to let other music stores sell Munich Soc. for the Promotion of
tracks that play on its phones. Of
course, the KFTC’s ruling allows ac-
cess to those individuals ready to
S o what’s in store for the future—
that as seen in South Korea or in
classical music? Either way, I’m fairly 6.
Economic Research (CESifo
GmbH), 2004.
C. Anderson, The Long Tail, Hype-
give away MP3s, whether of their optimistic: it looks as if we’re moving rion, 2006.
own music, poetry readings, or just to a world of relatively available and 7. W. Cohen, “Wal-Mart Wants $10
their dog barking. In the South Ko- affordable online and mobile music CDs,” Rolling Stone, 12 Oct. 2004.
rean context, this makes some sense: with fewer restrictions than we have 8. A. Veiga, “Music to Be Offered in
the government built the telecom now. Keep your fingers crossed. MP3 File Format,” The Associated
backbone network, so all providers Press, 6 Dec. 2006.
should have an equal chance to op- References 9. The Associated Press, “South
erate over the SK Telecom music 1. C. Garcia, “K-Pop Struggles to Korea Wants People in ‘Smart’
service infrastructure. Boost Sales,” The Korea Times, 30 Clothes,” 16 Aug. 2006.
The US is also moving in this di- Jan. 2007. 10. C. Higgins, “Big Demand for
rection; even the big music compa- 2. Yonhap News, “Corporate Watch- Classical Downloads is Music to
nies are starting to make material dog Fines SK Telecom for Incom- Ears of Record Industry,” The
available in unlocked formats.9 Many patible MP3 Service,” 20 Dec. Guardian, 28 Mar. 2006.
smaller vendors already sell unlocked 2006; http://english.yna.co.kr/
music—it’s their best way to deliver Engnews/20061220/6600000000 Michael Lesk is a professor and chair of
material that will play on an Apple 20061220135710E8.html. the library and information science
department at Rutgers University. His
iPod. If we all move toward MP3 as a 3. M. Connolly and A.B. Krueger,
research interests include digital libraries,
general music format instead of “Rockonomics: The Economics of computer networks, and databases. Lesk
DRM-encoded material, we should Popular Music,” working paper no. has a PhD in chemical physics from Har-
again expect an increase in the vari- 11282, Nat’l Bureau of Economic vard University. He is a member of the
National Academy of Engineering, the
ety of music and of the gadgets to Research (NBER), 2005.
ACM, the IEEE, and the American Society
play it. As an example, South Kore- 4. E. Smith, “Sony BMG Is Sued by for Information Science and Technology
ans are considering clothing with Bands Over Song-Download Roy- (ASIS&T). Contact him at lesk@acm.org.
built-in music hardware.7 (You could
always start by combining head-
phones and earmuffs.)
Strangely enough, the fastest
growing part of the digital download
business is classical music, which is
Call for Papers
otherwise a backwater of the music October–December 2007 issue on:
industry. Although the music indus- Security & Privacy in Pervasive Computing
try generally moans that digital rev- Submission Deadline: 1 May 2007
enues aren’t yet compensating for
declining CD sales, classical music IEEE Pervasive Computing invites submissions to a special issue on Security and Privacy
in Pervasive Computing. Example topics include, but are not limited to, the following:
revenues in the US (online and CD
sales combined) were actually up • Establishing trust in pervasive hardware • Using tamper-evident hardware
• Preserving security in cyber foraging • Providing peripheral awareness
22.5 percent in 2006, according to • Software and hardware attestation of trust context
Nielsen Music’s 2006 Year-End of remote executions • Combining privacy with accuracy
Music Industry Report (http://biz. • Authenticating with low distraction in location sensing
yahoo.com/bw/070104/20070104
AUTHOR GUIDELINES: www.computer.org/pervasive/author.htm
005813.html). Classical music is 3 to
PUBLICATION DATE: September 2007
4 percent of music sales in stores, but
it’s 12 percent of iTunes sales.10 Clas- GUEST EDITORS
sical listeners are generally older and M. Satyanarayanan, Carnegie Mellon University
George Cybenko, Dartmouth College
thought to be less likely to illegally Jason Hong, Carnegie Mellon University
download music; thus, it makes

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 87


Building Security In
Editors: John Steven, jsteven@cigital.com
Gunnar Peterson, gunnar@arctecgroup.net

A Metrics Framework
to Drive Application
Security Improvement

W
eb applications’ functionality and user base over time, and help establish baselines
for anomaly detection. When com-
have evolved along with the threat landscape. bined with runtime metrics, deploy-
ment-time metrics give insight into
Although controls such as network firewalls the rate of change and key service-
level agreement metrics such as avail-
are essential, they’re wholly insufficient for ability, mean time between failures,
and mean time to repair.
providing overall Web application security. They provide security Runtime metrics focus on the
Web application’s behavior in pro-
E LIZABETH A. for underlying hosts and a means of If you develop, manage, or ad- duction and the security vulnerabili-
N ICHOLS communication, but do little to aid minister Web application software ties discovered after deployment.
ClearPoint the application resist attack against and want to measure, analyze, and Vulnerability discovery at runtime
Metrics its software implementation or de- improve a development culture that causes the most expense both in
sign. Enterprises must therefore produces secure code, this article terms of application performance
G UNNAR focus on the security of the Web ap- provides an excellent starting point. and customer impact. Over time, if
PETERSON plication itself. But in doing so, the metrics collected in the earlier
Arctec Group questions immediately arise: “What Life-cycle metrics phases show improvement due to
could go wrong with my software? Software development managers design and deployment process
How vulnerable are my existing ap- use design-time metrics to make changes, then we would expect to
plications to the most common risk-management decisions when see a corresponding improvement in
problems? What changes to my soft- defining, implementing, and build- runtime metrics.
ware development life cycle might ing software and related security The notion of design-time,
affect these vulnerabilities?” mechanisms. Both managers and deployment-time, and runtime
The Open Web Application Se- developers should harvest design- metrics are particularly illustrative
curity Project (OWASP; www.owa time metrics from source code via because they apply to distinct phases
sp.org) Top Ten offers a starting static analysis, from audits and assess- of the software development life
point for figuring out what could go ments, and iteratively from other cycle. We can harvest runtime met-
wrong. This installment of Build- runtime and deployment-time met- rics, for example, in the quality as-
ing Security In presents metrics that rics. The importance of design-time surance phase.
can help quantify the impact that metrics stems from their ability to
process changes in one life-cycle identify and characterize weak- Top Ten items
phase have on other phases. For the nesses early in the application’s life To explore some explicit metrics,
purposes of this short discussion, cycle, when such weaknesses cost let’s review each OWASP Top Ten
we’ve broken an application’s life much less to fix.1 item and an example design, deploy-
cycle into three main phases: design, Deployment-time metrics mea- ment, or runtime metric for it.
deployment, and runtime. By orga- sure changes to the system and its
nizing metrics according to life configuration over time. A common Unvalidated input
cycle in addition to OWASP type, (if oversimplified) view is that change The first item—unvalidated input—
insight from the derived quantita- is the enemy of security. Deploy- involves the information from Web
tive results can potentially point to ment-time metrics provide hard data requests that isn’t validated before
defective processes and even suggest to characterize the amount of change the Web application uses it. Attack-
strategies for improvement. actually present, uncover patterns ers can exploit these weaknesses to

88 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Building Security In

compromise back-end components Broken authentication metric is OverflowVulnCount,


through the Web application. and session management which we can obtain from standard
A good design-time metric is The third item—broken authentica- vulnerability management tools that
“PercentValidatedInput.” To com- tion and session management— identify the patch level of installed
pute this metric, let T equal the means the application doesn’t software against the patch levels that
count of the amount of input forms properly protect account credentials repair known buffer overflow flaws.
or interfaces the application exposes and session tokens. Attackers that Another useful set of metrics pro-
(the number of HTML form compromise passwords, keys, session vide statistics around the patching
POSTs, GETs, and so on) and let V cookies, or other tokens can defeat latency for known overflow vulner-
equal the number of these interfaces authentication restrictions and as- abilities. To compute these metrics,
that use input validation mecha- sume other users’ identities. calculate the minimum, maximum,
nisms. The ratio V/T makes a strong An example runtime metric is mean, and standard deviation of the
statement about the Web applica- BrokenAccountCount, which we number of minutes/hours/days it
tion’s vulnerability to exploits from can compute by counting the num- took to patch detected overflow vul-
invalid input—the higher the per- ber of accounts that have no activity nerabilities during a given time pe-
centage, the better. If a company sees for more than 90 days and will never riod. A high mean or a high standard
that all of its Web applications have expire. Such accounts represent a deviation indicates either slow or in-
low values for PercentValidatedIn- clear risk of password compromise consistent patching processes.
put, then mandating the use of a and resulting illegal access.
standard input validation framework Injection flaws
would drive lasting improvement for Cross-site scripting The sixth item—injection flaws—
current and future applications. With the fourth item—cross-site involves the Web application as it
scripting or XSS—attackers can use passes parameters when accessing
Broken access control the Web application as a mechanism external systems or the local operat-
The second item—broken access to transport an attack to a user’s ing system. If an attacker embeds
control—means the application fails browser. A successful attack can dis- malicious commands in these para-
to impose and enforce proper re- close the user’s session token, attack meters, the external system can exe-
strictions on what authenticated the local machine, or spoof content cute those commands on the Web
users may do. Attackers can exploit to fool the user. application’s behalf.
such weaknesses to access other An example runtime metric is An example runtime metric is
users’ accounts, view sensitive files, XsiteVulnCount, which we can ob- InjectionFlawCount, which we
or use unauthorized functions. tain via a penetration-testing tool. can derive from penetration tests
An example runtime metric is The results will likely enter a bug- that submit invalid parameters to a
AnomalousSessionCount, which we tracking process (developers can running copy of the Web applica-
compute in two phases. The first quickly fix XSS bugs). However, this tion. This metric characterizes the
phase derives a SessionTableAccess- is another case in which catching the Web application’s vulnerability to
Profile by correlating application problem earlier is far better than later. potential attacks. Another runtime
server user log entries for a user ses- metric is ExploitedFlawCount,
sion with accessed database tables; Buffer overflow which we can derive from reported
the resulting value of SessionTable- The fifth item—buffer over- incidents in which an attacker suc-
AccessProfile is represented as a user flows—can crash Web application cessfully exploited the application
ID followed by a set of ordered pairs components such as libraries and via an injection flaw. This metric
with a table name and a count. The drivers in languages that fail to vali- characterizes the impact actually
second phase derives the Anom-
alousSessionCount by counting
how many SessionTableAccess- Organizing metrics by OWASP Top Ten
Counts don’t fit a predefined user
profile. If AnamalousSessionCount category and software life-cycle phase
is greater than one for any user, espe-
cially a privileged user, it could indi- can drive improvement in existing processes.
cate the need for significant
refactoring and redesign of the Web date input, or, in some cases, at- suffered. Both metrics offer excel-
application’s persistence layer. This is tackers can use them to take lent feedback to the development
a clear case in which detection at de- control of a process. organization about inadequate pa-
sign time is preferable. An example deployment time rameter checking.

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 89


Building Security In

Insecure configuration
management
The final item—insecure configura-
tion management—focuses on how
secure Web applications depend on a
strong server configuration. Servers
possess many configuration options
that affect security, and no default
configuration is secure.
For a deployment-time metric,
count the number of service ac-
counts (the ones a program uses to
log into services such as database
management systems) with weak or
default passwords. This indicator
helps quantify the risk of illegal ac-
cess, breach of confidentiality, and
loss of integrity. Consistent unac-
ceptable exposure warrants better
deployment standards.

Figure 1. Security scorecard. This layout helps reviewers assess one or more of the Web A security scorecard
application’s current states and quality by providing a color-coded score for each The scorecard in Figure 1 summa-
category of Open Web Application Security Project (OWASP; www.owasp.org) flaw. rizes the relatively fine-grained met-
rics that calculate data values from
penetration testing, static code
Improper error handling However, these functions and the analysis, incident management sys-
With the seventh item—improper code needed to integrate them have tems, vulnerability scanners, and
error handling—the application proven difficult to code properly, fre- other instrumentation as mentioned
source code doesn’t properly check or quently resulting in weak protection. in the previous section. Several of
handle error conditions that occur For a deployment-time metric, our client companies have used this
during normal operation. If an at- compute the percentage of servers scorecard to track improvement in
tacker can introduce errors that the with installed and active automatic security centric coding practices
Web application doesn’t handle, he or hard-disk encryption to find the level with their respective Web applica-
she can gain detailed system informa- of protection available as part of a Web tion development organizations.
tion, deny service, cause security application’s operating environment. The scorecard gives a calculated
mechanisms to fail, or crash the server. In short, the higher the metric value, rating for seven of the OWASP Top
For a design-time metric, use a the higher level of protection. Ten categories. Color helps translate
static analysis tool to count the the metric results to a more qualita-
number of function calls that don’t Application denial tive state: red for bad, green for
check return values. This “instance of service good, and yellow for somewhere in
per application” count provides a With the ninth item—denial of ser- between. If you perform this exer-
good indicator of improper error vice—attackers can consume Web cise in your own company, the keys
handling’s prevalence. A simple raw application resources to a point to success include forming consen-
count performs best. In this case, di- where other, legitimate users can no sus around the mapping and only
viding by the number of all function longer access or use the application. making changes in a controlled and
calls to normalize a raw count into a Attackers can also lock users out of fully auditable manner. Making a
percentage potentially masks a seri- their accounts or even cause the en- change based on a pseudo-political
ous problem. tire application to fail. need to turn a red into a yellow or a
For a runtime metric, derive yellow into a green will cause a lot of
Insecure storage metrics from penetration tests that damage in the long run, rendering
The eighth item—insecure storage— cover denial-of-service attacks. Vul- the scorecard and its underlying
illustrates how Web applications fre- nerability discovery can help here, metrics useless.
quently use cryptographic functions but preventing denial of service can The following steps, inspired by
to protect information and credentials. be a complicated design issue. the Six Sigma framework (www.isix

90 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Building Security In

sigma.com), help map quantitative given time period relative to previous


metric data into color-coded ratings: periods. Subordinate scorecards can
include trend lines covering several
• Express each underlying metric in historical time periods.
terms of defects divided by oppor-
tunities. If, for example, a Web ap-
plication has 100 input forms, and
12 of them have defective input
validation, then the application
T o enrich the above quantitative
scoring, analysts should also
include qualitative, unstructured Engineering
gets a rating of 88. The equation is annotations to the scorecard, de-
1.0 – (#defects/#opportunities). scribing how to use the data pro- & Applying
• Map values to colors by compar- vided, what objectives it serves,
ing each value to thresholds. For how to interpret the results, and the Internet
example, your group could estab- what actions the company has ini-
lish that red maps to values less tiated as a result of the insights de-
than 80, yellow maps to values be- rived. In this way, organizations can IEEE Internet Computing
tween 81 to 90, and green maps to begin to organize the myriad fine- reports emerging tools,
values over 91. grained metrics derived from their technologies, and
• Aggregate all individual Web ap- existing infrastructure and effi- applications implemented
plication scores in a given vulnera- ciently drive application security through the Internet
bility category to create a single improvement. to support a worldwide
“summary” score. As for the time involved, you can computing environment.
implement and regularly review a
On this last point, you can do the scorecard such as the one in Figure 1
mapping of many scores to one incrementally by starting with easily
In 2007, we’ll look at:
color-coded state in several different obtained metrics such as those from
ways. Some possibilities are: your currently existing penetration • Autonomic Computing
testers, static code scanners, and in- • Roaming
• The assigned state for the entire cident management systems. In our • Distance Learning
vulnerability category takes the own tests, in which we used a pur- • Dynamic Information
worst value or color. This harsh pose-built security metrics plat- Dissemination
but useful method gives lagging form, the scorecard took roughly • Knowledge Management
applications a lot of visibility and two weeks of effort from initial de- • Social Search
stresses improvement for the cate- sign to deployment for automatic
gory’s worst score. metric production.
• Map the mean of all individual For submission information
metrics to a color via a threshold References and author guidelines,
mechanism. 1. D.E. Geer, A.R. Jaquith, and K. Soo please visit
• Compute a weighted mean of all Hoo, “Information Security: Why www.computer.org/
individual metrics on the appli- the Future Belongs to the Quants,” internet/author.htm
cation’s agreed upon criticality IEEE Security & Privacy, vol. 1, no.
(consensus is key here). Map the 4, 2003, pp. 24–32.
weighted mean to a state using a
threshold mechanism. Elizabeth A. Nichols is the Chief Technol-
ogy Officer at ClearPoint Metrics. Her re-
• Map the mean minus the standard search interests include information security
deviation of all individual metrics metrics design, automation, visualization,
to a state for a particular category. and benchmarking. Nichols has a PhD in
This approach favors consistency. mathematics from Duke University. Con-
tact her at ean@clearpointmetrics.com.
• Map the value of the lowest appli-
cation in the top decile (or quartile Gunnar Peterson is a founder and man-
and so on) to a state. aging principal at Arctec Group, which
supports clients in strategic technology
decision making and architecture. His work
The scorecard provides additional in- focuses on distributed systems security
dicators that show an upward, down- architecture, design, process, and delivery.
ward, or unmoving trend for the Contact him at gunnar@arctecgroup.net. www.computer.org/internet/

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 91


Emerging Standards
Editors: Rick Kuhn, kuhn@nist.gov
Susan Landau, susan.landau@sun.com
Ramaswamy Chandramouli, mouli@nist.gov

Infrastructure Standards for


Smart ID Card Deployment

S
mart card deployment is increasing thanks to the A core component of the infra-
structure for supporting identity-
addition of security features and improvements in based applications, in general—and
smart ID cards, in particular—thus
computing power to support cryptographic algo- consists of product offerings with
varying functionality and interfaces.
rithms with bigger footprints (for digitally signing Our search to identify areas for stan-
dardization in the smart ID card sys-
and encrypting) in the smart card chips in the past five or six years. tem infrastructure therefore starts
with information flows in and out of
RAMASWAMY Typical applications include sub- Smart ID card the IDMS. Based on a conceptual
CHANDRAMOULI scriber identification module (SIM) system infrastructure understanding of IDMS as the
US National cards (in telecommunications), mi- At the heart of the smart ID card sys- repository of all credentials, it’s easy
Institute of cropayments (in financial transac- tem infrastructure is the identity man- to see that it should have two kinds
Standards & tions), commuter cards (in urban agement system (IDMS), which of information flow streams:
Technology transportation systems), and identifi- includes both a data repository and a
cation (ID) cards. Although the software system that many organiza- • The credential-collection stream (CCS)
PHILIP LEE share of cards used for identification tions use to support identity-based consists of all information flows
Identity applications (which we’ll call smart applications such as single sign-on needed to gather and consolidate
Alliance ID cards) is relatively small within and authorization management. credentials from multiple sources.
the overall smart card market, it’s one Broadly, the two most common areas Different types of credentials or
of the fastest growing segments. of identity-based applications are credential-related information ori-
Smart ID cards control physical physical access-control systems (PACS) ginate from these sources and flow
access to secure facilities and logical and logical access-control systems into the IDMS.
access to IT systems (Web servers, (LACS). Despite the IDMS’s versa- • The credential-provisioning stream
database servers, and workstations) tility, no agreed-upon definition ex- (CPS) consists of all information
and applications. Authentication of ists for its functional scope. Its flows to various end points (or tar-
the card and holder takes place using a canonical function as the manager of get systems) that need to perform
set of credentials. An organization de- all forms of enterprise-wide cre- identity verification. Where iden-
ploying such cards must have an infra- dentials (identity information) is re- tity verification takes place, au-
structure for generating, collecting, cognized, but individual product thentication credentials (a subset
storing, provisioning, and maintain- offerings vary widely in their func- of the credentials stored in IDMS)
ing credentials. The components in- tionality. The points of variation in- flow from IDMS to the access-
volved in these credential life-cycle clude the types of corporate (meta) control entities, such as door pan-
management activities constitute directories to which the IDMS can els or IT application systems.
what we’ll call the smart ID card system interface (LDAP, for example),
infrastructure, which supports smart ID native database management sys- When the authenticating credentials
card deployment. tem support (relational or object- used for identity verification are
Not all components involved in oriented), the data schemas’ long pieces of data (say, 25 bytes
this infrastructure have standardized expressive power (some IDMSs sup- rather than 4-digit personal identifi-
interfaces. Moreover, no robust port the capture of authorization in- cation numbers) or the authentica-
messaging standards exist for infor- formation such as roles, groups, and tion process involves sophisticated
mation exchange among the com- user IDs), and the mechanisms for transactions (a cryptographic proto-
ponents. Yet, some efforts are under connecting to the systems to which col rather than the exchange of a
way to partially address the standards the IDMS must provision the creden- simple shared secret, for example),
gap in this area. tials (connectors, agents, and so on). credential verification requires a

92 PUBLISHED BY THE IEEE COMPUTER SOCIETY ■ 1540-7993/07/$25.00 © 2007 IEEE ■ IEEE SECURITY & PRIVACY
Emerging Standards

smart card. Along with this require- Background


Enrollment workstation Enterprise HR system
ment, the associated infrastructure investigation service
needs a new component, called a
card-management system (CMS), to se- CCS-2 CCS-1 CCS-3
curely populate the card with the
credentials and to track the card’s sta- Identity-management
CPS-1 system (IDMS) Credential
tus regarding whether it’s active, sus- database
pended, terminated, lost, or stolen. CPS-3
Physical access-
Like a PACS or LACS, the CMS be- control system (PACS) CPS-2
comes the target of provisioning in (server and panels) Single-sign-on directories
its own right in a smart ID card sys- Card-management system (Logical access-
Smart card control system [LACS])
tem infrastructure. Hence informa-
CPS-2C
tion flow from the IDMS to CMS Graphical CPS-2A
becomes an important component Personalization modules PKI server
of the CPS. Electrical CPS-2B
To design this infrastructure to
meet an enterprise’s functional and
security needs, the enterprise con- Figure 1. Smart ID card system infrastructure components. Numbered information flows for the
sumer needs some market choices credential-collection stream (marked as CCS-x) and credential-provisioning stream (marked as
for all of the components. Yet, inter- CPS-x) illustrate the flow of information into and out of the IDMS component, respectively.
face standards between the key infra-
structure components are critical for
ensuring that integrating compo- card is physically issued, and a card
nents that meet the enterprise needs holder thereafter. The first CCS Glossary
isn’t a tedious and technically chal- (marked as CCS-1 in Figure 1) is the
lenging task. In the rest of this article, pre-enrollment package, which con- API: application programming interface
we identify the components and dis- tains the following information: CCS: credential-collection stream
cuss whether interface standards CMS: card-management system
exist for them. Our scope encom- • applicant demographic informa- CPS: credential-provisioning stream
passes both types of interface specifi- tion (name, address, social-security EWS: enrollment workstation
cations—those for program-level number, gender, birth date, and IDMS: identity management system
(APIs) and message-level (messaging so on), LACS: logical access-control systems
interface) interactions. • applicant affiliation information PACS: physical access-control systems
Figure 1 illustrates the compo- (organization, department or divi- PC/SC: personal computer smart card
nents in a typical smart ID card system sion, country of citizenship, and PKCS: public-key cryptography standard
infrastructure, including numbered status in the organization [such as PKI: public-key infrastructure
information flows for the CCS and employee or contractor], and so SDK: software development kit
CPS. Credential sources are the compo- on), and
nents from which CCSs originate, • sponsorship information (seal of
and credential targets are the compo- approval attesting that the appli- The enrollment package gener-
nents into which CPSs terminate cant is eligible to receive a smart ated from this process flows from the
(except for CPS-2C). ID card). EWS to the IDMS (marked as CCS-
2) and consists of the biometric in-
Once the pre-enrollment pack- formation (facial image, fingerprint,
Information flows age is entered, the IDMS or sponsor and templates) and scanned copies of
in credential notifies the applicant to go through the breeder documents. The organi-
collection stream the enrollment process at an enroll- zation planning to issue a smart ID
All credentials originate primarily ment workstation (EWS; see Figure card sends out applicant’s demo-
from the human resources (HR) sys- 1), which collects the applicant’s graphic information along with a
tems or their equivalents within a biometric information and per- subset of the enrollment package
given organization (personnel man- forms identity proofing or vetting (particularly the fingerprints) per-
agement systems, contractor reg- by examining the breeder documents, taining to that applicant for back-
istries, and so on). The person to be such as birth certificate, passport, ground investigation to verify
issued a smart ID card is called an ap- driver’s license, and residency and whether the applicant is a law-abid-
plicant until the time the smart ID work permits. ing citizen. An example of such a

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 93


Emerging Standards

background investigation in the US protocols between Web interfaces, ements can vary by organization, it
is an FBI criminal history check. the standard industry practice is to isn’t possible to define a “standard
The appropriate authority in the or- specify the use of Secure HTTP (at XML schema” for any given message
ganization then adjudicates the in- the session level) and Transport flow in the infrastructure. However,
one school of thought advocates that
a generic XML schema consisting of
To design the smart ID card infrastructure to all possible credentialing elements
can be defined to include a manda-
meet an enterprise’s functional and security tory set and make the rest optional
elements. The argument in favor of
needs, the enterprise consumer needs some this approach is that it can establish
uniformity in the syntactical repre-
market choices for all of the components. sentation of a given credential type
(expiration date, for example), thus
vestigation report and sends the Layer Security (at the transport facilitating support for multiple iden-
result to the IDMS (CCS-3). level). If a Web service interface sup- tity-based applications using the
porting a service-oriented protocol same infrastructure.
is involved, the associated secure ver- Continuing with our process-
Interface standards sion of application protocols such as flow analysis, we find that the enroll-
with credential- SOAP 2.0 must also be specified. ment package is the collection stream
source components The messages in both directions flowing from the EWS to the IDMS.
Based on the process flow we’ve de- (HR systems to IDMS and IDMS to An important aspect to remember is
scribed, the infrastructure involved HR systems, for example) must also that an EWS could be an in-house
in credential collection consists of be identified. Our reference archi- system or be located at an enrollment
the following credential-source compo- tecture for the smart ID card system service provider site (and hence
nents (and interfaces): infrastructure includes the following under a different IT administration
messages related to the CCS: domain). Even with an in-house
• enterprise HR system-to-IDMS EWS system, the physical location
interface • pre-enrollment package upload could be a remote site and hence re-
• EWS-to-IDMS interface. from HR systems to IDMS (for quire use of the public network for
transferring CCS-1), communications. Given the nature of
Enterprise HR systems are gener- • pre-enrollment package response EWS-to-IDMS interactions and the
ally legacy IT systems (or cus- from IDMS to HR systems, fact that no programming or mes-
tomized HR modules from • enrollment package upload from sage-level interface specifications
enterprise resource planning offer- EWS to IDMS (for transferring exist for IDMS, the process needs the
ings) with heterogeneous database- CCS-2), and same standardized set of require-
management systems. Although of • adjudication package upload from ments specified for HR system-to-
recent origin, IDMSs don’t have HR systems to IDMS (for trans- IDMS interactions. Furthermore, the
standardized APIs or messaging in- ferring CCS-3). message-level protections assume
terfaces. In the absence of interface added importance in EWS-to-IDMS
specifications, HR systems and Because of the diverse platforms on interaction because the enrollment
IDMSs use custom Web interfaces which the components involved in package contains privacy-sensitive
(all IDMSs come with Web inter- transferring these messages are information (biometrics and breeder
faces). System integrators therefore hosted, a machine-independent documents). Organizations making
have to make choices in the follow- transfer syntax is needed. Again, the use of either an in-house or service
ing areas for standardizing and se- state of the practice is to choose provider EWS to receive enrollment
curing interactions or information XML, which means expressing the packages must therefore ensure that
flows between enterprise HR sys- semantic structure of the messages these packages are delivered through
tems and IDMSs: through an artifact called XML secure channels and that the neces-
schema. An XML schema for a given sary hardware and software elements
• secure network protocols, message flow (in the context of the (cryptographic modules) involved in
• data and messaging syntax, and smart ID card infrastructure) es- providing the channels have the certi-
• message-level protections. sentially consists of the description of fication requirements consistent with
the various credentialing elements. cryptographic strength requirements.
In the area of secure network Given that the set of credentialing el- One last aspect of the process flow

94 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


Emerging Standards

related to credential collection in the • digitally signed third-party attesta- connected to enterprise networks)
smart ID card system infrastructure is tion of identity and credentials— legacy systems without standardized
the flow of information from IDMS the PKI certificates provided by a APIs or messaging interfaces. The
to background-verification systems. certificate authority—between two primary components are the
At least one such system—namely, the CMS and a PKI Server (CPS- PACS server (the main data reposi-
the US Visit/IDENT system—pro- 2C); and tory containing physical access-con-
vides a complete message interface • logical access-control information trol information) and the PACS
specification requiring: (cardholder name, a unique identi- panel, which maintains a cache of the
fier such as user principal name, or- data required (in the form of a
• secure network protocols and ganizational role, or clearance level) lookup table) for restricting physical
message-level protection (SOAP between IDMS and an LACS access; the PACS panel activates a
2.0 with client-side secure sockets module such as a single-sign-on lock to open doors or turnstiles once
layer authentication) and (SSO) directory (CPS-3). the smart-card reader (called a PACS
• data and messaging syntax (XML, reader) matches the submitted data
12 message types with XML Having identified the information against the lookup table. Given that
schema for each). flows and the participating compo- PACSs predate smart card and
nents, our next step is to look at the IDMS deployments (they were de-
Let’s now examine the portion of nature of the interfaces that these veloped to work with magnetic
the infrastructure involved in cre- components present. stripe cards), the norm for getting
dential provisioning. To do so, we access-control data into PACS
must identify all information flows servers (CPS-1) is through cus-
that are part of the CPS, just as we Interface standards tomized data-downloading scripts
did with the CCS. with credential-target that periodically batch transfer data
components from relevant authorized sources
The credential-target components such as HR systems and physical se-
Information flows (and interfaces) involved in creden- curity office databases.
in the credential- tial provisioning in the infrastructure Because of the huge investment
provisioning stream are the: in PACS systems (even single large
All information flows in the CPS organizations generally have PACS
originate from the IDMS; the infor- • IDMS-to-PACS panel interface, from different manufacturers), the
mation content and number of flows • IDMS-to-corporate directory in- US Department of Homeland
depends on the type of authentica- terfaces (to support logical access Security is sponsoring efforts to
tion application. The various infor- control through single-sign-on develop a middleware-oriented
mation flows include: mechanisms), approach for interfacing between
• IDMS-to-CMS interface, and IDMSs and multiple PACSs. Under
• physical access-control informa- • CMS-to-PKI servers. this approach, organizations can de-
tion (cardholder name, facial ploy PACS proxies with a standard-
image, unique credentialing num- The information flows from CMS to ized messaging interface that
ber, expiration date, or status) be- the provisioning end points (the includes the following components:
tween IDMS and PACS (marked physical smart card, for example) are
as CPS-1 in Figure 1); also part of the CPS; hence, the cre- • secure network protocols (SOAP
• card-resident credential informa- dential-provisioning function in- over HTTP),
tion (all credentials that will even- cludes the CMS-to-card printer • data and messaging syntax (XML
tually reside on the card) between
the IDMS and CMS (marked as
CPS-2); The interface between IDMS and corporate
• graphical card-personalization in-
formation (all visual information directories is one of the few areas where
found on the card such as the pho-
tograph, issuing organization’s seal, standardized interfaces are available.
and cardholder name) from the
CMS to card printers (CPS-2A); interface and the CMS-to-card inter- syntax, two main message types,
• electrical card-personalization in- face (through the card reader device). and XML schemas for each mes-
formation from the CMS to the Like enterprise HR systems, sage format), and
smart ID card (CPS-2B); most PACSs are stand-alone (rarely • message-level protections that

www.computer.org/security/ ■ IEEE SECURITY & PRIVACY 95


Emerging Standards

provide mutual authentication be- ing information to and extracting ing from the use of customized data-
tween the IDMS and PACS. information from the CMS. These upload and download scripts and
SDK libraries facilitate the task of communication sockets to the use of
The interface required between transferring card-resident credential standardized application and net-
IDMS and corporate directories for information (CPS-2) from the work-layer protocols (that include
transferring credential information IDMS to the CMS, as well as trans- security) using partially defined
for logical access control (CPS-3) is ferring graphical card-personaliza- messaging specifications. Upgrading
one of the few areas in which stan- tion information (CPS-2A) from this process to one with standardized
dardized interfaces are available with the CMS to card printers. The procedures can occur only when the
the help of secure directory access downside is that these SDKs are components in the smart ID card
protocols such as LDAP. useful only for integrating specific, system infrastructure have standard-
The IDMS-to-CMS interface is designated CMS products; organi- ized APIs or message-level interface
perhaps the most important one in zations must deploy new SDKs and specifications. An alternate path to-
the infrastructure. A CMS maintains develop new sets of data transfer ward this goal would be to employ
life-cycle data such as card status and programs if the CMS product in middleware with standardized APIs
credential status and populates smart the smart ID card infrastructure for connecting to each of these com-
cards with credentials by establishing changes. That said, the following ponents. For now, the road for both
secure sessions. Hence all card- platform- and product-neutral spec- approaches seems long. Organiza-
resident information (CPS-2) must ifications are available for integrating tions deploying smart ID cards will
be transferred from IDMS to CMS. CMS with PKI servers (for transfer- have to live with proprietary APIs
As Figure 1 illustrates, the CMS, in ring CPS-2C) and smart cards (for and messaging specifications for
turn, has to distribute or add to this transferring CPS-2B): some time to come.
information by communicating with
other provisioning end-point enti- • Public-Key Cryptography Stan- Ramaswamy Chandramouli is the direc-
tor of the NIST Personal Identity Verifi-
ties such as PKI servers, card print- dard (PKCS) #10 is a messaging cation Program (NPIVP) at the US
ers, and the smart card to be specification for requesting digital National Institute of Standards & Tech-
populated. CMS communicates with certificates from PKI servers run nology. His research interests include for-
the following entities to perform the by different certificate authorities. mal model-based testing, security
architectures, role-based access control,
associated functions: • Global Platform Messaging and and Domain Name System security.
API specifications (published by Chandramouli has a PhD in Information
• PKI servers to request and obtain the Globalplatform.org industry Security from George Mason University.
digital identity certificates to cre- He is coauthor of Role-Based Access
consortium) enables a CMS to
Control (Artech House, 2007), which is
ate bindings between cards and electrically personalize smart cards now in its second edition. Contact him at
credentials; in a secure way. mouli@nist.gov.
• cryptographic libraries (not shown
in Figure 1) to generate public– Philip Lee is a partner at Identity Alliance.
His research interests include studying
private key pairs and digitally sign the convergence of the US government’s
some credential objects that will
go onto the card;
• card printers to print cardholder
A fter a smart ID card is issued,
various components perform
the actual authentication functions.
Personal Identify Verification Program
applications and existing enterprise iden-
tity-management solutions, as well as
evaluating the feasibility of specialized
names, photographs, security fea- These include the host application biometric applications such as Match-
tures such as holographic patterns, and the service-provider middle- On-Card. Lee has an MS in computer sci-
and so on; and ware that provides specialized func- ence from the University of Maryland. He
• smart cards for electrical personal- tions, such as financial transactions is a member of the Smart Card Alliance
and ASIS International. Contact him at
ization of credentials in a card’s and telecommunications, related to lee@identityalliance.com.
data objects or containers. the smart card’s application area. Be-
cause these components technically
The most notable feature with form part of the smart card user in-
respect to integrating CMS with terface architecture, rather than the Interested in writing for this
other components in the smart ID infrastructure architecture, we didn’t department? Please contact editors
card infrastructure is that almost all consider their interfaces in this arti- Rick Kuhn, kuhn@nist.gov,
CMS vendors provide their own cle. Even restricting our focus to in- Susan Landau, susan.landau@
proprietary software development frastructure components in smart ID sun.com, and Ramaswamy
kits (SDKs) consisting of program- card systems, we find that the process Chandramouli, mouli@nist.gov.
ming interface libraries for upload- is still in the early stages of transition-

96 IEEE SECURITY & PRIVACY ■ MARCH/APRIL 2007


5

C ELEBRATING
S OUTH K OREA’ S D IGITAL W AVE • I DENTITY M ANAGEMENT S YSTEMS

YEARS

MARCH/APRIL 2007 BUILDING CONFIDENCE IN A NETWORKED WORLD


VOLUME 5, NUMBER 2
IEEE SECURITY & PRIVACY • MALWARE • VOLUME 5 • NUMBER 2 • MARCH/APRIL 2007
MARCH/APRIL 2007
VOLUME 5, NUMBER 2

BUILDING CONFIDENCE IN A NETWORKED WORLD

Features
Malware

15 Guest Editor’s Introduction


IVÁN ARCE

17 Studying Bluetooth Malware


Propagation: The BlueBag Project
LUCA CARETTONI, CLAUDIO MERLONI, AND STEFANO ZANERO

Bluetooth worms currently pose relatively little


danger compared to Internet scanning worms.
The BlueBag project shows targeted attacks
through Bluetooth malware using proof-of-
concept codes and mobile devices. COVER ARTWORK BY GIACOMO MARCHESI, WWW.GIACOMOMARCHESI.COM

26 Alien vs. Quine


VANESSA GRATZER AND DAVID NACCACHE
identification of malware. Entropy analysis
enables analysts to quickly and efficiently
identify packed and encrypted samples.
Is it possible to prove a computer is malware-free
without pulling out its hard disk? The authors
introduce a hardware inspection technique based
46 Code Normalization
for Self-Mutating Malware
on the injection of carefully crafted code and the DANILO BRUSCHI, LORENZO MARTIGNONI, AND MATTIA MONGA
analysis of its output and execution time.
Next-generation malware will adopt self-mutation

32 Toward Automated Dynamic


Malware Analysis Using CWSandbox
to circumvent current detection techniques. The
authors’ strategy reduces different instances of the
CARSTEN WILLEMS, THORSTEN HOLZ , AND FELIX FREILING same malware into a common form that can
enable accurate detection.
The authors present CWSandbox, which
executes malware samples in a simulated Identity Management
environment, monitors all system calls, and
automatically generates a detailed report to
simplify the malware analyst’s task.
55 Trust Negotiation in Identity Management
ABHILASHA BHARGAV-SPANTZEL, ANNA C. SQUICCIARINI,
AND ELISA BERTINO

40 Using Entropy Analysis


to Find Encrypted and Packed Malware
Most organizations require the verification of
personal information before providing services;
ROBERT LYDA AND JAMES HAMROCK the privacy of such information is of growing
In statically analyzing large sample collections, concern. The authors show how federal IdM
packed and encrypted malware pose a systems can better protect users’ information
significant challenge to automating the when integrated with trust negotiation.

Postmaster: Send undelivered copies and address changes to IEEE Security & Privacy, Circulation Dept., PO Box 3014, Los Alamitos, CA 90720-1314. Periodicals postage rate paid at New York, NY, and at additional mailing offices. Canadian
GST #125634188. Canada Post Publications Mail Agreement Number 40013885. Return undeliverable Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8. Printed in the USA.
Circulation: IEEE Security & Privacy (ISSN 1540-7993) is published bimonthly by the IEEE Computer Society. IEEE Headquarters, Three Park Ave., 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los
Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821 8380; IEEE Computer Society Headquarters, 1730 Massachusetts Ave. NW, Washington, DC 20036-1903. Subscription rates: IEEE Computer Society members
get the lowest rates and choice of media option—$24/29/29 US print + online/sister society/individual nonmember. Go to www.computer.org/subscribe to order and for more information on other subscription prices. Nonmember rate:
available on request. Back issues: $25 for members and $98 for nonmembers.
Copyright and reprint permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of US copyright law for the private use of patrons 1) those post-1977 articles that carry
a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; and 2) pre-1978 articles without fee. For other copying,
reprint, or republication permissions, write to the Copyright and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08855-1331. Copyright © 2007 The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Departments
From the Editors
4 Trusted Computing in Context
FRED B. SCHNEIDER

News
7 News Briefs
BRANDI ORTEGA

Interview
11 Silver Bullet Talks
with Dorothy Denning
GARY MCGRAW

Education SECURE SYSTEMS, P. 80

64 Common Body of Knowledge


Crypto Corner
for Information Security
MARIANTHI THEOHARIDOU AND DIMITRIS GRITZALIS
76 When Cryptographers Turn Lead into Gold
On the Horizon PATRICK P. TSANG

Secure Systems
68 Secure Communication
without Encryption?
KEYE MARTIN
80 A Case (Study) For Usability
in Secure Email Communication
Privacy Interests APU KAPADIA

Digital Protection
72 Setting Boundaries at Borders:
Reconciling Laptop Searches and Privacy
E. MICHAEL POWER, JONATHAN GILHEN,
85 South Korea’s Way to the Future
MICHAEL LESK
AND ROLAND L. TROPE

Building Security In
88 A Metrics Framework to Drive
Application Security Improvement
ELIZABETH A. NICHOLS AND GUNNAR PETERSON

Emerging Standards
92 Infrastructure Standards
for Smart ID Card Deployment
RAMASWAMY CHANDRAMOULI AND PHILIP LEE

84 Ad Product Index

PRIVACY INTERESTS, P. 72 Printed on 100% recycled paper

For more information on these or any other computing topics, please visit the IEEE Computer Society’s Digital Library at http://computer.org/publications/dlib.

You might also like