You are on page 1of 4

Crypto Corner

Editors: Peter Gutmann, pgut001@cs.auckland.ac.nz


David Naccache, david.naccache@ens.fr
Charles C. Palmer, ccpalmer@us.ibm.com

Side Channel Attacks on


Cryptographic Software

W
hen it comes to cryptographic software, seal, measured the beam’s modu-
lation, and recreated the conversa-
side channels are an often-overlooked tions in the room. This listening
method went undetected for years.
threat. A side channel is any observable More recently, side channel
attacks have become a power-
side effect of computation that an attack- ful threat to cryptography. One
of the first papers on side channel
er could measure and possibly influence. Crypto is especially attacks showed how to recover an
RSA private key merely by timing
Nate Lawson vulnerable to side channel attacks its quality. For example, memory how long it took to decrypt a mes-
Root Labs because of its strict requirements allocations by unrelated processes sage.1 This was possible because
for absolute secrecy. In the soft- might skew some measurements, RSA and other public-key crypto-
ware world, side channel attacks so a particularly busy system systems work with large numbers
have sometimes been dismissed as might have a low S/N ratio. Error (for example, 2,048 bits), whereas
impractical. However, new system correction methods can assist with modern CPUs have a smaller word
architecture features, such as larger this case. size. Crypto implementations
cache sizes and multicore proces- Whereas covert channels in- compensate by using multipreci-
sors, have increased the prevalence volve the problem of preventing sion arithmetic, representing large
of side channels and quality of mea- cooperation, side channel attacks numbers by an array of words and
surement available to an attacker. are a purely adversarial problem. using a loop to carry overflows
Software developers must be aware Side channels emerge because from one word to the next.
of the potential for side channel at- computation occurs on a non-ide- To raise a multiprecision num-
tacks and plan appropriately. al system, composed of transistors, ber to an exponent, systems such
wires, power supplies, memory, as RSA commonly use square-and-
History of and peripherals. Each component multiply. This optimization de-
Side Channel Attacks has characteristics that vary with composes an exponentiation into a
Side channels are a variant of the the instructions and data being series of squarings (x2) and condi-
classic covert-channel problem. processed. When this variance is tional multiplies (* x), which oc-
Covert channels involve two or measurable by an attacker, a side cur if the bit in question is a one.
more processes collaborating to channel is present. This is similar to pencil-and-paper
communicate via a shared re- Intelligence agencies have of- multiplication, in which trailing
source that they can both affect ten relied on side channel attacks zeros mean that you shift the re-
and measure. Attackers can ex- to monitor their foes. In one clev- sult one decimal place to the left
ploit these channels to bypass op- er incident, the Soviet Union pro- while nonzero digits are multi-
erating system protections such as vided a large wooden seal to the plied and added to the result.
mandatory access control that are American consulate in Moscow. Because the multiply step is
intended to keep the processes The US ambassador proudly hung conditional, an attacker gains in-
separate. For example, one process it in his office after it had been formation about the total number
can allocate memory while the examined for covert transmitters. of one bits with each decryption.
other measures the amount of free It appeared to be clean. Unbe- By measuring the total time to
memory. Through repetition of knownst to the ambassador, the perform a multiprecision expo-
this behavior, the first process can seal contained a carefully designed nentiation with different input
slowly communicate information cavity that vibrated in response messages, the attacker can eventu-
to the second. The channel’s sig- to sounds in the room. The spies ally recover the entire private key
nal-to-noise (S/N) ratio measures transmitted a radio beam at the or enough to brute-force the rest.

72 COPublished by the IEEE Computer and Reliability Societies ■ 1540-7993/09/$26.00 © 2009 IEEE ■ NOVEMBER/DECEMBER 2009
Crypto Corner

Timing attacks have continu- Authentication Code) hash return self.Sign(msg) ==


ally improved, even being per- construction is often used to au- sig_bytes
formed against an SSL (Secure thenticate messages. To compute
Sockets Layer) implementation a given message’s HMAC, the The Java code was equivalent.
over a network.2 New ways to fil- sender hashes the message and The underlying compari-
ter jitter have improved the distin- a secret key twice using a cryp- son operator for both high-level
guishability to 200 ns over a LAN tographic hash algorithm (for languages performs a byte-wise
and 30 ms over the Internet.3 At- example, SHA-256). The re- match of the two arrays. If any el-
tacks have also exploited new side sult is attached to the message. ement didn’t match, the compari-
channels. Power consumption, RF The recipient then calculates the son loop would terminate early.
and electromagnetic emissions, message’s HMAC via the same This would provide a timing side
sound, vibration, and even heat process and compares the result channel in which the attacker
give away information about secret to the value included with the could iteratively fill in guesses for
computations. These attacks aren’t message. If they match, the mes- each byte of the HMAC field, re-
merely the subject of research pa- sage wasn’t tampered with after submitting the same message each
pers. Smartcards used for payment, the sender calculated the HMAC. time. When the guess was cor-
transit, and satellite TV have been One subtlety with this pro- rect, the comparison would take a
compromised by both active fault cess is that the value the recipient little longer. Eventually, when the
induction attacks (“glitching”) and calculates must be kept secret. whole HMAC was correct, the re-
side channel attacks. Hackers used Consider what would happen if cipient would accept the message.
a timing attack against a secret the result were revealed to the An implementation of this at-
key stored in the Xbox360 CPU sender in the case of a mismatch, tack over a TCP connection to
to forge an authenticator and load perhaps as part of an error mes- localhost took about a thousand
their own code.4 sage. An attacker could submit a queries per byte of the secret key.
Embedded-systems designers message with an invalid HMAC This means that an attacker could
are no longer the only ones who field, observe the error message, find a 128-bit key in less than a
must prevent side channel attacks. and then resend the same mes- few minutes. Because the array
Previously, network-based timing sage with the correct value at- comparison operators in Java and
attacks against SSL were the only tached. The recipient would Python aren’t implemented na-
side channel attack most software accept this message as valid, even tively, the timing difference for
developers needed to consider. though the sender didn’t create each loop iteration was relatively
But today, virtualization and ap- the authenticator. Although most large. But even if there was more
plication-hosting services such as systems probably don’t reveal the network jitter or the comparison
Amazon S3 have given attackers correct HMAC value directly, loop was faster, the attacker could
a more privileged vantage point a side channel attack can often simply take more samples, apply
of running code on the same sys- produce the same effect. an appropriate filter, and perform
tem (possibly even at the same I recently reviewed the open- a statistical hypothesis test to de-
privilege level) as the target’s code. source Google Keyczar cryp- termine which guess was correct.
Also, high-speed multicore CPUs tographic library for possible The solution to this problem is
with large caches and complicated flaws.5 This library provides use- to implement a comparison func-
instruction- and data-dependent ful high-level key management tion that doesn’t terminate early.
behavior provide more possibili- features, with separate imple- Although this might sound easy
ties for side channels and greater mentations in Java, Python, at first, eliminating all conditional
precision for measurements. and C++. Keyczar includes an branches from a comparison loop
To illustrate side channel at-
tacks against software cryptogra- Power consumption, RF and electromagnetic emissions,
phy, I analyze three recent attacks.
Each is increasingly more pow- sound, vibration, and even heat give away information
erful, to the point where the at-
tacker can recover an entire RSA about secret computations.
key by measuring the behavior of
a single decryption operation. HMAC implementation for au- is surprisingly difficult. Even with
thenticating messages. a correct algorithm, some un-
Keeping the Correct The Python code compared derlying detail of the high-level
Answer Secret the received and calculated byte language implementation (such
The HMAC (Hash Message values as follows: as garbage collection) could still

www.computer.org/security 73
Crypto Corner

AES process ciative cache, which maps multiple 3. Trigger and time another en-
addresses to the same cache line cryption of the same plaintext.
on the basis of some fraction of
the upper address bits. For exam- The first step ensures that all the
ple, the addresses 0x100, 0x200, AES lookup tables accessed by
Cache
and 0x800 would all use the same the given plaintext and key are
cache line if the cache had 256 cached. The second step forces
lines of one byte each. “Set” refers the CPU to evict part of one AES
Spy process to the number of possible destina- table that the attacker is target-
tion cache lines per address (that ing, on the basis of a guess of the
is, N-way). The CPU evicts older key byte. The final step tests the
entries when data is loaded into an attacker’s hypothesis. If a cache
already-filled cache line. miss occurs and the AES en-
AES (Advanced Encryption cryption takes longer than other
Standard) is a standard block cipher. cases, the guess for the XOR of
It encrypts and decrypts data with the plaintext and key bytes was
Figure 1. In a Prime+Probe attack, a spy process probes a secret key, using a combination correct and caused the CPU to
the cache by monitoring timing of accesses to its of primitives such as MixColumns, reload the table from RAM after
own memory. As the target process encrypts, it evicts ShiftRows, and SubBytes over it had been evicted. If not, the
portions of the attacker’s memory from the cache, many rounds (10 for a 128-bit guess was incorrect. The attacker
resulting in longer access times. The access times key). A common optimization repeats this process to narrow the
for the individual regions of the attacker’s memory ­technique on 32-bit processors is to possible key values.
correspond to which tables the encryption process precompute a series of tables on the Prime+Probe (see Figure 1) is
accessed, and thus the target’s key. basis of the combination of these more powerful. It’s analogous to
primitives. AES encryption then placing a film negative behind an
becomes a series of table lookups object and measuring the outline
leave a measurable timing differ- and XOR operations. cast by the object’s shadow. Instead
ence. The standard C memcmp() Because the index for these of timing the encryption process,
function is unsafe as well because AES tables is the XOR of a plain- which is subject to noise and jit-
it also terminates early. text byte and a key byte, the in- ter due to surrounding code in
dices themselves must remain the target, the attacker repeatedly
Footprints in the Cache secret. However, a spy process times accesses to its own memory
Like the original RSA timing at- running on the same system can while the target encrypts. Each
tack, the HMAC timing attack observe the variable timing of time an encryption occurs, the
combines many measurements of the AES encryption due to cache CPU evicts one or more lines of
the entire operation to find the behavior, narrowing down the the attacker’s memory from the
target’s secret. However, more possible values for the key.6 Even cache, causing timing variation.
powerful side channel attacks can if running a spy process isn’t pos- Because the cache eviction is local
give insight into an algorithm’s sible, a remote attacker can of- to the attacker, countermeasures
intermediate working values, re- ten trigger changes in the system such as randomizing or normaliz-
vealing the secret more quickly. cache state by interacting with ing the total encryption time have
Modern systems employ a other processes and timing those no effect.
CPU cache to keep frequently ac- unrelated tasks’ behavior. Such an attack isn’t merely a
cessed memory close to where it’s Dag Arne Osvik and his col- timing attack. Although time is
needed. When data for a given ad- leagues have described two useful the method for probing cache be-
dress is in the cache, it’s returned ways to induce variability and ob- havior, this attack could use other
immediately. If not, it’s fetched serve cache behavior: Evict+Time methods to determine the cache
from memory into the cache, and Prime+Probe.6 state. For example, if an instruc-
stalling the CPU for a few more Evict+Time works as follows: tion provided the number of valid
cycles. A cache is often divided cache lines for the current task, it
into blocks called lines. 1. Trigger an encryption in the would directly provide the same
Because a cache is smaller than target process. information obtained from this
the memory it shadows, the CPU 2. Evict memory from chosen timing side channel.
must have a policy for filling and cache lines by accessing the Intel and AMD (and previ-
reusing its space. The most com- appropriate addresses in the ously, Via) introduced AES in-
mon implementation is a set asso- attacker’s process. structions to address this problem

74 IEEE SECURITY & PRIVACY


Crypto Corner

and increase performance. Unfor-


tunately, owing to the structure
of AES, there appears to be no
S ide channel attacks were once
esoteric, remaining the do-
main of special-purpose hardware.
Seifert, “On the Power of Simple
Branch Prediction Analysis,” Proc.
2nd ACM Symp. Information, Com-
way to build a high-performance However, with the advent of cloud puter and Communications Security,
implementation on a general-pur- computing and virtualized servers, ACM Press, 2006, pp. 312–320.
pose CPU while avoiding cache you can no longer assume that at-
side channels. tackers are remote. Advanced sta- Nate Lawson is the founder of Root
tistical methods and modeling have Labs, a security consulting practice fo-
Which Way Did He Go? given them precise measurements cusing on kernel, embedded-platform,
A related but even more powerful independent of jitter. Meanwhile, and cryptography design and analysis.
attack uses the branch prediction CPUs’ increasing microarchitec- Contact him at nate@rootlabs.com.
cache’s status as a side channel.7 tural complexity has created more
Instead of detecting memory ac- side channels to exploit. Any soft- Selected CS articles and columns
cesses to the key data, this attack ware developer who writes or are also available for free at
determines the code path the tar- deploys an application utilizing http://ComputingNow.computer.org.
get process takes while executing cryptography must be aware of this
the encryption code. powerful class of attacks.
As I previously described,
square-and-multiply has an op- References
tional multiplication step. If the at- 1. P. Kocher, “Timing Attacks
tacker can detect when this branch on Implementations of Diffie-­
is taken, he or she can determine Hellman, RSA, DSS, and Other
which bits of the key are ones. Systems,” Cryptography Research,
(Other, more optimized routines 1995; www.cryptography.com/
such as sliding-window exponen- resources/whitepapers/Timing
tiation have similar weaknesses.) Attacks.pdf.
Because modern CPUs have a 2. D. Brumley and D. Boneh, “Re-
deep pipeline, they implement a mote Timing Attacks Are Prac-
branch prediction unit, which keeps tical,” Proc. 12th Conf. Usenix
track of the target address and Security Symp., Usenix Assoc.,
whether the branch was taken in 2003, p. 1.
a cache called the branch prediction 3. S.A. Crosby, D.S. Wallach, and
target buffer (BTB). As with the R.H. Riedi, “Opportunities and
memory cache, an attacker can Limits of Remote Timing At-
influence and measure the cache tacks,” ACM Trans. Information
state by performing jumps and and System Security, vol. 12, no.
timing either the encryption pro- 3, article 17; www.cs.rice.edu/
cess (as in Evict+Time) or its own ~dwallach/pub/crosby-timing
execution speed (Prime+Probe). 2009.pdf.
One potential hurdle for 4. “Timing Attack Tested Success-
branch prediction side channel at- fully: Downgrade from Any Ker-
tacks is disruption due to support nel without CPU-Key”; www.
code or other processes running. xbox-scene.com/xbox1data/sep/
This adds noise to the measure- EElZluZypZpmixPJrS.php.
ments. However, Onur Aciiçmez 5. N. Lawson, “Timing Attack
and his colleagues discovered that on Google Keyzar,” blog, 28
this noise was highly periodic.7 By May 2009; http://rdist.root.org/
taking several different measure- 2009/05/28/tim ing-attack-in
ments, they could select the one -google-keyczar-library.
with the lowest noise and use it 6. D.A. Osvik, A. Shamir, and E.
as the source for the key bits they Tromer, “Cache Attacks and
were detecting. Unlike the cache Countermeasures: The Case of
attacks on AES, such an attack AES,” Topics in Cryptology—CT-
can derive enough key bits from a RSA 2006, LNCS 3860, Spring-
single trace that repeated analysis er, 2006; pp. 1–20.
is unnecessary. 7. O. Aciiçmez, Ç.K. Koç, and J.-P.

www.computer.org/security 75