Professional Documents
Culture Documents
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Werner Schindler Sorin A. Huss (Eds.)
Constructive
Side-Channel Analysis
and Secure Design
13
Volume Editors
Werner Schindler
Bundesamt für Sicherheit in der Informationstechnik (BSI)
Godesberger Allee 185–189
53175 Bonn, Germany
E-mail: werner.schindler@bsi.bund.de
Sorin A. Huss
Technische Universität Darmstadt
Hochschulstr. 10
64289 Darmstadt, Germany
E-mail: huss@iss.tu-darmstadt.de
Local Organizers
Annelie Heuser Technische Universität Darmstadt, Germany
Michael Kasper Fraunhofer SIT, Germany
Marc Stöttinger Technische Universität Darmstadt, Germany
Michael Zohner Technische Universität Darmstadt, Germany
Program Committee
Onur Aciimez Samsung Electronics, USA
Guido Bertoni ST Microelectronics, Italy
Stanislav Bulygin TU Darmstadt, Germany
Ray Cheung City University of Hong Kong, Hong Kong
Jean-Luc Danger Télécom ParisTech, France
Markus Dichtl Siemens AG, Germany
Viktor Fischer Université de Saint-Etienne, France
Ernst-Günter Giessmann T-Systems International GmbH, Germany
Tim Güneysu Ruhr-Universität Bochum, Germany
Lars Hoffmann Giesecke & Devrient GmbH, Germany
Naofumi Homma Tohoku University, Japan
Marc Joye Technicolor, France
Jens-Peter Kaps George Mason University, USA
Çetin Kaya Koç University of California Santa Barbara, USA
and Istanbul Şehir University, Turkey
Arjen Lenstra EPFL, Switzerland
Pierre-Yvan Liardet ST Microelectronics, France
Stefan Mangard Infineon Technologies AG, Germany
Sandra Marcello Thales, France
David Naccache ENS Paris, France
VIII Constructive Side-Channel Analysis and Secure Design
External Reviewers
Michel Agoyan Bernhard Jungk Mathieu Renauld
Joppe Bos Markus Kasper Vladimir Rozic
Lilian Boussuet Michael Kasper Fabrizio de Santis
Pierre-Louis Cayrel Toshihiro Katashita Laurent Sauvage
Guillaume Duc Stéphanie Kerckhof Hermann Seuscheck
Junfeng Fan Chong Hee Kim Marc Stöttinger
Lubos Gaspar Jiangtao Li Daehyun Strobel
Benedikt Gierlichs Marcel Medwed Mostafa Taha
Christophe Giraud Filippo Melzani Junko Takahashi
Sylvain Guilley Oliver Mischke Michael Tunstall
Yu-Ichi Hayashi Amir Moradi Rajesh Velegalati
Stefan Heyse Abdelaziz Moulay Markus Wamser
Matthias Hiller Nadia El Mrabet Michael Weiss
Phillipe Hoogvorst Jean Nicolai Carolyn Withnall
Gabriel Hospodar David Oswald Meiyuan Zhao
Dimitar Jetchev Gilles Piret Michael Zohner
Table of Contents
Invited Talk I
700+ Attacks Published on Smart Cards: The Need for a Systematic
Counter Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Mathias Wagner
Secure Design
An Interleaved EPE-Immune PA-DPL Structure for Resisting
Concentrated EM Side Channel Attacks on FPGA Implementation . . . . . 39
Wei He, Eduardo de la Torre, and Teresa Riesgo
An Architectural Countermeasure against Power Analysis Attacks for
FSR-Based Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Shohreh Sharif Mansouri and Elena Dubrova
Conversion of Security Proofs from One Leakage Model to Another:
A New Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Jean-Sébastien Coron, Christophe Giraud, Emmanuel Prouff,
Soline Renner, Matthieu Rivain, and Praveen Kumar Vadnala
Fault Attacks
A Fault Attack on the LED Block Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Philipp Jovanovic, Martin Kreuzer, and Ilia Polian
Invited Talk II
A Closer Look at Security in Random Number Generators Design . . . . . . 167
Viktor Fischer
1 Introduction
Side-channel attacks are among the most powerful attacks performed on cryp-
tographic implementations. They exploit secret information that physically leak
out of a device. Typical side channels are the power consumption [11,12], the elec-
tromagnetic emanation [1], or the execution time of cryptographic algorithms [10].
The efficiency or even the success of an attack is largely determined by the used
measurement equipment. The better the equipment, the less noise and the higher
the side-channel leakage exploitation will be. Especially when countermeasure-
enabled devices are analyzed, the setup is vital in order to limit the needed
number of power-trace acquisitions to succeed an attack.
In this paper, we present a setup that improves the efficiency of side-channel
attacks by measuring the difference of two side-channel leakages. Our setup is
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 1–16, 2012.
c Springer-Verlag Berlin Heidelberg 2012
2 M. Hutter et al.
based on the idea to use two cryptographic devices (instead of one) and to mea-
sure the difference of their physical characteristics (e.g., the power consumption).
If both modules perform the same cryptographic operation, their physical char-
acteristics are the same so that the difference of both side-channel measurements
becomes theoretically zero. However, if one module processes different data than
the other module, a difference in both measurements can be observed at loca-
tions in time when data-dependent information is processed. The difference of
both side channels therefore provides only data-dependent signals and eliminates
static and (non data-dependent) dynamic signals (i.e., noise). Hence, the quality
of the measurements can be significantly improved which results in the fact that
less power traces have to be acquired in practice.
In order to perform side-channel analysis attacks using our setup, an attacker
can choose from two possible attacking scenarios: (1) one device is fed with
constant input data while the second device is fed with random data, or (2) one
device is fed in a way such that the targeted intermediate value is complementary
to the intermediate value of the second device. For both scenarios, we quantified
the efficiency by performing practical experiments. We designed three evaluation
boards where each board uses two devices (an AT89S8253 microcontroller, an
ATmega128, and a custom 8051 ASIC design). In our experiments, we applied
the Pearson Correlation coefficient and performed a classical Differential (or
Correlation based) Power Analysis (DPA) attack [11,12] on the differential power
trace. Our best results increased the correlation coefficient for the AT89S8253
from 0.64 to 0.99 (55 %), for the ATmega128 from 0.61 to 0.96 (57 %), and for
the custom 8051 ASIC from 0.11 to 0.22 (100 %). Furthermore, we evaluated
our method on countermeasure-enabled devices and performed attacks on an
implementation that uses randomization techniques as well as a masked AES
implementation. In this scenario, it shows that the setup reduces the number of
needed traces up to 90 %.
The rest of this paper is organized as follows. In Section 2, we discuss re-
lated work. Section 3 gives a brief overview on side-channel measurements and
describes how to improve the signal-to-noise ratio. After that, we present the
new measurement setup and highlight the benefits. In Section 4, we describe the
measurement process in detail and introduce two different measurement scenar-
ios. The three evaluation boards are presented in Section 5. Section 6 describes
the performed attacks. Results are given in Section 7 and Section 8. Conclusions
are drawn in Section 9.
2 Related Work
(RCIS) and Tohoku University [17]. The TSRC has released two boards, the
INSTAC-8 with an 8-bit microcontroller and the INSTAC-32 with a 32-bit mi-
crocontroller and an FPGA. From the SASEBO boards there exist a variety
of different evaluation platforms that contain Xilinx (SASEBO, SASEBO-G,
SASEBO-GII) or Altera (SASEBO-B) FPGAs. The boards contain two FPGAs,
one for the implementation of the cryptographic algorithm and one for handling
control tasks. Since the FPGAs have processor cores integrated (powerPC pro-
cessor cores), both hardware and software implementations can be evaluated
with theses boards. An SCA simulation tool has been also presented by Eind-
hoven University of Technology. The tool is called PINPAS and allows analyzing
the vulnerability of software algorithms against SCA attacks [7]. Commercial
SCA evaluation setups are offered by companies like Cryptography Research
(DPA Workstation [6]), Riscure (Inspector [16]), and Brightsight (Sideways [4]).
RIC 1 RIC 2
IC GND1 A B GND2 IC
VDiff
1 2
G
1
D
N
N
D2
G
+
-
A VDiff B
R1 R2
R1 R2
GND
GND GND
to 7.3 in our experiments, cf. Section 7). Even very low power-consumption
differences (that are caused by data-dependent operations for example) can
be efficiently identified.
3. Higher signal-to-noise ratio. Since the noise level is reduced and the sig-
nal acquisition resolution is increased, the signal-to-noise ratio (SNR) is
higher compared to conventional DPA attack setups. In fact, the higher
the SNR, the less traces have to be acquired.
The setup can be applied in scenarios where both devices run synchronously, i.e.,
the devices process the same operation and data in the same instant of time.
This is typically the case for devices that are fed by an external clock source
or for devices that possess a very stable internal clock generator. In these cases,
both devices can be easily synchronized by feeding the same clock source or by a
simultaneous power up. In order to overcome the costly operation of the proposed
setup due to synchronization issues, a simple yet effective synchronization circuit
based on an FPGA could be used. The FPGA would just have to trigger a reset
signal or to toggle the power supply of both devices if the first response (e.g.,
a power-up indicator) is asynchronous. Once implemented, such an automatic
trial-and-error setup device would be universally usable and it would be able to
provide a synchronous measurement setup in no time.
For many embedded systems like conventional smart cards, the setup may
fail because both devices provide an asynchronous behavior which cannot be
controlled by an attacker. This asynchronous behavior is caused by asynchronous
designs, unstable clock sources, or by side-channel countermeasures such as clock
jitters. However, in a white-box scenario, where the implementation is known and
where the devices can be fully controlled, one can benefit from the higher signal-
to-noise ratio of the setup to reduce the number of needed traces for a successful
attack.
In this paper, we consider only contact-based power-analysis attacks even
though the idea can be also extended to electromagnetic (EM) based attack
settings. In such a scenario, the position of probes plays a major role in the
efficient cancelation of uninteresting signals.
4 Measurement Methodology
In the following, we describe the measurement process to perform side-channel
attacks using our proposed setup. First, the setup has to be calibrated in order to
efficiently identify interesting side-channel leakages. In a second step, an attacker
(or evaluator) has to choose from various attacking scenarios, e.g., keeping the
key or input data of one device constant or choosing the inputs in such a way
that the targeted intermediate value is complementary to the intermediate value
of the second device.
6 M. Hutter et al.
0.2 0.2
Voltage [V]
Voltage [V]
0 0
−0.2 í0.2
100 200 300 400 100 200 300 400
0.2 0.2
Voltage [V]
Voltage [V]
0 0
−0.2 í0.2
100 200 300 400 100 200 300 400
0.05
Voltage [V]
0.05
Voltage [V]
0 0
−0.05 í0.05
100 200 300 400 100 200 300 400
Time [ns]
Time [ns]
f(x) = y f(x’) = y’
GND1 GND2
VDiff
x x’
IC 1 y IC 2 y’
Fig. 5. The processing of different input data x and x causes a voltage difference
between both ICs which can be exploited in a side-channel attack
acquisition trace can be measured and evaluated, e.g., in elliptic curve based
implementations where the ephemeral key is changed in every execution. In this
case, the setup efficiently reveals the power-consumption difference of the two
devices in a single shot. This difference can then be compared with generated
power-consumption templates in order to classify the leakage according to the
processed data.
Fig. 6. The test apparatus according to Fig. 7. The AT89S8253 evaluation board
the ISO/IEC 10373-6 standard [8]
coprocessor implemented in CMOS logic and in a masked logic style1 . The ASIC
evaluation board additionally contains voltage regulators and two ROMs for
storing the programs executed in the microcontroller cores.
Both devices on the respective evaluation board are connected to the same
clock source, whereby the clock wires have been routed in a way so that timing
differences (i.e., clock skew) are minimized. All three evaluation boards provide
the possibility to easily measure the core power consumption of each of the two
devices over a measurement resistor either in the VDD or in the GND line, as
well as to measure the power consumption difference of both devices.
Fig. 8. The ATmega128 evaluation board Fig. 9. The ASIC prototype-chip evalua-
tion board
All boards have been connected to a PC that runs Matlab [18] in order to
control the entire measurement setup. The PC transmits three bytes over the
serial connection to both ICs that are assembled on each board. IC1 listens
to the first byte, IC2 listens to the second byte, and the last byte starts the
operation on both ICs.
The power consumption of the ICs has been measured using the 2.5 GHz
LeCroy WavePro 725Zi 8-bit digital-storage oscilloscope. For all experiments,
we used a sampling rate of 5 GS/s. Each IC has been further programmed to
pull a debug pin to high which triggers the oscilloscope and starts the measure-
ment process. Furthermore, we used an active differential probe to measure the
difference of both side channels. For this, we used the LeCroy D320 WaveLink
Differential probe with 3.5 GHz bandwidth.
Processor Synchronization. It showed that the ICs of each setup are often
not synchronized after startup and their trigger signals occur at different points
in time. This is because both ICs are not powered up perfectly in parallel which
causes one IC to get clocked earlier or later than the other IC. In addition,
both ICs provide slightly different characteristics (power consumption, timing,
etc.) which is due to variations in the fabrication process of the ICs. In order to
minimize the differences, we recommend to use only ICs which provide at least
the same revision number, production line, and year/month of fabrication.
In order to synchronize the two ICs, we needed to reset and power up the
boards until they are synchronized (try and error). For example, for the 8051
microcontroller AT89S8253 the probability of synchronization is 1/24 since the
processor requires 12 clock cycles (so-called T-states) to execute a single machine
cycle.
Exploiting the Difference of Side-Channel Leakages 11
Table 1. Result of the Constant-Value Attack using the Pearson Correlation coefficient
7 Results of Attacks
This section presents the results of the performed attacks. All boards have been
clocked at a frequency of 3.6864 MHz.
Table 1 shows the correlation coefficient for each measurement setup. For the
AT89S8253 and the ATmega128, we measured 1 000 power traces. 10 000 traces
have been measured for the 8051 CMOS core of the ASIC prototype chip.
It shows that our setup increased the correlation coefficient by 0.23 (about
36 %) compared to the result obtained from a classical CPA-attack setup. This
means that the number of needed power traces is reduced by a factor of about
2.7 (from 50 to only 18). The y-coordinate resolution of the oscilloscope was
increased from 81 mV/DIV (for the Reference Attack ) to 11 mV/DIV (for the
Constant-Value Attack ) which is a factor of about 7.3. Similar results have been
obtained for the ATmega128. The correlation coefficient increased by 0.26 (about
43 %), thus the needed number of traces is reduced by a factor of 3.2 (from 57
to 18). The acquisition resolution has been increased by a factor of about 3.8.
About 27 % improvement has been obtained for the 8051 CMOS ASIC such that
the needed number of traces is reduced by 1.6 (from about 2 300 to only 1 400).
The acquisition resolution has been increased by the factor 3.3.
We also calculated the SNR in order to compare the signal level to the noise
level. It shows that the SNR increased by a factor of 4.7 to 11.5 in our exper-
iments (depending on the used device). An example for the SNR improvement
on the ATmega128 is given in Appendix A.
1 1
0.8 0.8
Correlation coefficient
Correlation coefficient
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 1 2 3 4 5 0 1 2 3 4 5
Time [µs] Time [µs]
Fig. 10. Result of a classical CPA attack Fig. 11. Result of a CPA attack that ex-
on one ATmega128 device (Reference At- ploits the difference of two side channels
tack ) (Complementary-Value Attack )
Table 3. Summary of the CPA attacks on the AES coprocessor in the prototype chip
implemented in CMOS logic; For the attacks, we applied the Hamming-distance power
model
output in the first round of the AES encryption. We targeted 8-byte transitions
in the AES State and measured 200 000 power traces for the analyses.
The results show that our setup is able to improve the correlation coefficient
between 30 % and 72 %. In five of the eight attacks, the correlation coefficient
could be increased by more than 50 %. For the best attack, this means that
33 000 traces instead of about 97 000 traces have to be measured to succeed the
attack which corresponds to a trace reduction of nearly 3.
9 Conclusion
In this paper, we presented a measurement setup that increases the efficiency of
side-channel attacks. The idea of the setup is to use two cryptographic devices
and to measure the difference of their side-channel leakages. If both devices per-
form the same operation synchronously and if they process different data, the
static and the data-independent power consumption is canceled out and only
data-dependent side-channel leakage can be effectively identified. This results in
a much higher signal-to-noise ratio during the measurement where up to 90 %
less power traces have to be acquired to succeed an attack as shown in practical
experiments. Furthermore, the setup can be used to efficiently identify differ-
ences in the instruction flow of cryptographic implementations or to discover
data-dependent variations which can be exploited in attacks. The setup further
significantly increases the efficiency of template-based side-channel attacks that
use only a single-acquisition power trace to reveal secret information.
References
1. Agrawal, D., Archambeault, B., Rao, J.R., Rohatgi, P.: The EM side-channel(s).
In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp.
29–45. Springer, Heidelberg (2003)
2. Agrawal, D., Rao, J.R., Rohatgi, P., Schramm, K.: Templates as Master Keys.
In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 15–29. Springer,
Heidelberg (2005)
3. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage Model.
In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29.
Springer, Heidelberg (2004)
4. Brightsight. Unique Tools from the Security Lab, http://www.brightsight.com/
documents/marcom-materials/Brightsight Tools.pdf
5. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
6. Cryptography Research. DPA Workstation,
http://www.cryptography.com/technology/dpa-workstation.html
7. den Hartog, J., Verschuren, de Vink, E., de Vos, J., Wiersma, W.: PINPAS: A Tool
for Power Analysis of Smartcards. In: Sec 2003, pp. 453–457 (2003)
8. International Organisation for Standardization (ISO). ISO/IEC 10373-6: Identifi-
cation cards - Test methods – Part 6: Proximity cards (2001)
9. International Organisation for Standardization (ISO). ISO/IEC 10373-7: Identifi-
cation cards - Test methods – Part 7: Vicinity cards (2001)
10. Kocher, P.C.: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS,
and Other Systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp.
104–113. Springer, Heidelberg (1996)
11. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
12. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks – Revealing the Secrets
of Smart Cards. Springer (2007) ISBN 978-0-387-30857-9
13. Matsumoto, T., Kawamura, S., Fujisaki, K., Torii, N., Ishida, S., Tsunoo, Y., Saeki,
M., Yamagishi, A.: Tamper-resistance standardization research committee report.
In: The 2006 Symposium on Cryptography and Information Security (2006)
14. Popp, T., Kirschbaum, M., Mangard, S.: Practical Attacks on Masked Hardware.
In: Fischlin, M. (ed.) CT-RSA 2009. LNCS, vol. 5473, pp. 211–225. Springer,
Heidelberg (2009)
15. Popp, T., Kirschbaum, M., Zefferer, T., Mangard, S.: Evaluation of the Masked
Logic Style MDPL on a Prototype Chip. In: Paillier, P., Verbauwhede, I. (eds.)
CHES 2007. LNCS, vol. 4727, pp. 81–94. Springer, Heidelberg (2007)
16. Riscure. Inspector - The Side-Channel Test Tool,
http://www.riscure.com/fileadmin/images/Docs/Inspector_brochure.pdf
17. Side-channel attack standard evaluation board. The SASEBO Website,
http://www.rcis.aist.go.jp/special/SASEBO/
18. The Mathworks. MATLAB - The Language of Technical Computing,
http://www.mathworks.com/products/matlab/
16 M. Hutter et al.
We calculated the signal-to-noise ratio for the power measurements on the AT-
mega128 board (see Section 5 for a description of the board). Figure 12 shows
three SNR plots according to three performed attacks: the Reference Attack,
Constant-Value attack, and Complementary-Value attack. The SNR is defined
as the ratio of signal power to the noise power. For the signal characterization,
we calculated the variance of means for each of the 256 possible intermediate
values (300 power traces for each value resulting in 76 800 power traces in total).
The noise has been characterized by calculating the variance of constant value
processing, cf. [12]. It shows that the SNR is improved by a factor of 21.6 (from
3 to about 65). For the Constant-Value attack, the SNR has been improved from
0.3 to about 14 (by a factor of 4.6).
4
Reference Attack
SNR
0
0 1 2 3 4 5
15
Constant−Value Attack
10
SNR
0
0 1 2 3 4 5
80
60 Complementary−Value Attack
SNR
40
20
0
0 1 2 3 4 5
Time [µs]
Fig. 12. Signal-to-noise ratio of the Reference Attack, Constant-Value attack, and
Complementary-Value attack on the ATmega128
Attacking an AES-Enabled NFC Tag:
Implications from Design to a Real-World
Scenario
1 Introduction
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 17–32, 2012.
Springer-Verlag Berlin Heidelberg 2012
18 T. Korak, T. Plos, and M. Hutter
Fig. 1. Architecture of the evaluated chip Fig. 2. The development board with the
evaluated chip
3 Measurement Setup
The LC584AM oscilloscope from LeCroy was used to record the traces and the
recording process was controlled with a computer running MATLAB scripts. In
order to establish the communication between computer and tag an RFID reader
(Tagnology TagScan) was used. The EM probes to measure the electromagnetic
emanation are from ‘Langer EMV Technik’. We were able to record 1 trace per
second on average. The reason for this rather low recording speed is on the one
hand the two-step communication between computer and tag (the reader is in
the middle) and on the other hand storing the traces on the computer is also a
time-consuming process. Three different measurement setups were used in order
to record the traces needed for the SCA attacks: the real-world scenario, the test
scenario and the FPGA scenario.
Real-World Scenario. The real-world scenario is the most important one because
it can be used to attack the real NFC-tag chip without additional requirements
like trigger pins or external power supply. In this scenario the electromagnetic
emanation of the chip is measured using an EM probe. In order to measure only
the electromagnetic emanation and not the reader signal we separated the chip
and the antenna. This approach was presented by Hutter et al. [14] as well as by
Caluccio et al. [4]. So the chip could be placed outside of the reader field for better
measurement results. In our setup the distance between tag chip and antenna
was 25 centimeters. The presented modification can be made with every RFID
tag. A second EM probe was used in order to get the trigger information. This
probe was placed inside the reader field. With these traces the reader commands
could be easily identified. The EM traces were recorded with a sampling rate
of 2.5 GS/s. A schematic of the measurement setup for this scenario can be
seen in Figure 3. There were only small deviations in the duration between the
reader command and the start of the AES calculation. With an alignment step
these deviations could be removed and satisfying DPA-attack results could be
achieved. The least-square matching method was used to align the traces.
Test Scenario. The test scenario can only be performed with the development
board and is also used to attack the ASIC chip. In that scenario the chip was
powered with an external power supply, so the chip does not use the supply
voltage extracted from the reader field. We inserted a resistor in the ground
line in order to measure the power consumption of the chip. The value of the
resistor was 100 Ω. A schematic overview of the measurement setup can be seen
in Figure 4. The amplitude of the recorded trace increases significantly when the
chip starts an AES calculation. This could be used as trigger information. With
that setup the traces were not perfectly aligned so an alignment step was also
necessary in order to get satisfying results of the DPA attacks.
FPGA Scenario. The FPGA scenario was used to attack the FPGA-prototype
tag. In this scenario the electromagnetic emanation of the FPGA was used as
side-channel information. We used an EM probe to measure the electromagnetic
emanation. One advantage of the FPGA-prototype tag for the EM measurements
22 T. Korak, T. Plos, and M. Hutter
Fig. 3. Measurement setup of the real- Fig. 4. Measurement setup of the test
world scenario scenario
was that the FPGA chip is placed outside of the reader field. Several pins can
be used as debug pins on the FPGA-prototype tag. We used one of these pins to
indicate when the AES calculation starts. The signal of this pin could be used as
trigger information. This trigger information was very accurate so no alignment
step was necessary for successful DPA attacks on the FPGA prototype tag.
z1−α
2
n=3+8 (1)
ln2 1−ρ
1+ρ
The results of the performed DPA/DEMA attacks can be split into two main
parts: attacks with disabled countermeasures and attacks with enabled counter-
measures. The attacks with disabled countermeasures were used to evaluate the
performance of the different measurement setups. They equal an attack of an
unprotected AES implementation and results can be achieved with a small num-
ber of traces. The randomization parameters for the countermeasures were fixed.
This means that no dummy rounds are inserted at the beginning. Also shuffling
is deactivated, so the first S-box operation always appears at the same point in
time for every new AES encryption. With this step we show that the different
Attacking an AES-Enabled NFC Tag 23
Fig. 5. DEMA-attack result of the real- Fig. 6. DEMA-attack result of the real-
world scenario with countermeasures dis- world scenario with countermeasures dis-
abled. In this case the whole amplitude abled. In this case only the positive values
of the EM trace was recorded. of the EM trace were recorded.
As a result we got a higher correlation value of 0.325 and the result can be seen
in Figure 6. According to Equation 1, with 246 traces the correct value for the
first key byte could be found. With this improvement we were able to decrease
the number of required traces from 373 to 246.
As a second experiment we performed a DPA attack using the test scenario.
In this scenario we used an external power supply for the chip and measured the
power consumption with a resistor in the ground line. Here we got a correlation
value of 0.664 for the correct key hypothesis. About 47 traces are needed in order
to reveal the value of the first key byte.
In the FPGA case about 54 traces are needed in order to perform a successful
attack. For comparison we have plotted the result of the DEMA attack on the
FPGA prototype tag in Figure 7. Here the correlation value for the correct key
hypothesis is 0.629. Filtering the recorded traces was the only required prepro-
cessing step for a successful attack. Here a bandpass filter with a lower frequency
of 15 MHz and an upper frequency of 25 MHz had to be used in order to get sat-
isfying results. A dedicated pin was used for the trigger information and so the
traces did not have to be aligned afterwards.
The test scenario and the FPGA scenario produce similar results. Successful
attacks can be performed with low effort, only 47 and 55 traces are needed
to reveal the value of the first key byte, respectively. However both of these
attacks cannot be performed on a real RFID tag. The real-world scenario that
we have used for our measurements can be performed on real RFID tags as well.
We were able to perform successful DEMA attacks on the unprotected AES
implementation with 246 traces using that scenario, compared to the FPGA
scenario the effort increases by a factor of 4.5. This result enables chip designers
to evaluate the security of other implementations using the same production
process in an early design step. An FPGA implementation of the chip can be used
in order to evaluate the resistance of the ASIC against SCA attacks. If there is a
redesign of an existing ASIC (e.g. new SCA countermeasures are implemented),
the presented approach can be used to evaluate the security of the new ASIC
using the results of the SCA attacks on the FPGA implementation. We also use
the achieved results from above in the following section in order to evaluate the
security of the protected AES implementation.
Fig. 7. DEMA-attack result of the FPGA Fig. 8. Filtered EM trace of the initial
scenario with countermeasures disabled key addition and the first three rounds of
AES
Table 1. Estimate of the required number of traces for a successful DPA attack with
enabled countermeasures
would lead to a recording time of 189 days (using the factor of 4.5 from above).
This effort is rather high so we tried to find a way to reduce the impact of the
countermeasures. In many applications the number of encryptions is also limited
to a specific value, so a DPA/DEMA attack can only be successful if the number
of required traces is below this value.
The approach we used for reducing the impact of the countermeasures was to
get some information about the random value defining the number of dummy
rounds inserted at the beginning. For that purpose we recorded a set of 100 traces
containing the initial key addition and the first AES round. A plot showing one
trace of this set can be found in Figure 8. Our observations showed that delay
cycles are also inserted during the initial key addition. After some analysis of
the traces we found a pattern during the initial key addition. When calculating
the difference of two traces, peaks appear at different points in time depending
on the random variable defining the number of dummy rounds inserted at the
beginning. For the set of 100 traces we have calculated the difference for every
single pair of traces and could observe three different cases which are illustrated
in Figure 9:
Fig. 9. The left plot shows the difference of two traces without significant peaks (first
case). The plot in the middle shows the difference of two traces with four peaks with
comparable amplitude (second case). The plot on the right side shows the difference
of two traces with one significant peak (third case). Traces recorded with the FPGA
scenario have been used to generate these plots.
– In the third case again four peaks in the difference trace can be identified
but one of these four peaks has a significantly higher amplitude.
Following the upper observation we made the following assumptions: If the dif-
ference of two traces leads to the first case, the same random value was used
for the dummy-round countermeasure of both encryptions. If the difference of
two traces leads to the second case different random values were used for the
dummy-round countermeasure for the two encryptions. Finally, if the difference
leads to the third case a specific value was used for the countermeasure during
one of the two encryptions.
In a first attack scenario we used the third case to filter out the traces with one
specific number of dummy rounds inserted at the beginning. First we recorded
a set of traces including the first 16 rounds (there are 25 rounds in total, 15
dummy rounds and ten real AES rounds). In a next step we created a new set of
traces containing only these traces where the specific number of dummy rounds
were inserted at the beginning. In order to visualize our approach to filter out
the traces we have plotted the difference matrix for 100 traces which can be seen
in Figure 10. This matrix contains the absolute maximum value of the difference
of the two traces corresponding to the row number and column number. It is
clearly visible that for some traces this value is higher (darker points) compared
to other traces. In order to build the reduced set of traces we have selected only
these traces corresponding to a row number with a high value (dark points). As
we assume a unique distribution of the random value the size of this new set
is about 1/16 of the size of the original set. On the reduced set we performed
a DEMA attack. In order to conduct the first attack scenario we recorded a
set of 320 000 traces. After filtering out the dummy rounds with the approach
presented above the size of the set reduced by a factor of 16 to 20 000 traces.
The reduced set only contains traces with a specific number of dummy rounds
at the beginning followed by the first real AES round processing the attacked
intermediate value. On this reduced set we performed a DEMA attack and were
able to reveal the value of the first key byte. It figured out that 15 dummy rounds
Attacking an AES-Enabled NFC Tag 27
Fig. 10. Visualization of the difference Fig. 11. DEMA-attack result of the
matrix for 100 traces FPGA scenario with active countermea-
sures
are inserted at the beginning when the special pattern appears in the difference
traces. Figure 11 shows the result of this attack. Compared to the results in
Figure 5, Figure 6 and Figure 7 no single correlation peak can be identified.
This is because shuffling spreads the single peak on 16 different points in time.
With a bigger set of traces the 16 peaks in the correlation trace of the correct
key hypothesis could be identified better. The maximum correlation value of the
attack is 0.03931.
In a second attack scenario we used the first case of our observations above
to split the recorded traces into 16 groups. As we had the ability to read out
the random value used for the randomization for every encryption by a small
change in the FPGA implementation we were able to verify the performance of
the clustering approach. All the traces in one group belong to encryptions where
the same random value for the dummy rounds was used. In order to perform the
clustering we used the K-means clustering function provided by MATLAB with
the squared euclidian distance as distance measure. We also did a performance
analysis where we performed the group building for 100 to 500 traces. There
is a linear relationship between runtime of the group building algorithm and
the number of traces used. The amount of correctly classified traces is between
96% and 98%. The building of the groups takes about 0.25s per trace. It has to
be mentioned that for an attack the group building step has to be conducted
only for e.g. the first 100 traces. The huge remaining part of the traces can be
clustered by just comparing with the groups. We achieved similar results by
comparing with one single trace of each group and by comparing with the mean
trace of each group. Here we were able to decrease the time to group one trace
to 0.1s. The length of the traces used for the mentioned experiment was 250 000
samples. The runtime strongly depends on the length of the used traces.
With the clustering approach it is now possible to decrease the number of
required traces for a successful DEMA attack on the secret AES key. First of
all we recorded a set of 320 000 traces containing the initial key XOR and the
first three rounds. Next we applied the clustering algorithm to group the traces
28 T. Korak, T. Plos, and M. Hutter
into 16 groups. The clustering step for 320 000 traces takes about 9 hours on
a standard desktop computer. Every group contains on average 20 000 traces
as the random value defining the number of dummy rounds at the beginning
follows a uniform distribution. Now there are more possibilities to conduct the
attack. One way is to put the focus just on the first round and perform a DEMA
attack on each of the 16 groups separately. The result of the attack using one
specific group (the one where no dummy rounds are inserted at the beginning)
leads to a significantly higher correlation value for the correct key byte. The
shuffling countermeasure is still active but Table 1 shows that 20 000 traces are
sufficient to find the correct key value even in the presence of shuffling. A second
way is to combine the first and the second round and trying out all different
combinations of two groups. That means to pick out the first round of group A
and the second round of group B and preform a DPA attack on this combination.
If group A is the group where no dummy rounds are inserted at the beginning
and group B is the group containing traces where one dummy round is inserted
at the beginning the DPA attack leads to a correct result. This approach leads
to a higher computational effort because there are 256 possible combinations.
The number of required traces decreases because only 10 000 traces are needed
in each group. So the total number of traces decreases to 160 000. The runtime
for the DEMA attacks increases to nearly 15 hours in that case. Furthermore
we estimated the complexity for the focus on three rounds and the combination
of three groups. As the number of possible combinations increases to 4 096 the
runtime for the DEMA attacks increases to nearly 6.5 days. The positive effect
is that the number of required traces decreases again. A summary of the upper
scenarios can be found in Table 2.
Table 2. The influence of the clustering approach on the number of traces needed for
a successful DPA attack as well as on the calculation time for the attack
Groups Comb. Required traces Required traces Time for DPA Total
used per group overall attack on one group time
[s] [s]
1 16 20 000 320 000 400 6 400
2 256 10 000 160 000 200 51 200
3 4 096 6 666 106 666 133 544 768
Table 3. Comparison of the number of needed traces and the duration for recording
the required amount of traces for the FPGA scenario and the real-world scenario. Also
the influence of the used preprocessing techniques is illustrated. With windowing the
impact of shuffling can be decreased. With our clustering approach the impact of the
dummy rounds can be decreased. The number in the brackets denotes the number of
used groups for the DPA attack.
second. With the knowledge that the attack complexity for the real-world sce-
nario increases by a factor of 4.5 the number of required traces to perform a
successful attack as well as the attack duration can be given.
shown with the FPGA scenario we could find 2 different ways to decrease the
attack complexity. We were able to reveal some information about the number of
dummy rounds inserted before the first real AES round. Furthermore we could
show that with our approach it is possible to scale down the number of required
traces by adding more computational effort afterwards. This can be an important
step if the number of encryptions is limited to a fixed value (e.g. 200 000).
5 Conclusion
In this work we presented DPA and DEMA attacks on the AES implementation
of a security-enabled NFC tag. For the attacks we used an FPGA-prototype
version as well as a manufactured ASIC chip. Three different measurement se-
tups were used: a real-world scenario, a test scenario and an FPGA scenario. We
could show that the results of the attacks on the ASIC chip using the real-world
scenario are comparable with the attack results on the FPGA prototype. The
effort for the attack on the ASIC chip is 4.5 times higher compared to the attack
on the FPGA prototype. The attacks on the ASIC chip were performed using a
real-world scenario without a dedicated trigger pin or an external power supply
of the chip. The attacks on the FPGA prototype were performed under labo-
ratory conditions. The attacked AES implementation also has countermeasures
against SCA attacks integrated which are the insertion of dummy rounds and
shuffling. We were able to enable and disable the countermeasures and so we
found a pattern to mitigate the impact of the dummy-round countermeasure.
This pattern gave us the ability to group the recorded traces according to the
number of dummy rounds inserted before the first real AES round. As a con-
sequence the attack complexity decreased. Only some knowledge (usage of the
dummy-round countermeasure) about the AES implementation was needed in
order to find this pattern so the presented approach is a serious thread for im-
plementations with countermeasures against SCA attacks. We could show that
with the presented approach it is possible to decrease the number of needed
traces for a successful DPA attack. In our special case the number of traces
could be reduced from 320 000 to less than 110 000 traces. As a side-effect the
computational effort increases but within acceptable limits.
References
[1] Auer, A.: Scaling Hardware for Electronic Signatures to a Minimum. Master thesis,
University of Technology Graz (October 2008)
[2] Batina, L., Guajardo, J., Kerins, T., Mentens, N., Tuyls, P., Verbauwhede, I.:
Public-Key Cryptography for RFID-Tags. In: Workshop on RFID Security 2006
(RFIDSec 2006), Graz, Austria, July 12-14 (2006)
Attacking an AES-Enabled NFC Tag 31
[3] Bono, S., Green, M., Stubblefield, A., Juels, A., Rubin, A., Szydlo, M.: Secu-
rity Analysis of a Cryptographically-Enabled RFID Device. In: Proceedings of
USENIX Security Symposium, Baltimore, Maryland, USA, pp. 1–16. USENIX
(July-August 2005)
[4] Carluccio, D., Lemke, K., Paar, C.: Electromagnetic Side Channel Analysis of a
Contactless Smart Card: First Results. In: Oswald, E. (ed.) Workshop on RFID
and Lightweight Crypto (RFIDSec 2005), Graz, Austria, July 13-15, pp. 44–51
(2005)
[5] Courtois, N.T., O’Neil, S., Quisquater, J.-J.: Practical Algebraic Attacks on the
Hitag2 Stream Cipher. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A.
(eds.) ISC 2009. LNCS, vol. 5735, pp. 167–176. Springer, Heidelberg (2009)
[6] Oswald, D., Paar, C.: Breaking Mifare DESFire MF3ICD40: Power Analysis and
Templates in the Real World. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS,
vol. 6917, pp. 207–222. Springer, Heidelberg (2011)
[7] Eisenbarth, T., Kasper, T., Moradi, A., Paar, C., Salmasizadeh, M., Shalmani,
M.T.M.: On the Power of Power Analysis in the Real World: A Complete Break of
the KeeLoq Code Hopping Scheme. In: Wagner, D. (ed.) CRYPTO 2008. LNCS,
vol. 5157, pp. 203–220. Springer, Heidelberg (2008)
[8] Feldhofer, M., Dominikus, S., Wolkerstorfer, J.: Strong Authentication for RFID
Systems Using the AES Algorithm. In: Joye, M., Quisquater, J.-J. (eds.) CHES
2004. LNCS, vol. 3156, pp. 357–370. Springer, Heidelberg (2004)
[9] Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic Analysis: Concrete Results.
In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp.
251–261. Springer, Heidelberg (2001)
[10] Hämäläinen, P., Alho, T., Hännikäinen, M., Hämäläinen, T.D.: Design and Im-
plementation of Low-Area and Low-Power AES Encryption Hardware Core. In:
Proceedings of 9th EUROMICRO Conference on Digital System Design: Architec-
tures, Methods and Tools (DSD 2006), Dubrovnik, Croatia, August 30-September
1, pp. 577–583. IEEE Computer Society (2006)
[11] Hein, D., Wolkerstorfer, J., Felber, N.: ECC Is Ready for RFID – A Proof in
Silicon. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381,
pp. 401–413. Springer, Heidelberg (2009)
[12] Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: A Ring-Based Public Key Cryp-
tosystem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 267–288.
Springer, Heidelberg (1998)
[13] Hutter, M., Feldhofer, M., Wolkerstorfer, J.: A Cryptographic Processor for Low-
Resource Devices: Canning ECDSA and AES Like Sardines. In: Ardagna, C.A.,
Zhou, J. (eds.) WISTP 2011. LNCS, vol. 6633, pp. 144–159. Springer, Heidelberg
(2011)
[14] Hutter, M., Mangard, S., Feldhofer, M.: Power and EM Attacks on Passive 13.56
MHz RFID Devices. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS,
vol. 4727, pp. 320–333. Springer, Heidelberg (2007)
[15] Hutter, M., Medwed, M., Hein, D., Wolkerstorfer, J.: Attacking ECDSA-Enabled
RFID Devices. In: Abdalla, M., Pointcheval, D., Fouque, P.-A., Vergnaud, D.
(eds.) ACNS 2009. LNCS, vol. 5536, pp. 519–534. Springer, Heidelberg (2009)
[16] International Organization for Standardization (ISO). ISO/IEC 14443: Identifica-
tion Cards - Contactless Integrated Circuit(s) Cards - Proximity Cards (2000)
[17] Kasper, T., Oswald, D., Paar, C.: EM Side-Channel Attacks on Commercial Con-
tactless Smartcards Using Low-Cost Equipment. In: Youm, H.Y., Yung, M. (eds.)
WISA 2009. LNCS, vol. 5932, pp. 79–93. Springer, Heidelberg (2009)
32 T. Korak, T. Plos, and M. Hutter
Mathias Wagner
1 Introduction
Recent literature surveys showed that over the past two decades in excess of
700 papers have been published on attacks (or countermeasures thereto) on
embedded devices and smart cards, in particular. Most of these attacks fall into
one of three classes, (hardware) reverse engineering, fault attacks, and side–
channel attacks. Not included here are pure software attacks, which likely are
even more in abundance. Each year another 50–100 papers are being added to
this stack, and this is not accounting for exponential growth yet.
This poses a severe problem for the development and deployment of highly
secure hardware and software of embedded devices. Typically, a new embedded
chip family needs 2-3 years to develop, with derivatives perhaps being spun off
within 12-18 months. The development of secure operating systems for those
chips is not significantly faster.
Commercial software development can only start after development tools for
the embedded chip (simulator, emulator) have become available, so perhaps 1
year before the embedded chip itself is commercially available. Thus, adding
the development times of hardware and software and not accounting for further
time delays due to certification (such as Common Criteria or EMVCo) one can
conclude that easily 3 years will have passed since the embedded hardware had
been conceived originally. Or, in other words, another 150 – 300 attack papers
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 33–38, 2012.
c Springer-Verlag Berlin Heidelberg 2012
34 M. Wagner
will have been published by then. And during the foreseeable life time of the
product of another 3-5 years, this stack will increase to 300 – 600 papers.
So, how sure can we be that these embedded devices will not be hacked
during their life time? Clearly, old design strategies in hardware and software
development that operate in a ”responsive mode”, will not work. With these
strategies every time a new attack becomes known, one typically finds a patch,
applies it to the product, and then moves on.
To make matters more complicated: Since smart cards are generally certified
according to Common Criteria at level EAL 4+ or higher [1], meaning that they
are resistant to attackers with ”high attack potential”, any valid attack on a
smart card — and be that only under lab conditions in one of the certified eval-
uation labs — will reset the clock and will require that the embedded operating
system and, depending on the attack, perhaps also the underlying hardware be
tested again according to the Common Criteria rules. This adds costs and time
delay to any project. In the worst case yet another new attack will be found
whilst a product is still being tested in the evaluation labs, and the designers
have to go back to square one immediately. This way a product launch may be
delayed indefinitely.
What we thus need is a new, structurally different way of design that is much
more proactive and much more amenable to the requirements of todays ever–
faster moving security landscape.
In Section 2, for the sake of clarity, a brief overview of the dominant classes
of attacks is given, followed in Sect. 3 by a discussion of possible new design
strategies. However, not all problems will be solvable in the design phase, and
thus risk management and certification (Sect. 4) also need to be reviewed in this
context.
2 Overview of Attacks
Basically, there exist four classes of attacks: Reverse engineering of the hardware,
fault attacks, side–channel attacks, and software attacks. On top of this there
are attacks that combine elements out of these four fundamental classes.
An example of a reverse engineering attack was published at Blackhat [2,3].
In a nutshell, the aim here is to identify the standard cells used in the design
of the chip, understand the connectivity between these cells, and eventually
recover the functionality of the chip. A prime target with this approach is to
dump the memory content of a smart card, the so–called Linear Code Extraction
Attack. Substantial expertise is required to be successful with such an attack, but
publicly available tools like Degate [4] help to automate this process. Typically,
countering a successful attack in this class requires changes in the hardware of the
embedded chip, and a software fix is often not possible. Another characteristic
of this attack class is that it is very tedious, but the individual steps can be
automated and progress towards success can always be measured. This keeps
the motivation of hackers high.
700+ Attacks Published on Smart Cards 35
Fault attacks are much less invasive and are typically performed with high–end
laser equipment. The aim here is, e.g., to introduce faults either during code execu-
tion, or when reading data or code from the various memories. A famous example
is the Bellcore attack [5], where the introduction of a single fault at the right stage
of an RSA calculation based on the Chinese Remainder Theorem will reveal the
secret key used. The most economical way to address these attacks is with a right
mix of sufficiently resilient hardware and a robust embedded operating system
that can cope with ”mistakes” made by the hardware. The Bellcore attack already
demonstrates a key aspect of these types of attacks: Often, it suffices to have only
a few successful hits in order to succeed with the attack. However, the embedded
software has a chance of detecting the attack, e.g., when doing redundancy checks
that go wrong, or by monitoring alarms sent from the underlying hardware. Some
fault attacks like safe–error–attacks [6] are rather difficult to cope with, though,
since there the exploit already happens by detecting an ”unusual” response to the
attack by the embedded device — e.g., a reset forced by an attack is already a
useful information.
On the other hand, side–channel attacks are not invasive at all, and hence
there is by definition no way that an embedded device can tell it is being at-
tacked. Side–channel attacks aim at exploiting subtle differences in the power
consumption or the electromagnetic emission of these devices, e.g., differences
that depend on the secret key used during an encryption. Generally, these differ-
ences are so subtle that the measurement needs to be repeated many times and
quite some statistical analysis may be required. There is an abundant amount
of literature available on this subject.
Pure software attacks like fuzzing are of a different nature and will not be
further considered in this paper.
3 Possible Strategies
Strategies that can cope with new attacks even after the design of the embedded
device is finished are not easy to come by. However, new design principles do
begin to emerge.
In the past, the prevailing strategy had been security by obscurity, meaning
that the designers hoped that by making their design complicated and by hid-
ing their security countermeasures, it would be hard for an attacker to unravel
all the mysteries. However, this often underestimates the capabilities and the
determination of attackers. And once the attacker has been successful, it tends
to cause some embarrassment to the manufacturer. Consequently, in the long
run, it is much smarter to change to a strategy of integral security by design,
where the security lies in the secret keys used, but not in design and implemen-
tation details. Ideally, with such an approach a potential attacker can be given
all design details and it will still not help him/her to launch a successful attack.
Clearly, this is an ambitious goal.
Generally speaking it is favorable to address the root of a problem rather
than the symptoms. For instance, it is certainly possible to use a constant–
current source in hardware to make sure the power consumption is constant
36 M. Wagner
and independent of the CPU and of the coprocessor activity. This way a side–
channel attack on the power consumption will be made much harder. However,
it does not help at all for attacks based on electromagnetic emissions. It is much
better to deploy mathematical countermeasures such as blinding in the CPU and
coprocessor design. These countermeasures address the root cause and provide
protection independent of the particular side–channel used for the attack.
As to fault attacks, given that a single fault may already lead to a successful
attack, it is prudent for the embedded device to react very harshly to any fault
attack that it detects, particularly so when assuming that it will not detect
all faults to begin with. Ideally, it will shut down for good once a few attacks
have been detected. However, this requires that the embedded device can detect
attacks with high confidence and that there are no false positives. A false positive
would result in a very bad user experience ”in the field” and to unacceptably high
numbers of returns of dead devices to the manufacturer. Experience shows that
simple analogue security sensors tend to produce too many false positives in poor
environmental operating conditions and hence the trend is to more sophisticated
and even digital ”integral” security sensor concepts.
Some manufacturers deploy strategies where essentially two CPU cores exist
in the embedded device that perform the same calculation and then compare. If
the results differ, likely a fault attack has occurred. This strategy is very generic
and thus in principle capable of catching also future attacks of certain types.
On the other hand, there are obvious weaknesses of this approach. For once,
the coprocessors are usually not doubled, so these are not protected by this.
Secondly, the two CPU cores access the same memory and hence attacks on
the memory, say, during the fetching of code, will still apply. And thirdly, the
module that compares the results of the two CPU cores can also be attacked.
Commercially, the disadvantage of this approach is the substantial increase in
chip size, power consumption, and likely a degradation of RF performance.
Other strategies involve generating redundancy in time rather than in space,
by performing computations more than once. Obviously, this decreases the max-
imum performance of the embedded device, which needs to be compensated for
by having very powerful cores to begin with. The advantage is that this integral
method is very flexible and is not carved in stone. It can be applied where needed
and is very efficient with sparse resources such as chip area and power consump-
tion. It is entirely possible to cover not only the CPU with such an approach,
but also all crypto coprocessors, as well as memory access. Furthermore, the
time axis allows for more ”tricks” to be applied, such as mathematical blinding
techniques that change over time.
Independent of these particular strategies, formal methods for proving cor-
rectness and robustness against attacks can provide a greater level of confidence
that indeed all known cases have been covered correctly.
It is also needed to pay more attention to the family concept found for em-
bedded devices both, in hardware as well as in software. For instance, the attack
presented in [2,3] was made considerably easier due to the fact that an evolu-
tion of closely related smart cards existed across many technology nodes, where
700+ Attacks Published on Smart Cards 37
the attacker could learn and train his/her skills. Moreover, some members of
this family had less security features than others and were targeting other mar-
kets with possibly less security requirements. Again, these weaker members of
the same family provided stepping stones towards the ultimate attack. In order
to reduce such collateral damage, product diversification is required to target
different markets. Diversification of countermeasures may also be in order.
On a system level, it is always a wise idea to make the targets as unattractive
for attacks as possible. At the end of the day, except for ethical hacking, it is the
commercial business case that a hacker will review. Is it financially rewarding to
perform an attack or not — even though it may be theoretically feasible? Thus,
a system designer should refrain from making a single smart card or embedded
device too valuable to attack — e.g., by putting the same global system key into
all smart cards of an eco system. However, it is rather hard to estimate how
much value a single modern high–end smart card is capable of protecting — it
will be in the range of a few hundred k$, but not in the millions.
5 Conclusion
The wealth of new attacks on embedded devices and new countermeasures
thereto that emerges every year requires new approaches to the design of se-
cure embedded devices. Manufacturers need to embrace a philosophy of integral
security by design which is capable of coping with such new attacks, and where
the design could in principle be opened up to public review for analysis without
providing any substantial benefit to a potential attacker. Security by obscurity
38 M. Wagner
References
1. Common Criteria for Smart Cards, http://www.commoncriteriaportal.org/
2. Tarnovsky, C.: Hacking the Smartcard Chip. In: Blackhat Conference, February 2-3
(2010), http://www.blackhat.com/html/bh-dc-10/bh-dc-10-briefings.html
3. Nohl, K., Tarnovsky, C.: Reviving Smart Card Analysis. In: Blackhat Conference,
August 3-4 (2011),
http://www.blackhat.com/html/bh-us-11/bh-us-11-briefings.html
4. Schobert, M.: http://www.degate.org/
5. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Cryp-
tographic Protocols for Faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS,
vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
6. Loubet-Moundi, P., Olivier, F., Vigilant, D.: Static Fault Attack on Hardware DES
Registers, http://eprint.iacr.org/2011/531.pdf
An Interleaved EPE-Immune PA-DPL Structure
for Resisting Concentrated EM Side Channel Attacks
on FPGA Implementation
1 Introduction
Power consumption and ElectroMagnetic (EM) attacks are the most studied attack
types since Side Channel Attack (SCA) was introduced by Paul Kocher et al [1]. DPL
(Dual-rail Pre-charge Logic) is experimentally proved to be an effective
countermeasure against SCA by masking its data-dependent power or EM variations
due to the complementary behavior between the True (T) and False (F) rails.
In [2], the Early Propagation Effect (EPE), also called Early Evaluation/Pre-charge
Effect is first time studied, revealing a potential defect in conventional DPL logic that
can possibly impact the complementary balance between T and F rails. The difference
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 39–53, 2012.
© Springer-Verlag Berlin Heidelberg 2012
40 W. He, E. de la Torre, and T. Riesgo
of arrival time for the inputs of complementary gates (or LUTs on FPGA) is potential
of generating unintentional data-dependent power or EM peaks. This is particularly
critical in FPGA implementation due to the rigid routing resource. In recent years,
several countermeasures for repairing the EPE problem were proposed, mainly
depending on the use of dual-rail compound gates with complementary signal pairs.
In this structure, the corresponding gates from dual rails are set side by side but
routings are done automatically by the router, which may lead to non-identical routing
paths between the complementary rails. A dual-core structure called PA-DPL
(Precharge-Absorbed Dual-rail Precharge Logic) is proposed in [3], which aims to
resist EPE problem while keeping routing identical for the implementation on Xilinx
FPGA with 6-input LUTs. However, separate placement for dual cores makes it
vulnerable to concentrated EM attacks.
In this paper, we present a row-crossed interleaved structure to minimize dual rail
unbalances caused by the non-identical routings. The main merit is that the identical
routing for complementary net pairs can be maintained between both interleaved dual-
cores thereby increasing the resistance to concentrated EM attacks. We also mitigate
the rigid timing in [3] by extending signal's duty cycle, which helps to increase the
maximum working frequency. The complete design flow and security tests against
attacks to interleaved PA-DPL will be given.
The rest of the paper is organized as follows. Section 2 presents an introduction to
the EPE problem and briefly discusses related techniques. Section 3 details the
proposed interleaved PA-DPL structure with identical routing. Implementation flows
of this structure to a simplified AES co-processor are shown in section 4. Section 5
describes the experimental attacks and net delay results. The work conclusion and
future work are given in section 6.
2 Related Work
Side channel analysis reveals the confidential information by analyzing side channel
leakages from low level, namely physical level. Therefore, countermeasures on this
level typically have better security performance than, for example, arithmetic
protections. However, physical leakages can be affected by a lot of factors. Any
minor asymmetry between the T and F rails can possibly lead to a detectable
unbalanced compensation in DPL structure, such as compensation skew, switching
time swing or glitch. Typically, routing length and process variation are considered to
be two significant factors which impact the compensation between T and F rails [4].
Conventional DPL structures may be vulnerable due to EPE. When gates switch
either from pre-charge to evaluation phase or from evaluation to pre-charge phase,
EPE potentially occurs in these switching actions. Actually, the EPE problem does
not just open the possibility of attacks against power/EM variations caused by
switching-related glitches or skewed match, but also the switching actions themselves
by measuring the time variation. Generally, EPE has 3 main impacts that can be
potentially used to launch side channel analysis.
Fig. 1. DPL compound XOR gate, inverter factor is allowed in this example
Unintentional Switch. Normally, DPL logic requires that each compound gate should
have and should only have one switch action in each clock cycle to ensure that it's
data-independent [18][19]. For the inputs of a gate with variant arrival times,
unintentional switch may happen depending on the input combination. As shown in
Figure 2, the XOR gate of compound gate XOR has different arrival time, when the
combination of inputs are AT:BT=1:1 in evaluation phase, a short switching action
occurs. It would be inevitably reflected in power or EM leakages. Since only in this
input combination the switch can occur. So it can be said that this peak in power or
EM trace is data-dependent.
Switching Time. EPE also covers problem in terms of gate switching time. Switching
time attack was first introduced in [5]. In DPL, the switching edge for a gate with
different input arrival time swings depending on the input combination. In Figure 3,
early switching and late switching reveal the input combination as ''1:0" and "0:1"
respectively. Therefore, starting edge of switching action for this gate is also data-
dependent.
Skewed Compensation. The two gates in each compound gate should switch
simultaneously so as to match each other precisely. Even if the arrival time for the
inputs of each gate of the compound gate can be maintained identical, XOR and
XNOR gate cannot switch at the same time because the arrival time between the two
gates are not the same (XOR gate 1 unit, XNOR gate 2 units, as shown in Figure 4).
The minor peak residue due to skew compensation is still suspicious of attacks.
42 W. He, E. de la Torre, and T. Riesgo
Fig. 2. For this single XOR gate in the XOR compound gate, different input delay leads to data-
dependent unintentionally switch action
Fig. 3. Switching time swings depending on the input combination of each single gate of the
XOR compound gate
concerns. BCDL is presented in [9], which synchronizes all the input pairs of a
compound gate by using a Bundle block. Since it has no limitation of gate type, better
optimization reduces the resource costs compared with previous ones. Another
structure named DPL-noEE [10] evolved from BCDL embeds the synchronization
logic into the encoding of LUT equations. Any potential intermediate transition is
eliminated by changing the code values to the value of pre-charged invalid state. It
has the highest efficiency in resource usage, however the starting edge of the
evaluation phase swings depending on the input combination. In [13], authors
explored place and route techniques for SDDL logic. which keeps identical routing
for both rails in interleaved placement, while EPE problem is not solved yet.
Due to the fact that pre-charge and synchronization logic are embedded into the LUT
equations, PA-DPL has high efficiency in hardware resource usage compared with
most of other EPE-resistant solutions. Up to 4 input equations are permitted in 6-input
LUT without inverter prohibition, this can further optimize resource usage.
44 W. He, E. de la Torre, and T. Riesgo
3.1 PA-DPL
PA-DPL evolves from FPGA implemented SDDL logic [11][12]. As mentioned in
[3], Pre-charge action logic is absorbed into the LUT function by inserting a global
pre-charge signal. Ex signal works together with Pre-charge signal to restrict the
evaluation and pre-charge phases into a fixed portion. Pre-charge and Ex are produced
with a stable phase shift. The resistance against EPE benefits by the following 2
points [3]:
2. Early Evaluation Prevention. Since valid data needs to propagate from source
registers to capture registers, the Ex signal in PA-DPL acts to confine the
evaluation phase into a restricted period in each clock cycle in order to make the
evaluation phase to start after the valid data of the slowest input arrives at each
LUT. Register stores the propagated valid data in each evaluation phase and then
releases it to the first LUT of the next sequential stages in the next evaluation
phase. So T and F registers always store complementary valid data.
1 CLB row ( 1 DU )
Threats from Concentrated EM Analysis. The two cores in PA-DPL can be set
close to obtain optimal timing and security performances, as shown in Figure 7, the
two cores are placed at a distance of 1 CLB row, hereafter will be called 1 DU
(Distance Unit). However, the complementary LUTs and routing are still deployed in
locations with relatively larger distances, here 5 DU. If a narrow probe can be set
precisely to either one of the two cores, the induced voltage by the magnetic field
from a pair of data-dependent cells cannot be balanced.
Power-based attacks depend on the global power consumption of the whole design.
So, in this context, location of the core does not have crucial influence in the
compensation of the whole power consumption. So, separate architecture for dual
core is not a big weakness against power-based side channel analysis. However,
manufacturing process variation matters when facing more sophisticated power or
EM attacks. In [22][23] authors demonstrated that closer locations in a chip have less
process variations. In order to mitigate the fabrication process deviation, it is better to
deploy two complementary cells or nets in closer locations.
Compared with ASIC implementation, FPGAs have much less freedom to choose the
resource to be used in the design, specially for the routing resources. Using the FPGA
Place and Route tools (PAR), users cannot control the router but following the pre-
defined routing algorithm.
Switch matrices offer the connecting possibilities between the horizontal and
vertical local lines. Combined with some special interconnects, the router tool
automatically connects all logic cells according to the specific design. Generally,
Switch Box in perimeters vary with those inside the fabric in the number of allowable
routes. Since identical routings require identity in routing resources, the placement for
the duplicate part should preferably avoid the use of the perimeter resources so as to
prevent the routing problems in advance.
In an interleaved placement, routing conflicts can occur when duplicating the
routings of the T core to the location of the F core since the routing resource for the
46 W. He, E. de la Torre, and T. Riesgo
F core may possibly have been pre-assigned by the nets of the T core. This makes the
techniques of direct copy-and-paste in [14] challenging if the F part is overlapped or
interleaved in the same fabric location with the T part.
4 Implementation
A simplified AES co-processor is chosen as the testing design for our row interleaved
PA-DPL. We implement the SBOX by logic elements instead of RAM. Figure 9
illustrates the block diagram of this design. It contains only XOR operation and
SBOX substitution blocks. T and F cores share the same control and clock generation
blocks in order to save resources. The partition method used is similar to the
technique in [15]. In each clock cycle, 8 bits plaintext generated from a
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 47
Fig. 10. Dual cores share the same control and clock-generating logics
Manual Phase. This step includes two synthesis iterations, one constraints insertion
and one file conversion. First, Virtex-4 is chosen as the target device to synthesize the
HDL file of our design. We can get a ngc file which is a binary expressed netlist file
with constraint information. The size of each Boolean function in this file is
constrained to a 4 input LUT since Virtex-4 FPGAs are based on 4 input LUTs. Then,
Virtex-5 is used as the target device to synthesize the ngc file. We set the maximum
input number of the 6 input LUT to 4 and disable the optimization strategy in process
properties, we then get a ncd file in which all the 6 input LUTs have at most 4 used
inputs, namely at least there are 2 unused inputs for each 6 input LUT. This is exactly
what is required, because in PA-DPL, 2 inputs of each LUT should be used in order to
implement pre-charge and synchronization logics. An ucf file is utilized in this
synthesis to limit the use of CLBs in certain parts to make it as a initially interleaved
placement. As shown in Figure 11, after the synthesis, the ncd file is then converted to
XDL (Xilinx Design Language) version.
48 W. He, E. de la Torre, and T. Riesgo
Control
Logic
Core Part
Automatic Phase. A XDL file is a readable version of a ncd file. It contains all
information of the design with regular format for all instances and nets. Thereby all the
copy and paste work can be done by modifying the XDL content using programming
languages. Here, we constructed a script, named SCB (Security Converter Box), to
automatically and safely convert the single core to an interleaved dual-core module in
low level description. SCB is compiled with Java and Regular Expression syntax. It can
be self-adapted to different designs since users just need to supply two parameters,
location of T part (the part needs to be protected) and displacement parameter for the F
part (for the Type C placement from Figure 5, this parameter is vertical '+1', horizontal
'0'). SCB automatically executes all the modifications and produces a converted XDL
file. This phase performs the following steps:
Tag nets and instances according to the location parameters
Duplicate and move instances of T part to location of F part.
Insert Prch and Ex to free inputs of LUT
Adjust LUT equation
Arrange over-block nets (delete and convert the nets)
PIP
T net
F net
routing conflict
delete conflicting
section
Routing Conflict Check Phase. After the conversion step, a PA-DPL protected circuit
in interleaved structure is obtained. Then it is transformed back to ncd file. At this
point, conflicts between the T and F routing lines may potentially exist in the design.
So, the design is checked by a tool developed on top of RapidSmith [16][17]. This
tool transforms every net to an abstract representation where every net is represented
as a node, and Programmable Interconnect Points (PIPs) define the connections
between these nodes. Since we've tagged the copied routing lines in the previous
phase, the tool checks all routing information of the F part by comparing the path
shape (PIPs information) between T and F rails. If two same PIPs are found, the F
routing passing through this PIP conflicts with another routing which passes through
the same PIP. It then deletes the conflict section of the T routing, re-route it and
duplicates it to generate a new F routing. Then, the tool checks PIPs of the new F
routing again. If there are conflicts again, the procedure is repeated until no conflicts
are found. Figure 12 illustrates the block diagram of this check and re-route flow.
F core
T core
Control logic
Fig. 13. Dual-core (protected) with row-crossed interleaved placement. Complementary routing
pairs in the core part are identical in shape.
the same placement constraints as the interleaved one for the convenience of the
comparison. A self-made EM probe (copper multi-turn antenna with 0.5mm diameter
and 26 turns) is used to gather EM radiation. Sampling rate is 667MSa/s using an
Agilent Oscilloscope with segmented memory.
0.5
1
key revealed trace number: 60
0.4
0.3
Correlation
0.5
0.2
peak:0.487
0
0.1
0
−0.5
−0.1
−0.2 −1
0 50 100 150 200 250 0 50 100 150 200 250 300
0.03
0.06 key revealed trace number: 50,000
0.02 Correlation
0.04
peak:0.021
0.01 0.02
0
0
−0.02
−0.01 −0.04
−0.06
−0.02
0 50 100 150 200 250 −0.08
0 40000 80000 120000 160000 200000
0.03
0.06
0.02 Correlation key revealed trace number: 62,000
0.04
peak:0.016
0.01 0.02
0
0
−0.02
−0.01 −0.04
−0.06
−0.02 −0.08
0 50 100 150 200 250 0 40000 80000 120000 160000 200000
Fig. 14. Correlation Coefficient curves of concentrated EM attacks. The one with interleaved
placement shows improved protection level compared with the one with separate placement.
net delay comparison of the complementary nets from an interleaved placement, route
uncontrolled dual-core AES-8. Group I is the result of the same module with Group II
except using the identical routing methods. It's obvious that in Group I, for most of
the nets, difference of net delay is 0 ns. Only few of them have minor difference, less
than 20ps. Comparatively, in Group II, since nets are automatically routed by routers,
most all of the complementary routing pairs have distinct time delay. The minor delay
differences in Group I are caused by the tiny net adjustment when the router connects
the new core (F core) and the peripheral control logic. Test result validates the
assumption that even if the physical level is unknown, identical nets in FPGA Editor
view obtains the same net time delays.
Table 1. Delay difference comparison between Group I (interleaved placement with identical
routing) and Group II (interleaved placement without identical routing) of a routing pair with
11 net sections
net1 net2 net3 net4 net5 net6 net7 net8 net9 net10 net11
net_F 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.980ns 0.548ns 1.125ns 0.758ns 0.164ns 0.626ns
net_T 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.982ns 0.548ns 1.143ns 0.758ns 0.164ns 0.626ns
I net_F- 0.000 0.000 0.000 0.000 0.000 -0.002 0.000 -0.018 0.000 0.000 0.000
net_T
net_F 0.421ns 0.686ns 0.494ns 1.058ns 0.443ns 1.125ns 0.529ns 1.124ns 0.759ns 0.410ns 0.626ns
net_T 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.982ns 0.548ns 1.143ns 0.758ns 0.164ns 0.626ns
II net_F-
-0.002 -0.042 -0.002 -0.002 -0.003 0.143 -0.019 -0.019 0.001 0.246 0.000
net_T
Group
Group
I II
Fig. 15. Bar diagram of time delay difference. Comparison proves that with identical routings,
complementary net pairs have extremely small swing of delay time difference.
6 Conclusion
This paper deals with the routing problem which occurs when overlapping the
complementary parts of dual-core structures in DPL logic. In our proposal, we
developed a technique which is capable of checking and repair the unmatched routing
pairs. By following the routing conflict checking flow, identical routing can be kept
for the complementary parts, even if the placement is closely interleaved together.
Based upon an EPE-resistant PA-DPL, we demonstrated an improved one which has a
52 W. He, E. de la Torre, and T. Riesgo
row-crossed interleaved structure for the core part with routing consistency. This
makes the corresponding complementary instances and nets as close as one DU while
the time delays for complementary nets are kept identical. This effectively strengthens
the resistance against concentrated EM attacks. Meanwhile, interleaved PA-DPL
makes the dual rails closely paralleled. This helps to reduces the process variation
impact since neighboring areas in silicon chip provably have more similar electric
parasitic parameters than that between two areas apart [22]. We also corrected the Ex
signal in PA-DPL to release the timing pressure caused by the compressed evaluation
phase. After this improvement, signal duty cycle can be expanded to 41.7% when the
core works in 3MHz working frequency. Timing verification validates that the
combination of the proposed techniques significantly reduced the time delay
differences in each complementary net pairs. Size comparison is made by comparing
LUT cost. Interleaved PA-DPL AES-8 occupies 353 LUTs, with an increase factor of
2.69 compared with 131 LUT cost of the unprotected one. Separate PA-DPL one
occupies 355 LUTs. This minor difference between interleaved and separate ones is
due to the different placements used which impacts the synthesis and mapping results.
Cost increase factor varies depending on what proportion the core part accounts for in
the whole circuit. The comparison attacks on different implementations show that
row-crossed interleaved PA-DPL has an increased resistance against concentrated EM
analysis by a factor of 1033 and 1.24 respectively from the unprotected circuit and
PA-DPL protected circuit with separate placement.
In the next step, we will test the circuit with more sophisticated attacks in order to
make thorough security verifications. Reducing the transient peak current is another
part of the future work.
References
1. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.) CRYPTO
1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
2. Suzuki, D., Saeki, M.: Security Evaluation of DPA Countermeasures Using Dual-Rail Pre-
charge Logic Style. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp.
255–269. Springer, Heidelberg (2006)
3. He, W., De La Torre, E., Riesgo, T.: A Precharge-Absorbed DPL Logic for Reducing
Early Propagation Effects on FPGA Implementations. In: 6th IEEE International
Conference on ReConFigurable Computing and FPGAs, Cancun (2011)
4. Guilley, S., Chaudhuri, S., Sauvage, L., Graba, T., Danger, J.-L., Hoogvorst, P., Vong,
V.-N., Nassar, M.: Place-and-Route Impact on the Security of DPL Designs in FPGAs. In:
HOST, pp. 29–35. IEEE Computer Society (2008)
5. Guilley, S., Chaudhuri, S., Sauvage, L., Graba, T., Danger, J.-L., Hoogvorst, P., Vong, V.-N.,
Nassar, M.: Shall we trust WDDL? In: Future of Trust in Computing, Berlin, vol. 2 (2008)
6. Chen, Z., Zhou, Y.: Dual-Rail Random Switching Logic: A Countermeasure to Reduce
Side Channel Leakage. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249,
pp. 242–254. Springer, Heidelberg (2006)
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 53
7. Popp, T., Kirschbaum, M., Zefferer, T., Mangard, S.: Evaluation of the Masked Logic
Style MDPL on a Prototype Chip. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007.
LNCS, vol. 4727, pp. 81–94. Springer, Heidelberg (2007)
8. Popp, T., Mangard, S.: Masked Dual-Rail Pre-charge Logic: DPA-Resistance Without
Routing Constraints. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp.
172–186. Springer, Heidelberg (2005)
9. Nassar, M., Bhasin, S., Danger, J.-L., Duc, G., Guilley, S.: BCDL: a High Speed Balanced
DPL for FPGA with Global Precharge and No Early Evaluation. In: Proc. Design,
Automation and Test in Europe, pp. 849–854. IEEE Computer Society, Dresden (2010)
10. Bhasin, S., Guilley, S., Flament, F., Selmane, N., Danger, J.-L.: Countering Early
Evaluation: an Approach towards Robust Dual-Rail Precharge Logic. In: WESS. ACM,
Arizona (2010)
11. Tiri, K., Verbauwhede, I.: A Logic Level Design Methodology for a Secure DPA Resistant
ASIC or FPGA Implementation. In: Proc. Design, Automation and Test in Europe, pp.
246–251. IEEE Computer Society (2004)
12. Velegalai, R., Kaps, J.-P.: DPA Resistance for Light-Weight Implementations of
cryptographic Algorithms on FPGAs. In: IEEE (FPL) Field Programmable Logic and
Applications, pp. 385–390 (2009)
13. Velegalati, R., Kaps, J.-P.: Improving Security of SDDL Designs Through Interleaved
Placement on Xilinx FPGAs. In: 21st IEEE International Conference on Field
Programmable Logic and Applications, Crete, Greece (2011)
14. Yu, P., Schaumont, M.: Secure FPGA circuits using Controlled Placement and Routing.
In: 5th IEEE International Conference on Hardware/Software Codesign and System
Synthesis, pp. 45–50 (2007)
15. Kaps, J.-P., Velegalati, R.: DPA Resistant AES on FPGA using Partial DDL. In: IEEE
FCCM, Symposium on Field-Programmable Custom Computing Machines, pp. 273–280
(2010)
16. Lavin, C., Padilla, M., Lamprecht, J., Lundrigan, P., Nelson, B., Hutchings, B.:
RapidSmith: Do-It-Yourself CAD Tools for Xilinx FPGAs. In: 21st IEEE International
Conference on Field Programmable Logic and Applications, pp. 349–355 (2011)
17. Lavin, C., Padilla, M., Lamprecht, J., Lundrigan, P., Nelson, B., Hutchings, B.: HMFlow:
Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping. In: 18th IEEE
Symposium on Field-Programmable Custom Computing Machines, Salt Lake City, USA
(2011)
18. Kulikowski, K., Karpovsky, M., Taubin, A.: Power Attacks on Secure Hardware Based on
Early Propagation of Data. In: IEEE, IOLTS, pp. 131–138. Computer Society (2006)
19. Suzuki, D., Saeki, M.: An Analysis of Leakage Factors for Dual-Rail Pre-charge Logic
style. IEICE, Transactions on Fundamentals of Electronics, Communications and
Computer Sciences E91-A(1), 184–192 (2008)
20. Soares, R., Calazans, N., Lomné, V., Maurine, P.: Evaluating the Robustness of Secure
Triple Track Logic through Prototyping. In: 21st Symposium on Integrated Circuits and
System Design, pp. 193–198. ACM, New York (2008)
21. Stine, B., Boning, D., Chung, J.: Analysis and Decomposition of Spatial Variation in
Integrated Circuit Processes and Devices. IEEE Tran. on Semiconductor Manufacturing,
24–41 (1997)
22. Sedcole, P., Cheung, P.: Within-die Delay Variability in 90nm FPGAs and Beyond. In:
Proc. IEEE International Conference on Field Programmable Technology (FPT 2006), pp.
97–104 (2006)
23. Maiti, A., Schaumont, P.: Improved Ring Oscillator PUF: An FPGA-friendly Secure
Primitive. J. Cryptology 24, 375–397 (2010)
An Architectural Countermeasure
against Power Analysis Attacks
for FSR-Based Stream Ciphers
Abstract. Feedback Shift Register (FSR) based stream ciphers are known
to be vulnerable to power analysis attacks due to their simple hardware
structure. In this paper, we propose a countermeasure against non-invasive
power analysis attacks based on switching activity masking. Our solution
has a 50% smaller power overhead on average compared to the previous
standard cell-based countermeasures. Its resistance against different types
of attacks is evaluated on the example of Grain-80 stream cipher.
1 Introduction
Feedback Shift Register (FSR) based stream ciphers target highly constrained
applications and have the smallest hardware footprint of all existing crypto-
graphic systems [1]. They are resistant against many types of cryptographic
attacks, including algebraic attacks, chosen-IV attacks, and time/memory/data
tradeoff attacks [2,3] but, due to their simple hardware structure, they are vulner-
able to side channel attacks [4]. One of the most dangerous side channel attacks
is power analysis, which breaks a cipher by exploiting the information content of
its power signature. Two popular types of power analysis attacks are Differential
Power Analysis (DPA) [5] and Mutual Information Analysis (MIA) [6].
Several countermeasures against power analysis attacks for block ciphers have
been developed [7]. Although these countermeasures can be applied to stream
ciphers as well, their overhead is often too high.
In this paper we propose a countermeasure against power analysis attacks for
FSR-based stream ciphers which masks the power trace of a cipher by altering the
switching activity of its FSRs. The proposed solution can be implemented using
standard digital cells only and is therefore well compatible with the standard
ASIC design flow. Compared to previous standard cell-based countermeasures [8]
for FSR-based stream ciphers, it consumes 50% less power and uses 19% less area
on average. We evaluate its resistance against DPA, MIA, and more complex
attacks on the example of Grain-80.
The remainder of the paper is organised as follows: In Section 2, related work
is summarised; Section 3 makes a preliminary analysis of FSRs and analyses their
dynamic power consumption; Section 4 describes our countermeasure; hardware
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 54–68, 2012.
c Springer-Verlag Berlin Heidelberg 2012
An Architectural Countermeasure against Power Analysis Attacks 55
2 Related Work
Power analysis attacks were first proposed in 1998 [5]. Several countermeasures
have been suggested to protect cryptographic systems against power analysis at-
tacks.
Analog countermeasures hide the correlation between data and power con-
sumption using an analog isolation circuit which keeps the current always at a
defined level [7,9]. Most of these countermeasures target other crypto-hardwares
such as block ciphers [7]. Although analog countermeasures can be effective on
FSR-based stream ciphers, most of them have high area and power overheads
which make them unsuitable for highly constrained environments. The only work
which focuses directly on designing an analog countermeasure for FSR-based
stream ciphers is [10].
Cell level countermeasures implement the cipher using dual rail logic gates
such as SABL [11], TDPL [12] or 2N-2Np [13]. Dual rail gates have low power
variations compared to standard cells but have higher area and power consump-
tion compared to standard digital cells. Moreover, these gates are normally not
included in standard cell libraries and must be designed at transistor level.
Architecture level countermeasures protect the cipher by hiding the depen-
dency between data and power consumption [8] or by masking the power trace,
i.e. by changing the quality of the power diagram so that it shows a com-
plete different pattern compared to the original power diagram [4, 5]. To our
best knowledge, the only architecture level countermeasure specifically targeting
FSR-based stream ciphers is [8]. The authors suggest a new implementation of
FSRs in which the number of flip-flops is doubled and control logic is inserted so
that for an n-bits FSR, n flip-flops toggle in any cycle (see Figure 5-right) and
the power diagram is ideally flat. The countermeasure can be implemented using
only standard digital cells but carries high overheads: even without considering
the overheads of the control circuits, the average flip-flop switching activity of
the system is doubled (the average flip-flop switching activity of an n bits FSR
is n2 [10]).
in f1 f2 f3 f4out fault1(initial) 3
initial 0 1 1 1 0 1
SA
2
correct in ff1 f1 ff2 f2 ff3 f3 ff4 f4 ff5 out 1
time1 1 1 1 0 1 0 output
time2 0 1 1 1 0 1 1234 56
time3 1 0 1 1 1 0 faulty bit time
time4 0 1 0 1 1 1 correct faulty FSR
time5 1 0 1 0 1 1 output unfaulty FSR
Fig. 1. An example of faulty 5-bits FSR with an injected fault on f2 during the initial
cycle
of the cipher. There is a high correlation between this energetic trace and the
switching activity SA of the state bits fi of the FSR(s), i.e. how many FSR(s)
state bits toggle in one cycle [8]. The high correlation can be explained by the
following observations:
– Given the size of FSRs in FSR-based stream ciphers (2×80 bits for Grain-80,
2 × 128 bits for Grain-128, 288 bits for Trivium), most of the power of the
ciphers is consumed by the FSR(s) itself (themselves), with only a marginal
contribution given by the combinational blocks [8, 10].
– The energy consumption of every flip-flop in an FSR is highly dependent
on its output bit. Clock switching has a significant but constant power con-
sumption; if the output of a flip-flop toggles, its energy consumption is much
higher compared to a situation in which its output does not toggle. The
energy consumed in a 0 → 1 or 1 → 0 transition is in first approximation
equal and much higher than the energy consumed in a 0 → 0 or 1 → 1
transition [5].
Since the energetic trace of an FSR-based stream cipher has a very high corre-
lation with the switching activity of the state bits of the FSR(s), to alter the
energetic trace we propose to change the switching activity pattern of all its
FSRs, i.e. to modify the FSRs so that they have the same output stream as the
original FSRs, but a different switching activity in every cycle.
If the output fi of a flip-flop toggles before it is passed on in an FSR, a fault is
injected in the chain and propagates through it (see Figure 1). The fault alters the
output stream of the cipher if it reaches any of the outputs of the chain going to
combinational blocks. If the fault is corrected before, however, the output stream
of the cipher remains unaltered while the switching activity pattern (and thus
the power graph) is changed. We insert fault injection/correction mechanisms
between the flip-flops composing an FSR, in such a way that the output stream
An Architectural Countermeasure against Power Analysis Attacks 57
Switching Activity
8700 94
8300 89
7800 83
7300 77
6800 71
6300 65
0 20 40 60 80 100 120 140 160 180 200
Time (Clock Cycle)
SA
time2 1 1 1 1 1 0 1 1 1 1 0 1 0 0 1 11 1 0 1 1 0 1 3
2 6 3 FSR2
time3 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 01 1 1 1 1 1 0 2
1 5 2 1
time4 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 10 1 0 1 1 1 1
altered bits same output stream 1 2 3 4 5
time
of the cipher remains unaltered but the switching pattern of the flip-flops is
modified. The protected and original ciphers are functionally undistinguishable
in a non-invasive attack, because their output stream is identical; however, their
power signature is different.
1 1 0.15 0.3
A − ρ(SAP,SAO) B − I(SAP,SAO) C − Pr(ρ) D − Pr(I)
0.5 0.8
0.1 0.2
0.6
0
0.4
0.05 0.1
−0.5
0.2
−1 0 0 0
0 40 80 120 160 0 40 80 120 160 −0.2 −0.1 0 0.1 0.2 0.1 0.12 0.14 0.16 0.18
n. altered bits n. altered bits ρ(SAX,SAY) I(SAX,SAY)
Fig. 4. A, B: correlation and mutual information between SAO and SAP for 160-bits
FSRs after 10K cycles as the number of altered state bits varies. C, D: correlation
and mutual information distribution between unrelated switching activities after 10K
cycles.
no change with 50% probability, −1 with 25% probability) was added to SAP :
without this noise, if the number of altered state bits is even (odd), the parity
of SAO and SAP is necessarily equal (different), and the mutual information
is close to 1 bit (by knowing SAO we can understand the parity of SAP and
viceversa). This has no practical effect on power analysis attacks because ev-
ery power analysis attack uses energetic traces that are obtained through noisy
measurements.
For a = n2 = 80, ρ (SAP , SAO ) 0 and I (SAP , SAO ) has a minimum.
Random switching activities SAX and SAY for unrelated FSRs have correlation
and mutual information distributed as in Figures 4-C and 4-D (obtained for 600
10K runs). The average value for I (SAX , SAY ) is ∼ 0.11, which is the same
as the mutual information between SAO and SAP corresponding to n2 obtained
from Figure 4-B.
In a first time, we therefore insert the modification points so that exactly n2
state bits are altered, which guarantees the lowest dependency between SAO and
SAP . The number of n-bits altered FSRs that can be constructed
n equivalent to
a given n bits FSR with n2 altered state bits is given by n/2 , which is very high
for typical sizes of FSRs used in FSR-based stream ciphers (∼ 9.2 × 1046 for the
combination of the two 80-bits FSRs of Grain-80). Note that in Subsection 7.5
we will also assess the security of the method when the mask is randomly picked
between all possible masks.
5 Hardware Implementation
To simplify manufacturing, we suggest to design all FSRs with the same lay-
out, with all modification points already implemented, as shown in Figure 5.
The modification points are activated or de-activated based on the values of
en signals, which can be programmed after the chip has been manufactured.
Alternative solutions are discussed in Subsections 7.4 and 7.5.
In Figure 5-left, each signal fi between f0 = in and f4 = fn is XORed with
the s or s̄ signal before it is passed on in the chain if its relevant eni signal is
set to one; otherwise, the modification point is inactive and the signal is passed
60 S.S. Mansouri and E. Dubrova
G2 FF FF FF FF
a1 a2 a3 a4
G1=combinational
G1 in f1 f2 f3 f4
FF FF FF FF 1 1 1 1
G1* G1 G1
s clk
a1 a2 a3 a4
s FF G3
o1 o2 o3 o4
1 o5=out
0 G4 k1 a2 k2
k3
a4 k4 T T
1 2 3 4 5 time T T
FF FF FF FF
feed back function
feed back function G2=sequential
Fig. 5. Left: Schematic diagram of the protected FSR with our countermeasure (the
red feedback function indicates a Galois feedback). Right: Schematic diagram of the
countermeasure in [8].
en en en
A B
in f in in
FF out FF f 0 out FF f out
1
s s
s
– G4 corresponds to n2 XOR gates which are active only when the FSR is
loaded with the initial state during the first cycle of operation. If the state
bit in which the initial state is loaded is altered and has an even index, it is
inverted before it is loaded into the corresponding flip-flop.
An extension of our countermeasure to support parallelized FSRs is possible but
is outside the scope of this paper.
The gates in G1 and G3, if implemented as in Figure 6-A, have a different
power consumption if the corresponding en signal is at 0 or 1. This can reveal to
an attacker the number of en signals at 1. The gates can instead be implemented
as in Figure 6-B: since the en signal is the input value of both multiplexer’s AND
gates, in each cycle one of these two gates is active independently on the value
of en and the power consumption of the gates does not depend on the value of
the en signals.
The gates in G1, G2, G3 and G4 add area and power penalties to the FSR
architecture. However, many of these cells have constant input values (G2) or
are active only during initialization (G4), and do not consume dynamic power
during cipher operation. Gates in G3 consume dynamic power during operation
but their number is limited because they are only inserted on the state bits that
are used as inputs of a combinational function (in Grain-80, on 30 bits out of
160; in Trivium, on 6 bits out of 288).
6 Experimental Results
We designed in Verilog three versions of the Grain-80, Grain-128 and Trivium
ciphers: the first unprotected, the second protected as suggested in this paper (a
simplified implementation is shown in Figure 7) and the third protected using the
countermeasure suggested in [8] (the implementation follows Figure 5-right). We
compare the area and power overheads of our suggested countermeasure with the
countermeasure in [8] (see Figure 5-right) because, to the best of our knowledge,
it is the only standard cell architecture level countermeasure targeting FSR-
based stream ciphers. All ciphers were synthesised targeting minimal area using
Cadence RTL compiler in UMC 90 nm ASIC technology. Power results were
obtained from Cadence RTL compiler backannotated with gate-level switching
activity, estimated as a combination of dynamic and leakage power with a power
supply of 1.2V at 1 MHz clock frequency with a set of random test vectors.
As shown in Table 1, the ciphers protected as suggested in this paper are on
average ∼ 19% smaller than the ciphers protected as in [8], and consume on
average twice less power. This discrepancy between relatively low area benefits
62 S.S. Mansouri and E. Dubrova
G2
G4 G4 G4 G4 G4 G4 G4 G4
G1 FF FF
G1 78
FF
1 G1
FF
0 G1* FF FF
G1 78
FF
G1
FF
0
79 79 1
s
G3 G3
FF
LFSR feedback funcion NLFSR feedback funcion
H,Z
LFSR output stream NLFSR
4
x 10
9000 1.7
Switching Activity
Peak Current (nA)
Protected
Original
1.5
8000 80
1.4
7500
1.3 70
7000 1.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time (Clock Cycle) Time (Clock Cycle)
Fig. 8. Comparision between the power (current) consumptions and FSR’s switching
activities of both protected and unprotected Grain-80
and high power benefits is due to the fact that most of the gates that are inserted
in the FSR do not toggle during operation. Compared to the original cipher, the
power overhead of the countermeasure is on average ∼ 50% for all three ciphers.
The power (current) diagrams of a protected and an unprotected Grain-80 for
300 execution cycles are shown in Figure 8-left; the state bits switching activities
are shown in in Figure 8-right.
7 Security
Table 1. Area and power comparison between the original (Org.) Grain-80, Grain-128
and Trivium, the same ciphers using our countermeasure (R.SA), and the countermea-
sure in [8]
then calculated for each of the modelled energetic traces. In a successful attack,
the key k giving the highest value for the distinguisher corresponds to the secret
key ks . Attacks on longer traces are more likely to succeed: the Measurements to
Disclosures (MTD) is defined as the minimal number of samples in the energetic
traces for which the correct key’s distinguisher becomes higher than that of all
the wrong guessed keys.
The attack is successful if: (1) the pool of guessed keys contains the secret key
(ks ∈ K) and (2) the highest value of the distinguisher is obtained for k = ks .
The first strength of our countermeasure is that it makes it hard for an attacker
to find a pool of guessed keys containing the secret key, because normally getting
a pool of guessed keys requires assumptions on the power model of the system.
We however assume that a pool of guessed keys containing the secret key is
available and we check whether the distinguisher can reveal or not the secret
key during an attack.
We consider two first-order attacks: DPA attack [5], which uses Pearsons cor-
relation coefficient as the distinguisher (d = ρ(Ei , EMki )) and MIA attack (or
generic side channel attack) [16], which uses mutual information as the distin-
guisher (d = I(Ei , EMki )).
0.02
0.6
Correlation Coefficients
Correlation Coefficients
0.01 0.4
0.2
0
0
−0.01
−0.2
−0.02 −0.4
0 100 200 300 0 100 200 300
Predicted Key Predicted Key
Fig. 9. Correlation coefficients for the 300 guessed keys on the protected (left) Grain-80
after 1M cycles and unprotected (right) Grain-80 after 1K cycles
Fig. 10. Left: MIA attack on protected Grain-80 after 1M cycles. Right: joint proba-
bility distribution of SAi and SAM ks i after 1M cycles.
the higher is the degree of independence between EMkmi and Ei . Without any
information, the attacker could only guess m randomly. For a 160-bits FSR or
the two 80-bits FSRs of Grain-80, the mask distance would have a probability
distribution as in Figure 11.
The Gaussian distribution would be tighter if n was higher than 160 (as
for Grain 128 and Trivium). Given the sizes of FSRs used in stream ciphers,
therefore, in most of the cases Ei and EMkmi would be only weakly related.
Estimating the MTD of DPA and MIA attacks is computationally intensive.
MTDs are expected to rise as the mask distance gets closer to n2 . To estimate
MTDs we conducted 5 DPA and MIA attacks using 100 guessed keys for random
masks with r = 80 ± 5, 5 for random masks with r = 80 ± 10 and 5 for random
masks with r = 80 ± 20. For each mask distance, we found a lower bound for
which none of the 5 attacks was successful.
We found that for a MIA attack conducted in the same conditions of
Section 7.2, the cipher in 90% of the cases (70 ≤ r ≤ 90) will not break be-
fore 100K cycles. In 99% of the cases (60 ≤ r ≤ 100), the cipher will not break
before 5K cycles. The results are shown in Figure 11-left. The low rate of success
of these types of attack is due to the fact that the mutual information curve re-
mains low as long as the mask distance is between at least 60 and 100 as shown
in Figure 4-left.
DPA attacks are more successful, because the relation between the mask dis-
tance r and the correlation between the energetic curves is linear, as shown in
Figure 4-B. We found out that only in 62% of the cases (75 ≤ r ≤ 85) the MTD
is higher than 100K.
An attacker could also attack the cipher by using several different models of
the secret cipher, each of them obtained by estimating the energy consumption
EMmki of the cipher if its mask was a specific mask m ∈ M , where M is a pool
of 2, 3 or more guessed masks. It would then be possible to attack the cipher by
using multivariate correlation and/or multivariate mutual information between
all energetic traces Ei , EMmki . If chosen randomly, all guessed masks will in
general have a mask distance r randomly distributed as in Figure 11 from the
mask used by the cipher under attack. Discussion of these attacks, which are
computationally more intensive, lies outside the scope of this paper.
66 S.S. Mansouri and E. Dubrova
0.08 0.08
0.06 0.06
pdistribution
pdistribution
MTD > 500K MTD > 100K
0.04 0.04
MTD > 100K MTD > 30K
0 0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
Mask Distance Mask Distance
Fig. 11. Estimated MTDs for DPA (right) and MIA (left) attacks based on the mask
distance of two random masks
Using fixed en signals programmed after the chip has been manufactured makes
the en signals vulnerable to imaging semi-invasive attacks [15, 17], attacks in
which the attacker uses advanced optical probing to observe chip placement and
routing, and laser scanning techniques to find the active doped areas on the
surface of the chip. One of the solutions which makes it more difficult to see
under the top metal layer of the chip is planarising each predecessor layer before
applying the next layer [17] by filling blank spaces on metal layers with metal
pads to block the optical path. It is also possible to prevent decapsulating the
IC by implementing light sensors on the chip which prevent a decapsulated chip
from functioning [15].
As shown in Figure 12-right, it is possible to drive the en signals using an
SRAM Physical Unclonable Function (PUF) [18], which makes imaging attacks
ineffective. When the cipher is powered up, en signals boot to a state which is
different for every manufactured chip and depends on device mismatches between
the different cells. For the same chip, the en signals should take the same values
at every run; failure to do so will only add some randomization on some bits of
the mask. Discussion about these issues is outside the scope of this paper.
rst
Fig. 12. Two solutions for driving the en signals: PRNG (left) and PUF (right)
Since the mask will be randomly picked between all possible values, any model
made by the attacker by guessing a mask will have in general a mask distance
r from the cipher, with r distributed as in Figure 11. Any DPA or MIA attack
would therefore have the same success rate as in figure 11. However, since the
mask is changed in every run, the attacker has only a single chance to sample a
specific energetic trace before a new mask is loaded and the hardware structure
of the cipher is changed.
8 Conclusion
In this paper we introduced a standard cell architectural level countermeasure for
FSR-based stream ciphers. The proposed countermeasure alters the power trace
by masking the switching activity of the FSR. This differentiates our approach
from the previously proposed ones, which instead flatten the power trace. The
new concept allows us to save 50% power and 19% area on average.
The proposed countermeasure can be implemented using standard digital cells
only. Therefore, it is compatible with the standard ASIC design flow and easier to
implement compared to analog countermeasures and cell level countermeasures,
which require analog and transistor-level design.
We evaluated the security of our approach by performing DPA and MIA
attacks on the protected version of Grain-80 stream cipher. The results show
that the first-order DPA and MIA attacks can not break Grain-80 before 1M
cycles. If the attacker guesses a mask or the mask is randomly picked among
all possible values, the success rate of MIA and DPA attacks depends on which
mask is picked. We performed a probabilistic analysis and estimated the success
rates of MIA and DPA up to 100K cycles to be less 10% and 40% respectively.
Better results are expected for ciphers using larger FSRs, such as Grain-128 and
Trivium. As a solution for further decreasing the success rate, we propose to
change the mask randomly at every run using a PRNG.
In a future work, we plan to investigate a possibility of changing the mask or
some of its bits dynamically during the operation of the cipher.
References
1. Robshaw, M.: The eSTREAM Project. In: Robshaw, M., Billet, O. (eds.) New
Stream Cipher Designs. LNCS, vol. 4986, pp. 1–6. Springer, Heidelberg (2008)
2. De Cannière, C., Preneel, B.: Trivium. In: Robshaw, M., Billet, O. (eds.) New
Stream Cipher Designs. LNCS, vol. 4986, pp. 244–266. Springer, Heidelberg (2008)
3. Hell, M., Johansson, T., Maximov, A., Meier, W.: The Grain Family of Stream
Ciphers. In: Robshaw, M., Billet, O. (eds.) New Stream Cipher Designs. LNCS,
vol. 4986, pp. 179–190. Springer, Heidelberg (2008)
4. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards. Springer-Verlag New York, Inc. (2007)
5. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
6. Batina, L., Gierlichs, B., Prouff, E., et al.: Mutual information analysis: Compre-
hensive study. J. Cryptol. 24, 269–291 (2011)
7. Tokunaga, C., Blaauw, D.: Secure AES engine with a local switched-capacitor
current equalizer. In: IEEE International Solid-State Circuits Conference - Digest
of Technical Papers, ISSCC 2009 (2009)
8. Burman, S., Mukhopadhyay, D., Veezhinathan, K.: LFSR Based Stream Ciphers
Are Vulnerable to Power Attacks. In: Srinathan, K., Rangan, C.P., Yung, M. (eds.)
INDOCRYPT 2007. LNCS, vol. 4859, pp. 384–392. Springer, Heidelberg (2007)
9. Ratanpal, G., Williams, R., Blalock, T.: An on-chip signal suppression counter-
measure to power analysis attacks. IEEE Transactions on Dependable and Secure
Computing, 179–189 (2004)
10. Mansouri, S.S., Dubrova, E.: A Countermeasure Against Power Analysis Attacks
for FSR-Based Stream Ciphers. In: ACM Great Lakes Symposium on VLSI, pp.
235–240 (2011)
11. Atani, S., Atani, R.E., Mirzakuchaki, S., et al.: On DPA-resistive implementation of
fsr-based stream ciphers using sabl logic styles. International Journal of Computers,
Communications & Control (2008)
12. Bucci, M., Giancane, L., Luzzi, R., Trifiletti, A.: Three-Phase Dual-Rail Pre-charge
Logic. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 232–241.
Springer, Heidelberg (2006)
13. Moradi, A., Khatir, M., Salmasizadeh, M., et al.: Charge recovery logic as a side
channel attack countermeasure. In: ISQED 2009 (2009)
14. Hell, M., Johansson, T., Maximov, A., et al.: A Stream Cipher Proposal: Grain-128.
In: 2006 IEEE International Symposium on Information Theory, pp. 1614–1618
(2006)
15. Skorobogatov, S.P.: Semi-invasive attacks – a new approach to hardware security
analysis. University of Cambridge, Computer Laboratory, Tech. Rep. UCAM-CL-
TR-630 (April 2005)
16. Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual Information Analysis - A
Generic Side-Channel Distinguisher. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008.
LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008)
17. Anderson, R., Bond, M., et al.: Cryptographic processors-a survey. Proceedings of
the IEEE 94, 357–369 (2006)
18. Sadeghi, A.-R., Naccache, D.: Towards Hardware-Intrinsic Security: Foundations
and Practice, 1st edn. Springer-Verlag New York, Inc. (2010)
Conversion of Security Proofs from One Leakage
Model to Another: A New Issue
1 Introduction
1.1 Context
Side Channel Analysis (SCA for short) is a class of attacks that extracts in-
formation on sensitive values by analyzing a physical leakage during the exe-
cution of a cryptographic algorithm. They take advantage of the dependence
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 69–81, 2012.
c Springer-Verlag Berlin Heidelberg 2012
70 J.-S. Coron et al.
L = ϕ(Z) + B, (1)
The second model assumes that the device leaks on the memory transitions
when a value Z is manipulated. In this situation the function ϕ depends on Z
but also on a second value Y corresponding to the initial state of the memory
before the writing of Z. More precisely, we have:
L = ϕ(Z ⊕ Y ) + B. (2)
In the particular case where ϕ is the HW function, the leakage L defined in (2)
corresponds to the so-called Hamming distance (HD) model.
Several works have demonstrated the validity of HW and HD models in prac-
tice, which are today commonly accepted by the SCA community. However other
more precise models exist in the literature (see for instance [3, 9, 16]).
In the rest of this paper, we keep the generality by considering two models :
ODL model (Only manipulated Data Leak ) and MTL model (Memory Transition
Leak ), each of them being defined by the leakage function expressed in (1) and
(2) respectively.
Except very rare exceptions (e.g. [10]), security proofs in the literature are usu-
ally conducted in ODL model. This is in particular the case of the countermea-
sures proposed in [14]. However, in practice, the leakage is better modeled by
MTL model. Starting from this observation, a natural question is to decide at
which extent a countermeasure proved to be secure in ODL model stays secure
in MTL model. Very close to this question an interesting and practically rele-
vant problem is the design of methods to transform an implementation secure
in the first model into a new implementation secure in the second. Hence, if we
assume that the memory transitions leak information, the leakage is modeled by
ϕ(Y ⊕Z)+B. In such a model a masking countermeasure may become ineffective.
For instance, if Z corresponds to a masked variable X ⊕ M and if Y equals the
mask, then the leakage reveals information on X. A very straightforward idea to
deal with this issue is to erase the memory before each new writing (e.g. set Y
to 0 in our example). One may note that such a technique is often used in prac-
tice at either the hardware or software level. Using such a method, the leakage
ϕ(Y ⊕ Z) + B is replaced by the sequence of consecutive leakages ϕ(Y ⊕ 0) + B1
and ϕ(0 ⊕ Z) + B2 that is equivalent to ϕ(Y ) + B1 and ϕ(Z) + B2 . The single
difference with classical ODL model is the additional assumption that the exe-
cution leaks the content of the memory before the writings. Since this leakage
corresponds to a variable that has been manipulated prior to Z, it is reasonable
to assume that the leakage ϕ(Y ) + B1 has already been taken into account when
establishing the security of the countermeasure. As a consequence, this way to
implement a countermeasure proved to be secure in ODL model seems at a first
glance also offers security on a device leaking in MTL model.
In this paper, we emphasize that a countermeasure proved to be secure in
ODL model may no longer stay secure in MTL model. Indeed, we exhibit a
case where a countermeasure proved to be second-order resistant in ODL model
72 J.-S. Coron et al.
1. b ← rand(1)
2. for a = 0 to 2n − 1 do
3. cmp ← compareb (t1 ⊕ a, t2 )
4. Rcmp ← (F (x̃ ⊕ a) ⊕ s1 ) ⊕ s2
5. return Rb
4.1 RA ← x̃ ⊕ a
4.2 RA ← F (RA )
4.3 RA ← RA ⊕ s1 (4)
4.4 RA ← RA ⊕ s2
4.5 Rcmp ← RA
74 J.-S. Coron et al.
In the following we will show that the distribution of the value Y defined in
(5) brings information on the sensitive variable X. We will consider two cases
depending on whether RA equals Rcmp or not.
L ∼ ϕ(Y ⊕ X̃ ⊕ a) + B , (6)
where Y denotes the initial state of Rcmp before being updated with X̃ ⊕ a,
defined above by (5).
From (5) and (6), we deduce:
⎧
⎪
⎪
ϕ(X̃) +B if a=0 ,
⎪
⎨ ϕ(X ⊕ 1) + B if a = 1 and T1 ⊕ T2 = 0 ,
L = ϕ(X) +B if a > 0 and T1 ⊕ T2 = a ,
⎪
⎪
⎩ ϕ(F (X̃ ⊕ (a − 2)) ⊕ S1 ⊕ S2 ⊕ X̃ ⊕ a) + B if
⎪ a > 1 and T1 ⊕ T2 = (a − 1) ,
ϕ(F (X̃ ⊕ (a − 1)) ⊕ S1 ⊕ S2 ⊕ X̃ ⊕ a) + B otherwise.
where μ denotes the expectation E[ϕ(U )] with U uniform over Fn2 (e.g. for ϕ =
HW we have μ = n/2). And when a > 1, the mean of (L|X = x) satisfies:
μ 1
E(L|X = x) = + n ϕ(Da⊕(a−2) F (x ⊕ (a − 2) ⊕ (a − 1)))
2n 2
n
2 −1
1
+ n ϕ(Da⊕(a−1) F (x ⊕ (a − 1) ⊕ t)). (12)
2
t=0,t=a,(a−1)
From an algebraic point of view, the sums in (11) and (12) may be viewed as the
mean of the value taken by Da F (x ⊕ t) (respectively Da⊕(a−1) F (x ⊕ (a − 1) ⊕ t))
over the coset x⊕{t, t ∈ [2, 2n −1]} (respectively x⊕{t, t ∈ [0, 2n −1]\{a−1, a}}).
Since those cosets are not all equal, the means are likely to be different for some
values of x. Let us for instance consider the case of F equal to the AES s-box and
let us assume that ϕ is the identity function. In Relation (11), the sum equals
34066 if x = 1 and equals 34046 if x = 2. When a > 1, we have the similar
observation.
From (11) and (12), we can deduce that the mean leakage reveals information
on X and thus, the set of observations can be used to perform a first-order SCA.
By exhibiting several attacks in this section, we have shown that the second-
order countermeasure proved to be secure in ODL model may be broken by a
first-order attack in MTL model. These attacks demonstrate that a particular
attention must be paid when implementing Algorithm 1 on a device leaking in
MTL model. Otherwise, first-order leakage may occur as those exploited in the
attacks presented above. As already mentioned in the introduction, a natural
solution to help the security designer to deal with those security traps could be
to systematically erase the registers before any writing. This solution is presented
and discussed in the next section.
5 Experimental Results
This section provides the practical evaluation of the attacks presented above. We
have verified the attacks on block ciphers with two different kinds of s-boxes:
an 8-bit to 8-bit s-box (AES) and two 4-bit to 4-bit s-boxes (PRESENT and
Klein). We have implemented Algorithm 1 as described in Section 4.1 on a 8-bit
microcontroller. Using 2O-CPA, we were able to find the secret key for all three
s-boxes. In case of the 4 × 4 s-boxes, we needed fewer than 10.000 power traces to
find the correct key. However, for the 8 × 8 s-box, the number was much higher,
since more than 150.000 traces were required to distinguish the correct key from
the rest of the key guesses.
Initially, we set the value in the two memory locations R0 and R1 to zero. We
randomly generate the plaintexts mi and the input/output masks ti,1 , ti,2 and
si,1 , si,2 using a uniform pseudo-random number generator where the value of i
varies from 1 to N (i.e., the number of measurements). Then, we calculate x̃i
from the correct key k via x̃i = k ⊕ mi ⊕ ti,1 ⊕ ti,2 . As described in Section 4.1,
before writing a new value to any memory location, we first erase its contents
by writing 0, and then write the new value as shown in (13). For verifying the
attacks, we only consider the power traces where a = 1. During respectively
the manipulation of the x̃i and the memory erasing, we measure the power
consumption of the device. This results in a sample of pairs of leakage points that
are combined thanks to the centered product combining function defined in [11].
For each key hypothesis kj , the obtained combined leakage sample (Li )1≤i≤N is
0.12
0.12
0.1
0.1
0.08
0.08
Max (|ρ |)
i
K
Max (|ρ |)
0.06
i
K
0.06
0.04
0.04
0.02
0.02
0
0 0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20
Number of Measuremesnts (X 1000) Number of Measuremesnts (X 1000)
Fig. 2. Convergence with practical im- Fig. 3. Convergence with practical im-
plementation of 20-CPA for Klein plementation of 20-CPA for PRESENT
80 J.-S. Coron et al.
0.05
0.035
0.045
0.03
0.04
0.025 0.035
0.03
Max (|ρ |)
0.02
i
K
Max (|ρ |)
i
K
0.025
0.015
0.02
0.01
0.015
0.005 0.01
0.005
0 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7
Number of Measuremesnts (X 50000) Number of Measuremesnts (X 2000)
Fig. 4. Convergence with practical im- Fig. 5. Convergence with practical im-
plementation of 20-CPA for AES plementation of 10-CPA for PRESENT
correlated with the sample of hypotheses (HW (mi ⊕ kj ))1≤i≤N . The key guess
for which the correlation coefficient is the maximum will be the correct key.
Figure 2 and Figure 3 show the correlation traces for a 2O-CPA on the Klein
and PRESENT s-boxes, respectively. As it can be observed, the right key is found
in both cases with less than 10.000 power traces. Figure 4 shows the correlation
traces for a 2O-CPA on the AES s-box. Here the convergence of the traces to
the correct key is observable only after 150.000 traces. Finally, Figure 5 shows
the first-order attack on the PRESENT s-box in the Hamming Distance model
as described in Section 3.2. Here we implemented Algorithm 1 directly without
the additional step of erasing the memory contents before performing a write
operation. The power traces are collected for 50.000 inputs, and only the traces
corresponding to the case a = 1 are considered. The correct key candidate can
be identified with less than 10.000 traces.
In this paper, we have shown that a particular attention must be paid when
implementing a countermeasure proved to be secure in one model on devices
leaking in another one. In particular we have shown that the second-order coun-
termeasure proposed in [14] together with a security proof in ODL model is
broken by first-order SCA when running on a device leaking in MTL model.
Then, we have focused on a method that looked at first glance very natural to
convert a scheme resistant in ODL model in a new one secure in MTL model.
Our analysis pointed out flaws in the conversion method and hence led us to
identify two new issues that we think to be very promising for further research.
The first issue is the design of a generic countermeasure proved to be secure in
any practical model and the second is the design of a method of porting the
security from a model to another one.
Conversion of Security Proofs from One Leakage Model to Another 81
References
1. Blömer, J., Guajardo, J., Krummel, V.: Provably Secure Masking of AES. In:
Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 69–83.
Springer, Heidelberg (2004)
2. Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards Sound Approaches to
Counteract Power-Analysis Attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS,
vol. 1666, pp. 398–412. Springer, Heidelberg (1999)
3. Doget, J., Prouff, E., Rivain, M., Standaert, F.: Univariate side channel attacks and
leakage modeling. In: Schindler, W., Huss, S. (eds.) Second International Workshop
on Constructive Side-Channel Analysis and Secure Design – COSADE 2011 (2011)
4. Genelle, L., Prouff, E., Quisquater, M.: Thwarting Higher-Order Side Channel
Analysis with Additive and Multiplicative Maskings. In: Preneel, B., Takagi, T.
(eds.) CHES 2011. LNCS, vol. 6917, pp. 240–255. Springer, Heidelberg (2011)
5. Goubin, L., Patarin, J.: DES and Differential Power Analysis – The Duplication
Method. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 158–172.
Springer, Heidelberg (1999)
6. Messerges, T.S.: Securing the AES Finalists Against Power Analysis Attacks. In:
Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 150–164. Springer, Heidelberg
(2001)
7. Oswald, E., Mangard, S., Pramstaller, N.: Secure and Efficient Masking of AES –
A Mission Impossible? Cryptology ePrint Archive, Report 2004/134 (2004)
8. Oswald, E., Schramm, K.: An Efficient Masking Scheme for AES Software Imple-
mentations. In: Song, J., Kwon, T., Yung, M. (eds.) WISA 2005. LNCS, vol. 3786,
pp. 292–305. Springer, Heidelberg (2006)
9. Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Power and Electromagnetic Anal-
ysis: Improved Model, Consequences and Comparisons. Integration 40(1), 52–60
(2007)
10. Prouff, E., Rivain, M.: A Generic Method for Secure SBox Implementation. In:
Kim, S., Yung, M., Lee, H.-W. (eds.) WISA 2007. LNCS, vol. 4867, pp. 227–244.
Springer, Heidelberg (2008)
11. Prouff, E., Rivain, M., Bévan, R.: Statistical Analysis of Second Order Differential
Power Analysis. IEEE Trans. Comput. 58(6), 799–811 (2009)
12. Prouff, E., Roche, T.: Higher-Order Glitches Free Implementation of the AES Using
Secure Multi-party Computation Protocols. In: Preneel, B., Takagi, T. (eds.) CHES
2011. LNCS, vol. 6917, pp. 63–78. Springer, Heidelberg (2011)
13. Rivain, M., Dottax, E., Prouff, E.: Block Ciphers Implementations Provably Secure
Against Second Order Side Channel Analysis. Cryptology ePrint Archive, Report
2008/021 (2008), http://eprint.iacr.org/
14. Rivain, M., Dottax, E., Prouff, E.: Block Ciphers Implementations Provably Secure
Against Second Order Side Channel Analysis. In: Nyberg, K. (ed.) FSE 2008.
LNCS, vol. 5086, pp. 127–143. Springer, Heidelberg (2008)
15. Rivain, M., Prouff, E.: Provably Secure Higher-Order Masking of AES. In: Man-
gard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427.
Springer, Heidelberg (2010)
16. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Chan-
nel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659,
pp. 30–46. Springer, Heidelberg (2005)
17. Schramm, K., Paar, C.: Higher Order Masking of the AES. In: Pointcheval, D.
(ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 208–225. Springer, Heidelberg (2006)
Attacking Exponent Blinding
in RSA without CRT
Sven Bauer
1 Introduction
Consider a cryptographic device, e.g. a smart card, that calculates RSA signa-
tures. The device needs to be secured against side-channel attacks. The blinding
of the secret exponent, [7], is one standard countermeasure in this situation.
To sign a value x, the device generates a random number r and calculates the
signature as xd+rϕ(N ) mod N , where N is the RSA modulus and d is the secret
exponent. For each signature calculation, a fresh random number r is generated.
So an attacker who uses power analysis obtains exactly one power trace from
which he has to extract d+ rϕ(N ). This is unrealistic for modern hardware, even
if this hardware is not perfectly SPA resistant.
The attack by Schindler and Itoh, [8], starts with power traces of several
signing processes. The attacker obtains power traces corresponding to blinded
exponents d + rj ϕ(N ), j = 0, . . . n − 1. The idea of Schindler and Itoh is to look
for power traces with the same blinding factor or, more generally, sets of power
This work has been supported by the German Bundesministerium für Bildung und
Forschung as part of the project RESIST with Förderkennzeichen 01IS10027E. Re-
sponsibility for the content of this publication lies with the author.
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 82–88, 2012.
c Springer-Verlag Berlin Heidelberg 2012
Attacking Exponent Blinding in RSA without CRT 83
traces whose blinding factors add up to the same sum. The number of power
traces required to have enough of these “collision” is given by the (generalised)
birthday paradox. The number of “collisions” becomes too small, or the number
of sums to evaluate too large, for larger blinding factors (64 bit, for example).
The attacker is allowed to make a limited number of errors, i.e. identify some
collisions incorrectly.
Fouque et al., [4] use a completely different attack method. In their approach
the attacker also observes a number of power traces with different blinding fac-
tors. However, they assume that the attacker knows a few bits of each blinded
exponent with certainty. The attacker then uses an exhaustive search on the
unknown bits of each rj . Redundancy in the key material is used to calculate an
approximation to d+ r̃j ϕ(N ), where r̃j is the attacker’s guess for rj . The attacker
discards guesses which do not match the known bits of d + rj ϕ(N ). Having thus
obtained a number of blinding factors rj the attacker guesses chunks of d and
ϕ(N ) from the least significant bit upwards, again discarding guesses that do
not match known bits.
As pointed out in [8] the model in [4] is not very realistic. An SPA attacker
will always have noisy measurements. Single bits are never known with certainty.
For example, an SPA attacker who looks at a single power trace of a square-and-
multiply implementation can usually separate individual operations but can only
give a probability that any particular operation is a squaring or a multiplication.
In Sect. 2 we give a realistic model of the information an SPA attacker obtains
that captures this idea. In Sect. 3 we translate the attack of [4] into this setting.
We show how an attacker, given some information about each bit of a blinded
exponent d + rϕ(N ), can use redundancy in key material to correct observation
errors and obtain r. Repeating this several times, our attacker can then find a
number of blinding factors rj and then combine the information of several power
traces to determine d and ϕ(N ). The idea to correct errors in noisy observation
by exploiting redundancy in key material was inspired by cold boot attacks (see
[5], [6]).
For the remainder of this paper we assume that an SPA attacker measures
power traces on a cryptographic device while the device calculates the expo-
nentiation in an RSA signature generation. The exponentiation is implemented
as a square-and-multiply algorithm and is protected by exponent blinding as
described above. It is of course the attacker’s goal to find the secret exponent d.
2 A Statistical Model
Given these assumptions, the security of the cryptographic device depends on the
indistinguishability of squarings and multiplications. In practice, however, this is
rarely perfect. Usually, an attacker will be able to say that a particular operation
is a squaring with some likelihood. Note that, even if the attacker guesses 98%
of the exponent bits correctly, correcting the 2% erroneous guesses at unknown
positions in a 1024 bit exponent is unrealistic with exhaustive search.
We assume that for each (square or multiply) operation there is one point in
time for which power consumption follows a normal distribution N (μS , σ) for a
84 S. Bauer
Fig. 1. Histogram of current measurements for square and multiply operations with
mean, standard deviation and sample size. Normal distributions with corresponding
means and standard deviations are also shown.
squaring and N (μM , σ) for a multiplication. This model has been justified by
measuring power traces on a smart card. An example is shown in figure 1. The
figure shows the distribution of the current during a fixed point in time for 16500
squarings and multiplications. The samples were measured on a typical smart
card controller. The same code was used for both squarings and multiplications.
The larger |μS − μM |/σ, the easier it is for the attacker to distinguish squarings
from multiplications.
We suppose that the attacker knows μS , μM and σ. He can obtain these
values, for example, by studying the usually unprotected RSA signature verifi-
cation if the same square-and-multiply implementation is used for both signature
generation and verification.
Now the attacker can convert a power trace that captures m operations (each
a squaring or a multiplication) to a sequence of values cj , j = 0, . . . , m − 1,
where each cj gives the likelihood that the j-th operation is a squaring. Each cj
is drawn from a distribution with p.d.f.
2 2
e−(t−μs ) /(2σ )
g(t) = )2 /(2σ2 )
, t∈R (1)
e−(t−μs + e−(t−μM )2 /(2σ2 )
Note that template matching (see [2]) gives information of the same kind. How
well a template for a squaring operation matches, determines, how likely the
attacker thinks this operation to be a squaring.
Attacking Exponent Blinding in RSA without CRT 85
This model is more subtle than the usual assumption that a certain fraction
of an attacker’s guesses is incorrect. Our model also captures that the attacker
knows which guesses are more likely to be correct than others.
Translations between our model and the usual assumption are easily done. For
a sample t from a power trace, an attacker will decide that the corresponding
operation is a squaring if |t − μS | < |t − μM |. The probability that the guess
is correct is Φ−1 (|μS − μM |/(2σ)), where Φ is the c.d.f. of the standard normal
distribution. From a statistical table we see that if |μS − μM |/σ = 2 the attacker
guesses correctly about 84% of the time.
3 The Attack
As in [4] we assume that the public exponent e is small (i.e. about 16 bit).
This is no real restriction since in practice e = 65537 seems to be used almost
exclusively.
Like the attack in [4], our attack consists of three steps. In a first step, the
attacker looks at a single power trace and builds a list of likely candidates for
the random blinding factor r. In a second step, this list is narrowed down to just
one value, so the attacker finds r for this power trace. The attacker repeats this
for a number n of power traces, obtaining the corresponding blinding factors rj ,
j = 0, . . . n − 1. Finally, the attacker puts the information together to construct
d and ϕ(N ).
3.1 Step 1: Find a List of Likely Candidates for the Blinding Factor
The attacker has recorded a single power trace of an RSA signature generation
with blinded exponent d+rϕ(N ) and converted it to a sequence cj , j = 0, . . . , m−
1 as in Sect. 2. So cj is the likelihood that operation j is a squaring.
Assume the random blinding factor r is bits long. Note that the most
significant bits of d + rϕ(N ) depend only on r and the most significant bits
of ϕ(N ) (up to a carry that might spill over, but is not very likely to propagate
very far). It is a well known trick of the trade to approximate the high bits of
ϕ(N ) by N , see, for example [1], [4].
The attacker makes a guess r̃ for the i most significant bits of r, calculates the i
most significant bits of r̃N and derives the corresponding sequence v0 , . . . , vw−1
of squarings or multiplications, i.e. vj ∈ {S, M}. The attacker can judge the
quality of his guess by calculating
w−1
cj if vj = S
Q1 (r̃) = log qj , where qj = (2)
j=0
1 − cj if vj = M
The higher Q1 (r̃), the more likely the guess r̃ is to be correct. This way, the
attacker can assemble a set of guesses for the most significant bits of r, discarding
those with a low value Q1 . Given this set, the attacker guesses additional lower
bits, again discarding those guesses which score too low under Q1 .
86 S. Bauer
As a result of step 1, the attacker has a set C of guesses for r. The size of this
set is typically a few millions. The next task is to find the one correct value for
r, which, he hopes, is contained in his set of guesses.
By definition of the secret exponent d there is an integer k, 0 < k < e such
that
ed − kϕ(N ) = 1. (3)
Approximating ϕ(N ) by N again, we obtain an approximation for d:
˜ = 1 + kN .
d(k) (4)
e
The attacker now runs through all possible values of k and all guesses r̃ ∈ C and
˜ + r̃N . This he views as an exponent and writes
calculates the upper half of d(k)
down the corresponding sequence v0 , . . . , vw−1 of squarings or multiplications,
i.e. vj ∈ {S, M}. In a way similar to Sect. 3.1 he can judge the likelihood of k
and r̃ being correct by calculating
w−1
cj if vj = S
Q2 (r̃, k) = log qj , where qj = . (5)
j=0
1 − c j if vj = M
The attacker expects that the pair (r̃, k) with the highest value Q2 (r̃, k) is the
correct blinding factor with the correct value of k in (3).
Note that here e, and hence k, is required to be small for the exhaustive search
on k to be feasible.
The attacker repeats steps 1 and 2 for a number n of power traces. Note that
the exhaustive search on k in step 2 is only necessary once, because k has the
same value for all power traces.
As a result, the attacker has a set of power traces of exponentiations with
exponent d + rj ϕ(N ), j = 0, . . . , n − 1 and knows all blinding factors rj . Recall
that he also knows the high bits of d (from (3)) and of ϕ(N ) (because ϕ(N ) can
be approximated by N ). For the attacker it remains to find the lower half of d.
To do this, the attacker guesses chunks of d and ϕ(N ) from the least signif-
icant bit upwards. For a guess d, ˜ ϕ̃ of the w least significant bits of d, ϕ(N ),
respectively, he calculates uj = d˜ + rj ϕ̃, j = 0, . . . , n − 1 and converts the w
least significant bits of the uj to a sequence of squarings and multiplications
vj,0 , . . . , vj,m−1 , vj,i ∈ {S, M}. He then calculates
j −1
m
n−1
cj if vj = S
˜ ϕ̃) =
Q3 (d, log qj,i , where qj = . (6)
j=0 j=0
1 − cj if vj = M
Attacking Exponent Blinding in RSA without CRT 87
As in step 1 the attacker has to keep a set of likely choices for lower bits of d
and ϕ(N ) while working his way upwards from the least significant bit to the
middle of d and ϕ(N ). When he has gone through all unknown lower bits of d
and ϕ(N ) he can then use the known high bits of d and ϕ(N ) to discard wrong
guesses.
The final test is, of course, the signing of a value.
4 Discussion
Step 1 is the most critical part of the attack. If the correct value of r does not
survive in the set of best choices then step 1 of the attack fails. Note that this
can be detected in step 2. A wrong value for r will have a very low score under
Q2 . So the attacker can simply discard these power traces.
We have implemented the attack on a PC. Simulation results suggest that if
r is 32 bit in size, Δ = |μS − μM |/σ = 1.8 and up to 220 candidate values for r
are kept in step 1 then the attack finds r and k in 85% of the cases and within
about a day of computation time on a modern PC. Once k is known, running
steps 1 and 2 for further power traces takes less than a minute. For Δ = 2.4 the
success rate increases to 99%. If Δ = 1.8 the attacker guesses 18.4% of square-
or-multiply operations incorrectly. The value Δ = 2.4 corresponds to an error
rate of 11.5%.
If r is 64 bit in size and Δ = 2.4 the attack finds r and k in 50% of the cases
and within a day if 220 candidate values are kept in step 1. Note that a blinding
factor of this size is sufficient to protect against the attack in [8]. However, the
attack in [8] is more generic and also applies to RSA implementations based on
the Chinese Remainder Theorem or point multiplication on elliptic curves.
We have explained the attack in the context of a square-and-multiply im-
plementation. It is easily extended to m-ary or sliding window methods. The
attack can also be applied to square-and-always-multiply implementations if the
attacker can distinguish real multiplications from fake multiplications with suf-
ficiently high probability.
The input data for the attack is a string of probabilities for a particular oper-
ation to be a squaring. There are many ways to obtain this string of probabilities
from a power trace. For simplicity we suggested that this could be done directly
by choosing a particular point in time within an operation. The attacker can
also apply template matching (see [2]) or correlations within a power trace (see
[3] and [9]). The method by which the probabilities are derived from the power
trace has a significant influence on the size of Δ and hence the efficiency of the
attack. (The larger Δ the more efficient is the attack.)
The most obvious countermeasure is to increase the size of the blinding factor
r. Increasing r makes the blinded exponent d+rϕ(N ) longer and degrades perfor-
mance. It would be interesting to have a formula that expresses the time/memory
complexity of step 1 of the attack in terms of Δ and the size of r.
88 S. Bauer
5 Conclusion
References
1. Boneh, D.: Twenty Years of Attacks on the RSA Cryptosystem. Notices of the
AMS 46, 203–213 (1999)
2. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç, Ç.K.,
Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003)
3. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal Cor-
relation Analysis on Exponentiation. Cryptology ePrint Archive, Report 2010/394
(2010), http://eprint.iacr.org/2010/394
4. Fouque, P.-A., Kunz-Jacques, S., Martinet, G., Muller, F., Valette, F.: Power Attack
on Small RSA Public Exponent. In: Goubin, L., Matsui, M. (eds.) CHES 2006.
LNCS, vol. 4249, pp. 339–353. Springer, Heidelberg (2006)
5. Halderman, J.A., Schoen, S.D., Heninger, N., Clarkson, W., Paul, W., Calandrino,
J.A., Feldman, A.J., Appelbaum, J., Felten, E.W.: Lest We Remember: Cold Boot
Attacks on Encryption Keys. In: 2008 USENIX Security Symposium (2008),
http://www.usenix.org/events/sec08/tech/full papers/halderman/
halderman.pdf
6. Heninger, N., Shacham, H.: Reconstructing RSA Private Keys from Random Key
Bits. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 1–17. Springer,
Heidelberg (2009)
7. Kocher, P.C.: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and
Other Systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113.
Springer, Heidelberg (1996)
8. Schindler, W., Itoh, K.: Exponent Blinding Does Not Always Lift (Partial) Spa
Resistance to Higher-Level Security. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011.
LNCS, vol. 6715, pp. 73–90. Springer, Heidelberg (2011)
9. Walter, C.D.: Sliding Windows Succumbs to Big Mac Attack. In: Koç, Ç.K., Nac-
cache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 286–299. Springer,
Heidelberg (2001)
A New Scan Attack on RSA
in Presence of Industrial Countermeasures
1 Introduction
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 89–104, 2012.
© Springer-Verlag Berlin Heidelberg 2012
90 J. Da Rolt et al.
input is set to a chosen value, then the circuit is reset, followed by its execution in
normal mode for some cycles, and finally the circuit is switched to test mode and the
scan contents are shifted out. By repeating this multiple times with different chosen
plaintexts, the scan contents may be analyzed to find the secret key. In the case of
scan-attacks the basic requirement is that the cipher operation may be stopped at any
moment, and the contents of the intermediate registers can be scanned out, thus
compromising the hardware implementation of the cryptographic algorithm. A
common technique adopted by many smart-card providers is to disable the test
circuitry (such as JTAG) after manufacturing test. This solution may not be
acceptable for systems which require test and debug facilities in-the-field. High
quality test is only ensured by full controllability and observability of the secure
circuit, which may compromise security. Another alternative is BIST, which is
intrinsically more secure. However not all the circuits are suited for BIST (e.g.
microprocessors) and BIST provides just a pass/fail signature which is not useful for
diagnosis. Many countermeasures have been proposed in the literature [3][4],
however, each of them have their limitations and there is no full-proof mechanism to
deal with this leakage through the scan chains.
One of the attacks proposed in the literature, concerns the RSA algorithm [5].
However it supposes that the design has a single scan chain. Unfortunately, this
assumption is not realistic, since more complex DfT methods are required for meeting
the design requirements and reducing the test cost. Techniques such as multiple scan
chains, pattern decompression [6], response compaction [7] and filters to increase the
tolerance to unknowns [8] are commonly inserted in the test infrastructure. These
structures are supposed to behave as countermeasures against scan attacks, due to the
apparent reduction on the observability of internal states, as proposed in [9].
In this paper we propose a new attack on RSA that works even in the presence of
advanced DfT methods. We describe all the issues on carrying out the attack, and how
to overcome them. Additionally, we prove its feasibility by actually performing the
attack on a RSA design. Moreover, the attack may be applied without knowledge of
the DfT structures, which makes the attack more realistic.
The outline of the paper is as follows. In section 2, we present the previous work
performed in the field of scan-attacks on symmetric and public-key ciphers and some
proposed countermeasures. The RSA scan-attack itself is described in section 3. Then
in section 4, we describe how we deal with the practical aspects of performing the
attack. The experimental results containing a discussion about the applicability of the
scan attack in the presence of industrial DfT methods and known scan-attack
countermeasures is presented in section 5. A comparison with the previous RSA scan-
attack is given in section 6. Finally, we conclude the paper with plans for future work
in section 7.
2 Previous Work
The first scan attack proposed in the literature [1] was conceived to break a Data
Encryption Standard (DES) cipher. Karri et al. described a two phase procedure which
A New Scan Attack on RSA in Presence of Industrial Countermeasures 91
consists in first finding the position of the intermediary registers on the scan chain, and
then retrieving the DES first round key by applying only 3 chosen plaintexts. Later the
same authors proposed [2] an attack on the Advanced Encryption Standard (AES). This
one was based on the differential method, which analyses the differences of scan
contents instead of the direct value itself. By using this method, the preliminary step of
identifying the position of the intermediary registers is no longer required. Advances
were also made on proving that public-key implementations are susceptible to scan
attacks. RSA and Elliptic Curve Cryptography (ECC) keys are retrieved by methods
described in [5] and [10] respectively. Besides, some scan-attacks were also proposed
for stream ciphers, for example [11].
Binary exponentiation algorithm is used as the target algorithm for the RSA scan-
attack in [5], while the Montgomery Powering Ladder is used for the ECC attack in
[10]. Both the attack methods are based on observing the values of the intermediate
register of interest on the scan chain for each bit of the secret key (decryption exponent
for RSA, and scalar multiplier for ECC), and then correlating this value with a previous
offline calculation, which the authors refer to as ‘discriminator’. If the value matches
with this discriminator value, a corresponding decision is taken on the key bit.
In order to secure the test structures, several countermeasures have been proposed.
They may be classified in three different groups: (1) methods to control the access to
the test facilities through the use of secure test wrappers [12]; (2) methods to detect
unauthorized scan operations [13] as probing and other invasive attacks; (3) methods
that provide confusion of the stream shifted out from the scan outputs [14].
Additionally, it was suggested in [6] that advanced industrial DfT methods such as
response compression are enough to impede any attack. However, advanced attacks
[15][16] have been conceived to deal with those methods.
3.1 RSA
The Rivest-Shamir-Adleman (RSA) algorithm is a widely used public-key
cryptographic algorithm, employed in a wide range of key-exchange protocols, such
as the popular Diffie-Hellman scheme. A brief description of the RSA algorithm is
presented below:
When RSA is implemented in hardware, there are various possible options and
many algorithms are available. Montgomery Exponentiation method is most often
used, owing to its efficient hardware implementation, as it does away with the
expensive division operation required for modular multiplications involved in an
exponentiation. Hence we choose the Montgomery method as the target for our scan-
chain attack.
The Montgomery product of two n-bit numbers A with B is denoted by:
A * B = A . B . R-1 mod N,
where ‘.’ denotes a modular multiplication, N is the modulus or prime number in the
modular multiplications, and R = 2n, with n being the number of bits of the RSA
algorithm used. In this case study, we are using 1024-bit RSA.
The algorithm for a Montgomery Exponentiation used in RSA can be presented as
follows [17]:
Algorithm 3: Montgomery exponentiation
INPUT: Prime m = (ml−1 … m0)b, R = bl, exponent e = (et … e0)2 with et = 1, and
an integer x, 1 ≤ x < m (l is the number of bits in the prime number, 1024 in our case,
b is the base, which is 2 for binary).
OUTPUT: xe mod m.
1. xtilde ← Mont(x, R2 mod m), A ← R mod m. (R mod m and R2 mod m may
be provided as inputs.)
2. For i from t down to 0 do the following:
(a) A ← Mont(A, A).
(b) If ei = 1, then A ← Mont(A, xtilde).
3. A ← Mont(A, 1).
4. Return (A).
Mont (A, A) is known as the squaring (S) operation, while the Mont(A, xtilde) is
known as the Multiplication operation (M) for Montgomery Exponentiation. The
square and multiply operations are actually modular multiplications implemented
using the Montgomery multiplication algorithm [17]. Each iteration of the loop within
the algorithm consists either of a squaring and multiply operations if the key bit is 1,
or only a squaring operation if the key bit is 0.
In our proposed scan-based attack, we are focusing on the intermediary register (A,
in the algorithm above) which stores the value after each Montgomery multiplication.
Irrespective of how the RSA modular exponentiation is implemented, the intermediate
value will always be stored in a register. For instance, we may have a
hardware/software co-design for the RSA crypto-processor, where the Montgomery
multiplier is implemented as a co-processor in hardware (for efficiency) and the
control logic or the algorithm for the Montgomery exponentiation implemented in
software on a microcontroller. In this case, the results of the intermediate
Montgomery operations may be stored in an external RAM, but this value needs to be
transferred and stored in the registers inside the Montgomery multiplier datapath to
allow the module to perform the computations correctly.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 93
The leakage analysis as well as the attack methods implemented by this tool lies on
some assumptions:
─ the cipher algorithm is known as well as the timing diagrams. The designer in
charge of checking scan attack immunity should have this information;
─ the scan chain structure is not known by the attacker. The scan length, as the
number of internal chains and the order of the scan flip-flops are also supposed to
be hidden. Although the input/output test pins (interface) are controllable;
─ it is possible to control the test enable pin and then switch from mission mode to
test mode, which allows the cipher operation to be “stopped” at any moment;
─ it is possible to control the input plaintexts (e.g. a design primary input) and to
observe the values related to the intermediate states by means of scan out;
It is important to notice that all these assumptions are shared among all the scan
attacks proposed in the literature. Additionally, these assumptions are fulfilled by
majority of the test scenarios due to the fact that high testability is achieved by
controlling and observing a huge number of design internal nodes.
One of the main advantages of the attack proposed in our paper over the previous
RSA attacks is the fact that it works in the presence of industrial DfT structures. For
that purpose, the differential mode [2], [16] is used to deal with linear response
compactors which are inserted by majority of the DfT tools. Without compaction, the
values stored in the SFFs are directly observable at the test output while they are
shifted out. On the other hand, in the presence of compaction, each bit at the test
output depends on multiple SFFs. In the case of parity compactors, each output bit is
the XOR operation between the scan flip-flops on the same “slice”. It means that the
actual value stored in one SFF is not directly observable. Instead, if it differs from the
94 J. Da Rolt et al.
value expected, the parity y of the whole slice also differs, and so faults may be
detected. This difference may
m also be exploited by an attacker.
Fig. 1.a shows a cryptoo block, its cipher plaintext, and the intermediate regiister
which is usually the target of
o the scan attack. The rest of the circuit will be omittedd for
didactic reasons. The differrential mode consists of applying pairs of plaintexts, in this
example denoted by (M0, M1).
M The circuit is first reset and the message M0 is loadded.
Then after N clock cycless the circuit is halted and the intermediate register I00 is
shifted out. The same pro ocedure is repeated for the message M1 for which I11 is
obtained. Let’s suppose th hat I0 differs from I1 in 6 bit positions as shown in 1.a,
where a bit flip is repreesented by a darker box. Let’s also suppose that the
intermediate register contaiins only 16 bits and the bits 0, 8, 10, 13, 14, and 15 are
flipping. The parity of the differences is equal to 0, since there is an even numberr of
bit flips.
In Fig. 1.b, the flip-flopss of the intermediary register are inserted as an examplee of
DfT scenario with response compaction. In this case there are four scan chains diviided
in four slices. RX representts the test output corresponding to the slice X. As it mayy be
seen, if only the bit 0 flips in
i the first slice (an odd number) this difference is refleccted
into a flip of R1. In slice 2, no
n bits flip and thus R2 remains the same. Two flips occuur in
slice 3: 8 and 10. In this casse, both flips mask each other, thus 2 flips (even) result iin 0
flips at the output R3. In slicce 4, 3 bit flips are sensed as a bit flip in R4.
The parity of flips in thee intermediate register is equal to the parity of flips at the
output of the response com mpactor. This comes from a basic property of this kindd of
response compactors: the parity
p of differences measured in the test output is equaal to
the parity of differences in the
t intermediate register.
This property is valid for any possible configuration of the scan chains (number of
scans versus slices). Additio onally it is also valid for compactors with multiple outpputs.
In this case, the differencee measured should consider all compactor outputs. T Thus
using the differential mod de, the attacker observes differences in the intermeddiate
register and then retrieves the secret key. Complex scenarios with other FFs of the
circuit are shown in Section n 4.
A New Scan Atttack on RSA in Presence of Industrial Countermeasures 95
hypothesis 0 is correct and the secret key bit is 0. If it is equal to P1, then the secret
key bit is 1. This procedure is repeated for all the bits of the decryption key.
Performing scan attacks on actual designs requires additional procedures which have
not been taken into consideration by some previous attacks proposed in the literature.
The two main practical issues consist of (1) dealing with the other flip-flops of the
design; (2) finding out the exact time to halt the mission mode execution and to shift
out the internal contents. The first issue is solved by analyzing the leakage of the FFs
of the intermediate register at the test output (described in sub-section 4.1). The
second issue is described in sub-section 4.2.
The scenario of Fig. 1 is commonly taken into consideration by scan attacks, however
in real designs other FFs of the design will be included in the scan chain. These
additional FFs may complicate the attack if no workaround is taken into account.
Fig. 2.a shows a design containing three types of FF. We define here three types of
scan flip-flops (SFFs), depending on the value they store, as shown in Fig. 2.a. T1
SFFs correspond to the other IPs in the design, that store data not dependent on the
secret. T2 SFFs belong to the registers directly related to the intermediate register,
that store information related to the secret key and that are usually targeted by
attackers (e.g. AES round-register). T3 SFFs store data related to the cipher but not
the intermediate registers themselves (such as input/output buffers or other cipher
registers). The leakage, if it exists, concerns the T2 type.
The goals of the leakage analysis is to find out if a particular bit of the intermediate
register (T2) can be observed at the test output, and locate which output bit is related
to it. Thus the analysis is focused on one bit per time, looking for an eventual bit flip
in T2. In order to do that, the pair (M0, M1) is chosen so that the value on T2N for
M0 differs by a single bit from the value T2N for M1. Denoting T2N as the value
stored in T2 after N clock cycles while the design is running in mission mode from
the plaintext M0 (the first event in mission mode is a reset). In Fig. 2.a the darker
blocks represent a bit that flips. Thus, in this case, the least significant bit of T2N
flips. Since the attack tries to verify if it is possible to observe a flip in the LSB of
T2N, it is ideal that there is no flip in T1N. To reduce the effect of the T1 flip-flops,
all the inputs that are not related to the cipher plaintext are kept constant. It means that
T1N for M0 has the same value of T1N for M1. However, the same method cannot be
applied to reduce the effects of T3. Since we suppose that the logic associated with T3
is unknown and since its inputs are changing, the value T3N for M0 may differ from
T3N for M1. In our example, let us flip only three bits of T3.
A New Scan Atttack on RSA in Presence of Industrial Countermeasures 97
Fig. 3. a. Design
n illustrating the categories of FFs. b. DfT scheme.
Figure 2.b shows the ressult of these bit flips in the scan chain and consequentlyy in
the test outputs. For didactic reasons, we suppose that the DfT insertion created 4 sscan
chains, and placed a patterrn decompressor at the input and a response compresssor
with two outputs (R and L)). As it may be seen, the slice 1 contains only T1 scan fflip-
flops, meaning that after thhe response compressor, the values of R1 and L1 are not
supposed to flip (because T1N
T has the same value for M0 and M1). For slice 2, the
same happens. Slice 3 contaains the only flipping bit of T2N and the other flip-flopps in
the slice do not change. In this case, the bit flip of the first bit of T2N is observaable
in R3. It means that an attaacker could exploit the information contained in R3 to ffind
the secret key. Hence, this is considered a security leakage and may be exploitedd by
the attack described in Secttion 3.
Slice 4 and slice 5 contaain flip-flop configurations that may complicate an attaack.
For instance in slice 4, therre are FFs of T1 and T2 that are not affected by a chaange
from M0 to M1. However it contains one FF affected in T3. It implies that the L4
value flips, which may confuse the attacker (he expects a single bit flip caused by the
LSB of T2). In this case, the attacker is able to identify that the value of L44 is
dependent on the plaintextt, but is not able to exploit this information, since the T3
related logic is supposed to be unknown. Another complication is shown in the
configuration of slice 5. If the LSB of T2 is actually on the same slice as a flippping
98 J. Da Rolt et al.
SFF of T3, the flip is masked and no change is observed in L5. In this case, the
attacker is not able to explo
oit that bit
Next, the attacker repeatts this method for each bit of the intermediary register (ee.g.,
1024 times for RSA). If it detected some useful leakage (like R3), he proceeds w with
the attack method explained d in the Section 3.
Since the same hardwaree is commonly used for both encryption and decryptionn in
RSA, we can run the hardw ware with a known encryption key in order to get the timming
he attacker must find out the number of clock cycles that a
estimations. For instance, th
Montgomery multiplication operation takes. With a known key, we know the numbeer of
Montgomery multiplicationss required for the square and multiply operations of the R
RSA
modular exponentiation (A Algorithm 1). Dividing the total time of execution for this
encryption by the number of o operations gives the approximate time required for one
Montgomery operation. Theen using repeated trial-and-error steps of the comparing the
actual output with the expeccted result after one Montgomery (presented in Section 33), it
may be possible to find out the
t exact number of clock cycles required.
n our attack during the decryption process to find out the
This timing is utilized in
decryption exponent. The RSAR in hardware is run in functional mode for the exxact
number of cycles needed d to execute a predetermined number of Montgom mery
operations. Then the hardw ware is reset, scan enable is made high and the scan-chhain
contents are taken out. Depending on whether the key bit was 0 or 1, either a squarring
(S) is performed or both square
s (S) and multiply (M) are performed respectively.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 99
In our proposed attack, we always run the software implementation for two
Montgomery cycles taking the key bit as 0 and 1 (two hypothesis in parallel). If the
first bit was 1, both square (S0) and multiply (M0) operations are performed,
otherwise two squarings (S0 & S1) are performed. Then the actual result from the
scan-out of the hardware implementation after each key bit execution is checked with
the results of the simulation in software. If it matches with the first result (of S0 and
M0), then the key bit is 1, otherwise the key bit is 0. Now, for the next step starting
with the right key bit, again the decryption is performed in software assuming both 0
and 1 possibilities. This time we run for one or two Montgomery cycles depending on
whether the previous key bit was 0 or 1 respectively. If the previous key bit was 0,
then Squaring on the next key bit (S2) is performed for key bit 0 and a Multiply on
the same key bit is performed (M1) for present key bit 1. On the other hand, if the
previous key bit was 1, then Squaring on the same (S1) and next key bit (S2) is
performed for present key bit 0 or a Square (S1) and Multiply (M1) on the same key
bit is performed (M1). The results are compared with the actual result from the scan-
out of the hardware implementation, and the corresponding decision taken. The
process is repeated in this way until all the decryption key bits are obtained.
As an example, if the decryption key bits were 101…, the timing decision tree
would follow the path denoted within the dotted lines in the figure (S0, M0, S1,
S2, M2,…).
5 Attack Tool
In order to apply the attack to actual designs, we developed an attack tool. The main
goal of this tool is to apply the attack method proposed in Section 3, as well as the
leakage analysis proposed in Section 4, to many different DfT configurations, without
modifying the attack.
The scan analysis tool is divided in three main parts: the attack methods and
ciphers (implemented in C++), the main controller (Perl), and the simulation part
which is composed by a RTL deck and ModelSIM, as it may be seen in Figure 1. In
order to use the tool the gate-level netlist must be provided by correctly setting the
path for both the netlist and technology files for ModelSIM simulations. Then the
design is linked to the RTL deck, which is used as an interface with the tool logic.
This connection is automatically done by giving the list of input and output data test
pins, as well as the control clock, reset, and test enable pins. Additionally, other inputs
such as plaintext and ciphertext must be set in the configuration file.
Once the DUT is linked, the tool may simulate it by calling ModelSIM SE with the
values established by the main controller. This interface is achieved by setting
environment variables in the Perl script which are read by the ModelSIM Tcl script
and then passed on to the RTL deck via generics. For instance, the information being
exchanged here is the plaintext (cipher specific), reset and scan enable timing (when
to scan and how long) and the value of the scan input (test specific). In return, the
scan output contents are stored in a file and they are processed by the main attack
controller in order to run the attacks.
100 J. Da Rolt et al.
On the left side of Fig. 5, the software part is shown (attack method and cipher
description). The new RSA attack is implemented in C++ based on a RSA cipher
implemented in the same language. We previewed new attacks against other ciphers,
e.g. ECC. Scan-attacks on other similar cryptosystems may be conceived since the
tool was built in such a way that adding a new cipher is straight-forward.
The core of the tool is implemented by the attack controller (Perl) which calls the
attack method (using a SWIG interface). The attack controller ensures that the settings
are initialized and then it launches both the attack and the simulation. As a secondary
functionality, the controller handles some design aspects, like timing and multiple test
outputs, so that the attack method itself may abstract that information. For instance,
the attack method has no information on how many clock cycles it takes to execute a
Montgomery multiplication. Also, it finds out the number of scan cycles that the shift
operation must be enabled so all the scan length is unloaded.
6 Experimental Results
In order to test the effectiveness of the attack, we implemented a 1024 bits RSA
algorithm in hardware with separate datapaths for the Montgomery multiplier,
adder/subtractor block and the main controller for the Montgomery exponentiation.
Then we envisaged different scenarios to test the attack flexibility. The first scenario
is a single chain containing all the FFs of the design. Then, in the next subsection, we
used Synopsys DfT Compiler (v2010.12) to insert more complex configurations such
as decompression/compaction. Finally, in the last subsection, we implemented some
countermeasures proposed in the literature to verify if the attack is able to overcome
them. All the runs were performed on a 4 GB Intel Xeon CPU X5460 with four
processors.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 101
The total number of FFs in the design is 9260. Out of these, 4500 belong to the T1
type, 1024 consist of the intermediate register (T2 type) and 4096 belong to the T3
type (see Section 3). Therefore using Synopsys DfT Compiler we inserted a single
chain with all these FFs, and the design was linked with the tool. Then the leakage
analysis was run over this configuration. For identifying each bit of the RSA
intermediate register (1024-bit long), the attack tool takes approximately 3.5 minutes
per bit. Then the tool proceeds with the attack method, in order to find the secret key.
In this phase, the tool takes again approximately 3.5 min per bit of secret key. Both
the timing for the leakage analysis and the attack are strongly dependent on the server
configuration. Additionally, the C++ code takes approximately 5 seconds from the 3.5
minutes, meaning that the simulation limits the execution time.
For our test case, we required around 11 messages to find out the full 1024-bit
RSA exponent. This number is less than that required for the attack presented in [5]
(which takes around 30 messages).
aims at confusing the hacker, since the sensitive data observed at the scan chain may
be inverted. However, since these inverters are placed always at the same location in
the scan chain, they are completely transparent to the differential mode.
The effectiveness of the attack against this countermeasure was validated on the
RSA design containing multiple scan chains and compaction/compression module.
Two implementations were considered with 4630 and 6180 inverters (50% and 75 %
of the overall 9260 FFs in the design respectively) randomly inserted in the scan
chains. For both cases, the tool was able to find leakage points and then to retrieve the
secret key.
In presence of partial scan: depending on the design, not all the flip-flops need to
be inserted in the scan chain in order to achieve high testability. As proposed in [4],
partial scan may be used for increasing the security of a RSA design against scan
attacks. However, the authors suppose that the attacker needs the whole sensitive
register to retrieve the secret key. As it was described in Section 3, the leakage
analysis feature can be used to find out which bits of the sensitive register are inserted
in the scan chain. Once these bits are identified, the attack can proceed with only
partial information, since each bit of the sensitive register is related to the key.
For evaluating the strength of the partial scan, we configured the DfT tool in such a
way so as to not to insert some of the sensitive registers in the scan-chain. In the first
case, half of the sensitive flip-flops were inserted in the chain. The tool was able to
correctly identify all the leaking bits and then to retrieve the secret key. Also in the
worst case situation, i.e., where only one secret bit was inserted in the chain, the tool
was still able to find out the correct secret key.
The approach taken in [4] is for a pure software attack which does not take into
account the practical aspects of applying it to an actual cryptographic hardware
implementation. The timing aspects are crucial to scan attacks on secure hardware,
which has been addressed in this paper. Our scan-attack analysis tool integrates the
actual hardware (in the form of a gate-level netlist with inserted DFT) with the
software emulation which allows us to perform the attack in real-time. The secret
decryption exponent key bits are deciphered on-the-fly using this combined approach.
Left-to-right binary exponentiation (employed in ordinary exponentiation) is used
as the target RSA algorithm for the attack in [4]. This is generally not implemented in
hardware owing to the expensive division operation involved in modular operations.
We target the Montgomery Exponentiation algorithm, which is by far the most
popular and efficient implementation of RSA in hardware, as there are no division
operations involved (owing to performing the squaring and multiply operations in the
Montgomery domain).
Moreover, an inherent assumption in the attack in [4] is that there are no other
exponent key-bit dependent intermediate registers which change their value after each
square and multiply operation. This may not be the practical case in an actual
hardware implementation, where multiple registers are key dependent and change
A New Scan Attack on RSA in Presence of Industrial Countermeasures 103
their values together with the intermediate register of interest in the attack (for
instance, input and output buffers). These registers may mask the contents of the
target intermediate register after XOR-tree compaction (as shown in the leakage
analysis in Section 3). Our proposed scan-attack analysis takes the contents of other
key-dependent registers present in the scan chain, and presents ways to deal with this
problem.
Finally, the attack in [4] cannot be applied to secure designs having test response
compaction and masking (which is usually employed in DfT for all industrial circuits
to reduce the test volume and cost). Our scan-attack analysis, on the other hand,
works in the presence of these scan compression DfT structures.
8 Conclusion
References
1. Yang, B., Wu, K., Karri, R.: Scan Based Side Channel Attack on Dedicated Hardware
Implementations of Data Encryption Standard. In: Proceedings IEEE International Test
Conference, ITC (2004)
2. Yang, B., Wu, K., Karri, R.: Secure Scan: A Design-for-Test Architecture for Crypto
Chips. In: Proceedings ACM/IEEE Design Automation Conference (DAC), pp. 135–140
(June 2005)
3. Sengar, G., Mukhopadhayay, D., Chowdhury, D.: An Efficient Approach to Develop
Secure Scan Tree for Crypto-Hardware. In: 15th International Conference on Advanced
Computing and Communications
104 J. Da Rolt et al.
4. Inoue, M., Yoneda, T., Hasegawa, M., Fujiwara, H.: Partial Scan Approach for Secret
Information Protection. In: European Test Symposium, pp. 143–148 (2009)
5. Nara, R., Satoh, K., Yanagisawa, M., Ohtsuki, T., Togawa, N.: Scan-Based Side-Channel
Attack Against RSA Cryptosystems Using Scan Signatures. IEICE Transaction
Fundamentals E93-A(12) (December 2010), Special Section on VLSI Design and CAD
Algorithms
6. Wang, L.-T., Wen, X., Furukawa, H., Hsu, F.-S., Lin, S.-H., Tsai, S.-W., Abdel-Hafez,
K.S., Wu, S.: VirtualScan: a new compressed scan technology for test cost reduction. In:
Proceedings of International Test Conference, ITC 2004, October 26-28, pp. 916–925
(2004)
7. Rajski, J., Tyszer, J., Kassab, M., Mukherjee, N.: Embedded deterministic test. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems 23(5), 776–
792 (2004)
8. Mitra, S., Kim, K.S.: X-compact: an efficient response compaction technique for test cost
reduction. In: Proc. ITC 2002, pp. 311–320 (2002)
9. Liu, C., Huang, Y.: Effects of Embedded Decompression and Compaction Architectures
on Side-Channel Attack Resistance. In: 25th IEEE VLSI Test Symposium, VTS (2007)
10. Nara, R., Togawa, N., Yanagisawa, M., Ohtsuki, T.: Scan-Based Attack against Elliptic
Curve Cryptosystems. In: Asia South-Pacific Design Automatic Conference, ASPDAC
(2010)
11. Liu, Y., Wu, K., Karri, R.: Scan-based Attacks on Linear Feedback Shift Register Based
Stream Ciphers. ACM Transactions on Design Automation of Electronic Systems,
TODAES (2011)
12. Das, A., Knezevic, M., Seys, S., Verbauwhede, I.: Challenge-response based secure test
wrapper for testing cryptographic circuits. In: IEEE European Test Symposium, ETS
(2011)
13. Hély, D., Flottes, M., Bancel, F., Rouzeyre, B., Berard, N., Renovell, M.: Scan Design and
Secure Chip. In: 10th IEEE International On-Line Testing Symposium, IOLTS 2004
(2004)
14. Hély, D., Bancel, F., Flottes, M., Rouzeyre, B.: Test Control for Secure Scan Designs. In:
European Test Symposium, ETS 2005 (2005)
15. Da Rolt, J., Di Natale, G., Flottes, M., Rouzeyre, B.: New security threats against chips
containing scan chain structures. Hardware Oriented Security and Trust, HOST (2011)
16. Da Rolt, J., Di Natale, G., Flottes, M., Rouzeyre, B.: Scan attacks and countermeasures in
presence of scan response compactors. In:16th IEEE European Test Symposium, ETS
(2011)
17. Menezes, A., van Oorschot, P., Vanstone, S.: Efficient Implementations. In: Handbook of
Applied Cryptography, ch. 14. CRC Press (1996)
18. Gezel Hardware/Software Codesign Environment,
http://rijndael.ece.vt.edu/gezel2/
RSA Key Generation: New Attacks
1 Introduction
Generating RSA keys in tamper-resistant devices is not only practical but also a
good security practice, since it eliminates the single point of failure represented
by a workstation performing multiple key generations. Generating keys in the
field also raises the question of the necessary level of tamper resistance. The rel-
ative lack of publications related to side-channel attacks on RSA key generation
may give the impression that one can get away with basic countermeasures: the
published attacks concentrate on basic timing or SPA-type leakages [14,1] and
can be foiled with constant-time/SPA-secure implementation. In fact, even se-
cure implementations achieving the highest grade of tamper-resistance according
to the Common Criteria evaluations framework consider only timing and SPA
attacks [2]. However, a careful reading of [1] shows a different picture.
We assume that the trial division algorithm itself and the Miller-Rabin
test procedure are effectively protected against side-channel attacks. [. . . ]
If any security assumptions [. . . ] are violated, it may be possible to im-
prove our attack or to mount a different, even more efficient side-channel
attack.
In this paper, we close the gap and show new attacks against RSA key generation
that can be combined with the techniques from [1] but are also powerful to
enough to fully reveal an RSA key by themselves. The settings that we consider
are similar to [1]: we assume an incremental prime search algorithm possibly
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 105–119, 2012.
c Springer-Verlag Berlin Heidelberg 2012
106 C. Vuillaume, T. Endo, and P. Wooderson
enhanced with sieving, but unlike [1] we focus on the primality test procedure.
Our tools consist of differential power analysis [3], the template attack machinery
[4], fault analysis [5] and combinations thereof. In particular, to the best of our
knowledge, this is the first published fault attack on RSA key generation.
The paper is organized as follows. Section 2 recalls basic facts about RSA key
generation. Section 3 is the first attack, which is a DPA on the least significant
bits of the prime numbers calculated within RSA key generation. Section 4 is
the second attack, a template attack on the most significant bits of the prime
numbers. In Section 5, two fault attacks are described: a fault attack for increas-
ing the number of samples available for leakage attacks, and a safe-error attack
revealing the most significant bits of the prime numbers. Finally, we conclude in
Section 7.
We will briefly enumerate basic facts about RSA key generation and prime
number generation; refer to [6] for an more complete overview. On input e
public exponent and bit length, RSA key generation calculates two random
large primes p and q of bit length , with the additional requirement that
gcd(e, φ(p ∗ q)) = 1, where φ is Euler’s totient function. The RSA private key is
the exponent d = e−1 mod φ(p ∗ q) whereas the public key consists of the public
exponent e and the 2-bit RSA modulus n = p ∗ q.
The most computation-intensive step of RSA key generation is the generation
of prime numbers. To find large primes, one usually selects a random integer and
tests it for primality with a probabilistic test such as the Fermat or the Miller-
Rabin test. The Fermat primality test works as follows: given a prime candidate
p, a random number 0 < a < p is selected and ap−1 mod p is calculated and
compared with 1, which is the expected result when p is prime. It is well-known
that there exist (composite) integers p̃ called Carmichael numbers for which
ap̃−1 = 1 mod p̃ for all integers a such that gcd(a, p̃) = 1, despite p̃ not being
prime. As a result, the Fermat test is rarely used in practice.
Owing to theoretical results on its average and worst case error probabilities
[7], the Miller-Rabin test is often preferred. In the Miller-Rabin test, instead of
p − 1, the odd exponent (p − 1)/2s is employed: first, for a random 0 < a < p,
s
the exponentiation t = a(p−1)/2 mod p is calculated and the result is compared
with 1 and −1; the test is passed if there is a match. If not, the following step
is repeated s − 1 times: t = t2 mod p is calculated and the result is compared
with 1 and −1; if t = 1, the candidate p is composite and the test is failed, but
if t = −1 the test is passed. If after the s − 1 iterations t was never equal to 1 or
−1, the candidate p is composite and the test is failed.
For efficiency reasons, it is preferable to apply a trial division step before ex-
ecuting the costly primality test. In addition, the cost of trial divisions can be
amortized over several candidates when an incremental search algorithm is used
[8,10]. Incremental prime search is one of the techniques recommended by cryp-
tographic standards for prime generation; see for example [11, Appendix B.3.6].
RSA Key Generation: New Attacks 107
However, it is not the only way to sieve candidates that are not divisible by a set
of small primes: there exist other methods [12,13], but for the sake of simplicity
we will restrict the discussion to incremental prime search in this paper.
(j)
Property 2 (Quadrature Phase). Owing to a different hypothesis for pi ,
the functions fi+1 and gi+1 are in quadrature phase. Formally, for j positive
integer:
fi+1 (j + 2i−1 ) = gi+1 (j) (2)
Property 1 means that the functions fi+1 and gi+1 have their output flipped
every 2i ticks, and Property 2 that the distance between the output flips of fi+1
and gi+1 is 2i−1 .
(j)
p1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
(0)
hypothesis p2 = 0
(j)
p2 0 1 0 1 0 1 0 1 0 1
f3 C D C D C D
(0)
hypothesis p2 = 1
(j)
p2 1 0 1 0 1 0 1 0 1 0
g3 C D C D C D
0 v u 4 8 12 16 20
(0)
Fig. 1. Example of the attack on p3
Attack Methodology: The core of the attack is that the knowledge of the bits
(0) (0) (0) (0)
(pi−1 . . . p0 )2 and the guess of bit pi = 0 (resp. pi = 1) tells us when the
output of the function fi+1 (resp. gi+ ) is going to be flipped (without knowing the
(0)
exact value of the output). The attacker considers the two hypotheses pi = 0
(0)
and pi = 1. Accordingly, the attacker is able to calculate the smallest increment
u (resp. v) for which the output of the function fi+1 (resp. gi+1 ) is flipped. By
Property 2, it is clear that we have |u − v| = 2i−1 .
(0)
Next, following hypothesis pi = 0 , the attacker distributes the k measured
power traces of the Fermat test into two classes C and D:
– The power traces with index j < u belong to class C.
– The next 2i traces with index u ≤ j < u + 2i belong to class D.
– The next 2i traces with index u + 2i ≤ j < u + 2 ∗ 2i belong to class C.
– And so on.
RSA Key Generation: New Attacks 109
After that, the attacker computes Δ0 the difference of the average power traces
(0)
in classes C and D. Finally, the same thing is done for hypothesis pi = 1 in
1
order to obtain Δ . Figure 1 illustrates the result of the two classifications.
(0)
Attack Result. On the one hand, a correct hypothesis pi = b leads to a correct
classification in the classes C and D. As a result, all of the traces in class C have
(j)
the same value for bit pi+1 = β, and all of the traces in class D have the same
value for bit pi+1 = 1 − β. Thus, the differential power trace Δb should exhibit
(j)
(j)
a peak as a result of opposite values for bit pi+1 in the two classes.
(0)
On the other hand, an incorrect hypothesis pi = 1 − b leads to an incorrect
classification in the classes C and D. As a consequence of Property 2, about one
(j) (j)
half of the power traces in C have pi+1 = 0 and the other half pi+1 = 1, and the
same thing can be said for class D. Therefore, the large peak that can be seen
in Δb should vanish in Δ1−b .
correct classification
incorrect classification
(0)
Fig. 2. Attack result for DPA on p1 , 100 samples
110 C. Vuillaume, T. Endo, and P. Wooderson
It is important to understand that while we use one single bit for the classifi-
cation of power traces, the physical state of this bit is only a minor contributor
to the DPA peak. Indeed, flipping one exponent bit can trigger a vastly differ-
ent behavior of the exponentiation algorithm, such as the activation of different
hardware modules, the execution of different program branches with different
addresses or the access to different pieces of data located at different addresses.
This is the reason why we were able to see DPA peaks with a very small number
of samples. Of course, the practicality of our attack depends on the targeted
hardware and software. Although we do not reveal the details of our experimen-
tal setup, we think that our result serves its purpose, namely showing that the
attack is not only theoretical but also practical, and that the scope of security
evaluations and countermeasures in prime generation should not be limited to
timing attacks or SPA.
Discussion. The number of bits that can be revealed by this attack is limited
by nature, because at least 2i+1 samples (e.g. executions of the Fermat test) are
(0) (0)
necessary to reveal bits p1 to pi . According to the prime number theorem,
the number of -bit primes is about:
2 2−1 2−1
π(2 ) − π(2−1 ) ≈ − ≈ (3)
ln(2 ) ln(2−1 ) ln 2
Above, π is the prime-counting function. Thus, the average distance between
two -bit primes is 2 ln 2, and the average distance between a random -bit
number and the next prime is about ln 2. For = 512 bits and excluding even
integers, this means that there are on average about 177 executions of the Fermat
test until a prime number is found, and for = 1024 bits 354 executions. As a
consequence, the best that we can hope for is that the DPA attack is effective
for the first 6 or 7 low significant bits. However, as it will be shown in Section
4, revealing a few bits (even one single bit) will serve our purpose well enough,
because the DPA is only the first stage of a more complex attack.
Trial Division. The attack description above assumes that the Fermat test is cal-
culated after each increment of the prime candidate p(j) . In practice, incremental
prime search is combined with trial division therefore the Fermat test is not sys-
tematically executed. But if the increment δ = j − j between the executions
of the Fermat test with successive candidates p(j) and p(j ) is known, the same
attack methodology can be applied. It seems possible to obtain δ if the imple-
mentation is careless (i.e. not protected against SPA). Note that this may occur
even if a countermeasure against the attack presented in [1] is implemented.
However, trial division obviously decreases the number of times the Fer-
mat test is performed. Using the small primes π2 = 3, π3 = 5, . . . , πT (note
that 2 is excluded because we consider odd numbers only) and for πT large
enough, by direct application of the prime number theorem, the number of ex-
ecutions of the Fermat test (i.e. the number of survivors of the trial division
step) is approximately divided by ln(πT )/2. For example, with 256 small primes,
ln(π257 )/2 = ln(1621)/2 ≈ 3.7.
RSA Key Generation: New Attacks 111
The attack uses the simple fact that in an incremental prime search, the most
significant bits of the exponent do not change. This fact is true not only for the
Fermat test but also for the Miller-Rabin test. As a result, multi-shot template
attacks are in scope: with enough executions of the prime test, the accuracy of
template matching phase can be greatly increased. In addition, the same samples
and templates can be re-used for attacking all exponent bits, provided that they
are left unchanged by the incremental search.
For a single sample L with N points of interest, the likelihoods N (L|pi = 0)
and N (L|pi = 1) of observing sample L if bit pi is 0 (resp. 1) are:
⎧ 1 T −1
⎨ N (L|pi = 0) = √ 1N exp− 2 (L−M0 ) Σ0 (L−M0 )
(2π) |Σ0 |
1 T −1 (4)
⎩ N (L|pi = 1) = √ 1N exp− 2 (L−M1 ) Σ1 (L−M1 )
(2π) |Σ1 |
112 C. Vuillaume, T. Endo, and P. Wooderson
The highest likelihood yields the most probable value for bit pi [4]. Similarly, for
multiple samples L(0) , . . . , L(k−1) , the highest value of the log-likelihood yields
the correct value for bit pi .
L0 = k−1 log N (L(j) |pi = 0)
j=0
k−1 (5)
L1 = j=0 log N (L(j) |pi = 1)
Next, we estimate the number of samples k that can be expected during prime
search. As explained in Section 3, the candidate is typically incremented ln 2
times before a prime number is found. In addition, when trial division employs
the T − 1 first primes πj (excluding 2), the number of calls to the primality test
is about ln 2/ ln(πT ). For example, with = 512 bits and when T − 1 = 256, we
have on average 48 calls to the primality test and therefore 48 samples. When
= 1024, we can expect 96 samples. Note that when a prime is found, the
Miller-Rabin test is repeated several times in order to decrease the probability
of outputting a composite [9]. If the error probability must be smaller than 2−100 ,
this gives us 5 − 1 = 4 additional samples for the 512-bit case and 9 − 1 = 8
samples for the 1024-bit case.
Finally, our experimental results for the DPA validate the practicality of the
template attack as well, because the existence of DPA peaks implies that average
signals M0 and M1 can already be distinguished. Again, the differences in the
average signals arise not only from the value of bit pi but also from the behavior
of the exponentiation algorithm depending on bit pi . Since we were able to
observe DPA peaks with as little as 100 samples, a template attack with 96
samples is very likely to be successful.
5 Fault Attacks
5.1 Improving Leakage Attacks
The two attacks presented in Sections 3 and 4 suffer from a relatively limited
number of available samples in the average case. This is due to the fact that
the prime search algorithm exits as soon as a prime number is found. But it is
easy to imagine a (multi-) fault attack that lifts this limitation: when a prime
number has been found but additional rounds of the primality tests are still being
executed in order to decrease the error probability, a fault is induced during the
execution of the primality tests. As result, the candidate is incremented instead
of being returned, thereby increasing the number of samples. Interestingly, this
methodology can be applied to other scenarios. For example, while the attack
against incremental search in [1] has a success rate of 10-15% in the context of
a normal execution of the algorithm, it is always successful if the prime test is
disturbed.
We describe a possible implementation of the attack, where a fault is system-
atically triggered during the last multiplication of the primality test. In case the
candidate is composite, a faulty result is very unlikely to modify the outcome
of the test, but in case the candidate is a prime, a faulty result will certainly
RSA Key Generation: New Attacks 113
mislabel the prime as composite. The positioning of the fault induction system
is not critical since we simply aim at corrupting the result of the multiplication,
without any particular fault model. In order to maximize the effect of the attack,
it should be repeated several times, therefore the fault induction system should
handle multiple faults. This kind of attack is realistic considering modern lasers,
which have very low latencies and can be triggered with oscilloscopes [16]. Next,
two scenarios must be considered.
Free-Run Search. In a free-run search, the residues obtained by trial division are
updated using the simple relationship p(j+1) mod πi = (p(j) mod πi ) + 2 mod πi .
Thus, prime search can be continued indefinitely, until a prime number is found.
If the attacker is able to disrupt the primality test, he is also able to gather as
many samples as he likes. Thus, by combining fault attacks and leakage analysis,
the number of samples is essentially unlimited.
Above, s is the size of the interval (excluding even numbers) and the target bit
length of the prime. As a result, it is desirable to select a relatively large search
interval in order to reduce the failure probability. For example, using s = 2048
candidates yields a failure probability smaller than 9.7 ∗ 10−6 for 512-bit primes
and 3.1∗10−3 for 1024-bit primes. Taking trial division into account, the attacker
is able to gather at most 2s/ ln(πT ) samples. Note that the number of samples
does not depend on the target bit length of primes. For example, with s = 2048
and 256 primes for trial division, one can expect at most 554 samples.
On the one hand, the DPA from Section 3 can reach the upper bound 2s/ ln(πT ),
because its objective is simply gathering as many samples as possible for building
templates: the outcome of key generation is irrelevant. On the other hand, the
template attack from Section 4 assumes that the RSA key generation is even-
tually successful, otherwise the gathered samples are worthless. As a result, for
the template attack, there is a trade-off between the accuracy of the template
matching phase (i.e. having more samples) and the probability of success of the
attack. Indeed, if too many faults are induced, it may be that there are no prime
numbers left in the interval, in which case the prime search algorithm fails and
the gathered samples must be discarded. Assuming that the attacker can restart
the prime search in case of failure, this is an optimization problem that does not
affect the final outcome of the attack.
114 C. Vuillaume, T. Endo, and P. Wooderson
Unlike the attacks presented in the above sections, which are meant to be com-
bined, the final (multi-) fault attack described below can work independently.
Although it is possible to combine it with the template attack from Section 4,
such combination is somewhat redundant because both attacks target the most
significant bits of the exponent in primality tests.
We assume that exponentiation is calculated with the square-and-multiply-
always algorithm; in other words, multiplications (possibly dummy) are calcu-
(j)
lated independently from the value of exponent bits pi . The attack follows the
principle of safe-errors [15]: if a fault is induced during a dummy multiplication,
the output of the exponentiation is unaffected, but if a fault is induced during
a non-dummy multiplication, the output is corrupted. Note that this assumes
prior reverse-engineering work for identifying a fault target that will fit in the
safe-error model. For example, corrupting memory storing the input values to
the primality test is unlikely to serve our purpose because such faults will af-
fect the outcome of all calculations, but targeting the multiplication circuit or
internal multiplication registers seems more promising.
In order to apply the safe-error methodology, we will take advantage of the
following property of the Miller-Rabin test.
Property 3 (Rarity of Strong Pseudoprimes). With very large probability,
a large random integer passing one round of the Miller-Rabin test is a prime.
For example, the probability that a random 512-bit integer which passes one
round of the Miller-Rabin test is composite is about 2−56 , and for a 1024-bit in-
teger the probability will be even smaller [7]. Thus, an integer which passes one
round of the Miller-Rabin test is prime with very high probability, and therefore
will pass any additional rounds as well. But despite the very low probability of
failing subsequent rounds, if this event happens, it is likely that the actual imple-
mentation of the prime search algorithm will continue the search and increment
the candidate.
The details of the safe error attack against prime generation are given in what
follows. Initially, the target bit index i is set to − 1 (i.e. the most significant
bit of the exponentiation).
Note that the last round cannot be used because if the target bit is zero, the
fault will not affect the result and the prime search will successfully terminate,
depriving the attacker from remaining samples. For 512-bit and 1024-bit primes,
RSA Key Generation: New Attacks 115
Free-Run Search. The attacker may disrupt the primality testing as many times
as he likes, allowing him to reveal a large portion of the bits of the initial candi-
date p(0) . When he is satisfied with the number of bits obtained, the prime test
is left unperturbed until a prime number p(j) (close to p(0) ) is found.
Array Search. The attacker may disrupt the primality test to the extent that
prime numbers are still present in the remaining part of the search interval.
Recall that for an -bit initial candidate p, the average number of primes in the
interval p, . . . , p + 2s is:
2s
π(p + 2s) − π(p) ≈ 2s/ ln(p) ≈ (7)
ln 2
Above, π is the prime-counting function. For example, for a = 512-bit initial
candidate and with an array size of s = 2048, there are 11.54 prime numbers
in average, and therefore the attack reveals 20 bits in the average case. But
it is of course possible that due to a “lucky” choice, there are more primes in
the interval. For example, if there were 147 primes in the interval, this would
be enough to reveal 256 bits, and the rest of p could be calculated using a
lattice attack following the same strategy as [1]. In case the prime search is
re-started after a failure, the attacker may simply try until sufficiently many
bits are available. But for a typical choice of the interval size s, it is extremely
unlikely that the interval will ever contain sufficiently many primes.
116 C. Vuillaume, T. Endo, and P. Wooderson
Under Hardy and Littlewood prime r-tuple conjecture, Gallagher proved that
the number of primes in a short interval follows a Poisson distribution [17]. As
a result, the probability to have more than k primes in an interval of size s is
about:
2s
Pr [π(p + 2s) − π(p) ≥ k] ≈ 1 − PoissCDF(k, λ) with λ = (8)
ln 2
In addition, the Cumulative Distribution Function PoissCDF of a Poisson distri-
bution is easily calculated with the equation below.
k
λi
PoissCDF(k, λ) = e−λ (9)
i=1
i!
0.045
experimental probability
theoretical Poisson PDF
0.04
0.035
0.03
probability
0.025
0.02
0.015
0.01
0.005
0
100 110 120 130 140 150 160 170 180
number of primes
6 Countermeasures
A simple countermeasure to our attacks would be to ensure that key generation
is run in a secure environment, thereby eliminating the threat of side-channel
attacks. But a device able to run key generation in the field is undeniably more
attractive and more flexible, and eliminates infrastructure costs arising from the
requirement of a secure environment. We suggest a few countermeasures that
could prevent our attacks and allow key generation to be securely performed in
the field.
Next, with a random number a satisfying gcd(a, r ∗p) = 1, it follows from Euler’s
theorem that if p is a prime, then aφ(r)∗(p−1) = 1 mod r ∗ p holds.
It is clear that if p is composite, a Fermat liar a for which a(p−1) = 1 mod p
yields a liar for the randomized test, therefore the number of liars of the ran-
domized Fermat test is strictly larger than the number of liars of the normal
Fermat test. However, in practice, the test is accurate enough for quickly elimi-
nating composites. In order to assess primality in a reliable way, a candidate that
passes the randomized Fermat test could be checked with several rounds of the
(non-randomized) Miller-Rabin test. Since this step takes place only once, the
number of samples will remain very small making template attacks extremely
difficult.
Together with a randomized trial division step, we believe that a randomized
primality step can effectively prevent all of the attacks presented in this paper.
As long as r is not too large (e.g. 32 or 64 bits), the impact on performance
should remain negligible.
7 Conclusion
We presented four different side-channel attacks against prime generation. The
first one is a DPA attack that can reveal a few of the least significant bits of
prime candidates. It is not intended to be used alone but is merely the first
step of a more complex attack. The second one is a template attack targeting
the most significant bits or prime candidates, which are left unchanged by the
incremental search. If necessary, the template building phase can be realized
with our first attack. Since primality testing is expected to be repeated several
times, the attack can take advantage of averaging and has a very high potential
against unprotected implementations. The practicality of our first and second
attack was confirmed with experimental results. The third one is a fault attack
preventing the prime search from terminating too quickly, thereby increasing the
number of samples available for power analysis. By combining the first, second
and third attack, it is possible to gather an arbitrarily high number of samples
for building templates, and depending on implementation parameters, several
dozens to several hundreds of samples for the template matching phase and DPA.
The last one is a safe-error attack effective when the exponentiations in primality
testing involves dummy operations. The attack can break free-run incremental
search algorithms, but not interval search algorithms, at least not for practical
choices of the interval size. Finally, we proposed several countermeasures against
our attacks, including a randomized variant of the Fermat test.
RSA Key Generation: New Attacks 119
While the scope of the attacks presented in this paper is limited to incremental
prime search, it does not mean that other search strategies, especially those using
“deterministic” update methods of the prime candidate, are immune. We leave
this topic open for future research.
References
1. Finke, T., Gebhardt, M., Schindler, W.: A New Side-Channel Attack on RSA
Prime Generation. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747,
pp. 141–155. Springer, Heidelberg (2009)
2. Common Criteria Portal: Security Targets of ICs, Smart Cards and Smart Card-
Related Devices and Systems,
http://www.commoncriteriaportal.org/products/ (retrieved in December 2011)
3. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
4. Chari, S., Rao, J., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç, Ç.K.,
Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
5. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Cryp-
tographic Protocols for Faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS,
vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
6. Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography.
In: Public-Key Parameters, ch. 4. CRC Press (1996)
7. Damgård, I., Landrock, P., Pomerance, C.: Average Case Error Estimates for the
Strong Probable Prime Test. Mathematics of Computation 61(203), 177–194 (1993)
8. Brandt, J., Damgård, I., Landrock, P.: Speeding up Prime Number Generation. In:
Matsumoto, T., Imai, H., Rivest, R.L. (eds.) ASIACRYPT 1991. LNCS, vol. 739,
pp. 440–449. Springer, Heidelberg (1993)
9. Brandt, J., Damgård, I.B.: On Generation of Probable Primes by Incremental
Search. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 358–370.
Springer, Heidelberg (1993)
10. Silverman, R.D.: Fast Generation of Random, Strong RSA Primes. Crypto-
bytes 3(1), 9–13 (1997)
11. Federal Information Processing Standards: Digital Signature Standard (DSS).
FIPS PUB 186-3 (2009)
12. Joye, M., Paillier, P., Vaudenay, S.: Efficient Generation of Prime Numbers. In:
Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS, vol. 1965, pp. 340–354. Springer,
Heidelberg (2000)
13. Joye, M., Paillier, P.: Fast Generation of Prime Numbers on Portable Devices:
An Update. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp.
160–173. Springer, Heidelberg (2006)
14. Clavier, C., Coron, J.-S.: On the Implementation of a Fast Prime Generation Al-
gorithm. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp.
443–449. Springer, Heidelberg (2007)
15. Yen, S.-M., Joye, M.: Checking Before Output Not Be Enough Against Fault-Based
Cryptanalysis. IEEE Trans. Computers 49(9), 967–970 (2000)
16. Riscure. Diode Laser Station DLS 1.0.0714 Datasheet (2011)
17. Gallagher, P.X.: On the Distribution of Primes in Short Intervals. Mathematika 23,
4–9 (1976)
A Fault Attack on the LED Block Cipher
1 Introduction
Ubiquitous computing is enabled by small mobile devices, many of which process
sensitive personal information, including financial and medical data. These data
must be protected against unauthorized access using cryptographic methods.
The strength of cryptographic protection is determined by the (in)feasibility of
deriving secret information by an unauthorized party (attacker). On the other
hand, the acceptable complexity of cryptographic algorithms implementable on
mobile devices is typically restricted by stringent cost constraints and by power
consumption limits due to battery life-time and heat dissipation issues. There-
fore, methods which balance between a low implementation complexity and an
adequate level of protection have recently received significant interest [4,5].
Fault-based cryptanalysis [1] has emerged as a practical and effective technique
to break cryptographic systems, i.e., gain unauthorized access to the secret infor-
mation. Instead of attacking the cryptographic algorithm, a physical disturbance
(fault) is induced in the hardware on which the algorithm is executed. Means to
induce faults include parasitic charge-carrier generation by a laser beam; manipu-
lation of the circuit’s clock; and reduction of the circuit’s power-supply voltage [3].
Most fault-based attacks are based on running the cryptographic algorithm sev-
eral times, in presence and in absence of the disturbance. The secret information is
then derived from the differences between the outcomes of these calculations. The
success of a fault attack critically depends on the spatial and temporal resolution
of the attacker’s equipment. Spatial resolution refers to the ability to accurately
select the circuit element to be manipulated; temporal resolution stands for the
capacity to precisely determine the time (clock cycle) and the duration of fault
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 120–134, 2012.
c Springer-Verlag Berlin Heidelberg 2012
A Fault Attack on the LED Block Cipher 121
In this section we briefly recall the design of the block cipher LED, as specified
in [10]. It is immediately apparent that the specification of LED has many parallels
to the well-known block cipher AES. The LED cipher uses 64-bit blocks as states
and accepts 64- and 128-bit keys. Our main focus in this paper will be the
version having 64-bit keys which we will denote by LED-64. Other key lengths,
e.g. the popular choice of 80 bits, are padded to 128 bits by appending zeros
until the desired key length is reached. Depending on the key size, the encryption
algorithm performs 32 rounds for LED-64 and 48 round for LED-128. Later in
this section we will describe the components of such a round.
The 64-bit state of the cipher is conceptually arranged in a 4 × 4 matrix,
where each 4-bit sized entry is identified with an element of the finite field
122 P. Jovanovic, M. Kreuzer, and I. Polian
F16 ∼
= F2 [X]/X 4 + X + 1. In the following, we represent an element g ∈ F16 ,
with g = c3 X 3 + c2 X 2 + c1 X + c0 and ci ∈ F2 , by
g
−→ c3 ||c2 ||c1 ||c0
Here || denotes the concatenation of bits. In other words, this mapping identifies
an element of F16 with a bit string. For example, the polynomial X 3 + X + 1 has
the coefficient vector (1, 0, 1, 1) and is mapped to the bit string 1011. Note that
we write 4-bit strings always in their hexadecimal short form, i.e. 1011 = B.
First, a 64-bit plaintext unit m is considered as a 16-fold concatenation of
4-bit strings m0 || m1 || · · · || m14 || m15 . Then these 4-bis strings are identified
with elements of F16 and arranged row-wise in a matrix of size 4 × 4:
⎛ ⎞
m0 m1 m2 m 3
⎜ m 4 m5 m6 m 7 ⎟
m = ⎜ ⎝ m8 m9 m10 m11 ⎠
⎟
Likewise, the key is arranged in one or two matrices of size 4 × 4 over F16 ,
according to its size of 64 bits or 128 bits:
⎛ ⎞ ⎛ ⎞
k0 k1 k2 k3 k16 k17 k18 k19
⎜ k4 k5 k6 k7 ⎟ ⎜ ⎟
k=⎜ ⎟ and possibly k̃ = ⎜k20 k21 k22 k23 ⎟
⎝ k8 k9 k10 k11 ⎠ ⎝k24 k25 k26 k27 ⎠
k12 k13 k14 k15 k28 k29 k30 k31
Figure 1 below describes the way in which the encryption algorithm of LED op-
erates. It exhibits a special feature of this cipher – there is no key schedule.
On the one hand, this makes the implementation especially light-weight. On the
other hand, it may increase the cipher’s vulnerability to various attacks. Notice
k k k k k k
m 4 rounds 4 rounds 4 rounds 4 rounds c
k k~ k k~ k~ k
m 4 rounds 4 rounds 4 rounds 4 rounds c
Fig. 1. LED key usage: 64-bit key (top) and 128-bit key (bottom)
that key additions are performed only after four rounds have been executed. The
authors of the original paper [10] call these four rounds a single Step. Key ad-
ditions are effected by the function AddRoundKey (AK). It performs an addition
of the state matrix and the matrix representing the key using bitwise XOR. It
is applied for input- and output-whitening as well as after every fourth round.
A Fault Attack on the LED Block Cipher 123
We remark again that the original keys are used without further modification as
round keys.
Now we examine one round of the LED encryption algorithm. It is composed of
several operations. Figure 2 provides a rough overview. All matrices are defined
over the field F16 . The final value of the state matrix yields the 64-bit ciphertext
unit c in the obvious way. Let us have a look at the individual steps.
AddConstants (AC). For each round, a round constant consisting of a tuple
of six bits (b5 , b4 , b3 , b2 , b1 , b0 ) is defined as follows. Before the first round, we
start with the zero tuple. In consecutive rounds, we start with the previous
round constant. Then we shift the six bits one position to the left. The new
value of b0 is computed as b5 + b4 + 1. This results in the round constants whose
hexadecimal values are given in Table 1. Next, the round constant is divided into
and add it to the state matrix. (In the current setting, matrix adddition is
nothing but bitwise XOR.)
SubCells (SC). Each entry x of the state matrix is replaced by the element
S[x] from the SBox given in Table 2. (This particular SBox was first used by the
block cipher PRESENT, see [5].)
ShiftRows (SR). For i = 1, 2, 3, 4, the i-th row of the state matrix is shifted
cyclically to the left by i − 1 positions.
124 P. Jovanovic, M. Kreuzer, and I. Polian
x 0 1 2 3 4 5 6 7 8 9 A B C D E F
S[x] C 5 6 B 9 0 A D 3 E F 8 4 7 1 2
r = 30
AC SC SR MCS 8f'
Bf'
2f'
r = 31
4f' 4f' a a 4a 2d 2c 1b
Bf' Bf' c c Ba 9d Ac Eb
2f' 2f' d d 2a Bd Fc 2b
r = 32
4a 2d 2c 1b 4a 2d 2c 1b q0 q1 q2 q3 q0 q1 q2 q3 p0 p1 p2 p3
8a 6d 5c 6b AC 8a 6d 5c 6b SC q4 q5 q6 q7 SR q5 q6 q7 q4 MCS p4 p5 p6 p7 AK
2a Bd Fc 2b 2a Bd Fc 2b q12 q13 q14 q15 q15 q12 q13 q14 p12 p13 p14 p15
0 ≤ i ≤ 15 of the key are viewed as indeterminates. The following steps list the
expressions one has to compute to finally get the fault equations.
where S −1 is the inverse of the LED SBox. The remaining expressions are
computed in the same way again.
126 P. Jovanovic, M. Kreuzer, and I. Polian
The XOR difference between the two related expressions, one derived from c and
the other one from c , is computed and identified with the corresponding fault
value, which can be read off the fault propagation in Figure 3 above. Thus we
get
In summary one gets 16 fault equations for a fault injected at a particular 4-bit
element of the state matrix at the beginning of round r = 30. For the rest of the
paper we will denote the equations by Ex,i , where x ∈ {a, b, c, d} identifies the
block the equation belongs to and i ∈ {0, 1, 2, 3} the number of the equation as
ordered below. Let us list those 16 equations.
−1
4·a= S (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) +
−1
S (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) (Ea,0 )
−1
8·a= S (3 · (c3 + k3 ) + 8 · (c7 + k7 ) + 4 · (c11 + k11 ) + 5 · (c15 + k15 )) +
−1
S (3 · (c3 + k3 ) + 8 · (c7 + k7 ) + 4 · (c11 + k11 ) + 5 · (c15 + k15 )) (Ea,1 )
−1
B·a= S (7 · (c2 + k2 ) + 6 · (c6 + k6 ) + 2 · (c10 + k10 ) + E · (c14 + k14 )) +
−1
S (7 · (c2 + k2 ) + 6 · (c6 + k6 ) + 2 · (c10 + k10 ) + E · (c14 + k14 )) (Ea,2 )
−1
2·a= S (D · (c1 + k1 ) + 9 · (c5 + k5 ) + 9 · (c9 + k9 ) + D · (c13 + k13 )) +
−1
S (D · (c1 + k1 ) + 9 · (c5 + k5 ) + 9 · (c9 + k9 ) + D · (c13 + k13 )) (Ea,3 )
Here the fault values a, b, c and d are unknown and thus have to be considered
indeterminates. Of course, for a concrete instance of the attack, we assume that
we are given the correct ciphertext c and the faulty ciphertext c and we assume
henceforth that these values have been substituted in the fault equations.
4 Key Filtering
The correct key satisfies all the fault equations derived above. Our attack is based
on quickly identifying large sets of key candidates which are inconsistent with
some of the fault equations and excluding these sets from further consideration.
The attack stops when the number of remaining key candidates is so small
that exhaustive search becomes feasible. Key candidates are organized using a
formalism called fault tuples (introduced below), and filters work directly on
fault tuples. The outline of our approach is as follows:
1. Key Tuple Filtering: Filter the key tuples and obtain the fault tuples to-
gether with their key candidate sets. (Section 4.1; this stage is partly inspired
by the evaluation of the fault equations in [9] and [11]).
2. Key Set Filtering: Filter the fault tuples to eliminate some key candidate
sets (Section 4.2).
3. Exhaustive Search: Find the correct key by considering every remaining
key candidate.
Details on the individual stages and the parameter choice for the attacks are
given below.
165 evaluations of simple polynomials over F16 . Since all entries are independent
from each other, the calculations can be performed in parallel using multiple
processors.
In the next step, we determine, for every x ∈ {a, b, c, d} the set of possible val-
ues jx of x such that Sx,0 (jx ), Sx,1 (jx ), Sx,2 (jx ) and Sx,3 (jx ) are all non-empty.
In other words, we are looking for jx which can occur on the left-hand side of
equations Ex,0 , Ex,1 , Ex,2 and Ex,3 for some possible values of key indetermi-
nates. We call an identified value jx ∈ F16 a possible fault value of x.
By combining the possible fault values of a, b, c, d in all available ways, we ob-
tain tuples t = (ja , jd , jc , jb ) which we call fault tuples of the given pair (c, c ). For
each fault tuple, we intersect those sets Sx,i (jx ) which correspond to equations
involving the same key indeterminates:
(k0 , k4 , k8 , k12 ) : Sa,0 (ja ) ∩ Sd,1 (jd ) ∩ Sc,2 (jc ) ∩ Sb,3 (jb )
(k1 , k5 , k9 , k13 ) : Sa,3 (ja ) ∩ Sd,0 (jd ) ∩ Sc,1 (jc ) ∩ Sb,2 (jb )
(k2 , k6 , k10 , k14 ) : Sa,2 (ja ) ∩ Sd,3 (jd ) ∩ Sc,0 (jc ) ∩ Sb,1 (jb )
(k3 , k7 , k11 , k15 ) : Sa,1 (ja ) ∩ Sd,2 (jd ) ∩ Sc,3 (jc ) ∩ Sb,0 (jb )
By recombining the key values (k0 , . . . , k15 ) using all possible choices in these
four intersections, we arrive at the key candidate set for the given fault tuple. If
the size of the key candidate sets is sufficiently small, it is possible to skip the
second stage of the attack and to search all key candidate sets exhaustively for
the correct key.
Each of the intersections in the above picture contains typically 24 − 28 el-
ements. Consequently, the typical size of a key candidate set is in the range
219 − 226 . Unfortunately, often several fault tuples are generated. The key candi-
date sets corresponding to different fault tuples are necessarily pairwise disjoint
by their construction. Only one of them contains the true key, but up to now we
lack a way to distinguish the correct key candidate set (i.e. the one containing
the true key) from the wrong ones. Before we address this problem in the next
section, we illustrate the key set filtering by an example.
Example 1. In this example we take one of the official test vectors from the LED
specification and apply our attack. It is given by
k = 01234567 89ABCDEF
m = 01234567 89ABCDEF
c = FDD6FB98 45F81456
c = 51B8AB31 169AC161
where the faulty ciphertext c is obtained when injecting the error e = 8 in the
first entry of the state matrix at the beginning of the 30-th round. Although the
attack is independent of the value of the error, we use a specific one here in order
to enable the reader to reproduce our results. Evaluation of the fault equations
provides us with the following table:
A Fault Attack on the LED Block Cipher 129
a 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sa,0 0 214 0 214 0 0 0 0 0 214 0 214 0 0 0 0
#Sa,1 0 0 0 0 0 0 0 0 214 214 0 0 214 214 0 0
#Sa,2 0 0 0 0 214 0 0 214 0 214 214 0 0 0 0 0
#Sa,3 0 0 213 0 213 0 213 213 213 214 0 0 0 0 0 213
d 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sd,0 0 213 213 213 0 0 0 213 0 0 213 214 0 213 0 0
#Sd,1 0 213 213 213 214 0 213 0 0 0 213 0 213 0 0 0
#Sd,2 0 0 214 213 0 0 213 0 213 213 0 213 0 0 0 213
#Sd,3 0 213 213 0 213 0 0 213 0 213 213 0 213 0 0 213
c 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sc,0 0 0 0 214 214 0 0 0 213 0 0 213 213 0 0 213
#Sc,1 0 213 0 0 0 0 0 213 213 213 213 214 0 213 0 0
#Sc,2 0 213 0 0 0 213 214 0 214 0 0 213 0 0 0 213
#Sc,3 0 0 213 213 0 0 0 0 214 213 0 213 213 0 0 213
b 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sb,0 0 0 0 213 0 213 0 0 0 214 0 213 0 213 0 214
#Sb,1 0 0 0 0 213 213 214 0 213 213 0 214 0 0 0 0
#Sb,2 0 213 0 214 213 213 0 213 0 0 0 213 213 0 0 0
#Sb,3 0 214 214 0 0 214 214 0 0 0 0 0 0 0 0 0
From this we see that there are two fault tuples, namely (9, 2, 8, 5) and (9, 2, B, 5).
The corresponding key candidate sets have 224 and 223 elements, respectively.
The problematic equations are obviously equations Ec,i for i ∈ {0, 1, 2, 3}.
There are two possible fault values, namely 8 and B. So far we have no way of
deciding which set contains the key and thus have to search through both of
them. Actually, in this example the correct key is contained in the candidates
set corresponding to the fault tuple (9,2,B,5).
S(x0 + 0) = y0 S(x0 + 0) = y0 + a
S(x4 + 1) = y4 S(x4 + 1) = y4 + b
S(x8 + 2) = y8 S(x8 + 2) = y8 + c
S(x12 + 3) = y12 S(x12 + 3) = y12 + d
130 P. Jovanovic, M. Kreuzer, and I. Polian
Now we apply the inverse SBox to these equations and take the differences of
the equations involving the same elements yi . The result is the following system:
4f = S −1 (y0 ) + S −1 (y0 + a)
8f = S −1 (y4 ) + S −1 (y4 + b)
Bf = S −1 (y8 ) + S −1 (y8 + c)
2f = S −1 (y12 ) + S −1 (y12 + d)
Finally, we are ready to use a filter mechanism similar to the one in the preceding
subsection. For a given fault tuple (a, d, c, b), we try all possible values of the
elements yi and check whether there is one for which the system has a solution
for f . Thus we have to check four equations over F16 for consistency. This is
easy enough and can also be done in parallel. If there is no solution for f , we
discard the entire candidate set. While we are currently not using the absolute
values yi for the attack, we are exploring possible further speed-up techniques
based on these values.
The effect of the attack depends strongly on injecting the fault in round 30:
1. Injecting the fault at an earlier round does not lead to useful fault equations,
since they would depend on all key elements k0 , . . . , k15 and no meaningful
key filtering would be possible.
2. Injecting the fault in a later round results in weaker fault equations which
do not rule out enough key candidates to make exhaustive search feasible.
3. If the fault is injected in round 30 at another entry of the state matrix than
the first, one gets different equations. However, they make the same kind of
key filtering possible as the equations in Section 3. Thus, if we allow fault
injections at random entries of the state matrix in round 30, the overall time
complexity rises only by a factor of 16.
Several properties of LED render it more resistant to the fault-based attack pre-
sented in this paper, compared to AES discussed in [9] and [11]. The derived
LED fault equations are more complex than their counterparts for AES [9,11].
This fact is due to the diffusion property of the MixColumnsSerial function,
A Fault Attack on the LED Block Cipher 131
which is a matrix multiplication that makes every block of the LED fault equa-
tions (Ex,j ) (Section 3.2) depend on all 16 key indeterminates. In every block we
have exactly one equation that depends on one of the key tuples (k0 , k4 , k8 , k12 ),
(k1 , k5 , k9 , k13 ), (k2 , k6 , k10 , k14 ), and (k3 , k7 , k11 , k15 ). In contrast, AES skips the
final MixColumns operation, and every block of its fault equations depends only
on four key indeterminates.
This observation yields an interesting approach to protect AES against the
fault attack from [9,11]. Adding operation MixColumns to the last round of AES
makes this kind of fault attack much harder, as the time for evaluating the AES
equations rises up to 232 . Furthermore, as in the case of LED, it is possible that
several fault tuples have to be considered, further complicating the attack.
5 Experimental Results
In this section we report on some results and timings of our attack. The timings
were obtained on a 2.1 GHz AMD Opteron 6172 workstation having 48 GB
RAM. The LED cipher was implemented in C, the attack code in Python. We
performed our attack on 10000 examples using random keys, plaintext units and
faults. The faults were injected at the first element of the state matrix on the
beginning of round r = 30. On average, it took about 45 seconds to finish a single
run of the attack, including the key tuple filtering and the key set filtering. The
time for exhaustive search wasn’t measured at this point. The execution time of
the attack could be further reduced by using a better performing programming
language like C/C++ and parallelization.
Table 3 shows the possible number of fault tuples (#ft) that appeared during
our experiments and the relation between the number of occurrences and the
cases where fault tuples could be discarded by key set filtering (Section 4.2).
For instance, column 3 (#ft = 2) reports that there were 3926 cases in which
two fault tuples were found, and 1640 of them could be eliminated using key set
filtering.
#ft 1 2 3 4 5 6 8 9 10 12 16 18 24 36
occurred 2952 3926 351 1887 1 307 394 15 1 101 39 10 14 2
discarded - 1640 234 1410 1 268 359 14 1 101 38 10 14 2
It is clear that key set filtering is very efficient. Especially if many fault tuples
had to be considered, some of them could be discarded in almost every case.
But also in the more frequent case of a small number of fault tuples there was
a significant gain. Figure 4 shows this using a graphical representation. (Note
the logarithmic y scale.) Altogether, in about 29.5% of the examples there was
a unique fault tuple, in another 29.6% of the examples there were multiple fault
tuples, none of which could be discarded, and in about 40.9% of the example
some of the fault tuples could be eliminated using key set filtering.
132 P. Jovanovic, M. Kreuzer, and I. Polian
10000
# occurrences
# discards
1000
100
10
1
1 2 3 4 5 6 8 9 10 12 16 18 24 36
# fault tuples
#ft 2 3 4 5 6 8 9 10 12 16 18 24 36
ødiscarded 0.4 0.9 1.4 2.0 2.5 3.6 3.7 5.0 6.1 8.4 8.4 12.6 24.0
k = 01234567 89ABCDEF
m̃ = 10000000 10000000
c = 04376B73 063BC443
c = 0E8F2863 17C57720
A Fault Attack on the LED Block Cipher 133
Again the error e = 8 is injected at the first entry of the state matrix at the
beginning of round r = 30. The key filtering stage returns two fault tuples
(5, 7, 7, 5) and (5, 9, 7, 5), both having key candidate sets of size 220 .
Now we form the pairwise intersections of the key candidate sets of the first
and second run. The only non-empty one contains a mere 8 key candidates from
which the correct key is found almost immediately.
Note that repeating an attack may or may not be feasible in practice. Experi-
ments demonstrate that our technique works using a single attack; several at-
tacks just further reduce the set of key candidates on which to run an exhaustive
search.
References
1. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Elimination Errors
in Cryptographic Computations. J. Cryptology 14, 101–119 (2001)
2. National Institute of Standards and Technology (NIST). Advanced Encryption
Standard (AES). FIPS Publication 197 (2001),
http://www.itl.nist.gov/fipsbups/
3. Bar-El, H., Choukri, H., Naccache, D., Tunstall, M., Whelan, C.: The Sorcerer’s
Apprentice Guide to Fault Attacks. Proceedings of the IEEE 94, 370–382 (2006)
4. Hong, D., Sung, J., Hong, S., Lim, J., Lee, S., Koo, B., Lee, C., Chang, D., Lee,
J., Jeong, K., Kim, H., Kim, J., Chee, S.: HIGHT: A New Block Cipher Suitable
for Low-Resource Device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 46–59. Springer, Heidelberg (2006)
5. Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw,
M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466.
Springer, Heidelberg (2007)
6. Kim, C.H., Quisquater, J.-J.: Fault Attacks for CRT Based RSA: New Attacks,
New Results, and New Countermeasures. In: Sauveron, D., Markantonakis, K.,
Bilas, A., Quisquater, J.-J. (eds.) WISTP 2007. LNCS, vol. 4462, pp. 215–228.
Springer, Heidelberg (2007)
7. Koren, I., Krishna, C.M.: Fault-Tolerant Systems. Morgan-Kaufman Publishers,
San Francisco (2007)
8. Hojsı́k, M., Rudolf, B.: Differential Fault Analysis of Trivium. In: Nyberg, K. (ed.)
FSE 2008. LNCS, vol. 5086, pp. 158–172. Springer, Heidelberg (2008)
9. Mukhopadhyay, D.: An Improved Fault Based Attack of the Advanced Encryption
Standard. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 421–
434. Springer, Heidelberg (2009)
10. Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED Block Cipher. In:
Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer,
Heidelberg (2011)
11. Tunstall, M., Mukhopadhyay, D., Ali, S.: Differential Fault Analysis of the Ad-
vanced Encryption Standard Using a Single Fault. In: Ardagna, C.A., Zhou, J.
(eds.) WISTP 2011. LNCS, vol. 6633, pp. 224–233. Springer, Heidelberg (2011)
Differential Fault Analysis of Full LBlock
1 Introduction
Background. Cryptographic techniques are seen as an essential method for
the confidentiality, protection of the privacy and data integrity. Recently, with
The first author of this research Liang Zhao is supported by the governmental schol-
arship from China Scholarship Council.
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 135–150, 2012.
c Springer-Verlag Berlin Heidelberg 2012
136 L. Zhao, T. Nishide, and K. Sakurai
– Round of Attack: For the countermeasure against the DFA attack, the
popular and simple method is to duplicate the encryption algorithm and
check whether the same ciphertext can be obtained [20]. As this protection
against the DFA attack can cut down the performance, only the computation
of the last few rounds need to be doubled [20]. This implies that the DFA
attack should be exploited on the earlier round for the key recovery. Since
there usually exits the diffusion function in the block cipher, the round of
DFA attack need to be explored carefully.
Our Contributions. LBlock [1] is a new lightweight block cipher which was
presented at ACNS2011. It is based on the 32-round variant Feistel structure
with 64-bit block size and 80-bit key size. In [1], Wu et al. explored the strength
of LBlock against some attacks such as differential cryptanalysis, integral attack
and related-key attack. Moreover, Minier et al. [26] recently analyzed the differ-
ential behavior of LBlock and presented the related key impossible differential
attack on round-reduced LBlock. However, it really lacks the analysis on the
implementation attack. Therefore, in the current paper, we consider the DFA
attack for LBlock. Specially, for our analysis, the practical fault model, i.e., the
random bit model, is utilized when the fault is injected into the end of the rounds
from the 24th round to the 31st round, respectively. In this fault model, the ad-
versary does know neither the fault value nor the fault position in the injection
area. To the best of our knowledge, this is the first paper that proposes the fault
analysis on full LBlock. The details are as follows:
– Firstly, if the fault is injected at the end of round from 25th to 31st , we
present the general principle of DFA attack for revealing round subkeys K 32 ,
K 31 and K 30 . Then, these three round subkeys are used to reveal the master
key K only by computing the inverse process of the key scheduling. Specially,
when the fault is injected into the 25th and 26th round, we introduce the
concept of the false active S-box which is used to distinguish it from the
active S-box.
– Secondly, as the research of DFA attack on earlier round has been introduced
for AES and DES by Derbez et al. [19] and Rivain [20], respectively, we also
analyze the DFA attack on LBlock when the fault is injected into the right
part at the end of the 24th round. If the used fault model has an assumption
that the adversary can know the position of the corrupted 4 bits at the 24th
round, we show that it is possible to reveal the round subkey K 32 by the DFA
attack directly. This implies that the master key K can also be revealed.
Moreover, in order to confirm the DFA attack on LBlock, the data complexity
analyses are presented (see Table 6), and the simulation experiments are imple-
mented (see Table 7). These simulation results show the number of faults needed
for this attack when the fault is injected into the different rounds (i.e., from 24th
round to 31st round). Specially, it can be found that if the fault is injected into
the 25th round, the smallest number of the utilized faults on average is needed
for revealing the master key K.
Organization of the Paper. In Section 2, the detailed description and some
properties of LBlock are presented. Then, the proposed DFA attack for revealing
the master key of the LBlock is introduced in Section 3, and Section 4 shows
138 L. Zhao, T. Nishide, and K. Sakurai
the corresponding data complexity and the simulation result. The concluding
remarks and the possible countermeasure are drawn in the last section.
2 Preliminaries
In this section, the notations used in this paper are listed as follows. Then, we
present a brief description of LBlock. Moreover, some properties about LBlock
are given.
– M , C: 64-bit plaintext and ciphertext.
– K, K i−1 : 80-bit master key and 32-bit round subkey, i∈{2, 3, . . ., 33}.
– F (·), P (·): Round function and diffusion function in LBlock.
– sj (·): Confusion function with 4-bit S-box sj , j∈{0, 1, . . ., 7}.
– ⊕: Bitwise exclusive-OR operation.
– <<<, >>>: Left cyclic shift and right cyclic shift operation
– : Concatenation operation.
– [v]2 : Binary form of an integer v.
Z = Z7 ||Z6 ||Z5 ||Z4 ||Z3 ||Z2 ||Z1 ||Z0 → U = Z6 ||Z4 ||Z7 ||Z5 ||Z2 ||Z0 ||Z3 ||Z1
where U i−1 =U 0i−1 U 1i−1 U 2i−1 U 3i−1 U 4i−1 U 5i−1 U 6i−1 U 7i−1 if the result of the
permutation operation U is in the (i − 1)th round. Let the input of one round of
the encryption Mi−1 =X i−1 X i−2 , the output Ci−1 =X i X i−1 can be expressed
as (F (X i−1 , K i−1 )⊕(X i−2 <<<8), X i−1 ). After the 32 rounds encryption, the
ciphertext C=X 32 X 33 can be obtained.
The 80 bits master key K=k 79 k 78 k77 . . .k 0 is used to generate the round subkey
K i ∈{0, 1}32 . This key is stored in a key register. The details of the key scheduling
are as follows:
– If i=2, K i−1 =k 79 k 78 k 77 . . .k 49 k 48 which is from the master key K.
– If i∈{3, 4, . . ., 33}, the round subkey K i−1 can be updated by the following
steps:
• (1) The key register K<<<29.
• (2) [k 79 k 78 k 77 k 76 ]=s9 [k 79 k 78 k 77 k 76 ]; [k 75 k 74 k 73 k 72 ]=s8 [k 75 k 74 k 73 k 72 ].
• (3) [k 50 k 49 k 48 k 47 k 46 ]⊕[i − 2]2 .
• (4) The leftmost 32 bits from the current register K is output as the
round subkey Ki−1 .
In the LBlock [1], ten S-boxes are specified. The details are shown in Table 1. These
S-boxes are used in the encryption/decryption algorithm and key scheduling.
s0 14 9 15 0 13 4 10 11 1 2 8 3 7 6 12 5
s1 4 11 14 9 15 13 0 10 7 12 5 6 2 8 1 3
s2 1 14 7 12 15 13 0 6 11 5 9 3 2 4 8 10
s3 7 6 8 11 0 15 3 14 9 10 12 13 5 2 4 1
s4 14 5 15 0 7 2 12 13 1 8 4 9 11 10 6 3
s5 2 13 11 12 15 14 0 9 7 10 6 3 1 8 4 5
s6 11 9 4 14 0 15 10 13 6 12 5 7 3 8 1 2
s7 13 10 15 0 14 4 9 11 2 1 8 3 7 5 12 6
s8 8 7 14 5 15 13 0 6 11 12 9 10 2 4 1 3
s9 11 5 15 0 7 2 9 13 4 8 1 12 14 10 3 6
Δα 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NuΔβ 6 6 4 6 6 6 6 8 6 6 4 8 8 8 8
Proof. This proof is immediate from the observation of the differential distribu-
tion table of each S-box sj .
According to Lemma 1, we can obtain the following propositions.
Proposition 1. For any S-box of LBlock, i.e., s∈{s0 , s1 , . . ., s7 }, when α>0,
the conditional probabilities Pr[N s (α, β)= 2|N s (α, β)> 0] and Pr[N s (α,
β)= 4|N s (α, β)> 0] satisfy the following equations:
⎧
⎨ 0, N uΔβ = 4
Pr[Ns (Δα, Δβ) = 2|Ns (Δα, Δβ) > 0] = 2/3, N uΔβ = 6
⎩
⎧ 1, N uΔβ = 8
(1)
⎨ 1, N uΔβ = 4
Pr[Ns (Δα, Δβ) = 4|Ns (Δα, Δβ) > 0] = 1/3, N uΔβ = 6
⎩
0, N uΔβ = 8
Proof. This proof is immediate from the distribution of N s (α,β) in the dif-
ferential distribution table.
Proposition 2. For each S-box of LBlock, let the input difference α>0, the
probability Pr[N s (α, β)>0]≈ 0.4267. Moreover, if N s (α, β)>0, the expec-
tation of N s (α, β)≈ 2.6222.
Proof. According to Lemma 1, it can be found that Ns (α, β)∈{0, 2, 4}. If
α>0, β∈{1, 2,. . ., 15}. Therefore, Pr[Ns (α, β)>0] can be expressed as
Pr[Ns (Δα, Δβ) > 0] = Pr[Ns (Δα, Δβ) = 2] + Pr[Ns (Δα, Δβ) = 4]
.
= (8 × 6 + 5 × 8 + 2 × 4)/(15 × 15) ≈ 0.4267
Moreover, according to P roposition 1 and Table 2, the expectation of N s (α,
β) is computed as follows
E(Ns (Δα, Δβ)) = (2/15 + 8/15 × 1/3) × 4 + (5/15 + 8/15 × 2/3) × 2 ≈ 2.6222.
In the following, a property about the diffusion function P (·) is given.
Lemma 2. The inversion of the diffusion function P (·) can be expressed as
P−1 (U)=U5 ||U7 ||U4 ||U6 ||U1 ||U3 ||U0 ||U2 .
Proof. This proof is immediate.
According to the expression of P −1 (·), the analysis on the S-box of LBlock can
be extended to the round function F (·). This can contribute to our analysis.
Next, we move on to break the LBlock by the DFA attack.
Differential Fault Analysis of Full LBlock 141
For LBlock, if the fault is injected into the left part at the end of the rth round
(i.e., r∈{29, 30, 31}), the adversary can distinguish the active S-boxes from the
inactive S-boxes in each round according to the difference pair (ΔXi−1 e
, ΔXie )
directly, where e, e ∈{0, 1, . . ., 7}, e
=e . Specially, we have, for the case r=29,
i∈{31, 32, 33}, for the case r=30, i∈{32, 33}, and for the case r=31, i=33.
Table 3 lists the active S-boxes of the 32nd round when the fault is injected into
the 29th , 30th and 31st round, where the 32-bit corrupted X32 , X31 and X30
are divided into eight 4 bits from the 31st bit to the 0th bit, respectively. As
e e
the deduced output difference ΔXie ∈{ΔXi(W P ) , ΔXi(RP ) }, the difference pair
e
(ΔXi−1 , ΔXi(W
e
P ) ) can be extracted from (ΔXi−1 , ΔXi ), which is the input
e e
and output difference pair of the active S-box in the (i-1)th round. Then, the
e
e
adversary that knows these difference pair (ΔXi−1 , ΔXi(W P ) ) can mount a key
recovery attack. Note that if a round key Ki−1 is revealed, it can be used in
the recovery of Ki−2 . The final round subkey K30 , K31 , and K32 are uniquely
determined by using few pairs of ciphertexts (C, C).
Differential Fault Analysis of Full LBlock 143
After the round subkeys K30 , K31 and K32 are revealed, the master key K
can be obtained by using the reverse process of the key scheduling other than
the brute-force analysis. The steps are as follows:
1 e 2 e
Table 4. Output differences ΔX33(W P ) and ΔX33(RP )
– Step 1: Set the 80-bit key register K reg =k 79 k 78 . . .k1 k0 . Then, input the
round subkey K 30 into the leftmost 32 bits of K reg , i.e., k 79 k 78 . . .k49 k48 =K 30 .
For the round subkey K 31 , input the bits K31 K31 . . .K31
23 22 3
into k 42 k41 . . .k22 .
Moreover, the bits K30 K30 . . .K30 of K 30 is input into k 13 k12 . . .k0 directly.
23 22 10
– Step 2: Extract the leftmost 8 bits of K 31 and divide them into two bits sets:
IK 1 =[K 3131 K 31 K 31 K 31 ] and IK 2 =[K 31 K 31 K 31 K 31 ]. Input IK1 and IK2
30 29 28 27 26 25 24
−1 −1
into s9 and s8 , respectively, and obtain the output (i.e., [K 31 30 29 28
31 K 31 K 31 K 31 ]
−1 27 26 25 24 −1
=s9 ([K 31 K 31 K 31 K 31 ]),[K 31 K 31 K 31 K 31 ]=s8 ([K 31 K 31 K 31 K 31 ])). Spe-
31 30 29 28 27 26 25 24
cially, s−1 −1
9 and s8 are the inversion S-boxes of s9 and s8 . Then, k47 =K 31 ,
28
27 26 25 24
k46 k45 k44 k43 =K 31 K 31 K 31 K 31 . k21 k20 k19 =[K 31 K 31 K 31 ]⊕[111], where [111]
2 1 0
comes from [30]2 . Moreover, extract the leftmost 8 bits of K 32 and divide
them into two bits sets: IK 3 =[K 31 32 K 32 K 32 K 32 ] and IK 4 =[K 32 K 32 K 32 K 32 ].
30 29 28 27 26 25 24
−1 −1
Input IK3 and IK4 into s9 and s8 to obtain the corresponding outputs,
−1 −1
i.e., [K 31 30 29 28 27 26 25 24
32 K 32 K 32 K 32 ]=s9 ([K 32 K 32 K 32 K 32 ]),[K 32 K 32 K 32 K 32 ]=s8
31 30 29 28
144 L. Zhao, T. Nishide, and K. Sakurai
ΔX33(W P ) }). Moreover, the 4-bit K31 and K31 can also be revealed according
7 1 4
differences does not correspond to the output differences according to Eq. (2)
e3
(i.e., for the input difference ΔX32 e
, the output difference is ΔX33(W P +RP ) ). For
this kind of input difference ΔX32 e
, we define the related S-box as the false active
S-box. Therefore, for revealing the right round subkey K32 , the true active S-box
should be distinguished from the false active S-box. Table 5 lists these two kinds
of S-boxes in the 32nd round when the fault is injected into the 25th and 26th
round, respectively. It can be found that for the corrupted 4 bits of the 25th
(and 26th ) round, the corresponding input difference sets ΔX32 e
are different.
Based on the fault model which is a random bit fault model, the following two
steps are used in the attack procedure:
– Step 1: Produce the nonzero difference set {ΔX32 e
|ΔX32
e
>0, e∈{0, 1, 2,. . .,
7}. Then, deduce the position of the injected fault on the 25th (or 26th )
round according to the generated difference set and Table 5.
– Step 2: Distinguish the active S-boxes from the false active S-boxes based
on Table 5. Then, reveal the corresponding 4 bits of the round subkey K32 e
about which bit is corrupted in these 4 bits. For this semi-random bit model, as
the adversary knows which 4 bits in the 24th round are corrupted, she/he can
distinguish the active S-box from the false active S-box successfully (see Fig. 3).
e.g., if the fault is injected into any bit of the first 4 bits at the end of the 24th
round (i.e., X240
), the adversary can know that s4 is the unique active S-box.
Then, the round subkey K32 can be revealed by using the general principle of
DFA Attack.
Injected round 24th 25th 26th 27th 28th 29th 30th 31st
F N32 24 8 7 8 12 12 24 24
F N31 8 7 8 12 24 24 24 ∗
F N30 7 8 12 24 24 24 ∗ ∗
F Nsum 24 8 12 24 24 24 ∗ ∗
Injected round 24th 25th 26th 27th 28th 29th 30th 31st
1 21 8(7) 10(9) 10(9) 10(9) 10(9) 29(19) 42(21)
2 28 13(8) 12(9) 11(10) 13(10) 12(9) 24(18) 76(29)
3 17 15(8) 7(7) 9(8) 15(10) 14(9) 36(24) 42(19)
4 21 11(8) 8(7) 12(10) 12(9) 9(8) 110(33) 41(21)
5 19 10(8) 8(7) 9(9) 15(11) 10(8) 41(23) 35(22)
6 21 19(8) 10(9) 10(8) 13(11) 15(8) 116(37) 30(20)
7 21 8(6) 10(9) 9(7) 10(10) 12(10) 36(21) 36(19)
8 20 17(12) 6(6) 11(9) 14(10) 15(10) 21(18) 39(22)
9 19 11(9) 9(9) 14(12) 15(13) 9(8) 60(28) 40(21)
10 27 17(12) 7(7) 7(7) 17(10) 9(8) 65(23) 44(24)
In Table 7, for each DFA attack, only the number of faults for revealing the
K32 is presented. This is based on the fact that the DFA attack for revealing K31
and K30 when the fault is injected into the (i–3)th round is the same as the DFA
attack for revealing K32 when the fault is injected into the (i–2)th and (i–1)th
round. Specially, in this table, the number of faults in the bracket is the number
of the utilized faults for revealing K32 , and the number of faults out of the
bracket is the total number of faults used in the DFA attack under the random
bit model. For the case that the fault is injected into the 24th round, the number
of the injected faults is detected under the semi-random bit model. Generally
speaking, the simulation results verify the former data complexity analysis in
most cases. For the running time of the simulation, it is within one second. If
the fault is injected into the 24th round, the running time for revealing K32 is
within 0.08 second. Other conditions for revealing K32 is within 0.06 second.
fault is injected into the end of the rth round (r∈{25, 26, 27, 28, 29, 30, 31}),
the round subkey can be revealed by using the pair of ciphertexts (C, C) and
the differential cryptanalysis. Then, the master key is revealed according to the
analysis of the key scheduling which uses the last three round subkeys. Specially,
if the fault is injected into the 25th and 26th round, the active S-box should be
firstly distinguished from the false active S-box which also has the nonzero input
difference. Moreover, when the fault is injected into the 24th round and the fault
model is the semi-random bit model in which a strong adversary can know the
position of the corrupted 4 bits in the register, the DFA attack is exploited for
breaking the LBlock immediately.
To thwart the proposed DFA attack on LBlock, a possible countermeasure is
to protect the last few rounds by doubling the computation and doing the result
checking. Moreover, as noted in [20], assuming that the adversary has access
to the corresponding decryption oracle, the proposed DFA attack can also be
used on the first few rounds of the cipher. This implies that the same number
of rounds need to be protected at the beginning of the cipher. According to
our analysis, for LBlock, at least the last nine rounds and the first nine rounds
are recommended to be protected against the DFA attack. However, what is
provided in our work is the known lower bound on the number of rounds to
be protected. Therefore, whether the DFA attack can succeed if the fault is
injected into the middle rounds (e.g., the 23rd round) should be further explored.
Moreover, investigating whether the DFA attack can reveal the master key K
efficiently under the random bit model when the fault is injected into the 24th
round is an interesting problem.
References
1. Wu, W.-L., Zhang, L.: LBlock: A Lightweight Block Cipher. In: Lopez, J., Tsudik,
G. (eds.) ACNS 2011. LNCS, vol. 6715, pp. 327–344. Springer, Heidelberg (2011)
2. Bogdanov, A., Knudsen, L.-R., Leander, G., Parr, C., Poschmann, A., Robshaw,
M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466.
Springer, Heidelberg (2007)
3. De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN — A
Family of Small and Efficient Hardware-Oriented Block Ciphers. In: Clavier, C.,
Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg
(2009)
4. Leander, G., Paar, C., Poschmann, A., Schramm, K.: New Lightweight DES Vari-
ants. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 196–210. Springer,
Heidelberg (2007)
Differential Fault Analysis of Full LBlock 149
5. Hong, D., Sung, J., Hong, S., Lim, J., Lee, S., Koo, B., Lee, C., Chang, D., Lee,
J., Jeong, K., Kim, H., Kim, J., Chee, S.: HIGHT: A New Block Cipher Suitable
for Low-Resource Device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 46–59. Springer, Heidelberg (2006)
6. Knudsen, L., Leander, G., Poschmann, A., Robshaw, M.J.B.: PRINTcipher: A
Block Cipher for IC-Printing. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 16–32. Springer, Heidelberg (2010)
7. Yang, L., Wang, M., Qiao, S.: Side Channel Cube Attack on PRESENT. In: Garay,
J.A., Miyaji, A., Otsuka, A. (eds.) CANS 2009. LNCS, vol. 5888, pp. 379–391.
Springer, Heidelberg (2009)
8. Bogdanov, A., Rechberger, C.: A 3-Subset Meet-in-the-Middle Attack: Cryptanal-
ysis of the Lightweight Block Cipher KTANTAN. In: Biryukov, A., Gong, G.,
Stinson, D.R. (eds.) SAC 2010. LNCS, vol. 6544, pp. 229–240. Springer, Heidel-
berg (2011)
9. Özen, O., Varıcı, K., Tezcan, C., Kocair, Ç.: Lightweight Block Ciphers Revisited:
Cryptanalysis of Reduced Round PRESENT and HIGHT. In: Boyd, C., González
Nieto, J. (eds.) ACISP 2009. LNCS, vol. 5594, pp. 90–107. Springer, Heidelberg
(2009)
10. Leander, G., Abdelraheem, M.A., AlKhzaimi, H., Zenner, E.: A Cryptanalysis of
PRINTcipher: The Invariant Subspace Attack. In: Rogaway, P. (ed.) CRYPTO
2011. LNCS, vol. 6841, pp. 206–221. Springer, Heidelberg (2011)
11. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Crypto-
graphic Protocols for Faults (Extended Abstract). In: Fumy, W. (ed.) EUROCRYPT
1997. LNCS, vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
12. Clavier, C.: Secret External Encodings Do not Prevent Transient Fault Analysis.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 181–194.
Springer, Heidelberg (2007)
13. Hemme, L.: A Differential Fault Attack Against Early Rounds of (Triple-)DES.
In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 254–267.
Springer, Heidelberg (2004)
14. Li, Y., Sakiyama, K., Gomisawa, S., Fukunaga, T., Takahashi, J., Ohta, K.: Fault
Sensitivity Analysis. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS,
vol. 6225, pp. 320–334. Springer, Heidelberg (2010)
15. Biham, E., Shamir, A.: Differential Fault Analysis of Secret Key Cryptosystems.
In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 513–525. Springer,
Heidelberg (1997)
16. Czapski, M., Nikodem, M.: Error Detection and Error Correction Procedures for
the Advanced Encryption Standard. Des. Codes Cryptogr. 49, 217–232 (2008)
17. Chen, C.N., Yen, S.M.: Differential Fault Analysis on AES Key Schedule and
Some Countermeasures. In: Safavi-Naini, R., Seberry, J. (eds.) ACISP 2003. LNCS,
vol. 2727, pp. 118–129. Springer, Heidelberg (2003)
18. Moradi, A., Shalmani, M.T.M., Salmasizadeh, M.: A Generalized Method of Differ-
ential Fault Attack Against AES Cryptosystem. In: Goubin, L., Matsui, M. (eds.)
CHES 2006. LNCS, vol. 4249, pp. 91–100. Springer, Heidelberg (2006)
19. Derbez, P., Fouque, P.-A., Leresteux, D.: Meet-in-the-Middle and Impossible Dif-
ferential Fault Analysis on AES. In: Preneel, B., Takagi, T. (eds.) CHES 2011.
LNCS, vol. 6917, pp. 274–291. Springer, Heidelberg (2011)
20. Rivain, M.: Differential Fault Analysis on DES Middle Rounds. In: Clavier, C.,
Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 457–469. Springer, Heidelberg
(2009)
21. Chen, H., Wu, W.-L., Feng, D.-G.: Differential Fault Analysis on CLEFIA. In:
Qing, S., Imai, H., Wang, G. (eds.) ICICS 2007. LNCS, vol. 4861, pp. 284–295.
Springer, Heidelberg (2007)
150 L. Zhao, T. Nishide, and K. Sakurai
22. Takahashi, J., Fukunaga, T.: Improved Differential Fault Analysis on CLEFIA.
In: Fault Diagnosis and Tolerance in Cryptography-FDTC 2008, pp. 25–39. IEEE
Computer Society Press, Los Alamitos (2008)
23. Hojsı́k, M., Rudolf, B.: Differential Fault Analysis of Trivium. In: Nyberg, K. (ed.)
FSE 2008. LNCS, vol. 5086, pp. 158–172. Springer, Heidelberg (2008)
24. Esmaeili Salehani, Y., Kircanski, A., Youssef, A.: Differential Fault Analysis of
Sosemanuk. In: Nitaj, A., Pointcheval, D. (eds.) AFRICACRYPT 2011. LNCS,
vol. 6737, pp. 316–331. Springer, Heidelberg (2011)
25. Kircanski, A., Youssef, A.-M.: Differential Fault Analysis of HC-128. In: Bern-
stein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 261–278.
Springer, Heidelberg (2010)
26. Minier, M., Naya-Plasencia, M.: Some Preliminary Studies on the Differential
Behavior of the Lightweight Block Cipher LBlock. In: Leander, G., Standaert,
F.-X. (eds.) ECRYPT Workshop on Lightweight Cryptography, pp. 35–48 (2011),
http://www.uclouvain.be/crypto/ecrypt_lc11/static/post_proceedings.pdf
Contactless Electromagnetic Active Attack
on Ring Oscillator Based True Random Number
Generator
1 Introduction
True random number generators (TRNGs) are essential in data security hard-
ware. They are implemented to generate random streams of bits used in cryp-
tographic systems as confidential keys or random masks, to initialize vectors, or
to pad values. If an adversary is able to change the behavior of the generator
(for instance if he can change the bias of the generated stream of bits), he can
reduce the security of the whole cryptographic system.
Surprisingly, there are not many papers dealing with physical attacks on ran-
dom number generators. The only practical attack to the best of our knowledge,
was published by Markettos and Moore [1]. In their attack, the attacker targets
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 151–166, 2012.
c Springer-Verlag Berlin Heidelberg 2012
152 P. Bayon et al.
2 Background
This section discusses the TRNG threats and describes briefly the generator
adopted as a design under test (DUT) in the rest of the paper.
The general structure of a TRNG is depicted in Figure 1. The generator is
composed of:
Fig. 1. Passive (2, 5) and active (1, 3, 4) attacks on a TRNG general structure
Two types of attacks on TRNGs can be considered: passive and active attacks.
Passive attacks collect some information about the generator in order to predict
future values with a non negligible probability (attacks 2 and 5 in Figure 1
– see arrow orientation). Active attacks tend to modify the behavior of the
generator in order to control somehow its output (attacks 1, 3, and 4 in Figure 1).
According to Figure 1, the adversary may target different parts of the TRNG in
different ways. We could expect, that the statistical tests (simple embedded tests
or complex external tests) could detect the attack. One could also argue that
the algorithmic postprocessing would reduce the force of the attack. However,
algorithmic post-processing is missing in some generators [2] or embedded tests
are not used, because the generator is ”provably secure” [3]. Nevertheless, it
is a common practice in applied cryptography that the security of all building
elements is evaluated separately. For this reason, evaluation of the robustness of
the generator and all its parts is of great interest.
Many sources of randomness such as thermal noise, 1/f noise, shot noise or
metastability can be used in TRNGs. A good source of randomness should not
be manipulable (and therefore not attackable) or the manipulation should be
prevented. For example, the thermal noise quality can be guaranteed by control-
ling the temperature. It is thus reasonable to expect that attacks will not target
the source of randomness.
In this paper, we will consider attacks on entropy extraction (1). Their objec-
tive can be to bias the generator output or to reduce the digital noise entropy,
since both bias and entropy reduction can simplify the subsequent attack on
the cryptographic system, since the exhaustive key search can be significantly
shortened. We will not consider other attacks from Figure 1, such as attacks on
tests (2 and 3) and postprocessing (4), because of huge number of methods and
cases that should be considered. It is up to the designer, to adapt postprocessing
and embedded tests to weaknesses of the generator. The aim of this paper is to
154 P. Bayon et al.
3 Experimental Setup
The EM attacks were realized on a board featuring ACTEL Fusion FPGA. The
board is dedicated to evaluation of TRNGs. Special attention was payed to the
power supply design using low noise linear regulators and to the design of power
and ground planes. It is important to stress that the board was not specially
designed to make the EM fault injection or side-channel attacks easier, as it is
Contactless EM Active Attack on RO-Based TRNG 155
the case of the SASEBO board [4]. It can be seen in Figure 3, that the FPGA
module was plugged into the motherboard containing power regulator and USB
interface.
the transfer. Two signals were exchanged between the boards: a clock signal
coming from the communication board and the random bitstream produced
by the TRNG inside the FPGA under attack. These two signals were mon-
itored with an oscilloscope during the attack in order to ensure that their
integrities were untouched. This implementation is called Target#2.
We ensured that the ROs were not initially locked due to their placement. In the
rest of the paper, the term ”locked” has the same meaning as in phase-locked-
loops (PLL).
In both cases, ROs were composed of three inverters (NOT gates), giving the
working frequencies of about 330 MHz. For Target#2, the TRNG was composed
of 50 ROs. A sampling clock of 24 KHz was generated in an embedded PLL.
This sampling frequency was chosen in order to make a 2-RO TRNG pass the
NIST statistical tests. In general, decreasing the speed of the sampling clock will
improve the behavior of the TRNG (the jitter accumulation time will be longer).
Moreover, we used more ROs than Wold and Tan in [2] (50 versus 25). We stress
that the TRNG featuring 50 ROs should pass FIPS, and NIST statistical tests
under normal conditions without any problems.
one for controlling the whole platform and the other one for data acquisition
and storage.
The main element of both control and data acquisition chains is a personal
computer (PC), which:
– controls the amplitude and the frequency of the sine waveform signal pro-
vided by the signal generator to the input of the 50 W power amplifier,
– positions the micro-antenna above the IC surface thanks to the XYZ motor-
ized stages,
– collects data provided by the power meter, connected to a bi-directional
coupler, in order to monitor the forward (Pforward ) and reflected (Preflected)
powers,
– sends configuration data to the ACTEL Fusion FPGA and supplies target
boards via USB,
– stores the time domain traces of all signals of interest acquired using the
oscilloscope; in our case, the outputs of the four ROs (Target #1 - Out1 to
Out4 ) and the TRNG output (Target #2).
Note that according to safety standards, but also in order to limit the noise
during acquisitions, the whole EM injection platform is placed in an EMC table
top test enclosure with a 120 dB RF isolation.
A key element of this platform is the probe that converts electric energy
in a powerfull EM field (active attacks). Most of micrometric EM probes used
generally to characterize the susceptibility of IC [5] are inductive, composed of
a single coil in which a high amplitude and thus a sudden current variation
is injected. These probes cannot be used in our context. Indeed, reducing the
158 P. Bayon et al.
This probe involves predominantly electric field, and we can assume that only
this component, at the tip end, can couple with the metal tracks inside the
IC. Further information about the platform and the effects of EM injection are
available in [6,7].
RO1 RO2
20 20
Pforward max Pforward max
No injection No injection
15 15
Y(finj) / Y(fro1)
Y(finj) / Y(fro2)
10 10
5 5
0 0
−5 −5
300 305 310 315 320 325 300 305 310 315 320 325
Frequency (MHz) Frequency (MHz)
RO3 RO4
20 20
Pforward max Pforward max
No injection No injection
15 15
Y(finj) / Y(fro3)
Y(finj) / Y(fro4)
10 10
5 5
0 0
−5 −5
300 305 310 315 320 325 300 305 310 315 320 325
Frequency (MHz) Frequency (MHz)
Fig. 8. Discrete Fourier Transforms (DFT) factor Yfinj /YfROi vs injection frequency,
after analyzing signals Out1 , Out2 , Out3 and Out4
a) No injection b) P =3mW
Forward
0.5 3
Out3 Out3
Out1 Out1
2.5
0.4
2
0.3
|Y(f)|
|Y(f)|
ΔF 1.5
0.2
1
0.1
0.5
0 0
3 3.1 3.2 3.3 3.4 3 3.1 3.2 3.3 3.4
Frequency (MHz) x 10
8 Frequency (MHz) x 10
8
Fig. 9. Discrete Fourier Transform of the signals Out1 and Out3 under: a) normal
conditions b) EM injection at Finj=309.7MHz Pf orward =3mW
– Low MI values between Vi (t) and Vj (t) when for Pforward = 340 nW, meaning
that the ROs were not locked,
– Increased MI values when Pforward was higher, meaning that EM injections
effectively lock the ROs,
a) No injection b) P =3mW
forward
4 4
2
Out (V)
2
Out (V)
1
0 0
−2 −2
0 5 10 15 20 25 0 5 10 15 20 25
time (ns) time (ns)
4 4
2
Out (V)
Out (V)
2
3
0 0
−2 −2
0 5 10 15 20 25 0 5 10 15 20 25
time (ns) time (ns)
Fig. 10. Subsequent traces in persistent display mode (bold) and mean traces (fine)
of Out1 and Out3 corresponding to RO’s outputs during a) normal conditions and b)
submitted to Pforward = 3 mW of 309.7 MHz EM injections
162 P. Bayon et al.
300
Angle (°)
200
100
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (us)
b) Histogram − Couple (RO1−RO3) − Finjection 309.7MHz
500
Number of occurence
400
300
200
100
0
0 50 100 150 200 250 300 350
Angle (°)
Fig. 11. a) Phase difference between Out1 and Out3 over the time b) Phase Distribu-
tion
In the case of strong EM harmonic injections, the two ROs are locked on
the injection frequency. This is clearly visible in Figure 9b, where the biggest
harmonic is the one of the injected frequency. Next, we propose to evaluate the
phase difference between output signals of two ROs. The evolution of the phase
differences between signals Out1 and Out3 is plotted in Figure 11a. According
to the histogram from Figure 11b, the phase is distributed between 222˚and
252˚and centered around 237˚. This gives a range of variation for the phase of
30˚. If we look at the phase evolution over the time, it is following an almost
sinusoidal tendency. As said before, during the harmonic injection, Out1 and
Out3 are mainly composed of two frequencies, one coming from the injection
itself (finj ) and the working frequency of the ring (fRO1 and fRO3 ). These two
frequencies in the frequency spectrum of each RO produce a beat phenomenon
(as it is defined in acoustics). This beat phenomenon explains the sinusoidal
tendency of the phase.
Fig. 12. Bitstream produced by the TRNG under different attack powers at 309.7 MHz
using electric probe (120x32) - Starting from left to right: a) No injection b) PForward
= 210 µW c) PForward = 260 µW d) PForward = 300 µW
a) Vin Amplifier
1
Vin (V)
−1
0 0.5 1 1.5 2 2.5 Time (s) 3 3.5 4 4.5 5
b) TRNG bitstream
c) Bias in %
100
Bias in %
50
0
0 0.5 1 1.5 2 2.5 Time (s) 3 3.5 4 4.5 5
Fig. 13. a) AM signal - b) TRNG stream of bits (raster scanning from bottom to top
and left to right) - c) Bias in % for the TRNG stream of bits
Looking at the bitstream or the bias, it is clear that the behavior of the TRNG
is quickly (in less than 1 ms) impacted by the EM perturbation and it returns
to its initial state with the same speed. In fact, we observed that the bias was
changing according to the dynamics of the power amplification chain. In our case,
it has a time response of roughly 1 ms. The difference in the bias for the different
periods of attack is due to the fact that the response time of the power amplifier
is not adapted to operate in an AM mode. This experiment makes clear that
the dynamic EM harmonic injection is feasible and that it can be very powerfull
and able to control the behavior of a RO-based TRNG even if it is composed of
a big number of ROs. The dynamic control of the EM harmonic injection is of
a paramount importance, because it can be used in order to by-pass statistical
embedded tests launched periodically.
Contactless EM Active Attack on RO-Based TRNG 165
Vin Amplifier
1
Vin (V)
−1
0 50 100 150 200 250 300 350 400 450 500
Time (s)
TRNG bitstream
0
10
20
30
0 1000 2000 3000 4000 5000 6000 7000 8000
Bias in %
100
Bias in %
50
0
0 50 100 150 200 250 300 350 400 450 500
Time(s)
Fig. 14. a) AM signal - b) There might be something written in this stream of bits -
c) Bias in % for the TRNG stream of bits
6 Conclusion
Acknowledgments. The work presented in this paper was realized in the frame
of the EMAISeCi project number ANR-10-SEGI-005 supported by the French
”Agence Nationale de la Recherche” (ANR).
References
1. Markettos, A.T., Moore, S.W.: The Frequency Injection Attack on Ring-Oscillator-
Based True Random Number Generators. In: Clavier, C., Gaj, K. (eds.) CHES 2009.
LNCS, vol. 5747, pp. 317–331. Springer, Heidelberg (2009)
2. Wold, K., Tan, C.H.: Analysis and Enhancement of Random Number Generator in
FPGA Based on Oscillator Rings. In: International Conference on Reconfigurable
Computing and FPGAs (ReConFig 2008), pp. 385–390 (2008)
3. Sunar, B., Martin, W.J., Stinson, D.R.: A Provably Secure True Random Num-
ber Generator with Built-In Tolerance to Active Attacks. IEEE Transactions on
Computers 56(1), 109–119 (2007)
4. AIST, Side-channel Attack Standard Evaluation Board (SASEBO),
http://staff.aist.go.jp/akashi.satoh/SASEBO/en/index.html
5. Dubois, T., Jarrix, S., Penarier, A., Nouvel, P., Gasquet, D., Chusseau, L., Azais,
B.: Near-field electromagnetic characterization and perturbation of logic circuits.
In: Proc. 3rd Intern. Conf. on Near-Field Characterization and Imaging (ICONIC
2007), pp. 308–313 (2007)
6. Poucheret, F., Tobich, K., Lisart, M., Robisson, B., Chusseau, L., Maurine, P.:
Local and Direct EM Injection of Power into CMOS Integrated Circuits. In: Fault
Diagnosis and Tolerance in Cryptography, FDTC 2011 (2011)
7. Poucheret, F., Robisson, B., Chusseau, L., Maurine, P.: Local ElectroMagnetic Cou-
pling with CMOS Integrated Circuits. In: International Workshop on Electromag-
netic Compatibility of Integrated Circuits, EMC COMPO 2011 (2011)
8. Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon,
N.: Mutual Information Analysis: A Comprehensive Study. Journal of Cryptology,
1–23 (2010)
A Closer Look at Security
in Random Number Generators Design
Viktor Fischer
Abstract. The issue of random number generation is crucial for the im-
plementation of cryptographic systems. Random numbers are often used
in key generation processes, authentication protocols, zeroknowledge pro-
tocols, padding, in many digital signature and encryption schemes, and
even in some side channel attack countermeasures. For these applica-
tions, security depends to a great extent on the quality of the source of
randomness and on the way this source is exploited. The quality of the
generated numbers is checked by statistical tests. In addition to the good
statistical properties of the obtained numbers, the output of the genera-
tor used in cryptography must be unpredictable. Besides quality and un-
predictability requirements, the generator must be robust against aging
effects and intentional or unintentional environmental variations, such
as temperature, power supply, electromagnetic emanations, etc. In this
paper, we discuss practical aspects of a true random number generator
design. Special attention is given to the analysis of security requirements
and on the way how this requirements can be met in practice.
1 Introduction
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 167–182, 2012.
c Springer-Verlag Berlin Heidelberg 2012
168 V. Fischer
the system and they should never leave the system in clear. For the same reason,
if the security system is implemented in a single chip (cryptographic system-
on-chip), the keys should be generated inside the same chip. Implementation of
random number generators in logic devices (including configurable logic devices)
is of paramount importance.
There are three basic challenges in modern embedded TRNG design: (i) find-
ing a good quality source of randomness (available in the digital technology);
(ii) finding an efficient and robust principle of randomness extraction; (iii) guar-
anteeing the security (e.g. by a robust design or by an efficient online testing).
Historically, three basic RNG classes are used in cryptography: deterministic,
nondeterministic (physical) and hybrid random number generators.
Deterministic (pseudo-) random number generators (DRNGs) are mostly fast
and have good statistical properties. They are usually used as key generators
in stream ciphers. Due to the existence of some underlying algorithms, DRNGs
are easy to implement in logic devices. However, if the algorithm is known, the
generator output is predictable. Even when the algorithm is not known but some
of the generator output sequences have been recorded, its behavior during the
recorded sequence can be used in future attacks.
Physical (true-) random number generators (TRNGs) use physical processes
to generate random numbers. If the underlying physical process cannot be con-
trolled, the generator output is unpredictable and/or uncontrollable. The final
speed of TRNGs is limited by the spectrum of the underlying physical phe-
nomenon and by the principle used to extract entropy from it (e.g. sampling fre-
quency linked with the noise spectrum). The statistical characteristics of TRNGs
are closely related to the quality of the entropy source, but also to the random-
ness extraction method. Because physical processes are subject to fluctuations,
the statistical characteristics of TRNGs are usually worse than those of DRNGs.
Hybrid random number generators (HRNGs) represent a combination of a
(fast and good quality) deterministic RNG seeded repeatedly by a (slow but
unpredictable) physical RNG. The designer has to find a satisfactory compromise
between the speed of the generator and its predictability (by adjusting the time
interval between seeds and the size of a seed).
TRNGs are the only cryptographic primitives that have not been subject
to standardization up to now. However, before using the generator in practice,
the principle and its implementation inside a cryptographic module has to be
validated by an accredited institution as part of a security evaluation process.
Generators that do not have a security certificate are considered to be insecure
in terms of their use in cryptographic applications. Many TRNG designs exist,
but only few of them deal with security. In this paper, we will focus on security
aspects in TRNG design.
The paper is organized as follows. In Sec. 2, we present briefly basic approaches
in TRNG design. In Sec. 3, we present and discuss basic TRNG design evaluation
criteria and in Sec. 4 we analyze in detail TRNG security requirements. In Sec. 5,
we sum up basic requirements for future secure TRNG designs. We conclude the
paper in Sec. 6.
A Closer Look at Security in Random Number Generators Design 169
2 TRNG Design
The TRNG design styles evolved significantly in past few years. In the classical
approach (see Fig. 1a), the designers usually proposed some (new) principle
reflecting required design constraints such as area, throughput and/or power
consumption. In the development phase, they obviously used FIPS 140-1 [9] or
FIPS 140-2 statistical tests for verifying the quality of the generated bitstream,
because these simple tests need short data files and they give a good quality
estimation. In order to validate the final quality of the generated bitstream, the
designer tested the generated data using standard statistical test suites like NIST
SP 800-22 [20] or DIEHARD [19].
Even though statistical tests are required to evaluate the quality of the gener-
ated sequence, they cannot distinguish between pseudo random data generated
by a deterministic generator and truly random data from a physical TRNG.
This was one of the reasons, why German BSI (Bundesamt für Sicherheit in der
Informationstechnik) proposed in 2001 a new methodology aimed at evaluation
of physical random number generators. The AIS 31 methodology [15] defined
several RND classes and their security requirements. It was updated in 2011
and new RNG classes were defined [16].
Embedded Alarm
tests
Fig. 1. Classical (a) and German BSI’s (b) approach in TRNG design
very important security role if the source of randomness fails: (i) it can serve
temporarily as a DRNG; (ii) according to the application security level, it should
guarantee TRNG unpredictability in forward, backward or both directions.
Since the cryptographic algorithm implemented in the post-processing block
behaves as a DRNG when a true randomness fails, the latest AIS methodology
[16] merges evaluation of true random number generators and pseudorandom
number generators into a common evaluation procedure and introduces new
RNG subclasses (see Tab. 1): Physical TRNG (PTG.1 and PTG.2), Hybrid
physical TRNG (PTG.3), Deterministic RNG (DRG.1, DRG.2 and DRG.3),
Hybrid deterministic RNG (DRG.4) and Non-physical TRNG (NTG).
Source of Randomness
Logic devices are designed for the implementation of deterministic logic systems.
Each unpredictable behavior in such a system (caused by a metastability, clock
jitter, radiation errors, etc.) can have catastrophic consequences for the behavior
of the overall system. For this reason, vendors of logic devices tend to minimize
these causes. As a consequence, the TRNG design should always be critically
examined in order to keep up with the evolution of the underlying technology.
Most logic devices do not contain analog blocks, so the sources of randomness
are related to the operation of logic gates. Analog physical phenomena (like
thermal, shot and flicker noise) are transformed to time domain instability of
logic signals [13]. This can be seen as a variation in the delay of logic gates,
analog behavior of logic gates between two logic levels (e.g. metastability) [18],
[14] or randomness in two concurrent writings to RAM memory blocks [12], [11].
The instability of gate delays causes signal propagation variations over time.
These variations can be seen as a clock period instability (the jitter) in clock
generators containing delay elements assembled in a closed loop (ring oscillators).
The variation in propagation time is also used in generators with delay elements
in an open chain assembly [7].
Some generators use the tracking jitter introduced by phase locked loops
(PLLs) available in digital technology [10].
Method of Randomness Extraction
In general, random numbers can be obtained in two ways: sampling random
signals at regular time intervals or sampling regular signals at random time
intervals. In synchronous systems, the first method is preferable in order to
guarantee a constant bit rate on the output. In logic devices, randomness is often
extracted by sampling a jittery (clock) signal using synchronous or asynchronous
flip-flops (latches) and a reference (clock) signal.
The choice between synchronous and asynchronous flip-flops does not seem
to be important in ASICs, but it is very important in FPGAs. This is because
synchronous flip-flops are hardwired in logic cells as optimized blocks and their
metastable behavior is consequently minimized. On the other hand, latches can
usually only be implemented in Look up tables (LUTs) and are therefore subject
to metastable behavior to a greater extent [7].
Another ways of extracting randomness are: (i) counting number of random
events [28] or (ii) counting number of reference clock periods in a randomly
changing time interval [26].
The randomness extraction method is usually linked to the basic principle of
the generator and to the source of randomness. The randomness extraction pro-
cedure and post-processing are sometimes merged into the same block and cannot
be separated [24]. In that case, the entropy of the randomness source is masked
by post-processing and cannot be evaluated or tested correctly.
Arithmetic Post-processing of the Raw Binary Signal
The entropy source may have some weaknesses that lead to the generation
of non-random numbers (e.g. long sequences of zeros or ones). In this case,
172 V. Fischer
Cryptographic Post-processing
This kind of the post-processing uses both diffusion and confusion properties
of cryptographic functions. The perfect statistical characteristics of most of the
encryption algorithms can be used to mask generator imperfections. One of ad-
vantages of this approach is that the encryption key can be used as a crypto-
graphic variable to dynamically modify the behavior of the generator. Although
this kind of post-processing block (the cipher) is rather complex and expensive,
the TRNG can reuse (share) the cipher that is used for data encryption.
One of the most expensive (in time and area) but also one of the most se-
cure methods is cryptographic post-processing based on hash functions. It uses
diffusion and one-wayness (as opposed to encryption of the raw binary sisgnal)
properties of hash functions to ensure the unpredictability of bits generated by
the TRNG if a total breakdown of the noise source occurs. In this case, due
to the non-linearity property of hash functions, the TRNG will behave like a
cryptographically secure DRNG.
generated numbers. Another solution is to estimate the smallest bit rate available
at the output and to sample the output at this rate. The disadvantage of the
first solution is that, depending on the mean output bit rate and the need for
random numbers, the FIFOs sometimes need to be very big. The disadvantage
of the second solution is that if the estimated bit rate is incorrect, the random
numbers may not be always available at the output.
Power Consumption
The power consumption of the generator is linked to its randomness source (e.g.
the oscillator), to the clock frequency used and to the post-processing algorithm
agility. In power critical applications, the generator can be stopped when not in
use. However, the possibility to stop the bit stream generation can be used to
attack the generator.
Technological Requirements
Compared to the implementation of TRNGs in ASICs, their implementation in
FPGAs is much more restricted. Many TRNGs implemented in ASICs use analog
components to generate randomness (e.g. chaos based TRNGs using analog to
digital converters, free running oscillator based generators using thermal noise
from diodes and resistors, etc.) and to process randomness (e.g. operational
amplifiers, comparators, etc.).
Most of these functional blocks are usually not available in digital technology
and especially in FPGAs, although some of them may be available in selected
families, e.g. RC oscillators in Microsemi (Actel) Fusion FPGA, analog PLLs in
most Altera and Actel families but not in old Xilinx families. From the point
of view of their feasibility in FPGAs, some generators are not feasible or are
174 V. Fischer
difficult to implement in FPGAs, some are feasible in selected FPGAs and the
most general principles are feasible in all FPGAs.
Stochastic models are different from physical models. Figure 2 depicts the me-
chanical principle of the metastability (that is useful for understanding metasta-
bility in electronics). In this case, the physical model of the metastability would
describe the form of the hill and the stochastic model would describe probability
distribution of the ball final position according to the form and the width of the
hill. In general, stochastic models are easier to construct.
The stochastic model must describe only random process that is indeed used
as a source of randomness. The metastability in Fig. 2 is related to the ability
of the ball to stay at the top of the hill during random time interval. It is clear,
that it is very difficult (but not completely impossible) to place and to maintain
the ball on the top. However, it is completely impossible to place it periodically
exactly at the top in small time periods (in order to increase the bitrate) as is
supposed to be done in papers presumably using metastability, e.g. in [18].
The stochastic model serves for estimating the lower entropy bound. This
value should be used in the design of the arithmetic post-processing block: the
lower entropy bound determines the compression ratio necessary for increasing
the entropy per output bit to a value close to 1. It can also be used for testing
the entropy of the generated random bits in real time (online tests).
MSS
Inner Testability
Inner testability means that the generator structure enables evaluation of the
entropy of the raw binary signal (if it is available) [6]. Indeed, in some designs,
randomness extraction and post-processing are merged into the same process and
the unprocessed random signal (the raw binary signal) is not available. Even if
this signal is available, it is sometimes composed of a pseudo random pattern
combined with a truly random bit stream [4].
The pseudo random pattern makes statistical evaluation of the raw signal
more difficult. For this reason, we propose a new testability level: an absolute
inner testability. The raw binary signal of the generator featuring absolute inner
testability does not include a pseudo random pattern and contains only a true
random bit stream. If (for some reason) the source of randomness fails, the raw
signal of the generator will be zero. This fact can be used to detect very quickly
and easily the generator’s total failure.
176 V. Fischer
The TRNG characteristics discussed in Sec. 3 are not all equally important. Se-
curity parameters like robustness, availability of a stochastic model, testability,
etc. always take priority in a data security system. Their weight in TRNG eval-
uation is much higher than that of other parameters like power consumption,
bit rate, etc. For this reason, we will analyze these criteria in more details and
give some practical recommendations in the next section.
The quality of the generator output is tightly linked with the quality of the
source of randomness and to the randomness extraction method used. The phys-
ical characteristics of the source of randomness (e.g. frequency spectrum) and
the randomness extraction method determine the principal parameters of the
generated bit stream: the bias of the output bit stream, correlation between sub-
sequent bits, visible patterns, etc. While some of these faults can be corrected
by efficient post-processing, it is better if the generator inherently produces a
good quality raw bit stream.
It is of extreme importance that the generator is dimensioned to the minimum
amount of random physical quantities (noise, jitter, etc.) that cannot be further
reduced. The thermal noise can be considered as such a source of entropy. How-
ever, the total noise in digital devices is mostly a composition of random noises
(such as thermal noise, shot noise, flicker noise, etc.) coming from global and
independent local sources, but also of data dependent deterministic noises that
can be very often manipulable.
If the extractor samples the source of randomness too fast, adjacent bits could
be correlated. For this reason, it is good practice to check the generated bit
stream for a short term auto correlation. It is also possible that the digital noise
exhibits some other short term dependencies, which need to be detected by some
generator specific tests.
The behavior of the generator is often influenced by external and/or internal
electrical interferences. The most obvious effect of this will be discrete frequencies
from the power supply and from various internal signals appearing in the noise
spectrum.
The spectrum of the generated noise signal can be significantly influenced
by a low frequency 1/f noise caused by semiconductors. Furthermore, the high
A Closer Look at Security in Random Number Generators Design 177
frequencies from the noise spectrum may be unintentionally filtered out by some
internal capacities. Presumably white Gaussian noise will thus have a limited
spectrum that will not be uniform.
Some generators can feature so called bad spots. Bad spots are short time
periods, during which the generator ceases to work, due to some electrical inter-
ference or to extreme excursions of the generator’s overloaded circuitry.
Another dangerous feature of the generator can be a back door, which refers
to the deviations from uniform randomness deliberately introduced by the man-
ufacturer. For example, let us suppose that instead of using some physical pro-
cess, the generator would generate a high quality pseudo random sequence with
a 40-bit seed. It would be impossible to detect this behavior by applying stan-
dard statistical tests on the output bit stream, but it would be computationally
feasible for someone who knows the back door to guess successive keys.
When implementing TRNG as a part of a cryptographic system on chip,
designers must take into account that the circuitry surrounding the generator
will influence the generator’s behavior by the data dependent noise present in
power lines and by cross-talks. This impact is not so dangerous if two conditions
are fulfilled: (i) the low entropy bound estimation of the generator does not
include the digital noise from the system on chip activities; (ii) embedded online
tests verify continuously that the effective entropy is not below this bound.
Very few papers evaluate the impact of the environment on the source of
randomness and on the operation of the TRNG. The generator uses all the
sources contributing to the selected phenomena. For example the clock jitter is
determined by the local noise sources, but also by global sources from power
supply, electromagnetic emanations etc. If the low entropy bound was estimated
for the sum of noise sources, it will be sufficient for the attacker to put the
generator to ideal conditions (low noise battery power supply, metallic shielding)
in order reduce the entropy under the estimated low bound.
The generator’s design must be evaluated in changing environmental condi-
tions (temperature, electromagnetic emanations, etc.). It must be tested and
embedded test validated for edge values (only one parameter is set to its max-
imal value) and corner values (more or all parameters are set to their critical
value) of environmental parameters.
Recently, we have developed a set of evaluation boards (modules) aimed at
fair TRNG benchmarking [5]. Five modules using five different FPGA families
are available: Altera Cyclone III, Altera Arria II, Xilinx Spartan 3, Xilinx Virtex
5 and Microsemi Fusion. All the modules have the same architecture featuring
selected FPGA device, linear power supply, two LDVS outpus for external jitter
measurement and optionally 32 Mbits of external RAM for fast data acquisition.
The modules are plugged to a motherboard containing linear power supplies
(the card can be powered by battery, too), USB interface control device from
Cypress. The modules are accessible remotely on demand and can be used for
a fair evaluation of TRNG designs in the same working conditions. The new
generation will be placed in an electromagnetic shielding and will communicate
with PC via optical fibers.
178 V. Fischer
Very few recent designs deal with stochastic models [23], [2], [3], [22], [8], [28].
The most comprehensive model of a two-oscillator based TRNG is presented in
[2]. It characterizes randomness in the frequency domain. However, underlying
physical hypotheses (clock jitter as a one-dimensional Brownian motion) must
be still thoroughly evaluated.
A stochastic approach (an urn model) based on a known jitter size is pre-
sented by Sunar et al. in [23]. Unfortunately, it is based on several unrealistic
assumptions criticized by Dichtl in [8]. Some of these assumptions, such as jitter
overestimation (due to jitter measurement outside the device using standard in-
put/output circuitry) can be corrected by using differential oscilloscope probes
in combination with LVDS device outputs [25]. Unrealistic requirements given
on the XOR gate were later resolved by Wold and Tan in [30].
However, the most security critical assumption of Sunar et al. turned out to
be the mutual independence of rings (basic assumption for the validity of the
model). It was shown in [4] that the rings are not independent and that up to 25%
of them can be mutually locked. This phenomenon would reduce significantly the
validity of the Sunar et al.’s model and consequently the entropy estimation and
the security of the generator.
It is worth mentioning that Wold and Tan made another security critical at-
tempt: since (by changing the original TRNG structure) the raw binary signal
at the XOR gate output passed statistical tests more easily, they deduced that
the entropy is sufficient enough (without measuring the jitter) and consequently
they reduced considerably the number of rings (from 114 to 25). From the se-
curity point of view, this attempt is not acceptable, since it caused significant
entropy reduction (according to the model, only few urns were filled).
The models presented in [3] are restricted to TRNGs based on coherent sam-
pling [17], [26], [10]. However, these models have only limited practical value,
because the first TRNGs in [17] and [26] have some technological limits (diffi-
culty to set up precisely the generated clock signals periods) and the PLL-based
TRNG from [10] uses the jitter with a complex profile (some deterministic jitter
coming from the PLL depends on the characteristics of the PLL control loop).
In contrast to standard methods that tests only the TRNG output, the AIS
methodology requires to test (for higher security levels) also the raw binary
signal (see Fig. 1b). This new approach is motivated by the fact that the post-
processing can mask serious defects of the generator. If a stochastic model of
the physical randomness source is available, it can be used in combination with
the raw signal to estimate the entropy and the bias depending on random input
variables and depending on the generator principle.
The raw binary signal is also used in Online tests. Online tests should be
applied to the digital noise signal while the generator is running. They provide
A Closer Look at Security in Random Number Generators Design 179
ways to stop the TRNG (at least temporarily) when a conspicuous statistical
feature is detected. A special kind of online tests required by the AIS methodol-
ogy is a “total failure test” or Tot test that should be able to immediately detect
total failure of the generator.
Evaluating TRNGs is a difficult task. Clearly, it should not be limited to test-
ing the TRNG output. Following the AIS methodology, the designer should also
provide a stochastic model based on the noise source and the extraction process
and propose statistical and online tests suited to the generator’s principle. The
AIS methodology does not favor or exclude any reasonable TRNG design. The
applicant can also substitute alternative evaluation criteria, however these must
be clearly justified.
Surprisingly, no design was evaluated in the literature following the AIS rec-
ommendations for high level security (separate testing of the raw binary signal
and internal random numbers required for PTG.3 and PTG.4) up to now. Some
papers just apply AIS tests T0 to T4 at the generator output. It is also worth
pointing out that no paper proposed up to now a design-specific online test, not
even a design-specific total failure test. Surprisingly, most of recent designs are
still evaluated by their authors following the classical approach from Fig. 1a.
In our approach, we propose a new extension of security in TRNG design,
which is depicted in Fig. 3. This approach simplifies significantly security eval-
uation, construction of the generator’s stochastic model and last but not least,
realization of simple and fast embedded test, while being entirely compatible
with the AIS methodology.
Fig. 3. New security approach in TRNG design based on embedded randomness testing
We propose to measure the source of entropy (e.g. the jitter) before the en-
tropy extraction. This way, the randomness quantification is easier and more
precise. Since the entropy extraction is an algorithmic process, it can be easily
included in the stochastic (mathematical) model. However, two conditions must
be fulfilled in our approach: (i) the method must to quantify exactly the same
physical process that is used as a source of randomness; (ii) the entropy ex-
traction algorithm must be included in the stochastic model very precisely. We
have analyzed many recent TRNG principles. Unfortunately, only few of them
180 V. Fischer
are directly (without modification) applicable. For example, we can cite those
published in [17], [26], [10] and [27].
Some papers deal with implementation of embedded tests FIPS, NIST, etc.
inside the device [21], [29]. Unfortunately, their authors do not consider the
impact of the tests on the TRNG itself. The tests generate temporarily the
digital noise (which let them pass more easily) and during the normal operation
the effective noise (and consequently also entropy) can be significantly smaller.
6 Conclusion
In this paper, we have presented basic approaches to designing modern TRNGs.
We have presented and discussed basic TRNG design evaluation criteria, such as
sources of randomness and randomness extraction method applied, arithmetic
and cryptographic post-processing method utilized, output bitrate and its sta-
bility, resource usage, power consumption, technological and design automation
requirements, etc.
We have explained that security parameters like robustness, availability of a
stochastic model, testability, etc. always take priority in a data security system.
A Closer Look at Security in Random Number Generators Design 181
We have also proposed a new level of testability criteria: the absolute inner
testability. Furthermore, the new TRNG design approach testing the source of
entropy before entropy extraction presented in this paper contributes to security
enhancement of future TRNG design. We have also proposed a solution, which
can serve for a fair TRNG benchmarking. In the last section, we have summed
up several recommendations aimed at securing TRNG designs in general.
References
1. Badrignans, B., Danger, J.L., Fischer, V., Gogniat, G., Torres, L.: Security Trends
for FPGAs, 1st edn., ch. 5, pp. 101–135. Springer (2011)
2. Baudet, M., Lubicz, D., Micolod, J., Tassiaux, A.: On the security of oscillator-
based random number generators. Journal of Cryptology 24, 1–28 (2010)
3. Bernard, F., Fischer, V., Valtchanov, B.: Mathematical Model of Physical RNGs
Based on Coherent Sampling. Tatra Mt. Math. Publ. 45, 1–14 (2010)
4. Bochard, N., Bernard, F., Fischer, V., Valtchanov, B.: True-Randomness and Pseu-
dorandomness in Ring Oscillator-Based True Random Number Generators. Inter-
national Journal of Reconfigurable Computing, Article ID 879281, 13 (2010)
5. Bochard, N., Fischer, V.: A set of evaluation boards aimed at TRNG design eval-
uation and testing. Tech. rep., Laboratoire Hubert Curien, Saint-Etienne, France
(March 2012), http://www.cryptarchi.org
6. Bucci, M., Luzzi, R.: Design of Testable Random Bit Generators. In: Rao, J.R.,
Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 147–156. Springer, Heidelberg
(2005)
7. Danger, J.L., Guilley, S., Hoogvorst, P.: High Speed True Random Number Gen-
erator based on Open Loop Structures in FPGAs. Elsevier Microelectronics Jour-
nal 40(11), 1650–1656 (2009)
8. Dichtl, M., Golić, J.D.: High-Speed True Random Number Generation with Logic
Gates Only. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727,
pp. 45–62. Springer, Heidelberg (2007)
9. Fips, P. 140-1: Security Requirements for Cryptographic Modules. National Insti-
tute of Standards and Technology 11 (1994)
10. Fischer, V., Drutarovsky, M.: True Random Number Generator Embedded in Re-
configurable Hardware. In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002.
LNCS, vol. 2523, pp. 415–430. Springer, Heidelberg (2003)
11. Güneysu, T.: True Random Number Generation in Block Memories of Reconfig-
urable Devices. In: Proc. Int. Conf. on Field-Programmable Technology – FPT
2010, pp. 200–207. IEEE (2010)
12. Gyorfi, T., Cret, O., Suciu, A.: High Performance True Random Number Gen-
erator Based on FPGA Block RAMs. In: Proc. Int. Symposium on Parallel and
Distributed Processing, pp. 1–8. IEEE (2009)
13. Hajimiri, A., Lee, T.: A general theory of phase noise in electrical oscillators. IEEE
Journal of Solid-State Circuits 33(2), 179–194 (1998)
14. Holleman, J., Otis, B., Bridges, S., Mitros, A., Diorio, C.: A 2.92 muW Hardware
Random Number Generator. In: IEEE Proceedings of ESSCIRC (2006)
15. Killmann, W., Schindler, W.: AIS 31: Functionality classes and evaluation method-
ology for true (physical) random number generators, version 3.1. Bundesamt fur
Sicherheit in der Informationstechnik (BSI), Bonn (2001),
http://www.bsi.bund.de/zertifiz/zert/interpr/ais31e.pdf
182 V. Fischer
16. Killmann, W., Schindler, W.: A proposal for: Functionality classes for random
number generators, version 2.0. Tech. rep., Bundesamt fur Sicherheit in der Infor-
mationstechnik (BSI), Bonn (September 2011),
https://www.bsi.bund.de/EN/Home/home_node.html
17. Kohlbrenner, P., Gaj, K.: An Embedded True Random Number Generator for
FPGAs. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium
on Field Programmable Gate Arrays, pp. 71–78 (2004)
18. Majzoobi, M., Koushanfar, F., Devadas, S.: FPGA-Based True Random Num-
ber Generation Using Circuit Metastability with Adaptive Feedback Control. In:
Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 17–32. Springer,
Heidelberg (2011)
19. Marsaglia, G.: DIEHARD: Battery of Tests of Randomness (1996),
http://stat.fsu.edu/pub/diehard/
20. Rukhin, A., Soto, J., Nechvatal, J., Smid, J., Barker, E., Leigh, S., Leven-
son, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical
test suite for random and pseudorandom number generators for cryptographic
applications, nist special publication 800-22 (2001), http://csrc.nist.gov/,
http://csrc.ncsl.nist.gov/publications/nistbul/html-archive/dec-00.
html
21. Santoro, R., Sentieys, O., Roy, S.: On-line monitoring of random number genera-
tors for embedded security. In: Proceedings of IEEE International Symposium on
Circuits and Systems, ISCAS 2009 (2009)
22. Simka, M., Drutarovsky, M., Fischer, V., Fayolle, J.: Model of a True Random
Number Generator Aimed at Cryptographic Applications. In: Proceedings of 2006
IEEE International Symposium on Circuits and Systems, ISCAS 2006, p. 4 (2006)
23. Sunar, B., Martin, W., Stinson, D.: A Provably Secure True Random Number
Generator with Built-In Tolerance to Active Attacks. IEEE Transactions on Com-
puters, 109–119 (2007)
24. Tkacik, T.: A Hardware Random Number Generator. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 450–453. Springer, Heidel-
berg (2003)
25. Valtchanov, B., Aubert, A., Bernard, F., Fischer, V.: Characterization of random-
ness sources in ring oscillator-based true random number generators in FPGAs.
In: 13th IEEE Workshop on Design and Diagnostics of Electronic Circuits and
Systems, DDECS 2010, pp. 1–6 (2010)
26. Valtchanov, B., Fischer, V., Aubert, A.: Enhanced TRNG Based on the Coher-
ent Sampling. In: 2009 International Conference on Signals, Circuits and Systems
(2009)
27. Varchola, M., Drutarovsky, M.: Embedded Platform for Automatic Testing and
Optimizing of FPGA Based Cryptographic True Random Number Generators.
Radioengineering 18(4), 631–638 (2009)
28. Varchola, M., Drutarovsky, M.: New High Entropy Element for FPGA Based True
Random Number Generators. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 351–365. Springer, Heidelberg (2010)
29. Veljkovic, F., Rozic, V., Verbauwhede, I.: Low-Cost Implementations of On-the-
Fly Tests for Random Number Generators. In: Design, Automation, and Test in
Europe – DATE 2012. EDAA (2012)
30. Wold, K., Tan, C.H.: Analysis and Enhancement of Random Number Generator
in FPGA Based on Oscillator Rings. In: 2008 International Conference on Recon-
figurable Computing and FPGAs, pp. 385–390 (2008)
Same Values Power Analysis
Using Special Points on Elliptic Curves
1 Introduction
An approach to prevent the DPA on ECC implementations is to randomize the
base point P at the begin of an Elliptic Curve Scalar Multiplication (ECSM).
Commonly randomization techniques are projective randomization [6], and ran-
dom isomorphic class [10]. However, Goubin pointed out that some points with a
zero value, namely (0, y) and (x, 0), are not randomized [9]. For an elliptic curve
E, containing E points, if an attacker can choose the base point P = (k −1 mod
E)(0, y) for some integer k, he can detect if the point kP is computed during
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 183–198, 2012.
c Springer-Verlag Berlin Heidelberg 2012
184 C. Murdica et al.
the ECSM of P . This attack is called Refined Power Attack (RPA). Akishita
and Takagi extended this attack by pointing out that some special points with
no zero value might take a zero value in auxiliary registers during addition or
doubling points [2]. The Zero Value Point Attack (ZPA) increases the number
of possible special points in an elliptic curve.
In this paper we introduce a new attack called Same Values Analysis (SVA).
Instead of looking at points that show up zero values, we look at points that
show up equal values during doubling or addition algorithms. We list conditions
of special points that have those properties, even if the point is randomized using
the random projective coordinates countermeasure. An Internal Comparative
Power Analysis is used to detect if the special point appears during an ECSM.
Our attack is the first attack based on Internal Power Analysis on an ECC
implementation. New possible special points on elliptic curves are given that
we must give a particular attention, in addition of special points that show up
zero values given in [2]. Finally, the isogeny defence, sometimes used to protect
against the RPA and the ZPA, must be updated to also prevent our attack.
The rest of the article is structured as follows. In section 2, we describe some
properties of elliptic curve cryptosystems and give a description of the RPA
and the ZPA. In section 3, we give a detailed description of the Same-Values
Analysis. Section 4 is a summary of the possibility of the RPA, the ZPA and the
SVA on standardized elliptic curves. In this section, we will show that the only
standardized curve secured against the RPA and the ZPA, is not secured against
the SVA. In section 5, we discuss on the isogeny defence. In section 6 we discuss
on countermeasures to prevent the SVA. Finally, we conclude in section 7.
In a finite field K = Fp , with p a prime such that p > 3, an elliptic curve can be
described by his Weierstrass form:
E : y 2 = x3 + ax + b .
We denote by E(K) the set of points (x, y) ∈ K satisfying the equation, plus
the point at infinity O. E(K) has an Abelian group structure. Let P1 = (x1 , y1 )
and P2 = (x2 , y2 ) two points in E(K), different from the point O. The point
P3 = (x3 , y3 ) = P1 + P2 can be computed as:
x3 = λ2 − x1 − x2 , y3 = λ(x1 − x3 ) − y1 .
y1 −y2 3x21 +a
with λ = x1 −x2 if P1
= P2 , and λ = 2y1 if P1 = P2 .
To avoid costly inversions, one can use the Jacobian projective coordinates sys-
tem. A point P = (x, y) is denoted by P = (X : Y : Z) in Jacobian coordinates
Same Values Analysis 185
Algorithm 1. ECDBLJ
Input: P1 = (X1 , Y1 , Z1 ) = (λ21 x1 , λ31 y1 , λ1 )
Output: P3 = (X3 , Y3 , Z3 ), P3 = 2P1
1: T4 ← X1 , T5 ← Y1 , T6 ← Z1
2: T1 ← T4 × T4 ; {= X12 } (4 ← 2 × 2)
3: T2 ← T5 × T5 ; {= Y12 } (6 ← 3 × 3)
4: T2 ← T2 + T2 ; {= 2Y12 } (6 ← 6 + 6)
5: T4 ← T4 × T2 ; {= 2X1 Y12 } (8 ← 2 × 6)
6: T4 ← T4 + T4 ; {= 4X1 Y12 = S} (8 ← 8 + 8)
7: T2 ← T2 × T2 ; {= 4Y14 } (12 ← 6 × 6)
8: T2 ← T2 + T2 ; {= 8Y14 } (12 ← 12 + 12)
9: T3 ← T6 × T6 ; {= Z12 } (2 ← 1 × 1)
10: T3 ← T3 × T3 ; {= Z14 } (4 ← 2 × 2)
11: T6 ← T5 × T6 ; {= Y1 Z1 } (4 ← 3 × 1)
12: T6 ← T6 + T6 ; {= 2Y1 Z1 } (4 ← 4 + 4)
13: T5 ← T1 + T1 ; {= 2X12 } (4 ← 4 + 4)
14: T1 ← T1 + T5 ; {= 3X12 } (4 ← 4 + 4)
186 C. Murdica et al.
15: T 3 ← a × T3 ; {= aZ14 } (4 ← 0 × 4)
16: T1 ← T 1 + T3 ; {= 3X12 + aZ14 = M } (4 ← 4 + 4)
17: T3 ← T1 × T1 ; {= M 2 } (8 ← 4 × 4)
18: T3 ← T 3 − T4 ; {= −S + M 2 } (8 ← 8 − 8)
19: T3 ← T3 − T4 ; {= −2S + M 2 = T } (8 ← 8 − 8)
20: T 4 ← T 4 − T3 ; {= S − T } (8 ← 8 − 8)
21: T1 ← T1 × T4 ; {= M (S − T )} (12 ← 4 × 8)
22: T 4 ← T 1 − T2 ; {= −8Y14 + M (S − T ))} (12 ← 12 − 12)
23: X3 ← T3 , Y3 ← T4 , Z3 ← T6
24: return (X3 , Y3 , Z3 )
Algorithm 2. ECADDJ
Input: P1 = (X1 , Y1 , Z1 ) = (λ21 x1 , λ31 y1 , λ1 ), P2 = (X2 , Y2 , Z2 ) = (λ22 x2 , λ32 y2 , λ2 )
Output: P3 = (X3 , Y3 , Z3 ), P3 = P1 + P2
1: T2 ← X1 , T3 ← Y1 , T4 ← Z1 , T5 ← X2 , T6 ← Y2 , T7 ← Z2
2: T1 ← T7 × T7 ; {= Z22 } (22 ← 12 × 12 )
3: T2 ← T2 × T7 ; {= X1 Z22 = U1 } (21 22 ← 21 × 22 )
4: T3 ← T3 × T7 ; {= Y1 Z2 } (31 12 ← 31 × 12 )
5: T3 ← T3 × T1 ; {= Y1 Z23 = S1 } (31 32 ← 31 12 × 22 )
6: T1 ← T4 × T4 ; {= Z12 } (21 ← 11 × 11 )
7: T5 ← T5 × T1 ; {= X2 Z12 = U2 } (21 22 ← 22 × 21 )
8: T6 ← T6 × T4 ; {= Y2 Z1 } (11 32 ← 32 × 11 )
9: T6 ← T6 × T1 ; {= Y2 Z13 = S2 } (31 32 ← 11 32 × 21 )
10: T5 ← T5 − T2 ; {= U2 − U1 = H} (21 22 ← 21 22 − 21 22 )
11: T7 ← T4 × T7 ; {= Z1 Z2 } (11 12 ← 11 × 12 )
12: T7 ← T5 × T7 ; {= Z1 Z2 H = Z3 } (31 32 ← 11 12 × 21 22 )
13: T6 ← T6 − T3 ; {= S2 − S1 = R} (31 32 ← 31 32 − 31 32 )
14: T1 ← T5 × T5 ; {= H 2 } (41 42 ← 21 22 × 21 22 )
15: T4 ← T6 × T6 ; {= R2 } (61 62 ← 31 32 × 31 32 )
16: T2 ← T2 × T1 ; {= U1 H 2 } (61 62 ← 21 22 × 41 42 )
17: T5 ← T1 × T5 ; {= H 3 } (61 62 ← 41 42 × 21 22 )
18: T4 ← T4 − T5 ; {= −H 3 + R2 } (61 62 ← 61 62 − 61 62 )
19: T1 ← T2 + T2 ; {= 2U1 H 2 } (61 62 ← 61 62 + 61 62 )
20: T4 ← T4 − T1 ; {= −H 3 − 2U1 H 2 + R2 = X3 } (61 62 ← 61 62 − 61 62 )
21: T2 ← T2 − T4 ; {= U1 H 2 − X3 } (61 62 ← 61 62 − 61 62 )
22: T6 ← T6 × T2 ; {= R(U1 H 2 − X3 )} (91 92 ← 31 32 × 61 62 )
23: T1 ← T3 × T5 ; {= S1 H 3 } (91 92 ← 31 32 × 61 62 )
24: T1 ← T3 × T5 ; {= −S1 H 3 + R(U1 H 2 − X3 )} (91 92 ← 91 92 − 91 92 )
25: X3 ← T4 , Y3 ← T1 , Z3 ← T7
26: return (X3 , Y3 , Z3 )
The binary method is vulnerable to the Simple Power Analysis (SPA). The
Montgomery Ladder can be used, secured against the SPA.
Our attack is presented against the Montgomery Ladder but it can also work
on other algorithms.
DPA countermeasures of section 2.3 do not protect against the RPA [9] and the
ZPA [2]. The RPA assumes that the scalar is fixed for several ECSM and the
base point P can be chosen.
The attacker starts by finding special points with zero values from the elliptic
curve E given:
– Point (x, 0): a point of this form is of order 2. In Elliptic Curve Cryptosys-
tems, the order of the provided base point is checked and points of order 2
never appear during an ECSM.
– Point (0, y): this point does not give a special order of the point, so it can
appear during the ECSM.
Let P0 = (0, y). Suppose that the Montgomery Ladder (algorihm (4)) is used
to compute an ECSM. Suppose that the attacker already knows the N − i − 1
leftmost bits of the fixed scalar d = (dN −1 , dN −2 . . . , di+1 , di , di−1 , . . . , d0 )2 . He
tries to recover the unknown bit di .
The attacker computes the point P = ((dN −1 , dN −2 . . . , di+1 )−1 2 mod E)P0
and gives P to the targeted chip that computes the ECSM using the scalar d.
If di = 0, then the point P0 will be doubled during the ECSM. If the attacker
if able to recognize a zero value in a register during the doubling, he can then
conclude whether his hypothesis was correct.
The ZPA [2] uses the same approach, except the attack is not only interested
in zero values in coordinates but in intermediate registers when computing the
double of a point, or during the addition of two points. Such a point is defined
as a zero-value point.
Remark 1. The condition (ED2) can be avoided by changing the way to compute
T in ECDBLJ (1). See [2] for more details.
Same Values Analysis 189
Proof. Given the definition of a same-values point and given a point P1 = (X1 :
Y1 : Z1 ) = (λ21 x1 : λ31 y1 : λ1 ), we have to check equalities during the doubling
whatever the value of λ1 . So we have to check equalities between terms with the
same degree of λ1 , and zero values between all terms. We define by Si the set of
values that involve a term in λ1 with a degree i. Looking at algorithm (1), we
have:
– S2 = {X1 , Z12 },
– S4 = {X12 , Y1 Z1 , 2Y1 Z1 , 2X12 , 3X12 , aZ14 , M },
– S6 = {Y12 , 2Y12 },
– S8 = {2X1 Y12 , M 2 , −S + M 2 , T, S − T },
– S12 = {4Y14 , 8Y14 , M (S − T ), −8Y14 + M (S − T ))}
Equal values can only be found in the same set. Checking equality of each term
by set, and developing give the relations of the theorem. Checking equal zero
values between all terms give no additional condition.
attack consists of comparing two traces during the computation of the ECSM
with the base point P and the computation of the ECSM with the base point 2P .
However, this attack does not work if one of the countermeasures of section 2.3
is used. This is not the case of the RPA, ZPA and SVA.
A different approach of Collision Power Analysis was introduced by Schramm
et al. [11] to attack an implementation of the DES. Their attack consists in de-
tecting collision in the same trace during the computation of an algorithm, not
in different traces. Clavier et al. exposed an attack against a protected imple-
mentation of the AES, using the same principle [5].
An internal collision attack on ECC and RSA implementation was proposed
in [13], but it is restricted to inputs of low order, which are avoided in Elliptic
Curve Cryptosystems. In [7], they combined active and passive attacks: they
introduce a fault in the high order base point to become a low order base point
of another curve; they exploit the fact that the point at infinity shows up on
certain conditions of the scalar used.
Our attack is the first attack based on Internal Collision Analysis on ECC
implementation with a base point of high order.
The attacker recover several traces of the power consumption during the com-
putation of dP . He tries to detect internal collision of power consumption of each
trace using the methodology of [11] and [5]. If a collision is detected, he can con-
clude that di = 0. Otherwise, he conclude that di = 1.
Using this method, the attacker can recursively recover all bits of d.
We can see that our attack works against all standardized curves above. Note
that the curve secp224r1 does not contain any zero-value point, but it contains
same-values points. The curve secp224r1 is then secure against the RPA and the
ZPA but not against the SVA.
As mentioned in section 2.4, a countermeasure to prevent the RPA and the ZPA
consists of using a curve E isogenous to the original curve E, such that E
not contain any zero-value points. This countermeasure was introduced in [12],
but only to prevent the RPA. The countermeasure was extended in [3] to also
prevent the ZPA. They also give an algorithm to, given a curve E, find a curve
E l-isogenous to E such that:
– l is as small as possible (if l > 107, the countermeasure is not applied)
– E does not contain any zero-value points
– E , with equation y 2 = x3 + a x + b , satisfies a = −3, for efficiency
Those conditions are not enough because of our new attack, the SVA.
194 C. Murdica et al.
In [3], they give isogenous curves of standardized curves SECG [1]. We denote
by I(secpXrY) a possible isogenous curve of secpXrY satisfying the conditions
above, computed using the algorithm given in [3]. The curves parameters are
given in appendix. If the isogeny degree is greater than 107, I(secpXrY) is not
computed (this is the case for secp160r2, secp192r1 and secp384r1). We give
below a summary of the presence of same-values points on these curves. Degree
is the degree of the isogeny between secpXrY and I(secpXrY).
We can see that isogenous curves obtained with the algorithm in [3] are secure
against the RPA and the ZPA, but not against the SVA. If one uses the isogeny
defence as a countermeasure, he must update the algorithm to find isogenous
curves that are also secure against the SVA.
7 Conclusion
We introduced the first attack on Elliptic Curve Cryptosystem implementation
based on internal collision analysis, with a base point of high order. The attack,
called Same-Values Analysis, is based on special points that show up equalities on
intermediate values during a doubling of a point. These special points are called
same-values points. The random projective coordinates countermeasure [6] does
not prevent the attack. We showed that the only standardized curve SECG [1]
that does not contain any zero-value point to perform the RPA or ZPA, contains
same-values points: we can then apply our attack on this curve. We also showed
that the isogney defence to prevent the RPA and the ZPA must be updated to
also prevent the SVA.
Scalar randomization [6], scalar splitting [4] or base point blinding [6] should
be used to prevent against the RPA, the ZPA and the SVA.
Further work is to evaluate the SVA on real implementation and compare it
with the RPA and ZPA.
References
1. Standard for Efficient Cryptography (SECG), http://www.secg.org/
2. Akishita, T., Takagi, T.: Zero-Value Point Attacks on Elliptic Curve Cryptosystem.
In: Boyd, C., Mao, W. (eds.) ISC 2003. LNCS, vol. 2851, pp. 218–233. Springer,
Heidelberg (2003)
3. Akishita, T., Takagi, T.: On the Optimal Parameter Choice for Elliptic Curve
Cryptosystems Using Isogeny. In: Bao, F., Deng, R., Zhou, J. (eds.) PKC 2004.
LNCS, vol. 2947, pp. 346–359. Springer, Heidelberg (2004)
4. Ciet, M., Joye, M.: (Virtually) Free Randomization Techniques for Elliptic Curve
Cryptography. In: Qing, S., Gollmann, D., Zhou, J. (eds.) ICICS 2003. LNCS,
vol. 2836, pp. 348–359. Springer, Heidelberg (2003)
5. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Improved Collision-
Correlation Power Analysis on First Order Protected AES. In: Preneel, B., Takagi,
T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 49–62. Springer, Heidelberg (2011)
196 C. Murdica et al.
6. Coron, J.-S.: Resistance against Differential Power Analysis for Elliptic Curve
Cryptosystems. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp.
292–302. Springer, Heidelberg (1999)
7. Fan, J., Gierlichs, B., Vercauteren, F.: To Infinity and Beyond: Combined Attack
on ECC Using Points of Low Order. In: Preneel, B., Takagi, T. (eds.) CHES 2011.
LNCS, vol. 6917, pp. 143–159. Springer, Heidelberg (2011)
8. Fouque, P.-A., Valette, F.: The Doubling Attack – Why Upwards Is Better than
Downwards. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS,
vol. 2779, pp. 269–280. Springer, Heidelberg (2003)
9. Goubin, L.: A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems. In:
Desmedt, Y.G. (ed.) PKC 2003. LNCS, vol. 2567, pp. 199–210. Springer, Heidelberg
(2002)
10. Joye, M., Tymen, C.: Protections against Differential Analysis for Elliptic Curve
Cryptography. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS,
vol. 2162, pp. 377–390. Springer, Heidelberg (2001)
11. Schramm, K., Wollinger, T., Paar, C.: A New Class of Collision Attacks and Its
Application to DES. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 206–
222. Springer, Heidelberg (2003)
12. Smart, N.P.: An Analysis of Goubin’s Refined Power Analysis Attack. In:
Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 281–290.
Springer, Heidelberg (2003)
13. Yen, S.-M., Lien, W.-C., Moon, S.-J., Ha, J.C.: Power Analysis by Exploiting
Chosen Message and Internal Collisions – Vulnerability of Checking Mechanism
for RSA-Decryption. In: Dawson, E., Vaudenay, S. (eds.) Mycrypt 2005. LNCS,
vol. 3715, pp. 183–195. Springer, Heidelberg (2005)
secp160r1
secp160r1:
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFF
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFC
b = 1C97BEFC 54BD7A8B 65ACF89F 81D4D4AD C565FA45
I(secp160r1) (the isogeny degree is 13):
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFF
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFC
b = 1315649B C931E413 D426D94E 979B5FF8 83FE89C1
Same Values Analysis 197
secp224r1
secp224r1:
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 00000000 00000000 00000001
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE FFFFFFFF FFFFFFFF FFFFFFFE
b = B4050A85 0C04B3AB F5413256 5044B0B7 D7BFD8BA 270B3943 2355FFB4
I(secp224r1) = secp224r1 (the isogeny degree is 1)
secp256r1
secp256r1:
p = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFF
a = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFC
b = 5AC635D8 AA3A93E7 B3EBBD55 769886BC 651D06B0 CC53B0F6 3BCE3C3E
27D2604B
I(secp256r1) (the isogeny degree is 23):
p = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFF
a = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFC
b = ACAA2B48 AECF20BC 9AB54168 A691BCE4 117A6909 342F0635 C278870F
3B71578F
secp521r1
secp521r1:
Alexander Krüger
1 Introduction
Simple Power Analysis (SPA) and Differential Power Analysis (DPA) are a major
threat to implementations of ECC/RSA-Cryptosystems. In an implementation
of ECC the double-and-add-algorithm can be used to calculate the point dP
given the point P and a secret scalar d. Given one power trace, the attacker
might be able to distinguish between additions and doublings. Thus she can
find out the secret key, because in every round of the algorithm an addition is
performed, if and only if the corresponding bit is one. A countermeasure against
this attack is the insertion of dummy additions, which means that an addition
is performed in every round of the algorithm regardless the corresponding bit.
In this case still a DPA is possible, in which the attacker collects several power
traces and calculates the correlation between the power consumption and certain
intermediate values, s. [1]. A countermeasure against DPA is (additive) exponent
blinding: Here given the secret scalar d and the order y of the basepoint P as
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 199–214, 2012.
c Springer-Verlag Berlin Heidelberg 2012
200 A. Krüger
2.1 Notation
As in the paper [6] we will use the following notation: We have an elliptic curve
E, which is defined over a finite field F , and a base point P ∈ E(F ). Furthermore
we have a secret scalar d of bit length k, i.e. d < 2k . The cryptographic device
computes the point dP and sends it to another party, as in the Diffie-Hellmann
key exchange. So the attacker is assumed to know the points P and dP and wants
The Schindler-Itoh-attack in Case of Partial Information Leakage 201
If the attacker is able to decide with a certain error rate from a single trace,
whether a given operation is a dummy addition, she can use the Schindler-Itoh-
attack to find the scalar. The Schindler-Itoh-attack applies to RSA and to ECC.
Here only ECC is considered. There are two versions of the Schindler-Itoh-attack:
The basic version and the enhanced version.
Basic Version. In the basic version the attacker finds a t-collision, i.e. t traces,
where the same factor is used for exponent blinding in each trace, and uses a
majority decision for every single bit of the blinded secret exponent. This way
she can correct the errors she made when guessing the secret scalar from single
traces.
There are essential 3 phases of the basic version of the attack:
2u+1
t s
qt,b = b (1 − b )t−s . (2)
s=u+1
s
Now if t = 2u + 1 traces with the same blinding factor are found, (k ∗ R) ∗ qt,b
false guesses are to be expected. The attacker does not know for which bits the
(k+R)∗qt,b k+R
majority decision was wrong. This yields i=0 i expected operations
to correct the remaining errors brute force. Note that the attacker is assumed
to know dp and, thus, assumed to be able to verify a certain hypothesis for d.
Enhanced Version. In the enhanced version the attacker finds several pairs of
u-tuples of traces, where for each pair the sum of the blinding factors of both u-
tuples are the same. This means we have two u-tuples of indices (j1 , j2 , . . . , ju )
and (i1 , i2 , . . . , iu ) corresponding to blinded scalars vk = d + rk y, such that
rj1 + rj2 ) + . . . + rju = ri1 + ri2 + . . . + riu . Finding several collisions yields a
system of linear equitations in the blinding factors, which can be solved. This
way the secret scalar d can be found.The steps of the enhanced version of the
attack are the following:
1. Find several u-tuples with the same sum of blinding factors. Obtain a system
of linear equations in the blinding factors r1 , r2 , . . . , rN over ZZ.
2. Find r1 , r2 , . . . , rN
with (r1 , r2 , . . . , rN ) = (r1 , r2 , . . . , rN
) + (c, c, . . . , c) by
solving this system.
3. Compute for all j < N : ṽj − rj y = d + rj y + ej − (rj − c)y = d + cy + ej .
Then determine d + cy ≡ d (mod y). In [6] an explicit algorithm for this step
is given.
In the enhanced version the definition of the error vectors ej is different than in
the basic version: ej is defined by ej := ṽj − vj , where vj is the correct blinded
exponent and ṽj is the erroneous blinded exponent, which is the outcome of the
SPA.
certain error rate. We will now turn to scenarios, where only partial information
about the secret scalar leaks with a certain error rate. Such scenarios are plau-
sible, if a window-algorithm, where there are several types of additions, is used
instead of the double-and-add-algorithm. Perhaps the attacker can distinguish
between some types of addition, but cannot distinguish between all types of ad-
dition revealing only some information abozut the secret scalar. As an example
we consider a 2-bit window with dummy additions. Given an elliptic Curve E
defined over a finite field F this algorithm looks like this:
n−1
Input: A point P in E(F ), a scalar d = i=0 di 2i with di ∈ {0, 1} and n even
Output: The Point Q = dP
1. Precompute:2P and 3P
2. Q := 0
3. Q̃ := 0
4. i := n − 1
5. While (i > 0)
5.1 Q := 2Q
5.2 Q := 2Q
5.3 If di = 1 and di−1 = 1, then Q := Q + 3P
5.4 If di = 1 and di−1 = 0, then Q := Q + 2P
5.5 If di = 0 and di−1 = 1, then Q := Q + P
5.6 If di = 0 and di−1 = 0, then
5.6.1 Choose x ∈ {1, 2, 3} randomly
5.6.2 Q̃ := Q + xP
6. Return Q̃
Two bits of the secret scalar are considered at once. After two doublings there
are four possible types of operations: addition of P , addition of 2P , addition of
3P and dummy addition. A dummy addition is randomly one of the tree types
of addition, whose result is not used. (For further analysis of the algorithm, see
[5], p. 614). The dummy additions are a countermeasure against SPA. Without
dummy additions an attacker able to distinguish between additions and can find
out, if dl = dl−1 = 0 holds. Clearly, if an attacker can discriminate between all
four of these types, she can find out the whole scalar by SPA. If an attacker can
discriminate between all four types of operations with a tolerable error rate, the
attacker can just apply the normal Schindler-Itoh-attack. But it is also possible,
that the attacker can only gain partial information by SPA, which means that
she can discriminate between different classes of operation with a certain error
rate, but not between all four types of operations. We will consider two scenarios.
1. The attacker knows the used exponentiation algorithm and can differentiate
between doublings and additions. Furthermore she can distinguish additions,
which are necessary for the calculation of dP , and dummy additions with
a certain error rate. But she cannot decide whether P , 2P or 3P is added.
This is plausible, if the attacker uses an address bit attack, see [3].
204 A. Krüger
2. The attacker knows the used exponentiation algorithm and can differentiate
between doublings and additions. Also she can decide, whether P , 2P , or 3P
is added with a certain error rate, but she cannot detect dummy additions.
In both scenarios the attacker can find out some information on the secret scalar
with a certain error rate, but not all information. In the two following chapters
it will be analyzed, whether an attack like the basic version of the Schindler-
Itoh-attack can be used to find the secret scalar. For that the attacker must be
able to find collisions of blinding factors and to correct her guessing errors. After
that she can use a variation of the BSGS-algorithm to find the secret scalar.
φ : IN → IN,
(an−1 , an−2 , . . . , a0 )2 with even n → (bn/2−1 , bn/2−2 , . . . , b0 )2 ,
with bi = a2i + a2i+1 − a2i a2i+1
So the decision rule is reasonable for small error rates. But finding a good value
for the threshold should be harder than in the original scenario of the Schindler-
Itoh-attack since the mean of the HAM (xj ⊕ ej ) ⊕ (xm ⊕ em ) without a collision
is below 1/2. This means that the difference between the two cases is not as big
as in the leakage scenario originally considered in [6].
Remark 1. If all decisions are correct, the expected number of traces which the
attacker needs to find a t-collision is 2αR , where
α = (1 + log(t!) + 1 − Rlog(2))/(Rtlog(2) − 1). (4)
If wrong decisions occur, more traces are needed. Note that the probability of
wrong decision is bigger here than in the original setting of the Schindler-Itoh-
attack.
Once a collision of t traces is found, the second phase of the attack works just
like in [6]: For every single bit the majority decision is applied.
Proposition 1. If φ(vi ) is known, then the blinded secret scalar vi can be cal-
culated with approximately 33(k+R)/16 steps on average.
Proof. There are (k + R)/2 additions, because every addition corresponds to
two bits. On average 1/4 of these additions are dummy additions and there are
3(k + R)/8 additions which are not dummy additions. Each of these additions
corresponds to two bits. We can modify the BSGS-Algorithm the following way:
k+R k+R
W rite vj = q ∗ 2 2 + r with r < 2 2
k+R
k+R−1
2 −1
j
Clearly q = vi,j 2 and r = vi,j 2j .
j= k+R j=0
2
Proposition 2. Approximately 20.2972∗(k+R) trials are needed on average
to find the secret scalar. If all majority decisions are correct, the attack
using the Schindler-Itoh-approach is more efficient than a BSGS-algorithm if
R < (1 − 2 ∗ 0.2972)/(2 ∗ 0.2972)k ≈ 0, 6825k.
206 A. Krüger
If R is much smaller than k, the attack is significantly more efficient than the
standard attack. Note that this is the usual case in practice. Just like in [6] the
probability qt,b that for a given addition the majority decision yields the wrong
result is:
t
2u+1
qt,b = sb (1 − b )t−s , t = 2u + 1. (5)
s=u+1
s
If t traces with the same blinding factor are found, the expected number l of
wrong majority decisions is:
k+R
l = qt,b ∗
(6)
2
Now the number L of possible locations of these at most l errors is:
l
(k + R)/2
L := (7)
i=0
i
For example for k = 256 and R = 16 this means, that the attack is more efficient
than the standard attack, iff log2 (L) < 47.168. For k = 256 and R = 32 the
attack is more efficient, iff log2 (L) < 42.413. Note that approximately 280+log2 (L)
trials are necessary to find the secret scalar in case of R = 16 and approximately
286+log2 (L) trials in case of R = 32, compared to 2128 trials in case of the standard
attack. Table 1 shows different values for M = log2 (L) + 0.2972 ∗ (k + R) for
different error rates b and different values for t, where a t-collision is found.
Table 1. Attack from chapter 4: Values for M for (k,R)=(256,16) and (k,R)=(356,32)
for different values t and different values for b . The attacker needs 2M trials on average.
definition of α. If wrong decisions occur, more traces are needed. Note that the
probability of wrong decision is bigger here than in the original setting of the
Schindler-Itoh-attack.
After finding a class of t traces with the same exponent the majority decision
rule can be applied for every single bit just like in [6].
Lemma 5. After the attacker has successfully distinguished between the three
k+R
types of addition, she can find out the secret scalar with approximately 2 4
trials on average.
Proof. There are k+R2 additions. For each addition the attacker has to find out,
whether it is a dummy addition. With a modified BSGS-Algorithm as in propo-
k+R
sition 2 she needs 2 4 steps on average to do so.
For reasonable values of k and R, this is more efficient than the standard attack
using the BSGS-algorithm. We now estimate the number of trials to correct the
guessing errors. Note that the attacker has got three different possible guesses
for each operation (addition of P , 2P and 3P ), so she cannot always apply the
majority decision rule straightforward. For example if the attacker has found
a collision of three traces with the same blinded scalar, it is possible, that she
observed on addition of P , one addition of 2P and one addition of 3P at one
position. To solve this problem, the majority decision rule is applied bitwisely.
A better decision strategy might further improve the attack. Straightforward
combinatorics shows that the probability for a wrong majority decision for a
given operation is:
2u+1 s
s−u−1
t s 2 1
qt,b = b (1 − b )t−s ∗ ( + ∗ ), t = 2u + 1. (8)
s=u+1
s 3 3 ∗ 2s−1 i=0
i
If t traces with the same blinding factor are found, the expected number l of
wrong majority decisions is:
3(k + R)
l = qt,b ∗
(9)
2
Now the number L of possible locations of the l errors at most l errors is:
l
3(k + R)/8
L := (10)
i=0
i
k+R
Proposition 4. On average L ∗ 2 4 +1 trials are enough to correct the wrong
majority decisions. The attack is on average more efficient than the BSGS-
algorithm, if log2 (L) < (k−R−4
4 .
Proof. Follows from Lemma 5 and the fact, that for every error there are two
possible corrections.
The Schindler-Itoh-attack in Case of Partial Information Leakage 209
So for k=256 and R=16 this means log2 (L) < 59 and for k = 256 and R = 32
this means log2 (L) < 55. Note that on average 2M trials are necessary to find
the secret scalar, where M = 69 + log2 (L), if k = 256 and R = 16, and M =
73 + log2 (L), if k = 256 and R = 32, compared to 2128 trials in case of the
BSGS-algorithm. Table 2 shows different values for M for different error rates b
and different values for t, where a t-collision is found.
Table 2. Attack from section 5.1.: values for M for (k,R)=(256,16) and (k,R)=(356,32)
for different values t and different values for b . The attacker needs 2M trials on average.
Thus in case of a dummy addition the probability, that the attacker has made
the same guess four or five times is significantly lower than in the case that the
operation is necessary for the calculation. Therefore the attacker cannot only
correct her guessing errors, but will additionally obtain some information about
the location of the dummy additions.
Definition 1. Let us call operation, where she made the same guess at most
three times suspicious. Let N1 be the number of suspicious operations and D1 the
number of suspicious dummy operations. Let N2 be the number of not suspicious
operations and D2 the number of not suspicious dummy operations.
With high probability most of the suspicious operations are dummy operations
and only view dummy operations are not suspicious, i.e. N1 − D1 and D2 are
small. The attacker searches the set of suspicious operations to find the N1 −
D1 suspicious non-dummy operations and searches the set of not suspicious
operations to find the D2 dummy operations, which are not suspicious. This
way she can find the whole secret scalar much faster than by trying blindly all
possible locations. The following proposition gives the average workload for an
attack for t=5:
Proof. The formulas for the average value of N1 , N2 , D1 and D2 are obtained
straightforward. The attacker needs Ω1 trials to find the suspicious non-dummy
operations and Ω2 trials to find the not suspicious dummy operations. With
a variation of the BSGS-algorithm like in lemma 2 this can be reduced to its
square root.
√ √
This means that on average 2M trials with M = log2 ( Ω1 ∗ Ω2 ) + log2 (L)
are necessary to find the secret scalar for t = 5. For the definition of L see
section 5.1. L operations derive from the correction of guessing errors. Table 4
shows the value of M for (k,R)=(256,16) and (k,R)=(256,32) and for different
error rates b .
Thus this is clearly a further advance to the attack in 5.1. and an attack on an
implementation with k = 256 and R ≤ 32 becomes definitely feasible for error
rates up to 0.15.
The Schindler-Itoh-attack in Case of Partial Information Leakage 211
Table 4. Attack from section 5.2.: values for M for (k,R)=(256,16) and (k,R)=(356,32)
for t=5 and different values for b . The attacker needs 2M trials on average.
A Barrier in the First Phase. In the first phase the attacker has to detect col-
lisions of sums of blinding
factors. She decides for a collision, if
HAM (N AF ( uk=1 ṽjk − uk=1 ṽik )) < b0 for a certain threshold b0 . Essential
for the decision rule is the fact that
u u
u
u
ṽjk − ṽik = ry + ejk − eik ,
k=1 k=1 k=1 k=1
u
u
where r = 0, if rjk = ri and r ∈ ZZ − {0}else.
k=1 k=1
212 A. Krüger
Now consider the case, where the attacker cannot find out ṽj = vj + ej , because
she gets only partial information from SPA. Just like in the discussion of the
first the scenario we can define a map φj : IN → IN, which is not injective. Given
a blinded secret scalar vj , the attacker can find φj (vj ) + ej . Here φj (vj ) contains
partial information about vj and ej is an error vector. We have
u
u
u
u
u
u
(φjk (vjk )+ejk )− (φik (vik )+eik ) = φjk (vjk )− φik (vik )+ ejk − eik
k=1 k=1 k=1 k=1 k=1 k=1
(13)
But this is not the case in general. In fact it is the case, if the following three
conditions hold:
1. There is one single map φ with φj = φ for all j, e.g. φj does not
depend on j.
2. φ(a + b) = φ(a) + φ(b) and φ(a − b) = φ(a) − φ(b) for all a > b
3. φ(0) = 0
In the first scenario, which is considered in chapter 4, the first condition holds;
we even defined φ explicitly. In the second scenario, which is considered in
chapter 5, the first condition does not hold, because it is randomly decided
which type of addition is performed, when a dummy addition is performed. This
means φj (vj ) does not only depend on vj but also on random decisions the cryp-
tographic device made, when the j-th power trace was recorded. Thus the map
depends on j. This means the first condition may be fulfilled sometimes, but is
not fulfilled always. However, even if the first condition is fulfilled, there is no
reason at all to assume, that the second condition is fulfilled as well, because the
map depends on which information the attacker can get. There is no reason,
why this information should lead to an additive map . This is why finding colli-
sions of sums of blinding factors should be impossible, if only partial information
is available. But in phase one it is possible to find a weaker condition 2’) than
condition 2) and 3), so that the Hamming Weight of the NAF of (16) will be
small and it is possible to find collisions:
2’) HAM (N AF (HAM (N AF ( uk=1 φjk (vjk ) − uk=1 φik (vik )) is sufficiently
small.
A Barrier in the Third Phaset. But even if phase one and phase two work,
we still face a problem in the third phase of the attack: The attacker first has
to compute: ṽj − rj y = d + rj y + ej − (rj − c)y = d + cy + ej . This is not
possible, because she only knows φ(ṽj ) and does not know ṽj . Let lj be the
number of natural numbers x with φ(x) = φ(ṽj ). The attacker can compute the
The Schindler-Itoh-attack in Case of Partial Information Leakage 213
set {x − rj y|x ∈ IN and φ(x) = φ(ṽj )} and gets lj hypotheses for d + cy + ej . But
she can only verify a hypothesis for d + cy and not a hypothesis for d + cy + ej .
Now the algorithm presented in [6] to compute d + cy from d + cy + ej requires
the values d + cy + ej for several indices j as input. In fact in [6] the algorithm
takes this this value for all N recorded power traces as an input. If thevalues
r
d + cy + ej are needed for the indices j1 , j2 , . . . , jr the attacker needs i=1 lji
trials to find the secret scalar. This can be viewed as impractical for large values
of r andlj .
Another reasonable approach in the second leakage scenario would be to guess
the whole scalar and use the enhanced attack just like in [6]. Here the inability to
recognize dummy operations just raises the error rate: The necessary operations
are misinterpreted with an error rateb and additionally the dummy operations
are always misinterpreted. By lemma 3 this would lead to an overall error rate
of 16 ≈ 16, 67%, even if b = 0. In [6] 13% is given as the maximal error rate
which the enhanced version tolerates for R=16. So this approach would also be
impossible for R ≥ 16. However the example does show, that despite the barriers
highlighted in this chapter, every algorithm and every leakage scenario has to
be analyzed carefully to determine, whether a variation of the enhanced variant
of the Schindler-Itoh-attack can be mounted.
7 Conclusion
It has been shown, that the basic version of the Schindler-Itoh-attack can be
generalized to a setting where only some of the bits leak by SPA with a certain
error rates. This is possible in two scenarios, where different information about
a discrete exponentiation using a window-method can be found out. In the first
scenario dummy additions can be detected with a certain error rate, but different
types of additions are indistinguishable. In the second scenario the three types
of additions can be distinguished, but dummy operations cannot be detected.
In both scenarios it is possible to find collisions and to correct the guessing
errors using the Schindler-Itoh-attack and to find out the remaining bits using
a variation of the BSGS-algorithm. In the second scenario it is even possible to
gain information about the location of the dummy operations by the methods
of the Schindler-Itoh-attack. This way an attack on an implementation with
a secret scalar of bit length 256 and a 32-bit randomization becomes feasible.
However finding the collisions is more difficult than in the setting considered in
[6], because the expected hamming weight in presence of a collision is higher. It
has been shown, that it is difficult to apply the enhanced version of the attack
to the case of partial information leakage due to the lack of arithmetic structure.
However, it has to be further investigated, in which situations this is possible.
References
1. Coron, J.-S.: Resistance against Differential Power Analysis for Elliptic Curve Cryp-
tosystems. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
Springer, Heidelberg (1999)
2. Fouque, P.-A., Kunz-Jacques, S., Martinet, G., Muller, F., Valette, F.: Power Attack
on Small RSA Public Exponent. In: Goubin, L., Matsui, M. (eds.) CHES 2006.
LNCS, vol. 4249, pp. 339–353. Springer, Heidelberg (2006)
3. Itoh, K., Izu, T., Takenaka, M.: A Practical Countermeasure against Address-Bit
Differential Power Analysis. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003.
LNCS, vol. 2779, pp. 382–396. Springer, Heidelberg (2003)
4. Krüger, A.: Kryptographie mit elliptischen Kurven und Angriffe darauf (Elliptic
Curce Cryptography and Attacks on it). Bachelor thesis, University of Bonn (2011)
5. Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography.
CRC Press (1996)
6. Schindler, W., Itoh, K.: Exponent Blinding Does Not Always Lift (Partial) Spa
Resistance to Higher-Level Security. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011.
LNCS, vol. 6715, pp. 73–90. Springer, Heidelberg (2011)
Butterfly-Attack on Skein’s Modular Addition
1 Introduction
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 215–230, 2012.
c Springer-Verlag Berlin Heidelberg 2012
216 M. Zohner, M. Kasper, and M. Stöttinger
2 Background Theory
The following section provides an introduction to the background theory needed
to understand this paper. It gives an overview of hash functions and side channel
analysis and then present a brief introduction to Skein.
where || denotes the concatenation, ⊕ the binary XOR, K is the pre-shared key,
H is the hash function used and IP AD as well as OP AD are two constants
defined as the hexadecimal value 3636...36 and 5C5C...5C which have the same
size as the state value used in H.
The correct key can then be recovered by assuming the candidate with the
highest absolute correlation.
An advancement of the DPA was presented in [6]. In this contribution, Bevan
et al. compute the difference of means for the actual measurement and compare
it to the theoretical distribution of the key using the least square method. The
key hypotheses that minimizes the least square method is then chosen as the
correct key.
2.3 Skein
Skein is a hash function family with three different internal state sizes: 256, 512
and 1024 bits. Skein consists of three components:
218 M. Zohner, M. Kasper, and M. Stöttinger
– Threefish: a block cipher with block sizes of 256, 512 and 1024 bits.
– Unique Block Iteration (UBI): a construction similar to Matyas-Meyer-
Oseas that builds a compression function by connecting Threefish operations.
– Optional Argument System: provides optional features like tree hashing
or MAC.
The following sections will provide a basic introduction to Skein needed to un-
derstand the attack. For further information refer to [9].
Skein Hashing: To compute the hash of a message Skein compresses data using
Threefish and connects the Threefish calls using UBI. Upon being called, Three-
fish divides the input M = m0 m1 ...mN −1 and the state value K = k0 k1 ...kN −1
for N ∈ {4, 8, 16} in 64 bit blocks. Then it performs an AddRoundKey operation
with M as plaintext and K as first round key:
With the resulting state values Skein then performs 72 rounds (80 rounds in case
of the block size being 1024 bits) of the operations MIX and Permute and every
fourth round it adds the next round key. For the attack described in this paper only
the initial AddRoundKey operation is relevant, the other operations are therefore
omitted. To compute the hash of a message, Skein calls UBI three times, first with
a configuration block as input and the state value 0, then with the outcome of the
first call as state value and the message as input and lastly with 0 as input and
the output from the second call as state value (cf. Figure 1).
Skein MAC: Skein provides its own MAC function for performance reasons.
While with HMAC a hash function has to be called two times, Skein MAC
only needs one additional UBI call in the beginning using the key as input (cf.
Figure 2). The output of this UBI call is then used as input for the usual Skein
hashing. Note that the resulting chaining value before the third UBI call that
uses the message as input is constant if the same key and configuration are used,
so it can be precomputed and stored.
Butterfly-Attack on Skein’s Modular Addition 219
Fig. 3. Theoretical correlation of all elements in the field (28 ) with the correct key 89
(010110012 ) and the symmetric counterpart 217 (110110012 )
220 M. Zohner, M. Kasper, and M. Stöttinger
that it is symmetric with the points of origin being the correct key and the key
candidate with only the most significant bit differing from the correct key k, a
candidate from which we will from now on refer to as symmetric counterpart.
The symmetrical effect is due to the fact that:
corr(k + d (mod 256)) = corr(k − d (mod 256)), 0 ≤ d ≤ 128. (5)
Furthermore, noticable is that an order can be established amongst the key can-
didates regarding their correlation. Candidates with a small Hamming distance
to the correct key tend to have a higher correlation than candidates with a high
Hamming distance. Also amongst the group of candidates whose distance is of
the same Hamming weight, the ones which differ in more significant bits from the
correct key have a higher correlation than the ones which differ in less significant
bits.
The reason for all these effects is the carry bit and the carry bit propagation,
respectively [8]. A candidate has a higher correlation if its variance is similar to
the variance of the correct key for a specific input. In case of the modular addition
the implication is that the more constant the Hamming distance between a
candidate and the correct key is for all inputs, the higher the resulting correlation
of the candidate.
Fig. 4. Hamming weight difference between the correct key 8 and every other of the
24 candidates
Figure 4 shows the Hamming weight difference for all candidates to the correct
key, in this case the value is 8. The difference for all candidates was computed and
the occurrences for each value were accumulated. As one can see, the symmetric
counterpart (0) is either one Hamming weight bigger or one Hamming weight
smaller than the correct key, making it the second most constant candidate
amongst all others.
Butterfly-Attack on Skein’s Modular Addition 221
If a carry occurs in a bit a candidate differs from the correct key, the Hamming
distance of the candidate to the correct key is changed. The bigger the Hamming
distance of a candidate to the correct key is before the addition, the likelier the
Hamming distance is to change after an addition, therefore causing a loss of
correlation. Furthermore, bit differences in less significant bits influence more
significant bits by either causing a faulty carry or not causing a missing a correct
carry, therefore inducing even further discrepancy in the overall value.
Lastly there is another conspicuity if in the correlation of the modular addi-
tion. If one compares the resulting correlations of a specific candidate for two
different bit sizes, one may notice that the correlation increases with the size of
operands processed in the modular addition. This effect is due to features of the
Pearson correlation. According to Appendix A for a b bit sized modular addition
the Pearson correlation can be written as:
2b
i=1 (xi ∗ yi )
b b
−b (6)
4 ∗2
HH
R
N HH
8 bits or 16 bits 32 bits or 64 bits
H
≤ 16 bits regular attack regular attack
> 16 bits Divide and Conquer Divide and Conquer (costly)
N
or 16 times 16 bits values. Thereby, the complexity of the DPA is reduced
from 2N to N8 ∗ 28 or 16
N
∗ 216 hypothesis computations. The separation can
be performed since the device splits the N bit modular addition into a size its
registers can process. The device performs the operations independently, only
passing a carry bit to the next block if necessary. We refer to this attack as the
Divide and Conquer Approach.
If the size of the modular addition operands and the register size of the device
both exceed 16 bits, the divide and conquer becomes more costly in terms of
required measurements. The problem is that the bits are not being processed
independent anymore and the power consumption of the omitted bits influences
the attacked bits. Thus, more power traces are required in order to average out
noise.
The concept behind this is to keep the change of the key value to an attack-
able size. If it is known that only a certain number of bits were likely to have
changed during the modular addition and the other bits of the key remained
the same, the untouched bits can be omitted in the DPA, reducing the complex-
ity. So instead of having a complexity of 2N for the analysis the masked divide
and conquer strategy reduces the complexity to 2λ ∗ N λ . The corresponding
hypothesis function h for the DPA is:
where mλ is the λ bits input block with the variable data.
Depending on the least significant bit of the next key block, a carry in the
most significant bit of the hypothesis leads to an increase of the overall Hamming
weight by 1 in 50% of the cases, to no change at all in 25% of the cases, to a
decrease by 1 in 12.5% of the cases, to a decrease by 2 in 6.25% of the cases and
so on. Thus an increase in Hamming weight by 1 is most likely and we chose the
hypothesis function to not reduce the result modulo 2λ .
Fig. 5. Correlation of masked divide and conquer with block size 8 bits and 212 as
correct key
because their variance no longer matches. However, other candidates can have
their correlation increased because their behavior resembles the behavior of the
measured traces.
If the candidate is not the point of origin, there is no symmetrical effect and
large values tend to be added whereas if it is the point of origin, the values tend
to be small. Figure 6 shows the squared difference computed for the correlation
of Figure 5. As one can see the key and it’s symmetric counterpart both have
the minimal value which leaves us with two possible keys.
With the Butterfly-Attack we have drastically reduced the number of pos-
sible keys for a λ bit block from 2λ to 2. While this may suffice for a lot of
scenarios, there exist some algorithms like Skein-1024 which have a large state
size, therefore still leaving us with too many key candidates to claim the attack
successful. A summary over the complexity after the Butterfly-Attack for all
Butterfly-Attack on Skein’s Modular Addition 225
Fig. 6. Butterfly-Attack performed for each key of Figure 5 on the point in time with
the highest correlation
Skein variants is depicted in Table 2. For Skein-256 it is still feasible to find the
correct state, needed to forge a Skein MAC, by verifying all possible combina-
tions. However, an exhaustive search for Skein-512 already requires dedicated
hardware and for Skein-1024 it is computationally infeasible to try all possible
combinations. Therefore, in the next section we suggest a modified version of the
masked divide and conquer which reduces the number of candidates for a block
even further and thus lets us successfully attack Skein-512 and Skein-1024.
Improving the Masked Divide and Conquer: To further reduce the number
of possible keys and make the attack feasible for Skein-512 and Skein-1024, we
have to determine the correct key for a λ bit block. Because the uncertainty
only remains for the most significant bit independent of the position of the
mask we let the λ bit blocks partially overlap during the measurement phase.
In that manner, a most significant bit in one position becomes one of the lesser
significant bits in the next position and we can determine it’s value. The number
of positions shifted can be varied as required. For instance if one shifts the mask
bits by λ − 1, the number of measurements needed would decrease to the factor
of ( Nλ−1
−2∗λ
) + 2. It would also be possible to test every single bit multiple times
with this approach, adding redundancy and therefore more confidence in the key
guess, but raising the number of measurements needed.
226 M. Zohner, M. Kasper, and M. Stöttinger
1. split the input message M = M0 , M1 , ..., Mn−1 and the key state K =
K0 , K1 , ..., Kn−1 into n 64 bit blocks where n ∈ {4, 8, 16}, depending on
the Skein variant used
2. perform a modular addition of Mi with Ki for 0 ≤ i < n
The message is directly added to the key state, so we do not have to change it,
making this is an ideal attack scenario for the masked divide and conquer. We
demonstrate the attack on Skein-256, but because of the 64 bit blocks attacked
are independent of each other and the only difference in the attack for the Skein
variants is the number of the 64 bit blocks, the attack is also applicable for
Skein-512 and Skein-1024 with nearly the same complexity.
Following, we present the results of the divide and conquer on a 8 bit AVR
ATMega2561 microcontroller [2] to prove the practical applicability of our side
channel analysis on Skein. Then we switch to a 32 bit ARM Cortex-M3 micro-
controller [1] and show the results of our masked divide and conquer. As im-
plementation in both cases we used the reference implementation of Skein-256
submitted to the third round of the SHA-3 competition [3].
Because the AVR ATMega 2561 has registers of 8 bits size and therefore splits the
64 bits modular addition in eight modular additions of 8 bits operands each, we do
not need to mask the input. In total we applied the DPA using 200 power traces
which were enough to achieve a stable correlation. During the analysis of the re-
sulting correlations we observed that for some key bytes the key with the highest
correlation was not the correct key candidate. However, the symmetric shape of
the correlation was still noticeable. Therefore, we applied the Butterfly-Attack on
these correlations, resulting in the correct key candidate and it’s symmetric coun-
terpart. Note that because we could not use our approach of shifting the mask to
attain the correct key because the most significant bit is always the most signifi-
cant bit in this 8 bits block. In order to pick the correct key, we analyzed the effect
of a carry in the most significant bit and decided for certain input, whether a carry
during the addition of two 8 bit values occurred or not. Thereby, we were able to
restore the state value, enabling us to compute legitimate Skein-MACs.
In order to estimate the influence of the noise, due to the ommitted bits, we
attacked the 32 bit ARM Cortex-M3 with the divide and conquer approach.
In total we performed 5000 measurements of the device. We split the 32 bits
key into four blocks of 8 bits each, computed hypotheses for each block, and
Butterfly-Attack on Skein’s Modular Addition 227
In order to estimate the benefit of the masked divide and conquer, we also
performed it on the 32 bit ARM Cortex-M3. As mask size λ we settled on
8 bits because it brought the best trade-off for our setup (see Appendix D).
Starting with the eight least significant bits of a 32 bits block, we performed
500 measurements and then shifted the random byte four positions towards the
most significant bit. We proceeded in this manner until we covered all 32 bits. In
total we had to perform 3500 measurements for each 32 bits modular addition.
With the same setting as for the divide and conquer approach, we were able to
achieve a stable correlation and thus recover the key for all 8 bits blocks after
only 500 measurements.
To reduce the number of measurements for Skein-256 we attacked the eight
32 bits blocks simultaneously by choosing the input for all of them the same.
This decreased the measurements needed by a factor of eight for Skein-256 and
speeds up the computation of the hypothesis because it only has to be computed
once for each of the 32 bits blocks. We computed the most significant bits for
the first of the two 32 bits blocks by analyzing the effects of a carry in the most
significant bit and deciding for each input whether or not a carry occurred.
For the DPA we used the hypothesis function mentioned in Equation 8 in order
to compute the correlation between each of the 256 keys and the traces measured.
The resulting correlation was then analyzed using the Butterfly-Attack and the
two points of origin of the symmetry were chosen as possible key candidates.
In that manner we proceeded for all four 64 bits blocks and for every position
of the mask. Finally, we combined the key hypothesis by choosing the bit value
with the higher occurrence for each of the 256 bits, resulting in the correct state
which enabled us to forge legitimate Skein-MACs.
conquer against modular addition which performs inefficiently for this particular
scenario due to the infeasible computing overhead.
Using the known divide and conquer method and the masked divide and con-
quer method the key could not be recovered by applying a DPA in the regular
manner. In order to cope with this problem we introduce the Butterfly-Attack,
a new analysis method specifically designed for attacking modular addition. To
show the applicability of our attack, we applied it to the reference implemen-
tation of Skein-256 where we successfully recovered the constant state value,
enabling us to forge Skein-MACs. In future work we will perform our attack on
more complex platforms like the Virtex-5 FPGA and we will also attack different
Skein variants.
References
1. ARM CortexM-3 product site,
http://www.arm.com/products/processors/cortex-m/cortex-m3.php
2. AVR ATMega2561 product site, http://www.atmel.com/
3. Skein submission to the final round of the SHA-3 contest,
http://csrc.nist.gov/groups/ST/hash/sha-3/Round3/documents/
Skein_FinalRnd.zip
4. Preneel, B., Govaerts, R., Vandewalle, J.: Hash Functions Based on Block Ciphers:
A Synthetic Approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp.
368–378. Springer, Heidelberg (1994),
http://www.springerlink.com/content/adq9luqrkkxmgk03/fulltext.pdf
5. Benoît, O., Peyrin, T.: Side-channel analysis of six SHA-3 candidates. In: Mangard,
S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 140–157. Springer,
Heidelberg (2010),
http://www.springerlink.com/content/822377q22h78420u/fulltext.pdf
6. Bevan, R., Knudsen, E.: Ways to Enhance Differential Power Analysis. In: Lee, P.J.,
Lim, C.H. (eds.) ICISC 2002. LNCS, vol. 2587, pp. 327–342. Springer, Heidelberg
(2003)
7. Krawczyk, H., Bellare, M., Canetti, R.: Rfc 2104 - hmac: Keyed-hashing for message
authentication. Tech. rep., IEEE (1997), http://tools.ietf.org/html/rfc2104
8. Lemke, K., Schramm, K., Paar, C.: DPA on n-Bit Sized Boolean and Arithmetic
Operations and Its Application to IDEA, RC6, and the HMAC-Construction. In:
Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 205–219.
Springer, Heidelberg (2004)
9. Ferguson, N., Lucks, S., Schneier, B., et al.: The skein hash function family. Sub-
mission to NIST, Round 3 (2010), http://www.skein-hash.info
10. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
11. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards (2007)
12. Zohner, M., Kasper, M., Stöttinger, M.: Side channel analysis of the sha-3 finalists.
In: Design, Automation & Test in Europe, DATE (2012)
Butterfly-Attack on Skein’s Modular Addition 229
A Proof of Equation 6
The proof of Equation 6 with bit size b, hypothesis x and reference y assumes
that the hypothesis was computed using Appendix B and for the reference the
results of the correct key was used. We utilized the fact that for a b bit modular
addition the corresponding mean values x and y equal 2b and the variances σx
and σy equal 4b .
2b
i=1 (xi − x)(yi − y)
Rρ (x, y) =
2b 2
2b 2
i=1 (xi − x) i=1 (yi − y)
2b 2b 2b 2b
(xi ∗ yi ) − i=1 (xi ∗ y) − i=1 (x ∗ yi ) + i=1 (x ∗ y)
= i=1
2b ∗ σx 2b ∗ σy
2b
(xi ∗ yi ) − ( 2b )2 ∗ 2b − ( 2b )2 ∗ 2b + ( 2b )2 ∗ 2b
= i=1 b b
4 ∗2
2b
(xi ∗ yi ) − ( 2b )2 ∗ 2b
= i=1 b b
4 ∗2
2b b2 b
i=1 (xi ∗ yi ) 4 ∗2
= b b
− b b
4 ∗2 4 ∗2
2 b
(xi ∗ yi )
= i=1b b
− b.
4 ∗2
The correlation of the symmetric counterpart for a modular addition of bit size
b can be computed by:
2b b b−1
i=1 (xi ∗ yi ) i=1 2∗ i−1 ∗ i ∗ (i − 1)
b
−b= b
− b. (10)
4 ∗ 2b 4 ∗ 2b
One can find a trade-off in choosing the parameter λ. The bigger λ is chosen, the
fewer blocks have to attacked but also the higher the complexity of the DPA. On
the contrary the smaller λ is chosen, the higher the number of blocks which have
to be attacked but the lesser the complexity of the DPA. The optimal choice for
λ minimizes the following equation:
64 N
TT otal = Tmeasure ∗ ∗ Nmeasure + Thypo ∗ 2λ ∗ ∗ Nmeasure (11)
λ λ
where Tmeasure denotes the time needed for one measurement, Nmeasure is the
number of measurements needed for one mask position and Thypo is the time
needed to compute one key hypothesis during the DPA.
The equations minimizes the total time needed for the attack, which consists
of the time needed for the measurement process and the time needed for the
DPA.
MDASCA: An Enhanced Algebraic
Side-Channel Attack for Error Tolerance
and New Leakage Model Exploitation
1 Introduction
How to improve the efficiency and feasibility of side-channel attacks (SCAs) has
been widely studied in recent years. The objective is to fully utilize the leakage
information and reduce the number of measurements. This can be achieved from
two directions. One is to find new distinguishers for key recovery [3, 8, 34]. The
This work was supported in part by the National Natural Science Foundation of
China under the grants 60772082 and 61173191, and US National Science Foundation
under the grant CNS-0644188.
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 231–248, 2012.
c Springer-Verlag Berlin Heidelberg 2012
232 X. Zhao et al.
DES in eSmart 2010 [10] and Trivium in COSADE 2011 [21]. The data com-
plexity required in the attacks [10, 21] can be further reduced. As addressed
in [27, 28], ASCA is a generic framework and can be applied to more leakage
models. However the diversity, error, and complexity of leakage models make it
difficult to adopt new models in ASCA.
2.2 MDASCA
Existing ASCAs [27, 28] add only a few equations to the algebraic system, as-
suming the deduction from leakages is single and correct. As a result, they are
sensitive to errors and likely to fail in practical attacks. In this section, we pro-
pose an enhanced ASCA technique, named Multiple Deductions-based ASCA
(MDASCA), in which a deduction set of multiple values is created and con-
verted into a constraint equation set. As long as the deductions are enumerable,
the whole equation system can be solved by a SAT solver in a reasonable amount
of time. Next, we describe the core of MDASCA, the representation of multiple
deductions with algebraic equations.
Suppose a targeted state X can be represented with m one-bit variables xj ,
j=1..m. φ(X) denotes the output value of the leakage function for X. If the
correct deduction d can be deduced accurately, d can be calculated as in Eq. (1).
d = φ(X), X = x1 x2 . . . xm (1)
Representing multiple deductions can be divided into the following two steps.
MDASCA: An Enhanced Algebraic Side-Channel Attack 235
Bi = b1i b2i . . . bm
i , di = φ(Bi ), 1 ≤ i ≤ Sp (2)
Also, each d¯i ∈ D̄ is an “impossible deduction”. Similar to Eq. (2), new variables
B̄i and b̄ji are introduced. New equations can be built as in Eq. (3).
Which set to use (D, D̄, or both) is highly dependent on the leakage model and
the adversaries’ ability. Typically, if Sp < Sn , D is used because it leads to a less
complicated equation system. Otherwise, D̄ is preferred.
2. Building equations for the relationship between d and D (or D̄).
Note that if Bi is equal to X, di =d. m × Sp one-bit variables ei j are introduced
to represent whether bi j is equal to xj . ei j =1 if bi j = xj ; otherwise ei j =0. Sp
one-bit variables ci are introduced to represent whether di is correct or not. ci =1
if di =d; otherwise ci =0. ci can be represented by Eq. (4), where ¬ denotes the
NOT operation.
m
ei j = ¬(xj ⊕ bi j ), ci = ei j (4)
j=1
Since only one element in D is equal to d, only one ci is 1. This can be represented
as:
c1 ∨ c2 ∨ . . . ∨ cSp = 1, ¬ci ∨ ¬cj = 1, 1 ≤ i < j ≤ Sp (5)
As for the impossible deductions, none of the elements in D̄ is the correct de-
duction d. This can be represented by Eq. (6).
m
ei j = ¬(xj ⊕ bi j ), ci = ei j = 0 (6)
j=1
Let nv,φ and ne,φ denote the number of the newly introduced variables and
ANF equations in representing one deduction di (d¯i ). nv,φ and ne,φ depend
on φ. According to Equations (2), (4), (5), (1 + 2m + nv,φ )Sp variables and
1 + (1 + m + ne,φ )Sp + S2p ANF equations are introduced to represent D.
According to Equations (3), (6), (1+2m+nv,φ )Sn variables and (1+m+ne,φ )Sn
ANF equations are introduced to represent D̄.
The new constraint equations mentioned above are quite simple. They can
be easily fed into the SAT solver [33] to accelerate the key search. To launch
MDASCA, it is important to choose φ and determine the deduction set D/D̄
under different models, which is addressed in Section 3.
236 X. Zhao et al.
MDASCA can handle the deduction offset easily by setting φ=HW (),
d=HW (X). For example, if d=3, D = {2, 3, 4}.
Cache in the microprocessors can leak the secret information about the indexes
of table(S-Box) lookups and compromise the cryptosystems. Numerous cache
attacks on AES have been published. There are three leakage models in cache
attacks: time driven (TILM) [4], access driven (ACLM) [2, 22, 25], and trace
driven (TRLM) [1, 6, 7, 13–15, 20]. Under TILM, only the overall execution
time is collected and it is difficult to deduce the internal states from few traces.
Under ACLM and TRLM, adversaries can measure the cache-collisions and infer
internal states with a single cache trace. Now we discuss how to use these two
models in MDASCA.
Suppose a table has 2m bytes and a cache line has 2n bytes (m > n). The whole
table will fill 2m−n cache lines. A cipher process V performs k table lookups,
denoted as l1 , l2 , . . . , lk . For each lookup li , the corresponding table index is Xi .
Assume lt is the targeted table lookup.
1.Access driven leakage model. Under ACLM [2, 22, 25], the cache lines
accessed by V can be profiled by a malicious process S and used to deduce lt . S
first fills the cache with its own data before V performs the lookups, and accesses
the same data after the lookups are done. S can tell whether a datum is in cache
or not by measuring its access time. A shorter access time indicates a cache hit.
A longer access time is a cache miss, implying that V already accessed the same
cache line. If S knows which cache line that lt accessed, he knows the higher
m − n bits of Xt . Let X denote the function that extracts the higher m − n
bits of X. Then the correct deduction for lt is Xt .
In practice, S observes many cache misses from two sources. Some are from
the k − 1 lookups other than lt . Some are from interfering processes that run
in parallel with V. Assume the interfering processes have accessed g different
cache lines, which can be considered as g more “lookups” at Xk+1 , ...Xk+g . All
the possible values of Xt form a collection L. Without loss of generality, we
MDASCA: An Enhanced Algebraic Side-Channel Attack 237
assume the first Sp values of L are distinct and Sp ≤ k+g. The possible deduction
set D can be written as:
Note that the impossible deduction set D̄ can also be obtained, Sn =2m−n − Sp .
So ACLM can be easily interpreted with multiple deductions by setting φ = ·,
d = Xt and di = Xi . The values of elements in D or D̄ are known to
adversaries after deductions.
2.Trace driven leakage model. Under TRLM [1, 6, 7, 13–15, 20], S can
keep track of the cache hit/miss sequence of all lookups to the same table of V via
power or EM probes. Suppose there are r misses before lt . Let SM (X) be the set
of lookup indexes corresponding to the r misses, SM (X) = {Xt1 , Xt2 , . . . , Xtr }.
If lt is a cache hit, the data that Xt tries to access have been loaded into the
cache by previous lookups. The possible deduction set D for Xt can be written
as in Eq. (9) where Sp = r.
ACLM based cache attacks on AES have been widely studied [2, 22, 25]. This
paper takes AES implementations in OpenSSL1.0.0d as the target, where n=6
and m=11,10,8 for table with the size of 2KB, 1KB or 256 bytes. Accordingly,
the leakages (i.e., = m − n) are the higher 5, 4 and 2 bits of table lookup index,
respectively. This is equivalent to say that one leakage reduces ξ(K) about 2 .
We consider two typical ACLM models.
1.Bangerter model. Bangerter et al. [2] launched attacks on AES in OpenSSL
1.0.0d with one 2KB table and 100 encryptions, assuming S can profile the ac-
cessed cache lines per lookup. In the attack, the CFS Scheduler of Linux and
Hyper thread techniques are required. We refer to their work as Bangerter model.
In the attack, there are 16 leakages of table lookups in each round. After the
first round analysis, ξ(K) can be reduced to 2(8−)×16 . After the second round
analysis, ξ(K) can be approximately reduce to 2(8−2)×16 . As to =4 or 5, two
rounds of leakages from one cache trace are enough to recover K. As to =2, after
the first round analysis, ξ(K) can be reduced to 26×16 . After the second round
analysis, three cache traces are enough to recover ξ(K) into 2(6−2×3)×16 = 1.
2.Osvik model. Osvik et al. [25] and Neve et al. [22] conducted ACLM based
cache attacks on AES in OpenSSL 0.9.8a, assuming S can profile the accessed
cache lines per encryption. We refer to their work as Osvik model.
In OpenSSL0.9.8a, four 1KB tables (T0 , T1 , T2 , T3 , m=10) are used in the first
nine rounds, and a table T4 is used in the last round. The attacks in [25] can
succeed with 300 samples by utilizing the 32 table lookups in the first two rounds
of AES. The attack in [22] can break AES with 14 samples by utilizing the 16 T4
table lookups in the final round of AES. AES in OpenSSL1.0.0d removes T4 and
uses T0 , T1 , T2 , T3 throughout the encryption. As a result, it is secure against
[22] and increase the difficulties of [25].
MDASCA: An Enhanced Algebraic Side-Channel Attack 239
The first TRLM based cache attack on AES was proposed by Bertoni et al. [6]
and later improved in [13] through real power analysis. More simulation results
with 1KB table were proposed in [1, 6, 7, 20]. Recently, several real-world attacks
were proposed in [14, 15] on AES with 256B table on a 32-bit ARM micropro-
cessor. In [14], the cache events are detected with a power probe and only the
first 18 lookups are analyzed. 30 traces reduce ξ(K) to 230 . In [15], the attacks
are done with an EM probe and ξ(K) is further reduced to 10 when two more
lookups are considered.
Let pi and ki be the i-th byte of the plaintext and the master key. The
i-th lookup index is pi ⊕ ki . In TRLM, the XOR between two different lookup
indexes (pi ⊕ ki and pj ⊕ kj ) can be leaked. From the 16 leakages in the first
round, pi ⊕ ki ⊕ pj ⊕ kj can be recovered, 1≤ i < j ≤16. ξ(K) can be reduced
to 2128−15∗ and further reduced to 1 by analyzing leakages in the later rounds.
(a) # (b) Nl
Fig. 1. Estimations in TRLM based MDASCA on AES
table lookups is ( NN
c −1
c
) . And # = Nc (1 − ( NN c −1
c
) ). Let y be the -th lookup
index, and ρ be the reduced percentage of ξ(y ) due to the -th lookup. Then
Nl
Nc ) + (1 − Nc ) , as also shown in [1]. Let Nc × (ρ )
ρ = ( # 2 # 2
≤ 1, N ≈ −logρNc .
In [14, 15], Nc =16. Fig. 1(a) shows how # changes with . It’s clear to see
that even after 48 table lookups, # < 16 and ρ < 1, which means there are
still some cache misses that could be used for deductions. Fig. 1(b) shows how
N changes with , where the minimal of N is 4, and the maximal is 22.24. If
N is 5 or 6, ξ(K) can be reduced to 276.10 and 274.13 , respectively.
Using the leakages in the second round, if N is 5 or 6, ξ can be further
reduced to 276.10−48.80 =227.30 , and 274.13−54.78 =219.35 approximately. After the
third round, ξ(K) can be reduced to 1 with a high probability. So approximately
5 or 6 cache traces are enough to launch a successful TRLM based attacks on
AES. As it is really hard to analyze the third round leakages manually, we will
just verify it through MDASCA experiments, as shown in Section 5.4.
We adopt the technique in [19] to build the equation set for AES. Each S-Box
can be represented by 254 ANF equations with 262 variables. The full AES-128
including both encryption and key scheduling can be described by 58288 ANF
equations with 61104 variables. MDASCA does not require the full round of
AES, as analyzed in Section 4. We choose the CryptoMiniSat 2.9.0 solver [33]
to solve the equations. The solver is running on an AMD Athlon 64 Dual core
3600+ processor clocked at 2.0GHz. We consider that a trial of MDASCA fails
when no solution is found within 3600 seconds.
Fig. 2(b) shows the AddRoundKey of the first round in a single power trace
where the offset is one. In this example, from Section 2.1, the error rate e is
9
16 = 56.25%. The deduction set for the 8−th byte is D={2, 3, 4}, Sp = 3.
To represent each HW deduction in D, 99 new variables (not including the
8 variables in Eq. (2)) and 103 equations are required(see Appendix 2). So,
nv,φ = 99, ne,φ = 103. According to Section 2.2, we use 348 new variables and
340 ANF equations to represent D.
As in [28], we considered several attack scenarios: known or unknown P/C,
consecutive or random distributions of correct HW deduction. Since the PBOPT
solver used in [24] fails on AES even when there is no error, we only compare
our results with [28]. We repeat the experiment for each scenario 100 times and
compute the average time. Our results indicate that, although the SAT solver has
some smart heuristics, most of the trials (more than 99%) succeed in reasonable
time with small variation.
Table 2 lists how many rounds are required for different scenarios. With one
power trace, when leakages are consecutive, and if P/C is known, only one round
is required instead of 3 in [28]; if P/C is unknown, 2 rounds are required instead
of 3 in [28]. The results are consistent with the analysis in Section 4.1.
Under HWLM, the average HW deduction error rate of a single power traces
is about 75%, which is also indicated in [27, 28]. MDASCA can succeed even
with 80% error rate by analyzing 3 consecutive rounds in a single trace within 10
minutes. Even when the error rate is 100% (the number of all the HW deductions
is 3), AES can still be broken by analyzing two consecutive rounds of two power
traces within 2 minutes, or one round of three traces with 2 minutes. From
the above, we can see that MDASCA has excellent error tolerance and can
significantly increase the robustness and practicability of ASCA.
Note that MDASCA can also exploit larger number of HW deductions, e.g, 4.
If the HW leakages are not enough for the solver to find the single and correct
solution, a full AES encrypt procedure of an additional P/C can be added into
the original equation set to verify the correctness of all the possible solutions.
The time complexity might be a bit higher without increase the data complexity.
(a) Cache events sampled after each table (b) Cache events sampled after one en-
lookup cryption
Fig. 4(a) shows the cache events of the first 48 lookups (first 3 rounds) in 5
cache traces. The table lookups in the first round are more likely to cause misses.
The probabilities of cache hit increase in following rounds. However, even after
48 lookups, there is still high probability that full 16 cache lines on the 256B
table have not been updated yet, consistent with the analysis in Section 4.3.
Let the table lookup index be yi (1 ≤ i ≤ 48). Fig. 4(b) shows the number of
deductions for 48 table lookup indexes of the 5 cache traces. This number is
increased with the table lookup number, and the range is 0-15 for these 5 traces.
Take the 8-th and 9-th lookups (also l8 and l9 ) of the first sample in Fig. 4(a)
as examples. As to l8 , a cache miss is observed. Then the impossible deduction
set of y8 is D̄ = {y1 , y2 , y3 , y5 , y6 , y7 }, Sn = 6. Note that all the
variables of D̄ have already been represented in the AES algebraic equation
system, and nv,φ = 0, ne,φ = 0. We only needs to compute the new intro-
duced variables and equations by Eq. (6). According to Section 2.2, 30 ANF
equations with 30 variables can be generated. As to l9 , a cache hit happens.
From Section 2.2, the possible deduction set of y9 (higher four bits of y9 ) is
D = {y1 , y2 , y3 , y5 , y6 , y7 , y8 }, Sp = 7. As all the variables of D have
been represented in the AES system, according to Section 2.2, 57 ANF equations
with 35 variables can be added to the equation system.
For some table lookups, it is hard to tell whether they are cache miss or hit
because the peak is not high enough. In our MDASCA, we treat uncertain cache
events as cache hits. In some other scenarios, partially preloaded cache is also
considered and more cache hits are observed. Our MDASCA only utilizes cache
misses and still works in this case.
As in [15], we conduct several TRLM based MDASCAs on AES considering
three scenarios: with both cache hit and miss events, with cache miss events only,
with cache miss events and four preloaded cache lines. Each attack in the three
cases is repeated for 100 times. To accelerate the equation solving procedure,
we also input the candidates of 4 key bits into the equation system and launch
16 ASCA instances corresponding to 16 possible candidates. The comparisons
of our results with previous work are listed in Table 4.
MDASCA: An Enhanced Algebraic Side-Channel Attack 245
Attacks Utilized collisions Collision type Preloaded cache lines Sample size Key space time
[13] 16 lookups H/M 0 14.5 268 -
[14] 18 lookups H/M 0 30 230 -
[15] 20 lookups H/M 0 30 10 -
MDASCA 48 lookups H/M 0 5 (6) 1 1 hour(5 minutes)
[15] 20 lookups M 0 61 - -
MDASCA 48 lookups M 0 10 1 1 hour
[15] 20 lookups M 4 119 - -
MDASCA 48 lookups M 4 24 1 1 hour
From Table 4, TRLM based MDASCA can exploit cache behaviors of three
AES rounds (48 table lookups) and achieve better results than previous work
[14, 15]. At least five cache traces are able to recover 128-bit AES key within
an hour. The complexity for both online (number of measurements) and offline
(recovering the key from the measurements) phases has been reduced. Moreover,
the results are also consistent with the theoretical analysis in Section 4.3.
6 Impact of MDASCA
References
1. Acıı̈çmez, O., Koç, Ç.: Trace Driven Cache Attack on AES. In: Rhee, M.S., Lee,
B. (eds.) ICISC 2006. LNCS, vol. 4296, pp. 112–121. Springer, Heidelberg (2006)
2. Bangerter, E., Gullasch, D., Krenn, S.: Cache Games - Bringing Access-Based
Cache Attacks on AES to Practice. In: IEEE S&P 2011, pp. 490–505 (2011)
3. Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon,
N.: Mutual Information Analysis: A Comprehensive Study. Journal of Cryptol-
ogy 24, 269–291 (2011)
4. Bernstein, D.J.: Cache-timing attacks on AES (2004),
http://cr.yp.to/papers.html#cachetiming
5. Berthold, T., Heinz, S., Pfetsch, M.E., Winkler, M.: SCIP C solving constraint
integer programs. In: SAT 2009 (2009)
6. Bertoni, G., Zaccaria, V., Breveglieri, L., Monchiero, M., Palermo, G.: AES Power
Attack Based on Induced Cache Miss and Countermeasure. In: ITCC 2005, pp.
586–591. IEEE Computer Society (2005)
7. Bonneau, J.: Robust Final-Round Cache-Trace Attacks Against AES. Cryptology
ePrint Archive (2006), http://eprint.iacr.org/2006/374.pdf
8. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage
Model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp.
16–29. Springer, Heidelberg (2004)
9. Courtois, N., Pieprzyk, J.: Cryptanalysis of Block Ciphers with Overdefined Sys-
tems of Equations. In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp.
267–287. Springer, Heidelberg (2002)
MDASCA: An Enhanced Algebraic Side-Channel Attack 247
10. Courtois, N., Ware, D., Jackson, K.: Fault-Algebraic Attacks on Inner Rounds of
DES. In: eSmart 2010, pp. 22–24 (September 2010)
11. Dinur, I., Shamir, A.: Side Channel Cube Attacks on Block Ciphers. Cryptology
ePrint Archive (2009), http://eprint.iacr.org/2009/127
12. Faugère, J.-C.: Gröbner Bases. Applications in Cryptology. In: FSE 2007 Invited
Talk (2007), http://fse2007.uni.lu/slides/faugere.pdf
13. Fournier, J., Tunstall, M.: Cache Based Power Analysis Attacks on AES. In:
Batten, L.M., Safavi-Naini, R. (eds.) ACISP 2006. LNCS, vol. 4058, pp. 17–28.
Springer, Heidelberg (2006)
14. Gallais, J., Kizhvatov, I., Tunstall, M.: Improved Trace-Driven Cache-Collision
Attacks against Embedded AES Implementations. In: Chung, Y., Yung, M. (eds.)
WISA 2010. LNCS, vol. 6513, pp. 243–257. Springer, Heidelberg (2011)
15. Gallais, J., Kizhvatov, I.: Error-Tolerance in Trace-Driven Cache Collision Attacks.
In: COSADE 2011, pp. 222–232 (2011)
16. Goyet, C., Faugre, J., Renault, G.: Analysis of the Algebraic Side Channel Attack.
In: COSADE 2011, pp. 141–146 (2011)
17. Handschuh, H., Preneel, B.: Blind Differential Cryptanalysis for Enhanced Power
Attacks. In: Biham, E., Youssef, A.M. (eds.) SAC 2006. LNCS, vol. 4356, pp. 163–
173. Springer, Heidelberg (2007)
18. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
19. Knudsen, L.R., Miolane, C.V.: Counting equations in algebraic attacks on block
ciphers. International Journal of Information Security 9(2), 127–135 (2010)
20. Lauradoux, C.: Collision Attacks on Processors with Cache and Countermeasures.
In: WEWoRC 2005. LNI, vol. 74, pp. 76–85 (2005)
21. Improved Differential Fault Analysis of Trivium. In: COSADE 2011, pp. 147–158
(2011)
22. Neve, M., Seifert, J.: Advances on Access-Driven Cache Attacks on AES. In: Bi-
ham, E., Youssef, A.M. (eds.) SAC 2006. LNCS, vol. 4356, pp. 147–162. Springer,
Heidelberg (2007)
23. FIPS 197, Advanced Encryption Standard, Federal Information Processing Stan-
dard, NIST, U.S. Dept. of Commerce, November 26 (2001)
24. Oren, Y., Kirschbaum, M., Popp, T., Wool, A.: Algebraic Side-Channel Analysis
in the Presence of Errors. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 428–442. Springer, Heidelberg (2010)
25. Osvik, D.A., Shamir, A., Tromer, E.: Cache Attacks and Countermeasures: The
Case of AES. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 1–20.
Springer, Heidelberg (2006)
26. Percival, C.: Cache missing for fun and profit (2005),
http://www.daemonology.net/hyperthreading-considered-harmful/
27. Renauld, M., Standaert, F.-X.: Algebraic Side-Channel Attacks. In: Bao, F., Yung,
M., Lin, D., Jing, J. (eds.) Inscrypt 2009. LNCS, vol. 6151, pp. 393–410. Springer,
Heidelberg (2010)
28. Renauld, M., Standaert, F., Veyrat-Charvillon, N.: Algebraic Side-Channel Attacks
on the AES: Why Time also Matters in DPA. In: Clavier, C., Gaj, K. (eds.) CHES
2009. LNCS, vol. 5747, pp. 97–111. Springer, Heidelberg (2009)
29. Renauld, M., Standaert, F.-X.: Representation-, Leakage- and Cipher- Dependen-
cies in Algebraic Side-Channel Attacks. In: Industrial Track of ACNS 2010 (2010)
30. Roche, T.: Multi-Linear cryptanalysis in Power Analysis Attacks. MLPA CoRR
abs/0906.0237 (2009)
248 X. Zhao et al.
31. Schramm, K., Wollinger, T.J., Paar, C.: A New Class of Collision Attacks and
Its Application to DES. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp.
206–222. Springer, Heidelberg (2003)
32. Shannon, C.E.: Communication theory of secrecy systems. Bell System Technical
Journal 28 (1949); see in particular page 704
33. Soos, M., Nohl, K., Castelluccia, C.: Extending SAT Solvers to Cryptographic
Problems. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 244–257.
Springer, Heidelberg (2009)
34. Whitnall, C., Oswald, E., Mather, L.: An Exploration of the Kolmogorov-Smirnov
Test as Competitor to Mutual Information Analysis. Cryptology ePrint Archive
(2011), http://eprint.iacr.org/2011/380.pdf
Algorithm 1 computes ξ(x) from two input parameters. The first is n, the number
of bits in x. The second is m. If m is 1, the algorithm outputs ξ(x) for the cases
that HW (x) is known. Otherwise, it outputs ξ(x) for the cases that both HW (x)
and HW (S(x)) are known, where S(x) is the S-Box result of x.
1 Introduction
Side channel analysis utilizes physical leakage that is emitted during the execu-
tion of cryptographic devices in order to recover a secret. Among side channel
attacks, profiling based side channel attacks are considered to be the most effec-
tive attacks when a strong adversary is assumed. In profiling based side channel
attacks an adversary utilizes a training device, over which he has full control,
in order to gain additional knowledge for the attack against an identical target
device. A common profiling based side channel attack, the so called template
attack, was introduced as the most powerful type of profiling based side channel
attack from an information theoretical point of view [3]. However, since the tem-
plate attack requires many power traces in order to correctly model the power
consumption of the device, further profiling based side channel attacks were sug-
gested. A relatively new suggestion deals with machine learning techniques [8,13],
in particular support vector machines (SVM) [18]. These contributions focus on
the SVM as a binary classification method. The actual strength of SVM, i.e. the
ability to generalize a given problem, is not tackled and thus the full potential
of SVM in the area of side channel analysis is not utilized.
In this contribution we highlight the ability of SVM to build a generalized
model from an underspecified profiling set by introducing the so called SVM
W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 249–264, 2012.
c Springer-Verlag Berlin Heidelberg 2012
250 A. Heuser and M. Zohner
attack. The SVM attack is a profiling based side channel attack that reveals
cryptographic secrets by using SVM to predict the Hamming weight for a given
power consumption. We highlight the ability of SVM to build a generalized
model from a given profiling set by evaluating the required number of attack
traces to achieve a fixed guessing entropy for the SVM attack on power traces
with different noise levels and for a varying number of profiling traces. We show
that the SVM attack is better suited than the template attack when attacking
power traces with a high noise level and when given an underspecified profiling
base. Thus, the SVM attack lessens the significance of huge profiling bases, which
is the main drawback for template attacks.
2 Preliminaries
In this section we provide the reader with all necessary information about pro-
filing based side channel analysis, followed by an introduction to the area of
machine learning and support vector machines.
number of power trace vectors of the class c. Since template attacks rely on a
multivariate Gaussian noise model, the power trace vectors are considered to be
drawn from a multivariate distribution. More precisely,
Intelligent Machine Homicide 251
1 1
N (lc |μc , Σc ) = exp{− (lc − μc )T Σx−1 (lc − μc )} (1)
(2π)N 1/2 |Σc |1/2 2
1 1
Nc Nc
and μ̃c = ln , Σ̃c = (ln − μ̂c )(lnc − μ̂c )T .
Nc n =1 c Nc n =1 c
c c
N2
N2
log Lk∗ ≡ log P (li |c) = log N (li |μc , Σc ), (2)
i=1 i=1
where the class c is calculated according to the leakage model given a key guess
k ∗ and an input.
In this section we describe the idea of classifying linearly separable data us-
ing support vector machines (SVM). Suppose we have a training set with N1
instances1 and a test set with N2 instances. Each instance in the training set
contains one assignment yi (i.e. a class label) and several attributes xi (i.e. fea-
tures2 or observed variables) with i = 1, ...N1 . Using SVM, the goal is to classify
each test instance xi with i = 1, . . . , N2 according to the corresponding data
attributes. In the following we restrict our focus on a binary classification prob-
lem and describe its extension in Subsection 2.5. Given a training set of pairs
(xi , yi ) with xi ∈ Rn and yi ∈ {±1}. Then the different attributes can be clas-
sified via a hyperplane H described as w, x + b = 0, where w ∈ Rn denotes
b
the normal to the hyperplane, w the perpendicular distance to the origin with
b ∈ R, and ·, · the dot-product in Rn . One chooses the primal decision function
τp (x) = sgn(w, x + b) to predict the class of the test data, cf. Figure 1. Thus,
one has to select the parameters w and b, which describe the hyperplane H.
While there exist many possible linear hyperplanes that separate two classes,
only one unique hyperplane maximizes the margin between the two classes. The
construction of the optimal separating hyperplane is discussed in the following.
Optimal Hyperplane Separation. Let us now consider the points that lie
closest to the separating hyperplane, i.e. the support vectors (filled black in
Figure 1). Moreover, let us define the hyperplane on which the support vectors
lie with H1 and H2 and let d1 and d2 be the respective distances of H1 and H2
1
In the context of side channel analysis instances are called measurements.
2
In the context of side channel analysis features are relevant points in time.
252 A. Heuser and M. Zohner
yi = +1
Margin
1
min w2 s.b. yi (w, xi + b) ≥ 1, i = 1, . . . , m. (3)
w,b 2
n
1
n
max αi − αi αj yi yj xi , xj (4)
α
i=1
2 i,j=1
m
s.b. αi ≥ 0, i = 1, . . . , n and αi yi .
i=1
n
τd (x) = sgn( αi yi x, xi + b). (5)
i=1
Note that this decision requires only the calculation of the dot-product of each in-
put vector xi , which is important for the kernel trick described in
Subsection 2.4.
Intelligent Machine Homicide 253
1 m
min w2 + C ξi s.b. yi (w, xi + b) ≥ 1 − ξi , ξi ≥ 0 ∀i. (6)
w,b,ξ 2 i=1
Since for large ξi the constraints can always be met, an additional constant C > 0
is introduced, in order to determine the trade-off between margin maximization
and training error minimization. The conversation into the dual form is similar
to the standard case (see [17] for details).
a binary classifier exist, e.g: one-against-one [7], one-against-all [20], and error
coding [4]. Since all extensions perform similarly [11] we constrain ourselves to
the description of the one-against-one strategy. The one-against-one extension
trains a binary classifier for each possible pair of classes. Thus, for M classes
(M − 1)M/2 binary classifiers are trained. The prediction of all binary classifiers
is combined into the prediction of the multi-class classifier and the class with
the most votes is chosen. For more details, we refer to [7, 11].
N
N
log Lk∗ ≡ log PSV M (li |c) = log PSV M (li |c). (7)
i=1 i=1
arg max
∗
log Lk∗ . (8)
k
In our experiments we use the guessing entropy in order to evaluate how many
attack traces are required in order to achieve a fixed guessing entropy. We fix
the guessing entropy by defining two thresholds: a guessing entropy of 1 (GE 1 )
and a guessing entropy below 5 (GE 5 ).
4 Experimental Results
In the following, we describe the experimental setup and the results of the com-
parison between the SVM attack and template attack. To present the results, we
first identify the influence of the parameters of SVM. Subsequently, we utilize
the knowledge of the effect of the parameters in order to determine the best set
of parameters for each scenario and state the corresponding results.
In order to obtain traces with different noise levels, we added white Gaussian
noise to the ATMega-256-1 measurements. For the traces with a low noise level
we used the original microcontroller measurements. The traces with a medium
noise level were acquired by adding 30dB of white Gaussian noise. Lastly, the
15dB of white Gaussian noise was added to the microcontroller measurements
to obtain the traces with a high noise level. As SVM implementation we applied
the C-SVC implementation of libsvm [2], which uses the one-against-one multi-
class strategy and predicts the probability output PSV M . We trained the SVM
on a profiling base starting from 180 profiling traces and increased the number
of profiling traces by 180 after each evaluation until we reach a profiling base of
2700 profiling traces.
All experiment were performed using profiling traces for which each Hamming
weight occurred equally often. The equal distribution of Hamming weights is
beneficial for the evaluation using the guessing entropy since the prediction of
Hamming weights is independent of the distribution of Hamming weights. Note
that even if the attacker is not privileged to choose the plaintexts in the profiling
phase such that all Hamming weights occur equally often, an error weight can
be inserted during the training of SVM [18]. This weight is used to penalize
errors for different Hamming weights differently such that an equally distributed
profiling base can be simulated.
−0.24 8 0.12
HW 0
−0.26 7
0.1 HW 1
Power Consumption At Time B
HW 2
−0.28 6
HW 3
0.08 HW 4
−0.3 5
Probability
HW 5
−0.32 4 0.06 HW 6
HW 7
−0.34 3 HW 8
0.04
−0.36 2
0.02
−0.38 1
−0.4 0 0
−0.4 −0.35 −0.3 −0.25 −0.2 −0.400 −0.350 −0.300 −0.250
Power Consumption At Time A Power Consumption At Time A
(a) Two Dimensional Space for Times A (b) Density for Time A
and B
space with the two axes representing the power consumptions at times A and
B, whereas each Hamming weight is colored distinctly. Figure 2b shows the
density of each Hamming weight for the time A. Note that the instances are
visibly distinguishable by their Hamming weight in both figures and there are
only few conflicting instances (i.e. instances that have the same feature values
but a different Hamming weight).
Next, we executed C-SVC with varying parameters on the training instances
and evaluated the guessing entropy on the attack traces. The libsvm framework
allowed us to vary over the cost for a wrong classification, the termination crite-
rion, and the kernel function. The tested kernels were the linear kernel, the RBF
kernel, the polynomial kernel, the power kernel, the hybrid kernel, and the log
kernel [10]. The results indicated that the RBF kernel, with a cost factor of 10
and a termination criterion of 0.02, performed best for the low noise traces.
From our experiments we deduced that the cost factor affects the adaption of
C-SVC on errors. If the cost factor is chosen high, C-SVC tries to separate the
instances, making as few errors as possible. While a minimization of errors sounds
desirable at first, it decreases the ability of C-SVC to generalize a problem and
should thus only be chosen high when there are very few contradicting instances.
The termination criterion, on the other hand, specifies the optimality con-
straint for the constructed hyperplane [18]. If chosen high, C-SVC is more likely
to find a hyperplane in a small number of iterations. However, since C-SVC re-
lies on an optimization problem, the resulting hyperplane for a high termination
criterion may be adequate but not optimal.
Lastly, we varied the input features, i.e. the number of relevant time instances
of the power trace. Starting from the two points in time with the highest cor-
relation, we increased the number of input features until we trained the SVM
on the eight points in time with the highest correlation. For our experiments,
four input features, i.e. the four points in time, which leak the most information
about the processed variable, resulted in the smallest number of attack traces.
Low-Noised Traces. The first comparison was performed on the original mi-
crocontroller traces, using the parameters determined in Section 4.2, i.e. the
RBF kernel, a cost factor of 10, and a termination criterion of 0.02. For both,
the SVM attack and the template attack, we computed the guessing entropy for
an increasing profiling base. Figure 3a and Figure 3b depict the resulting classi-
fiers. The results of these experiments are listed in Table 1 and indicate that the
number of attack traces, required for recovering the correct key, is nearly equal
for both attacks.
Intelligent Machine Homicide 259
−0.2 8 −0.2 8
7 7
−0.25 6 −0.25 6
5 5
−0.3 4 −0.3 4
3 3
−0.35 2 −0.35 2
1 1
−0.4 0 −0.4 0
−0.4 −0.35 −0.3 −0.25 −0.2 −0.4 −0.35 −0.3 −0.25 −0.2
Power Consumption At Time A Power Consumption At Time A
Also, the performance of both attacks stabilizes after only 20 profiling traces
for each Hamming weight. This result was expected, since the instances of the
different Hamming weights could even visibly be distinguished. Thus, the reach-
ing the required guessing entropy threshold requires only the attack traces, which
are needed to uniquely characterize the key.
Table 1. Guessing Entropy for the SVM and template attack on traces with a low
noise level and a varying number of profiling traces per Hamming weight
−0.15 8 0.014
−0.2 7 0.012
Power Consumption At Time B
HW 8
6 HW 7
−0.25 0.01
HW 6
5 HW 5
Probabilty
−0.3 0.008
HW 4
4 HW 3
−0.35 0.006 HW 2
3 HW 1
−0.4 0.004 HW 0
2
−0.45 1 0.002
−0.5 0 0
−0.5 −0.4 −0.3 −0.2 −0.1 −0.45 −0.4 −0.35 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05
Power Consumption At Time A Power Consumption At Time A
(a) Two Dimensional Space for Times A and (b) Density for Time A
B
Table 2. Guessing entropy for the SVM and template attack on traces with a moderate
noise level and a varying number of profiling traces per Hamming weight
for this experiment is the bad performance of the SVM attack compared to
the template attack on a very small profiling base, i.e. 20 profiling traces per
Hamming weight. Given a small profiling base, the template attack manages to
find the correct key using only few attack traces while the SVM attack is not
able to find the correct key using all 1000 attack traces. However, if given more
profiling traces, the SVM attack quickly surpasses the template attack in terms
of guessing entropy.
High-Noised Traces. The last experiment was conducted on traces with a high
noise level. Figure 5a and 5b depict the instance distribution and the Hamming
weight densities for these traces. As expected, the high noise level makes the
instances very hard to distinguish and a trend is only observable for Hamming
weight 0 and Hamming weight 8. However, because of the normal distributed
noise, we still expect each Hamming weight to have a high concentration of
instances around the respective expectation value. Thus, we chose the same cost
factor as for the traces with a moderate noise level.
The results of the corresponding experiments are listed in Table 3. Because
of the high noise level, the required number of attack traces for the guessing
Intelligent Machine Homicide 261
−3
x 10
0.4 8 2.5
0.2 7
Power Consumption At Time B
2 HW 3
6 HW 2
0
HW 0
5 HW 1
1.5
Probability
−0.2 HW 6
4 HW 8
−0.4 HW 4
1
3 HW 7
−0.6 HW 5
2
0.5
−0.8 1
−1 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 −1.5 −1 −0.5 0 0.5
Power Consumption At Time A Power Consumption At Time A
(a) Two Dimensional Space for Times A and (b) Density for Time A
B
Table 3. Guessing entropy for the SVM and template attack on traces with a high
noise level and a varying number of profiling traces per Hamming weight
entropy rises drastically. Both, the template attack and the SVM attack, are
not able to recover the correct key or at least narrow the key space down to
five possible keys within 1000 attack traces, if less than 60 profiling traces per
Hamming weight are used. However, if the profiling base is increased to 60 traces
per Hamming weight, the SVM attack manages to find the correct key. Using
the same profiling base, the template attack is only able to narrow the key space
down to 5 possible key values. The template attack first manages to recover the
correct key using a profiling base of 280 traces per Hamming weight. However,
even though it manages to find the correct key, the template attack still requires
roughly twice the number of attack traces as the SVM attack.
Noticeable about the results is that, just like for the traces with a moderate
noise level, the SVM attack reaches a point where it fluctuates around a certain
number of attack traces very quickly compared to the template attack, which
decreases the number of attack traces steadily but slowly. This is observable for
GE 1 as well for GE 5 where the SVM attack starts fluctuating at a profiling
base of 80 traces per Hamming weight, whereas the template attack continues
to decrease the required attack traces in GE 5 .
262 A. Heuser and M. Zohner
5 Conclusion
In this paper we presented a new profiling based side channel attack, the so-called
SVM attack. The SVM attack utilizes the machine learning algorithm SVM in
order to classify the Hamming weight of an intermediate value, which depends
on a secret key. In order to evaluate the gain of the SVM attack, we compared
it to the template attack. The comparison between the SVM attack and the
template attack was conducted by evaluating the number of traces in the attack
phase that are required to achieve a pre-fixed guessing entropy on a variable sized
profiling base and by varying the noise level of the traces. While the template
attack required less attack traces when the noise level was low, the SVM attack
Intelligent Machine Homicide 263
outperformed the template attack on traces with a higher noise level. This can
be explained by the different focus of templates and SVM. While templates try
to model the complete power consumption distribution of a device by taking all
elements into account, SVM focuses on the separation of classes, using only some
conflicting instances and the support vectors. Thus, SVM disregards instances
that are not important for the separation of classes, which allows SVM to achieve
a stable performance using a smaller profiling base than the template attack.
Future work may concentrate on the advantage of SVM to generalize a given
problem. A possible scenario for the generalization is a profiling based attack
that conducts the profiling phase on one device and performs the attack phase
on another device, which is identical to the profiled device. This scenario is
especially interesting since it depicts a practical profiling based attack on a
device. Additionally, we plan to analyze further machine learning methods in
order to better adapt SVM to challenges in the area of side channel analysis.
References
1. Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template At-
tacks in Principal Subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 1–14. Springer, Heidelberg (2006)
2. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011),
http://www.csie.ntu.edu.tw/~ cjlin/libsvm
3. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
4. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-
correcting output codes. J. Artif. Int. Res. 2, 263–286 (1995),
http://dl.acm.org/citation.cfm?id=1622826.1622834
5. Elaabid, M.A., Guilley, S., Hoogvorst, P.: Template attacks with a power model.
IACR Cryptology ePrint Archive 2007, 443 (2007)
6. Gierlichs, B., Lemke-Rust, K., Paar, C.: Templates vs. Stochastic Methods. In:
Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 15–29. Springer,
Heidelberg (2006)
7. Hastie, T., Tibshirani, R.: Classification by pairwise coupling (1998)
8. Hospodar, G., Mulder, E.D., Gierlichs, B., Verbauwhede, I., Vandewalle, J.: Least
square support vector machines for side-channel analysis. In: Constructive Side-
Channel Analysis and Secure Design, COSADE (2011)
9. Kasper, M., Schindler, W., Stöttinger, M.: A stochastic method for security evalua-
tion of cryptographic fpga implementations. In: IEEE International Conference on
Field-Programmable Technology (FPT 2010), pp. 146–154. IEEE Press (December
2010)
10. Kiely, T., Gielen, G.: Performance modeling of analog integrated circuits using
least-squares support vector machines. In: Proceedings of the Design, Automation
and Test in Europe Conference and Exhibition, vol. 1, pp. 448–453 (February 2004)
264 A. Heuser and M. Zohner
11. Kreßel, U.H.G.: Pairwise classification and support vector machines, pp. 255–268.
MIT Press, Cambridge (1999),
http://dl.acm.org/citation.cfm?id=299094.299108
12. Lemke-Rust, K., Paar, C.: Analyzing Side Channel Leakage of Masked Implemen-
tations with Stochastic Methods. In: Biskup, J., López, J. (eds.) ESORICS 2007.
LNCS, vol. 4734, pp. 454–468. Springer, Heidelberg (2007)
13. Lerman, L., Bontempi, G., Markowitch, O.: Side channel attack: an approach based
on machine learning. In: Constructive Side-Channel Analysis and Secure Design,
COSADE (2011)
14. Mohamed, M.S.E., Bulygin, S., Zohner, M., Heuser, A., Walter, M.: Improved
algebraic side-channel attack on aes. Cryptology ePrint Archive, Report 2012/084
(2012)
15. Rechberger, C., Oswald, E.: Practical Template Attacks. In: Lim, C.H., Yung, M.
(eds.) WISA 2004. LNCS, vol. 3325, pp. 440–456. Springer, Heidelberg (2005)
16. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Chan-
nel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659,
pp. 30–46. Springer, Heidelberg (2005)
17. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector
algorithms. Neural Comput. 12, 1207–1245 (2000),
http://dl.acm.org/citation.cfm?id=1139689.1139691
18. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Reg-
ularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
19. Standaert, F.X., Malkin, T.G., Yung, M.: A unified framework for the analysis of
side-channel key recovery attacks (extended version). Cryptology ePrint Archive,
Report 2006/139 (2006)
20. Weston, J., Watkins, C.: Multi-class support vector machines (1998)
21. Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification
by pairwise coupling. Journal of Machine Learning Research 5, 975–1005 (2003)
Author Index
Renner, Soline 69
Endo, Takashi 105
Riesgo, Teresa 39
Rivain, Matthieu 69
Fischer, Viktor 151, 167
Robisson, Bruno 151
Flottes, Marie-Lise 89
Rouzeyre, Bruno 89
Giraud, Christophe 69 Sakurai, Kouichi 135
Guilley, Sylvain 183 Schmidt, Jörn-Marc 1
Guo, Shize 231 Shi, Zhijie 231
Stöttinger, Marc 215
He, Wei 39
Heuser, Annelie 249 Vadnala, Praveen Kumar 69
Hoogvorst, Philippe 183 Verbauwhede, Ingrid 89
Hutter, Michael 1, 17 Vuillaume, Camille 105