You are on page 1of 274

Lecture Notes in Computer Science 7275

Commenced Publication in 1973


Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Werner Schindler Sorin A. Huss (Eds.)

Constructive
Side-Channel Analysis
and Secure Design

Third International Workshop, COSADE 2012


Darmstadt, Germany, May 3-4, 2012
Proceedings

13
Volume Editors

Werner Schindler
Bundesamt für Sicherheit in der Informationstechnik (BSI)
Godesberger Allee 185–189
53175 Bonn, Germany
E-mail: werner.schindler@bsi.bund.de
Sorin A. Huss
Technische Universität Darmstadt
Hochschulstr. 10
64289 Darmstadt, Germany
E-mail: huss@iss.tu-darmstadt.de

ISSN 0302-9743 e-ISSN 1611-3349


ISBN 978-3-642-29911-7 e-ISBN 978-3-642-29912-4
DOI 10.1007/978-3-642-29912-4
Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2012936495

CR Subject Classification (1998): E.3, D.4.6, K.6.5, C.2, J.1, G.2.1

LNCS Sublibrary: SL 4 – Security and Cryptology

© Springer-Verlag Berlin Heidelberg 2012


This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface

COSADE 2012, the Third Workshop on Constructive Side-Channel Analysis


and Secure Design, was held in Darmstadt, Germany, during May 3–4, 2012.
COSADE 2012 was supported by CASED and its partners TU Darmstadt and
Fraunhofer SIT as well as by the German Federal Office for Information Security
(Bundesamt für Sicherheit in der Informationstechnik, BSI).
For researchers and experts from academia, industry and government who are
interested in attacks on cryptographic implementations and/or secure design,
COSADE workshops present a great opportunity to meet and enjoy intensive
discussions.
The program provides plenty of time for information exchange on the further
development of existing and for the establishment of new scientific collaborations.
This year 49 papers from several areas such as side-channel analysis, fault
analysis, secure design, and architectures were submitted. Each paper was as-
signed to three reviewers. The decision process was very challenging and resulted
in the selection of 16 interesting papers. Their carefully revised versions are con-
tained in the conference proceedings.
The Program Committee consisted of 33 members from 12 countries. The
members were carefully selected to represent both academia and industry, as
well as to include high-profile experts with research relevant to COSADE 2012.
The Program Committee was supported by 48 external reviewers. We are deeply
grateful to the members of the Program Committee as well as to the external
reviewers for their dedication and hard work.
Besides 16 contributed presentations, two highly relevant invited talks were
held. Mathias Wagner considered “700+ Attacks Published on Smart Cards:
The Need for a Systematic Counter Strategy,” while Viktor Fischer gave “A
Close Look at Security in Random Number Generators Design.” The workshop
program included special sessions. The presentation “Using Multi-Area Diode
Lasers and Developing EM FI Tools” considered fault injection attacks. More-
over, the outcome of DPA contest v3 was presented at COSADE 2012, and DPA
contest v4 was announced.
COSADE 2012 also had a Work in Progress session where cutting-edge re-
search results were presented. These contributions are not contained in this vol-
ume since the submission deadline expired after the editorial deadline of these
proceedings.
We are also very grateful to Annelie Heuser, Michael Kasper, Marc Stöttinger
and Michael Zohner for the local organization. Finally, we would like to pro-
foundly thank and give our regards to all the authors who submitted their pa-
pers to this workshop, and entrusted us with a fair and objective evaluation of
their work. We appreciate their creativity, hard work, and interesting results.

March 2012 Werner Schindler


Sorin A. Huss
Third International Workshop on Constructive
Side-Channel Analysis and Secure Design

Darmstadt, Germany, May 3–4, 2012

General Chairs and Program Chairs


Werner Schindler Bundesamt für Sicherheit in der
Informationstechnik (BSI), Germany
Sorin A. Huss Integrated Circuits and Systems Lab (ICS),
Technische Universität Darmstadt, Germany

Local Organizers
Annelie Heuser Technische Universität Darmstadt, Germany
Michael Kasper Fraunhofer SIT, Germany
Marc Stöttinger Technische Universität Darmstadt, Germany
Michael Zohner Technische Universität Darmstadt, Germany

Program Committee
Onur Aciimez Samsung Electronics, USA
Guido Bertoni ST Microelectronics, Italy
Stanislav Bulygin TU Darmstadt, Germany
Ray Cheung City University of Hong Kong, Hong Kong
Jean-Luc Danger Télécom ParisTech, France
Markus Dichtl Siemens AG, Germany
Viktor Fischer Université de Saint-Etienne, France
Ernst-Günter Giessmann T-Systems International GmbH, Germany
Tim Güneysu Ruhr-Universität Bochum, Germany
Lars Hoffmann Giesecke & Devrient GmbH, Germany
Naofumi Homma Tohoku University, Japan
Marc Joye Technicolor, France
Jens-Peter Kaps George Mason University, USA
Çetin Kaya Koç University of California Santa Barbara, USA
and Istanbul Şehir University, Turkey
Arjen Lenstra EPFL, Switzerland
Pierre-Yvan Liardet ST Microelectronics, France
Stefan Mangard Infineon Technologies AG, Germany
Sandra Marcello Thales, France
David Naccache ENS Paris, France
VIII Constructive Side-Channel Analysis and Secure Design

Elisabeth Oswald Universitiy of Bristol, UK


Emmanuel Prouff Oberthur Technologies, France
Anand Rajan Intel Corporation, USA
Steffen Reith Hochschule RheinMain, Germany
Akashi Satoh RCIS, Japan
Patrick Schaumont Virginia Tech, Blacksburg, USA
Abdulhadi Shoufan Khalifa University Abu-Dhabi, UAE
Sergei Skorobogatov University of Cambridge, UK
Georg Sigl Technische Universität München, Germany
Francoiş-Xavier Standaert Université Catholique de Louvain, Belgium
Lionel Torres LIRMM, University of Montpellier 2, France
Ingrid Verbauwhede Katholieke Universiteit Leuven, Belgium
Marc Witteman Riscure, The Netherlands
Michael Waidner Fraunhofer SIT, Germany

External Reviewers
Michel Agoyan Bernhard Jungk Mathieu Renauld
Joppe Bos Markus Kasper Vladimir Rozic
Lilian Boussuet Michael Kasper Fabrizio de Santis
Pierre-Louis Cayrel Toshihiro Katashita Laurent Sauvage
Guillaume Duc Stéphanie Kerckhof Hermann Seuscheck
Junfeng Fan Chong Hee Kim Marc Stöttinger
Lubos Gaspar Jiangtao Li Daehyun Strobel
Benedikt Gierlichs Marcel Medwed Mostafa Taha
Christophe Giraud Filippo Melzani Junko Takahashi
Sylvain Guilley Oliver Mischke Michael Tunstall
Yu-Ichi Hayashi Amir Moradi Rajesh Velegalati
Stefan Heyse Abdelaziz Moulay Markus Wamser
Matthias Hiller Nadia El Mrabet Michael Weiss
Phillipe Hoogvorst Jean Nicolai Carolyn Withnall
Gabriel Hospodar David Oswald Meiyuan Zhao
Dimitar Jetchev Gilles Piret Michael Zohner
Table of Contents

Practical Side-Channel Analysis


Exploiting the Difference of Side-Channel Leakages . . . . . . . . . . . . . . . . . . . 1
Michael Hutter, Mario Kirschbaum, Thomas Plos,
Jörn-Marc Schmidt, and Stefan Mangard
Attacking an AES-Enabled NFC Tag: Implications from Design to a
Real-World Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Thomas Korak, Thomas Plos, and Michael Hutter

Invited Talk I
700+ Attacks Published on Smart Cards: The Need for a Systematic
Counter Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Mathias Wagner

Secure Design
An Interleaved EPE-Immune PA-DPL Structure for Resisting
Concentrated EM Side Channel Attacks on FPGA Implementation . . . . . 39
Wei He, Eduardo de la Torre, and Teresa Riesgo
An Architectural Countermeasure against Power Analysis Attacks for
FSR-Based Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Shohreh Sharif Mansouri and Elena Dubrova
Conversion of Security Proofs from One Leakage Model to Another:
A New Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Jean-Sébastien Coron, Christophe Giraud, Emmanuel Prouff,
Soline Renner, Matthieu Rivain, and Praveen Kumar Vadnala

Side-Channel Attacks on RSA


Attacking Exponent Blinding in RSA without CRT . . . . . . . . . . . . . . . . . . 82
Sven Bauer
A New Scan Attack on RSA in Presence of Industrial
Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Jean Da Rolt, Amitabh Das, Giorgio Di Natale, Marie-Lise Flottes,
Bruno Rouzeyre, and Ingrid Verbauwhede
RSA Key Generation: New Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Camille Vuillaume, Takashi Endo, and Paul Wooderson
X Table of Contents

Fault Attacks
A Fault Attack on the LED Block Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Philipp Jovanovic, Martin Kreuzer, and Ilia Polian

Differential Fault Analysis of Full LBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


Liang Zhao, Takashi Nishide, and Kouichi Sakurai

Contactless Electromagnetic Active Attack on Ring Oscillator Based


True Random Number Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Pierre Bayon, Lilian Bossuet, Alain Aubert, Viktor Fischer,
François Poucheret, Bruno Robisson, and Philippe Maurine

Invited Talk II
A Closer Look at Security in Random Number Generators Design . . . . . . 167
Viktor Fischer

Side-Channel Attacks on ECC


Same Values Power Analysis Using Special Points on Elliptic Curves . . . 183
Cédric Murdica, Sylvain Guilley, Jean-Luc Danger,
Philippe Hoogvorst, and David Naccache

The Schindler-Itoh-attack in Case of Partial Information Leakage . . . . . . 199


Alexander Krüger

Different Methods in Side-Channel Analysis


Butterfly-Attack on Skein’s Modular Addition . . . . . . . . . . . . . . . . . . . . . . . 215
Michael Zohner, Michael Kasper, and Marc Stöttinger

MDASCA: An Enhanced Algebraic Side-Channel Attack for Error


Tolerance and New Leakage Model Exploitation . . . . . . . . . . . . . . . . . . . . . 231
Xinjie Zhao, Fan Zhang, Shize Guo, Tao Wang, Zhijie Shi,
Huiying Liu, and Keke Ji

Intelligent Machine Homicide: Breaking Cryptographic Devices Using


Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Annelie Heuser and Michael Zohner

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265


Exploiting the Difference
of Side-Channel Leakages

Michael Hutter1 , Mario Kirschbaum1 , Thomas Plos1 ,


Jörn-Marc Schmidt1 , and Stefan Mangard2
1
Institute for Applied Information Processing and Communications (IAIK),
Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria
{mhutter,mkirschbaum,tplos,jschmidt}@iaik.tugraz.at
2
Infineon Technologies AG, Am Campeon 1-12, 85579 Neubiberg, Germany
stefan.mangard@infineon.com

Abstract. In this paper, we propose a setup that improves the perfor-


mance of implementation attacks by exploiting the difference of side-
channel leakages. The main idea of our setup is to use two cryptographic
devices and to measure the difference of their physical leakages, e.g., their
power consumption. This increases the signal-to-noise ratio of the mea-
surement and reduces the number of needed power-consumption traces
in order to succeed an attack. The setup can efficiently be applied (but is
not limited) in scenarios where two synchronous devices are available for
analysis. By applying template-based attacks, only a few power traces
are required to successfully identify weak but data-dependent leakage
differences. In order to quantify the efficiency of our proposed setup, we
performed practical experiments by designing three evaluation boards
that assemble different cryptographic implementations. The results of
our investigations show that the needed number of traces can be reduced
up to 90 %.

Keywords: Side-Channel Attacks, Power Analysis, Measurement Setup,


DPA, SPA.

1 Introduction

Side-channel attacks are among the most powerful attacks performed on cryp-
tographic implementations. They exploit secret information that physically leak
out of a device. Typical side channels are the power consumption [11,12], the elec-
tromagnetic emanation [1], or the execution time of cryptographic algorithms [10].
The efficiency or even the success of an attack is largely determined by the used
measurement equipment. The better the equipment, the less noise and the higher
the side-channel leakage exploitation will be. Especially when countermeasure-
enabled devices are analyzed, the setup is vital in order to limit the needed
number of power-trace acquisitions to succeed an attack.
In this paper, we present a setup that improves the efficiency of side-channel
attacks by measuring the difference of two side-channel leakages. Our setup is

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 1–16, 2012.

c Springer-Verlag Berlin Heidelberg 2012
2 M. Hutter et al.

based on the idea to use two cryptographic devices (instead of one) and to mea-
sure the difference of their physical characteristics (e.g., the power consumption).
If both modules perform the same cryptographic operation, their physical char-
acteristics are the same so that the difference of both side-channel measurements
becomes theoretically zero. However, if one module processes different data than
the other module, a difference in both measurements can be observed at loca-
tions in time when data-dependent information is processed. The difference of
both side channels therefore provides only data-dependent signals and eliminates
static and (non data-dependent) dynamic signals (i.e., noise). Hence, the quality
of the measurements can be significantly improved which results in the fact that
less power traces have to be acquired in practice.
In order to perform side-channel analysis attacks using our setup, an attacker
can choose from two possible attacking scenarios: (1) one device is fed with
constant input data while the second device is fed with random data, or (2) one
device is fed in a way such that the targeted intermediate value is complementary
to the intermediate value of the second device. For both scenarios, we quantified
the efficiency by performing practical experiments. We designed three evaluation
boards where each board uses two devices (an AT89S8253 microcontroller, an
ATmega128, and a custom 8051 ASIC design). In our experiments, we applied
the Pearson Correlation coefficient and performed a classical Differential (or
Correlation based) Power Analysis (DPA) attack [11,12] on the differential power
trace. Our best results increased the correlation coefficient for the AT89S8253
from 0.64 to 0.99 (55 %), for the ATmega128 from 0.61 to 0.96 (57 %), and for
the custom 8051 ASIC from 0.11 to 0.22 (100 %). Furthermore, we evaluated
our method on countermeasure-enabled devices and performed attacks on an
implementation that uses randomization techniques as well as a masked AES
implementation. In this scenario, it shows that the setup reduces the number of
needed traces up to 90 %.
The rest of this paper is organized as follows. In Section 2, we discuss re-
lated work. Section 3 gives a brief overview on side-channel measurements and
describes how to improve the signal-to-noise ratio. After that, we present the
new measurement setup and highlight the benefits. In Section 4, we describe the
measurement process in detail and introduce two different measurement scenar-
ios. The three evaluation boards are presented in Section 5. Section 6 describes
the performed attacks. Results are given in Section 7 and Section 8. Conclusions
are drawn in Section 9.

2 Related Work

There exist several side-channel analysis (SCA) measurement boards as well as


SCA simulation tools and evaluation setups. SCA measurement boards aim at
providing a common attack platform that eases the comparison of measurement
results. Well-known attack platforms for SCA evaluation are the INSTAC boards
from the Tamper-resistance Standardization Research Committee (TSRC) [13]
and the SASEBO boards from the Research Center for Information Security
Exploiting the Difference of Side-Channel Leakages 3

(RCIS) and Tohoku University [17]. The TSRC has released two boards, the
INSTAC-8 with an 8-bit microcontroller and the INSTAC-32 with a 32-bit mi-
crocontroller and an FPGA. From the SASEBO boards there exist a variety
of different evaluation platforms that contain Xilinx (SASEBO, SASEBO-G,
SASEBO-GII) or Altera (SASEBO-B) FPGAs. The boards contain two FPGAs,
one for the implementation of the cryptographic algorithm and one for handling
control tasks. Since the FPGAs have processor cores integrated (powerPC pro-
cessor cores), both hardware and software implementations can be evaluated
with theses boards. An SCA simulation tool has been also presented by Eind-
hoven University of Technology. The tool is called PINPAS and allows analyzing
the vulnerability of software algorithms against SCA attacks [7]. Commercial
SCA evaluation setups are offered by companies like Cryptography Research
(DPA Workstation [6]), Riscure (Inspector [16]), and Brightsight (Sideways [4]).

3 The Measurement of Side-Channel Leakages


A measurement of side-channel leakage involves various components. Besides
components that are caused by the execution of an operation or due to data-
dependent variations, there exist components that are caused due to different
kinds of noise. Noise is produced by the equipment itself (e.g., quantization noise
of the digital oscilloscope, an unstable clock generator, glitches and variations
in the power supply, etc.), by the device (switching noise or noise due to leakage
currents), or by the environment (radiated or conducted emissions, cosmic radia-
tion, etc.). The higher the noise, the lower the measured side-channel leakage will
be and the more traces have to be acquired to perform a successful side-channel
attack. The signal-to-noise ratio is a measure to characterize the side-channel
leakage of cryptographic devices. It is the ratio between the (data-dependent)
signal and the noise component of a measurement [12].
In the following, we propose a new setup that can be used to increase the
signal-to-noise ratio of side-channel measurements. Instead of exploiting the side-
channel leakage of only one cryptographic device, we propose to use two devices
to exploit the difference of their side-channel leakages. The setup therefore sig-
nificantly reduces the number of needed power-consumption traces to succeed
an attack.

3.1 The Proposed Measurement Setup


Figure 1 shows the schematic of the proposed setup. It consists of two crypto-
graphic Integrated Circuits (ICs) (IC1 on the left side and IC2 on the right side
of the figure). A resistor is placed in the ground line of each IC (GN D1 and
GN D2 ) which allows to measure the voltage drop across the resistors.
In contrast to classical power-analysis setups, we propose to measure the volt-
age difference of both ICs, i.e., VDif f in Figure 1. This can be simply done by
using a differential probe which in fact implicitly subtracts the side-channel
leakage of both devices and allows the efficient acquisition of their side-channel
leakage difference.
4 M. Hutter et al.

RIC 1 RIC 2
IC GND1 A B GND2 IC
VDiff
1 2

G
1
D

N
N

D2
G
+
-
A VDiff B
R1 R2
R1 R2

GND
GND GND

Fig. 1. Schematic of the proposed setup Fig. 2. Schematic of a Wheatstone bridge

In view of electrical metrology, the setup is actually equal to a bridge circuit


which can be used to accurately measure very small variations of two circuit
branches. Figure 2 shows the schematic of a Wheatstone bridge. The dynamic
resistance RIC1 of IC1 and R1 form one branch and the resistance RIC2 of IC2
and R2 represent the other branch of the bridge circuit. The voltage difference
of both branches is then measured between the points A and B.
The bridge can be manually balanced by varying the resistor R1 . It is balanced
if a zero value is measured at VDif f which means that the same amount of current
flows through the branch RIC1 + R1 and through the second branch RIC2 + R2 .
R R
Note that the voltage at point A is proportional to the ratio RIC1 1 and it is RIC2 2
at the point B.
If both ICs process different data, the measurement bridge becomes unbal-
anced. In this case, the measured voltage difference VDif f is high and this causes
a peak in the measured power traces. This voltage difference is in fact pro-
portional to the processed data and can be therefore exploited in side-channel
attacks.

3.2 What Are the Advantages of the Proposed Setup?

The proposed setup provides three major advantages:

1. Reduction of noise. Constant and static power consumption (e.g., the


clock signal or non data-dependent operations) are canceled out by the setup
because the side-channel leakages of both devices are subtracted. Further-
more, noise from the environment is canceled out since both devices are
exposed to the same noise level.
2. Higher measurement sensitivity. Since the power-consumption traces of
both devices are implicitly subtracted by the setup, only their differences
are identified. This results in a much higher measurement sensitivity and
acquisition resolution (we achieved a signal amplification by a factor of up
Exploiting the Difference of Side-Channel Leakages 5

to 7.3 in our experiments, cf. Section 7). Even very low power-consumption
differences (that are caused by data-dependent operations for example) can
be efficiently identified.
3. Higher signal-to-noise ratio. Since the noise level is reduced and the sig-
nal acquisition resolution is increased, the signal-to-noise ratio (SNR) is
higher compared to conventional DPA attack setups. In fact, the higher
the SNR, the less traces have to be acquired.

3.3 Applicability of the Setup

The setup can be applied in scenarios where both devices run synchronously, i.e.,
the devices process the same operation and data in the same instant of time.
This is typically the case for devices that are fed by an external clock source
or for devices that possess a very stable internal clock generator. In these cases,
both devices can be easily synchronized by feeding the same clock source or by a
simultaneous power up. In order to overcome the costly operation of the proposed
setup due to synchronization issues, a simple yet effective synchronization circuit
based on an FPGA could be used. The FPGA would just have to trigger a reset
signal or to toggle the power supply of both devices if the first response (e.g.,
a power-up indicator) is asynchronous. Once implemented, such an automatic
trial-and-error setup device would be universally usable and it would be able to
provide a synchronous measurement setup in no time.
For many embedded systems like conventional smart cards, the setup may
fail because both devices provide an asynchronous behavior which cannot be
controlled by an attacker. This asynchronous behavior is caused by asynchronous
designs, unstable clock sources, or by side-channel countermeasures such as clock
jitters. However, in a white-box scenario, where the implementation is known and
where the devices can be fully controlled, one can benefit from the higher signal-
to-noise ratio of the setup to reduce the number of needed traces for a successful
attack.
In this paper, we consider only contact-based power-analysis attacks even
though the idea can be also extended to electromagnetic (EM) based attack
settings. In such a scenario, the position of probes plays a major role in the
efficient cancelation of uninteresting signals.

4 Measurement Methodology
In the following, we describe the measurement process to perform side-channel
attacks using our proposed setup. First, the setup has to be calibrated in order to
efficiently identify interesting side-channel leakages. In a second step, an attacker
(or evaluator) has to choose from various attacking scenarios, e.g., keeping the
key or input data of one device constant or choosing the inputs in such a way
that the targeted intermediate value is complementary to the intermediate value
of the second device.
6 M. Hutter et al.

0.2 0.2
Voltage [V]

Voltage [V]
0 0

−0.2 í0.2
100 200 300 400 100 200 300 400
0.2 0.2

Voltage [V]
Voltage [V]

0 0

−0.2 í0.2
100 200 300 400 100 200 300 400
0.05

Voltage [V]
0.05
Voltage [V]

0 0

−0.05 í0.05
100 200 300 400 100 200 300 400
Time [ns]
Time [ns]

Fig. 3. Power-consumption traces of two Fig. 4. Power-consumption traces of two


devices that process the same data (first devices that process different data (first
two rows from the top) are subtracted (dif- two rows from the top) are subtracted (dif-
ference trace at the bottom) ference trace at the bottom)

4.1 Calibration of the Setup


In order to calibrate our setup, both ICs have to execute the same operations
and the same data has to be processed (e.g., zero values). The resistor R1 has to
be adjusted such that a minimum voltage offset is measured at VDif f . Figure 3
shows the result of the calibration step. In the upper two plots of the figure,
the power-consumption traces of IC1 and IC2 are shown. Both ICs processed
the same operation and the same data. The lower plot shows the result after
subtracting both power traces. It shows that the signals are nearly canceled out
(e.g., the clock signal or the signal between 200 and 300 ns is much weaker in
the resulting power trace).
Figure 4 shows the subtraction of two power-consumption traces that are
caused by devices which process different data. In this case, the setup becomes
unbalanced and a significant voltage difference can be measured at VDif f . A
peak can be identified at locations in time when different data is processed.
After calibration, an attacker has to choose between the two possible attacking
scenarios which are described in the following.

4.2 Scenario 1: Choosing a Constant Intermediate Value


In this scenario, one device is fed with constant input data such that the targeted
intermediate value is also constant. The second device is fed with random input
data. For both devices we assume a constant key.
This scenario is practicable for real-world attacks where the secret key of one
device is not known. The second device can be fed with constant input data
such that a difference in the power-consumption traces is caused that can be
exploited in an attack.
The advantage compared to a classical DPA attack lies in a much higher
signal-to-noise ratio of the performed measurement. Let Pmeas be the measured
Exploiting the Difference of Side-Channel Leakages 7

power consumption of a single cryptographic device. Then, the power consump-


tion can be separated into several components such as an operation-dependent
part Pop , a data-dependent part Pdata , noise from the environment Penv.noise ,
and noise caused by the device itself, i.e., Pdev.noise (see [12] for a detailed de-
scription of power-trace characterization). Pmeas can therefore be modeled as a
sum of those components, i.e.,
Pmeas = Pop + Pdata + Penv.noise + Pdev.noise . (1)
In view of our proposed setup, the measured power consumption can then be
modeled as follows:
Pmeas = Pop1 + Pdata1 + Penv.noise1 + Pdev.noise1 −
(Pop2 + Pdata2 + Penv.noise2 + Pdev.noise2 ) (2)
= (Pdata1 − Pdata2 ) + (Pdev.noise1 − Pdev.noise2 ).
Since both devices process the same operation, Pop1 and Pop2 are equal and
are therefore implicitly canceled out by the setup. The same holds true for the
noise Penv.noise1 and Penv.noise2 that is caused by the proximity and that influ-
ences both devices with the same signal strength. Thus, the remaining power
consumption consists only of the difference of their data-dependent compo-
nents Pdata1 − Pdata2 as well as the difference of their electronic noise, i.e.,
Pdev.noise1 − Pdev.noise2 .

4.3 Scenario 2: Choosing Complementary Intermediate Values


In this scenario, one device is fed in a way such that the targeted intermediate
value is complementary to the intermediate value of the second device. Therefore,
the power-consumption difference is maximized because both devices always
process data that are complementary to each other.
This scenario is only practicable if the targeted intermediate value is known
by the attacker because only then the complementary value can be generated.
This is typically the case for design evaluators or compliance-testing bodies who
are in possession of the entire implementation and the secret key. By knowing the
targeted intermediate value, the complementary value can be easily calculated
which is then processed by the second device.
Figure 5 shows an example where two ICs process different input data x and
x . The input values are chosen in a way such that the targeted intermediate
value y  provides a maximum Hamming distance to y. This actually corresponds
to flipping all bits of the intermediate value y or to perform an XOR operation
of y with 255. For example, if the output byte y of IC1 is 3 (0x03), the output
byte y  of IC2 is 252 (0xFC).

4.4 Using Templates


Another big advantage of the proposed setup is the use of templates (cf. [5,2]).
The setup can be effectively applied in scenarios where only one single
8 M. Hutter et al.

f(x) = y f(x’) = y’
GND1 GND2
VDiff
x x’
IC 1 y IC 2 y’

Fig. 5. The processing of different input data x and x causes a voltage difference
between both ICs which can be exploited in a side-channel attack

acquisition trace can be measured and evaluated, e.g., in elliptic curve based
implementations where the ephemeral key is changed in every execution. In this
case, the setup efficiently reveals the power-consumption difference of the two
devices in a single shot. This difference can then be compared with generated
power-consumption templates in order to classify the leakage according to the
processed data.

4.5 The ISO/IEC 10373-6/7 Test Apparatus


The proposed setup is similar to the test apparatus for compliance testing of
identification cards specified in the ISO/IEC 10373-6 [8] (for proximity cards) or
10373-7 [9] (for vicinity cards) standard. Figure 6 shows a picture of the appara-
tus. It consists of a Radio-Frequency Identification (RFID) reader antenna in the
middle of the setup and two so-called sense coils. The sense coils have the same
distance to the reader antenna so that they measure the same signals emitted
by the reader. Both sense coils are connected such that the signal from one coil
is in phase opposition to the other coil. This theoretically cancels out the signal
of the reader and allows the detection of load modulation signals of contactless
identification cards (which are in fact much weaker than the RFID reader field).

5 Practical Evaluation of the Proposed Setup


In order to evaluate the efficiency of our proposed setup, we developed three pro-
totyping boards. Each board assembles two equal ICs and allows the measure-
ment of their power-consumption difference. We used the following processors: an
8051-compatible microcontroller (the AT89S8253 from Atmel), the ATmega128,
and another 8051-compatible microcontroller that has been incorporated in an
ASIC design fabricated as a prototype chip presented in [14,15].
Figure 7 shows a picture of the AT89S8253 board. It consists of two 8051
microcontrollers, a USB interface for communication, a BNC clock connector, a
reset switch, and some I/O pins. The ATmega128 evaluation board (see Fig. 8)
additionally features two JTAG interfaces, which allow the programming and
debugging of both devices.
The ASIC prototype-chip evaluation board is shown in Figure 9. Each ASIC
prototype chip contains an 8051-compatible microcontroller with an AES
Exploiting the Difference of Side-Channel Leakages 9

Fig. 6. The test apparatus according to Fig. 7. The AT89S8253 evaluation board
the ISO/IEC 10373-6 standard [8]

coprocessor implemented in CMOS logic and in a masked logic style1 . The ASIC
evaluation board additionally contains voltage regulators and two ROMs for
storing the programs executed in the microcontroller cores.
Both devices on the respective evaluation board are connected to the same
clock source, whereby the clock wires have been routed in a way so that timing
differences (i.e., clock skew) are minimized. All three evaluation boards provide
the possibility to easily measure the core power consumption of each of the two
devices over a measurement resistor either in the VDD or in the GND line, as
well as to measure the power consumption difference of both devices.

6 Description of the Performed Attacks


We performed several attacks using the described evaluation boards. First, we
evaluated the efficiency of our proposed setup by setting the intermediate value
of one device to a constant value (further denoted as Constant-Value Attack ).
Second, we evaluated the performance of the setup by choosing complementary
intermediate values (further denoted as Complementary-Value Attack ). Third,
we evaluated the efficiency of our setup regarding side-channel countermeasures
and performed attacks on a randomized and a masked implementation using our
custom 8051 ASIC chip.
In order to compare the results, we performed a reference attack for each setup,
i.e., a classical Correlation Power Analysis (CPA) attack [3] on one IC of each
setup. As a target of these attacks, we considered the output of a MOV operation
(the input byte is moved from memory to a working register of the CPU). Note
that this or similar memory operations are also performed in implementations
of cryptographic algorithms such as DES or AES, e.g., moving the S-box output
byte after the first round of AES from a register to the RAM.
1
As the type of the masked logic style implemented on our prototype chips is not
important for this paper, we omit further details about it.
10 M. Hutter et al.

Fig. 8. The ATmega128 evaluation board Fig. 9. The ASIC prototype-chip evalua-
tion board

All boards have been connected to a PC that runs Matlab [18] in order to
control the entire measurement setup. The PC transmits three bytes over the
serial connection to both ICs that are assembled on each board. IC1 listens
to the first byte, IC2 listens to the second byte, and the last byte starts the
operation on both ICs.
The power consumption of the ICs has been measured using the 2.5 GHz
LeCroy WavePro 725Zi 8-bit digital-storage oscilloscope. For all experiments,
we used a sampling rate of 5 GS/s. Each IC has been further programmed to
pull a debug pin to high which triggers the oscilloscope and starts the measure-
ment process. Furthermore, we used an active differential probe to measure the
difference of both side channels. For this, we used the LeCroy D320 WaveLink
Differential probe with 3.5 GHz bandwidth.

Processor Synchronization. It showed that the ICs of each setup are often
not synchronized after startup and their trigger signals occur at different points
in time. This is because both ICs are not powered up perfectly in parallel which
causes one IC to get clocked earlier or later than the other IC. In addition,
both ICs provide slightly different characteristics (power consumption, timing,
etc.) which is due to variations in the fabrication process of the ICs. In order to
minimize the differences, we recommend to use only ICs which provide at least
the same revision number, production line, and year/month of fabrication.
In order to synchronize the two ICs, we needed to reset and power up the
boards until they are synchronized (try and error). For example, for the 8051
microcontroller AT89S8253 the probability of synchronization is 1/24 since the
processor requires 12 clock cycles (so-called T-states) to execute a single machine
cycle.
Exploiting the Difference of Side-Channel Leakages 11

Table 1. Result of the Constant-Value Attack using the Pearson Correlation coefficient

AT89S8253 ATmega128 8051 CMOS ASIC


Reference Attack 0.64 0.61 0.11
Constant-Value Attack 0.87 0.87 0.14
Improvement 0.23 0.26 0.03
Improvement [%] 35.94 42.62 27.27

7 Results of Attacks

This section presents the results of the performed attacks. All boards have been
clocked at a frequency of 3.6864 MHz.

7.1 Choosing a Constant Intermediate Value

Table 1 shows the correlation coefficient for each measurement setup. For the
AT89S8253 and the ATmega128, we measured 1 000 power traces. 10 000 traces
have been measured for the 8051 CMOS core of the ASIC prototype chip.
It shows that our setup increased the correlation coefficient by 0.23 (about
36 %) compared to the result obtained from a classical CPA-attack setup. This
means that the number of needed power traces is reduced by a factor of about
2.7 (from 50 to only 18). The y-coordinate resolution of the oscilloscope was
increased from 81 mV/DIV (for the Reference Attack ) to 11 mV/DIV (for the
Constant-Value Attack ) which is a factor of about 7.3. Similar results have been
obtained for the ATmega128. The correlation coefficient increased by 0.26 (about
43 %), thus the needed number of traces is reduced by a factor of 3.2 (from 57
to 18). The acquisition resolution has been increased by a factor of about 3.8.
About 27 % improvement has been obtained for the 8051 CMOS ASIC such that
the needed number of traces is reduced by 1.6 (from about 2 300 to only 1 400).
The acquisition resolution has been increased by the factor 3.3.
We also calculated the SNR in order to compare the signal level to the noise
level. It shows that the SNR increased by a factor of 4.7 to 11.5 in our exper-
iments (depending on the used device). An example for the SNR improvement
on the ATmega128 is given in Appendix A.

Table 2. Result of the Complementary-Value Attack using the Pearson Correlation


coefficient

AT89S8253 ATmega128 8051 CMOS ASIC


Reference Attack 0.64 0.61 0.11
Complementary-Value Attack 0.99 0.96 0.22
Improvement 0.35 0.35 0.11
Improvement [%] 54.69 57.38 100.00
12 M. Hutter et al.

1 1

0.8 0.8
Correlation coefficient

Correlation coefficient
0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

0 1 2 3 4 5 0 1 2 3 4 5
Time [µs] Time [µs]

Fig. 10. Result of a classical CPA attack Fig. 11. Result of a CPA attack that ex-
on one ATmega128 device (Reference At- ploits the difference of two side channels
tack ) (Complementary-Value Attack )

7.2 Choosing Complementary Intermediate Values


Table 2 shows the result for the Complementary-Value Attack. The result shows
a significant improvement of the correlation coefficient for every setup. The cor-
relation coefficient has been increased by 0.35 for both the AT89S8253 and the
ATmega128 setup, i.e., about 55-57 %. For the 8051 ASIC, a 100 % improvement
has been obtained.
Thus, the needed number of traces has been reduced by a factor of 7.2 for
the AT89S8253 (7 instead of 51 traces), a factor of 5.7 for the ATmega128 (10
instead of 57 traces), and a factor of 4.1 for the 8051 ASIC (about 550 instead
of 2 300 traces).
Figure 10 presents the results of a CPA attack that has been performed on
one ATmega128 microcontroller (Reference Attack ). It shows two correlation
peaks (two because the intermediate value has been moved two times in our
implementation). The peaks occur between the second and fourth microsecond
after the trigger signal. The maximum absolute correlation coefficient is 0.61 for
the correct intermediate-value guess (trace plotted in black). All other incorrect
guesses show no significant correlation (traces plotted in gray). Figure 11 shows
the result of the CPA attack that exploits the difference of two side channels
(Complementary-Value Attack ). For the correct intermediate guess, a correlation
of 0.96 has been obtained while no significant correlation can be discerned for
incorrect guesses.

8 Results of Attacks on Countermeasure-Enabled Devices


In order to evaluate the efficiency of our setup regarding side-channel counter-
measures, we investigated two different types of countermeasures: randomization
and masking. First, we present results of our ASIC prototype where the MOV op-
eration is randomized in time. Second, we present the results of an attack on a
masked implementation of a MOV operation as well as on the AES core.
Exploiting the Difference of Side-Channel Leakages 13

8.1 Attacks on Randomization Countermeasures


We performed a Constant-Value Attack on a MOV operation using our ASIC
prototype and compared the results with a Reference Attack. For the attack, we
measured 10 000 power traces and applied a 50 % randomization in our measure-
ment. This means that the MOV operation is randomized at two locations in time.
The randomization experiment should indicate the performance of our proposed
measurement setup in case of noisy environments (i.e., in case of a randomiza-
tion countermeasure). Compared to the Reference Attack where we achieved a
correlation coefficient of 0.11, corresponding to 2 300 traces, the randomization
decreases the correlation coefficient to 0.07 (5700 traces). This results in a factor
of approximately 2.5. Performing a Constant-Value Attack results in a corre-
lation coefficient of 0.09 (3450 traces), i.e., the factor can be reduced from 2.5
to approximately 1.65. Most probably, a Complementary-Value Attack would
decrease the factor even further.

8.2 Attacks on Masking Countermeasures


We also performed a Constant-Value Attack and a Complementary-Value Attack
on our masked 8051 ASIC chip. First, we targeted a masked MOV operation.
Second, we targeted the masked AES core. As a target for AES, we have chosen
the first S-box output of the first round of AES.
As a result of the Constant-Value Attack on the masked MOV operation, it
shows that the correlation coefficient increased from 0.05 to 0.10 in our exper-
iments. This means that about 8 400 less power traces have be measured com-
pared to a classical DPA attack, i.e., a factor of 4. For the Complementary-Value
Attack, the correlation coefficient increased from 0.05 to 0.16. Thus, a factor 10
less power traces are needed, this corresponds to about 90 %.
We also performed an attack on the masked AES core that has been imple-
mented on our ASIC prototype. As a reference, we measured the power consump-
tion of a single chip (IC1 ) during the execution of AES encryptions of known
plaintexts. We performed a standard CPA attack on the AES coprocessor based
on the Hamming distance (HD) of two consecutively processed S-box outputs
in IC1 . Note that the device leaks the Hamming distance (HD) instead of the
Hamming weight of the intermediate values.
After that, we performed a Constant-Value Attack. IC1 performs the same
operation as in the reference attack, i.e., AES encryptions of known random
plaintexts. IC2 , in contrast, is fed with a constant plaintext. In our case, we set
all bytes of the secret key stored in IC2 to the value 82 (0x52). Moreover, the
plaintext of IC2 was chosen to be a zero value (0x00). This way, the output of
the S-box transformation in the first round of AES was constantly 0. Also in
this case, our CPA attack was based on the HD of two S-box outputs processed
by IC1 .
Table 3 shows the results of the performed attacks. The table compares the
results of the reference CPA attack on one single AES coprocessor (reference
attack) with the CPA results obtained from measuring the difference of the side-
channel leakages in case the second chip always computes 0 (0x00) at the S-box
14 M. Hutter et al.

Table 3. Summary of the CPA attacks on the AES coprocessor in the prototype chip
implemented in CMOS logic; For the attacks, we applied the Hamming-distance power
model

ASIC CHIP AES COPROCESSOR CMOS


Byte transition 2 → 1 3 → 2 4 → 3 16 → 4 1 → 5 11 → 6 3 → 7 4 → 8
Reference attack 0.0174 0.0163 0.0164 0.0315 0.0133 0.0170 0.0155 0.0292
Constant-Value attack 0.0226 0.0239 0.0278 0.0436 0.0223 0.0293 0.0267 0.0466
Improvement 0.0052 0.0076 0.0114 0.0121 0.009 0.0123 0.0112 0.0174
Improvement [%] 30 46 69 38 67 72 72 59

output in the first round of the AES encryption. We targeted 8-byte transitions
in the AES State and measured 200 000 power traces for the analyses.
The results show that our setup is able to improve the correlation coefficient
between 30 % and 72 %. In five of the eight attacks, the correlation coefficient
could be increased by more than 50 %. For the best attack, this means that
33 000 traces instead of about 97 000 traces have to be measured to succeed the
attack which corresponds to a trace reduction of nearly 3.

9 Conclusion
In this paper, we presented a measurement setup that increases the efficiency of
side-channel attacks. The idea of the setup is to use two cryptographic devices
and to measure the difference of their side-channel leakages. If both devices per-
form the same operation synchronously and if they process different data, the
static and the data-independent power consumption is canceled out and only
data-dependent side-channel leakage can be effectively identified. This results in
a much higher signal-to-noise ratio during the measurement where up to 90 %
less power traces have to be acquired to succeed an attack as shown in practical
experiments. Furthermore, the setup can be used to efficiently identify differ-
ences in the instruction flow of cryptographic implementations or to discover
data-dependent variations which can be exploited in attacks. The setup further
significantly increases the efficiency of template-based side-channel attacks that
use only a single-acquisition power trace to reveal secret information.

Acknowledgements. The work has been supported by the European


Commission through the ICT program under contract ICT-SEC-2009-5-258754
(Tamper Resistant Sensor Node - TAMPRES) and by Austrian Science Fund
(FWF) under grant number P22241-N23 (Investigation of Implementation
Attacks - IIA).
Exploiting the Difference of Side-Channel Leakages 15

References
1. Agrawal, D., Archambeault, B., Rao, J.R., Rohatgi, P.: The EM side-channel(s).
In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp.
29–45. Springer, Heidelberg (2003)
2. Agrawal, D., Rao, J.R., Rohatgi, P., Schramm, K.: Templates as Master Keys.
In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 15–29. Springer,
Heidelberg (2005)
3. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage Model.
In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29.
Springer, Heidelberg (2004)
4. Brightsight. Unique Tools from the Security Lab, http://www.brightsight.com/
documents/marcom-materials/Brightsight Tools.pdf
5. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
6. Cryptography Research. DPA Workstation,
http://www.cryptography.com/technology/dpa-workstation.html
7. den Hartog, J., Verschuren, de Vink, E., de Vos, J., Wiersma, W.: PINPAS: A Tool
for Power Analysis of Smartcards. In: Sec 2003, pp. 453–457 (2003)
8. International Organisation for Standardization (ISO). ISO/IEC 10373-6: Identifi-
cation cards - Test methods – Part 6: Proximity cards (2001)
9. International Organisation for Standardization (ISO). ISO/IEC 10373-7: Identifi-
cation cards - Test methods – Part 7: Vicinity cards (2001)
10. Kocher, P.C.: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS,
and Other Systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp.
104–113. Springer, Heidelberg (1996)
11. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
12. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks – Revealing the Secrets
of Smart Cards. Springer (2007) ISBN 978-0-387-30857-9
13. Matsumoto, T., Kawamura, S., Fujisaki, K., Torii, N., Ishida, S., Tsunoo, Y., Saeki,
M., Yamagishi, A.: Tamper-resistance standardization research committee report.
In: The 2006 Symposium on Cryptography and Information Security (2006)
14. Popp, T., Kirschbaum, M., Mangard, S.: Practical Attacks on Masked Hardware.
In: Fischlin, M. (ed.) CT-RSA 2009. LNCS, vol. 5473, pp. 211–225. Springer,
Heidelberg (2009)
15. Popp, T., Kirschbaum, M., Zefferer, T., Mangard, S.: Evaluation of the Masked
Logic Style MDPL on a Prototype Chip. In: Paillier, P., Verbauwhede, I. (eds.)
CHES 2007. LNCS, vol. 4727, pp. 81–94. Springer, Heidelberg (2007)
16. Riscure. Inspector - The Side-Channel Test Tool,
http://www.riscure.com/fileadmin/images/Docs/Inspector_brochure.pdf
17. Side-channel attack standard evaluation board. The SASEBO Website,
http://www.rcis.aist.go.jp/special/SASEBO/
18. The Mathworks. MATLAB - The Language of Technical Computing,
http://www.mathworks.com/products/matlab/
16 M. Hutter et al.

A Appendix: Example of SNR Improvement

We calculated the signal-to-noise ratio for the power measurements on the AT-
mega128 board (see Section 5 for a description of the board). Figure 12 shows
three SNR plots according to three performed attacks: the Reference Attack,
Constant-Value attack, and Complementary-Value attack. The SNR is defined
as the ratio of signal power to the noise power. For the signal characterization,
we calculated the variance of means for each of the 256 possible intermediate
values (300 power traces for each value resulting in 76 800 power traces in total).
The noise has been characterized by calculating the variance of constant value
processing, cf. [12]. It shows that the SNR is improved by a factor of 21.6 (from
3 to about 65). For the Constant-Value attack, the SNR has been improved from
0.3 to about 14 (by a factor of 4.6).

4
Reference Attack
SNR

0
0 1 2 3 4 5
15
Constant−Value Attack
10
SNR

0
0 1 2 3 4 5
80
60 Complementary−Value Attack
SNR

40
20
0
0 1 2 3 4 5
Time [µs]

Fig. 12. Signal-to-noise ratio of the Reference Attack, Constant-Value attack, and
Complementary-Value attack on the ATmega128
Attacking an AES-Enabled NFC Tag:
Implications from Design to a Real-World
Scenario

Thomas Korak, Thomas Plos, and Michael Hutter

Institute for Applied Information Processing and Communications (IAIK),


Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria
{thomas.korak,thomas.plos,michael.hutter}@iaik.tugraz.at

Abstract. Radio-frequency identification (RFID) technology is the en-


abler for applications like the future internet of things (IoT), where secu-
rity plays an important role. When integrating security to RFID tags, not
only the cryptographic algorithms need to be secure but also their im-
plementation. In this work we present differential power analysis (DPA)
and differential electromagnetic analysis (DEMA) attacks on a security-
enabled RFID tag. The attacks are conducted on both an ASIC-chip
version and on an FPGA-prototype version of the tag. The design of the
ASIC version equals that of commercial RFID tags and has analog and
digital part integrated on a single chip. Target of the attacks is an im-
plementation of the Advanced Encryption Standard (AES) with 128-bit
key length and DPA countermeasures. The countermeasures are shuf-
fling of operations and insertion of dummy rounds. Our results illustrate
that the effort for successfully attacking the ASIC chip in a real-world
scenario is only 4.5 times higher than for the FPGA prototype in a lab-
oratory environment. This let us come to the conclusion that the effort
for attacking contactless devices like RFID tags is only slightly higher
than that for contact-based devices. The results further underline that
the design of countermeasures like the insertion of dummy rounds has
to be done with great care, since the detection of patterns in power or
electromagnetic traces can be used to significantly lower the attacking
effort.

Keywords: Radio-Frequency Identification (RFID), Advanced Encryp-


tion Standard (AES), Side-Channel Analysis (SCA), Differential Power
Analysis (DPA), Differential Electromagnetic Analysis (DEMA).

1 Introduction

Radio-frequency identification (RFID) technology has gained a lot of attention


during the last decade and is already used in many applications like ticketing,
supply-chain management, electronic passports, access-control systems, and im-
mobilizers. The relevance of this technology is underlined by the integration
of RFID functionality into the latest generation of smart phones, which uses

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 17–32, 2012.
Springer-Verlag Berlin Heidelberg 2012
18 T. Korak, T. Plos, and M. Hutter

so-called near-field communication (NFC). With this widespread use of RFID


technology, new applications like the future internet of things (IoT) will arise
where security plays an important role. When integrating security to RFID sys-
tems, not only the selected cryptographic algorithms have to be secure, but also
their implementation has to be protected against attacks such as side-channel
analysis.
An RFID system consists of a reader (e.g. a smart phone) and a tag that
communicate contactlessly by means of a radio frequency (RF) field. The tag is
a small microchip attached to an antenna. Passive tags also receive their power
supply from the RF field, which limits the available power budget of the tags.
Especially passive tags that can be produced at low cost will be used in appli-
cations like the future IoT, where tags have to be competitive in price. In order
to keep the price low, tags have to be produced in high volume and with small-
est possible chip size. These limitations make the integration of cryptographic
security to RFID tags challenging.
Recent incidents like the reverse engineering of the CRYPTO 1 algorithm in
Mifare tags [22], the breaking of the Digital Signature Transponder (DST) [3],
or the attacks on the Hitag 2 cipher [5] and the KeeLoq remote entry sys-
tem [7] have emphasized the need for integrating strong cryptographic security
to RFID tags. A lot of effort has been made by the research community to bring
strong security to resource-constrained RFID tags. Well-known examples are
symmetric-key schemes like the Advanced Encryption Standard (AES) [8, 10, 21]
and PRESENT [25], or public-key schemes like Elliptic Curve Cryptography
(ECC) [1, 2, 11, 27] and NTRU [12].
Having a strong cryptographic algorithm alone is not enough, also the imple-
mentation of the algorithm has to be secure. Techniques that exploit weaknesses
of an implementation are called implementation attacks. A prominent kind of
implementation attack is side-channel analysis (SCA). In an SCA attack, side-
channel information is measured during the execution of a cryptographic al-
gorithm to deduce secret data like the encryption key. As side-channel infor-
mation, execution time [18], power consumption [19], or electromagnetic (EM)
emissions [9] of a cryptographic device can be used. A very powerful SCA attack
is differential power analysis (DPA) introduced by Kocher et al. [19] that reveals
even very weak data-dependent information in the power consumption of a de-
vice. When using the EM emissions of a device instead of the power consumption,
the attack is called differential electromagnetic analysis (DEMA) [26]. In order
to make SCA attacks less efficient, so-called countermeasures are integrated.
While there is a large number of published articles about DPA and DEMA
attacks on contact-based devices, there is only a handful of them about attacks
on RFID devices. Hutter et al. [14, 15] have presented several DPA and DEMA
attacks on high frequency (HF) RFID prototype devices. Oren and Shamir [23]
have inspected the EM emissions of ultra-high frequency (UHF) tags to deduce
the secret kill password. Kasper et al. [17] and Oswald [6] have successfully
applied DEMA attacks on a contactless smart card that computes Triple DES
(3DES).
Attacking an AES-Enabled NFC Tag 19

In this work we present DPA as well as DEMA attacks on a security-enabled


NFC tag. The novelty of this work is that we have conducted the attacks on
two versions of the tag, an ASIC-chip version and an FPGA-prototype version.
Both versions implement the same functionality. The ASIC integrates the digital
part and the analog part on a single chip, which equals the design structure of
commercially available RFID tags. The FPGA prototype on the other hand has
the digital part implemented on the FPGA and the analog part is realized via
an extra analog front-end built with discrete components. Our work closes the
gap of current publications where either prototype tags or commercially avail-
able RFID tags are examined separately. Target of the SCA attacks is an AES
implementation that has countermeasures integrated. The countermeasures are
shuffling of operations and insertion of dummy rounds. Our results show that the
effort for attacking the ASIC chip is only 4.5 times higher with our measurement
setup than for the FPGA prototype. This clarifies that the effort for attacking
commercial RFID tags is only slightly higher than for prototype devices. The
results also confirm that countermeasures like the insertion of dummy rounds
have to be implemented very carefully, as the detection of patterns in the traces
allows to significantly reduce the attacking effort.
The remainder of this work is organized as follows. Section 2 provides an
overview of the ASIC chip and the FPGA prototype that we have used for
our measurements. In Section 3 we describe the different measurement-setup
scenarios. Side-channel analysis results are given in Section 4. Conclusions are
drawn in Section 5.

2 Overview of the Analyzed Devices


In this section we give an overview of the attacked hardware devices. For the
evaluation we use a security-enabled NFC-tag chip. First the focus is put on
the ASIC version of the security-enabled NFC-tag chip and then on the FPGA-
prototype version. The latter device is a prototype but with the connected an-
tenna it behaves like a commercial, passive RFID tag. It is an HF tag using a
frequency of 13.56 MHz in order to communicate with the reader and the com-
munication protocol is implemented according to the ISO 14443A standard [16].
The chip consists of two main parts as it can be seen in Figure 1: the analog
front-end (AFE) and the digital part. The antenna is connected to the AFE
that provides the power supply and the clock signal to the digital part. The dig-
ital part is responsible for processing the commands to communicate with the
reader. This part also contains a crypto unit with an AES implementation to
provide symmetric-key cryptography. The AES part is implemented as special-
purpose hardware to meet the most important requirements for RFID-tag chips:
low power consumption and small chip area. Low power consumption is a re-
quirement because the chip uses the power supply generated from the reader
field. Chip area is an important factor concerning the production costs. More
implementation details of the chip can be found in [13, 24].
20 T. Korak, T. Plos, and M. Hutter

Pins for external power


supply of the chip
Chip
Antenna Analog frontend Digital part

VCC Controller Crypto


unit
AES
GND
Clock
Data

Fig. 1. Architecture of the evaluated chip Fig. 2. The development board with the
evaluated chip

There are two countermeasures integrated into the AES implementation in


order to increase the resistance against SCA attacks: the insertion of dummy
rounds and shuffling. The chip processes in total 25 rounds during an AES
encryption/decryption. Ten rounds relate to the real computation of AES and
fifteen rounds are dummy rounds that process random data. The dummy rounds
are inserted at the beginning and at the end in order to increase the effort
for SCA attacks. With shuffling, the processing order of the bytes of the state
is randomized. As the AES state consists of sixteen bytes every byte can be
processed at sixteen different points in time. For DPA/DEMA attacks it is very
important to know at which point in time a specific byte of the state is processed.
Because of that fact shuffling increases the attack complexity.
As it can be seen in Figure 2 the prototype chip is mounted on a development
board that contains an antenna with four windings. The board also allows to
power the chip with an external power supply. If an external power supply with
a voltage of 3.3 V or more is connected, the chip does not use the power supply
extracted from the reader field. This gave us the ability to measure the power
consumption of the chip with a resistor in the ground line.
In addition to the security-enabled NFC-tag chip we also use an FPGA-
prototype tag for the evaluation. The implementation of the digital part on
the FPGA-prototype tag is equal to the one on the evaluated ASIC chip. For a
reader device, the FPGA-prototype tag appears like a regular, passive RFID tag.
It uses an external power supply but the reader field is used for communication
and for extracting the clock signal. We used the FPGA-prototype tag to show
that the DEMA-attack results achieved with this device are comparable with
the results from the real tag. Another advantage of the FPGA-prototype tag
is that we have more control over this device. We could use, e.g., a debug pin
in order to get a precise trigger signal. The FPGA prototype further gives the
ability to correct bugs detected on the real chip and evaluate the effects of the
modification. It is also important to mention that the FPGA-prototype version
enables the chip developers to test the implementation before manufacturing the
ASIC chip.
Attacking an AES-Enabled NFC Tag 21

3 Measurement Setup
The LC584AM oscilloscope from LeCroy was used to record the traces and the
recording process was controlled with a computer running MATLAB scripts. In
order to establish the communication between computer and tag an RFID reader
(Tagnology TagScan) was used. The EM probes to measure the electromagnetic
emanation are from ‘Langer EMV Technik’. We were able to record 1 trace per
second on average. The reason for this rather low recording speed is on the one
hand the two-step communication between computer and tag (the reader is in
the middle) and on the other hand storing the traces on the computer is also a
time-consuming process. Three different measurement setups were used in order
to record the traces needed for the SCA attacks: the real-world scenario, the test
scenario and the FPGA scenario.

Real-World Scenario. The real-world scenario is the most important one because
it can be used to attack the real NFC-tag chip without additional requirements
like trigger pins or external power supply. In this scenario the electromagnetic
emanation of the chip is measured using an EM probe. In order to measure only
the electromagnetic emanation and not the reader signal we separated the chip
and the antenna. This approach was presented by Hutter et al. [14] as well as by
Caluccio et al. [4]. So the chip could be placed outside of the reader field for better
measurement results. In our setup the distance between tag chip and antenna
was 25 centimeters. The presented modification can be made with every RFID
tag. A second EM probe was used in order to get the trigger information. This
probe was placed inside the reader field. With these traces the reader commands
could be easily identified. The EM traces were recorded with a sampling rate
of 2.5 GS/s. A schematic of the measurement setup for this scenario can be
seen in Figure 3. There were only small deviations in the duration between the
reader command and the start of the AES calculation. With an alignment step
these deviations could be removed and satisfying DPA-attack results could be
achieved. The least-square matching method was used to align the traces.
Test Scenario. The test scenario can only be performed with the development
board and is also used to attack the ASIC chip. In that scenario the chip was
powered with an external power supply, so the chip does not use the supply
voltage extracted from the reader field. We inserted a resistor in the ground
line in order to measure the power consumption of the chip. The value of the
resistor was 100 Ω. A schematic overview of the measurement setup can be seen
in Figure 4. The amplitude of the recorded trace increases significantly when the
chip starts an AES calculation. This could be used as trigger information. With
that setup the traces were not perfectly aligned so an alignment step was also
necessary in order to get satisfying results of the DPA attacks.

FPGA Scenario. The FPGA scenario was used to attack the FPGA-prototype
tag. In this scenario the electromagnetic emanation of the FPGA was used as
side-channel information. We used an EM probe to measure the electromagnetic
emanation. One advantage of the FPGA-prototype tag for the EM measurements
22 T. Korak, T. Plos, and M. Hutter

Fig. 3. Measurement setup of the real- Fig. 4. Measurement setup of the test
world scenario scenario

was that the FPGA chip is placed outside of the reader field. Several pins can
be used as debug pins on the FPGA-prototype tag. We used one of these pins to
indicate when the AES calculation starts. The signal of this pin could be used as
trigger information. This trigger information was very accurate so no alignment
step was necessary for successful DPA attacks on the FPGA prototype tag.

4 Side-Channel Analysis Results


In order to evaluate the security of the NFC tag we performed DPA and DEMA
attacks on the AES implementation on the chip. As intermediate result we used
the output of the S-box lookup for the first key byte in the first round of AES. The
Hamming-weight model was used as power model to get the hypothetical power
values. The Pearson correlation coefficient was used to calculate the correlation
between the hypothetical power values and the recorded traces. The equation to
calculate the correlation coefficient ρ can be found in [20].
As performance indicator for the attacks we used the number of required
traces n to reveal the value of the first key byte. The relationship between the
number of traces n and the correlation coefficient ρ is shown in Equation 1 [20].
For further calculations we used z1−α = 3.719 with α = 0.0001.

z1−α
2
n=3+8 (1)
ln2 1−ρ
1+ρ

The results of the performed DPA/DEMA attacks can be split into two main
parts: attacks with disabled countermeasures and attacks with enabled counter-
measures. The attacks with disabled countermeasures were used to evaluate the
performance of the different measurement setups. They equal an attack of an
unprotected AES implementation and results can be achieved with a small num-
ber of traces. The randomization parameters for the countermeasures were fixed.
This means that no dummy rounds are inserted at the beginning. Also shuffling
is deactivated, so the first S-box operation always appears at the same point in
time for every new AES encryption. With this step we show that the different
Attacking an AES-Enabled NFC Tag 23

Fig. 5. DEMA-attack result of the real- Fig. 6. DEMA-attack result of the real-
world scenario with countermeasures dis- world scenario with countermeasures dis-
abled. In this case the whole amplitude abled. In this case only the positive values
of the EM trace was recorded. of the EM trace were recorded.

approaches to measure the side-channel information as well as the attacks on the


different hardware devices lead to comparable results which is a very important
observation.
The attacks with enabled countermeasures could only be performed on the
FPGA-prototype tag. The reason for this limitation is that the countermea-
sures on the ASIC version of the chip cannot be enabled because of a bug in
the implementation. On the FPGA-prototype version the parameters for the
countermeasures are random values. These values are updated for every AES
encryption. In that case a random number of dummy rounds is inserted at the
beginning and the first S-box operation is shuffled over sixteen positions in time.
Before we started with the attacks we did an estimation for the needed effort
for successful attacks with enabled countermeasures based on the results with
disabled countermeasures.

4.1 Measurements with Disabled Countermeasures


Figure 5 shows the result of the DEMA attack on the security-enabled NFC
tag for the real-world scenario. Here the positive as well as the negative part of
the EM trace was recorded. The black correlation trace contains a clearly visi-
ble peak and belongs to the correct key hypothesis. The maximum correlation
value for this attack is 0.267. According to Equation 1, 373 traces are required
to obtain the correct value for the first key byte. In order to get a satisfying
result two preprocessing steps had to be performed on the recorded traces: fil-
tering and aligning. A lowpass filter with a stop frequency of 8 MHz was used
to filter out surrounding noise and the reader signal. The filtered traces had to
be aligned because the used trigger signal, the pattern in the communication,
was not accurate enough. In order to achieve an even higher correlation value,
we performed further measurements where we only recorded the positive val-
ues of the EM traces. So we could increase the resolution of the voltage values.
24 T. Korak, T. Plos, and M. Hutter

As a result we got a higher correlation value of 0.325 and the result can be seen
in Figure 6. According to Equation 1, with 246 traces the correct value for the
first key byte could be found. With this improvement we were able to decrease
the number of required traces from 373 to 246.
As a second experiment we performed a DPA attack using the test scenario.
In this scenario we used an external power supply for the chip and measured the
power consumption with a resistor in the ground line. Here we got a correlation
value of 0.664 for the correct key hypothesis. About 47 traces are needed in order
to reveal the value of the first key byte.
In the FPGA case about 54 traces are needed in order to perform a successful
attack. For comparison we have plotted the result of the DEMA attack on the
FPGA prototype tag in Figure 7. Here the correlation value for the correct key
hypothesis is 0.629. Filtering the recorded traces was the only required prepro-
cessing step for a successful attack. Here a bandpass filter with a lower frequency
of 15 MHz and an upper frequency of 25 MHz had to be used in order to get sat-
isfying results. A dedicated pin was used for the trigger information and so the
traces did not have to be aligned afterwards.
The test scenario and the FPGA scenario produce similar results. Successful
attacks can be performed with low effort, only 47 and 55 traces are needed
to reveal the value of the first key byte, respectively. However both of these
attacks cannot be performed on a real RFID tag. The real-world scenario that
we have used for our measurements can be performed on real RFID tags as well.
We were able to perform successful DEMA attacks on the unprotected AES
implementation with 246 traces using that scenario, compared to the FPGA
scenario the effort increases by a factor of 4.5. This result enables chip designers
to evaluate the security of other implementations using the same production
process in an early design step. An FPGA implementation of the chip can be used
in order to evaluate the resistance of the ASIC against SCA attacks. If there is a
redesign of an existing ASIC (e.g. new SCA countermeasures are implemented),
the presented approach can be used to evaluate the security of the new ASIC
using the results of the SCA attacks on the FPGA implementation. We also use
the achieved results from above in the following section in order to evaluate the
security of the protected AES implementation.

4.2 Measurements with Enabled Countermeasures


Before we started with the attack on the protected AES implementation, we did
some estimations on the effort needed for a successful attack. These estimations
can be found in Table 1. The dummy-round countermeasure increases the num-
ber of traces needed for a successful attack by a factor of 256 and also shuffling
increases the number of traces needed for a successful attack by a factor of 256.
As a result the total number of traces required for a successful attack increases
by a factor of 2562 = 65 536. For a successful attack on the unprotected im-
plementation 55 traces were needed and this value multiplied with 65 536 gives
nearly four million traces. With our recording speed of one trace per second this
would lead to a recording time of about 42 days! For the real-world scenario this
Attacking an AES-Enabled NFC Tag 25

Fig. 7. DEMA-attack result of the FPGA Fig. 8. Filtered EM trace of the initial
scenario with countermeasures disabled key addition and the first three rounds of
AES

Table 1. Estimate of the required number of traces for a successful DPA attack with
enabled countermeasures

Estimated number of traces


Countermeasures FPGA scenario Test scenario Real-world scenario
No active countermeasures 55 47 246
Shuffling 14 080 12 032 62 976
Shuffling and dummy rounds > 3 600 000 > 3 000 000 > 16 100 000

would lead to a recording time of 189 days (using the factor of 4.5 from above).
This effort is rather high so we tried to find a way to reduce the impact of the
countermeasures. In many applications the number of encryptions is also limited
to a specific value, so a DPA/DEMA attack can only be successful if the number
of required traces is below this value.
The approach we used for reducing the impact of the countermeasures was to
get some information about the random value defining the number of dummy
rounds inserted at the beginning. For that purpose we recorded a set of 100 traces
containing the initial key addition and the first AES round. A plot showing one
trace of this set can be found in Figure 8. Our observations showed that delay
cycles are also inserted during the initial key addition. After some analysis of
the traces we found a pattern during the initial key addition. When calculating
the difference of two traces, peaks appear at different points in time depending
on the random variable defining the number of dummy rounds inserted at the
beginning. For the set of 100 traces we have calculated the difference for every
single pair of traces and could observe three different cases which are illustrated
in Figure 9:

– In the first case no significant peak can be identified.


– In the second case four significant peaks can be identified which have nearly
the same amplitude.
26 T. Korak, T. Plos, and M. Hutter

Fig. 9. The left plot shows the difference of two traces without significant peaks (first
case). The plot in the middle shows the difference of two traces with four peaks with
comparable amplitude (second case). The plot on the right side shows the difference
of two traces with one significant peak (third case). Traces recorded with the FPGA
scenario have been used to generate these plots.

– In the third case again four peaks in the difference trace can be identified
but one of these four peaks has a significantly higher amplitude.
Following the upper observation we made the following assumptions: If the dif-
ference of two traces leads to the first case, the same random value was used
for the dummy-round countermeasure of both encryptions. If the difference of
two traces leads to the second case different random values were used for the
dummy-round countermeasure for the two encryptions. Finally, if the difference
leads to the third case a specific value was used for the countermeasure during
one of the two encryptions.
In a first attack scenario we used the third case to filter out the traces with one
specific number of dummy rounds inserted at the beginning. First we recorded
a set of traces including the first 16 rounds (there are 25 rounds in total, 15
dummy rounds and ten real AES rounds). In a next step we created a new set of
traces containing only these traces where the specific number of dummy rounds
were inserted at the beginning. In order to visualize our approach to filter out
the traces we have plotted the difference matrix for 100 traces which can be seen
in Figure 10. This matrix contains the absolute maximum value of the difference
of the two traces corresponding to the row number and column number. It is
clearly visible that for some traces this value is higher (darker points) compared
to other traces. In order to build the reduced set of traces we have selected only
these traces corresponding to a row number with a high value (dark points). As
we assume a unique distribution of the random value the size of this new set
is about 1/16 of the size of the original set. On the reduced set we performed
a DEMA attack. In order to conduct the first attack scenario we recorded a
set of 320 000 traces. After filtering out the dummy rounds with the approach
presented above the size of the set reduced by a factor of 16 to 20 000 traces.
The reduced set only contains traces with a specific number of dummy rounds
at the beginning followed by the first real AES round processing the attacked
intermediate value. On this reduced set we performed a DEMA attack and were
able to reveal the value of the first key byte. It figured out that 15 dummy rounds
Attacking an AES-Enabled NFC Tag 27

Fig. 10. Visualization of the difference Fig. 11. DEMA-attack result of the
matrix for 100 traces FPGA scenario with active countermea-
sures

are inserted at the beginning when the special pattern appears in the difference
traces. Figure 11 shows the result of this attack. Compared to the results in
Figure 5, Figure 6 and Figure 7 no single correlation peak can be identified.
This is because shuffling spreads the single peak on 16 different points in time.
With a bigger set of traces the 16 peaks in the correlation trace of the correct
key hypothesis could be identified better. The maximum correlation value of the
attack is 0.03931.
In a second attack scenario we used the first case of our observations above
to split the recorded traces into 16 groups. As we had the ability to read out
the random value used for the randomization for every encryption by a small
change in the FPGA implementation we were able to verify the performance of
the clustering approach. All the traces in one group belong to encryptions where
the same random value for the dummy rounds was used. In order to perform the
clustering we used the K-means clustering function provided by MATLAB with
the squared euclidian distance as distance measure. We also did a performance
analysis where we performed the group building for 100 to 500 traces. There
is a linear relationship between runtime of the group building algorithm and
the number of traces used. The amount of correctly classified traces is between
96% and 98%. The building of the groups takes about 0.25s per trace. It has to
be mentioned that for an attack the group building step has to be conducted
only for e.g. the first 100 traces. The huge remaining part of the traces can be
clustered by just comparing with the groups. We achieved similar results by
comparing with one single trace of each group and by comparing with the mean
trace of each group. Here we were able to decrease the time to group one trace
to 0.1s. The length of the traces used for the mentioned experiment was 250 000
samples. The runtime strongly depends on the length of the used traces.
With the clustering approach it is now possible to decrease the number of
required traces for a successful DEMA attack on the secret AES key. First of
all we recorded a set of 320 000 traces containing the initial key XOR and the
first three rounds. Next we applied the clustering algorithm to group the traces
28 T. Korak, T. Plos, and M. Hutter

into 16 groups. The clustering step for 320 000 traces takes about 9 hours on
a standard desktop computer. Every group contains on average 20 000 traces
as the random value defining the number of dummy rounds at the beginning
follows a uniform distribution. Now there are more possibilities to conduct the
attack. One way is to put the focus just on the first round and perform a DEMA
attack on each of the 16 groups separately. The result of the attack using one
specific group (the one where no dummy rounds are inserted at the beginning)
leads to a significantly higher correlation value for the correct key byte. The
shuffling countermeasure is still active but Table 1 shows that 20 000 traces are
sufficient to find the correct key value even in the presence of shuffling. A second
way is to combine the first and the second round and trying out all different
combinations of two groups. That means to pick out the first round of group A
and the second round of group B and preform a DPA attack on this combination.
If group A is the group where no dummy rounds are inserted at the beginning
and group B is the group containing traces where one dummy round is inserted
at the beginning the DPA attack leads to a correct result. This approach leads
to a higher computational effort because there are 256 possible combinations.
The number of required traces decreases because only 10 000 traces are needed
in each group. So the total number of traces decreases to 160 000. The runtime
for the DEMA attacks increases to nearly 15 hours in that case. Furthermore
we estimated the complexity for the focus on three rounds and the combination
of three groups. As the number of possible combinations increases to 4 096 the
runtime for the DEMA attacks increases to nearly 6.5 days. The positive effect
is that the number of required traces decreases again. A summary of the upper
scenarios can be found in Table 2.

Table 2. The influence of the clustering approach on the number of traces needed for
a successful DPA attack as well as on the calculation time for the attack

Groups Comb. Required traces Required traces Time for DPA Total
used per group overall attack on one group time
[s] [s]
1 16 20 000 320 000 400 6 400
2 256 10 000 160 000 200 51 200
3 4 096 6 666 106 666 133 544 768

In a last experiment we used another preprocessing step called windowing in


order to reduce the impact of shuffling on the attack complexity. This approach
is presented in the book of Mangard et al. [20]. It should be possible to decrease
the attack complexity by a factor of four with windowing. A key factor for this
step is to find a good window pattern. In our attacks it was very hard to find
such a pattern and so we could only achieve a complexity reduction of 1.4.
Table 3 compares the FPGA scenario and the real-world scenario. Based on
the correlation values of the attacks using the FPGA scenario the number of
required traces n using Equation 1 are calculated. With the number of traces
the attack duration can be calculated as our recording speed is one trace per
Attacking an AES-Enabled NFC Tag 29

Table 3. Comparison of the number of needed traces and the duration for recording
the required amount of traces for the FPGA scenario and the real-world scenario. Also
the influence of the used preprocessing techniques is illustrated. With windowing the
impact of shuffling can be decreased. With our clustering approach the impact of the
dummy rounds can be decreased. The number in the brackets denotes the number of
used groups for the DPA attack.

FPGA scenario Real-world scenario


Countermeasures Preprocessing n Time n Time
No countermeasures - 55 < 1 min 246 < 5 min
- 17 886 5 hours 80 000 23 hours
Shuffling
Windowing 9 119 2.5 hours 41 036 11.4 hours
- 4 571 000 53 days 20 480 000 246 days
Clustering(1) 320 000 3.7 days 1 440 000 17 days
Shuffling, dummy rounds
Clustering(2) 160 000 1.9 days 720 000 8.5 days
Clustering(3) 106 666 30 hours 480 000 5.6 days

second. With the knowledge that the attack complexity for the real-world sce-
nario increases by a factor of 4.5 the number of required traces to perform a
successful attack as well as the attack duration can be given.

4.3 Summary of the Results


As we have shown with the DPA/DEMA attacks performed on the unprotected
AES implementation the effort (needed number of traces) for a successful attack
for the real-world scenario is 4.5 times higher compared to the FPGA scenario.
Table 3 draws a comparison between the effort for a successful DPA attack using
the FPGA scenario and the real-world scenario. Attacks on the protected AES
implementation could only be performed in the FPGA scenario because of a bug
in the ASIC chip. The effort for the real-world scenario can be estimated based
on the results for the attacks on the unprotected AES implementation.
A successful attack on an unprotected AES implementation using the FPGA
scenario can be performed in less than one minute. With the real-world sce-
nario the value of a key byte can be revealed within five minutes. This result
emphasises again that it is possible to successfully attack an unprotected AES
implementation on an RFID tag with very low effort and that countermeasures
have to be implemented.
If the AES implementation is protected with countermeasures against SCA
attacks (insertion of dummy rounds and shuffling) as it is done on the FPGA-
prototype tag the attack complexity increases significantly. If no patterns can
be found to decrease the influence of the countermeasures 53 days are required
in order to record the amount of traces needed for a successful DEMA attack
on the FPGA-prototype tag. For the real-world scenario the duration has to be
multiplied by a factor of 4.5, so the duration for a successful attack increases to
246 days.
If the attacker can find a pattern to mitigate the influence of the used coun-
termeasures the effort for a successful attack can be decreased. As we have
30 T. Korak, T. Plos, and M. Hutter

shown with the FPGA scenario we could find 2 different ways to decrease the
attack complexity. We were able to reveal some information about the number of
dummy rounds inserted before the first real AES round. Furthermore we could
show that with our approach it is possible to scale down the number of required
traces by adding more computational effort afterwards. This can be an important
step if the number of encryptions is limited to a fixed value (e.g. 200 000).

5 Conclusion
In this work we presented DPA and DEMA attacks on the AES implementation
of a security-enabled NFC tag. For the attacks we used an FPGA-prototype
version as well as a manufactured ASIC chip. Three different measurement se-
tups were used: a real-world scenario, a test scenario and an FPGA scenario. We
could show that the results of the attacks on the ASIC chip using the real-world
scenario are comparable with the attack results on the FPGA prototype. The
effort for the attack on the ASIC chip is 4.5 times higher compared to the attack
on the FPGA prototype. The attacks on the ASIC chip were performed using a
real-world scenario without a dedicated trigger pin or an external power supply
of the chip. The attacks on the FPGA prototype were performed under labo-
ratory conditions. The attacked AES implementation also has countermeasures
against SCA attacks integrated which are the insertion of dummy rounds and
shuffling. We were able to enable and disable the countermeasures and so we
found a pattern to mitigate the impact of the dummy-round countermeasure.
This pattern gave us the ability to group the recorded traces according to the
number of dummy rounds inserted before the first real AES round. As a con-
sequence the attack complexity decreased. Only some knowledge (usage of the
dummy-round countermeasure) about the AES implementation was needed in
order to find this pattern so the presented approach is a serious thread for im-
plementations with countermeasures against SCA attacks. We could show that
with the presented approach it is possible to decrease the number of needed
traces for a successful DPA attack. In our special case the number of traces
could be reduced from 320 000 to less than 110 000 traces. As a side-effect the
computational effort increases but within acceptable limits.

Acknowledgements. The work presented in this article has been supported


by the European Commission through the ICT programs TAMPRES (under
contract ICT-SEC-2009-5-258754) and ECRYPT II (under contract ICT-2007-
216676).

References
[1] Auer, A.: Scaling Hardware for Electronic Signatures to a Minimum. Master thesis,
University of Technology Graz (October 2008)
[2] Batina, L., Guajardo, J., Kerins, T., Mentens, N., Tuyls, P., Verbauwhede, I.:
Public-Key Cryptography for RFID-Tags. In: Workshop on RFID Security 2006
(RFIDSec 2006), Graz, Austria, July 12-14 (2006)
Attacking an AES-Enabled NFC Tag 31

[3] Bono, S., Green, M., Stubblefield, A., Juels, A., Rubin, A., Szydlo, M.: Secu-
rity Analysis of a Cryptographically-Enabled RFID Device. In: Proceedings of
USENIX Security Symposium, Baltimore, Maryland, USA, pp. 1–16. USENIX
(July-August 2005)
[4] Carluccio, D., Lemke, K., Paar, C.: Electromagnetic Side Channel Analysis of a
Contactless Smart Card: First Results. In: Oswald, E. (ed.) Workshop on RFID
and Lightweight Crypto (RFIDSec 2005), Graz, Austria, July 13-15, pp. 44–51
(2005)
[5] Courtois, N.T., O’Neil, S., Quisquater, J.-J.: Practical Algebraic Attacks on the
Hitag2 Stream Cipher. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A.
(eds.) ISC 2009. LNCS, vol. 5735, pp. 167–176. Springer, Heidelberg (2009)
[6] Oswald, D., Paar, C.: Breaking Mifare DESFire MF3ICD40: Power Analysis and
Templates in the Real World. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS,
vol. 6917, pp. 207–222. Springer, Heidelberg (2011)
[7] Eisenbarth, T., Kasper, T., Moradi, A., Paar, C., Salmasizadeh, M., Shalmani,
M.T.M.: On the Power of Power Analysis in the Real World: A Complete Break of
the KeeLoq Code Hopping Scheme. In: Wagner, D. (ed.) CRYPTO 2008. LNCS,
vol. 5157, pp. 203–220. Springer, Heidelberg (2008)
[8] Feldhofer, M., Dominikus, S., Wolkerstorfer, J.: Strong Authentication for RFID
Systems Using the AES Algorithm. In: Joye, M., Quisquater, J.-J. (eds.) CHES
2004. LNCS, vol. 3156, pp. 357–370. Springer, Heidelberg (2004)
[9] Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic Analysis: Concrete Results.
In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp.
251–261. Springer, Heidelberg (2001)
[10] Hämäläinen, P., Alho, T., Hännikäinen, M., Hämäläinen, T.D.: Design and Im-
plementation of Low-Area and Low-Power AES Encryption Hardware Core. In:
Proceedings of 9th EUROMICRO Conference on Digital System Design: Architec-
tures, Methods and Tools (DSD 2006), Dubrovnik, Croatia, August 30-September
1, pp. 577–583. IEEE Computer Society (2006)
[11] Hein, D., Wolkerstorfer, J., Felber, N.: ECC Is Ready for RFID – A Proof in
Silicon. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381,
pp. 401–413. Springer, Heidelberg (2009)
[12] Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: A Ring-Based Public Key Cryp-
tosystem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 267–288.
Springer, Heidelberg (1998)
[13] Hutter, M., Feldhofer, M., Wolkerstorfer, J.: A Cryptographic Processor for Low-
Resource Devices: Canning ECDSA and AES Like Sardines. In: Ardagna, C.A.,
Zhou, J. (eds.) WISTP 2011. LNCS, vol. 6633, pp. 144–159. Springer, Heidelberg
(2011)
[14] Hutter, M., Mangard, S., Feldhofer, M.: Power and EM Attacks on Passive 13.56
MHz RFID Devices. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS,
vol. 4727, pp. 320–333. Springer, Heidelberg (2007)
[15] Hutter, M., Medwed, M., Hein, D., Wolkerstorfer, J.: Attacking ECDSA-Enabled
RFID Devices. In: Abdalla, M., Pointcheval, D., Fouque, P.-A., Vergnaud, D.
(eds.) ACNS 2009. LNCS, vol. 5536, pp. 519–534. Springer, Heidelberg (2009)
[16] International Organization for Standardization (ISO). ISO/IEC 14443: Identifica-
tion Cards - Contactless Integrated Circuit(s) Cards - Proximity Cards (2000)
[17] Kasper, T., Oswald, D., Paar, C.: EM Side-Channel Attacks on Commercial Con-
tactless Smartcards Using Low-Cost Equipment. In: Youm, H.Y., Yung, M. (eds.)
WISA 2009. LNCS, vol. 5932, pp. 79–93. Springer, Heidelberg (2009)
32 T. Korak, T. Plos, and M. Hutter

[18] Kocher, P.C.: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS,


and Other Systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp.
104–113. Springer, Heidelberg (1996)
[19] Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
[20] Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks – Revealing the Secrets
of Smart Cards. Springer (2007) ISBN 978-0-387-30857-9
[21] Moradi, A., Poschmann, A., Ling, S., Paar, C., Wang, H.: Pushing the Limits: A
Very Compact and a Threshold Implementation of AES. In: Paterson, K.G. (ed.)
EUROCRYPT 2011. LNCS, vol. 6632, pp. 69–88. Springer, Heidelberg (2011)
[22] Nohl, K.: Cryptanalysis of Crypto-1. Computer Science Department University of
Virginia, White Paper (2008)
[23] Oren, Y., Shamir, A.: Remote Password Extraction from RFID Tags. IEEE Trans-
actions on Computers 56(9), 1292–1296 (2007)
[24] Plos, T., Feldhofer, M.: Hardware Implementation of a Flexible Tag Platform for
Passive RFID Devices. In: Proceedings of the 14th Euromicro Conference on Dig-
ital System Design Architectures, Methods and Tools (DSD 2011), Oulu, Finland,
pp. 293–300. IEEE Computer Society (August 2011)
[25] Poschmann, A.Y.: Lightweight Cryptography - Cryptographic Engineering for a
Pervasive World. PhD thesis, Faculty of Electrical Engineering and Information
Technology, Ruhr-University Bochum, Germany (Februrary 2009)
[26] Quisquater, J.-J., Samyde, D.: A new Tool for Non-Intrusive Analysis of Smart
Cards Based on Electro-Magnetic Emissions, the SEMA and DEMA Methods.
Presented at the rump session of EUROCRYPT 2000 (2000)
[27] Tuyls, P., Batina, L.: RFID-Tags for Anti-counterfeiting. In: Pointcheval, D. (ed.)
CT-RSA 2006. LNCS, vol. 3860, pp. 115–131. Springer, Heidelberg (2006)
700+ Attacks Published on Smart Cards:
The Need for a Systematic Counter Strategy

Mathias Wagner

NXP Semiconductors Germany, GmbH, Stresemannallee 101,


22529 Hamburg, Germany
mathias.wagner@nxp.com
http://www.nxp.com

Abstract. Recent literature surveys showed that in excess of 700 papers


have been published on attacks (or countermeasures thereto) on embed-
ded devices and smart cards, in particular. Most of these attacks fall into
one of three classes, (hardware) reverse engineering, fault attacks, and
side–channel attacks. Not included here are pure software attacks. Each
year another 50–100 papers are being added to this stack and hence
it is becoming a necessity to find new ways to cope with new attacks
found during the design of secure smart cards, be it on the hardware or
the software side, or during their deployment phase. This paper explores
possible solutions to this issue.

Keywords: Smart card, attack, risk management, certification.

1 Introduction
Recent literature surveys showed that over the past two decades in excess of
700 papers have been published on attacks (or countermeasures thereto) on
embedded devices and smart cards, in particular. Most of these attacks fall into
one of three classes, (hardware) reverse engineering, fault attacks, and side–
channel attacks. Not included here are pure software attacks, which likely are
even more in abundance. Each year another 50–100 papers are being added to
this stack, and this is not accounting for exponential growth yet.
This poses a severe problem for the development and deployment of highly
secure hardware and software of embedded devices. Typically, a new embedded
chip family needs 2-3 years to develop, with derivatives perhaps being spun off
within 12-18 months. The development of secure operating systems for those
chips is not significantly faster.
Commercial software development can only start after development tools for
the embedded chip (simulator, emulator) have become available, so perhaps 1
year before the embedded chip itself is commercially available. Thus, adding
the development times of hardware and software and not accounting for further
time delays due to certification (such as Common Criteria or EMVCo) one can
conclude that easily 3 years will have passed since the embedded hardware had
been conceived originally. Or, in other words, another 150 – 300 attack papers

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 33–38, 2012.

c Springer-Verlag Berlin Heidelberg 2012
34 M. Wagner

will have been published by then. And during the foreseeable life time of the
product of another 3-5 years, this stack will increase to 300 – 600 papers.
So, how sure can we be that these embedded devices will not be hacked
during their life time? Clearly, old design strategies in hardware and software
development that operate in a ”responsive mode”, will not work. With these
strategies every time a new attack becomes known, one typically finds a patch,
applies it to the product, and then moves on.
To make matters more complicated: Since smart cards are generally certified
according to Common Criteria at level EAL 4+ or higher [1], meaning that they
are resistant to attackers with ”high attack potential”, any valid attack on a
smart card — and be that only under lab conditions in one of the certified eval-
uation labs — will reset the clock and will require that the embedded operating
system and, depending on the attack, perhaps also the underlying hardware be
tested again according to the Common Criteria rules. This adds costs and time
delay to any project. In the worst case yet another new attack will be found
whilst a product is still being tested in the evaluation labs, and the designers
have to go back to square one immediately. This way a product launch may be
delayed indefinitely.
What we thus need is a new, structurally different way of design that is much
more proactive and much more amenable to the requirements of todays ever–
faster moving security landscape.
In Section 2, for the sake of clarity, a brief overview of the dominant classes
of attacks is given, followed in Sect. 3 by a discussion of possible new design
strategies. However, not all problems will be solvable in the design phase, and
thus risk management and certification (Sect. 4) also need to be reviewed in this
context.

2 Overview of Attacks
Basically, there exist four classes of attacks: Reverse engineering of the hardware,
fault attacks, side–channel attacks, and software attacks. On top of this there
are attacks that combine elements out of these four fundamental classes.
An example of a reverse engineering attack was published at Blackhat [2,3].
In a nutshell, the aim here is to identify the standard cells used in the design
of the chip, understand the connectivity between these cells, and eventually
recover the functionality of the chip. A prime target with this approach is to
dump the memory content of a smart card, the so–called Linear Code Extraction
Attack. Substantial expertise is required to be successful with such an attack, but
publicly available tools like Degate [4] help to automate this process. Typically,
countering a successful attack in this class requires changes in the hardware of the
embedded chip, and a software fix is often not possible. Another characteristic
of this attack class is that it is very tedious, but the individual steps can be
automated and progress towards success can always be measured. This keeps
the motivation of hackers high.
700+ Attacks Published on Smart Cards 35

Fault attacks are much less invasive and are typically performed with high–end
laser equipment. The aim here is, e.g., to introduce faults either during code execu-
tion, or when reading data or code from the various memories. A famous example
is the Bellcore attack [5], where the introduction of a single fault at the right stage
of an RSA calculation based on the Chinese Remainder Theorem will reveal the
secret key used. The most economical way to address these attacks is with a right
mix of sufficiently resilient hardware and a robust embedded operating system
that can cope with ”mistakes” made by the hardware. The Bellcore attack already
demonstrates a key aspect of these types of attacks: Often, it suffices to have only
a few successful hits in order to succeed with the attack. However, the embedded
software has a chance of detecting the attack, e.g., when doing redundancy checks
that go wrong, or by monitoring alarms sent from the underlying hardware. Some
fault attacks like safe–error–attacks [6] are rather difficult to cope with, though,
since there the exploit already happens by detecting an ”unusual” response to the
attack by the embedded device — e.g., a reset forced by an attack is already a
useful information.
On the other hand, side–channel attacks are not invasive at all, and hence
there is by definition no way that an embedded device can tell it is being at-
tacked. Side–channel attacks aim at exploiting subtle differences in the power
consumption or the electromagnetic emission of these devices, e.g., differences
that depend on the secret key used during an encryption. Generally, these differ-
ences are so subtle that the measurement needs to be repeated many times and
quite some statistical analysis may be required. There is an abundant amount
of literature available on this subject.
Pure software attacks like fuzzing are of a different nature and will not be
further considered in this paper.

3 Possible Strategies
Strategies that can cope with new attacks even after the design of the embedded
device is finished are not easy to come by. However, new design principles do
begin to emerge.
In the past, the prevailing strategy had been security by obscurity, meaning
that the designers hoped that by making their design complicated and by hid-
ing their security countermeasures, it would be hard for an attacker to unravel
all the mysteries. However, this often underestimates the capabilities and the
determination of attackers. And once the attacker has been successful, it tends
to cause some embarrassment to the manufacturer. Consequently, in the long
run, it is much smarter to change to a strategy of integral security by design,
where the security lies in the secret keys used, but not in design and implemen-
tation details. Ideally, with such an approach a potential attacker can be given
all design details and it will still not help him/her to launch a successful attack.
Clearly, this is an ambitious goal.
Generally speaking it is favorable to address the root of a problem rather
than the symptoms. For instance, it is certainly possible to use a constant–
current source in hardware to make sure the power consumption is constant
36 M. Wagner

and independent of the CPU and of the coprocessor activity. This way a side–
channel attack on the power consumption will be made much harder. However,
it does not help at all for attacks based on electromagnetic emissions. It is much
better to deploy mathematical countermeasures such as blinding in the CPU and
coprocessor design. These countermeasures address the root cause and provide
protection independent of the particular side–channel used for the attack.
As to fault attacks, given that a single fault may already lead to a successful
attack, it is prudent for the embedded device to react very harshly to any fault
attack that it detects, particularly so when assuming that it will not detect
all faults to begin with. Ideally, it will shut down for good once a few attacks
have been detected. However, this requires that the embedded device can detect
attacks with high confidence and that there are no false positives. A false positive
would result in a very bad user experience ”in the field” and to unacceptably high
numbers of returns of dead devices to the manufacturer. Experience shows that
simple analogue security sensors tend to produce too many false positives in poor
environmental operating conditions and hence the trend is to more sophisticated
and even digital ”integral” security sensor concepts.
Some manufacturers deploy strategies where essentially two CPU cores exist
in the embedded device that perform the same calculation and then compare. If
the results differ, likely a fault attack has occurred. This strategy is very generic
and thus in principle capable of catching also future attacks of certain types.
On the other hand, there are obvious weaknesses of this approach. For once,
the coprocessors are usually not doubled, so these are not protected by this.
Secondly, the two CPU cores access the same memory and hence attacks on
the memory, say, during the fetching of code, will still apply. And thirdly, the
module that compares the results of the two CPU cores can also be attacked.
Commercially, the disadvantage of this approach is the substantial increase in
chip size, power consumption, and likely a degradation of RF performance.
Other strategies involve generating redundancy in time rather than in space,
by performing computations more than once. Obviously, this decreases the max-
imum performance of the embedded device, which needs to be compensated for
by having very powerful cores to begin with. The advantage is that this integral
method is very flexible and is not carved in stone. It can be applied where needed
and is very efficient with sparse resources such as chip area and power consump-
tion. It is entirely possible to cover not only the CPU with such an approach,
but also all crypto coprocessors, as well as memory access. Furthermore, the
time axis allows for more ”tricks” to be applied, such as mathematical blinding
techniques that change over time.
Independent of these particular strategies, formal methods for proving cor-
rectness and robustness against attacks can provide a greater level of confidence
that indeed all known cases have been covered correctly.
It is also needed to pay more attention to the family concept found for em-
bedded devices both, in hardware as well as in software. For instance, the attack
presented in [2,3] was made considerably easier due to the fact that an evolu-
tion of closely related smart cards existed across many technology nodes, where
700+ Attacks Published on Smart Cards 37

the attacker could learn and train his/her skills. Moreover, some members of
this family had less security features than others and were targeting other mar-
kets with possibly less security requirements. Again, these weaker members of
the same family provided stepping stones towards the ultimate attack. In order
to reduce such collateral damage, product diversification is required to target
different markets. Diversification of countermeasures may also be in order.
On a system level, it is always a wise idea to make the targets as unattractive
for attacks as possible. At the end of the day, except for ethical hacking, it is the
commercial business case that a hacker will review. Is it financially rewarding to
perform an attack or not — even though it may be theoretically feasible? Thus,
a system designer should refrain from making a single smart card or embedded
device too valuable to attack — e.g., by putting the same global system key into
all smart cards of an eco system. However, it is rather hard to estimate how
much value a single modern high–end smart card is capable of protecting — it
will be in the range of a few hundred k$, but not in the millions.

4 Certification and Risk Management


There is also room for improvement when it comes to certification and risk
management. The current certification schemes such as Common Criteria do
a very good job at assessing the security of an embedded device, and there is
substantial effort involved in these schemes to stay at the forefront of security
technology and to keep raising the bar. However, there are two effects here
that need to be considered. Firstly, it is a bit bizarre to hear that a brand
new embedded device fails a certification — perhaps only marginally — when
there are older embedded devices ”out there” that were certified against a lower
benchmark of security a few years before, and hence may actually be less secure.
So, the overall security in that eco system would actually increase, if these not
quite perfect successor devices were to replace their even less secure predecessors.
This calls for a proper risk management. For some industries like banking risk
management is in place, whilst for others it is not.
Secondly, because these certification processes tend to be very slow, by the
time a product gets a certificate, it is already old. The certificate necessarily
provides a snap-shot of the security level at the time it was issued. This calls for
very lean and agile certification schemes with regular maintenance procedures
for products to check whether they are still secure enough.

5 Conclusion
The wealth of new attacks on embedded devices and new countermeasures
thereto that emerges every year requires new approaches to the design of se-
cure embedded devices. Manufacturers need to embrace a philosophy of integral
security by design which is capable of coping with such new attacks, and where
the design could in principle be opened up to public review for analysis without
providing any substantial benefit to a potential attacker. Security by obscurity
38 M. Wagner

is a thing of the past. Countermeasures need to be as generic as possible and


deal with entire classes of attacks rather than providing patches for very specific
attacks only. Naturally, countermeasures will need to be much more mathemat-
ical than in the past. Formal methods may help here to gain clarity and proofs
of completeness. These challenges present a huge opportunity for the security
research community. And finally, one will need to find ways to improve certifica-
tion schemes such as Common Criteria, to add to their highest security standards
also the flexibility to cope with an ever more quickly changing world.

References
1. Common Criteria for Smart Cards, http://www.commoncriteriaportal.org/
2. Tarnovsky, C.: Hacking the Smartcard Chip. In: Blackhat Conference, February 2-3
(2010), http://www.blackhat.com/html/bh-dc-10/bh-dc-10-briefings.html
3. Nohl, K., Tarnovsky, C.: Reviving Smart Card Analysis. In: Blackhat Conference,
August 3-4 (2011),
http://www.blackhat.com/html/bh-us-11/bh-us-11-briefings.html
4. Schobert, M.: http://www.degate.org/
5. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Cryp-
tographic Protocols for Faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS,
vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
6. Loubet-Moundi, P., Olivier, F., Vigilant, D.: Static Fault Attack on Hardware DES
Registers, http://eprint.iacr.org/2011/531.pdf
An Interleaved EPE-Immune PA-DPL Structure
for Resisting Concentrated EM Side Channel Attacks
on FPGA Implementation

Wei He, Eduardo de la Torre, and Teresa Riesgo

Centro de Electrónica Industrial, Universidad Politécnica de Madrid,


José Gutierrez Abascal. 2, 28006 Madrid, Spain
{wei.he,eduardo.delatorre,teresa.riesgo}@upm.es

Abstract. Early propagation effect (EPE) is a critical problem in conventional


dual-rail logic implementations against Side Channel Attacks (SCAs). Among
previous EPE-resistant architectures, PA-DPL logic offers EPE-free capability
at relatively low cost. However, its separate dual core structure is a weakness
when facing concentrated EM attacks where a tiny EM probe can be precisely
positioned closer to one of the two cores. In this paper, we present an PA-DPL
dual-core interleaved structure to strengthen resistance against sophisticated
EM attacks on Xilinx FPGA implementations. The main merit of the proposed
structure is that every two routing in each signal pair are kept identical even the
dual cores are interleaved together. By minimizing the distance between the
complementary routings and instances of both cores, even the concentrated EM
measurement cannot easily distinguish the minor EM field unbalance. In PA-
DPL, EPE is avoided by compressing the evaluation phase to a small portion of
the clock period, therefore, the speed is inevitably limited. Regarding this, we
made an improvement to extend the duty cycle of evaluation phase to more than
40 percent, yielding a larger maximum working frequency. The detailed design
flow is also presented. We validate the security improvement against EM attack
by implementing a simplified AES co-processor in Virtex-5 FPGA.

Keywords: Interleaved Placement, Dual-Core, Concentrated EM Attack,


Routing Conflict, PA-DPL, PIP, LUT, FPGA.

1 Introduction

Power consumption and ElectroMagnetic (EM) attacks are the most studied attack
types since Side Channel Attack (SCA) was introduced by Paul Kocher et al [1]. DPL
(Dual-rail Pre-charge Logic) is experimentally proved to be an effective
countermeasure against SCA by masking its data-dependent power or EM variations
due to the complementary behavior between the True (T) and False (F) rails.
In [2], the Early Propagation Effect (EPE), also called Early Evaluation/Pre-charge
Effect is first time studied, revealing a potential defect in conventional DPL logic that
can possibly impact the complementary balance between T and F rails. The difference

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 39–53, 2012.
© Springer-Verlag Berlin Heidelberg 2012
40 W. He, E. de la Torre, and T. Riesgo

of arrival time for the inputs of complementary gates (or LUTs on FPGA) is potential
of generating unintentional data-dependent power or EM peaks. This is particularly
critical in FPGA implementation due to the rigid routing resource. In recent years,
several countermeasures for repairing the EPE problem were proposed, mainly
depending on the use of dual-rail compound gates with complementary signal pairs.
In this structure, the corresponding gates from dual rails are set side by side but
routings are done automatically by the router, which may lead to non-identical routing
paths between the complementary rails. A dual-core structure called PA-DPL
(Precharge-Absorbed Dual-rail Precharge Logic) is proposed in [3], which aims to
resist EPE problem while keeping routing identical for the implementation on Xilinx
FPGA with 6-input LUTs. However, separate placement for dual cores makes it
vulnerable to concentrated EM attacks.
In this paper, we present a row-crossed interleaved structure to minimize dual rail
unbalances caused by the non-identical routings. The main merit is that the identical
routing for complementary net pairs can be maintained between both interleaved dual-
cores thereby increasing the resistance to concentrated EM attacks. We also mitigate
the rigid timing in [3] by extending signal's duty cycle, which helps to increase the
maximum working frequency. The complete design flow and security tests against
attacks to interleaved PA-DPL will be given.
The rest of the paper is organized as follows. Section 2 presents an introduction to
the EPE problem and briefly discusses related techniques. Section 3 details the
proposed interleaved PA-DPL structure with identical routing. Implementation flows
of this structure to a simplified AES co-processor are shown in section 4. Section 5
describes the experimental attacks and net delay results. The work conclusion and
future work are given in section 6.

2 Related Work

Side channel analysis reveals the confidential information by analyzing side channel
leakages from low level, namely physical level. Therefore, countermeasures on this
level typically have better security performance than, for example, arithmetic
protections. However, physical leakages can be affected by a lot of factors. Any
minor asymmetry between the T and F rails can possibly lead to a detectable
unbalanced compensation in DPL structure, such as compensation skew, switching
time swing or glitch. Typically, routing length and process variation are considered to
be two significant factors which impact the compensation between T and F rails [4].

2.1 The Problem of Early Propagation Effect


DPL is a common logic type with symmetrical structure and mirror behavior. DPL
generates complementary logic behaviors from T and F rails and therefore obtains
constant and stable switching pattern in overall view of power or EM curves from
both rails. Figure 1 shows a compound XOR gate where complementary inputs
between the 2 gates generate complementary outputs.
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 41

Conventional DPL structures may be vulnerable due to EPE. When gates switch
either from pre-charge to evaluation phase or from evaluation to pre-charge phase,
EPE potentially occurs in these switching actions. Actually, the EPE problem does
not just open the possibility of attacks against power/EM variations caused by
switching-related glitches or skewed match, but also the switching actions themselves
by measuring the time variation. Generally, EPE has 3 main impacts that can be
potentially used to launch side channel analysis.

Fig. 1. DPL compound XOR gate, inverter factor is allowed in this example

Unintentional Switch. Normally, DPL logic requires that each compound gate should
have and should only have one switch action in each clock cycle to ensure that it's
data-independent [18][19]. For the inputs of a gate with variant arrival times,
unintentional switch may happen depending on the input combination. As shown in
Figure 2, the XOR gate of compound gate XOR has different arrival time, when the
combination of inputs are AT:BT=1:1 in evaluation phase, a short switching action
occurs. It would be inevitably reflected in power or EM leakages. Since only in this
input combination the switch can occur. So it can be said that this peak in power or
EM trace is data-dependent.

Switching Time. EPE also covers problem in terms of gate switching time. Switching
time attack was first introduced in [5]. In DPL, the switching edge for a gate with
different input arrival time swings depending on the input combination. In Figure 3,
early switching and late switching reveal the input combination as ''1:0" and "0:1"
respectively. Therefore, starting edge of switching action for this gate is also data-
dependent.

Skewed Compensation. The two gates in each compound gate should switch
simultaneously so as to match each other precisely. Even if the arrival time for the
inputs of each gate of the compound gate can be maintained identical, XOR and
XNOR gate cannot switch at the same time because the arrival time between the two
gates are not the same (XOR gate 1 unit, XNOR gate 2 units, as shown in Figure 4).
The minor peak residue due to skew compensation is still suspicious of attacks.
42 W. He, E. de la Torre, and T. Riesgo

Fig. 2. For this single XOR gate in the XOR compound gate, different input delay leads to data-
dependent unintentionally switch action

Fig. 3. Switching time swings depending on the input combination of each single gate of the
XOR compound gate

Fig. 4. Skewed switching action causes imperfect match

2.2 Previous Work Related with EPE Protection


For the FPGA implementation, some techniques have been proposed for resisting
EPE in recent years. DRSL is introduced in [6], which synchronizes the inputs before
the evaluation phase. STTL [20] ensures the same arrival time of the gate inputs by
using an extra validation rail. It requires customized unique gate, thereby bringing
troubles to the implementation. iMDPL [7], which is developed from MDPL [8], can
synchronize the gate inputs with SR-Latch, but the size and complexity of the gate are
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 43

concerns. BCDL is presented in [9], which synchronizes all the input pairs of a
compound gate by using a Bundle block. Since it has no limitation of gate type, better
optimization reduces the resource costs compared with previous ones. Another
structure named DPL-noEE [10] evolved from BCDL embeds the synchronization
logic into the encoding of LUT equations. Any potential intermediate transition is
eliminated by changing the code values to the value of pre-charged invalid state. It
has the highest efficiency in resource usage, however the starting edge of the
evaluation phase swings depending on the input combination. In [13], authors
explored place and route techniques for SDDL logic. which keeps identical routing
for both rails in interleaved placement, while EPE problem is not solved yet.

2.3 Interleaved Placement


In FPGAs, logic cells and routing resources are deployed as a highly regular array.
Interleaved placement aims to overlap the T and F parts of the dual-rail modules by
mapping the synthesized design into the basic logic blocks (the CLBs) side by side.
This helps to make the distance of the complementary cells as small as possible. In
Xilinx FPGAs, placement configuration is controllable by setting prohibit constraints.
Different placement types can be used for an interleaved dual core module. Similar to
the work in [13], we investigated several placement types, as shown in Figure 5, due
to the merits that type A and B give the smallest distance between complementary
instances and nets with high placement density. Comparatively, type C offers a larger
space for routing, whereas with lower placement density.

Type A Type B Type C

Fig. 5. Possible placement configurations for dual-core interleaved structure

3 Proposal of Interleaved PA-DPL

Due to the fact that pre-charge and synchronization logic are embedded into the LUT
equations, PA-DPL has high efficiency in hardware resource usage compared with
most of other EPE-resistant solutions. Up to 4 input equations are permitted in 6-input
LUT without inverter prohibition, this can further optimize resource usage.
44 W. He, E. de la Torre, and T. Riesgo

3.1 PA-DPL
PA-DPL evolves from FPGA implemented SDDL logic [11][12]. As mentioned in
[3], Pre-charge action logic is absorbed into the LUT function by inserting a global
pre-charge signal. Ex signal works together with Pre-charge signal to restrict the
evaluation and pre-charge phases into a fixed portion. Pre-charge and Ex are produced
with a stable phase shift. The resistance against EPE benefits by the following 2
points [3]:

1. Early Pre-Charge Prevention. In PA-DPL, Ex and Pre-charge signals are


implemented by global clock networks and directly connected to every LUT in
the protected part. So all the logic cells can be pre-charged instantly and
simultaneously without waiting for the propagation of the pre-charge waveform
as needed in WDDL [11]. Therefore, we can ensure that the pre-charge phase
always starts before the invalid data (pre-charged value) of the fastest input
arrives at each LUT, as illustrated in Figure 6.

2. Early Evaluation Prevention. Since valid data needs to propagate from source
registers to capture registers, the Ex signal in PA-DPL acts to confine the
evaluation phase into a restricted period in each clock cycle in order to make the
evaluation phase to start after the valid data of the slowest input arrives at each
LUT. Register stores the propagated valid data in each evaluation phase and then
releases it to the first LUT of the next sequential stages in the next evaluation
phase. So T and F registers always store complementary valid data.

Fig. 6. Implementation of PA-DPL logic [3]


An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 45

1 CLB row ( 1 DU )

Fig. 7. Separate dual-core structure of PA-DPL logic

Threats from Concentrated EM Analysis. The two cores in PA-DPL can be set
close to obtain optimal timing and security performances, as shown in Figure 7, the
two cores are placed at a distance of 1 CLB row, hereafter will be called 1 DU
(Distance Unit). However, the complementary LUTs and routing are still deployed in
locations with relatively larger distances, here 5 DU. If a narrow probe can be set
precisely to either one of the two cores, the induced voltage by the magnetic field
from a pair of data-dependent cells cannot be balanced.
Power-based attacks depend on the global power consumption of the whole design.
So, in this context, location of the core does not have crucial influence in the
compensation of the whole power consumption. So, separate architecture for dual
core is not a big weakness against power-based side channel analysis. However,
manufacturing process variation matters when facing more sophisticated power or
EM attacks. In [22][23] authors demonstrated that closer locations in a chip have less
process variations. In order to mitigate the fabrication process deviation, it is better to
deploy two complementary cells or nets in closer locations.

3.2 Routing Conflicts

Compared with ASIC implementation, FPGAs have much less freedom to choose the
resource to be used in the design, specially for the routing resources. Using the FPGA
Place and Route tools (PAR), users cannot control the router but following the pre-
defined routing algorithm.
Switch matrices offer the connecting possibilities between the horizontal and
vertical local lines. Combined with some special interconnects, the router tool
automatically connects all logic cells according to the specific design. Generally,
Switch Box in perimeters vary with those inside the fabric in the number of allowable
routes. Since identical routings require identity in routing resources, the placement for
the duplicate part should preferably avoid the use of the perimeter resources so as to
prevent the routing problems in advance.
In an interleaved placement, routing conflicts can occur when duplicating the
routings of the T core to the location of the F core since the routing resource for the
46 W. He, E. de la Torre, and T. Riesgo

F core may possibly have been pre-assigned by the nets of the T core. This makes the
techniques of direct copy-and-paste in [14] challenging if the F part is overlapped or
interleaved in the same fabric location with the T part.

3.3 Timing Improvement


Compared with WDDL, the synchronized logic in [3] has a decreased duty cycle of
25%. Actually, there are timing margin can be obtained. Here, we avoid the use of a
frequency-doubled Ex signal, but using one which has the same frequency with the
global pre-charge and clock signal, as shown in Figure 8. As well, we use a stable
phase shift between Prch and Ex to compress the evaluation phase for making the
evaluation phase start only after the valid data (evaluated value) of the slowest input
arriving at the gate (i.e. LUT in FPGA). It can be easily done by setting the width of a
Johnson Counter to 6 bit (other width can also be chosen depending on the phase shift
a specific design requires), and choosing the outputs of any two neighboring bits as
the inputs of global clock buffers of Prch and Ex respectively. So, Prch gets 30o phase
shift forward to Ex. With this configuration, we can get the synchronized signal with
a fixed evaluation phase of 41.7%. This configuration is related with the speed of the
circuit. Less phase shift offers larger duty cycle, however it risks of exceeding the
arrival time of the slowest input in certain LUT. If the gate mapped to this LUT is
critical (i.e. is related to the key of the crypto-algorithm), side channel analysis to this
part is still possible. Larger phase shift leads to smaller evaluation phase (i.e. smaller
signal duty cycle), however, it prevents EPE in the majority of the critical cells.

Fig. 8. Timing schedule for interleaved PA-DPL XOR gate

4 Implementation

A simplified AES co-processor is chosen as the testing design for our row interleaved
PA-DPL. We implement the SBOX by logic elements instead of RAM. Figure 9
illustrates the block diagram of this design. It contains only XOR operation and
SBOX substitution blocks. T and F cores share the same control and clock generation
blocks in order to save resources. The partition method used is similar to the
technique in [15]. In each clock cycle, 8 bits plaintext generated from a
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 47

Pseudo-Random Number Generator (PRNG) is encrypted into 8 bits ciphertext, and it


will be abbreviated as AES-8. A pair of 8-bit registers store the outputs from T and F
SBOX. Figure 10 shows the design flow of the interleaved PA-DPL. The complete
procedure is made up of manual, automatic and routing conflict check phases.

Fig. 9. Block diagram of dual cores simplified AES module

Fig. 10. Dual cores share the same control and clock-generating logics

Manual Phase. This step includes two synthesis iterations, one constraints insertion
and one file conversion. First, Virtex-4 is chosen as the target device to synthesize the
HDL file of our design. We can get a ngc file which is a binary expressed netlist file
with constraint information. The size of each Boolean function in this file is
constrained to a 4 input LUT since Virtex-4 FPGAs are based on 4 input LUTs. Then,
Virtex-5 is used as the target device to synthesize the ngc file. We set the maximum
input number of the 6 input LUT to 4 and disable the optimization strategy in process
properties, we then get a ncd file in which all the 6 input LUTs have at most 4 used
inputs, namely at least there are 2 unused inputs for each 6 input LUT. This is exactly
what is required, because in PA-DPL, 2 inputs of each LUT should be used in order to
implement pre-charge and synchronization logics. An ucf file is utilized in this
synthesis to limit the use of CLBs in certain parts to make it as a initially interleaved
placement. As shown in Figure 11, after the synthesis, the ncd file is then converted to
XDL (Xilinx Design Language) version.
48 W. He, E. de la Torre, and T. Riesgo

Control
Logic
Core Part

Fig. 11. Single (unprotected) core with row-crossed interleaved placement

Automatic Phase. A XDL file is a readable version of a ncd file. It contains all
information of the design with regular format for all instances and nets. Thereby all the
copy and paste work can be done by modifying the XDL content using programming
languages. Here, we constructed a script, named SCB (Security Converter Box), to
automatically and safely convert the single core to an interleaved dual-core module in
low level description. SCB is compiled with Java and Regular Expression syntax. It can
be self-adapted to different designs since users just need to supply two parameters,
location of T part (the part needs to be protected) and displacement parameter for the F
part (for the Type C placement from Figure 5, this parameter is vertical '+1', horizontal
'0'). SCB automatically executes all the modifications and produces a converted XDL
file. This phase performs the following steps:
 Tag nets and instances according to the location parameters
 Duplicate and move instances of T part to location of F part.
 Insert Prch and Ex to free inputs of LUT
 Adjust LUT equation
 Arrange over-block nets (delete and convert the nets)

PIP
T net
F net

routing conflict

delete conflicting
section

re-duplicate and paste re-route T part


Fig. 12. T net is routed by Xilinx router, so it has optimal global timing result. In this check and
re-route flow, it deletes the conflict net section and maintain all the other sections. So, the
optimized timing result provided by router can be maintained as much as possible.
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 49

Routing Conflict Check Phase. After the conversion step, a PA-DPL protected circuit
in interleaved structure is obtained. Then it is transformed back to ncd file. At this
point, conflicts between the T and F routing lines may potentially exist in the design.
So, the design is checked by a tool developed on top of RapidSmith [16][17]. This
tool transforms every net to an abstract representation where every net is represented
as a node, and Programmable Interconnect Points (PIPs) define the connections
between these nodes. Since we've tagged the copied routing lines in the previous
phase, the tool checks all routing information of the F part by comparing the path
shape (PIPs information) between T and F rails. If two same PIPs are found, the F
routing passing through this PIP conflicts with another routing which passes through
the same PIP. It then deletes the conflict section of the T routing, re-route it and
duplicates it to generate a new F routing. Then, the tool checks PIPs of the new F
routing again. If there are conflicts again, the procedure is repeated until no conflicts
are found. Figure 12 illustrates the block diagram of this check and re-route flow.

F core

T core

Control logic

Fig. 13. Dual-core (protected) with row-crossed interleaved placement. Complementary routing
pairs in the core part are identical in shape.

The final layout of a PA-DPL AES-8 in row-crossed interleaved structure is shown


in the left part of Figure 13. It has identical routings between the two core parts. By
making different placement constraints in manual phase, different interleaved
structures can be obtained. However, according to the test results, we found that the
configuration of the PIPs in the horizontal direction are not strictly identical in
neighboring columns from the target device (Xilinx Virtex-5). So we eventually
choose the row-crossed type (i.e. Type A in Figure 5), due to its high placement
density and perfect regularity of PIP configuration in vertical direction. A pair of the
identical routing from interleaved placement is shown in the right part of Figure 13.

5 Test Attacks and Timing Check

Comparison attacks are made to validate the protection improvements. We implement


AES-8 co-processor in SE (Single Ended, i.e. unprotected), separated PA-DPL and
row-crossed interleaved PA-DPL respectively. They are all deployed in the similar
fabric location in a same Virtex-5 FPGA chip in order to minimize the interference
from process variation [21][22]. Control logic sends the plaintext and repeatedly runs
the encryption core at a frequency of 3MHz. SE and separate PA-DPL design also use
50 W. He, E. de la Torre, and T. Riesgo

the same placement constraints as the interleaved one for the convenience of the
comparison. A self-made EM probe (copper multi-turn antenna with 0.5mm diameter
and 26 turns) is used to gather EM radiation. Sampling rate is 667MSa/s using an
Agilent Oscilloscope with segmented memory.

5.1 Experimental Attacks


Primitive analysis results show that only 60 traces are enough to retrieve the right key
in attack to SE implemented AES-8. Separate dual-core PA-DPL resists the attack
until the analyzed trace number reaches around 50,000. For the interleaved one, the
key revealed trace number is increased to 62,000, gaining increase robustness factors
of 1033 and 1.24 respectively from SE one and separate dual-core PA-DPL. Test
results are plotted in Figure 14.

0.5
1
key revealed trace number: 60
0.4

0.3
Correlation
0.5
0.2
peak:0.487
0
0.1

0
−0.5
−0.1

−0.2 −1
0 50 100 150 200 250 0 50 100 150 200 250 300

0.03
0.06 key revealed trace number: 50,000
0.02 Correlation
0.04
peak:0.021
0.01 0.02
0
0
−0.02

−0.01 −0.04
−0.06
−0.02
0 50 100 150 200 250 −0.08
0 40000 80000 120000 160000 200000
0.03
0.06
0.02 Correlation key revealed trace number: 62,000
0.04
peak:0.016
0.01 0.02
0
0
−0.02

−0.01 −0.04
−0.06
−0.02 −0.08
0 50 100 150 200 250 0 40000 80000 120000 160000 200000

Fig. 14. Correlation Coefficient curves of concentrated EM attacks. The one with interleaved
placement shows improved protection level compared with the one with separate placement.

5.2 Timing Verification


FPGA Editor offers a user-oriented interface for convenience of identifying the cells
in the fabric matrix, but it doesn't strictly follow the facts on the physical level of the
chip. Low level (i.e. physical level) parameters are typically kept confidential from
users. Therefore, we made timing comparison between the T and F routings to verity
the improvement. Table 1 and Figure 15 show the comparison result. Group II is the
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 51

net delay comparison of the complementary nets from an interleaved placement, route
uncontrolled dual-core AES-8. Group I is the result of the same module with Group II
except using the identical routing methods. It's obvious that in Group I, for most of
the nets, difference of net delay is 0 ns. Only few of them have minor difference, less
than 20ps. Comparatively, in Group II, since nets are automatically routed by routers,
most all of the complementary routing pairs have distinct time delay. The minor delay
differences in Group I are caused by the tiny net adjustment when the router connects
the new core (F core) and the peripheral control logic. Test result validates the
assumption that even if the physical level is unknown, identical nets in FPGA Editor
view obtains the same net time delays.

Table 1. Delay difference comparison between Group I (interleaved placement with identical
routing) and Group II (interleaved placement without identical routing) of a routing pair with
11 net sections

net1 net2 net3 net4 net5 net6 net7 net8 net9 net10 net11
net_F 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.980ns 0.548ns 1.125ns 0.758ns 0.164ns 0.626ns
net_T 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.982ns 0.548ns 1.143ns 0.758ns 0.164ns 0.626ns
I net_F- 0.000 0.000 0.000 0.000 0.000 -0.002 0.000 -0.018 0.000 0.000 0.000
net_T
net_F 0.421ns 0.686ns 0.494ns 1.058ns 0.443ns 1.125ns 0.529ns 1.124ns 0.759ns 0.410ns 0.626ns
net_T 0.423ns 0.728ns 0.496ns 1.060ns 0.446ns 0.982ns 0.548ns 1.143ns 0.758ns 0.164ns 0.626ns
II net_F-
-0.002 -0.042 -0.002 -0.002 -0.003 0.143 -0.019 -0.019 0.001 0.246 0.000
net_T

Group
Group
I II

Fig. 15. Bar diagram of time delay difference. Comparison proves that with identical routings,
complementary net pairs have extremely small swing of delay time difference.

6 Conclusion
This paper deals with the routing problem which occurs when overlapping the
complementary parts of dual-core structures in DPL logic. In our proposal, we
developed a technique which is capable of checking and repair the unmatched routing
pairs. By following the routing conflict checking flow, identical routing can be kept
for the complementary parts, even if the placement is closely interleaved together.
Based upon an EPE-resistant PA-DPL, we demonstrated an improved one which has a
52 W. He, E. de la Torre, and T. Riesgo

row-crossed interleaved structure for the core part with routing consistency. This
makes the corresponding complementary instances and nets as close as one DU while
the time delays for complementary nets are kept identical. This effectively strengthens
the resistance against concentrated EM attacks. Meanwhile, interleaved PA-DPL
makes the dual rails closely paralleled. This helps to reduces the process variation
impact since neighboring areas in silicon chip provably have more similar electric
parasitic parameters than that between two areas apart [22]. We also corrected the Ex
signal in PA-DPL to release the timing pressure caused by the compressed evaluation
phase. After this improvement, signal duty cycle can be expanded to 41.7% when the
core works in 3MHz working frequency. Timing verification validates that the
combination of the proposed techniques significantly reduced the time delay
differences in each complementary net pairs. Size comparison is made by comparing
LUT cost. Interleaved PA-DPL AES-8 occupies 353 LUTs, with an increase factor of
2.69 compared with 131 LUT cost of the unprotected one. Separate PA-DPL one
occupies 355 LUTs. This minor difference between interleaved and separate ones is
due to the different placements used which impacts the synthesis and mapping results.
Cost increase factor varies depending on what proportion the core part accounts for in
the whole circuit. The comparison attacks on different implementations show that
row-crossed interleaved PA-DPL has an increased resistance against concentrated EM
analysis by a factor of 1033 and 1.24 respectively from the unprotected circuit and
PA-DPL protected circuit with separate placement.
In the next step, we will test the circuit with more sophisticated attacks in order to
make thorough security verifications. Reducing the transient peak current is another
part of the future work.

Acknowledgments. This work was partially supported by the Artemis program


under the project SMART (Secure, Mobile Visual Sensor Networks Architecture)
with number ARTEMIS-2008-100032 and RECINTO project partially funded by
Community of Madrid.

References
1. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.) CRYPTO
1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
2. Suzuki, D., Saeki, M.: Security Evaluation of DPA Countermeasures Using Dual-Rail Pre-
charge Logic Style. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp.
255–269. Springer, Heidelberg (2006)
3. He, W., De La Torre, E., Riesgo, T.: A Precharge-Absorbed DPL Logic for Reducing
Early Propagation Effects on FPGA Implementations. In: 6th IEEE International
Conference on ReConFigurable Computing and FPGAs, Cancun (2011)
4. Guilley, S., Chaudhuri, S., Sauvage, L., Graba, T., Danger, J.-L., Hoogvorst, P., Vong,
V.-N., Nassar, M.: Place-and-Route Impact on the Security of DPL Designs in FPGAs. In:
HOST, pp. 29–35. IEEE Computer Society (2008)
5. Guilley, S., Chaudhuri, S., Sauvage, L., Graba, T., Danger, J.-L., Hoogvorst, P., Vong, V.-N.,
Nassar, M.: Shall we trust WDDL? In: Future of Trust in Computing, Berlin, vol. 2 (2008)
6. Chen, Z., Zhou, Y.: Dual-Rail Random Switching Logic: A Countermeasure to Reduce
Side Channel Leakage. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249,
pp. 242–254. Springer, Heidelberg (2006)
An Interleaved EPE-Immune PA-DPL Structure for Resisting Concentrated EM SCAs 53

7. Popp, T., Kirschbaum, M., Zefferer, T., Mangard, S.: Evaluation of the Masked Logic
Style MDPL on a Prototype Chip. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007.
LNCS, vol. 4727, pp. 81–94. Springer, Heidelberg (2007)
8. Popp, T., Mangard, S.: Masked Dual-Rail Pre-charge Logic: DPA-Resistance Without
Routing Constraints. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp.
172–186. Springer, Heidelberg (2005)
9. Nassar, M., Bhasin, S., Danger, J.-L., Duc, G., Guilley, S.: BCDL: a High Speed Balanced
DPL for FPGA with Global Precharge and No Early Evaluation. In: Proc. Design,
Automation and Test in Europe, pp. 849–854. IEEE Computer Society, Dresden (2010)
10. Bhasin, S., Guilley, S., Flament, F., Selmane, N., Danger, J.-L.: Countering Early
Evaluation: an Approach towards Robust Dual-Rail Precharge Logic. In: WESS. ACM,
Arizona (2010)
11. Tiri, K., Verbauwhede, I.: A Logic Level Design Methodology for a Secure DPA Resistant
ASIC or FPGA Implementation. In: Proc. Design, Automation and Test in Europe, pp.
246–251. IEEE Computer Society (2004)
12. Velegalai, R., Kaps, J.-P.: DPA Resistance for Light-Weight Implementations of
cryptographic Algorithms on FPGAs. In: IEEE (FPL) Field Programmable Logic and
Applications, pp. 385–390 (2009)
13. Velegalati, R., Kaps, J.-P.: Improving Security of SDDL Designs Through Interleaved
Placement on Xilinx FPGAs. In: 21st IEEE International Conference on Field
Programmable Logic and Applications, Crete, Greece (2011)
14. Yu, P., Schaumont, M.: Secure FPGA circuits using Controlled Placement and Routing.
In: 5th IEEE International Conference on Hardware/Software Codesign and System
Synthesis, pp. 45–50 (2007)
15. Kaps, J.-P., Velegalati, R.: DPA Resistant AES on FPGA using Partial DDL. In: IEEE
FCCM, Symposium on Field-Programmable Custom Computing Machines, pp. 273–280
(2010)
16. Lavin, C., Padilla, M., Lamprecht, J., Lundrigan, P., Nelson, B., Hutchings, B.:
RapidSmith: Do-It-Yourself CAD Tools for Xilinx FPGAs. In: 21st IEEE International
Conference on Field Programmable Logic and Applications, pp. 349–355 (2011)
17. Lavin, C., Padilla, M., Lamprecht, J., Lundrigan, P., Nelson, B., Hutchings, B.: HMFlow:
Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping. In: 18th IEEE
Symposium on Field-Programmable Custom Computing Machines, Salt Lake City, USA
(2011)
18. Kulikowski, K., Karpovsky, M., Taubin, A.: Power Attacks on Secure Hardware Based on
Early Propagation of Data. In: IEEE, IOLTS, pp. 131–138. Computer Society (2006)
19. Suzuki, D., Saeki, M.: An Analysis of Leakage Factors for Dual-Rail Pre-charge Logic
style. IEICE, Transactions on Fundamentals of Electronics, Communications and
Computer Sciences E91-A(1), 184–192 (2008)
20. Soares, R., Calazans, N., Lomné, V., Maurine, P.: Evaluating the Robustness of Secure
Triple Track Logic through Prototyping. In: 21st Symposium on Integrated Circuits and
System Design, pp. 193–198. ACM, New York (2008)
21. Stine, B., Boning, D., Chung, J.: Analysis and Decomposition of Spatial Variation in
Integrated Circuit Processes and Devices. IEEE Tran. on Semiconductor Manufacturing,
24–41 (1997)
22. Sedcole, P., Cheung, P.: Within-die Delay Variability in 90nm FPGAs and Beyond. In:
Proc. IEEE International Conference on Field Programmable Technology (FPT 2006), pp.
97–104 (2006)
23. Maiti, A., Schaumont, P.: Improved Ring Oscillator PUF: An FPGA-friendly Secure
Primitive. J. Cryptology 24, 375–397 (2010)
An Architectural Countermeasure
against Power Analysis Attacks
for FSR-Based Stream Ciphers

Shohreh Sharif Mansouri and Elena Dubrova

Department of Electronic Systems, School of ICT,


KTH - Royal Institute of Technology, Stockholm
{shsm,dubrova}@kth.se

Abstract. Feedback Shift Register (FSR) based stream ciphers are known
to be vulnerable to power analysis attacks due to their simple hardware
structure. In this paper, we propose a countermeasure against non-invasive
power analysis attacks based on switching activity masking. Our solution
has a 50% smaller power overhead on average compared to the previous
standard cell-based countermeasures. Its resistance against different types
of attacks is evaluated on the example of Grain-80 stream cipher.

1 Introduction
Feedback Shift Register (FSR) based stream ciphers target highly constrained
applications and have the smallest hardware footprint of all existing crypto-
graphic systems [1]. They are resistant against many types of cryptographic
attacks, including algebraic attacks, chosen-IV attacks, and time/memory/data
tradeoff attacks [2,3] but, due to their simple hardware structure, they are vulner-
able to side channel attacks [4]. One of the most dangerous side channel attacks
is power analysis, which breaks a cipher by exploiting the information content of
its power signature. Two popular types of power analysis attacks are Differential
Power Analysis (DPA) [5] and Mutual Information Analysis (MIA) [6].
Several countermeasures against power analysis attacks for block ciphers have
been developed [7]. Although these countermeasures can be applied to stream
ciphers as well, their overhead is often too high.
In this paper we propose a countermeasure against power analysis attacks for
FSR-based stream ciphers which masks the power trace of a cipher by altering the
switching activity of its FSRs. The proposed solution can be implemented using
standard digital cells only and is therefore well compatible with the standard
ASIC design flow. Compared to previous standard cell-based countermeasures [8]
for FSR-based stream ciphers, it consumes 50% less power and uses 19% less area
on average. We evaluate its resistance against DPA, MIA, and more complex
attacks on the example of Grain-80.
The remainder of the paper is organised as follows: In Section 2, related work
is summarised; Section 3 makes a preliminary analysis of FSRs and analyses their
dynamic power consumption; Section 4 describes our countermeasure; hardware

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 54–68, 2012.

c Springer-Verlag Berlin Heidelberg 2012
An Architectural Countermeasure against Power Analysis Attacks 55

implementation, experimental results and security issues are considered respec-


tively in Sections 5, 6 and 7; Section 8 concludes the paper.

2 Related Work
Power analysis attacks were first proposed in 1998 [5]. Several countermeasures
have been suggested to protect cryptographic systems against power analysis at-
tacks.
Analog countermeasures hide the correlation between data and power con-
sumption using an analog isolation circuit which keeps the current always at a
defined level [7,9]. Most of these countermeasures target other crypto-hardwares
such as block ciphers [7]. Although analog countermeasures can be effective on
FSR-based stream ciphers, most of them have high area and power overheads
which make them unsuitable for highly constrained environments. The only work
which focuses directly on designing an analog countermeasure for FSR-based
stream ciphers is [10].
Cell level countermeasures implement the cipher using dual rail logic gates
such as SABL [11], TDPL [12] or 2N-2Np [13]. Dual rail gates have low power
variations compared to standard cells but have higher area and power consump-
tion compared to standard digital cells. Moreover, these gates are normally not
included in standard cell libraries and must be designed at transistor level.
Architecture level countermeasures protect the cipher by hiding the depen-
dency between data and power consumption [8] or by masking the power trace,
i.e. by changing the quality of the power diagram so that it shows a com-
plete different pattern compared to the original power diagram [4, 5]. To our
best knowledge, the only architecture level countermeasure specifically targeting
FSR-based stream ciphers is [8]. The authors suggest a new implementation of
FSRs in which the number of flip-flops is doubled and control logic is inserted so
that for an n-bits FSR, n flip-flops toggle in any cycle (see Figure 5-right) and
the power diagram is ideally flat. The countermeasure can be implemented using
only standard digital cells but carries high overheads: even without considering
the overheads of the control circuits, the average flip-flop switching activity of
the system is doubled (the average flip-flop switching activity of an n bits FSR
is n2 [10]).

3 Preliminaries: Cipher Power Consumption and FSR


Switching Activity
FSR-based stream ciphers such as [2, 3, 14] contain feedback shift registers and
combinational blocks. From a hardware point of view, FSRs are chains of syn-
chronous flip-flops connected back-to-back, with (in Fibonacci configuration) an
input on the first flip-flop obtained from a combinational block. The outputs of
the flip-flops are defined as state bits fi .
Non-invasive power analysis attacks can only observe the energetic trace of a
complete cipher [15], obtained by probing the current on the power supply line
56 S.S. Mansouri and E. Dubrova

in f1 f2 f3 f4out fault1(initial) 3
initial 0 1 1 1 0 1

SA
2
correct in ff1 f1 ff2 f2 ff3 f3 ff4 f4 ff5 out 1
time1 1 1 1 0 1 0 output
time2 0 1 1 1 0 1 1234 56
time3 1 0 1 1 1 0 faulty bit time
time4 0 1 0 1 1 1 correct faulty FSR
time5 1 0 1 0 1 1 output unfaulty FSR

Fig. 1. An example of faulty 5-bits FSR with an injected fault on f2 during the initial
cycle

of the cipher. There is a high correlation between this energetic trace and the
switching activity SA of the state bits fi of the FSR(s), i.e. how many FSR(s)
state bits toggle in one cycle [8]. The high correlation can be explained by the
following observations:

– Given the size of FSRs in FSR-based stream ciphers (2×80 bits for Grain-80,
2 × 128 bits for Grain-128, 288 bits for Trivium), most of the power of the
ciphers is consumed by the FSR(s) itself (themselves), with only a marginal
contribution given by the combinational blocks [8, 10].
– The energy consumption of every flip-flop in an FSR is highly dependent
on its output bit. Clock switching has a significant but constant power con-
sumption; if the output of a flip-flop toggles, its energy consumption is much
higher compared to a situation in which its output does not toggle. The
energy consumed in a 0 → 1 or 1 → 0 transition is in first approximation
equal and much higher than the energy consumed in a 0 → 0 or 1 → 1
transition [5].

In an experiment that we ran on energetic traces of Grain-80 during operation


(200 cycles are considered), we found a ∼ 85% correlation between the cipher
energetic trace and the switching activity of its FSR’s state bits (see Figure 2).

4 Switching Activity Alteration Countermeasure

4.1 Intuitive Idea

Since the energetic trace of an FSR-based stream cipher has a very high corre-
lation with the switching activity of the state bits of the FSR(s), to alter the
energetic trace we propose to change the switching activity pattern of all its
FSRs, i.e. to modify the FSRs so that they have the same output stream as the
original FSRs, but a different switching activity in every cycle.
If the output fi of a flip-flop toggles before it is passed on in an FSR, a fault is
injected in the chain and propagates through it (see Figure 1). The fault alters the
output stream of the cipher if it reaches any of the outputs of the chain going to
combinational blocks. If the fault is corrected before, however, the output stream
of the cipher remains unaltered while the switching activity pattern (and thus
the power graph) is changed. We insert fault injection/correction mechanisms
between the flip-flops composing an FSR, in such a way that the output stream
An Architectural Countermeasure against Power Analysis Attacks 57

Peak Current (nA)

Switching Activity
8700 94
8300 89
7800 83
7300 77
6800 71
6300 65
0 20 40 60 80 100 120 140 160 180 200
Time (Clock Cycle)

Fig. 2. Power (current) of Grain-80 and state bits switching activity

feed back feed back


FSR1 s FSR2
in out f3 ff4 f4 ff5 f5 ff6 out
ff1 f1 ff2 f2 ff3 f3 ff4 f4 ff5 f5 ff6 in ff1 f1 ff2 f2
ff3
s s s s
original FSR FSR1 FSR2
SA diagram
in f1 f2 f3 f4 f5 out SA s in f1 f2 f3 f4 f5 out SA s in f1 f2 f3 f4 f5 out SA
initial 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 11 1 0 0 1 0 0 6
3 3 4 FSR1
time1 1 1 1 1 0 1 0 0 1 0 1 0 0 1 0 01 1 1 1 0 1 0 5
3 5 4 4

SA
time2 1 1 1 1 1 0 1 1 1 1 0 1 0 0 1 11 1 0 1 1 0 1 3
2 6 3 FSR2
time3 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 01 1 1 1 1 1 0 2
1 5 2 1
time4 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 10 1 0 1 1 1 1
altered bits same output stream 1 2 3 4 5
time

Fig. 3. Two different protected FSRs and their switching activities

of the cipher remains unaltered but the switching pattern of the flip-flops is
modified. The protected and original ciphers are functionally undistinguishable
in a non-invasive attack, because their output stream is identical; however, their
power signature is different.

4.2 Alteration Mechanism


Our fault injection and correction mechanism consists in introducing a number
of XOR gates in the middle of the flip-flop chain (modification points), which
combine the output of a flip-flop fi with a periodic signal s or s̄ before passing
it on in the chain. Signal s toggles between 0 and 1 in every cycle, starting from
s = 1 in the first cycle of operation, and s̄ = N OT s. Depending on whether fi
is combined with s or s̄, the modification points let the signal pass unaltered in
every even or odd cycle and invert it in every odd or even cycle, as in Figure 3.
Modification points combine fi with s if i is even and combine fi with s̄ if i
is odd. Let us consider an FSR with two modification points, the first on state
bit fj and the second on state bit fk . The first modification point injects a fault
in the stream; the second corrects it. If k − j is even, both modification points
combine the state bits with signal s or s̄; if k − j is odd, one modification point
combines the state bit with signal s and the other with signal s̄. Correction of
the fault at the second modification point is guaranteed because the following
relations hold:

i even : ((f ⊕ s)  i) ⊕ s = ((f ⊕ s̄)  i) ⊕ s̄
f i=
i odd : ((f ⊕ s)  i) ⊕ s̄ = ((f ⊕ s̄)  i) ⊕ s
58 S.S. Mansouri and E. Dubrova

where the i operator is the shifting operator indicating a right shift of i


positions.
State bits f1 , ..., fj and state bits fk+1 , ..., fn keep their original switching
pattern while state bits fj+1 , ..., fk take a new switching pattern. In general,
we introduce an even number of modification points in an FSR, so that the first
modification point introduces a fault, the second corrects it, the third introduces
a fault, the fourth corrects it, etc. This divides the state bits into unaltered state
bits (the ones that keep their original switching pattern) and altered state bits
(the ones that get a new switching pattern). Altered state bits are marked in red
in Figure 3. If an altered state bit is given as input to a combinational function,
the fault is corrected before the combinational function (see bit f3 of FSR1 in
Figure 3).
During the parallel initialization of the FSR, the initial state of the FSR is
loaded in its flip-flops. With altered FSRs, because signal s is at one in the first
cycle of operation, initial state bits that are loaded into altered state bits must
be inverted if they have an even index (see f2 and f4 for FSR1 and f2 for FSR2
in Figure 3).

4.3 Power Traces Independence


Two tests can be performed to check whether the switching activities of the
flip-flops in the original and protected ciphers, respectively indicated as SAO
and SAP , are independent: correlation and mutual information. Independence
between SAO and SAP is important to guarantee immunity to power analysis
attacks.
Pearson’s correlation detects linear dependency between variables. The cor-
relation between SAO and SAP is defined as:
E ((SAO − E (SAO )) (SAP − E (SAP )))
ρ (SAO , SAP ) =
σ (SAO ) σ (SAP )
where E is the expected value operator and σ indicates standard deviation.
The mutual information between two random variables measures the amount
of information on one variable that one obtains by knowing the other variable and
can detect any type of dependency among them. The mutual information between
the switching activities of the original and protected ciphers is defined as:
  
P r (SAO = x, SAP = y)
I(SAO ; SAP ) = P r(SAO = x, SAP = y) × log2
x,y
P r(SAO = x) × P r(SAP = y)

where P r(SAO = x) is the probability to have SAO = x, P r(SAP = y) is the


probability to have SAP = y and P r(SAO = x, SAP = y) is the probability to
have at the same time SAO = x and SAP = y.
Figures 4-A and 4-B show respectively the correlation and the mutual infor-
mation between SAP and SAO as the number of altered state bits a changes. The
figures were obtained by running 800 experiments (each of them a 10K run) on a
protected and an unprotected 160-bits FSRs. A random noise signal correspond-
ing to ∼ 0.5% of the maximal switching activity value (+1 with 25% probability,
An Architectural Countermeasure against Power Analysis Attacks 59

1 1 0.15 0.3
A − ρ(SAP,SAO) B − I(SAP,SAO) C − Pr(ρ) D − Pr(I)
0.5 0.8
0.1 0.2
0.6
0
0.4
0.05 0.1
−0.5
0.2
−1 0 0 0
0 40 80 120 160 0 40 80 120 160 −0.2 −0.1 0 0.1 0.2 0.1 0.12 0.14 0.16 0.18
n. altered bits n. altered bits ρ(SAX,SAY) I(SAX,SAY)

Fig. 4. A, B: correlation and mutual information between SAO and SAP for 160-bits
FSRs after 10K cycles as the number of altered state bits varies. C, D: correlation
and mutual information distribution between unrelated switching activities after 10K
cycles.

no change with 50% probability, −1 with 25% probability) was added to SAP :
without this noise, if the number of altered state bits is even (odd), the parity
of SAO and SAP is necessarily equal (different), and the mutual information
is close to 1 bit (by knowing SAO we can understand the parity of SAP and
viceversa). This has no practical effect on power analysis attacks because ev-
ery power analysis attack uses energetic traces that are obtained through noisy
measurements.
For a = n2 = 80, ρ (SAP , SAO )  0 and I (SAP , SAO ) has a minimum.
Random switching activities SAX and SAY for unrelated FSRs have correlation
and mutual information distributed as in Figures 4-C and 4-D (obtained for 600
10K runs). The average value for I (SAX , SAY ) is ∼ 0.11, which is the same
as the mutual information between SAO and SAP corresponding to n2 obtained
from Figure 4-B.
In a first time, we therefore insert the modification points so that exactly n2
state bits are altered, which guarantees the lowest dependency between SAO and
SAP . The number of n-bits altered FSRs that can be constructed
 n  equivalent to
a given n bits FSR with n2 altered state bits is given by n/2 , which is very high
for typical sizes of FSRs used in FSR-based stream ciphers (∼ 9.2 × 1046 for the
combination of the two 80-bits FSRs of Grain-80). Note that in Subsection 7.5
we will also assess the security of the method when the mask is randomly picked
between all possible masks.

5 Hardware Implementation

To simplify manufacturing, we suggest to design all FSRs with the same lay-
out, with all modification points already implemented, as shown in Figure 5.
The modification points are activated or de-activated based on the values of
en signals, which can be programmed after the chip has been manufactured.
Alternative solutions are discussed in Subsections 7.4 and 7.5.
In Figure 5-left, each signal fi between f0 = in and f4 = fn is XORed with
the s or s̄ signal before it is passed on in the chain if its relevant eni signal is
set to one; otherwise, the modification point is inactive and the signal is passed
60 S.S. Mansouri and E. Dubrova

en0 en1 en2 en3


feed back function

G2 FF FF FF FF
a1 a2 a3 a4
G1=combinational
G1 in f1 f2 f3 f4
FF FF FF FF 1 1 1 1
G1* G1 G1
s clk
a1 a2 a3 a4
s FF G3
o1 o2 o3 o4
1 o5=out
0 G4 k1 a2 k2
k3
a4 k4 T T
1 2 3 4 5 time T T
FF FF FF FF
feed back function
feed back function G2=sequential

Fig. 5. Left: Schematic diagram of the protected FSR with our countermeasure (the
red feedback function indicates a Galois feedback). Right: Schematic diagram of the
countermeasure in [8].

on unaltered in the chain. Signals ai indicate whether state bit fi is altered


(ai = 1) or unaltered (ai = 0). Additional extra gates are inserted for parallel
initialization and correction of the altered state bits that are used as inputs in
combinational blocks.
The inputs of the combinational blocks are unmasked. We chose this solution
because the combinational blocks consume only a small percentage of the total
power consumed by an FSR, and the information they leak in the FSR power
trace is small compared to that leaked by the flip-flops. The same assumption
was done also in [8]. Since the output of all combinational blocks is unmasked,
our countermeasure can be used also for Galois FSRs (feedbacks on multiple
flip-flops in the chain, see Figure 5-left). If the leakage of such combinational
functions is a concern, it can be blocked by implementing the combinational
functions using symmetric logic, i.e. gates built in such a way that the switching
activity of the combinational function is always the same. This is outside the
scope of this paper.
With reference to Figure 5-left the extra gates can be divided into four groups
based on their function:
– G1 corresponds to n XOR gates responsible for combining the fi signals with
s or s̄ before passing them on in the chain. They are activated or excluded
using AND gates based on the values of the en signals. G1∗ is an extended
version of gates in Group 1 used for Galois FSRs, using 3-input XORs.
– G2 corresponds to n XOR gates which combine the eni signals to determine
the altered or unaltered state of every state bit fi in the FSR. ai is set to
1 if there is an odd number of en signals set to 1 between en0 and eni−1
included, and set to 0 otherwise. ai signals are used as input of gates from
G3 and G4.
– G3 contains AND and XOR gates used as fault correction units for the state
bits fi that are used as inputs by external combinational blocks: if bit fi is
altered (ai = 1) the gates are activated and the fault is corrected.
An Architectural Countermeasure against Power Analysis Attacks 61

en en en
A B
in f in in
FF out FF f 0 out FF f out
1
s s
s

Fig. 6. Balanced design for G1 and G3 gates

– G4 corresponds to n2 XOR gates which are active only when the FSR is
loaded with the initial state during the first cycle of operation. If the state
bit in which the initial state is loaded is altered and has an even index, it is
inverted before it is loaded into the corresponding flip-flop.
An extension of our countermeasure to support parallelized FSRs is possible but
is outside the scope of this paper.
The gates in G1 and G3, if implemented as in Figure 6-A, have a different
power consumption if the corresponding en signal is at 0 or 1. This can reveal to
an attacker the number of en signals at 1. The gates can instead be implemented
as in Figure 6-B: since the en signal is the input value of both multiplexer’s AND
gates, in each cycle one of these two gates is active independently on the value
of en and the power consumption of the gates does not depend on the value of
the en signals.
The gates in G1, G2, G3 and G4 add area and power penalties to the FSR
architecture. However, many of these cells have constant input values (G2) or
are active only during initialization (G4), and do not consume dynamic power
during cipher operation. Gates in G3 consume dynamic power during operation
but their number is limited because they are only inserted on the state bits that
are used as inputs of a combinational function (in Grain-80, on 30 bits out of
160; in Trivium, on 6 bits out of 288).

6 Experimental Results
We designed in Verilog three versions of the Grain-80, Grain-128 and Trivium
ciphers: the first unprotected, the second protected as suggested in this paper (a
simplified implementation is shown in Figure 7) and the third protected using the
countermeasure suggested in [8] (the implementation follows Figure 5-right). We
compare the area and power overheads of our suggested countermeasure with the
countermeasure in [8] (see Figure 5-right) because, to the best of our knowledge,
it is the only standard cell architecture level countermeasure targeting FSR-
based stream ciphers. All ciphers were synthesised targeting minimal area using
Cadence RTL compiler in UMC 90 nm ASIC technology. Power results were
obtained from Cadence RTL compiler backannotated with gate-level switching
activity, estimated as a combination of dynamic and leakage power with a power
supply of 1.2V at 1 MHz clock frequency with a set of random test vectors.
As shown in Table 1, the ciphers protected as suggested in this paper are on
average ∼ 19% smaller than the ciphers protected as in [8], and consume on
average twice less power. This discrepancy between relatively low area benefits
62 S.S. Mansouri and E. Dubrova

G2
G4 G4 G4 G4 G4 G4 G4 G4

G1 FF FF
G1 78
FF
1 G1
FF
0 G1* FF FF
G1 78
FF
G1
FF
0
79 79 1
s
G3 G3
FF
LFSR feedback funcion NLFSR feedback funcion
H,Z
LFSR output stream NLFSR

Fig. 7. Schematic diagram of the protected Grain

4
x 10
9000 1.7

Switching Activity
Peak Current (nA)

Peak Current (nA)


1.6 90
8500

Protected
Original

1.5
8000 80
1.4
7500
1.3 70
7000 1.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time (Clock Cycle) Time (Clock Cycle)

Fig. 8. Comparision between the power (current) consumptions and FSR’s switching
activities of both protected and unprotected Grain-80

and high power benefits is due to the fact that most of the gates that are inserted
in the FSR do not toggle during operation. Compared to the original cipher, the
power overhead of the countermeasure is on average ∼ 50% for all three ciphers.
The power (current) diagrams of a protected and an unprotected Grain-80 for
300 execution cycles are shown in Figure 8-left; the state bits switching activities
are shown in in Figure 8-right.

7 Security

Side-channel attacks exploit leakage information from a cipher, which in case of


power analysis attacks is the energetic trace. The core idea of differential side-
channel attacks is to compare the set of key-dependent predictions of physical
leakages with actual measurements from the cipher, in order to identify which
predicted key in a pool of guessed keys is most likely the correct key. A com-
parison algorithm (also called distinguisher) is used to distinguish the correct
predicted key from all the other guessed keys.
The cipher under attack, whose secret key is ks , is initialized with a known
initial value IV and its power consumption is probed during operation. The
power trace is then integrated to obtain an energetic trace Ei indicating the
energy consumption of the cipher in clock cycle i. On the other hand, a set of
energetic traces EMki are obtained from a model of the cipher. Each of these
traces indicate the estimated energetic trace of the cipher under attack initialized
with IV if its secret key was k. Several EMki are obtained, one for each key
k ∈ K, where K is the pool of guessed keys. The distinguisher d (Ei , EMki ) is
An Architectural Countermeasure against Power Analysis Attacks 63

Table 1. Area and power comparison between the original (Org.) Grain-80, Grain-128
and Trivium, the same ciphers using our countermeasure (R.SA), and the countermea-
sure in [8]

Property Grain-80 Grain-128 Trivium


Org. R.SA [8] Org. R.SA [8] Org. R.SA [8]
Power (µW ) 3.5 5.4 12.0 6.5 9.1 19.5 7.4 10.2 21.189
Norm. 1 1.54 3.43 1 1.4 3 1 1.4 2.9
Area (µm2 ) 4482 11962 14482 7007 19795 22319 7568 20280 28473
Norm. 1 2.7 3.2 1 2.8 3.2 1 2.7 3.8

then calculated for each of the modelled energetic traces. In a successful attack,
the key k giving the highest value for the distinguisher corresponds to the secret
key ks . Attacks on longer traces are more likely to succeed: the Measurements to
Disclosures (MTD) is defined as the minimal number of samples in the energetic
traces for which the correct key’s distinguisher becomes higher than that of all
the wrong guessed keys.
The attack is successful if: (1) the pool of guessed keys contains the secret key
(ks ∈ K) and (2) the highest value of the distinguisher is obtained for k = ks .
The first strength of our countermeasure is that it makes it hard for an attacker
to find a pool of guessed keys containing the secret key, because normally getting
a pool of guessed keys requires assumptions on the power model of the system.
We however assume that a pool of guessed keys containing the secret key is
available and we check whether the distinguisher can reveal or not the secret
key during an attack.
We consider two first-order attacks: DPA attack [5], which uses Pearsons cor-
relation coefficient as the distinguisher (d = ρ(Ei , EMki )) and MIA attack (or
generic side channel attack) [16], which uses mutual information as the distin-
guisher (d = I(Ei , EMki )).

7.1 First-Order DPA Attack


We performed a DPA attack on an unprotected and a protected Grain-80 for a
pool of 300 guessed keys, containing among others the key ks . EMki is obtained
by running for 300 times an unprotected cipher initialized with IV, each time
using a different key k.
The energetic traces were obtained from Cadence RTL compiler, estimated
based on gate-level switching activity obtained from simulation and back-
annotated through a VCD file. To make the attack realistic, a white noise sig-
nal up to 10% of the maximum power is added to the power consumption of
the unprotected cipher in each sampling. Figure 9-right shows the correlation
coefficients of the guessed keys for the DPA attack on the unprotected Grain-80
after 1K cycles. The correlation peak in the diagram easily reveals the correct
key (M T D < 1K). In contrast, the protected Grain-80 (Figure 9-left) is still
resistant against the DPA attack after 1M cycles (M T D > 1M ).
64 S.S. Mansouri and E. Dubrova

0.02
0.6

Correlation Coefficients
Correlation Coefficients
0.01 0.4

0.2
0
0
−0.01
−0.2

−0.02 −0.4
0 100 200 300 0 100 200 300
Predicted Key Predicted Key

Fig. 9. Correlation coefficients for the 300 guessed keys on the protected (left) Grain-80
after 1M cycles and unprotected (right) Grain-80 after 1K cycles

7.2 First-Order MIA Attack


MIA attacks can exploit any abnormality in the joint probability of two variables
and are therefore recommended [6, 16] when the attacker does not have much
knowledge available about the target device.
We performed a MIA attack on the unprotected and the protected Grain-
80 for 300 guessed keys. The attack could be performed on the energetic trace
Ei obtained from observation of the cipher under attack. However, as we have
already discussed, Ei has a linear relation with the state bits switching activity
trace of the cipher under attack SAi , and EMki has a linear relation with the
state bits switching activity SAMki of the cipher in the model. MIA attacks
are sensitive to noise: to make a worst-case scenario analysis, we suppose that
the attacker has been able to extract SAi from Ei with only minimal noise
(+1 in with 25% probability, no change with 50% probability and −1 with 25%
probability). The attacker runs a MIA attack between SAi and SAMki , which
he can easily obtain through high-level simulation of his model. The probability
distributions are obtained using 160 bins, one for each possible value of the
switching activity.
As shown Figure 10 the MIA attack on the protected Grain-80 after 1M cycles
does not reveal the correct guessed key (M T D > 1M ). The joint probability
distribution function between SAi and SAMks i after 1M cycles is Gaussian.

7.3 More Complex Attacks


An attacker could make a more advanced model of the protected stream cipher.
Instead of comparing Ei with the estimated energy trace of the unprotected
cipher EMki , he could compare it with the estimated energetic trace EMkmi ,
an estimation of the energetic trace of the cipher under attack initialized with
IV if its secret key was k and its secret mask (i.e. the number and position
of its altered bits) was a given mask m. With m random, the dependency of
EMkmi and Ei would depend on the number of bits r that have a different state
(altered or unaltered) in the cipher and the model (mask distance). In terms of
correlation and mutual information, the dependency between the two variables
as r changes can be seen in Figures 4-A and 4-B. The more r is close to n2 ,
An Architectural Countermeasure against Power Analysis Attacks 65

Fig. 10. Left: MIA attack on protected Grain-80 after 1M cycles. Right: joint proba-
bility distribution of SAi and SAM ks i after 1M cycles.

the higher is the degree of independence between EMkmi and Ei . Without any
information, the attacker could only guess m randomly. For a 160-bits FSR or
the two 80-bits FSRs of Grain-80, the mask distance would have a probability
distribution as in Figure 11.
The Gaussian distribution would be tighter if n was higher than 160 (as
for Grain 128 and Trivium). Given the sizes of FSRs used in stream ciphers,
therefore, in most of the cases Ei and EMkmi would be only weakly related.
Estimating the MTD of DPA and MIA attacks is computationally intensive.
MTDs are expected to rise as the mask distance gets closer to n2 . To estimate
MTDs we conducted 5 DPA and MIA attacks using 100 guessed keys for random
masks with r = 80 ± 5, 5 for random masks with r = 80 ± 10 and 5 for random
masks with r = 80 ± 20. For each mask distance, we found a lower bound for
which none of the 5 attacks was successful.
We found that for a MIA attack conducted in the same conditions of
Section 7.2, the cipher in 90% of the cases (70 ≤ r ≤ 90) will not break be-
fore 100K cycles. In 99% of the cases (60 ≤ r ≤ 100), the cipher will not break
before 5K cycles. The results are shown in Figure 11-left. The low rate of success
of these types of attack is due to the fact that the mutual information curve re-
mains low as long as the mask distance is between at least 60 and 100 as shown
in Figure 4-left.
DPA attacks are more successful, because the relation between the mask dis-
tance r and the correlation between the energetic curves is linear, as shown in
Figure 4-B. We found out that only in 62% of the cases (75 ≤ r ≤ 85) the MTD
is higher than 100K.
An attacker could also attack the cipher by using several different models of
the secret cipher, each of them obtained by estimating the energy consumption
EMmki of the cipher if its mask was a specific mask m ∈ M , where M is a pool
of 2, 3 or more guessed masks. It would then be possible to attack the cipher by
using multivariate correlation and/or multivariate mutual information between
all energetic traces Ei , EMmki . If chosen randomly, all guessed masks will in
general have a mask distance r randomly distributed as in Figure 11 from the
mask used by the cipher under attack. Discussion of these attacks, which are
computationally more intensive, lies outside the scope of this paper.
66 S.S. Mansouri and E. Dubrova

0.08 0.08

0.06 0.06
pdistribution

pdistribution
MTD > 500K MTD > 100K

0.04 0.04
MTD > 100K MTD > 30K

0.02 MTD > 5k 0.02 MTD > 4k

0 0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
Mask Distance Mask Distance

Fig. 11. Estimated MTDs for DPA (right) and MIA (left) attacks based on the mask
distance of two random masks

7.4 Invasive and Semi-invasive Attacks

Using fixed en signals programmed after the chip has been manufactured makes
the en signals vulnerable to imaging semi-invasive attacks [15, 17], attacks in
which the attacker uses advanced optical probing to observe chip placement and
routing, and laser scanning techniques to find the active doped areas on the
surface of the chip. One of the solutions which makes it more difficult to see
under the top metal layer of the chip is planarising each predecessor layer before
applying the next layer [17] by filling blank spaces on metal layers with metal
pads to block the optical path. It is also possible to prevent decapsulating the
IC by implementing light sensors on the chip which prevent a decapsulated chip
from functioning [15].
As shown in Figure 12-right, it is possible to drive the en signals using an
SRAM Physical Unclonable Function (PUF) [18], which makes imaging attacks
ineffective. When the cipher is powered up, en signals boot to a state which is
different for every manufactured chip and depends on device mismatches between
the different cells. For the same chip, the en signals should take the same values
at every run; failure to do so will only add some randomization on some bits of
the mask. Discussion about these issues is outside the scope of this paper.

7.5 Random Mask Generator

To increase the security of the countermeasure, the en signals can be chosen


randomly at the start of every run using a set of simple ring oscillators as a
Physical Random Number Generator (PRNG).
An implementation of a simple ring oscillator PRNG is shown in Figure 12-
left. The ring oscillators start oscillating when rst goes to 0 at the beginning of
every run, which makes the XOR gates act as delay lines. After some cycles, rst is
raised to 1 and the ring oscillators become memory elements, which fix the value
of the en signals during operation. All ring oscillators will oscillate at slightly
different frequencies among them and in each run, due to device mismatches and
other unpredictable parameters. The unpredictability of the PRNG depends on
how long the signal rst remains low. Discussion on the properties of random
numbers generated by the ring oscillator PRNGs is outside the scope of this
paper.
An Architectural Countermeasure against Power Analysis Attacks 67

rst

en0 en1 en2 en3 en0 en1 en2 en3


Protected Grain−80 Protected Grain−80

Fig. 12. Two solutions for driving the en signals: PRNG (left) and PUF (right)

Since the mask will be randomly picked between all possible values, any model
made by the attacker by guessing a mask will have in general a mask distance
r from the cipher, with r distributed as in Figure 11. Any DPA or MIA attack
would therefore have the same success rate as in figure 11. However, since the
mask is changed in every run, the attacker has only a single chance to sample a
specific energetic trace before a new mask is loaded and the hardware structure
of the cipher is changed.

8 Conclusion
In this paper we introduced a standard cell architectural level countermeasure for
FSR-based stream ciphers. The proposed countermeasure alters the power trace
by masking the switching activity of the FSR. This differentiates our approach
from the previously proposed ones, which instead flatten the power trace. The
new concept allows us to save 50% power and 19% area on average.
The proposed countermeasure can be implemented using standard digital cells
only. Therefore, it is compatible with the standard ASIC design flow and easier to
implement compared to analog countermeasures and cell level countermeasures,
which require analog and transistor-level design.
We evaluated the security of our approach by performing DPA and MIA
attacks on the protected version of Grain-80 stream cipher. The results show
that the first-order DPA and MIA attacks can not break Grain-80 before 1M
cycles. If the attacker guesses a mask or the mask is randomly picked among
all possible values, the success rate of MIA and DPA attacks depends on which
mask is picked. We performed a probabilistic analysis and estimated the success
rates of MIA and DPA up to 100K cycles to be less 10% and 40% respectively.
Better results are expected for ciphers using larger FSRs, such as Grain-128 and
Trivium. As a solution for further decreasing the success rate, we propose to
change the mask randomly at every run using a PRNG.
In a future work, we plan to investigate a possibility of changing the mask or
some of its bits dynamically during the operation of the cipher.

Acknowledgment. This work was supported in part by a project No 621-2010-


4388 from Swedish Research Council.
68 S.S. Mansouri and E. Dubrova

References
1. Robshaw, M.: The eSTREAM Project. In: Robshaw, M., Billet, O. (eds.) New
Stream Cipher Designs. LNCS, vol. 4986, pp. 1–6. Springer, Heidelberg (2008)
2. De Cannière, C., Preneel, B.: Trivium. In: Robshaw, M., Billet, O. (eds.) New
Stream Cipher Designs. LNCS, vol. 4986, pp. 244–266. Springer, Heidelberg (2008)
3. Hell, M., Johansson, T., Maximov, A., Meier, W.: The Grain Family of Stream
Ciphers. In: Robshaw, M., Billet, O. (eds.) New Stream Cipher Designs. LNCS,
vol. 4986, pp. 179–190. Springer, Heidelberg (2008)
4. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards. Springer-Verlag New York, Inc. (2007)
5. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
6. Batina, L., Gierlichs, B., Prouff, E., et al.: Mutual information analysis: Compre-
hensive study. J. Cryptol. 24, 269–291 (2011)
7. Tokunaga, C., Blaauw, D.: Secure AES engine with a local switched-capacitor
current equalizer. In: IEEE International Solid-State Circuits Conference - Digest
of Technical Papers, ISSCC 2009 (2009)
8. Burman, S., Mukhopadhyay, D., Veezhinathan, K.: LFSR Based Stream Ciphers
Are Vulnerable to Power Attacks. In: Srinathan, K., Rangan, C.P., Yung, M. (eds.)
INDOCRYPT 2007. LNCS, vol. 4859, pp. 384–392. Springer, Heidelberg (2007)
9. Ratanpal, G., Williams, R., Blalock, T.: An on-chip signal suppression counter-
measure to power analysis attacks. IEEE Transactions on Dependable and Secure
Computing, 179–189 (2004)
10. Mansouri, S.S., Dubrova, E.: A Countermeasure Against Power Analysis Attacks
for FSR-Based Stream Ciphers. In: ACM Great Lakes Symposium on VLSI, pp.
235–240 (2011)
11. Atani, S., Atani, R.E., Mirzakuchaki, S., et al.: On DPA-resistive implementation of
fsr-based stream ciphers using sabl logic styles. International Journal of Computers,
Communications & Control (2008)
12. Bucci, M., Giancane, L., Luzzi, R., Trifiletti, A.: Three-Phase Dual-Rail Pre-charge
Logic. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 232–241.
Springer, Heidelberg (2006)
13. Moradi, A., Khatir, M., Salmasizadeh, M., et al.: Charge recovery logic as a side
channel attack countermeasure. In: ISQED 2009 (2009)
14. Hell, M., Johansson, T., Maximov, A., et al.: A Stream Cipher Proposal: Grain-128.
In: 2006 IEEE International Symposium on Information Theory, pp. 1614–1618
(2006)
15. Skorobogatov, S.P.: Semi-invasive attacks – a new approach to hardware security
analysis. University of Cambridge, Computer Laboratory, Tech. Rep. UCAM-CL-
TR-630 (April 2005)
16. Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual Information Analysis - A
Generic Side-Channel Distinguisher. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008.
LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008)
17. Anderson, R., Bond, M., et al.: Cryptographic processors-a survey. Proceedings of
the IEEE 94, 357–369 (2006)
18. Sadeghi, A.-R., Naccache, D.: Towards Hardware-Intrinsic Security: Foundations
and Practice, 1st edn. Springer-Verlag New York, Inc. (2010)
Conversion of Security Proofs from One Leakage
Model to Another: A New Issue

Jean-Sébastien Coron1,6 , Christophe Giraud2 , Emmanuel Prouff3 ,


Soline Renner2,5 , Matthieu Rivain4 , and Praveen Kumar Vadnala1
1
Université du Luxembourg
{jean-sebastien.coron,praveen.vadnala}@uni.lu
Oberthur Technologies
Crypto and Security Group
2
4, allée du Doyen Georges Brus, 33 600 Pessac, France
3
71-73, rue des Hautes Pâtures, 92 726 Nanterre, France
{c.giraud,e.prouff,s.renner}@oberthur.com
4
CryptoExperts
41, boulevard des Capucines, 75 002 Paris, France
matthieu.rivain@cryptoexperts.com
5
Université Bordeaux I
351, cours de la Libération, 33 405 Talence cedex, France
6
Tranef
jscoron@tranef.com

Abstract. To guarantee the security of a cryptographic implementation


against Side Channel Attacks, a common approach is to formally prove
the security of the corresponding scheme in a model as pertinent as pos-
sible. Nowadays, security proofs for masking schemes in the literature
are usually conducted for models where only the manipulated data are
assumed to leak. However in practice, the leakage is better modeled en-
compassing the memory transitions as e.g. the Hamming distance model.
From this observation, a natural question is to decide at which extent
a countermeasure proved to be secure in the first model stays secure in
the second. In this paper, we look at this issue and we show that it must
definitely be taken into account. Indeed, we show that a countermeasure
proved to be secure against second-order side-channel attacks in the first
model becomes vulnerable against a first-order side-channel attack in
the second model. Our result emphasize the issue of porting an imple-
mentation from devices leaking only on the manipulated data to devices
leaking on the memory transitions.

1 Introduction

1.1 Context

Side Channel Analysis (SCA for short) is a class of attacks that extracts in-
formation on sensitive values by analyzing a physical leakage during the exe-
cution of a cryptographic algorithm. They take advantage of the dependence

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 69–81, 2012.

c Springer-Verlag Berlin Heidelberg 2012
70 J.-S. Coron et al.

between one or several manipulated value(s) and physical measurements. Im-


plementations of block ciphers have been a privileged target and a wide num-
ber of countermeasures have been published during the last decade to protect
them [1, 4–8, 10, 12–15, 17].
One of the most common techniques to protect block ciphers against SCA
consists in randomly splitting each sensitive value of the processing into several
shares [2,5,14]. These shares must then be propagated throughout the algorithm
in such a way that no intermediate value is key-dependent, making SCA difficult
to perform. This kind of countermeasures can be characterized by the number of
random shares per sensitive variable: a so-called dth -order masking splits each
sensitive value into d+1 shares. Theoretically, such a countermeasure can always
be broken by a so-called (d+1)th -order side channel analysis, where the adversary
is assumed to be able to observe the physical leakage related to the manipulation
of the d + 1 shares. However, in practice the difficulty of carrying out a higher
order SCA increases exponentially with the order. As a consequence, the use of
a first or second order masking scheme is often sufficient to achieve practical
resistance.
When applying masking to protect a block cipher implementation, the most
critical parts to deal with are the non-linear functions, also called s-boxes. Among
the numerous methods that have been proposed in the literature, many of them
have been broken, which has risen the need for a formal analysis of the security
provided by such countermeasures. When the purpose is to thwart first-order
SCA only, a secure and efficient solution is to use pre-computed look-up tables
in RAM [6, 8]. When the countermeasure must also defeat second-order SCA,
there exists no solution which is at the same time secure and very efficient for
any kind of s-box. To the best of our knowledge only the schemes [4, 12, 14, 15]
have a formal proof of security. The schemes proposed in [4], [12] and [15] are
quite efficient but dedicated to the AES s-box only. In comparison, [14] is less
efficient but it can be applied to protect any s-box implementation. In this paper,
we focus on the latter one.
To guarantee the security of a cryptographic implementation against dth -order
SCA or to simply enable comparison between the resistance of several counter-
measures, it is nowadays a common approach to formally prove the security of a
scheme in a model as pertinent as possible. Two different models are generally
considered in the literature. We recall these models hereafter.
When the device writes a value Z into the memory, the first leakage model
assumes that the leakage L satisfies:

L = ϕ(Z) + B, (1)

with ϕ a (non-constant) function and B an independent gaussian noise with


zero mean. Such a model is said to leak on the manipulated data bits only. For
example the leakage function ϕ is often the Hamming weight (HW) function (or
an affine function of the HW). In that case, we usually speak about Hamming
weight model. A more conservative choice in terms of security is to suppose that
ϕ might be the identity function i.e. the leakage reveals the value of Z.
Conversion of Security Proofs from One Leakage Model to Another 71

The second model assumes that the device leaks on the memory transitions
when a value Z is manipulated. In this situation the function ϕ depends on Z
but also on a second value Y corresponding to the initial state of the memory
before the writing of Z. More precisely, we have:

L = ϕ(Z ⊕ Y ) + B. (2)

In the particular case where ϕ is the HW function, the leakage L defined in (2)
corresponds to the so-called Hamming distance (HD) model.
Several works have demonstrated the validity of HW and HD models in prac-
tice, which are today commonly accepted by the SCA community. However other
more precise models exist in the literature (see for instance [3, 9, 16]).
In the rest of this paper, we keep the generality by considering two models :
ODL model (Only manipulated Data Leak ) and MTL model (Memory Transition
Leak ), each of them being defined by the leakage function expressed in (1) and
(2) respectively.

1.2 ODL Model vs. MTL Model

Except very rare exceptions (e.g. [10]), security proofs in the literature are usu-
ally conducted in ODL model. This is in particular the case of the countermea-
sures proposed in [14]. However, in practice, the leakage is better modeled by
MTL model. Starting from this observation, a natural question is to decide at
which extent a countermeasure proved to be secure in ODL model stays secure
in MTL model. Very close to this question an interesting and practically rele-
vant problem is the design of methods to transform an implementation secure
in the first model into a new implementation secure in the second. Hence, if we
assume that the memory transitions leak information, the leakage is modeled by
ϕ(Y ⊕Z)+B. In such a model a masking countermeasure may become ineffective.
For instance, if Z corresponds to a masked variable X ⊕ M and if Y equals the
mask, then the leakage reveals information on X. A very straightforward idea to
deal with this issue is to erase the memory before each new writing (e.g. set Y
to 0 in our example). One may note that such a technique is often used in prac-
tice at either the hardware or software level. Using such a method, the leakage
ϕ(Y ⊕ Z) + B is replaced by the sequence of consecutive leakages ϕ(Y ⊕ 0) + B1
and ϕ(0 ⊕ Z) + B2 that is equivalent to ϕ(Y ) + B1 and ϕ(Z) + B2 . The single
difference with classical ODL model is the additional assumption that the exe-
cution leaks the content of the memory before the writings. Since this leakage
corresponds to a variable that has been manipulated prior to Z, it is reasonable
to assume that the leakage ϕ(Y ) + B1 has already been taken into account when
establishing the security of the countermeasure. As a consequence, this way to
implement a countermeasure proved to be secure in ODL model seems at a first
glance also offers security on a device leaking in MTL model.
In this paper, we emphasize that a countermeasure proved to be secure in
ODL model may no longer stay secure in MTL model. Indeed, we exhibit a
case where a countermeasure proved to be second-order resistant in ODL model
72 J.-S. Coron et al.

does no longer provide security against first-order SCA when implemented in


a device leaking on the memory transitions. Then, we show that the natural
method proposed above to transfer a countermeasure resistant in ODL model
into a countermeasure resistant in MTL model is flawed. Those two results en-
lighten the actual lack of a framework to solve the (practically) important issue
of porting an implementation from one family of devices to the other one.

1.3 Paper Organization


This paper is organized as follows. In Section 2, we briefly recall a second-order
countermeasure proved to be secure in ODL model [14]. In Section 3, we show that
such a countermeasure can be broken by using a first-order attack in MTL model.
To thwart this attack, we apply in Section 4.1 the method described previously
which erases the memory before each new writing and we show that this method
does not provide an implementation secure in the second model. We provide the
results of a practical implementation of our attacks in Section 5. Finally we con-
clude this paper in Section 6.

2 Securing Block Cipher against 2O-SCA


Most of the countermeasure published in the literature to thwart SCA are based
on the algebraic properties of the targeted algorithm (e.g. AES). However, when
the corresponding algorithm involves s-boxes with no particular algebraic struc-
ture (e.g. those in DES, PRESENT or FOX ciphers), then only the methods
proposed in [14] enable to achieve second-order security. In the following, we
focus on the last case where a random-like s-box must be implemented in a se-
cure way w.r.t. 2O-SCA. For such a purpose, we focus on the second variant
proposed in [14] (this choice can for instance have been made because of its low
RAM consumption compared to the first variant).
Based on a secure primitive compareb defined such that compareb (x, y) equals
b if x = y and b otherwise (see [13, Appendix A] for more details), the authors
in [14] propose the algorithm below:

Algorithm 1. Computation of a 2O-masked s-box output from a 2O-masked input


Inputs: a masked value x̃ = x ⊕t1 ⊕t2 ∈ Fn n n
2 , the pair of input masks (t1 , t2 ) ∈ F2 ×F2 ,
a pair of output masks (s1 , s2 ) ∈ Fm
2 × F m
2 , a (n, m) s-box function F
Output: the masked s-box output F (x) ⊕ s1 ⊕ s2 ∈ Fm 2

1. b ← rand(1)
2. for a = 0 to 2n − 1 do
3. cmp ← compareb (t1 ⊕ a, t2 )
4. Rcmp ← (F (x̃ ⊕ a) ⊕ s1 ) ⊕ s2
5. return Rb

To compute F (x) ⊕ s1 ⊕ s2 , the core idea of Algorithm 1 is to successively


read all values of the lookup table F from index x̃ ⊕ a with a = 0 to index x̃ ⊕ a
Conversion of Security Proofs from One Leakage Model to Another 73

with a = 2n − 1. When the correct value F (x) ⊕ s1 ⊕ s2 is accessed, it is stored in


a pre-determined register Rb whereas the other values F (x̃ ⊕ a) ⊕ s1 ⊕ s2 , with
x̃ ⊕ a 
= x, are stored in a garbage register Rb . In practice two registers R0 and
R1 are used and their roles are chosen thanks to a random bit b.
Depending on the loop index a, the fourth step of Algorithm 1 processes the
following operation:

cmp ← b ; Rcmp ← F (x) ⊕ s1 ⊕ s2 if a = t1 ⊕ t2
. (3)
cmp ← b ; Rcmp ← F (x̃ ⊕ a) ⊕ s1 ⊕ s2 otherwise
In view of (3), it may be observed that the register Rb is modified only once
whereas Rb changes 2n − 1 times. As proven in [14], this behavior difference be-
tween the registers Rb and Rb cannot be successfully exploited by a second-order
attack when the device leaks in the ODL model. The proof can be straightfor-
wardly extended to any leakage model called linear, in which all bits of the
manipulated data leak independently. However, if Algorithm 1 must be imple-
mented on a physical device with a different leakage model, then the security
proof in [14] can no longer be invoked. Hence, since the most common alternative
is MTL model, it is particularly interesting to investigate whether Algorithm 1
stays secure in this context. In the next section, we put forward the kind of
security issues brought by a straightforward implementation of Algorithm 1 on
a device leaking the memory transition. In particular, for a specific (but quite
natural) implementation, we exhibit a first-order SCA.

3 Attack of Algorithm 1 in the MTL Model

This section is organized as follows: first we present a straightforward imple-


mentation of the 2O-SCA countermeasure described in Algorithm 1. Then we
expose how a first-order attack in MTL model can break this second-order
countermeasure.
In the analysis developed in this paper, we will denote random variables by
capital letters (e.g. X) and their values by small letters (e.g. x).

3.1 Straightforward Implementation of Algorithm 1

In the following, we assume that the considered device is based on an assembler


language for which a register RA is used as accumulator. Moreover we assume
that registers RA , R0 and R1 are initially set to zero.
Based on these assumptions, the fourth step of Algorithm 1 can be imple-
mented in the following way:

4.1 RA ← x̃ ⊕ a
4.2 RA ← F (RA )
4.3 RA ← RA ⊕ s1 (4)
4.4 RA ← RA ⊕ s2
4.5 Rcmp ← RA
74 J.-S. Coron et al.

During this processing where X̃ = X ⊕ T1 ⊕ T2 , the initial content of register


Rcmp , denoted by Y , satisfies the following equation depending on the values of
the loop index a, T1 and T2 :


⎪ 0 if a = 0 ,

⎪ if a = 1 and T1 ⊕ T2 = 0 ,
⎨ 0
Y = 0 if a > 0 and T1 ⊕ T2 = a , (5)



⎪ F (X̃ ⊕ (a − 2)) ⊕ S 1 ⊕ S 2 if a > 1 and T 1 ⊕ T 2 = (a − 1) ,

F (X̃ ⊕ (a − 1)) ⊕ S1 ⊕ S2 otherwise.

In the following we will show that the distribution of the value Y defined in
(5) brings information on the sensitive variable X. We will consider two cases
depending on whether RA equals Rcmp or not.

3.2 Description of the First-Order Attack When RA = Rcmp


According to this decomposition, if we assume that the register Rcmp is the
accumulator register, then Step 4.5 of (4) is unnecessary and the register Rcmp
leaks at each state. This is in particular the case at Step 4.1,
In this part, we assume that the physical leakage of the device is modeled by
MTL model and hence the leakage L associated to Step 4.1 of (4) satisfies:

L ∼ ϕ(Y ⊕ X̃ ⊕ a) + B , (6)

where Y denotes the initial state of Rcmp before being updated with X̃ ⊕ a,
defined above by (5).
From (5) and (6), we deduce:



ϕ(X̃) +B if a=0 ,

⎨ ϕ(X ⊕ 1) + B if a = 1 and T1 ⊕ T2 = 0 ,
L = ϕ(X) +B if a > 0 and T1 ⊕ T2 = a ,


⎩ ϕ(F (X̃ ⊕ (a − 2)) ⊕ S1 ⊕ S2 ⊕ X̃ ⊕ a) + B if
⎪ a > 1 and T1 ⊕ T2 = (a − 1) ,
ϕ(F (X̃ ⊕ (a − 1)) ⊕ S1 ⊕ S2 ⊕ X̃ ⊕ a) + B otherwise.

When a = 0, the leakage L is an uniform value which brings no information on


the value X. Therefore in the following, we omit this particular case.
Hence, we have

⎨ ϕ(X) +B if T1 ⊕ T2 = a ,
L= ϕ(X ⊕ 1) +B if T1 ⊕ T2 = 0 and a = 1 , (7)

ϕ(Z) +B otherwise ,

with Z a variable independent of X and with uniform distribution.


In view of (7), the leakage L depends on X. Indeed, the mean of (L|X = x)
satisfies:
⎧ 1 n
⎨ 2n × (ϕ(x) + ϕ(x ⊕ 1)) + 2 2−2
n × E(ϕ(Z)) if a = 1 ,
E(L | X = x) =
⎩ 2n −1
2n × ϕ(x) + 2n × E(ϕ(Z))
1
if a > 1 ,
Conversion of Security Proofs from One Leakage Model to Another 75

or equivalently (since Z has uniform distribution):


⎧ 1 n×(2n −2)
⎨ 2n × (ϕ(x) + ϕ(x ⊕ 1)) + 2n+1 if a = 1 ,
E(L | X = x) = (8)
⎩ n×(2n −1)
2n × ϕ(x) +
1
2n+1 if a > 1 .
When a > 1, the mean in (8) is an affine function of ϕ(x) and it is an affine
function of (ϕ(x)+ϕ(x⊕1)) otherwise. Therefore in both cases the mean leakage
reveals some information on X.
An adversary can thus target the second round in Algorithm 1 (i.e. a = 1)
and get a sample of observations for the leakage L defined as in (6). The value
X typically corresponds to the bitwise addition between a secret sub-key K and
a known plaintext subpart M . In such a case and according to the statistical
dependence shown in (8), the set of observations can be used to perform a first-
order SCA allowing an attacker to recover the secret value K.
As an illustration, we simulated a first-order CPA in the Hamming weight
model without noise targeting the second loop (namely a = 1) with the AES
s-box. The secret key byte was recovered with a success rate of 99% by using
1.000.000 acquisitions.

3.3 Description of the First-Order Attack When RA 


= Rcmp
In this part, the accumulator register RA is assumed to be different from the
register Rcmp . Under such an assumption, Step 4.5 in (4) leaks the transition
between the initial content Y of Rcmp and the current content of RA . Namely,
after denoting T1 ⊕ T2 and S1 ⊕ S2 by T and S respectively, we have:
L = ϕ(Y ⊕ F (X ⊕ T ⊕ a) ⊕ S) + B. (9)
Due to (5), Relation (9) may be developed in the following way according to the
values of a and T :


⎪ ϕ( F (X ⊕ T ) ⊕ S ) + B if a = 0,


⎨ ϕ( F (X) ⊕ S ) + B if a = 1 and T = (a − 1),
L = ϕ( F (X) ⊕ S ) + B if a > 0 and T = a,



⎪ ϕ( Da⊕(a−2) F (X ⊕ (a − 2) ⊕ (a − 1) ) + B if a > 1 and T = (a − 1),

ϕ( Da⊕(a−1) F (X ⊕ (a − 1) ⊕ T ) ) + B otherwise,
(10)
where Dy F denotes the derivative of F with respect to y ∈ Fn2 , which is defined
for every x ∈ Fn2 by Dy F (x) = F (x) ⊕ F (x ⊕ y).
In the three first cases in (10), the presence of S implies that the leakage L
is independent of X. Indeed, in these cases the leakage is of the form ϕ(Z) + B
where Z is an uniform random variable independent of X. In the last two cases,
S does not appear anymore. As a consequence it may be checked that the leakage
L depends on X. Indeed, due to the law of total probability, for any x and a = 1,
the mean of (L|X = x) satisfies:
n
2 −1
2μ 1 
E(L|X = x) = n + n ϕ(Da F (x ⊕ t)), (11)
2 2 t=2
76 J.-S. Coron et al.

where μ denotes the expectation E[ϕ(U )] with U uniform over Fn2 (e.g. for ϕ =
HW we have μ = n/2). And when a > 1, the mean of (L|X = x) satisfies:
μ 1
E(L|X = x) = + n ϕ(Da⊕(a−2) F (x ⊕ (a − 2) ⊕ (a − 1)))
2n 2
n
2 −1
1
+ n ϕ(Da⊕(a−1) F (x ⊕ (a − 1) ⊕ t)). (12)
2
t=0,t=a,(a−1)

From an algebraic point of view, the sums in (11) and (12) may be viewed as the
mean of the value taken by Da F (x ⊕ t) (respectively Da⊕(a−1) F (x ⊕ (a − 1) ⊕ t))
over the coset x⊕{t, t ∈ [2, 2n −1]} (respectively x⊕{t, t ∈ [0, 2n −1]\{a−1, a}}).
Since those cosets are not all equal, the means are likely to be different for some
values of x. Let us for instance consider the case of F equal to the AES s-box and
let us assume that ϕ is the identity function. In Relation (11), the sum equals
34066 if x = 1 and equals 34046 if x = 2. When a > 1, we have the similar
observation.
From (11) and (12), we can deduce that the mean leakage reveals information
on X and thus, the set of observations can be used to perform a first-order SCA.
By exhibiting several attacks in this section, we have shown that the second-
order countermeasure proved to be secure in ODL model may be broken by a
first-order attack in MTL model. These attacks demonstrate that a particular
attention must be paid when implementing Algorithm 1 on a device leaking in
MTL model. Otherwise, first-order leakage may occur as those exploited in the
attacks presented above. As already mentioned in the introduction, a natural
solution to help the security designer to deal with those security traps could be
to systematically erase the registers before any writing. This solution is presented
and discussed in the next section.

4 Study of a Straightforward Patch


In the following, we present a straightforward method to patch the flaw exhibited
in the previous section. The aim of this patch is to transform an implementation
secure in ODL model into an implementation secure in MTL model. It essentially
consists in erasing the memory before each new writing. In this section, we
evaluate this strategy when applied to implement Algorithm 1 leaking in MTL
model. Then, we show that this natural method does not suffice to go from
security in ODL model to security in MTL model. Indeed, we present a second-
order attack against the obtained second-order countermeasure.

4.1 Transformation of Algorithm 1 into a MTL-Resistant Scheme


As in the previous section, we assume that the leakage model is MTL model and
that the registers Rb and Rb are initially set to zero. In order to preserve the
security proof given in the first model, we apply a solution consisting in erasing
the memory before each new writing.
Conversion of Security Proofs from One Leakage Model to Another 77

Based on these assumptions, the fourth step of Algorithm 1 can be imple-


mented in the following way:
4.1 Rcmp ← 0
(13)
4.2 Rcmp ← F (x̃ ⊕ a) ⊕ s1 ⊕ s2
As previously, we assume that the initial state of Rcmp before Step 4.1 is equal
to Y . Then, according to this decomposition, the register Rcmp is set to 0 before
the writing of Z = F (X̃ ⊕ a) ⊕ S1 ⊕ S2 in the Step 4.2. Hence, the leakage
defined by (6) is replaced by the sequence of consecutive leakages ϕ(Y, 0) + B1
(Step 4.1), ϕ(0, Z) + B2 (Step 4.2), that is ϕ(Y ) + B1 , ϕ(Z) + B2 . However this
model is not equivalent to the ODL model since here the previous value in Rcmp
leaks whenever it is erased. And as we show hereafter, such a leakage enables a
second-order attack breaking the countermeasure althought secure in the ODL
model.

4.2 Description of a Second-Order Attack


To perform our second-order attack, we use two information leakages L1 and L2
during the same execution of Algorithm 1 implemented with (13).
The first leakage L1 corresponds to the manipulation of X̃ prior to Algorithm 1.
L1 thus satisfies:
L1 ∼ ϕ(X̃) + B0 . (14)
The second leakage L2 corresponds to Step 4.1 of (13). Thus it satisfies:
L2 ∼ ϕ(Y ) + B1 . (15)
From (5) and (15), we deduce:


⎪ ϕ(0) + B1 if a = 0 ,

⎪ if a = 1 and T1 ⊕ T2 = 0 ,
⎨ ϕ(0) + B1
L2 = ϕ(0) + B1 if a > 0 and T1 ⊕ T2 = a ,



⎪ ϕ(F (X̃ ⊕ (a − 2)) ⊕ S1 ⊕ S2 ) + B1 if a > 1 and T1 ⊕ T2 = (a − 1) ,

ϕ(F (X̃ ⊕ (a − 1)) ⊕ S1 ⊕ S2 ) + B1 otherwise.
(16)
which implies that:


⎪ ϕ(0) + B1 if a=0 ,


⎨ ϕ(0) + B1 if a=1 and T1 ⊕ T2 =0 or 1 ,
L2 = ϕ(Z) + B1 if a=1 and T1 ⊕ T2 
=0 or 1 , (17)



⎪ ϕ(0) + B1 if a>1 and T1 ⊕ T2 =a ,

ϕ(Z) + B1 if a>1 and T1 ⊕ T2 
=a ,
where Z is a variable independent of X and with uniform distribution.
From (17), the leakage is independent from T1 ⊕ T2 when a = 0. For this
reason, in the following we only study the mean of L2 for a > 0:


⎪ ϕ(0) if a = 1 and T1 ⊕ T2 = 0 or 1 ,

ϕ(Z) if a = 1 and T1 ⊕ T2  = 0 or 1 ,
E(L2 ) =

⎪ ϕ(0) if a > 1 and T 1 ⊕ T 2 =a ,

ϕ(Z) if a > 1 and T1 ⊕ T2  =a ,
78 J.-S. Coron et al.

or equivalently (since Z has uniform distribution):



⎨ ϕ(0) if a = 1 and T1 ⊕ T2 = 0 or 1 ,
E(L2 ) = ϕ(0) if a > 1 and T1 ⊕ T2 = a , (18)
⎩ n
2 otherwise.

On the other hand, the leakage L1 depends by definition on X ⊕ T1 ⊕ T2 . As a con-


sequence, one deduces that the pair (L1 , L2 ) statistically depends on the sensitive
value X. Moreover, it can be seen in (18) that the leakage on T1 ⊕ T2 is maximal
when a = 1. An adversary can thus target the second loop in Algorithm 1 (i.e.
a = 1), make measurements for the pair of leakages (L1 , L2 ) and then perform a
2O-CPA to extract information on X from those measurements.

Fig. 1. Convergence with simulated curves without noise, for a = 1

We have simulated such a 2O-SCA with X = M ⊕ K where M is a 8-bit


value known to the attacker and K a 8-bit secret key value. By combining L1
and L2 using the normalized multiplication and the optimal prediction function
as defined in [11], the secret value k is recovered with a success rate of 99% by
using less than 200.000 curves. Fig.1 represents the convergence of the maximal
correlation value for different key guesses over the number of leakage measure-
ments. Each curve corresponds to some hypothesis on the secret K. In particular
the black curve corresponds to the correct hypothesis k.
The second-order attack presented in this section show that erasing registers
before writing a new value does not suffice to port the security of an implemen-
tation from ODL model to MTL model. For the case of Algorithm 1, a possible
patch is to erase Rcmp using a random value. However, though this patch works
Conversion of Security Proofs from One Leakage Model to Another 79

in the particular case of Algorithm 1, it does not provide a generic method to


transform a dth-order countermeasure secure in the ODL model to a dth-order
countermeasure secure in the MTL model. The design of such a generic method
is an interesting problem that we leave open for future research.

5 Experimental Results

This section provides the practical evaluation of the attacks presented above. We
have verified the attacks on block ciphers with two different kinds of s-boxes:
an 8-bit to 8-bit s-box (AES) and two 4-bit to 4-bit s-boxes (PRESENT and
Klein). We have implemented Algorithm 1 as described in Section 4.1 on a 8-bit
microcontroller. Using 2O-CPA, we were able to find the secret key for all three
s-boxes. In case of the 4 × 4 s-boxes, we needed fewer than 10.000 power traces to
find the correct key. However, for the 8 × 8 s-box, the number was much higher,
since more than 150.000 traces were required to distinguish the correct key from
the rest of the key guesses.
Initially, we set the value in the two memory locations R0 and R1 to zero. We
randomly generate the plaintexts mi and the input/output masks ti,1 , ti,2 and
si,1 , si,2 using a uniform pseudo-random number generator where the value of i
varies from 1 to N (i.e., the number of measurements). Then, we calculate x̃i
from the correct key k via x̃i = k ⊕ mi ⊕ ti,1 ⊕ ti,2 . As described in Section 4.1,
before writing a new value to any memory location, we first erase its contents
by writing 0, and then write the new value as shown in (13). For verifying the
attacks, we only consider the power traces where a = 1. During respectively
the manipulation of the x̃i and the memory erasing, we measure the power
consumption of the device. This results in a sample of pairs of leakage points that
are combined thanks to the centered product combining function defined in [11].
For each key hypothesis kj , the obtained combined leakage sample (Li )1≤i≤N is

0.12
0.12

0.1
0.1

0.08
0.08
Max (|ρ |)
i
K
Max (|ρ |)

0.06
i
K

0.06

0.04
0.04

0.02
0.02

0
0 0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20
Number of Measuremesnts (X 1000) Number of Measuremesnts (X 1000)

Fig. 2. Convergence with practical im- Fig. 3. Convergence with practical im-
plementation of 20-CPA for Klein plementation of 20-CPA for PRESENT
80 J.-S. Coron et al.

0.05
0.035

0.045

0.03
0.04

0.025 0.035

0.03

Max (|ρ |)
0.02

i
K
Max (|ρ |)
i
K

0.025
0.015

0.02

0.01
0.015

0.005 0.01

0.005
0 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7
Number of Measuremesnts (X 50000) Number of Measuremesnts (X 2000)

Fig. 4. Convergence with practical im- Fig. 5. Convergence with practical im-
plementation of 20-CPA for AES plementation of 10-CPA for PRESENT

correlated with the sample of hypotheses (HW (mi ⊕ kj ))1≤i≤N . The key guess
for which the correlation coefficient is the maximum will be the correct key.
Figure 2 and Figure 3 show the correlation traces for a 2O-CPA on the Klein
and PRESENT s-boxes, respectively. As it can be observed, the right key is found
in both cases with less than 10.000 power traces. Figure 4 shows the correlation
traces for a 2O-CPA on the AES s-box. Here the convergence of the traces to
the correct key is observable only after 150.000 traces. Finally, Figure 5 shows
the first-order attack on the PRESENT s-box in the Hamming Distance model
as described in Section 3.2. Here we implemented Algorithm 1 directly without
the additional step of erasing the memory contents before performing a write
operation. The power traces are collected for 50.000 inputs, and only the traces
corresponding to the case a = 1 are considered. The correct key candidate can
be identified with less than 10.000 traces.

6 Conclusion and Perspectives

In this paper, we have shown that a particular attention must be paid when
implementing a countermeasure proved to be secure in one model on devices
leaking in another one. In particular we have shown that the second-order coun-
termeasure proposed in [14] together with a security proof in ODL model is
broken by first-order SCA when running on a device leaking in MTL model.
Then, we have focused on a method that looked at first glance very natural to
convert a scheme resistant in ODL model in a new one secure in MTL model.
Our analysis pointed out flaws in the conversion method and hence led us to
identify two new issues that we think to be very promising for further research.
The first issue is the design of a generic countermeasure proved to be secure in
any practical model and the second is the design of a method of porting the
security from a model to another one.
Conversion of Security Proofs from One Leakage Model to Another 81

References
1. Blömer, J., Guajardo, J., Krummel, V.: Provably Secure Masking of AES. In:
Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 69–83.
Springer, Heidelberg (2004)
2. Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards Sound Approaches to
Counteract Power-Analysis Attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS,
vol. 1666, pp. 398–412. Springer, Heidelberg (1999)
3. Doget, J., Prouff, E., Rivain, M., Standaert, F.: Univariate side channel attacks and
leakage modeling. In: Schindler, W., Huss, S. (eds.) Second International Workshop
on Constructive Side-Channel Analysis and Secure Design – COSADE 2011 (2011)
4. Genelle, L., Prouff, E., Quisquater, M.: Thwarting Higher-Order Side Channel
Analysis with Additive and Multiplicative Maskings. In: Preneel, B., Takagi, T.
(eds.) CHES 2011. LNCS, vol. 6917, pp. 240–255. Springer, Heidelberg (2011)
5. Goubin, L., Patarin, J.: DES and Differential Power Analysis – The Duplication
Method. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 158–172.
Springer, Heidelberg (1999)
6. Messerges, T.S.: Securing the AES Finalists Against Power Analysis Attacks. In:
Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 150–164. Springer, Heidelberg
(2001)
7. Oswald, E., Mangard, S., Pramstaller, N.: Secure and Efficient Masking of AES –
A Mission Impossible? Cryptology ePrint Archive, Report 2004/134 (2004)
8. Oswald, E., Schramm, K.: An Efficient Masking Scheme for AES Software Imple-
mentations. In: Song, J., Kwon, T., Yung, M. (eds.) WISA 2005. LNCS, vol. 3786,
pp. 292–305. Springer, Heidelberg (2006)
9. Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Power and Electromagnetic Anal-
ysis: Improved Model, Consequences and Comparisons. Integration 40(1), 52–60
(2007)
10. Prouff, E., Rivain, M.: A Generic Method for Secure SBox Implementation. In:
Kim, S., Yung, M., Lee, H.-W. (eds.) WISA 2007. LNCS, vol. 4867, pp. 227–244.
Springer, Heidelberg (2008)
11. Prouff, E., Rivain, M., Bévan, R.: Statistical Analysis of Second Order Differential
Power Analysis. IEEE Trans. Comput. 58(6), 799–811 (2009)
12. Prouff, E., Roche, T.: Higher-Order Glitches Free Implementation of the AES Using
Secure Multi-party Computation Protocols. In: Preneel, B., Takagi, T. (eds.) CHES
2011. LNCS, vol. 6917, pp. 63–78. Springer, Heidelberg (2011)
13. Rivain, M., Dottax, E., Prouff, E.: Block Ciphers Implementations Provably Secure
Against Second Order Side Channel Analysis. Cryptology ePrint Archive, Report
2008/021 (2008), http://eprint.iacr.org/
14. Rivain, M., Dottax, E., Prouff, E.: Block Ciphers Implementations Provably Secure
Against Second Order Side Channel Analysis. In: Nyberg, K. (ed.) FSE 2008.
LNCS, vol. 5086, pp. 127–143. Springer, Heidelberg (2008)
15. Rivain, M., Prouff, E.: Provably Secure Higher-Order Masking of AES. In: Man-
gard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427.
Springer, Heidelberg (2010)
16. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Chan-
nel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659,
pp. 30–46. Springer, Heidelberg (2005)
17. Schramm, K., Paar, C.: Higher Order Masking of the AES. In: Pointcheval, D.
(ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 208–225. Springer, Heidelberg (2006)
Attacking Exponent Blinding
in RSA without CRT

Sven Bauer

Giesecke & Devrient GmbH,


Prinzregentenstrasse 159,
P.O. Box 80 07 29,
81607 Munich,
Germany
sven.bauer@gi-de.com

Abstract. A standard SPA protection for RSA implementations is ex-


ponent blinding (see [7]). Fouque et al., [4] and more recently Schindler
and Itoh, [8] have described side-channel attacks against such implemen-
tations. The attack in [4] requires that the attacker knows some bits of
the blinded exponent with certainty. The attack methods of [8] can be
defeated by choosing a sufficiently large blinding factor (about 64 bit).
In this paper we start from a more realistic model for the information
an attacker can obtain by simple power analysis (SPA) than the one that
forms the base of the attack in [4]. We show how the methods of [4] can
be extended to work in this setting. This new attack works, under certain
restrictions, even for long blinding factors (i.e. 64 bit or more).

Keywords: SPA, RSA, exponent blinding.

1 Introduction
Consider a cryptographic device, e.g. a smart card, that calculates RSA signa-
tures. The device needs to be secured against side-channel attacks. The blinding
of the secret exponent, [7], is one standard countermeasure in this situation.
To sign a value x, the device generates a random number r and calculates the
signature as xd+rϕ(N ) mod N , where N is the RSA modulus and d is the secret
exponent. For each signature calculation, a fresh random number r is generated.
So an attacker who uses power analysis obtains exactly one power trace from
which he has to extract d+ rϕ(N ). This is unrealistic for modern hardware, even
if this hardware is not perfectly SPA resistant.
The attack by Schindler and Itoh, [8], starts with power traces of several
signing processes. The attacker obtains power traces corresponding to blinded
exponents d + rj ϕ(N ), j = 0, . . . n − 1. The idea of Schindler and Itoh is to look
for power traces with the same blinding factor or, more generally, sets of power

This work has been supported by the German Bundesministerium für Bildung und
Forschung as part of the project RESIST with Förderkennzeichen 01IS10027E. Re-
sponsibility for the content of this publication lies with the author.

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 82–88, 2012.

c Springer-Verlag Berlin Heidelberg 2012
Attacking Exponent Blinding in RSA without CRT 83

traces whose blinding factors add up to the same sum. The number of power
traces required to have enough of these “collision” is given by the (generalised)
birthday paradox. The number of “collisions” becomes too small, or the number
of sums to evaluate too large, for larger blinding factors (64 bit, for example).
The attacker is allowed to make a limited number of errors, i.e. identify some
collisions incorrectly.
Fouque et al., [4] use a completely different attack method. In their approach
the attacker also observes a number of power traces with different blinding fac-
tors. However, they assume that the attacker knows a few bits of each blinded
exponent with certainty. The attacker then uses an exhaustive search on the
unknown bits of each rj . Redundancy in the key material is used to calculate an
approximation to d+ r̃j ϕ(N ), where r̃j is the attacker’s guess for rj . The attacker
discards guesses which do not match the known bits of d + rj ϕ(N ). Having thus
obtained a number of blinding factors rj the attacker guesses chunks of d and
ϕ(N ) from the least significant bit upwards, again discarding guesses that do
not match known bits.
As pointed out in [8] the model in [4] is not very realistic. An SPA attacker
will always have noisy measurements. Single bits are never known with certainty.
For example, an SPA attacker who looks at a single power trace of a square-and-
multiply implementation can usually separate individual operations but can only
give a probability that any particular operation is a squaring or a multiplication.
In Sect. 2 we give a realistic model of the information an SPA attacker obtains
that captures this idea. In Sect. 3 we translate the attack of [4] into this setting.
We show how an attacker, given some information about each bit of a blinded
exponent d + rϕ(N ), can use redundancy in key material to correct observation
errors and obtain r. Repeating this several times, our attacker can then find a
number of blinding factors rj and then combine the information of several power
traces to determine d and ϕ(N ). The idea to correct errors in noisy observation
by exploiting redundancy in key material was inspired by cold boot attacks (see
[5], [6]).
For the remainder of this paper we assume that an SPA attacker measures
power traces on a cryptographic device while the device calculates the expo-
nentiation in an RSA signature generation. The exponentiation is implemented
as a square-and-multiply algorithm and is protected by exponent blinding as
described above. It is of course the attacker’s goal to find the secret exponent d.

2 A Statistical Model
Given these assumptions, the security of the cryptographic device depends on the
indistinguishability of squarings and multiplications. In practice, however, this is
rarely perfect. Usually, an attacker will be able to say that a particular operation
is a squaring with some likelihood. Note that, even if the attacker guesses 98%
of the exponent bits correctly, correcting the 2% erroneous guesses at unknown
positions in a 1024 bit exponent is unrealistic with exhaustive search.
We assume that for each (square or multiply) operation there is one point in
time for which power consumption follows a normal distribution N (μS , σ) for a
84 S. Bauer

Fig. 1. Histogram of current measurements for square and multiply operations with
mean, standard deviation and sample size. Normal distributions with corresponding
means and standard deviations are also shown.

squaring and N (μM , σ) for a multiplication. This model has been justified by
measuring power traces on a smart card. An example is shown in figure 1. The
figure shows the distribution of the current during a fixed point in time for 16500
squarings and multiplications. The samples were measured on a typical smart
card controller. The same code was used for both squarings and multiplications.
The larger |μS − μM |/σ, the easier it is for the attacker to distinguish squarings
from multiplications.
We suppose that the attacker knows μS , μM and σ. He can obtain these
values, for example, by studying the usually unprotected RSA signature verifi-
cation if the same square-and-multiply implementation is used for both signature
generation and verification.
Now the attacker can convert a power trace that captures m operations (each
a squaring or a multiplication) to a sequence of values cj , j = 0, . . . , m − 1,
where each cj gives the likelihood that the j-th operation is a squaring. Each cj
is drawn from a distribution with p.d.f.
2 2
e−(t−μs ) /(2σ )
g(t) = )2 /(2σ2 )
, t∈R (1)
e−(t−μs + e−(t−μM )2 /(2σ2 )

Note that template matching (see [2]) gives information of the same kind. How
well a template for a squaring operation matches, determines, how likely the
attacker thinks this operation to be a squaring.
Attacking Exponent Blinding in RSA without CRT 85

This model is more subtle than the usual assumption that a certain fraction
of an attacker’s guesses is incorrect. Our model also captures that the attacker
knows which guesses are more likely to be correct than others.
Translations between our model and the usual assumption are easily done. For
a sample t from a power trace, an attacker will decide that the corresponding
operation is a squaring if |t − μS | < |t − μM |. The probability that the guess
is correct is Φ−1 (|μS − μM |/(2σ)), where Φ is the c.d.f. of the standard normal
distribution. From a statistical table we see that if |μS − μM |/σ = 2 the attacker
guesses correctly about 84% of the time.

3 The Attack
As in [4] we assume that the public exponent e is small (i.e. about 16 bit).
This is no real restriction since in practice e = 65537 seems to be used almost
exclusively.
Like the attack in [4], our attack consists of three steps. In a first step, the
attacker looks at a single power trace and builds a list of likely candidates for
the random blinding factor r. In a second step, this list is narrowed down to just
one value, so the attacker finds r for this power trace. The attacker repeats this
for a number n of power traces, obtaining the corresponding blinding factors rj ,
j = 0, . . . n − 1. Finally, the attacker puts the information together to construct
d and ϕ(N ).

3.1 Step 1: Find a List of Likely Candidates for the Blinding Factor
The attacker has recorded a single power trace of an RSA signature generation
with blinded exponent d+rϕ(N ) and converted it to a sequence cj , j = 0, . . . , m−
1 as in Sect. 2. So cj is the likelihood that operation j is a squaring.
Assume the random blinding factor r is  bits long. Note that the  most
significant bits of d + rϕ(N ) depend only on r and the  most significant bits
of ϕ(N ) (up to a carry that might spill over, but is not very likely to propagate
very far). It is a well known trick of the trade to approximate the high bits of
ϕ(N ) by N , see, for example [1], [4].
The attacker makes a guess r̃ for the i most significant bits of r, calculates the i
most significant bits of r̃N and derives the corresponding sequence v0 , . . . , vw−1
of squarings or multiplications, i.e. vj ∈ {S, M}. The attacker can judge the
quality of his guess by calculating


w−1
cj if vj = S
Q1 (r̃) = log qj , where qj = (2)
j=0
1 − cj if vj = M

The higher Q1 (r̃), the more likely the guess r̃ is to be correct. This way, the
attacker can assemble a set of guesses for the most significant bits of r, discarding
those with a low value Q1 . Given this set, the attacker guesses additional lower
bits, again discarding those guesses which score too low under Q1 .
86 S. Bauer

3.2 Step 2: Use Redundancy in the Key to Find the Correct


Blinding Factor

As a result of step 1, the attacker has a set C of guesses for r. The size of this
set is typically a few millions. The next task is to find the one correct value for
r, which, he hopes, is contained in his set of guesses.
By definition of the secret exponent d there is an integer k, 0 < k < e such
that
ed − kϕ(N ) = 1. (3)
Approximating ϕ(N ) by N again, we obtain an approximation for d:

˜ = 1 + kN .
d(k) (4)
e
The attacker now runs through all possible values of k and all guesses r̃ ∈ C and
˜ + r̃N . This he views as an exponent and writes
calculates the upper half of d(k)
down the corresponding sequence v0 , . . . , vw−1 of squarings or multiplications,
i.e. vj ∈ {S, M}. In a way similar to Sect. 3.1 he can judge the likelihood of k
and r̃ being correct by calculating


w−1
cj if vj = S
Q2 (r̃, k) = log qj , where qj = . (5)
j=0
1 − c j if vj = M

The attacker expects that the pair (r̃, k) with the highest value Q2 (r̃, k) is the
correct blinding factor with the correct value of k in (3).
Note that here e, and hence k, is required to be small for the exhaustive search
on k to be feasible.

3.3 Step 3: Find the Secret Exponent

The attacker repeats steps 1 and 2 for a number n of power traces. Note that
the exhaustive search on k in step 2 is only necessary once, because k has the
same value for all power traces.
As a result, the attacker has a set of power traces of exponentiations with
exponent d + rj ϕ(N ), j = 0, . . . , n − 1 and knows all blinding factors rj . Recall
that he also knows the high bits of d (from (3)) and of ϕ(N ) (because ϕ(N ) can
be approximated by N ). For the attacker it remains to find the lower half of d.
To do this, the attacker guesses chunks of d and ϕ(N ) from the least signif-
icant bit upwards. For a guess d, ˜ ϕ̃ of the w least significant bits of d, ϕ(N ),
respectively, he calculates uj = d˜ + rj ϕ̃, j = 0, . . . , n − 1 and converts the w
least significant bits of the uj to a sequence of squarings and multiplications
vj,0 , . . . , vj,m−1 , vj,i ∈ {S, M}. He then calculates

j −1

 m
n−1
cj if vj = S
˜ ϕ̃) =
Q3 (d, log qj,i , where qj = . (6)
j=0 j=0
1 − cj if vj = M
Attacking Exponent Blinding in RSA without CRT 87

As in step 1 the attacker has to keep a set of likely choices for lower bits of d
and ϕ(N ) while working his way upwards from the least significant bit to the
middle of d and ϕ(N ). When he has gone through all unknown lower bits of d
and ϕ(N ) he can then use the known high bits of d and ϕ(N ) to discard wrong
guesses.
The final test is, of course, the signing of a value.

4 Discussion

Step 1 is the most critical part of the attack. If the correct value of r does not
survive in the set of best choices then step 1 of the attack fails. Note that this
can be detected in step 2. A wrong value for r will have a very low score under
Q2 . So the attacker can simply discard these power traces.
We have implemented the attack on a PC. Simulation results suggest that if
r is 32 bit in size, Δ = |μS − μM |/σ = 1.8 and up to 220 candidate values for r
are kept in step 1 then the attack finds r and k in 85% of the cases and within
about a day of computation time on a modern PC. Once k is known, running
steps 1 and 2 for further power traces takes less than a minute. For Δ = 2.4 the
success rate increases to 99%. If Δ = 1.8 the attacker guesses 18.4% of square-
or-multiply operations incorrectly. The value Δ = 2.4 corresponds to an error
rate of 11.5%.
If r is 64 bit in size and Δ = 2.4 the attack finds r and k in 50% of the cases
and within a day if 220 candidate values are kept in step 1. Note that a blinding
factor of this size is sufficient to protect against the attack in [8]. However, the
attack in [8] is more generic and also applies to RSA implementations based on
the Chinese Remainder Theorem or point multiplication on elliptic curves.
We have explained the attack in the context of a square-and-multiply im-
plementation. It is easily extended to m-ary or sliding window methods. The
attack can also be applied to square-and-always-multiply implementations if the
attacker can distinguish real multiplications from fake multiplications with suf-
ficiently high probability.
The input data for the attack is a string of probabilities for a particular oper-
ation to be a squaring. There are many ways to obtain this string of probabilities
from a power trace. For simplicity we suggested that this could be done directly
by choosing a particular point in time within an operation. The attacker can
also apply template matching (see [2]) or correlations within a power trace (see
[3] and [9]). The method by which the probabilities are derived from the power
trace has a significant influence on the size of Δ and hence the efficiency of the
attack. (The larger Δ the more efficient is the attack.)
The most obvious countermeasure is to increase the size of the blinding factor
r. Increasing r makes the blinded exponent d+rϕ(N ) longer and degrades perfor-
mance. It would be interesting to have a formula that expresses the time/memory
complexity of step 1 of the attack in terms of Δ and the size of r.
88 S. Bauer

5 Conclusion

We presented a novel SPA attack against RSA signature generation protected by


exponent blinding. The attack is more realistic and can handle larger blinding
factors than previous attacks.
Last but not least we would like to thank Hermann Drexler and Jürgen Pulkus
for fruitful discussions.

References
1. Boneh, D.: Twenty Years of Attacks on the RSA Cryptosystem. Notices of the
AMS 46, 203–213 (1999)
2. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç, Ç.K.,
Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003)
3. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal Cor-
relation Analysis on Exponentiation. Cryptology ePrint Archive, Report 2010/394
(2010), http://eprint.iacr.org/2010/394
4. Fouque, P.-A., Kunz-Jacques, S., Martinet, G., Muller, F., Valette, F.: Power Attack
on Small RSA Public Exponent. In: Goubin, L., Matsui, M. (eds.) CHES 2006.
LNCS, vol. 4249, pp. 339–353. Springer, Heidelberg (2006)
5. Halderman, J.A., Schoen, S.D., Heninger, N., Clarkson, W., Paul, W., Calandrino,
J.A., Feldman, A.J., Appelbaum, J., Felten, E.W.: Lest We Remember: Cold Boot
Attacks on Encryption Keys. In: 2008 USENIX Security Symposium (2008),
http://www.usenix.org/events/sec08/tech/full papers/halderman/
halderman.pdf
6. Heninger, N., Shacham, H.: Reconstructing RSA Private Keys from Random Key
Bits. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 1–17. Springer,
Heidelberg (2009)
7. Kocher, P.C.: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and
Other Systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113.
Springer, Heidelberg (1996)
8. Schindler, W., Itoh, K.: Exponent Blinding Does Not Always Lift (Partial) Spa
Resistance to Higher-Level Security. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011.
LNCS, vol. 6715, pp. 73–90. Springer, Heidelberg (2011)
9. Walter, C.D.: Sliding Windows Succumbs to Big Mac Attack. In: Koç, Ç.K., Nac-
cache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 286–299. Springer,
Heidelberg (2001)
A New Scan Attack on RSA
in Presence of Industrial Countermeasures

Jean Da Rolt1, Amitabh Das2, Giorgio Di Natale1, Marie-Lise Flottes1,


Bruno Rouzeyre1, and Ingrid Verbauwhede2
1
LIRMM (Université Montpellier II /CNRS UMR 5506), Montpellier, France
{darolt,dinatale,flottes,rouzeyre}@lirmm.fr
2
Katholieke Universiteit Leuven, ESAT/COSIC, Leuven, Belgium
{amitabh.das,ingrid.verbauwhede}@esat.kuleuven.be

Abstract. This paper proposes a new scan-based side-channel attack on RSA


public-key cryptographic implementations in the presence of advanced Design
for Testability (DfT) techniques. The attack is performed on an actual hardware
implementation, for which different test scenarios were conceived (response
compaction, X-Masking). The practical aspects of scan-based attacks on the
RSA cryptosystem are also presented. Additionally, a novel scan-attack security
analysis tool is proposed which helps in evaluating the scan-chain leakage
resilience of security circuits.

Keywords: Scan-attacks, public-key cryptography, DfT methods.

1 Introduction

Security is a critical component of information technology and communication and is


one of the levers of its development because it is the basis for establishing confidence
to end users. Among the security threats, the vulnerability of electronic equipment
that implement cryptography which enable the necessary services of confidentiality,
identification and authentication, is perhaps the most important. Some fraudulent
access or "attacks" on the equipment to extract sensitive information, such as
encryption keys, undermine the whole chain of secure transmission of information.
One of these attacks exploit the scan-chain Design for Test (DfT) infrastructure
inserted for testing the equipment. Testing acts like a double-edged sword. On one
hand, it is very important to test a cryptographic circuit thoroughly to ensure its
correct operation, and on the other hand, this test infrastructure may be exploited by
an attacker to extract secret information.
There have been many scan-attacks on cryptographic circuits proposed in the
literature[1][2], which focus on extracting the stored secret key. Once the secret key is
retrieved, more confidential data may be stolen. These attacks rely on the
observability of intermediate states of the cipher. Even if the cryptographic algorithms
are proven to be secure, accessing their intermediate registers compromise their
strength. The process to mount a scan-attack is as follows: First the cipher plaintext

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 89–104, 2012.
© Springer-Verlag Berlin Heidelberg 2012
90 J. Da Rolt et al.

input is set to a chosen value, then the circuit is reset, followed by its execution in
normal mode for some cycles, and finally the circuit is switched to test mode and the
scan contents are shifted out. By repeating this multiple times with different chosen
plaintexts, the scan contents may be analyzed to find the secret key. In the case of
scan-attacks the basic requirement is that the cipher operation may be stopped at any
moment, and the contents of the intermediate registers can be scanned out, thus
compromising the hardware implementation of the cryptographic algorithm. A
common technique adopted by many smart-card providers is to disable the test
circuitry (such as JTAG) after manufacturing test. This solution may not be
acceptable for systems which require test and debug facilities in-the-field. High
quality test is only ensured by full controllability and observability of the secure
circuit, which may compromise security. Another alternative is BIST, which is
intrinsically more secure. However not all the circuits are suited for BIST (e.g.
microprocessors) and BIST provides just a pass/fail signature which is not useful for
diagnosis. Many countermeasures have been proposed in the literature [3][4],
however, each of them have their limitations and there is no full-proof mechanism to
deal with this leakage through the scan chains.
One of the attacks proposed in the literature, concerns the RSA algorithm [5].
However it supposes that the design has a single scan chain. Unfortunately, this
assumption is not realistic, since more complex DfT methods are required for meeting
the design requirements and reducing the test cost. Techniques such as multiple scan
chains, pattern decompression [6], response compaction [7] and filters to increase the
tolerance to unknowns [8] are commonly inserted in the test infrastructure. These
structures are supposed to behave as countermeasures against scan attacks, due to the
apparent reduction on the observability of internal states, as proposed in [9].
In this paper we propose a new attack on RSA that works even in the presence of
advanced DfT methods. We describe all the issues on carrying out the attack, and how
to overcome them. Additionally, we prove its feasibility by actually performing the
attack on a RSA design. Moreover, the attack may be applied without knowledge of
the DfT structures, which makes the attack more realistic.
The outline of the paper is as follows. In section 2, we present the previous work
performed in the field of scan-attacks on symmetric and public-key ciphers and some
proposed countermeasures. The RSA scan-attack itself is described in section 3. Then
in section 4, we describe how we deal with the practical aspects of performing the
attack. The experimental results containing a discussion about the applicability of the
scan attack in the presence of industrial DfT methods and known scan-attack
countermeasures is presented in section 5. A comparison with the previous RSA scan-
attack is given in section 6. Finally, we conclude the paper with plans for future work
in section 7.

2 Previous Work

The first scan attack proposed in the literature [1] was conceived to break a Data
Encryption Standard (DES) cipher. Karri et al. described a two phase procedure which
A New Scan Attack on RSA in Presence of Industrial Countermeasures 91

consists in first finding the position of the intermediary registers on the scan chain, and
then retrieving the DES first round key by applying only 3 chosen plaintexts. Later the
same authors proposed [2] an attack on the Advanced Encryption Standard (AES). This
one was based on the differential method, which analyses the differences of scan
contents instead of the direct value itself. By using this method, the preliminary step of
identifying the position of the intermediary registers is no longer required. Advances
were also made on proving that public-key implementations are susceptible to scan
attacks. RSA and Elliptic Curve Cryptography (ECC) keys are retrieved by methods
described in [5] and [10] respectively. Besides, some scan-attacks were also proposed
for stream ciphers, for example [11].
Binary exponentiation algorithm is used as the target algorithm for the RSA scan-
attack in [5], while the Montgomery Powering Ladder is used for the ECC attack in
[10]. Both the attack methods are based on observing the values of the intermediate
register of interest on the scan chain for each bit of the secret key (decryption exponent
for RSA, and scalar multiplier for ECC), and then correlating this value with a previous
offline calculation, which the authors refer to as ‘discriminator’. If the value matches
with this discriminator value, a corresponding decision is taken on the key bit.
In order to secure the test structures, several countermeasures have been proposed.
They may be classified in three different groups: (1) methods to control the access to
the test facilities through the use of secure test wrappers [12]; (2) methods to detect
unauthorized scan operations [13] as probing and other invasive attacks; (3) methods
that provide confusion of the stream shifted out from the scan outputs [14].
Additionally, it was suggested in [6] that advanced industrial DfT methods such as
response compression are enough to impede any attack. However, advanced attacks
[15][16] have been conceived to deal with those methods.

3 Principles of the RSA Attack

3.1 RSA
The Rivest-Shamir-Adleman (RSA) algorithm is a widely used public-key
cryptographic algorithm, employed in a wide range of key-exchange protocols, such
as the popular Diffie-Hellman scheme. A brief description of the RSA algorithm is
presented below:

Algorithm 1: RSA Key generation


● Random primes p and q
● N = p*q (1024 bit)
● e = random co-prime to φ(N)=(p-1)*(q-1)
● d = e-1 mod φ(N)

Algorithm 2: RSA Encryption & Decryption


● Ciphertext c = me mod N
● Decrypted plaintext m = cd mod N
Both the above operations are large number modular exponentiations.
92 J. Da Rolt et al.

When RSA is implemented in hardware, there are various possible options and
many algorithms are available. Montgomery Exponentiation method is most often
used, owing to its efficient hardware implementation, as it does away with the
expensive division operation required for modular multiplications involved in an
exponentiation. Hence we choose the Montgomery method as the target for our scan-
chain attack.
The Montgomery product of two n-bit numbers A with B is denoted by:
A * B = A . B . R-1 mod N,
where ‘.’ denotes a modular multiplication, N is the modulus or prime number in the
modular multiplications, and R = 2n, with n being the number of bits of the RSA
algorithm used. In this case study, we are using 1024-bit RSA.
The algorithm for a Montgomery Exponentiation used in RSA can be presented as
follows [17]:
Algorithm 3: Montgomery exponentiation
INPUT: Prime m = (ml−1 … m0)b, R = bl, exponent e = (et … e0)2 with et = 1, and
an integer x, 1 ≤ x < m (l is the number of bits in the prime number, 1024 in our case,
b is the base, which is 2 for binary).
OUTPUT: xe mod m.
1. xtilde ← Mont(x, R2 mod m), A ← R mod m. (R mod m and R2 mod m may
be provided as inputs.)
2. For i from t down to 0 do the following:
(a) A ← Mont(A, A).
(b) If ei = 1, then A ← Mont(A, xtilde).
3. A ← Mont(A, 1).
4. Return (A).
Mont (A, A) is known as the squaring (S) operation, while the Mont(A, xtilde) is
known as the Multiplication operation (M) for Montgomery Exponentiation. The
square and multiply operations are actually modular multiplications implemented
using the Montgomery multiplication algorithm [17]. Each iteration of the loop within
the algorithm consists either of a squaring and multiply operations if the key bit is 1,
or only a squaring operation if the key bit is 0.
In our proposed scan-based attack, we are focusing on the intermediary register (A,
in the algorithm above) which stores the value after each Montgomery multiplication.
Irrespective of how the RSA modular exponentiation is implemented, the intermediate
value will always be stored in a register. For instance, we may have a
hardware/software co-design for the RSA crypto-processor, where the Montgomery
multiplier is implemented as a co-processor in hardware (for efficiency) and the
control logic or the algorithm for the Montgomery exponentiation implemented in
software on a microcontroller. In this case, the results of the intermediate
Montgomery operations may be stored in an external RAM, but this value needs to be
transferred and stored in the registers inside the Montgomery multiplier datapath to
allow the module to perform the computations correctly.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 93

3.2 Target RSA Hardware Implementation


We have made a hierarchical 1024-bit RSA hardware implementation (employing
Montgomery Exponentiation algorithm), which is the target of our proposed scan-
attack. It consists of an adder/subtractor arithmetic module, a Montgomery multiplier
block, and a RSA controller datapath for controlling the square and multiply
operations involved in the exponentiation. This is shown in the block diagram below.
Gezel Hardware software co-design environment [18] was used to create the
design, it was transformed into VHDL using the fdlvhd VHDL converter tool of
Gezel, and finally Synopsys Design Compiler v2009.06 was used to convert the
VHDL file into a gate-level netlist. Our implementation does not consider protection
against Simple Power Analysis (SPA), Differential Power Analysis (DPA) and Fault
Attacks, but test compression techniques supposedly acting as scan-attack
countermeasures have been included.

3.3 Assumptions of Scan Attacks

The leakage analysis as well as the attack methods implemented by this tool lies on
some assumptions:

─ the cipher algorithm is known as well as the timing diagrams. The designer in
charge of checking scan attack immunity should have this information;
─ the scan chain structure is not known by the attacker. The scan length, as the
number of internal chains and the order of the scan flip-flops are also supposed to
be hidden. Although the input/output test pins (interface) are controllable;
─ it is possible to control the test enable pin and then switch from mission mode to
test mode, which allows the cipher operation to be “stopped” at any moment;
─ it is possible to control the input plaintexts (e.g. a design primary input) and to
observe the values related to the intermediate states by means of scan out;

It is important to notice that all these assumptions are shared among all the scan
attacks proposed in the literature. Additionally, these assumptions are fulfilled by
majority of the test scenarios due to the fact that high testability is achieved by
controlling and observing a huge number of design internal nodes.

3.4 Attack Basics: The Differential Mode

One of the main advantages of the attack proposed in our paper over the previous
RSA attacks is the fact that it works in the presence of industrial DfT structures. For
that purpose, the differential mode [2], [16] is used to deal with linear response
compactors which are inserted by majority of the DfT tools. Without compaction, the
values stored in the SFFs are directly observable at the test output while they are
shifted out. On the other hand, in the presence of compaction, each bit at the test
output depends on multiple SFFs. In the case of parity compactors, each output bit is
the XOR operation between the scan flip-flops on the same “slice”. It means that the
actual value stored in one SFF is not directly observable. Instead, if it differs from the
94 J. Da Rolt et al.

value expected, the parity y of the whole slice also differs, and so faults may be
detected. This difference may
m also be exploited by an attacker.
Fig. 1.a shows a cryptoo block, its cipher plaintext, and the intermediate regiister
which is usually the target of
o the scan attack. The rest of the circuit will be omittedd for
didactic reasons. The differrential mode consists of applying pairs of plaintexts, in this
example denoted by (M0, M1).
M The circuit is first reset and the message M0 is loadded.
Then after N clock cycless the circuit is halted and the intermediate register I00 is
shifted out. The same pro ocedure is repeated for the message M1 for which I11 is
obtained. Let’s suppose th hat I0 differs from I1 in 6 bit positions as shown in 1.a,
where a bit flip is repreesented by a darker box. Let’s also suppose that the
intermediate register contaiins only 16 bits and the bits 0, 8, 10, 13, 14, and 15 are
flipping. The parity of the differences is equal to 0, since there is an even numberr of
bit flips.

Fig. 1. a. Desiign with crypto block. b. example of DfT scheme.

In Fig. 1.b, the flip-flopss of the intermediary register are inserted as an examplee of
DfT scenario with response compaction. In this case there are four scan chains diviided
in four slices. RX representts the test output corresponding to the slice X. As it mayy be
seen, if only the bit 0 flips in
i the first slice (an odd number) this difference is refleccted
into a flip of R1. In slice 2, no
n bits flip and thus R2 remains the same. Two flips occuur in
slice 3: 8 and 10. In this casse, both flips mask each other, thus 2 flips (even) result iin 0
flips at the output R3. In slicce 4, 3 bit flips are sensed as a bit flip in R4.
The parity of flips in thee intermediate register is equal to the parity of flips at the
output of the response com mpactor. This comes from a basic property of this kindd of
response compactors: the parity
p of differences measured in the test output is equaal to
the parity of differences in the
t intermediate register.
This property is valid for any possible configuration of the scan chains (number of
scans versus slices). Additio onally it is also valid for compactors with multiple outpputs.
In this case, the differencee measured should consider all compactor outputs. T Thus
using the differential mod de, the attacker observes differences in the intermeddiate
register and then retrieves the secret key. Complex scenarios with other FFs of the
circuit are shown in Section n 4.
A New Scan Atttack on RSA in Presence of Industrial Countermeasures 95

3.5 Description of the Attack


A

As presented in sub-section n 3.1, the Montgomery exponentiation consists of repeatting


the Montgomery multiplicaation operations several times. The first multiplicationn in
the main loop, i.e., the squaaring of A, is always performed independently of the vaalue
of the secret key bit. The second
s multiplication, A times xtilde, is performed onlly if
the decryption key bit is 1. The main idea of the attack proposed here is to check iff the
second operation is executeed or not, by observing the value of A afterwards. If it dooes,
then the key bit is 1, otherrwise it is 0. This procedure is repeated for the whole key
(1024 or 2048 bits).
In order to detect if the second
s multiplication was executed, the attacker must sscan
out the value of A after eacch loop (timing issues detailed in Section 4). Additionaally,
as explained in the previou us sub-section, a pair of plaintexts is used to overcome the
obscurity provided by the response
r compactor. This pair must be properly chosenn so
that a difference on the paarity of A would lead to the decryption bit. For that, iit is
important that we give a paair of specific message inputs to the algorithm. The proccess
to derive these ‘good’ pairs of messages is as follows:

Fig. 2. Hypothesis Decision

First, a pair of random 1024-bit messages is generated using a software pseuudo-


random number generato or. We denote them here as (M0, M1). Then, the
corresponding output respo onses (after one iteration of the exponentiation algorithhm)
are computed on each of th hese messages assuming the key bit to be both ‘0’ and ‘1’.
Let (R00, R01, R10, R11) be the responses for message M0 and M1 for key bit ‘0’
and ‘1’ respectively. Let Paarity(R00), Parity(R01), Parity(R10) and Parity(R11) be the
corresponding parities on these responses. Let P0 be equal to Parity(R00) X XOR
Parity(R10) and P1 be equ ual to Parity(R01) XOR Parity(R11). If P0 != P1, then the
messages are taken to be useful, otherwise they are rejected and the processs is
repeated till a pair of ‘good’ messages is obtained.
After a good pair of messsages is found, it may be applied to the actual circuit. For
both pairs of elements, the application is executed in mission mode for the number of
clock cycles corresponding to the targeted step (decryption key bit). For these pairs of
elements, the scan contentss are shifted out and the parity of the difference at the test
output bitstream is measurred. If the parity of differences is equal to P0, then the
96 J. Da Rolt et al.

hypothesis 0 is correct and the secret key bit is 0. If it is equal to P1, then the secret
key bit is 1. This procedure is repeated for all the bits of the decryption key.

4 Practical Aspects of the Attack

Performing scan attacks on actual designs requires additional procedures which have
not been taken into consideration by some previous attacks proposed in the literature.
The two main practical issues consist of (1) dealing with the other flip-flops of the
design; (2) finding out the exact time to halt the mission mode execution and to shift
out the internal contents. The first issue is solved by analyzing the leakage of the FFs
of the intermediate register at the test output (described in sub-section 4.1). The
second issue is described in sub-section 4.2.

4.1 Leakage Analysis

The scenario of Fig. 1 is commonly taken into consideration by scan attacks, however
in real designs other FFs of the design will be included in the scan chain. These
additional FFs may complicate the attack if no workaround is taken into account.
Fig. 2.a shows a design containing three types of FF. We define here three types of
scan flip-flops (SFFs), depending on the value they store, as shown in Fig. 2.a. T1
SFFs correspond to the other IPs in the design, that store data not dependent on the
secret. T2 SFFs belong to the registers directly related to the intermediate register,
that store information related to the secret key and that are usually targeted by
attackers (e.g. AES round-register). T3 SFFs store data related to the cipher but not
the intermediate registers themselves (such as input/output buffers or other cipher
registers). The leakage, if it exists, concerns the T2 type.
The goals of the leakage analysis is to find out if a particular bit of the intermediate
register (T2) can be observed at the test output, and locate which output bit is related
to it. Thus the analysis is focused on one bit per time, looking for an eventual bit flip
in T2. In order to do that, the pair (M0, M1) is chosen so that the value on T2N for
M0 differs by a single bit from the value T2N for M1. Denoting T2N as the value
stored in T2 after N clock cycles while the design is running in mission mode from
the plaintext M0 (the first event in mission mode is a reset). In Fig. 2.a the darker
blocks represent a bit that flips. Thus, in this case, the least significant bit of T2N
flips. Since the attack tries to verify if it is possible to observe a flip in the LSB of
T2N, it is ideal that there is no flip in T1N. To reduce the effect of the T1 flip-flops,
all the inputs that are not related to the cipher plaintext are kept constant. It means that
T1N for M0 has the same value of T1N for M1. However, the same method cannot be
applied to reduce the effects of T3. Since we suppose that the logic associated with T3
is unknown and since its inputs are changing, the value T3N for M0 may differ from
T3N for M1. In our example, let us flip only three bits of T3.
A New Scan Atttack on RSA in Presence of Industrial Countermeasures 97

Fig. 3. a. Design
n illustrating the categories of FFs. b. DfT scheme.

Figure 2.b shows the ressult of these bit flips in the scan chain and consequentlyy in
the test outputs. For didactic reasons, we suppose that the DfT insertion created 4 sscan
chains, and placed a patterrn decompressor at the input and a response compresssor
with two outputs (R and L)). As it may be seen, the slice 1 contains only T1 scan fflip-
flops, meaning that after thhe response compressor, the values of R1 and L1 are not
supposed to flip (because T1N
T has the same value for M0 and M1). For slice 2, the
same happens. Slice 3 contaains the only flipping bit of T2N and the other flip-flopps in
the slice do not change. In this case, the bit flip of the first bit of T2N is observaable
in R3. It means that an attaacker could exploit the information contained in R3 to ffind
the secret key. Hence, this is considered a security leakage and may be exploitedd by
the attack described in Secttion 3.
Slice 4 and slice 5 contaain flip-flop configurations that may complicate an attaack.
For instance in slice 4, therre are FFs of T1 and T2 that are not affected by a chaange
from M0 to M1. However it contains one FF affected in T3. It implies that the L4
value flips, which may confuse the attacker (he expects a single bit flip caused by the
LSB of T2). In this case, the attacker is able to identify that the value of L44 is
dependent on the plaintextt, but is not able to exploit this information, since the T3
related logic is supposed to be unknown. Another complication is shown in the
configuration of slice 5. If the LSB of T2 is actually on the same slice as a flippping
98 J. Da Rolt et al.

SFF of T3, the flip is masked and no change is observed in L5. In this case, the
attacker is not able to explo
oit that bit
Next, the attacker repeatts this method for each bit of the intermediary register (ee.g.,
1024 times for RSA). If it detected some useful leakage (like R3), he proceeds w with
the attack method explained d in the Section 3.

4.2 Timing Aspects


The scan-based attack on RSA
R is targeted at finding the decryption key (may be 10024
or 2048 bits long). It is verry important to find the exact time to scan out the conteents
of the intermediate registers using the scan chains. The integral timing aspects for the
attack are presented pictoriaally in Fig. 4.

Fig. 4. Timing Estimation Tree

Since the same hardwaree is commonly used for both encryption and decryptionn in
RSA, we can run the hardw ware with a known encryption key in order to get the timming
he attacker must find out the number of clock cycles that a
estimations. For instance, th
Montgomery multiplication operation takes. With a known key, we know the numbeer of
Montgomery multiplicationss required for the square and multiply operations of the R
RSA
modular exponentiation (A Algorithm 1). Dividing the total time of execution for this
encryption by the number of o operations gives the approximate time required for one
Montgomery operation. Theen using repeated trial-and-error steps of the comparing the
actual output with the expeccted result after one Montgomery (presented in Section 33), it
may be possible to find out the
t exact number of clock cycles required.
n our attack during the decryption process to find out the
This timing is utilized in
decryption exponent. The RSAR in hardware is run in functional mode for the exxact
number of cycles needed d to execute a predetermined number of Montgom mery
operations. Then the hardw ware is reset, scan enable is made high and the scan-chhain
contents are taken out. Depending on whether the key bit was 0 or 1, either a squarring
(S) is performed or both square
s (S) and multiply (M) are performed respectively.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 99

In our proposed attack, we always run the software implementation for two
Montgomery cycles taking the key bit as 0 and 1 (two hypothesis in parallel). If the
first bit was 1, both square (S0) and multiply (M0) operations are performed,
otherwise two squarings (S0 & S1) are performed. Then the actual result from the
scan-out of the hardware implementation after each key bit execution is checked with
the results of the simulation in software. If it matches with the first result (of S0 and
M0), then the key bit is 1, otherwise the key bit is 0. Now, for the next step starting
with the right key bit, again the decryption is performed in software assuming both 0
and 1 possibilities. This time we run for one or two Montgomery cycles depending on
whether the previous key bit was 0 or 1 respectively. If the previous key bit was 0,
then Squaring on the next key bit (S2) is performed for key bit 0 and a Multiply on
the same key bit is performed (M1) for present key bit 1. On the other hand, if the
previous key bit was 1, then Squaring on the same (S1) and next key bit (S2) is
performed for present key bit 0 or a Square (S1) and Multiply (M1) on the same key
bit is performed (M1). The results are compared with the actual result from the scan-
out of the hardware implementation, and the corresponding decision taken. The
process is repeated in this way until all the decryption key bits are obtained.
As an example, if the decryption key bits were 101…, the timing decision tree
would follow the path denoted within the dotted lines in the figure (S0, M0, S1,
S2, M2,…).

5 Attack Tool

In order to apply the attack to actual designs, we developed an attack tool. The main
goal of this tool is to apply the attack method proposed in Section 3, as well as the
leakage analysis proposed in Section 4, to many different DfT configurations, without
modifying the attack.
The scan analysis tool is divided in three main parts: the attack methods and
ciphers (implemented in C++), the main controller (Perl), and the simulation part
which is composed by a RTL deck and ModelSIM, as it may be seen in Figure 1. In
order to use the tool the gate-level netlist must be provided by correctly setting the
path for both the netlist and technology files for ModelSIM simulations. Then the
design is linked to the RTL deck, which is used as an interface with the tool logic.
This connection is automatically done by giving the list of input and output data test
pins, as well as the control clock, reset, and test enable pins. Additionally, other inputs
such as plaintext and ciphertext must be set in the configuration file.
Once the DUT is linked, the tool may simulate it by calling ModelSIM SE with the
values established by the main controller. This interface is achieved by setting
environment variables in the Perl script which are read by the ModelSIM Tcl script
and then passed on to the RTL deck via generics. For instance, the information being
exchanged here is the plaintext (cipher specific), reset and scan enable timing (when
to scan and how long) and the value of the scan input (test specific). In return, the
scan output contents are stored in a file and they are processed by the main attack
controller in order to run the attacks.
100 J. Da Rolt et al.

Fig. 5. High-level block diagram of the scan attack tool

On the left side of Fig. 5, the software part is shown (attack method and cipher
description). The new RSA attack is implemented in C++ based on a RSA cipher
implemented in the same language. We previewed new attacks against other ciphers,
e.g. ECC. Scan-attacks on other similar cryptosystems may be conceived since the
tool was built in such a way that adding a new cipher is straight-forward.
The core of the tool is implemented by the attack controller (Perl) which calls the
attack method (using a SWIG interface). The attack controller ensures that the settings
are initialized and then it launches both the attack and the simulation. As a secondary
functionality, the controller handles some design aspects, like timing and multiple test
outputs, so that the attack method itself may abstract that information. For instance,
the attack method has no information on how many clock cycles it takes to execute a
Montgomery multiplication. Also, it finds out the number of scan cycles that the shift
operation must be enabled so all the scan length is unloaded.

6 Experimental Results
In order to test the effectiveness of the attack, we implemented a 1024 bits RSA
algorithm in hardware with separate datapaths for the Montgomery multiplier,
adder/subtractor block and the main controller for the Montgomery exponentiation.
Then we envisaged different scenarios to test the attack flexibility. The first scenario
is a single chain containing all the FFs of the design. Then, in the next subsection, we
used Synopsys DfT Compiler (v2010.12) to insert more complex configurations such
as decompression/compaction. Finally, in the last subsection, we implemented some
countermeasures proposed in the literature to verify if the attack is able to overcome
them. All the runs were performed on a 4 GB Intel Xeon CPU X5460 with four
processors.
A New Scan Attack on RSA in Presence of Industrial Countermeasures 101

The total number of FFs in the design is 9260. Out of these, 4500 belong to the T1
type, 1024 consist of the intermediate register (T2 type) and 4096 belong to the T3
type (see Section 3). Therefore using Synopsys DfT Compiler we inserted a single
chain with all these FFs, and the design was linked with the tool. Then the leakage
analysis was run over this configuration. For identifying each bit of the RSA
intermediate register (1024-bit long), the attack tool takes approximately 3.5 minutes
per bit. Then the tool proceeds with the attack method, in order to find the secret key.
In this phase, the tool takes again approximately 3.5 min per bit of secret key. Both
the timing for the leakage analysis and the attack are strongly dependent on the server
configuration. Additionally, the C++ code takes approximately 5 seconds from the 3.5
minutes, meaning that the simulation limits the execution time.
For our test case, we required around 11 messages to find out the full 1024-bit
RSA exponent. This number is less than that required for the attack presented in [5]
(which takes around 30 messages).

6.1 In Presence of DfT Methods


In order to test our scan-attack in the presence of industrial DfT methods, Synopsys
DFT Compiler was used to insert different DfT configurations in the RSA circuit. In
the first case, 120 scan chains were inserted, without compaction/compression. Since
the tool analyzes each scan output pin separately and independently, and since the
sensitive registers were converted to scan FFs, the attack with the tool was able to
find out the secret key. Changing the position of the sensitive FFs do not change the
result. The time taken to retrieve the key in this case is almost the same as that of the
previous case (with a single chain).
In a second scenario, pattern compaction and response compression were inserted.
Different compression ratios were tested, but as proposed in [13], linear response
compactors do not lead to any increase in the security. Since the test inputs are not
used in the pattern compactor (the plaintext is a primary input), it does not affect the
attack method and hence it is not taken into consideration. As the proposed methods
are all based on the differential mode, the linear compressors do not impede the attack
and also it does not imply significant increase in simulation time.
As a last scenario, the X-tolerant options were activated to add the masking logic
that deals with the unknowns present in the design. The masking blocks some scan
chains at the instant while the contents are shifted out if the test engineer believes that
there is an X that may corrupt the test output. This mask is controlled by the output of
the pattern decompressor, which is controller then by the test inputs. Since the mask is
controllable, it is just a matter of shifting in the right pattern which does not mask the
confidential data. Thus the masking can set when the sensitive data is shifted out.
Hence, our proposed scan-attach still works in the presence of masking.

6.2 In Presence of Proposed Countermeasures


In presence of inverters: one of the countermeasures proposed in the literature is the
insertion of dummy inverters before some FFs of the scan chain [16]. This technique
102 J. Da Rolt et al.

aims at confusing the hacker, since the sensitive data observed at the scan chain may
be inverted. However, since these inverters are placed always at the same location in
the scan chain, they are completely transparent to the differential mode.
The effectiveness of the attack against this countermeasure was validated on the
RSA design containing multiple scan chains and compaction/compression module.
Two implementations were considered with 4630 and 6180 inverters (50% and 75 %
of the overall 9260 FFs in the design respectively) randomly inserted in the scan
chains. For both cases, the tool was able to find leakage points and then to retrieve the
secret key.
In presence of partial scan: depending on the design, not all the flip-flops need to
be inserted in the scan chain in order to achieve high testability. As proposed in [4],
partial scan may be used for increasing the security of a RSA design against scan
attacks. However, the authors suppose that the attacker needs the whole sensitive
register to retrieve the secret key. As it was described in Section 3, the leakage
analysis feature can be used to find out which bits of the sensitive register are inserted
in the scan chain. Once these bits are identified, the attack can proceed with only
partial information, since each bit of the sensitive register is related to the key.
For evaluating the strength of the partial scan, we configured the DfT tool in such a
way so as to not to insert some of the sensitive registers in the scan-chain. In the first
case, half of the sensitive flip-flops were inserted in the chain. The tool was able to
correctly identify all the leaking bits and then to retrieve the secret key. Also in the
worst case situation, i.e., where only one secret bit was inserted in the chain, the tool
was still able to find out the correct secret key.

7 Comparison with Previous RSA Attacks

The approach taken in [4] is for a pure software attack which does not take into
account the practical aspects of applying it to an actual cryptographic hardware
implementation. The timing aspects are crucial to scan attacks on secure hardware,
which has been addressed in this paper. Our scan-attack analysis tool integrates the
actual hardware (in the form of a gate-level netlist with inserted DFT) with the
software emulation which allows us to perform the attack in real-time. The secret
decryption exponent key bits are deciphered on-the-fly using this combined approach.
Left-to-right binary exponentiation (employed in ordinary exponentiation) is used
as the target RSA algorithm for the attack in [4]. This is generally not implemented in
hardware owing to the expensive division operation involved in modular operations.
We target the Montgomery Exponentiation algorithm, which is by far the most
popular and efficient implementation of RSA in hardware, as there are no division
operations involved (owing to performing the squaring and multiply operations in the
Montgomery domain).
Moreover, an inherent assumption in the attack in [4] is that there are no other
exponent key-bit dependent intermediate registers which change their value after each
square and multiply operation. This may not be the practical case in an actual
hardware implementation, where multiple registers are key dependent and change
A New Scan Attack on RSA in Presence of Industrial Countermeasures 103

their values together with the intermediate register of interest in the attack (for
instance, input and output buffers). These registers may mask the contents of the
target intermediate register after XOR-tree compaction (as shown in the leakage
analysis in Section 3). Our proposed scan-attack analysis takes the contents of other
key-dependent registers present in the scan chain, and presents ways to deal with this
problem.
Finally, the attack in [4] cannot be applied to secure designs having test response
compaction and masking (which is usually employed in DfT for all industrial circuits
to reduce the test volume and cost). Our scan-attack analysis, on the other hand,
works in the presence of these scan compression DfT structures.

8 Conclusion

In this paper, we have presented a new scan-based attack on RSA cryptosystems. A


scan-chain leakage analysis for the algorithm is presented, along with the practical
aspects of mounting the attack on an actual hardware implementation of RSA. A
comparison with the previous RSA scan-attack proposal is also made. We present a
scan-chain leakage analysis tool and explain its use through the RSA attack. State-of-
the-art scan attack countermeasures and industrial test compression techniques,
supposed to behave as countermeasures are also evaluated for scan-leakage strength
using RSA as a case study. We successfully attacked the RSA implementation in the
presence of these countermeasures.
As future work, we plan to extend the scope of this scan-based attack on RSA to El
Gamal and other similar public-key implementations based on large number modular
exponentiations. We will also extend the scope of our proposed attack on RSA
implementations with SPA and DPA countermeasures. This can also be an interesting
topic for future contributions in this domain.

Acknowledgement. This work has been supported in part by the European


Commission under grant agreement ICT-2007-238811 UNIQUE and in part by the
IAP Programme P6/26 BCRYPT of the Belgian State. Amitabh Das is funded by a
fellowship from Erasmus Mundus External Cooperation Window Lot 15.

References
1. Yang, B., Wu, K., Karri, R.: Scan Based Side Channel Attack on Dedicated Hardware
Implementations of Data Encryption Standard. In: Proceedings IEEE International Test
Conference, ITC (2004)
2. Yang, B., Wu, K., Karri, R.: Secure Scan: A Design-for-Test Architecture for Crypto
Chips. In: Proceedings ACM/IEEE Design Automation Conference (DAC), pp. 135–140
(June 2005)
3. Sengar, G., Mukhopadhayay, D., Chowdhury, D.: An Efficient Approach to Develop
Secure Scan Tree for Crypto-Hardware. In: 15th International Conference on Advanced
Computing and Communications
104 J. Da Rolt et al.

4. Inoue, M., Yoneda, T., Hasegawa, M., Fujiwara, H.: Partial Scan Approach for Secret
Information Protection. In: European Test Symposium, pp. 143–148 (2009)
5. Nara, R., Satoh, K., Yanagisawa, M., Ohtsuki, T., Togawa, N.: Scan-Based Side-Channel
Attack Against RSA Cryptosystems Using Scan Signatures. IEICE Transaction
Fundamentals E93-A(12) (December 2010), Special Section on VLSI Design and CAD
Algorithms
6. Wang, L.-T., Wen, X., Furukawa, H., Hsu, F.-S., Lin, S.-H., Tsai, S.-W., Abdel-Hafez,
K.S., Wu, S.: VirtualScan: a new compressed scan technology for test cost reduction. In:
Proceedings of International Test Conference, ITC 2004, October 26-28, pp. 916–925
(2004)
7. Rajski, J., Tyszer, J., Kassab, M., Mukherjee, N.: Embedded deterministic test. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems 23(5), 776–
792 (2004)
8. Mitra, S., Kim, K.S.: X-compact: an efficient response compaction technique for test cost
reduction. In: Proc. ITC 2002, pp. 311–320 (2002)
9. Liu, C., Huang, Y.: Effects of Embedded Decompression and Compaction Architectures
on Side-Channel Attack Resistance. In: 25th IEEE VLSI Test Symposium, VTS (2007)
10. Nara, R., Togawa, N., Yanagisawa, M., Ohtsuki, T.: Scan-Based Attack against Elliptic
Curve Cryptosystems. In: Asia South-Pacific Design Automatic Conference, ASPDAC
(2010)
11. Liu, Y., Wu, K., Karri, R.: Scan-based Attacks on Linear Feedback Shift Register Based
Stream Ciphers. ACM Transactions on Design Automation of Electronic Systems,
TODAES (2011)
12. Das, A., Knezevic, M., Seys, S., Verbauwhede, I.: Challenge-response based secure test
wrapper for testing cryptographic circuits. In: IEEE European Test Symposium, ETS
(2011)
13. Hély, D., Flottes, M., Bancel, F., Rouzeyre, B., Berard, N., Renovell, M.: Scan Design and
Secure Chip. In: 10th IEEE International On-Line Testing Symposium, IOLTS 2004
(2004)
14. Hély, D., Bancel, F., Flottes, M., Rouzeyre, B.: Test Control for Secure Scan Designs. In:
European Test Symposium, ETS 2005 (2005)
15. Da Rolt, J., Di Natale, G., Flottes, M., Rouzeyre, B.: New security threats against chips
containing scan chain structures. Hardware Oriented Security and Trust, HOST (2011)
16. Da Rolt, J., Di Natale, G., Flottes, M., Rouzeyre, B.: Scan attacks and countermeasures in
presence of scan response compactors. In:16th IEEE European Test Symposium, ETS
(2011)
17. Menezes, A., van Oorschot, P., Vanstone, S.: Efficient Implementations. In: Handbook of
Applied Cryptography, ch. 14. CRC Press (1996)
18. Gezel Hardware/Software Codesign Environment,
http://rijndael.ece.vt.edu/gezel2/
RSA Key Generation: New Attacks

Camille Vuillaume1 , Takashi Endo1 , and Paul Wooderson2


1
Renesas Electronics, Tokyo, Japan
{camille.vuillaume.cj,takashi.endo.ym}@renesas.com
2
Renesas Electronics, Bourne End, UK
paul.wooderson@renesas.com

Abstract. We present several new side-channel attacks against RSA


key generation. Our attacks may be combined and are powerful enough
to fully reveal RSA primes generated on a tamper-resistant device, unless
adequate countermeasures are implemented. More precisely, we describe
a DPA attack, a template attack and several fault attacks against prime
generation. Our experimental results confirm the practicality of the DPA
and template attacks. To the best of our knowledge, these attacks are
the first of their kind and demonstrate that basic timing and SPA coun-
termeasures may not be sufficient for high-security applications.

Keywords: RSA key generation, prime generation, DPA, template, fault.

1 Introduction
Generating RSA keys in tamper-resistant devices is not only practical but also a
good security practice, since it eliminates the single point of failure represented
by a workstation performing multiple key generations. Generating keys in the
field also raises the question of the necessary level of tamper resistance. The rel-
ative lack of publications related to side-channel attacks on RSA key generation
may give the impression that one can get away with basic countermeasures: the
published attacks concentrate on basic timing or SPA-type leakages [14,1] and
can be foiled with constant-time/SPA-secure implementation. In fact, even se-
cure implementations achieving the highest grade of tamper-resistance according
to the Common Criteria evaluations framework consider only timing and SPA
attacks [2]. However, a careful reading of [1] shows a different picture.
We assume that the trial division algorithm itself and the Miller-Rabin
test procedure are effectively protected against side-channel attacks. [. . . ]
If any security assumptions [. . . ] are violated, it may be possible to im-
prove our attack or to mount a different, even more efficient side-channel
attack.
In this paper, we close the gap and show new attacks against RSA key generation
that can be combined with the techniques from [1] but are also powerful to
enough to fully reveal an RSA key by themselves. The settings that we consider
are similar to [1]: we assume an incremental prime search algorithm possibly

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 105–119, 2012.

c Springer-Verlag Berlin Heidelberg 2012
106 C. Vuillaume, T. Endo, and P. Wooderson

enhanced with sieving, but unlike [1] we focus on the primality test procedure.
Our tools consist of differential power analysis [3], the template attack machinery
[4], fault analysis [5] and combinations thereof. In particular, to the best of our
knowledge, this is the first published fault attack on RSA key generation.
The paper is organized as follows. Section 2 recalls basic facts about RSA key
generation. Section 3 is the first attack, which is a DPA on the least significant
bits of the prime numbers calculated within RSA key generation. Section 4 is
the second attack, a template attack on the most significant bits of the prime
numbers. In Section 5, two fault attacks are described: a fault attack for increas-
ing the number of samples available for leakage attacks, and a safe-error attack
revealing the most significant bits of the prime numbers. Finally, we conclude in
Section 7.

2 RSA Key Generation

We will briefly enumerate basic facts about RSA key generation and prime
number generation; refer to [6] for an more complete overview. On input e
public exponent and  bit length, RSA key generation calculates two random
large primes p and q of bit length , with the additional requirement that
gcd(e, φ(p ∗ q)) = 1, where φ is Euler’s totient function. The RSA private key is
the exponent d = e−1 mod φ(p ∗ q) whereas the public key consists of the public
exponent e and the 2-bit RSA modulus n = p ∗ q.
The most computation-intensive step of RSA key generation is the generation
of prime numbers. To find large primes, one usually selects a random integer and
tests it for primality with a probabilistic test such as the Fermat or the Miller-
Rabin test. The Fermat primality test works as follows: given a prime candidate
p, a random number 0 < a < p is selected and ap−1 mod p is calculated and
compared with 1, which is the expected result when p is prime. It is well-known
that there exist (composite) integers p̃ called Carmichael numbers for which
ap̃−1 = 1 mod p̃ for all integers a such that gcd(a, p̃) = 1, despite p̃ not being
prime. As a result, the Fermat test is rarely used in practice.
Owing to theoretical results on its average and worst case error probabilities
[7], the Miller-Rabin test is often preferred. In the Miller-Rabin test, instead of
p − 1, the odd exponent (p − 1)/2s is employed: first, for a random 0 < a < p,
s
the exponentiation t = a(p−1)/2 mod p is calculated and the result is compared
with 1 and −1; the test is passed if there is a match. If not, the following step
is repeated s − 1 times: t = t2 mod p is calculated and the result is compared
with 1 and −1; if t = 1, the candidate p is composite and the test is failed, but
if t = −1 the test is passed. If after the s − 1 iterations t was never equal to 1 or
−1, the candidate p is composite and the test is failed.
For efficiency reasons, it is preferable to apply a trial division step before ex-
ecuting the costly primality test. In addition, the cost of trial divisions can be
amortized over several candidates when an incremental search algorithm is used
[8,10]. Incremental prime search is one of the techniques recommended by cryp-
tographic standards for prime generation; see for example [11, Appendix B.3.6].
RSA Key Generation: New Attacks 107

However, it is not the only way to sieve candidates that are not divisible by a set
of small primes: there exist other methods [12,13], but for the sake of simplicity
we will restrict the discussion to incremental prime search in this paper.

3 Differential Power Analysis on Least Significant Bits


Here, we will assume that the primality testing is performed with the Fermat
test, which is similar but conceptually simpler than the Miller-Rabin test. We
will also briefly explain how to apply our results to the Miller-Rabin test. In the
typical DPA setting, the attacker controls or knows inputs and/or outputs to
the cryptosystem, predicts intermediate states of the cryptosystem and verifies
the correctness of his prediction using side-channel information. In the Fermat
test t = ap−1 mod p with a random basis a, the attacker can target either the
exponent p − 1 or the modulus p, but has no control or knowledge of the ex-
ponentiation basis a, and essentially no knowledge of the output t except when
t = 1 (in which case the test is passed).

3.1 The Basics


In the following, we will show that under particular circumstances, it is possi-
ble to bypass these limitations and describe an attack that can reveal some of
the least significant bits of the prime candidate processed by the Fermat test.
Recall that in the incremental prime search, the j-th candidate p(j) is tested for
primality and incremented p(j+1) ← p(j) + 2 in case of failure. For the sake of
(j) (j) (j)
simplicity, we assume that there is no trial division. Let (p−1 . . . p1 p0 )2 be
(j)
the binary representation of the j-th candidate p(j) . It is clear that p0 = 1 since
candidates are always odd. In addition, p(j) = p(0) + 2j.
(0) (0)
Assumptions: The bits (pi−1 . . . p0 )2 are known. The target of the attack is
(0)
bit pi . The attacker is able to gather k traces of the Fermat test t(j) =
(j)
(a(j) )p −1 mod p(j) with 0 ≤ j < k. We will use the functions fi+1 and gi+1
defined as follows:
(j) (j)
– fi+1 (j) = pi+1 is the function mapping the increment j to the bit value pi+1 ,
(0)
under the assumption that pi = 0.
(j) (j)
– gi+1 (j) = pi+1 is the function mapping the increment j to the bit value pi+1 ,
(0)
under the assumption that pi = 1.
It is easy to see that the following properties hold.
Property 1 (Antiperiodicity). The functions fi+1 and gi+1 are antiperiodic
with antiperiod 2i . Formally, for j positive integer:

fi+1 (j) = ¬fi+1 (j + 2i ) and gi+1 (j) = ¬gi+1 (j + 2i ), (1)

where ¬ refers to bit negation.


108 C. Vuillaume, T. Endo, and P. Wooderson

(j)
Property 2 (Quadrature Phase). Owing to a different hypothesis for pi ,
the functions fi+1 and gi+1 are in quadrature phase. Formally, for j positive
integer:
fi+1 (j + 2i−1 ) = gi+1 (j) (2)
Property 1 means that the functions fi+1 and gi+1 have their output flipped
every 2i ticks, and Property 2 that the distance between the output flips of fi+1
and gi+1 is 2i−1 .

(j)
p1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

(0)
hypothesis p2 = 0
(j)
p2 0 1 0 1 0 1 0 1 0 1

f3 C D C D C D

(0)
hypothesis p2 = 1
(j)
p2 1 0 1 0 1 0 1 0 1 0

g3 C D C D C D

0 v u 4 8 12 16 20

(0)
Fig. 1. Example of the attack on p3

Attack Methodology: The core of the attack is that the knowledge of the bits
(0) (0) (0) (0)
(pi−1 . . . p0 )2 and the guess of bit pi = 0 (resp. pi = 1) tells us when the
output of the function fi+1 (resp. gi+ ) is going to be flipped (without knowing the
(0)
exact value of the output). The attacker considers the two hypotheses pi = 0
(0)
and pi = 1. Accordingly, the attacker is able to calculate the smallest increment
u (resp. v) for which the output of the function fi+1 (resp. gi+1 ) is flipped. By
Property 2, it is clear that we have |u − v| = 2i−1 .
(0)
Next, following hypothesis pi = 0 , the attacker distributes the k measured
power traces of the Fermat test into two classes C and D:
– The power traces with index j < u belong to class C.
– The next 2i traces with index u ≤ j < u + 2i belong to class D.
– The next 2i traces with index u + 2i ≤ j < u + 2 ∗ 2i belong to class C.
– And so on.
RSA Key Generation: New Attacks 109

After that, the attacker computes Δ0 the difference of the average power traces
(0)
in classes C and D. Finally, the same thing is done for hypothesis pi = 1 in
1
order to obtain Δ . Figure 1 illustrates the result of the two classifications.

(0)
Attack Result. On the one hand, a correct hypothesis pi = b leads to a correct
classification in the classes C and D. As a result, all of the traces in class C have
(j)
the same value for bit pi+1 = β, and all of the traces in class D have the same
value for bit pi+1 = 1 − β. Thus, the differential power trace Δb should exhibit
(j)

(j)
a peak as a result of opposite values for bit pi+1 in the two classes.
(0)
On the other hand, an incorrect hypothesis pi = 1 − b leads to an incorrect
classification in the classes C and D. As a consequence of Property 2, about one
(j) (j)
half of the power traces in C have pi+1 = 0 and the other half pi+1 = 1, and the
same thing can be said for class D. Therefore, the large peak that can be seen
in Δb should vanish in Δ1−b .

3.2 Discussion and Extensions

Experimental Results. We experimented with our attack, using a secure micro-


controller executing Fermat tests ap−1 mod p, where p was incremented between
calls to the test. For the sake of simplicity, we selected a fixed basis a = 2, which
is sometimes suggested for accelerating the Fermat test, but we do not expect
that this choice has a significant impact on our results. As depicted in Figure 2,
we were able to observe DPA peaks with as little as 100 samples, but the peaks
should be already visible with even less samples.

correct classification
incorrect classification

(0)
Fig. 2. Attack result for DPA on p1 , 100 samples
110 C. Vuillaume, T. Endo, and P. Wooderson

It is important to understand that while we use one single bit for the classifi-
cation of power traces, the physical state of this bit is only a minor contributor
to the DPA peak. Indeed, flipping one exponent bit can trigger a vastly differ-
ent behavior of the exponentiation algorithm, such as the activation of different
hardware modules, the execution of different program branches with different
addresses or the access to different pieces of data located at different addresses.
This is the reason why we were able to see DPA peaks with a very small number
of samples. Of course, the practicality of our attack depends on the targeted
hardware and software. Although we do not reveal the details of our experimen-
tal setup, we think that our result serves its purpose, namely showing that the
attack is not only theoretical but also practical, and that the scope of security
evaluations and countermeasures in prime generation should not be limited to
timing attacks or SPA.

Discussion. The number of bits that can be revealed by this attack is limited
by nature, because at least 2i+1 samples (e.g. executions of the Fermat test) are
(0) (0)
necessary to reveal bits p1 to pi . According to the prime number theorem,
the number of -bit primes is about:
2 2−1 2−1
π(2 ) − π(2−1 ) ≈ − ≈ (3)
ln(2 ) ln(2−1 )  ln 2
Above, π is the prime-counting function. Thus, the average distance between
two -bit primes is 2 ln 2, and the average distance between a random -bit
number and the next prime is about  ln 2. For  = 512 bits and excluding even
integers, this means that there are on average about 177 executions of the Fermat
test until a prime number is found, and for  = 1024 bits 354 executions. As a
consequence, the best that we can hope for is that the DPA attack is effective
for the first 6 or 7 low significant bits. However, as it will be shown in Section
4, revealing a few bits (even one single bit) will serve our purpose well enough,
because the DPA is only the first stage of a more complex attack.

Trial Division. The attack description above assumes that the Fermat test is cal-
culated after each increment of the prime candidate p(j) . In practice, incremental
prime search is combined with trial division therefore the Fermat test is not sys-
tematically executed. But if the increment δ = j  − j between the executions

of the Fermat test with successive candidates p(j) and p(j ) is known, the same
attack methodology can be applied. It seems possible to obtain δ if the imple-
mentation is careless (i.e. not protected against SPA). Note that this may occur
even if a countermeasure against the attack presented in [1] is implemented.
However, trial division obviously decreases the number of times the Fer-
mat test is performed. Using the small primes π2 = 3, π3 = 5, . . . , πT (note
that 2 is excluded because we consider odd numbers only) and for πT large
enough, by direct application of the prime number theorem, the number of ex-
ecutions of the Fermat test (i.e. the number of survivors of the trial division
step) is approximately divided by ln(πT )/2. For example, with 256 small primes,
ln(π257 )/2 = ln(1621)/2 ≈ 3.7.
RSA Key Generation: New Attacks 111

Miller-Rabin Test. The exponentiation in the Miller-Rabin test uses (p(j) −


(j)
1)/2s as exponent instead of p(j) − 1. In other words, the Miller-Rabin test
skips the s(j) rightmost zero bits of p(j) − 1. As a result, if the exponentiation is
computed from left to right, the same attack methodology is applicable, and if
computed from right to left, the power traces must be aligned according to the
(j)
value of s(j) . However, power traces where all rightmost bits until pi+1 are zero
(that is s(j) ≥ i + 1) should be excluded from the analysis because in that case
(j)
the Miller-Rabin test does not use bit pi+1 .

4 Template Attack on Most Significant Bits

4.1 Building Templates


Template attacks usually require the access to a training, blank device [4]. Such
“blank” device may be instantiated by an evaluation sample, a multi-application
smart card or a smart card with a vulnerable OS that allows the execution of
arbitrary programs (for example through a buffer overflow vulnerability). The
trend in security evaluations is to assume the existence of such devices, in which
case template attacks are in scope. But even when this is not the case, the
attack presented in Section 1 can be effectively used as a training phase. The
DPA can be repeated as many times as necessary, revealing as little as one single
bit from several primes p (and q). Once sufficiently many power traces (together
with the revealed bit value pi ) are gathered, it is possible to build the templates
P0 = (M0 , Σ0 ) using samples with pi = 0 and P1 = (M1 , Σ1 ) using samples with
pi = 1. The template Pb consists of the average signal Mb and the covariance
matrix Σb , and can be used to characterize the noise probability density function
of the leakage when pi = b [4]. Although pi is on the less significant side of p,
the templates will be effective for more significant bits as well assuming that the
exponentiation algorithm performs the same actions for all bits.

4.2 Template Attack

The attack uses the simple fact that in an incremental prime search, the most
significant bits of the exponent do not change. This fact is true not only for the
Fermat test but also for the Miller-Rabin test. As a result, multi-shot template
attacks are in scope: with enough executions of the prime test, the accuracy of
template matching phase can be greatly increased. In addition, the same samples
and templates can be re-used for attacking all exponent bits, provided that they
are left unchanged by the incremental search.
For a single sample L with N points of interest, the likelihoods N (L|pi = 0)
and N (L|pi = 1) of observing sample L if bit pi is 0 (resp. 1) are:
⎧ 1 T −1
⎨ N (L|pi = 0) = √ 1N exp− 2 (L−M0 ) Σ0 (L−M0 )
(2π) |Σ0 |
1 T −1 (4)
⎩ N (L|pi = 1) = √ 1N exp− 2 (L−M1 ) Σ1 (L−M1 )
(2π) |Σ1 |
112 C. Vuillaume, T. Endo, and P. Wooderson

The highest likelihood yields the most probable value for bit pi [4]. Similarly, for
multiple samples L(0) , . . . , L(k−1) , the highest value of the log-likelihood yields
the correct value for bit pi .
 
L0 = k−1 log N (L(j) |pi = 0)
j=0
k−1 (5)
L1 = j=0 log N (L(j) |pi = 1)

Next, we estimate the number of samples k that can be expected during prime
search. As explained in Section 3, the candidate is typically incremented  ln 2
times before a prime number is found. In addition, when trial division employs
the T − 1 first primes πj (excluding 2), the number of calls to the primality test
is about  ln 2/ ln(πT ). For example, with  = 512 bits and when T − 1 = 256, we
have on average 48 calls to the primality test and therefore 48 samples. When
 = 1024, we can expect 96 samples. Note that when a prime is found, the
Miller-Rabin test is repeated several times in order to decrease the probability
of outputting a composite [9]. If the error probability must be smaller than 2−100 ,
this gives us 5 − 1 = 4 additional samples for the 512-bit case and 9 − 1 = 8
samples for the 1024-bit case.
Finally, our experimental results for the DPA validate the practicality of the
template attack as well, because the existence of DPA peaks implies that average
signals M0 and M1 can already be distinguished. Again, the differences in the
average signals arise not only from the value of bit pi but also from the behavior
of the exponentiation algorithm depending on bit pi . Since we were able to
observe DPA peaks with as little as 100 samples, a template attack with 96
samples is very likely to be successful.

5 Fault Attacks
5.1 Improving Leakage Attacks
The two attacks presented in Sections 3 and 4 suffer from a relatively limited
number of available samples in the average case. This is due to the fact that
the prime search algorithm exits as soon as a prime number is found. But it is
easy to imagine a (multi-) fault attack that lifts this limitation: when a prime
number has been found but additional rounds of the primality tests are still being
executed in order to decrease the error probability, a fault is induced during the
execution of the primality tests. As result, the candidate is incremented instead
of being returned, thereby increasing the number of samples. Interestingly, this
methodology can be applied to other scenarios. For example, while the attack
against incremental search in [1] has a success rate of 10-15% in the context of
a normal execution of the algorithm, it is always successful if the prime test is
disturbed.
We describe a possible implementation of the attack, where a fault is system-
atically triggered during the last multiplication of the primality test. In case the
candidate is composite, a faulty result is very unlikely to modify the outcome
of the test, but in case the candidate is a prime, a faulty result will certainly
RSA Key Generation: New Attacks 113

mislabel the prime as composite. The positioning of the fault induction system
is not critical since we simply aim at corrupting the result of the multiplication,
without any particular fault model. In order to maximize the effect of the attack,
it should be repeated several times, therefore the fault induction system should
handle multiple faults. This kind of attack is realistic considering modern lasers,
which have very low latencies and can be triggered with oscilloscopes [16]. Next,
two scenarios must be considered.

Free-Run Search. In a free-run search, the residues obtained by trial division are
updated using the simple relationship p(j+1) mod πi = (p(j) mod πi ) + 2 mod πi .
Thus, prime search can be continued indefinitely, until a prime number is found.
If the attacker is able to disrupt the primality test, he is also able to gather as
many samples as he likes. Thus, by combining fault attacks and leakage analysis,
the number of samples is essentially unlimited.

Array Search. In an array search, the search interval is pre-determined and


once all candidates are exhausted, a new interval is randomly selected. This is
typically of what happens if a bit-array-like method is implemented [10]. In that
case, the number of samples k that can be obtained with the fault attack is
always smaller than the size s of the interval. Recall that the failure probability
(i.e. not finding a prime) is bounded by the following expression [9]:
s
Pr[Failure] ≤ exp−2  ln 2 (6)

Above, s is the size of the interval (excluding even numbers) and  the target bit
length of the prime. As a result, it is desirable to select a relatively large search
interval in order to reduce the failure probability. For example, using s = 2048
candidates yields a failure probability smaller than 9.7 ∗ 10−6 for 512-bit primes
and 3.1∗10−3 for 1024-bit primes. Taking trial division into account, the attacker
is able to gather at most 2s/ ln(πT ) samples. Note that the number of samples
does not depend on the target bit length of primes. For example, with s = 2048
and 256 primes for trial division, one can expect at most 554 samples.
On the one hand, the DPA from Section 3 can reach the upper bound 2s/ ln(πT ),
because its objective is simply gathering as many samples as possible for building
templates: the outcome of key generation is irrelevant. On the other hand, the
template attack from Section 4 assumes that the RSA key generation is even-
tually successful, otherwise the gathered samples are worthless. As a result, for
the template attack, there is a trade-off between the accuracy of the template
matching phase (i.e. having more samples) and the probability of success of the
attack. Indeed, if too many faults are induced, it may be that there are no prime
numbers left in the interval, in which case the prime search algorithm fails and
the gathered samples must be discarded. Assuming that the attacker can restart
the prime search in case of failure, this is an optimization problem that does not
affect the final outcome of the attack.
114 C. Vuillaume, T. Endo, and P. Wooderson

5.2 Safe-Error Attack

Unlike the attacks presented in the above sections, which are meant to be com-
bined, the final (multi-) fault attack described below can work independently.
Although it is possible to combine it with the template attack from Section 4,
such combination is somewhat redundant because both attacks target the most
significant bits of the exponent in primality tests.
We assume that exponentiation is calculated with the square-and-multiply-
always algorithm; in other words, multiplications (possibly dummy) are calcu-
(j)
lated independently from the value of exponent bits pi . The attack follows the
principle of safe-errors [15]: if a fault is induced during a dummy multiplication,
the output of the exponentiation is unaffected, but if a fault is induced during
a non-dummy multiplication, the output is corrupted. Note that this assumes
prior reverse-engineering work for identifying a fault target that will fit in the
safe-error model. For example, corrupting memory storing the input values to
the primality test is unlikely to serve our purpose because such faults will af-
fect the outcome of all calculations, but targeting the multiplication circuit or
internal multiplication registers seems more promising.
In order to apply the safe-error methodology, we will take advantage of the
following property of the Miller-Rabin test.
Property 3 (Rarity of Strong Pseudoprimes). With very large probability,
a large random integer passing one round of the Miller-Rabin test is a prime.
For example, the probability that a random 512-bit integer which passes one
round of the Miller-Rabin test is composite is about 2−56 , and for a 1024-bit in-
teger the probability will be even smaller [7]. Thus, an integer which passes one
round of the Miller-Rabin test is prime with very high probability, and therefore
will pass any additional rounds as well. But despite the very low probability of
failing subsequent rounds, if this event happens, it is likely that the actual imple-
mentation of the prime search algorithm will continue the search and increment
the candidate.
The details of the safe error attack against prime generation are given in what
follows. Initially, the target bit index i is set to  − 1 (i.e. the most significant
bit of the exponentiation).

1. Wait until a first round of the Miller-Rabin test is passed.


2. Trigger a fault in the (2( − 1 − i) + 1)-th multiplication of the next round.
3. If the faulty round of the Miller-Rabin test is passed, the multiplication was
(j)
dummy and pi = 0. Set i ← i − 1. If there are 2 or more rounds remaining
go to step 2 otherwise go to step 1.
4. If the faulty round of the Miller-Rabin test is failed, the multiplication was
(j)
not dummy and pi = 1. Set i ← i − 1 and go to step 1.

Note that the last round cannot be used because if the target bit is zero, the
fault will not affect the result and the prime search will successfully terminate,
depriving the attacker from remaining samples. For 512-bit and 1024-bit primes,
RSA Key Generation: New Attacks 115

Table 1. Number of bits revealed per prime

bit length bit pattern probability # of bits MR rounds used


 = 512 1 1/2 1 2
01 1/4 2 2-3
000 1/8 3 2-4
001 1/8 3 2-4
 = 1024 1 1/2 1 2
01 1/4 2 2-3
001 1/8 3 2-4
0001 1/16 4 2-5
00001 1/32 5 2-6
000001 1/64 6 2-7
0000000 1/128 7 2-8
0000001 1/128 7 2-8

we have respectively 5 and 9 rounds, but only rounds 2 to 4 (3 rounds) and 2 to


8 (7 rounds) can be used.
On average 1.75 bits (resp. 1.98 bits) are revealed per 512-bit (resp.1024-bit)
prime in the search interval, and at most 3 bits (resp. 7 bits). The situation
is similar to that of the fault attack from the previous subsection in that two
scenarios must be distinguished.

Free-Run Search. The attacker may disrupt the primality testing as many times
as he likes, allowing him to reveal a large portion of the bits of the initial candi-
date p(0) . When he is satisfied with the number of bits obtained, the prime test
is left unperturbed until a prime number p(j) (close to p(0) ) is found.

Array Search. The attacker may disrupt the primality test to the extent that
prime numbers are still present in the remaining part of the search interval.
Recall that for an -bit initial candidate p, the average number of primes in the
interval p, . . . , p + 2s is:

2s
π(p + 2s) − π(p) ≈ 2s/ ln(p) ≈ (7)
 ln 2
Above, π is the prime-counting function. For example, for a  = 512-bit initial
candidate and with an array size of s = 2048, there are 11.54 prime numbers
in average, and therefore the attack reveals 20 bits in the average case. But
it is of course possible that due to a “lucky” choice, there are more primes in
the interval. For example, if there were 147 primes in the interval, this would
be enough to reveal 256 bits, and the rest of p could be calculated using a
lattice attack following the same strategy as [1]. In case the prime search is
re-started after a failure, the attacker may simply try until sufficiently many
bits are available. But for a typical choice of the interval size s, it is extremely
unlikely that the interval will ever contain sufficiently many primes.
116 C. Vuillaume, T. Endo, and P. Wooderson

Under Hardy and Littlewood prime r-tuple conjecture, Gallagher proved that
the number of primes in a short interval follows a Poisson distribution [17]. As
a result, the probability to have more than k primes in an interval of size s is
about:
2s
Pr [π(p + 2s) − π(p) ≥ k] ≈ 1 − PoissCDF(k, λ) with λ = (8)
 ln 2
In addition, the Cumulative Distribution Function PoissCDF of a Poisson distri-
bution is easily calculated with the equation below.
k
 λi
PoissCDF(k, λ) = e−λ (9)
i=1
i!

For  = 512 and assuming an array of size 16,256 (corresponding to 2 KByte of


memory with a bit array implementation), the probability of having 147 primes
in the interval can be estimated to be about 6 ∗ 10−8 . Assuming an array of size
24,576 (3 KBytes of memory), the probability is 0.22, For smaller sizes of the
interval or larger bit lengths , the probability is negligible.
Figure 3 shows the theoretical Poisson probability distribution function with
λ = 2s/( ln 2) = 138.5 and the measured number of primes in an interval with
s = 24, 576 and a bit length  = 512. For experimental data, 1,000 different
intervals of size 2s = 2 ∗ 24, 576 were randomly generated and the number of
primes in each interval was calculated. Experimental data is very close to the
theoretical value, which confirms the validity of our approximation.

0.045
experimental probability
theoretical Poisson PDF
0.04

0.035

0.03
probability

0.025

0.02

0.015

0.01

0.005

0
100 110 120 130 140 150 160 170 180
number of primes

Fig. 3. Distribution of number of primes in 3KByte array


RSA Key Generation: New Attacks 117

6 Countermeasures
A simple countermeasure to our attacks would be to ensure that key generation
is run in a secure environment, thereby eliminating the threat of side-channel
attacks. But a device able to run key generation in the field is undeniably more
attractive and more flexible, and eliminates infrastructure costs arising from the
requirement of a secure environment. We suggest a few countermeasures that
could prevent our attacks and allow key generation to be securely performed in
the field.

6.1 Alternative Prime Search Strategies


A naive countermeasure would be to give up trial division and choose a random
search algorithm instead but this comes with a very high cost. Compared to the
case where 256 small primes are employed for sieving, one can expect a perfor-
mance degradation of 370%. Another possibility is to use alternative techniques
[12]. However, as long as the candidate update procedure is deterministic, the
risk of having similar attacks remains.

6.2 Execution/Failure Counter


RSA key generation is generally not executed a large number of times in a
typical smart card life cycle. Imposing restrictions on the maximum number
of times key generation can be executed does not hinder a typical use of the
smart card, but can prevent several attacks such as our DPA (used for building
templates). In addition, although failure of finding a prime is something that
naturally occurs during incremental prime generation, a high number of failures
may be a sign that a fault attack is being executed. Therefore, we suggest that
the number of allowed failures should be kept relatively small, and execution
of the RSA key generation should be prevented in case too many failures have
been detected. For example, using Equation (6) and assuming that the search
interval is s = 2048 and that 1024-bit prime numbers are generated, it is easy to
see that the probability of failing 12 times is smaller than 2−99.9 ; in other words,
this should never happen in practice. Consequently, the maximum number of
allowed failure could be set to 12.

6.3 Randomizing the Primality Test


We believe that tackling the source of the problem, namely the primality test,
is a better solution. It is now a common practice to randomize RSA moduli and
exponents to prevent DPA, and it seems natural to apply the same approach to
key generation and primality tests, but the task is not trivial, because the secret
that must be protected is both the exponent and the modulus of the primality
test.
We describe a simple solution for the Fermat test, assuming that a table of
small primes (excluding 2) π2 = 3, π3 = 5, . . . , πT is available. This is usually
118 C. Vuillaume, T. Endo, and P. Wooderson

the case for eliminating candidates through trial division. By repeatedly


T selecting
small primes randomly, one can generate a random number r = i=2 πiei , where
ei is the multiplicity of prime πi in the factorization of r. Since the factorization
of r is known, φ(r) can be easily calculated:

φ(r) = (πi − 1)πiei −1 (10)
i=2...T,ei >0

Next, with a random number a satisfying gcd(a, r ∗p) = 1, it follows from Euler’s
theorem that if p is a prime, then aφ(r)∗(p−1) = 1 mod r ∗ p holds.
It is clear that if p is composite, a Fermat liar a for which a(p−1) = 1 mod p
yields a liar for the randomized test, therefore the number of liars of the ran-
domized Fermat test is strictly larger than the number of liars of the normal
Fermat test. However, in practice, the test is accurate enough for quickly elimi-
nating composites. In order to assess primality in a reliable way, a candidate that
passes the randomized Fermat test could be checked with several rounds of the
(non-randomized) Miller-Rabin test. Since this step takes place only once, the
number of samples will remain very small making template attacks extremely
difficult.
Together with a randomized trial division step, we believe that a randomized
primality step can effectively prevent all of the attacks presented in this paper.
As long as r is not too large (e.g. 32 or 64 bits), the impact on performance
should remain negligible.

7 Conclusion
We presented four different side-channel attacks against prime generation. The
first one is a DPA attack that can reveal a few of the least significant bits of
prime candidates. It is not intended to be used alone but is merely the first
step of a more complex attack. The second one is a template attack targeting
the most significant bits or prime candidates, which are left unchanged by the
incremental search. If necessary, the template building phase can be realized
with our first attack. Since primality testing is expected to be repeated several
times, the attack can take advantage of averaging and has a very high potential
against unprotected implementations. The practicality of our first and second
attack was confirmed with experimental results. The third one is a fault attack
preventing the prime search from terminating too quickly, thereby increasing the
number of samples available for power analysis. By combining the first, second
and third attack, it is possible to gather an arbitrarily high number of samples
for building templates, and depending on implementation parameters, several
dozens to several hundreds of samples for the template matching phase and DPA.
The last one is a safe-error attack effective when the exponentiations in primality
testing involves dummy operations. The attack can break free-run incremental
search algorithms, but not interval search algorithms, at least not for practical
choices of the interval size. Finally, we proposed several countermeasures against
our attacks, including a randomized variant of the Fermat test.
RSA Key Generation: New Attacks 119

While the scope of the attacks presented in this paper is limited to incremental
prime search, it does not mean that other search strategies, especially those using
“deterministic” update methods of the prime candidate, are immune. We leave
this topic open for future research.

References
1. Finke, T., Gebhardt, M., Schindler, W.: A New Side-Channel Attack on RSA
Prime Generation. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747,
pp. 141–155. Springer, Heidelberg (2009)
2. Common Criteria Portal: Security Targets of ICs, Smart Cards and Smart Card-
Related Devices and Systems,
http://www.commoncriteriaportal.org/products/ (retrieved in December 2011)
3. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
4. Chari, S., Rao, J., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç, Ç.K.,
Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
5. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Cryp-
tographic Protocols for Faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS,
vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
6. Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography.
In: Public-Key Parameters, ch. 4. CRC Press (1996)
7. Damgård, I., Landrock, P., Pomerance, C.: Average Case Error Estimates for the
Strong Probable Prime Test. Mathematics of Computation 61(203), 177–194 (1993)
8. Brandt, J., Damgård, I., Landrock, P.: Speeding up Prime Number Generation. In:
Matsumoto, T., Imai, H., Rivest, R.L. (eds.) ASIACRYPT 1991. LNCS, vol. 739,
pp. 440–449. Springer, Heidelberg (1993)
9. Brandt, J., Damgård, I.B.: On Generation of Probable Primes by Incremental
Search. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 358–370.
Springer, Heidelberg (1993)
10. Silverman, R.D.: Fast Generation of Random, Strong RSA Primes. Crypto-
bytes 3(1), 9–13 (1997)
11. Federal Information Processing Standards: Digital Signature Standard (DSS).
FIPS PUB 186-3 (2009)
12. Joye, M., Paillier, P., Vaudenay, S.: Efficient Generation of Prime Numbers. In:
Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS, vol. 1965, pp. 340–354. Springer,
Heidelberg (2000)
13. Joye, M., Paillier, P.: Fast Generation of Prime Numbers on Portable Devices:
An Update. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp.
160–173. Springer, Heidelberg (2006)
14. Clavier, C., Coron, J.-S.: On the Implementation of a Fast Prime Generation Al-
gorithm. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp.
443–449. Springer, Heidelberg (2007)
15. Yen, S.-M., Joye, M.: Checking Before Output Not Be Enough Against Fault-Based
Cryptanalysis. IEEE Trans. Computers 49(9), 967–970 (2000)
16. Riscure. Diode Laser Station DLS 1.0.0714 Datasheet (2011)
17. Gallagher, P.X.: On the Distribution of Primes in Short Intervals. Mathematika 23,
4–9 (1976)
A Fault Attack on the LED Block Cipher

Philipp Jovanovic, Martin Kreuzer, and Ilia Polian

Fakultät für Informatik und Mathematik


Universität Passau
D-94030 Passau, Germany

Abstract. A fault-based attack on the new low-cost LED block cipher


is reported. Parameterized sets of key candidates called fault tuples are
generated, and filtering techniques are employed to quickly eliminate
fault tuples not containing the correct key. Experiments for LED-64 show
that the number of remaining key candidates is practical for performing
brute-force evaluation even for a single fault injection. The extension of
the attack to LED-128 is also discussed.

Keywords: differential fault analysis, fault based attack, cryptanalysis,


LED, block cipher.

1 Introduction
Ubiquitous computing is enabled by small mobile devices, many of which process
sensitive personal information, including financial and medical data. These data
must be protected against unauthorized access using cryptographic methods.
The strength of cryptographic protection is determined by the (in)feasibility of
deriving secret information by an unauthorized party (attacker). On the other
hand, the acceptable complexity of cryptographic algorithms implementable on
mobile devices is typically restricted by stringent cost constraints and by power
consumption limits due to battery life-time and heat dissipation issues. There-
fore, methods which balance between a low implementation complexity and an
adequate level of protection have recently received significant interest [4,5].
Fault-based cryptanalysis [1] has emerged as a practical and effective technique
to break cryptographic systems, i.e., gain unauthorized access to the secret infor-
mation. Instead of attacking the cryptographic algorithm, a physical disturbance
(fault) is induced in the hardware on which the algorithm is executed. Means to
induce faults include parasitic charge-carrier generation by a laser beam; manipu-
lation of the circuit’s clock; and reduction of the circuit’s power-supply voltage [3].
Most fault-based attacks are based on running the cryptographic algorithm sev-
eral times, in presence and in absence of the disturbance. The secret information is
then derived from the differences between the outcomes of these calculations. The
success of a fault attack critically depends on the spatial and temporal resolution
of the attacker’s equipment. Spatial resolution refers to the ability to accurately
select the circuit element to be manipulated; temporal resolution stands for the
capacity to precisely determine the time (clock cycle) and the duration of fault

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 120–134, 2012.

c Springer-Verlag Berlin Heidelberg 2012
A Fault Attack on the LED Block Cipher 121

injection. Several previously published attacks make different assumptions about


vulnerable elements of the circuit accessible to the attacker and the required spa-
tial and temporal resolutions [6,8].
In this paper, we present a new fault-based attack on the LED block cipher [10],
a recently introduced low-cost cryptographic system specifically designed for
resource-constrained hardware implementations. The LED is a derivative of the
Advanced Encryption Standard (AES) [2], but can be implemented using less
resources. We demonstrate that the 64-bit key version of the LED cipher can
still be broken by a fault attack that uses the same rather weak assumptions on
the spatial resolution as an earlier attack targeting AES [9] [11]. In the course
of the attack, relations between key bits are expressed by algebraic equations.
While the system of equations is significantly more complex than for AES, some
simplifications are sufficient to reduce the number of possible key candidates to
a value practical for brute-force analysis.
During the attack, sets of key candidates described by a parametrized data
structure called fault tuple are generated. Novel advanced filtering techniques
help to quickly identify (and discard) fault tuples which definitely do not cor-
respond to candidate sets containing the correct key. Experiments on a large
number of instances show that, when all filtering techniques are used, a single
fault injection is sufficient to break the cipher. The number of key candidates can
be further reduced by repeated fault injection. We also describe an extension of
the attack to the more expensive LED-128 cipher which assumes better control
of the circuit by the attacker.
The remainder of the paper is organized as follows. The 64-bit and 128-bit
versions of the LED cipher are described in the next section. The operation of
LED-64 with an injected fault is described in Section 3 and used to derive fault
equations. Techniques for generating and filtering the key candidates produced
by the attack are the subject of Section 4. Experimental results showing the
efficiency of the filtering techniques are reported in Section 5. Finally, Section 6
on variants of the attack and Section 7 containing our conclusions finish the
paper.

2 The LED Block Cipher

In this section we briefly recall the design of the block cipher LED, as specified
in [10]. It is immediately apparent that the specification of LED has many parallels
to the well-known block cipher AES. The LED cipher uses 64-bit blocks as states
and accepts 64- and 128-bit keys. Our main focus in this paper will be the
version having 64-bit keys which we will denote by LED-64. Other key lengths,
e.g. the popular choice of 80 bits, are padded to 128 bits by appending zeros
until the desired key length is reached. Depending on the key size, the encryption
algorithm performs 32 rounds for LED-64 and 48 round for LED-128. Later in
this section we will describe the components of such a round.
The 64-bit state of the cipher is conceptually arranged in a 4 × 4 matrix,
where each 4-bit sized entry is identified with an element of the finite field
122 P. Jovanovic, M. Kreuzer, and I. Polian

F16 ∼
= F2 [X]/X 4 + X + 1. In the following, we represent an element g ∈ F16 ,
with g = c3 X 3 + c2 X 2 + c1 X + c0 and ci ∈ F2 , by

g 
−→ c3 ||c2 ||c1 ||c0

Here || denotes the concatenation of bits. In other words, this mapping identifies
an element of F16 with a bit string. For example, the polynomial X 3 + X + 1 has
the coefficient vector (1, 0, 1, 1) and is mapped to the bit string 1011. Note that
we write 4-bit strings always in their hexadecimal short form, i.e. 1011 = B.
First, a 64-bit plaintext unit m is considered as a 16-fold concatenation of
4-bit strings m0 || m1 || · · · || m14 || m15 . Then these 4-bis strings are identified
with elements of F16 and arranged row-wise in a matrix of size 4 × 4:
⎛ ⎞
m0 m1 m2 m 3
⎜ m 4 m5 m6 m 7 ⎟
m = ⎜ ⎝ m8 m9 m10 m11 ⎠

m12 m13 m14 m15

Likewise, the key is arranged in one or two matrices of size 4 × 4 over F16 ,
according to its size of 64 bits or 128 bits:
⎛ ⎞ ⎛ ⎞
k0 k1 k2 k3 k16 k17 k18 k19
⎜ k4 k5 k6 k7 ⎟ ⎜ ⎟
k=⎜ ⎟ and possibly k̃ = ⎜k20 k21 k22 k23 ⎟
⎝ k8 k9 k10 k11 ⎠ ⎝k24 k25 k26 k27 ⎠
k12 k13 k14 k15 k28 k29 k30 k31

Figure 1 below describes the way in which the encryption algorithm of LED op-
erates. It exhibits a special feature of this cipher – there is no key schedule.
On the one hand, this makes the implementation especially light-weight. On the
other hand, it may increase the cipher’s vulnerability to various attacks. Notice

k k k k k k
m 4 rounds 4 rounds 4 rounds 4 rounds c
k k~ k k~ k~ k
m 4 rounds 4 rounds 4 rounds 4 rounds c
Fig. 1. LED key usage: 64-bit key (top) and 128-bit key (bottom)

that key additions are performed only after four rounds have been executed. The
authors of the original paper [10] call these four rounds a single Step. Key ad-
ditions are effected by the function AddRoundKey (AK). It performs an addition
of the state matrix and the matrix representing the key using bitwise XOR. It
is applied for input- and output-whitening as well as after every fourth round.
A Fault Attack on the LED Block Cipher 123

We remark again that the original keys are used without further modification as
round keys.
Now we examine one round of the LED encryption algorithm. It is composed of
several operations. Figure 2 provides a rough overview. All matrices are defined

AddConstants SubCells ShiftRows MixColumnsSerial


S S S S
S S S S
4 cells
S S S S
S S S S
4 cells
element of F16

Fig. 2. An overview of a single round of LED

over the field F16 . The final value of the state matrix yields the 64-bit ciphertext
unit c in the obvious way. Let us have a look at the individual steps.
AddConstants (AC). For each round, a round constant consisting of a tuple
of six bits (b5 , b4 , b3 , b2 , b1 , b0 ) is defined as follows. Before the first round, we
start with the zero tuple. In consecutive rounds, we start with the previous
round constant. Then we shift the six bits one position to the left. The new
value of b0 is computed as b5 + b4 + 1. This results in the round constants whose
hexadecimal values are given in Table 1. Next, the round constant is divided into

Table 1. The LED round constants


Rounds Constants
1-24 01,03,07,0F,1F,3E,3D,3B,37,2F,1E,3C,39,33,27,0E,1D,3A,35,2B,16,2C,18,30
25-48 21,02,05,0B,17,2E,1C,38,31,23,06,0D,1B,36,2D,1A,34,29,12,24,08,11,22,04

x = b5 || b4 || b3 and y = b2 || b1 || b0 where we interpret x and y as elements


of F16 . Finally, we form the matrix
⎛ ⎞
0x00
⎜1 y 0 0⎟
⎜ ⎟
⎝2 x 0 0⎠
3y00

and add it to the state matrix. (In the current setting, matrix adddition is
nothing but bitwise XOR.)
SubCells (SC). Each entry x of the state matrix is replaced by the element
S[x] from the SBox given in Table 2. (This particular SBox was first used by the
block cipher PRESENT, see [5].)
ShiftRows (SR). For i = 1, 2, 3, 4, the i-th row of the state matrix is shifted
cyclically to the left by i − 1 positions.
124 P. Jovanovic, M. Kreuzer, and I. Polian

Table 2. The LED SBox

x 0 1 2 3 4 5 6 7 8 9 A B C D E F
S[x] C 5 6 B 9 0 A D 3 E F 8 4 7 1 2

MixColumnsSerial (MCS). Each column v of the state matrix is replaced by


the product M · v, where M is the matrix1
⎛ ⎞
4122
⎜8 6 5 6⎟
M =⎜ ⎟
⎝B E A 9⎠
22FB

3 Fault Equations for LED-64


In this section we describe a way to cryptanalyze LED-64, the 64-bit version of the
LED block cipher, by fault induction. Our fault model assumes that an attacker
is capable of inducing a fault in a particular 4-bit entry of the state matrix at
a specified point during the encryption algorithm, changing it to a random and
unknown value. The attack is based on solving fault equations derived from
the propagation of this fault through the remainder of the encryption algorithm.
In the following we explain the construction of these fault equations.
The attack starts with a fault injection at the beginning of round r = 30. The
attacker then watches the error spread over the state matrix in the course of the
last three rounds. Figure 3 shows the propagation of a fault injected in the first
entry of the state matrix during the encryption. Every square depicts the XOR
difference of the correct and the faulty cipher state during that particular phase
of the last three encryption rounds.
In the end the attacker has two ciphertexts, the correct c = c0 || ... || c15 and
the faulty c = c0 || ... || c15 , with ci , ci ∈ F16 . By working backwards from this
result, we construct equations that describe relations between c and c . Such
relations exist, because the difference between c and c is due to a single faulty
state matrix entry at the beginning of round 30.
With the help of those equations we then try to limit the space of all possible
keys, such that we are able to perform a brute force attack, or in the best case,
get the secret key directly. Next, we discuss the method to establish the fault
equations.

3.1 Inversion of LED Steps


We consider c resp. c as a starting point and invert every operation of the
encryption until the beginning of round r = 30. The 4-bit sized elements ki with
1
In the specification of LED in the original paper [10], the first row of M is given as
4 2 1 1. This appears to be a mistake, as the results computed starting with these
value do not match those presented for the test examples later in the paper. The
matrix M used here is taken from the original authors’ reference implementation
of LED and gives the correct results for the test examples.
A Fault Attack on the LED Block Cipher 125

r = 30

f f f' f' 4f'

AC SC SR MCS 8f'

Bf'

2f'

r = 31

4f' 4f' a a 4a 2d 2c 1b

8f' AC 8f' SC b SR b MCS 8a 6d 5c 6b

Bf' Bf' c c Ba 9d Ac Eb

2f' 2f' d d 2a Bd Fc 2b

r = 32

4a 2d 2c 1b 4a 2d 2c 1b q0 q1 q2 q3 q0 q1 q2 q3 p0 p1 p2 p3

8a 6d 5c 6b AC 8a 6d 5c 6b SC q4 q5 q6 q7 SR q5 q6 q7 q4 MCS p4 p5 p6 p7 AK

Ba 9d Ac Eb Ba 9d Ac Eb q8 q9 q10 q11 q10 q11 q8 q9 p8 p9 p10 p11

2a Bd Fc 2b 2a Bd Fc 2b q12 q13 q14 q15 q15 q12 q13 q14 p12 p13 p14 p15

Fig. 3. Fault propagation in the LED cipher

0 ≤ i ≤ 15 of the key are viewed as indeterminates. The following steps list the
expressions one has to compute to finally get the fault equations.

1. AK−1 : ci + ki and ci + ki


2. MCS−1 : Use the inverse matrix
⎛ ⎞
CCD4
⎜3 8 4 5⎟
M −1 =⎜ ⎟
⎝7 6 2 E⎠
D99D

of the matrix M from the MCS operation to get the expressions

C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 ) resp.


C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 ).

Obviously the other expressions are computed in a similar way.


3. SR−1 : As the operation only shifts the entries of the state matrix, the com-
puted expressions are unaffected.
4. SC−1 : Inverting the SC operation results in

S −1 (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) resp.


S −1 (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )).

where S −1 is the inverse of the LED SBox. The remaining expressions are
computed in the same way again.
126 P. Jovanovic, M. Kreuzer, and I. Polian

3.2 Generation of Fault Equations

The XOR difference between the two related expressions, one derived from c and
the other one from c , is computed and identified with the corresponding fault
value, which can be read off the fault propagation in Figure 3 above. Thus we
get

4a = S −1 (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) +


S −1 (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )).

In summary one gets 16 fault equations for a fault injected at a particular 4-bit
element of the state matrix at the beginning of round r = 30. For the rest of the
paper we will denote the equations by Ex,i , where x ∈ {a, b, c, d} identifies the
block the equation belongs to and i ∈ {0, 1, 2, 3} the number of the equation as
ordered below. Let us list those 16 equations.

−1
4·a= S (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) +
−1    
S (C · (c0 + k0 ) + C · (c4 + k4 ) + D · (c8 + k8 ) + 4 · (c12 + k12 )) (Ea,0 )
−1
8·a= S (3 · (c3 + k3 ) + 8 · (c7 + k7 ) + 4 · (c11 + k11 ) + 5 · (c15 + k15 )) +
−1    
S (3 · (c3 + k3 ) + 8 · (c7 + k7 ) + 4 · (c11 + k11 ) + 5 · (c15 + k15 )) (Ea,1 )
−1
B·a= S (7 · (c2 + k2 ) + 6 · (c6 + k6 ) + 2 · (c10 + k10 ) + E · (c14 + k14 )) +
−1    
S (7 · (c2 + k2 ) + 6 · (c6 + k6 ) + 2 · (c10 + k10 ) + E · (c14 + k14 )) (Ea,2 )
−1
2·a= S (D · (c1 + k1 ) + 9 · (c5 + k5 ) + 9 · (c9 + k9 ) + D · (c13 + k13 )) +
−1    
S (D · (c1 + k1 ) + 9 · (c5 + k5 ) + 9 · (c9 + k9 ) + D · (c13 + k13 )) (Ea,3 )

2 · d = S −1 (C · (c1 + k1 ) + C · (c5 + k5 ) + D · (c9 + k9 ) + 4 · (c13 + k13 )) +


S −1 (C · (c1 + k1 ) + C · (c5 + k5 ) + D · (c9 + k9 ) + 4 · (c13 + k13 )) (Ed,0 )
6 · d = S −1 (3 · (c0 + k0 ) + 8 · (c4 + k4 ) + 4 · (c8 + k8 ) + 5 · (c12 + k12 )) +
S −1 (3 · (c0 + k0 ) + 8 · (c4 + k4 ) + 4 · (c8 + k8 ) + 5 · (c12 + k12 )) (Ed,1 )
9 · d = S −1 (7 · (c3 + k3 ) + 6 · (c7 + k7 ) + 2 · (c11 + k11 ) + E · (c15 + k15 )) +
S −1 (7 · (c3 + k3 ) + 6 · (c7 + k7 ) + 2 · (c11 + k11 ) + E · (c15 + k15 )) (Ed,2 )
B · d = S −1 (D · (c2 + k2 ) + 9 · (c6 + k6 ) + 9 · (c10 + k10 ) + D · (c14 + k14 )) +
S −1 (D · (c2 + k2 ) + 9 · (c6 + k6 ) + 9 · (c10 + k10 ) + D · (c14 + k14 )) (Ed,3 )

2 · c = S −1 (C · (c2 + k2 ) + C · (c6 + k6 ) + D · (c10 + k10 ) + 4 · (c14 + k14 )) +


−1    
S (C · (c2 + k2 ) + C · (c6 + k6 ) + D · (c10 + k10 ) + 4 · (c14 + k14 )) (Ec,0 )
−1
5·c= S (3 · (c1 + k1 ) + 8 · (c5 + k5 ) + 4 · (c9 + k9 ) + 5 · (c13 + k13 )) +
−1    
S (3 · (c1 + k1 ) + 8 · (c5 + k5 ) + 4 · (c9 + k9 ) + 5 · (c13 + k13 )) (Ec,1 )
−1
A·c= S (7 · (c0 + k0 ) + 6 · (c4 + k4 ) + 2 · (c8 + k8 ) + E · (c12 + k12 )) +
−1    
S (7 · (c0 + k0 ) + 6 · (c4 + k4 ) + 2 · (c8 + k8 ) + E · (c12 + k12 )) (Ec,2 )
−1
F·c= S (D · (c3 + k3 ) + 9 · (c7 + k7 ) + 9 · (c11 + k11 ) + D · (c15 + k15 )) +
−1    
S (D · (c3 + k3 ) + 9 · (c7 + k7 ) + 9 · (c11 + k11 ) + D · (c15 + k15 )) (Ec,3 )
A Fault Attack on the LED Block Cipher 127

1 · b = S −1 (C · (c3 + k3 ) + C · (c7 + k7 ) + D · (c11 + k11 ) + 4 · (c15 + k15 )) +


S −1 (C · (c3 + k3 ) + C · (c7 + k7 ) + D · (c11 + k11 ) + 4 · (c15 + k15 )) (Eb,0 )
−1
6·b = S (3 · (c2 + k2 ) + 8 · (c6 + k6 ) + 4 · (c10 + k10 ) + 5 · (c14 + k14 )) +
−1    
S (3 · (c2 + k2 ) + 8 · (c6 + k6 ) + 4 · (c10 + k10 ) + 5 · (c14 + k14 )) (Eb,1 )
−1
E·b = S (7 · (c1 + k1 ) + 6 · (c5 + k5 ) + 2 · (c9 + k9 ) + E · (c13 + k13 )) +
−1    
S (7 · (c1 + k1 ) + 6 · (c5 + k5 ) + 2 · (c9 + k9 ) + E · (c13 + k13 )) (Eb,2 )
−1
2·b = S (D · (c0 + k0 ) + 9 · (c4 + k4 ) + 9 · (c8 + k8 ) + D · (c12 + k12 )) +
−1    
S (D · (c0 + k0 ) + 9 · (c4 + k4 ) + 9 · (c8 + k8 ) + D · (c12 + k12 )) (Eb,3 )

Here the fault values a, b, c and d are unknown and thus have to be considered
indeterminates. Of course, for a concrete instance of the attack, we assume that
we are given the correct ciphertext c and the faulty ciphertext c and we assume
henceforth that these values have been substituted in the fault equations.

4 Key Filtering
The correct key satisfies all the fault equations derived above. Our attack is based
on quickly identifying large sets of key candidates which are inconsistent with
some of the fault equations and excluding these sets from further consideration.
The attack stops when the number of remaining key candidates is so small
that exhaustive search becomes feasible. Key candidates are organized using a
formalism called fault tuples (introduced below), and filters work directly on
fault tuples. The outline of our approach is as follows:

1. Key Tuple Filtering: Filter the key tuples and obtain the fault tuples to-
gether with their key candidate sets. (Section 4.1; this stage is partly inspired
by the evaluation of the fault equations in [9] and [11]).
2. Key Set Filtering: Filter the fault tuples to eliminate some key candidate
sets (Section 4.2).
3. Exhaustive Search: Find the correct key by considering every remaining
key candidate.

Details on the individual stages and the parameter choice for the attacks are
given below.

4.1 Key Tuple Filtering


In the following we let x be an element of {a, b, c, d} and i ∈ {1, 2, 3, 4}. Each
equation Ex,i depends on only four key indeterminates. In the first stage, we
start by computing for each equation Ex,i a list Sx,i of length 16. The j-th entry
of Sx,i , denoted Sx,i (j), is the set of all 4-tuples of values of key indeterminates
which produces the j-th field element as a result of evaluating equation Ex,i at
these values. Notice that we have to check 164 tuples of elements of F16 in order
to generate one Sx,i (j). The computation of all entries Sx,i (j) requires merely
128 P. Jovanovic, M. Kreuzer, and I. Polian

165 evaluations of simple polynomials over F16 . Since all entries are independent
from each other, the calculations can be performed in parallel using multiple
processors.
In the next step, we determine, for every x ∈ {a, b, c, d} the set of possible val-
ues jx of x such that Sx,0 (jx ), Sx,1 (jx ), Sx,2 (jx ) and Sx,3 (jx ) are all non-empty.
In other words, we are looking for jx which can occur on the left-hand side of
equations Ex,0 , Ex,1 , Ex,2 and Ex,3 for some possible values of key indetermi-
nates. We call an identified value jx ∈ F16 a possible fault value of x.
By combining the possible fault values of a, b, c, d in all available ways, we ob-
tain tuples t = (ja , jd , jc , jb ) which we call fault tuples of the given pair (c, c ). For
each fault tuple, we intersect those sets Sx,i (jx ) which correspond to equations
involving the same key indeterminates:

(k0 , k4 , k8 , k12 ) : Sa,0 (ja ) ∩ Sd,1 (jd ) ∩ Sc,2 (jc ) ∩ Sb,3 (jb )
(k1 , k5 , k9 , k13 ) : Sa,3 (ja ) ∩ Sd,0 (jd ) ∩ Sc,1 (jc ) ∩ Sb,2 (jb )
(k2 , k6 , k10 , k14 ) : Sa,2 (ja ) ∩ Sd,3 (jd ) ∩ Sc,0 (jc ) ∩ Sb,1 (jb )
(k3 , k7 , k11 , k15 ) : Sa,1 (ja ) ∩ Sd,2 (jd ) ∩ Sc,3 (jc ) ∩ Sb,0 (jb )

By recombining the key values (k0 , . . . , k15 ) using all possible choices in these
four intersections, we arrive at the key candidate set for the given fault tuple. If
the size of the key candidate sets is sufficiently small, it is possible to skip the
second stage of the attack and to search all key candidate sets exhaustively for
the correct key.
Each of the intersections in the above picture contains typically 24 − 28 el-
ements. Consequently, the typical size of a key candidate set is in the range
219 − 226 . Unfortunately, often several fault tuples are generated. The key candi-
date sets corresponding to different fault tuples are necessarily pairwise disjoint
by their construction. Only one of them contains the true key, but up to now we
lack a way to distinguish the correct key candidate set (i.e. the one containing
the true key) from the wrong ones. Before we address this problem in the next
section, we illustrate the key set filtering by an example.

Example 1. In this example we take one of the official test vectors from the LED
specification and apply our attack. It is given by

k = 01234567 89ABCDEF
m = 01234567 89ABCDEF
c = FDD6FB98 45F81456
c = 51B8AB31 169AC161

where the faulty ciphertext c is obtained when injecting the error e = 8 in the
first entry of the state matrix at the beginning of the 30-th round. Although the
attack is independent of the value of the error, we use a specific one here in order
to enable the reader to reproduce our results. Evaluation of the fault equations
provides us with the following table:
A Fault Attack on the LED Block Cipher 129

a 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sa,0 0 214 0 214 0 0 0 0 0 214 0 214 0 0 0 0
#Sa,1 0 0 0 0 0 0 0 0 214 214 0 0 214 214 0 0
#Sa,2 0 0 0 0 214 0 0 214 0 214 214 0 0 0 0 0
#Sa,3 0 0 213 0 213 0 213 213 213 214 0 0 0 0 0 213
d 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sd,0 0 213 213 213 0 0 0 213 0 0 213 214 0 213 0 0
#Sd,1 0 213 213 213 214 0 213 0 0 0 213 0 213 0 0 0
#Sd,2 0 0 214 213 0 0 213 0 213 213 0 213 0 0 0 213
#Sd,3 0 213 213 0 213 0 0 213 0 213 213 0 213 0 0 213
c 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sc,0 0 0 0 214 214 0 0 0 213 0 0 213 213 0 0 213
#Sc,1 0 213 0 0 0 0 0 213 213 213 213 214 0 213 0 0
#Sc,2 0 213 0 0 0 213 214 0 214 0 0 213 0 0 0 213
#Sc,3 0 0 213 213 0 0 0 0 214 213 0 213 213 0 0 213
b 0 1 2 3 4 5 6 7 8 9 A B C D E F
#Sb,0 0 0 0 213 0 213 0 0 0 214 0 213 0 213 0 214
#Sb,1 0 0 0 0 213 213 214 0 213 213 0 214 0 0 0 0
#Sb,2 0 213 0 214 213 213 0 213 0 0 0 213 213 0 0 0
#Sb,3 0 214 214 0 0 214 214 0 0 0 0 0 0 0 0 0
From this we see that there are two fault tuples, namely (9, 2, 8, 5) and (9, 2, B, 5).
The corresponding key candidate sets have 224 and 223 elements, respectively.
The problematic equations are obviously equations Ec,i for i ∈ {0, 1, 2, 3}.
There are two possible fault values, namely 8 and B. So far we have no way of
deciding which set contains the key and thus have to search through both of
them. Actually, in this example the correct key is contained in the candidates
set corresponding to the fault tuple (9,2,B,5).

4.2 Key Set Filtering


In the following we study the problem how to decide if a key candidate set
contains the true key or not.
Let xi ∈ F16 with i ∈ {0, 4, 8, 12} be the elements of the first column of the
state matrix at the beginning of round r = 31. The fault propagation in Figure 3
implies the following equations for the faulty elements xi :
x0 = x0 + 4f  x8 = x8 + Bf 
x4 = x4 + 8f  x12 = x12 + 2f 
Next, let yi ∈ F16 be the values that we get after adding the round constants to
the elements xi and plugging the result into the SBox. These values satisfy

S(x0 + 0) = y0 S(x0 + 0) = y0 + a
S(x4 + 1) = y4 S(x4 + 1) = y4 + b
S(x8 + 2) = y8 S(x8 + 2) = y8 + c
S(x12 + 3) = y12 S(x12 + 3) = y12 + d
130 P. Jovanovic, M. Kreuzer, and I. Polian

Now we apply the inverse SBox to these equations and take the differences of
the equations involving the same elements yi . The result is the following system:

4f  = S −1 (y0 ) + S −1 (y0 + a)
8f  = S −1 (y4 ) + S −1 (y4 + b)
Bf  = S −1 (y8 ) + S −1 (y8 + c)
2f  = S −1 (y12 ) + S −1 (y12 + d)

Finally, we are ready to use a filter mechanism similar to the one in the preceding
subsection. For a given fault tuple (a, d, c, b), we try all possible values of the
elements yi and check whether there is one for which the system has a solution
for f  . Thus we have to check four equations over F16 for consistency. This is
easy enough and can also be done in parallel. If there is no solution for f  , we
discard the entire candidate set. While we are currently not using the absolute
values yi for the attack, we are exploring possible further speed-up techniques
based on these values.

4.3 Temporal and Spatial Aspects of the Attack

The effect of the attack depends strongly on injecting the fault in round 30:

1. Injecting the fault at an earlier round does not lead to useful fault equations,
since they would depend on all key elements k0 , . . . , k15 and no meaningful
key filtering would be possible.
2. Injecting the fault in a later round results in weaker fault equations which
do not rule out enough key candidates to make exhaustive search feasible.
3. If the fault is injected in round 30 at another entry of the state matrix than
the first, one gets different equations. However, they make the same kind of
key filtering possible as the equations in Section 3. Thus, if we allow fault
injections at random entries of the state matrix in round 30, the overall time
complexity rises only by a factor of 16.

We experimented with enhancing the attack by level-2 fault equations which go


even further back in the fault history. These equations incorporate two inverse
SBoxes and depend on all parts k0 , . . . , k15 of the key. We determined experi-
mentally that they do not bring any speed-up compared to the exhaustive search
of remaining key candidates. Therefore, we do not report the details on these
equations.

4.4 Relation to AES

Several properties of LED render it more resistant to the fault-based attack pre-
sented in this paper, compared to AES discussed in [9] and [11]. The derived
LED fault equations are more complex than their counterparts for AES [9,11].
This fact is due to the diffusion property of the MixColumnsSerial function,
A Fault Attack on the LED Block Cipher 131

which is a matrix multiplication that makes every block of the LED fault equa-
tions (Ex,j ) (Section 3.2) depend on all 16 key indeterminates. In every block we
have exactly one equation that depends on one of the key tuples (k0 , k4 , k8 , k12 ),
(k1 , k5 , k9 , k13 ), (k2 , k6 , k10 , k14 ), and (k3 , k7 , k11 , k15 ). In contrast, AES skips the
final MixColumns operation, and every block of its fault equations depends only
on four key indeterminates.
This observation yields an interesting approach to protect AES against the
fault attack from [9,11]. Adding operation MixColumns to the last round of AES
makes this kind of fault attack much harder, as the time for evaluating the AES
equations rises up to 232 . Furthermore, as in the case of LED, it is possible that
several fault tuples have to be considered, further complicating the attack.

5 Experimental Results
In this section we report on some results and timings of our attack. The timings
were obtained on a 2.1 GHz AMD Opteron 6172 workstation having 48 GB
RAM. The LED cipher was implemented in C, the attack code in Python. We
performed our attack on 10000 examples using random keys, plaintext units and
faults. The faults were injected at the first element of the state matrix on the
beginning of round r = 30. On average, it took about 45 seconds to finish a single
run of the attack, including the key tuple filtering and the key set filtering. The
time for exhaustive search wasn’t measured at this point. The execution time of
the attack could be further reduced by using a better performing programming
language like C/C++ and parallelization.
Table 3 shows the possible number of fault tuples (#ft) that appeared during
our experiments and the relation between the number of occurrences and the
cases where fault tuples could be discarded by key set filtering (Section 4.2).
For instance, column 3 (#ft = 2) reports that there were 3926 cases in which
two fault tuples were found, and 1640 of them could be eliminated using key set
filtering.

Table 3. Efficiency of key set filtering

#ft 1 2 3 4 5 6 8 9 10 12 16 18 24 36
occurred 2952 3926 351 1887 1 307 394 15 1 101 39 10 14 2
discarded - 1640 234 1410 1 268 359 14 1 101 38 10 14 2

It is clear that key set filtering is very efficient. Especially if many fault tuples
had to be considered, some of them could be discarded in almost every case.
But also in the more frequent case of a small number of fault tuples there was
a significant gain. Figure 4 shows this using a graphical representation. (Note
the logarithmic y scale.) Altogether, in about 29.5% of the examples there was
a unique fault tuple, in another 29.6% of the examples there were multiple fault
tuples, none of which could be discarded, and in about 40.9% of the example
some of the fault tuples could be eliminated using key set filtering.
132 P. Jovanovic, M. Kreuzer, and I. Polian

10000

# occurrences
# discards

1000

100

10

1
1 2 3 4 5 6 8 9 10 12 16 18 24 36
# fault tuples

Fig. 4. Efficiency of key set filtering (logarithmic y scale)

Finally, it is interesting to see how many fault tuples can be discarded on


average. These values are collected in Table 4.

Table 4. Average number of discards

#ft 2 3 4 5 6 8 9 10 12 16 18 24 36
ødiscarded 0.4 0.9 1.4 2.0 2.5 3.6 3.7 5.0 6.1 8.4 8.4 12.6 24.0

6 Extensions of the Attack


In this section we discuss some improvements and extensions of the attack in-
troduced in Section 4.

6.1 Multiple Fault Injection


It is possible to further reduce the key space by running the attack a second time
with the same key but a different plaintext. After the second attack, all sets of
key candidates from the first and the second attack are intersected pairwise.
This eliminates many “wrong” candidate sets and greatly reduces the number of
candidates in the correct one. The following example illustrates this technique.
Example 2. We repeat the attack from Example 1 with the same key k and a
different plaintext m̃:

k = 01234567 89ABCDEF
m̃ = 10000000 10000000
c = 04376B73 063BC443
c = 0E8F2863 17C57720
A Fault Attack on the LED Block Cipher 133

Again the error e = 8 is injected at the first entry of the state matrix at the
beginning of round r = 30. The key filtering stage returns two fault tuples
(5, 7, 7, 5) and (5, 9, 7, 5), both having key candidate sets of size 220 .
Now we form the pairwise intersections of the key candidate sets of the first
and second run. The only non-empty one contains a mere 8 key candidates from
which the correct key is found almost immediately.

Note that repeating an attack may or may not be feasible in practice. Experi-
ments demonstrate that our technique works using a single attack; several at-
tacks just further reduce the set of key candidates on which to run an exhaustive
search.

6.2 Extension of the Attack for LED-128


LED-128 uses a 128-bit key which is split into two 64-bit keys k and k̃ used
alternatingly as round keys. Since k and k̃ are independent from each other,
a straightforward application of the procedure from Section 3 would result in
fault equations with too many indeterminates to allow sufficient key filtering.
Unlike AES (where reconstructing the last subkey allows the derivation of all
other subkeys from the key schedule [9]), LED-128 inherently resists the fault
attack under the assumptions of this paper.
Still, LED-128 is vulnerable to a fault attack if we assume that the attacker
has the capability assumed in previous literature ([7], p. 298). If the key is stored
in a secure memory (EEPROM) and transferred to the device’s main memory
when needed, the attacker may reset selected bytes of the key, i.e., assign them
the value of 0, during the transfer from the EEPROM to the memory. If we
can temporary set, using this technique, the round key k̃ to zero (or any other
known value) and leave k unchanged, then a simple modification of our attack
can derive k. Using the knowledge of k, we mount a second fault attack without
manipulating k̃. This second attack is another modification of our attack and is
used to determine k̃.

7 Conclusions and Future Work


We demonstrated that the LED-64 block cipher has a vulnerability to fault-
based attacks which roughly matches AES. The improved protection mechanisms
of LED can be overcome using clever manipulation of sub-sets of key candidates,
described by fault tuples. LED-128 is more challenging, even though its strength
collapses if the attacker has the ability to set one half of the key bits to a
known value (e.g., during the transfer from a secure memory location). In the
future, we plan to implement LED in hardware and to study attacks using a
fault-injection framework. We are interested in investigating the effectiveness of
hardware protection mechanisms in detecting and preventing attempted attacks.
134 P. Jovanovic, M. Kreuzer, and I. Polian

References
1. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Elimination Errors
in Cryptographic Computations. J. Cryptology 14, 101–119 (2001)
2. National Institute of Standards and Technology (NIST). Advanced Encryption
Standard (AES). FIPS Publication 197 (2001),
http://www.itl.nist.gov/fipsbups/
3. Bar-El, H., Choukri, H., Naccache, D., Tunstall, M., Whelan, C.: The Sorcerer’s
Apprentice Guide to Fault Attacks. Proceedings of the IEEE 94, 370–382 (2006)
4. Hong, D., Sung, J., Hong, S., Lim, J., Lee, S., Koo, B., Lee, C., Chang, D., Lee,
J., Jeong, K., Kim, H., Kim, J., Chee, S.: HIGHT: A New Block Cipher Suitable
for Low-Resource Device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 46–59. Springer, Heidelberg (2006)
5. Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw,
M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466.
Springer, Heidelberg (2007)
6. Kim, C.H., Quisquater, J.-J.: Fault Attacks for CRT Based RSA: New Attacks,
New Results, and New Countermeasures. In: Sauveron, D., Markantonakis, K.,
Bilas, A., Quisquater, J.-J. (eds.) WISTP 2007. LNCS, vol. 4462, pp. 215–228.
Springer, Heidelberg (2007)
7. Koren, I., Krishna, C.M.: Fault-Tolerant Systems. Morgan-Kaufman Publishers,
San Francisco (2007)
8. Hojsı́k, M., Rudolf, B.: Differential Fault Analysis of Trivium. In: Nyberg, K. (ed.)
FSE 2008. LNCS, vol. 5086, pp. 158–172. Springer, Heidelberg (2008)
9. Mukhopadhyay, D.: An Improved Fault Based Attack of the Advanced Encryption
Standard. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 421–
434. Springer, Heidelberg (2009)
10. Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED Block Cipher. In:
Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer,
Heidelberg (2011)
11. Tunstall, M., Mukhopadhyay, D., Ali, S.: Differential Fault Analysis of the Ad-
vanced Encryption Standard Using a Single Fault. In: Ardagna, C.A., Zhou, J.
(eds.) WISTP 2011. LNCS, vol. 6633, pp. 224–233. Springer, Heidelberg (2011)
Differential Fault Analysis of Full LBlock

Liang Zhao , Takashi Nishide, and Kouichi Sakurai

Graduate School of Information Science and Electrical Engineering,


Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
zhaoliangjapan@gmail.com, nishide@inf.kyushu-u.ac.jp,
sakurai@csce.kyushu-u.ac.jp

Abstract. LBlock is a 64-bit lightweight block cipher which can be im-


plemented in both hardware environments and software platforms. It
was designed by Wu and Zhang, and published at ACNS2011. In this
paper, we explore the strength of LBlock against the differential fault
analysis (DFA). As far as we know, this is the first time the DFA attack
is used to analyze LBlock. Our DFA attack adopts the random bit fault
model. When the fault is injected at the end of the round from the 25th
round to the 31st round, the DFA attack is used to reveal the last three
round subkeys (i.e., K32 , K31 and K30 ) by analyzing the active S-box of
which the input and output differences can be obtained from the right
and faulty ciphertexts (C, C). Then, the master key can be recovered
based on the analysis of the key scheduling. Specially, for the condition
that the fault is injected at the end of the 25th and 26th round, we show
that the active S-box can be distinguished from the false active S-box by
analyzing the nonzero differences from the pair of ciphertexts (C, C). 
The false active S-box which we define implies that the nonzero input
difference does not correspond to the right output difference. Moreover,
as the LBlock can achieve the best diffusion in eight rounds, there can
exist the countermeasures that protect the first and last eight rounds.
This countermeasure raises a question whether provoking a fault at the
former round of LBlock can reveal the round subkey. Our current work
also gives an answer to the question that the DFA attack can be used to
reveal the round subkey when the fault is injected into the 24th round.
If the fault model used in this analysis is a semi-random bit model, the
round subkey can be revealed directly. Specially, the semi-random bit
model corresponds to an adversary who could know the corrupted 4 bits
at the chosen round but not know the exact bit in these 4 bits. Finally,
the data complexity analysis and simulations show the number of neces-
sary faults for revealing the master key.

Keywords: Differential fault analysis (DFA), Variant Feistel structure,


Differential distribution, Key scheduling.

1 Introduction
Background. Cryptographic techniques are seen as an essential method for
the confidentiality, protection of the privacy and data integrity. Recently, with

The first author of this research Liang Zhao is supported by the governmental schol-
arship from China Scholarship Council.

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 135–150, 2012.

c Springer-Verlag Berlin Heidelberg 2012
136 L. Zhao, T. Nishide, and K. Sakurai

the development of communications and electronic industry, the low resource


devices such as RFID tags and sensor nodes are widely used. As this kind of
device has the special features, such as small storage space, weak computation
ability and strict power constraints [1], the research of the lightweight cryp-
tography has been focused on a large scale. Specially, the lightweight block ci-
pher is one of the most primary topics. There have existed several lightweight
block ciphers, such as PRESENT [2], KATAN/KTANTAN [3], DESL/DESXL [4],
HIGHT [5] and PRINTcipher [6]. For these lightweight block ciphers, the corre-
sponding cryptanalysis has also been developed [7–10]. For example, in [7], the
side-channel attack on PRESENT was presented. In [8], a Meet-in-the-Middle
attack on KTANTAN was proposed to reveal the master key. In [10], an invariant
subspace attack was introduced to analyze PRINTcipher.

Previous Works. Fault analysis, seen as an active side-channel attack (SCA),


is an implementation attack compared with the differential cryptanalysis and
linear cryptanalysis which are the classical cryptanalysis. The fault analysis was
introduced by Boneh et al. [11] against the implementations of RSA-CRT in
1996. After that, several fault based analyses were proposed, such as the Inef-
fective Fault Analysis (IFA) [13], Collision Fault Analysis (CFA) [12] and Fault
Sensitivity Analysis (FSA) [14]. However, the most discussed fault analysis is the
Differential Fault Analysis (DFA). DFA attack was first introduced by Biham and
Shamir [15] for analyzing the DES block cipher. It can be seen as an extension
of the differential cryptanalysis [16]. Nowadays, DFA attack is considered to be a
serious attack for the cryptographic techniques. For example, in [17–22], the DFA
attack has been proposed for analyzing the AES block cipher, DES block cipher
and CLEFIA lightweight block cipher. Moreover, for the stream cipher, some such
attacks have been also used on Trivium [23], Sosemanuk [24] and HC-128 [25].
For the low-cost devices such as smart cards, the fault analysis is also suitable
for analyzing the security of the used cipher. Some techniques which target the
software or the hardware components of smart cards are known to provoke fault
during the computation process such as provoking a spike on the power supply and
using external approaches which are based on the laser or Focused Ion Bean [19].
As the secret keys embedded into the secure computing device such as smart cards
or RFID tags can be revealed by the fault analysis within a feasible computational
time, fault analysis will become a serious issue for the lightweight cipher.
Challenge Issues. As the above discussion, the DFA attack is also suitable
for analyzing the lightweight ciphers, including the lightweight block ciphers.
Considering the lightweight block cipher as the case, it is usually utilized for the
constrained devices which implies that the adversary lacks the large computing
ability. Therefore, when we propose this kind of SCA to analyze the lightweight
block cipher, there are two crucial challenges that should be considered:
– Fault model: The realistic fault model which has weak assumptions should
be used in the DFA attack. According to the application of the lightweight
block cipher, the weaker adversary implies the more practical attack. e.g.,
two fault models with the weak assumption are the random bit model and
the random byte model.
Differential Fault Analysis of Full LBlock 137

– Round of Attack: For the countermeasure against the DFA attack, the
popular and simple method is to duplicate the encryption algorithm and
check whether the same ciphertext can be obtained [20]. As this protection
against the DFA attack can cut down the performance, only the computation
of the last few rounds need to be doubled [20]. This implies that the DFA
attack should be exploited on the earlier round for the key recovery. Since
there usually exits the diffusion function in the block cipher, the round of
DFA attack need to be explored carefully.
Our Contributions. LBlock [1] is a new lightweight block cipher which was
presented at ACNS2011. It is based on the 32-round variant Feistel structure
with 64-bit block size and 80-bit key size. In [1], Wu et al. explored the strength
of LBlock against some attacks such as differential cryptanalysis, integral attack
and related-key attack. Moreover, Minier et al. [26] recently analyzed the differ-
ential behavior of LBlock and presented the related key impossible differential
attack on round-reduced LBlock. However, it really lacks the analysis on the
implementation attack. Therefore, in the current paper, we consider the DFA
attack for LBlock. Specially, for our analysis, the practical fault model, i.e., the
random bit model, is utilized when the fault is injected into the end of the rounds
from the 24th round to the 31st round, respectively. In this fault model, the ad-
versary does know neither the fault value nor the fault position in the injection
area. To the best of our knowledge, this is the first paper that proposes the fault
analysis on full LBlock. The details are as follows:
– Firstly, if the fault is injected at the end of round from 25th to 31st , we
present the general principle of DFA attack for revealing round subkeys K 32 ,
K 31 and K 30 . Then, these three round subkeys are used to reveal the master
key K only by computing the inverse process of the key scheduling. Specially,
when the fault is injected into the 25th and 26th round, we introduce the
concept of the false active S-box which is used to distinguish it from the
active S-box.
– Secondly, as the research of DFA attack on earlier round has been introduced
for AES and DES by Derbez et al. [19] and Rivain [20], respectively, we also
analyze the DFA attack on LBlock when the fault is injected into the right
part at the end of the 24th round. If the used fault model has an assumption
that the adversary can know the position of the corrupted 4 bits at the 24th
round, we show that it is possible to reveal the round subkey K 32 by the DFA
attack directly. This implies that the master key K can also be revealed.
Moreover, in order to confirm the DFA attack on LBlock, the data complexity
analyses are presented (see Table 6), and the simulation experiments are imple-
mented (see Table 7). These simulation results show the number of faults needed
for this attack when the fault is injected into the different rounds (i.e., from 24th
round to 31st round). Specially, it can be found that if the fault is injected into
the 25th round, the smallest number of the utilized faults on average is needed
for revealing the master key K.
Organization of the Paper. In Section 2, the detailed description and some
properties of LBlock are presented. Then, the proposed DFA attack for revealing
the master key of the LBlock is introduced in Section 3, and Section 4 shows
138 L. Zhao, T. Nishide, and K. Sakurai

the corresponding data complexity and the simulation result. The concluding
remarks and the possible countermeasure are drawn in the last section.

2 Preliminaries
In this section, the notations used in this paper are listed as follows. Then, we
present a brief description of LBlock. Moreover, some properties about LBlock
are given.
– M , C: 64-bit plaintext and ciphertext.
– K, K i−1 : 80-bit master key and 32-bit round subkey, i∈{2, 3, . . ., 33}.
– F (·), P (·): Round function and diffusion function in LBlock.
– sj (·): Confusion function with 4-bit S-box sj , j∈{0, 1, . . ., 7}.
– ⊕: Bitwise exclusive-OR operation.
– <<<, >>>: Left cyclic shift and right cyclic shift operation
– : Concatenation operation.
– [v]2 : Binary form of an integer v.

2.1 LBlock Description


LBlock is a lightweight block cipher which is based on a 32-round variant Feistel
structure with the block size of 64 bits and key size of 80 bits (see Fig. 1). The F -
function includes a round subkey addition, a confusion by eight 4-bit S-boxes sj
(0≤j≤7) and a permutation of eight 4-bit words. Specially, for the permutation
operation, it can be expressed as follows:

Z = Z7 ||Z6 ||Z5 ||Z4 ||Z3 ||Z2 ||Z1 ||Z0 → U = Z6 ||Z4 ||Z7 ||Z5 ||Z2 ||Z0 ||Z3 ||Z1

where U i−1 =U 0i−1 U 1i−1 U 2i−1 U 3i−1 U 4i−1 U 5i−1 U 6i−1 U 7i−1 if the result of the
permutation operation U is in the (i − 1)th round. Let the input of one round of
the encryption Mi−1 =X i−1 X i−2 , the output Ci−1 =X i X i−1 can be expressed
as (F (X i−1 , K i−1 )⊕(X i−2 <<<8), X i−1 ). After the 32 rounds encryption, the
ciphertext C=X 32 X 33 can be obtained.

Fig. 1. One round of the LBlock cipher


Differential Fault Analysis of Full LBlock 139

The 80 bits master key K=k 79 k 78 k77 . . .k 0 is used to generate the round subkey
K i ∈{0, 1}32 . This key is stored in a key register. The details of the key scheduling
are as follows:
– If i=2, K i−1 =k 79 k 78 k 77 . . .k 49 k 48 which is from the master key K.
– If i∈{3, 4, . . ., 33}, the round subkey K i−1 can be updated by the following
steps:
• (1) The key register K<<<29.
• (2) [k 79 k 78 k 77 k 76 ]=s9 [k 79 k 78 k 77 k 76 ]; [k 75 k 74 k 73 k 72 ]=s8 [k 75 k 74 k 73 k 72 ].
• (3) [k 50 k 49 k 48 k 47 k 46 ]⊕[i − 2]2 .
• (4) The leftmost 32 bits from the current register K is output as the
round subkey Ki−1 .
In the LBlock [1], ten S-boxes are specified. The details are shown in Table 1. These
S-boxes are used in the encryption/decryption algorithm and key scheduling.

Table 1. S-boxes of LBlock

s0 14 9 15 0 13 4 10 11 1 2 8 3 7 6 12 5
s1 4 11 14 9 15 13 0 10 7 12 5 6 2 8 1 3
s2 1 14 7 12 15 13 0 6 11 5 9 3 2 4 8 10
s3 7 6 8 11 0 15 3 14 9 10 12 13 5 2 4 1
s4 14 5 15 0 7 2 12 13 1 8 4 9 11 10 6 3
s5 2 13 11 12 15 14 0 9 7 10 6 3 1 8 4 5
s6 11 9 4 14 0 15 10 13 6 12 5 7 3 8 1 2
s7 13 10 15 0 14 4 9 11 2 1 8 3 7 5 12 6
s8 8 7 14 5 15 13 0 6 11 12 9 10 2 4 1 3
s9 11 5 15 0 7 2 9 13 4 8 1 12 14 10 3 6

2.2 Properties of LBlock


This section is used to present the analyses of LBlock. These analyses are related
to our attack.
Definition 1. (Difference category) Let Δα be a nonzero difference. If Δα is
propagated by the word-wise permutation in the round function, the corresponding
output difference Δβ is defined as Δβ(W P ) . If Δα is propagated by the rotation
operation in each round, the corresponding output difference Δβ is defined as
Δβ(RP ) . If Δβ comes from the propagation of these two operations, it is defined
as Δα(W P +RP ) , i.e., Δα(W P +RP ) =Δα(W P ) ⊕Δα(RP ) .
According to this definition, for LBlock, the output difference of the right half
data of the (i − 1)th round ΔXi ∈{ΔXi(W P ) , ΔXi(RP ) , ΔXi(W P +RP ) }.
Definition 2. (Differential distribution table) Let s be an 4×4 S-box, i.e.,
{0,1}4 →{0, 1}4 . For Δα and Δβ∈{0, 1}4 , the input of S-box IN s (Δα, Δβ) ={x∈
{0, 1}4 | s(x)⊕s(x⊕Δα)=Δβ} and corresponding number of input N s (Δα, Δβ) =
{x∈{0, 1}4 |s(x)⊕s(x⊕Δα)=Δβ}. The differential distribution table of s is de-
fined as a table (Δα, Δβ, N s (Δα, Δβ)), where the row and column are Δα and
Δβ, respectively, and N s (Δα, Δβ) is the element.
140 L. Zhao, T. Nishide, and K. Sakurai

In the following, the properties of S-boxes used in LBlock are listed.


Lemma 1. For each S-box of LBlock, i.e., s∈{s0 , s1 , . . ., s7 }, given an input
difference Δα>0, the number of the possible output difference Δβ (i.e., N uΔβ )
satisfies the number in Table 2. Specially, N s (Δα, Δβ)∈{0, 2, 4}.

Table 2. Number of the possible output difference Δβ (NuΔβ )

Δα 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NuΔβ 6 6 4 6 6 6 6 8 6 6 4 8 8 8 8

Proof. This proof is immediate from the observation of the differential distribu-
tion table of each S-box sj .
According to Lemma 1, we can obtain the following propositions.
Proposition 1. For any S-box of LBlock, i.e., s∈{s0 , s1 , . . ., s7 }, when α>0,
the conditional probabilities Pr[N s (α, β)= 2|N s (α, β)> 0] and Pr[N s (α,
β)= 4|N s (α, β)> 0] satisfy the following equations:

⎨ 0, N uΔβ = 4
Pr[Ns (Δα, Δβ) = 2|Ns (Δα, Δβ) > 0] = 2/3, N uΔβ = 6

⎧ 1, N uΔβ = 8
(1)
⎨ 1, N uΔβ = 4
Pr[Ns (Δα, Δβ) = 4|Ns (Δα, Δβ) > 0] = 1/3, N uΔβ = 6

0, N uΔβ = 8
Proof. This proof is immediate from the distribution of N s (α,β) in the dif-
ferential distribution table.
Proposition 2. For each S-box of LBlock, let the input difference α>0, the
probability Pr[N s (α, β)>0]≈ 0.4267. Moreover, if N s (α, β)>0, the expec-
tation of N s (α, β)≈ 2.6222.
Proof. According to Lemma 1, it can be found that Ns (α, β)∈{0, 2, 4}. If
α>0, β∈{1, 2,. . ., 15}. Therefore, Pr[Ns (α, β)>0] can be expressed as
Pr[Ns (Δα, Δβ) > 0] = Pr[Ns (Δα, Δβ) = 2] + Pr[Ns (Δα, Δβ) = 4]
.
= (8 × 6 + 5 × 8 + 2 × 4)/(15 × 15) ≈ 0.4267
Moreover, according to P roposition 1 and Table 2, the expectation of N s (α,
β) is computed as follows
E(Ns (Δα, Δβ)) = (2/15 + 8/15 × 1/3) × 4 + (5/15 + 8/15 × 2/3) × 2 ≈ 2.6222.
In the following, a property about the diffusion function P (·) is given.
Lemma 2. The inversion of the diffusion function P (·) can be expressed as
P−1 (U)=U5 ||U7 ||U4 ||U6 ||U1 ||U3 ||U0 ||U2 .
Proof. This proof is immediate.
According to the expression of P −1 (·), the analysis on the S-box of LBlock can
be extended to the round function F (·). This can contribute to our analysis.
Next, we move on to break the LBlock by the DFA attack.
Differential Fault Analysis of Full LBlock 141

3 Differential Fault Analysis on LBlock


In this section, the Differential Fault Analysis (DFA) is presented to analyze
LBlock. Before the description of this analysis, we introduce the used fault model.

3.1 Fault Model


The random bit fault model is used in our proposed analysis when the fault is
injected into the rth round with r∈{24, 25, 26, 27, 28, 29, 30, 31}. Considering
the application device of LBlock such as the smart card, this kind of model can
be seen as the realistic fault model in which the basic assumptions are as follows:
– The adversary can encrypt the same plaintext with the same master key for

obtaining a pair of right and faulty ciphertexts (C, C).
– Only one bit fault is randomly injected into the register which is used to
store the intermediate results. The adversary can know the fault injection
area. In our fault analysis, the fault is randomly injected into any bit of
the internal state at the end of the rth round. Specially, if r∈{31, 30, 29},
the fault is injected into the left part of the internal state. Otherwise, the
fault is injected into the right part of the internal state (see Fig. 2). For
the condition that r≤28, according to Fig. 2, it can be found that a fault
injection in the right part can skip one round before propagating through
the round function F .
– The adversary does not know the position of the fault in the internal state.
As the slow diffusion of LBlock, compared with byte-oriented model of random
faults, this kind of fault model is more suitable for the DFA attack on the earlier
round.

3.2 Attack Description of Retrieving Master Key


General Principle of DFA Attack on LBlock. The basic principle of DFA
can be described as the Eq. (2), where xa =x and xb =x⊕Δα are a pair of inputs,
Δβ is the output difference, k is a round subkey, and s[·] is denoted as the S-box
operation.
s[k ⊕ x] ⊕ s[k ⊕ (x ⊕ Δα)] = Δβ, (2)
If these inputs and corresponding outputs difference are known, we can obtain a
set of unknown key candidates k according to solving the Eq. (2). Therefore,
based on Eq. (2), if the random bit fault model is used to analyze the LBlock,
there exist three steps for the DFA attack, i.e.,
– Step 1: Explore the active S-box and deduce the input and correspondingly-
right output difference for this S-box.
– Step 2: Reveal the key candidates based on the differential distribution table
(Δα, Δβ, Ns (Δα, Δβ)), where Δα and Δβ are the input and corresponding
output difference of the S-box.
– Step 3: Repeat the above steps to reduce the possible key space, and ensure
the unique key k finally.
142 L. Zhao, T. Nishide, and K. Sakurai

(a) Fault→31st (b) Fault→30th (c) Fault→29th (d) Fault→28th

Fig. 2. Error propagation paths

Table 3. Active S-boxes of 32nd round for the corrupted 4 bits

Bits 31–28 27–24 23–20 19–16 15–12 11–8 7–4 3–0


31st round s7 s6 s5 s4 s3 s2 s1 s0
30th round s5 s7 s4 s6 s1 s3 s0 s2
29th round s1 ,s4 s0 ,s5 s6 ,s7 s6 ,s7 s0 ,s5 s1 ,s4 s2 ,s3 s2 ,s3

For LBlock, if the fault is injected into the left part at the end of the rth round
(i.e., r∈{29, 30, 31}), the adversary can distinguish the active S-boxes from the

inactive S-boxes in each round according to the difference pair (ΔXi−1 e
, ΔXie )
directly, where e, e ∈{0, 1, . . ., 7}, e
=e . Specially, we have, for the case r=29,
i∈{31, 32, 33}, for the case r=30, i∈{32, 33}, and for the case r=31, i=33.
Table 3 lists the active S-boxes of the 32nd round when the fault is injected into
the 29th , 30th and 31st round, where the 32-bit corrupted X32 , X31 and X30
are divided into eight 4 bits from the 31st bit to the 0th bit, respectively. As

e e
the deduced output difference ΔXie ∈{ΔXi(W P ) , ΔXi(RP ) }, the difference pair
 
e
(ΔXi−1 , ΔXi(W
e
P ) ) can be extracted from (ΔXi−1 , ΔXi ), which is the input
e e

and output difference pair of the active S-box in the (i-1)th round. Then, the
e
e
adversary that knows these difference pair (ΔXi−1 , ΔXi(W P ) ) can mount a key
recovery attack. Note that if a round key Ki−1 is revealed, it can be used in
the recovery of Ki−2 . The final round subkey K30 , K31 , and K32 are uniquely

determined by using few pairs of ciphertexts (C, C).
Differential Fault Analysis of Full LBlock 143

After the round subkeys K30 , K31 and K32 are revealed, the master key K
can be obtained by using the reverse process of the key scheduling other than
the brute-force analysis. The steps are as follows:

1 e 2 e
Table 4. Output differences ΔX33(W P ) and ΔX33(RP )

Bits 31–28 27–24


e1 0 3 1 0 1 6
ΔX33(W P) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P )
27th e2 4 5 4 5
ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP )
e1 1 3 1 3
th ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P )
28 e2 2 4
ΔX33(RP ) ΔX33(RP ) ΔX33(RP )
Bits 23–20 19–16
1e 2 5 7 0 3 6
th ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P )
27 e
2 3 6 2 7
ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP )
e1 0 6 2 4
ΔX33(W P) ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P )
28th e2 7 1
ΔX33(RP ) ΔX33(RP ) ΔX33(RP )
Bits 15–12 11–8
e1 4 5 7 2 4 5
ΔX33(W P) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P )
27th e2 0 1 0 1
ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP )
e1 5 7 5 7
ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P )
28th e2 6 0
ΔX33(RP ) ΔX33(RP ) ΔX33(RP )
Bits 7–4 3–0
1e 3 1 6 2 4 7
th ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P ) ,ΔX33(W P )
27 e
2 2 7 3 6
ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP ) ΔX33(RP ) ,ΔX33(RP )
e1 2 4 0 6
ΔX33(W P) ΔX33(W P ) ,ΔX33(W P ) ΔX33(W P ) ,ΔX33(W P )
28th e2 3 5
ΔX33(RP ) ΔX33(RP ) ΔX33(RP )

– Step 1: Set the 80-bit key register K reg =k 79 k 78 . . .k1 k0 . Then, input the
round subkey K 30 into the leftmost 32 bits of K reg , i.e., k 79 k 78 . . .k49 k48 =K 30 .
For the round subkey K 31 , input the bits K31 K31 . . .K31
23 22 3
into k 42 k41 . . .k22 .
Moreover, the bits K30 K30 . . .K30 of K 30 is input into k 13 k12 . . .k0 directly.
23 22 10

– Step 2: Extract the leftmost 8 bits of K 31 and divide them into two bits sets:
IK 1 =[K 3131 K 31 K 31 K 31 ] and IK 2 =[K 31 K 31 K 31 K 31 ]. Input IK1 and IK2
30 29 28 27 26 25 24
−1 −1
into s9 and s8 , respectively, and obtain the output (i.e., [K 31 30 29 28
31 K 31 K 31 K 31 ]
−1 27 26 25 24 −1
=s9 ([K 31 K 31 K 31 K 31 ]),[K 31 K 31 K 31 K 31 ]=s8 ([K 31 K 31 K 31 K 31 ])). Spe-
31 30 29 28 27 26 25 24

cially, s−1 −1
9 and s8 are the inversion S-boxes of s9 and s8 . Then, k47 =K 31 ,
28
27 26 25 24
k46 k45 k44 k43 =K 31 K 31 K 31 K 31 . k21 k20 k19 =[K 31 K 31 K 31 ]⊕[111], where [111]
2 1 0

comes from [30]2 . Moreover, extract the leftmost 8 bits of K 32 and divide
them into two bits sets: IK 3 =[K 31 32 K 32 K 32 K 32 ] and IK 4 =[K 32 K 32 K 32 K 32 ].
30 29 28 27 26 25 24
−1 −1
Input IK3 and IK4 into s9 and s8 to obtain the corresponding outputs,
−1 −1
i.e., [K 31 30 29 28 27 26 25 24
32 K 32 K 32 K 32 ]=s9 ([K 32 K 32 K 32 K 32 ]),[K 32 K 32 K 32 K 32 ]=s8
31 30 29 28
144 L. Zhao, T. Nishide, and K. Sakurai

28 27 26


32 K 32 K 32 K 32 ]). Then, k18 =K 32 ⊕[1], k17 =K 32 ⊕[0], and k16 k15 k14 =K 32
([K 27 26 25 24
25 24
K 32 K 32 . Until now, the key register K reg , which is used to produce the
round subkey K 30 , is found.
– Step 3: Repeat the inverse operation of the key scheduling from i=30 to
i=2. Then, the original master key K can be revealed.
The above key recovery schedule only depends on three consecutive round sub-
keys K32 , K31 and K30 . Therefore, if the fault is injected into the earlier round
(i.e., r≤28), this recovery schedule can be also used to obtain the master key K.
Let us consider the condition that the fault is injected into the right part
at the end of the rth round (r∈{27, 28}). As the deduced output difference
 e1 e2   
ΔXie ∈{ΔXi(W P ) , ΔXi(RP ) } (e ∈{e 1 , e 2 }), the adversary can apply the pre-
vious attack to reveal the master key K. Table 4 lists the nonzero output dif-
e1 e2
ferences ΔX33(W P ) and ΔX33(RP ) when the fault is injected into the 27
th
and

28th round, respectively. Specially, the difference pair (ΔX32 e
, ΔX33(We
P ) ) can
be used to reveal the corresponding 4 bits of the round subkey K32 e
.
For the previous DFA attack, the procedure for revealing the round subkey
(e −2)mod8
K32 and K31
e e
can be in parallel. This is based on the fact that ΔX31 =

ΔX33(RP
e
) and ΔX e
32 ∈{ΔX e1
32(W P ) , ΔX e2
32(RP ) }, where e∈{e 1 , e 2 }. e.g., if the
fault is injected into the X270
, the 4-bit K32 0
, K325
and K326
can be revealed ac-
cording to the difference pair ({ΔX32 0
, ΔX32 5
, ΔX32 6
}, {ΔX33(W
2
P ) , ΔX33(W P ) ,
4

ΔX33(W P ) }). Moreover, the 4-bit K31 and K31 can also be revealed according
7 1 4

to the pair ({ΔX31 1


, ΔX314
}, {ΔX32(W
0
P ) , ΔX32(W P ) }), simultaneously.
6

Reveal K32 under Condition: Fault→25th and Fault→26th . If the fault


is injected into the right part during the earlier round (i.e., the 25th and 26th
round), the previous DFA attack can not be used to reveal the K32 immediately.
e e1
This is due to the fact that the nonzero output difference ΔX33 ∈{ΔX33(W P ),
e e    
ΔX33(W 2
P ) , ΔX33(W P +RP ) }, where e ∈{e 1 , e 2 , e 3 }. It implies that some input
3

differences does not correspond to the output differences according to Eq. (2)
e3
(i.e., for the input difference ΔX32 e
, the output difference is ΔX33(W P +RP ) ). For
this kind of input difference ΔX32 e
, we define the related S-box as the false active
S-box. Therefore, for revealing the right round subkey K32 , the true active S-box
should be distinguished from the false active S-box. Table 5 lists these two kinds
of S-boxes in the 32nd round when the fault is injected into the 25th and 26th
round, respectively. It can be found that for the corrupted  4 bits of the 25th
(and 26th ) round, the corresponding input difference sets ΔX32 e
are different.
Based on the fault model which is a random bit fault model, the following two
steps are used in the attack procedure:
– Step 1: Produce the nonzero difference set {ΔX32 e
|ΔX32
e
>0, e∈{0, 1, 2,. . .,
7}. Then, deduce the position of the injected fault on the 25th (or 26th )
round according to the generated difference set and Table 5.
– Step 2: Distinguish the active S-boxes from the false active S-boxes based
on Table 5. Then, reveal the corresponding 4 bits of the round subkey K32 e

by using the general principle of the previous DFA attack.


Differential Fault Analysis of Full LBlock 145

Table 5. Active S-boxes and false active S-boxes of 32nd round

Bits 31–28 27–24 23–20 19–16


th active S-boxes s0 ,s2 ,s6 s0 ,s4 ,s6 s1 ,s3 ,s7 s1 ,s5 ,s7
25
false active S-boxes
s1 ,s3 ,s5 ,s7 s1 ,s3 ,s5 ,s7 s0 ,s2 ,s4 ,s6 s0 ,s2 ,s4 ,s6
th active S-boxes s2 ,s3 ,s4 ,s7 s1 ,s2 ,s3 ,s6 s1 ,s2 ,s4 ,s5 s0 ,s4 ,s5 ,s7
26
false active S-boxes
s6 s7 s0 s1
Bits 15–12 11–8 7–4 3–0
th active S-boxes s2 ,s4 ,s6 s0 ,s2 ,s4 s3 ,s5 ,s7 s1 ,s3 ,s5
25
false active S-boxes s1 ,s3 ,s5 ,s7 s1 ,s3 ,s5 ,s7 s0 ,s2 ,s4 ,s6 s0 ,s2 ,s4 ,s6
active S-boxes s0 ,s3 ,s6 ,s7 s2 ,s5 ,s6 ,s7 s0 ,s1 ,s5 ,s6 s0 ,s1 ,s3 ,s4
26th
false active S-boxes s2 s3 s4 s5

Revealing K32 under Condition: Fault→24th . The success of the previ-


ous DFA attack is based on the condition that there exists the inactive S-box
for distinguishing the corrupted 4 bits in the 32-bit Xi (or Xi−1 ). Then, the
adversary can explore the active S-boxes in the 30th , 31st and 32nd round from
the error propagation route to reveal the corresponding round subkey K32 , K31
and K30 . However, as LBlock can achieve the best diffusion in eight rounds [1],
if a bit-fault is injected into the right part at the end of the 24th round, The
fault is totally diffused at the 32nd round (see Fig. 3). Therefore, in the 32nd
round, all the eight S-boxes have the nonzero difference inputs. Under this con-
dition, if the fault model is the random bit model, the adversary can firstly
check the value Ns (Δα, Δβ) of each S-box from s0 to s7 to explore the active
S-box, and then reveal the corresponding part of the round subkey K32 . For
this case, if Nsj (Δα, Δβ)>0, the corresponding S-box sj can be considered as
the active S-box candidate. Otherwise, this S-box is the false active S-box. Let
SN={Nsj (Δα, Δβ)>0| j∈{0, 1, 2,. . ., 7}} be the number of Nsj (Δα, Δβ)>0.
According to the Lemma 1 and P roposition 2, the success of the adversary
Pr[A=1] can be computed by Eq. (3). As Pr[Nsj (Δα, Δβ)>0]≈0.4267, the suc-
cess Pr[A=1]=0.4267×0.57337+0.42672×0.57336×1/2+0.42673×0.57335×1/3+
0.42674×0.57334×1/4+0.42675×0.57333×1/5+0.42676×0.57332×1/6+0.42677×
0.5733×1/7+0.42678×1/8≈0.01563. This implies that the adversary can distin-
guish the active S-box from the false active S-box with the approximative prob-
ability 0.01563 under the random bit model.
Pr[A = 1] = Pr[A = 1|SN = 1] + Pr[A = 1|SN = 2] × 12 + Pr[A = 1|SN = 3] × 1
3
+Pr[A = 1|SN = 4] × 14 + Pr[A = 1|SN = 5] × 15 + Pr[A = 1|SN = 6] × 16 .
+Pr[A = 1|SN = 7] × 17 + Pr[A = 1|SN = 8] × 18
(3)
In fact, it can be found that if a fault is injected into any 4 bits in the 24th round,
there is only one active S-box in the 32nd round (see Fig. 3, The blue number
denotes that the corresponding S-box is the false active S-box, and the black
number denotes that the corresponding S-box is the active S-box). Therefore,
if the DFA attack corresponds to a stronger adversary, the round subkey K32
can be revealed directly. Based on this consideration, we assume that the used
fault model is the semi-random bit model. In this model, our attack corresponds
of an adversary who knows which 4 bits are faulted at the chosen round (i.e.,
the 24th round). However, the adversary does also not know any information
146 L. Zhao, T. Nishide, and K. Sakurai

about which bit is corrupted in these 4 bits. For this semi-random bit model, as
the adversary knows which 4 bits in the 24th round are corrupted, she/he can
distinguish the active S-box from the false active S-box successfully (see Fig. 3).
e.g., if the fault is injected into any bit of the first 4 bits at the end of the 24th
round (i.e., X240
), the adversary can know that s4 is the unique active S-box.
Then, the round subkey K32 can be revealed by using the general principle of
DFA Attack.

Fig. 3. Error propagation from 24th round to 32nd round

4 Theoretical and Simulation Results


Data Complexity Analysis. In LBlock, there are eight distinct S-boxes (i.e.,
s0 , s1 ,. . ., s7 ) which are used in F function. According to P roposition 2, it
can be found that for the non-empty Ns (Δα, Δβ), the expectation of Ns (Δα,
Δβ)=2.6222. This implies that about three faults (i.e., faulty ciphertexts) on
average are needed to reveal the input of each S-box. Therefore, about 24 faults
should be used to reveal each round subkey Ki−1 . However, according to the used
DFA attack, some parts of the round subkey can be recovered synchronously, e.g.,
if the fault is injected into the end of the 28th round for corrupting the first 4
bits, the candidates of K32 0
, K32
6
can be obtained simultaneously. Table 6 lists
the number of faults (i.e., F N ) for revealing the round subkeys {K32 , K31 , K30 }
and the master key K. As the used fault model is the random bit model, we
list the lower bound of the theoretical number of faults if three faults on average
are used to reveal the 4 bits of the round subkey. Moreover, according to the
structure of LBlock, if a round subkey Ki−1 of the (i–1)th round is revealed, the
corresponding input Xi−2 Xi−1 is also recovered. Specially, this input can be
seen as the output of the ith round. Therefore, the total number of the injected
faults can be at least the maximum among the number of faults for revealing
K32 , K31 and K30 .
Differential Fault Analysis of Full LBlock 147

Table 6. Data complexity analysis of DFA attack on LBlock

Injected round 24th 25th 26th 27th 28th 29th 30th 31st
F N32 24 8 7 8 12 12 24 24
F N31 8 7 8 12 24 24 24 ∗
F N30 7 8 12 24 24 24 ∗ ∗
F Nsum 24 8 12 24 24 24 ∗ ∗

Computer Simulation. The proposed DFA has been successfully implemented


through the computer simulation. Specially, the simulations were done by Mat-
lab2009 running on the computer with Core 2 Duo CPU 1.40GHz and 2.00GB
RAM. For each injected round from the 24th round to the 31st round, ten times
simulations are performed on LBlock. Table 7 lists the number of faults of the
corresponding DFA attack.

Table 7. Simulation results of DFA attack on LBlock

Injected round 24th 25th 26th 27th 28th 29th 30th 31st
1 21 8(7) 10(9) 10(9) 10(9) 10(9) 29(19) 42(21)
2 28 13(8) 12(9) 11(10) 13(10) 12(9) 24(18) 76(29)
3 17 15(8) 7(7) 9(8) 15(10) 14(9) 36(24) 42(19)
4 21 11(8) 8(7) 12(10) 12(9) 9(8) 110(33) 41(21)
5 19 10(8) 8(7) 9(9) 15(11) 10(8) 41(23) 35(22)
6 21 19(8) 10(9) 10(8) 13(11) 15(8) 116(37) 30(20)
7 21 8(6) 10(9) 9(7) 10(10) 12(10) 36(21) 36(19)
8 20 17(12) 6(6) 11(9) 14(10) 15(10) 21(18) 39(22)
9 19 11(9) 9(9) 14(12) 15(13) 9(8) 60(28) 40(21)
10 27 17(12) 7(7) 7(7) 17(10) 9(8) 65(23) 44(24)

In Table 7, for each DFA attack, only the number of faults for revealing the
K32 is presented. This is based on the fact that the DFA attack for revealing K31
and K30 when the fault is injected into the (i–3)th round is the same as the DFA
attack for revealing K32 when the fault is injected into the (i–2)th and (i–1)th
round. Specially, in this table, the number of faults in the bracket is the number
of the utilized faults for revealing K32 , and the number of faults out of the
bracket is the total number of faults used in the DFA attack under the random
bit model. For the case that the fault is injected into the 24th round, the number
of the injected faults is detected under the semi-random bit model. Generally
speaking, the simulation results verify the former data complexity analysis in
most cases. For the running time of the simulation, it is within one second. If
the fault is injected into the 24th round, the running time for revealing K32 is
within 0.08 second. Other conditions for revealing K32 is within 0.06 second.

5 Conclusions and Future Works


We have presented the differential fault analysis on a new lightweight block
cipher LBlock. The random bit fault model is utilized in our attack. When the
148 L. Zhao, T. Nishide, and K. Sakurai

fault is injected into the end of the rth round (r∈{25, 26, 27, 28, 29, 30, 31}),
the round subkey can be revealed by using the pair of ciphertexts (C, C)  and
the differential cryptanalysis. Then, the master key is revealed according to the
analysis of the key scheduling which uses the last three round subkeys. Specially,
if the fault is injected into the 25th and 26th round, the active S-box should be
firstly distinguished from the false active S-box which also has the nonzero input
difference. Moreover, when the fault is injected into the 24th round and the fault
model is the semi-random bit model in which a strong adversary can know the
position of the corrupted 4 bits in the register, the DFA attack is exploited for
breaking the LBlock immediately.
To thwart the proposed DFA attack on LBlock, a possible countermeasure is
to protect the last few rounds by doubling the computation and doing the result
checking. Moreover, as noted in [20], assuming that the adversary has access
to the corresponding decryption oracle, the proposed DFA attack can also be
used on the first few rounds of the cipher. This implies that the same number
of rounds need to be protected at the beginning of the cipher. According to
our analysis, for LBlock, at least the last nine rounds and the first nine rounds
are recommended to be protected against the DFA attack. However, what is
provided in our work is the known lower bound on the number of rounds to
be protected. Therefore, whether the DFA attack can succeed if the fault is
injected into the middle rounds (e.g., the 23rd round) should be further explored.
Moreover, investigating whether the DFA attack can reveal the master key K
efficiently under the random bit model when the fault is injected into the 24th
round is an interesting problem.

Acknowledgments. The authors would like to thank the anonymous reviewers


for their helpful and valuable comments. This research is (partially) supported by
JAPAN SCIENCE AND TECHNOLOGY AGENCY (JST), Strategic Japanese-
Indian Cooperative Programme on Multidisciplinary Research Field, which com-
bines Information and Communications Technology with Other Fields, entitled
“Analysis of Cryptographic Algorithms and Evaluation on Enhancing Network
Security Based on Mathematical Science”.

References
1. Wu, W.-L., Zhang, L.: LBlock: A Lightweight Block Cipher. In: Lopez, J., Tsudik,
G. (eds.) ACNS 2011. LNCS, vol. 6715, pp. 327–344. Springer, Heidelberg (2011)
2. Bogdanov, A., Knudsen, L.-R., Leander, G., Parr, C., Poschmann, A., Robshaw,
M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466.
Springer, Heidelberg (2007)
3. De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN — A
Family of Small and Efficient Hardware-Oriented Block Ciphers. In: Clavier, C.,
Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg
(2009)
4. Leander, G., Paar, C., Poschmann, A., Schramm, K.: New Lightweight DES Vari-
ants. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 196–210. Springer,
Heidelberg (2007)
Differential Fault Analysis of Full LBlock 149

5. Hong, D., Sung, J., Hong, S., Lim, J., Lee, S., Koo, B., Lee, C., Chang, D., Lee,
J., Jeong, K., Kim, H., Kim, J., Chee, S.: HIGHT: A New Block Cipher Suitable
for Low-Resource Device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 46–59. Springer, Heidelberg (2006)
6. Knudsen, L., Leander, G., Poschmann, A., Robshaw, M.J.B.: PRINTcipher: A
Block Cipher for IC-Printing. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 16–32. Springer, Heidelberg (2010)
7. Yang, L., Wang, M., Qiao, S.: Side Channel Cube Attack on PRESENT. In: Garay,
J.A., Miyaji, A., Otsuka, A. (eds.) CANS 2009. LNCS, vol. 5888, pp. 379–391.
Springer, Heidelberg (2009)
8. Bogdanov, A., Rechberger, C.: A 3-Subset Meet-in-the-Middle Attack: Cryptanal-
ysis of the Lightweight Block Cipher KTANTAN. In: Biryukov, A., Gong, G.,
Stinson, D.R. (eds.) SAC 2010. LNCS, vol. 6544, pp. 229–240. Springer, Heidel-
berg (2011)
9. Özen, O., Varıcı, K., Tezcan, C., Kocair, Ç.: Lightweight Block Ciphers Revisited:
Cryptanalysis of Reduced Round PRESENT and HIGHT. In: Boyd, C., González
Nieto, J. (eds.) ACISP 2009. LNCS, vol. 5594, pp. 90–107. Springer, Heidelberg
(2009)
10. Leander, G., Abdelraheem, M.A., AlKhzaimi, H., Zenner, E.: A Cryptanalysis of
PRINTcipher: The Invariant Subspace Attack. In: Rogaway, P. (ed.) CRYPTO
2011. LNCS, vol. 6841, pp. 206–221. Springer, Heidelberg (2011)
11. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the Importance of Checking Crypto-
graphic Protocols for Faults (Extended Abstract). In: Fumy, W. (ed.) EUROCRYPT
1997. LNCS, vol. 1233, pp. 37–51. Springer, Heidelberg (1997)
12. Clavier, C.: Secret External Encodings Do not Prevent Transient Fault Analysis.
In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 181–194.
Springer, Heidelberg (2007)
13. Hemme, L.: A Differential Fault Attack Against Early Rounds of (Triple-)DES.
In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 254–267.
Springer, Heidelberg (2004)
14. Li, Y., Sakiyama, K., Gomisawa, S., Fukunaga, T., Takahashi, J., Ohta, K.: Fault
Sensitivity Analysis. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS,
vol. 6225, pp. 320–334. Springer, Heidelberg (2010)
15. Biham, E., Shamir, A.: Differential Fault Analysis of Secret Key Cryptosystems.
In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 513–525. Springer,
Heidelberg (1997)
16. Czapski, M., Nikodem, M.: Error Detection and Error Correction Procedures for
the Advanced Encryption Standard. Des. Codes Cryptogr. 49, 217–232 (2008)
17. Chen, C.N., Yen, S.M.: Differential Fault Analysis on AES Key Schedule and
Some Countermeasures. In: Safavi-Naini, R., Seberry, J. (eds.) ACISP 2003. LNCS,
vol. 2727, pp. 118–129. Springer, Heidelberg (2003)
18. Moradi, A., Shalmani, M.T.M., Salmasizadeh, M.: A Generalized Method of Differ-
ential Fault Attack Against AES Cryptosystem. In: Goubin, L., Matsui, M. (eds.)
CHES 2006. LNCS, vol. 4249, pp. 91–100. Springer, Heidelberg (2006)
19. Derbez, P., Fouque, P.-A., Leresteux, D.: Meet-in-the-Middle and Impossible Dif-
ferential Fault Analysis on AES. In: Preneel, B., Takagi, T. (eds.) CHES 2011.
LNCS, vol. 6917, pp. 274–291. Springer, Heidelberg (2011)
20. Rivain, M.: Differential Fault Analysis on DES Middle Rounds. In: Clavier, C.,
Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 457–469. Springer, Heidelberg
(2009)
21. Chen, H., Wu, W.-L., Feng, D.-G.: Differential Fault Analysis on CLEFIA. In:
Qing, S., Imai, H., Wang, G. (eds.) ICICS 2007. LNCS, vol. 4861, pp. 284–295.
Springer, Heidelberg (2007)
150 L. Zhao, T. Nishide, and K. Sakurai

22. Takahashi, J., Fukunaga, T.: Improved Differential Fault Analysis on CLEFIA.
In: Fault Diagnosis and Tolerance in Cryptography-FDTC 2008, pp. 25–39. IEEE
Computer Society Press, Los Alamitos (2008)
23. Hojsı́k, M., Rudolf, B.: Differential Fault Analysis of Trivium. In: Nyberg, K. (ed.)
FSE 2008. LNCS, vol. 5086, pp. 158–172. Springer, Heidelberg (2008)
24. Esmaeili Salehani, Y., Kircanski, A., Youssef, A.: Differential Fault Analysis of
Sosemanuk. In: Nitaj, A., Pointcheval, D. (eds.) AFRICACRYPT 2011. LNCS,
vol. 6737, pp. 316–331. Springer, Heidelberg (2011)
25. Kircanski, A., Youssef, A.-M.: Differential Fault Analysis of HC-128. In: Bern-
stein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 261–278.
Springer, Heidelberg (2010)
26. Minier, M., Naya-Plasencia, M.: Some Preliminary Studies on the Differential
Behavior of the Lightweight Block Cipher LBlock. In: Leander, G., Standaert,
F.-X. (eds.) ECRYPT Workshop on Lightweight Cryptography, pp. 35–48 (2011),
http://www.uclouvain.be/crypto/ecrypt_lc11/static/post_proceedings.pdf
Contactless Electromagnetic Active Attack
on Ring Oscillator Based True Random Number
Generator

Pierre Bayon1, Lilian Bossuet1 , Alain Aubert1 , Viktor Fischer1 ,


François Poucheret2,3, Bruno Robisson3 , and Philippe Maurine2
1
University of Lyon, Hubert Curien Laboratory, CNRS 5516, 42000, Saint-Etienne,
France
2
University of Montpellier 2, LIRMM Laboratory, CRNS 5506, 34000, Montpellier,
France
3
CEA-LETI, SESAM Laboratory, Centre Microélectronique de Provence, 13541,
Gardanne, France

Abstract. True random number generators (TRNGs) are ubiquitous in


data security as one of basic cryptographic primitives. They are primar-
ily used as generators of confidential keys, to initialize vectors, to pad
values, but also as random masks generators in some side channel attacks
countermeasures. As such, they must have good statistical properties, be
unpredictable and robust against attacks. This paper presents a contact-
less and local active attack on ring oscillators (ROs) based TRNGs using
electromagnetic fields. Experiments show that in a TRNG featuring fifty
ROs, the impact of a local electromagnetic emanation on the ROs is so
strong, that it is possible to lock them on the injected signal and thus to
control the monobit bias of the TRNG output even when low power elec-
tromagnetic fields are exploited. These results confirm practically that
the electromagnetic waves used for harmonic signal injection may rep-
resent a serious security threat for secure circuits that embed RO-based
TRNG.

Keywords: Active attacks, EM injections, IEMI, Ring oscillators,


TRNGs.

1 Introduction
True random number generators (TRNGs) are essential in data security hard-
ware. They are implemented to generate random streams of bits used in cryp-
tographic systems as confidential keys or random masks, to initialize vectors, or
to pad values. If an adversary is able to change the behavior of the generator
(for instance if he can change the bias of the generated stream of bits), he can
reduce the security of the whole cryptographic system.
Surprisingly, there are not many papers dealing with physical attacks on ran-
dom number generators. The only practical attack to the best of our knowledge,
was published by Markettos and Moore [1]. In their attack, the attacker targets

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 151–166, 2012.

c Springer-Verlag Berlin Heidelberg 2012
152 P. Bayon et al.

a two ring oscillator (RO) based TRNG implemented in a security dedicated


Integrated Circuit (IC). Markettos and Moore inject a sine wave signal onto the
power pad of the device in order to intentionally modify the operating conditions
of the two ROs and thus to get a biased output signal.
Within this context, our main contribution is an electromagnetic (EM) attack
on the RO based TRNG that can be seen as a significant improvement of the
attack introduced in [1]. In our attack, the attacker alters the entropy extractor
by injecting an EM signal into the device rather than by inducing a harmonic
signal on the power pad.
The EM injection is contactless and does not require any access to the power
line. The procedure may be applied to ROs operating at higher frequencies than
the cut-off frequencies of the power pad and the supply/ground network. Unlike
in [1], the proposed attack may work on generators featuring separated power
and ground nets for each RO. Note that this technique is used sometimes in
order to decouple the ROs and thus to maximize the entropy per bit at the
generator’s output.
In real cryptographic devices, the embedded TRNG is often built using more
than two ROs (a 2-RO TRNG targeted in [1] is rather exceptional). For this
reason, the EM attacks presented in this paper are evaluated on a TRNG using
as much as 50 ROs. This kind of TRNG was considered to be invulnerable up
to now.
The paper is organized as follows. Section 2 presents the TRNG threat model
and the general structure of the generator based on ROs studied in the paper. In
Section 3, the whole experimental platform required for the EM injection attack
is detailed. Section 4 provides experimental results demonstrating the influence
of the EM injection on the ROs. Section 5 shows how the mono-bit bias of a
50-RO TRNG can be dynamically controlled.

2 Background

This section discusses the TRNG threats and describes briefly the generator
adopted as a design under test (DUT) in the rest of the paper.
The general structure of a TRNG is depicted in Figure 1. The generator is
composed of:

– A digital noise source (randomness source + entropy extractor) that should


give as much entropy per bit as possible, enable a sufficient bit-rate and be
robust to environmental (voltage, temperature) variations.
– An algorithmic post-processing could be added at the output of the TRNG
to enhance statistical properties without reducing the entropy.
– In some cases, the designer could add some embedded tests to evaluate on-
chip the quality of the randomness source in real time or to detect online the
generator’s permanent or temporal failure. However, advanced and complex
statistical tests are time and energy consuming. Therefore, the functionality
and the quality of a TRNG can only be periodically tested on-chip.
Contactless EM Active Attack on RO-Based TRNG 153

Fig. 1. Passive (2, 5) and active (1, 3, 4) attacks on a TRNG general structure

2.1 TRNG Threat Model

Two types of attacks on TRNGs can be considered: passive and active attacks.
Passive attacks collect some information about the generator in order to predict
future values with a non negligible probability (attacks 2 and 5 in Figure 1
– see arrow orientation). Active attacks tend to modify the behavior of the
generator in order to control somehow its output (attacks 1, 3, and 4 in Figure 1).
According to Figure 1, the adversary may target different parts of the TRNG in
different ways. We could expect, that the statistical tests (simple embedded tests
or complex external tests) could detect the attack. One could also argue that
the algorithmic postprocessing would reduce the force of the attack. However,
algorithmic post-processing is missing in some generators [2] or embedded tests
are not used, because the generator is ”provably secure” [3]. Nevertheless, it
is a common practice in applied cryptography that the security of all building
elements is evaluated separately. For this reason, evaluation of the robustness of
the generator and all its parts is of great interest.
Many sources of randomness such as thermal noise, 1/f noise, shot noise or
metastability can be used in TRNGs. A good source of randomness should not
be manipulable (and therefore not attackable) or the manipulation should be
prevented. For example, the thermal noise quality can be guaranteed by control-
ling the temperature. It is thus reasonable to expect that attacks will not target
the source of randomness.
In this paper, we will consider attacks on entropy extraction (1). Their objec-
tive can be to bias the generator output or to reduce the digital noise entropy,
since both bias and entropy reduction can simplify the subsequent attack on
the cryptographic system, since the exhaustive key search can be significantly
shortened. We will not consider other attacks from Figure 1, such as attacks on
tests (2 and 3) and postprocessing (4), because of huge number of methods and
cases that should be considered. It is up to the designer, to adapt postprocessing
and embedded tests to weaknesses of the generator. The aim of this paper is to
154 P. Bayon et al.

show one of possible weaknesses that could be targeted by attacker in RO-based


TRNGs.
As discussed in the introduction, the only published paper dealing with a
practical active attack on TRNG is from Markettos and Moore [1]. This paper
deals with a harmonic signal injection into the power line of a TRNG based on
ROs. The authors claim that they could reduce the digital noise entropy when
the frequency of the harmonic signal was close to the frequency of ROs. Their
study can be seen as a proof of concept of an attack on TRNG using harmonic
injection. Nevertheless, this attack has some practical limits. For example, the
attack could be probably countered by separating power and ground lines of
all ROs, filtering the power supply, avoiding the access to power line, etc. It is
clear, that the efficiency of the attack would be better if it was contactless and
undetectable by embedded sensors, such as light sensor.
In this paper, we show that EM waves are good candidates for performing
contactless attacks.

2.2 RO-Based TRNG

A jittery clock generated by a RO is the most common type of source of random-


ness used in TRNGs. ROs are easy to implement in both ASICs and FPGAs.
Commonly used TRNG principle employing several ROs was proposed in [3]
and enhanced in [2]. The resulting architecture shown in Figure 2 represents one
of the simplest TRNG structures that can be implemented in FPGAs. It needs
only NOT gates (for implementing ROs), flip-flops (as samplers) and a large
XOR gate (entropy collector). In [3], authors proposed a mathematical model
of the TRNG that guarantees enough entropy in the output bit and thus the
robustness and security. In their model, ROs are assumed to be independent.
The generator has several parameters that can be tuned: number of elements
composing ROs, number of ROs and the sampling frequency. Modifying these
parameters, the designer can change the statistical properties of the random
stream of bit produced by the TRNG. For example, according to [2], for a sam-
pling frequency of 100 MHz, the generator composed of 25 ROs, each using 3
NOT gates, generates stream of bits passing the NIST and DIEHARD tests even
without post-processing (in the original Sunar’s design [3], the post-processing
was mandatory).

3 Experimental Setup

3.1 TRNG Implementation

The EM attacks were realized on a board featuring ACTEL Fusion FPGA. The
board is dedicated to evaluation of TRNGs. Special attention was payed to the
power supply design using low noise linear regulators and to the design of power
and ground planes. It is important to stress that the board was not specially
designed to make the EM fault injection or side-channel attacks easier, as it is
Contactless EM Active Attack on RO-Based TRNG 155

Fig. 2. RO-based TRNG

the case of the SASEBO board [4]. It can be seen in Figure 3, that the FPGA
module was plugged into the motherboard containing power regulator and USB
interface.

Fig. 3. Block diagram of the board dedicated to TRNG testing

In order to demonstrate that the EM injection can disturb both RO and


TRNG behavior, we performed attacks on two kinds of implementations:
– The first one was composed of four 3-element ROs. It was aimed at the
measurement of the phase difference between four generated clocks (see
Figure 4). This implementation will be called Target#1.
– In the second implementation depicted in Figure 5, the TRNG core was
implemented in an FPGA board under attack. Another board that was pro-
tected from EM emanations, generated reference clock signals, read data
from TRNG and communicated with computer. We decided to separate the
communication from random data generation in order to be sure that it
was the TRNG that was faulty, not the communication. The communication
module is composed of a serial to parallel converter, a FIFO and a USB con-
troller. USB interface throughput (up to 20 MB/s) was sufficient to handle
the bit rate of the TRNG. The FIFO guarantees that no data are lost during
156 P. Bayon et al.

Fig. 4. Implementation for the measurement on ROs - Target#1

the transfer. Two signals were exchanged between the boards: a clock signal
coming from the communication board and the random bitstream produced
by the TRNG inside the FPGA under attack. These two signals were mon-
itored with an oscilloscope during the attack in order to ensure that their
integrities were untouched. This implementation is called Target#2.

Fig. 5. TRNG testing architecture - Target#2

We ensured that the ROs were not initially locked due to their placement. In the
rest of the paper, the term ”locked” has the same meaning as in phase-locked-
loops (PLL).
In both cases, ROs were composed of three inverters (NOT gates), giving the
working frequencies of about 330 MHz. For Target#2, the TRNG was composed
of 50 ROs. A sampling clock of 24 KHz was generated in an embedded PLL.
This sampling frequency was chosen in order to make a 2-RO TRNG pass the
NIST statistical tests. In general, decreasing the speed of the sampling clock will
improve the behavior of the TRNG (the jitter accumulation time will be longer).
Moreover, we used more ROs than Wold and Tan in [2] (50 versus 25). We stress
that the TRNG featuring 50 ROs should pass FIPS, and NIST statistical tests
under normal conditions without any problems.

3.2 EM Injection Platform


The EM injection platform is presented in Figure 6. The platform embeds a
power injection chain supplying the micro-antenna, but also two other chains:
Contactless EM Active Attack on RO-Based TRNG 157

one for controlling the whole platform and the other one for data acquisition
and storage.
The main element of both control and data acquisition chains is a personal
computer (PC), which:
– controls the amplitude and the frequency of the sine waveform signal pro-
vided by the signal generator to the input of the 50 W power amplifier,
– positions the micro-antenna above the IC surface thanks to the XYZ motor-
ized stages,
– collects data provided by the power meter, connected to a bi-directional
coupler, in order to monitor the forward (Pforward ) and reflected (Preflected)
powers,
– sends configuration data to the ACTEL Fusion FPGA and supplies target
boards via USB,
– stores the time domain traces of all signals of interest acquired using the
oscilloscope; in our case, the outputs of the four ROs (Target #1 - Out1 to
Out4 ) and the TRNG output (Target #2).

Fig. 6. Direct power injection platform

Note that according to safety standards, but also in order to limit the noise
during acquisitions, the whole EM injection platform is placed in an EMC table
top test enclosure with a 120 dB RF isolation.
A key element of this platform is the probe that converts electric energy
in a powerfull EM field (active attacks). Most of micrometric EM probes used
generally to characterize the susceptibility of IC [5] are inductive, composed of
a single coil in which a high amplitude and thus a sudden current variation
is injected. These probes cannot be used in our context. Indeed, reducing the
158 P. Bayon et al.

coil diameter to micrometric dimensions (200 μm - 20 μm) implies reducing the


coil wire diameter, too. As a result, the amplitude of the current injected into
the probe must be reduced to avoid any deterioration of the coil. Consequently,
the power that can be effectively injected into such probes was experimentally
found too small for disturbing significantly the behavior of the logic device. After
several attempts and prototype versions, we adopted a probe shown in Figure 7.
It is constituted of a thin tungsten rod. Its main characteristics are: a length of
30mm, and a diameter of 200 μm at one end and 10 μm at the other end.

Fig. 7. Unipole micro-probe

This probe involves predominantly electric field, and we can assume that only
this component, at the tip end, can couple with the metal tracks inside the
IC. Further information about the platform and the effects of EM injection are
available in [6,7].

3.3 Attack Description


Inside the EMC table top test enclosure, the probe was located in the close
vicinity of the FPGA plastic seal (the FPGA packaging was left intact), i.e. at a
distance of roughly 100 μm from the DUT packaging. In order to maximize the
impact of EM injections, the tip of the probe was placed near ROs implemented
inside the FPGA.
– The first set of experiments, realized on Target#1, was aimed at analyzing
the influence of EM injections on the ROs. The EM signals power level
Pforward was set successively to [34 nW ; 340 μW ; 1 mW ; 3 mW], in a
frequency range [300 MHz – 325 MHz]. With a sampling rate of 20 MS/s,
we acquired 10 traces on each of the four oscilloscope channels, in order to
record:
• Out1 , the signal provided by the RO#1 used as a trigger to synchronize
the oscilloscope.
• Out2 to Out4 , the signals provided by RO#2, RO#3 and RO#4.
Finally, all acquired data were analyzed off line according to several crite-
ria. Among them, one is the mutual information. This point is detailed in
section 4.2. Another one (detailed in section 4.3) is the phase difference be-
tween the oscillating signals Out1 and Out3 with EM injection.
Contactless EM Active Attack on RO-Based TRNG 159

– The second set of experiments aimed at studying the behavior of a complete


TRNG (Target#2) under EM emanation attacks. For each configuration,
the TRNG output bitstream was stored and analyzed with and without EM
injections. This latter set of experiments was conducted with a periodic signal
of 309.7 MHz. This frequency corresponded to the value maximizing the
coupling between the probe and the IC. It was found by analyzing the results
of a Discrete Fourier Transform applied on the SPA signal that was obtained
at different EM emanation frequencies. This point is further explained in the
next section.

4 Effect of the EM Waves on the ROs - Target #1


4.1 Choice of the Injection Frequency
The frequency of the injected signal determines success of the attack. Indeed,
the coupling between the IC and the probe tip end depends strongly on this
parameter.
Our first aim was to find the frequency that will impact a maximum number
of ROs. For this reason, the EM injections were realized at different frequencies.
More precisely, the frequency was swept over the range of [300 MHz - 325 MHz]
by steps of 50 kHz. This range was chosen because the oscillating frequencies fROi
of all ROs were measured and found to be spread between 325 MHz and 330 MHz.
During frequency sweeping, we analyzed the evolution of the following ratios:
DFTRi= Yfinj /YfROi ; where Yfinj is the amplitude of the spectral decomposition
of Outi at the injected frequency and YfROi is the amplitude at fROi . As shown
Figure 8, within this frequency range, all the DFTRi ratios reach their maximum
value at around f=309.7 MHz. For this reason and also because this frequency
maximizes the EM injection effects on all ROs, it was selected for realizing all
the following experiments. Figure 9b illustrates the effect of the EM injection
at this frequency. It can be seen that the spectral decomposition of Out1 and
Out3 shows a maximum at 309.7 MHz during perturbation signal injection. This
maximum is fifteen times higher than the amplitude at fRO1 and fRO3 , because
ROs oscillate at the injected frequency. However, this also means that all the
ROs (or at least most of them) are mutually locked.
The selected frequency was kept unchanged during the rest of experiments
and also during the specific attacks on TRNGs.
When the RO was not perturbed by an EM injection, only the fundamental
frequency composed the signal and its magnitude was equal to 0.25 (Figure 9a).
As a result, the DFT factor was near to 0. Then, the EM harmonic signal of 309.7
MHz was injected. The 309.7 MHz harmonic was so strong that it appeared on
the DFT and the amplitude of this harmonic became fifteen times higher than
that of the fundamental frequency (Figure 9b). This injected harmonic signal
took the control of ROs and generated signals.
160 P. Bayon et al.

RO1 RO2
20 20
Pforward max Pforward max
No injection No injection
15 15
Y(finj) / Y(fro1)

Y(finj) / Y(fro2)
10 10

5 5

0 0

−5 −5
300 305 310 315 320 325 300 305 310 315 320 325
Frequency (MHz) Frequency (MHz)

RO3 RO4
20 20
Pforward max Pforward max
No injection No injection
15 15
Y(finj) / Y(fro3)

Y(finj) / Y(fro4)
10 10

5 5

0 0

−5 −5
300 305 310 315 320 325 300 305 310 315 320 325
Frequency (MHz) Frequency (MHz)

Fig. 8. Discrete Fourier Transforms (DFT) factor Yfinj /YfROi vs injection frequency,
after analyzing signals Out1 , Out2 , Out3 and Out4

a) No injection b) P =3mW
Forward
0.5 3
Out3 Out3
Out1 Out1
2.5
0.4

2
0.3
|Y(f)|

|Y(f)|

ΔF 1.5
0.2
1

0.1
0.5

0 0
3 3.1 3.2 3.3 3.4 3 3.1 3.2 3.3 3.4
Frequency (MHz) x 10
8 Frequency (MHz) x 10
8

Fig. 9. Discrete Fourier Transform of the signals Out1 and Out3 under: a) normal
conditions b) EM injection at Finj=309.7MHz Pf orward =3mW

4.2 Proof of Effectiveness


In order to verify that all ROs were effectively locked, we analyzed the evo-
lution of the mutual information (MI) between the four ROs output voltages
(Vi (t),Vj (t)) for the injected power Pforward . The MI is a general measure of the
dependence between two random variables, and this parameter is often used as a
generic side-channel distinguisher [8]. Concerning our experiments, we expected
to observe:
Contactless EM Active Attack on RO-Based TRNG 161

– Low MI values between Vi (t) and Vj (t) when for Pforward = 340 nW, meaning
that the ROs were not locked,
– Increased MI values when Pforward was higher, meaning that EM injections
effectively lock the ROs,

Table 1 shows MI values at different levels of injection. As expected, the MI val-


ues were really low (0.02 bit) when Pforward = 340 nW. On the other side, when
Pforward = 3 mW, the MI average increased up to 0.99 bits at f=309.7 MHz.
This clearly demonstrates that the ROs were locked or at least interdependent.
This interdependence was also visible on the oscilloscope thanks to the persis-
tence of the screen. Figure 10) shows signals Out1 and Out3 obtained without
(Figure 10a) and with (Figure 10b) signal injection. As it can be seen, if un-
der attack, the two ROs were synchronized and operated at the same frequency
(note, that for other ROs we observed the same behavior).

Table 1. MI values for selected RO couples obtained at different injection powers

PForward 309.7 MHz 340 nW 34 µW 1 mW 3 mW


MI(RO#1,RO#2) 0.0267 0.1746 0.5478 1.5729
MI(RO#1,RO#3) 0.0305 0.7697 0.7889 1.1029
MI(RO#1,RO#4) 0.0135 0.2838 0.6747 0.8221
MI(RO#2,RO#3) 0.1055 0.1086 0.3872 0.8379
MI(RO#2,RO#4) 0.0245 0.1332 0.2247 0.6477
MI(RO#3,RO#4) 0.0383 0.3196 0.8053 0.9382
MI average 0.0398 0.2983 0.5715 0.9870

a) No injection b) P =3mW
forward
4 4

2
Out (V)

2
Out (V)
1

0 0

−2 −2

0 5 10 15 20 25 0 5 10 15 20 25
time (ns) time (ns)

4 4

2
Out (V)

Out (V)

2
3

0 0

−2 −2

0 5 10 15 20 25 0 5 10 15 20 25
time (ns) time (ns)

Fig. 10. Subsequent traces in persistent display mode (bold) and mean traces (fine)
of Out1 and Out3 corresponding to RO’s outputs during a) normal conditions and b)
submitted to Pforward = 3 mW of 309.7 MHz EM injections
162 P. Bayon et al.

4.3 Phase Reduction


Under normal conditions, the ROs have different operating frequencies due to
different interconnection delays. This is visible in Figure 9a. The difference ΔF =
fRO1 − fRO3 produces a linear drift between the rising edges of the ROs signals
(their positions will also depend on the jitter, but compared to ΔF , the jitter
impact is smaller).

a) Phase evolution over the time − Couple (RO1−RO3) − Finjection 309.7MHz

300
Angle (°)

200

100

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (us)
b) Histogram − Couple (RO1−RO3) − Finjection 309.7MHz
500
Number of occurence

400
300
200
100
0
0 50 100 150 200 250 300 350
Angle (°)

Fig. 11. a) Phase difference between Out1 and Out3 over the time b) Phase Distribu-
tion

In the case of strong EM harmonic injections, the two ROs are locked on
the injection frequency. This is clearly visible in Figure 9b, where the biggest
harmonic is the one of the injected frequency. Next, we propose to evaluate the
phase difference between output signals of two ROs. The evolution of the phase
differences between signals Out1 and Out3 is plotted in Figure 11a. According
to the histogram from Figure 11b, the phase is distributed between 222˚and
252˚and centered around 237˚. This gives a range of variation for the phase of
30˚. If we look at the phase evolution over the time, it is following an almost
sinusoidal tendency. As said before, during the harmonic injection, Out1 and
Out3 are mainly composed of two frequencies, one coming from the injection
itself (finj ) and the working frequency of the ring (fRO1 and fRO3 ). These two
frequencies in the frequency spectrum of each RO produce a beat phenomenon
(as it is defined in acoustics). This beat phenomenon explains the sinusoidal
tendency of the phase.

5 Effect of the EM Waves on the TRNG - Target #2


5.1 Impact of the RO Dependence on the Random Bitstream
The TRNG output bitstream produced for several levels of EM injection power
is shown in Figure 12. Each sample is composed of 120 successive 32-bit frames
Contactless EM Active Attack on RO-Based TRNG 163

(black and white squares correspond to 1 and 0, respectively). Under normal


conditions (Figure 12a), the TRNG bitstream passed NIST statistical tests with
1 Gb of data (1000 sequences of 1 Mb). It is recommended and common to
evaluate a bitstream starting by the frequency test (also called monobit test),
which evaluates the balance between number of ones and zeros in the bitstream.
If this test does not pass, it is not reasonable to continue with other tests.

a) No injection b) PForward 210 uW c) PForward 260 uW d) PForward 300uW

Fig. 12. Bitstream produced by the TRNG under different attack powers at 309.7 MHz
using electric probe (120x32) - Starting from left to right: a) No injection b) PForward
= 210 µW c) PForward = 260 µW d) PForward = 300 µW

Table 2. Statistical parameters of the TRNG output bitstream

PForward No Injection 210 µW 260 µW 300 µW


Bias% 0.1% 15.87% 51.57% 55%
NIST tests SUCCESS FAIL FAIL FAIL

In Table 2, the bias is defined as Bias = abs(0.5 − P (0)) = abs(0.5 − P (1)),


where P(x) is the probability of the element x. The bias can vary between 0 and
0.5. The bias is usually reported in %, after extrapolation between 0% and 100%
corresponding to 0 and 0.5 bias values. We will use this bias representation in
the rest of the paper. A good TRNG must have a bias close to 0%.
According to Figure 12 and Table 2, the EM injection effect on the bias is
clear. For example, for a power of 210 μW (Figure 12b the bias reaches 15%
(15 bits out of 100 bits of the bitstream are impacted by signal injection). In-
creasing the injected power up to 260 μW, the bias raises up to 50% (Figure 12c
and Figure 12d).

5.2 Control of the Bias


Previous experiments confirmed that it is possible to control statically the bias of
a RO based TRNG. In the next experiments, we wanted to observe the dynamic
164 P. Bayon et al.

behavior of the TRNG under attack. We added an amplitude modulator (AM)


between the RF generator and the input of the power amplifier. This system
achieves the analog multiplication between the injection signal – a sine waveform
signal fixed at 309.7 MHz (the active harmonic needed to perform the attacks)
– and a square waveform signal (the control signal), which controls accurately
the beginning and the end of the EM injection. The control signal is provided
by an external FPGA to deliver a desired timing injection sequence. Figure 13a
represents the timing evolution of the AM signal in Volts. Figure 13b shows
the effect on the TRNG output bitstream. Finally, Figure 13c represents the
evolution of the bias in time. It was computed using a sliding window of 10 000
bits. The sliding step was 32 bits.

a) Vin Amplifier
1
Vin (V)

−1
0 0.5 1 1.5 2 2.5 Time (s) 3 3.5 4 4.5 5
b) TRNG bitstream

c) Bias in %
100
Bias in %

50

0
0 0.5 1 1.5 2 2.5 Time (s) 3 3.5 4 4.5 5

Fig. 13. a) AM signal - b) TRNG stream of bits (raster scanning from bottom to top
and left to right) - c) Bias in % for the TRNG stream of bits

Looking at the bitstream or the bias, it is clear that the behavior of the TRNG
is quickly (in less than 1 ms) impacted by the EM perturbation and it returns
to its initial state with the same speed. In fact, we observed that the bias was
changing according to the dynamics of the power amplification chain. In our case,
it has a time response of roughly 1 ms. The difference in the bias for the different
periods of attack is due to the fact that the response time of the power amplifier
is not adapted to operate in an AM mode. This experiment makes clear that
the dynamic EM harmonic injection is feasible and that it can be very powerfull
and able to control the behavior of a RO-based TRNG even if it is composed of
a big number of ROs. The dynamic control of the EM harmonic injection is of
a paramount importance, because it can be used in order to by-pass statistical
embedded tests launched periodically.
Contactless EM Active Attack on RO-Based TRNG 165

Vin Amplifier

1
Vin (V)

−1
0 50 100 150 200 250 300 350 400 450 500
Time (s)
TRNG bitstream
0

10

20

30
0 1000 2000 3000 4000 5000 6000 7000 8000

Bias in %
100
Bias in %

50

0
0 50 100 150 200 250 300 350 400 450 500
Time(s)

Fig. 14. a) AM signal - b) There might be something written in this stream of bits -
c) Bias in % for the TRNG stream of bits

In order to demonstrate different capabilities of the proposed EM attack, a


complex square waveform signal was generated by an external FPGA in order
to replace the 1 Hz signal previously used to modulate the injected frequency;
the carrier wave frequency was kept at 309.7 MHz. In order to implement such
an experiment and to maintain the same amplifier in the injection chain, we had
to decrease the sampling frequency of the TRNG from 24 KHz to 500 Hz. This
sampling frequency modification had an impact on the output bitrate of the
TRNG, but not on its capability to produce a good quality random bitstream,
which passes the tests. The control sequence was arranged in such a way, that we
obtained the bitstream from Figure 14. As shown, the word COSADE appears in
the stream of bits. This definitely demonstrates that the EM harmonic injection
constitutes an important threat for RO based TRNG.

6 Conclusion

In this paper, an active EM attack on RO-based TRNG is presented. The ex-


periment setup is first described, and details about the EM harmonic platform
and the DUTs are provided. The first study of the behavior of the source of
entropy in the TRNG, i.e. of the set of ROs, showed the efficiency of the EM
emanations in controlling the behavior of ROs by their locking on the injected
signal, depending on the power of the injected signal and its frequency. In a sec-
ond experiment, realized on a 50-RO Wold’s TRNG implemented in an FPGA,
we demonstrated that it was possible dynamically control the bias of the TRNG
output.
166 P. Bayon et al.

Acknowledgments. The work presented in this paper was realized in the frame
of the EMAISeCi project number ANR-10-SEGI-005 supported by the French
”Agence Nationale de la Recherche” (ANR).

References
1. Markettos, A.T., Moore, S.W.: The Frequency Injection Attack on Ring-Oscillator-
Based True Random Number Generators. In: Clavier, C., Gaj, K. (eds.) CHES 2009.
LNCS, vol. 5747, pp. 317–331. Springer, Heidelberg (2009)
2. Wold, K., Tan, C.H.: Analysis and Enhancement of Random Number Generator in
FPGA Based on Oscillator Rings. In: International Conference on Reconfigurable
Computing and FPGAs (ReConFig 2008), pp. 385–390 (2008)
3. Sunar, B., Martin, W.J., Stinson, D.R.: A Provably Secure True Random Num-
ber Generator with Built-In Tolerance to Active Attacks. IEEE Transactions on
Computers 56(1), 109–119 (2007)
4. AIST, Side-channel Attack Standard Evaluation Board (SASEBO),
http://staff.aist.go.jp/akashi.satoh/SASEBO/en/index.html
5. Dubois, T., Jarrix, S., Penarier, A., Nouvel, P., Gasquet, D., Chusseau, L., Azais,
B.: Near-field electromagnetic characterization and perturbation of logic circuits.
In: Proc. 3rd Intern. Conf. on Near-Field Characterization and Imaging (ICONIC
2007), pp. 308–313 (2007)
6. Poucheret, F., Tobich, K., Lisart, M., Robisson, B., Chusseau, L., Maurine, P.:
Local and Direct EM Injection of Power into CMOS Integrated Circuits. In: Fault
Diagnosis and Tolerance in Cryptography, FDTC 2011 (2011)
7. Poucheret, F., Robisson, B., Chusseau, L., Maurine, P.: Local ElectroMagnetic Cou-
pling with CMOS Integrated Circuits. In: International Workshop on Electromag-
netic Compatibility of Integrated Circuits, EMC COMPO 2011 (2011)
8. Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon,
N.: Mutual Information Analysis: A Comprehensive Study. Journal of Cryptology,
1–23 (2010)
A Closer Look at Security
in Random Number Generators Design

Viktor Fischer

Laboratoire Hubert Curien UMR 5516 CNRS


Jean Monnet University, Member of University of Lyon
Rue du Prof. Benoit Lauras, 18, 42000, Saint-Etienne, France
http://laboratoirehubertcurien.fr/spip.php?rubrique29

Abstract. The issue of random number generation is crucial for the im-
plementation of cryptographic systems. Random numbers are often used
in key generation processes, authentication protocols, zeroknowledge pro-
tocols, padding, in many digital signature and encryption schemes, and
even in some side channel attack countermeasures. For these applica-
tions, security depends to a great extent on the quality of the source of
randomness and on the way this source is exploited. The quality of the
generated numbers is checked by statistical tests. In addition to the good
statistical properties of the obtained numbers, the output of the genera-
tor used in cryptography must be unpredictable. Besides quality and un-
predictability requirements, the generator must be robust against aging
effects and intentional or unintentional environmental variations, such
as temperature, power supply, electromagnetic emanations, etc. In this
paper, we discuss practical aspects of a true random number generator
design. Special attention is given to the analysis of security requirements
and on the way how this requirements can be met in practice.

Keywords: Random number generation, cryptographic hardware, data


security, statistical tests, digital design.

1 Introduction

Random number generators (RNGs) are one of basic cryptographic primitives


used to design cryptographic protocols. Their applications include - but are not
limited to - the generation of cryptographic keys, initialization vectors, chal-
lenges, nonces and padding values, and the implementation of countermeasures
against side channel attacks. RNGs aimed at cryptographic applications must
fulfill basic security requirements. First of all, their output values must have
good statistical properties and be unpredictable. In modern designs, some addi-
tional features are required: the generator must be inherently secure, robust and
resistant to attacks and/or tested on line using generator specific tests.
The security of cryptographic systems is mainly linked to the protection of
confidential keys. In high end information security systems, when used in an
uncontrolled environment, cryptographic keys should never be generated outside

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 167–182, 2012.

c Springer-Verlag Berlin Heidelberg 2012
168 V. Fischer

the system and they should never leave the system in clear. For the same reason,
if the security system is implemented in a single chip (cryptographic system-
on-chip), the keys should be generated inside the same chip. Implementation of
random number generators in logic devices (including configurable logic devices)
is of paramount importance.
There are three basic challenges in modern embedded TRNG design: (i) find-
ing a good quality source of randomness (available in the digital technology);
(ii) finding an efficient and robust principle of randomness extraction; (iii) guar-
anteeing the security (e.g. by a robust design or by an efficient online testing).
Historically, three basic RNG classes are used in cryptography: deterministic,
nondeterministic (physical) and hybrid random number generators.
Deterministic (pseudo-) random number generators (DRNGs) are mostly fast
and have good statistical properties. They are usually used as key generators
in stream ciphers. Due to the existence of some underlying algorithms, DRNGs
are easy to implement in logic devices. However, if the algorithm is known, the
generator output is predictable. Even when the algorithm is not known but some
of the generator output sequences have been recorded, its behavior during the
recorded sequence can be used in future attacks.
Physical (true-) random number generators (TRNGs) use physical processes
to generate random numbers. If the underlying physical process cannot be con-
trolled, the generator output is unpredictable and/or uncontrollable. The final
speed of TRNGs is limited by the spectrum of the underlying physical phe-
nomenon and by the principle used to extract entropy from it (e.g. sampling fre-
quency linked with the noise spectrum). The statistical characteristics of TRNGs
are closely related to the quality of the entropy source, but also to the random-
ness extraction method. Because physical processes are subject to fluctuations,
the statistical characteristics of TRNGs are usually worse than those of DRNGs.
Hybrid random number generators (HRNGs) represent a combination of a
(fast and good quality) deterministic RNG seeded repeatedly by a (slow but
unpredictable) physical RNG. The designer has to find a satisfactory compromise
between the speed of the generator and its predictability (by adjusting the time
interval between seeds and the size of a seed).
TRNGs are the only cryptographic primitives that have not been subject
to standardization up to now. However, before using the generator in practice,
the principle and its implementation inside a cryptographic module has to be
validated by an accredited institution as part of a security evaluation process.
Generators that do not have a security certificate are considered to be insecure
in terms of their use in cryptographic applications. Many TRNG designs exist,
but only few of them deal with security. In this paper, we will focus on security
aspects in TRNG design.
The paper is organized as follows. In Sec. 2, we present briefly basic approaches
in TRNG design. In Sec. 3, we present and discuss basic TRNG design evaluation
criteria and in Sec. 4 we analyze in detail TRNG security requirements. In Sec. 5,
we sum up basic requirements for future secure TRNG designs. We conclude the
paper in Sec. 6.
A Closer Look at Security in Random Number Generators Design 169

2 TRNG Design
The TRNG design styles evolved significantly in past few years. In the classical
approach (see Fig. 1a), the designers usually proposed some (new) principle
reflecting required design constraints such as area, throughput and/or power
consumption. In the development phase, they obviously used FIPS 140-1 [9] or
FIPS 140-2 statistical tests for verifying the quality of the generated bitstream,
because these simple tests need short data files and they give a good quality
estimation. In order to validate the final quality of the generated bitstream, the
designer tested the generated data using standard statistical test suites like NIST
SP 800-22 [20] or DIEHARD [19].
Even though statistical tests are required to evaluate the quality of the gener-
ated sequence, they cannot distinguish between pseudo random data generated
by a deterministic generator and truly random data from a physical TRNG.
This was one of the reasons, why German BSI (Bundesamt für Sicherheit in der
Informationstechnik) proposed in 2001 a new methodology aimed at evaluation
of physical random number generators. The AIS 31 methodology [15] defined
several RND classes and their security requirements. It was updated in 2011
and new RNG classes were defined [16].

Classical approach TRNG output


a) TRNG

BSI’s AIS Digitized noise Arith. & Crypto TRNG output


approach source postprocessing Raw binary
b)
signal output

Embedded Alarm
tests

Fig. 1. Classical (a) and German BSI’s (b) approach in TRNG design

According to the TRNG evaluation methodology proposed by BSI (see


Fig. 1b), the generator should use an uncontrollable physical process as a source
of randomness. Since physical phenomena used in TRNGs are mostly analog
processes, some method enabling data conversion from analog to digital domain
(as a part of randomness extraction procedure) is usually necessary.
The obtained unprocessed raw binary signal (so-called digital noise) can have
low entropy and/or bad statistical properties (e.g. it can be biased). In this
case, some post-processing algorithms can be used to enhance the statistical
parameters of the output bitstream. While the algorithmic post-processing is
optional, the following cryptographic post-processing can be strictly required
according to the targeted security level. The cryptographic post-processing plays
170 V. Fischer

very important security role if the source of randomness fails: (i) it can serve
temporarily as a DRNG; (ii) according to the application security level, it should
guarantee TRNG unpredictability in forward, backward or both directions.
Since the cryptographic algorithm implemented in the post-processing block
behaves as a DRNG when a true randomness fails, the latest AIS methodology
[16] merges evaluation of true random number generators and pseudorandom
number generators into a common evaluation procedure and introduces new
RNG subclasses (see Tab. 1): Physical TRNG (PTG.1 and PTG.2), Hybrid
physical TRNG (PTG.3), Deterministic RNG (DRG.1, DRG.2 and DRG.3),
Hybrid deterministic RNG (DRG.4) and Non-physical TRNG (NTG).

Table 1. New AIS RNG classes

RNG Class AIS20/AIS31 Class Comments


PTG.1 AIS31, P1 Physical TRNG with an internal total failure test
of the entropy source and tests of non-tolerable
statistical defects of the TRNG output
PTG.2 AIS31, P2 PTG.1 + a stochastic model of the entropy sour-
ce and statistical tests of the raw binary signal
PTG.3 No counterpart PTG.2 + cryptographic post-processing (hybrid
PTRNG)
DRG.1 AIS20, K2, partly K3 DRNG with forward secrecy
DRG.2 AIS20, K3 DRG.1 + backward secrecy
DRG.3 AIS20, K4 DRG.2 + enhanced backward secrecy
DRG.4 No counterpart DRG.3 + enhanced forward secrecy (hybrid
DRNG)
NTG.1 No counterpart Non-physical TRNG with entropy estimation

TRNG output post-processing can sometimes mask a serious faults, which


standard statistical tests may fail to detect. Therefore, the unprocessed digital
noise must be tested in classes with higher security requirements (PTG.2 and
PTG.3). The dedicated tests should suit the generator’s principle with particular
reference to its potential weaknesses and should be executed on the fly.

3 TRNG Design Evaluation Criteria


True random number generators use different sources of randomness and nu-
merous principles to extract it. TRNG designs (not TRNG implementations !)
can be evaluated using three classes of criteria [1]: (i) characteristics related to
the TRNG principle; (ii) design related characteristics; and (iii) security related
characteristics.

3.1 Criteria Related to the TRNG Principle


This set of parameters determines the main characteristics of the generator. It in-
cludes parameters like source of randomness, method of randomness extraction,
post-processing algorithms, output bitrate and its stability.
A Closer Look at Security in Random Number Generators Design 171

Source of Randomness
Logic devices are designed for the implementation of deterministic logic systems.
Each unpredictable behavior in such a system (caused by a metastability, clock
jitter, radiation errors, etc.) can have catastrophic consequences for the behavior
of the overall system. For this reason, vendors of logic devices tend to minimize
these causes. As a consequence, the TRNG design should always be critically
examined in order to keep up with the evolution of the underlying technology.
Most logic devices do not contain analog blocks, so the sources of randomness
are related to the operation of logic gates. Analog physical phenomena (like
thermal, shot and flicker noise) are transformed to time domain instability of
logic signals [13]. This can be seen as a variation in the delay of logic gates,
analog behavior of logic gates between two logic levels (e.g. metastability) [18],
[14] or randomness in two concurrent writings to RAM memory blocks [12], [11].
The instability of gate delays causes signal propagation variations over time.
These variations can be seen as a clock period instability (the jitter) in clock
generators containing delay elements assembled in a closed loop (ring oscillators).
The variation in propagation time is also used in generators with delay elements
in an open chain assembly [7].
Some generators use the tracking jitter introduced by phase locked loops
(PLLs) available in digital technology [10].
Method of Randomness Extraction
In general, random numbers can be obtained in two ways: sampling random
signals at regular time intervals or sampling regular signals at random time
intervals. In synchronous systems, the first method is preferable in order to
guarantee a constant bit rate on the output. In logic devices, randomness is often
extracted by sampling a jittery (clock) signal using synchronous or asynchronous
flip-flops (latches) and a reference (clock) signal.
The choice between synchronous and asynchronous flip-flops does not seem
to be important in ASICs, but it is very important in FPGAs. This is because
synchronous flip-flops are hardwired in logic cells as optimized blocks and their
metastable behavior is consequently minimized. On the other hand, latches can
usually only be implemented in Look up tables (LUTs) and are therefore subject
to metastable behavior to a greater extent [7].
Another ways of extracting randomness are: (i) counting number of random
events [28] or (ii) counting number of reference clock periods in a randomly
changing time interval [26].
The randomness extraction method is usually linked to the basic principle of
the generator and to the source of randomness. The randomness extraction pro-
cedure and post-processing are sometimes merged into the same block and cannot
be separated [24]. In that case, the entropy of the randomness source is masked
by post-processing and cannot be evaluated or tested correctly.
Arithmetic Post-processing of the Raw Binary Signal
The entropy source may have some weaknesses that lead to the generation
of non-random numbers (e.g. long sequences of zeros or ones). In this case,
172 V. Fischer

post-processing may be necessary to improve the statistical properties of random


numbers, for example to increase entropy per bit, reduce bias and/or correlation.
The quality of the digital noise signal (the signal obtained at the randomness
extraction block output) can deteriorate for several reasons: (i) the entropy of
the source is not high enough (this can be the case if metastability is used as
a source of randomness); (ii) the entropy, which is high in the original signal,
is not efficiently extracted; (iii) the extracted samples are correlated. The en-
tropy per bit at the output of the generator is mostly increased at the cost of
reduction and/or variation in the bit rate. Most of arithmetic post-processing
methods use some data compression technique in order to increase entropy per
bit at generator’s output.

Cryptographic Post-processing
This kind of the post-processing uses both diffusion and confusion properties
of cryptographic functions. The perfect statistical characteristics of most of the
encryption algorithms can be used to mask generator imperfections. One of ad-
vantages of this approach is that the encryption key can be used as a crypto-
graphic variable to dynamically modify the behavior of the generator. Although
this kind of post-processing block (the cipher) is rather complex and expensive,
the TRNG can reuse (share) the cipher that is used for data encryption.
One of the most expensive (in time and area) but also one of the most se-
cure methods is cryptographic post-processing based on hash functions. It uses
diffusion and one-wayness (as opposed to encryption of the raw binary sisgnal)
properties of hash functions to ensure the unpredictability of bits generated by
the TRNG if a total breakdown of the noise source occurs. In this case, due
to the non-linearity property of hash functions, the TRNG will behave like a
cryptographically secure DRNG.

Output Bit Rate and Its Stability


The speed is a secondary parameter (after security) in many cryptographic ap-
plications. Output bit rates from hundred kilobits per second up to 1 megabit
per second are usually sufficient. However, there are some speed critical data
security applications for which high speed generators are required. For example,
Quantum cryptography requires a high bit rate (up to 100 megabits per second)
because of the very low efficiency of key data transmission over the low-power
optical channel.
High speed telecommunication servers can be given as a second example. They
need to generate session keys regularly and at a high frequency (tens of megabits
per second). For example a 10-Gbit Ethernet hub/server would need at least 20
Mbits/s random bits to generate one 128-bit session key for each 64kB data block
in order to be able to face side channel attacks (giving 4k enciphered blocks per
key).
Another aspect of the output bit rate that has to be considered is its regularity.
Some generators give random numbers periodically, others generate output in
irregular time intervals. In the second case, a FIFO is required to accumulate the
A Closer Look at Security in Random Number Generators Design 173

generated numbers. Another solution is to estimate the smallest bit rate available
at the output and to sample the output at this rate. The disadvantage of the
first solution is that, depending on the mean output bit rate and the need for
random numbers, the FIFOs sometimes need to be very big. The disadvantage
of the second solution is that if the estimated bit rate is incorrect, the random
numbers may not be always available at the output.

3.2 Criteria Related to the TRNG Design


Resource Usage
To evaluate practical usefulness of various TRNG principles, it is important to
analyze the kind and number of resources needed for generator hardware imple-
mentation. Of course, the FPGA technology is more restrictive than its ASIC
counterpart. In FPGAs, designers can use: LUT based or multiplexer based logic
cells, embedded memory blocks, clock blocks featuring PLLs and DLLs, embed-
ded RC oscillators, hardwired multipliers, programmable interconnections, etc.
FPGAs have many logic cells, so the use of logic cells (the logic area) is
usually not a problem. However, the topology and electrical parameters of pro-
grammable interconnections are strongly technology dependent. Many TRNG
designs require designer’s manual intervention during placement and routing
(P/R). Some designs can be easily implemented in one FPGA family, but could
be difficult or impossible to implement in others. The choice and the number of
embedded hardwired blocs is usually much more limited (PLLs, RC oscillators,
multipliers, memory blocks) and varies with the vendor and the technology. The
use of hardwired blocks can thus be a limiting factor for reusability of the TRNG
principle.

Power Consumption
The power consumption of the generator is linked to its randomness source (e.g.
the oscillator), to the clock frequency used and to the post-processing algorithm
agility. In power critical applications, the generator can be stopped when not in
use. However, the possibility to stop the bit stream generation can be used to
attack the generator.

Technological Requirements
Compared to the implementation of TRNGs in ASICs, their implementation in
FPGAs is much more restricted. Many TRNGs implemented in ASICs use analog
components to generate randomness (e.g. chaos based TRNGs using analog to
digital converters, free running oscillator based generators using thermal noise
from diodes and resistors, etc.) and to process randomness (e.g. operational
amplifiers, comparators, etc.).
Most of these functional blocks are usually not available in digital technology
and especially in FPGAs, although some of them may be available in selected
families, e.g. RC oscillators in Microsemi (Actel) Fusion FPGA, analog PLLs in
most Altera and Actel families but not in old Xilinx families. From the point
of view of their feasibility in FPGAs, some generators are not feasible or are
174 V. Fischer

difficult to implement in FPGAs, some are feasible in selected FPGAs and the
most general principles are feasible in all FPGAs.

Design Automation Possibilities


The fact that the generator uses resources that are available in given technology
does not automatically mean that it can be implemented in this kind of tech-
nology. The range of tolerance of some technology parameters can be such that
it prevents reliable implementation of the generator. This is especially true in
FPGA technology.
The parameter that limits generator implementation in FPGAs is the avail-
ability of routing resources and their characteristics. Some generators require
perfectly balanced routing. This necessitates perfect control of the module place-
ment (e.g. symmetrical placement of two modules in relation to another module)
and routing. While most FPGA design tools allow precise control of placement,
the routing process is difficult or impossible to control (e.g. in the Microsemi
family). Even when routing can be partially or totally controlled (e.g. Altera
and Xilinx families), the delays in the configurable routing net vary so much
from device to device that it is impossible to balance module interconnections
in a general manner and the design will be device dependent, i.e. it has to be
balanced manually for each device. Such manual intervention is not acceptable
from the point of view of the practical implementation of the generator.
The best generators (very rare) can be mapped automatically (without man-
ual intervention) in all FPGA families. From a practical point of view, imple-
mentation of the generator that requires manual P/R for each family and/or
type of device, remains acceptable. However, generators that require manual
optimization for each device are not tolerable in industrial applications.

3.3 Criteria Related to the TRNG Security

Robustness, Resistance against Attacks


Besides defining the compression ratio, the entropy bound given by the statistical
model can be used for security evaluation of the generator. Namely, it can help
in estimating the robustness of the generator against intentional or unintentional
environmental variations. Concerning attacks and resistance against them, there
are three possibilities: (i) proof exists that the generator cannot malfunction as
the result of any attack or of a changing environment (proof of security), (ii)
neither security proof nor attack exists, (iii) some attack on a particular gener-
ator has been reported.

Existence of a Statistical Model and Its Reliability


The randomness of generated bitstream can be characterized by the entropy
increase per bit at the generator output. Unfortunately, entropy is a property of
random variables and not of observed realizations (random numbers). In order
to quantify entropy, the distribution of the random variables must be analyzed,
e.g. by the use of a stochastic model.
A Closer Look at Security in Random Number Generators Design 175

Stochastic models are different from physical models. Figure 2 depicts the me-
chanical principle of the metastability (that is useful for understanding metasta-
bility in electronics). In this case, the physical model of the metastability would
describe the form of the hill and the stochastic model would describe probability
distribution of the ball final position according to the form and the width of the
hill. In general, stochastic models are easier to construct.
The stochastic model must describe only random process that is indeed used
as a source of randomness. The metastability in Fig. 2 is related to the ability
of the ball to stay at the top of the hill during random time interval. It is clear,
that it is very difficult (but not completely impossible) to place and to maintain
the ball on the top. However, it is completely impossible to place it periodically
exactly at the top in small time periods (in order to increase the bitrate) as is
supposed to be done in papers presumably using metastability, e.g. in [18].
The stochastic model serves for estimating the lower entropy bound. This
value should be used in the design of the arithmetic post-processing block: the
lower entropy bound determines the compression ratio necessary for increasing
the entropy per output bit to a value close to 1. It can also be used for testing
the entropy of the generated random bits in real time (online tests).

MSS

SS0 SS1 Metastability range MSS – Metastable State


SSx – Stable State x

Fig. 2. Mechanical (physical) model of metastability

Inner Testability
Inner testability means that the generator structure enables evaluation of the
entropy of the raw binary signal (if it is available) [6]. Indeed, in some designs,
randomness extraction and post-processing are merged into the same process and
the unprocessed random signal (the raw binary signal) is not available. Even if
this signal is available, it is sometimes composed of a pseudo random pattern
combined with a truly random bit stream [4].
The pseudo random pattern makes statistical evaluation of the raw signal
more difficult. For this reason, we propose a new testability level: an absolute
inner testability. The raw binary signal of the generator featuring absolute inner
testability does not include a pseudo random pattern and contains only a true
random bit stream. If (for some reason) the source of randomness fails, the raw
signal of the generator will be zero. This fact can be used to detect very quickly
and easily the generator’s total failure.
176 V. Fischer

3.4 TRNG Design Evaluation – Conclusions

The TRNG characteristics discussed in Sec. 3 are not all equally important. Se-
curity parameters like robustness, availability of a stochastic model, testability,
etc. always take priority in a data security system. Their weight in TRNG eval-
uation is much higher than that of other parameters like power consumption,
bit rate, etc. For this reason, we will analyze these criteria in more details and
give some practical recommendations in the next section.

4 Main Security Issues in Published TRNG Designs

The output of a good TRNG should be indistinguishable from the output of an


ideal TRNG, independently of operating conditions and time. The quality of the
generator output bit stream and its security parameters including robustness
against aging, environmental changes, attacks, existence of selftest and online
tests are very important in the TRNG design.

4.1 Sensitivity of the TRNG to Variations of Operating Conditions

The quality of the generator output is tightly linked with the quality of the
source of randomness and to the randomness extraction method used. The phys-
ical characteristics of the source of randomness (e.g. frequency spectrum) and
the randomness extraction method determine the principal parameters of the
generated bit stream: the bias of the output bit stream, correlation between sub-
sequent bits, visible patterns, etc. While some of these faults can be corrected
by efficient post-processing, it is better if the generator inherently produces a
good quality raw bit stream.
It is of extreme importance that the generator is dimensioned to the minimum
amount of random physical quantities (noise, jitter, etc.) that cannot be further
reduced. The thermal noise can be considered as such a source of entropy. How-
ever, the total noise in digital devices is mostly a composition of random noises
(such as thermal noise, shot noise, flicker noise, etc.) coming from global and
independent local sources, but also of data dependent deterministic noises that
can be very often manipulable.
If the extractor samples the source of randomness too fast, adjacent bits could
be correlated. For this reason, it is good practice to check the generated bit
stream for a short term auto correlation. It is also possible that the digital noise
exhibits some other short term dependencies, which need to be detected by some
generator specific tests.
The behavior of the generator is often influenced by external and/or internal
electrical interferences. The most obvious effect of this will be discrete frequencies
from the power supply and from various internal signals appearing in the noise
spectrum.
The spectrum of the generated noise signal can be significantly influenced
by a low frequency 1/f noise caused by semiconductors. Furthermore, the high
A Closer Look at Security in Random Number Generators Design 177

frequencies from the noise spectrum may be unintentionally filtered out by some
internal capacities. Presumably white Gaussian noise will thus have a limited
spectrum that will not be uniform.
Some generators can feature so called bad spots. Bad spots are short time
periods, during which the generator ceases to work, due to some electrical inter-
ference or to extreme excursions of the generator’s overloaded circuitry.
Another dangerous feature of the generator can be a back door, which refers
to the deviations from uniform randomness deliberately introduced by the man-
ufacturer. For example, let us suppose that instead of using some physical pro-
cess, the generator would generate a high quality pseudo random sequence with
a 40-bit seed. It would be impossible to detect this behavior by applying stan-
dard statistical tests on the output bit stream, but it would be computationally
feasible for someone who knows the back door to guess successive keys.
When implementing TRNG as a part of a cryptographic system on chip,
designers must take into account that the circuitry surrounding the generator
will influence the generator’s behavior by the data dependent noise present in
power lines and by cross-talks. This impact is not so dangerous if two conditions
are fulfilled: (i) the low entropy bound estimation of the generator does not
include the digital noise from the system on chip activities; (ii) embedded online
tests verify continuously that the effective entropy is not below this bound.
Very few papers evaluate the impact of the environment on the source of
randomness and on the operation of the TRNG. The generator uses all the
sources contributing to the selected phenomena. For example the clock jitter is
determined by the local noise sources, but also by global sources from power
supply, electromagnetic emanations etc. If the low entropy bound was estimated
for the sum of noise sources, it will be sufficient for the attacker to put the
generator to ideal conditions (low noise battery power supply, metallic shielding)
in order reduce the entropy under the estimated low bound.
The generator’s design must be evaluated in changing environmental condi-
tions (temperature, electromagnetic emanations, etc.). It must be tested and
embedded test validated for edge values (only one parameter is set to its max-
imal value) and corner values (more or all parameters are set to their critical
value) of environmental parameters.
Recently, we have developed a set of evaluation boards (modules) aimed at
fair TRNG benchmarking [5]. Five modules using five different FPGA families
are available: Altera Cyclone III, Altera Arria II, Xilinx Spartan 3, Xilinx Virtex
5 and Microsemi Fusion. All the modules have the same architecture featuring
selected FPGA device, linear power supply, two LDVS outpus for external jitter
measurement and optionally 32 Mbits of external RAM for fast data acquisition.
The modules are plugged to a motherboard containing linear power supplies
(the card can be powered by battery, too), USB interface control device from
Cypress. The modules are accessible remotely on demand and can be used for
a fair evaluation of TRNG designs in the same working conditions. The new
generation will be placed in an electromagnetic shielding and will communicate
with PC via optical fibers.
178 V. Fischer

4.2 Security Threats Related to Statistical Models and Entropy


Estimators

Very few recent designs deal with stochastic models [23], [2], [3], [22], [8], [28].
The most comprehensive model of a two-oscillator based TRNG is presented in
[2]. It characterizes randomness in the frequency domain. However, underlying
physical hypotheses (clock jitter as a one-dimensional Brownian motion) must
be still thoroughly evaluated.
A stochastic approach (an urn model) based on a known jitter size is pre-
sented by Sunar et al. in [23]. Unfortunately, it is based on several unrealistic
assumptions criticized by Dichtl in [8]. Some of these assumptions, such as jitter
overestimation (due to jitter measurement outside the device using standard in-
put/output circuitry) can be corrected by using differential oscilloscope probes
in combination with LVDS device outputs [25]. Unrealistic requirements given
on the XOR gate were later resolved by Wold and Tan in [30].
However, the most security critical assumption of Sunar et al. turned out to
be the mutual independence of rings (basic assumption for the validity of the
model). It was shown in [4] that the rings are not independent and that up to 25%
of them can be mutually locked. This phenomenon would reduce significantly the
validity of the Sunar et al.’s model and consequently the entropy estimation and
the security of the generator.
It is worth mentioning that Wold and Tan made another security critical at-
tempt: since (by changing the original TRNG structure) the raw binary signal
at the XOR gate output passed statistical tests more easily, they deduced that
the entropy is sufficient enough (without measuring the jitter) and consequently
they reduced considerably the number of rings (from 114 to 25). From the se-
curity point of view, this attempt is not acceptable, since it caused significant
entropy reduction (according to the model, only few urns were filled).
The models presented in [3] are restricted to TRNGs based on coherent sam-
pling [17], [26], [10]. However, these models have only limited practical value,
because the first TRNGs in [17] and [26] have some technological limits (diffi-
culty to set up precisely the generated clock signals periods) and the PLL-based
TRNG from [10] uses the jitter with a complex profile (some deterministic jitter
coming from the PLL depends on the characteristics of the PLL control loop).

4.3 Embedded TRNG Testing and Related Security Issues

In contrast to standard methods that tests only the TRNG output, the AIS
methodology requires to test (for higher security levels) also the raw binary
signal (see Fig. 1b). This new approach is motivated by the fact that the post-
processing can mask serious defects of the generator. If a stochastic model of
the physical randomness source is available, it can be used in combination with
the raw signal to estimate the entropy and the bias depending on random input
variables and depending on the generator principle.
The raw binary signal is also used in Online tests. Online tests should be
applied to the digital noise signal while the generator is running. They provide
A Closer Look at Security in Random Number Generators Design 179

ways to stop the TRNG (at least temporarily) when a conspicuous statistical
feature is detected. A special kind of online tests required by the AIS methodol-
ogy is a “total failure test” or Tot test that should be able to immediately detect
total failure of the generator.
Evaluating TRNGs is a difficult task. Clearly, it should not be limited to test-
ing the TRNG output. Following the AIS methodology, the designer should also
provide a stochastic model based on the noise source and the extraction process
and propose statistical and online tests suited to the generator’s principle. The
AIS methodology does not favor or exclude any reasonable TRNG design. The
applicant can also substitute alternative evaluation criteria, however these must
be clearly justified.
Surprisingly, no design was evaluated in the literature following the AIS rec-
ommendations for high level security (separate testing of the raw binary signal
and internal random numbers required for PTG.3 and PTG.4) up to now. Some
papers just apply AIS tests T0 to T4 at the generator output. It is also worth
pointing out that no paper proposed up to now a design-specific online test, not
even a design-specific total failure test. Surprisingly, most of recent designs are
still evaluated by their authors following the classical approach from Fig. 1a.
In our approach, we propose a new extension of security in TRNG design,
which is depicted in Fig. 3. This approach simplifies significantly security eval-
uation, construction of the generator’s stochastic model and last but not least,
realization of simple and fast embedded test, while being entirely compatible
with the AIS methodology.

Extended Entropy Entropy Arith. & Crypto TRNG output


security source extractor postprocessing Raw binary
approach signal output
Digitized
noise Embedded Alarm 1
source tests

Test of the source Alarm 2


of randomness

Fig. 3. New security approach in TRNG design based on embedded randomness testing

We propose to measure the source of entropy (e.g. the jitter) before the en-
tropy extraction. This way, the randomness quantification is easier and more
precise. Since the entropy extraction is an algorithmic process, it can be easily
included in the stochastic (mathematical) model. However, two conditions must
be fulfilled in our approach: (i) the method must to quantify exactly the same
physical process that is used as a source of randomness; (ii) the entropy ex-
traction algorithm must be included in the stochastic model very precisely. We
have analyzed many recent TRNG principles. Unfortunately, only few of them
180 V. Fischer

are directly (without modification) applicable. For example, we can cite those
published in [17], [26], [10] and [27].
Some papers deal with implementation of embedded tests FIPS, NIST, etc.
inside the device [21], [29]. Unfortunately, their authors do not consider the
impact of the tests on the TRNG itself. The tests generate temporarily the
digital noise (which let them pass more easily) and during the normal operation
the effective noise (and consequently also entropy) can be significantly smaller.

5 Recommendations for Future Secure TRNG Designs


According to the previous analysis of TRNG designs and of security requirements
in modern cryptographic systems, we propose designers to follow the next list of
recommendations:
– Designers should clearly define the targeted security level. For example, in
the context of the AIS procedure, they should specify the RND class.
– If higher security classes are targeted, the generator must be inner testable.
– A fast total failure test adapted to the TRNG principle must be proposed,
implemented and executed continuously.
– If some online tests are embedded in the device, the designer should verify
that the tests do not have any impact on the generated random numbers
themselves, otherwise the tests must be executed continuously.
– If the generator makes part of a system on chip, the designer should verify
that system working does not have a negative impact on the generator (i.e.
that generation of random numbers cannot be manipulated by varying the
system activity).
– The highest security can be obtained if the source of randomness (e.g. the
jitter) is is measured online inside the device according to Fig. 3. In this
case, the designer must pay particular attention to the fact that he measures
exactly the same kind of physical parameter, which is used as a source of
randomness. The same parameter must be used to build a stochastic model
and to verify in real time the low entropy bound.
– The generator must be tested and embedded test validated in edge and
corner values of environmental parameters.

6 Conclusion
In this paper, we have presented basic approaches to designing modern TRNGs.
We have presented and discussed basic TRNG design evaluation criteria, such as
sources of randomness and randomness extraction method applied, arithmetic
and cryptographic post-processing method utilized, output bitrate and its sta-
bility, resource usage, power consumption, technological and design automation
requirements, etc.
We have explained that security parameters like robustness, availability of a
stochastic model, testability, etc. always take priority in a data security system.
A Closer Look at Security in Random Number Generators Design 181

We have also proposed a new level of testability criteria: the absolute inner
testability. Furthermore, the new TRNG design approach testing the source of
entropy before entropy extraction presented in this paper contributes to security
enhancement of future TRNG design. We have also proposed a solution, which
can serve for a fair TRNG benchmarking. In the last section, we have summed
up several recommendations aimed at securing TRNG designs in general.

References
1. Badrignans, B., Danger, J.L., Fischer, V., Gogniat, G., Torres, L.: Security Trends
for FPGAs, 1st edn., ch. 5, pp. 101–135. Springer (2011)
2. Baudet, M., Lubicz, D., Micolod, J., Tassiaux, A.: On the security of oscillator-
based random number generators. Journal of Cryptology 24, 1–28 (2010)
3. Bernard, F., Fischer, V., Valtchanov, B.: Mathematical Model of Physical RNGs
Based on Coherent Sampling. Tatra Mt. Math. Publ. 45, 1–14 (2010)
4. Bochard, N., Bernard, F., Fischer, V., Valtchanov, B.: True-Randomness and Pseu-
dorandomness in Ring Oscillator-Based True Random Number Generators. Inter-
national Journal of Reconfigurable Computing, Article ID 879281, 13 (2010)
5. Bochard, N., Fischer, V.: A set of evaluation boards aimed at TRNG design eval-
uation and testing. Tech. rep., Laboratoire Hubert Curien, Saint-Etienne, France
(March 2012), http://www.cryptarchi.org
6. Bucci, M., Luzzi, R.: Design of Testable Random Bit Generators. In: Rao, J.R.,
Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 147–156. Springer, Heidelberg
(2005)
7. Danger, J.L., Guilley, S., Hoogvorst, P.: High Speed True Random Number Gen-
erator based on Open Loop Structures in FPGAs. Elsevier Microelectronics Jour-
nal 40(11), 1650–1656 (2009)
8. Dichtl, M., Golić, J.D.: High-Speed True Random Number Generation with Logic
Gates Only. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727,
pp. 45–62. Springer, Heidelberg (2007)
9. Fips, P. 140-1: Security Requirements for Cryptographic Modules. National Insti-
tute of Standards and Technology 11 (1994)
10. Fischer, V., Drutarovsky, M.: True Random Number Generator Embedded in Re-
configurable Hardware. In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002.
LNCS, vol. 2523, pp. 415–430. Springer, Heidelberg (2003)
11. Güneysu, T.: True Random Number Generation in Block Memories of Reconfig-
urable Devices. In: Proc. Int. Conf. on Field-Programmable Technology – FPT
2010, pp. 200–207. IEEE (2010)
12. Gyorfi, T., Cret, O., Suciu, A.: High Performance True Random Number Gen-
erator Based on FPGA Block RAMs. In: Proc. Int. Symposium on Parallel and
Distributed Processing, pp. 1–8. IEEE (2009)
13. Hajimiri, A., Lee, T.: A general theory of phase noise in electrical oscillators. IEEE
Journal of Solid-State Circuits 33(2), 179–194 (1998)
14. Holleman, J., Otis, B., Bridges, S., Mitros, A., Diorio, C.: A 2.92 muW Hardware
Random Number Generator. In: IEEE Proceedings of ESSCIRC (2006)
15. Killmann, W., Schindler, W.: AIS 31: Functionality classes and evaluation method-
ology for true (physical) random number generators, version 3.1. Bundesamt fur
Sicherheit in der Informationstechnik (BSI), Bonn (2001),
http://www.bsi.bund.de/zertifiz/zert/interpr/ais31e.pdf
182 V. Fischer

16. Killmann, W., Schindler, W.: A proposal for: Functionality classes for random
number generators, version 2.0. Tech. rep., Bundesamt fur Sicherheit in der Infor-
mationstechnik (BSI), Bonn (September 2011),
https://www.bsi.bund.de/EN/Home/home_node.html
17. Kohlbrenner, P., Gaj, K.: An Embedded True Random Number Generator for
FPGAs. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium
on Field Programmable Gate Arrays, pp. 71–78 (2004)
18. Majzoobi, M., Koushanfar, F., Devadas, S.: FPGA-Based True Random Num-
ber Generation Using Circuit Metastability with Adaptive Feedback Control. In:
Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 17–32. Springer,
Heidelberg (2011)
19. Marsaglia, G.: DIEHARD: Battery of Tests of Randomness (1996),
http://stat.fsu.edu/pub/diehard/
20. Rukhin, A., Soto, J., Nechvatal, J., Smid, J., Barker, E., Leigh, S., Leven-
son, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical
test suite for random and pseudorandom number generators for cryptographic
applications, nist special publication 800-22 (2001), http://csrc.nist.gov/,
http://csrc.ncsl.nist.gov/publications/nistbul/html-archive/dec-00.
html
21. Santoro, R., Sentieys, O., Roy, S.: On-line monitoring of random number genera-
tors for embedded security. In: Proceedings of IEEE International Symposium on
Circuits and Systems, ISCAS 2009 (2009)
22. Simka, M., Drutarovsky, M., Fischer, V., Fayolle, J.: Model of a True Random
Number Generator Aimed at Cryptographic Applications. In: Proceedings of 2006
IEEE International Symposium on Circuits and Systems, ISCAS 2006, p. 4 (2006)
23. Sunar, B., Martin, W., Stinson, D.: A Provably Secure True Random Number
Generator with Built-In Tolerance to Active Attacks. IEEE Transactions on Com-
puters, 109–119 (2007)
24. Tkacik, T.: A Hardware Random Number Generator. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 450–453. Springer, Heidel-
berg (2003)
25. Valtchanov, B., Aubert, A., Bernard, F., Fischer, V.: Characterization of random-
ness sources in ring oscillator-based true random number generators in FPGAs.
In: 13th IEEE Workshop on Design and Diagnostics of Electronic Circuits and
Systems, DDECS 2010, pp. 1–6 (2010)
26. Valtchanov, B., Fischer, V., Aubert, A.: Enhanced TRNG Based on the Coher-
ent Sampling. In: 2009 International Conference on Signals, Circuits and Systems
(2009)
27. Varchola, M., Drutarovsky, M.: Embedded Platform for Automatic Testing and
Optimizing of FPGA Based Cryptographic True Random Number Generators.
Radioengineering 18(4), 631–638 (2009)
28. Varchola, M., Drutarovsky, M.: New High Entropy Element for FPGA Based True
Random Number Generators. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 351–365. Springer, Heidelberg (2010)
29. Veljkovic, F., Rozic, V., Verbauwhede, I.: Low-Cost Implementations of On-the-
Fly Tests for Random Number Generators. In: Design, Automation, and Test in
Europe – DATE 2012. EDAA (2012)
30. Wold, K., Tan, C.H.: Analysis and Enhancement of Random Number Generator
in FPGA Based on Oscillator Rings. In: 2008 International Conference on Recon-
figurable Computing and FPGAs, pp. 385–390 (2008)
Same Values Power Analysis
Using Special Points on Elliptic Curves

Cédric Murdica1,2 , Sylvain Guilley1,2 , Jean-Luc Danger1,2 ,


Philippe Hoogvorst2, and David Naccache3
1
Secure-IC S.A.S., 80 avenue des Buttes de Coësmes,
F-35700 Rennes, France
{cedric.murdica,sylvain.guilley,jean-luc.danger}@secure-ic.com
2
Département COMELEC, Institut TELECOM,
TELECOM ParisTech, CNRS LTCI, Paris, France
{cedric.murdica,sylvain.guilley,jean-luc.danger,
philippe.hoogvorst}@telecom-paristech.fr
3
Ecole normale supérieure, Equipe de cryptographie, 45 rue d’Ulm
F-75230 Paris cedex 05, France
david.naccache@ens.fr

Abstract. Elliptic Curve Cryptosystems (ECC) on Smart-Cards can be


vulnerable to Side Channel Attacks such as the Simple Power Analysis
(SPA) or the Differential Power Analysis (DPA) if they are not carefully
implemented. Goubin proposed a variant of the DPA using the point
(0, y). This point is randomized neither by projective coordinates nor by
isomorphic class. Akishita and Takagi extended this attack by consider-
ing not only points with a zero coordinate, but also points containing a
zero value on intermediate registers during doubling and addition formu-
las. This attack increases the number of possible special points on elliptic
curve that need a particular attention. In this paper, we introduce a new
attack based on special points that show up internal collision power anal-
ysis. This attack increases more the number of possible special points on
elliptic curve that need a particular attention. Like Goubin’s attack and
Akishita and Takagi’s attack, our attack works if a fixed scalar is used
and the attacker can chose the base point.

Keywords: Elliptic Curve Cryptosystem, Differential Power Analysis,


Zero Value Point Attack, Collision Power Analysis.

1 Introduction
An approach to prevent the DPA on ECC implementations is to randomize the
base point P at the begin of an Elliptic Curve Scalar Multiplication (ECSM).
Commonly randomization techniques are projective randomization [6], and ran-
dom isomorphic class [10]. However, Goubin pointed out that some points with a
zero value, namely (0, y) and (x, 0), are not randomized [9]. For an elliptic curve
E, containing E points, if an attacker can choose the base point P = (k −1 mod
E)(0, y) for some integer k, he can detect if the point kP is computed during

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 183–198, 2012.

c Springer-Verlag Berlin Heidelberg 2012
184 C. Murdica et al.

the ECSM of P . This attack is called Refined Power Attack (RPA). Akishita
and Takagi extended this attack by pointing out that some special points with
no zero value might take a zero value in auxiliary registers during addition or
doubling points [2]. The Zero Value Point Attack (ZPA) increases the number
of possible special points in an elliptic curve.
In this paper we introduce a new attack called Same Values Analysis (SVA).
Instead of looking at points that show up zero values, we look at points that
show up equal values during doubling or addition algorithms. We list conditions
of special points that have those properties, even if the point is randomized using
the random projective coordinates countermeasure. An Internal Comparative
Power Analysis is used to detect if the special point appears during an ECSM.
Our attack is the first attack based on Internal Power Analysis on an ECC
implementation. New possible special points on elliptic curves are given that
we must give a particular attention, in addition of special points that show up
zero values given in [2]. Finally, the isogeny defence, sometimes used to protect
against the RPA and the ZPA, must be updated to also prevent our attack.
The rest of the article is structured as follows. In section 2, we describe some
properties of elliptic curve cryptosystems and give a description of the RPA
and the ZPA. In section 3, we give a detailed description of the Same-Values
Analysis. Section 4 is a summary of the possibility of the RPA, the ZPA and the
SVA on standardized elliptic curves. In this section, we will show that the only
standardized curve secured against the RPA and the ZPA, is not secured against
the SVA. In section 5, we discuss on the isogeny defence. In section 6 we discuss
on countermeasures to prevent the SVA. Finally, we conclude in section 7.

2 Elliptic Curve Cryptosystems

In a finite field K = Fp , with p a prime such that p > 3, an elliptic curve can be
described by his Weierstrass form:

E : y 2 = x3 + ax + b .

We denote by E(K) the set of points (x, y) ∈ K satisfying the equation, plus
the point at infinity O. E(K) has an Abelian group structure. Let P1 = (x1 , y1 )
and P2 = (x2 , y2 ) two points in E(K), different from the point O. The point
P3 = (x3 , y3 ) = P1 + P2 can be computed as:

x3 = λ2 − x1 − x2 , y3 = λ(x1 − x3 ) − y1 .
y1 −y2 3x21 +a
with λ = x1 −x2 if P1 
= P2 , and λ = 2y1 if P1 = P2 .

2.1 Elliptic Curve in Projective Jocabian Coordinates

To avoid costly inversions, one can use the Jacobian projective coordinates sys-
tem. A point P = (x, y) is denoted by P = (X : Y : Z) in Jacobian coordinates
Same Values Analysis 185

with x = X/Z 2 and y = Y /Z 3 , then P = (X : Y : Z) = (Z 2 x, Z 3 y, Z). The


point at infinity is denoted by (1 : 1 : 0). The equation of an elliptic curve in the
Jacobian projective coordinates system is:
E J : Y 2 = X 3 + aXZ 4 + bZ 6 .
We give addition (ECADDJ ) and doubling (ECDBLJ ) formulas in the Jacobian
projective coordinates system. Let P1 = (X1 : Y1 : Z1 ) and P2 = (X2 : Y2 : Z2 )
two points of E J (K).
– ECDBLJ . P3 = (X3 : Y3 : Z3 ) = 2P1 is computed as:
X3 = T, Y3 = −8Y14 + M (S − T ), Z3 = 2Y1 Z1 ,
S = 4X1 Y12 , M = 3X12 + aZ14 , T = −2S + M 2
– ECADDJ . P3 = (X3 , Y3 , Z3 ) = P1 + P2 is computed as:
X3 = −H 3 − 2U1 H 2 + R2 , Y3 = −S1 H 3 + R(U1 H 2 − X3 ), Z3 = Z1 Z2 H,
U1 = X1 Z22 , U2 = X2 Z12 , S1 = Y1 Z23 , S2 = Y2 Z13 , H = U2 −U1 , R = S2 −S1
Our attack is presented in Jacobian coordinates, because it is the most commonly
used, but it can also work on other representations.
It is important to know the precise implementation of doubling and addition
to mount the proposed attack. They are respectively given in algorithm (1)
and (2).
Given P1 = (λ21 x1 , λ31 y1 , λ1 ) and P2 = (λ22 x2 , λ32 y2 , λ2 ), we will see that the
degrees of λ1 and λ2 of the terms during the doubling of P1 or during the addition
of P1 and P2 are important for our attack. Informations on the right show the
degrees of the parameters λ1 , λ2 of each operand and the result. In the doubling
algorithm (1), we denote by n an operand with a term λ1 of degree n. In the
addition algorithm (2), we denote by l1 m2 an operand with a term λ1 of degree
l, and a term λ2 of degree m. ×, − and + are field operations.

Algorithm 1. ECDBLJ
Input: P1 = (X1 , Y1 , Z1 ) = (λ21 x1 , λ31 y1 , λ1 )
Output: P3 = (X3 , Y3 , Z3 ), P3 = 2P1
1: T4 ← X1 , T5 ← Y1 , T6 ← Z1
2: T1 ← T4 × T4 ; {= X12 } (4 ← 2 × 2)
3: T2 ← T5 × T5 ; {= Y12 } (6 ← 3 × 3)
4: T2 ← T2 + T2 ; {= 2Y12 } (6 ← 6 + 6)
5: T4 ← T4 × T2 ; {= 2X1 Y12 } (8 ← 2 × 6)
6: T4 ← T4 + T4 ; {= 4X1 Y12 = S} (8 ← 8 + 8)
7: T2 ← T2 × T2 ; {= 4Y14 } (12 ← 6 × 6)
8: T2 ← T2 + T2 ; {= 8Y14 } (12 ← 12 + 12)
9: T3 ← T6 × T6 ; {= Z12 } (2 ← 1 × 1)
10: T3 ← T3 × T3 ; {= Z14 } (4 ← 2 × 2)
11: T6 ← T5 × T6 ; {= Y1 Z1 } (4 ← 3 × 1)
12: T6 ← T6 + T6 ; {= 2Y1 Z1 } (4 ← 4 + 4)
13: T5 ← T1 + T1 ; {= 2X12 } (4 ← 4 + 4)
14: T1 ← T1 + T5 ; {= 3X12 } (4 ← 4 + 4)
186 C. Murdica et al.

15: T 3 ← a × T3 ; {= aZ14 } (4 ← 0 × 4)
16: T1 ← T 1 + T3 ; {= 3X12 + aZ14 = M } (4 ← 4 + 4)
17: T3 ← T1 × T1 ; {= M 2 } (8 ← 4 × 4)
18: T3 ← T 3 − T4 ; {= −S + M 2 } (8 ← 8 − 8)
19: T3 ← T3 − T4 ; {= −2S + M 2 = T } (8 ← 8 − 8)
20: T 4 ← T 4 − T3 ; {= S − T } (8 ← 8 − 8)
21: T1 ← T1 × T4 ; {= M (S − T )} (12 ← 4 × 8)
22: T 4 ← T 1 − T2 ; {= −8Y14 + M (S − T ))} (12 ← 12 − 12)
23: X3 ← T3 , Y3 ← T4 , Z3 ← T6
24: return (X3 , Y3 , Z3 )

Algorithm 2. ECADDJ
Input: P1 = (X1 , Y1 , Z1 ) = (λ21 x1 , λ31 y1 , λ1 ), P2 = (X2 , Y2 , Z2 ) = (λ22 x2 , λ32 y2 , λ2 )
Output: P3 = (X3 , Y3 , Z3 ), P3 = P1 + P2
1: T2 ← X1 , T3 ← Y1 , T4 ← Z1 , T5 ← X2 , T6 ← Y2 , T7 ← Z2
2: T1 ← T7 × T7 ; {= Z22 } (22 ← 12 × 12 )
3: T2 ← T2 × T7 ; {= X1 Z22 = U1 } (21 22 ← 21 × 22 )
4: T3 ← T3 × T7 ; {= Y1 Z2 } (31 12 ← 31 × 12 )
5: T3 ← T3 × T1 ; {= Y1 Z23 = S1 } (31 32 ← 31 12 × 22 )
6: T1 ← T4 × T4 ; {= Z12 } (21 ← 11 × 11 )
7: T5 ← T5 × T1 ; {= X2 Z12 = U2 } (21 22 ← 22 × 21 )
8: T6 ← T6 × T4 ; {= Y2 Z1 } (11 32 ← 32 × 11 )
9: T6 ← T6 × T1 ; {= Y2 Z13 = S2 } (31 32 ← 11 32 × 21 )
10: T5 ← T5 − T2 ; {= U2 − U1 = H} (21 22 ← 21 22 − 21 22 )
11: T7 ← T4 × T7 ; {= Z1 Z2 } (11 12 ← 11 × 12 )
12: T7 ← T5 × T7 ; {= Z1 Z2 H = Z3 } (31 32 ← 11 12 × 21 22 )
13: T6 ← T6 − T3 ; {= S2 − S1 = R} (31 32 ← 31 32 − 31 32 )
14: T1 ← T5 × T5 ; {= H 2 } (41 42 ← 21 22 × 21 22 )
15: T4 ← T6 × T6 ; {= R2 } (61 62 ← 31 32 × 31 32 )
16: T2 ← T2 × T1 ; {= U1 H 2 } (61 62 ← 21 22 × 41 42 )
17: T5 ← T1 × T5 ; {= H 3 } (61 62 ← 41 42 × 21 22 )
18: T4 ← T4 − T5 ; {= −H 3 + R2 } (61 62 ← 61 62 − 61 62 )
19: T1 ← T2 + T2 ; {= 2U1 H 2 } (61 62 ← 61 62 + 61 62 )
20: T4 ← T4 − T1 ; {= −H 3 − 2U1 H 2 + R2 = X3 } (61 62 ← 61 62 − 61 62 )
21: T2 ← T2 − T4 ; {= U1 H 2 − X3 } (61 62 ← 61 62 − 61 62 )
22: T6 ← T6 × T2 ; {= R(U1 H 2 − X3 )} (91 92 ← 31 32 × 61 62 )
23: T1 ← T3 × T5 ; {= S1 H 3 } (91 92 ← 31 32 × 61 62 )
24: T1 ← T3 × T5 ; {= −S1 H 3 + R(U1 H 2 − X3 )} (91 92 ← 91 92 − 91 92 )
25: X3 ← T4 , Y3 ← T1 , Z3 ← T7
26: return (X3 , Y3 , Z3 )

2.2 Elliptic Curve Scalar Multiplication


In Elliptic Curve Cryptosystem, one has to compute an Elliptic Curve Scalar
Multiplication (ECSM), that is the computation of the point Q = dP , for an
integer d. One can use the binary method.
Same Values Analysis 187

Algorithm 3. Binary Method


Input: d = (dN−1 , . . . , d0 )2 , P, dN−1 = 1
Output: A = dP
A←P
for i = N − 2 downto 0 do
A ← ECDBLJ (A)
If di = 1 then A ← ECADDJ (A, P )
end forreturn A

The binary method is vulnerable to the Simple Power Analysis (SPA). The
Montgomery Ladder can be used, secured against the SPA.

Algorithm 4. Montgomery Ladder


Input: d = (dN−1 , . . . , d0 )2 , P, dN−1 = 1
Output: A = dP
A ← P, B ← ECDBLJ (P )
for i = N − 2 downto 0 do
If di = 0 then B ← ECADDJ (B, A), A ← ECDBLJ (A)
If di = 1 then A ← ECADDJ (A, B), B ← ECDBLJ (B)
end forreturn A

Our attack is presented against the Montgomery Ladder but it can also work
on other algorithms.

2.3 DPA Attack and Countermeasures


If the same scalar d is used several times, with the same base point, the Mont-
gomery Ladder is vulnerable to the DPA. The countermeasures given below can
be used to prevent the DPA.

Random Projective Coordinates [6]. A point P = (X : Y : Z) in Jacobian


coordinates is equivalent to any point (r2 X : r3 Y : rZ), with r ∈ K∗ . One can
randomize the base point at the beginning of the ECSM by choosing a random r.

Random Curve Isomorphism [10]. A curve E defined by E : y 2 = x3 +ax+b


in affine coordinate is isomorphic to the curve E  defined by E  : y 2 = x3 + a x +
b if and only if there exists u ∈ K∗ such that u4 a = a and u6 b = b. The
isomorphism ϕ is defined as:

∼  O→O
ϕ:E− →E,
(x, y) → (u−2 x, u−3 y)
and 
−1  ∼ O→O
ϕ :E −
→ E,
(x, y) → (u2 x, u3 y)
188 C. Murdica et al.

The countermeasure consists of computing the ECSM on a random curve E 


instead of E.

2.4 RPA and ZPA Attacks and Countermeasures

DPA countermeasures of section 2.3 do not protect against the RPA [9] and the
ZPA [2]. The RPA assumes that the scalar is fixed for several ECSM and the
base point P can be chosen.
The attacker starts by finding special points with zero values from the elliptic
curve E given:

– Point (x, 0): a point of this form is of order 2. In Elliptic Curve Cryptosys-
tems, the order of the provided base point is checked and points of order 2
never appear during an ECSM.
– Point (0, y): this point does not give a special order of the point, so it can
appear during the ECSM.

Let P0 = (0, y). Suppose that the Montgomery Ladder (algorihm (4)) is used
to compute an ECSM. Suppose that the attacker already knows the N − i − 1
leftmost bits of the fixed scalar d = (dN −1 , dN −2 . . . , di+1 , di , di−1 , . . . , d0 )2 . He
tries to recover the unknown bit di .
The attacker computes the point P = ((dN −1 , dN −2 . . . , di+1 )−1 2 mod E)P0
and gives P to the targeted chip that computes the ECSM using the scalar d.
If di = 0, then the point P0 will be doubled during the ECSM. If the attacker
if able to recognize a zero value in a register during the doubling, he can then
conclude whether his hypothesis was correct.
The ZPA [2] uses the same approach, except the attack is not only interested
in zero values in coordinates but in intermediate registers when computing the
double of a point, or during the addition of two points. Such a point is defined
as a zero-value point.

Theorem 1. Let E be an elliptic over K = Fp , with p > 3, defined by the


equation y 2 = x3 + ax + b. If the algorithm ECDBLJ (1) is used to compute the
double of the point P = (x, y), P is a zero-value point if and only if one of the
following conditions given below is satisfied:
3x2 + a = 0 (ED1)
5x4 + 2ax2 − 4bx + a2 = 0 (ED2)
P has order 3 (ED3)
x = 0 or x2P = 0 (ED4)
y = 0 or y2P = 0 (ED5)
Moreover, the zero-value point is not randomized either by random projective
coordinates or random isomorphic curve.

The proof of the theorem is given in [2].

Remark 1. The condition (ED2) can be avoided by changing the way to compute
T in ECDBLJ (1). See [2] for more details.
Same Values Analysis 189

Some countermeasures to prevent RPA and ZPA are given below.

Scalar Randomization [6]. Randomization of the scalar using d = d + rE is


effective against ZPA and ZPA.

Random Scalar Split [4]. Random scalar splitting, such as computing d1 P +


d2 P with d = d1 +d2 , is effective against ZPA and RPA. Other splitting methods
are given in [4].

Point Blinding [6]. Computing Q = d(P + R) instead of dP , with R a random


point is effective against RPA and ZPA.

Isogeny Defence [12,3]. Computing an ECSM on a curve E  isogenous to E


such that E  does not contain any non-trivial zero-value point is effective against
the RPA and the ZPA. This countermeasure was introduced in [12], but only to
prevent the RPA. The countermeasure was extended to also prevent the ZPA
in [3].

3 Same Values Analysis


In this section we describe our new attack, called Same Values Analysis. We
introduce special points on elliptic curves that have the property to show up
same values among intermediate variables during the doubling of a point. Special
points with the similar property during addition of points are also introduced.
Finally, we describe our attack based on those special points.

3.1 Special Points of Same Values during Doubling


Let P1 = (X1 : Y1 : Z1 ) = (λ2 x1 : λ3 y1 : λ) for some λ ∈ F∗p , and P3 = (X3 : Y3 :
Z3 ) = 2P1 . We are interested in an equality of values during the computation of
P3 . We define the same-values points.

Definition 1. Let E be an elliptic over K = Fp , with p > 3, and an algorithm


of doubling ECDBL. A point P = (x, y) in E is said to be a same-values point
relative to ECDBL if and only if any representation of P (i.e. the equivalent
points (λ2 x : λ3 y : λ) for all λ ∈ K∗ in Jacobian coordinates) shows up same
values among intermediate variables during the computation of the point 2P
using the algorithm ECDBL.

We give the following theorem.

Theorem 2. Let E be an elliptic over K = Fp , with p > 3, defined by the


equation y 2 = x3 +ax+b. The point P = (x, y) ∈ E is a same-values point relative
to the algorithm ECDBLJ (1) if and only if one of the following conditions given
below is satisfied:
190 C. Murdica et al.

x=0 (SED1) −x2 + a = 0 (SED16)


x=1 (SED2) a=0 (SED17)
x2 = y (SED3) 2xy 2 = (3x2 + a)2 (SED18)
2
−2x + a = 0 (SED4) 6xy 2 = (3x2 + a)2 (SED19)
y=1 (SED5) 10xy 2 = (3x2 + a)2 (SED20)
2y = 1 (SED6) −10xy 2 = (3x2 + a)2 (SED21)
2x2 = 1 (SED7) x2P = 0 (SED22)
3x2 = 1 (SED8) 12xy 2 = (3x2 + a)2 (SED23)
2x2 + a = 1 (SED9) 3x2 + a = 0 (SED24)
y=0 (SED10) −16xy 2 = (3x2 + a)2 (SED25)
y = 2x2 (SED11) 4y 2 = (3x2 + a)2 (SED26)
y = 3x2 (SED12) 4y 4 = (3x2 + a)(12xy − (3x2 + a)2 ) (SED27)
y = 3x2 + a (SED13) 12y 4 = (3x2 + a)(12xy − (3x2 + a)2 ) (SED28)
2
2y = 3x (SED14) 8y 4 = (3x2 + a)(12xy − (3x2 + a)2 ) (SED29)
2
2y = 3x + a (SED15) 16y 4 = (3x2 + a)(12xy − (3x2 + a)2 ) (SED30)
Moreover, by definition, the same values appear even if the random projective
coordinates countermeasure is used.

Proof. Given the definition of a same-values point and given a point P1 = (X1 :
Y1 : Z1 ) = (λ21 x1 : λ31 y1 : λ1 ), we have to check equalities during the doubling
whatever the value of λ1 . So we have to check equalities between terms with the
same degree of λ1 , and zero values between all terms. We define by Si the set of
values that involve a term in λ1 with a degree i. Looking at algorithm (1), we
have:

– S2 = {X1 , Z12 },
– S4 = {X12 , Y1 Z1 , 2Y1 Z1 , 2X12 , 3X12 , aZ14 , M },
– S6 = {Y12 , 2Y12 },
– S8 = {2X1 Y12 , M 2 , −S + M 2 , T, S − T },
– S12 = {4Y14 , 8Y14 , M (S − T ), −8Y14 + M (S − T ))}

Equal values can only be found in the same set. Checking equality of each term
by set, and developing give the relations of the theorem. Checking equal zero
values between all terms give no additional condition. 


3.2 Special Points of Same Values during Addition

Let P1 = (X1 : Y1 : Z1 ) = (λ21 x1 : λ31 y1 : λ1 ), P1 = (X1 : Y1 : Z1 ) = (λ22 x2 :


λ32 y2 : λ2 ) for some λ1 , λ2 ∈ F∗p and P3 = (X3 : Y3 : Z3 ) = P1 + P2 . We give here
the similar definition and theorem as in the above section.

Definition 2. Let E be an elliptic over K = Fp , with p > 3, and an algorithm


of addition ECADD. Points P1 = (x1 , y1 ), P2 = (x2 , y2 ) in E are said to be
same-values points relative to ECADD if and only if any representations of P1
and P2 show up an equality of intermediate values during the computation of the
point P1 + P2 using the algorithm ECADD.
Same Values Analysis 191

Theorem 3. Let E be an elliptic over K = Fp , with p > 3, defined by the equa-


tion y 2 = x3 + ax + b. The points P1 = (x1 , y1 ), P2 = (x2 , y2 ) are same-values
points relative to the algorithm ECADDJ (2) if and only if one of the following
conditions given below is satisfied:
x1 = x2 (SEA1) (y2 − y1 )2 = x2 (x2 − x1 )2 (SEA14)
2x1 = x2 (SEA2) (y2 − y1 )2 = −(x2 − x1 )2 (2x1 + x2 ) (SEA15)
x1 = 0 (SEA3) (y2 − y1 )2 = (x2 − x1 )2 (x1 + x2 ) (SEA16)
y 1 = y2 (SEA4) (y2 − y1 )2 = 2(x2 − x1 )3 (SEA17)
y1 = x2 − x1 (SEA5) (y2 − y1 )2 = x1 (x2 − x1 )2 (SEA18)
2y1 = y2 (SEA6) (y2 − y1 )2 = 2x2 (x2 − x1 )2 (SEA19)
y2 = x2 − x1 (SEA7) (y2 − y1 )2 = 3x1 (x2 − x1 )2 (SEA20)
y1 = 0 (SEA8) (y2 − y1 )2 = (x2 − x1 )2 (x1 + x2 ) (SEA21)
y2 − y1 = x2 − x1 (SEA9) (y2 − y1 )2 = (x2 − x1 )3 (SEA22)
2x1 − x2 = 0 (SEA10) 2(y2 − y1 )2 = (x2 − x1 )2 (x1 + 2x2 ) (SEA23)
3x1 = x2 (SEA11) (y2 − y1 )2 = (x2 − x1 )2 (3x1 + x2 ) (SEA24)
x1 = x2 (SEA12) (y2 − y1 )2 = (x2 − x1 )2 (−2x1 + x2 ) (SEA25)
x2 + x1 = 0 (SEA13) 2(y2 − y1 )2 = (x2 − x1 )2 (3x1 + 2x2 ) (SEA26)
2(y2 − y1 )2 = (x2 − x1 )2 (2x1 + x2 ) (SEA27)
(y2 − y1 )2 = 2x1 (x2 − x1 )2 (SEA28)
(y2 − y1 )(x1 (x2 − x1 )2 − xP1 +P2 ) = y1 (x2 − x1 )3 (SEA29)
(y2 − y1 )(x1 (x2 − x1 )2 − xP1 +P2 ) = 2y1 (x2 − x1 )3 (SEA30)
Moreover, by definition, the same values appear even if the random projective
coordinates countermeasure is used.
Proof. Given the definition of a same-values point and given points P1 = (λ21 x1 :
λ31 y1 : λ1 ) and P2 = (λ22 x2 : λ32 y2 : λ2 ), we have to check equalities during the
addtion whatever the values of λ1 , λ2 . So we have to check equalities between
terms with the same degree of λ1 , λ2 , and zero values between all terms. We
define by Si,j the set of values that involve a term in λ1 with a degree of i and
a term of λ2 with a degree of j, looking at algorithm (1), we have:
– S2,2 = {U1 , U2 , H},
– S3,3 = {S1 , S2 , Z3 , R},
– S6,6 = {R2 , U1 H 2 , H 3 , −H 3 + R2 , 2U1 H 2 , X3 , U1 H 2 − X3 },
– S9,9 = {R(U1 H 2 − X3 ), S1 H 3 , Y3 }
Equal values can only be found in the same set. Checking equality of each term
by set, and developing give the relations of the theorem. Checking equal zero
values between all terms give no additional condition. 


3.3 Collision Power Analysis


Collision Power Analysis consists in comparing power consumption between dif-
ferent traces or detecting collision inside the same trace. The later is called
Internal Collision Analysis.
Collision Power Attacks exist against ECC implementation, the first one was
introduced by Fouque and Valette and is called the Doubling Attack [8]. The
192 C. Murdica et al.

attack consists of comparing two traces during the computation of the ECSM
with the base point P and the computation of the ECSM with the base point 2P .
However, this attack does not work if one of the countermeasures of section 2.3
is used. This is not the case of the RPA, ZPA and SVA.
A different approach of Collision Power Analysis was introduced by Schramm
et al. [11] to attack an implementation of the DES. Their attack consists in de-
tecting collision in the same trace during the computation of an algorithm, not
in different traces. Clavier et al. exposed an attack against a protected imple-
mentation of the AES, using the same principle [5].
An internal collision attack on ECC and RSA implementation was proposed
in [13], but it is restricted to inputs of low order, which are avoided in Elliptic
Curve Cryptosystems. In [7], they combined active and passive attacks: they
introduce a fault in the high order base point to become a low order base point
of another curve; they exploit the fact that the point at infinity shows up on
certain conditions of the scalar used.
Our attack is the first attack based on Internal Collision Analysis on ECC
implementation with a base point of high order.

3.4 Collision Power Analysis on ECC Using Same-Values Points


Not all conditions given in theorem 2 and 3 lead to a successful attack. Collision
power analysis can be detected if a operation with same inputs is computed
several times, namely field multiplication, addition or subtraction.
Among all conditions of theorem 2 and 3, the conditions below give the result
needed:
– (SED2): x = 1, this condition implies that the power consumption during
the computation of the square at line 2 and 10 of algorithm (1) are the same.
– (SED3): y = x2 , this condition implies that the power consumption during
the computation of the addition at line 12 and 13 of algorithm (1) are the
same.
– (SED15): 2y = 3x2 + a, this condition implies that the power consumption
during the computation of the square at line 17 and the square of the value
Z3 of algorithm (1), which will occur when P3 will be doubling or added.
– (SEA9): y2 −y1 = x2 −x1 , this condition implies that the power consumption
during the computation the square at line 15 and the square of the value Z3
of algorithm (2), which will occur when P3 will be doubling or added.
Suppose that the Montgomery Ladder (4) is used to compute an ECSM. Suppose
that the attacker already knows the N − i − 1 leftmost bits of the fixed scalar
d = (dN −1 , dN −2 . . . , di+1 , di , di−1 , . . . , d0 )2 . He tries to recover the unknown bit
di .
The attacker choose a point P0 satisfying one of the conditions (SED2),
(SED3) or (SED15), computes the point P = ((dN −1 , dN −2 . . . , di+1 )−1 2 mod
E)P0 and gives P to the targeted chip that computes the ECSM using the
fixed scalar d. If di = 0, the point P0 will be doubled during the ECSM, and a
collision of power consumption will appear during the ECSM.
Same Values Analysis 193

The attacker recover several traces of the power consumption during the com-
putation of dP . He tries to detect internal collision of power consumption of each
trace using the methodology of [11] and [5]. If a collision is detected, he can con-
clude that di = 0. Otherwise, he conclude that di = 1.
Using this method, the attacker can recursively recover all bits of d.

4 Same-Values Points on Standardized Curves


The method to find same-values points is similar to the method to find zero-
value points. The interested reader should refer to [2]. We give a summary of
non trivial zero-value points and same-values points on standardized curves of
SECG [1] with a size of at least 160. With reference to remark 1, the condition
(ED2) can be avoided. We denote by × if the curve contains a point satisfying
the condition, and by - otherwise.

ZPA SVA SVA


RPA
(0, y) (ED1) (SED2) (SED3) (SED15) summary
secp160r1 × - - - × ×
secp160r2 × - - × - ×
secp192r1 × × - × × ×
secp224r1 - - - × × ×
secp256r1 × - - × × ×
secp384r1 × × - - - ×
secp521r1 × × × - - ×

We can see that our attack works against all standardized curves above. Note
that the curve secp224r1 does not contain any zero-value point, but it contains
same-values points. The curve secp224r1 is then secure against the RPA and the
ZPA but not against the SVA.

5 Isogeny Defence Discussion

As mentioned in section 2.4, a countermeasure to prevent the RPA and the ZPA
consists of using a curve E  isogenous to the original curve E, such that E 
not contain any zero-value points. This countermeasure was introduced in [12],
but only to prevent the RPA. The countermeasure was extended in [3] to also
prevent the ZPA. They also give an algorithm to, given a curve E, find a curve
E  l-isogenous to E such that:
– l is as small as possible (if l > 107, the countermeasure is not applied)
– E  does not contain any zero-value points
– E  , with equation y 2 = x3 + a x + b , satisfies a = −3, for efficiency
Those conditions are not enough because of our new attack, the SVA.
194 C. Murdica et al.

In [3], they give isogenous curves of standardized curves SECG [1]. We denote
by I(secpXrY) a possible isogenous curve of secpXrY satisfying the conditions
above, computed using the algorithm given in [3]. The curves parameters are
given in appendix. If the isogeny degree is greater than 107, I(secpXrY) is not
computed (this is the case for secp160r2, secp192r1 and secp384r1). We give
below a summary of the presence of same-values points on these curves. Degree
is the degree of the isogeny between secpXrY and I(secpXrY).

ZPA SVA SVA


degree RPA
(0, y) (ED1) (SED2) (SED3) (SED15) summary
I(secp160r1) 13 - - - × - ×
I(secp224r1) 1 - - - × × ×
I(secp256r1) 23 - - - × × ×
I(secp521r1) 5 - - - - × ×

We can see that isogenous curves obtained with the algorithm in [3] are secure
against the RPA and the ZPA, but not against the SVA. If one uses the isogeny
defence as a countermeasure, he must update the algorithm to find isogenous
curves that are also secure against the SVA.

6 Countermeasures to Prevent the SVA


In this section we discuss on some countermeasures to prevent the SVA.
Like the RPA and the ZPA, the attack is recursive on the bits of the secret
scalar. Then scalar randomization [6] or scalar splitting [4] are effective against
the RPA, the ZPA and the SVA.
Like the RPA and the ZPA, the attack can be mounted if the base point can
be chosen by the attacker. Then base point blinding [6] is effective against the
RPA, the ZPA and the SVA.
The isogeny defence [12,3] was discussed in the above section. The counter-
measure must be updated to also prevent the SVA.
The random projective coordinates [6] countermeasure does not prevent the
RPA, the ZPA and the SVA. That was explained in [9,2] and in this article.
The random isomorphic curve [10] described in section 2.3 is an interested
case. We can remark that if a point P = (x, y) is a same-values point of an
elliptic curve E relative to ECDBLJ , that does not involve that P is a same-
values point of an elliptic curve E  , with E  isomorphic to E. In fact, let ϕ the
morphism as defined in section 2.3 and P  = ϕ(P ) = (u−2 x, u−3 y). In Jacobian
coordinates, P  = (X  : Y  : Z  ) = (λ2 u−2 x, λ3 u−3 y, λ) for some λ. The same
remark holds for addition of points. Then, if the point P  is doubled, one have
to consider the degree of u and λ in terms.
The conditions given in theorems (2) and (3) for the elliptic curve E does
not all hold for the elliptic curve E  . The particular conditions (SED2), (SED3),
(SED15) and (SEA9) used to mount our attack does not hold in E  : our attack
described in section 3.4 will fail if the countermeasure is used.
Same Values Analysis 195

We give a summary of countermeasures effective against RPA, ZPA and the


SVA. meansthat the countermeasure is effective, × means that it fails.
RPA ZPA SVA
Scalar Randomisation [6]   
Scalar Splitting [4]   
Point blinding [6]   
Isogeny defence [12,3]   (if updated)
Random Projective coordinates [6] × × ×
Random isomorphic curve [10] × × 

7 Conclusion
We introduced the first attack on Elliptic Curve Cryptosystem implementation
based on internal collision analysis, with a base point of high order. The attack,
called Same-Values Analysis, is based on special points that show up equalities on
intermediate values during a doubling of a point. These special points are called
same-values points. The random projective coordinates countermeasure [6] does
not prevent the attack. We showed that the only standardized curve SECG [1]
that does not contain any zero-value point to perform the RPA or ZPA, contains
same-values points: we can then apply our attack on this curve. We also showed
that the isogney defence to prevent the RPA and the ZPA must be updated to
also prevent the SVA.
Scalar randomization [6], scalar splitting [4] or base point blinding [6] should
be used to prevent against the RPA, the ZPA and the SVA.
Further work is to evaluate the SVA on real implementation and compare it
with the RPA and ZPA.

References
1. Standard for Efficient Cryptography (SECG), http://www.secg.org/
2. Akishita, T., Takagi, T.: Zero-Value Point Attacks on Elliptic Curve Cryptosystem.
In: Boyd, C., Mao, W. (eds.) ISC 2003. LNCS, vol. 2851, pp. 218–233. Springer,
Heidelberg (2003)
3. Akishita, T., Takagi, T.: On the Optimal Parameter Choice for Elliptic Curve
Cryptosystems Using Isogeny. In: Bao, F., Deng, R., Zhou, J. (eds.) PKC 2004.
LNCS, vol. 2947, pp. 346–359. Springer, Heidelberg (2004)
4. Ciet, M., Joye, M.: (Virtually) Free Randomization Techniques for Elliptic Curve
Cryptography. In: Qing, S., Gollmann, D., Zhou, J. (eds.) ICICS 2003. LNCS,
vol. 2836, pp. 348–359. Springer, Heidelberg (2003)
5. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Improved Collision-
Correlation Power Analysis on First Order Protected AES. In: Preneel, B., Takagi,
T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 49–62. Springer, Heidelberg (2011)
196 C. Murdica et al.

6. Coron, J.-S.: Resistance against Differential Power Analysis for Elliptic Curve
Cryptosystems. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp.
292–302. Springer, Heidelberg (1999)
7. Fan, J., Gierlichs, B., Vercauteren, F.: To Infinity and Beyond: Combined Attack
on ECC Using Points of Low Order. In: Preneel, B., Takagi, T. (eds.) CHES 2011.
LNCS, vol. 6917, pp. 143–159. Springer, Heidelberg (2011)
8. Fouque, P.-A., Valette, F.: The Doubling Attack – Why Upwards Is Better than
Downwards. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS,
vol. 2779, pp. 269–280. Springer, Heidelberg (2003)
9. Goubin, L.: A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems. In:
Desmedt, Y.G. (ed.) PKC 2003. LNCS, vol. 2567, pp. 199–210. Springer, Heidelberg
(2002)
10. Joye, M., Tymen, C.: Protections against Differential Analysis for Elliptic Curve
Cryptography. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS,
vol. 2162, pp. 377–390. Springer, Heidelberg (2001)
11. Schramm, K., Wollinger, T., Paar, C.: A New Class of Collision Attacks and Its
Application to DES. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 206–
222. Springer, Heidelberg (2003)
12. Smart, N.P.: An Analysis of Goubin’s Refined Power Analysis Attack. In:
Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 281–290.
Springer, Heidelberg (2003)
13. Yen, S.-M., Lien, W.-C., Moon, S.-J., Ha, J.C.: Power Analysis by Exploiting
Chosen Message and Internal Collisions – Vulnerability of Checking Mechanism
for RSA-Decryption. In: Dawson, E., Vaudenay, S. (eds.) Mycrypt 2005. LNCS,
vol. 3715, pp. 183–195. Springer, Heidelberg (2005)

Appendix: Sstandardized Curves SECG [1] and Isogenous


Curves
We give standardized curves SECG [1] and isogenous curve that does not contain
zero-value point. The curves are defined over Fp with the equation y 2 = x3 +
ax + b. The parameters are in hexadecimal notation.

secp160r1
secp160r1:
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFF
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFC
b = 1C97BEFC 54BD7A8B 65ACF89F 81D4D4AD C565FA45
I(secp160r1) (the isogeny degree is 13):
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFF
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 7FFFFFFC
b = 1315649B C931E413 D426D94E 979B5FF8 83FE89C1
Same Values Analysis 197

secp224r1

secp224r1:
p = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 00000000 00000000 00000001
a = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE FFFFFFFF FFFFFFFF FFFFFFFE
b = B4050A85 0C04B3AB F5413256 5044B0B7 D7BFD8BA 270B3943 2355FFB4
I(secp224r1) = secp224r1 (the isogeny degree is 1)

secp256r1
secp256r1:
p = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFF
a = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFC
b = 5AC635D8 AA3A93E7 B3EBBD55 769886BC 651D06B0 CC53B0F6 3BCE3C3E
27D2604B
I(secp256r1) (the isogeny degree is 23):
p = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFF
a = FFFFFFFF 00000001 00000000 00000000 00000000 FFFFFFFF FFFFFFFF
FFFFFFFC
b = ACAA2B48 AECF20BC 9AB54168 A691BCE4 117A6909 342F0635 C278870F
3B71578F

secp521r1
secp521r1:

p = 01FF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF


FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF
a = 01FF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFC
b = 0051 953EB961 8E1C9A1F 929A21A0 B68540EE A2DA725B 99B315F3
B8B48991 8EF109E1 56193951 EC7E937B 1652C0BD 3BB1BF07 3573DF88
3D2C34F1 EF451FD4 6B503F00
198 C. Murdica et al.

I(secp521r1) (the isogeny degree is 5):

p = 01FF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF


FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF
a = 01FF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFC
b = 11C5 ED61BC94 D9A8B1D7 792DAEC6 86260850 B24C72FF F920258F
2203AC5E EEA586D3 3980EBEA B8733972 6E1C5545 28EB4DF3 8445A6D
1891F60B 5B09C2C7 86DDCFCA
The Schindler-Itoh-attack
in Case of Partial Information Leakage

Alexander Krüger

SRC - Security Research & Consulting GmbH


Graurheindorfer Straße 149a, 53177 Bonn, Germany,
alexander.krueger@src-gmbh.de
http://www.src-gmbh.de

Abstract. Schindler and Itoh proposed a side-channel attack on im-


plementations of the double-and-add-algorithm with blinded exponents,
where dummy additions can be detected with errors. Here this approach
is generalized to partial information leakage: If window methods are used,
several different types of additions occur. If the attacker can only dis-
criminate between some types of additions, but not between all types,
the so-called basic version of the attack is still feasible and the attacker
can correct her guessing errors and find out the secret scalar. Sometimes
generalized Schindler-Itoh methods can reveal even more bits than leak
by SPA. In fact this makes an attack on a 2bit-window-algorithm feasible
for a 32-bit randomization, where the attacker can distinguish between
additions of different values with error rates up to 0.15, but cannot detect
dummy additions. A barrier to applying the so-called enhanced version
to partial information leakage is described.

Keywords: side-channel analysis, SPA, Schindler-Itoh-attack, window-


methods, partial information leakage, dummy operations, exponent ran-
domization, elliptic curve cryptographys.

1 Introduction

Simple Power Analysis (SPA) and Differential Power Analysis (DPA) are a major
threat to implementations of ECC/RSA-Cryptosystems. In an implementation
of ECC the double-and-add-algorithm can be used to calculate the point dP
given the point P and a secret scalar d. Given one power trace, the attacker
might be able to distinguish between additions and doublings. Thus she can
find out the secret key, because in every round of the algorithm an addition is
performed, if and only if the corresponding bit is one. A countermeasure against
this attack is the insertion of dummy additions, which means that an addition
is performed in every round of the algorithm regardless the corresponding bit.
In this case still a DPA is possible, in which the attacker collects several power
traces and calculates the correlation between the power consumption and certain
intermediate values, s. [1]. A countermeasure against DPA is (additive) exponent
blinding: Here given the secret scalar d and the order y of the basepoint P as

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 199–214, 2012.

c Springer-Verlag Berlin Heidelberg 2012
200 A. Krüger

a group element of the elliptic curve, at every execution of the algorithm a


random number r is chosen and the blinded scalar d + ry is calculated. Then
(d + ry)P is computed using the double-and-add-algorithm, so the intermediate
values change at every execution of the algorithm.
Consider an implementation which is only partial SPA-resistant, i.e. the at-
tacker can distinguish between necessary additions and dummy addition with
a certain error rate. In this situation Schindler and Itoh propose in [6] a side-
channel attack. For every recorded power trace the attacker tries to find the
dummy additions by SPA and gets a hypothesis for every bit. Then she finds
several traces whith the same blinding factor or the same sum of blinding factors
and uses it to find the secret key, see chapter 2 of this paper or [6].
In the setting considered in [6] all bits leak with the same error rate. We will
consider scenarios, where only some information about the secret scala leaks by
SPA with a certain error rate. This is plausible, if window-methods are used
for the computation of dP instead of the double-and-add-algorithm. Window-
methods are a generalization of the double-and-add-algorithm, where k bits of
the secret scalar d are considered in one iteration step. For a description of
a 2-bit-window method with pseudo cod, see chapter 3. In this case there are
different types of additions, i.e. additions of P, 2P, . . . , (2k −1)P . If all considered
bits are zero, no addition is needed and a dummy addition may be performed.
It is possible that the attacker can distinguish between some kinds of operations
with a certain error rate, but not between all kinds. In this case the attacker
only gets partial information with a certain error rate and the naive approach
to apply the Schindler-Itoh-attack would be to guess the other bits by chance.
But this leads to high error rates, which make the attack unlikely to be feasible.
In this paper another approach is proposed in this situation: As a variation
of the basic version of the attack, the attacker can find collisions of blinding
factors only using partial information. Then she corrects the guessing errors, she
made by her initial SPA. To find out the rest of the sceret scalar she has to use
other cryptanalytic methods, see chapter 4 and 5. In the case where the attacker
can distinguish between the additions of P, 2P, . . . , (2k − 1)P , but cannot detect
dummy addition, she can even get information about the location of dummy
additions from the Schindler-Itoh-attack under certain conditions, see chapter
5.2. We also point out a barrier to applying the enhanced version of the attack
in the case of partial information leakage, see chapter 6.

2 Notation and the Schindler-Itoh-attack

2.1 Notation

As in the paper [6] we will use the following notation: We have an elliptic curve
E, which is defined over a finite field F , and a base point P ∈ E(F ). Furthermore
we have a secret scalar d of bit length k, i.e. d < 2k . The cryptographic device
computes the point dP and sends it to another party, as in the Diffie-Hellmann
key exchange. So the attacker is assumed to know the points P and dP and wants
The Schindler-Itoh-attack in Case of Partial Information Leakage 201

to find out d by monitoring the side-channels of the cryptographic device calcu-


lating dP. She has to solve the discrete-logarithm-problem using side-channels,
see [4] for a general discussion of this topic.
The attacker collects N power traces. Let y be the order of P as a group
element of the elliptic curve.
As a countermeasure against DPA the scalar is blinded. For j = 1, 2, . . . , N
we have the blinded scalars vj = d + rj y, where rj is a random number. Let
k + R be the bit length of the blinded exponent, i.e. vj < 2k+R . The binary
representation of vj is vj = (vj,k+R−1 , vj,k+R−2 , . . . , vj,0 )2 .

2.2 The Schindler-Itoh-attack

If the attacker is able to decide with a certain error rate from a single trace,
whether a given operation is a dummy addition, she can use the Schindler-Itoh-
attack to find the scalar. The Schindler-Itoh-attack applies to RSA and to ECC.
Here only ECC is considered. There are two versions of the Schindler-Itoh-attack:
The basic version and the enhanced version.

Basic Version. In the basic version the attacker finds a t-collision, i.e. t traces,
where the same factor is used for exponent blinding in each trace, and uses a
majority decision for every single bit of the blinded secret exponent. This way
she can correct the errors she made when guessing the secret scalar from single
traces.
There are essential 3 phases of the basic version of the attack:

1. Find t traces with the same blinding factor


2. Apply the majority decision rule for every bit
3. Correct the remaining errors brute force

Let vj = (vj,k+R−1 , vj,k+R−2 , . . . , vj,0 )2 and vm = (vm,k+R−1 , vm,k+R−2 , . . . ,


vm,0 )2 be the blinded secret scalars of two different traces.
Let ej = (ej,k+R−1 , ej,k+R−2 , . . . , ej,0 )2 and em = (em,k+R−1 , em,k+R−2 , . . . ,
em,0 )2 be the guessing errors, which means ej,i = 0 if and only if the attacker has
guessed vj,i correctly. Let b be the error rate, which means P rob(ej,i = 1) = b .
We assume that ej,i and ej,k are independently distributed for i = k. The outcome
of the SPAs of the attacker is ṽj = vj ⊕ ej and ṽm = vm ⊕ em .
In the first phase the attacker tries to find t traces with the same blinding
factor, i.e. traces with indices j1 , j2 , . . . , jt , such that rj1 = rj2 = . . . = rjt . She
decides, that two traces have the same blinding factor, if HAM (ṽj ⊕ ṽm ) < μ
holds for a certain threshold μ. This decision rule is justified by the fact, that if
both traces have the same blinding factor, the following holds:

HAM (ṽj ⊕ ṽm ) = HAM ((vj ⊕ ej ) ⊕ (vm ⊕ em )) = HAM (ej ⊕ em ) (1)

HAM (ej ⊕ em ) is binomial B(2b (1 − b ), n)-distributed, because ej and


em are independent. If the blinding factors of both traces are different, then
202 A. Krüger

HAM (ṽj ⊕ ṽm ) = HAM ((vj ⊕ ej ) ⊕ (vm ⊕ em )) is binomial B(1/2n)-distributed,


because vj , ej , vm and em are independent.
After the collisions are found the attacker applies in the second phase the
majority decision rule to every bit. It is still possible, that that the majority
decision yields wrong results for some of the bits. In the third phase the attacker
corrects these errors. The probability qt,b that for a given bit the majority
decision yields the wrong result is:


2u+1  
t s
qt,b = b (1 − b )t−s . (2)
s=u+1
s

Now if t = 2u + 1 traces with the same blinding factor are found, (k ∗ R) ∗ qt,b
false guesses are to be expected. The attacker does not know for which bits the
(k+R)∗qt,b k+R
majority decision was wrong. This yields i=0 i expected operations
to correct the remaining errors brute force. Note that the attacker is assumed
to know dp and, thus, assumed to be able to verify a certain hypothesis for d.

Enhanced Version. In the enhanced version the attacker finds several pairs of
u-tuples of traces, where for each pair the sum of the blinding factors of both u-
tuples are the same. This means we have two u-tuples of indices (j1 , j2 , . . . , ju )
and (i1 , i2 , . . . , iu ) corresponding to blinded scalars vk = d + rk y, such that
rj1 + rj2 ) + . . . + rju = ri1 + ri2 + . . . + riu . Finding several collisions yields a
system of linear equitations in the blinding factors, which can be solved. This
way the secret scalar d can be found.The steps of the enhanced version of the
attack are the following:

1. Find several u-tuples with the same sum of blinding factors. Obtain a system
of linear equations in the blinding factors r1 , r2 , . . . , rN over ZZ.
2. Find r1 , r2 , . . . , rN

with (r1 , r2 , . . . , rN ) = (r1 , r2 , . . . , rN

) + (c, c, . . . , c) by
solving this system.
3. Compute for all j < N : ṽj − rj y = d + rj y + ej − (rj − c)y = d + cy + ej .
Then determine d + cy ≡ d (mod y). In [6] an explicit algorithm for this step
is given.

In the enhanced version the definition of the error vectors ej is different than in
the basic version: ej is defined by ej := ṽj − vj , where vj is the correct blinded
exponent and ṽj is the erroneous blinded exponent, which is the outcome of the
SPA.

3 Leakage Scenarios for Partial Information

The precondition of the Schindler-Itoh-attack is that the attacker is able to de-


termine the whole secret scalar by SPA with a certain error rate. This is the case
when a double-and-add-algorithm is protected by dummy additions and the at-
tacker can distinguish between dummy additions and necessary additions with a
The Schindler-Itoh-attack in Case of Partial Information Leakage 203

certain error rate. We will now turn to scenarios, where only partial information
about the secret scalar leaks with a certain error rate. Such scenarios are plau-
sible, if a window-algorithm, where there are several types of additions, is used
instead of the double-and-add-algorithm. Perhaps the attacker can distinguish
between some types of addition, but cannot distinguish between all types of ad-
dition revealing only some information abozut the secret scalar. As an example
we consider a 2-bit window with dummy additions. Given an elliptic Curve E
defined over a finite field F this algorithm looks like this:
n−1
Input: A point P in E(F ), a scalar d = i=0 di 2i with di ∈ {0, 1} and n even
Output: The Point Q = dP

1. Precompute:2P and 3P
2. Q := 0
3. Q̃ := 0
4. i := n − 1
5. While (i > 0)
5.1 Q := 2Q
5.2 Q := 2Q
5.3 If di = 1 and di−1 = 1, then Q := Q + 3P
5.4 If di = 1 and di−1 = 0, then Q := Q + 2P
5.5 If di = 0 and di−1 = 1, then Q := Q + P
5.6 If di = 0 and di−1 = 0, then
5.6.1 Choose x ∈ {1, 2, 3} randomly
5.6.2 Q̃ := Q + xP
6. Return Q̃

Two bits of the secret scalar are considered at once. After two doublings there
are four possible types of operations: addition of P , addition of 2P , addition of
3P and dummy addition. A dummy addition is randomly one of the tree types
of addition, whose result is not used. (For further analysis of the algorithm, see
[5], p. 614). The dummy additions are a countermeasure against SPA. Without
dummy additions an attacker able to distinguish between additions and can find
out, if dl = dl−1 = 0 holds. Clearly, if an attacker can discriminate between all
four of these types, she can find out the whole scalar by SPA. If an attacker can
discriminate between all four types of operations with a tolerable error rate, the
attacker can just apply the normal Schindler-Itoh-attack. But it is also possible,
that the attacker can only gain partial information by SPA, which means that
she can discriminate between different classes of operation with a certain error
rate, but not between all four types of operations. We will consider two scenarios.

1. The attacker knows the used exponentiation algorithm and can differentiate
between doublings and additions. Furthermore she can distinguish additions,
which are necessary for the calculation of dP , and dummy additions with
a certain error rate. But she cannot decide whether P , 2P or 3P is added.
This is plausible, if the attacker uses an address bit attack, see [3].
204 A. Krüger

2. The attacker knows the used exponentiation algorithm and can differentiate
between doublings and additions. Also she can decide, whether P , 2P , or 3P
is added with a certain error rate, but she cannot detect dummy additions.
In both scenarios the attacker can find out some information on the secret scalar
with a certain error rate, but not all information. In the two following chapters
it will be analyzed, whether an attack like the basic version of the Schindler-
Itoh-attack can be used to find the secret scalar. For that the attacker must be
able to find collisions of blinding factors and to correct her guessing errors. After
that she can use a variation of the BSGS-algorithm to find the secret scalar.

4 The First Leakage Scenario


To extend the Schindler-Itoh-attack to the first scenario we consider the following
mapping:

φ : IN → IN,
(an−1 , an−2 , . . . , a0 )2 with even n → (bn/2−1 , bn/2−2 , . . . , b0 )2 ,
with bi = a2i + a2i+1 − a2i a2i+1

This mapping corresponds to the detection of dummy additions: If at least one


of the two bits a2i and a2i+1 is equal to one then bi = a2i + a2i+1 − a2i a2i+1 = 1.
In this case the algorithm performs an addition, which is really necessary. If
a2i = a2i+1 = 0, then bi = a2i + a2i+1 − a2i a2i+1 = 0 and the algorithm
performs a dummy addition. So we have bi = 0, if and only if a dummy addition
is performed. This means given a secret scalar d, the attacker can find out φ(d)
with a certain error rate.
We now apply the first phase of the basic version of the attack: Given
k+R−1
vj = i=0 vj,i 2i with vj,i ∈ {0, 1} , where vj = d + rj y is the blinded
exponent, the attacker considers φ(vj ). Note that the mean of the hamming
weight of φ(vj ) = xj = (xj,(k+R)/2−1 , xj,(k+R)/2−2 , . . . , xj,0 )2 is (3(k + R))/8
since only 1/4 of the additions are dummy additions. The attacker can find out
xj ⊕ ej , where ej = (ej,(k+R)/2−1 , ej,(k+R)/2−2 , . . . , ej,0 ) is an error vector, i.e.
ej = 0 if and only if the attacker has guessed xj,i correctly. Let b = P rob(ej,i =
1) be the error rate. We assume that ej,i and ej,k are independently distributed
for i 
= k. As in [6] the attacker decides, that rj = rm holds, if HAM ((xj ⊕
ej ) ⊕ (xm ⊕ em )) < μ holds for a certain threshold μ. This can be justified by
the following lemma:

Lemma 1. HAM ((xj ⊕ ej ) ⊕ (xm ⊕ em )) is binomial B(2b (1 − b ), k+R


2 )-
b +2b k+R
distributed, if rj = rm and binomial B(3/8 − 2 , 2 )-distributed otherwise.

Proof. If rj = rm , then vj = vm and xj = φ(vj ) = φ(vm ) = xm .


So xj ⊕ xm = (0, 0, . . . , 0) and therefore

HAM ((xj ⊕ ej ) ⊕ (xm ⊕ em )) = HAM (ej ⊕ em ). (3)


The Schindler-Itoh-attack in Case of Partial Information Leakage 205

Because ej and em are independent,HAM (ej ⊕em ) is binomialB(2b (1−b ), k+R 2 )-


distributed.
Since xj , ej , xm and em are independently distributed and 34 of the bits of
xj and xm equal to one, HAM (xj ⊕ ej ) and HAM (xm ⊕ em) are independently
binominal B( 34 − 2b , k+R
2 )- distributed. Therefore the probability of a single bit
of (xj ⊕ ej ) ⊕ (xm ⊕ em ) being one is 2 ∗ (( 34 − 2b ) ∗ ( 14 + 2b ) = 38 − b ∗(1+
2
b
).

So the decision rule is reasonable for small error rates. But finding a good value
for the threshold should be harder than in the original scenario of the Schindler-
Itoh-attack since the mean of the HAM (xj ⊕ ej ) ⊕ (xm ⊕ em ) without a collision
is below 1/2. This means that the difference between the two cases is not as big
as in the leakage scenario originally considered in [6].
Remark 1. If all decisions are correct, the expected number of traces which the
attacker needs to find a t-collision is 2αR , where
α = (1 + log(t!) + 1 − Rlog(2))/(Rtlog(2) − 1). (4)
If wrong decisions occur, more traces are needed. Note that the probability of
wrong decision is bigger here than in the original setting of the Schindler-Itoh-
attack.
Once a collision of t traces is found, the second phase of the attack works just
like in [6]: For every single bit the majority decision is applied.
Proposition 1. If φ(vi ) is known, then the blinded secret scalar vi can be cal-
culated with approximately 33(k+R)/16 steps on average.
Proof. There are (k + R)/2 additions, because every addition corresponds to
two bits. On average 1/4 of these additions are dummy additions and there are
3(k + R)/8 additions which are not dummy additions. Each of these additions
corresponds to two bits. We can modify the BSGS-Algorithm the following way:
k+R k+R
W rite vj = q ∗ 2 2 + r with r < 2 2

k+R
k+R−1
 2 −1

j
Clearly q = vi,j 2 and r = vi,j 2j .
j= k+R j=0
2

This means the k+R


2 bits of q correspond to the first k+R
4 additions performed
by the algorithm of which 3(k+R)
16 are necessary. Each necessary addition is one
3(k+R)
of three possible types of addition (01, 10 and 11).So there are 3 16 possible
3(k+R)
values for q. Because of an analog argument there are 3 16 possible values for
r. So it is possible to make a table of all values for r (baby-steps) and see, which
of the possible values for q (giant-steps) matches.


Proposition 2. Approximately 20.2972∗(k+R) trials are needed on average
to find the secret scalar. If all majority decisions are correct, the attack
using the Schindler-Itoh-approach is more efficient than a BSGS-algorithm if
R < (1 − 2 ∗ 0.2972)/(2 ∗ 0.2972)k ≈ 0, 6825k.
206 A. Krüger

Proof. Follows from Proposition 1 by elemantary calculation.

If R is much smaller than k, the attack is significantly more efficient than the
standard attack. Note that this is the usual case in practice. Just like in [6] the
probability qt,b that for a given addition the majority decision yields the wrong
result is:
 t
2u+1
qt,b = sb (1 − b )t−s , t = 2u + 1. (5)
s=u+1
s

If t traces with the same blinding factor are found, the expected number l of
wrong majority decisions is:

k+R
l = qt,b ∗ (6)
2
Now the number L of possible locations of these at most l errors is:

l 
 
(k + R)/2
L := (7)
i=0
i

Proposition 3. Approximately L ∗ 20.2972∗(k+R) trials are necessary on


average to correct all wrong majority decisions. Approximately the attack is
more efficient than the BSGS-algorithm on average if,
log2 (L)
2∗0.2972 k − 0.20972 ≈ 0, 6825k − 3.3650 ∗ log2 (L) holds.
R < 1−2∗0.2972

For example for k = 256 and R = 16 this means, that the attack is more efficient
than the standard attack, iff log2 (L) < 47.168. For k = 256 and R = 32 the
attack is more efficient, iff log2 (L) < 42.413. Note that approximately 280+log2 (L)
trials are necessary to find the secret scalar in case of R = 16 and approximately
286+log2 (L) trials in case of R = 32, compared to 2128 trials in case of the standard
attack. Table 1 shows different values for M = log2 (L) + 0.2972 ∗ (k + R) for
different error rates b and different values for t, where a t-collision is found.

Table 1. Attack from chapter 4: Values for M for (k,R)=(256,16) and (k,R)=(356,32)
for different values t and different values for b . The attacker needs 2M trials on average.

b 0.05 0.10 0.15 0.20 0.25


M(t=3, R=16) 87.098 87.098 103.744 111.219 114.966
M(t=3, R=32) 93.180 99.350 110.075 131.790 155.410
M(t=5, R=16) 87.098 87.098 87.098 103.186 113.744
M(t=5, R=32) 93.180 93.180 93.180 99.350 110.0750
M(t=7, R=16) 87.098 87.098 87.098 87.098 87.098
M(t=7, R=32) 93.180 93.180 93.180 93.180 93.180
The Schindler-Itoh-attack in Case of Partial Information Leakage 207

5 The Second Leakage Scenario


As described in chapter 3 the attacked algorithm chooses randomly the addition
type (P , 2P or 3P ) of dummy additions. It would also be possible, that all
dummy additions are of one fixed type A In this case half of the additions would
be additions of type A and only a quarter of the additions would be additions
of type B and type C respectively. Thus only the additions of type A would be
candidates for dummy additions and the attacker would getl information about
the location of dummy additions.

5.1 Applying the Schindler-Itoh-attack to the Second Leakage


Scenario
In the second scenario an attacker can make several kinds of guessing errors. She
can guess 11 instead of 01 and 10 instead of 00 and so on. Note that guessing
01 despite of 10 and guessing 10 despite of 01 result in two wrong bits, while
all other types of guessing errors result in only one wrong bit. Since there are
three different types of additions in general, for each type of addition there are
two types of mistakes possible. We make the assumption that there is one fixed
error rate b and that given one operation both kinds of mistakes occur with the
same probability. So given an operation of type A the attacker will consider this
operation to be one operation of type B with probability 2b and will consider this
as an operation of type C with probability 2b . If the two bits 00 are processed, one
randomly chosen type of the three different types of additions will be executed
as a dummy addition. The attacker is not able to know, that a dummy addition
is executed and will determine the type of the addition. The error rate b also
holds for dummy additions. The attacker determines every type of operations
and guesses the corresponding bit s.
k+R k+R
Lemma 2. The attacker will guess 2 b + 6 bits wrong on average.
Lemma 3. Let there be a collision. Then the estimated hamming weight of ṽj ⊕
ṽm is (k + R) ∗ (1/42b + b (1 − b ) + 19 ).
Lemma 4. Let there be no collision. Then the estimated hamming weight of
ṽj ⊕ ṽm is (k+R)∗4
9 .
The proofs of lemma 2, lemma 3 and lemma 4 are omitted here.
Thus for small error rates the expected hamming weight of ṽj ⊕ ṽm in presence
of a collision, is significantly smaller than the error rate without a collision. Thus,
it should be possible to find collisions in this scenario as well. Note that the
hamming weights are not binomially distributed here, because the probabilities
of two adjacent bits being one are not independent, because two adjacent bits
correspond to one addition, so the probabilities of guessing the bits wrong are
not independent.
Remark 2. If all decisions are correct, than the expected number of traces, the
attacker needs to find a t-collision, is 2αR as in chapter 4, see formula (4) for the
208 A. Krüger

definition of α. If wrong decisions occur, more traces are needed. Note that the
probability of wrong decision is bigger here than in the original setting of the
Schindler-Itoh-attack.
After finding a class of t traces with the same exponent the majority decision
rule can be applied for every single bit just like in [6].

Lemma 5. After the attacker has successfully distinguished between the three
k+R
types of addition, she can find out the secret scalar with approximately 2 4
trials on average.

Proof. There are k+R2 additions. For each addition the attacker has to find out,
whether it is a dummy addition. With a modified BSGS-Algorithm as in propo-
k+R
sition 2 she needs 2 4 steps on average to do so.

For reasonable values of k and R, this is more efficient than the standard attack
using the BSGS-algorithm. We now estimate the number of trials to correct the
guessing errors. Note that the attacker has got three different possible guesses
for each operation (addition of P , 2P and 3P ), so she cannot always apply the
majority decision rule straightforward. For example if the attacker has found
a collision of three traces with the same blinded scalar, it is possible, that she
observed on addition of P , one addition of 2P and one addition of 3P at one
position. To solve this problem, the majority decision rule is applied bitwisely.
A better decision strategy might further improve the attack. Straightforward
combinatorics shows that the probability for a wrong majority decision for a
given operation is:


2u+1    s
s−u−1
t s 2 1
qt,b = b (1 − b )t−s ∗ ( + ∗ ), t = 2u + 1. (8)
s=u+1
s 3 3 ∗ 2s−1 i=0
i

If t traces with the same blinding factor are found, the expected number l of
wrong majority decisions is:

3(k + R)
l = qt,b ∗ (9)
2
Now the number L of possible locations of the l errors at most l errors is:
l 
 
3(k + R)/8
L := (10)
i=0
i

k+R
Proposition 4. On average L ∗ 2 4 +1 trials are enough to correct the wrong
majority decisions. The attack is on average more efficient than the BSGS-
algorithm, if log2 (L) < (k−R−4
4 .

Proof. Follows from Lemma 5 and the fact, that for every error there are two
possible corrections.
The Schindler-Itoh-attack in Case of Partial Information Leakage 209

So for k=256 and R=16 this means log2 (L) < 59 and for k = 256 and R = 32
this means log2 (L) < 55. Note that on average 2M trials are necessary to find
the secret scalar, where M = 69 + log2 (L), if k = 256 and R = 16, and M =
73 + log2 (L), if k = 256 and R = 32, compared to 2128 trials in case of the
BSGS-algorithm. Table 2 shows different values for M for different error rates b
and different values for t, where a t-collision is found.

Table 2. Attack from section 5.1.: values for M for (k,R)=(256,16) and (k,R)=(356,32)
for different values t and different values for b . The attacker needs 2M trials on average.

b 0.05 0.10 0.15 0.20 0.25


M(t=3, R=16) 75.687 86.433 99.418 110.202 124.966
M(t=3, R=32) 79.768 90.680 103.919 118.290 132.850
M(t=5, R=16) 75.687 75.687 86.433 95.386 110.202
M(t=5, R=32) 79.768 79.768 90.680 99.802 114.965
M(t=7, R=16) 75.687 75.687 75.687 86.433 99.418
M(t=7, R=32) 79.768 79.768 79.768 90.680 103.919

5.2 Using the Found Collisions to Gain Even More Information


In this scenario it is not just possible to apply the Schindler-Itoh-attack to cor-
rect the errors, but the attacker can also use the attack to gain information
about the location of dummy additions. She can distinguish the different three
types of additions, so to gain full information she has to detect the dummy
additions. Suppose she has found t=5 blinded scalars with the same blinding
factor. She now applies the majority decision rule. In case of a dummy addition
the probability that the attacker has made the same guess four or five times
is significantly lower than in the case that the operation is necessary for the
calculation. This gives her a criterion, where most of the dummy additions are
located.The attacker considers two adjacent bits corresponding to one addition.
For each of the five considered traces she has guessed that this addition and the
two corresponding bits belong to one type (01, 10 or 11). If this operation is
not a dummy addition, then every time the same operation has been performed
and it depends only on guessing errors, whether the attacker has made different
guesses. In this case with probability
15 14
pb = (1 − b )5 + 5 ∗ (1 − b )4 ∗ b + 5b ∗ 12 ∗ ∗ +5 ∗ 4b ∗ (1 − b ) ∗ (11)
2 2
the attacker has made the same guess at least four times. The value of pb for
several error rates is shown in table 3.
If the addition is a dummy addition then in each of the five executions of the
algorithm, the performed type of addition was chosen randomly. In this case the
probability that the attacker has made the same guess at least four times is
15 15 11
q =3∗ +3∗2∗5∗ = ≈ 0.136 (12)
3 3 81
210 A. Krüger

Table 3. The value of pb for different values of b

b 0.05 0.10 0.15 0.20 0.25


pb 0.977 0.987 0.835 0.738 0.634

Thus in case of a dummy addition the probability, that the attacker has made
the same guess four or five times is significantly lower than in the case that the
operation is necessary for the calculation. Therefore the attacker cannot only
correct her guessing errors, but will additionally obtain some information about
the location of the dummy additions.

Definition 1. Let us call operation, where she made the same guess at most
three times suspicious. Let N1 be the number of suspicious operations and D1 the
number of suspicious dummy operations. Let N2 be the number of not suspicious
operations and D2 the number of not suspicious dummy operations.

With high probability most of the suspicious operations are dummy operations
and only view dummy operations are not suspicious, i.e. N1 − D1 and D2 are
small. The attacker searches the set of suspicious operations to find the N1 −
D1 suspicious non-dummy operations and searches the set of not suspicious
operations to find the D2 dummy operations, which are not suspicious. This
way she can find the whole secret scalar much faster than by trying blindly all
possible locations. The following proposition gives the average workload for an
attack for t=5:

Proposition 5. On average we have N1 = k+R 8 ∗ (4 − 3pb − q), N2 =


k+R
8 ∗
(k+R)∗(1−q) (k+R)∗q
(3pb + q) , D1 = 8 and D 2 = 8 . For t=5 if
√ she has
√ corrected
all guessing 
errors, the
 attacker
 can find
 the
 secret
 scalar in Ω1 ∗ Ω2 trials,
N1 −D1 N1 D2 N2
where Ω1 = i=0 i and Ω2 = i=0 i .

Proof. The formulas for the average value of N1 , N2 , D1 and D2 are obtained
straightforward. The attacker needs Ω1 trials to find the suspicious non-dummy
operations and Ω2 trials to find the not suspicious dummy operations. With
a variation of the BSGS-algorithm like in lemma 2 this can be reduced to its
square root.


√ √
This means that on average 2M trials with M = log2 ( Ω1 ∗ Ω2 ) + log2 (L)
are necessary to find the secret scalar for t = 5. For the definition of L see
section 5.1. L operations derive from the correction of guessing errors. Table 4
shows the value of M for (k,R)=(256,16) and (k,R)=(256,32) and for different
error rates b .
Thus this is clearly a further advance to the attack in 5.1. and an attack on an
implementation with k = 256 and R ≤ 32 becomes definitely feasible for error
rates up to 0.15.
The Schindler-Itoh-attack in Case of Partial Information Leakage 211

Table 4. Attack from section 5.2.: values for M for (k,R)=(256,16) and (k,R)=(356,32)
for t=5 and different values for b . The attacker needs 2M trials on average.

M \b 0.05 0.10 0.15 0.20 0.25


M(R=16) 26.167 33.653 51.339 66.584 86.437
M(R=32) 26.611 34.305 52.878 68.753 89.398

5.3 Simulation Data


To validate these results numerical experiments have been conducted. For k=256
and R=16,32 100,000 winning classes of 5 traces each have been simulated. The
average results are shown in table 5. The values for N1 , D1 , D2 and l confirme
the theory.

Table 5. Simulation data for attack from section 5.2

t\b 0.05 0.10 0.15 0.20 0.25


Number of suspicious operations (R=16) 31.711 37.682 46.188 56.096 666.623
Number of suspicious operations (R=32) 33.517 39.896 48.900 59.415 70.523
Number of suspicious dummy operations (R=16) 29.403 29.388 29.394 29.404 29.404
Number of suspicious dummy operations (R=32) 31.082 31.105 31.109 31.124 31.127
Number of not suspicious dummy operations (R=16) 4.613 4.627 4.629 4.623 4.619
Number of not suspicious dummy operations (R=32) 4.895 4.905 4.885 4.889 4.897
Number of guessing errors (R=16) 0.090 0.660 2.072 4.518 8.117
Number of guessing errors (R=32) 0.095 0.689 2.188 4.795 8.609

6 A Barrier to Applying the Enhanced Version to Partial


Information
Applying the enhanced version of the attack to partial information faces a bar-
rier: the lack of arithmetic structure. The enhanced version consists of three
different phases: firstly finding collisions, secondly solving a system of linear eq-
uitations and thirdly determining d + cy. For thefirst phaseand for the third
phase the ṽjk have to be added. In the first phase uk=1 ṽjk − uk=1 ṽik has to be
computed to decide, whether there is a collision and in the third phase ṽj − rj y
must be computed. This is not possible, if the ṽjk are only partially known.

A Barrier in the First Phase. In the first phase the attacker has to detect col-
lisions of sums of blinding
 factors. She decides for a collision, if
HAM (N AF ( uk=1 ṽjk − uk=1 ṽik )) < b0 for a certain threshold b0 . Essential
for the decision rule is the fact that
 u u
 u
 u

ṽjk − ṽik = ry + ejk − eik ,
k=1 k=1 k=1 k=1
u
 u

where r = 0, if rjk = ri and r ∈ ZZ − {0}else.
k=1 k=1
212 A. Krüger

Now consider the case, where the attacker cannot find out ṽj = vj + ej , because
she gets only partial information from SPA. Just like in the discussion of the
first the scenario we can define a map φj : IN → IN, which is not injective. Given
a blinded secret scalar vj , the attacker can find φj (vj ) + ej . Here φj (vj ) contains
partial information about vj and ej is an error vector. We have

u 
u 
u 
u 
u 
u
(φjk (vjk )+ejk )− (φik (vik )+eik ) = φjk (vjk )− φik (vik )+ ejk − eik
k=1 k=1 k=1 k=1 k=1 k=1
(13)

It is only justified to use a decision rule like in [6], if


u
 u

φjk (vjk ) − φik (vik ) = 0. (14)
k=1 k=1

But this is not the case in general. In fact it is the case, if the following three
conditions hold:

1. There is one single map φ with φj = φ for all j, e.g. φj does not
depend on j.
2. φ(a + b) = φ(a) + φ(b) and φ(a − b) = φ(a) − φ(b) for all a > b
3. φ(0) = 0

In the first scenario, which is considered in chapter 4, the first condition holds;
we even defined φ explicitly. In the second scenario, which is considered in
chapter 5, the first condition does not hold, because it is randomly decided
which type of addition is performed, when a dummy addition is performed. This
means φj (vj ) does not only depend on vj but also on random decisions the cryp-
tographic device made, when the j-th power trace was recorded. Thus the map
depends on j. This means the first condition may be fulfilled sometimes, but is
not fulfilled always. However, even if the first condition is fulfilled, there is no
reason at all to assume, that the second condition is fulfilled as well, because the
map depends on which information the attacker can get. There is no reason,
why this information should lead to an additive map . This is why finding colli-
sions of sums of blinding factors should be impossible, if only partial information
is available. But in phase one it is possible to find a weaker condition 2’) than
condition 2) and 3), so that the Hamming Weight of the NAF of (16) will be
small and it is possible to find collisions:
 
2’) HAM (N AF (HAM (N AF ( uk=1 φjk (vjk ) − uk=1 φik (vik )) is sufficiently
small.

A Barrier in the Third Phaset. But even if phase one and phase two work,
we still face a problem in the third phase of the attack: The attacker first has
to compute: ṽj − rj y = d + rj y + ej − (rj − c)y = d + cy + ej . This is not
possible, because she only knows φ(ṽj ) and does not know ṽj . Let lj be the
number of natural numbers x with φ(x) = φ(ṽj ). The attacker can compute the
The Schindler-Itoh-attack in Case of Partial Information Leakage 213

set {x − rj y|x ∈ IN and φ(x) = φ(ṽj )} and gets lj hypotheses for d + cy + ej . But
she can only verify a hypothesis for d + cy and not a hypothesis for d + cy + ej .
Now the algorithm presented in [6] to compute d + cy from d + cy + ej requires
the values d + cy + ej for several indices j as input. In fact in [6] the algorithm
takes this this value for all N recorded power traces as an input. If thevalues
r
d + cy + ej are needed for the indices j1 , j2 , . . . , jr the attacker needs i=1 lji
trials to find the secret scalar. This can be viewed as impractical for large values
of r andlj .
Another reasonable approach in the second leakage scenario would be to guess
the whole scalar and use the enhanced attack just like in [6]. Here the inability to
recognize dummy operations just raises the error rate: The necessary operations
are misinterpreted with an error rateb and additionally the dummy operations
are always misinterpreted. By lemma 3 this would lead to an overall error rate
of 16 ≈ 16, 67%, even if b = 0. In [6] 13% is given as the maximal error rate
which the enhanced version tolerates for R=16. So this approach would also be
impossible for R ≥ 16. However the example does show, that despite the barriers
highlighted in this chapter, every algorithm and every leakage scenario has to
be analyzed carefully to determine, whether a variation of the enhanced variant
of the Schindler-Itoh-attack can be mounted.

7 Conclusion

It has been shown, that the basic version of the Schindler-Itoh-attack can be
generalized to a setting where only some of the bits leak by SPA with a certain
error rates. This is possible in two scenarios, where different information about
a discrete exponentiation using a window-method can be found out. In the first
scenario dummy additions can be detected with a certain error rate, but different
types of additions are indistinguishable. In the second scenario the three types
of additions can be distinguished, but dummy operations cannot be detected.
In both scenarios it is possible to find collisions and to correct the guessing
errors using the Schindler-Itoh-attack and to find out the remaining bits using
a variation of the BSGS-algorithm. In the second scenario it is even possible to
gain information about the location of the dummy operations by the methods
of the Schindler-Itoh-attack. This way an attack on an implementation with
a secret scalar of bit length 256 and a 32-bit randomization becomes feasible.
However finding the collisions is more difficult than in the setting considered in
[6], because the expected hamming weight in presence of a collision is higher. It
has been shown, that it is difficult to apply the enhanced version of the attack
to the case of partial information leakage due to the lack of arithmetic structure.
However, it has to be further investigated, in which situations this is possible.

Acknowledgement. I would like to thank my college Dirk Feldhusen for en-


couraging my to write this paper, for his valuable support and for proofreading.
I also would like to thank the anonymous referees for their helpful comments.
214 A. Krüger

References
1. Coron, J.-S.: Resistance against Differential Power Analysis for Elliptic Curve Cryp-
tosystems. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
Springer, Heidelberg (1999)
2. Fouque, P.-A., Kunz-Jacques, S., Martinet, G., Muller, F., Valette, F.: Power Attack
on Small RSA Public Exponent. In: Goubin, L., Matsui, M. (eds.) CHES 2006.
LNCS, vol. 4249, pp. 339–353. Springer, Heidelberg (2006)
3. Itoh, K., Izu, T., Takenaka, M.: A Practical Countermeasure against Address-Bit
Differential Power Analysis. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003.
LNCS, vol. 2779, pp. 382–396. Springer, Heidelberg (2003)
4. Krüger, A.: Kryptographie mit elliptischen Kurven und Angriffe darauf (Elliptic
Curce Cryptography and Attacks on it). Bachelor thesis, University of Bonn (2011)
5. Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography.
CRC Press (1996)
6. Schindler, W., Itoh, K.: Exponent Blinding Does Not Always Lift (Partial) Spa
Resistance to Higher-Level Security. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011.
LNCS, vol. 6715, pp. 73–90. Springer, Heidelberg (2011)
Butterfly-Attack on Skein’s Modular Addition

Michael Zohner1,3 , Michael Kasper2,3 , and Marc Stöttinger1,3


1
Technische Universität Darmstadt, Integrated Circuits and Systems Lab,
Hochschulstraße 10, 64289 Darmstadt, Germany
{zohner,stoettinger}@iss.tu-darmstadt.de
2
Fraunhofer Institute for Secure Information Technology (SIT), Rheinstraße 75,
64295 Darmstadt, Germany
michael.kasper@sit.fraunhofer.de
3
Center for Advanced Security Research Darmstadt (CASED), Mornewegstraße 32,
64289 Darmstadt, Germany
{michael.zohner,michael.kasper,marc.stoettinger}@cased.de

Abstract. At the cutting edge of todays security research and develop-


ment, the SHA-3 contest evaluates a new successor of SHA-2 for secure
hashing operations. One of the finalists is the SHA-3 candidate Skein.
Like many other cryptographic primitives Skein utilizes arithmetic oper-
ations, for instance modular addition. In this paper we introduce a new
method of performing a DPA on modular addition of arbitrary length.
We will give an overview over side channel analysis of modular addition,
followed by problems occurring when dealing with large operand sizes of
32 bits and more. To overcome these problems, we suggest a new method,
called the Butterfly-Attack to exploit the leakage of modular additions.
Real world application is being shown by applying our new approach to
Skein-MAC, enabling us to forge legitimate MACs using Skein.

Keywords: side-channel, SHA-3, Skein, Butterfly-Attack, modular


addition.

1 Introduction

In modern cryptography a huge effort is made to ensure the security of an al-


gorithm. Cryptographic schemes, for instance, are usually designed to rely on a
mathematical problem, which is known to be computationally infeasible to solve.
Thus, instead of attacking the cryptographic scheme directly, adversaries some-
times try to break its realization by attacking the implementation. One common
kind of these implementation attacks on crypto systems are Side Channel At-
tacks. Side channel attacks utilize all kinds of physical leaking information, e.g.
the computation time of the cryptographic algorithm, the power consumption of
the device or even the electromagnetic emission during the active computation
phase of the device. Because they exploit features of operations, side channel
attacks may be used on various cryptographic algorithms like AES, RSA or
even hash functions. An example for an operation which may be exploited by a
side channel attack is the modular addition which is one of the basic operations

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 215–230, 2012.

c Springer-Verlag Berlin Heidelberg 2012
216 M. Zohner, M. Kasper, and M. Stöttinger

for modular arithmetic and therefore used in most cryptographic algorithms.


However, even though the modular addition is a basic operation, which is of-
ten exploited by side channel attacks, there still exist issues that complicate or
hinder a practical attack.
In this paper we focus on the side channel analysis of the modular addition.
First, we outline the state-of-the-art SCA methods against this type of integer
arithmetic. Following, we introduce our approach on performing a power analysis
against large scale modular addition that uses a chosen input scenario in order
to reduce the required power traces. Subsequently, we detail a common problem
of side channel attacks on modular addition and introduce our new method of
analyzing the resulting correlation, the so called Butterfly-Attack. Finally we
apply the Butterfly-Attack on the reference implementation of Skein-MAC.

2 Background Theory
The following section provides an introduction to the background theory needed
to understand this paper. It gives an overview of hash functions and side channel
analysis and then present a brief introduction to Skein.

2.1 Hash Functions


Hash functions H : {0, 1}∗ → {0, 1}n map a variable sized data value from an
input domain to a fixed sized representation from their output range, the so
called hash. Additionally, cryptographic hash functions like SHA-1 and SHA-2
have to guarantee certain properties, i.e. collision resistance, second preimage
resistance and one-wayness to be safely usable in cryptography.

Matyas-Meyer-Oseas Construction: To reduce the complexity of build-


ing cryptographic hash functions, several constructions, such as the Matyas-
Meyer-Oseas construction [4], have been proposed. The Matyas-Meyer-Oseas
construction builds a collision resistant hash function H : {0, 1}∗ → {0, 1}n by
using a block cipher E : {0, 1}q × {0, 1}o → {0, 1}o and an initialization vector
IV ∈ {0, 1}o. The input message blocks m0 m1 ...mp−1 = M serve as the key
in the block cipher and must each match the expected block size q (M may
eventually be padded). To compute H(M ) the input message M is split into p
blocks of length q and then processed as:
y0 = IV, yi+1 = E(yi , mi ) ⊕ yi , H(M ) = yk for 0 ≤ i < k (1)
The values yi are called state values and their size is called state size.

Message Authentication Codes: Message Authentication Codes (MACs) are


used to provide end to end authentication to transmitted text. The most promi-
nent MAC function, Hash-based Message Authentication Code (HMAC) [7] com-
putes a MAC for a message M as follows:
HM AC(M ) = H((K ⊕ OP AD)||H((K ⊕ IP AD)||M )) (2)
Butterfly-Attack on Skein’s Modular Addition 217

where || denotes the concatenation, ⊕ the binary XOR, K is the pre-shared key,
H is the hash function used and IP AD as well as OP AD are two constants
defined as the hexadecimal value 3636...36 and 5C5C...5C which have the same
size as the state value used in H.

2.2 Side Channel Attacks

Side channel attacks are attacks against cryptographic implementations exploit-


ing all kinds of information that can be attained during the computation. They
were introduced by Paul Kocher in 1996 and have gained more and more im-
portance since then. Side channel attacks can be divided into different groups,
depending on the information they utilize in order to recover the key. The most
prominent groups of these attacks are Power Attacks, Timing Attacks, Electro-
magnetic Emission Attacks, and Cache Attacks. Power analysis attacks exploit
the dependency of the power intake of a device on the processed data [11]. An
adversary can thus make assumptions about the processed data by analyzing
the power consumption. For instance in CMOS based microcontrollers a 1 needs
more power to be represented than a 0 due to the bit flip caused by the pre-charge
of the bus lines with 0s. Thereby, the Hamming Weight (HW) of an intermediate
value can be estimated. The intermediate values, processed in the device, can be
estimated using a hypothesis function.
The most common Power Attack is the Differential Power Analysis (DPA)
[10]. It was introduced as an advancement of the Simple Power Analysis (SPA)
where an attacker directly interprets the power consumption of a device. Using
the DPA it is possible to statistically analyze the power consumption, thereby
reducing the impact of noise - the power consumption which is not due to the
attacked intermediate value. During the DPA a correlation between the hypoth-
esis hk of a key candidate k and the measurement y can be computed using
Pearson’s correlation coefficient :
n
i=1 (hk,i − hk )(yi − y)
Rρ (hk , y) =  n (3)
n 2 2
i=1 (h k,i − h k ) i=1 (y i − y)

The correct key can then be recovered by assuming the candidate with the
highest absolute correlation.
An advancement of the DPA was presented in [6]. In this contribution, Bevan
et al. compute the difference of means for the actual measurement and compare
it to the theoretical distribution of the key using the least square method. The
key hypotheses that minimizes the least square method is then chosen as the
correct key.

2.3 Skein

Skein is a hash function family with three different internal state sizes: 256, 512
and 1024 bits. Skein consists of three components:
218 M. Zohner, M. Kasper, and M. Stöttinger

– Threefish: a block cipher with block sizes of 256, 512 and 1024 bits.
– Unique Block Iteration (UBI): a construction similar to Matyas-Meyer-
Oseas that builds a compression function by connecting Threefish operations.
– Optional Argument System: provides optional features like tree hashing
or MAC.

The following sections will provide a basic introduction to Skein needed to un-
derstand the attack. For further information refer to [9].

Skein Hashing: To compute the hash of a message Skein compresses data using
Threefish and connects the Threefish calls using UBI. Upon being called, Three-
fish divides the input M = m0 m1 ...mN −1 and the state value K = k0 k1 ...kN −1
for N ∈ {4, 8, 16} in 64 bit blocks. Then it performs an AddRoundKey operation
with M as plaintext and K as first round key:

yi = mi + ki (mod 264 ), for 0 ≤ i < N. (4)

With the resulting state values Skein then performs 72 rounds (80 rounds in case
of the block size being 1024 bits) of the operations MIX and Permute and every
fourth round it adds the next round key. For the attack described in this paper only
the initial AddRoundKey operation is relevant, the other operations are therefore
omitted. To compute the hash of a message, Skein calls UBI three times, first with
a configuration block as input and the state value 0, then with the outcome of the
first call as state value and the message as input and lastly with 0 as input and
the output from the second call as state value (cf. Figure 1).

Fig. 1. Straightforward hashing with Skein

Skein MAC: Skein provides its own MAC function for performance reasons.
While with HMAC a hash function has to be called two times, Skein MAC
only needs one additional UBI call in the beginning using the key as input (cf.
Figure 2). The output of this UBI call is then used as input for the usual Skein
hashing. Note that the resulting chaining value before the third UBI call that
uses the message as input is constant if the same key and configuration are used,
so it can be precomputed and stored.
Butterfly-Attack on Skein’s Modular Addition 219

Fig. 2. Skein’s built in MAC function

3 Side Channel Attacks Using the Modular Addition

This section provides a detailed analysis of the modular addition in regards to


side channel analysis. First, we outline the effects of the modular addition on the
resulting correlation of a DPA. Then we give a brief overview over the different
attack scenarios for modular addition and the possible attacks which can be ap-
plied to each scenario. Finally, we will introduce our own approach to attack the
modular addition followed by the Butterfly-Attack, which is being used to recover
the key from a resulting correlation, because the usual approach of finding the key
by choosing the absolute maximum is not applicable for this case.

3.1 Modular Addition

When performing a DPA on a modular addition, the resulting Pearson corre-


lation has a distinct shape observable in figure 3 [5]. As one can see only the
correct key has the highest attainable correlation of 1 respectively -1. If one
takes a closer look at the shape of the resulting correlation, one may notice

Fig. 3. Theoretical correlation of all elements in the field (28 ) with the correct key 89
(010110012 ) and the symmetric counterpart 217 (110110012 )
220 M. Zohner, M. Kasper, and M. Stöttinger

that it is symmetric with the points of origin being the correct key and the key
candidate with only the most significant bit differing from the correct key k, a
candidate from which we will from now on refer to as symmetric counterpart.
The symmetrical effect is due to the fact that:
corr(k + d (mod 256)) = corr(k − d (mod 256)), 0 ≤ d ≤ 128. (5)

Furthermore, noticable is that an order can be established amongst the key can-
didates regarding their correlation. Candidates with a small Hamming distance
to the correct key tend to have a higher correlation than candidates with a high
Hamming distance. Also amongst the group of candidates whose distance is of
the same Hamming weight, the ones which differ in more significant bits from the
correct key have a higher correlation than the ones which differ in less significant
bits.
The reason for all these effects is the carry bit and the carry bit propagation,
respectively [8]. A candidate has a higher correlation if its variance is similar to
the variance of the correct key for a specific input. In case of the modular addition
the implication is that the more constant the Hamming distance between a
candidate and the correct key is for all inputs, the higher the resulting correlation
of the candidate.

Fig. 4. Hamming weight difference between the correct key 8 and every other of the
24 candidates

Figure 4 shows the Hamming weight difference for all candidates to the correct
key, in this case the value is 8. The difference for all candidates was computed and
the occurrences for each value were accumulated. As one can see, the symmetric
counterpart (0) is either one Hamming weight bigger or one Hamming weight
smaller than the correct key, making it the second most constant candidate
amongst all others.
Butterfly-Attack on Skein’s Modular Addition 221

If a carry occurs in a bit a candidate differs from the correct key, the Hamming
distance of the candidate to the correct key is changed. The bigger the Hamming
distance of a candidate to the correct key is before the addition, the likelier the
Hamming distance is to change after an addition, therefore causing a loss of
correlation. Furthermore, bit differences in less significant bits influence more
significant bits by either causing a faulty carry or not causing a missing a correct
carry, therefore inducing even further discrepancy in the overall value.
Lastly there is another conspicuity if in the correlation of the modular addi-
tion. If one compares the resulting correlations of a specific candidate for two
different bit sizes, one may notice that the correlation increases with the size of
operands processed in the modular addition. This effect is due to features of the
Pearson correlation. According to Appendix A for a b bit sized modular addition
the Pearson correlation can be written as:
2b
i=1 (xi ∗ yi )
b b
−b (6)
4 ∗2

where xi is the hypothesis and yi is the measured power consumption. The


enumerator of this equation increases for a steady xi faster than the divisor,
thereby causing the correlation of the candidates to converge against 1 if the
operand sizes tend to infinity. An example can be seen in Appendix C where
the correlation for the symmetric counterpart for growing operand size is shown.
As this will not be the case in real life applications since the register size of a
device will not grow infinitely, it still shows that attacking a modular addition
with fewer bits is easier than attacking modular addition with more bits because
the difference in correlation between the candidates makes the key easier to
distinguish.

3.2 State-of-the-Art Attacks against Modular Addition


Since the modular addition is a common operation, several attacks already exist.
The most basic attack is to use the modular addition as hypothesis function,
determine the Hamming weight for each key candidate and input, compute the
correlation and pick the candidate with the highest result.
While this approach is possible if the bit size N of operands of the modular
addition is small (i.e. 8 bits or 16 bits), performing this attack for bigger operands
(i.e. 32 bits or 64 bits) becomes more computationally costly since a hypothesis
has to be computed for too many candidates. Additionally, the register size R of
the platform the attack is applied to (i.e. 8 bits, 16 bits, 32 bits, or even 64 bits)
is also of importance. Table 1 summarizes the different scenarios and applicable
attacks. If the size of the modular addition N is small the regular attack can be
performed regardless of the register size of the platform. If, on the other hand,
N is big, the regular approach becomes computationally costly and a different
approach should be taken in order to attack the implementation.
In case of the device having 8 bit or 16 bit registers, one can divide the N
bit input and candidates during the hypothesis computation in  N8 times 8 bits
222 M. Zohner, M. Kasper, and M. Stöttinger

Table 1. Possible attacks against modular addition

HH
R
N HH
8 bits or 16 bits 32 bits or 64 bits
H
≤ 16 bits regular attack regular attack
> 16 bits Divide and Conquer Divide and Conquer (costly)

N
or  16 times 16 bits values. Thereby, the complexity of the DPA is reduced
from 2N to  N8 ∗ 28 or  16
N
∗ 216 hypothesis computations. The separation can
be performed since the device splits the N bit modular addition into a size its
registers can process. The device performs the operations independently, only
passing a carry bit to the next block if necessary. We refer to this attack as the
Divide and Conquer Approach.
If the size of the modular addition operands and the register size of the device
both exceed 16 bits, the divide and conquer becomes more costly in terms of
required measurements. The problem is that the bits are not being processed
independent anymore and the power consumption of the omitted bits influences
the attacked bits. Thus, more power traces are required in order to average out
noise.

3.3 Improved Attack against Modular Addition


The state-of-the-art approaches of attacking the modular addition have a prob-
lem when dealing with a device which has registers of size 32 bits or 64 bits and
which runs a modular addition with operands of size 232 or bigger. Thus, in this
section we introduce our approach of how to efficiently attack this scenario.

Masked Divide and Conquer: To reduce the complexity of attacking the N


bit modular addition result, it is again being split into blocks which are analyzed
independently of each other. The difference to the regular divide and conquer
is that we only choose λ successive bits of the input random and keep the rest
zero. This is done by taking the N bit input M = m0 m1 ...mN −1 , mi ∈ {0, 1}
and a mask α = α0 α1 ...αN −1 , αi ∈ {0, 1} which has only λ successive bits set
to 1 and performing the bitwise logical AND denoted by :
mi = mi  αi for 0 ≤ i < N. (7)
The resulting M  = m0 m1 ...mN −1 is then used as input for the device. Note
that in this case the term mask does not refer to the countermeasure masking
where a random value is used to change the processed intermediate variable in
order to randomize the key hypothesis.
If sufficient measurements for this position of the λ successive bits in the
mask have been performed, the masking value α is shifted by λ bits and the
measurement process is started again. This continues until all bits in the mask
have, at least once, been set to 1.
Butterfly-Attack on Skein’s Modular Addition 223

The concept behind this is to keep the change of the key value to an attack-
able size. If it is known that only a certain number of bits were likely to have
changed during the modular addition and the other bits of the key remained
the same, the untouched bits can be omitted in the DPA, reducing the complex-
ity. So instead of having a complexity of 2N for the analysis the masked divide
and conquer strategy reduces the complexity to 2λ ∗  N λ . The corresponding
hypothesis function h for the DPA is:

h(mλ , k) = HW (mλ + k), for 0 ≤ k < 2λ , (8)

where mλ is the λ bits input block with the variable data.
Depending on the least significant bit of the next key block, a carry in the
most significant bit of the hypothesis leads to an increase of the overall Hamming
weight by 1 in 50% of the cases, to no change at all in 25% of the cases, to a
decrease by 1 in 12.5% of the cases, to a decrease by 2 in 6.25% of the cases and
so on. Thus an increase in Hamming weight by 1 is most likely and we chose the
hypothesis function to not reduce the result modulo 2λ .

3.4 Symmetrical Analysis


When we performed a DPA using the divide and conquer approach on the mod-
ular addition we observed a conspicuity within the resulting correlation. While
after the regular attack the correct key possessed the highest correlation, the
masked divide and conquer resulted in other candidates having a higher corre-
lation even though the overall correlation converged. Thus, choosing the correct
key as the candidate with the highest correlation results in a failed key recovery.
The following section introduces a way of coping with this problem, making use
of the features of the modular addition.

Problems When Analyzing the Correlation: In the resulting correlations


from the DPA it still holds that each key candidate with a small Hamming
distance to the correct key possesses a high correlation. The difference is that it
may occur that neither the correct key nor its symmetric counterpart have the
highest correlation among all possible key candidates. This renders the approach
of finding the correct key by choosing the candidate with the highest correlation
unsuited.
An example can be seen in Figure 5, which depicts a symmetric correlation
for the masked divide and conquer where the mask size λ is 8 bits. The correct
key 212 has a low positive and a rather high negative correlation. But there are
other candidates that have an even higher positive and/or negative correlation.
The reason for this effect is again the carry bit propagation. If a carry occurs
during the addition of the most significant bit of the λ bits block with the
key, some of the omitted bits may also change depending on the value of the
next key block. If several lesser significant bits of the next key block are set to
1, multiple carries occur and thus affect the variance of the measured traces.
Thus, the correlation between the measured traces and the correct key decreases
224 M. Zohner, M. Kasper, and M. Stöttinger

Fig. 5. Correlation of masked divide and conquer with block size 8 bits and 212 as
correct key

because their variance no longer matches. However, other candidates can have
their correlation increased because their behavior resembles the behavior of the
measured traces.

The Butterfly-Attack: Because we can no longer detect the key by choosing


the candidate with the highest correlation, we make use of the symmetrical
features of the modular addition. As mentioned in Section 3.1, candidates with
the same distance to the correct key are similar in correlation. Therefore, instead
of finding the candidate with the highest result, we determine the points of origin
of the symmetry. This is done by a least square approach on the correlation of
all key hypothesis. Taken a candidate k as a point of origin, for all candidates
of a modular addition of N bit size we subtract the correlation of the candidate
smaller than k from the correlation of the candidate bigger than k with the same
distance to k, square the difference and accumulate the result:
N −1
2  2
lsqcorr (k) = (corr(k + j mod 2N )) − (corr(k − j mod 2N )) . (9)
j=1

If the candidate is not the point of origin, there is no symmetrical effect and
large values tend to be added whereas if it is the point of origin, the values tend
to be small. Figure 6 shows the squared difference computed for the correlation
of Figure 5. As one can see the key and it’s symmetric counterpart both have
the minimal value which leaves us with two possible keys.
With the Butterfly-Attack we have drastically reduced the number of pos-
sible keys for a λ bit block from 2λ to 2. While this may suffice for a lot of
scenarios, there exist some algorithms like Skein-1024 which have a large state
size, therefore still leaving us with too many key candidates to claim the attack
successful. A summary over the complexity after the Butterfly-Attack for all
Butterfly-Attack on Skein’s Modular Addition 225

Fig. 6. Butterfly-Attack performed for each key of Figure 5 on the point in time with
the highest correlation

Table 2. Complexity of attacking the different Skein variants

Skein variant complexity complexity for λ = 8


(64/λ−1) 4
256 (2 ) 228
512 (2(64/λ−1) )8 256
1024 (2(64/λ−1) )16 2112

Skein variants is depicted in Table 2. For Skein-256 it is still feasible to find the
correct state, needed to forge a Skein MAC, by verifying all possible combina-
tions. However, an exhaustive search for Skein-512 already requires dedicated
hardware and for Skein-1024 it is computationally infeasible to try all possible
combinations. Therefore, in the next section we suggest a modified version of the
masked divide and conquer which reduces the number of candidates for a block
even further and thus lets us successfully attack Skein-512 and Skein-1024.

Improving the Masked Divide and Conquer: To further reduce the number
of possible keys and make the attack feasible for Skein-512 and Skein-1024, we
have to determine the correct key for a λ bit block. Because the uncertainty
only remains for the most significant bit independent of the position of the
mask we let the λ bit blocks partially overlap during the measurement phase.
In that manner, a most significant bit in one position becomes one of the lesser
significant bits in the next position and we can determine it’s value. The number
of positions shifted can be varied as required. For instance if one shifts the mask
bits by λ − 1, the number of measurements needed would decrease to the factor
of ( Nλ−1
−2∗λ
) + 2. It would also be possible to test every single bit multiple times
with this approach, adding redundancy and therefore more confidence in the key
guess, but raising the number of measurements needed.
226 M. Zohner, M. Kasper, and M. Stöttinger

4 Applying the Butterfly-Attack to Skein-MAC

During the computation of Skein-MAC there is a modular addition of the in-


termediate state with the input message in the third UBI call, which meets our
requirements for a DPA [12]. The state value before the UBI call is constant if
the same key is used. Thus, if one regains the state value, one is able to forge an
arbitrary Skein-MAC. Skein does the modular addition as follows:

1. split the input message M = M0 , M1 , ..., Mn−1 and the key state K =
K0 , K1 , ..., Kn−1 into n 64 bit blocks where n ∈ {4, 8, 16}, depending on
the Skein variant used
2. perform a modular addition of Mi with Ki for 0 ≤ i < n

The message is directly added to the key state, so we do not have to change it,
making this is an ideal attack scenario for the masked divide and conquer. We
demonstrate the attack on Skein-256, but because of the 64 bit blocks attacked
are independent of each other and the only difference in the attack for the Skein
variants is the number of the 64 bit blocks, the attack is also applicable for
Skein-512 and Skein-1024 with nearly the same complexity.
Following, we present the results of the divide and conquer on a 8 bit AVR
ATMega2561 microcontroller [2] to prove the practical applicability of our side
channel analysis on Skein. Then we switch to a 32 bit ARM Cortex-M3 micro-
controller [1] and show the results of our masked divide and conquer. As im-
plementation in both cases we used the reference implementation of Skein-256
submitted to the third round of the SHA-3 competition [3].

4.1 Using the Divide and Conquer Approach

Because the AVR ATMega 2561 has registers of 8 bits size and therefore splits the
64 bits modular addition in eight modular additions of 8 bits operands each, we do
not need to mask the input. In total we applied the DPA using 200 power traces
which were enough to achieve a stable correlation. During the analysis of the re-
sulting correlations we observed that for some key bytes the key with the highest
correlation was not the correct key candidate. However, the symmetric shape of
the correlation was still noticeable. Therefore, we applied the Butterfly-Attack on
these correlations, resulting in the correct key candidate and it’s symmetric coun-
terpart. Note that because we could not use our approach of shifting the mask to
attain the correct key because the most significant bit is always the most signifi-
cant bit in this 8 bits block. In order to pick the correct key, we analyzed the effect
of a carry in the most significant bit and decided for certain input, whether a carry
during the addition of two 8 bit values occurred or not. Thereby, we were able to
restore the state value, enabling us to compute legitimate Skein-MACs.
In order to estimate the influence of the noise, due to the ommitted bits, we
attacked the 32 bit ARM Cortex-M3 with the divide and conquer approach.
In total we performed 5000 measurements of the device. We split the 32 bits
key into four blocks of 8 bits each, computed hypotheses for each block, and
Butterfly-Attack on Skein’s Modular Addition 227

compared each block independently using the Pearson correlation. Interestingly,


the number of measurements, required for recovering the key, varied from 8 bits
block to 8 bits block for each of the attacked 32 bits modular additions. While
the correlation of the 8 most significant bits (b31 ...b24 ) of the key stabilized after
800 measurements, the correlation of the 8 least significant bits (b7 ...b0 ) of the
correct key required 1500 measurements. The two 8 bit blocks in the middle
(b23 ...b8 ) required the highest number of traces for a stable correlation with a
total number of 3800 (for (b23 ...b16 )) and 4100 (for (b15 ...b8 )). The difference in
required traces is probably due to the varying impact of the bits on the power
consumption.

4.2 Using the Masked Divide and Conquer Approach

In order to estimate the benefit of the masked divide and conquer, we also
performed it on the 32 bit ARM Cortex-M3. As mask size λ we settled on
8 bits because it brought the best trade-off for our setup (see Appendix D).
Starting with the eight least significant bits of a 32 bits block, we performed
500 measurements and then shifted the random byte four positions towards the
most significant bit. We proceeded in this manner until we covered all 32 bits. In
total we had to perform 3500 measurements for each 32 bits modular addition.
With the same setting as for the divide and conquer approach, we were able to
achieve a stable correlation and thus recover the key for all 8 bits blocks after
only 500 measurements.
To reduce the number of measurements for Skein-256 we attacked the eight
32 bits blocks simultaneously by choosing the input for all of them the same.
This decreased the measurements needed by a factor of eight for Skein-256 and
speeds up the computation of the hypothesis because it only has to be computed
once for each of the 32 bits blocks. We computed the most significant bits for
the first of the two 32 bits blocks by analyzing the effects of a carry in the most
significant bit and deciding for each input whether or not a carry occurred.
For the DPA we used the hypothesis function mentioned in Equation 8 in order
to compute the correlation between each of the 256 keys and the traces measured.
The resulting correlation was then analyzed using the Butterfly-Attack and the
two points of origin of the symmetry were chosen as possible key candidates.
In that manner we proceeded for all four 64 bits blocks and for every position
of the mask. Finally, we combined the key hypothesis by choosing the bit value
with the higher occurrence for each of the 256 bits, resulting in the correct state
which enabled us to forge legitimate Skein-MACs.

5 Conclusions and Future Work

In this paper we investigate side channel analysis on modular addition. We in-


troduced the masked divide and conquer, a new scheme of dealing with large
scale modular additions. This scheme is suitable for devices with a register and
operand sizes of 32 or 64 bits. It is an improvement of the regular divide and
228 M. Zohner, M. Kasper, and M. Stöttinger

conquer against modular addition which performs inefficiently for this particular
scenario due to the infeasible computing overhead.
Using the known divide and conquer method and the masked divide and con-
quer method the key could not be recovered by applying a DPA in the regular
manner. In order to cope with this problem we introduce the Butterfly-Attack,
a new analysis method specifically designed for attacking modular addition. To
show the applicability of our attack, we applied it to the reference implemen-
tation of Skein-256 where we successfully recovered the constant state value,
enabling us to forge Skein-MACs. In future work we will perform our attack on
more complex platforms like the Virtex-5 FPGA and we will also attack different
Skein variants.

References
1. ARM CortexM-3 product site,
http://www.arm.com/products/processors/cortex-m/cortex-m3.php
2. AVR ATMega2561 product site, http://www.atmel.com/
3. Skein submission to the final round of the SHA-3 contest,
http://csrc.nist.gov/groups/ST/hash/sha-3/Round3/documents/
Skein_FinalRnd.zip
4. Preneel, B., Govaerts, R., Vandewalle, J.: Hash Functions Based on Block Ciphers:
A Synthetic Approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp.
368–378. Springer, Heidelberg (1994),
http://www.springerlink.com/content/adq9luqrkkxmgk03/fulltext.pdf
5. Benoît, O., Peyrin, T.: Side-channel analysis of six SHA-3 candidates. In: Mangard,
S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 140–157. Springer,
Heidelberg (2010),
http://www.springerlink.com/content/822377q22h78420u/fulltext.pdf
6. Bevan, R., Knudsen, E.: Ways to Enhance Differential Power Analysis. In: Lee, P.J.,
Lim, C.H. (eds.) ICISC 2002. LNCS, vol. 2587, pp. 327–342. Springer, Heidelberg
(2003)
7. Krawczyk, H., Bellare, M., Canetti, R.: Rfc 2104 - hmac: Keyed-hashing for message
authentication. Tech. rep., IEEE (1997), http://tools.ietf.org/html/rfc2104
8. Lemke, K., Schramm, K., Paar, C.: DPA on n-Bit Sized Boolean and Arithmetic
Operations and Its Application to IDEA, RC6, and the HMAC-Construction. In:
Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 205–219.
Springer, Heidelberg (2004)
9. Ferguson, N., Lucks, S., Schneier, B., et al.: The skein hash function family. Sub-
mission to NIST, Round 3 (2010), http://www.skein-hash.info
10. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
11. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards (2007)
12. Zohner, M., Kasper, M., Stöttinger, M.: Side channel analysis of the sha-3 finalists.
In: Design, Automation & Test in Europe, DATE (2012)
Butterfly-Attack on Skein’s Modular Addition 229

A Proof of Equation 6

The proof of Equation 6 with bit size b, hypothesis x and reference y assumes
that the hypothesis was computed using Appendix B and for the reference the
results of the correct key was used. We utilized the fact that for a b bit modular
addition the corresponding mean values x and y equal 2b and the variances σx
and σy equal 4b .
 2b
i=1 (xi − x)(yi − y)
Rρ (x, y) =  
2b 2
2b 2
i=1 (xi − x) i=1 (yi − y)
2b 2b 2b 2b
(xi ∗ yi ) − i=1 (xi ∗ y) − i=1 (x ∗ yi ) + i=1 (x ∗ y)
= i=1  
2b ∗ σx 2b ∗ σy
2b
(xi ∗ yi ) − ( 2b )2 ∗ 2b − ( 2b )2 ∗ 2b + ( 2b )2 ∗ 2b
= i=1 b b
4 ∗2
2b
(xi ∗ yi ) − ( 2b )2 ∗ 2b
= i=1 b b
4 ∗2
2b b2 b
i=1 (xi ∗ yi ) 4 ∗2
= b b
− b b
4 ∗2 4 ∗2
2 b
(xi ∗ yi )
= i=1b b
− b.
4 ∗2

B Algorithm for Computing the Optimal Correlation of


a Modular Addition

Algorithm 1 shows how to compute the optimal correlation of a modular addition


for every candidate.

C Example Convergence for Increasing Bit Size

The correlation of the symmetric counterpart for a modular addition of bit size
b can be computed by:
2b b b−1
i=1 (xi ∗ yi ) i=1 2∗ i−1 ∗ i ∗ (i − 1)
b
−b= b
− b. (10)
4 ∗ 2b 4 ∗ 2b

An example is the symmetric counterpart for which the resulting correlation is


equal to b−2
b , which we verified for 2 ≤ b < 65.
230 M. Zohner, M. Kasper, and M. Stöttinger

Algorithm 1. Optimal correlation computation


Require: bit length b, key k
Ensure: correlation corr for all candidates.
1: corr[2b ];
2: result[2b ][2b ];
3: for i from 1 to 2b do
4: for j from 1 to 2b do
5: result[i][j] = i + j (mod 2b );
6: end for
7: end for
8:
9: for i from 1 to 2b do
10: corr[i] = correlation(result[i], result[k]);
11: end for

D Choosing the Parameter λ

One can find a trade-off in choosing the parameter λ. The bigger λ is chosen, the
fewer blocks have to attacked but also the higher the complexity of the DPA. On
the contrary the smaller λ is chosen, the higher the number of blocks which have
to be attacked but the lesser the complexity of the DPA. The optimal choice for
λ minimizes the following equation:
64 N
TT otal = Tmeasure ∗  ∗ Nmeasure + Thypo ∗ 2λ ∗  ∗ Nmeasure (11)
λ λ
where Tmeasure denotes the time needed for one measurement, Nmeasure is the
number of measurements needed for one mask position and Thypo is the time
needed to compute one key hypothesis during the DPA.
The equations minimizes the total time needed for the attack, which consists
of the time needed for the measurement process and the time needed for the
DPA.
MDASCA: An Enhanced Algebraic
Side-Channel Attack for Error Tolerance
and New Leakage Model Exploitation

Xinjie Zhao1 , Fan Zhang2 , Shize Guo3 ,


Tao Wang1 , Zhijie Shi2 , Huiying Liu1 , and Keke Ji1
1
Ordnance Engineering College, Shijiazhuang, Hebei, China
zhaoxinjieem@163.com
2
University of Connecticut, Storrs, Connecticut, USA
fan.zhang@engineer.uconn.edu, zshi@engr.uconn.edu
3
The Institute of North Electronic Equipment, Beijing, China

Abstract. Algebraic side-channel attack (ASCA) is a powerful crypt-


analysis technique different from conventional side-channel attacks. This
paper studies ASCA from three aspects: enhancement, analysis and
application. To enhance ASCA, we propose a generic method, called
Multiple Deductions-based ASCA (MDASCA), to cope the multiple de-
ductions caused by inaccurate measurements or interferences. For the
first time, we show that ASCA can exploit cache leakage models. We
analyze the attacks and estimate the minimal amount of leakages re-
quired for a successful ASCA on AES under different leakage models. In
addition, we apply MDASCA to attack AES on an 8-bit microcontroller
under Hamming weight leakage model, on two typical microprocessors
under access driven cache leakage model, and on a 32-bit ARM micro-
processor under trace driven cache leakage model. Many better results
are achieved compared to the previous work. The results are also consis-
tent with the theoretical analysis. Our work shows that MDASCA poses
great threats with its excellence in error tolerance and new leakage model
exploitation.

Keywords: Algebraic side-channel attack, Multiple deductions, Ham-


ming weight leakage, Cache leakage, AES.

1 Introduction

How to improve the efficiency and feasibility of side-channel attacks (SCAs) has
been widely studied in recent years. The objective is to fully utilize the leakage
information and reduce the number of measurements. This can be achieved from
two directions. One is to find new distinguishers for key recovery [3, 8, 34]. The

This work was supported in part by the National Natural Science Foundation of
China under the grants 60772082 and 61173191, and US National Science Foundation
under the grant CNS-0644188.

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 231–248, 2012.

c Springer-Verlag Berlin Heidelberg 2012
232 X. Zhao et al.

other is to combine SCA with the mathematical techniques, such as differen-


tial based [17], linear based [30], collision based [31], cube based [11], algebraic
based [16, 27–29] SCAs. This paper studies algebraic based SCAs.
Algebraic cryptanalysis converts the key recovery into a problem of solving a
boolean equation system. The main idea was proposed by Shannon [32] and first
applied to block ciphers by Courtois et al. in ASIACRYPT 2002 [9]. However, the
complexity of algebraic cryptanalysis increases exponentially with the number of
rounds. As a result, it is mainly effective to the reduced-round block ciphers. SCA
can derive additional equations by analyzing the physical leakages (e.g., timing,
power, EM, cache, etc) and help to solve the equation system. The combination
of two techniques leads to Algebraic side-channel attacks (ASCA) [16, 24, 27–29].
In ASCA, the targeted cipher is first represented with a system of alge-
braic equations. The adversary chooses a leakage model and several interme-
diate states, according to his measurement capability and attack strategy. After
that, the physical leakages are measured and used to deduce the output val-
ues of the leakage function (model) for these targeted states. Then additional
equations representing these values are derived and added into the equation sys-
tem. Finally, the equations system is solved with the solvers, such as Gröbner
basis-based [12] or SAT-based [33] solvers, to recover the key bits.

1.1 Related Work


In the original ASCA [27, 28], the key recovery was converted to a Boolean sat-
isfiability (SAT) problem and the zChaff solver was used. It has been shown that
ASCA can exploit the Hamming weight (HW) leakages in all rounds and recover
the key with a single trace even when both the plaintexts and the ciphertexts are
unknown [27, 28]. Further research on ASCA has been focused on three aspects.
The first is to leverage error tolerant ASCA to improve its practicability. The
work in [27, 28] was mainly based on an error-free assumption. Although it was
mentioned that a pair of HWs can be used to build the algebraic equations
even if one of them is incorrect, the error must be small. The details of the
method and the error rates in practice were not discussed. In CHES 2010, an
error tolerant ASCA (TASCA) [24] was proposed based on a pseudo-Boolean
optimization (PBOPT) problem. The SCIP solver [5] was used to handle the
wrong HW deductions. TASCA works on Keeloq when the error rate is less than
20% but fails on AES. Designing an error tolerant ACSA on AES is a challenge.
The second is to analyze the dependencies of ASCA. In ACNS 2010, it was
shown in [29] that the success rate of ASCA depends on the representations,
leakages and ciphers. However, it is difficult to predict the success rate because
of numerous parameters. At least 252 consecutive HW leakages are required in
ASCA on AES in [29], but the reason was not given. In COSADE 2011, the work
in [16] showed that the success rate highly depends on the algebraic immunity
and the distribution of leakage information. Still, it remains as an open problem
to estimate the number of leakages required for a specific cipher.
The third is to exploit new leakage models in ASCA. Previous work has stud-
ied the combination of algebraic cryptanalysis with fault attacks, e.g, attacks on
MDASCA: An Enhanced Algebraic Side-Channel Attack 233

DES in eSmart 2010 [10] and Trivium in COSADE 2011 [21]. The data com-
plexity required in the attacks [10, 21] can be further reduced. As addressed
in [27, 28], ASCA is a generic framework and can be applied to more leakage
models. However the diversity, error, and complexity of leakage models make it
difficult to adopt new models in ASCA.

1.2 Our Work


In this paper, we study ASCA in three aspects.
Enhancement. We initiate our work by addressing the error tolerance in ASCA.
We observe that not only the errors but also the leakage models may cause multi-
ple results when inferring the output value of the leakage function for a targeted
state. The ability of handling such multiple values for a state is critical to im-
proving the error tolerance and extending the applications of ACSA. In Section
2, we introduce an enhanced ASCA technique named as Multiple Deductions-
based ASCA (MDASCA) and propose a generic method to represent multiple
values.
Analysis. In Section 3, we analyze the possible application scenarios of MDASCA.
For the first time, we can exploit cache leakage models, which are widely studied
in recent years [1, 2, 4, 6, 7, 13–15, 20, 22, 25]. More specifically, we use MDASCA
to exploit access driven [2, 22, 25] and trace driven [1, 6, 7, 13–15, 20] cache leakage
models. In Section 4, we take another approach to evaluate ASCA different from
[16, 29]. We estimate the minimal amount of leakages required for ASCA on AES.
Many new and better results are given based on theoretical analysis, and later
confirmed by experiments. For example, in HW-based ASCA on AES (standard
NIST implementation in [23]), we show that: under known plaintext/ciphertext
scenario, only one round of HW leakages is required instead of three rounds in
[28]; under unknown plaintext/ciphertext scenario, only two rounds of HW leak-
ages are required instead of three rounds in [28].
Application. To demonstrate the excellent error tolerance and new leakage
model exploiting ability of MDASCA, in Section 5, we conduct a series of physical
experiments on AES under different leakage models. Under the HW leakage
model, AES implemented on an 8-bit microcontroller can be broken even when
the HW deduction has 80% errors with a single power trace or 100% errors with
two traces, which is better than previous results in CHES 2009 [28] and CHES
2010 [24]. Under the access driven cache leakage model, AES implemented on
two typical microprocessors can be broken with only 1 and 36 cache traces,
respectively, compared with 100 in IEEE S&P 2011 [2] and 300 in CT-RSA
2006 [25]. Under the trace driven cache leakage model, AES implemented on a
32-bit ARM microprocessor can be broken with only 5 cache traces instead of
30 in COSADE 2011 [15]. Moreover, all the experimental results of MDASCA
are consistent with our theoretical analysis.
We describe the impacts of MDASCA in Section 7 and conclude this paper
in Section 8.
234 X. Zhao et al.

2 MDASCA: Multiple Deductions-Based ASCA


2.1 Notations
To make our discussions concise and consistent, we first clarify some notations.
Deduction. In SCA, the output value of the leakage function for the targeted
state obtained from side-channel leakages is called deduction, denoted as d. The
specific meaning of the deduction highly depends on the leakage model.
Multiple Deductions. Due to the inaccurate measurements or the interfer-
ences from other components in the cryptosystem, the deduction from SCA is
not always equal to the correct value. Instead, multiple values are obtained dur-
ing the process, which are also referred to as multiple deductions.
Deduction Set. Multiple deductions are placed in a set, which is referred to
as deduction set, denoted as D. Note that in the attack, the adversaries may
also exploit the complement of the deduction set, denoted as D̄, which includes
the impossible values. The size of D (D̄) is denoted as Sp (Sn ) and is very
important to the efficiency of ASCA. The elements in D (D̄) are denoted as di
(d¯i ), 1 ≤ i ≤ Sp (Sn ). We assume that the correct deduction d is always in D
and not in D̄ throughout this paper.
Deduction Offset. The distance between the deductions di and the correct one
d is referred to as deduction offset, denoted as oi , oi = di − d. The value of oi is
very important when choosing the solving strategies (solvers) in ASCA.
Error Rate. In ASCA, the number of the targeted states where deductions are
made is denoted as NT . As to the possible deduction set D, the number of the
targeted states where deductions are wrong is denoted as NE . We define the
error rate e as e = N E
NT .

2.2 MDASCA
Existing ASCAs [27, 28] add only a few equations to the algebraic system, as-
suming the deduction from leakages is single and correct. As a result, they are
sensitive to errors and likely to fail in practical attacks. In this section, we pro-
pose an enhanced ASCA technique, named Multiple Deductions-based ASCA
(MDASCA), in which a deduction set of multiple values is created and con-
verted into a constraint equation set. As long as the deductions are enumerable,
the whole equation system can be solved by a SAT solver in a reasonable amount
of time. Next, we describe the core of MDASCA, the representation of multiple
deductions with algebraic equations.
Suppose a targeted state X can be represented with m one-bit variables xj ,
j=1..m. φ(X) denotes the output value of the leakage function for X. If the
correct deduction d can be deduced accurately, d can be calculated as in Eq. (1).
d = φ(X), X = x1 x2 . . . xm (1)
Representing multiple deductions can be divided into the following two steps.
MDASCA: An Enhanced Algebraic Side-Channel Attack 235

1. Building equations for the deduction set D or D̄. di ∈ D, 1 ≤ i ≤ Sp ,


is a “possible deduction” on X. New variables Bi are introduced to represent
X’s value that generates di . Each Bi is represented as m one-bit variables bji .
New equations can be built as shown in Eq. (2).

Bi = b1i b2i . . . bm
i , di = φ(Bi ), 1 ≤ i ≤ Sp (2)

Also, each d¯i ∈ D̄ is an “impossible deduction”. Similar to Eq. (2), new variables
B̄i and b̄ji are introduced. New equations can be built as in Eq. (3).

B̄i = b̄1i b̄2i ...b̄m


i , d¯i = φ(B̄i ), 1 ≤ i ≤ Sn (3)

Which set to use (D, D̄, or both) is highly dependent on the leakage model and
the adversaries’ ability. Typically, if Sp < Sn , D is used because it leads to a less
complicated equation system. Otherwise, D̄ is preferred.
2. Building equations for the relationship between d and D (or D̄).
Note that if Bi is equal to X, di =d. m × Sp one-bit variables ei j are introduced
to represent whether bi j is equal to xj . ei j =1 if bi j = xj ; otherwise ei j =0. Sp
one-bit variables ci are introduced to represent whether di is correct or not. ci =1
if di =d; otherwise ci =0. ci can be represented by Eq. (4), where ¬ denotes the
NOT operation.
m
ei j = ¬(xj ⊕ bi j ), ci = ei j (4)
j=1

Since only one element in D is equal to d, only one ci is 1. This can be represented
as:
c1 ∨ c2 ∨ . . . ∨ cSp = 1, ¬ci ∨ ¬cj = 1, 1 ≤ i < j ≤ Sp (5)

As for the impossible deductions, none of the elements in D̄ is the correct de-
duction d. This can be represented by Eq. (6).

m

ei j = ¬(xj ⊕ bi j ), ci = ei j = 0 (6)
j=1

Let nv,φ and ne,φ denote the number of the newly introduced variables and
ANF equations in representing one deduction di (d¯i ). nv,φ and ne,φ depend
on φ. According to Equations (2), (4), (5), (1 + 2m + nv,φ )Sp variables and
 
1 + (1 + m + ne,φ )Sp + S2p ANF equations are introduced to represent D.
According to Equations (3), (6), (1+2m+nv,φ )Sn variables and (1+m+ne,φ )Sn
ANF equations are introduced to represent D̄.
The new constraint equations mentioned above are quite simple. They can
be easily fed into the SAT solver [33] to accelerate the key search. To launch
MDASCA, it is important to choose φ and determine the deduction set D/D̄
under different models, which is addressed in Section 3.
236 X. Zhao et al.

3 Analysis of Leakage Models in MDASCA

3.1 Hamming Weight Leakage Model with Errors

MDASCA can improve the error tolerance of ASCAs based on HW leakage


model (HWLM). In such attacks, the adversaries try to deduce the HW of the
targeted state X from measurements. In practice, due to noise, the adversaries
may get wrong values of HW (X) that are close to the correct one. For those
implementations on some devices such as microcontrollers, the deduction offset
for HW leakage is small and approximately ±1 away from HW (X), which is also
addressed in [24]. Therefore, the possible HW deduction set D can be written as

D = {HW (X) − 1, HW (X), HW (X) + 1} (7)

MDASCA can handle the deduction offset easily by setting φ=HW (),
d=HW (X). For example, if d=3, D = {2, 3, 4}.

3.2 Cache Leakage Models

Cache in the microprocessors can leak the secret information about the indexes
of table(S-Box) lookups and compromise the cryptosystems. Numerous cache
attacks on AES have been published. There are three leakage models in cache
attacks: time driven (TILM) [4], access driven (ACLM) [2, 22, 25], and trace
driven (TRLM) [1, 6, 7, 13–15, 20]. Under TILM, only the overall execution
time is collected and it is difficult to deduce the internal states from few traces.
Under ACLM and TRLM, adversaries can measure the cache-collisions and infer
internal states with a single cache trace. Now we discuss how to use these two
models in MDASCA.
Suppose a table has 2m bytes and a cache line has 2n bytes (m > n). The whole
table will fill 2m−n cache lines. A cipher process V performs k table lookups,
denoted as l1 , l2 , . . . , lk . For each lookup li , the corresponding table index is Xi .
Assume lt is the targeted table lookup.
1.Access driven leakage model. Under ACLM [2, 22, 25], the cache lines
accessed by V can be profiled by a malicious process S and used to deduce lt . S
first fills the cache with its own data before V performs the lookups, and accesses
the same data after the lookups are done. S can tell whether a datum is in cache
or not by measuring its access time. A shorter access time indicates a cache hit.
A longer access time is a cache miss, implying that V already accessed the same
cache line. If S knows which cache line that lt accessed, he knows the higher
m − n bits of Xt . Let X denote the function that extracts the higher m − n
bits of X. Then the correct deduction for lt is Xt .
In practice, S observes many cache misses from two sources. Some are from
the k − 1 lookups other than lt . Some are from interfering processes that run
in parallel with V. Assume the interfering processes have accessed g different
cache lines, which can be considered as g more “lookups” at Xk+1 , ...Xk+g . All
the possible values of Xt  form a collection L. Without loss of generality, we
MDASCA: An Enhanced Algebraic Side-Channel Attack 237

assume the first Sp values of L are distinct and Sp ≤ k+g. The possible deduction
set D can be written as:

D = {d1 , .., dSp }, di = Xi , 0 ≤ di < 2m−n (8)

Note that the impossible deduction set D̄ can also be obtained, Sn =2m−n − Sp .
So ACLM can be easily interpreted with multiple deductions by setting φ = ·,
d = Xt  and di = Xi . The values of elements in D or D̄ are known to
adversaries after deductions.
2.Trace driven leakage model. Under TRLM [1, 6, 7, 13–15, 20], S can
keep track of the cache hit/miss sequence of all lookups to the same table of V via
power or EM probes. Suppose there are r misses before lt . Let SM (X) be the set
of lookup indexes corresponding to the r misses, SM (X) = {Xt1 , Xt2 , . . . , Xtr }.
If lt is a cache hit, the data that Xt tries to access have been loaded into the
cache by previous lookups. The possible deduction set D for Xt can be written
as in Eq. (9) where Sp = r.

D = SM (X) = {Xt1 , Xt2 , . . . , Xtr } (9)

If a cache miss happens at lt , the impossible deduction set D̄ for Xt can be


written as in Eq. (10) where Sn = r.

D̄ = SM (X) = {Xt1 , Xt2 , . . . , Xtr } (10)

So TRLM can also interpreted under MDASCA by setting φ = ·, d = Xt 


and di = Xti . Different from ACLM, the elements in D or D̄ are among the
set of higher m − n bits of table lookup indexes that cause cache misses. The
exact value is unknown to adversaries even after the deductions.

4 Evaluation of MDASCA on AES


It is desirable to know how many leakages or traces are required in MDASCA. We
take AES-128 as an example to evaluate it under HWLM, ACLM and TRLM.
Suppose the master key is denoted as K, and the plaintext/ciphertext as P/C.
We use ξ(x) to denote the key search space for a variable x which is key depen-
dent. Ai ,Bi ,Ci ,Di stand for the states after the AddRoundKey (AKi ), SubBytes
(SBi ), ShiftRows (SRi ) and MixColumns (M Ci ) in the i-th round.

4.1 HWLM Based MDASCA


Under both known P/C and unknown P/C scenarios, the attacks in [28, 29]
require about 252 known HW leakages in three consecutive rounds to recover K.
They assume there are 84 HW leakages in each round (16 in AK, 16 in SB, and
52 in M C) and the deduction offset is 0. However, the number of HW leakages
that required can be less.
Consider a byte x, if its HW is leaked, we can calculate that ξ(x)=50.27=25.65.
It is equivalent to say that ξ(x) is reduced by about 28−5.65 =22.35 . If both HW (x)
238 X. Zhao et al.

and HW (S[x]) are leaked in SubByte, ξ(x)=10.69=23.42. These numbers are


calculated by an algorithm in Appendix 1.
Taking the assumptions as in [28, 29], we evaluate MDASCA on AES under
unknown P/C scenarios. With 16 bytes leaked in AK1 and SB1 , ξ(B1 )=23.42×16 =
254.72 . In M C1 , four bytes in a column can be calculated with 13 steps [28, 29].
Simply considering the last four steps which write the output to D1 , we estimate
that each column can reduce ξ(D1 ) for about 22.35×4 . Taking all four columns
into account, ξ(D1 ) should be reduced to 254.72−2.35×4×4 ≈ 217.12 approximately.
ξ(D1 ) can be further reduced to 1 if other nine leakages can be utilized. We be-
lieve 84 HW leakages can obtain A1 ,B1 ,D1 , which also applies to A2 ,B2 ,D2 .
We verify this with a C program and later with experiments in Section 5.2.
Two rounds of leakages can obtain the second round key by XORing D1 and
A2 . Then the master key K can be recovered via solving the equations on the
key schedule of AES. Note that the known P/C scenario is just a special case
of above. One round of leakages should be enough to recover the first round key
K since both A1 and P are known.

4.2 ACLM Based MDASCA

ACLM based cache attacks on AES have been widely studied [2, 22, 25]. This
paper takes AES implementations in OpenSSL1.0.0d as the target, where n=6
and m=11,10,8 for table with the size of 2KB, 1KB or 256 bytes. Accordingly,
the leakages (i.e.,  = m − n) are the higher 5, 4 and 2 bits of table lookup index,
respectively. This is equivalent to say that one leakage reduces ξ(K) about 2 .
We consider two typical ACLM models.
1.Bangerter model. Bangerter et al. [2] launched attacks on AES in OpenSSL
1.0.0d with one 2KB table and 100 encryptions, assuming S can profile the ac-
cessed cache lines per lookup. In the attack, the CFS Scheduler of Linux and
Hyper thread techniques are required. We refer to their work as Bangerter model.
In the attack, there are 16 leakages of table lookups in each round. After the
first round analysis, ξ(K) can be reduced to 2(8−)×16 . After the second round
analysis, ξ(K) can be approximately reduce to 2(8−2)×16 . As to =4 or 5, two
rounds of leakages from one cache trace are enough to recover K. As to =2, after
the first round analysis, ξ(K) can be reduced to 26×16 . After the second round
analysis, three cache traces are enough to recover ξ(K) into 2(6−2×3)×16 = 1.
2.Osvik model. Osvik et al. [25] and Neve et al. [22] conducted ACLM based
cache attacks on AES in OpenSSL 0.9.8a, assuming S can profile the accessed
cache lines per encryption. We refer to their work as Osvik model.
In OpenSSL0.9.8a, four 1KB tables (T0 , T1 , T2 , T3 , m=10) are used in the first
nine rounds, and a table T4 is used in the last round. The attacks in [25] can
succeed with 300 samples by utilizing the 32 table lookups in the first two rounds
of AES. The attack in [22] can break AES with 14 samples by utilizing the 16 T4
table lookups in the final round of AES. AES in OpenSSL1.0.0d removes T4 and
uses T0 , T1 , T2 , T3 throughout the encryption. As a result, it is secure against
[22] and increase the difficulties of [25].
MDASCA: An Enhanced Algebraic Side-Channel Attack 239

We implemented AES in OpenSSL1.0.0d with m=10, n=6 and =4. In total,


there are 40 cache accesses to the same table in one encryption. Under Osvik
model [25], 16 × ( 1516 )
40
≈ 1.211 cache lines will not be accessed by S, which
means on average there are 1.211 elements in the impossible deduction set for
the table lookup indexes. Theoretically speaking, around 1.21115
≈ 12.386 traces
with one round of leakages can recover the high 4-bit of table lookup indexes,
which reduces ξ(K) to 264 . Utilizing the 16 table lookups in the second round,
ξ(K) can be further reduced to 1. In practice, the number of traces required will
increase a little bit, as shown in Section 5.3.

4.3 TRLM Based MDASCA

The first TRLM based cache attack on AES was proposed by Bertoni et al. [6]
and later improved in [13] through real power analysis. More simulation results
with 1KB table were proposed in [1, 6, 7, 20]. Recently, several real-world attacks
were proposed in [14, 15] on AES with 256B table on a 32-bit ARM micropro-
cessor. In [14], the cache events are detected with a power probe and only the
first 18 lookups are analyzed. 30 traces reduce ξ(K) to 230 . In [15], the attacks
are done with an EM probe and ξ(K) is further reduced to 10 when two more
lookups are considered.
Let pi and ki be the i-th byte of the plaintext and the master key. The
i-th lookup index is pi ⊕ ki . In TRLM, the XOR between two different lookup
indexes (pi ⊕ ki and pj ⊕ kj ) can be leaked. From the 16 leakages in the first
round, pi ⊕ ki ⊕ pj ⊕ kj  can be recovered, 1≤ i < j ≤16. ξ(K) can be reduced
to 2128−15∗ and further reduced to 1 by analyzing leakages in the later rounds.

(a) # (b) Nl
Fig. 1. Estimations in TRLM based MDASCA on AES

To calculate N , the number of cache traces required for a successful attack,


we need to know #, the expected number of cache lines that are updated by
S after the first l lookups. Let Nc =2m−n be the number of cache lines that
one table can fill. The probability for one cache line not updated by S after l
240 X. Zhao et al.

table lookups is ( NN
c −1 
c
) . And # = Nc (1 − ( NN c −1 
c
) ). Let y be the -th lookup
index, and ρ be the reduced percentage of ξ(y ) due to the -th lookup. Then
Nl
Nc ) + (1 − Nc ) , as also shown in [1]. Let Nc × (ρ )
ρ = ( # 2 # 2
≤ 1, N ≈ −logρNc .
In [14, 15], Nc =16. Fig. 1(a) shows how # changes with . It’s clear to see
that even after 48 table lookups, # < 16 and ρ < 1, which means there are
still some cache misses that could be used for deductions. Fig. 1(b) shows how
N changes with , where the minimal of N is 4, and the maximal is 22.24. If
N is 5 or 6, ξ(K) can be reduced to 276.10 and 274.13 , respectively.
Using the leakages in the second round, if N is 5 or 6, ξ can be further
reduced to 276.10−48.80 =227.30 , and 274.13−54.78 =219.35 approximately. After the
third round, ξ(K) can be reduced to 1 with a high probability. So approximately
5 or 6 cache traces are enough to launch a successful TRLM based attacks on
AES. As it is really hard to analyze the third round leakages manually, we will
just verify it through MDASCA experiments, as shown in Section 5.4.

5 Application of MDASCA on AES


5.1 Experiment Setup
We conduct attacks under HWLM, ACLM and TRLM. Table 1 presents the
experiment setups.

Table 1. Experiment setup of MDASCA on AES

Leakage Model Targeted platform AES Implementation Note


HWLM 8-bit microcontroller ATMEGA324P AES with compact table
ACLM Intel Pentium 4 processor1 , Fedora 8 Linux AES in OpenSSL 1.0.0d Bangerter model [2]
ACLM Athlon 64 3000+ processor2 , Windows XP SP2 AES in OpenSSL 1.0.0d Osvik model [25]
TRLM 32-bit ARM microprocessor NXP LPC2124 AES with compact table

We adopt the technique in [19] to build the equation set for AES. Each S-Box
can be represented by 254 ANF equations with 262 variables. The full AES-128
including both encryption and key scheduling can be described by 58288 ANF
equations with 61104 variables. MDASCA does not require the full round of
AES, as analyzed in Section 4. We choose the CryptoMiniSat 2.9.0 solver [33]
to solve the equations. The solver is running on an AMD Athlon 64 Dual core
3600+ processor clocked at 2.0GHz. We consider that a trial of MDASCA fails
when no solution is found within 3600 seconds.

5.2 Case Study 1: HWLM Based MDASCA on AES


As shown in Fig. 2(a), the instant power of AES-128 running on ATMEGA324P
is highly correlated to the HW of the data. Similar to [24], we deduce the HW
for 84 intermediate states in each round. According to Section 4.1, our attacks
only require a few rounds of HW leakages. We calculate the Pearson correlation
factor [8] when deducing the HWs from power traces.
1
L1 Cache setting: 16KB cache size, 8 way associative, 64B cache line size.
2
L1 Cache setting: 64KB cache size, 2 way associative, 64B cache line size.
MDASCA: An Enhanced Algebraic Side-Channel Attack 241

Fig. 2. HW deductions in MDASCA on AES

Fig. 2(b) shows the AddRoundKey of the first round in a single power trace
where the offset is one. In this example, from Section 2.1, the error rate e is
9
16 = 56.25%. The deduction set for the 8−th byte is D={2, 3, 4}, Sp = 3.
To represent each HW deduction in D, 99 new variables (not including the
8 variables in Eq. (2)) and 103 equations are required(see Appendix 2). So,
nv,φ = 99, ne,φ = 103. According to Section 2.2, we use 348 new variables and
340 ANF equations to represent D.
As in [28], we considered several attack scenarios: known or unknown P/C,
consecutive or random distributions of correct HW deduction. Since the PBOPT
solver used in [24] fails on AES even when there is no error, we only compare
our results with [28]. We repeat the experiment for each scenario 100 times and
compute the average time. Our results indicate that, although the SAT solver has
some smart heuristics, most of the trials (more than 99%) succeed in reasonable
time with small variation.
Table 2 lists how many rounds are required for different scenarios. With one
power trace, when leakages are consecutive, and if P/C is known, only one round
is required instead of 3 in [28]; if P/C is unknown, 2 rounds are required instead
of 3 in [28]. The results are consistent with the analysis in Section 4.1.

Table 2. Comparisons of HWLM based MDASCA on AES with previous work

Scenarios error type leakage type [28] MDASCA


known P/C error free consecutive 3 rounds 1 round(10 seconds)
known P/C error free random 8 rounds 5 rounds (120 seconds)
unknown P/C error free consecutive 3 rounds 2 rounds (10 seconds)
unknown P/C error free random 8 rounds 6 rounds (100 seconds)
known P/C 80% error rate consecutive - 3 rounds (600 seconds)
known P/C 100% error rate consecutive - 2 rounds (120 seconds, 2 power traces)
known P/C 100% error rate consecutive - 1 rounds (120 seconds, 3 power traces)
242 X. Zhao et al.

Under HWLM, the average HW deduction error rate of a single power traces
is about 75%, which is also indicated in [27, 28]. MDASCA can succeed even
with 80% error rate by analyzing 3 consecutive rounds in a single trace within 10
minutes. Even when the error rate is 100% (the number of all the HW deductions
is 3), AES can still be broken by analyzing two consecutive rounds of two power
traces within 2 minutes, or one round of three traces with 2 minutes. From
the above, we can see that MDASCA has excellent error tolerance and can
significantly increase the robustness and practicability of ASCA.
Note that MDASCA can also exploit larger number of HW deductions, e.g, 4.
If the HW leakages are not enough for the solver to find the single and correct
solution, a full AES encrypt procedure of an additional P/C can be added into
the original equation set to verify the correctness of all the possible solutions.
The time complexity might be a bit higher without increase the data complexity.

5.3 Case Study 2: ACLM Based MDASCA on AES


We conduct ACLM based MDASCA on AES under both Bangerter [2] and
Osvik [25] model. The comparisons of MDASCA with previous work are listed
in Table 3. The results are consistent with the analysis in Section 4.2.
Under Bangerter model, we apply MDASCAs to three AES implementations
in OpenSSL1.0.0d with 2KB, 1KB or 256B table. Fig. 3(a) shows the cache
events of 16 lookups in the first round with four 1KB tables. According to our
experience, there are 1-4 cache misses during each lookup due to the noises from
other system processes. Take the third column of Fig. 3(a) as an example. We
have 3 possible deductions on X3 , and D={4,11,13}, Sp = 3. To represent each
deduction in D, no new variables are introduced and 4 assignment equations are
required, nv,φ = 0, ne,φ = 4. In total, 31 ANF equations with 27 additional
variables are introduced.

Table 3. Comparisons of ACLM based MDASCA with previous work

Attacks AES implementation Leakage model Scenarios samples time


[2] 2KB table Bangerter known P (unknown P/C) 100 3 minutes
MDASCA 2KB table Bangerter known P 1 6 seconds
MDASCA 2KB table Bangerter unknown P/C 2 60 seconds
MDASCA 1KB table Bangerter known P 1 15 seconds
MDASCA 1KB table Bangerter unknown P/C 2 120 seconds
MDASCA 256B table Bangerter known P 3 60 seconds
[25] 1KB table Osvik known P 300 65 milliseconds
MDASCA 1KB table Osvik known P 36 1 hour

As shown in Table 3, only up to three traces are required in MDASCA, in


contrast to 100 samples in [2]. In particular, when AES is implemented with
256B table, the attacks in [2] failed. This is because the leakages (the high 2-bit
of the lookup index) is small and the number of rounds that can be analyzed is
limited. MDASCA can utilize leakages of all rounds and only three cache traces
are required, which is the first successful ACLM based cache attack on AES with
a compact table.
MDASCA: An Enhanced Algebraic Side-Channel Attack 243

(a) Cache events sampled after each table (b) Cache events sampled after one en-
lookup cryption

Fig. 3. Profiled ACLM based leakages of V by S

Under Osvik model, we apply MDASCAs to AES implementation in OpenSSL


1.0.0d with four 1KB tables (such implementation can well defend attack in [22]).
Fig. 3(b) shows the events of 16 cache lines in 10 encryptions related for T0 in
real system. About 13-16 cache lines (colored in cyan) have misses, which means
that there are about 0-3 impossible deductions for the high 4-bit of every table
lookup index Xi . Take the first column as an example, there are 3 impossible
deductions for X2 , D̄={9,10,15}, Sn = 3. Therefore, 27 ANF equations with
27 variables are introduced.
From Table 3, 36 traces can recover the full key comparing to the 300 traces
in [25] (in fact, to attack the same AES implementation in OpenSSL 0.9.8.a as
in [25], only 30 traces are required by MDASCA). In the attack, we first try
to directly use the equations generated by ACLM based leakages. Experimental
results show that the whole system cannot be solved even within one day. Then,
to accelerate the equation solving process, we iterate the values of several key
bits into the equation system besides the equations generated from leakages. For
example, four key bits need 16 enumerations. The results show that if the 4 key
bits of the input value are correct, it can be solved in 40 minutes. Otherwise,
the solver will output “unsatisfiable” within 10-20 seconds. We repeat the tests
for about 100 times. On average one hour is enough to recover the AES-128 key.

5.4 Case Study 3: TRLM Based MDASCA on AES

In TRLM based MDASCA, we implement AES on an 32-bit ARM microproces-


sor NXP LPC2124 with a direct mapped cache, and profile the cache events via
EM probes, as in [15]. The cache line size is 16 bytes. The table size is 256 bytes
and can fill up with 16 cache lines (=4, Nc =16). According to the analysis in
Section 4.3, the cache events of the first three rounds are utilized in the attack.
In an EM trace, the cache miss has a distinct peak. Thus cache hit/miss events
can be easily distinguished.
244 X. Zhao et al.

(a) Cache events in five encryptions (b) Deduction set size

Fig. 4. Profiled Cache trace in TRLM Based MDASCA

Fig. 4(a) shows the cache events of the first 48 lookups (first 3 rounds) in 5
cache traces. The table lookups in the first round are more likely to cause misses.
The probabilities of cache hit increase in following rounds. However, even after
48 lookups, there is still high probability that full 16 cache lines on the 256B
table have not been updated yet, consistent with the analysis in Section 4.3.
Let the table lookup index be yi (1 ≤ i ≤ 48). Fig. 4(b) shows the number of
deductions for 48 table lookup indexes of the 5 cache traces. This number is
increased with the table lookup number, and the range is 0-15 for these 5 traces.
Take the 8-th and 9-th lookups (also l8 and l9 ) of the first sample in Fig. 4(a)
as examples. As to l8 , a cache miss is observed. Then the impossible deduction
set of y8  is D̄ = {y1 , y2 , y3 , y5 , y6 , y7 }, Sn = 6. Note that all the
variables of D̄ have already been represented in the AES algebraic equation
system, and nv,φ = 0, ne,φ = 0. We only needs to compute the new intro-
duced variables and equations by Eq. (6). According to Section 2.2, 30 ANF
equations with 30 variables can be generated. As to l9 , a cache hit happens.
From Section 2.2, the possible deduction set of y9  (higher four bits of y9 ) is
D = {y1 , y2 , y3 , y5 , y6 , y7 , y8 }, Sp = 7. As all the variables of D have
been represented in the AES system, according to Section 2.2, 57 ANF equations
with 35 variables can be added to the equation system.
For some table lookups, it is hard to tell whether they are cache miss or hit
because the peak is not high enough. In our MDASCA, we treat uncertain cache
events as cache hits. In some other scenarios, partially preloaded cache is also
considered and more cache hits are observed. Our MDASCA only utilizes cache
misses and still works in this case.
As in [15], we conduct several TRLM based MDASCAs on AES considering
three scenarios: with both cache hit and miss events, with cache miss events only,
with cache miss events and four preloaded cache lines. Each attack in the three
cases is repeated for 100 times. To accelerate the equation solving procedure,
we also input the candidates of 4 key bits into the equation system and launch
16 ASCA instances corresponding to 16 possible candidates. The comparisons
of our results with previous work are listed in Table 4.
MDASCA: An Enhanced Algebraic Side-Channel Attack 245

Table 4. Comparisons of TRLM based MDASCA on AES with previous work

Attacks Utilized collisions Collision type Preloaded cache lines Sample size Key space time
[13] 16 lookups H/M 0 14.5 268 -
[14] 18 lookups H/M 0 30 230 -
[15] 20 lookups H/M 0 30 10 -
MDASCA 48 lookups H/M 0 5 (6) 1 1 hour(5 minutes)
[15] 20 lookups M 0 61 - -
MDASCA 48 lookups M 0 10 1 1 hour
[15] 20 lookups M 4 119 - -
MDASCA 48 lookups M 4 24 1 1 hour

From Table 4, TRLM based MDASCA can exploit cache behaviors of three
AES rounds (48 table lookups) and achieve better results than previous work
[14, 15]. At least five cache traces are able to recover 128-bit AES key within
an hour. The complexity for both online (number of measurements) and offline
(recovering the key from the measurements) phases has been reduced. Moreover,
the results are also consistent with the theoretical analysis in Section 4.3.

6 Impact of MDASCA

The impacts of MDASCA in this paper can be understood as follows.


The first impact is on error tolerance. Providing error tolerance can increase
the robustness and practicability of ASCA. This can be addressed with two
approaches. One is to embed the error toletance into the solver, as in TASCA [28].
The SCIP solver [5] used in [28] requires the small errors continuously distributed
around the correct value (e.g., under HWLM), and might not work under ACLM
and TRLM, where the error offset is discrete, unpredictable and large. The
diversities among different leakage models play as the major barrier. The other
approach is what MDASCA does. The errors are preprocessed and represented
with new equations. The overhead of the approach includes new variables and
equations. However, the cryptanalysts can now focus on the leakage utilization
and reduce the affects from the solver. Moreover, our results of MDASCA show
that the complexity of solving equations is not prohibitively high and most of
the instances can be solved within reasonable times with a small variance..
The second impact is on application of attacks. Previous ASCAs [24, 27, 28]
work well on some small devices (e.g., microcontroller), where the power is highly
correlated to the HW and easy to measure. How can we adopt ACSA under
different scenarios, such as ACLM and TRLM, where those advantages never
exist? For the first time MDASCA extends the applications of ASCA to more
complicated models. Considering the widely used microprocessors in common PC
and embedded devices, it is difficult to launch HWLM based ASCA on them.
Cache attacks are more practical. Previous attacks [1, 2, 4, 6, 7, 13–15, 20, 22, 25]
on AES can only use the leakages in the first two rounds due to the complexity
of representing the cache leakages of the targeted states. MDASCA can exploit
the cache leakages in more rounds, even in all the rounds. Thus the complexity
of the attack and the required measurements are dramatically reduced.
246 X. Zhao et al.

7 Conclusion and Future Work


Due to the existence of noises and the intrinsic feature of leakage models, cor-
rect deductions in ASCA are often hidden in multiple candidates. This paper
proposes an enhanced ASCA attack called Multiple Deductions-based ASCA
(MDASCA) to exploit these candidates. A generic method is described to repre-
sent multiple deductions with algebraic equations. Several leakage models suit-
able for MDASCA are analyzed and the details of the leakage exploitation are
also provided. For the first time, we evaluate the minimal amount of leakages for
MDASCA on AES under these models. To verify the practicality and theoretical
analysis, we have successfully launched real MDASCA attacks under different
models, and achieved better results.
From this paper, MDASCA attests again that combining algebraic techniques
with SCA is a promising way to fully utilize the leakages. Future works of
MDASCA consequently include the solver improvement (try different solvers
for better performances and solving capabilities), application extension (to dif-
ferent ciphers, leakage models and implementations) and security evaluation (as
a benchmark to evaluate the physical security of ciphers).

Acknowledgments. The authors would like to thank Francois-Xavier Stan-


daert, Yu Yu, Ruilin Li, Siwei Sun, Zheng Gong and the anonymous referees for
helpful discussions and comments.

References
1. Acıı̈çmez, O., Koç, Ç.: Trace Driven Cache Attack on AES. In: Rhee, M.S., Lee,
B. (eds.) ICISC 2006. LNCS, vol. 4296, pp. 112–121. Springer, Heidelberg (2006)
2. Bangerter, E., Gullasch, D., Krenn, S.: Cache Games - Bringing Access-Based
Cache Attacks on AES to Practice. In: IEEE S&P 2011, pp. 490–505 (2011)
3. Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon,
N.: Mutual Information Analysis: A Comprehensive Study. Journal of Cryptol-
ogy 24, 269–291 (2011)
4. Bernstein, D.J.: Cache-timing attacks on AES (2004),
http://cr.yp.to/papers.html#cachetiming
5. Berthold, T., Heinz, S., Pfetsch, M.E., Winkler, M.: SCIP C solving constraint
integer programs. In: SAT 2009 (2009)
6. Bertoni, G., Zaccaria, V., Breveglieri, L., Monchiero, M., Palermo, G.: AES Power
Attack Based on Induced Cache Miss and Countermeasure. In: ITCC 2005, pp.
586–591. IEEE Computer Society (2005)
7. Bonneau, J.: Robust Final-Round Cache-Trace Attacks Against AES. Cryptology
ePrint Archive (2006), http://eprint.iacr.org/2006/374.pdf
8. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage
Model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp.
16–29. Springer, Heidelberg (2004)
9. Courtois, N., Pieprzyk, J.: Cryptanalysis of Block Ciphers with Overdefined Sys-
tems of Equations. In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp.
267–287. Springer, Heidelberg (2002)
MDASCA: An Enhanced Algebraic Side-Channel Attack 247

10. Courtois, N., Ware, D., Jackson, K.: Fault-Algebraic Attacks on Inner Rounds of
DES. In: eSmart 2010, pp. 22–24 (September 2010)
11. Dinur, I., Shamir, A.: Side Channel Cube Attacks on Block Ciphers. Cryptology
ePrint Archive (2009), http://eprint.iacr.org/2009/127
12. Faugère, J.-C.: Gröbner Bases. Applications in Cryptology. In: FSE 2007 Invited
Talk (2007), http://fse2007.uni.lu/slides/faugere.pdf
13. Fournier, J., Tunstall, M.: Cache Based Power Analysis Attacks on AES. In:
Batten, L.M., Safavi-Naini, R. (eds.) ACISP 2006. LNCS, vol. 4058, pp. 17–28.
Springer, Heidelberg (2006)
14. Gallais, J., Kizhvatov, I., Tunstall, M.: Improved Trace-Driven Cache-Collision
Attacks against Embedded AES Implementations. In: Chung, Y., Yung, M. (eds.)
WISA 2010. LNCS, vol. 6513, pp. 243–257. Springer, Heidelberg (2011)
15. Gallais, J., Kizhvatov, I.: Error-Tolerance in Trace-Driven Cache Collision Attacks.
In: COSADE 2011, pp. 222–232 (2011)
16. Goyet, C., Faugre, J., Renault, G.: Analysis of the Algebraic Side Channel Attack.
In: COSADE 2011, pp. 141–146 (2011)
17. Handschuh, H., Preneel, B.: Blind Differential Cryptanalysis for Enhanced Power
Attacks. In: Biham, E., Youssef, A.M. (eds.) SAC 2006. LNCS, vol. 4356, pp. 163–
173. Springer, Heidelberg (2007)
18. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
19. Knudsen, L.R., Miolane, C.V.: Counting equations in algebraic attacks on block
ciphers. International Journal of Information Security 9(2), 127–135 (2010)
20. Lauradoux, C.: Collision Attacks on Processors with Cache and Countermeasures.
In: WEWoRC 2005. LNI, vol. 74, pp. 76–85 (2005)
21. Improved Differential Fault Analysis of Trivium. In: COSADE 2011, pp. 147–158
(2011)
22. Neve, M., Seifert, J.: Advances on Access-Driven Cache Attacks on AES. In: Bi-
ham, E., Youssef, A.M. (eds.) SAC 2006. LNCS, vol. 4356, pp. 147–162. Springer,
Heidelberg (2007)
23. FIPS 197, Advanced Encryption Standard, Federal Information Processing Stan-
dard, NIST, U.S. Dept. of Commerce, November 26 (2001)
24. Oren, Y., Kirschbaum, M., Popp, T., Wool, A.: Algebraic Side-Channel Analysis
in the Presence of Errors. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010.
LNCS, vol. 6225, pp. 428–442. Springer, Heidelberg (2010)
25. Osvik, D.A., Shamir, A., Tromer, E.: Cache Attacks and Countermeasures: The
Case of AES. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 1–20.
Springer, Heidelberg (2006)
26. Percival, C.: Cache missing for fun and profit (2005),
http://www.daemonology.net/hyperthreading-considered-harmful/
27. Renauld, M., Standaert, F.-X.: Algebraic Side-Channel Attacks. In: Bao, F., Yung,
M., Lin, D., Jing, J. (eds.) Inscrypt 2009. LNCS, vol. 6151, pp. 393–410. Springer,
Heidelberg (2010)
28. Renauld, M., Standaert, F., Veyrat-Charvillon, N.: Algebraic Side-Channel Attacks
on the AES: Why Time also Matters in DPA. In: Clavier, C., Gaj, K. (eds.) CHES
2009. LNCS, vol. 5747, pp. 97–111. Springer, Heidelberg (2009)
29. Renauld, M., Standaert, F.-X.: Representation-, Leakage- and Cipher- Dependen-
cies in Algebraic Side-Channel Attacks. In: Industrial Track of ACNS 2010 (2010)
30. Roche, T.: Multi-Linear cryptanalysis in Power Analysis Attacks. MLPA CoRR
abs/0906.0237 (2009)
248 X. Zhao et al.

31. Schramm, K., Wollinger, T.J., Paar, C.: A New Class of Collision Attacks and
Its Application to DES. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp.
206–222. Springer, Heidelberg (2003)
32. Shannon, C.E.: Communication theory of secrecy systems. Bell System Technical
Journal 28 (1949); see in particular page 704
33. Soos, M., Nohl, K., Castelluccia, C.: Extending SAT Solvers to Cryptographic
Problems. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 244–257.
Springer, Heidelberg (2009)
34. Whitnall, C., Oswald, E., Mather, L.: An Exploration of the Kolmogorov-Smirnov
Test as Competitor to Mutual Information Analysis. Cryptology ePrint Archive
(2011), http://eprint.iacr.org/2011/380.pdf

Appendix 1: Algorithm to Calculate ξ(x) under HWLM

Algorithm 1 computes ξ(x) from two input parameters. The first is n, the number
of bits in x. The second is m. If m is 1, the algorithm outputs ξ(x) for the cases
that HW (x) is known. Otherwise, it outputs ξ(x) for the cases that both HW (x)
and HW (S(x)) are known, where S(x) is the S-Box result of x.

Algorithm 1. Compute the search space of x


1: Inputs: n, m
2: Output: ξ(x) the expected search space of x
3: int i, j, sum=0;
4: for i=0 to 2n do
5: for j=0 to 2n do
6: if (HW (i) == HW (j))
7: if (m == 1 or HW (S(i)) == HW (S(j)))
8: sum++;
9: end if
10: end if
11: end for
12: end for
13: return ξ(x) = ( float)sum/2n ;

Appendix 2: Hamming Weight Representation of a Byte

Suppose X is a byte,containing 8 bits (x7 . . . x0 ). HW (X) can be represented


with a 4-bit value Y =(y3 . . . y0 ). x0 and y0 denote the LSBs. Bits in Y can be
calculated as:

7 
y3 = xi , y2 = xi xj xm xn (0 ≤ i < j < m < n ≤ 7),
i=0
(11)
 
7
y1 = xi xj (0 ≤ i < j ≤ 7), y0 = xi
i=0
Intelligent Machine Homicide
Breaking Cryptographic Devices Using Support Vector
Machines

Annelie Heuser1,2 and Michael Zohner1,2


1
Technische Universität Darmstadt, Germany
2
Center for Advanced Security Research Darmstadt (CASED), Germany
{annelie.heuser,michael.zohner}@cased.de

Abstract. In this contribution we propose the so-called SVM attack,


a profiling based side channel attack, which uses the machine learning
algorithm support vector machines (SVM) in order to recover a crypto-
graphic secret. We compare the SVM attack to the template attack by
evaluating the number of required traces in the attack phase to achieve
a fixed guessing entropy. In order to highlight the benefits of the SVM
attack, we perform the comparison for power traces with a varying noise
level and vary the size of the profiling base. Our experiments indicate
that due to the generalization of SVM the SVM attack is able to recover
the key using a smaller profiling base than the template attack. Thus,
the SVM attack counters the main drawback of the template attack, i.e.
a huge profiling base.

1 Introduction

Side channel analysis utilizes physical leakage that is emitted during the execu-
tion of cryptographic devices in order to recover a secret. Among side channel
attacks, profiling based side channel attacks are considered to be the most effec-
tive attacks when a strong adversary is assumed. In profiling based side channel
attacks an adversary utilizes a training device, over which he has full control,
in order to gain additional knowledge for the attack against an identical target
device. A common profiling based side channel attack, the so called template
attack, was introduced as the most powerful type of profiling based side channel
attack from an information theoretical point of view [3]. However, since the tem-
plate attack requires many power traces in order to correctly model the power
consumption of the device, further profiling based side channel attacks were sug-
gested. A relatively new suggestion deals with machine learning techniques [8,13],
in particular support vector machines (SVM) [18]. These contributions focus on
the SVM as a binary classification method. The actual strength of SVM, i.e. the
ability to generalize a given problem, is not tackled and thus the full potential
of SVM in the area of side channel analysis is not utilized.
In this contribution we highlight the ability of SVM to build a generalized
model from an underspecified profiling set by introducing the so called SVM

W. Schindler and S.A. Huss (Eds.): COSADE 2012, LNCS 7275, pp. 249–264, 2012.

c Springer-Verlag Berlin Heidelberg 2012
250 A. Heuser and M. Zohner

attack. The SVM attack is a profiling based side channel attack that reveals
cryptographic secrets by using SVM to predict the Hamming weight for a given
power consumption. We highlight the ability of SVM to build a generalized
model from a given profiling set by evaluating the required number of attack
traces to achieve a fixed guessing entropy for the SVM attack on power traces
with different noise levels and for a varying number of profiling traces. We show
that the SVM attack is better suited than the template attack when attacking
power traces with a high noise level and when given an underspecified profiling
base. Thus, the SVM attack lessens the significance of huge profiling bases, which
is the main drawback for template attacks.

Organization. Section 2 presents all necessary background in terms of side chan-


nel analysis as well as SVMs. In Section 3 we introduce the application of SVMs
in side channel analysis. Section 4 displays our experimental results, followed
by an interpretation. Section 5 concludes this paper and proposes new ideas for
further research.

2 Preliminaries
In this section we provide the reader with all necessary information about pro-
filing based side channel analysis, followed by an introduction to the area of
machine learning and support vector machines.

2.1 Side Channel Analysis


Side channel attacks exploit information that are unintentionally emitted during
the execution of a cryptographic algorithm on a device. Such emitted informa-
tion are for instance the execution time, the power consumption, or even the
electromagnetic radiance. In the area of power analysis a common attack type
are the profiling based attacks that build a profile of a training device, over
which the adversary has full control, and utilize this profile to recover secret
information from an identical target device. Finding a suitable composition of
keys, for which the profiles are built, is called leakage analysis, resulting in a
leakage model. There exists two different strategies for profiling based attacks:
classification [1,3,5] and regression [9,16]. In the following we detail the template
attack, which is the most common profiling based side channel attack.

Template Attacks. In the following we describe a template attack that builds


a template for each possible class c ∈ {1, . . . , C}, where the number of classes
C depends on the assumed leakage model. Suppose an adversary is provided
with power trace vectors {lci }N
i=1 for each class c ∈ {1, . . . , C}, where Nc is the
c

number of power trace vectors of the class c. Since template attacks rely on a
multivariate Gaussian noise model, the power trace vectors are considered to be
drawn from a multivariate distribution. More precisely,
Intelligent Machine Homicide 251

1 1
N (lc |μc , Σc ) = exp{− (lc − μc )T Σx−1 (lc − μc )} (1)
(2π)N 1/2 |Σc |1/2 2
1  1 
Nc Nc
and μ̃c = ln , Σ̃c = (ln − μ̂c )(lnc − μ̂c )T .
Nc n =1 c Nc n =1 c
c c

The construction of these templates is based on the estimation of the expected


values μ̂c as well as the covariance matrix Σ̂c . The key recovery during the attack
phase is performed using the maximum-likelihood estimator [3] or equivalently
the log-likelihood rule, given by


N2 
N2
log Lk∗ ≡ log P (li |c) = log N (li |μc , Σc ), (2)
i=1 i=1

where the class c is calculated according to the leakage model given a key guess
k ∗ and an input.

2.2 Support Vector Machines

In this section we describe the idea of classifying linearly separable data us-
ing support vector machines (SVM). Suppose we have a training set with N1
instances1 and a test set with N2 instances. Each instance in the training set
contains one assignment yi (i.e. a class label) and several attributes xi (i.e. fea-
tures2 or observed variables) with i = 1, ...N1 . Using SVM, the goal is to classify
each test instance xi with i = 1, . . . , N2 according to the corresponding data
attributes. In the following we restrict our focus on a binary classification prob-
lem and describe its extension in Subsection 2.5. Given a training set of pairs
(xi , yi ) with xi ∈ Rn and yi ∈ {±1}. Then the different attributes can be clas-
sified via a hyperplane H described as w, x + b = 0, where w ∈ Rn denotes
b
the normal to the hyperplane, w the perpendicular distance to the origin with
b ∈ R, and ·, · the dot-product in Rn . One chooses the primal decision function
τp (x) = sgn(w, x + b) to predict the class of the test data, cf. Figure 1. Thus,
one has to select the parameters w and b, which describe the hyperplane H.
While there exist many possible linear hyperplanes that separate two classes,
only one unique hyperplane maximizes the margin between the two classes. The
construction of the optimal separating hyperplane is discussed in the following.

Optimal Hyperplane Separation. Let us now consider the points that lie
closest to the separating hyperplane, i.e. the support vectors (filled black in
Figure 1). Moreover, let us define the hyperplane on which the support vectors
lie with H1 and H2 and let d1 and d2 be the respective distances of H1 and H2
1
In the context of side channel analysis instances are called measurements.
2
In the context of side channel analysis features are relevant points in time.
252 A. Heuser and M. Zohner

yi = +1
Margin

Fig. 1. Binary Hyperplane Classification

to the hyperplane H, with d1 = d2 . SVM tries to maximize the margin, which


corresponds to the following optimization problem:

1
min w2 s.b. yi (w, xi  + b) ≥ 1, i = 1, . . . , m. (3)
w,b 2

The usual approach of solving this problem in optimization theory is to transform


it into the dual form, a more appropriate form that derives the same solution.
Following, we introduce the  Lagrangian form in order to achieve the dual
form: L(w, b, α) = 12 w2 − m i=1 αi (yi (xi , w + b) − 1) with Lagrange mul-
tipliers αi ≥ 0. The Lagrangian L must be maximized with respect m to αi and
minimizedwith respect to w and b. Consequently, this leads to i=1 αi yi = 0
and w = m i=1 αi yi xi . The Karush-Kuhn-Tucker theorem [18] states that only
the solution with αi  = 0 matches the constraints in Equation (3). Substituting
these equations into Equation (2.2) derives the dual form


n
1 
n
max αi − αi αj yi yj xi , xj  (4)
α
i=1
2 i,j=1

m
s.b. αi ≥ 0, i = 1, . . . , n and αi yi .
i=1

The dual decision function of SVM is thus given by:

n
τd (x) = sgn( αi yi x, xi  + b). (5)
i=1

Note that this decision requires only the calculation of the dot-product of each in-
put vector xi , which is important for the kernel trick described in
Subsection 2.4.
Intelligent Machine Homicide 253

2.3 Soft-Margin Classification


The optimization problems, formulated in the previous section (cf. Equation (3)
and Equation (4)), have two main drawbacks. First, a hyperplane, which sep-
arates the classes is not bound to exist, since the classes might not be linearly
separable. Second, if outliers occur, a hyperplane, which fits all given instances,
might correctly describe the problem according to the training data, but fails to
be an adequate solution for the overlying problem. Thus, the soft margin classi-
fication was introduced, which allows the intentional misclassification of training
instances in order to achieve a better overall accuracy. The soft margin classifi-
cation adds slack variables ξi > 0, with i = 1, . . . , L that penalize instances on
the wrong side of H, whereas the penalty increases the with the distance. The
goal of SVM using the soft margin classification is to find a trade-off between
maximizing the margin and minimizing the number of misclassified instances.
The primal optimization problem mutates to

1 m
min w2 + C ξi s.b. yi (w, xi  + b) ≥ 1 − ξi , ξi ≥ 0 ∀i. (6)
w,b,ξ 2 i=1

Since for large ξi the constraints can always be met, an additional constant C > 0
is introduced, in order to determine the trade-off between margin maximization
and training error minimization. The conversation into the dual form is similar
to the standard case (see [17] for details).

2.4 Kernel Trick


In the previous sections we described methods to classify linearly separable data,
assuming outliers. However, in some scenarios the data might not be linearly
separable. Therefore, we sketch the idea of combining SVM with so-called kernel
functions. As mentioned in Subsection 2.2 the optimization problem, stated in
Equation (4), only requires a computation of an inner product of vectors. Thus,
if the data is not linearly separable in the original space (e.g. Rn ), one could map
the feature vectors into a space with a higher dimension. Thus, the computation
of the inner product can be extended by a non-linear mapping function Φ(·)
through xi , xj  → φ(xi ), φ(xj ). The exact mapping Φ(·) does not need to be
known, which is denoted as the kernel trick, since it is implicitly defined by the
kernel function k with k(xi , xj ) = φ(xi ), φ(xj ). The restrictions on possible
kernel functions are discussed in [18]. In the following we state two possible
kernel functions, which are also utilized in our experiments.
Example 1. Linear kernel function: k(xi , xj ) = xTi xj .
Example 2. Radial basis function (RBF): k(xi , xj ) = exp(−γxi − xj 2 ), γ > 0.

2.5 Multi-class SVM


In its classical sense, SVM is a binary classifier, i.e. it only distinguishes two
classes. However, several extensions for constructing a multi-class classifier from
254 A. Heuser and M. Zohner

a binary classifier exist, e.g: one-against-one [7], one-against-all [20], and error
coding [4]. Since all extensions perform similarly [11] we constrain ourselves to
the description of the one-against-one strategy. The one-against-one extension
trains a binary classifier for each possible pair of classes. Thus, for M classes
(M − 1)M/2 binary classifiers are trained. The prediction of all binary classifiers
is combined into the prediction of the multi-class classifier and the class with
the most votes is chosen. For more details, we refer to [7, 11].

2.6 Probability Output


The SVM, as defined in the last sections, outputs a label yi = {1, . . . , N } where
N is the number of classes. In terms of side channel attacks, where an erroneous
environment is assumed, an attacker is rather interested in the probability of
an instance xi belonging to a class yi . Therefore, instead of predicting a class,
we aim at predicting the probability PSV M (xi |c) for all classes c. Since this is a
very extensive field, we refer to [21] for a detailed description of how to calculate
PSV M (xi |c).

3 SVM in Side Channel Analysis


In this section we present the underlying attacker model, followed by the de-
scription of the SVM attack. Subsequently, we discuss an adequate metric to
compare the performance of the SVM attack to the performance of the template
attack. Finally, we define scenarios, for which we compare the performance of
the SVM attack to the performance of the template attack.

3.1 Assumed Attacker Model


We assume an attacker, who has full control of a training device during the pro-
filing phase and is able to measure the power consumption during the execution
of a cryptographic algorithm. In the subsequent attack phase, the attacker aims
at recovering an unknown secret key, processed by an identical target device, by
measuring the power consumption of multiple executions, processing a known
and random input. The assumed leakage model of the device is the Hamming
weight model.
Our attacker model differs from the attacker model considered in [8, 13] re-
garding the number of attack traces. While the authors of [8, 13] assumed only
one attack trace, we assume an attacker who is able to measure multiple power
traces in the attack phase. We consider this attacker model since it is more ap-
propriate for highlighting the ability of SVM to generalize to a given problem
and because it is the most common in the context of profiled side channel anal-
ysis [3, 5, 6, 9, 12, 16]. Note that the decisions for the analysis in this contribution
were chosen to fit our assumed attacker model. However, if the number of attack
traces is limited to one, we recommend a combination of the SVM attack with
algebraic techniques [14] in order to recover the key.
Intelligent Machine Homicide 255

3.2 How to Recover the Key


The authors of [8, 13] used SVM in order to recover single bits of the key with
an accuracy of 50% − 94%. However, due to their restriction on a single attack
trace, a high additional computational complexity is required when recovering
the whole key. Since we are not limited to a single attack trace, we can utilize
various methods that reduce the computational complexity for a key recovery.
The first method is the extension of the bit leakage model to the Hamming
weight leakage model. Using the Hamming weight leakage model, we can make
assumptions about the whole intermediate value instead of only bits of the inter-
mediate value. However, since the Hamming weight leakage model distinguishes
nine different classes instead of only two, we have to utilize the multi-class func-
tion of SVM (cf. Subsection 2.5) to extend the classification.
The second method is the extension of the attack to multiple power traces.
We combine the predictions of SVM from N power traces li with i = 1, . . . , N
for all classes c of the leakage model by using the probability outputs PSV M (cf.
Subsection 2.6) in order to perform a log maximum likelihood estimation for
each possible key k ∗ as:


N 
N
log Lk∗ ≡ log PSV M (li |c) = log PSV M (li |c). (7)
i=1 i=1

One chooses the key, which maximizes the likelihood:

arg max

log Lk∗ . (8)
k

3.3 How to Compare the Performance


Profiling based side channel attacks can be compared by various measures. The
most popular measure in the field of machine learning is the accuracy of the
model on an independent test set [8, 13]. Using the accuracy as measure is ad-
equate when only one attack trace is available or when the number of elements
of each predicted class is equal. However, for the underlying attacker model in
this contribution, which assumes multiple attack traces and a Hamming weight
leakage model, the accuracy is not suited as measure. The problem when using
the accuracy as measure is that the most likely Hamming weight class also has
the most elements.
We therefore disregard the accuracy as measure for our experiments and
choose the guessing entropy [19], which was used in [13] to evaluate the number
of remaining keys. The guessing entropy is defined as follows: let g contain the
descending probability ranking of all possible keys after N iterations of Equa-
tion (2) or Equation (7) and let i define the indices of the correct key in g. After
conducting s experiments, one obtains a matrix [g1 , . . . , gs ] and a corresponding
vector [i1 , . . . , is ]. The guessing
s entropy then determines the average position of
the correct key: GE = 1s x=1 ix . In other words, the guessing entropy describes
the average number of guesses, required for recovering the actual key.
256 A. Heuser and M. Zohner

In our experiments we use the guessing entropy in order to evaluate how many
attack traces are required in order to achieve a fixed guessing entropy. We fix
the guessing entropy by defining two thresholds: a guessing entropy of 1 (GE 1 )
and a guessing entropy below 5 (GE 5 ).

3.4 Scenarios for Profiling Based Power Analysis

In order to highlight the advantages of profiling using SVM, we evaluate the


guessing entropy of the SVM attack and the template attack for different sce-
narios. First, we vary the signal to noise ratio of the traces in order to assume
devices with different noise levels. In total we use three different signal to noise
ratios: no additional noise (low noise), 30dB (medium noise), and 15dB (high
noise). Next, we vary the size of the profiling set in order to evaluate the number
of profiling traces that are sufficient for the classifiers to accurately model the
power consumption. Since the number of corresponding profiling traces can be
very high, we are also interested in the performance of both classifiers in case of
an underspecified profiling base.

4 Experimental Results

In the following, we describe the experimental setup and the results of the com-
parison between the SVM attack and template attack. To present the results, we
first identify the influence of the parameters of SVM. Subsequently, we utilize
the knowledge of the effect of the parameters in order to determine the best set
of parameters for each scenario and state the corresponding results.

4.1 Experimental Setup

For our experiments we measured the power consumption of an ATMega-256-1


microcontroller, which was powered by an external power supply and synchro-
nized via an external frequency generator to 8 MHz. This setup was chosen
in order to stabilize the measurements. The power consumption of the micro-
controller was measured using a PicoScope6000 oscilloscope. We measured the
power consumption of the substitution using an AES S-box of the result of a
XOR between a varying input message and a key, each of 8 bit size. The AES
S-box substitution was chosen for the attack, since it is a common target in side
channel analysis and because it has a high level of diffusion. The high level of
diffusion is beneficial to the analysis using the guessing entropy since the correct
value can be determined after knowing only few Hamming weights. We measured
2700 traces for the profiling phase (300 for each of the nine Hamming weights)
and 1000 traces for the attack phase. Furthermore, as input features for the
SVM attack and the template attack we chose the points in time at which the
highest correlation between the Hamming weight of the output of the S-box and
the power consumption occurred [15].
Intelligent Machine Homicide 257

In order to obtain traces with different noise levels, we added white Gaussian
noise to the ATMega-256-1 measurements. For the traces with a low noise level
we used the original microcontroller measurements. The traces with a medium
noise level were acquired by adding 30dB of white Gaussian noise. Lastly, the
15dB of white Gaussian noise was added to the microcontroller measurements
to obtain the traces with a high noise level. As SVM implementation we applied
the C-SVC implementation of libsvm [2], which uses the one-against-one multi-
class strategy and predicts the probability output PSV M . We trained the SVM
on a profiling base starting from 180 profiling traces and increased the number
of profiling traces by 180 after each evaluation until we reach a profiling base of
2700 profiling traces.
All experiment were performed using profiling traces for which each Hamming
weight occurred equally often. The equal distribution of Hamming weights is
beneficial for the evaluation using the guessing entropy since the prediction of
Hamming weights is independent of the distribution of Hamming weights. Note
that even if the attacker is not privileged to choose the plaintexts in the profiling
phase such that all Hamming weights occur equally often, an error weight can
be inserted during the training of SVM [18]. This weight is used to penalize
errors for different Hamming weights differently such that an equally distributed
profiling base can be simulated.

4.2 Understanding the Effects of the C-SVC Parameters

The SVM implemenation C-SVC is a highly optimized all-purpose learning algo-


rithm, which makes it hard to a priori know the optimal parameters for a given
problem [18]. Thus, to get an understanding of the effect of the parameters, we
performed multiple executions of C-SVC and varied the set of parameters.
We analyze the low noise traces in order to get an estimation of the difficulty
of distinguishing the Hamming weights. Figure 2a depicts a two dimensional

−0.24 8 0.12
HW 0
−0.26 7
0.1 HW 1
Power Consumption At Time B

HW 2
−0.28 6
HW 3
0.08 HW 4
−0.3 5
Probability

HW 5
−0.32 4 0.06 HW 6
HW 7
−0.34 3 HW 8
0.04
−0.36 2
0.02
−0.38 1

−0.4 0 0
−0.4 −0.35 −0.3 −0.25 −0.2 −0.400 −0.350 −0.300 −0.250
Power Consumption At Time A Power Consumption At Time A

(a) Two Dimensional Space for Times A (b) Density for Time A
and B

Fig. 2. Distribution of the Hamming Weight 0-8 on the Low-Noise Traces


258 A. Heuser and M. Zohner

space with the two axes representing the power consumptions at times A and
B, whereas each Hamming weight is colored distinctly. Figure 2b shows the
density of each Hamming weight for the time A. Note that the instances are
visibly distinguishable by their Hamming weight in both figures and there are
only few conflicting instances (i.e. instances that have the same feature values
but a different Hamming weight).
Next, we executed C-SVC with varying parameters on the training instances
and evaluated the guessing entropy on the attack traces. The libsvm framework
allowed us to vary over the cost for a wrong classification, the termination crite-
rion, and the kernel function. The tested kernels were the linear kernel, the RBF
kernel, the polynomial kernel, the power kernel, the hybrid kernel, and the log
kernel [10]. The results indicated that the RBF kernel, with a cost factor of 10
and a termination criterion of 0.02, performed best for the low noise traces.
From our experiments we deduced that the cost factor affects the adaption of
C-SVC on errors. If the cost factor is chosen high, C-SVC tries to separate the
instances, making as few errors as possible. While a minimization of errors sounds
desirable at first, it decreases the ability of C-SVC to generalize a problem and
should thus only be chosen high when there are very few contradicting instances.
The termination criterion, on the other hand, specifies the optimality con-
straint for the constructed hyperplane [18]. If chosen high, C-SVC is more likely
to find a hyperplane in a small number of iterations. However, since C-SVC re-
lies on an optimization problem, the resulting hyperplane for a high termination
criterion may be adequate but not optimal.
Lastly, we varied the input features, i.e. the number of relevant time instances
of the power trace. Starting from the two points in time with the highest cor-
relation, we increased the number of input features until we trained the SVM
on the eight points in time with the highest correlation. For our experiments,
four input features, i.e. the four points in time, which leak the most information
about the processed variable, resulted in the smallest number of attack traces.

4.3 Comparing SVM Attack and Template Attack

After understanding the influence of the parameters of C-SVC, we compared the


SVM attack and the template attack. In the following, we first state the results
of the experiments for each differently noised trace set and then interpret the
results.

Low-Noised Traces. The first comparison was performed on the original mi-
crocontroller traces, using the parameters determined in Section 4.2, i.e. the
RBF kernel, a cost factor of 10, and a termination criterion of 0.02. For both,
the SVM attack and the template attack, we computed the guessing entropy for
an increasing profiling base. Figure 3a and Figure 3b depict the resulting classi-
fiers. The results of these experiments are listed in Table 1 and indicate that the
number of attack traces, required for recovering the correct key, is nearly equal
for both attacks.
Intelligent Machine Homicide 259

−0.2 8 −0.2 8

7 7

Power Consumption At Time B


Power Consumption At Time B

−0.25 6 −0.25 6

5 5

−0.3 4 −0.3 4

3 3

−0.35 2 −0.35 2

1 1

−0.4 0 −0.4 0
−0.4 −0.35 −0.3 −0.25 −0.2 −0.4 −0.35 −0.3 −0.25 −0.2
Power Consumption At Time A Power Consumption At Time A

(a) Classification with Templates (b) Classification with SVM

Fig. 3. Classification Models

Also, the performance of both attacks stabilizes after only 20 profiling traces
for each Hamming weight. This result was expected, since the instances of the
different Hamming weights could even visibly be distinguished. Thus, the reach-
ing the required guessing entropy threshold requires only the attack traces, which
are needed to uniquely characterize the key.

Table 1. Guessing Entropy for the SVM and template attack on traces with a low
noise level and a varying number of profiling traces per Hamming weight

GE Attack Number of Profiling Traces for each HW


20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
Template 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
GE 1
SVM 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Template 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2
GE 5
SVM 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Moderate-Noised Traces. The next comparison was performed on the traces


with a moderate noise level. In order to get an understanding of how the noise
affects the distribution of Hamming weights, we again plotted the instance dis-
tribution and the Hamming weight densities (see Figure 4a and Figure 4b). As
expected, adding normal distributed noise to the traces increases the number
of conflicting instances and thus decreases the distinguishability of the Ham-
ming weights. However, a trend is still observable. Thus, in order to decrease the
influence of errors, we reduced the cost factor for a misclassified instance to 1.
The results of the experiments are listed in Table 2. As expected, both attacks
perform require more attack traces to recover the key and need more profiling
traces in order to achieve a stable guessing entropy. However, the correct key
can still be recovered using less than 50 attack traces in most cases. Noticeable
260 A. Heuser and M. Zohner

−0.15 8 0.014

−0.2 7 0.012
Power Consumption At Time B

HW 8
6 HW 7
−0.25 0.01
HW 6
5 HW 5

Probabilty
−0.3 0.008
HW 4
4 HW 3
−0.35 0.006 HW 2
3 HW 1
−0.4 0.004 HW 0
2

−0.45 1 0.002

−0.5 0 0
−0.5 −0.4 −0.3 −0.2 −0.1 −0.45 −0.4 −0.35 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05
Power Consumption At Time A Power Consumption At Time A

(a) Two Dimensional Space for Times A and (b) Density for Time A
B

Fig. 4. Distribution of the Hamming weight 0-8 on the moderate-noise traces

Table 2. Guessing entropy for the SVM and template attack on traces with a moderate
noise level and a varying number of profiling traces per Hamming weight

GE Attack Number of Profiling Traces for each HW


20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
Template 66 50 47 44 44 43 42 42 42 41 40 42 40 40 40
GE 1
SVM - 43 35 29 27 27 26 25 25 25 23 23 25 25 24
Template 25 18 17 17 17 17 17 17 16 16 16 16 15 15 15
GE 5
SVM 187 16 14 12 11 10 10 10 10 9 9 9 10 10 10

for this experiment is the bad performance of the SVM attack compared to
the template attack on a very small profiling base, i.e. 20 profiling traces per
Hamming weight. Given a small profiling base, the template attack manages to
find the correct key using only few attack traces while the SVM attack is not
able to find the correct key using all 1000 attack traces. However, if given more
profiling traces, the SVM attack quickly surpasses the template attack in terms
of guessing entropy.

High-Noised Traces. The last experiment was conducted on traces with a high
noise level. Figure 5a and 5b depict the instance distribution and the Hamming
weight densities for these traces. As expected, the high noise level makes the
instances very hard to distinguish and a trend is only observable for Hamming
weight 0 and Hamming weight 8. However, because of the normal distributed
noise, we still expect each Hamming weight to have a high concentration of
instances around the respective expectation value. Thus, we chose the same cost
factor as for the traces with a moderate noise level.
The results of the corresponding experiments are listed in Table 3. Because
of the high noise level, the required number of attack traces for the guessing
Intelligent Machine Homicide 261

−3
x 10
0.4 8 2.5

0.2 7
Power Consumption At Time B

2 HW 3
6 HW 2
0
HW 0
5 HW 1
1.5

Probability
−0.2 HW 6
4 HW 8
−0.4 HW 4
1
3 HW 7
−0.6 HW 5
2
0.5
−0.8 1

−1 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 −1.5 −1 −0.5 0 0.5
Power Consumption At Time A Power Consumption At Time A

(a) Two Dimensional Space for Times A and (b) Density for Time A
B

Fig. 5. Distribution of the Hamming weight 0-8 on the high-noise traces

Table 3. Guessing entropy for the SVM and template attack on traces with a high
noise level and a varying number of profiling traces per Hamming weight

GE Attack Number of Profiling Traces for each HW


20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
Template - - - - - - - - - - - - - 918 937
GE 1
SVM - - 975 578 684 579 671 595 580 668 568 588 585 595 591
Template - - 928 771 610 511 487 480 425 382 373 356 343 335 325
GE 5
SVM - - 339 203 234 204 210 201 199 211 195 198 202 200 198

entropy rises drastically. Both, the template attack and the SVM attack, are
not able to recover the correct key or at least narrow the key space down to
five possible keys within 1000 attack traces, if less than 60 profiling traces per
Hamming weight are used. However, if the profiling base is increased to 60 traces
per Hamming weight, the SVM attack manages to find the correct key. Using
the same profiling base, the template attack is only able to narrow the key space
down to 5 possible key values. The template attack first manages to recover the
correct key using a profiling base of 280 traces per Hamming weight. However,
even though it manages to find the correct key, the template attack still requires
roughly twice the number of attack traces as the SVM attack.
Noticeable about the results is that, just like for the traces with a moderate
noise level, the SVM attack reaches a point where it fluctuates around a certain
number of attack traces very quickly compared to the template attack, which
decreases the number of attack traces steadily but slowly. This is observable for
GE 1 as well for GE 5 where the SVM attack starts fluctuating at a profiling
base of 80 traces per Hamming weight, whereas the template attack continues
to decrease the required attack traces in GE 5 .
262 A. Heuser and M. Zohner

4.4 Interpretation of the Results


The results of the comparison indicate that the template attack requires slightly
less attack traces when profiling traces with a low noise level than the SVM
attack. However, with increasing noise level the SVM attack outperforms the
template attack. For higher noise levels, the attack traces, required to achieve
the desired guessing entropy, increase slower for the SVM attack than for the
template attack. Also, the number of profiling traces, required for a stable num-
ber of attack traces, is smaller for the SVM attack than for the template attack.
The reason for the smaller number of required attack traces of the SVM attack
compared to the template attack on more noisy traces is the focus of SVM on
support vectors, i.e. separation of Hamming weights. A focus on the separation
does not utilize all information about the distribution of power consumptions
for each Hamming weight, but allows SVM a faster adaption to the relevant
task of the attack, i.e. the separation of Hamming weights. Also, SVM omits
all instances, which are correctly classified by the constructed hyperplane and
have a higher distance to the hyperplane than the support vectors. Thus, only
particular instances have an influence on the relocation of the hyperplane, which
explains the fluctuation of the guessing entropy.
In comparison, the template attack aims at correctly modeling the distribu-
tion of the traces, which is more accurate, but also requires more profiling traces.
When more noise is added to the traces, the variance of the measurements in-
creases and thus more profiling traces are needed in order to correctly model
the distribution. The slow and steady convergence in required attack traces can
be explained by the increasing precision of the constructed distribution and on
the other hand the decrease in influence of each additional instance. The lower
number in attack traces of the template attack for a very small profiling base, i.e.
20 profiling traces per Hamming weight, can be explained by the presumption
of normal distributed instances. However, this presumption can also become a
disadvantage of the template attack when analyzing traces, which, in contrary
to the traces used in our experiments, are not normally-distributed. SVM, on
the other hand, does not use such a presumption and thus performs worse on a
very small profiling base, but is not restricted to a particular distribution of the
traces.

5 Conclusion
In this paper we presented a new profiling based side channel attack, the so-called
SVM attack. The SVM attack utilizes the machine learning algorithm SVM in
order to classify the Hamming weight of an intermediate value, which depends
on a secret key. In order to evaluate the gain of the SVM attack, we compared
it to the template attack. The comparison between the SVM attack and the
template attack was conducted by evaluating the number of traces in the attack
phase that are required to achieve a pre-fixed guessing entropy on a variable sized
profiling base and by varying the noise level of the traces. While the template
attack required less attack traces when the noise level was low, the SVM attack
Intelligent Machine Homicide 263

outperformed the template attack on traces with a higher noise level. This can
be explained by the different focus of templates and SVM. While templates try
to model the complete power consumption distribution of a device by taking all
elements into account, SVM focuses on the separation of classes, using only some
conflicting instances and the support vectors. Thus, SVM disregards instances
that are not important for the separation of classes, which allows SVM to achieve
a stable performance using a smaller profiling base than the template attack.
Future work may concentrate on the advantage of SVM to generalize a given
problem. A possible scenario for the generalization is a profiling based attack
that conducts the profiling phase on one device and performs the attack phase
on another device, which is identical to the profiled device. This scenario is
especially interesting since it depicts a practical profiling based attack on a
device. Additionally, we plan to analyze further machine learning methods in
order to better adapt SVM to challenges in the area of side channel analysis.

Acknowledgements. We like to thank Eneldo Loza Mencia, Lorenz Weizsäcker,


and Johannes Fürnkranz from the knowledge engineering group of the Technische
Universität Darmstadt for their very helpful suggestions on data classification.

References
1. Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template At-
tacks in Principal Subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 1–14. Springer, Heidelberg (2006)
2. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011),
http://www.csie.ntu.edu.tw/~ cjlin/libsvm
3. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Koç,
Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg
(2003)
4. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-
correcting output codes. J. Artif. Int. Res. 2, 263–286 (1995),
http://dl.acm.org/citation.cfm?id=1622826.1622834
5. Elaabid, M.A., Guilley, S., Hoogvorst, P.: Template attacks with a power model.
IACR Cryptology ePrint Archive 2007, 443 (2007)
6. Gierlichs, B., Lemke-Rust, K., Paar, C.: Templates vs. Stochastic Methods. In:
Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 15–29. Springer,
Heidelberg (2006)
7. Hastie, T., Tibshirani, R.: Classification by pairwise coupling (1998)
8. Hospodar, G., Mulder, E.D., Gierlichs, B., Verbauwhede, I., Vandewalle, J.: Least
square support vector machines for side-channel analysis. In: Constructive Side-
Channel Analysis and Secure Design, COSADE (2011)
9. Kasper, M., Schindler, W., Stöttinger, M.: A stochastic method for security evalua-
tion of cryptographic fpga implementations. In: IEEE International Conference on
Field-Programmable Technology (FPT 2010), pp. 146–154. IEEE Press (December
2010)
10. Kiely, T., Gielen, G.: Performance modeling of analog integrated circuits using
least-squares support vector machines. In: Proceedings of the Design, Automation
and Test in Europe Conference and Exhibition, vol. 1, pp. 448–453 (February 2004)
264 A. Heuser and M. Zohner

11. Kreßel, U.H.G.: Pairwise classification and support vector machines, pp. 255–268.
MIT Press, Cambridge (1999),
http://dl.acm.org/citation.cfm?id=299094.299108
12. Lemke-Rust, K., Paar, C.: Analyzing Side Channel Leakage of Masked Implemen-
tations with Stochastic Methods. In: Biskup, J., López, J. (eds.) ESORICS 2007.
LNCS, vol. 4734, pp. 454–468. Springer, Heidelberg (2007)
13. Lerman, L., Bontempi, G., Markowitch, O.: Side channel attack: an approach based
on machine learning. In: Constructive Side-Channel Analysis and Secure Design,
COSADE (2011)
14. Mohamed, M.S.E., Bulygin, S., Zohner, M., Heuser, A., Walter, M.: Improved
algebraic side-channel attack on aes. Cryptology ePrint Archive, Report 2012/084
(2012)
15. Rechberger, C., Oswald, E.: Practical Template Attacks. In: Lim, C.H., Yung, M.
(eds.) WISA 2004. LNCS, vol. 3325, pp. 440–456. Springer, Heidelberg (2005)
16. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Chan-
nel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659,
pp. 30–46. Springer, Heidelberg (2005)
17. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector
algorithms. Neural Comput. 12, 1207–1245 (2000),
http://dl.acm.org/citation.cfm?id=1139689.1139691
18. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Reg-
ularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
19. Standaert, F.X., Malkin, T.G., Yung, M.: A unified framework for the analysis of
side-channel key recovery attacks (extended version). Cryptology ePrint Archive,
Report 2006/139 (2006)
20. Weston, J., Watkins, C.: Multi-class support vector machines (1998)
21. Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification
by pairwise coupling. Journal of Machine Learning Research 5, 975–1005 (2003)
Author Index

Aubert, Alain 151 Liu, Huiying 231

Bauer, Sven 82 Mangard, Stefan 1


Bayon, Pierre 151 Mansouri, Shohreh Sharif 54
Bossuet, Lilian 151 Maurine, Philippe 151
Murdica, Cédric 183
Coron, Jean-Sébastien 69
Naccache, David 183
Danger, Jean-Luc 183 Nishide, Takashi 135
Da Rolt, Jean 89
Das, Amitabh 89 Plos, Thomas 1, 17
de la Torre, Eduardo 39 Polian, Ilia 120
Di Natale, Giorgio 89 Poucheret, François 151
Dubrova, Elena 54 Prouff, Emmanuel 69

Renner, Soline 69
Endo, Takashi 105
Riesgo, Teresa 39
Rivain, Matthieu 69
Fischer, Viktor 151, 167
Robisson, Bruno 151
Flottes, Marie-Lise 89
Rouzeyre, Bruno 89
Giraud, Christophe 69 Sakurai, Kouichi 135
Guilley, Sylvain 183 Schmidt, Jörn-Marc 1
Guo, Shize 231 Shi, Zhijie 231
Stöttinger, Marc 215
He, Wei 39
Heuser, Annelie 249 Vadnala, Praveen Kumar 69
Hoogvorst, Philippe 183 Verbauwhede, Ingrid 89
Hutter, Michael 1, 17 Vuillaume, Camille 105

Ji, Keke 231 Wagner, Mathias 33


Jovanovic, Philipp 120 Wang, Tao 231
Wooderson, Paul 105
Kasper, Michael 215
Kirschbaum, Mario 1 Zhang, Fan 231
Korak, Thomas 17 Zhao, Liang 135
Kreuzer, Martin 120 Zhao, Xinjie 231
Krüger, Alexander 199 Zohner, Michael 215, 249

You might also like