Cryptacus 2018 Paper 7

Towards Efficient and Reliable
Side Channel Evaluations at Design Time
Danilo Šijačić, Josep Balasch, Bohan Yang, and Ingrid Verbauwhede

imec-COSIC, KU Leuven, Belgium
Email: name.surname@esat.kuleuven.be
Abstract—Models and tools developed by the semiconductor be efficiently combined with the SCA evaluation techniques
community have matured over decades of use. As a result, to diagnose SCA leakage starting from the earliest design
hardware simulations can yield highly accurate pre-silicon steps.
estimates for e.g. timing and area figures. In this work we
design, implement, and evaluate a highly efficient framework 1.2. Related Work
that combines a largely automated full-stack standard-cell
design flow with state of the art techniques for side channel The topic of IPC simulation in the context of SCA has
analysis. We show how this approach can be used to efficiently been previously addressed in the literature. The generation
evaluate side channel leakage prior to chip manufacturing. and analysis of IPC estimates is in fact the prevalent ap-
Moreover, it is independent of the underlying countermeasure proach to evaluate secure logic styles. To name a few, Tiri
and it can be applied starting from the earliest stages of the and Verbauwhede [2] target an AES core implemented in
design flow. In addition to the description of our toolchain Wave Dynamic Differential Logic (WDDL) [3]; Regazzoni
we provide experimental validation through assessment of the et al. [4] instruction set extensions in MOS Current Mode
side channel security of representative cryptographic circuits. Logic; Custom design flows [5], and models [6], [7], [8]
We discuss aspects related to the performance, scalability, are tailored for specific cases. It is understood from these
and utility of our approach. In particular, we show that our works that different balances of the simulation accuracy/time
toolchain can evaluate information leakage with 1 million trade-off influence the security assessment. Nevertheless, to
simulated traces in less than 4 hours using a single desktop the best of our knowledge there is no information on the
workstation, for a design larger than 100kGE. computational cost of these simulations, nor detailed insight
in how they should be carried out.
1. Introduction
1.3. Our Contributions
1.1. Motivation
We address SCA evaluations at design time in a whole-
Side Channel Attacks (SCA) are acknowledged as major some and methodical manner. To make this approach prac-
threats to cryptographic implementations. Unlike conven- tical, we develop, implement, and document a configurable
tional cryptanalysis techniques stemming from mathemat- and extensible framework. It combines state of the art EDA
ics, SCA leverage information that leaks through inherent tools for synthesis and simulation with our custom parsers
physical channels. A very popular exploitable channel is and analysis tools written in C. Secondly, we experimen-
the instantaneous power consumption (IPC) [1] of a device. tally validate our toolchain by applying it to a number
IPC measurement, or power trace carries within information of protected circuits representative of cryptographic imple-
about the values and operations internally processed by a cir- mentations. Lastly, we apply our toolchain to analyse a
cuit, including cryptographic keys. SCA security assesment larger circuirt and provide a discussion of our toolchain and
is ultimately done after a device is manufactured. Vulnera- proposed methodology.
bilities discovered by millions of post-silicon measurements
cause major set backs, or can detect flaws that require 2. Design Time Evaluation Framework
complete redesign. In this context, simulations rise as an
attractive alternative to assess the security of cryptographic Precision of modern EDA models and absence of noise
implementations at design time. Simulation techniques for allow simulations intimate observations of the target circuit,
typical hardware design constraints are long-studied and otherwise unatainable using measurements. Available early
well-integrated into Electronic Design Automation (EDA) in the design flow, simulations may greatly reduce time
tools. Hence they can provide remarkably accurate area, to market and make the entire design process error prone.
delay, and power estimates even in the earliest design stages. Hence SCA evaluations at design time can be great help
We aim to leverage this knowledge and show how it can for the digital designers. Nevertheless, simulations can not
capture manufacturing defects and variations. For this reason >_
they are not a replacement for lab evaluations, but a valuable CLI LM Handlers
design aid.
1 2 3 4 5 LS LT PS LSIM PSIM SSIM
SM
2.1. Design Rationale Parameters
2 run.LS
beh.LSIM
2 1 syn.LSIM
...
32 run.LT 3 2 1 gln.LSIM
43 2 run.PS 4 3 2 1 par.LSIM
We focus on standard-cell (SC) design flow, as the most
widely used in practice. SC libraries are collections of basic 5 Analyzers 5 Parsers Generators
(e.g., AND2, OR2, INV), as well as more complex (e.g., TVLA DoM ... LP PP SP TG DG SG
MUX21) cells. They entail different physical “views” of
SCs that are used for synthesis, simulation and optimiza- *.afd *.pff 1 tb.v 2 syn.sdf
tion. SC design flow can be generally divided into three X C tool User I/O x Textual file i Parameter set i
design stages: behavioral, structural, and physical. In the X Python tool >_ Command line interface x Binary file i x Parametrized file
behavioral stage, the functionality of a design is commonly Figure 1: High level architecture of the toolchain.
captured using hardware description languages. In the struc-
tural stage, the design is mapped to a set of SCs from a TABLE 1: List of commercial EDA tools used.
library through the processes of logic synthesis and library Acronym Function Tool
translation. In the physical stage, the design is brought to LS Logic synthesis Synopsys Design Compiler
its final, post-layout form through the process of physical LT Library translation Synopsys Design Compiler
synthesis. After each synthesis step, design estimates are PS Physical synthesis Cadence Innovus
obtained by means of simulation. Designs are to proceed LSIM Logic simulation MentorGraphics ModelSim
PSIM Physical simulation Synopsys PrimeTime with PX plugin
to the next stage, only once they fit all constraints. If this SSIM SPICE simulation Synopsys HSPICE
is not the case, designers can immediately proceed to fix
their design. The goal of our work is to incorporate SCA
evaluations at design time in the same manner. description in e.g. Verilog. Logic functionality is synthesized
Information leakage estimates obtained at different ab- (SYN) using generic logic gates. This functionality is then
straction layers need to be analyzed in order to assess mapped to library cells of a concrete library, to form a
the security of a circuit. We opt for a lightweight method gate-level netlist (GLN). Placed and Routed (PAR) design
Test Vector Leakage Assessment (TVLA) methodology pre- stage comes last, before the tape-out. Generators are utility
sented in [9] uses the T-test distinguisher to detect statistical
dependencies between sensitive data and side channel infor-
mation contained in IPC measurements. The test analyzes design.v 2 run.LS 3 2 run.LT 43 2 run.PS
two classes of measurements (partitioned according to sensi-
BEH LS SYN LT GLN PS PAR
tive information) to assess whether their means are different.
A so-called t value is calculated by applying the Welch’s t-
test. If the t value is outside the ±4.5 range, the test rejects x Design stage
x Commercial tool
the null-hypothesis with confidence greater than 99.999% SCA evaluation
SCA closure 2 1 syn.LSIM 3 2 1 gln.LSIM 4 3 2 1 par.LSIM
for large numbers of measurements, i.e. indicating that the
mean of the sets at a particular sample is distinguishable Figure 2: SC design flow stages using our toolchain.
and thus highlighting the existence of side-channel leakage.
tools which primarily aid the automation. The test bench
2.2. Framework Description generator (TG) produces test benches based on Verilog code
of the design (e.g. tb.v) and parameters obtained from the
Our framework is depicted in Figure 2. Through a simple SM. A Delay Generator (DG) is additionally used to annotate
Command Line Interface (CLI) it allows traversal of entire generic netlists at SYN design stage (see ∆-delay model
design and analysis stages. in Section 2.3). We design a set of parsers optimized to
The Session Manager (SM) is responsible for keeping process and store IPC data in a SCA-friendly manner; and
track of sets of parameters of the design. We distinguish 5 implement it in C. Regardless of the type of data we parse,
types of parameters:
1 simulation, 2 design constraints, 3 Logic Parsers (LP), Power Parsers (PP), and SPICE Parsers
design resources, 4 physical constraints,
5 power models. (SP) output a Power Frame File (PFF), a custom binary
Handlers wrap commercial EDA tool, thereby facili- format. We use analyzers to process PFF files. Each sub-
tating design and simulation stages in a streamlined and module implements a technique necessary to assess the SCA
automated manner. They produce TCL scripts (e.g. run.LS, security of the circuit, e.g. TVLA or DPA. In particular
par.LSIM) used to control the underlying tools. The set for TVLA, we follow the roadmap due to Schneider and
of commercial tools we currently rely on, together with its Moradi [10] which enables estimation and computation of
acronym, is given in Table 1. The traversal of design stages distribution parameters in a single pass. The reason why
is depicted in Figure 2. The initial behavioral (BEH) stage we abstain from applying the faster leakage assessment
enables design capture and functional simulation of a circuit of Reparaz et al. [11] is the prohibitive cost of storing
histograms in our setting, i.e. with high sampling rates and 3. Experimental Validation
quantization bits larger than 8 bits.
To validate our toolchain we perform series of experi-
2.3. Simulation Models ments using different representative cryptographic circuits,
design stages, and model flavors. Due to spatial limitations
Composite Current Sources (CCS) are state of the art SC we present only the first-order secure TI PRESENT S-box
models used for timing and power simulations with accuracy by Poschmann et al. [12]. The 4-bit PRESENT S-box is
comperable to SPICE. For each SC they capture a high decomposed into two quadratic S-boxes F and G, which
level of detail, such as asymmetries of transitions caused are split into three shares in accordance to the principles of
by different input pins of a SC. We rely on these models in Threshold Implementations (TI). The overall design requires
GLN and PAR stages. In earlier stages we use using ∆-delay three shares per variable and consumes no internal random-
models. For each SC, regardless of its functionality, size, or ness. The total number of inputs (resp. outputs) is thus 12
input transition, the output is delayed by ∆ such that: (4 sensitive bits masked with 3 shares), leading to more
than 16 million measurements for all possible transitions.
1) ∆ = 0, also known as zero-delay model; We perform all experiments using a single thread of the
2) ∆ > 0; Intel i7-7700k. We use a 45nm open source SC library from
3) ∆ = f (F, δ), where F is the cells’ fanout, and δ NanGate. Simulation resolution is set to 1ps.
is the delay when F = 1. Combining EDPC and first-order fix1 vs. random TVLA
The selection of f allows to refine the model. In this work for this circuit results in a contant t-value equal to zero using
we opt for a simple version given in Equation 1, MSM power models. As expected, this design shows no
signs of leakage in the first-order moment. Figure 3 shows
∆ = δ(1 + (F − 1)θ), (1) the result of the first-order TVLA given that one share is
turned off (fixed input value to zero). As expected, breaking
where θ represents a scaling factor between 0.05 ≤ θ ≤ defining property of TI yields an insecure implementation,
0.20. Commonly the number of toggles is used to represent caught by our tools. Additionally, Figure 3 shows how well
IPC in the SCA community. It is a very crude model based MSM compares to CCS models. Figure 4 shows the result of
on the predominance of dynamic power consumption. It
is completely symmetrical in a sense that each transition 30
always results in a Dirac-like pulse of unitary height. To 20
address the assymetry between rising and falling edges of a
10
SC, we ameliorate this model as shown in Equation 2.
t-score
 0
 1 for rising edge,
10
Ptransition = 1 − α for falling edge, (2)
 0 otherwise, 20 CCS
MSM
300 500 1000 1500 2000 2500 3000 3500
with parameter −1.0 ≤ α ≤ 1.0. We refer to it as Marching- t[ps]
Sticks Model (MSM) for its graphical interpretation. In a
sense, we can relate MSM to CCS power models in the Figure 3: First order TVLA, MSM vs. CCS at GLN stage,
same manner as ∆-delay models relate to the timing ones. 224 − 212 traces, first share off.
the second-order TVLA for the first 1 million of transitions.

2.4. Simulation Methodology As expected, this implementation is not secure in higher
orders.
Our methodology sits directly on top of every stage in
the traditional SC design flow. While these stages may be
technically alike to the functional simulations for timing 4. Discussion
closure (e.g., use same tools), the rationale behind them is
For this approach to be practical it has to yield results in
completely different. In the traditional design flow designers
a timely manner, such that designers are delayed minimally.
care about the values in the steady state, i.e. after all
Table 2 gives runtimes of simulations and processing of IPC
transitions have settled. In contrast, we focus on observing
data per 1 million traces at PAR stage that involves fully ex-
changes in the dynamic IPC caused by an input transition.
tracted layout parasitics. These numbers are obtained using
In order to capture all possible transitions of a circuit with
a single thread. Due to the embarassingly parallel nature of
n input bits we need to simulate 22n − 2n transitions.
these simulations computations can be broken into smaller
We call this type of simulation Exhaustive Dynamic Power
batches (e.g. 10k traces large), and fed to the LSIM → LP →
Capturing (EDPC). We find this feasible for circuits with
TVLA pipeline. Since all TVLA evaluations are performed
up to 16 input bits. We primarily use EPDC for rigorous
on the fly using [10], in case a faulty implementation is
evaluation of smaller design blocks, such as S-Boxes or
masked AND gates. For larger designs, we use (pseudo-) 1. Fix set corresponds to the case were all 4 unshared input bits equal
random transitions. zero.
Acknowledgments
40
This project has received partial funding from the Eu-
ropean Unions Horizon 2020 research and innovation pro-
20
gramme under the Marie Skłodowska-Curie grant agreement
No. 643161, and HECTOR grant agreement No. 644052; as
t-score
0 well as partial funding from Intel.
References
20
[1] P. C. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in
Advances in Cryptology - CRYPTO ’99 (M. J. Wiener, ed.), vol. 1666
40 of LNCS, pp. 388–397, Springer, 1999.
[2] K. Tiri and I. Verbauwhede, “Simulation models for side-channel
0 500 1000 1500 2000 2500 3000 3500
t[ps] information leaks,” in Design Automation Conference - DAC 2005
(W. H. J. Jr., G. Martin, and A. B. Kahng, eds.), pp. 228–233, ACM,
Figure 4: Second order TVLA, MSM at PAR stage, 1 million 2005.
traces. [3] K. Tiri and I. Verbauwhede, “A logic level design methodology for
a secure DPA resistant ASIC or FPGA implementation,” in Design,
Automation and Test in Europe - DATE 2004), pp. 246–251, IEEE
Computer Society, 2004.
analyzed designers may get feedback much before all the
[4] F. Regazzoni, A. Cevrero, F. Standaert, S. Badel, T. Kluter, P. Brisk,
traces are run. To test the scalability of this approach we Y. Leblebici, and P. Ienne, “A design flow and evaluation framework
apply this recipe to a fully unrolled, 127kGE large AES for dpa-resistant instruction set extensions,” in Cryptographic Hard-
circuit. We find the obtained numbers favorable. Namely, ware and Embedded Systems - CHES 2009 (C. Clavier and K. Gaj,
under these assumption 1 million traces at 1ps resolution, eds.), vol. 5747 of LNCS, pp. 205–219, Springer, 2009.
evaluated in less than 36 thread hours. This maps to 4 hours [5] K. Tiri and I. Verbauwhede, “A digital design flow for secure in-
on a single $300 CPU for a design that exceeds 100kGE. tegrated circuits,” IEEE Trans. on CAD of Integrated Circuits and
Systems, vol. 25, no. 7, pp. 1197–1208, 2006.
Since all models are deterministic this can easily be run
on multiple machines, making it a very efficient and cost- [6] M. Aigner, S. Mangard, F. Menichelli, R. Menicocci, M. Olivieri,
T. Popp, G. Scotti, and A. Trifiletti, “Side channel analysis resistant
effective solution for side channel evaluations at design time. design flow,” in 2006 IEEE International Symposium on Circuits and
Systems, pp. 4 pp.–2912, May 2006.
TABLE 2: Runtime per 1 million traces, per 1 thread of [7] A. Moradi, M. Salmasizadeh, M. T. M. Shalmani, and T. Eisenbarth,
i7-7700k, for PAR stage simulations. “Vulnerability modeling of cryptographic hardware to power analysis
attacks,” Integration, the VLSI Journal, vol. 42, no. 4, pp. 468 – 478,
Design Area [kGE] Period [ps] Simulate [h] Process [h] 2009.
PRESENT 0.35 3600 0.01 0.01 [8] D. Fujimoto, M. Nagata, T. Katashita, A. T. Sasaki, Y. Hori, and
AES-128 127.18 36000 30.33 5.58 A. Satoh, “A fast power current analysis methodology using capacitor
charging model for side channel attack evaluation,” in Hardware-
Oriented Security and Trust - HOST 2011, pp. 87–92, IEEE, 2011.
[9] J. Cooper, E. DeMulder, G. Goodwill, J. Jaffe, G. Kenworthy,
5. Conclusions and Future Work and P. Rohatgi, “Test Vector Leakage Assessment (TVLA) method-
ology in practice.” International Cryptographic Module Confer-
ence, 2013. http://icmc-2013.org/wp/wp-content/uploads/2013/09/
goodwillkenworthtestvector.pdf.
In this work we present a comprehensive and extensible
framework for SCA evaluation at design time. Based on state [10] T. Schneider and A. Moradi, “Leakage assessment methodology -
A clear roadmap for side-channel evaluations,” in Cryptographic
of the art EDA tools and SCA evaluation methodologies,
Hardware and Embedded Systems - CHES 2015 (T. Güneysu and
it combines them in a methodical and automated manner. H. Handschuh, eds.), vol. 9293 of LNCS, pp. 495–513, Springer, 2015.
We benchmark the performance of our complete toolchain
[11] O. Reparaz, B. Gierlichs, and I. Verbauwhede, “Fast leakage as-
to show its suitability in testing realistic cryptographic de- sessment,” in Cryptographic Hardware and Embedded Systems -
signs, and argue its feasibility for real-world use even when CHES 2017 (W. Fischer and N. Homma, eds.), vol. 10529 of LNCS,
relying on a single desktop workstation. Our next steps are pp. 387–399, Springer, 2017.
to gain insight on the influences of different SC libraries [12] A. Poschmann, A. Moradi, K. Khoo, C. Lim, H. Wang, and S. Ling,
and to explore the effects of individual parameters in more “Side-channel resistant crypto for less than 2, 300 GE,” J. Cryptology,
detail. Furthermore, we plan to study the fine implications vol. 24, no. 2, pp. 322–345, 2011.
of simulation artifacts on the manufactured chip through
comparison to actual measurements in a lab setting. We
aim to use the insights gained along this research to refine
existing models for the needs of more efficient and reliable
SCA evaluation at design time.

Cryptacus 2018 Paper 7

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cryptacus 2018 Paper 7

Uploaded by

Copyright:

Available Formats

Towards Efficient and Reliable

Side Channel Evaluations at Design Time

Danilo Šijačić, Josep Balasch, Bohan Yang, and Ingrid Verbauwhede

the second-order TVLA for the first 1 million of transitions.

0 well as partial funding from Intel.

You might also like