You are on page 1of 130

Information Security – Theory vs.

Reality
0368-4474, Winter 2013-2014

Lecture 6:
Machine Learning Approach
to Side-Channel Attacks
Guest lecturer: Guy Wolf

1
Machine Learning Approach
to
Side-Channel Attacks

Guy Wolf

Tel Aviv University

guy.wolf@cs.tau.ac.il

2013

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 1 / 61


Outline
1 Introduction
Side-Channel Leaks
Machine Learning
2 Quick Recap: Correlation Power Analysis
3 Alternative Attack: Template Power Analysis
Dimensionality Reduction
Classification
Power Trace Alignment
4 Information & Activity Leaks
Hidden Markov Model
Acoustic Analysis
5 Conclusion

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 2 / 61


Side-Channel Leaks

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Power Consumption
Y
H
HH
H
H

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Probing
Power Consumption 6
Y
H
HH
H
H

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Probing
Power Consumption 6
Y
H
Response Times
HH *

H 
H 


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Probing
Power Consumption 6
Y
H
Response Times
HH *

H 
H 


HH
HH
j
H
Acoustics

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Probing
Power Consumption 6
Y
H
Response Times
HH *

H 
H 


HH
HH
j
H
Acoustics
?
Electromagnetic Radiation

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Side-Channel Leaks

Probing
Power Consumption 6
Y
H
Response Times
HH *

H 
H 



 HH

 HH
Temperatures j
H
Acoustics
?
Electromagnetic Radiation

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 3 / 61


Processing Leaked Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-
-
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-
+)
⊆ RO(100
-
-
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

leaked data
-
+)
⊆ RO(100
-
-
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Processing Leaked Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 4 / 61


Machine Learning

Machine learning encompasses tools that perform smart analysis of


data, such as:
Discovery of useful, possibly unexpected, patterns in data
Non-trivial extraction of implicit, previously unknown and
potentially useful information from data
Exploration & analysis, by automatic or semi-automatic means,
of large quantities of data in order to discover meaningful
patterns
Common tasks include:
Dimensionality reduction
Clustering & Classification
Regression & Out-of-sample extension

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 5 / 61


Quick Recap:
Correlation Power Analysis

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 6 / 61


Quick Recap: Correlation Power Analysis

Key k -
- Ciphertext
OC
- C
Plaintext p C
C
C
C
Internal value y = f (k, p)

Typical assumption:
power consumption correlates with Hamming Weight of y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 7 / 61


Quick Recap: Correlation Power Analysis
Plaintexts: Traces: Internal Vals.: Hamm. Wts.:

p1 -  - 00101001 - 3

.. .. .. ..
. . . .
pn -  - 10011011 - 5

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 8 / 61


Quick Recap: Correlation Power Analysis
Traces matrix: Hamming Weight matrix:
times key candidates
:::: trace x :::: 1

.. .. .. .. .. .. ..
plaintexts

plaintexts
. ......
:::: trace x :::: n

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 9 / 61


Quick Recap: Correlation Power Analysis


Traces matrix: Hamming Weight matrix:
times key candidates

~
:::: trace x ::::
P 1
P PP
..
Hamming weight of
.. .. P
.. j
.. ..
plaintexts

plaintexts
internal value for

.
designated plaintext
. . . ..

and key candidate

 

:::: trace x :::: n

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 9 / 61


Quick Recap: Correlation Power Analysis
Traces matrix: Hamming Weight matrix:
times key candidates
:::: trace x :::: 1

.. .. .. .. .. .. ..
plaintexts

plaintexts
. ......
:::: trace x :::: n

Correlation indicates the correct time and key

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 9 / 61


Quick Recap: Correlation Power Analysis
Traces matrix: Hamming Weight matrix:
times key candidates
:::: trace x :::: 1
Pearson Coefficient:  
cov(x ,y )
linear correlation std(x
.. .. .. .. .. .. ..
plaintexts

plaintexts
)std(y )

Mutual Information:
.
general statistical correlation
(Entropy(x ) + Entropy(y ) − Entropy(x , y ))
......
:::: trace x :::: n

Correlation indicates the correct time and key

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 9 / 61


Alternative Attack:
Template Power Analysis

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 10 / 61


Alternative Attack: Template Power Analysis
Training: Learn traces of many plaintexts & keys
times
:::: trace ::::
:::: trace ::::
:::: trace ::::
:::: trace ::::
plaintexts & keys

:::: trace ::::


:::: trace ::::
:::: trace ::::
..
.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 11 / 61
Alternative Attack: Template Power Analysis
Training: Learn traces of many plaintexts & keys
times
:::: trace ::::

*
 


plaintext = p1  :::: trace :::: key = k0

:::: trace ::::
A @ 

A @
A @
A
R
@ :::: trace :::: 


:::: trace ::::

A
A
key = k1
:::: trace ::::


A 
A
U
A
:::: trace ::::
..
.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 11 / 61
Alternative Attack: Template Power Analysis
Training: Learn traces of many plaintexts & keys
times
:::: trace ::::

*
 


plaintext = p1  :::: trace :::: key = k0

:::: trace ::::
A @ 

A @
A @
A
R
@ :::: trace ::::  

:::: trace ::::  key = k1

A
A
A :::: trace :::: s in gle t r a c

e
A
usin g a
U
A
::::
Online:
Attack trace ::::

..
.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 11 / 61
Alternative Attack: Template Power Analysis
Representing power traces as vectors

Essentially, power traces are high-dimensional vectors with


m = O(100+) of measurements.

:::: trace x ::::


Rm
 
x[1] , x[2] , ··· , x[m] ∈

Traces can be analyzed as vectors in the Euclidean space m with `2 R


norms & distances
Somewhat arbitrary representation, but effective and convenient

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 12 / 61


Alternative Attack: Template Power Analysis
Learning & analyzing trace vectors

Template assumption: the positions of traces in Rm (with `2 norm)


correlate with the plaintext & key values that generated them
(p0 , k0 )
(p1 , k1 )

(p2 , k2 ) (p3 , k3 )

Correlation is either directly seen or via some internal value


Ideally, correlation is expressed as clustering by plaintext & key
Theoretically, machine learning tools can be used to detect clustering
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 13 / 61
Alternative Attack: Template Power Analysis
Example: classic TPA attack using Gaussian statistical modeling1

Model power consumption of encrypting plaintext p with key k as a


~ (k,p) ∼ N (~
random variable X µ(k,p) , Σ(k,p) )
~ (k,p) is drawn from a multidimensional normal (Gaussian) distribution
X
~ (k,p) ∈
µ Rm is the mean power consumption for k & p
Σ(k,p) is the m × m noise covariance matrix for k & p
The likelihood of a trace ~x originating
 from k & p is 
−1/2
m
L(k,p) (~x ) = (2π) |Σ(k,p) | ~ (k,p) )T Σ−1
exp − 12 (~x − µ (k,p) (~
x − µ
~ (k,p) )

Attack single trace (with known plaintext p):


Compute likelihood of it originating from each key candidate (with p)
Choose key candidate with maximum likelihood

1
“Template Attacks”, 2003, by S. Chari, J.R. Rao, and P. Rohatgi
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 14 / 61
Alternative Attack: Template Power Analysis
Curse of dimensionality

Unfortunately, directly analyzing high-dimensional vectors is usually


unfeasible due to the “Curse of Dimensionality”.

Curse of Dimensionality
A general term for various phenomena that arise when
analyzing/organizing high-dimensional data.
Common theme - difficult/impractical/impossible to obtain statistical
significance due to sparsity of the data in high-dimensions
Causes poor performance/results of statistical methods compared to
low-dimensional data

Common solution - use dimensionality reduction methods and analyze their


resulting embedded space.
Example: only use the ` < m time indices that provide the highest differences
between mean power consumptions of different key-plaintext pairs.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 15 / 61
Dimensionality Reduction
with
Principal Component Analysis

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 16 / 61


Dimensionality Reduction
Principal Component Analysis (PCA)

3D space
6
-


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 17 / 61


Dimensionality Reduction
Principal Component Analysis (PCA)

3D space
6
-


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 17 / 61


Dimensionality Reduction
Principal Component Analysis (PCA)

Assume:
avg = 0

3D space
6
-

Find:
max variance directions

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 17 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - covariance matrix
traces
times
























times

Covariance

=


matrix

traces









tracei [t1 ] · tracei [t2 ]


P
cov (t1 , t2 ) ,
i

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 18 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - spectral theorem

Eigenvector
H
Hj
H
 t1
Spectral theorem Eigenvalue
@
applies to cov. R
@
λi φi = Covariance
matrix φi
matrices:
 t2
Eigenvectors
 H
 HH
 H

 j
H
@
@
Covariance @
matrix = @
@ SVD (Singular Value
@
@
@
@ Decomposition)
6

Eigenvalues

P
Spectral Theorem: cov (t1 , t2 ) = λi φi (t1 ) φi (t2 )
i
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 19 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - truncated SVD

eig
en
va
lu
es
λ1 λ2 λ3 λ4 λ5

Many datasets (incl. power traces) have a decaying cov. spectrum


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 20 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - truncated SVD
Eigenvectors
 H
 HH
 H

 j
H
@
@
Covariance @
matrix = @
@
@
@
@@
| {z }
principal components 6

Eigenvalues

Approximate cov. matrix by truncating small eigenvalues from SVD


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 20 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - example

Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector ψ =1

Points on the line are defined by multiplying ψ ~ by scalars


The traces can be formulated as xi = ci ψ ~
P P ~ ~
Covariance: cov(t1 , t2 ) = xi [t1 ]xi [t2 ] = ci ψ[t1 ]ci ψ[t2 ] =
i i
~
( c 2 )ψ[t
P ~ ~ 1 )ψ(t
c k2 ψ(t
1 ]ψ[t2 ] = k~
~ 2) ~c , (c1 , c2 , . . .)
i
i

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 21 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector ψ =1

Points on the line are definedkck


by2 multiplyingψ ψ ~ by scalars
The traces can be formulated as xi = ci ψ ~
Covariance
Covariance: = P
ψ P ~
cov(t1 , t2 ) = xi [t1 ]xi [t2 ] = ci ψ[t
Matrix i
~
1 ]ci ψ[t2 ] =
i
~
( c 2 )ψ[t
P ~ ~ 1 )ψ(t
c k2 ψ(t
1 ]ψ[t2 ] = k~
~ 2) ~c , (c1 , c2 , . . .)
i
i

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 21 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector ψ =1

Points on the line are defined by multiplying ψ ~ by scalars


The traces can be formulated as xi = ci ψ ~
P P ~ ~
Covariance: cov(t1 , t2 ) = xi [t1 ]xi [t2 ] = ci ψ[t1 ]ci ψ[t2 ] =
i i
~
( c 2 )ψ[t
P ~ ~ 1 )ψ(t
c k2 ψ(t
1 ]ψ[t2 ] = k~
~ 2) ~c , (c1 , c2 , . . .)
i
i

Covariance matrix has a single eigenvalue k~c k2 and a single


~ which defines principal direction of the trace-vectors
eigenvector ψ,

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 21 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

f
v

f
v

f
v
3D space
6 f
v
-
f
v


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 22 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

f
v

f
v
φ1 = ψ
f
v

3D space
6 f
v
-
f
v


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 22 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

3D space
6
-


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 22 / 61


Dimensionality Reduction
Principal Component Analysis (PCA) - example

Length: eigenvalues
λ1 φ1
Direction: eigenvectors

λ2 φ2 
3D space
I
@
6
-


principal components ⇒ max var directions


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 22 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - projection

Projection on principal components:


Principal components





















Traces














Guy Wolf (TAU) Machine Learning & Side-Channels 2013 23 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - projection

Projection on principal components:

3D space λ1 φ1
6

- 
7→

1D space
-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 23 / 61


Dimensionality Reduction
PCA algorithm

PCA algorithm:
1 Centering
2 Covariance
3 Eigendecomposition
4 Projection

Alternative method: Multi-Dimensional Scaling (MDS) - preserve


distances/inner-products with minimal set of coordinates.

Short tutorial on PCA & MDS:


www.cs.haifa.ac.il/∼rita/uml course/lectures/PCA MDS.pdf

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 24 / 61


Dimensionality Reduction
Summary

Traces (high-dim. vectors): Projected traces


(low-dim. vectors):

.. 7→ PCA

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 25 / 61


Dimensionality Reduction
Summary

Traces (high-dim. vectors): Projected traces


(low-dim. vectors):

.. 7→ PCA

.
Next task: how to find keys from the low-dimensional vectors?
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 25 / 61
Clustering & Classification
with
Support Vector Machine

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 26 / 61


Classification
Clustering

Cluster analysis
Clustering - the task of grouping
objects such that objects in the
same cluster are more similar to
each other than to those in other
clusters.

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 27 / 61


Classification
Clustering

Cluster analysis
Clustering - the task of grouping
objects such that objects in the
same cluster are more similar to
each other than to those in other
clusters.

Learning types
Unsupervised learning: Supervised learning:
Trying to find hidden Inferring functions from
structures in unlabeled data. labeled training data.

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 27 / 61


Classification
Clustering & classification approaches

Classic TPA using Gaussian statistical models:


The analysis considers many clusters (one for each key-plaintext pair)
Clusters are assumed to correlate with normally distributed random
~ (k,p) ∼ N (~
variables X µ(k,p) , Σ(k,p) )
Requires many traces for each key-plaintext pair to compute µ
~ (k,p) & Σ(k,p)

Simplified bit clustering with Support Vector Machine (SVM):


Classify each bit separately - only two classes are considered for each bit
Requires less training traces than classic TPA - traces are grouped by
bit values, not by the key value
No statistical assumptions required - geometric classification using a
separating hyperplane

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 28 / 61


Classification
Simplified bit clustering - only two classes

bit i of key
7→ 0  B 1
 BN

(p1 , k1 )
(p0 , k0 )

(p2 , k2 ) (p3 , k3 )

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 29 / 61


Classification
Support Vector Machine (SVM) - separation with hyperplane

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 30 / 61


Classification
Support Vector Machine (SVM) - separation with hyperplane

b=0

b=1

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 30 / 61


Classification
Support Vector Machine (SVM) - separation with hyperplane



b=0

 b=1

Guy Wolf (TAU)



Machine Learning & Side-Channels 2013 30 / 61
Classification
Support Vector Machine (SVM) - separation with hyperplane



b=0

 b=1

Guy Wolf (TAU)



Machine Learning & Side-Channels 2013 30 / 61
Classification
Support Vector Machine (SVM) - separation with hyperplane

b=0

b=1

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 30 / 61


Classification
Support Vector Machine (SVM) - separation with hyperplane



b=0 

 b=1

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 30 / 61
Classification
Support Vector Machine (SVM) - quantifying robustness with margins






b=0 



 b=1


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 31 / 61
Classification
Support Vector Machine (SVM) - quantifying robustness with margins





b=0 


 b=1


Guy Wolf (TAU) Machine Learning & Side-Channels 2013 31 / 61
Classification
SVM formulation - hyperplane



~ · ~x > 0
w



~
w
Y0
H
 ~ · ~x < 0
w



Guy Wolf (TAU) Machine Learning & Side-Channels 2013 32 / 61
Classification
SVM formulation - hyperplane with margin



~ · ~x ≥ α
w

 


~
w


αH
Y0 

 ~ · ~x ≤ −α
w



Guy Wolf (TAU) Machine Learning & Side-Channels 2013 33 / 61
Classification
SVM formulation - shifted hyperplane with margin

~ · ~u
c=w 

~ · ~x − c ≥ α
w

 


~
w

Y~u 
αH

 ~ · ~x − c ≤ −α
w



Guy Wolf (TAU) Machine Learning & Side-Channels 2013 34 / 61
Classification
SVM algorithm

SVM training
Input:
Points {~xi } from PCA of the traces
Labels {bi } according to attacked bit:

1 bit is 0
bi =
−1 bit is 1

Solve the quadratic program (e.g., using Lagrange multipliers):

Find max α
~ · ~xi − c ≥ bi α
s.t. w

Output: the solution (~


w , c, α)

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 35 / 61


Classification
SVM algorithm

SVM classifier
Input:
New point ~x from PCA projection of attacked trace
The solution (~
w , c, α) from SVM training
~ · ~x − c:
Classify by value of w

attacked bit is 0 attacked bit is probably 1


z }| { z }| {
α
−α | {z } | {z }
attacked bit is probably 0 attacked bit is 1

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 35 / 61


Power Analysis with PCA & SVM
based on recent paper2

SVM - 1st bit



..
- PCA .
@@
R SVM - 7th bit

Use single PCA & multiple SVMs (one per bit)


to learn traces (in training phase) and attack a key byte
2
“Side channel attack: an approach based on machine learning”, 2011,
by L. Lerman, G. Bontempi, and O. Markowitch
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 36 / 61
Power Analysis with PCA & SVM
based on recent paper2
PCA results (colored by specified bit values):

Effective SVM attack: SVM attack will fail:

Empirical results (on 3DES): desc. success rate (by bit position)
7th bit success rate: ∼95% −→ 1st bit success rate: ∼50%
2
“Side channel attack: an approach based on machine learning”, 2011,
by L. Lerman, G. Bontempi, and O. Markowitch
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 36 / 61
Power Trace Alignment with DTW

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 37 / 61


Power Trace Alignment
Motivation

Power traces can be misaligned for several reasons, such as


Synchronization issues between the sampling devices and the
tested hardware
Clock variabilities and instabilities
Intentional countermeasures such as delays and modulations

Misaligned traces ⇒ incorrect/inaccurate correlations ⇒ wrong


classification and useless attacks

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 38 / 61


Power Trace Alignment
Naı̈ve approach: static alignment by time offset

Theoretically:

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 39 / 61


Power Trace Alignment

Theoretically:
 a
Naı̈ve approach: static alignment by time offset

Use time offset


to align traces

a-

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 39 / 61


Power Trace Alignment
Naı̈ve approach: static alignment by time offset

Realistically:

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 39 / 61


Power Trace Alignment
a
Naı̈ve approach: static alignment by time offset

a-
Realistically:

 a a-
Which offset to use?

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 39 / 61


Power Trace Alignment
Machine-learning approach: adaptive alignment by Dynamic Time Warp (DTW)

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 40 / 61


Power Trace Alignment
Machine-learning approach: adaptive alignment by Dynamic Time Warp (DTW)

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 40 / 61


Power Trace Alignment
Machine-learning approach: adaptive alignment by Dynamic Time Warp (DTW)

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 40 / 61


Power Trace Alignment
Using pairwise alignment in an attack

Training:
1 Acquire power traces
2 Choose reference trace (e.g., arbitrarily or use mean of all traces)
3 Align each trace to the reference trace using the pairwise
alignment
4 Apply training algorithm (e.g., PCA & SVM) to the aligned
traces

Online:
1 Acquire trace from attacked hardware
2 Align trace to the reference trace (from the training) using
pairwise alignment
3 Apply classification algorithm (e.g., PCA & SVM)
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 41 / 61
Power Trace Alignment
Pairwise alignment
Trace x -
i a
Pairwise diff. matrix:
each cell holds difference

6j a qe  between two trace entries

H
HH
H x [i] − y [j]
HH

Trace y

H
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61
Power Trace Alignment
Pairwise alignment
Trace x -

Alignment path:
get from start to end
of both traces
6
Trace y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61


Power Trace Alignment
Pairwise alignment
Trace x -

1:1 alignment:
trivial - nothing modified
by the alignment
6
Aligned distance:
P 2
= kx − y k2
Trace y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61


Power Trace Alignment
Pairwise alignment
Trace x -

Time offset:
works sometimes, but
not always optimal
6
Aligned distance:
P 2
=?
Trace y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61


Power Trace Alignment
Pairwise alignment
Trace x -

Extreme offset:
complete misalignment -
worst alignment
6 alternative

Aligned distance:
P 2
= kx k2 + ky k2
Trace y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61


Power Trace Alignment
Pairwise alignment
Trace x -

Optimal alignment:
Optimize alignment by
minimizing aligned
6 distance

Aligned distance:
P 2
= min
Trace y

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61


Power Trace Alignment
Finding optimal pairwise alignment

Dynamic Programming
A method for solving complex problems by breaking them down
into simpler subproblems.
Applicable to problems exhibiting the properties of overlapping
subproblems and optimal substructure.
Better performances than naive methods that do not utilize the
subproblem overlap.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 43 / 61
Power Trace Alignment
Dynamic Time Warp (DTW)

Basic DTW Algorithm:


For each trace-time i and for each trace-time j:
Set cost ← (x [i] − y [j])2
Set the optimal distance at stage [i, j] to:
DTW[i,j−1] - DTW[i,j]
 
DTW[i,j−1] 

 
DTW[i,j] ← cost + min DTW[i−1,j−1]  6
 

DTW[i−1,j]  DTW[i−1,j−1] DTW[i−1,j]

Optimal distance: DTW[m,n] (where m & n are lengths of traces).

Optimal alignment: backtracking the path leading to DTW[m,n] via


min-cost choices of the algorithm

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 44 / 61


Power Analysis with DTW-based Alignment
based on recent paper3

Use coarse-grained matrices to avoid bad/unreasonable portions:

Drill down by fine graining to approximate the optimal


alignment with quasi-linear time & space requirements
3
“Improving Differential Power Analysis by Elastic Alignment”, 2011,
by J.G.J. van Woudenberg, M.F. Witteman, and B. Bakker
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 45 / 61
Power Analysis with DTW-based Alignment
based on recent paper3

Experimental results:
Compare correlation DPA using 3 alignment methods:
Static: Simple static alignment by time offset
SW: Replace trace entries with avg. of sliding window
Not strictly an alignment method, but simple &
sometimes effective
DTW: Elastic alignment with DTW

3
“Improving Differential Power Analysis by Elastic Alignment”, 2011,
by J.G.J. van Woudenberg, M.F. Witteman, and B. Bakker
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 46 / 61
Power Analysis with DTW-based Alignment
based on recent paper3

Static:
SW:
DTW:

DES with stable clock


3
“Improving Differential Power Analysis by Elastic Alignment”, 2011,
by J.G.J. van Woudenberg, M.F. Witteman, and B. Bakker
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 47 / 61
Power Analysis with DTW-based Alignment
based on recent paper3

SW:
DTW:

DES with unstable clock


3
“Improving Differential Power Analysis by Elastic Alignment”, 2011,
by J.G.J. van Woudenberg, M.F. Witteman, and B. Bakker
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 47 / 61
Analyzing Non-Cryptographic Leaks
with
Hidden Markov Model

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 48 / 61


Information & Activity Leaks

Consider secret sequence of activities and leaked information with the


following properties:
Contains information about the secret sequence
Contains noise
Insufficient for directly recovering the secret information

If activities follow known statistical patterns, then an attacker can


“guess” secret sequence from noisy leaks.

Attack: find best hypothesis such that:


1 It matches the leaked data
2 Has high probability according to statistical distribution of
activity sequences

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 49 / 61


Information & Activity Leaks
Can it work?

Leaked information can be used for more than cryptographic


purposes:
Users are predictable - most activities are similar & repetitive
Internet - common websites and surfing routines
Emails/documents - linguistic models
Passwords - most common password is “password”
Others examples: “querty”, “letmein”, “trustno1”, “dragon”,
“monkey”, “ninja”, and “jesus”.
News services often publish lists of most common passwords of
the year/month

Guess activities/information by detecting “reasonable” usage


patterns from leaked data
A statistical model of user activity profile can be used for this task.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 50 / 61
Markov Chain
Stochastic Process: h
x

=?


1
+
- q f - q f - qf

qi
i−2 i−1 i
qi+1 =
?
- h
y
@

q i+
1
h

?
=
@
@R
@ z

Transition probabilities:
Pr [qi+1 =? | qi , qi−1 , . . . , q2 , q1 ]

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 51 / 61


Markov Chain
Markov Process: h
x

=?


1
+
f

qi
- qi−2m - qi−1m - qi qi+1 =
?
- h
y
@

q i+
1
h

?
=
@
@R
@ z

Transition probabilities (no history):


Pr [qi+1 = ? | qi ] = Pr [qi+1 = ? | qi , qi−1 , . . . , q2 , q1 ]

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 51 / 61


Markov Chain
Keyboard structure & text auto-complete

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 52 / 61


Markov Chain
Keyboard structure & text auto-complete

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 52 / 61


Markov Chain
Keyboard structure & text auto-complete

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 52 / 61


Markov Chain
Keyboard structure & text auto-complete

- tm - em - xm - tm -

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 52 / 61


Hidden Markov Model

- hm
1 - hm
2 - hm
3 - hm
4 -

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 53 / 61


Hidden Markov Model

- hm
1 - hm
2 - hm
3 - hm
4 -

o1f
? f
?
o2 o3f
? f
?
o4

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 53 / 61


Hidden Markov Model

- hm
1 - hm
2 - hm
3 - hm
4 -

o1f
? f
?
o2 o3f
?
o4 f
?

Transition probabilities: Leak probabilities:


Pr [hi+1 =? | hi ] Pr [oi =? | hi ]

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 53 / 61


Hidden Markov Model

- hm
1 - hm
2 - hm
3 - hm
4 -

o1f
? f
?
o2 o3f
? f
?
o4

Viterbi Algorithm
A dynamic programming algorithm for finding the most likely
sequence of hidden states, especially in the context of Hidden Markov
models.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 53 / 61
Acoustic Analysis of Keyboards
based on paper4

4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4

4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4

Typed text:

4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4

HMM only:

4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4

HMM & spelling corrections:

4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4

Password retrieval
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Printers
based on recent paper5

(Pictures taken from


(Picture taken from URL:flylib.com/books/en/2.374.1.27/1/) URL:mindmachine.co.uk/book/print 06 dotmatrix overview01.html)

5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Acoustic Analysis of Printers
based on recent paper5
sound intensity

# of needles hitting the paper


5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Acoustic Analysis of Printers
based on recent paper5

Training:
Feature extraction (split into words, noise reduction, etc.)
Construct DB with (word, sound) pairs

Online:
Feature extraction (same as in training)
For each word:
Sort DB by similarity/difference from recorded sound
Reorder DB by n-gram/word distribution using HMM
Guess printed word as the top candidate from reordered DB

5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Acoustic Analysis of Printers
based on recent paper5

5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Further Reading I

Side-channel attacks using machine learning tools:


“Further hidden Markov model “Machine learning in side-channel
cryptanalysis” (2005) by P.J. Green, R. analysis: a first study” (2011) by G.
Noad, N.P. Smart Hospodar, B. Gierlichs, E. De Mulder, I.
“Analyzing side channel leakage of Verbauwhede, J. Vandewalle
masked implementations with stochastic “Side channel attack: an approach based
methods” (2007) by K. Lemke-Rust & C. on machine learning” (2011) by L.
Paar Lerman, G. Bontempi, O. Markowitch
“Side channel attacks on cryptographic “Side channel cryptanalysis using
devices as a classification problem” machine learning” (2012) by H. He, J.
(2007) by P. Karsmakers, B. Gierlichs, K. Jaffe, & L. Zou
Pelckmans, K. De Cock, J. Suykens, B. “PCA, eigenvector localization and
Preneel, B. De Moor clustering for side-channel attacks on
“Theoretical and practical aspects of cryptographic hardware devices” (2012)
mutual information based side channel by D. Mavroeidis, L. Batina, T. van
analysis” (2009) by E. Prouff & M. Laarhoven, E. Marchiori
Rivain “Efficient Template Attacks Based on
“Cache-timing template attacks” (2009) Probabilistic Multi-class Support Vector
by B.B. Brumley & R.M. Hakala Machines” (2013) by T. Bartkewitz & K.
Lemke-Rust

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 56 / 61


Further Reading II
Trace alignment:
“Recovering secret keys from weak side “Improving differential power analysis by
channel traces of differing lengths” (2008) by elastic alignment” (2011) by J.G.J. van
C.D. Walter Woudenberg, M.F. Witteman, B. Bakker
“Side Channel Analysis enhancement: a “A general approach to power trace alignment
proposition for measurements for the assessment of side-channel resistance of
resynchronisation” (2011) N. Debande, Y. hardened cryptosystems” (2012) by Q. Tian &
Souissi, M. Nassar, S. Guilley, T.H. Le, J.L. S.A. Huss
DangerBakker

Information retrieval from leaked data:


“Keyboard acoustic emanations revisited” “Automated black-box detection of
(2005) by L. Zhuang, F. Zhou, J.D. Tygar side-channel vulnerabilities in web applications”
“Acoustic side-channel attacks on printers” (2011) by P. Chapman & D. Evans
(2010) by M. Backes, M. Dürmuth, S. Gerling, “Current events: identifying webpages by
M. Pinkal, C. Sporleder tapping the electrical outlet” (2012) by S. S.
“Building a side channel based disassembler” Clark, B. A. Ransford, J. M. Sorber, W. Xu, E.
(2010) by T. Eisenbarth, C. Paar, Björn G. Learned-Miller, K. Fu
Weghenkel “Engineering statistical behaviors for attacking
and defending covert channels” (2013) by V.
Crespi, G. Cybenko, A. Giani

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 57 / 61


Conclusion
Machine learning
Retrieve meaningful information from vast amounts of leaked data.

Machine learning tools/concepts:


Training/testing scheme
Dimensionality reduction with PCA
Clustering/classification with SVM
Alignment with DTW
Predicting/guessing usage patterns with HMM

Side channel applications


Template based power analysis & power trace alignment
Acoustic analysis of keyboards & printers
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 58 / 61
Conclusion
Machine learning
Retrieve meaningful information from vast amounts of leaked data.

Machine learning tools/concepts:


Training/testing scheme
Dimensionality reduction with PCA
Clustering/classification with SVM
Alignment with DTW
Predicting/guessing usage patterns with HMM

Side channel applications


Template based power analysis & power trace alignment
Acoustic analysis of keyboards & printers
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 58 / 61
Big Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 59 / 61


Big Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 59 / 61


Big Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 59 / 61


Big Data

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 59 / 61


Big Data
Not just from security leaks...

Big Data is produced and collected everywhere:


Web data
E-commerce (Amazon, Ebay, etc..)
Purchases at department/grocery stores
Bank/Credit Card transactions
Netflix/Blockbuster/VOD
Social networks (Facebook, Google, etc.)

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 60 / 61


Big Data
What can be done with it?

Processing Big Data involves several aspect such as:


Aggregation and Statistics
Data warehouse and OLAP
Indexing, Searching, and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge discovery
Data Mining
Statistical Modeling

Guy Wolf (TAU) Machine Learning & Side-Channels 2013 61 / 61

You might also like