You are on page 1of 100

Data Structures and Algorithms

Cryptography
Prepared by Kristian Guillaumier
Department of Intelligent Computer Systems

Note on the slides


Any code or pseudo-code shown in these
slides may not be the most efficient or optimal
way to implement an algorithm. The purpose
of code in these slides is to support an
argument or an explanation in class.
Send corrections to
kristian.guillaumier@um.edu.mt

Cryptography
Note: All content in this section from An Introduction to Cryptography, Linz
and corresponding Wikipedia articles (references checked).

Cryptography: the study of methods of sending


messages in secret so that only the intended recipient
can read the message.
Plain Text: the original message.
Cipher Text: disguised message.
Encryption: The process of converting the plain text
into cypher text.
Decryption: the process of converting the cipher text
into plain text.
Cryptogram: the transmitted encrypted message (the
transmission of the cipher text message).
3

Cryptography
Cryptographer: someone who studies
cryptography.
Cryptanalysis: the mathematical study of
defeating cryptographic methods.
Steganography: concealing the presence of
the existence of a message in the first place
(e.g. false-bottomed suitcase, invisible ink)

Caesar Cipher (100BCE-44BCE)


images and examples from: http://en.wikipedia.org/wiki/Caesar_cipher

Substitution cipher.
Shift cipher.
Monographic substitution.
Shift (of 3) as in original version.

Caesar Cipher Example (shift 3)

Caesar Cipher
Implementing the algorithm using modulo
arithmetic.
Assign a number to each letter such that A=0, B=1,
C=3, and so on.
Know the size of the alphabet. In English, this
would be 26.
To encrypt a letter x by some shift n:
En(x) = (x+n) % 26.

To decrypt a letter x encrypted using some shift n:


Dn(x) = (x-n) % 26.
7

Example
PT Letter

PT Number

Function shift 3

CT Number

CT Letter

(0+3)%26

(1+3)%26

(2+3)%26

23

(23+3)%26

24

(24+3)%26

25

(25+3)%26

Caesar Cypher: Cryptanalysis


Assumption: the attacker
knows we are using a Caesar
cypher but does not know
the shift (the secret).
Since there could only be 26
possible shifts, a brute force
attack can be attempted
even by hand.
Note: even if we are not
sure that a Caesar cypher is
in use we can eventually
determine it using a
technique called frequency
analysis.

Caesar Cypher: Cryptanalysis


Multiple encryption passes are irrelevant to
improving the strength of encryption.
It can be shown that:

En(Em(x)) == En+m(x)
CS can be broken trivially.
10

Website
US Army Field Manual for Basic Cryptanalysis:
http://www.umich.edu/~umich/fm-34-40-2/

11

Frequency Analysis
The frequency of occurrence of letters (or groups of
letters as in a digraph) in a language.
Example: E, T, A and O most common in English, while
Z, Q and X are least common.
TH, ER, ON, and AN are the most common digraphs.
SS , EE , TT , and FF are the most common repeats.
In some cryptographic systems these patens in the
plain text manifest themselves in the cypher text as
well (as in the Caesar cypher) and can be exploited.
12

Distribution of letters in the English


Language

13

Example
(see full at http://en.wikipedia.org/wiki/Frequency_analysis)

LIVITCSWPIYVEWHEVSRIQMXLEYVEOIEWHRXEXIPFE
MVEWHKVSTYLXZIXLIKIIXPIJVSZEYPERRGERIM
WQLMGLMXQERIWGPSRIHMXQEREKIETXMJTPRGEVEKE
ITREWHEXXLEXXMZITWAWSQWXSWEXTVEPMRXRSJ
GSTVRIEYVIEXCVMUIMWERGMIWXMJMGCSMWXSJOMIQ
XLIVIQIVIXQSVSTWHKPEGARCSXRWIEVSWIIBXV
IZMXFSJXLIKEGAEWHEPSWYSWIWIEVXLISXLIVXLIR
GEPIRQIVIIBGIIHMWYPFLEVHEWHYPSRRFQMXLE
PPXLIECCIEVEWGISJKTVWMRLIHYSPHXLIQIMYLXSJ
XLIMWRIGXQEROIVFVIZEVAEKPIEWHXEAMWYEPP
XLMWYRMWXSGSWRMHIVEXMSWMGSTPHLEVHPFKPEZIN
TCMXIVJSVLMRSCMWMSWVIRCIGXMWYMX

14

Example
(see full at http://en.wikipedia.org/wiki/Frequency_analysis)
Convention: we use upper case for cypher text characters and lower
for plain text characters.
In cryptogram:
I is the most common single letter.
XL most common bigram.
XLI is the most common trigram.

In English:
e is the most common single letter.
th is the most common bigram.
the is the most common trigram.

This allows us to guess that:


I~e
XL~th, so X~t and L~h
XLI~the so X~t and L~h and I~e
15

Example
(see full at http://en.wikipedia.org/wiki/Frequency_analysis)

In cryptogram:
E is the second most common single letter.

In English:
t is the second most common single letter but t is
already accounted for in our previous guesses so we
use a which is third most common single letter.

This allows us to guess that:


E~a

Using these guesses we get the following.


16

heVe == here??

Rtate == state??

V~r

R~s

heVeTCSWPeYVaWHaVSReQMthaYVaOeaWHRtatePFa
MVaWHKVSTYhtZetheKeetPeJVSZaYPaRRGaReM
WQhMGhMtQaReWGPSReHMtQaRaKeaTtMJTPRGaVaKa
eTRaWHatthattMZeTWAWSQWtSWatTVaPMRtRSJ
GSTVReaYVeatCVMUeMWaRGMeWtMJMGCSMWtSJOMeQ
theVeQeVetQSVSTWHKPaGARCStRWeaVSWeeBtV
eZMtFSJtheKaGAaWHaPSWYSWeWeaVtheStheVtheR
GaPeRQeVeeBGeeHMWYPFhaVHaWHYPSRRFQMtha
PPtheaCCeaVaWGeSJKTVWMRheHYSPHtheQeMYhtSJ
atthattMZe == at that time??
theMWReGtQaROeVFVeZaVAaKPeaWHtaAMWYaPP
thMWYRMWtSGSWRMHeVatMSWMGSTPHhaVHPFKPaZeN
M~i, Z~m
TCMteVJSVhMRSCMWMSWVeRCeGtMWYMt
17

hereTCSWPeYraWHarSseQithaYraOeaWHstatePFa
iraWHKrSTYhtmetheKeetPeJrSmaYPassGasei
WQhiGhitQaseWGPSseHitQasaKeaTtiJTPsGaraKa
eTsaWHatthattimeTWAWSQWtSWatTraPistsSJ
GSTrseaYreatCriUeiWasGieWtiJiGCSiWtSJOieQ
thereQeretQSrSTWHKPaGAsCStsWearSWeeBtr
remarA == remark??
emitFSJtheKaGAaWHaPSWYSWeWeartheStherthes
A~k
GaPesQereeBGeeHiWYPFharHaWHYPSssFQitha
PPtheaCCearaWGeSJKTrWisheHYSPHtheQeiYhtSJ
theiWseGtQasOerFremarAaKPeaWHtaAiWYaPP
thiWYsiWtSGSWsiHeratiSWiGSTPHharHPFKPameN
TCiterJSrhisSCiWiSWresCeGtiWYit

18

hereTCSWPeYraWHarSseQithaYraOeaWHstatePFa
iraWHKrSTYhtmetheKeetPeJrSmaYPassGasei
WQhiGhitQaseWGPSseHitQasaKeaTtiJTPsGaraKa
eTsaWHatthattimeTWkWSQWtSWatTraPistsSJ
With?
GSTrseaYreatCriUeiWasGieWtiJiGCSiWtSJOieQ
thereQeretQSrSTWHKPaGksCStsWearSWeeBtr
Which?
emitFSJtheKaGkaWHaPSWYSWeWeartheStherthes
GaPesQereeBGeeHiWYPFharHaWHYPSssFQitha
PPtheaCCearaWGeSJKTrWisheHYSPHtheQeiYhtSJ
theiWseGtQasOerFremarkaKPeaWHtakiWYaPP
thiWYsiWtSGSWsiHeratiSWiGSTPHharHPFKPameN
TCiterJSrhisSCiWiSWresCeGtiWYit

19

Example
(see full at http://en.wikipedia.org/wiki/Frequency_analysis)

hereuponlegrandarosewithagraveandstatelya
irandbroughtmethebeetlefromaglasscasei
nwhichitwasencloseditwasabeautifulscaraba
eusandatthattimeunknowntonaturalists
Hereupon Legrand arose, with a grave and stately air, and
brought me the beetle
from a glass case in which it was enclosed. It was a beautiful
scarabaeus, and, at
that time, unknown to naturalistsof course a great prize in
a scientific point

20

Vigener cipher

Example from Coursera.org (Stanford crypto course by Dan Boneh).


Take a key, e.g. crypto.
Take the message whatanicedaytoday.
Align and repeat key.

cryptocryptocrypt
whatanicedaytoday
Add mode 26
zzzjucludtunwqcqs
Interesting property first encryption of p.t. a is z and second encryption
of a is u. Not fixed.

21

Decryption and cryptanalysis


Simply subtract mod 28 (instead of add).
Cryptanalysis is trivial.
Guess a key length and a target language (English).
Lets guess correctly: 6.
Break c.t. in chunks of 6:
zzzjuc.ludtun.wqcqs
Look at the first letter:
zzzjuc.ludtun.wqcqs

List these first letters... zlw


The most common letter L in this list will likely be the most common in English i.e.
e.
So now we that cypher letter L is plain letter e.
Lets say the most common letter L was H. Subtract H from e to get c.
We found the first letter of the key.
Repeat for 2nd to 6th.

22

Polygraphic ciphers
As opposed to monographic ciphers.
Digraph substitution.
Monograph (a letter), Digraph (a pair of
letters).
Frequency analysis is much harder:
Wed need the frequency analysis of 600 digraphs
(in a Playfair cypher) rather than 26 monographs.
Explanation of 600 on next slide.
23

Number of digraphs
Consider an alphabet with 4 characters a, b, c and d.
How many pair combinations could we have: 42. aa, ab, ac, ad, ba, bc, bd,
dd.
What if we dont allow duplicates? We have to remove 4 possibilities: aa,
bb, cc, dd.
So the number of pairs without duplicates in an alphabet containing n
characters is n2-n.
What if we combine one character to give us: a, b, c/d?
How many pairs do we get in an n letter alphabet if we omit/combine m
letters: (n-m)2.
Again if we wish to omit duplicate pairs wed get: (n-m)2 (n-m).
In Playfair, we combine one 2 letters from the 26-letter alphabet so we
get: (26-1)2 (26-1) = 252-25 = 600.
24

The Playfair cipher (1854)


Use a 5x5 table.
From the top left, start writing a password (key,
secret) in every cell careful not to create duplicates.
After using enough boxes to fit the secret, fill in the
remaining boxes with the letters of the alphabet in
order (again careful not to create duplicates).
Since I can fit 25 letters in the table but the alphabet
consists of 26 characters, usually a letter is combined
(e.g. the i with the j or the w with the x).

25

The Playfair cipher


The key used here is playfair example. We get
this table (note the i and j are combined).

26

Encrypting with Playfair


Consider this table:

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX
G
Q
L
B

D
Y
M
S
C
27

Rule 1
If 2 letters are on the same row, their cypher text is
immediately to their right (use wrap-around).
E.g. VC is RV

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX

G
Q
L
B

D
Y
M
S
C
28

Rule 2
If two letters are on the same column the cypher text is the
letters below (careful to apply wrap-around too if necessary).
E.g. ZF is UR.

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX

G
Q
L
B

D
Y
M
S
C
29

Rule 3
If two letters are on the diagonal of a rectangle formed by them, then the
cyphers are the equivalents on the corners of the opposite diameter on
the same rows as the plain text letters.
E.g. UL = GF and SZ = FD.

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX

D
G Y
Q M
L S
B C

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX

D
G Y
Q M
L S
B C
30

Rule 4+5
If the same letter appears as a pair in the plain
text, separate them with a Z before
encrypting.
If a single letter appears at the end of the
plain text when encrypting (there is an odd
number of letters to encrypt), pad with a Z.

31

Example

BUTTON
BUTZTON
BU TZ TO N
BU TZ TO NZ

Becomes

A
E
O
H
V

Z
U
N
F
R

I
T
K
J
P

WX

D
G Y
Q M
L S
B C

RG UI EK FU
32

Cryptanalysis
The cypher text will never contain double letter digraphs
(pairs). If this observation is made over a suitably long stream
of cypher text (to make it statistically significant) we can infer
that Playfair is used.
If both the plain text and the cypher text are available then
finding the key is straightforward (assuming we have enough
text).
Try out: BU TZ TO NZ CT UT VK SU

33

Cryptanalysis
In Playfair, a digraph and its reverse (e.g. AB and
BA) will decrypt to plain text in reverse (e.g. RE
and ER).
In English, there are many words which contain
these reversed digraphs such as REceivER and
DEpartED.
Identifying nearby reversed digraphs in the
ciphertext and matching the pattern to a list of
known plaintext words containing the pattern is
an easy way to generate possible plaintext strings
with which to begin constructing the key.
34

Cryptanalysis
Random-restart hill climbing.
Start with random square of letters.
Create mutation operations (swap letters,
swap rows/columns, reflecting.
Score the obtained plaintext with some fitness
function, e.g. comparing digraphs to a
frequency chart.

35

Cryptanalysis: an example
Taken from a forum challenge http://s13.zetaboards.com/Crypto/topic/1237
30/1/

36

Challenge
TM
EK
BH
AV
CN
RU
ES

NX
TV
AN
HX
RX
IS
BP

LR
GV
KT
IQ
MI
EU
VA

QG
SU
GI
NK
AS
LE
HU

CR
GZ
VO
IS
HV
VA
RE

XE
KH
VA
EU
AS
SF
IR

EW
IC
SF
LE
HB
GZ
XE

EG
NH
VA
BM
CI
KN
TY

VK
NB
AR
HA
HY
HG
AB

GS
TM
BV
LX
BM
GC
IU

MH
SA
NI
VC
AR
RC

XM
VS
VE
BF
BU
IK

EV
KN
IV
ST
NX
BS

37

The crib
"turkey eating title".
Possible ways to split in pairs:
?t ur ke ye at in gt it le
tu rk ey ea ti ng ti tl e?
Interesting pattern in 2nd option. Lets try that.
38

tu rk ey ea ti ng ti tl e?
TM
EK
BH
AV
CN
RU
ES

NX
TV
AN
HX
RX
IS
BP

LR
GV
KT
IQ
MI
EU
VA

QG
SU
GI
NK
AS
LE
HU

CR
GZ
VO
IS
HV
VA
RE

XE
KH
VA
EU
AS
SF
IR

EW
IC
SF
LE
HB
GZ
XE

EG
NH
VA
BM
CI
KN
TY

VK
NB
AR
HA
HY
HG
AB

GS
TM
BV
LX
BM
GC
IU

MH
SA
NI
VC
AR
RC

XM
VS
VE
BF
BU
IK

EV
KN
IV
ST
NX
BS

39

Possible digraph matches


Either
tu rk ey ea ti ng ti tl e?
AN KT GI VO VA SF VA AR BV
Or
tu rk ey ea ti ng ti tl e?
ST CN RX MI AS HV AS HB CI
Lets try the 2nd one.
40

We now know some pairs


Plain

Cipher

tu

ST

rk

CN

ey

RX

ea

MI

ti

AS

ng

HV

ti

AS

tl

HB

e?

CI

41

Lets pick on tu=ST


S and T must be on the same row or column. They can never be diagonals.
Why? Hint: T is in ciphertext and t is in plain text. You can never get the
repeated t if you use a diagonal because youd have to repeat the letter.

t
Impossibe

T
42

So if tu=ST then UTS must be on the same row or column in


exactly these relative positions for instance

U
U

T
S

t encrypts to S and u encrypts to T


43

Lets assume it is on the same col


From the crib there is another known digram starting with t
ti=AS

U
T
S

TIAS cannot be on the same col because t encrypts to S (in


the col).

TIAS cannot be on the same row since because T and S are


on the same column.
So ti and AS are diagonals. For instance

U
T

I
44

Lets consider ea=MI

From previous results, I is on the same column as A.


e and M must also be on the same column.
M must be under e.
There are 2 possibilities

I
E
45

Another useful pattern


Squares shifted by rows and columns are
functionally equivalent (key=hello world):
H

IJ

IJ

M
46

U
T

ey=RX

So far our table is handling tu=ST,


ti=AS, and ea=MI properly.
Now for ey=RX, EYRX can be either
on the same column, on the same
row or diagonals.
They cannot be on the same column
because column-wise we have
proven that e=M (from ea=MI).
IF EYRX is on the same row then E
and R, and Y and X are on the same
row (obviously).
IF EY/RX are diagonals, E and R, and
Y and X are on the same row.
So we prove that E and R, and Y and
X will be on the same row in all
cases.

47

Cryptanalysis example: conclusion


Notice that Im building rules as to the relative
position of letters in the square.
By continuing my analysis along these lines, Ill
fill in the whole square.
See walkthrough here:
http://s13.zetaboards.com/Crypto/topic/1237
30/1/

48

Four-Square Cypher (1902)


Used four 5x5 matrices arranged in a square
(merging the i with j, omitting, say Q, or any
other combination).
Generally the upper left and lower right
matrices are filled with plain text (plain text
squares) and the other matrices filled with
mixed text (cypher text squares).
Each cypher text square is filled using a
scheme similar to Playfair.
49

Example Example Keyword

Plain text
square.

Plain text
square.

Note: character case is used to simplify reading.

50

Encryption
Split the plain text in digraphs:
THE SLICK BIRD
TH ES LI CK BI RD
Find the first digraph letter TH in the upper-left PT matrix.
Find the second digraph letter TH in the lower-right PT
matrix.

51

Encryption
The first
cypher letter
in the digraph
is on the
same row as
the first plain
text letter and
the same
column of the
second plain
text letter.
52

Encryption
The second
cypher letter in
the digraph is
on the same
row as the
second plain
text letter and
the same
column of the
first plain text
letter.
53

th RB
2

54

Encryption
The slick bird
=
TH ES LI CK

BI

RD

MD

SE

Encrypted
RB

AS

JD

EH

Decryption uses the exact but


reverse process.

55

Cryptanalysis
Similar ideas to Playfair if both plain text and
cipher text are known.
Consider what happens when we encrypt
MI/LI/TA/RY, for MI we get JA and for LI
we get JD.
Notice the repetition of J in the cipher text.
This happens because M and L are on the
same row in the top left plain text square and the
I is the same. This is an exploitable pattern.
56

Cryptanalysis
Difference from Playfair:
Four-Square will not show reversed cypher
text digraphs for reversed plain text digraphs.
Consider DEpartED becomes
PWnksmMO
PW and MO are not reversals of each other.
This makes four-square stronger than Playfair.
57

ADFGVX cipher (1918)


Polybius square (cardinality):

Arrange letters in a square.


A 5x5 square (enough for 25 characters so we merge i with j as usual).
A 6x6 square (enough for 36 characters so we can fit 26 letters and 10 digits.
Each letter is represented using its coodinates in the square. E.g. using the
6x6, Hello becomes 22 15 26 26 33

I/J

58

Polybius square
Left hand/right hand raising of torches.
Knock-knock.

59

ADFGVX cipher
The ADFGVX cypher (originally ADFGX but later added the V
to become 6x6 to include digits and shorten transmissions)
starts off with a Ploybius square indexed by ADFGVX.
Each letter in ADFGVX sounds very distinct in Morse code.
A

0
60

ADFGVX cipher
SEND 20 is fractionated as
S
E
N
D
2
0
FF FG XA GF DX DG
A

0
61

ADFGVX cipher
The British have landed would be enciphered
as XF FX FG AA AG AX XF AX FF FX FX DA GX FG AV
DA XA GF FG GF.
This is the fractionated text or transitional cypher
text.
So far the this is a simple one-to-one substitution
(cryptographically useless by itself).
The next step involves a key. E.g. the word
German.
62

ADFGVX cipher
Create a table with the key (German) as the heading and place all the
characters in the transitional cypher text horizontally in it. Note the red
arrows that indicate the fill order.
XF FX FG AA AG AX XF AX FF FX FX DA GX FG AV DA XA GF FG GF

F
63

ADFGVX cipher
Now sort the table by the letters in the
keyword (moving the columns with the letter
of the keyword).
G

G
64

ADFGVX cipher
Now read the values column wise grouping in, say, 5 letter
blocks (for convenience when reading).

FAFDA

GFAFX

XAGXA

XFGDF XGXXG

AFGXF

AVFFA

AFFXG
65

Cryptanalysis (for a special case)


Lets pick a Caesar shift:
abcdefghijklmnopqr...
cdefghijklmnopqrst...
Encrypt hello jgnnq.
This can be easily beaten using freq. analysis.
But now transpose columns of the cipher. E.g. jgnnq
becomes ngnjq.
The transposed version is still subject to freq. analysis but if
we use it, we get lelho (hello transposed).
In other words we still dont get the plain text.
Transposition is the problem.
66

Hint on ADFGVX
If we notice that the ciphertext only consists
of 6 letters and has an even number of letters
then we could assume a 6x6 board and that
were dealing with digraphs.
Frequency analysis matches (indicates)
plaintext for the language being assumed but
performing it will not give the plaintext. This is
a hint that transposition is used.
67

Lets set a Polybius square


A

68

Assume 2 similar (prefix) plaintexts and substitute


A

MY NAME IS KRIS
FF VV FG AA FF AV DG GG FA GF DG GG
MY NAME IS JOHNNY
FF VV FG AA FF AV DG GG DV FV DF FG FG VV

69

Use a key cat example 1


For MY NAME IS KRIS i.e. FF VV FG AA FF AV
DG GG FA GF DG GG we get:
C

Sort key

Ciphertext: FFAAG FFGFV AFDGG GVGFV GADG

70

Use a key cat example 2


For MY NAME IS JOHNNY i.e. FF VV FG AA FF AV DG GG DV
FV DF FG FG VV we get (pad with V):
C

Sort key

Ciphertext: FFAAG DVFGV FVAFD GFFFV VGFVG VDGVV

71

The 2 ciphertexts next to each other


FFAAG FFGFV AFDGG GVGFV GADG
FFAAG DVFGV FVAFD GFFFV VGFVG VDGVV

Notice that there are common substrings, e.g.


FFAAG FFGFV AFDGG GVGFV GADG
FFAAG DVFGV FVAFD GFFFV VGFVG VDGVV

72

Remember the substrings we found:


FFAAG FFGFV AFDGG GVGFV GADG
FFAAG DVFGV FVAFD GFFFV VGFVG VDGVV

MY NAME IS JOHNNY

MY NAME IS KRIS

Pick the longest ones, say we pick the longest 3 (in fact 3 is the right
guess).
Now remember what the tables where:
Now we can guess the column length.
A
C
T
F

73

For My name is Kris


If we guessed the key length of 3 correctly we can know the column length is 8.
Break the ciphertext in chunks of 8:
FFAAGFFG FVAFDGGG VGFVGADG
Notice that the longer substrings are closer to the beginning of the unsorted
(original) key?
A

And now we
have the key

74

Finally
Read row-wise and use frequency analysis to
beat the substitution.
ADFGVX was cryptanalyzed for special cases
by Georges Painvin in 1918.
Cryptanalysis for the general case was found
by William Friedman in 1933.
More info at:
http://www.nku.edu/~christensen/section%2010
%20ADFGVX.pdf
75

Secure stream ciphers


Most material in this section taken from the Stanford
Crypto course by Dan Boneh (coursera.org)
m, c is plaintext and ciphertext
E, D is the encryption and decryption algorithm.
k is the secret key.
E(k, m) = c
D(k, c) = m
E and D are publicly known.
Only k is kept secret.
76

Secure stream ciphers


Sharing secret (symmetric) keys PK
infrastructure.
Digital signatures (function of the content
being signed [hash?], signature can never be
the same for different documents attacker
would just copy and paste the same signature
being used).

77

Keyspace of a substitution cipher


Key space of a substitution cipher assuming 26
letters is 26!
Note, this is not the key space for Caesar
(Caesar has no key).
26! Is the number of substitution tables I
could possibly have.
This is approx. 288.
Nonetheless, this is easily breakable using
frequency analysis.
78

What is a cipher?
A cipher is a pair of efficient algorithms E and
D over the triple (: key space, : message
space, : ciphertext space).
E:
D:
Consistency:
, : , , =
Efficient means runs in polynomial time.
79

The One Time Pad Vernam 1917


= = 0,1
i.e. message space = cipher text space = set of
all n-bit binary strings.
= 0,1
So a key is a sequence of bits as long as the
message.
= , =
i.e. key XOR message.
80

One time pad is a cipher


m: 0110111
k: 1011001
E: 1101110
To decrypt: , =
c: 1101110
k: 1011001
D: 0110111
This is consistent (a cipher) because XOR is associative so:
, ,

= , = = = 0 =

81

Note
If I know the cipher text c and the message m, it
is easy to get the key.
=
One time pad is very, very fast.
Problem to use in practice the key must be as
long as the message.
If Alice has a secure method to communicate the
key with Bob before sending the message, then
Alice might as well use that method to send the
message.
82

Security of OTP
What is a secure cipher?
Cipher text should not reveal anything about the plaintext.
A cipher (E,D) over , , has perfect secrecy (Claude Shannon) if:
0 , 1 . . 0 = 1
Pr , 0 = = Pr[ , 1 = ]
i.e. for some random key k, the probability of getting a ciphertext c from 0 is
the same as having got it from any other message 1 .
In other words, if I only have they ciphertext c, I can never tell if the message
was 0 or 1 .
There is no ciphertext-only attack (other attacks may be possible).
83

OTP has perfect secrecy


Proof:
# . . , =
, : Pr , = =

i.e. the number of keys that encrypt m into c divided by


the number of keys in the key space.
Suppose # . . , = is some
constant. is a constant so Pr , = will be a
constant and the same for any , . If this property is
true, then the cypher will have perfect secrecy.

How many keys can encrypt a given message m into c? 1.


84

Only 1 OTP key maps m to c


How many keys exist such that = ?
= = (only 1 key).
(above is easy to show. Just XOR both sides of
= with )
OTP has perfect secrecy (no c.t. only attack,
unlike, say, Playfair).
85

Issues
OTP has long keys.
Is there another cipher that has perfect secrecy?
Shannon has proven that for perfect secrecy
in other words, the length of the key
must be length of the message (bad news).
Perfect secrecy ciphers have practical issues.
The key needs to be truly random (cannot be
predictable).
May-time pad attacks.
86

Weakness of Many Time Pad


Using the pad more than once will offer an
opportunity to attack using the ciphertexts
only.
A common method is called crib dragging.
There is a good example here:
http://travisdazell.blogspot.com/2012/11/ma
ny-time-pad-attack-crib-drag.html (Travis
Dazell).
87

Stream ciphers
The idea is to replace the random key with a
pseudorandom key.
A pseudorandom number generator (PSG) is a
function:
: 0,1 0,1 . .
i.e. a function that takes a value called the seed (say,
128 bits) and give you a much larger output stream
(e.g. Gb long).
G is a deterministic. The only random thing is the
seed.
The output should look random.
88

Making OTP practical


Use a reasonably sized key as the seed for
that will generate a pesudorandom string as
long as the message and encrypt to
get .
For the same seed , the generator will
generate the same so we can decrypt.
Can a stream cipher have perfect secrecy?
No (the key is shorter than the message, recall thats
proven by Shannon).
Well need to come up with another definition of
security.
89

The PRG must be unpredictable


Suppose that a PRG is predictable. i.e.:
:

+1

There is some algorithm that given the first i bits generated


by the PRG can predict the rest.
Why is this bad?
Suppose I have some ciphertext 1 of length for some
message 1 .
Suppose that the attacker knows the some prefix 1 of the
plain text message (e.g. an SMTP header).
If the attacker XORs this prefix with the equivalent prefix in the
ciphertext he will obtain the prefix of the pad:
1 1 = ()1
If the PRG is predictable, I can obtain the rest of the sequence
G()+1 to get the plaintext from the ciphertext.
90

Predictable
Many random number generators in
frameworks/libraries are predictable (e.g.
random() in C).
Do not use for crypto.
Use specific cryptographically secure PRGs.

91

Attacks on an OPT
So in order to make the OTP practical, our cipher
is:
, = ()
, = ()

is the short key.


Security cannot now depend on the perfect
secrecy of the method but on the property of the
PRG.
92

Two time pads are insecure


Scenario:
1 1
2 2 ()
Same pad, different messages.
If the attacker intercepts 1 and 2 then:

1 2 = 1 2
= 0 1 2 = 1 2
1 and 2 can be recovered from 1 2 .
93

Recent two time pad attacks


Microsoft PPTP.
Client side:
Sends messages 1 , 2 , 3 ,
Encryption is 1 2 3 ()
Where is concatenation. i.e. all client messages are one stream.

Server side:
Sends messages 1 , 2 , 3 ,
Encryption is 1 2 3 ()
Where is concatenation. i.e. all server messages are one stream.

The problem is that the client and server are communicating using
the same key the pad is used twice.
Never use the same key and .
The proper key should be a pair ( , ).
94

Recent two time pad attacks


WIFI, WEP, 802.11b.
Setup:
Computer and router share a secret long term key . is 104 bits.
The message (a frame) consists of the that and a CRC.
=
The pad is calculated as follows:
( )
Where is a 24-bit sequence (the initialisation vector).
So:

Key for frame 1: 0


Key for frame 2: 1
Key for frame 3: 2

95

WIFI, WEP, 802.11b


Problem 1: after 224 16 frames the cycles and
different frames will be encrypted with the same key.
Problem 2: on many devices, power cycling resets the
to 0.
Problem 3: the keys fed into the PRG are of the format:
where remains the same. Only the 24 bit prefix
changes so the keys are related (104 bit suffix is the same).
The PRG used in WEP (RC4, read about it) isnt designed to
handle related keys.
Attacks on the PRG are known when observing something
in the order of 40,000 frames, the key can be fully
recovered (a few minutes on a bust network).
96

A better alternative
Treat all the frames as long a stream
1 2 3 and with ().
First segment of pad encrypts 1 , second
segment of pad encrypts 2 , etc
Each segment of the pad is random and
unrelated.

97

Another attack: OTP provides


confidentiality and not integrity
Given: a message .
Cipher text is then .
Now the attacker, attacks/modifies the c.t. by
doing this:

When the receiver receives and
decrypts it, will end up with .
The receiver will never know that has been
injected in the message.
98

Another attack: OTP provides


confidentiality and not integrity
Sender:
P.t. =
C.t = = # #

Attacker:
Sees # #
Designs a such that # # =
# #.

Receiver:
Gets # #.
Decrypts it to " ".
99

Another attack: OTP provides


confidentiality and not integrity

42

6F

62

45

76

65

07

19

07

This defect occurs because OTP is malleable.

100