You are on page 1of 49

Cryptography

Topic 10: Introduction to Encryption

Topic 1 1
Goals of Cryptography
• The most fundamental problem cryptography
addresses: ensure security of communication over
insecure medium
• What does secure communication mean?
– confidentiality (privacy, secrecy)
• only the intended recipient can see the communication
– integrity (authenticity)
• the communication is generated by the alleged sender
• What does insecure medium mean?
– Two possibilities:
• Passive attacker: the adversary can eavesdrop
• Active attacker: the adversary has full control over the
communication channel

Topic 1 2
Approaches to Secure Communication

• Steganography
– “covered writing”
– hides the existence of a message
– depends on secrecy of method

• Cryptography
– “hidden writing”
– hide the meaning of a message
– depends on secrecy of a short key, not method

Topic 1 3
Terms: Cryptography, cryptanalysis,
and cryptology
• Cryptography,
– Traditionally, designing algorithms/protocols
– Nowadays, often synonym with cryptology

• Cryptanalysis
– Breaking algorithms/protocols

• Cryptology: both cryptography & cryptanalysis


– Becoming less common

Topic 1 4
What Cryptography is About?
• Constructing and analyzing protocols which
enables parties to achieve objectives,
overcoming the influence of adversaries.
– a protocol (or a scheme) is a suite of algorithms that
tell each party what to do

• How to devise and analyze protocols


– understand the threats posed by the adversaries and
the goals

Topic 1 5
A Sample List of Other Goals in
Modern Cryptography
• Modern cryptography covers many topics beyond
secure communication
–Pseudo-random number generation
–Non-repudiation: Digital signatures
–Zero-knowledge proof
–Commitment schemes
–E-voting
–Secret sharing
–Secure Multi-party Computation (Secure Function
Evaluation)
–…

Topic 1 6
History of Cryptography
• 2500+ years
• An ongoing battle between codemakers and
codebreakers
• Driven by communication & computation
technology
– paper and ink (until end of 19th century)
– cryptographic engine & telegram, radio
• Enigma machine, Purple machine used in WWII
– computers & digital communication

Topic 1 7
Major Events in History of
Cryptography
• Mono-alphabetical ciphers (Before 1000 AD)
• Frequency analysis (Before 1000 AD)
• Cipher machines (early 1900’s)
• Shannon developed theory of perfect secrecy and information theoretical
security (around 1950)
• US adopts Data Encryption Standard in 1977
• Notion of public key cryptography and digital signatures introduced
(1970~1976)
• The study of cryptography becomes mainstream in the research
community (1976)
• Development of computational security and other theoretical foundation
of modern cryptography (1980’s)

Topic 1 8
What is This Course About?
• Mostly mathematical
– Understand the mathematics underlying the
cryptographic algorithms & protocols
– Understand the power and limitations of cryptographic
tools
– Understand the formal approach to security in modern
cryptography

Topic 1 9
Backgrounds Necessary for the Course

• A bit of probability
• Algorithms and complexity
• Mathematical maturity
– understand what is (and what is not) a proper
definition
– know how to write a proof

Topic 1 10
Symmetric-key Encryption
• This is what cryptography is all about until 1970.
• Two parties (often called a sender and a receiver)
share some secret information called a key.

• Sender uses the key to encrypt (or “scramble”) the


message, before it is sent
• Receiver uses the same key to decrypt (or
“unscramble”) and recover the original message

Topic 1 11
Basic Terminology for Encryption
• Plaintext
– An original message
– Also referred to as message
• Plaintext space (aka Message space)
– the set consisting of all possible plaintexts
• Ciphertext
– A scrambled message
• Ciphertext space
– The set consisting of all possible scrambled message
• Key secret used in transformation
• Key space K

Topic 1 12
Notation for Symmetric-key
Encryption
• A symmetric-key encryption scheme is comprised of three
algorithms
– Gen the key generation algorithm
• The algorithm must be probabilistic/randomized
• Output: a key k
– Enc the encryption algorithm
• Input: key k, plaintext m
• Output: ciphertext c := Enck(m)
– Dec the decryption algorithm
• Input: key k, ciphertext c
• Output: plaintext m := Deck(m)
Requirement: k m [ Deck(Enck(m)) = m ]

Topic 1 13
Shift Cipher
• The Key Space K :
– [0 .. 25]
• Encryption given a key k:
– each letter in the plaintext P is
replaced with the k’th letter
following corresponding number
(shift right)
• Decryption given k:
– shift left

History: k = 3, Caesar’s cipher

Topic 1 14
Shift Cipher: Cryptanalysis
• Can an attacker find K?
– YES: by a bruteforce attack through exhaustive
key search,
• How to tell whether a shift is correct?
– key space is small (<= 26 possible keys).

• Cipher key space needs to be large


enough.
• Exhaustive key search can be effective.

Topic 1 15
Mono-alphabetic Substitution Cipher
• The key space: all permutations of  = {A, B, C, …, Z}
• Encryption given a key :
– each letter X in the plaintext P is replaced with (X)
• Decryption given a key :
– each letter Y in the cipherext P is replaced with -1(Y)

Example:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
= B A D C Z H W Y G O Q X S V T R N M L K J I P F E U
BECAUSE  AZDBJSZ

Topic 1 16
Strength of the Mono-alphabetic
Substitution Cipher
• Exhaustive search is difficult
– key space size is 26!  4×1026  288

• Dominates the art of secret writing throughout the first


millennium A.D.

• Thought to be unbreakable by many back then

• How to break it?

Topic 1 17
Cryptanalysis of Substitution Ciphers:
Frequency Analysis
• Basic ideas:
– Each language has certain features: frequency of
letters, or of groups of two or more letters.
– Substitution ciphers preserve the language features.
• History of frequency analysis
– Discovered by the Arabs; earliest known description is in a book
by the ninth-century scientist al-Kindi
– Rediscovered or introduced from the Arabs in the Europe during
the Renaissance
• Frequency analysis made substitution cipher insecure

Topic 1 18
Frequency of Letters in English

14

12

10

8
Series1
6

0
a b c d e f g h i j k l m n o p q r s t u v w x y z

Topic 1 19
How to Defeat Frequency
Analysis?
• Use larger blocks as the basis of substitution.
Rather than substituting one letter at a time,
substitute 64 bits at a time, or 128 bits.
– Leads to block ciphers such as DES & AES.

• Use different substitutions to get rid of frequency


features.
– Leads to polyalphabetical substituion ciphers, cipher
machines, and stream ciphers

Topic 1 20
Cryptography
CS 555

Topic 11: Evolution of Classical


Cryptography

Topic 2 21
Lecture Outline
• Basics of probability
• Vigenere cipher.
• Attacks on Vigenere: Kasisky Test
and Index of Coincidence
• Cipher machines: The Enigma
machine.

• Required readings:
– Katz & Lindell: 1.1 to 1.3
• Recommended readings
– The Code Book by Simon Singh

Topic 2 22
Begin Math

CS555 Topic 2 23
Random Variable
Definition
A discrete random variable, X, consists of a finite set
X, and a probability distribution defined on X. The
probability that the random variable X takes on the
value x is denoted Pr[X =x]; sometimes, we will
abbreviate this to Pr[x] if the random variable X is fixed.
It must be that
0  Pr[ x] for all x  X
 Pr[ x]  1
xX

Topic 2 24
Example of Random Variables
• Let random variable D1 denote the outcome of throw one
dice (with numbers 0 to 5 on the 6 sides) randomly, then
D={0,1,2,3,4,5} and Pr[D1=i] = 1/6 for 0 i  5
• Let random variable D2 denote the outcome of throw a
second such dice randomly
• Let random variable S1 denote the sum of the two dices,
then S ={0,1,2,…,10}, and Pr[S1=0] = Pr[S1=10] =
1/36 Pr[S1=1] = Pr[S1=9] = 2/36 = 1/18 …
• Let random variable S2 denote the sum of the two dices
modulo 6, what is the distribution of S2

Topic 2 25
Relationships between Two Random
Variables
Definitions
Assume X and Y are two random variables,
then we define:
- joint probability: Pr[x, y] is the probability that
X takes value x and Y takes value y.
- conditional probability: Pr[x|y] is the probability
that X takes on the value x given that Y takes
value y.
Pr[x|y] = Pr[x, y] / Pr[y]
- independent random variables: X and Y
are said to be independent if Pr[x,y] = Pr[x]P[y],
for all x  X and all y  Y.

Topic 2 26
Examples
• Joint probability of D1 and D2
for 0i, j5, Pr[D1=i, D2=j] = ?
• What is the conditional probability Pr[D1=i | D2=j]
for 0i, j5?
• Are D1 and D2 independent?

• Suppose D1 is plaintext and D2 is key, and S1


and S2 are ciphertexts of two different ciphers,
which cipher would you use?

Topic 2 27
Examples to think after class
• What is the joint probability of D1 and S1?
• What is the joint probability of D2 and S2?

• What is the conditional probability Pr[S1=s | D1=i] for 0i5 and


0s10?
• What is the conditional probability Pr[D1=i | S2=s] for 0i5 and
0s5?

• Are D1 and S1 independent?


• Are D1 and S2 independent?

CS555 Topic 2 28
Bayes’ Theorem

Bayes’ Theorem
If P[y] > 0 then

P[ x]P[ y | x]
P[ x | y ] 
P[ y ]
Corollary
X and Y are independent random variables
iff P[x|y] = P[x], for all x  X and all y  Y.

CS555 Topic 2 29
End Math

CS555 Topic 2 30
Ways to Enhance the Substitution
Cipher against Frequency Analysis
• Using nulls
– e.g., using numbers from 1 to 99 as the ciphertext
alphabet, some numbers representing nothing and are
inserted randomly
• Deliberately misspell words
– e.g., “Thys haz thi ifekkt off diztaughting thi ballans off
frikwenseas”
• Homophonic substitution cipher
– each letter is replaced by a variety of substitutes
• These make frequency analysis more difficult,
but not impossible

CS555 Topic 2 31
Towards the Polyalphabetic
Substitution Ciphers
• Main weaknesses of monoalphabetic substitution ciphers
– In ciphertext, different letters have different frequency
• each letter in the ciphertext corresponds to only one letter in the
plaintext letter
• Idea for a stronger cipher (1460’s by Alberti)
– Use more than one cipher alphabet, and switch between them
when encrypting different letters
• As result, frequencies of letters in ciphertext are similar
• Developed into a practical cipher by Vigenère (published
in 1586)

CS555 Topic 2 32
The Vigenère Cipher
Treat letters as numbers: [A=0, B=1, C=2, …, Z=25]
Number Theory Notation: Zn= {0, 1, …, n-1}
Definition:
Given m, a positive integer, P = C = (Z26)n, and K = (k1, k2,
… , km) a key, we define:
Encryption:
ek(p1, p2… pm) = (p1+k1, p2+k2…pm+km) (mod 26)
Decryption:
dk(c1, c2… cm) = (c1-k1, c2-k2 … cm- km) (mod 26)
Example:
Plaintext: C R Y P T O G R A P H Y
Key: LUCKLUC KLUCK
Ciphertext: N L A Z E I I B L J J I

CS555 Topic 2 33
Security of Vigenere Cipher
• Vigenere masks the frequency with which a
character appears in a language: one letter in
the ciphertext corresponds to multiple letters
in the plaintext. Makes the use of frequency
analysis more difficult.
• Any message encrypted
by a Vigenere cipher is a
collection of as many shift ciphers as there
are letters in the key.

CS555 Topic 2 34
Vigenere Cipher: Cryptanalysis
• Find the length of the key.
• Divide the message into
that many simple
substitution encryptions.
• Solve the resulting simple
substitutions.
– how?

CS555 Topic 2 35
How to Find the Key Length?
• For Vigenere, as the length of the keyword
increases, the letter frequency shows less
English-like characteristics and becomes
more random.
• Two methods to find the key
length:
– Kasisky test
– Index of coincidence
(Friedman)

CS555 Topic 2 36
Kasisky Test
• Note: two identical segments of plaintext, will be
encrypted to the same ciphertext, if the they
occur in the text at the distance , (0 (mod
m), m is the key length).
• Algorithm:
– Search for pairs of identical
segments of length at least 3
– Record distances between
the two segments: 1, 2, …
– m divides gcd(1, 2, …)

CS555 Topic 2 37
Example of the Kasisky Test

Key K I N G K I N G K I N G K I N G K I N G K I N G
PT t h e s u n a n d t h e m a n i n t h e m o o n
CT D P R Y E V N T N B U K W I A O X B U K W W B T

CS555 Topic 2 38
Index of Coincidence (Friedman)
Informally: Measures the probability that two random
elements of the n-letters string x are identical.
Definition:
Suppose x = x1x2…xn is a string of n alphabetic characters.
Then Ic(x), the index of coincidence is:

I ( x)  P( x  x )
when i and j are uniformly randomly chosen from [1..n]
c i j

CS555 Topic 2 39
Index of Coincidence (cont.)

• Consider the plaintext x, and f0, f1, … f25 are the


frequencies with which A, B, … Z appear in x and p0, p1,
… p25 are the probabilities with which A, B, … Z appear
in x.
• That is pi = fi / n where n is the length of x

• We want to compute Ic(x).

• Given frequencies of all letters in an alphabet, index of


coincidence is a feature of the frequencies
• It does not change under substitution

CS555 Topic 2 40
Index of Coincidence (cont.)
• We can choose two elements out of the
string of size n in  n  ways
 2
• For each i, there are  fi  ways of choosing
 
the elements to be i 2 

S
 fi  S S
  
i 0  2 
 f i ( f i  1)  fi
2
S
  pi
2
I C ( x)   i 0
 i 0

 n n(n  1) n2 i 0
 
 2

CS555 Topic 2 41
Index of Coincidence of English
• For English, S = 25 and pi can be estimated

Letter pi Letter pi Letter pi Letter pi


A .082 H .061 O .075 V .010
B .015 I .070 P .019 W .023
C .028 J .002 Q .001 X .001
D .043 K .008 R .060 Y .020
E .127 L .040 S .063 Z .001
F .022 M .024 T .091
G .020 N .067 U .028

i  25
I c ( x)   pi  0.065
2

i 0

CS555 Topic 2 42
Finding the Key Length
y = y1y2…yn, , assum m is the key length, write y vertically in an
m-row array

 y1 y m 1 ... y n  m 1  y1
y y m 2 ... y nm 2  
 2 y2
 ... ... ... ...  …
 
 ym y2m ... yn  ym

CS555 Topic 2 43
Finding out the Key Length
• If m is the key length, then the text ``looks like’’
English text
i  25
I c ( y i )   pi  0.065 1  i  m
2

i 0

• If m is not the key length, the text ``looks like’’


random text and:

i 25
1 2 1 1
Ic   ( )  26  2   0.038
i 0 26 26 26

CS555 Topic 2 44
Rotor Machines
• Basic idea: if the key in Vigenere cipher is very long,
then the attacks won’t work
• Implementation idea: multiple rounds of substitutions
• A machine consists of multiple cylinders
– Each character is encrypted by multiple cylinders
– Each cylinder has 26 states, at each state it is a
substitution cipher
– Each cylinder rotates to change states according to
different schedule

CS555 Topic 2 45
Rotor Machines
• A m-cylinder rotor machine has
– 26m different substitution ciphers
• 263 = 17576
• 264 = 456,976
• 255 = 11,881,376

CS555 Topic 2 46
Earliest Enigma Machine
• Use 3 scramblers (motors): 17576
substitutions

• 3 scramblers can be used in any


order: 6 combinations

• Plug board: allowed 6 pairs of


letters to be swapped before the
encryption process started and
after it ended.

CS555 Topic 2 47
History of the Enigma Machine
• Patented by Scherius in 1918
• Widely used by the Germans from 1926 to the
end of second world war
• First successfully broken by the Polish’s in the
thirties by exploiting the repeating of the
message key
• Then broken by the UK intelligence during the
WW II

CS555 Topic 2 49
Coming Attractions …
• Information-Theoretic secrecy (Perfect secrecy),
One-Time Pad

• Recommended reading for next lecture:


Katz and Lindell: Chapter 2

CS555 Topic 2 50

You might also like