You are on page 1of 8

Before we start with the technical

material, I want to tell you a little bit


about the history of cryptography. There's
a beautiful book on this topic by David
Kahn called the codebreakers. It covers
the history of cryptography all the way
from the Babylonian era, to the
present. Here, I'm just going to give you
a few examples of historical ciphers, all
of which are badly broken. So to talk
about ciphers the first thing I'm going to
do is introduce our friends Alice and Bob,
who are gonna be with us for the rest of
the quarter. So Alice and Bob are trying
to communicate securely and there is an
attacker who's trying to eavesdrop on
their conversation. So to communicate
securely, they're going to share a secret
key, which I'll denote by K. They both know
the secret key. But the attacker does not
know anything about this key K.
So now they're gonna use a cipher, which
is a pair of algorithms, an
encryption algorithm denoted by E, and a
decryption algorithm, D. These algorithms
work as follows: the encryption algorithm
E takes the message M as inputs. And it
takes as inputs, the key K. I'm gonna
put a wedge above the key input just
to denote the fact that this input is
really the key input. And then it outputs
a ciphertext. Which is the encryption of
the message M using the key K. I'm always
gonna write the key first. Now, and when I
write: = what I mean is that the
expression defines what C what the
variable C stands for. Now the ciphertext
is transmitted over the internet to Bob,
somehow. Actually it could be transmitted
over the internet. Could be transmitted
using an encrypted file system, it doesn't
really matter, but when the ciphertext
reaches Bob, he can plug it into the
decryption algorithm and give the
decryption algorithm the same key K.
Again, I'm gonna put a wedge here as well. To
denote the key inputs and the decryption
algorithm outputs the original plaintext
message. Now the reason we say that these
are symmetric ciphers is that both the
encrypter and decrypter actually use the
same key K. As we'll see later
in the course, there are ciphers where the
encrypter uses one key and the decrypter
uses a different key. But here we're just
going to focus on symmetric cipher where
both sides use the same key. Okay, so let
me give you a few historic examples of
ciphers. The first example, the simplest
one is called the substitution cipher. I

am sure all of you played the substitution


ciphers when you were in kindergarten.
Basically a key for a substitution cipher
is a substitution table that basically
says how to map our letters. So here for
example the letter A would be mapped to C,
the letter B would be mapped to W the
letter C would be mapped to N so on and so
forth and then the letter Z would be
mapped, say, to A. So this is one example
of a key for a substitution cipher. So just
to practice the notation we introduced
before, the encryption of a certain
message using this key, let's say the
message is BCZA, the encryption of this
message using this key here would be, is
done by substituting one letter at a time.
So B becomes W, C becomes N, Z becomes A,
and A becomes C. So the encryption of BCZA
is WNAC, and this defines the ciphertext.
Similarly we can decrypt the ciphertext using the same key and of course
we'll get back the original message.
Okay. So, just for historical
reasons, there is one example of something
that's related to the substitution cipher
called the Caesar cipher. The Caesar
cipher, actually, is not really a cipher
at all. And the reason is that it doesn't
have a key. What a Caesar cipher is, is
basically a substitution cipher where the
substitution is fixed. Namely, it's a
shift by three. So A becomes D, B becomes
E, C becomes F and so on and so forth.
What is it, Y becomes B and Z becomes C.
It's a fixed substitution that's applied
to all plaintext messages. So, again,
this is not a cipher, because there is no
key, the key is fixed. So if an attacker
knows how our encryption scheme works, he
can easily decrypt the message. The key is
not random, and therefore, decryption is
very easy once you understand how the
scheme actually works. Okay, so now, let's
go back to the substitution cipher, where
the keys really are chosen at random, the
substitution tables are chosen at random.
And let's see how we break this
substitution cipher. Turns out to be very
easy to break. The first question is, how
big is the key space? How many different
keys are there, assuming we have 26
letters? So, I hope all of you said that,
the number of keys is 26 factorial,
because, a key, a substitution key, is
simply a table, a permutation of all 26
letters. The number of permutations of 26
letters, is 26 factorial. If you calculate
this out, twenty-sixth factorial is about
two to the 88th, which means that
describing a key in a substitution cipher

takes about 88 bits. So, each key is


represented by about 88 bits. Now, this is
a perfectly fine size for a keyspace. In
fact, we're gonna be seeing ciphers that
are perfectly secure, or, you know, that
are adequately secure, with key spaces
that are roughly of this size. However,
even though the substitution cipher has a
large key space of size 2^88, it's
still terribly insecure. So let's see how
to break it. And to break it, we're going
to be using letter frequencies. So the
first question is, what is the most
frequent letter in English text? So I
imagine all of you know that, in fact, E
is the most common letter. And that is
gonna, if we make it quantitative, that's
gonna help us break a substitution cipher.
So just given the ciphertext, we can
already recover the plaintext. So the way
we do that is, first of all, using
frequencies of English letters. So here's
how this works. So you give me an
encrypted message using the substitution
cipher. All I know is that the plaintext
is in English and I know that the letter E
is the most frequent letter in English.
And in fact, it appears 12.7 percent of
the time in standard English texts. So
what I'll do is I'll look at the ciphertext you gave me and I'm going to count
how many times every letter appears. Now
the most common letter in the ciphertext
is going to be the encryption of the
letter E with very high probability. So
now I'm able to recover one entry in the
key table. Mainly the letter, mainly now I
know what the letter E maps to. The
next, most common letter in English is the
letter T, that appears about 9.1 percent
of the time. So now again, I count how
many times each letter appears in the
ciphertext. And the second most frequent
letter is very likely to be the encryption
of the letter T. So now I've recovered a
second entry in the key table. And I can
continue this way. In fact, the letter A
is the next most common letter; it appears
8.1 percent of the time. So now I can
guess that the third most common letter in
the ciphertext is the encryption of the
letter A. And now I've recovered three
entries in the key table. Well, so now
what do I do? The remaining letters in
English appear roughly same amount of
time, other than some rare letters like Q
and X. But we're kinda stuck at this
point. We figured out three entries in
the key table but what do we do next? So,
the next idea is to use frequencies of
pairs of letters. Sometimes these are

called digrams. So, what I'll do is, I'll


count how many times each pair of letters
appears in the cipher text, and, I know
that in English, the most common pairs of
letters are things like, HE, AN. IN, I
guess TH is another common pair of
letters. And so I know that the most
common pair of letters in the ciphertext
is likely to be the encryption of one of
these four pairs. And so by trial and
error I can sort of figure out more entry ...
more elements in the key table and again
by more trial and error this way by
looking at trigrams. I can actually figure
out the entire key table. So the bottom
line here is that in fact the substitution
cipher is vulnerable to the worst possible
type of attack namely a ciphertext only
attack. Just given the ciphertext the
attack that can recover the decryption key
and therefore recover the original plaintext. So there's really no point in
encrypting anything using the substitution
cipher, because the attacker can easily
decrypt it all; you might as well send
your plaintext completely in the clear.
So, now we're gonna fast-forward to the
Renaissance, and, I guess we're moving
from the Roman Era to the Renaissance, and
look at a cipher designed by a fellow
named Vigener, who lived in the
sixteenth century. He designed a couple
of ciphers. Here I'm gonna show you a
variant of one of his ciphers, this is
called a Vigener cipher. So, in a
Vigener cipher, the key is a
word. In this case the word, is CRYPTO,
it's got six letters in it. And then to
encrypt a message, what you do is, you
write the message under the key. So in
this case the message is "WHAT A NICE DAY
TODAY" and then you replicate the key as
many times as needed to cover the message.
And then the way you encrypt is basically,
you add the key letters to the message
letters, modulo 26. So just to give
you an example here, for example, if you
add Y and A, you get Z. If you add T and
A, you get U. And you do this for all the
letters. And remember, whenever you add,
you add modulo 26. So if you go past Z,
you go back to A. So, that's the
Vigener cipher. And in fact,
decryption is just as easy as encryption.
Basically, the way you would decrypt is,
again, you would write the ciphertext
under the key. You would replicate the key
and then you would subtract the key from
the ciphertext to get the original plain
text message. So, breaking the Vigener
cipher is actually quite easy. Let me

show you how you do it. The first thing we


need to do is we need to assume that we
know the length of the key. So let's just
assume we know that. In this case, the
length of the key is six. And then what we
do is we break the cipher text into groups
of six letters each, okay? So we're gonna
get a bunch, a bunch of groups like this.
Each one, contains six letters. And then
we're gonna look at, the first letter in
each group. Okay? So, in this case, yes,
we're looking at the first letter, every
six characters. Now, what do we know about
these six letters? We know that, in fact,
they're all encrypted using the same
letter in the ciphertext. All of these
are encrypted using the letter c. In other
words. Z L W is a shift by three of the
plaintext letters. So if we collect all
these letters then the most common letter
among the set is likely to be the
encryption of E, right? E is the most
common letter in English, therefore, if I
look at every sixth letter, the most
common letter in that set is likely to be
the encryption of the letter E. Ahah! So
let's just suppose that in fact the most
common letter every sixth letter happens
to be the letter H. Then we know that E
plus the first letter of the key is equal
to H. That says that the first letter of
the key is equal to H minus E. And in fact
that is the letter C. So now we've
recovered the first letter of the key. Now
we can continue doing this with the second
letter. So we look at the second letter in
every group of six characters and again,
we repeat the same exercise. We find
the most common letter among the sets and
we know that the most, this most common
letter is likely the encryption of E and
therefore whatever this letter, whatever
this most common letter is if we
subtract 'E' from it we're going to get the
second letter of the key. And so on and so
forth. With, the third letter every six
characters. And this way we recover, the
entire key. And that allows us to
decrypt, the message. Now, the only caveat
is that I had to assume ahead of time that
I know the length of the key, which in
this case is six. But if I don't know the
length of the key ahead of time, that's
not a problem either. What I would do is I
would run this decryption procedure,
assuming the key length is one. Then I'd
run it assuming the key length is two.
Then I would run it assuming the key
lengths is three. And so on, and so on,
and so on, until finally I get a message.

I get a decryption that makes sense,


that's sensical. And once I do that I know
that I've kind of recovered the right
length of the key and I know that
I've also recovered the right key and
therefore the right message. Okay? So
very, very quickly, you can recover, you
can decrypt Vigener cyphers. Again,
this is a ciphertext only attack. The
interesting thing is, Vigener had a
good idea here. This addition mod
26 is actually a good idea, and we'll see
that later, except it's executed very
poorly here. And so we'll correct that, a
little bit later. Okay, we're gonna
fast-forward now from the Renaissance to, to the nineteenth century where
everything became electric. And so people
wanted to design ciphers that use electric
motors. In particular, these ciphers are
called rotor machines because they use
rotors. So an early example is called the
Hibber machine which uses a single motor.
Here you have a picture of this machine.
The, the motor, the, I guess the rotor is
over here. And the secret key is captured
by this disc here, it's embedded inside of
this disc, which rotates by one notch
every time you press a key on the
typewriter, okay? So every time you, that
you hit a key, the disc rotates by one
notch. Now what does this key do? Well,
the key actually encodes a substitution
table. So ... and therefore, the disc actually
is the secret key. And as I said, this
disc encodes a substitution table. In this
case, if you happen to press C as the
first letter, output would be the letter
T. And then the disc would rotate by one
notch. After rotating, rotating by one
notch, the new substitution table becomes
the one shown here. You see that E,
basically, moves up, and then the
remaining letters move down. So imagine
this is basically a two dimensional
rendering of the disc rotating by one
notch. Then you press the next letter. And
the disc rotates again. You notice again N
moved up and the remaining letters moved
down. So in particular, if we hit the
letter C three times, the first time we
would output, the output would be T, the
second time the output would be S, and the
third time the output wold be K. So this
is how the single rotor machine works and
as it turned out very quickly after it was
advertised it was again broken basically
using letter frequency, digram
frequencies and trigram frequencies. It's
not that hard given enough ciphertext to
directly recover the secret key and then

the message. Again, a ciphertext only


attack. So to kind of work against these
frequency attacks, these statistical
attacks, these rotor machines became more
and more complicated over time. Until
finally, I'm sure you've all heard of the
Enigma. The Enigma is a kind of
complicated rotor machine. It uses
three, four, or five rotors. There are
different versions of the Enigma
machine. Here you see an example of the
Enigma machine with three rotors. The
secret key in the Enigma machine is the
initial setting of the rotors. Okay. So in
the case of three rotors there would be 26
cubed possible different keys. When you
type on the typewriter basically these
rotors here rotate at different rates. Oh,
forgot to say this is a diagram of an Enigma
machine using four rotors. As you type on
the typewriter the rotors rotate and
output the appropriate, letters of, the
ciphertext. So in this case the number
of keys is 26 to the fourth, which is two
to the eighteen, which is relatively a
small key space. Today you can kind of,
brute-force a search using a computer
through two to the eighteen different
keys, very, very quickly. You know, my
wristwatch can do it in just a few
seconds, I guess. And so, these, this
Enigma machine was, already was using
relatively small key spaces. But I'm sure
you've all heard that the British
cryptographers at Bletchley Park were
able to mount ciphertext only attacks on
the Enigma machine. They were able to
decrypt German ciphers back in World, in
World War Two. And that played an
important role in many different battles
during the war. After the war, I guess
that was the end kind of the mechanical
age and started the digital age where
folks were using computers. And as the
world kind of migrated to using computers,
the government realized that it's buying a
lot of digital equipment from industry.
And so they wanted industry to use a good
cipher so that when it buys equipment from
the, from industry, it would be getting
equipment with, with a decent cipher. And
so the government put out this request for
proposal for a data encryption standard,
a Federal data encryption standard. And
we're gonna talk about this effort, in
more detail later on in the course, but in
1974 a group at IBM put together a cipher
that became known as DES, data encryption
standard, which became a Federal standard
for encrypting data. The key space for DES

is two to the 56, which is relatively


small these days, but was large enough
back in 1974. And another interesting
thing about DES is, rather than, unlike
rotor machines which encrypt one character
at a time the data encryption standard
encrypts 64 bits at a time, namely eight
characters at a time. And we'll see the
significance of this later on in the
course. Because DES uses such
a small key space, these days it can be
broken by a brute-force search and so
these days DES is considered
insecure and should not be used in
projects. Unfortunately, it is used in
some legacy systems, but it definitely is
on its way out and definitely should not
be used in new projects. Today there are
new ciphers, things like the advanced
encryption standard which uses 128 bit
keys. Again, we'll talk about the advanced
encryption standards in much more detail
later on in the course. There are many,
many other types of ciphers. I mentioned
Salsa20 here. We'll see why in just a
minute. But this is all for the quick
historical survey and now we can get into
the more technical material.

You might also like