19 views

Uploaded by dreamworks88

save

- The Communication Process is the Guide Toward Realizing Effective Communication
- Sardinas Patterson Like Algorithms in Coding Theory
- Bennatan2006 - arXive 0511040v1.pdf
- File Compression and Decompression
- Algebraic Coding
- Image Compression Using Haar Transform and Modified Fast Haar Wavelet Transform
- American-Cinematographer-AGUSTUS 2012 VOL 93 NO 8.pdf
- 18070508_toc
- Review: Denoising and Compression Methods
- 10. Mul Tech
- 4. Coding Dan Kompresi
- 07541330
- H.264 Considerations
- Improving the security of images transmission
- Hybrid JPEG Compression Using Edge Based Segmentation
- Chapter 1. Background
- Introduction to MATLAB
- Psyche Soul Breach Driven Dynamics Insid
- channel capacity.pdf
- sachin
- Data Domain Health Check
- Jpeg Latex
- Lossless Compression Methods
- [IJCST-V5I1P3]:H.N.Meenakshi, P.Nagabushan
- Mining Music
- Unit 2
- Ee 451 Homework 6 Spring 2016
- Communication 101
- Audio Video Formats
- Implementation of Image Compression and Decompression using JPEG Technique
- What is the difference between French & English.pptx
- Guia Enfoques de enseñanza y aprendizaje.pdf
- Encuesta170.pdf
- First Aid - Step 2 Ck-2
- Wiggins Criminal Complaint
- 136199369-CEFOTAXIME-Injection-1-Gm.doc
- Penerapan Metode Algoritma Regresi Linear dalam Memprediksi lama Pasien di RSUD ALMULK.docx
- 3000_F(esp)
- derecho-procesal-venezuela.doc
- 26_03_Aisenberg.pdf
- Toto - Georgy porgy.pdf
- el-oro-de-los-dioses-erich-von-daniken.pdf
- Marco Teórico Abril Inga Maldonado
- Srednja Evropa
- NOTAS 103.docx
- GUEROULT-Martial-Descartes-Segundo-a-Ordem-Das-Razoes.pdf
- Inventaris Depresi Beck
- Necesidad Del Bien o Servicio Del Aguacate
- cap08efintermedios (1)
- Crowood Press Aviation Series - Avro Lancaster
- Reglamento_Calidad_Agua.pdf
- Cuaderno+de+Programacion+de+Obras.pdf
- D098_2015_070915
- 1
- Ele Chegou Anderson Freire - Bass Trombone
- trupes.doc
- EKG Evi.pptx
- 375924858-Tarea-7-de-Sociologia.docx
- Civica
- asfiksia
- 2 Sockets TCP
- 15
- Servicios Al Contribuyente - EdoMex
- Sec Hat
- Las 7 Herramientas as de La Calidad
- Imagenes Configuracion de Iis
- DESARROLLO Y SIMULACIÓN DE UNA ESTACIÓN BASE
- tesis_picoblaze
- ADSL
- Untitled

You are on page 1of 104

T. W. K¨orner

February 14, 2011

Transmitting messages is an important practical problem. Coding theory

includes the study of compression codes which enable us to send messages

cheaply and error correcting codes which ensure that messages remain legible

even in the presence of errors. Cryptography on the other hand, makes sure

that messages remain unreadable — except to the intended recipient. These

techniques turn out to have much in common.

Many Part II courses go deeply into one topic so that you need to un-

derstand the whole course before you understand any part of it. They often

require a ﬁrm grasp of some preceding course. Although this course has an

underlying theme, it splits into parts which can be understood separately and,

although it does require knowledge from various earlier courses, it does not

require mastery of that knowledge. All that is needed is a little probability,

a little algebra and a fair amount of common sense. On the other hand,

the variety of techniques and ideas probably makes it harder to understand

everything in the course than in a more monolithic course.

Small print The syllabus for the course is deﬁned by the Faculty Board Schedules (which

are minimal for lecturing and maximal for examining). I should very much appreciate

being told of any corrections or possible improvements however minor. This document

is written in L

A

T

E

X2e and should be available from my home page

http://www.dpmms.cam.ac.uk/˜twk

in latex, dvi, ps and pdf formats. Supervisors can obtain comments on the exercises at

the end of these notes from the secretaries in DPMMS or by e-mail from me.

My e-mail address is twk@dpmms.

These notes are based on notes taken in the course of a previous lecturer Dr Pinch, on

the excellent set of notes available from Dr Carne’s home page and on Dr Fisher’s collection

of examples. Dr Parker and Dr Lawther produced two very useful list of corrections. Any

credit for these notes belongs to them, any discredit to me. This is a course outline. A

few proofs are included or sketched in these notes but most are omitted. Please note that

vectors are row vectors unless otherwise stated.

1

Contents

1 Codes and alphabets 3

2 Huﬀman’s algorithm 5

3 More on preﬁx-free codes 9

4 Shannon’s noiseless coding theorem 11

5 Non-independence 16

6 What is an error correcting code? 18

7 Hamming’s breakthrough 20

8 General considerations 24

9 Some elementary probability 27

10 Shannon’s noisy coding theorem 29

11 A holiday at the race track 31

12 Linear codes 33

13 Some general constructions 38

14 Polynomials and ﬁelds 43

15 Cyclic codes 47

16 Shift registers 53

17 A short homily on cryptography 59

18 Stream ciphers 61

19 Asymmetric systems 68

20 Commutative public key systems 71

21 Trapdoors and signatures 76

22 Quantum cryptography 78

2

23 Further reading 82

24 Exercise Sheet 1 85

25 Exercise Sheet 2 90

26 Exercise Sheet 3 95

27 Exercise Sheet 4 99

1 Codes and alphabets

Originally, a code was a device for making messages hard to read. The study

of such codes and their successors is called cryptography and will form the

subject of the last quarter of these notes. However, in the 19th century

the optical

1

and then the electrical telegraph made it possible to send mes-

sages speedily, but only after they had been translated from ordinary written

English or French into a string of symbols.

The best known of the early codes is the Morse code used in electronic

telegraphy. We think of it as consisting of dots and dashes but, in fact, it

had three symbols dot, dash and pause which we write as •, − and ∗. Morse

assigned a code word consisting of a sequence of symbols to each of the letters

of the alphabet and each digit. Here are some typical examples.

A → • −∗ B → −• • • ∗ C → −• −• ∗

D → −• •∗ E → •∗ F → • • −• ∗

O → −−−∗ S → • • •∗ 7 → −−• • •∗

The symbols of the original message would be encoded and the code words

sent in sequence, as in

SOS → • • • ∗ −−−∗ • • •∗,

and then decoded in sequence at the other end to recreate the original mes-

sage.

Exercise 1.1. Decode −• −• ∗ −−−∗ −• • ∗ • ∗.

1

See The Count of Monte Cristo and various Napoleonic sea stories. A statue to the

inventor of the optical telegraph (semaphore) was put up in Paris in 1893 but melted down

during World War II and not replaced (http://hamradio.nikhef.nl/tech/rtty/chappe/). In

the parallel universe of Disc World the clacks is one of the wonders of the Century of the

Anchovy.

3

Morse’s system was intended for human beings. Once machines took

over the business of encoding, other systems developed. A very inﬂuential

one called ASCII was developed in the 1960s. This uses two symbols 0 and

1 and all code words have seven symbols. In principle, this would give 128

possibilities, but 0000000 and 1111111 are not used, so there are 126 code

words allowing the original message to contain a greater variety of symbols

than Morse code. Here are some typical examples

A → 1000001 B → 1000010 C → 1000011

a → 1100001 b → 1100010 c → 1100011

+ → 0101011 ! → 0100001 7 → 0110111

Exercise 1.2. Encode b7!. Decode 110001111000011100010.

More generally, we have two alphabets / and B and a coding function

c : / → B

∗

where B

∗

consists of all ﬁnite sequences of elements of B. If /

∗

consists of all ﬁnite sequences of elements of /, then the encoding function

c

∗

: /

∗

→ B

∗

is given by

c

∗

(a

1

a

2

. . . a

n

) = c(a

1

)c(a

2

) . . . c(a

n

).

We demand that c

∗

is injective, since otherwise it is possible to produce two

messages which become indistinguishable once encoded.

We call codes for which c

∗

is injective decodable.

For many purposes, we are more interested in the collection of code words

( = c(/) than the coding function c. If we look at the code words of Morse

code and the ASCII code, we observe a very important diﬀerence. All the

code words in ASCII have the same length (so we have a ﬁxed length code),

but this is not true for the Morse code (so we have a variable length code).

Exercise 1.3. Explain why (if c is injective) any ﬁxed length code is decod-

able.

A variable length code need not be decodable even if c is injective.

Exercise 1.4. (i) Let / = B = ¦0, 1¦. If c(0) = 0, c(1) = 00 show that c is

injective but c

∗

is not.

(ii) Let / = ¦1, 2, 3, 4, 5, 6¦ and B = ¦0, 1¦. Show that there is a variable

length coding c such that c is injective and all code words have length 2 or

less. Show that there is no decodable coding c such that all code words have

length 2 or less.

However, there is a family of variable length codes which are decodable

in a natural way.

4

Deﬁnition 1.5. Let B be an alphabet. We say that a ﬁnite subset ( of B

∗

is

preﬁx-free if, whenever w ∈ ( is an initial sequence of w

′

∈ (, then w = w

′

.

If c : / → B

∗

is a coding function, we say that c is preﬁx-free if c is injective

and c(/) is preﬁx-free.

If c is preﬁx-free, then, not only is c

∗

injective, but we can decode messages

on the ﬂy. Suppose that we receive a sequence b

1

, b

2

, . . . . The moment we

have received some c(a

1

), we know that the ﬁrst message was a

1

and we can

proceed to look for the second message. (For this reason preﬁx-free codes are

sometimes called instantaneous codes or self punctuation codes.)

Exercise 1.6. Let / = ¦0, 1, 2, 3¦, B = ¦0, 1¦. If c, ˜ c : / → B

∗

are given by

c(0) = 0 ˜ c(0) = 0

c(1) = 10 ˜ c(1) = 01

c(2) = 110 ˜ c(2) = 011

c(3) = 111 ˜ c(3) = 111

show that c is preﬁx-free, but ˜ c is not. By thinking about the way ˜ c is obtained

from c, or otherwise, show that ˜ c

∗

is injective.

Exercise 1.7. Why is every injective ﬁxed length code automatically preﬁx-

free?

From now on, unless explicitly stated otherwise, c will be injective and

the codes used will be preﬁx-free. In section 3 we show that we lose nothing

by conﬁning ourselves to preﬁx-free codes.

2 Huﬀman’s algorithm

An electric telegraph is expensive to build and maintain. However good a

telegraphist was, he could only send or receive a limited number of dots and

dashes each minute. (This is why Morse chose a variable length code. The

telegraphist would need to send the letter E far more often than the letter

Q so Morse gave E the short code •∗ and Q the long code −−• −∗.) It is

possible to increase the rate at which symbols are sent and received by using

machines, but the laws of physics (backed up by results in Fourier analysis)

place limits on the number of symbols that can be correctly transmitted over

a given line. (The slowest rates were associated with undersea cables.)

Customers were therefore charged so much a letter or, more usually, so

much a word

2

(with a limit on the permitted word length). Obviously it made

2

Leading to a prose style known as telegraphese. ‘Arrived Venice. Streets ﬂooded.

Advise.’

5

sense to have books of ‘telegraph codes’ in which one ﬁve letter combination,

say, ‘FTCGI’ meant ‘are you willing to split the diﬀerence?’ and another

‘FTCSU’ meant ‘cannot see any diﬀerence’

3

.

Today messages are usually sent in as binary sequences like 01110010 . . . ,

but the transmission of each digit still costs money. If we know that there

are n possible messages that can be sent and that n ≤ 2

m

, then we can assign

each message a diﬀerent string of m zeros and ones (usually called bits) and

each message will cost mK cents where K is the cost of sending one bit.

However, this may not be the best way of saving money. If, as often

happens, one message (such as ‘nothing to report’) is much more frequent

than any other then it may be cheaper on average to assign it a shorter code

word even at the cost of lengthening the other code words.

Problem 2.1. Given n messages M

1

, M

2

, . . . , M

n

such that the probability

that M

j

will be chosen is p

j

, ﬁnd distinct code words C

j

consisting of l

j

bits

so that the expected cost

K

n

j=1

p

j

l

j

of sending the code word corresponding to the chosen message is minimised.

Of course, we suppose K > 0.

The problem is interesting as it stands, but we have not taken into account

the fact that a variable length code may not be decodable. To deal with this

problem we add an extra constraint.

Problem 2.2. Given n messages M

1

, M

2

, . . . , M

n

such that the probability

that M

j

will be chosen is p

j

, ﬁnd a preﬁx-free collection of code words C

j

consisting of l

j

bits so that the expected cost

K

n

j=1

p

j

l

j

of sending the code word corresponding to the chosen message is minimised.

In 1951 Huﬀman was asked to write an essay on this problem as an end

of term university exam. Instead of writing about the problem, he solved it

completely.

3

If the telegraph company insisted on ordinary words you got codes like ‘FLIRT’ for

‘quality of crop good’. Google ‘telegraphic codes and message practice, 1870-1945’ for lots

of examples.

6

Theorem 2.3. [Huﬀman’s algorithm] The following algorithm solves Prob-

lem 2.2 with n messages. Order the messages so that p

1

≥ p

2

≥ ≥ p

n

.

Solve the problem with n −1 messages M

′

1

, M

′

2

, . . . , M

′

n−1

such that M

′

j

has

probability p

j

for 1 ≤ j ≤ n − 2, but M

′

n−1

has probability p

n−1

+ p

n

. If

C

′

j

is the code word corresponding to M

′

j

, the original problem is solved by

assigning M

j

the code word C

′

j

for 1 ≤ j ≤ n − 2 and M

n−1

the code word

consisting of C

′

n−1

followed by 0 and M

n

the code word consisting of C

′

n−1

followed by 1.

Since the problem is trivial when n = 2 (give M

1

the code word 0 and

M

2

the code word 1) this gives us what computer programmers and logicians

call a recursive solution.

Recursive programs are often better adapted to machines than human

beings, but it is very easy to follow the steps of Huﬀman’s algorithm ‘by

hand’. (Note that the algorithm is very speciﬁc about the labelling of the

code words so that, for example, message 1

Example 2.4. Suppose n = 4, M

j

has probability j/10 for 1 ≤ j ≤ 4. Apply

Huﬀman’s algorithm.

Solution. (Note that we do not bother to reorder messages.) Combining

messages in the suggested way, we get

1, 2, 3, 4

[1, 2], 3, 4

[[1, 2], 3], 4.

Working backwards, we get

C

[[1,2],3]

= 0 . . . , C

4

= 1

C

[1,2]

= 01 . . . , C

3

= 00

C

1

= 011, C

2

= 010.

The reader is strongly advised to do a slightly more complicated example

like the next.

Exercise 2.5. Suppose M

j

has probability j/45 for 1 ≤ j ≤ 9. Apply Huﬀ-

man’s algorithm.

As we indicated earlier, the eﬀects of Huﬀman’s algorithm will be most

marked when a few messages are highly probable.

7

Exercise 2.6. Suppose n = 64, M

1

has probability 1/2, M

2

has probability

1/4 and M

j

has probability 1/248 for 3 ≤ j ≤ 64. Explain why, if we use

code words of equal length then the length of a code word must be at least 6.

By using the ideas of Huﬀman’s algorithm (you should not need to go through

all the steps) obtain a set of code words such that the expected length of a

code word sent is not more than 3.

Whilst doing the exercises the reader must already have been struck

by the fact that minor variations in the algorithm produce diﬀerent codes.

(Note, for example that, if we have a Huﬀman code, then interchanging the

role of 0 and 1 will produce another Huﬀman type code.) In fact, although

the Huﬀman algorithm will always produce a best code (in the sense of Prob-

lem 2.2), there may be other equally good codes which could not be obtained

in this manner.

Exercise 2.7. Suppose n = 4, M

1

has probability .23, M

2

has probability .24,

M

3

has probability .26 and M

4

has probability .27. Show that any assignment

of the code words 00, 01, 10 and 11 produces a best code in the sense of

Problem 2.2.

The fact that the Huﬀman code may not be the unique best solution

means that we need to approach the proof of Theorem 2.3 with caution. We

observe that reading a code word from a preﬁx-free code is like climbing a

tree with 0 telling us to take the left branch and 1 the right branch. The fact

that the code is preﬁx-free tells us that each code word may be represented

by a leaf at the end of a ﬁnal branch. Thus, for example, the code word

00101 is represented by the leaf found by following left branch, left branch,

right branch, left branch, right branch. The next lemma contains the essence

of our proof of Theorem 2.3.

Lemma 2.8. (i) If we have a best code then it will split into a left branch

and right branch at every stage.

(ii) If we label every branch by the sum of the probabilities of all the leaves

that spring from it then, if we have a best code, every branch belonging to a

particular stage of growth will have at least as large a number associated with

it as any branch belonging to a later stage.

(iii) If we have a best code then interchanging the probabilities of leaves

belonging to the last stage (ie the longest code words) still gives a best code.

(iv) If we have a best code then two of the leaves with the lowest probabil-

ities will appear at the last stage.

(v) There is a best code in which two of the leaves with the lowest proba-

bilities are neighbours (have code words diﬀering only in the last place).

8

In order to use the Huﬀman algorithm we need to know the probabilities

of the n possible messages. Suppose we do not. After we have sent k messages

we will know that message M

j

has been sent k

j

times and so will the recipient

of the message. If we decide to use a Huﬀman code for the next message, it

is not unreasonable (lifting our hat in the direction of the Reverend Thomas

Bayes) to take

p

j

=

k

j

+ 1

k +n

.

Provided the recipient knows the exact version of the Huﬀman algorithm

that we use, she can reconstruct our Huﬀman code and decode our next

message. Variants of this idea are known as ‘Huﬀman-on-the-ﬂy’ and form

the basis of the kind of compression programs used in your computer. Notice

however, that whilst Theorem 2.3 is an examinable theorem, the contents of

this paragraph form a non-examinable plausible statement.

3 More on preﬁx-free codes

It might be thought that Huﬀman’s algorithm says all that is to be said on

the problem it addresses. However, there are two important points that need

to be considered. The ﬁrst is whether we could get better results by using

codes which are not preﬁx-free. The object of this section is to show that

this is not the case.

As in section 1, we consider two alphabets / and B and a coding function

c : / → B

∗

(where, as we said earlier, B

∗

consists of all ﬁnite sequences of

elements of B). For most of this course B = ¦0, 1¦, but in this section we

allow B to have D elements. The elements of B

∗

are called words.

Lemma 3.1. [Kraft’s inequality 1] If a preﬁx-free code ( consists of n

words C

j

of length l

j

, then

n

j=1

D

−l

j

≤ 1.

Lemma 3.2. [Kraft’s inequality 2] Given strictly positive integers l

j

sat-

isfying

n

j=1

D

−l

j

≤ 1,

we can ﬁnd a preﬁx-free code ( consisting of n words C

j

of length l

j

.

Proof. Take l

1

≤ l

2

≤ ≤ l

n

. We give an inductive construction for an

appropriate preﬁx-free code. Start by choosing C

1

to be any code word of

length l

1

.

9

Suppose that we have found a collection of r preﬁx-free code words C

k

of length l

k

[1 ≤ k ≤ r]. If r = n we are done. If not, consider all possible

code words of length l

r+1

. Of these D

l

r+1

−l

k

will have preﬁx C

k

so at most

(in fact, exactly)

r

k=1

D

l

r+1

−l

k

will have one of the code words already selected as preﬁx. By hypothesis

r

k=1

D

l

r+1

−l

k

= D

l

r+1

r

k=1

D

−l

k

< D

l

r+1

.

Since there are D

l

r+1

possible code words of length l

r+1

there is at least

one ‘good code word’ which does not have one of the code words already

selected as preﬁx. Choose one of the good code words as C

r+1

and restart

the induction.

The method used in the proof is called a ‘greedy algorithm’ because we

just try to do the best we can at each stage without considering future

consequences.

Lemma 3.1 is pretty but not deep. MacMillan showed that the same

inequality applies to all decodable codes. The proof is extremely elegant and

(after one has thought about it long enough) natural.

Theorem 3.3. [The MacMillan inequality] If a decodable code ( consists

of n words C

j

of length l

j

, then

n

j=1

D

−l

j

≤ 1.

Using Lemma 3.2 we get the immediate corollary.

Lemma 3.4. If there exists a decodable code ( consisting of n words C

j

of

length l

j

, then there exists a preﬁx-free code (

′

consisting of n words C

′

j

of

length l

j

.

Thus if we are only concerned with the length of code words we need only

consider preﬁx-free codes.

10

4 Shannon’s noiseless coding theorem

In the previous section we indicated that there was a second question we

should ask about Huﬀman’s algorithm. We know that Huﬀman’s algorithm

is best possible, but we have not discussed how good the best possible should

be.

Let us restate our problem. (In this section we allow the coding alphabet

B to have D elements.)

Problem 4.1. Given n messages M

1

, M

2

, . . . , M

n

such that the probability

that M

j

will be chosen is p

j

, ﬁnd a decodable code ( whose code words C

j

consist of l

j

bits so that the expected cost

K

n

j=1

p

j

l

j

of sending the code word corresponding to the chosen message is minimised.

In view of Lemma 3.2 (any system of lengths satisfying Kraft’s inequality

is associated with a preﬁx-free and so decodable code) and Theorem 3.3

(any decodable code satisﬁes Kraft’s inequality), Problem 4.1 reduces to an

abstract minimising problem.

Problem 4.2. Suppose p

j

≥ 0 for 1 ≤ j ≤ n and

n

j=1

p

j

= 1. Find strictly

positive integers l

j

minimising

n

j=1

p

j

l

j

subject to

n

j=1

D

−l

j

≤ 1.

Problem 4.2 is hard because we restrict the l

j

to be integers. If we drop

the restriction we end up with a problem in Part IB variational calculus.

Problem 4.3. Suppose p

j

≥ 0 for 1 ≤ j ≤ n and

n

j=1

p

j

= 1. Find strictly

positive real numbers x

j

minimising

n

j=1

p

j

x

j

subject to

n

j=1

D

−x

j

≤ 1.

Calculus solution. Observe that decreasing any x

k

decreases

n

j=1

p

j

x

j

and

increases

n

j=1

D

−x

j

. Thus we may demand

n

j=1

D

−x

j

= 1.

11

The Lagrangian is

L(x, λ) =

n

j=1

p

j

x

j

−λ

n

j=1

D

−x

j

.

Since

∂L

∂x

j

= p

j

+ (λlog D)D

−x

j

we know that, at any stationary point,

D

−x

j

= K

0

(λ)p

j

for some K

0

(λ) > 0. Since

n

j=1

D

−x

j

= 1, our original problem will have a

stationarising solution when

D

−x

j

= p

j

, that is to say x

j

= −

log p

j

log D

and

n

j=1

p

j

x

j

= −

n

j=1

p

j

log p

j

log D

.

It is not hard to convince oneself that the stationarising solution just

found is, in fact, maximising, but it is an unfortunate fact that IB variational

calculus is suggestive rather than conclusive.

The next two exercises (which will be done in lectures and form part of

the course) provide a rigorous proof.

Exercise 4.4. (i) Show that

log t ≤ t −1

for t > 0 with equality if and only if t = 1.

(ii) [Gibbs’ inequality] Suppose that p

j

, q

j

> 0 and

n

j=1

p

j

=

n

j=1

q

j

= 1.

By applying (i) with t = q

j

/p

j

, show that

n

j=1

p

j

log p

j

≥

n

j=1

p

j

log q

j

with equality if and only if p

j

= q

j

.

12

Exercise 4.5. We use the notation of Problem 4.3.

(i) Show that, if x

∗

j

= −log p

j

/ log D, then x

∗

j

> 0 and

n

j=1

D

−x

∗

j

= 1.

(ii) Suppose that y

j

> 0 and

n

j=1

D

−y

j

= 1.

Set q

j

= D

−y

j

. By using Gibbs’ inequality from Exercise 4.4 (ii), show that

n

j=1

p

j

x

∗

j

≤

n

j=1

p

j

y

j

with equality if and only if y

j

= x

∗

j

for all j.

Analysts use logarithms to the base e, but the importance of two-symbol

alphabets means that communication theorists often use logarithms to the

base 2.

Exercise 4.6. (Memory jogger.) Let a, b > 0. Show that

log

a

b =

log b

log a

.

The result of Problem 4.3 is so important that it gives rise to a deﬁnition.

Deﬁnition 4.7. Let / be a non-empty ﬁnite set and A a random variable

taking values in /. If A takes the value a with probability p

a

we say that the

system has Shannon entropy

4

(or information entropy)

H(A) = −

a∈A

p

a

log

2

p

a

.

Theorem 4.8. Let / and B be ﬁnite alphabets and let B have D symbols. If

A is an /-valued random variable, then any decodable code c : / → B must

satisfy

E[c(A)[ ≥

H(A)

log

2

D

.

4

It is unwise for the beginner and may or may not be fruitless for the expert to seek a

link with entropy in physics.

13

Here [c(A)[ denotes the length of c(A). Notice that the result takes a

particularly simple form when D = 2.

In Problem 4.3 the x

j

are just positive real numbers but in Problem 4.2

the d

j

are integers. Choosing d

j

as close as possible to the best x

j

may not

give the best d

j

, but it is certainly worth a try.

Theorem 4.9. [Shannon–Fano encoding] Let / and B be ﬁnite alphabets

and let B have D symbols. If A is an /-valued random variable, then there

exists a preﬁx-free (so decodable) code c : / → B which satisﬁes

E[c(A)[ ≤ 1 +

H(A)

log

2

D

.

Proof. By Lemma 3.2 (which states that given lengths satisfying Kraft’s

inequality we can construct an associated preﬁx-free code), it suﬃces to ﬁnd

strictly positive integers l

a

such that

a∈A

D

−l

a

≤ 1, but

a∈A

p

a

l

a

≤ 1 +

H(A)

log

2

D

.

If we take

l

a

= ⌈−log

D

p

a

⌉,

that is to say, we take l

a

to be the smallest integer no smaller than −log

D

p

a

,

then these conditions are satisﬁed and we are done.

It is very easy to use the method just indicated to ﬁnd an appropriate

code. (Such codes are called Shannon–Fano codes

5

. Fano was the professor

who set the homework for Huﬀman. The point of view adopted here means

that for some problems there may be more than one Shannon–Fano code.)

Exercise 4.10. (i) Let / = ¦1, 2, 3, 4¦. Suppose that the probability that

letter k is chosen is k/10. Use your calculator

6

to ﬁnd ⌈−log

2

p

k

⌉ and write

down an appropriate Shannon–Fano code c.

(ii) We found a Huﬀman code c

h

for the system in Example 2.4. Show

7

that the entropy is approximately 1.85, that E[c(A)[ = 2.4 and that E[c

h

(A)[ =

1.9. Check that these results are consistent with our previous theorems.

5

Wikipedia and several other sources give a deﬁnition of Shannon–Fano codes which

is deﬁnitely inconsistent with that given here. Within a Cambridge examination context

you may assume that Shannon–Fano codes are those considered here.

6

If you have no calculator, your computer has a calculator program. If you have no

computer, use log tables. If you are on a desert island, just think.

7

Unless you are on a desert island in which case the calculations are rather tedious.

14

Putting Theorems 4.8 and Theorem 4.9 together, we get the following

remarkable result.

Theorem 4.11. [Shannon’s noiseless coding theorem] Let / and B

be ﬁnite alphabets and let B have D symbols. If A is a /-valued random

variable, then any decodable code c which minimises E[c(A)[ satisﬁes

H(A)

log

2

D

≤ E[c(A)[ ≤ 1 +

H(A)

log

2

D

.

In particular, Huﬀman’s code c

h

for two symbols satisﬁes

H(A) ≤ E[c

h

(A)[ ≤ 1 +H(A).

Exercise 4.12. (i) Sketch h(t) = −t log t for 0 ≤ t ≤ 1. (We deﬁne h(0) =

0.)

(ii) Let

Γ =

_

p ∈ R

n

: p

j

≥ 0,

n

j=1

p

j

= 1

_

and let H : Γ → R be deﬁned by

H(p) =

n

j=1

h(p

j

).

Find the maximum and minimum of H and describe the points where these

values are attained.

(iii) If n = 2

r

+s with 0 ≤ s < 2

r

and p

j

= 1/n, describe the Huﬀman code

c

h

for two symbols and verify directly that (with notation of Theorem 4.11)

H(A) ≤ E[c

h

(A)[ ≤ 1 +H(A).

Waving our hands about wildly, we may say that ‘A system with low

Shannon entropy is highly organised and, knowing the system, it is usually

quite easy to identify an individual from the system’.

Exercise 4.13. The notorious Trinity gang has just been rounded up and

Trubshaw of the Yard wishes to identify the leader (or Master, as he is called).

Sam the Snitch makes the following oﬀer. Presented with any collection of

members of the gang he will (by a slight twitch of his left ear) indicate if the

Master is among them. However, in view of the danger involved, he demands

ten pounds for each such encounter. Trubshaw believes that the probability of

the jth member of the gang being the Master is p

j

[1 ≤ j ≤ n] and wishes to

minimise the expected drain on the public purse. Advise him.

15

5 Non-independence

(This section is non-examinable.)

In the previous sections we discussed codes c : / → B

∗

such that, if a

letter A ∈ / was chosen according to some random law, E[c(A)[ was about

as small as possible. If we choose A

1

, A

2

, . . . independently according to the

same law, then it is not hard to convince oneself that

E[c

∗

(A

1

A

2

A

3

. . . A

n

)[ = nE[c(A)[

will be as small as possible.

However, in re*l lif* th* let*ers a*e often no* i*d*p***ent. It is sometimes

possible to send messages more eﬃciently using this fact.

Exercise 5.1. Suppose that we have a sequence X

j

of random variables

taking the values 0 and 1. Suppose that X

1

= 1 with probability 1/2 and

X

j+1

= X

j

with probability .99 independent of what has gone before.

(i) Suppose we wish to send ten successive bits X

j

X

j+1

. . . X

j+9

. Show

that if we associate the sequence of ten zeros with 0, the sequence of ten

ones with 10 and any other sequence a

0

a

1

. . . a

9

with 11a

0

a

1

. . . a

9

we have

a decodable code which on average requires about 5/2 bits to transmit the

sequence.

(ii) Suppose we wish to send the bits X

j

X

j+10

6X

j+2×10

6 . . . X

j+9×10

6. Ex-

plain why any decodable code will require on average at least 10 bits to trans-

mit the sequence. (You need not do detailed computations.)

If we transmit sequences of letters by forming them into longer words

and coding the words, we say we have a block code. It is plausible that the

longer the blocks, the less important the eﬀects of non-independence. In more

advanced courses it is shown how to deﬁne entropy for systems like the one

discussed in Exercise 5.1 (that is to say Markov chains) and that, provided

we take long enough blocks, we can recover an analogue of Theorem 4.11

(the noiseless coding theorem).

In the real world, the problem lies deeper. Presented with a photograph,

we can instantly see that it represents Lena wearing a hat. If a machine

reads the image pixel by pixel, it will have great diﬃculty recognising much,

apart from the fact that the distribution of pixels is ‘non-random’ or has

‘low entropy’ (to use the appropriate hand-waving expressions). Clearly, it

ought to be possible to describe the photograph with many fewer bits than are

required to describe each pixel separately, but, equally clearly, a method that

works well on black and white photographs may fail on colour photographs

and a method that works well on photographs of faces may work badly when

applied to photographs of trees.

16

Engineers have a clever way of dealing with this problem. Suppose we

have a sequence x

j

of zeros and ones produced by some random process.

Someone who believes that they partially understand the nature of the pro-

cess builds us a prediction machine which, given the sequence x

1

, x

2

, . . . x

j

so far, predicts the next term will be x

′

j+1

. Now set

y

j+1

≡ x

j+1

−x

′

j+1

mod 2.

If we are given the sequence y

1

, y

2

, . . . we can recover the x

j

inductively using

the prediction machine and the formula

x

j+1

≡ y

j+1

+x

′

j+1

mod 2.

If the prediction machine is good, then the sequence of y

j

will consist

mainly of zeros and there will be many ways of encoding the sequence as

(on average) a much shorter code word. (For example, if we arrange the

sequence in blocks of ﬁxed length, many of the possible blocks will have very

low probability, so Huﬀman’s algorithm will be very eﬀective.)

Build a better mousetrap, and the world will beat a path to your door.

Build a better prediction machine and the world will beat your door down.

There is a further real world complication. Engineers distinguish be-

tween irreversible ‘lossy compression’ and reversible ‘lossless compression’.

For compact discs, where bits are cheap, the sound recorded can be recon-

structed exactly. For digital sound broadcasting, where bits are expensive,

the engineers make use of knowledge of the human auditory system (for ex-

ample, the fact that we can not make out very soft noise in the presence of

loud noises) to produce a result that might sound perfect (or nearly so) to

us, but which is, in fact, not. For mobile phones, there can be greater loss of

data because users do not demand anywhere close to perfection. For digital

TV, the situation is still more striking with reduction in data content from

ﬁlm to TV of anything up to a factor of 60. However, medical and satellite

pictures must be transmitted with no loss of data. Notice that lossless coding

can be judged by absolute criteria, but the merits of lossy coding can only

be judged subjectively.

Ideally, lossless compression should lead to a signal indistinguishable

(from a statistical point of view) from a random signal in which the value

of each bit is independent of the value of all the others. In practice, this is

only possible in certain applications. As an indication of the kind of problem

involved, consider TV pictures. If we know that what is going to be transmit-

ted is ‘head and shoulders’ or ‘tennis matches’ or ‘cartoons’ it is possible to

obtain extraordinary compression ratios by ‘tuning’ the compression method

17

to the expected pictures, but then changes from what is expected can be dis-

astrous. At present, digital TV encoders merely expect the picture to consist

of blocks which move at nearly constant velocity remaining more or less un-

changed from frame to frame

8

. In this, as in other applications, we know

that after compression the signal still has non-trivial statistical properties,

but we do not know enough about them to exploit them.

6 What is an error correcting code?

In the introductory Section 1, we discussed ‘telegraph codes’ in which one

ﬁve letter combination ‘QWADR’, say, meant ‘please book quiet room for

two’ and another ‘QWNDR’ meant ‘please book cheapest room for one’.

Obviously, also, an error of one letter in this code could have unpleasant

consequences

9

.

Today, we transmit and store long strings of binary sequences, but face the

same problem that some digits may not be transmitted or stored correctly.

We suppose that the string is the result of data compression and so, as we

said at the end of the last section, although the string may have non-trivial

statistical properties, we do not know enough to exploit this fact. (If we knew

how to exploit any statistical regularity, we could build a prediction device

and compress the data still further.) Because of this, we shall assume that

we are asked to consider a collection of m messages each of which is equally

likely.

Our model is the following. When the ‘source’ produces one of the m

possible messages µ

i

say, it is fed into a ‘coder’ which outputs a string c

i

of n

binary digits. The string is then transmitted one digit at a time along a ‘com-

munication channel’. Each digit has probability p of being mistransmitted

(so that 0 becomes 1 or 1 becomes 0) independently of what happens to the

other digits [0 ≤ p < 1/2]. The transmitted message is then passed through

a ‘decoder’ which either produces a message µ

j

(where we hope that j = i)

or an error message and passes it on to the ‘receiver’. The technical term

for our model is the binary symmetric channel (binary because we use two

symbols, symmetric because the probability of error is the same whichever

symbol we use).

Exercise 6.1. Why do we not consider the case 1 ≥ p > 1/2? What if

p = 1/2?

8

Watch what happens when things go wrong.

9

This is a made up example, since compilers of such codes understood the problem.

18

For most of the time we shall concentrate our attention on a code C ⊆

¦0, 1¦

n

consisting of the codewords c

i

. (Thus we use a ﬁxed length code.) We

say that C has size m = [C[. If m is large then we can send a large number

of possible messages (that is to say, we can send more information) but, as m

increases, it becomes harder to distinguish between diﬀerent messages when

errors occur. At one extreme, if m = 1, errors cause us no problems (since

there is only one message) but no information is transmitted (since there is

only one message). At the other extreme, if m = 2

n

, we can transmit lots of

messages but any error moves us from one codeword to another. We are led

to the following rather natural deﬁnition.

Deﬁnition 6.2. The information rate of C is

log

2

m

n

.

Note that, since m ≤ 2

n

the information rate is never greater than 1.

Notice also that the values of the information rate when m = 1 and m = 2

n

agree with what we might expect.

How should our decoder work? We have assumed that all messages are

equally likely and that errors are independent (this would not be true if, for

example, errors occurred in bursts

10

).

Under these assumptions, a reasonable strategy for our decoder is to

guess that the codeword sent is one which diﬀers in the fewest places from

the string of n binary digits received. Here and elsewhere the discussion can

be illuminated by the simple notion of a Hamming distance.

Deﬁnition 6.3. If x, y ∈ ¦0, 1¦

n

, we write

d(x, y) =

n

j=1

[x

j

−y

j

[

and call d(x, y) the Hamming distance between x and y.

Lemma 6.4. The Hamming distance is a metric.

10

For the purposes of this course we note that this problem could be tackled by permut-

ing the ‘bits’ of the message so that ‘bursts are spread out’. In theory, we could do better

than this by using the statistical properties of such bursts to build a prediction machine.

In practice, this is rarely possible. In the paradigm case of mobile phones, the properties

of the transmission channel are constantly changing and are not well understood. (Here

the main restriction on the use of permutation is that it introduces time delays. One

way round this is ‘frequency hopping’ in which several users constantly swap transmission

channels ‘dividing bursts among users’.) One desirable property of codes for mobile phone

users is that they should ‘fail gracefully’, so that as the error rate for the channel rises the

error rate for the receiver should not suddenly explode.

19

We now do some very simple IA probability.

Lemma 6.5. We work with the coding and transmission scheme described

above. Let c ∈ C and x ∈ ¦0, 1¦

n

.

(i) If d(c, x) = r, then

Pr(x received given c sent) = p

r

(1 −p)

n−r

.

(ii) If d(c, x) = r, then

Pr(c sent given x received) = A(x)p

r

(1 −p)

n−r

,

where A(x) does not depend on r or c.

(iii) If c

′

∈ C and d(c

′

, x) ≥ d(c, x), then

Pr(c sent given x received) ≥ Pr(c

′

sent given x received),

with equality if and only if d(c

′

, x) = d(c, x).

This lemma justiﬁes our use, both explicit and implicit, throughout what

follows of the so-called maximum likelihood decoding rule.

Deﬁnition 6.6. The maximum likelihood decoding rule states that a string

x ∈ ¦0, 1¦

n

received by a decoder should be decoded as (one of ) the code-

word(s) at the smallest Hamming distance from x.

Notice that, although this decoding rule is mathematically attractive, it

may be impractical if C is large and there is often no known way of ﬁnding

the codeword at the smallest distance from a particular x in an acceptable

number of steps. (We can always make a complete search through all the

members of C but unless there are very special circumstances this is likely

to involve an unacceptable amount of work.)

7 Hamming’s breakthrough

Although we have used simple probabilistic arguments to justify it, the max-

imum likelihood decoding rule will often enable us to avoid probabilistic

considerations (though not in the very important part of this concerned with

Shannon’s noisy coding theorem) and concentrate on algebra and combi-

natorics. The spirit of most of the course is exempliﬁed in the next two

deﬁnitions.

Deﬁnition 7.1. We say that C is d error detecting if changing up to d digits

in a codeword never produces another codeword.

20

Deﬁnition 7.2. We say that C is e error correcting if knowing that a string

of n binary digits diﬀers from some codeword of C in at most e places we

can deduce the codeword.

Here are some simple schemes. Some of them use alphabets with more

than two symbols but the principles remain the same.

Repetition coding of length n. We take codewords of the form

c = (c, c, c, . . . , c)

with c = 0 or c = 1. The code C is n − 1 error detecting, and ⌊(n − 1)/2⌋

error correcting. The maximum likelihood decoder chooses the symbol that

occurs most often. (Here and elsewhere ⌊α⌋ is the largest integer N ≤ α and

⌈α⌉ is the smallest integer M ≥ α.) Unfortunately, the information rate is

1/n which is rather low

11

.

The Cambridge examination paper code Each candidate is asked to write

down a Candidate Identiﬁer of the form 1234A, 1235B, 1236C, . . . (the

eleven

12

possible letters are repeated cyclically) and a desk number. The ﬁrst

four numbers in the Candidate Identiﬁer identify the candidate uniquely. If

the letter written by the candidate does not correspond to to the ﬁrst four

numbers the candidate is identiﬁed by using the desk number.

Exercise 7.3. Show that if the candidate makes one error in the Candidate

Identiﬁer, then this will be detected. Would this be true if there were 9 possible

letters repeated cyclically? Would this be true if there were 12 possible letters

repeated cyclically? Give reasons.

Show that, if we also use the Desk Number then the combined code Can-

didate Number/Desk Number is one error correcting

The paper tape code. Here and elsewhere, it is convenient to give ¦0, 1¦ the

structure of the ﬁeld F

2

= Z

2

by using arithmetic modulo 2. The codewords

have the form

c = (c

1

, c

2

, c

3

, . . . , c

n

)

with c

1

, c

2

, . . . , c

n−1

freely chosen elements of F

2

and c

n

(the check digit)

the element of F

2

which gives

c

1

+c

2

+ +c

n−1

+c

n

= 0.

The resulting code C is 1 error detecting since, if x ∈ F

n

2

is obtained from

c ∈ C by making a single error, we have

x

1

+x

2

+ +x

n−1

+x

n

= 1.

11

Compare the chorus ‘Oh no John, no John, no John, no’.

12

My guess.

21

However it is not error correcting since, if

x

1

+x

2

+ +x

n−1

+x

n

= 1,

there are n codewords y with Hamming distance d(x, y) = 1. The informa-

tion rate is (n − 1)/n. Traditional paper tape had 8 places per line each of

which could have a punched hole or not, so n = 8.

Exercise 7.4. If you look at the inner title page of almost any book published

between 1970 and 2006 you will ﬁnd its International Standard Book Number

(ISBN). The ISBN uses single digits selected from 0, 1, . . . , 8, 9 and X rep-

resenting 10. Each ISBN consists of nine such digits a

1

, a

2

, . . . , a

9

followed

by a single check digit a

10

chosen so that

10a

1

+ 9a

2

+ + 2a

9

+a

10

≡ 0 mod 11. (*)

(In more sophisticated language, our code C consists of those elements a ∈

F

10

11

such that

10

j=1

(11 −j)a

j

= 0.)

(i) Find a couple of books

13

and check that (∗) holds for their ISBNs

14

.

(ii) Show that (∗) will not work if you make a mistake in writing down

one digit of an ISBN.

(iii) Show that (∗) may fail to detect two errors.

(iv) Show that (∗) will not work if you interchange two distinct adjacent

digits (a transposition error).

(v) Does (iv) remain true if we replace ‘adjacent’ by ‘diﬀerent’ ?

Errors of type (ii) and (iv) are the most common in typing

15

. In communi-

cation between publishers and booksellers, both sides are anxious that errors

should be detected but would prefer the other side to query errors rather than

to guess what the error might have been.

(vi) After January 2007, the appropriate ISBN is a 13 digit number

x

1

x

2

. . . x

13

with each digit selected from 0, 1, . . . , 8, 9 and the check digit

x

13

computed by using the formula

x

13

≡ −(x

1

+ 3x

2

+x

3

+ 3x

4

+ +x

11

+ 3x

12

) mod 10.

Show that we can detect single errors. Give an example to show that we

cannot detect all transpositions.

13

In case of diﬃculty, your college library may be of assistance.

14

In fact, X is only used in the check digit place.

15

Thus a syllabus for an earlier version of this course contained the rather charming

misprint of ‘snydrome’ for ‘syndrome’.

22

Hamming had access to an early electronic computer but was low down

in the priority list of users. He would submit his programs encoded on paper

tape to run over the weekend but often he would have his tape returned

on Monday because the machine had detected an error in the tape. ‘If the

machine can detect an error’ he asked himself ‘why can the machine not

correct it?’ and he came up with the following scheme.

Hamming’s original code. We work in F

7

2

. The codewords c are chosen to

satisfy the three conditions

c

1

+c

3

+c

5

+c

7

= 0

c

2

+c

3

+c

6

+c

7

= 0

c

4

+c

5

+c

6

+c

7

= 0.

By inspection, we may choose c

3

, c

5

, c

6

and c

7

freely and then c

1

, c

2

and c

4

are completely determined. The information rate is thus 4/7.

Suppose that we receive the string x ∈ F

7

2

. We form the syndrome

(z

1

, z

2

, z

4

) ∈ F

3

2

given by

z

1

= x

1

+x

3

+x

5

+x

7

z

2

= x

2

+x

3

+x

6

+x

7

z

4

= x

4

+x

5

+x

6

+x

7

.

If x is a codeword, then (z

1

, z

2

, z

4

) = (0, 0, 0). If c is a codeword and the

Hamming distance d(x, c) = 1, then the place in which x diﬀers from c is

given by z

1

+ 2z

2

+ 4z

4

(using ordinary addition, not addition modulo 2) as

may be easily checked using linearity and a case by case study of the seven

binary sequences x containing one 1 and six 0s. The Hamming code is thus

1 error correcting.

Exercise 7.5. Suppose we use eight hole tape with the standard paper tape

code and the probability that an error occurs at a particular place on the tape

(i.e. a hole occurs where it should not or fails to occur where it should) is

10

−4

. A program requires about 10 000 lines of tape (each line containing

eight places) using the paper tape code. Using the Poisson approximation,

direct calculation (possible with a hand calculator but really no advance on

the Poisson method), or otherwise, show that the probability that the tape will

be accepted as error free by the decoder is less than .04%.

Suppose now that we use the Hamming scheme (making no use of the last

place in each line). Explain why the program requires about 17 500 lines of

tape but that any particular line will be correctly decoded with probability about

1 − (21 10

−8

) and the probability that the entire program will be correctly

decoded is better than 99.6%.

23

Hamming’s scheme is easy to implement. It took a little time for his com-

pany to realise what he had done

16

but they were soon trying to patent it.

In retrospect, the idea of an error correcting code seems obvious (Hamming’s

scheme had actually been used as the basis of a Victorian party trick) and

indeed two or three other people discovered it independently, but Hamming

and his co-discoverers had done more than ﬁnd a clever answer to a ques-

tion. They had asked an entirely new question and opened a new ﬁeld for

mathematics and engineering.

The times were propitious for the development of the new ﬁeld. Before

1940, error correcting codes would have been luxuries, solutions looking for

problems, after 1950, with the rise of the computer and new communica-

tion technologies, they became necessities. Mathematicians and engineers

returning from wartime duties in code breaking, code making and general

communications problems were primed to grasp and extend the ideas. The

mathematical engineer Claude Shannon may be considered the presiding ge-

nius of the new ﬁeld.

The reader will observe that data compression shortens the length of

our messages by removing redundancy and Hamming’s scheme (like all error

correcting codes) lengthens them by introducing redundancy. This is true,

but data compression removes redundancy which we do not control and which

is not useful to us and error correction coding then replaces it with carefully

controlled redundancy which we can use.

The reader will also note an analogy with ordinary language. The idea

of data compression is illustrated by the fact that many common words are

short

17

. On the other hand the redund of ordin lang makes it poss to understa

it even if we do no catch everyth that is said.

8 General considerations

How good can error correcting and error detecting

18

codes be? The following

discussion is a natural development of the ideas we have already discussed.

Later, in our discussion of Shannon’s noisy coding theorem we shall see an-

other and deeper way of looking at the question.

Deﬁnition 8.1. The minimum distance d of a code is the smallest Hamming

16

Experienced engineers came away from working demonstrations muttering ‘I still don’t

believe it’.

17

Note how ‘horseless carriage’ becomes ‘car’ and ‘telephone’ becomes ‘phone’.

18

If the error rate is low and it is easy to ask for the message to be retransmitted, it may

be cheaper to concentrate on error detection. If there is no possibility of retransmission

(as in long term data storage), we have to concentrate on error correction.

24

distance between distinct code words.

We call a code of length n, size m and distance d an [n, m, d] code. Less

brieﬂy, a set C ⊆ F

n

2

, with [C[ = m and

min¦d(x, y) : x, y ∈ C, x ,= y¦ = d

is called an [n, m, d] code. By an [n, m] code we shall simply mean a code of

length n and size m.

Lemma 8.2. A code of minimum distance d can detect d − 1 errors

19

and

correct ⌊

d−1

2

⌋ errors. It cannot detect all sets of d errors and cannot correct

all sets of ⌊

d−1

2

⌋ + 1 errors.

It is natural, here and elsewhere, to make use of the geometrical insight

provided by the (closed) Hamming ball

B(x, r) = ¦y : d(x, y) ≤ r¦.

Observe that

[B(x, r)[ = [B(0, r)[

for all x and so, writing

V (n, r) = [B(0, r)[,

we know that V (n, r) is the number of points in any Hamming ball of radius

r. A simple counting argument shows that

V (n, r) =

r

j=0

_

n

j

_

.

Theorem 8.3. [Hamming’s bound] If a code C is e error correcting, then

[C[ ≤

2

n

V (n, e)

.

There is an obvious fascination (if not utility) in the search for codes

which attain the exact Hamming bound.

19

This is not as useful as it looks when d is large. If we know that our message is likely

to contain many errors, all that an error detecting code can do is conﬁrm our expectations.

Error detection is only useful when errors are unlikely.

25

Deﬁnition 8.4. A code C of length n and size m which can correct e errors

is called perfect if

m =

2

n

V (n, e)

.

Lemma 8.5. Hamming’s original code is a [7, 16, 3] code. It is perfect.

It may be worth remarking in this context that, if a code which can correct

e errors is perfect (i.e. has a perfect packing of Hamming balls of radius e),

then the decoder must invariably give the wrong answer when presented with

e + 1 errors. We note also that, if (as will usually be the case) 2

n

/V (n, e) is

not an integer, no perfect e error correcting code can exist.

Exercise 8.6. Even if 2

n

/V (n, e) is an integer, no perfect code may exist.

(i) Verify that

2

90

V (90, 2)

= 2

78

.

(ii) Suppose that C is a perfect 2 error correcting code of length 90 and

size 2

78

. Explain why we may suppose without loss of generality that 0 ∈ C.

(iii) Let C be as in (ii) with 0 ∈ C. Consider the set

X = ¦x ∈ F

90

2

: x

1

= 1, x

2

= 1, d(0, x) = 3¦.

Show that, corresponding to each x ∈ X, we can ﬁnd a unique c(x) ∈ C such

that d(c(x), x) = 2.

(iv) Continuing with the argument of (iii), show that

d(c(x), 0) = 5

and that c

i

(x) = 1 whenever x

i

= 1. If y ∈ X, ﬁnd the number of solutions

to the equation c(x) = c(y) with x ∈ X and, by considering the number of

elements of X, obtain a contradiction.

(v) Conclude that there is no perfect [90, 2

78

] code.

The result of Exercise 8.6 was obtained by Golay. Far more importantly,

he found another case when 2

n

/V (n, e) is an integer and there does exist an

associated perfect code (the Golay code).

Exercise 8.7. Show that V (23, 3) is a power of 2.

Unfortunately the proof that the Golay code is perfect is too long to be

given in the course,

We obtained the Hamming bound, which places an upper bound on how

good a code can be, by a packing argument. A covering argument gives us

the GSV (Gilbert, Shannon, Varshamov) bound in the opposite direction.

Let us write A(n, d) for the size of the largest code with minimum distance

d.

26

Theorem 8.8. [Gilbert, Shannon, Varshamov] We have

A(n, d) ≥

2

n

V (n, d −1)

.

Until recently there were no general explicit constructions for codes which

achieved the GVS bound (i.e. codes whose minimum distance d satisﬁed the

inequality A(n, d)V (n, d − 1) ≥ 2

n

). Such a construction was ﬁnally found

by Garcia and Stichtenoth by using ‘Goppa’ codes.

9 Some elementary probability

Engineers are, of course, interested in ‘best codes’ of length n for reasonably

small values of n, but mathematicians are particularly interested in what

happens as n → ∞.

We recall some elementary probability.

Lemma 9.1. [Tchebychev’s inequality] If X is a bounded real valued

random variable and a > 0, then

Pr([X −EX[ ≥ a) ≤

var X

a

2

.

Theorem 9.2. [Weak law of large numbers] If X

1

, X

2

, . . . is a sequence

of independent identically distributed real valued bounded random variables

and a > 0, then

Pr

_¸

¸

¸

¸

¸

n

−1

n

j=1

X

j

−EX

¸

¸

¸

¸

¸

≥ a

_

→ 0

as N → ∞.

Applying the weak law of large numbers, we obtain the following impor-

tant result.

Lemma 9.3. Consider the model of a noisy transmission channel used in

this course in which each digit has probability p of being wrongly transmitted

independently of what happens to the other digits. If ǫ > 0, then

Pr

_

number of errors in transmission for message of n digits ≥ (1 + ǫ)pn

_

→ 0

as n → ∞.

27

By Lemma 8.2, a code of minimum distance d can correct ⌊

d−1

2

⌋ errors.

Thus, if we have an error rate p and ǫ > 0, we know that the probability

that a code of length n with error correcting capacity ⌈(1 + ǫ)pn⌉ will fail

to correct a transmitted message falls to zero as n → ∞. By deﬁnition, the

biggest code with minimum distance ⌈2(1 +ǫ)pn⌉ has size A(n, ⌈2(1 +ǫ)pn⌉)

and so has information rate log

2

A(n, ⌈2(1+ǫ)pn⌉)/n. Study of the behaviour

of log

2

A(n, nδ)/n will thus tell us how large an information rate is possible

in the presence of a given error rate.

Deﬁnition 9.4. If 0 < δ < 1/2 we write

α(δ) = limsup

n→∞

log

2

A(n, nδ)

n

.

Deﬁnition 9.5. We deﬁne the entropy function H : [0, 1] → R by H(0) =

H(1) = 0 and

H(t) = −t log

2

(t) −(1 −t) log

2

(1 −t).

Exercise 9.6. (i) We have already met Shannon entropy in Deﬁnition 4.7.

Give a simple system such that, using the notation of that deﬁnition,

H(A) = H(t).

(ii) Sketch H. What is the value of H(1/2)?

Theorem 9.7. With the deﬁnitions just given,

1 −H(δ) ≤ α(δ) ≤ 1 −H(δ/2)

for all 0 ≤ δ < 1/2.

Using the Hamming bound (Theorem 8.3) and the GSV bound (Theo-

rem 8.8), we see that Theorem 9.7 follows at once from the following result.

Theorem 9.8. We have

log

2

V (n, nδ)

n

→ H(δ)

as n → ∞.

Our proof of Theorem 9.8 depends, as one might expect, on a version of

Stirling’s formula. We only need the very simplest version proved in IA.

Lemma 9.9 (Stirling). We have

log

e

n! = nlog

e

n −n +O(log

2

n).

28

We combine this with the remarks that

V (n, nδ) =

0≤j≤nδ

_

n

j

_

and that very simple estimates give

_

n

m

_

≤

0≤j≤nδ

_

n

j

_

≤ (m+ 1)

_

n

m

_

where m = ⌊nδ⌋.

Although the GSV bound is very important, Shannon showed that a

stronger result can be obtained for the error correcting power of the best

long codes.

10 Shannon’s noisy coding theorem

In the backstreets of Cambridge (Massachusetts) there is a science museum

devoted to the glory of MIT. Since MIT has a great deal of glory and since

much thought has gone into the presentation of the exhibits, it is well worth

a visit. However, for any mathematician, the highlight is a glass case con-

taining such things as a juggling machine, an electronic calculator

20

that uses

Roman numerals both externally and internally, the remnants of a machine

built to guess which of heads and tails its opponent would choose next

21

and a mechanical maze running mouse. These objects were built by Claude

Shannon.

In his 1937 master’s thesis, Shannon showed how to analyse circuits us-

ing Boolean algebra and binary arithmetic. During the war he worked on

gunnery control and cryptography at Bell labs and in 1948 he published A

Mathematical Theory of Communication

22

. Shannon had several predeces-

sors and many successors, but it is his vision which underlies this course.

Hamming’s bound together with Theorem 9.7 gives a very strong hint

that it is not possible to have an information rate greater than 1 −H(δ) for

an error rate δ < 1/2. (We shall prove this explicitly in Theorem 10.3.) On

the other hand the GSV bound together with Theorem 9.7 shows that it is

20

THROBAC the THrifty ROman numeral BAckwards-looking Computer. Google ‘MIT

Museum’, go to ‘objects’ and then search ‘Shannon’.

21

That is to say a prediction machine. Google ‘Shannon Mind-Reading Machine’ for

sites giving demonstrations and descriptions of the underlying program.

22

This beautiful paper is available on the web and in his Collected Works.

29

always possible to have an information rate greater than 1 − H(2δ) for an

error rate δ < 1/4.

Although we can use repetition codes to get a positive information rate

when 1/4 ≤ δ < 1/2 it looks very hard at ﬁrst (and indeed second) glance to

improve these results.

However, Shannon realised that we do not care whether errors arise be-

cause of noise in transmission or imperfections in our coding scheme. By

allowing our coding scheme to be less than perfect (in this connection, see

Question 25.13) we can actually improve the information rate whilst still

keeping the error rate low.

Theorem 10.1. [Shannon’s noisy coding theorem] Suppose 0 < p < 1/2

and η > 0. Then there exists an n

0

(p, η) such that, for any n > n

0

, we can

ﬁnd codes of length n which have the property that (under our standard model

of a symmetric binary channel with probability of error p) the probability

that any codeword is mistaken is less than η and still have information rate

1 −H(p) −η.

Shannon’s theorem is a masterly display of the power of elementary prob-

abilistic arguments to overcome problems which appear insuperable by other

means

23

.

However, it merely asserts that good codes exist and gives no means of

ﬁnding them apart from exhaustive search. More seriously, random codes

will have no useful structure and the only way to use them is to ‘search

through a large dictionary’ at the coding end and ‘search through an enor-

mous dictionary’ at the decoding end. It should also be noted that n

0

(p, η)

will be very large when p is close to 1/2.

Exercise 10.2. Why, in the absence of suitable structure, is the dictionary

at the decoding end much larger than the dictionary at the coding end?

It is relatively simple to obtain a converse to Shannon’s theorem.

Theorem 10.3. Suppose 0 < p < 1/2 and η > 0. Then there exists an

n

0

(p, η) such that, for any n > n

0

, it is impossible to ﬁnd codes of length

n which have the property that (under our standard model of a symmetric

binary channel with probability of error p) the probability that any codeword

is mistaken is less than 1/2 and the code has information rate 1 −H(p) +η.

23

Conway says that in order to achieve success in a mathematical ﬁeld you must either

be ﬁrst or be clever. However, as in the case of Shannon, most of those who are ﬁrst to

recognise a new mathematical ﬁeld are also clever.

30

As might be expected, Shannon’s theorem and its converse extend to more

general noisy channels (in particular, those where the noise is governed by a

Markov chain M). It is possible to deﬁne the entropy H(M) associated with

M and to show that the information rate cannot exceed 1 −H(M) but that

any information rate lower than 1−H(M) can be attained with arbitrarily low

error rates. However, we must leave something for more advanced courses,

and as we said earlier, it is rare in practice to have very clear information

about the nature of the noise we encounter.

There is one very important theorem of Shannon which is not covered

in this course. In it, he reinterprets a result of Whittaker to show that any

continuous signal whose Fourier transform vanishes outside a range of length

R can be reconstructed from its value at equally spaced sampling points

provided those points are less than A/R apart. (The constant A depends

on the conventions used in deﬁning the Fourier transform.) This enables us

to apply the ‘digital’ theory of information transmission developed here to

continuous signals.

11 A holiday at the race track

Although this section is examinable

24

, the material is peripheral to the course.

Suppose a very rich friend makes you the following oﬀer. Every day, at noon,

you may make a bet with her for any amount k you chose. You give her k

pounds which she keeps whatever happens. She then tosses a coin and, if it

shows heads, she pays you ku and, if it shows tails, she pays you nothing.

You know that the probability of heads is p. What should you do?

If pu < 1, you should not bet, because your expected winnings are neg-

ative. If pu > 1, most mathematicians would be inclined to bet, but how

much? If you bet your entire fortune and win, you will be better oﬀ than if

you bet a smaller sum, but, if you lose, then you are bankrupt and cannot

continue playing.

Thus your problem is to discover the proportion w of your present fortune

that you should bet. Observe that your choice of w will always be the same

(since you expect to go on playing for ever). Only the size of your fortune

will vary. If your fortune after n goes is Z

n

, then

Z

n+1

= Z

n

Y

n+1

24

When the author of the present notes gives the course. This is his interpretation of

the sentence in the schedules ‘Applications to gambling and the stock market.’ Other

lecturers may view matters diﬀerently.

31

where Y

n+1

= uw + (1 − w) if the n + 1st throw is heads and Y

n+1

= 1 − w

if it is tails.

Using the weak law of large numbers, we have the following result.

Lemma 11.1. Suppose Y , Y

1

, Y

2

, . . . are identically distributed independent

random variables taking values in [a, b] with 0 < a < b. If we write Z

n

=

Y

1

. . . Y

n

, then

Pr([n

−1

log Z

n

−Elog Y [ > ǫ) → 0

as n → 0.

Thus you should choose w to maximise

Elog Y

n

= p log

_

uw + (1 −w)

_

+ (1 −p) log(1 −w).

Exercise 11.2. (i) Show that, for the situation described, you should not bet

if up ≤ 1 and should take

w =

up −1

u −1

if up > 1.

(ii) We write q = 1−p. Show that, if up > 1 and we choose the optimum

w,

Elog Y

n

= p log p +q log q + log u −q log(u −1).

We have seen the expression −(p log p + q log q) before as (a multiple

of) the Shannon information entropy of a simple probabilistic system. In

a paper entitled A New Interpretation of Information Rate

25

Kelly showed

how to interpret this and similar situations using communication theory. In

his model a gambler receives information over a noisy channel about which

horse is going to win. Just as Shannon’s theorem shows that information

can be transmitted over such a channel at a rate close to channel capacity

with negligible risk of error (provided the messages are long enough), so that

the gambler can (with arbitrarily high probability) increase her fortune at a

certain optimum rate provided that she can continue to bet long enough.

Although the analogy between betting and communication channels is

very pretty, it was the suggestion that those making a long sequence of bets

should aim to maximise the expectation of the logarithm (now called Kelly’s

criterion) which made the paper famous. Although Kelly seems never to

have used his idea in practice, mathematicians like Thorp, Berlekamp and

25

Available on the web. The exposition is slightly opaque because the Bell company

which employed Kelly was anxious not draw attention to the use of telephones for betting

fraud.

32

Shannon himself have made substantial fortunes in the stock market and

claim to have used Kelly’s ideas

26

.

Kelly is also famous for an early demonstration of speech synthesis in

which a computer sang ‘Daisy Bell’. This inspired the corresponding scene

in the ﬁlm 2001.

Before rushing out to the race track or stock exchange

27

, the reader is

invited to run computer simulations of the result of Kelly gambling for various

values of u and p. She will observe that although, in the very long run, the

system works, the short run can be be very unpleasant indeed.

Exercise 11.3. Returning to our original problem, show that, if you bet

less than the optimal proportion, your fortune will still tend to increase but

more slowly, but, if you bet more than some proportion w

1

, your fortune will

decrease. Write down the equation for w

1

.

[Moral: If you use the Kelly criterion veer on the side under-betting.]

12 Linear codes

The next few sections involve no probability at all. We shall only be inter-

ested in constructing codes which are easy to handle and have all their code

words at least a certain Hamming distance apart.

Just as R

n

is a vector space over R and C

n

is a vector space over C, so

F

n

2

is a vector space over F

2

. (If you know about vector spaces over ﬁelds,

so much the better, if not, just follow the obvious paths.) A linear code is a

subspace of F

n

2

. More formally, we have the following deﬁnition.

Deﬁnition 12.1. A linear code is a subset of F

n

2

such that

(i) 0 ∈ C,

(ii) if x, y ∈ C, then x +y ∈ C.

Note that, if λ ∈ F, then λ = 0 or λ = 1, so that condition (i) of the

deﬁnition just given guarantees that λx ∈ C whenever x ∈ C. We shall see

that linear codes have many useful properties.

Example 12.2. (i) The repetition code with

C = ¦x : x = (x, x, . . . x)¦

is a linear code.

26

However, we hear more about mathematicians who win on the stock market than

those who lose.

27

A sprat which thinks it’s a shark will have a very short life.

33

(ii) The paper tape code

C =

_

x :

n

j=0

x

j

= 0

_

is a linear code.

(iii) Hamming’s original code is a linear code.

The veriﬁcation is easy. In fact, examples (ii) and (iii) are ‘parity check

codes’ and so automatically linear as we see from the next lemma.

Deﬁnition 12.3. Consider a set P in F

n

2

. We say that C is the code deﬁned

by the set of parity checks P if the elements of C are precisely those x ∈ F

n

2

with

n

j=1

p

j

x

j

= 0

for all p ∈ P.

Lemma 12.4. If C is a code deﬁned by parity checks, then C is linear.

We now prove the converse result.

Deﬁnition 12.5. If C is a linear code, we write C

⊥

for the set of p ∈ F

n

such that

n

j=1

p

j

x

j

= 0

for all x ∈ C.

Thus C

⊥

is the set of parity checks satisﬁed by C.

Lemma 12.6. If C is a linear code, then

(i) C

⊥

is a linear code,

(ii) (C

⊥

)

⊥

⊇ C.

We call C

⊥

the dual code to C.

In the language of the course on linear mathematics, C

⊥

is the annihilator

of C. The following is a standard theorem of that course.

Lemma 12.7. If C is a linear code in F

n

2

then

dimC + dimC

⊥

= n.

34

Since the treatment of dual spaces is not the most popular piece of math-

ematics in IB, we shall give an independent proof later (see the note after

Lemma 12.13). Combining Lemma 12.6 (ii) with Lemma 12.7, we get the

following corollaries.

Lemma 12.8. If C is a linear code, then (C

⊥

)

⊥

= C.

Lemma 12.9. Every linear code is deﬁned by parity checks.

Our treatment of linear codes has been rather abstract. In order to put

computational ﬂesh on the dry theoretical bones, we introduce the notion of

a generator matrix.

Deﬁnition 12.10. If C is a linear code of length n, any r n matrix whose

rows form a basis for C is called a generator matrix for C. We say that C

has dimension or rank r.

Example 12.11. As examples, we can ﬁnd generator matrices for the repe-

tition code, the paper tape code and the original Hamming code.

Remember that the Hamming code is the code of length 7 given by the

parity conditions

x

1

+x

3

+x

5

+x

7

= 0

x

2

+x

3

+x

6

+x

7

= 0

x

4

+x

5

+x

6

+x

7

= 0.

By using row operations and column permutations to perform Gaussian

elimination, we can give a constructive proof of the following lemma.

Lemma 12.12. Any linear code of length n has (possibly after permuting the

order of coordinates) a generator matrix of the form

(I

r

[B).

Notice that this means that any codeword x can be written as

(y[z) = (y[yB)

where y = (y

1

, y

2

, . . . , y

r

) may be considered as the message and the vector

z = yB of length n −r may be considered the check digits. Any code whose

codewords can be split up in this manner is called systematic.

We now give a more computational treatment of parity checks.

35

Lemma 12.13. If C is a linear code of length n with generator matrix G,

then a ∈ C

⊥

if and only if

Ga

T

= 0

T

.

Thus

C

⊥

= (ker G)

T

.

Using the rank, nullity theorem, we get a second proof of Lemma 12.7.

Lemma 12.13 enables us to characterise C

⊥

.

Lemma 12.14. If C is a linear code of length n and dimension r with gen-

erator the nr matrix G, then, if H is any n(n−r)– matrix with columns

forming a basis of ker G, we know that H is a parity check matrix for C and

its transpose H

T

is a generator for C

⊥

.

Example 12.15. (i) The dual of the paper tape code is the repetition code.

(ii) Hamming’s original code has dual with generator matrix

_

_

1 0 1 0 1 0 1

0 1 1 0 0 1 1

0 0 0 1 1 1 1

_

_

We saw above that the codewords of a linear code can be written

(y[z) = (y[yB)

where y may be considered as the vector of message digits and z = yB as the

vector of check digits. Thus encoders for linear codes are easy to construct.

What about decoders? Recall that every linear code of length n has a

(non-unique) associated parity check matrix H with the property that x ∈ C

if and only if xH = 0. If z ∈ F

n

2

, we deﬁne the syndrome of z to be zH. The

following lemma is mathematically trivial but forms the basis of the method

of syndrome decoding.

Lemma 12.16. Let C be a linear code with parity check matrix H. If we are

given z = x +e where x is a code word and the ‘error vector’ e ∈ F

n

2

, then

zH = eH.

Suppose we have tabulated the syndrome uH for all u with ‘few’ non-

zero entries (say, all u with d(u, 0) ≤ K). When our decoder receives z,

it computes the syndrome zH. If the syndrome is zero, then z ∈ C and

the decoder assumes the transmitted message was z. If the syndrome of the

received message is a non-zero vector w, the decoder searches its list until it

36

ﬁnds an e with eH = w. The decoder then assumes that the transmitted

message was x = z − e (note that z − e will always be a codeword, even if

not the right one). This procedure will fail if w does not appear in the list,

but, for this to be case, at least K + 1 errors must have occurred.

If we take K = 1, that is we only want a 1 error correcting code, then,

writing e

(i)

for the vector in F

n

2

with 1 in the ith place and 0 elsewhere, we

see that the syndrome e

(i)

H is the ith row of H. If the transmitted message

z has syndrome zH equal to the ith row of H, then the decoder assumes

that there has been an error in the ith place and nowhere else. (Recall the

special case of Hamming’s original code.)

If K is large the task of searching the list of possible syndromes becomes

onerous and, unless (as sometimes happens) we can ﬁnd another trick, we

ﬁnd that ‘decoding becomes dear’ although ‘encoding remains cheap’.

We conclude this section by looking at weights and the weight enumera-

tion polynomial for a linear code. The idea here is to exploit the fact that, if

C is linear code and a ∈ C, then a +C = C. Thus the ‘view of C’ from any

codeword a is the same as the ‘view of C’ from the particular codeword 0.

Deﬁnition 12.17. The weight w(x) of a vector x ∈ F

n

2

is given by

w(x) = d(0, x).

Lemma 12.18. If w is the weight function on F

n

2

and x, y ∈ F

n

2

, then

(i) w(x) ≥ 0,

(ii) w(x) = 0 if and only if x = 0,

(iii) w(x) +w(y) ≥ w(x +y).

Since the minimum (non-zero) weight in a linear code is the same as the

minimum (non-zero) distance, we can talk about linear codes of minimum

weight d when we mean linear codes of minimum distance d.

The pattern of distances in a linear code is encapsulated in the weight

enumeration polynomial.

Deﬁnition 12.19. Let C be a linear code of length n. We write A

j

for the

number of codewords of weight j and deﬁne the weight enumeration polyno-

mial W

C

to be the polynomial in two real variables given by

W

C

(s, t) =

n

j=0

A

j

s

j

t

n−j

.

Here are some simple properties of W

C

.

37

Lemma 12.20. Under the assumptions and with the notation of Deﬁni-

tion 12.19, the following results are true.

(i) W

C

is a homogeneous polynomial of degree n.

(ii) If C has rank r, then W

C

(1, 1) = 2

r

.

(iii) W

C

(0, 1) = 1.

(iv) W

C

(1, 0) takes the value 0 or 1.

(v) W

C

(s, t) = W

C

(t, s) for all s and t if and only if W

C

(1, 0) = 1.

Lemma 12.21. For our standard model of communication along an error

prone channel with independent errors of probability p and a linear code C

of length n,

W

C

(p, 1 −p) = Pr(receive a code word [ code word transmitted)

and

Pr(receive incorrect code word [ code word transmitted) = W

C

(p, 1−p)−(1−p)

n

.

Example 12.22. (i) If C is the repetition code, W

C

(s, t) = s

n

+t

n

.

(ii) If C is the paper tape code of length n, W

C

(s, t) =

1

2

((s+t)

n

+(t−s)

n

).

Example 12.22 is a special case of the MacWilliams identity.

Theorem 12.23. [MacWilliams identity] If C is a linear code

W

C

⊥(s, t) = 2

−dimC

W

C

(t −s, t +s).

We give a proof as Exercise 26.9. (The result is thus not bookwork though

it could be set as a problem with appropriate hints.)

13 Some general constructions

However interesting the theoretical study of codes may be to a pure mathe-

matician, the engineer would prefer to have an arsenal of practical codes so

that she can select the one most suitable for the job in hand. In this section

we discuss the general Hamming codes and the Reed-Muller codes as well as

some simple methods of obtaining new codes from old.

Deﬁnition 13.1. Let d be a strictly positive integer and let n = 2

d

− 1.

Consider the (column) vector space D = F

d

2

. Write down a d n matrix H

whose columns are the 2

d

−1 distinct non-zero vectors of D. The Hamming

(n, n −d) code is the linear code of length n with H

T

as parity check matrix.

38

Of course the Hamming (n, n−d) code is only deﬁned up to permutation

of coordinates. We note that H has rank d, so a simple use of the rank nullity

theorem shows that our notation is consistent.

Lemma 13.2. The Hamming (n, n−d) code is a linear code of length n and

rank n −d [n = 2

d

−1].

Example 13.3. The Hamming (7, 4) code is the original Hamming code.

The fact that any two rows of H are linearly independent and a look at the

appropriate syndromes gives us the main property of the general Hamming

code.

Lemma 13.4. The Hamming (n, n −d) code has minimum weight 3 and is

a perfect 1 error correcting code [n = 2

d

−1].

Hamming codes are ideal in situations where very long strings of binary

digits must be transmitted but the chance of an error in any individual

digit is very small. (Look at Exercise 7.5.) Although the search for perfect

codes other than the Hamming codes produced the Golay code (not discussed

here) and much interesting combinatorics, the reader is warned that, from a

practical point of view, it represents a dead end

28

.

Here are a number of simple tricks for creating new codes from old.

Deﬁnition 13.5. If C is a code of length n, the parity check extension C

+

of C is the code of length n + 1 given by

C

+

=

_

x ∈ F

n+1

2

: (x

1

, x

2

, . . . , x

n

) ∈ C,

n+1

j=1

x

j

= 0

_

.

Deﬁnition 13.6. If C is a code of length n, the truncation C

−

of C is the

code of length n −1 given by

C

−

= ¦(x

1

, x

2

, . . . , x

n−1

) : (x

1

, x

2

, . . . , x

n

) ∈ C for some x

n

∈ F

2

¦.

28

If we conﬁne ourselves to the binary codes discussed in this course, it is known that

perfect codes of length n with Hamming spheres of radius ρ exist for ρ = 0, ρ = n,

ρ = (n−1)/2, with n odd (the three codes just mentioned are easy to identify), ρ = 3 and

n = 23 (the Golay code, found by direct search) and ρ = 1 and n = 2

m

− 1. There are

known to be non-Hamming codes with ρ = 1 and n = 2

m

− 1, it is suspected that there

are many of them and they are the subject of much research, but, of course they present

no practical advantages. The only linear perfect codes with ρ = 1 and n = 2

m

−1 are the

Hamming codes.

39

Deﬁnition 13.7. If C is a code of length n, the shortening (or puncturing)

C

′

of C by the symbol α (which may be 0 or 1) is the code of length n − 1

given by

C

′

= ¦(x

1

, x

2

, . . . , x

n−1

) : (x

1

, x

2

, . . . , x

n−1

, α) ∈ C¦.

Lemma 13.8. If C is linear, so is its parity check extension C

+

, its trunca-

tion C

−

and its shortening C

′

(provided that the symbol chosen is 0).

How can we combine two linear codes C

1

and C

2

? Our ﬁrst thought might

be to look at their direct sum

C

1

⊕C

2

= ¦(x[y) : x ∈ C

1

, y ∈ C

2

¦,

but this is unlikely to be satisfactory.

Lemma 13.9. If C

1

and C

2

are linear codes, then we have the following

relation between minimum distances.

d(C

1

⊕C

2

) = min

_

d(C

1

), d(C

2

)

_

.

On the other hand, if C

1

and C

2

satisfy rather particular conditions, we

can obtain a more promising construction.

Deﬁnition 13.10. Suppose C

1

and C

2

are linear codes of length n with

C

1

⊇ C

2

(i.e. with C

2

a subspace of C

1

). We deﬁne the bar product C

1

[C

2

of C

1

and C

2

to be the code of length 2n given by

C

1

[C

2

= ¦(x[x +y) : x ∈ C

1

, y ∈ C

2

¦.

Lemma 13.11. Let C

1

and C

2

be linear codes of length n with C

1

⊇ C

2

.

Then the bar product C

1

[C

2

is a linear code with

rank C

1

[C

2

= rank C

1

+ rank C

2

.

The minimum distance of C

1

[C

2

satisﬁes the equality

d(C

1

[C

2

) = min(2d(C

1

), d(C

2

)).

We now return to the construction of speciﬁc codes. Recall that the

Hamming codes are suitable for situations when the error rate p is very

small and we want a high information rate. The Reed-Muller are suitable

when the error rate is very high and we are prepared to sacriﬁce information

rate. They were used by NASA for the radio transmissions from its planetary

40

probes (a task which has been compared to signalling across the Atlantic with

a child’s torch

29

).

We start by considering the 2

d

points P

0

, P

1

, . . . , P

2

d

−1

of the space

X = F

d

2

. Our code words will be of length n = 2

d

and will correspond to the

indicator functions I

A

on X. More speciﬁcally, the possible code word c

A

is

given by

c

A

i

= 1 if P

i

∈ A

c

A

i

= 0 otherwise.

for some A ⊆ X.

In addition to the usual vector space structure on F

n

2

, we deﬁne a new

operation

c

A

∧ c

B

= c

A∩B

.

Thus, if x, y ∈ F

n

2

,

(x

0

, x

1

, . . . , x

n−1

) ∧ (y

0

, y

1

, . . . , y

n−1

) = (x

0

y

0

, x

1

y

1

, . . . , x

n−1

y

n−1

).

Finally we consider the collection of d hyperplanes

π

j

= ¦p ∈ X : p

j

= 0¦ [1 ≤ j ≤ d]

in F

n

2

and the corresponding indicator functions

h

j

= c

π

j

,

together with the special vector

h

0

= c

X

= (1, 1, . . . , 1).

Exercise 13.12. Suppose that x, y, z ∈ F

n

2

and A, B ⊆ X.

(i) Show that x ∧ y = y ∧ x.

(ii) Show that (x +y) ∧ z = x ∧ z +y ∧ z.

(iii) Show that h

0

∧ x = x.

(iv) If c

A

+c

B

= c

E

, ﬁnd E in terms of A and B.

(v) If h

0

+c

A

= c

E

, ﬁnd E in terms of A.

We refer to /

0

= ¦h

0

¦ as the set of terms of order zero. If /

k

is the set

of terms of order at most k, then the set /

k+1

of terms of order at most k +1

is deﬁned by

/

k+1

= ¦a ∧ h

j

: a ∈ /

k

, 1 ≤ j ≤ d¦.

Less formally, but more clearly, the elements of order 1 are the h

i

, the ele-

ments of order 2 are the h

i

∧ h

j

with i < j, the elements of order 3 are the

h

i

∧ h

j

∧ h

k

with i < j < k and so on.

29

Strictly speaking, the comparison is meaningless. However, it sounds impressive and

that is the main thing.

41

Deﬁnition 13.13. Using the notation established above, the Reed-Muller

code RM(d, r) is the linear code (i.e. subspace of F

n

2

) generated by the terms

of order r or less.

Although the formal deﬁnition of the Reed-Muller codes looks pretty

impenetrable at ﬁrst sight, once we have looked at suﬃciently many examples

it should become clear what is going on.

Example 13.14. (i) The RM(3, 0) code is the repetition code of length 8.

(ii) The RM(3, 1) code is the parity check extension of Hamming’s orig-

inal code.

(iii) The RM(3, 2) code is the paper tape code of length 8.

(iii) The RM(3, 3) code is the trivial code consisting of all the elements

of F

3

2

.

We now prove the key properties of the Reed-Muller codes. We use the

notation established above.

Theorem 13.15. (i) The elements of order d or less (that is the collection

of all possible wedge products formed from the h

i

) span F

n

2

.

(ii) The elements of order d or less are linearly independent.

(iii) The dimension of the Reed-Muller code RM(d, r) is

_

d

0

_

+

_

d

1

_

+

_

d

2

_

+ +

_

d

r

_

.

(iv) Using the bar product notation, we have

RM(d, r) = RM(d −1, r)[RM(d −1, r −1).

(v) The minimum weight of RM(d, r) is exactly 2

d−r

.

Exercise 13.16. The Mariner mission to Mars used the RM(5, 1) code.

What was its information rate? What proportion of errors could it correct in

a single code word?

Exercise 13.17. Show that the RM(d, d − 2) code is the parity extension

code of the Hamming (N, N − d) code with N = 2

d

− 1. (This is useful

because we often want codes of length 2

d

.)

42

14 Polynomials and ﬁelds

This section is starred. Its object is to make plausible the few facts from

modern

30

algebra that we shall need. They were covered, along with much

else, in various post-IA algebra courses, but attendance at those courses is

no more required for this course than is reading Joyce’s Ulysses before going

for a night out at an Irish pub. Anyone capable of criticising the imprecision

and general slackness of the account that follows obviously can do better

themselves and should rewrite this section in an appropriate manner.

A ﬁeld K is an object equipped with addition and multiplication which

follow the same rules as do addition and multiplication in R. The only rule

which will cause us trouble is

If x ∈ K and x ,= 0, then we can ﬁnd y ∈ K such that xy = 1. ⋆

Obvious examples of ﬁelds include R, C and F

2

.

We are particularly interested in polynomials over ﬁelds, but here an

interesting diﬃculty arises.

Example 14.1. We have t

2

+t = 0 for all t ∈ F

2

.

To get round this, we distinguish between the polynomial in the ‘indeter-

minate’ X

P(X) =

n

j=0

a

j

X

j

with coeﬃcients a

j

∈ K and its evaluation P(t) =

n

j=0

a

j

t

j

for some t ∈

K. We manipulate polynomials in X according to the standard rules for

polynomials, but say that

n

j=0

a

j

X

j

= 0

if and only if a

j

= 0 for all j. Thus X

2

+ X is a non-zero polynomial over

F

2

all of whose values are zero.

The following result is familiar, in essence, from school mathematics.

Lemma 14.2. [Remainder theorem] (i) If P is a polynomial over a ﬁeld

K and a ∈ K, then we can ﬁnd a polynomial Q and an r ∈ K such that

P(X) = (X −a)Q(X) +r.

(ii) If P is a polynomial over a ﬁeld K and a ∈ K is such that P(a) = 0,

then we can ﬁnd a polynomial Q such that

P(X) = (X −a)Q(X).

30

Modern, that is, in 1920.

43

The key to much of the elementary theory of polynomials lies in the fact

that we can apply Euclid’s algorithm to obtain results like the following.

Theorem 14.3. Suppose that T is a set of polynomials which contains at

least one non-zero polynomial and has the following properties.

(i) If Q is any polynomial and P ∈ T, then the product PQ ∈ T.

(ii) If P

1

, P

2

∈ T, then P

1

+P

2

∈ T.

Then we can ﬁnd a non-zero P

0

∈ T which divides every P ∈ T.

Proof. Consider a non-zero polynomial P

0

of smallest degree in T.

Recall that the polynomial P(X) = X

2

+ 1 has no roots in R (that is

P(t) ,= 0 for all t ∈ R). However, by considering the collection of formal

expressions a + bi [a, b ∈ R] with the obvious formal deﬁnitions of addition

and multiplication and subject to the further condition i

2

+1 = 0, we obtain

a ﬁeld C ⊇ R in which P has a root (since P(i) = 0). We can perform a

similar trick with other ﬁelds.

Example 14.4. If P(X) = X

2

+X+1, then P has no roots in F

2

. However,

if we consider

F

2

[ω] = ¦0, 1, ω, 1 +ω¦

with obvious formal deﬁnitions of addition and multiplication and subject to

the further condition ω

2

+ ω + 1 = 0, then F

2

[ω] is a ﬁeld containing F

2

in

which P has a root (since P(ω) = 0).

Proof. The only thing we really need prove is that F

2

[ω] is a ﬁeld and to do

that the only thing we need to prove is that ⋆ holds. Since

(1 +ω)ω = 1

this is easy.

In order to state a correct generalisation of the ideas of the previous

paragraph we need a preliminary deﬁnition.

Deﬁnition 14.5. If P is a polynomial over a ﬁeld K, we say that P is

reducible if there exists a non-constant polynomial Q of degree strictly less

than P which divides P. If P is a non-constant polynomial which is not

reducible, then P is irreducible.

Theorem 14.6. If P is an irreducible polynomial of degree n ≥ 2 over a

ﬁeld K, then P has no roots in K. However, if we consider

K[ω] =

_

n−1

j=0

a

j

ω

j

: a

j

∈ K

_

44

with the obvious formal deﬁnitions of addition and multiplication and subject

to the further condition P(ω) = 0, then K[ω] is a ﬁeld containing K in which

P has a root.

Proof. The only thing we really need prove is that K[ω] is a ﬁeld and to do

that the only thing we need to prove is that ⋆ holds. Let Q be a non-zero

polynomial of degree at most n −1. Since P is irreducible, the polynomials

P and Q have no common factor of degree 1 or more. Hence, by Euclid’s

algorithm, we can ﬁnd polynomials R and S such that

R(X)Q(X) +S(X)P(X) = 1

and so R(ω)Q(ω) +S(ω)P(ω) = 1. But P(ω) = 0, so R(ω)Q(ω) = 1 and we

have proved ⋆.

In a proper algebra course we would simply deﬁne

K[ω] = K[X]/(P(X))

where (P(X)) is the ideal generated by P(X). This is a cleaner procedure

which avoids the use of such phrases as ‘the obvious formal deﬁnitions of

addition and multiplication’ but the underlying idea remains the same.

Lemma 14.7. If P is a polynomial over a ﬁeld K which does not factorise

completely into linear factors, then we can ﬁnd a ﬁeld L ⊇ K in which P has

more linear factors.

Proof. Factor P into irreducible factors and choose a factor Q which is not

linear. By Theorem 14.6, we can ﬁnd a ﬁeld L ⊇ K in which Q has a root α

say and so, by Lemma 14.2, a linear factor X −α. Since any linear factor of

P in K remains a factor in the bigger ﬁeld L, we are done.

Theorem 14.8. If P is a polynomial over a ﬁeld K, then we can ﬁnd a ﬁeld

L ⊇ K in which P factorises completely into linear factors.

We shall be interested in ﬁnite ﬁelds (that is ﬁelds K with only a ﬁnite

number of elements). A glance at our method of proving Theorem 14.8 shows

that the following result holds.

Lemma 14.9. If P is a polynomial over a ﬁnite ﬁeld K, then we can ﬁnd a

ﬁnite ﬁeld L ⊇ K in which P factorises completely.

In this context, we note yet another useful simple consequence of Euclid’s

algorithm.

45

Lemma 14.10. Suppose that P is an irreducible polynomial over a ﬁeld K

which has a linear factor X − α in some ﬁeld L ⊇ K. If Q is a polynomial

over K which has the factor X −α in L, then P divides Q.

We shall need a lemma on repeated roots.

Lemma 14.11. Let K be a ﬁeld. If P(X) =

n

j=0

a

j

X

j

is a polynomial over

K, we deﬁne P

′

(X) =

n

j=1

ja

j

X

j−1

.

(i) If P and Q are polynomials, (P +Q)

′

= P

′

+Q

′

and (PQ)

′

= P

′

Q+

PQ

′

.

(ii) If P and Q are polynomials with P(X) = (X −a)

2

Q(X), then

P

′

(X) = 2(X −a)Q(X) + (X −a)

2

Q

′

(X).

(iii) If P is divisible by (X −a)

2

, then P(a) = P

′

(a) = 0.

If L is a ﬁeld containing F

2

, then 2y = (1+1)y = 0y = 0 for all y ∈ L. We

can thus deduce the following result which will be used in the next section.

Lemma 14.12. If L is a ﬁeld containing F

2

and n is an odd integer, then

X

n

−1 can have no repeated linear factors as a polynomial over L.

We also need a result on roots of unity given as part (v) of the next

lemma.

Lemma 14.13. (i) If G is a ﬁnite Abelian group and x, y ∈ G have coprime

orders r and s, then xy has order rs.

(ii) If G is a ﬁnite Abelian group and x, y ∈ G have orders r and s, then

we can ﬁnd an element z of G with order the lowest common multiple of r

and s.

(iii) If G is a ﬁnite Abelian group, then there exists an N and an h ∈ G

such that h has order N and g

N

= e for all g ∈ G.

(iv) If G is a ﬁnite subset of a ﬁeld K which is a group under multiplica-

tion, then G is cyclic.

(v) Suppose n is an odd integer. If L is a ﬁeld containing F

2

such that

X

n

− 1 factorises completely into linear terms, then we can ﬁnd an ω ∈ L

such that the roots of X

n

−1 are 1, ω, ω

2

, . . . ω

n−1

. (We call ω a primitive

nth root of unity.)

Proof. (ii) Consider z = x

u

y

v

where u is a divisor of r, v is a divisor of s,

r/u and s/v are coprime and rs/(uv) = lcm(r, s).

(iii) Let h be an element of highest order in G and use (ii).

(iv) By (iii) we can ﬁnd an integer N and a h ∈ G such that h has order

N and any element g ∈ G satisﬁes g

N

= 1. Thus X

N

−1 has a linear factor

46

X − g for each g ∈ G and so

g∈G

(X − g) divides X

N

− 1. It follows that

the order [G[ of G cannot exceed N. But by Lagrange’s theorem N divides

G. Thus [G[ = N and g generates G.

(v) Observe that G = ¦ω : ω

n

= 1¦ is an Abelian group with exactly n

elements (since X

n

−1 has no repeated roots) and use (iv).

Here is another interesting consequence of Lemma 14.13 (iv).

Lemma 14.14. If K is a ﬁeld with m elements, then there is an element k

of K such that

K = ¦0¦ ∪ ¦k

r

: 0 ≤ r ≤ m−2¦

and k

m−1

= 1.

Proof. Observe that K ¸ ¦0¦ forms an Abelian group under multiplication.

We call an element k with the properties given in Lemma 14.14 a primitive

element of K.

Exercise 14.15. Find all the primitive elements of F

7

.

With this hint, it is not hard to show that there is indeed a ﬁeld with 2

n

elements containing F

2

.

Lemma 14.16. Let L be some ﬁeld containing F

2

in which X

2

n

−1

− 1 = 0

factorises completely. Then

K = ¦x ∈ L : x

2

n

= x¦

is a ﬁeld with 2

n

elements containing F

2

.

Lemma 14.14 shows that there is (up to ﬁeld isomorphism) only one ﬁeld

with 2

n

elements containing F

2

. We call it F

2

n.

15 Cyclic codes

In this section, we discuss a subclass of linear codes, the so-called cyclic codes.

Deﬁnition 15.1. A linear code C in F

n

2

is called cyclic if

(a

0

, a

1

, . . . , a

n−2

, a

n−1

) ∈ C ⇒ (a

1

, a

2

, . . . , a

n−1

, a

0

) ∈ C.

47

Let us establish a correspondence between F

n

2

and the polynomials on F

2

modulo X

n

−1 by setting

P

a

=

n−1

j=0

a

j

X

j

whenever a ∈ F

n

2

. (Of course, X

n

−1 = X

n

+ 1 but in this context the ﬁrst

expression seems more natural.)

Exercise 15.2. With the notation just established, show that

(i) P

a

+P

b

= P

a+b

,

(ii) P

a

= 0 if and only if a = 0.

Lemma 15.3. A code C in F

n

2

is cyclic if and only if T

C

= ¦P

a

: a ∈ C¦

satisﬁes the following two conditions (working modulo X

n

−1).

(i) If f, g ∈ T

C

, then f +g ∈ T

C

.

(ii) If f ∈ T

C

and g is any polynomial, then the product fg ∈ T

C

.

(In the language of abstract algebra, C is cyclic if and only if T

C

is an ideal

of the quotient ring F

2

[X]/(X

n

−1).)

From now on we shall talk of the code word f(X) when we mean the code

word a with P

a

(X) = f(X). An application of Euclid’s algorithm gives the

following useful result.

Lemma 15.4. A code C of length n is cyclic if and only if (working modulo

X

n

−1, and using the conventions established above) there exists a polynomial

g such that

C = ¦f(X)g(X) : f a polynomial¦

(In the language of abstract algebra, F

2

[X] is a Euclidean domain and so a

principal ideal domain. Thus the quotient F

2

[X]/(X

n

−1) is a principal ideal

domain.) We call g(X) a generator polynomial for C.

Lemma 15.5. A polynomial g is a generator for a cyclic code of length n if

and only if it divides X

n

−1.

Thus we must seek generators among the factors of X

n

−1 = X

n

+ 1. If

there are no conditions on n, the result can be rather disappointing.

Exercise 15.6. If we work with polynomials over F

2

, then

X

2

r

+ 1 = (X + 1)

2

r

.

In order to avoid this problem and to be able to make use of Lemma 14.12,

we shall take n odd from now on. (In this case, the cyclic codes are said to be

separable.) Notice that the task of ﬁnding irreducible factors (that is factors

with no further factorisation) is a ﬁnite one.

48

Lemma 15.7. Consider codes of length n. Suppose that g(X)h(X) = X

n

−1.

Then g is a generator of a cyclic code C and h is a generator for a cyclic

code which is the reverse of C

⊥

.

As an immediate corollary, we have the following remark.

Lemma 15.8. The dual of a cyclic code is itself cyclic.

Lemma 15.9. If a cyclic code C of length n has generator g of degree n −r

then g(X), Xg(X), . . . , X

r−1

g(X) form a basis for C.

Cyclic codes are thus easy to specify (we just need to write down the

generator polynomial g) and to encode.

We know that X

n

+ 1 factorises completely over some larger ﬁnite ﬁeld

and, since n is odd, we know, by Lemma 14.12, that it has no repeated

factors. The same is therefore true for any polynomial dividing it.

Lemma 15.10. Suppose that g is a generator of a cyclic code C of odd length

n. Suppose further that g factorises completely into linear factors in some

ﬁeld K containing F

2

. If g = g

1

g

2

. . . g

k

with each g

j

irreducible over F

2

and

A is a subset of the set of all the roots of all the g

j

and containing at least

one root of each g

j

[1 ≤ j ≤ k], then

C = ¦f ∈ F

2

[X] : f(α) = 0 for all α ∈ A¦.

Deﬁnition 15.11. A deﬁning set for a cyclic code C is a set A of elements

in some ﬁeld K containing F

2

such that f ∈ F

2

[X] belongs to C if and only

if f(α) = 0 for all α ∈ A.

(Note that, if C has length n, A must be a set of zeros of X

n

−1.)

Lemma 15.12. Suppose that

A = ¦α

1

, α

2

, . . . , α

r

¦

is a deﬁning set for a cyclic code C in some ﬁeld K containing F

2

. Let B be

the n r matrix over K whose jth column is

(1, α

j

, α

2

j

, . . . , α

n−1

j

)

T

Then a vector a ∈ F

n

2

is a code word in C if and only if

aB = 0

in K.

49

The columns in B are not parity checks in the usual sense since the code

entries lie in F

2

and the computations take place in the larger ﬁeld K.

With this background we can discuss a famous family of codes known

as the BCH (Bose, Ray-Chaudhuri, Hocquenghem) codes. Recall that a

primitive nth root of unity is an root α of X

n

− 1 = 0 such that every root

is a power of α.

Deﬁnition 15.13. Suppose that n is odd and K is a ﬁeld containing F

2

in

which X

n

−1 factorises into linear factors. Suppose that α ∈ K is a primitive

nth root of unity. A cyclic code C with deﬁning set

A = ¦α, α

2

, . . . , α

δ−1

¦

is a BCH code of design distance δ.

Note that the rank of C will be n −k, where k is the degree of the product

of those irreducible factors of X

n

−1 over F

2

which have a zero in A. Notice

also that k may be very much larger than δ.

Example 15.14. (i) If K is a ﬁeld containing F

2

, then (a + b)

2

= a

2

+ b

2

for all a, b ∈ K.

(ii) If P ∈ F

2

[X] and K is a ﬁeld containing F

2

, then P(a)

2

= P(a

2

) for

all a ∈ K.

(iii) Let K be a ﬁeld containing F

2

in which X

7

−1 factorises into linear

factors. If β is a root of X

3

+X+1 in K, then β is a primitive root of unity

and β

2

is also a root of X

3

+X + 1.

(iv) We continue with the notation (iii). The BCH code with ¦β, β

2

¦ as

deﬁning set is Hamming’s original (7,4) code.

The next theorem contains the key fact about BCH codes.

Theorem 15.15. The minimum distance for a BCH code is at least as great

as the design distance.

Our proof of Theorem 15.15 relies on showing that the matrix B of

Lemma 15.12 is of full rank for a BCH. To do this we use a result which

every undergraduate knew in 1950.

Lemma 15.16. [The van der Monde determinant] We work over a ﬁeld

K. The determinant

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1 1 1 . . . 1

x

1

x

2

x

3

. . . x

n

x

2

1

x

2

2

x

2

3

. . . x

2

n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x

n−1

1

x

n−1

2

x

n−1

3

. . . x

n−1

n

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

=

1≤j<i≤n

(x

i

−x

j

).

50

How can we construct a decoder for a BCH code? From now on, until

the end of this section, we shall suppose that we are using the BCH code C

described in Deﬁnition 15.13. In particular, C will have length n and deﬁning

set

A = ¦α, α

2

, . . . , α

δ−1

¦

where α is a primitive nth root of unity in K. Let t be the largest integer

with 2t + 1 ≤ δ. We show how we can correct up to t errors.

Suppose that a codeword c = (c

0

, c

1

, . . . , c

n−1

) is transmitted and that

the string received is r. We write e = r −c and assume that

c = ¦0 ≤ j ≤ n −1 : e

j

,= 0¦

has no more than t members. In other words, e is the error vector and we

assume that there are no more than t errors. We write

c(X) =

n−1

j=0

c

j

X

j

,

r(X) =

n−1

j=0

r

j

X

j

,

e(X) =

n−1

j=0

e

j

X

j

.

Deﬁnition 15.17. The error locator polynomial is

σ(X) =

j∈E

(1 −α

j

X)

and the error co-locator is

ω(X) =

n−1

i=0

e

i

α

i

j∈E, j=i

(1 −α

j

X).

Informally, we write

ω(X) =

n−1

i=0

e

i

α

i

σ(X)

1 −α

i

X

.

We take ω(X) =

j

ω

j

X

j

and σ(X) =

j

σ

j

X

j

. Note that ω has degree at

most t −1 and σ degree at most t. Note that we know that σ

0

= 1 so both

the polynomials ω and σ have t unknown coeﬃcients.

51

Lemma 15.18. If the error locator polynomial is given the value of e and

so of c can be obtained directly.

We wish to make use of relations of the form

1

1 −α

j

X

=

∞

r=0

(α

j

X)

r

.

Unfortunately, it is not clear what meaning to assign to such a relation. One

way round is to work modulo Z

2t

(more formally, to work in K[Z]/(Z

2t

)).

We then have Z

u

≡ 0 for all integers u ≥ 2t.

Lemma 15.19. If we work modulo Z

2t

then

(1 −α

j

Z)

2t−1

m=0

(α

j

Z)

m

≡ 1.

Thus, if we work modulo Z

2t

, as we shall from now on, we may deﬁne

1

1 −α

j

Z

=

2t−1

m=0

(α

j

Z)

m

.

Lemma 15.20. With the conventions already introduced.

(i)

ω(Z)

σ(Z)

≡

2t−1

m=0

Z

m

e(α

m+1

).

(ii) e(α

m

) = r(α

m

) for all 0 ≤ m ≤ 2t −1.

(iii)

ω(Z)

σ(Z)

≡

2t−1

m=0

Z

m

r(α

m+1

).

(iv) ω(Z) ≡

2t−1

m=0

Z

m

r(α

m+1

)σ(Z).

(v) ω

j

=

u+v=j

r(α

u+1

)σ

v

for all 0 ≤ j ≤ t −1.

(vi) 0 =

u+v=j

r(α

u+1

)σ

v

for all t ≤ j ≤ 2t −1.

(vii) The conditions in (vi) determine σ completely.

Part (vi) of Lemma 15.20 completes our search for a decoding method, since

σ determines c, c determines e and e determines c. It is worth noting that

the system of equations in part (v) suﬃce to determine the pair σ and ω

directly.

Compact disc players use BCH codes. Of course, errors are likely to

occur in bursts (corresponding to scratches etc) and this is dealt with by

52

distributing the bits (digits) in a single codeword over a much longer stretch

of track. The code used can correct a burst of 4000 consecutive errors (2.5

mm of track).

Unfortunately, none of the codes we have considered work anywhere near

the Shannon bound (see Theorem 10.1). We might suspect that this is be-

cause they are linear, but Elias has shown that this is not the case. (We just

state the result without proof.)

Theorem 15.21. In Theorem 10.1 we can replace ‘code’ by ‘linear code’.

The advance of computational power and the ingenuity of the discover-

ers

31

have lead to new codes which appear to come close to the Shannon

bounds. But that is another story.

Just as pure algebra has contributed greatly to the study of error correct-

ing codes, so the study of error correcting codes has contributed greatly to

the study of pure algebra. The story of one such contribution is set out in

T. M. Thompson’s From Error-correcting Codes through Sphere Packings to

Simple Groups [9] — a good, not too mathematical, account of the discovery

of the last sporadic simple groups by Conway and others.

16 Shift registers

In this section we move towards cryptography, but the topic discussed will

turn out to have connections with the decoding of BCH codes as well.

Deﬁnition 16.1. A general feedback shift register is a map f : F

d

2

→ F

d

2

given by

f(x

0

, x

1

, . . . , x

d−2

, x

d−1

) = (x

1

, x

2

, . . . , x

d−1

, C(x

0

, x

1

, . . . , x

d−2

, x

d−1

))

with C a map C : F

d

2

→ F

2

. The stream associated to an initial ﬁll

(y

0

, y

1

, . . . , y

d−1

) is the sequence

y

0

, y

1

, . . . , y

j

, y

j+1

, . . . with y

n

= C(y

n−d

, y

n−d+1

, . . . , y

n−1

) for all n ≥ d.

Example 16.2. If the general feedback shift f given in Deﬁnition 16.1 is a

permutation, then C is linear in the ﬁrst variable, i.e.

C(x

0

, x

1

, . . . , x

d−2

, x

d−1

) = x

0

+C

′

(x

1

, x

2

, . . . , x

d−2

, x

d−1

).

31

People like David MacKay, now better known for his superb ‘Sustainable Energy

Without the Hot Air’ — rush out and read it.

53

Deﬁnition 16.3. We say that the function f of Deﬁnition 16.1 is a linear

feedback register if

C(x

0

, x

1

, . . . , x

d−1

) = a

0

x

0

+a

1

x

1

+. . . +a

d−1

x

d−1

,

with a

0

= 1.

Exercise 16.4. Discuss brieﬂy the eﬀect of omitting the condition a

0

= 1

from Deﬁnition 16.3.

The discussion of the linear recurrence

x

n

= a

0

x

n−d

+a

1

x

n−d+1

+ +a

d−1

x

n−1

over F

2

follows the IA discussion of the same problem over R but is compli-

cated by the fact that

n

2

= n

in F

2

. We assume that a

0

,= 0 and consider the auxiliary polynomial

C(X) = X

d

−a

d−1

X

d−1

− −a

1

X −a

0

.

In the exercise below,

_

n

v

_

is the appropriate polynomial in n.

Exercise 16.5. Consider the linear recurrence

x

n

= a

0

x

n−d

+a

1

x

n−d+1

+. . . +a

d−1

x

n−1

⋆

with a

j

∈ F

2

and a

0

,= 0.

(i) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C has a root α in K. Show that x

n

= α

n

is a solution of ⋆ in K.

(ii) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C has d distinct roots α

1

, α

2

, . . . , α

d

in K. Show that the general solution

of ⋆ in K is

x

n

=

d

j=1

b

j

α

n

j

for some b

j

∈ K. If x

0

, x

1

, . . . , x

d−1

∈ F

2

, show that x

n

∈ F

2

for all n.

(iii) Work out the ﬁrst few lines of Pascal’s triangle modulo 2. Show that

the functions f

j

: Z → F

2

f

j

(n) =

_

n

j

_

54

are linearly independent in the sense that

m

j=0

b

j

f

j

(n) = 0

for all n implies b

j

= 0 for 0 ≤ j ≤ m.

(iv) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C factorises completely into linear factors. If the root α

u

has multiplicity

m(u), [1 ≤ u ≤ q], show that the general solution of ⋆ in K is

x

n

=

q

u=1

m(u)−1

v=0

b

u,v

_

n

v

_

α

n

u

for some b

u,v

∈ K. If x

0

, x

1

, . . . , x

d−1

∈ F

2

, show that x

n

∈ F

2

for all n.

A strong link with the problem of BCH decoding is provided by Theo-

rem 16.7 below.

Deﬁnition 16.6. If we have a sequence (or stream) x

0

, x

1

, x

2

, . . . of elements

of F

2

then its generating function G is given by

G(Z) =

∞

n=0

x

j

Z

j

.

Theorem 16.7. The stream (x

n

) comes from a linear feedback generator

with auxiliary polynomial C if and only if the generating function for the

stream is (formally) of the form

G(Z) =

B(Z)

C(Z)

with B a polynomial of degree strictly smaller than that of C.

If we can recover C from G then we have recovered the linear feedback

generator from the stream.

The link with BCH codes is established by looking at Lemma 15.20 (iii)

and making the following remark.

Lemma 16.8. If a stream (x

n

) comes from a linear feedback generator with

auxiliary polynomial C of degree d, then C is determined by the condition

G(Z)C(Z) ≡ B(Z) mod Z

2d

with B a polynomial of degree at most d −1.

55

We thus have the following problem.

Problem Given a generating function G for a stream and knowing that

G(Z) =

B(Z)

C(Z)

with B a polynomial of degree less than that of C and the constant term in

C is c

0

= 1, recover C.

The Berlekamp–Massey method In this method we do not assume that the

degree d of C is known. The Berlekamp–Massey solution to this problem is

based on the observation that, since

d

j=0

c

j

x

n−j

= 0

(with c

0

= 1) for all n ≥ d, we have

_

_

_

_

_

x

d

x

d−1

. . . x

1

x

0

x

d+1

x

d

. . . x

2

x

1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x

2d

x

2d−1

. . . x

d+1

x

d

_

_

_

_

_

_

_

_

_

_

1

c

1

.

.

.

c

d

_

_

_

_

_

=

_

_

_

_

_

0

0

.

.

.

0

_

_

_

_

_

. ⋆

The Berlekamp–Massey method tells us to look successively at the ma-

trices

A

1

= (x

0

), A

2

=

_

x

1

x

0

x

2

x

1

_

, A

3

=

_

_

x

2

x

1

x

0

x

3

x

2

x

1

x

4

x

3

x

2

_

_

, . . .

starting at A

r

if it is known that r ≥ d. For each A

j

we evaluate det A

j

. If

det A

j

,= 0, then j − 1 ,= d. If det A

j

= 0, then j − 1 is a good candidate

for d so we solve ⋆ on the assumption that d = j − 1. (Note that a one

dimensional subspace of F

d+1

contains only one non-zero vector.) We then

check our candidate for (c

0

, c

1

, . . . , c

d

) over as many terms of the stream as

we wish. If it fails the test, we then know that d ≥ j and we start again

32

.

As we have stated it, the Berlekamp–Massey method is not an algorithm

in the strict sense of the term although it becomes one if we put an upper

bound on the possible values of d. (A little thought shows that, if no upper

32

Note that, over F

2

, det A

j

can only take two values so there will be many false alarms.

Note also that the determinant may be evaluated much faster using reduction to (rear-

ranged) triangular form than by Cramer’s rule and that once the system is in (rearranged)

triangular form it is easy to solve the associated equations.

56

bound is put on d, no algorithm is possible because, with a suitable initial

stream, a linear feedback register with large d can be made to produce a

stream whose initial values would be produced by a linear feedback register

with much smaller d. For the same reason the Berlekamp–Massey method

will produce the B of smallest degree which gives G and not necessarily the

original B.) In practice, however, the Berlekamp–Massey method is very

eﬀective in cases when d is unknown.

By careful arrangement of the work it is possible to cut down considerably

on the labour involved.

The solution of linear equations gives us a method of ‘secret sharing’.

Problem 16.9. It is not generally known that CMS when reversed forms

the initials of of ‘Secret Missile Command’. If the University is attacked by

HEFCE

33

, the Faculty Board will retreat to a bunker known as Meeting Room

23. Entry to the room involves tapping out a positive integer S (the secret)

known only to the Chairman of the Faculty Board. Each of the n members of

the Faculty Board knows a certain pair of numbers (their shadow) and it is

required that, in the absence of the Chairman, any k members of the Faculty

can reconstruct S from their shadows, but no k −1 members can do so. How

can this be done?

Here is one neat solution. Suppose S must lie between 0 and N (it is

sensible to choose S at random). The chairman chooses a prime p > N, n.

She then chooses integers a

1

, a

2

, . . . , a

k−1

at random and distinct integers

x

1

, x

2

, . . . , x

n

at random subject to 0 ≤ a

j

≤ p − 1, 1 ≤ x

j

≤ p − 1, sets

a

0

= S and computes

P(r) ≡ a

0

+a

1

x

r

+a

2

x

2

r

+ +a

k−1

x

k−1

r

mod p

choosing 0 ≤ P(r) ≤ p − 1. She then gives the rth member of the Faculty

Board the pair of numbers

_

x

r

, P(r)

_

(the shadow pair), to be kept secret

from everybody else) and tells everybody the value of p. She then burns her

calculations.

Suppose that k members of the Faculty Board with shadow pairs

_

y

j

, Q(j)

_

=

_

x

r

j

, P(r

j

)

_

[1 ≤ j ≤ k] are together. By the properties of the Van der Monde

33

An institution like SPECTRE but without the charm.

57

determinant (see Lemma 15.16)

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1 y

1

y

2

1

. . . y

k−1

1

1 y

2

y

2

2

. . . y

k−1

2

1 y

3

y

2

3

. . . y

k−1

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1 y

k

y

2

k

. . . y

k−1

k

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

≡

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1 1 1 . . . 1

y

1

y

2

y

3

. . . y

k−1

y

2

1

y

2

2

y

2

3

. . . y

2

k−1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

y

k−1

1

y

k−1

2

y

k−1

3

. . . y

k−1

k−1

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

≡

1≤j<i≤k−1

(y

i

−y

j

) ,≡ 0 mod p.

Thus the system of equations

z

0

+y

1

z

1

+y

2

1

z

2

+. . . +y

k−1

1

z

k−1

≡ Q

1

z

0

+y

2

z

1

+y

2

2

z

2

+. . . +y

k−1

2

z

k−1

≡ Q

2

z

0

+y

3

z

1

+y

2

3

z

2

+. . . +y

k−1

3

z

k−1

≡ Q

3

.

.

.

z

0

+y

k

z

1

+y

2

k

z

2

+. . . +y

k−1

k

z

k−1

≡ Q

k

has a unique solution z. But we know that a is a solution, so z = a and the

secret S = z

0

.

On the other hand,

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

y

1

y

2

1

. . . y

k−1

1

y

2

y

2

2

. . . y

k−1

2

y

3

y

2

3

. . . y

k−1

3

.

.

.

.

.

.

.

.

.

.

.

.

y

k−1

y

2

k−1

. . . y

k−1

k−1

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

≡ y

1

y

2

. . . y

k−1

1≤j<i≤k−1

(y

i

−y

j

) ,≡ 0 mod p,

so the system of equations

z

0

+y

1

z

1

+y

2

1

z

2

+. . . +y

k−1

1

z

k−1

≡ Q

1

z

0

+y

2

z

1

+y

2

2

z

2

+. . . +y

k−1

2

z

k−1

≡ Q

2

z

0

+y

3

z

1

+y

2

3

z

2

+. . . +y

k−1

3

z

k−1

≡ Q

3

.

.

.

z

0

+y

k−1

z

1

+y

2

k−1

z

2

+. . . +y

k−1

k−1

z

k−1

≡ Q

k−1

has a solution, whatever value of z

0

we take, so k −1 members of the Faculty

Board have no way of saying that any possible values of S is more likely than

any other.

58

One way of looking at this method of ‘secret sharing’ is to note that a

polynomial of degree k − 1 can be recovered from its value at k points but

not from its value at k−1 points. However, the proof that the method works

needs to be substantially more careful.

Exercise 16.10. Is the secret compromised if the values of the x

j

become

known?

17 A short homily on cryptography

Cryptography is the science of code making. Cryptanalysis is the art of code

breaking.

Two thousand years ago, Lucretius wrote that ‘Only recently has the

true nature of things been discovered’. In the same way, mathematicians

are apt to feel that ‘Only recently has the true nature of cryptography been

discovered’. The new mathematical science of cryptography with its promise

of codes which are ‘provably hard to break’ seems to make everything that

has gone before irrelevant.

It should, however, be observed that the best cryptographic systems of our

ancestors (such as diplomatic ‘book codes’) served their purpose of ensuring

secrecy for a relatively small number of messages between a relatively small

number of people extremely well. It is the modern requirement for secrecy

on an industrial scale to cover endless streams of messages between many

centres which has made necessary the modern science of cryptography.

More pertinently, it should be remembered that the German Naval Enigma

codes not only appeared to be ‘provably hard to break’ (though not against

the modern criteria of what this should mean) but, considered in isolation,

probably were unbreakable in practice

34

. Fortunately the Submarine codes

formed part of an ‘Enigma system’ with certain exploitable weaknesses. (For

an account of how these weaknesses arose and how they were exploited see

Kahn’s Seizing the Enigma [4].)

Even the best codes are like the lock on a safe. However good the lock

is, the safe may be broken open by brute force, or stolen together with its

contents, or a key holder may be persuaded by fraud or force to open the

lock, or the presumed contents of the safe may have been tampered with

before they go into the safe, or . . . . The coding schemes we shall consider,

are at best, cryptographic elements of larger possible cryptographic systems.

The planning of cryptographic systems requires not only mathematics but

also engineering, economics, psychology, humility and an ability to learn from

34

Some versions remained unbroken until the end of the war.

59

past mistakes. Those who do not learn the lessons of history are condemned

to repeat them.

In considering a cryptographic system, it is important to consider its

purpose. Consider a message M sent by A to B. Here are some possible

aims.

Secrecy A and B can be sure that no third party X can read the message

M.

Integrity A and B can be sure that no third party X can alter the message

M.

Authenticity B can be sure that A sent the message M.

Non-repudiation B can prove to a third party that A sent the message M.

When you ﬁll out a cheque giving the sum both in numbers and words you

are seeking to protect the integrity of the cheque. When you sign a traveller’s

cheque ‘in the presence of the paying oﬃcer’ the process is intended, from

your point of view, to protect authenticity and, from the bank’s point of

view, to produce non-repudiation.

Another point to consider is the level of security aimed at. It hardly

matters if a few people use forged tickets to travel on the underground, it

does matter if a single unauthorised individual can gain privileged access to

a bank’s central computer system. If secrecy is aimed at, how long must the

secret be kept? Some military and ﬁnancial secrets need only remain secret

for a few hours, others must remain secret for years.

We must also, to conclude this non-exhaustive list, consider the level of

security required. Here are three possible levels.

(1) Prospective opponents should ﬁnd it hard to compromise your system

even if they are in possession of a plentiful supply of encoded messages C

i

.

(2) Prospective opponents should ﬁnd it hard to compromise your system

even if they are in possession of a plentiful supply of pairs (M

i

, C

i

) of messages

M

i

together with their encodings C

i

.

(3) Prospective opponents should ﬁnd it hard to compromise your system

even if they are allowed to produce messages M

i

and given their encodings

C

i

.

Clearly, safety at level (3) implies safety at level (2) and safety at level

(2) implies safety at level (1). Roughly speaking, the best Enigma codes

satisﬁed (1). The German Navy believed on good but mistaken grounds that

they satisﬁed (2). Level (3) would have appeared evidently impossible to

attain until a few years ago. Nowadays, level (3) is considered a minimal

requirement for a really secure system.

60

18 Stream ciphers

One natural way of enciphering is to use a stream cipher. We work with

streams (that is, sequences) of elements of F

2

. We use a cipher stream k

0

,

k

1

, k

2

. . . . The plain text stream p

0

, p

1

, p

2

, . . . is enciphered as the cipher

text stream z

0

, z

1

, z

2

, . . . given by

z

n

= p

n

+k

n

.

This is an example of a private key or symmetric system. The security of

the system depends on a secret (in our case the cipher stream) k shared be-

tween the cipherer and the encipherer. Knowledge of an enciphering method

makes it easy to work out a deciphering method and vice versa. In our case

a deciphering method is given by the observation that

p

n

= z

n

+k

n

.

(Indeed, writing α(p) = p + z, we see that the enciphering function α has

the property that α

2

= ι the identity map. Ciphers like this are called

symmetric.)

In the one-time pad, ﬁrst discussed by Vernam in 1926, the cipher stream

is a random sequence k

j

= K

j

, where the K

j

are independent random vari-

ables with

Pr(K

j

= 0) = Pr(K

j

= 1) = 1/2.

If we write Z

j

= p

j

+ K

j

, then we see that the Z

j

are independent random

variables with

Pr(Z

j

= 0) = Pr(Z

j

= 1) = 1/2.

Thus (in the absence of any knowledge of the ciphering stream) the code-

breaker is just faced by a stream of perfectly random binary digits. Deci-

pherment is impossible in principle.

It is sometimes said that it is hard to ﬁnd random sequences, and it

is, indeed, rather harder than might appear at ﬁrst sight, but it is not too

diﬃcult to rig up a system for producing ‘suﬃciently random’ sequences

35

.

The secret services of the former Soviet Union were particularly fond of one-

time pads. The real diﬃculty lies in the necessity for sharing the secret

35

Take ten of your favourite long books, convert to binary sequences x

j,n

and set k

n

=

10

j=1

x

j,1000+j+n

+ s

n

where s

n

is the output of your favourite ‘pseudo-random number

generator’ (in this connection see Exercise 27.16). Give a memory stick with a copy of k

to your friend and, provided both of you obey some elementary rules, your correspondence

will be safe from MI5. The anguished debate in the US about codes and privacy refers

to the privacy of large organisations and their clients, not the privacy of communication

from individual to individual.

61

sequence k. If a random sequence is reused it ceases to be random (it becomes

‘the same code as last Wednesday’ or the ‘the same code as Paris uses’) so,

when there is a great deal of code traﬃc

36

, new one-time pads must be sent

out. If random bits can be safely communicated, so can ordinary messages

and the exercise becomes pointless.

In practice, we would like to start from a short shared secret ‘seed’ and

generate a ciphering string k that ‘behaves like a random sequence’. This

leads us straight into deep philosophical waters

37

. As might be expected,

there is an illuminating discussion in Chapter III of Knuth’s marvellous The

Art of Computing Programming [7]. Note, in particular, his warning:

. . . random numbers should not be generated with a method chosen

at random. Some theory should be used.

One way that we might try to generate our ciphering string is to use a gen-

eral feedback shift register f of length d with the initial ﬁll (k

0

, k

1

, . . . , k

d−1

)

as the secret seed.

Lemma 18.1. If f is a general feedback shift register of length d, then, given

any initial ﬁll (k

0

, k

1

, . . . , k

d−1

), there will exist N, M ≤ 2

d

such that the

output stream k satisﬁes k

r+N

= k

r

for all r ≥ M.

Exercise 18.2. Show that the decimal expansion of a rational number must

be a recurrent expansion. Give a bound for the period in terms of the quo-

tient. Conversely, by considering geometric series, or otherwise, show that a

recurrent decimal represents a rational number.

Lemma 18.3. Suppose that f is a linear feedback register of length d.

(i) f(x

0

, x

1

, . . . , x

d−1

) = (x

0

, x

1

, . . . , x

d−1

) if (x

0

, x

1

, . . . , x

d−1

) = (0, 0, . . . , 0).

(ii) Given any initial ﬁll (k

0

, k

1

, . . . , k

d−1

), there will exist N, M ≤ 2

d

−1

such that the output stream k satisﬁes k

r+N

= k

r

for all r ≥ M.

We can complement Lemma 18.3 by using Lemma 14.16 and the associ-

ated discussion.

36

In 1941, the Soviet Union’s need for one-time pads suddenly increased and it appears

that pages were reused in diﬀerent pads. If the reader reﬂects, she will see that, though

this is a mistake, it is one which it is very diﬃcult to exploit. However, under the pressure

of the cold war, US code-breakers managed to decode messages which, although several

years old, still provided useful information. After 1944, the Soviet Union’s one-time pads

became genuinely one-time again and the coded messages became indecipherable.

37

Where we drown at once, since the best (at least, in my opinion) modern view is that

any sequence that can be generated by a program of reasonable length from a ‘seed’ of

reasonable size is automatically non-random.

62

Lemma 18.4. A linear feedback register of length d attains its maximal pe-

riod 2

d

−1 (for a non-trivial initial ﬁll) when the roots of the auxiliary poly-

nomial

38

are primitive elements of F

2

d.

(We will note why this result is plausible, but we will not prove it. See

Exercise 27.19 for a proof.)

It is well known that short period streams are dangerous. During World

War II the British Navy used codes whose period was adequately long for

peace time use. The massive increase in traﬃc required by war time con-

ditions meant that the period was now too short. By dint of immense toil,

German naval code breakers were able to identify coincidences and crack the

British codes.

Unfortunately, whilst short periods are deﬁnitely unsafe, it does not follow

that long periods guarantee safety. Using the Berlekamp–Massey method we

see that stream codes based on linear feedback registers are unsafe at level

(2).

Lemma 18.5. Suppose that an unknown cipher stream k

0

, k

1

, k

2

. . . is pro-

duced by an unknown linear feedback register f of unknown length d ≤ D.

The plain text stream p

0

, p

1

, p

2

, . . . is enciphered as the cipher text stream

z

0

, z

1

, z

2

, . . . given by

z

n

= p

n

+k

n

.

If we are given p

0

, p

1

, . . . p

2D−1

and z

0

, z

1

, . . . z

2D−1

then we can ﬁnd k

r

for

all r.

Thus if we have a message of length twice the length of the linear feedback

register together with its encipherment the code is broken.

It is easy to construct immensely complicated looking linear feedback

registers with hundreds of registers. Lemma 18.5 shows that, from the point

of view of a determined, well equipped and technically competent opponent,

cryptographic systems based on such registers are the equivalent of leaving

your house key hidden under the door mat. Professionals say that such

systems seek ‘security by obscurity’.

However, if you do not wish to baﬄe the CIA, but merely prevent little

old ladies in tennis shoes watching subscription television without paying for

it, systems based on linear feedback registers are cheap and quite eﬀective.

Whatever they may say in public, large companies are happy to tolerate a

certain level of fraud. So long as 99.9% of the calls made are paid for, the

38

In this sort of context we shall sometimes refer to the ‘auxiliary polynomial’ as the

‘feedback polynomial’.

63

proﬁts of a telephone company are essentially unaﬀected by the .1% which

‘break the system’.

What happens if we try some simple tricks to increase the complexity of

the cipher text stream?

Lemma 18.6. If x

n

is a stream produced by a linear feedback system of

length N with auxiliary polynomial P and y

n

is a stream produced by a linear

feedback system of length M with auxiliary polynomial Q, then x

n

+ y

n

is a

stream produced by a linear feedback system of length N + M with auxiliary

polynomial P(X)Q(X).

Note that this means that adding streams from two linear feedback system

is no more economical than producing the same eﬀect with one. Indeed the

situation may be worse since a stream produced by linear feedback system of

given length may, possibly, also be produced by another linear feedback system

of shorter length.

Lemma 18.7. Suppose that x

n

is a stream produced by a linear feedback

system of length N with auxiliary polynomial P and y

n

is a stream produced

by a linear feedback system of length M with auxiliary polynomial Q. Let P

have roots α

1

, α

2

, . . . α

N

and Q have roots β

1

, β

2

, . . . β

M

over some ﬁeld

K ⊇ F

2

. Then x

n

y

n

is a stream produced by a linear feedback system of

length NM with auxiliary polynomial

1≤i≤N

1≤i≤M

(X −α

i

β

j

).

We shall probably only prove Lemmas 18.6 and 18.7 in the case when all

roots are distinct, leaving the more general case as an easy exercise. We

shall also not prove that the polynomial

1≤i≤N

1≤i≤M

(X−α

i

β

j

) obtained

in Lemma 18.7 actually lies in F

2

[X] but (for those who are familiar with the

phrase in quotes) this is an easy exercise in ‘symmetric functions of roots’.

Here is an even easier remark.

Lemma 18.8. Suppose that x

n

is a stream which is periodic with period N

and y

n

is a stream which is periodic with period M. Then the streams x

n

+y

n

and x

n

y

n

are periodic with periods dividing the lowest common multiple of N

and M.

Exercise 18.9. One of the most conﬁdential German codes (called FISH by

the British) involved a complex mechanism which the British found could be

simulated by two loops of paper tape of length 1501 and 1497. If k

n

= x

n

+y

n

where x

n

is a stream of period 1501 and y

n

is a stream of period 1497, what

64

is the longest possible period of k

n

? How many consecutive values of k

n

would you need to to ﬁnd the underlying linear feedback register using the

Berlekamp–Massey method if you did not have the information given in the

question? If you had all the information given in the question how many

values of k

n

would you need? (Hint, look at x

n+1497

−x

n

.)

You have shown that, given k

n

for suﬃciently many consecutive n we can

ﬁnd k

n

for all n. Can you ﬁnd x

n

for all n?

It might be thought that the lengthening of the underlying linear feed-

back system obtained in Lemma 18.7 is worth having, but it is bought at a

substantial price. Let me illustrate this by an informal argument. Suppose

we have 10 streams x

j,n

(without any peculiar properties) produced by linear

feedback registers of length about 100. If we form k

n

=

10

j=1

x

j,n

, then the

Berlekamp–Massey method requires of the order of 10

20

consecutive values

of k

n

and the periodicity of k

n

can be made still more astronomical. Our

cipher key stream k

n

appears safe from prying eyes. However it is doubtful if

the prying eyes will mind. Observe that (under reasonable conditions) about

2

−1

of the x

j,n

will have the value 1 and about 2

−10

of the k

n

=

10

j=1

x

j,n

will have value 1. Thus, if z

n

= p

n

+k

n

, in more than 999 cases out of a 1000

we will have z

n

= p

n

. Even if we just combine two streams x

n

and y

n

in the

way suggested we may expect x

n

y

n

= 0 for about 75% of the time.

Here is another example where the apparent complexity of the cipher key

stream is substantially greater than its true complexity.

Example 18.10. The following is a simpliﬁed version of a standard satel-

lite TV decoder. We have 3 streams x

n

, y

n

, z

n

produced by linear feedback

registers. If the cipher key stream is deﬁned by

k

n

=x

n

if z

n

= 0,

k

n

=y

n

if z

n

= 1,

then

k

n

= (y

n

+x

n

)z

n

+x

n

and the cipher key stream is that produced by linear feedback register.

We must not jump to the conclusion that the best way round these dif-

ﬁculties is to use a non-linear feedback generator f. This is not the easy

way out that it appears. If chosen by an amateur, the complicated looking

f so produced will have the apparent advantage that we do not know what

is wrong with it and the very real disadvantage that we do not know what

is wrong with it.

65

Another approach is to observe that, so far as the potential code breaker

is concerned, the cipher stream method only combines the ‘unknown secret’

(here the feedback generator f together with the seed (k

0

, k

1

, . . . , k

d−1

)) with

the unknown message p in a rather simple way. It might be better to consider

a system with two functions F : F

m

2

F

n

2

→ F

q

2

and G : F

m

2

F

q

2

→ F

n

2

such

that

G(k, F(k, p)) = p.

Here k will be the shared secret, p the message, and z = F(k, p) the encoded

message which can be decoded by using the fact that G(k, z) = p.

In the next section we shall see that an even better arrangement is possi-

ble. However, arrangements like this have the disadvantage that the message

p must be entirely known before it is transmitted and the encoded message

z must have been entirely received before it can be decoded. Stream ciphers

have the advantage that they can be decoded ‘on the ﬂy’. They are also much

more error tolerant. A mistake in the coding, transmission or decoding of a

single element only produces an error in a single place of the sequence. There

will continue to be circumstances where stream ciphers are appropriate.

There is one further remark to be made. Suppose, as is often the case,

that we know F, that n = q and we know the ‘encoded message’ z. Suppose

also that we know that the ‘unknown secret’ or ‘key’ k ∈ / ⊆ F

m

2

and the

‘unknown message’ p ∈ T ⊆ F

n

2

. We are then faced with the problem:- Solve

the system

z = F(k, p) where k ∈ /, p ∈ T. ⋆

Speaking roughly, the task is hopeless unless ⋆ has a unique solution

39

Speaking even more roughly, this is unlikely to happen if [/[[T[ > 2

n

and is

likely to happen if 2

n

is substantially greater than [/[[T[. (Here, as usual,

[B[ denotes the number of elements of B.)

Now recall the deﬁnition of the information rate given in Deﬁnition 6.2.

If the message set / has information rate µ and the key set (that is the

shared secret set) / has information rate κ, then, taking logarithms, we see

that, if

n −mκ −nµ

39

‘According to some, the primordial Torah was inscribed in black ﬂames on white ﬁre.

At the moment of its creation, it appeared as a series of letters not yet joined up in

the form of words. For this reason, in the Torah rolls there appear neither vowels nor

punctuation, nor accents; for the original Torah was nothing but a disordered heap of

letters. Furthermore, had it not been for Adam’s sin, these letters might have been joined

diﬀerently to form another story. For the kabalist, God will abolish the present ordering

of the letters, or else will teach us how to read them according to a new disposition only

after the coming of the Messiah.’ ([1], Chapter 2.) A reader of this footnote has directed

me to the International Torah Codes Society.

66

is substantially greater than 0, then ⋆ is likely to have a unique solution,

but, if it is substantially smaller, this is unlikely.

Example 18.11. Suppose that, instead of using binary code, we consider

an alphabet of 27 letters (the English alphabet plus a space). We must take

logarithms to the base 27, but the considerations above continue to apply. The

English language treated in this way has information rate about .4. (This is

very much a ball park ﬁgure. The information rate is certainly less than .5

and almost certainly greater than .2.)

(i) In the Caesar code, we replace the ith element of our alphabet by the

i +jth (modulo 27). The shared secret is a single letter (the code for A say).

We have m = 1, κ = 1 and µ ≈ .4. Thus

n −mκ −nµ ≈ .6n −1.

If n = 1 (so n − mκ − nµ ≈ −.4) it is obviously impossible to decode the

message. If n = 10 (so n − mκ − nµ ≈ 5) a simple search through the 27

possibilities will almost always give a single possible decode.

(ii) In a simple substitution code, a permutation of the alphabet is chosen

and applied to each letter of the code in turn. The shared secret is a sequence

of 26 letters (given the coding of the ﬁrst 26 letters, the 27th can then be

deduced). We have m = 26, κ = 1 and µ ≈ .4. Thus

n −mκ −nµ ≈ .6n −26.

In The Dancing Men, Sherlock Holmes solves such a code with n = 68 (so

n − mκ − nµ ≈ 15) without straining the reader’s credulity too much and

I would think that, unless the message is very carefully chosen, most of my

audience could solve such a code with n = 200 (so n −mκ −nµ ≈ 100).

(iii) In the one-time pad m = n and κ = 1, so (if µ > 0)

n −mκ −nµ = −nµ → −∞

as n → ∞.

(iv) Note that the larger µ is, the slower n − mκ − nµ increases. This

corresponds to the very general statement that the higher the information rate

of the messages, the harder it is to break the code in which they are sent.

The ideas just introduced can be formalised by the notion of unicity

distance.

Deﬁnition 18.12. The unicity distance of a code is the number of bits of

message required to exceed the number of bits of information in the key plus

the number of bits of information in the message.

67

(The notion of information content brings us back to Shannon whose

paper Communication theory of secrecy systems

40

, published in 1949, forms

the ﬁrst modern treatment of cryptography in the open literature.)

If we only use our code once to send a message which is substantially

shorter than the unicity distance, we can be conﬁdent that no code breaker,

however gifted, could break it, simply because there is no unambiguous de-

code. (A one-time pad has unicity distance inﬁnity.) However, the fact that

there is a unique solution to a problem does not mean that it is easy to ﬁnd.

We have excellent reasons, some of which are spelled out in the next section,

to believe that there exist codes for which the unicity distance is essentially

irrelevant to the maximum safe length of a message.

19 Asymmetric systems

Towards the end of the previous section, we discussed a general coding scheme

depending on a shared secret key k known to the encoder and the decoder.

The scheme can be generalised still further by splitting the secret in two.

Consider a system with two functions F : F

m

2

F

n

2

→ F

q

2

and G : F

p

2

F

q

2

→ F

n

2

such that

G(l, F(k, p)) = p.

Here (k, l) will be be a pair of secrets, p the message and z = F(k, p) the

encoded message which can be decoded by using the fact that G(l, z) = p. In

this scheme, the encoder must know k, but need not know l and the decoder

must know l, but need not know k. Such a system is called asymmetric.

So far the idea is interesting but not exciting. Suppose, however, that we

can show that

(i) knowing F, G and k it is very hard to ﬁnd l

(ii) if we do not know l then, even if we know F, G and k, it is very hard

to ﬁnd p from F(k, p).

Then the code is secure at what we called level (3).

Lemma 19.1. Suppose that the conditions speciﬁed above hold. Then an

opponent who is entitled to demand the encodings z

i

of any messages p

i

they

choose to specify will still ﬁnd it very hard to ﬁnd p when given F(k, p).

Let us write F(k, p) = p

K

A

and G(l, z) = z

K

−1

A

and think of p

K

A

as

participant A’s encipherment of p and z

K

−1

A

as participant B’s decipherment

of z. We then have

(p

K

A

)

K

−1

A

= p.

40

Available on the web and in his Collected Papers.

68

Lemma 19.1 tells us that such a system is secure however many messages

are sent. Moreover, if we think of A as a spy-master, he can broadcast K

A

to the world (that is why such systems are called public key systems) and

invite anybody who wants to spy for him to send him secret messages in total

conﬁdence

41

.

It is all very well to describe such a code, but do they exist? There is

very strong evidence that they do, but, so far, all mathematicians have been

able to do is to show that provided certain mathematical problems which are

believed to be hard are indeed hard, then good codes exist.

The following problem is believed to be hard.

Problem Given an integer N, which is known to be the product N = pq of

two primes p and q, ﬁnd p and q.

Several schemes have been proposed based on the assumption that this fac-

torisation is hard. (Note, however, that it is easy to ﬁnd large ‘random’

primes p and q.) We give a very elegant scheme due to Rabin and Williams.

It makes use of some simple number theoretic results from IA and IB.

The reader may well have seen the following results before. In any case,

they are easy to obtain by considering primitive roots.

Lemma 19.2. If p is an odd prime the congruence

x

2

≡ d mod p

is soluble if and only if d ≡ 0 or d

(p−1)/2

≡ 1 modulo p.

Lemma 19.3. Suppose p is a prime such that p = 4k − 1 for some integer

k. Then, if the congruence

x

2

≡ d mod p

has any solution, it has d

k

as a solution.

We now call on the Chinese remainder theorem.

Lemma 19.4. Let p and q be primes of the form 4k − 1 and set N = pq.

Then the following two problems are of equivalent diﬃculty.

(A) Given N and d ﬁnd all the m satisfying

m

2

≡ d mod N.

(B) Given N ﬁnd p and q.

41

Although we make statements about certain codes along the lines of ‘It does not

matter who knows this’, you should remember the German naval saying ‘All radio traﬃc

is high treason’. If any aspect of a code can be kept secret, it should be kept secret.

69

(Note that, provided that d ,≡ 0, knowing the solution to (A) for any d gives

us the four solutions for the case d = 1.) The result is also true but much

harder to prove for general primes p and q.

At the risk of giving aid and comfort to followers of the Lakatosian heresy,

it must be admitted that the statement of Lemma 19.4 does not really tell

us what the result we are proving is, although the proof makes it clear that

the result (whatever it may be) is certainly true. However, with more work,

everything can be made precise.

We can now give the Rabin–Williams scheme. The spy-master A selects

two very large primes p and q. (Since he has only done an undergraduate

course in mathematics, he will take p and q of the form 4k − 1.) He keeps

the pair (p, q) secret, but broadcasts the public key N = pq. If B wants to

send him a message, she writes it in binary code and splits it into blocks of

length m with 2

m

< N < 2

m+1

. Each of these blocks is a number r

j

with

0 ≤ r

j

< N. B computes s

j

such that r

2

j

≡ s

j

modulo N and sends s

j

. The

spy-master (who knows p and q) can use the method of Lemma 19.4 to ﬁnd

one of four possible values for r

j

(the four square roots of s

j

). Of these four

possible message blocks it is almost certain that three will be garbage, so the

fourth will be the desired message.

If the reader reﬂects, she will see that the ambiguity of the root is gen-

uinely unproblematic. (If the decoding is mechanical then ﬁxing 50 bits

scattered throughout each block will reduce the risk of ambiguity to negli-

gible proportions.) Slightly more problematic, from the practical point of

view, is the possibility that someone could be known to have sent a very

short message, that is to have started with an m such that 1 ≤ m ≤ N

1/2

but, provided sensible precautions are taken, this should not occur.

If I Google ‘Casino’, then I am instantly put in touch with several of

the world’s ‘most trusted electronic casinos’ who subscribe to ‘responsible

gambling’ and who have their absolute probity established by ‘internation-

ally recognised Accredited Test Facilities’. Given these assurances, it seems

churlish to introduce Alice and Bob who live in diﬀerent cities, can only

communicate by e-mail and are so suspicious of each other that neither will

accept the word of the other as to the outcome of the toss of a coin.

If, in spite of this diﬃculty, Alice and Bob wish to play heads and tails (the

technical expression is ‘bit exchange’ or ‘bit sharing’), then the ambiguity of

the Rabin–Williams scheme becomes an advantage. Let us set out the steps

of a ‘bit sharing scheme’ based on Rabin–Williams.

STEP 1 Alice chooses at random two large primes p and q such that p ≡ q ≡ 3

mod 4. She computes n = pq and sends n to Bob.

STEP 2 Bob chooses a random integer r with 1 < r < n/2. (He wishes to

hide r from Alice, so he may take whatever other precautions he wishes in

70

choosing r.) He computes m ≡ r

2

mod n and sends m to Alice.

STEP 3 Since Alice knows p and q she can easily compute the 4 square roots

of m modulo n. Exactly two of the roots r

1

and r

2

will satisfy 1 < r < n/2.

(If s is a root, so is −s.) However, Alice has no means of telling which is r.

Alice writes out r

1

and r

2

in binary and chooses a place (the kth digit say)

where they diﬀer. She then tells Bob ‘I choose the value u for the kth bit’.

STEP 4 Bob tells Alice the value of r. If the value of the kth bit of r is u,

then Alice wins. If not, Bob wins. Alice checks that r

2

≡ m mod n. Since,

r

1

r

−!

2

is a square root of unity which is neither 1 nor −1, knowing r

1

and r

2

is

equivalent to factoring n, she knows that Bob could not lie about the value

of r. Thus Alice is happy.

STEP 5 Alice tells Bob the values of p and q. He checks that p and q are

primes (see Exercise 27.12 for why he does this) and ﬁnds r

1

and r

2

. After

Bob has veriﬁed that r

1

and r

2

do indeed diﬀer in the kth bit, he also is

happy, since there is no way Alice could know from inspection of m which

root he started with.

20 Commutative public key systems

In the previous sections we introduced the coding and decoding functions

K

A

and K

−1

A

with the property that

(p

K

A

)

K

−1

A

= p,

and satisfying the condition that knowledge of K

A

did not help very much in

ﬁnding K

−1

A

. We usually require, in addition, that our system be commutative

in the sense that

(p

K

−1

A

)

K

A

= p.

and that knowledge of K

−1

A

does not help very much in ﬁnding K

A

. The

Rabin–Williams scheme, as described in the last section, does not have this

property.

Commutative public key codes are very ﬂexible and provide us with simple

means for maintaining integrity, authenticity and non-repudiation. (This is

not to say that non-commutative codes can not do the same; simply that

commutativity makes many things easier.)

Integrity and non-repudiation Let A ‘own a code’, that is know both K

A

and K

−1

A

. Then A can broadcast K

−1

A

to everybody so that everybody can

decode but only A can encode. (We say that K

−1

A

is the public key and K

A

the private key.) Then, for example, A could issue tickets to the castle ball

71

carrying the coded message ‘admit Joe Bloggs’ which could be read by the

recipients and the guards but would be unforgeable. However, for the same

reason, A could not deny that he had issued the invitation.

Authenticity If B wants to be sure that A is sending a message then B can

send A a harmless random message q. If B receives back a message p such

that p

K

−1

A

ends with the message q then A must have sent it to B. (Anybody

can copy a coded message but only A can control the content.)

Signature Suppose now that B also owns a commutative code pair (K

B

, K

−1

B

)

and has broadcast K

−1

B

. If A wants to send a message p to B he computes

q = p

K

A

and sends p

K

−1

B

followed by q

K

−1

B

. B can now use the fact that

(q

K

−1

B

)

K

B

= q

to recover p and q. B then observes that q

K

−1

A

= p. Since only A can

produce a pair (p, q) with this property, A must have written it.

There is now a charming little branch of the mathematical literature

based on these ideas in which Albert gets Bertha to authenticate a message

from Caroline to David using information from Eveline, Fitzpatrick, Gilbert

and Harriet whilst Ingrid, Jacob, Katherine and Laszlo play bridge without

using a pack of cards. However, a cryptographic system is only as strong

as its weakest link. Unbreakable password systems do not prevent computer

systems being regularly penetrated by ‘hackers’ and however ‘secure’ a trans-

action on the net may be it can still involve a rogue at one end and a fool at

the other.

The most famous candidate for a commutative public key system is the

RSA (Rivest, Shamir, Adleman) system. It was the RSA system

42

that ﬁrst

convinced the mathematical community that public key systems might be

feasible. The reader will have met the RSA in IA, but we will push the ideas

a little bit further.

Lemma 20.1. Let p and q be primes. If N = pq and λ(N) = lcm(p−1, q−1),

then

M

λ(N)

≡ 1 (mod N)

for all integers M coprime to N.

42

A truly patriotic lecturer would refer to the ECW system, since Ellis, Cocks and

Williamson discovered the system earlier. However, they worked for GCHQ and their

work was kept secret.

72

Since we wish to appeal to Lemma 19.4, we shall assume in what follows

that we have secretly chosen large primes p and q. We choose an integer e

and then use Euclid’s algorithm to check that e and λ(N) are coprime and

to ﬁnd an integer d such that

de ≡ 1 (mod λ(N)).

If Euclid’s algorithm reveals that e and λ(N) are not coprime, we try another

e. Since others may be better psychologists than we are, we would be wise

to use some sort of random method for choosing p, q and e.

The public key includes the value of e and N, but we keep secret the

value of d. Given a number M with 1 ≤ M ≤ N − 1, we encode it as the

integer E with 1 ≤ E ≤ N −1

E ≡ M

d

(mod N).

The public decoding method is given by the observation that

E

e

≡ M

de

≡ M

for M coprime to N. (The probability that M is not coprime to N is so

small that it can be neglected.) As was observed in IA, high powers are easy

to compute.

Exercise 20.2. Show how M

2

n

can be computed using n multiplications. If

1 ≤ r ≤ 2

n

show how M

r

can be computed using at most 2n multiplications.

To show that (providing that factoring N is indeed hard) ﬁnding d from

e and N is hard we use the following lemma.

Lemma 20.3. Suppose that d, e and N are as above. Set de−1 = 2

a

b where

b is odd.

(i) a ≥ 1.

(ii) If y ≡ x

b

(mod N) and y ,≡ 1 then there exists an r with 0 ≤ r ≤ a−1

such that

z = y

2

r

,≡ 1 but z

2

≡ 1 (mod N).

Combined with Lemma 19.4, the idea of Lemma 20.3 gives a fast prob-

abilistic algorithm where, by making random choices of x, we very rapidly

reduce the probability that we can not ﬁnd p and q to as close to zero as we

wish.

Lemma 20.4. The problem of ﬁnding d from the public information e and

N is essentially as hard as factorising N.

73

Remark 1 At ﬁrst glance, we seem to have done as well for the RSA

code as for the Rabin–Williams code. But this is not so. In Lemma 19.4 we

showed that ﬁnding the four solutions of M

2

≡ E (mod N) was equivalent

to factorising N. In the absence of further information, ﬁnding one root is

as hard as ﬁnding another. Thus the ability to break the Rabin-Williams

code (without some tremendous stroke of luck) is equivalent to the ability

to factor N. On the other hand it is, a priori, possible that someone may

ﬁnd a decoding method for the RSA code which does not involve knowing d.

They would have broken the RSA code without ﬁnding d. It must, however,

be said that, in spite of this problem, the RSA code is much used in practice

and the Rabin–Williams code is not.

Remark 2 It is natural to ask what evidence there is that the factorisation

problem really is hard. Properly organised, trial division requires O(N

1/2

) op-

erations to factorise a number N. This order of magnitude was not bettered

until 1972 when Lehman produced a O(N

1/3

) method. In 1974, Pollard

43

produced a O(N

1/4

) method. In 1979, as interest in the problem grew be-

cause of its connection with secret codes, Lenstra made a breakthrough to

a O(e

c((log N)(log log N))

1/2

) method with c ≈ 2. Since then some progress has

been made (Pollard reached O(e

2((log N)(log log N))

1/3

) but, in spite of intense

eﬀorts, mathematicians have not produced anything which would be a real

threat to codes based on the factorisation problem. A series of challenge

numbers is hosted on the Wikipedia article entitled RSA. In 1996, it was

possible to factor 100 (decimal) digit numbers routinely, 150 digit numbers

with immense eﬀort but 200 digit numbers were out of reach. In May 2005,

the 200 digit challenge number was factored by F. Bahr, M. Boehm, J. Franke

43

Although mathematically trained, Pollard worked outside the professional mathemat-

ical community.

74

and T. Kleinjunge as follows

27997833911221327870829467638722601621

07044678695542853756000992932612840010

76093456710529553608560618223519109513

65788637105954482006576775098580557613

57909873495014417886317894629518723786

9221823983

= 35324619344027701212726049781984643

686711974001976250236493034687761212536

79423200058547956528088349

7925869954478333033347085841480059687

737975857364219960734330341455767872818

152135381409304740185467

but the 210 digit challenge

24524664490027821197651766357308801846

70267876783327597434144517150616008300

38587216952208399332071549103626827191

67986407977672324300560059203563124656

12184658179041001318592996199338170121

49335034875870551067

remains (as of mid-2008) unfactored. Organisations which use the RSA and

related systems rely on ‘security through publicity’. Because the problem of

cracking RSA codes is so notorious, any breakthrough is likely to be publicly

announced

44

. Moreover, even if a breakthrough occurs, it is unlikely to be

one which can be easily exploited by the average criminal. So long as the

secrets covered by RSA-type codes need only be kept for a few months rather

than forever

45

, the codes can be considered to be one of the strongest links

in the security chain.

44

And if not, is most likely to be a government rather than a Maﬁa secret.

45

If a suﬃciently robust ‘quantum computer’ could be built, then it could solve the

factorisation problem and the discrete logarithm problem (mentioned later) with high

probability extremely fast. It is highly unlikely that such a machine would be or could be

kept secret, since it would have many more important applications than code breaking.

75

21 Trapdoors and signatures

It might be thought that secure codes are all that are needed to ensure the

security of communications, but this is not so. It is not necessary to read

a message to derive information from it

46

. In the same way, it may not be

necessary to be able to write a message in order to tamper with it.

Here is a somewhat far fetched but worrying example. Suppose that, by

wire tapping or by looking over peoples’ shoulders, I discover that a bank

creates messages in the form M

1

, M

2

where M

1

is the name of the client

and M

2

is the sum to be transferred to the client’s account. The messages

are then encoded according to the RSA scheme discussed after Lemma 20.1

as Z

1

= M

d

1

and Z

2

= M

d

2

. I then enter into a transaction with the bank

which adds $ 1000 to my account. I observe the resulting Z

1

and Z

2

and

then transmit Z

1

followed by Z

3

2

.

Example 21.1. What will (I hope) be the result of this transaction?

We say that the RSA scheme is vulnerable to ‘homomorphism attack’ that

is to say an attack which makes use of the fact our code is a homomorphism.

(If θ(M) = M

d

, then θ(M

1

M

2

) = θ(M

1

)θ(M

2

).)

One way of increasing security against tampering is to ﬁrst code our

message by a classical coding method and then use our RSA (or similar)

scheme on the result.

Exercise 21.2. Discuss brieﬂy the eﬀect of ﬁrst using an RSA scheme and

then a classical code.

However there is another way forward which has the advantage of wider

applicability since it also can be used to protect the integrity of open (non-

coded) messages and to produce password systems. These are the so-called

signature systems. (Note that we shall be concerned with the ‘signature of

the message’ and not the signature of the sender.)

Deﬁnition 21.3. A signature or trapdoor or hashing function is a mapping

H : / → o from the space / of possible messages to the space o of possible

signatures.

(Let me admit, at once, that Deﬁnition 21.3 is more of a statement of notation

than a useful deﬁnition.) The ﬁrst requirement of a good signature function

is that the space / should be much larger than the space o so that H is a

many-to-one function (in fact a great-many-to-one function) and we can not

46

During World War II, British bomber crews used to spend the morning before a night

raid testing their equipment, this included the radios.

76

work back from H(M) to M. The second requirement is that o should be

large so that a forger can not (sensibly) hope to hit on H(M) by luck.

Obviously we should aim at the same kind of security as that oﬀered by

our ‘level 2’ for codes:-

Prospective opponents should ﬁnd it hard to ﬁnd H(M) given M

even if they are in possession of a plentiful supply of message–

signature pairs (M

i

, H(M

i

)) of messages M

i

together with their

encodings C

i

.

I leave it to the reader to think about level 3 security (or to look at section

12.6 of [10]).

Here is a signature scheme due to Elgamal

47

. The message sender A

chooses a very large prime p, some integer 1 < g < p and some other integer

u with 1 < u < p (as usual, some randomisation scheme should be used). A

then releases the values of p, g and y = g

u

(modulo p) but keeps the value of

u secret. Whenever he sends a message m (some positive integer), he chooses

another integer k with 1 ≤ k ≤ p −2 at random and computes r and s with

1 ≤ r ≤ p −1 and 0 ≤ s ≤ p −2 by the rules

48

r ≡ g

k

(mod p), (*)

m ≡ ur +ks (mod p −1). (**)

Lemma 21.4. If conditions (*) and (**) are satisﬁed, then

g

m

≡ y

r

r

s

(mod p).

If A sends the message m followed by the signature (r, s), the recipient need

only verify the relation g

m

≡ y

r

r

s

(mod p) to check that the message is

authentic

49

.

Since k is random, it is believed that the only way to forge signatures is

to ﬁnd u from g

u

(or k from g

k

) and it is believed that this problem, which

is known as the discrete logarithm problem, is very hard.

Needless to say, even if it is impossible to tamper with a message–signature

pair it is always possible to copy one. Every message should thus contain a

unique identiﬁer such as a time stamp.

47

This is Dr Elgamal’s own choice of spelling according to Wikipedia.

48

There is a small point which I have glossed over here and elsewhere. Unless k and

and p −1 are coprime the equation (**) may not be soluble. However the quickest way to

solve (**), if it is soluble, is Euclid’s algorithm which will also reveal if (**) is insoluble.

If (**) is insoluble, we simply choose another k at random and try again.

49

Sometimes, m is replaced by some hash function H(m) of m so (∗∗) becomes H(m) ≡

ur + ks (mod p −1). In this case the recipient checks that g

H(m)

≡ y

r

r

s

(mod p).

77

The evidence that the discrete logarithm problem is very hard is of the

same kind of nature and strength as the evidence that the factorisation prob-

lem is very hard. We conclude our discussion with a description of the Diﬃe–

Hellman key exchange system which is also based on the discrete logarithm

problem.

The modern coding schemes which we have discussed have the disadvan-

tage that they require lots of computation. This is not a disadvantage when

we deal slowly with a few important messages. For the Web, where we must

deal speedily with a lot of less than world shattering messages sent by im-

patient individuals, this is a grave disadvantage. Classical coding schemes

are fast but become insecure with reuse. Key exchange schemes use modern

codes to communicate a new secret key for each message. Once the secret

key has been sent slowly, a fast classical method based on the secret key is

used to encode and decode the message. Since a diﬀerent secret key is used

each time, the classical code is secure.

How is this done? Suppose A and B are at opposite ends of a tapped

telephone line. A sends B a (randomly chosen) large prime p and a randomly

chosen g with 1 < g < p − 1. Since the telephone line is insecure, A and B

must assume that p and g are public knowledge. A now chooses randomly a

secret number α and tells B the value of g

α

(modulo p). B chooses randomly

a secret number β and tells A the value of g

β

(modulo p). Since

g

αβ

≡ (g

α

)

β

≡ (g

β

)

α

,

both A and B can compute k = g

αβ

modulo p and k becomes the shared

secret key.

The eavesdropper is left with the problem of ﬁnding k ≡ g

αβ

from knowl-

edge of g, g

α

and g

β

(modulo p). It is conjectured that this is essentially as

hard as ﬁnding α and β from the values of g, g

α

and g

β

(modulo p) and this

is the discrete logarithm problem.

22 Quantum cryptography

In the days when messages were sent in the form of letters, suspicious people

might examine the creases where the paper was folded for evidence that

the letter had been read by others. Our ﬁnal cryptographic system has the

advantage that it too will reveal attempts to read it. It also has the advantage

that, instead of relying on the unproven belief that a certain mathematical

task is hard, it depends on the fact that a certain physical task is impossible

50

.

50

If you believe our present theories of the universe.

78

We shall deal with a highly idealised system. The business of dealing with

realistic systems is a topic of active research within the faculty. The system

we sketch is called the BB84 system (since it was invented by Bennett and

Brassard in 1984) but there is another system invented by Ekert.

Quantum mechanics tells us that a polarised photon has a state

φ = α[ ¡) +β[ ↔)

where α, β ∈ R, α

2

+ β

2

= 1, ¡) is the vertically polarised state and ↔) is

the horizontally polarised state. Such a photon will pass through a vertical

polarising ﬁlter with probability α

2

and its state will then be ¡). It will pass

through a horizontal polarising ﬁlter with probability β

2

and its state will

then be ↔). We have an orthonormal basis consisting of ¡) and ↔) by +.

We now consider a second basis given by

) =

1

√

2

[ ¡) +

1

√

2

[ ↔) and ) =

1

√

2

[ ¡) −

1

√

2

[ ↔)

in which the states correspond to polarisation at angles π/4 and −π/4 to the

horizontal. Observe that a photon in either state will have a probability 1/2

of passing through either a vertical or a horizontal ﬁlter and will then be in

the appropriate state.

Suppose Eve

51

intercepts a photon passing between Alice and Bob. If

Eve knows that it is either horizontally or vertically polarised, then she can

use a vertical ﬁlter. If the photon passes through, she knows that it was

vertically polarised when Alice sent it and can pass on a vertically polarised

photon to Bob. If the photon does not pass through through, she knows

that the photon was horizontally polarised and can pass on a horizontally

polarised photon to Bob. However, if Alice’s photon was actually diagonally

polarised (at angle ±π/4), this procedure will result in Eve sending Bob a

photon which is horizontally or vertically polarised.

It is possible that the ﬁnder of a fast factorising method would get a

Field’s medal. It is certain that anyone who can do better than Eve would

get the Nobel prize for physics since they would have overturned the basis of

Quantum Mechanics.

Let us see how this can (in principle) be used to produce a key exchange

scheme (so that Alice and Bob can agree on a random number to act as the

basis for a classical code).

STEP 1 Alice produces a secret random sequence a

1

a

2

. . . of bits (zeros and

ones) and Bob produces another secret random sequence b

1

b

2

. . . of bits.

STEP 2 Alice produces another secret random sequence c

1

c

2

. . . . She trans-

mits it to Bob as follows.

51

This is a traditional pun.

79

If a

j

= 0 and c

j

= 0, she uses a vertically polarised photon.

If a

j

= 0 and c

j

= 1, she uses a horizontally polarised photon.

If a

j

= 1 and c

j

= 0, she uses a ‘left diagonally’ polarised photon.

If a

j

= 1 and c

j

= 1, she uses a ‘right diagonally’ polarised photon.

STEP 3 If b

j

= 0, Bob uses a vertical polariser to examine the jth photon.

If he records a vertical polarisation, he sets d

j

= 0, if a horizontal he sets

d

j

= 1. If b

j

= 1, Bob uses a π/4 diagonal polariser to examine the jth

photon. If he records a left diagonal polarisation, he sets d

j

= 0, if a right

he sets d

j

= 1.

STEP 4 Bob and Alice use another communication channel to tell each other

the values of the a

j

and b

j

. Of course, they should try to keep these com-

munication secret, but we shall assume that worst has happened and these

values become known to Eve.

STEP 5 If the sequences are long, we can be pretty sure, by the law of large

numbers, that a

j

= b

j

in about half the cases. (If not, Bob and Alice can

agree to start again.) In particular, we can ensure that, with probability of

at least 1 −ǫ/4 (where ǫ is chosen in advance), the number of agreements is

suﬃciently large for the purposes set out below. Alice and Bob only look at

the ‘good cases’ when a

j

= b

j

. In such cases, if Eve does not examine the

associated photon, then d

j

= c

j

. If Eve does examine the associated photon,

then with probability 1/4, d

j

,= c

j

.

To see this, we examine the case when c

j

= 0 and Eve uses a diagonal

polariser. (The other cases may be treated in exactly the same way.) With

probability 1/2, a

j

= 1 so the photon is diagonally polarised, Eve records the

correct polarisation and sends Bob a correctly polarised photon. Thus d

j

=

c

j

. With probability 1/2, a

j

= 0 so the photon is vertically or horizontally

polarised. Since Eve records a diagonal polarisation she will send a diagonally

polarised photon to Bob and, since Bob’s polariser is vertical, he will record

a vertical polarisation with probability 1/2.

STEP 6 Alice uses another communication channel to tell Bob the value of

a randomly chosen sample of good cases. Standard statistical techniques

tell Alice and Bob that, if the number of discrepancies is below a certain

level, the probability that Eve is intercepting more than a previously chosen

proportion p of photons is less than ǫ/4. If the number of discrepancies is

greater than the chosen level, Alice and Bob will abandon the attempt to

communicate.

STEP 7 If Eve is intercepting less than a proportion p of photons and q > p

(with q chosen in advance) the probability that she will have intercepted

more than a proportion q of the remaining ‘good’ photons is less than ǫ/4.

Although we shall not do this, the reader who has ploughed through these

80

notes will readily accept that Bob and Alice can use the message conveyed

through the remaining good photons to construct a common secret such that

Eve has probability less than ǫ/4 of guessing it.

Thus, unless they decide that their messages are being partially read,

Alice and Bob can agree a shared secret with probability less than ǫ that an

eavesdropper can guess it.

There are various gaps in the exposition above. First we have assumed

that Eve must hold her polariser at a small ﬁxed number of angles. A little

thought shows that allowing her a free choice of angle will make little dif-

ference. Secondly, since physical systems always have imperfections, some

‘good’ photons will produce errors even in the absence of Eve. This means

that p in STEP 5 must be chosen above the ‘natural noise level’ and the

sequences must be longer but, again, this ought to make little diﬀerence.

There is a further engineering problem that it is very diﬃcult just to send

single photons every time. If there are too many groups of photons, then Eve

only need capture one and let the rest go, so we can not detect eavesdrop-

ping. If there are only a few, then the values of p and q can be adjusted to

take account of this. There are several networks in existence which employ

quantum cryptography.

Quantum cryptography has deﬁnite advantages when matched individu-

ally against RSA, secret sharing (using a large number of independent chan-

nels) or one-time pads. It is less easy to ﬁnd applications where it is better

than the best choice of one of these three ‘classical’ methods

52

.

Of course, quantum cryptography will appeal to those who need to per-

suade others that they are using the latest and most expensive technology

to guard their secrets. However as I said before coding schemes are at best,

cryptographic elements of larger possible cryptographic systems. If smiling

white coated technicians install big gleaming machines with ‘Unbreakable

Quantum Code Company’ painted in large letters above the keyboard in the

homes of Alice and Bob, it does not automatically follow that their commu-

nications are safe. Money will buy the appearance of security. Only thought

will buy the appropriate security for a given purpose at an appropriate cost.

And even then we can not be sure.

As we know,

There are known knowns.

There are things we know we know.

We also know

52

One problem is indicated by the ﬁrst British military action in World War I which

was to cut the undersea telegraph cables linking Germany to the outside world. Complex

systems are easier to disrupt than simple ones.

81

There are known unknowns.

That is to say

We know there are some things

We do not know.

But there are also unknown unknowns,

The ones we don’t know

We don’t know

53

.

23 Further reading

For many students this will be one of the last university mathematics course

they will take. Although the twin subjects of error-correcting codes and

cryptography occupy a small place in the grand panorama of modern math-

ematics, it seems to me that they form a very suitable topic for such a ﬁnal

course.

Outsiders often think of mathematicians as guardians of abstruse but set-

tled knowledge. Even those who understand that there are still problems un-

settled, ask what mathematicians will do when they run out of problems. At

a more subtle level, Kline’s magniﬁcent Mathematical Thought from Ancient

to Modern Times [5] is pervaded by the melancholy thought that, though the

problems will not run out, they may become more and more baroque and

inbred. ‘You are not the mathematicians your parents were’ whispers Kline

‘and your problems are not the problems your parents’ were.’

However, when we look at this course, we see that the idea of error-

correcting codes did not exist before 1940. The best designs of such codes

depend on the kind of ‘abstract algebra’ that historians like Kline and Bell

consider a dead end, and lie behind the superior performance of CD players

and similar artifacts.

In order to go further into the study of codes, whether secret or error

correcting, we need to go into the question of how the information content of

a message is to be measured. ‘Information theory’ has its roots in the code

breaking of World War II (though technological needs would doubtless have

led to the same ideas shortly thereafter anyway). Its development required a

level of sophistication in treating probability which was simply not available

in the 19th century. (Even the Markov chain is essentially 20th century

54

.)

The question of what makes a calculation diﬃcult could not even have

53

Rumsfeld

54

We are now in the 21st century, but I suspect that we are still part of the mathematical

‘long 20th century’ which started in the 1880s with the work of Cantor and like minded

contemporaries.

82

been thought about until G¨odel’s theorem (itself a product of the great ‘foun-

dations crisis’ at the beginning of the 20th century). Developments by Turing

and Church of G¨odel’s theorem gave us a theory of computational complex-

ity which is still under development today. The question of whether there

exist ‘provably hard’ public codes is intertwined with still unanswered ques-

tions in complexity theory. There are links with the profound (and very 20th

century) question of what constitutes a random number.

Finally, the invention of the electronic computer has produced a cultural

change in the attitude of mathematicians towards algorithms. Before 1950,

the construction of algorithms was a minor interest of a few mathematicians.

(Gauss and Jacobi were considered unusual in the amount of thought they

gave to actual computation.) Today, we would consider a mathematician as

much as a maker of algorithms as a prover of theorems. The notion of the

probabilistic algorithm which hovered over much of our discussion of secret

codes is a typical invention of the last decades of the 20th century.

Although both the subjects of error correcting and secret codes are now

‘mature’ in the sense that they provide usable and well tested tools for prac-

tical application, they still contain deep unanswered questions. For example

How close to the Shannon bound can a ‘computationally easy’ error cor-

recting code get?

Do provably hard public codes exist?

Even if these questions are too hard, there must surely exist error cor-

recting and public codes based on new ideas

55

. Such ideas would be most

welcome and, although they are most likely to come from the professionals,

they might come from outside the usual charmed circles.

Those who wish to learn about error correction from the horse’s mouth

will consult Hamming’s own book on the matter [2]. For the present course,

the best book I know for further reading is Welsh [10]. After this, the book

of Goldie and Pinch [8] provides a deeper idea of the meaning of information

and its connection with the topic. The book by Koblitz [6] develops the

number theoretic background. The economic and practical importance of

transmitting, storing and processing data far outweighs the importance of

hiding it. However, hiding data is more romantic. For budding cryptologists

and cryptographers (as well as those who want a good read), Kahn’s The

Codebreakers [3] has the same role as is taken by Bell’s Men of Mathematics

for budding mathematicians.

I conclude with a quotation from Galbraith (referring to his time as am-

bassador to India) taken from Koblitz’s entertaining text [6].

I had asked that a cable from Washington to New Delhi . . . be

55

Just as quantum cryptography was.

83

reported to me through the Toronto consulate. It arrived in code;

no facilities existed for decoding. They brought it to me at the

airport — a mass of numbers. I asked if they assumed I could

read it. They said no. I asked how they managed. They said

that when something arrived in code, they phoned Washington

and had the original read to them.

References

[1] U. Eco The Search for the Perfect Language (English translation), Black-

well, Oxford 1995.

[2] R. W. Hamming Coding and Information Theory (2nd edition) Prentice

Hall, 1986.

[3] D. Kahn The Codebreakers: The Story of Secret Writing MacMillan,

New York, 1967. (A lightly revised edition has recently appeared.)

[4] D. Kahn Seizing the Enigma Houghton Miﬄin, Boston, 1991.

[5] M. Kline Mathematical Thought from Ancient to Modern Times OUP,

1972.

[6] N. Koblitz A Course in Number Theory and Cryptography Springer,

1987.

[7] D. E. Knuth The Art of Computing Programming Addison-Wesley. The

third edition of Volumes I to III is appearing during this year and the

next (1998–9).

[8] G. M. Goldie and R. G. E. Pinch Communication Theory CUP, 1991.

[9] T. M. Thompson From Error-correcting Codes through Sphere Packings

to Simple Groups Carus Mathematical Monographs 21, MAA, Wash-

ington DC, 1983.

[10] D. Welsh Codes and Cryptography OUP, 1988.

84

There is a widespread superstition, believed both by supervisors and su-

pervisees, that exactly twelve questions are required to provide full under-

standing of six hours of mathematics and that the same twelve questions

should be appropriate for students of all abilities and all levels of diligence.

I have tried to keep this in mind, but have provided some extra questions in

the various exercise sheets for those who scorn such old wives’ tales.

24 Exercise Sheet 1

Q 24.1. (Exercises 1.1 and 1.2.) (i) Consider Morse code.

A → • −∗ B → −• • • ∗ C → −• −• ∗

D → −• •∗ E → •∗ F → • • −• ∗

O → −−−∗ S → • • •∗ 7 → −−• • •∗

Decode −• −• ∗ −−−∗ −• • ∗ • ∗.

(ii) Consider ASCII code.

A → 1000001 B → 1000010 C → 1000011

a → 1100001 b → 1100010 c → 1100011

+ → 0101011 ! → 0100001 7 → 0110111

Encode b7!. Decode 110001111000011100010.

Q 24.2. (Exercises 1.3, 1.4 and 1.7.) Consider two alphabets / and B and

a coding function c : / → B

∗

(i) Explain, without using the notion of preﬁx-free codes, why, if c is

injective and ﬁxed length, c is decodable. Explain why, if c is injective and

ﬁxed length, c is preﬁx-free.

(ii) Let / = B = ¦0, 1¦. If c(0) = 0, c(1) = 00 show that c is injective

but c

∗

is not.

(iii) Let / = ¦1, 2, 3, 4, 5, 6¦ and B = ¦0, 1¦. Show that there is a variable

length coding c such that c is injective and all code words have length 2 or

less. Show that there is no decodable coding c such that all code words have

length 2 or less

Q 24.3. The product of two codes c

j

: /

j

→ B

∗

j

is the code

g : /

1

/

2

→ (B

1

∪ B

2

)

∗

given by g(a

1

, a

2

) = c

1

(a

1

)c

2

(a

2

).

Show that the product of two preﬁx-free codes is preﬁx free, but the prod-

uct of a decodable code and a preﬁx-free code need not even be decodable.

85

Q 24.4. (Exercises 2.5 and 2.7)

(i) Apply Huﬀman’s algorithm to the nine messages M

j

where M

j

has

probability j/45 for 1 ≤ j ≤ 9.

(ii) Consider 4 messages with the following properties. M

1

has probability

.23, M

2

has probability .24, M

3

has probability .26 and M

4

has probability

.27. Show that any assignment of the code words 00, 01, 10 and 11 produces

a best code in the sense of this course.

Q 24.5. (Exercises 2.6 and 4.6.) (i) Consider 64 messages M

j

. M

1

has

probability 1/2, M

2

has probability 1/4 and M

j

has probability 1/248 for

3 ≤ j ≤ 64. Explain why, if we use code words of equal length, then the

length of a code word must be at least 6. By using the ideas of Huﬀman’s

algorithm (you should not need to go through all the steps) obtain a set of

code words such that the expected length of a code word sent is no more than

3.

(ii) Let a, b > 0. Show that

log

a

b =

log b

log a

.

Q 24.6. (Exercise 4.10) (i) Let / = ¦1, 2, 3, 4¦. Suppose that the probability

that letter k is chosen is k/10. Use your calculator to ﬁnd ⌈−log

2

p

k

⌉ and

write down a Shannon–Fano code c.

(ii) We found a Huﬀman code c

h

for the system in Example 2.4. Show that

the entropy is approximately 1.85, that E[c(A)[ = 2.4 and that E[c

h

(A)[ =

1.9. Check that these results are consistent with the appropriate theorems

of the course.

Q 24.7. (Exercise 5.1) Suppose that we have a sequence X

j

of random vari-

ables taking the values 0 and 1. Suppose that X

1

= 1 with probability 1/2

and X

j+1

= X

j

with probability .99 independent of what has gone before.

(i) Suppose we wish to send 10 successive bits X

j

X

j+1

. . . X

j+9

. Show

that if we associate the sequence of ten zeros with 0, the sequence of ten

ones with 10 and any other sequence a

0

a

1

. . . a

9

with 11a

0

a

1

. . . a

9

, we have

a decodable code which on average requires about 5/2 bits to transmit the

sequence.

(ii) Suppose we wish to send the bits X

j

X

j+10

6X

j+2×10

6 . . . X

j+9×10

6. Ex-

plain why any decodable code will require on average at least 10 bits to

transmit the sequence. (You need not do detailed computations.)

Q 24.8. In Bridge, a 52 card pack is dealt to provide 4 hands of 13 cards

each.

86

(i) Purely as a matter of interest, we consider the following question. If

the contents of a hand are conveyed by one player to their partner by a

series of nods and shakes of the head how many movements of the head are

required? Show that at least 40 movements are required. Give a simple code

requiring 52 movements.

[You may assume for simplicity that the player to whom the information

is being communicated does not look at her own cards. (In fact this does not

make a diﬀerence since the two players do not acquire any shared information

by looking at their own cards.)]

(ii) If instead the player uses the initial letters of words (say using the 16

most common letters), how many words will you need to utter

56

?

Q 24.9. (i) In a comma code, like Morse code, one symbol from an alphabet of

m letters is reserved to end each code word. Show that this code is preﬁx-free

and give a direct argument to show that it must satisfy Kraft’s inequality.

(ii) Give an example of a code satisfying Kraft’s inequality which is not

decodable.

Q 24.10. Show that if an optimal binary code has word lengths s

1

, s

2

, . . . s

m

then

mlog

2

m ≤ s

1

+s

2

+ +s

m

≤ (m

2

+m−2)/2.

Q 24.11. (i) It is known that exactly one member of the starship Emphasise

has contracted the Macguﬃn virus. A test is available that will detect the

virus at any dilution. However, the power required is such that the ship’s

force shields must be switched oﬀ

57

for a minute during each test. Blood

samples are taken from all crew members. The ship’s computer has worked

out that the probability of crew member number i harbouring the virus is p

i

.

(Thus the probability that the captain, who is, of course, number 1, has the

disease is p

1

.) Explain how, by testing pooled samples, the expected number

of tests can be minimised. Write down the exact form of the test when there

are 2

n

crew members and p

i

= 2

−n

.

(ii) Questions like (i) are rather artiﬁcial, since they require that exactly

one person carries the virus. Suppose that the probability that any member

of a population of 2

n

has a certain disease is p (and that the probability

56

‘Marked cards, M. l’Anglais?’ I said, with a chilling sneer. ’They are used, I am told,

to trap players–not unbirched schoolboys.’

’Yet I say that they are marked!’ he replied hotly, in his queer foreign jargon. ’In my

last hand I had nothing. You doubled the stakes. Bah, sir, you knew! You have swindled

me!’

’Monsieur is easy to swindle – when he plays with a mirror behind him,’ I answered

tartly. Under the Red Robe S. J. Weyman

57

‘Captain, ye canna be serious.’

87

is independent of the health of the others) and there exists an error free

test which can be carried out on pooled blood samples which indicates the

presence of the disease in at least one of the samples or its absence from all.

Explain why there cannot be a testing scheme which can be guaranteed

to require less than 2

n

tests to diagnose all members of the population. How

does the scheme suggested in the last sentence of (i) need to be modiﬁed to

take account of the fact that more than one person may be ill (or, indeed,

no one may be ill)? Show that the expected number of tests required by

the modiﬁed scheme is no greater than pn2

n+1

+ 1. Explain why the cost of

testing a large population of size x is no more than about 2pcx log

2

x with c

the cost of a test.

(iii) In practice, pooling schemes will be less complicated. Usually a

group of x people are tested jointly and, if the joint test shows the disease,

each is tested individually. Explain why this is not sensible if p is large but is

sensible (with a reasonable choice of x) if p is small. If p is small, explain why

there is an optimum value for x Write down (but do not attempt to solve) an

equation which indicates (in a ‘mathematical methods’ sense) that optimum

value in terms of p, the probability that an individual has the disease.

Schemes like these are only worthwhile if the disease is rare and the

test is both expensive and will work on pooled samples. However, these

circumstances do occur together from time to time and the idea then produces

public health beneﬁts much more cheaply than would otherwise be possible.

Q 24.12. (i) Give the appropriate generalisation of Huﬀman’s algorithm

to an alphabet with a symbols when you have m messages and m ≡ 1

mod a −1.

(ii) Prove that your algorithm gives an optimal solution.

(iii) Extend the algorithm to cover general m by introducing messages of

probability zero.

Q 24.13. (i) A set of m apparently identical coins consists of m− 1 coins

and one heavier coin. You are given a balance in which you can weigh

equal numbers of the coins and determine which side (if either) contains the

heavier coin. You wish to ﬁnd the heavy coin in the fewest average number

of weighings.

If 3

r

+ 1 ≤ m ≤ 3

r+1

show that you can label each coin with a ternary

number a

1

a

2

. . . a

r+1

with a

j

∈ ¦0, 1, 2¦ in such a way that the number of

coins having 1 in the jth place equals the number of coins with 2 in the jth

place for each j (think Huﬀman ternary trees).

By considering the Huﬀman algorithm problem for preﬁx-free codes on

an alphabet with three letters, solve the problem stated in the ﬁrst part

88

and show that you do indeed have a solution. Show that your solution also

minimises the maximum number of weighings that you might have to do.

(ii) Suppose the problem is as before but m = 12 and the odd coin may

be heavier or lighter. Show that you need at least 3 weighings.

[In fact you can always do it in 3 weighings, but the problem of showing

this ‘is said to have been planted during the war . . . by enemy agents since

Operational Research spent so many man-hours on its solution.’

58

]

Q 24.14. Extend the deﬁnition of entropy to a random variable X taking

values in the non-negative integers. (You must allow for the possibility of

inﬁnite entropy.)

Compute the expected value EY and entropy H(Y ) in the case when

Y has the geometric distribution, that is to say Pr(Y = k) = p

k

(1 − p)

[0 < p < 1]. Show that, amongst all random variables X taking values in

the non-negative integers with the same expected value µ [0 < µ < ∞], the

geometric distribution maximises the entropy.

Q 24.15. A source produces a set / of messages M

1

, M

2

, . . . , M

n

with

non-zero probabilities p

1

, p

2

, . . . p

n

. Let S be the codeword length when the

message is encoded by a decodable code c : / → B

∗

where B is an alphabet

of k letters.

(i) Show that

_

n

i=1

√

p

i

_

2

≤ E(k

S

)

[Hint: Cauchy–Schwarz, p

1/2

i

= p

1/2

i

k

s

i

/2

k

−s

i

/2

.]

(ii) Show that

min E(k

S

) ≤ k

_

n

i=1

√

p

i

_

2

.

where the minimum is taken over all decodable codes.

[Hint: Look for a code with codeword lengths s

i

= ⌈−log

k

p

1/2

i

/λ⌉ for an

appropriate λ.]

58

The quotation comes from Pedoe’s The Gentle Art of Mathematics which also gives a

very pretty solution. As might be expected, there are many accounts of this problem on

the web.

89

25 Exercise Sheet 2

Q 25.1. (Exercise 7.3.) In an exam each candidate is asked to write down a

Candidate Number of the form 3234A, 3235B, 3236C,. . . (the eleven possible

letters are repeated cyclically) and a desk number. (Thus candidate 0004

sitting at desk 425 writes down 0004D − −425.) The ﬁrst four numbers in

the Candidate Identiﬁer identify the candidate uniquely. Show that if the

candidate makes one error in the Candidate Identiﬁer then that error can be

detected without using the Desk Number. Would this be true if there were

9 possible letters repeated cyclically? Would this be true if there were 12

possible letters repeated cyclically? Give reasons.

Show that if we combine the Candidate Number and the Desk Number

the combined code is one error correcting.

Q 25.2. (Exercise 6.1) In the model of a communication channel, we take

the probability p of error to be less than 1/2. Why do we not consider the

case 1 ≥ p > 1/2? What if p = 1/2?

Q 25.3. (Exercise 7.4.) If you look at the inner title page of almost any book

published between 1974 and 2007, you will ﬁnd its International Standard

Book Number (ISBN). The ISBN uses single digits selected from 0, 1, . . . , 8,

9 and X representing 10. Each ISBN consists of nine such digits a

1

, a

2

, . . . ,

a

9

followed by a single check digit a

10

chosen so that

10a

1

+ 9a

2

+ + 2a

9

+a

10

≡ 0 mod 11. (*)

(In more sophisticated language, our code C consists of those elements a ∈

F

10

11

such that

10

j=1

(11 −j)a

j

= 0.)

(i) Find a couple of books and check that (∗) holds for their ISBNs.

(ii) Show that (∗) will not work if you make a mistake in writing down

one digit of an ISBN.

(iii) Show that (∗) may fail to detect two errors.

(iv) Show that (∗) will not work if you interchange two distinct adjacent

digits (a transposition error).

(v) Does (iv) remain true if we replace ‘adjacent’ by ‘diﬀerent’ ? Errors

of type (ii) and (iv) are the most common in typing.

In communication between publishers and booksellers, both sides are anx-

ious that errors should be detected but would prefer the other side to query

errors rather than to guess what the error might have been.

(vi) Since the ISBN contained information such as the name of the pub-

lisher, only a small proportion of possible ISBNs could be used

59

and the

59

The same problem occurs with telephone numbers. If we use the Continent, Country,

90

system described above started to ‘run out of numbers’. A new system was

introduced which was is compatible with the system used to label most con-

sumer goods. After January 2007, the appropriate ISBN became a 13 digit

number x

1

x

2

. . . x

13

with each digit selected from 0, 1, . . . , 8, 9 and the check

digit x

13

computed by using the formula

x

13

≡ −(x

1

+ 3x

2

+x

3

+ 3x

4

+ +x

11

+ 3x

12

) mod 10.

Show that we can detect single errors. Give an example to show that we

cannot detect all transpositions.

Q 25.4. (Exercise 7.5.) Suppose we use eight hole tape with the standard

paper tape code and the probability that an error occurs at a particular

place on the tape (i.e. a hole occurs where it should not or fails to occur

where it should) is 10

−4

. A program requires about 10 000 lines of tape (each

line containing eight places) using the paper tape code. Using the Poisson

approximation, direct calculation (possible with a hand calculator but really

no advance on the Poisson method), or otherwise, show that the probability

that the tape will be accepted as error free by the decoder is less than .04%.

Suppose now that we use the Hamming scheme (making no use of the last

place in each line). Explain why the program requires about 17 500 lines of

tape but that any particular line will be correctly decoded with probability

about 1 − (21 10

−8

) and the probability that the entire program will be

correctly decoded is better than 99.6%.

Q 25.5. If 0 < δ < 1/2, ﬁnd an A(δ) > 0 such that, whenever 0 ≤ r ≤ nδ,

we have

r

j=0

_

n

j

_

≤ A(δ)

_

n

r

_

.

(We use weaker estimates in the course but this is the most illuminating.

The particular value of A(δ) is unimportant so do not waste time trying to

ﬁnd a ‘good’ value.)

Q 25.6. Show that the n-fold repetition code is perfect if and only if n is

odd.

Q 25.7. (i) What is the expected Hamming distance between two randomly

chosen code words in F

n

2

. (As usual we suppose implicitly that the two choices

are independent and all choices are equiprobable.)

Town, Subscriber system we will need longer numbers than if we just numbered each

member of the human race.

91

(ii) Three code words are chosen at random from F

n

2

. If k

n

is the expected

value of the distance between the closest two, show that n

−1

k

n

→ 1/2 as

n → ∞.

[There are many ways to do (ii). One way is to consider Tchebychev’s in-

equality.]

Q 25.8. (Exercises 11.2 and 11.3.) Consider the situation described in the

ﬁrst paragraph of Section 11.

(i) Show that for the situation described you should not bet if up ≤ 1

and should take

w =

up −1

u −1

if up > 1.

(ii) Let us write q = 1 − p. Show that, if up > 1 and we choose the

optimum w,

Elog Y

n

= p log p +q log q + log u −q log(u −1).

(iii) Show that, if you bet less than the optimal proportion, your fortune

will still tend to increase but more slowly, but, if you bet more than some

proportion w

1

, your fortune will decrease. Write down the equation for w

1

.

[Moral: If you use the Kelly criterion veer on the side under-betting.]

Q 25.9. Your employer announces that he is abandoning the old-fashioned

paternalistic scheme under which he guarantees you a ﬁxed sum Kx (where,

of course, K, x > 0) when you retire. Instead, he will empower you by giving

you a ﬁxed sum x now, to invest as you wish. In order to help you and the

rest of the staﬀ, your employer arranges that you should obtain advice from

a ﬁnancial whizkid with a top degree from Cambridge. After a long lecture

in which the whizkid manages to be simultaneously condescending, boring

and incomprehensible, you come away with the following information.

When you retire, the world will be in exactly one of n states. By means

of a piece of ﬁnancial wizardry called ditching (or something like that) the

whizkid can oﬀer you a pension plan which for the cost of x

i

will return

Kx

i

q

−1

i

if the world is in state i, but nothing otherwise. (Here q

i

> 0 and

n

i=1

q

i

= 1.) The probability that the world will be in state i is p

i

. You

must invest the entire ﬁxed sum. (Formally,

n

i=1

x

i

= x. You must also

take x

i

≥ 0.) On philosophical grounds you decide to maximise the expected

value S of the logarithm of the sum received on retirement. Assuming that

you will have to live oﬀ this sum for the rest of your life, explain, in your

opinion, why this choice is reasonable or explain why it is unreasonable.

Find the appropriate choices of x

i

. Do they depend on the q

i

?

92

Suppose that K is ﬁxed, but the whizkid can choose q

i

. We may suppose

that what is good for you is bad for him so he will seek to minimise S for

your best choices. Show that he will choose q

i

= p

i

. Show that, with these

choices,

S = log Kx.

Q 25.10. Let C be the code consisting of the word 10111000100 and its

cyclic shifts (that is 01011100010, 00101110001 and so on) together with the

zero code word. Is C linear? Show that C has minimum distance 5.

Q 25.11. (i) The original Hamming code was a 7 bit code used in an 8 bit

system (paper tape). Consider the code c : ¦0, 1¦

4

→ ¦0, 1¦

8

obtained by

using the Hamming code for the ﬁrst 7 bits and the ﬁnal bit as a check digit

so that

x

1

+x

2

+ +x

8

≡ 0 mod 2.

Find the minimum distance for this code. How many errors can it detect?

How many can it correct?

(ii) Given a code of length n which corrects e errors can you always

construct a code of length n + 1 which detects 2e + 1 errors?

Q 25.12. In general, we work under the assumption that all messages sent

through our noisy channel are equally likely. In this question we drop this

assumption. Suppose that each bit sent through a channel has probability

1/3 of being mistransmitted. There are 4 codewords 1100, 0110, 0001, 1111

sent with probabilities 1/4, 1/2, 1/12, 1/6. If you receive 1001 what will you

decode it as, using each of the following rules?

(i) The ideal observer rule: ﬁnd b ∈ C so as to maximise

Pr(b sent [ u received¦.

(ii) The maximum likelihood rule: ﬁnd b ∈ C so as to maximise

Pr(u received [ b sent¦.

(iii) The minimum distance rule: ﬁnd b ∈ C so as to minimise the Ham-

ming distance d(b, u) from the received message u.

Q 25.13. (i) Show that −t ≥ log(1 −t) for 0 ≤ t < 1.

(ii) Show that, if δ

N

> 0, 1 −Nδ

N

> 0 and N

2

δ

N

→ ∞, then

N−1

m=1

(1 −mδ

N

) → 0.

93

(iii) Let V (n, r) be the number of points in a Hamming ball of radius

r in F

n

2

and let p(n, N, r) be the probability that N such balls chosen at

random do not intersect. By observing that if m non-intersecting balls are

already placed, then an m + 1st ball which does not intersect them must

certainly not have its centre in one of the balls already placed, show that, if

N

2

n

2

−n

V (n, r

n

) → ∞, then p(n, N

n

, r

n

) → 0.

(iv) Show that, if 2β +H(α) > 1, then p(n, 2

βn

, αn) → 0.

Thus simply throwing balls down at random will not give very good sys-

tems of balls with empty intersections.

94

26 Exercise Sheet 3

Q 26.1. A message passes through a binary symmetric channel with prob-

ability p of error for each bit and the resulting message is passed through

a second binary symmetric channel which is identical except that there is

probability q of error [0 < p, q < 1/2]. Show that the result behaves as if

it had been passed through a binary symmetric channel with probability of

error to be determined. Show that the probability of error is less than 1/2.

Can we improve the rate at which messages are transmitted (with low error)

by coding, sending through the ﬁrst channel, decoding with error correction

and then recoding, sending through the second channel and decoding with

error correction again or will this produce no improvement on treating the

whole thing as a single channel and coding and decoding only once?

Q 26.2. Write down the weight enumerators of the trivial code (that is to

say, F

n

2

), the zero code (that is to say, ¦0¦), the repetition code and the

simple parity code.

Q 26.3. List the codewords of the Hamming (7,4) code and its dual. Write

down the weight enumerators and verify that they satisfy the MacWilliams

identity.

Q 26.4. (a) Show that if C is linear, then so are its extension C

+

, truncation

C

−

and puncturing C

′

, provided the symbol chosen to puncture by is 0. Give

an example to show that C

′

may not be linear if we puncture by 1.

(b) Show that extension followed by truncation does not change a code.

Is this true if we replace ‘truncation’ by ‘puncturing’ ?

(c) Give an example where puncturing reduces the information rate and

an example where puncturing increases the information rate.

(d) Show that the minimum distance of the parity extension C

+

is the

least even integer n with n ≥ d(C).

(e) Show that the minimum distance of the truncation C

−

is d(C) or

d(C) −1 and that both cases can occur.

(f) Show that puncturing cannot decrease the minimum distance, but give

examples to show that the minimum distance can stay the same or increase.

Q 26.5. If C

1

and C

2

are linear codes of appropriate type with generator

matrices G

1

and G

2

, write down a generator matrix for C

1

[C

2

.

Q 26.6. Show that the weight enumerator of RM(d, 1) is

y

2

d

+ (2

d+1

−2)x

2

d−1

y

2

d−1

+x

2

d

.

95

Q 26.7. (i) Show that every codeword in RM(d, d −1) has even weight.

(ii) Show that RM(m, m−r −1) ⊆ RM(m, r)

⊥

.

(iii) By considering dimension, or otherwise, show that RM(m, r) has

dual code RM(m, m−r −1).

Q 26.8. (Exercises 8.6 and 8.7.) We show that, even if 2

n

/V (n, e) is an

integer, no perfect code may exist.

(i) Verify that

2

90

V (90, 2)

= 2

78

.

(ii) Suppose that C is a perfect 2 error correcting code of length 90 and

size 2

78

. Explain why we may suppose, without loss of generality, that 0 ∈ C.

(iii) Let C be as in (ii) with 0 ∈ C. Consider the set

X = ¦x ∈ F

90

2

: x

1

= 1, x

2

= 1, d(0, x) = 3¦.

Show that, corresponding to each x ∈ X, we can ﬁnd a unique c(x) ∈ C

such that d(c(x), x) = 2.

(iv) Continuing with the argument of (iii), show that

d(c(x), 0) = 5

and that c

i

(x) = 1 whenever x

i

= 1. If y ∈ X, ﬁnd the number of solutions

to the equation c(x) = c(y) with x ∈ X and, by considering the number of

elements of X, obtain a contradiction.

(v) Conclude that there is no perfect [90, 2

78

] code.

(vi) Show that V (3, 23) is a power of 2. (In this case a perfect code exists

called the binary Golay code.)

Q 26.9. [The MacWilliams identity for binary codes] Let C ⊆ F

n

2

be

a linear code of dimension k.

(i) Show that

x∈C

(−1)

x.y

=

_

2

k

if y ∈ C

⊥

0 if y / ∈ C

⊥

.

(ii) If t ∈ R, show that

y∈F

n

2

t

w(y)

(−1)

x.y

= (1 −t)

w(x)

(1 +t)

n−w(x)

.

(iii) By using parts (i) and (ii) to evaluate

x∈C

_

_

y∈F

n

2

(−1)

x.y

_

s

t

_

w(y)

_

_

96

in two diﬀerent ways, obtain the MacWilliams identity

W

C

⊥(s, t) = 2

−dimC

W

C

(t −s, t +s).

Q 26.10. An erasure is a digit which has been made unreadable in trans-

mission. Why are they easier to deal with than errors? Find a necessary and

suﬃcient condition on the parity check matrix for it to be always possible

to correct t erasures. Find a necessary and suﬃcient condition on the parity

check matrix for it never to be possible to correct t erasures (ie whatever

message you choose and whatever t erasures are made the recipient cannot

tell what you sent).

Q 26.11. Consider the collection K of polynomials

a

0

+a

1

ω

with a

j

∈ F

2

manipulated subject to the usual rules of polynomial arithmetic

and to the further condition

1 +ω +ω

2

= 0.

Show by ﬁnding a generator and writing out its powers that K

∗

= K ¸ ¦0¦

is a cyclic group under multiplication and deduce that K is a ﬁnite ﬁeld.

[Of course, this follows directly from general theory but direct calculation is

not uninstructive.]

Q 26.12. (i) Identify the cyclic codes of length n corresponding to each of

the polynomials 1, X −1 and X

n−1

+X

n−2

+ +X + 1.

(ii) Show that there are three cyclic codes of length 7 corresponding to

irreducible polynomials of which two are versions of Hamming’s original code.

What are the other cyclic codes?

(iii) Identify the dual codes for each of the codes in (ii).

Q 26.13. (Example 15.14.) Prove the following results.

(i) If K is a ﬁeld containing F

2

, then (a +b)

2

= a

2

+b

2

for all a, b ∈ K.

(ii) If P ∈ F

2

[X] and K is a ﬁeld containing F

2

, then P(a)

2

= P(a

2

) for

all a ∈ K.

(iii) Let K be a ﬁeld containing F

2

in which X

7

−1 factorises into linear

factors. If β is a root of X

3

+X +1 in K, then β is a primitive root of unity

and β

2

is also a root of X

3

+X + 1.

(iv) We continue with the notation of (iii). The BCH code with ¦β, β

2

¦

as deﬁning set is Hamming’s original (7,4) code.

97

Q 26.14. Let C be a binary linear code of length n, rank k and distance d.

(i) Show that C contains a codeword x with exactly d non-zero digits.

(ii) Show that n ≥ d +k −1.

(iii) Prove that truncating C on the non-zero digits of x produces a code

C

′

of length n −d, rank k −1 and distance d

′

≥ ⌈

d

2

⌉.

[Hint: To show d

′

≥ ⌈

d

2

⌉, consider, for y ∈ C, the coordinates where x

j

= y

j

and the coordinates where x

j

,= y

j

.]

(iv) Show that

n ≥ d +

k−1

u=1

⌈

d

2

u

⌉.

Why does (iv) imply (ii)? Give an example where n > d +k −1.

Q 26.15. Implement the secret sharing method of page 57 with k = 2, n = 3,

x

j

= j + 1 p = 7, a

0

= S = 2, a

1

= 3. Check directly that any two people

can ﬁnd S but no single individual can.

If we take k = 3, n = 4, p = 6, x

j

= j +1 show that the ﬁrst two members

and the fourth member of the Faculty Board will be unable to determine S

uniquely. Why does this not invalidate our method?

98

27 Exercise Sheet 4

Q 27.1. (Exercise 18.2.) Show that the decimal expansion of a rational

number must be a recurrent expansion. Give a bound for the period in terms

of the quotient. Conversely, by considering geometric series, or otherwise,

show that a recurrent decimal represents a rational number.

Q 27.2. A binary non-linear feedback register of length 4 has deﬁning rela-

tion

x

n+1

= x

n

x

n−1

+x

n−3

.

Show that the state space contains 4 cycles of lengths 1, 2, 4 and 9

Q 27.3. A binary LFR was used to generate the following stream

110001110001 . . .

Recover the feedback polynomial by the Berlekamp–Massey method. [The

LFR has length 4 but you should work through the trials for length r for

1 ≤ r ≤ 4.]

Q 27.4. (Exercise 16.5.) Consider the linear recurrence

x

n

= a

0

x

n−d

+a

1

x

n−d+1

+. . . +a

d−1

x

n−1

⋆

with a

j

∈ F

2

and a

0

,= 0.

(i) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C has a root α in K. Show that x

n

= α

n

is a solution of ⋆ in K.

(ii) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C has d distinct roots α

1

, α

2

, . . . , α

d

in K. Show that the general solution

of ⋆ in K is

x

n

=

d

j=1

b

j

α

n

j

for some b

j

∈ K. If x

0

, x

1

, . . . , x

d−1

∈ F

2

, show that x

n

∈ F

2

for all n.

(iii) Work out the ﬁrst few lines of Pascal’s triangle modulo 2. Show that

the functions f

j

: Z → F

2

f

j

(n) =

_

n

j

_

are linearly independent in the sense that

m

j=0

b

j

f

j

(n) = 0

99

for all n implies b

j

= 0 for 1 ≤ j ≤ m.

(iv) Suppose K is a ﬁeld containing F

2

such that the auxiliary polynomial

C factorises completely into linear factors. If the root α

u

has multiplicity

m(u) [1 ≤ u ≤ q], show that the general solution of ⋆ in K is

x

n

=

q

u=1

m(u)−1

v=0

b

u,v

_

n

v

_

α

n

u

for some b

u,v

∈ K. If x

0

, x

1

, . . . , x

d−1

∈ F

2

, show that x

n

∈ F

2

for all n.

Q 27.5. Consider the recurrence relation

u

n+p

+

n−1

j=0

c

j

u

j+p

= 0

over a ﬁeld (if you wish, you may take the ﬁeld to be R but the algebra is

the same for all ﬁelds.) We suppose c

0

,= 0. Write down an n n matrix M

such that

_

_

_

_

_

u

1

u

2

.

.

.

u

n

_

_

_

_

_

= M

_

_

_

_

_

u

0

u

1

.

.

.

u

n−1

_

_

_

_

_

.

Find the characteristic and minimal polynomials for M. Would your

answers be the same if c

0

= 0?

Q 27.6. (Exercise 18.9.) One of the most conﬁdential German codes (called

FISH by the British) involved a complex mechanism which the British found

could be simulated by two loops of paper tape of length 1501 and 1497. If

k

n

= x

n

+y

n

where x

n

is a stream of period 1501 and y

n

is a stream of period

1497, what is the longest possible period of k

n

? How many consecutive values

of k

n

would you need to to ﬁnd the underlying linear feedback register using

the Berlekamp–Massey method if you did not have the information given in

the question? If you had all the information given in the question how many

values of k

n

would you need? (Hint, look at x

n+1497

−x

n

.)

You have shown that, given k

n

for suﬃciently many consecutive n we can

ﬁnd k

n

for all n. Can you ﬁnd x

n

for all n?

Q 27.7. We work in F

2

. I have a secret sequence k

1

, k

2

, . . . and a message

p

1

, p

2

, . . . , p

N

. I transmit p

1

+ k

1

, p

2

+ k

2

, . . . p

N

+ k

N

and then, by error,

transmit p

1

+ k

2

, p

2

+ k

3

, . . . p

N

+ k

N+1

. Assuming that you know this and

that my message makes sense, how would you go about ﬁnding my message?

Can you now decipher other messages sent using the same part of my secret

sequence?

100

Q 27.8. Give an example of a homomorphism attack on an RSA code. Show

in reasonable detail that the Elgamal signature scheme defeats it.

Q 27.9. I announce that I shall be using the Rabin–Williams scheme with

modulus N. My agent in X’Dofdro sends me a message m (with 1 ≤ m ≤

N − 1) encoded in the requisite form. Unfortunately, my cat eats the piece

of paper on which the prime factors of N are recorded, so I am unable to

decipher it. I therefore ﬁnd a new pair of primes and announce that I shall

be using the Rabin–Williams scheme with modulus N

′

> N. My agent now

recodes the message and sends it to me again.

The dreaded SNDO of X’Dofdro intercept both code messages. Show that

they can ﬁnd m. Can they decipher any other messages sent to me using

only one of the coding schemes?

Q 27.10. Extend the Diﬃe–Hellman key exchange system to cover three

participants in a way that is likely to be as secure as the two party scheme.

Extend the system to n parties in such a way that they can compute their

common secret key by at most n

2

−n communications of ‘Diﬃe–Hellman type

numbers’. (The numbers p and g of our original Diﬃe-Hellman system are

known by everybody in advance.) Show that this can be done using at most

2n − 2 communications by including several ‘Diﬃe–Hellman type numbers’

in one message.

Q 27.11. St Abacus, who established written Abacan, was led, on theological

grounds, to use an alphabet containing only three letters A, B and C and to

avoid the use of spaces. (Thus an Abacan book consists of single word.) In

modern Abacan, the letter A has frequency .5 and the letters B and C both

have frequency .25. In order to disguise this, the Abacan Navy uses codes in

which the 3r + ith number is x

3r+i

+ y

i

modulo 3 [0 ≤ i ≤ 2] where x

j

= 0

if the jth letter of the message is A, x

j

= 1 if the jth letter of the message

is B, x

j

= 2 if the jth letter of the message is C and y

0

, y

1

and y

2

are the

numbers 0, 1, 2 in some order.

Radio interception has picked up the following message.

120022010211121001001021002021

Although nobody in Naval Intelligence reads Abacan, it is believed that

the last letter of the message will be B if the Abacan ﬂeet is at sea. The

Admiralty are desperate to know the last letter and send a representative to

your rooms in Baker Street to ask your advice. Give it.

101

Q 27.12. Consider the bit exchange scheme proposed at the end of Sec-

tion 19. Suppose that we replace STEP 5 by:- Alice sends Bob r

1

and r

2

and

Bob checks that

r

2

1

≡ r

2

2

≡ m mod n.

Suppose further that Alice cheats by choosing 3 primes p

1

, p

2

, p

3

, and

sending Bob p = p

1

and q = p

2

p

3

. Explain how Alice can shift the odds of

heads to 3/4. (She has other ways of cheating, but you are only asked to

consider this one.)

Q 27.13. (i) Consider the Fermat code given by the following procedure.

‘Choose N a large prime. Choose e and d so that a

de

≡ a mod N, encrypt

using the publicly known N and e, decrypt using the secret d.’ Why is this

not a good code?

(ii) In textbook examples of the RSA code we frequently see e = 65537.

How many multiplications are needed to compute a

e

modulo N?

(iii) Why is it unwise to choose primes p and q with p − q small when

forming N = pq for the RSA method? Factorise 1763.

Q 27.14. The University of Camford is proud of the excellence of its privacy

system CAMSEC. To advertise this fact to the world, the Vice-Chancellor

decrees that the university telephone directory should bear on its cover a

number N (a product of two very large secret primes) and each name in

the University Directory should be followed by their personal encryption

number e

i

. The Vice-Chancellor knows all the secret decryption numbers d

i

but gives these out on a need to know basis only. (Of course each member

of staﬀ must know their personal decryption number but they are instructed

to keep it secret.) Messages a from the Vice-Chancellor to members of staﬀ

are encrypted in the standard manner as a

e

i

modulo N and decrypted as b

d

i

modulo N.

(i) The Vice-Chancellor sends a message to all members of the University.

An outsider intercepts the encrypted message to individuals i and j where e

i

and e

j

are coprime. How can the outsider read the message? Can she read

other messages sent from the Vice-Chancellor to the ith member of staﬀ

only?

(ii) By means of a phone tapping device, the Professor of Applied Numis-

matics (number u in the University Directory) has intercepted messages from

the Vice-Chancellor to her hated rival, the Professor of Pure Numismatics

(number v in the University Directory). Explain why she can decode them.

What moral should be drawn?

Q 27.15. The Poldovian Embassy uses a one-time pad to communicate with

the notorious international spy Ivanovich Smith. The messages are coded

102

in the obvious way. (If the pad has C the 3rd letter of the alphabet and

the message has I the 9th then the encrypted message has L the 3 + 9th.

Work modulo 26.) Unknown to them, the person whom they employ to carry

the messages is actually the MI5 agent ‘Union’ Jack Caruthers in disguise.

MI5 are on the verge of arresting Ivanovich when ‘Union’ Jack is given the

message

LRPFOJQLCUD.

Caruthers knows that the actual message is

FLY XATXONCE

and suggests that ‘the boﬃns change things a little’ so that Ivanovich deci-

phers the message as

REMAINXHERE.

The only boﬃn available is you. Advise MI5.

Q 27.16. Suppose that X and Y are independent random variables taking

values in Z

n

. Show that

H(X +Y ) ≥ max¦H(X), H(Y )¦.

Why is this remark of interest in the context of one-time pads?

Does this result remain true if X and Y need not be independent? Give

a proof or counterexample.

Q 27.17. I use the Elgamal signature scheme described on page 77. Instead

of choosing k at random, I increase the value used by 2 each time I use

it. Show that it will often be possible to ﬁnd my privacy key u from two

successive messages.

Q 27.18. Conﬁdent in the unbreakability of RSA, I write the following.

What mistakes have I made?

0000001 0000000 0002048 0000001 1391142

0000000 0177147 1033288 1391142 1174371.

Advise me on how to increase the security of messages.

Q 27.19. Let K be the ﬁnite ﬁeld with 2

d

elements and primitive root α.

(Recall that α is a generator of the cyclic group K¸¦0¦ under multiplication.)

Let T : K → F

2

be a non-zero linear map. (Here we treat K as a vector

space over F

2

.)

103

(i) Show that the map S : K K → F

2

given by S(x, y) = T(xy) is a

symmetric bilinear form. Show further that S is non-degenerate (that is to

say S(x, y) = 0 for all x implies y = 0).

(ii) Show that the sequence x

n

= T(α

n

) is the output from a linear

feedback register of length at most d. (Part (iii) shows that it must be

exactly d.)

(iii) Show that the period of the system (that is to say the minimum

period of T) is 2

d

−1. Explain brieﬂy why this is best possible.

104

Contents

1 Codes and alphabets 2 Huﬀman’s algorithm 3 More on preﬁx-free codes 4 Shannon’s noiseless coding theorem 5 Non-independence 6 What is an error correcting code? 7 Hamming’s breakthrough 8 General considerations 9 Some elementary probability 10 Shannon’s noisy coding theorem 11 A holiday at the race track 12 Linear codes 13 Some general constructions 14 Polynomials and ﬁelds 15 Cyclic codes 16 Shift registers 17 A short homily on cryptography 18 Stream ciphers 19 Asymmetric systems 20 Commutative public key systems 21 Trapdoors and signatures 22 Quantum cryptography 2 3 5 9 11 16 18 20 24 27 29 31 33 38 43 47 53 59 61 68 71 76 78

23 Further reading 24 Exercise Sheet 1 25 Exercise Sheet 2 26 Exercise Sheet 3 27 Exercise Sheet 4

82 85 90 95 99

1

Codes and alphabets

Originally, a code was a device for making messages hard to read. The study of such codes and their successors is called cryptography and will form the subject of the last quarter of these notes. However, in the 19th century the optical1 and then the electrical telegraph made it possible to send messages speedily, but only after they had been translated from ordinary written English or French into a string of symbols. The best known of the early codes is the Morse code used in electronic telegraphy. We think of it as consisting of dots and dashes but, in fact, it had three symbols dot, dash and pause which we write as •, − and ∗. Morse assigned a code word consisting of a sequence of symbols to each of the letters of the alphabet and each digit. Here are some typical examples. A→•−∗ D → − • •∗ O → − − −∗ B →−•••∗ E → •∗ S → • • •∗ C →−•−•∗ F →••−•∗ 7 → − − • • •∗

The symbols of the original message would be encoded and the code words sent in sequence, as in SOS → • • • ∗ − − − ∗ • • •∗, and then decoded in sequence at the other end to recreate the original message. Exercise 1.1. Decode − • − • ∗ − − − ∗ − • • ∗ • ∗.

See The Count of Monte Cristo and various Napoleonic sea stories. A statue to the inventor of the optical telegraph (semaphore) was put up in Paris in 1893 but melted down during World War II and not replaced (http://hamradio.nikhef.nl/tech/rtty/chappe/). In the parallel universe of Disc World the clacks is one of the wonders of the Century of the Anchovy.

1

3

Morse’s system was intended for human beings. However. we have two alphabets A and B and a coding function c : A → B ∗ where B∗ consists of all ﬁnite sequences of elements of B. 2. this would give 128 possibilities. Once machines took over the business of encoding. We demand that c∗ is injective. c(1) = 00 show that c is injective but c∗ is not. 6} and B = {0. If A∗ consists of all ﬁnite sequences of elements of A.4. If c(0) = 0. Here are some typical examples A → 1000001 a → 1100001 + → 0101011 B → 1000010 b → 1100010 ! → 0100001 C → 1000011 c → 1100011 7 → 0110111 Exercise 1. Decode 110001111000011100010. (i) Let A = B = {0.3. A variable length code need not be decodable even if c is injective. More generally. This uses two symbols 0 and 1 and all code words have seven symbols. 4 . Exercise 1. Show that there is a variable length coding c such that c is injective and all code words have length 2 or less. All the code words in ASCII have the same length (so we have a ﬁxed length code). 1}. . c(an ). Explain why (if c is injective) any ﬁxed length code is decodable. there is a family of variable length codes which are decodable in a natural way. We call codes for which c∗ is injective decodable. but 0000000 and 1111111 are not used. (ii) Let A = {1. we are more interested in the collection of code words C = c(A) than the coding function c. an ) = c(a1 )c(a2 ) . other systems developed. . so there are 126 code words allowing the original message to contain a greater variety of symbols than Morse code. . but this is not true for the Morse code (so we have a variable length code). If we look at the code words of Morse code and the ASCII code. . For many purposes. A very inﬂuential one called ASCII was developed in the 1960s. Show that there is no decodable coding c such that all code words have length 2 or less. since otherwise it is possible to produce two messages which become indistinguishable once encoded. 4. 3. 5. 1}. then the encoding function c∗ : A∗ → B∗ is given by c∗ (a1 a2 . we observe a very important diﬀerence.2. Exercise 1. In principle. Encode b7!.

) It is possible to increase the rate at which symbols are sent and received by using machines. Streets ﬂooded. or otherwise. Obviously it made Leading to a prose style known as telegraphese.7. he could only send or receive a limited number of dots and dashes each minute. B = {0. If c. whenever w ∈ C is an initial sequence of w′ ∈ C. ‘Arrived Venice. Why is every injective ﬁxed length code automatically preﬁxfree? From now on. (For this reason preﬁx-free codes are sometimes called instantaneous codes or self punctuation codes. If c is preﬁx-free. more usually. then w = w′ . then. but the laws of physics (backed up by results in Fourier analysis) place limits on the number of symbols that can be correctly transmitted over a given line. unless explicitly stated otherwise. c : A → B ∗ are given by ˜ c(0) = 0 c(1) = 10 c(2) = 110 c(3) = 111 c(0) = 0 ˜ c(1) = 01 ˜ c(2) = 011 ˜ c(3) = 111 ˜ show that c is preﬁx-free. Let B be an alphabet. ˜ Exercise 1. . However good a telegraphist was. 3}. (This is why Morse chose a variable length code. but we can decode messages on the ﬂy. Let A = {0. If c : A → B ∗ is a coding function. c will be injective and the codes used will be preﬁx-free. show that c is injective. 2. not only is c∗ injective. we know that the ﬁrst message was a1 and we can proceed to look for the second message. Advise. (The slowest rates were associated with undersea cables. we say that c is preﬁx-free if c is injective and c(A) is preﬁx-free. In section 3 we show that we lose nothing by conﬁning ourselves to preﬁx-free codes.) Customers were therefore charged so much a letter or. 2 Huﬀman’s algorithm An electric telegraph is expensive to build and maintain.6. The telegraphist would need to send the letter E far more often than the letter Q so Morse gave E the short code •∗ and Q the long code − − • − ∗. The moment we have received some c(a1 ). so much a word2 (with a limit on the permitted word length). . We say that a ﬁnite subset C of B ∗ is preﬁx-free if. .) Exercise 1. 1. Suppose that we receive a sequence b1 . By thinking about the way c is obtained ˜ ˜ ∗ from c.Deﬁnition 1. .’ 2 5 . b2 . 1}.5. but c is not.

. as often happens. Given n messages M1 . Of course. To deal with this problem we add an extra constraint.2. However. . Google ‘telegraphic codes and message practice. The problem is interesting as it stands. . Today messages are usually sent in as binary sequences like 01110010 . then we can assign each message a diﬀerent string of m zeros and ones (usually called bits) and each message will cost mK cents where K is the cost of sending one bit. 3 6 . . . Mn such that the probability that Mj will be chosen is pj . one message (such as ‘nothing to report’) is much more frequent than any other then it may be cheaper on average to assign it a shorter code word even at the cost of lengthening the other code words. Instead of writing about the problem. but the transmission of each digit still costs money.sense to have books of ‘telegraph codes’ in which one ﬁve letter combination. Problem 2. If we know that there are n possible messages that can be sent and that n ≤ 2m . ﬁnd distinct code words Cj consisting of lj bits so that the expected cost n K j=1 pj lj of sending the code word corresponding to the chosen message is minimised. he solved it completely. Problem 2. ‘FTCGI’ meant ‘are you willing to split the diﬀerence?’ and another ‘FTCSU’ meant ‘cannot see any diﬀerence’3 . 1870-1945’ for lots of examples. . Given n messages M1 . In 1951 Huﬀman was asked to write an essay on this problem as an end of term university exam. . say. M2 . If the telegraph company insisted on ordinary words you got codes like ‘FLIRT’ for ‘quality of crop good’. . . ﬁnd a preﬁx-free collection of code words Cj consisting of lj bits so that the expected cost n K j=1 pj lj of sending the code word corresponding to the chosen message is minimised.1. but we have not taken into account the fact that a variable length code may not be decodable. . M2 . we suppose K > 0. If. this may not be the best way of saving money. . Mn such that the probability that Mj will be chosen is pj .

2 with n messages.4.) Combining messages in the suggested way. M2 . . 3]. 4. . Since the problem is trivial when n = 2 (give M1 the code word 0 and M2 the code word 1) this gives us what computer programmers and logicians call a recursive solution. Exercise 2.2] = 01 . Mj has probability j/10 for 1 ≤ j ≤ 4. Order the messages so that p1 ≥ p2 ≥ · · · ≥ pn .Theorem 2. Mn−1 such that Mj′ has ′ probability pj for 1 ≤ j ≤ n − 2. The reader is strongly advised to do a slightly more complicated example like the next. 7 . Suppose n = 4. Working backwards. message 1 Example 2. C4 = 1 C[1. the original problem is solved by ′ assigning Mj the code word Cj for 1 ≤ j ≤ n − 2 and Mn−1 the code word ′ ′ consisting of Cn−1 followed by 0 and Mn the code word consisting of Cn−1 followed by 1. (Note that we do not bother to reorder messages. Apply Huﬀman’s algorithm. .5. C3 = 00 C1 = 011. for example. but it is very easy to follow the steps of Huﬀman’s algorithm ‘by hand’. (Note that the algorithm is very speciﬁc about the labelling of the code words so that.3] = 0 . Apply Huﬀman’s algorithm. . . the eﬀects of Huﬀman’s algorithm will be most marked when a few messages are highly probable. 3. . 4 [[1. 2]. Recursive programs are often better adapted to machines than human beings. [Huﬀman’s algorithm] The following algorithm solves Problem 2. If ′ Cj is the code word corresponding to Mj′ . we get 1. 2. Suppose Mj has probability j/45 for 1 ≤ j ≤ 9. As we indicated earlier. C2 = 010. we get C[[1. 4 [1. but Mn−1 has probability pn−1 + pn . . Solution.3. 3. . 2]. . ′ ′ ′ Solve the problem with n − 1 messages M1 . .2].

We observe that reading a code word from a preﬁx-free code is like climbing a tree with 0 telling us to take the left branch and 1 the right branch. (iv) If we have a best code then two of the leaves with the lowest probabilities will appear at the last stage. then interchanging the role of 0 and 1 will produce another Huﬀman type code. Thus. left branch.27. M2 has probability . Lemma 2. Whilst doing the exercises the reader must already have been struck by the fact that minor variations in the algorithm produce diﬀerent codes. if we have a Huﬀman code. (iii) If we have a best code then interchanging the probabilities of leaves belonging to the last stage (ie the longest code words) still gives a best code. 8 . M2 has probability 1/4 and Mj has probability 1/248 for 3 ≤ j ≤ 64. (v) There is a best code in which two of the leaves with the lowest probabilities are neighbours (have code words diﬀering only in the last place). for example. 01. Suppose n = 4. for example that. right branch.2. Suppose n = 64. if we have a best code. right branch. The fact that the Huﬀman code may not be the unique best solution means that we need to approach the proof of Theorem 2.24. M1 has probability 1/2. The fact that the code is preﬁx-free tells us that each code word may be represented by a leaf at the end of a ﬁnal branch. every branch belonging to a particular stage of growth will have at least as large a number associated with it as any branch belonging to a later stage. there may be other equally good codes which could not be obtained in this manner.23.2). M1 has probability .8. left branch. Exercise 2.7. By using the ideas of Huﬀman’s algorithm (you should not need to go through all the steps) obtain a set of code words such that the expected length of a code word sent is not more than 3.) In fact.3 with caution. (Note. Show that any assignment of the code words 00. (ii) If we label every branch by the sum of the probabilities of all the leaves that spring from it then. if we use code words of equal length then the length of a code word must be at least 6. (i) If we have a best code then it will split into a left branch and right branch at every stage. although the Huﬀman algorithm will always produce a best code (in the sense of Problem 2. M3 has probability .6.26 and M4 has probability . 10 and 11 produces a best code in the sense of Problem 2. Explain why. the code word 00101 is represented by the leaf found by following left branch. The next lemma contains the essence of our proof of Theorem 2.3.Exercise 2.

After we have sent k messages we will know that message Mj has been sent kj times and so will the recipient of the message. then n j=1 D−lj ≤ 1. but in this section we allow B to have D elements. However. . Notice however.In order to use the Huﬀman algorithm we need to know the probabilities of the n possible messages. The elements of B∗ are called words. Suppose we do not. k+n Provided the recipient knows the exact version of the Huﬀman algorithm that we use. Lemma 3. as we said earlier. 9 we can ﬁnd a preﬁx-free code C consisting of n words Cj of length lj . it is not unreasonable (lifting our hat in the direction of the Reverend Thomas Bayes) to take kj + 1 pj = . Take l1 ≤ l2 ≤ · · · ≤ ln . Start by choosing C1 to be any code word of length l1 . The ﬁrst is whether we could get better results by using codes which are not preﬁx-free.1. 3 More on preﬁx-free codes It might be thought that Huﬀman’s algorithm says all that is to be said on the problem it addresses. The object of this section is to show that this is not the case.2. Lemma 3. Proof. Variants of this idea are known as ‘Huﬀman-on-the-ﬂy’ and form the basis of the kind of compression programs used in your computer. that whilst Theorem 2. she can reconstruct our Huﬀman code and decode our next message. there are two important points that need to be considered. the contents of this paragraph form a non-examinable plausible statement. As in section 1. [Kraft’s inequality 1] If a preﬁx-free code C consists of n words Cj of length lj . [Kraft’s inequality 2] Given strictly positive integers lj satisfying n j=1 D−lj ≤ 1.3 is an examinable theorem. 1}. If we decide to use a Huﬀman code for the next message. we consider two alphabets A and B and a coding function c : A → B ∗ (where. We give an inductive construction for an appropriate preﬁx-free code. For most of this course B = {0. B∗ consists of all ﬁnite sequences of elements of B).

Since there are Dlr+1 possible code words of length lr+1 there is at least one ‘good code word’ which does not have one of the code words already selected as preﬁx. [The MacMillan inequality] If a decodable code C consists of n words Cj of length lj . consider all possible code words of length lr+1 . MacMillan showed that the same inequality applies to all decodable codes. The proof is extremely elegant and (after one has thought about it long enough) natural.4. Choose one of the good code words as Cr+1 and restart the induction. Thus if we are only concerned with the length of code words we need only consider preﬁx-free codes.1 is pretty but not deep. Lemma 3. exactly) r Dlr+1 −lk k=1 will have one of the code words already selected as preﬁx. Lemma 3. Of these Dlr+1 −lk will have preﬁx Ck so at most (in fact. Using Lemma 3.3. The method used in the proof is called a ‘greedy algorithm’ because we just try to do the best we can at each stage without considering future consequences. If r = n we are done. Theorem 3.Suppose that we have found a collection of r preﬁx-free code words Ck of length lk [1 ≤ k ≤ r]. then n j=1 D−lj ≤ 1. then there exists a preﬁx-free code C ′ consisting of n words Cj of length lj . 10 . If there exists a decodable code C consisting of n words Cj of ′ length lj . By hypothesis r r D k=1 lr+1 −lk =D lr+1 k=1 D−lk < Dlr+1 .2 we get the immediate corollary. If not.

3. n j=1 Calculus solution. Find strictly pj xj subject to j=1 j=1 D−xj ≤ 1. .2 (any system of lengths satisfying Kraft’s inequality is associated with a preﬁx-free and so decodable code) and Theorem 3. Mn such that the probability that Mj will be chosen is pj . We know that Huﬀman’s algorithm is best possible.2. . Problem 4. Find strictly pj lj subject to j=1 j=1 D−lj ≤ 1. Problem 4.3 (any decodable code satisﬁes Kraft’s inequality). . Let us restate our problem. but we have not discussed how good the best possible should be. Thus we may demand j=1 n pj xj and D−xj = 1. ﬁnd a decodable code C whose code words Cj consist of lj bits so that the expected cost n K j=1 pj lj of sending the code word corresponding to the chosen message is minimised. .2 is hard because we restrict the lj to be integers. Given n messages M1 . Suppose pj ≥ 0 for 1 ≤ j ≤ n and positive integers lj minimising n n n j=1 pj = 1.1 reduces to an abstract minimising problem. (In this section we allow the coding alphabet B to have D elements. j=1 11 .) Problem 4. If we drop the restriction we end up with a problem in Part IB variational calculus. M2 .1. Problem 4. In view of Lemma 3. Suppose pj ≥ 0 for 1 ≤ j ≤ n and positive real numbers xj minimising n n n j=1 pj = 1.4 Shannon’s noiseless coding theorem In the previous section we indicated that there was a second question we should ask about Huﬀman’s algorithm. Problem 4. Observe that decreasing any xk decreases increases n D−xj .

D−xj = K0 (λ)pj for some K0 (λ) > 0. λ) = j=1 p j xj − λ D−xj . but it is an unfortunate fact that IB variational calculus is suggestive rather than conclusive. j=1 ∂L = pj + (λ log D)D−xj ∂xj we know that.The Lagrangian is n n L(x. (ii) [Gibbs’ inequality] Suppose that pj . maximising. in fact. By applying (i) with t = qj /pj . show that n n j=1 pj log pj ≥ pj log qj j=1 with equality if and only if pj = qj . Since stationarising solution when n j=1 Since D−xj = 1. 12 . that is to say xj = − and n n j=1 p j xj = − pj j=1 log pj . log D It is not hard to convince oneself that the stationarising solution just found is. The next two exercises (which will be done in lectures and form part of the course) provide a rigorous proof. our original problem will have a log pj log D D−xj = pj . at any stationary point. Exercise 4.4. (i) Show that log t ≤ t − 1 for t > 0 with equality if and only if t = 1. qj > 0 and n n pj = j=1 j=1 qj = 1.

If A takes the value a with probability pa we say that the system has Shannon entropy4 (or information entropy) H(A) = − pa log2 pa . Show that loga b = log b . a∈A Theorem 4.5. If A is an A-valued random variable. b > 0.) Let a. Let A and B be ﬁnite alphabets and let B have D symbols. (Memory jogger.Exercise 4.3 is so important that it gives rise to a deﬁnition. if x∗ = − log pj / log D. E|c(A)| ≥ log2 D It is unwise for the beginner and may or may not be fruitless for the expert to seek a link with entropy in physics. (i) Show that. Deﬁnition 4. j=1 Set qj = D−yj . 4 13 . We use the notation of Problem 4. log a The result of Problem 4.4 (ii). but the importance of two-symbol alphabets means that communication theorists often use logarithms to the base 2. then any decodable code c : A → B must satisfy H(A) .6. then x∗ > 0 and j j n D−xj = 1.3. By using Gibbs’ inequality from Exercise 4.7. j Analysts use logarithms to the base e. Let A be a non-empty ﬁnite set and A a random variable taking values in A.8. show that n n p j x∗ j j=1 ≤ p j yj j=1 with equality if and only if yj = x∗ for all j. j=1 ∗ (ii) Suppose that yj > 0 and n D−yj = 1. Exercise 4.

2 the dj are integers. use log tables. log2 D Proof.Here |c(A)| denotes the length of c(A). (ii) We found a Huﬀman code ch for the system in Example 2. [Shannon–Fano encoding] Let A and B be ﬁnite alphabets and let B have D symbols. but it is certainly worth a try. Suppose that the probability that letter k is chosen is k/10. that E|c(A)| = 2. By Lemma 3. 6 If you have no calculator. 5 14 . The point of view adopted here means that for some problems there may be more than one Shannon–Fano code. just think. Check that these results are consistent with our previous theorems. then these conditions are satisﬁed and we are done. that is to say. 4}.85. Within a Cambridge examination context you may assume that Shannon–Fano codes are those considered here.3 the xj are just positive real numbers but in Problem 4. it suﬃces to ﬁnd strictly positive integers la such that D−la ≤ 1. but pa la ≤ 1 + H(A) . your computer has a calculator program. Use your calculator6 to ﬁnd ⌈− log2 pk ⌉ and write down an appropriate Shannon–Fano code c.10. If you are on a desert island. (Such codes are called Shannon–Fano codes5 .4 and that E|ch (A)| = 1.) Exercise 4. (i) Let A = {1. Show7 that the entropy is approximately 1. Wikipedia and several other sources give a deﬁnition of Shannon–Fano codes which is deﬁnitely inconsistent with that given here. In Problem 4. Fano was the professor who set the homework for Huﬀman. It is very easy to use the method just indicated to ﬁnd an appropriate code. Theorem 4. Notice that the result takes a particularly simple form when D = 2. then there exists a preﬁx-free (so decodable) code c : A → B which satisﬁes E|c(A)| ≤ 1 + H(A) . we take la to be the smallest integer no smaller than − logD pa .9.4. If you have no computer. 3. 2.2 (which states that given lengths satisfying Kraft’s inequality we can construct an associated preﬁx-free code). 7 Unless you are on a desert island in which case the calculations are rather tedious. If A is an A-valued random variable.9. Choosing dj as close as possible to the best xj may not give the best dj . log2 D a∈A a∈A If we take la = ⌈− logD pa ⌉.

However.Putting Theorems 4. log2 D log2 D In particular. Advise him. If A is a A-valued random variable.12. it is usually quite easy to identify an individual from the system’.11. we may say that ‘A system with low Shannon entropy is highly organised and. Exercise 4. he demands ten pounds for each such encounter. Find the maximum and minimum of H and describe the points where these values are attained. Trubshaw believes that the probability of the jth member of the gang being the Master is pj [1 ≤ j ≤ n] and wishes to minimise the expected drain on the public purse. 15 . (i) Sketch h(t) = −t log t for 0 ≤ t ≤ 1. n pj = 1 j=1 and let H : Γ → R be deﬁned by n H(p) = j=1 h(pj ). The notorious Trinity gang has just been rounded up and Trubshaw of the Yard wishes to identify the leader (or Master.) (ii) Let n Γ= p ∈ R : pj ≥ 0.8 and Theorem 4. then any decodable code c which minimises E|c(A)| satisﬁes H(A) H(A) ≤ E|c(A)| ≤ 1 + . [Shannon’s noiseless coding theorem] Let A and B be ﬁnite alphabets and let B have D symbols. (iii) If n = 2r +s with 0 ≤ s < 2r and pj = 1/n. Huﬀman’s code ch for two symbols satisﬁes H(A) ≤ E|ch (A)| ≤ 1 + H(A). knowing the system.13. Waving our hands about wildly. Presented with any collection of members of the gang he will (by a slight twitch of his left ear) indicate if the Master is among them. describe the Huﬀman code ch for two symbols and verify directly that (with notation of Theorem 4. in view of the danger involved. Sam the Snitch makes the following oﬀer.9 together. (We deﬁne h(0) = 0.11) H(A) ≤ E|ch (A)| ≤ 1 + H(A). Exercise 4. as he is called). we get the following remarkable result. Theorem 4.

If a machine reads the image pixel by pixel. . (ii) Suppose we wish to send the bits Xj Xj+106 Xj+2×106 . but. we say we have a block code. we can recover an analogue of Theorem 4. . a9 with 11a0 a1 . a9 we have a decodable code which on average requires about 5/2 bits to transmit the sequence. It is plausible that the longer the blocks. the less important the eﬀects of non-independence. However. It is sometimes possible to send messages more eﬃciently using this fact. . the problem lies deeper. we can instantly see that it represents Lena wearing a hat. . An )| = nE|c(A)| will be as small as possible. the sequence of ten ones with 10 and any other sequence a0 a1 . . A2 . 16 . in re*l lif* th* let*ers a*e often no* i*d*p***ent. E|c(A)| was about as small as possible.11 (the noiseless coding theorem). . Xj+9 . . . Exercise 5. a method that works well on black and white photographs may fail on colour photographs and a method that works well on photographs of faces may work badly when applied to photographs of trees. Explain why any decodable code will require on average at least 10 bits to transmit the sequence. apart from the fact that the distribution of pixels is ‘non-random’ or has ‘low entropy’ (to use the appropriate hand-waving expressions). independently according to the same law. Presented with a photograph.) In the previous sections we discussed codes c : A → B ∗ such that. Clearly. In the real world. it ought to be possible to describe the photograph with many fewer bits than are required to describe each pixel separately.99 independent of what has gone before. Suppose that X1 = 1 with probability 1/2 and Xj+1 = Xj with probability . Show that if we associate the sequence of ten zeros with 0. . then it is not hard to convince oneself that E|c∗ (A1 A2 A3 . provided we take long enough blocks.1 (that is to say Markov chains) and that.5 Non-independence (This section is non-examinable. equally clearly. . In more advanced courses it is shown how to deﬁne entropy for systems like the one discussed in Exercise 5. . Suppose that we have a sequence Xj of random variables taking the values 0 and 1. (You need not do detailed computations. . . Xj+9×106 . if a letter A ∈ A was chosen according to some random law. If we choose A1 .) If we transmit sequences of letters by forming them into longer words and coding the words.1. it will have great diﬃculty recognising much. (i) Suppose we wish to send ten successive bits Xj Xj+1 .

If we know that what is going to be transmitted is ‘head and shoulders’ or ‘tennis matches’ or ‘cartoons’ it is possible to obtain extraordinary compression ratios by ‘tuning’ the compression method 17 . if we arrange the sequence in blocks of ﬁxed length. For compact discs. There is a further real world complication. where bits are cheap. Build a better prediction machine and the world will beat your door down. many of the possible blocks will have very low probability. not. Someone who believes that they partially understand the nature of the process builds us a prediction machine which. As an indication of the kind of problem involved. the situation is still more striking with reduction in data content from ﬁlm to TV of anything up to a factor of 60. but which is. For mobile phones. . Notice that lossless coding can be judged by absolute criteria. the engineers make use of knowledge of the human auditory system (for example. x2 . Ideally. and the world will beat a path to your door. Suppose we have a sequence xj of zeros and ones produced by some random process. in fact. . . but the merits of lossy coding can only be judged subjectively. given the sequence x1 . Engineers distinguish between irreversible ‘lossy compression’ and reversible ‘lossless compression’. consider TV pictures. y2 . the fact that we can not make out very soft noise in the presence of loud noises) to produce a result that might sound perfect (or nearly so) to us. If the prediction machine is good. lossless compression should lead to a signal indistinguishable (from a statistical point of view) from a random signal in which the value of each bit is independent of the value of all the others. the sound recorded can be reconstructed exactly. . xj so far. there can be greater loss of data because users do not demand anywhere close to perfection. where bits are expensive.Engineers have a clever way of dealing with this problem.) Build a better mousetrap. we can recover the xj inductively using the prediction machine and the formula xj+1 ≡ yj+1 + x′j+1 mod 2. this is only possible in certain applications. medical and satellite pictures must be transmitted with no loss of data. Now set yj+1 ≡ xj+1 − x′j+1 mod 2. so Huﬀman’s algorithm will be very eﬀective. For digital sound broadcasting. . . In practice. then the sequence of yj will consist mainly of zeros and there will be many ways of encoding the sequence as (on average) a much shorter code word. predicts the next term will be x′j+1 . If we are given the sequence y1 . However. For digital TV. (For example.

since compilers of such codes understood the problem. as we said at the end of the last section. 6 What is an error correcting code? In the introductory Section 1. The technical term for our model is the binary symmetric channel (binary because we use two symbols. Exercise 6. but face the same problem that some digits may not be transmitted or stored correctly.1. The string is then transmitted one digit at a time along a ‘communication channel’. We suppose that the string is the result of data compression and so. meant ‘please book quiet room for two’ and another ‘QWNDR’ meant ‘please book cheapest room for one’. Why do we not consider the case 1 ≥ p > 1/2? What if p = 1/2? 8 9 Watch what happens when things go wrong. say. but we do not know enough about them to exploit them. Obviously. digital TV encoders merely expect the picture to consist of blocks which move at nearly constant velocity remaining more or less unchanged from frame to frame8 . it is fed into a ‘coder’ which outputs a string ci of n binary digits. although the string may have non-trivial statistical properties. we could build a prediction device and compress the data still further. The transmitted message is then passed through a ‘decoder’ which either produces a message µj (where we hope that j = i) or an error message and passes it on to the ‘receiver’. Our model is the following. we do not know enough to exploit this fact. (If we knew how to exploit any statistical regularity. symmetric because the probability of error is the same whichever symbol we use). Today. but then changes from what is expected can be disastrous. This is a made up example. we discussed ‘telegraph codes’ in which one ﬁve letter combination ‘QWADR’. When the ‘source’ produces one of the m possible messages µi say.to the expected pictures. we know that after compression the signal still has non-trivial statistical properties. an error of one letter in this code could have unpleasant consequences9 . we shall assume that we are asked to consider a collection of m messages each of which is equally likely. we transmit and store long strings of binary sequences. as in other applications. At present. also. In this. 18 . Each digit has probability p of being mistransmitted (so that 0 becomes 1 or 1 becomes 0) independently of what happens to the other digits [0 ≤ p < 1/2].) Because of this.

For most of the time we shall concentrate our attention on a code C ⊆ {0. as m increases. How should our decoder work? We have assumed that all messages are equally likely and that errors are independent (this would not be true if. errors cause us no problems (since there is only one message) but no information is transmitted (since there is only one message). y ∈ {0. (Thus we use a ﬁxed length code. In practice. a reasonable strategy for our decoder is to guess that the codeword sent is one which diﬀers in the fewest places from the string of n binary digits received. 10 19 . y) the Hamming distance between x and y. 1}n . 1}n consisting of the codewords ci . for example. we can transmit lots of messages but any error moves us from one codeword to another. Lemma 6.4. the properties of the transmission channel are constantly changing and are not well understood. so that as the error rate for the channel rises the error rate for the receiver should not suddenly explode. The Hamming distance is a metric. since m ≤ 2n the information rate is never greater than 1. Under these assumptions. At the other extreme. In the paradigm case of mobile phones. If m is large then we can send a large number of possible messages (that is to say.) We say that C has size m = |C|. we can send more information) but.3. At one extreme. We are led to the following rather natural deﬁnition. If x. In theory. this is rarely possible. Here and elsewhere the discussion can be illuminated by the simple notion of a Hamming distance. Deﬁnition 6. it becomes harder to distinguish between diﬀerent messages when errors occur. Notice also that the values of the information rate when m = 1 and m = 2n agree with what we might expect. we write n d(x.2. For the purposes of this course we note that this problem could be tackled by permuting the ‘bits’ of the message so that ‘bursts are spread out’. if m = 1. we could do better than this by using the statistical properties of such bursts to build a prediction machine. n Note that. (Here the main restriction on the use of permutation is that it introduces time delays. y) = j=1 |xj − yj | and call d(x. One way round this is ‘frequency hopping’ in which several users constantly swap transmission channels ‘dividing bursts among users’. The information rate of C is log2 m .) One desirable property of codes for mobile phone users is that they should ‘fail gracefully’. errors occurred in bursts10 ). if m = 2n . Deﬁnition 6.

throughout what follows of the so-called maximum likelihood decoding rule.6.5.) 7 Hamming’s breakthrough Although we have used simple probabilistic arguments to justify it. both explicit and implicit. (We can always make a complete search through all the members of C but unless there are very special circumstances this is likely to involve an unacceptable amount of work. The spirit of most of the course is exempliﬁed in the next two deﬁnitions. although this decoding rule is mathematically attractive.We now do some very simple IA probability. x) = d(c.1. (i) If d(c. Deﬁnition 6. it may be impractical if C is large and there is often no known way of ﬁnding the codeword at the smallest distance from a particular x in an acceptable number of steps. then Pr(c sent given x received) ≥ Pr(c′ sent given x received). 20 . The maximum likelihood decoding rule states that a string x ∈ {0. We work with the coding and transmission scheme described above. the maximum likelihood decoding rule will often enable us to avoid probabilistic considerations (though not in the very important part of this concerned with Shannon’s noisy coding theorem) and concentrate on algebra and combinatorics. We say that C is d error detecting if changing up to d digits in a codeword never produces another codeword. Let c ∈ C and x ∈ {0. then Pr(x received given c sent) = pr (1 − p)n−r . x). 1}n received by a decoder should be decoded as (one of ) the codeword(s) at the smallest Hamming distance from x. x). with equality if and only if d(c′ . (ii) If d(c. x) = r. then Pr(c sent given x received) = A(x)pr (1 − p)n−r . x) = r. where A(x) does not depend on r or c. Deﬁnition 7. Lemma 6. x) ≥ d(c. This lemma justiﬁes our use. Notice that. 1}n . (iii) If c′ ∈ C and d(c′ .

no John. 1} the structure of the ﬁeld F2 = Z2 by using arithmetic modulo 2.3. Show that if the candidate makes one error in the Candidate Identiﬁer. . (the eleven12 possible letters are repeated cyclically) and a desk number. Would this be true if there were 9 possible letters repeated cyclically? Would this be true if there were 12 possible letters repeated cyclically? Give reasons. The ﬁrst four numbers in the Candidate Identiﬁer identify the candidate uniquely. The resulting code C is 1 error detecting since. c) with c = 0 or c = 1. . . 21 . . it is convenient to give {0. We take codewords of the form c = (c. 1235B. cn−1 freely chosen elements of F2 and cn (the check digit) the element of F2 which gives c1 + c2 + · · · + cn−1 + cn = 0. Here and elsewhere. . My guess. Some of them use alphabets with more than two symbols but the principles remain the same. if x ∈ Fn is obtained from 2 c ∈ C by making a single error.2. c2 . c. c. . . . no’. .Deﬁnition 7. we have x1 + x2 + · · · + xn−1 + xn = 1. . Exercise 7. the information rate is 1/n which is rather low11 . cn ) with c1 . then this will be detected. . We say that C is e error correcting if knowing that a string of n binary digits diﬀers from some codeword of C in at most e places we can deduce the codeword. The codewords have the form c = (c1 . . no John. Show that. . Here are some simple schemes. The maximum likelihood decoder chooses the symbol that occurs most often. Repetition coding of length n. c3 .) Unfortunately. if we also use the Desk Number then the combined code Candidate Number/Desk Number is one error correcting The paper tape code. c2 . (Here and elsewhere ⌊α⌋ is the largest integer N ≤ α and ⌈α⌉ is the smallest integer M ≥ α. . 11 12 Compare the chorus ‘Oh no John. and ⌊(n − 1)/2⌋ error correcting. The Cambridge examination paper code Each candidate is asked to write down a Candidate Identiﬁer of the form 1234A. 1236C. If the letter written by the candidate does not correspond to to the ﬁrst four numbers the candidate is identiﬁed by using the desk number. . The code C is n − 1 error detecting.

8. 1. . . so n = 8. if x1 + x2 + · · · + xn−1 + xn = 1. Each ISBN consists of nine such digits a1 . 15 Thus a syllabus for an earlier version of this course contained the rather charming misprint of ‘snydrome’ for ‘syndrome’. our code C consists of those elements a ∈ 10 F10 such that j=1 (11 − j)aj = 0. there are n codewords y with Hamming distance d(x. . Traditional paper tape had 8 places per line each of which could have a punched hole or not. 1. Give an example to show that we cannot detect all transpositions. y) = 1. . . The ISBN uses single digits selected from 0. your college library may be of assistance. (v) Does (iv) remain true if we replace ‘adjacent’ by ‘diﬀerent’ ? Errors of type (ii) and (iv) are the most common in typing15 . 9 and the check digit x13 computed by using the formula x13 ≡ −(x1 + 3x2 + x3 + 3x4 + · · · + x11 + 3x12 ) mod 10. . . (*) (In more sophisticated language. x13 with each digit selected from 0. The information rate is (n − 1)/n.4.However it is not error correcting since. . a9 followed by a single check digit a10 chosen so that 10a1 + 9a2 + · · · + 2a9 + a10 ≡ 0 mod 11. . both sides are anxious that errors should be detected but would prefer the other side to query errors rather than to guess what the error might have been. (iv) Show that (∗) will not work if you interchange two distinct adjacent digits (a transposition error). Exercise 7. (ii) Show that (∗) will not work if you make a mistake in writing down one digit of an ISBN. (vi) After January 2007.) 11 (i) Find a couple of books13 and check that (∗) holds for their ISBNs14 . 14 13 22 . the appropriate ISBN is a 13 digit number x1 x2 . . 8. Show that we can detect single errors. In communication between publishers and booksellers. a2 . In case of diﬃculty. . X is only used in the check digit place. 9 and X representing 10. . . If you look at the inner title page of almost any book published between 1970 and 2006 you will ﬁnd its International Standard Book Number (ISBN). In fact. . (iii) Show that (∗) may fail to detect two errors.

Using the Poisson approximation. ‘If the machine can detect an error’ he asked himself ‘why can the machine not correct it?’ and he came up with the following scheme. a hole occurs where it should not or fails to occur where it should) is 10−4 . Suppose we use eight hole tape with the standard paper tape code and the probability that an error occurs at a particular place on the tape (i. z2 . The Hamming code is thus 1 error correcting. 23 . then (z1 . Hamming’s original code.04%. c) = 1. z4 ) ∈ F2 given by z1 = x1 + x3 + x5 + x7 z2 = x2 + x3 + x6 + x7 z4 = x4 + x5 + x6 + x7 . If x is a codeword. we may choose c3 . Explain why the program requires about 17 500 lines of tape but that any particular line will be correctly decoded with probability about 1 − (21 × 10−8 ) and the probability that the entire program will be correctly decoded is better than 99. The information rate is thus 4/7. then the place in which x diﬀers from c is given by z1 + 2z2 + 4z4 (using ordinary addition.e. By inspection. show that the probability that the tape will be accepted as error free by the decoder is less than . or otherwise. z4 ) = (0. direct calculation (possible with a hand calculator but really no advance on the Poisson method). c5 . If c is a codeword and the Hamming distance d(x.Hamming had access to an early electronic computer but was low down in the priority list of users. A program requires about 10 000 lines of tape (each line containing eight places) using the paper tape code. 0. Suppose that we receive the string x ∈ F7 . We form the syndrome 2 3 (z1 . c6 and c7 freely and then c1 . We work in F7 . z2 . The codewords c are chosen to 2 satisfy the three conditions c1 + c3 + c5 + c7 = 0 c2 + c3 + c6 + c7 = 0 c4 + c5 + c6 + c7 = 0. He would submit his programs encoded on paper tape to run over the weekend but often he would have his tape returned on Monday because the machine had detected an error in the tape. 0).6%. Exercise 7. c2 and c4 are completely determined. Suppose now that we use the Hamming scheme (making no use of the last place in each line).5. not addition modulo 2) as may be easily checked using linearity and a case by case study of the seven binary sequences x containing one 1 and six 0s.

solutions looking for problems. after 1950. code making and general communications problems were primed to grasp and extend the ideas.Hamming’s scheme is easy to implement. It took a little time for his company to realise what he had done16 but they were soon trying to patent it. This is true. The reader will also note an analogy with ordinary language. 8 General considerations How good can error correcting and error detecting18 codes be? The following discussion is a natural development of the ideas we have already discussed. The reader will observe that data compression shortens the length of our messages by removing redundancy and Hamming’s scheme (like all error correcting codes) lengthens them by introducing redundancy. but Hamming and his co-discoverers had done more than ﬁnd a clever answer to a question.1. Later. in our discussion of Shannon’s noisy coding theorem we shall see another and deeper way of looking at the question. Deﬁnition 8. The times were propitious for the development of the new ﬁeld. The minimum distance d of a code is the smallest Hamming Experienced engineers came away from working demonstrations muttering ‘I still don’t believe it’. They had asked an entirely new question and opened a new ﬁeld for mathematics and engineering. error correcting codes would have been luxuries. 18 If the error rate is low and it is easy to ask for the message to be retransmitted. The idea of data compression is illustrated by the fact that many common words are short17 . Before 1940. the idea of an error correcting code seems obvious (Hamming’s scheme had actually been used as the basis of a Victorian party trick) and indeed two or three other people discovered it independently. 16 24 . they became necessities. 17 Note how ‘horseless carriage’ becomes ‘car’ and ‘telephone’ becomes ‘phone’. On the other hand the redund of ordin lang makes it poss to understa it even if we do no catch everyth that is said. If there is no possibility of retransmission (as in long term data storage). The mathematical engineer Claude Shannon may be considered the presiding genius of the new ﬁeld. we have to concentrate on error correction. In retrospect. but data compression removes redundancy which we do not control and which is not useful to us and error correction coding then replaces it with carefully controlled redundancy which we can use. it may be cheaper to concentrate on error detection. with the rise of the computer and new communication technologies. Mathematicians and engineers returning from wartime duties in code breaking.

If we know that our message is likely to contain many errors. It cannot detect all sets of d errors and cannot correct 2 all sets of ⌊ d−1 ⌋ + 1 errors. Lemma 8. Less brieﬂy. we know that V (n. r) = j=0 n . x = y} = d is called an [n. y) : x. d] code. j Theorem 8. Error detection is only useful when errors are unlikely. all that an error detecting code can do is conﬁrm our expectations. size m and distance d an [n. 19 25 . Observe that |B(x. d] code. e) There is an obvious fascination (if not utility) in the search for codes which attain the exact Hamming bound. A code of minimum distance d can detect d − 1 errors19 and correct ⌊ d−1 ⌋ errors. a set C ⊆ Fn . y ∈ C. r) = |B(0. r) is the number of points in any Hamming ball of radius r. with |C| = m and 2 min{d(x.3. m. 2 It is natural. r)|. r) = {y : d(x. r)| = |B(0.distance between distinct code words. r)| for all x and so. to make use of the geometrical insight provided by the (closed) Hamming ball B(x.2. then |C| ≤ 2n . here and elsewhere. [Hamming’s bound] If a code C is e error correcting. m. By an [n. This is not as useful as it looks when d is large. A simple counting argument shows that r V (n. m] code we shall simply mean a code of length n and size m. y) ≤ r}. We call a code of length n. V (n. writing V (n.

Far more importantly. (v) Conclude that there is no perfect [90. Explain why we may suppose without loss of generality that 0 ∈ C. 3] code. by considering the number of elements of X.5. If y ∈ X. by a packing argument.6. which places an upper bound on how good a code can be.Deﬁnition 8. x) = 3}. 2 Show that. Consider the set X = {x ∈ F90 : x1 = 1. if a code which can correct e errors is perfect (i. m= V (n. (iii) Let C be as in (ii) with 0 ∈ C. Exercise 8. x2 = 1. We obtained the Hamming bound. then the decoder must invariably give the wrong answer when presented with e + 1 errors. no perfect e error correcting code can exist. ﬁnd the number of solutions to the equation c(x) = c(y) with x ∈ X and. Shannon. we can ﬁnd a unique c(x) ∈ C such that d(c(x). e) is an integer and there does exist an associated perfect code (the Golay code). obtain a contradiction. e) Lemma 8. show that d(c(x).6 was obtained by Golay. no perfect code may exist. Hamming’s original code is a [7. (iv) Continuing with the argument of (iii). 26 .4. d) for the size of the largest code with minimum distance d. 16.e. Even if 2n /V (n. x) = 2. has a perfect packing of Hamming balls of radius e). corresponding to each x ∈ X. d(0. 2) (ii) Suppose that C is a perfect 2 error correcting code of length 90 and size 278 . We note also that. (i) Verify that 290 = 278 . he found another case when 2n /V (n. The result of Exercise 8. It may be worth remarking in this context that. if (as will usually be the case) 2n /V (n. Let us write A(n. e) is an integer. Unfortunately the proof that the Golay code is perfect is too long to be given in the course. Show that V (23. 0) = 5 and that ci (x) = 1 whenever xi = 1. A code C of length n and size m which can correct e errors is called perfect if 2n . 3) is a power of 2. 278 ] code. It is perfect. Varshamov) bound in the opposite direction.7. e) is not an integer. Exercise 8. A covering argument gives us the GSV (Gilbert. V (90.

we obtain the following important result. [Tchebychev’s inequality] If X is a bounded real valued random variable and a > 0. then Pr number of errors in transmission for message of n digits ≥ (1 + ǫ)pn → 0 as n → ∞.Theorem 8. n−1 j=1 Xj − EX ≥ a →0 Applying the weak law of large numbers. but mathematicians are particularly interested in what happens as n → ∞.e. Such a construction was ﬁnally found by Garcia and Stichtenoth by using ‘Goppa’ codes. d)V (n. [Gilbert. V (n. Shannon. Consider the model of a noisy transmission channel used in this course in which each digit has probability p of being wrongly transmitted independently of what happens to the other digits. a2 Theorem 9. 27 . 9 Some elementary probability Engineers are. . [Weak law of large numbers] If X1 . Lemma 9. interested in ‘best codes’ of length n for reasonably small values of n. then n Pr as N → ∞.3. d − 1) Until recently there were no general explicit constructions for codes which achieved the GVS bound (i. We recall some elementary probability. d − 1) ≥ 2n ). . If ǫ > 0. of course. Varshamov] We have A(n.8.2. is a sequence of independent identically distributed real valued bounded random variables and a > 0. d) ≥ 2n . Lemma 9. .1. X2 . codes whose minimum distance d satisﬁed the inequality A(n. then Pr(|X − EX| ≥ a) ≤ var X .

4.7.9 (Stirling). we see that Theorem 9. n Deﬁnition 9. as one might expect. on a version of Stirling’s formula. By deﬁnition. Our proof of Theorem 9. We deﬁne the entropy function H : [0. ⌈2(1+ǫ)pn⌉)/n.By Lemma 8. 28 . 1] → R by H(0) = H(1) = 0 and H(t) = −t log2 (t) − (1 − t) log2 (1 − t).8. ⌈2(1 + ǫ)pn⌉) and so has information rate log2 A(n. (ii) Sketch H.8 depends. a code of minimum distance d can correct ⌊ d−1 ⌋ errors.7 follows at once from the following result. Give a simple system such that. H(A) = H(t). Deﬁnition 9. 1 − H(δ) ≤ α(δ) ≤ 1 − H(δ/2) for all 0 ≤ δ < 1/2. What is the value of H(1/2)? Theorem 9. Lemma 9. nδ) . Exercise 9. 2 Thus. the biggest code with minimum distance ⌈2(1 + ǫ)pn⌉ has size A(n. With the deﬁnitions just given. nδ)/n will thus tell us how large an information rate is possible in the presence of a given error rate.8). We only need the very simplest version proved in IA. nδ) → H(δ) n as n → ∞. using the notation of that deﬁnition. If 0 < δ < 1/2 we write α(δ) = lim sup n→∞ log2 A(n. we know that the probability that a code of length n with error correcting capacity ⌈(1 + ǫ)pn⌉ will fail to correct a transmitted message falls to zero as n → ∞. Theorem 9. if we have an error rate p and ǫ > 0.2. (i) We have already met Shannon entropy in Deﬁnition 4. Using the Hamming bound (Theorem 8. We have loge n! = n loge n − n + O(log2 n).7.6. We have log2 V (n.5. Study of the behaviour of log2 A(n.3) and the GSV bound (Theorem 8.

the remnants of a machine built to guess which of heads and tails its opponent would choose next21 and a mechanical maze running mouse. an electronic calculator20 that uses Roman numerals both externally and internally.7 shows that it is THROBAC the THrifty ROman numeral BAckwards-looking Computer. In his 1937 master’s thesis. Google ‘Shannon Mind-Reading Machine’ for sites giving demonstrations and descriptions of the underlying program. Since MIT has a great deal of glory and since much thought has gone into the presentation of the exhibits. (We shall prove this explicitly in Theorem 10. it is well worth a visit. 10 Shannon’s noisy coding theorem In the backstreets of Cambridge (Massachusetts) there is a science museum devoted to the glory of MIT. nδ) = 0≤j≤nδ n j and that very simple estimates give n m ≤ n j ≤ (m + 1) n m 0≤j≤nδ where m = ⌊nδ⌋.) On the other hand the GSV bound together with Theorem 9. 22 This beautiful paper is available on the web and in his Collected Works. for any mathematician.7 gives a very strong hint that it is not possible to have an information rate greater than 1 − H(δ) for an error rate δ < 1/2. but it is his vision which underlies this course. Although the GSV bound is very important. the highlight is a glass case containing such things as a juggling machine. 20 29 .3. Shannon showed how to analyse circuits using Boolean algebra and binary arithmetic. However. Google ‘MIT Museum’.We combine this with the remarks that V (n. These objects were built by Claude Shannon. Shannon had several predecessors and many successors. Shannon showed that a stronger result can be obtained for the error correcting power of the best long codes. Hamming’s bound together with Theorem 9. During the war he worked on gunnery control and cryptography at Bell labs and in 1948 he published A Mathematical Theory of Communication 22 . go to ‘objects’ and then search ‘Shannon’. 21 That is to say a prediction machine.

It should also be noted that n0 (p. as in the case of Shannon. most of those who are ﬁrst to recognise a new mathematical ﬁeld are also clever. By allowing our coding scheme to be less than perfect (in this connection. random codes will have no useful structure and the only way to use them is to ‘search through a large dictionary’ at the coding end and ‘search through an enormous dictionary’ at the decoding end. Suppose 0 < p < 1/2 and η > 0. More seriously. Shannon’s theorem is a masterly display of the power of elementary probabilistic arguments to overcome problems which appear insuperable by other means23 . However. However. Then there exists an n0 (p. Then there exists an n0 (p. [Shannon’s noisy coding theorem] Suppose 0 < p < 1/2 and η > 0. for any n > n0 . is the dictionary at the decoding end much larger than the dictionary at the coding end? It is relatively simple to obtain a converse to Shannon’s theorem. Although we can use repetition codes to get a positive information rate when 1/4 ≤ δ < 1/2 it looks very hard at ﬁrst (and indeed second) glance to improve these results. Theorem 10. Shannon realised that we do not care whether errors arise because of noise in transmission or imperfections in our coding scheme. Why. η) will be very large when p is close to 1/2. Conway says that in order to achieve success in a mathematical ﬁeld you must either be ﬁrst or be clever.1. However. η) such that.2. η) such that. Exercise 10.13) we can actually improve the information rate whilst still keeping the error rate low. we can ﬁnd codes of length n which have the property that (under our standard model of a symmetric binary channel with probability of error p) the probability that any codeword is mistaken is less than η and still have information rate 1 − H(p) − η. it is impossible to ﬁnd codes of length n which have the property that (under our standard model of a symmetric binary channel with probability of error p) the probability that any codeword is mistaken is less than 1/2 and the code has information rate 1 − H(p) + η. Theorem 10. it merely asserts that good codes exist and gives no means of ﬁnding them apart from exhaustive search. see Question 25.always possible to have an information rate greater than 1 − H(2δ) for an error rate δ < 1/4. 23 30 . in the absence of suitable structure.3. for any n > n0 .

the material is peripheral to the course. you may make a bet with her for any amount k you chose. most mathematicians would be inclined to bet. but. then you are bankrupt and cannot continue playing. You give her k pounds which she keeps whatever happens. 11 A holiday at the race track Although this section is examinable24 . 24 31 . Observe that your choice of w will always be the same (since you expect to go on playing for ever).) This enables us to apply the ‘digital’ theory of information transmission developed here to continuous signals. She then tosses a coin and. but how much? If you bet your entire fortune and win. Every day. those where the noise is governed by a Markov chain M ). Only the size of your fortune will vary. There is one very important theorem of Shannon which is not covered in this course. she pays you ku and. at noon. it is rare in practice to have very clear information about the nature of the noise we encounter. In it.’ Other lecturers may view matters diﬀerently. This is his interpretation of the sentence in the schedules ‘Applications to gambling and the stock market. she pays you nothing. we must leave something for more advanced courses. because your expected winnings are negative. Thus your problem is to discover the proportion w of your present fortune that you should bet. If your fortune after n goes is Zn . you will be better oﬀ than if you bet a smaller sum. he reinterprets a result of Whittaker to show that any continuous signal whose Fourier transform vanishes outside a range of length R can be reconstructed from its value at equally spaced sampling points provided those points are less than A/R apart. If pu > 1. you should not bet.As might be expected. Suppose a very rich friend makes you the following oﬀer. then Zn+1 = Zn Yn+1 When the author of the present notes gives the course. if you lose. Shannon’s theorem and its converse extend to more general noisy channels (in particular. However. and as we said earlier. You know that the probability of heads is p. if it shows tails. (The constant A depends on the conventions used in deﬁning the Fourier transform. What should you do? If pu < 1. if it shows heads. It is possible to deﬁne the entropy H(M ) associated with M and to show that the information rate cannot exceed 1 − H(M ) but that any information rate lower than 1−H(M ) can be attained with arbitrarily low error rates.

for the situation described. (ii) We write q = 1 − p. The exposition is slightly opaque because the Bell company which employed Kelly was anxious not draw attention to the use of telephones for betting fraud. if up > 1 and we choose the optimum w. you should not bet if up ≤ 1 and should take up − 1 w= u−1 if up > 1. we have the following result. so that the gambler can (with arbitrarily high probability) increase her fortune at a certain optimum rate provided that she can continue to bet long enough. b] with 0 < a < b. . mathematicians like Thorp. Berlekamp and Available on the web. Y2 . it was the suggestion that those making a long sequence of bets should aim to maximise the expectation of the logarithm (now called Kelly’s criterion) which made the paper famous.2. then Pr(|n−1 log Zn − E log Y | > ǫ) → 0 as n → 0. Lemma 11. Suppose Y . Although the analogy between betting and communication channels is very pretty. 25 32 . We have seen the expression −(p log p + q log q) before as (a multiple of) the Shannon information entropy of a simple probabilistic system.where Yn+1 = uw + (1 − w) if the n + 1st throw is heads and Yn+1 = 1 − w if it is tails.1. . Using the weak law of large numbers. E log Yn = p log p + q log q + log u − q log(u − 1). Y1 . If we write Zn = Y1 . . Exercise 11. Just as Shannon’s theorem shows that information can be transmitted over such a channel at a rate close to channel capacity with negligible risk of error (provided the messages are long enough). are identically distributed independent random variables taking values in [a. . Thus you should choose w to maximise E log Yn = p log uw + (1 − w) + (1 − p) log(1 − w). In a paper entitled A New Interpretation of Information Rate 25 Kelly showed how to interpret this and similar situations using communication theory. Yn . Show that. Although Kelly seems never to have used his idea in practice. (i) Show that. In his model a gambler receives information over a noisy channel about which horse is going to win. .

However. Before rushing out to the race track or stock exchange27 . the short run can be be very unpleasant indeed. x)} is a linear code. if you bet less than the optimal proportion. Just as Rn is a vector space over R and Cn is a vector space over C. (ii) if x. . your fortune will decrease.] 12 Linear codes The next few sections involve no probability at all.1.2. (i) The repetition code with C = {x : x = (x. Kelly is also famous for an early demonstration of speech synthesis in which a computer sang ‘Daisy Bell’. show that. the reader is invited to run computer simulations of the result of Kelly gambling for various values of u and p. Returning to our original problem. 2 so much the better. . . if not. so Fn is a vector space over F2 . in the very long run.Shannon himself have made substantial fortunes in the stock market and claim to have used Kelly’s ideas26 . (If you know about vector spaces over ﬁelds. [Moral: If you use the Kelly criterion veer on the side under-betting. if you bet more than some proportion w1 . your fortune will still tend to increase but more slowly. Write down the equation for w1 . if λ ∈ F. x. the system works. we have the following deﬁnition. Exercise 11. just follow the obvious paths. y ∈ C.3. We shall only be interested in constructing codes which are easy to handle and have all their code words at least a certain Hamming distance apart. Example 12. This inspired the corresponding scene in the ﬁlm 2001. we hear more about mathematicians who win on the stock market than those who lose. 2 Deﬁnition 12. Note that. We shall see that linear codes have many useful properties. More formally. then λ = 0 or λ = 1. 27 A sprat which thinks it’s a shark will have a very short life. She will observe that although. then x + y ∈ C. so that condition (i) of the deﬁnition just given guarantees that λx ∈ C whenever x ∈ C.) A linear code is a subspace of Fn . but. 26 33 . A linear code is a subset of Fn such that 2 (i) 0 ∈ C.

5. We say that C is the code deﬁned 2 by the set of parity checks P if the elements of C are precisely those x ∈ Fn 2 with n p j xj = 0 j=1 for all p ∈ P . We call C ⊥ the dual code to C. Lemma 12. then C is linear. (iii) Hamming’s original code is a linear code. If C is a linear code in Fn then 2 dim C + dim C ⊥ = n.6. If C is a linear code. (ii) (C ⊥ )⊥ ⊇ C. examples (ii) and (iii) are ‘parity check codes’ and so automatically linear as we see from the next lemma. If C is a code deﬁned by parity checks. Thus C ⊥ is the set of parity checks satisﬁed by C. Lemma 12. We now prove the converse result. In fact. then (i) C ⊥ is a linear code. The following is a standard theorem of that course.3. Consider a set P in Fn .(ii) The paper tape code n C= x: j=0 xj = 0 is a linear code. The veriﬁcation is easy. Lemma 12. Deﬁnition 12. In the language of the course on linear mathematics. we write C ⊥ for the set of p ∈ Fn such that n p j xj = 0 j=1 for all x ∈ C. If C is a linear code.4. 34 . C ⊥ is the annihilator of C. Deﬁnition 12.7.

11. Our treatment of linear codes has been rather abstract. . we get the following corollaries. then (C ⊥ )⊥ = C.6 (ii) with Lemma 12. Remember that the Hamming code is the code of length 7 given by the parity conditions x1 + x3 + x5 + x7 = 0 x2 + x3 + x6 + x7 = 0 x4 + x5 + x6 + x7 = 0.13). As examples. . 35 . We now give a more computational treatment of parity checks. any r × n matrix whose rows form a basis for C is called a generator matrix for C. we can give a constructive proof of the following lemma.8. Every linear code is deﬁned by parity checks. Deﬁnition 12. If C is a linear code. the paper tape code and the original Hamming code. .10. Example 12. Lemma 12. we can ﬁnd generator matrices for the repetition code.9.Since the treatment of dual spaces is not the most popular piece of mathematics in IB. Any code whose codewords can be split up in this manner is called systematic. If C is a linear code of length n. Lemma 12. We say that C has dimension or rank r. Lemma 12. . we introduce the notion of a generator matrix. Notice that this means that any codeword x can be written as (y|z) = (y|yB) where y = (y1 . By using row operations and column permutations to perform Gaussian elimination. Any linear code of length n has (possibly after permuting the order of coordinates) a generator matrix of the form (Ir |B).7.12. y2 . yr ) may be considered as the message and the vector z = yB of length n − r may be considered the check digits. In order to put computational ﬂesh on the dry theoretical bones. we shall give an independent proof later (see the note after Lemma 12. Combining Lemma 12.

What about decoders? Recall that every linear code of length n has a (non-unique) associated parity check matrix H with the property that x ∈ C if and only if xH = 0. then. Lemma 12. we know that H is a parity check matrix for C and its transpose H T is a generator for C ⊥ .13. If the syndrome of the received message is a non-zero vector w. Lemma 12.7.Lemma 12. 0) ≤ K). we deﬁne the syndrome of z to be zH. (ii) Hamming’s original code has dual with generator matrix 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1 We saw above that the codewords of a linear code can be written (y|z) = (y|yB) where y may be considered as the vector of message digits and z = yB as the vector of check digits. Using the rank. Thus C ⊥ = (ker G)T . Lemma 12. If z ∈ Fn . all u with d(u. then z ∈ C and the decoder assumes the transmitted message was z. If the syndrome is zero. If we are given z = x + e where x is a code word and the ‘error vector’ e ∈ Fn . If C is a linear code of length n and dimension r with generator the n × r matrix G. if H is any n × (n − r)– matrix with columns forming a basis of ker G. When our decoder receives z. Example 12. nullity theorem.13 enables us to characterise C ⊥ . we get a second proof of Lemma 12. the decoder searches its list until it 36 . (i) The dual of the paper tape code is the repetition code. Thus encoders for linear codes are easy to construct. then 2 zH = eH.15. it computes the syndrome zH. If C is a linear code of length n with generator matrix G. Let C be a linear code with parity check matrix H. The 2 following lemma is mathematically trivial but forms the basis of the method of syndrome decoding. Suppose we have tabulated the syndrome uH for all u with ‘few’ nonzero entries (say.14. then a ∈ C ⊥ if and only if GaT = 0T .16.

t) = j=0 Aj sj tn−j . then. Deﬁnition 12. The weight w(x) of a vector x ∈ Fn is given by 2 w(x) = d(0. If the transmitted message z has syndrome zH equal to the ith row of H. If w is the weight function on Fn and x. x).) If K is large the task of searching the list of possible syndromes becomes onerous and. The pattern of distances in a linear code is encapsulated in the weight enumeration polynomial. we ﬁnd that ‘decoding becomes dear’ although ‘encoding remains cheap’. If we take K = 1. for this to be case.17. then 2 2 (i) w(x) ≥ 0. Here are some simple properties of WC . even if not the right one). we can talk about linear codes of minimum weight d when we mean linear codes of minimum distance d. Since the minimum (non-zero) weight in a linear code is the same as the minimum (non-zero) distance. then a + C = C. The decoder then assumes that the transmitted message was x = z − e (note that z − e will always be a codeword. writing e(i) for the vector in Fn with 1 in the ith place and 0 elsewhere. Let C be a linear code of length n. The idea here is to exploit the fact that. Thus the ‘view of C’ from any codeword a is the same as the ‘view of C’ from the particular codeword 0. We write Aj for the number of codewords of weight j and deﬁne the weight enumeration polynomial WC to be the polynomial in two real variables given by n WC (s. we 2 see that the syndrome e(i) H is the ith row of H. 37 . y ∈ Fn . (Recall the special case of Hamming’s original code.ﬁnds an e with eH = w. unless (as sometimes happens) we can ﬁnd another trick. but. at least K + 1 errors must have occurred. (iii) w(x) + w(y) ≥ w(x + y). Deﬁnition 12. that is we only want a 1 error correcting code. Lemma 12. then the decoder assumes that there has been an error in the ith place and nowhere else.18. (ii) w(x) = 0 if and only if x = 0. We conclude this section by looking at weights and the weight enumeration polynomial for a linear code. if C is linear code and a ∈ C. This procedure will fail if w does not appear in the list.19.

(ii) If C is the paper tape code of length n. Deﬁnition 13. t) = WC (t. For our standard model of communication along an error prone channel with independent errors of probability p and a linear code C of length n. 1−p)−(1−p)n . Let d be a strictly positive integer and let n = 2d − 1. then WC (1. Lemma 12. 38 . t) = 1 ((s+t)n +(t−s)n ). Consider the (column) vector space D = Fd . WC (s.9. t) = sn + tn .20.21. 1 − p) = Pr(receive a code word | code word transmitted) and Pr(receive incorrect code word | code word transmitted) = WC (p. Under the assumptions and with the notation of Deﬁnition 12. Theorem 12.19. (i) If C is the repetition code. t) = 2− dim C WC (t − s. WC (s. the following results are true.) 13 Some general constructions However interesting the theoretical study of codes may be to a pure mathematician.1. 1) = 1. (v) WC (s. (ii) If C has rank r. The Hamming (n. s) for all s and t if and only if WC (1. the engineer would prefer to have an arsenal of practical codes so that she can select the one most suitable for the job in hand. We give a proof as Exercise 26. 2 Example 12. WC (p. (iii) WC (0. Example 12. 0) = 1.22. [MacWilliams identity] If C is a linear code WC ⊥ (s. 1) = 2r .22 is a special case of the MacWilliams identity. 0) takes the value 0 or 1.23. Write down a d × n matrix H 2 whose columns are the 2d − 1 distinct non-zero vectors of D. n − d) code is the linear code of length n with H T as parity check matrix. t + s). (iv) WC (1. (The result is thus not bookwork though it could be set as a problem with appropriate hints.Lemma 12. (i) WC is a homogeneous polynomial of degree n. In this section we discuss the general Hamming codes and the Reed-Muller codes as well as some simple methods of obtaining new codes from old.

the reader is warned that. it represents a dead end28 . n − d) code has minimum weight 3 and is a perfect 1 error correcting code [n = 2d − 1]. .3. ρ = 3 and n = 23 (the Golay code. Lemma 13. . with n odd (the three codes just mentioned are easy to identify). . xn ) ∈ C for some xn ∈ F2 }. Deﬁnition 13. Hamming codes are ideal in situations where very long strings of binary digits must be transmitted but the chance of an error in any individual digit is very small.5. .Of course the Hamming (n. . xn ) ∈ C. The Hamming (7. Lemma 13. x2 . 4) code is the original Hamming code. x2 . 2 xj = 0 . The only linear perfect codes with ρ = 1 and n = 2m − 1 are the Hamming codes. (Look at Exercise 7. . of course they present no practical advantages. xn−1 ) : (x1 . 28 39 . We note that H has rank d. . it is known that perfect codes of length n with Hamming spheres of radius ρ exist for ρ = 0. the parity check extension C + of C is the code of length n + 1 given by n+1 C+ = x ∈ Fn+1 : (x1 . n − d) code is a linear code of length n and rank n − d [n = 2d − 1]. from a practical point of view.2.6. The fact that any two rows of H are linearly independent and a look at the appropriate syndromes gives us the main property of the general Hamming code. . Here are a number of simple tricks for creating new codes from old. ρ = n.4.5. n − d) code is only deﬁned up to permutation of coordinates. found by direct search) and ρ = 1 and n = 2m − 1. The Hamming (n. so a simple use of the rank nullity theorem shows that our notation is consistent. Example 13. it is suspected that there are many of them and they are the subject of much research. the truncation C − of C is the code of length n − 1 given by C − = {(x1 . but. If C is a code of length n. If we conﬁne ourselves to the binary codes discussed in this course. ρ = (n − 1)/2. x2 . j=1 Deﬁnition 13.) Although the search for perfect codes other than the Hamming codes produced the Golay code (not discussed here) and much interesting combinatorics. The Hamming (n. . . If C is a code of length n. There are known to be non-Hamming codes with ρ = 1 and n = 2m − 1. . .

e. On the other hand. We deﬁne the bar product C1 |C2 of C1 and C2 to be the code of length 2n given by C1 |C2 = {(x|x + y) : x ∈ C1 . Suppose C1 and C2 are linear codes of length n with C1 ⊇ C2 (i. If C is linear. then we have the following relation between minimum distances. y ∈ C2 }.7. but this is unlikely to be satisfactory. The Reed-Muller are suitable when the error rate is very high and we are prepared to sacriﬁce information rate.10. Lemma 13. d(C1 ⊕ C2 ) = min d(C1 ). How can we combine two linear codes C1 and C2 ? Our ﬁrst thought might be to look at their direct sum C1 ⊕ C2 = {(x|y) : x ∈ C1 .11. . . its truncation C − and its shortening C ′ (provided that the symbol chosen is 0). we can obtain a more promising construction. if C1 and C2 satisfy rather particular conditions.9. Let C1 and C2 be linear codes of length n with C1 ⊇ C2 . . . α) ∈ C}. x2 . . Lemma 13. Deﬁnition 13. We now return to the construction of speciﬁc codes. y ∈ C2 }. d(C2 )).Deﬁnition 13. Recall that the Hamming codes are suitable for situations when the error rate p is very small and we want a high information rate. They were used by NASA for the radio transmissions from its planetary 40 . . Lemma 13. xn−1 ) : (x1 . . If C1 and C2 are linear codes.8. so is its parity check extension C + . If C is a code of length n. the shortening (or puncturing) C ′ of C by the symbol α (which may be 0 or 1) is the code of length n − 1 given by C ′ = {(x1 . d(C2 ) . Then the bar product C1 |C2 is a linear code with rank C1 |C2 = rank C1 + rank C2 . with C2 a subspace of C1 ). . xn−1 . The minimum distance of C1 |C2 satisﬁes the equality d(C1 |C2 ) = min(2d(C1 ). x2 .

the comparison is meaningless. . Our code words will be of length n = 2d and will correspond to the 2 indicator functions IA on X. We start by considering the 2d points P0 . 29 41 . . However. . πj = {p ∈ X : pj = 0} [1 ≤ j ≤ d] Less formally. the elements of order 2 are the hi ∧ hj with i < j. . 1). . y ∈ Fn . . we deﬁne a new operation cA ∧ cB = cA∩B . (iii) Show that h0 ∧ x = x. If Ak is the set of terms of order at most k. together with the special vector h0 = cX = (1. Exercise 13. 1 ≤ j ≤ d}. 2 for some A ⊆ X. the possible code word cA is given by cA = 1 i cA i =0 otherwise. n In addition to the usual vector space structure on F2 . . . Strictly speaking. . (ii) Show that (x + y) ∧ z = x ∧ z + y ∧ z. y. if x. yn−1 ) = (x0 y0 . . xn−1 yn−1 ). if Pi ∈ A Thus. x1 y1 . 2 (i) Show that x ∧ y = y ∧ x.probes (a task which has been compared to signalling across the Atlantic with a child’s torch29 ). More speciﬁcally. y1 . . . . the elements of order 1 are the hi . z ∈ Fn and A. P1 . . . then the set Ak+1 of terms of order at most k + 1 is deﬁned by Ak+1 = {a ∧ hj : a ∈ Ak . . but more clearly. ﬁnd E in terms of A. Finally we consider the collection of d hyperplanes in Fn and the corresponding indicator functions 2 hj = cπj . (iv) If cA + cB = cE . the elements of order 3 are the hi ∧ hj ∧ hk with i < j < k and so on. 1. ﬁnd E in terms of A and B. x1 . it sounds impressive and that is the main thing. .12. (v) If h0 + cA = cE . . P2d −1 of the space X = Fd . xn−1 ) ∧ (y0 . . We refer to A0 = {h0 } as the set of terms of order zero. . B ⊆ X. (x0 . Suppose that x.

+ ··· + + + r 2 1 0 (iv) Using the bar product notation. 3) code is the trivial code consisting of all the elements of F3 . 2 We now prove the key properties of the Reed-Muller codes. once we have looked at suﬃciently many examples it should become clear what is going on.e. r) is exactly 2d−r . we have RM (d.Deﬁnition 13. Using the notation established above. r) = RM (d − 1.17. Theorem 13. (iii) The RM (3. (v) The minimum weight of RM (d. 2) code is the paper tape code of length 8. Show that the RM (d.) 42 . r) is d d d d . subspace of Fn ) generated by the terms 2 of order r or less. Example 13. r)|RM (d − 1.13. (i) The RM (3. r) is the linear code (i.14. What was its information rate? What proportion of errors could it correct in a single code word? Exercise 13. (iii) The dimension of the Reed-Muller code RM (d. The Mariner mission to Mars used the RM (5.15. We use the notation established above. (This is useful because we often want codes of length 2d . 2 (ii) The elements of order d or less are linearly independent.16. (i) The elements of order d or less (that is the collection of all possible wedge products formed from the hi ) span Fn . N − d) code with N = 2d − 1. r − 1). Exercise 13. 1) code is the parity check extension of Hamming’s original code. (iii) The RM (3. the Reed-Muller code RM (d. 0) code is the repetition code of length 8. (ii) The RM (3. 1) code. d − 2) code is the parity extension code of the Hamming (N. Although the formal deﬁnition of the Reed-Muller codes looks pretty impenetrable at ﬁrst sight.

The only rule which will cause us trouble is If x ∈ K and x = 0. Thus X 2 + X is a non-zero polynomial over F2 all of whose values are zero. we distinguish between the polynomial in the ‘indeterminate’ X n P (X) = j=0 Example 14. along with much else. but here an interesting diﬃculty arises. A ﬁeld K is an object equipped with addition and multiplication which follow the same rules as do addition and multiplication in R.2. then we can ﬁnd y ∈ K such that xy = 1. from school mathematics. We manipulate polynomials in X according to the standard rules for polynomials.14 Polynomials and ﬁelds This section is starred. We have t2 + t = 0 for all t ∈ F2 . Its object is to make plausible the few facts from modern30 algebra that we shall need. We are particularly interested in polynomials over ﬁelds. then we can ﬁnd a polynomial Q such that P (X) = (X − a)Q(X). [Remainder theorem] (i) If P is a polynomial over a ﬁeld K and a ∈ K. Lemma 14. (ii) If P is a polynomial over a ﬁeld K and a ∈ K is such that P (a) = 0. 43 . The following result is familiar. Anyone capable of criticising the imprecision and general slackness of the account that follows obviously can do better themselves and should rewrite this section in an appropriate manner. but attendance at those courses is no more required for this course than is reading Joyce’s Ulysses before going for a night out at an Irish pub. To get round this. in various post-IA algebra courses. C and F2 . that is. but say that n aj X j = 0 j=0 if and only if aj = 0 for all j. They were covered. in 1920.1. aj X j with coeﬃcients aj ∈ K and its evaluation P (t) = n aj tj for some t ∈ j=0 K. then we can ﬁnd a polynomial Q and an r ∈ K such that P (X) = (X − a)Q(X) + r. in essence. 30 Modern. ⋆ Obvious examples of ﬁelds include R.

then P1 + P2 ∈ P. 1. Suppose that P is a set of polynomials which contains at least one non-zero polynomial and has the following properties. (i) If Q is any polynomial and P ∈ P. However.The key to much of the elementary theory of polynomials lies in the fact that we can apply Euclid’s algorithm to obtain results like the following. Recall that the polynomial P (X) = X 2 + 1 has no roots in R (that is P (t) = 0 for all t ∈ R). However. Consider a non-zero polynomial P0 of smallest degree in P. We can perform a similar trick with other ﬁelds. then P is irreducible. then F2 [ω] is a ﬁeld containing F2 in which P has a root (since P (ω) = 0). ω. b ∈ R] with the obvious formal deﬁnitions of addition and multiplication and subject to the further condition i2 + 1 = 0. then the product P Q ∈ P. if we consider F2 [ω] = {0. Proof. by considering the collection of formal expressions a + bi [a. If P is a polynomial over a ﬁeld K. we say that P is reducible if there exists a non-constant polynomial Q of degree strictly less than P which divides P .3. 1 + ω} with obvious formal deﬁnitions of addition and multiplication and subject to the further condition ω 2 + ω + 1 = 0.6. Then we can ﬁnd a non-zero P0 ∈ P which divides every P ∈ P. (ii) If P1 .4. If P is a non-constant polynomial which is not reducible. P2 ∈ P. then P has no roots in K. In order to state a correct generalisation of the ideas of the previous paragraph we need a preliminary deﬁnition. Proof. Theorem 14.5. Since (1 + ω)ω = 1 this is easy. However. Deﬁnition 14. If P is an irreducible polynomial of degree n ≥ 2 over a ﬁeld K. we obtain a ﬁeld C ⊇ R in which P has a root (since P (i) = 0). The only thing we really need prove is that F2 [ω] is a ﬁeld and to do that the only thing we need to prove is that ⋆ holds. if we consider n−1 K[ω] = j=0 aj ω j : aj ∈ K 44 . Theorem 14. Example 14. then P has no roots in F2 . If P (X) = X 2 +X +1.

the polynomials P and Q have no common factor of degree 1 or more. then we can ﬁnd a ﬁnite ﬁeld L ⊇ K in which P factorises completely. By Theorem 14. Since any linear factor of P in K remains a factor in the bigger ﬁeld L. we note yet another useful simple consequence of Euclid’s algorithm. A glance at our method of proving Theorem 14.2. If P is a polynomial over a ﬁeld K. Proof. we can ﬁnd a ﬁeld L ⊇ K in which Q has a root α say and so. Lemma 14. Since P is irreducible.9. Lemma 14. then we can ﬁnd a ﬁeld L ⊇ K in which P has more linear factors. In a proper algebra course we would simply deﬁne K[ω] = K[X]/(P (X)) where (P (X)) is the ideal generated by P (X).8. Proof. If P is a polynomial over a ﬁnite ﬁeld K. If P is a polynomial over a ﬁeld K which does not factorise completely into linear factors. Hence. then we can ﬁnd a ﬁeld L ⊇ K in which P factorises completely into linear factors. This is a cleaner procedure which avoids the use of such phrases as ‘the obvious formal deﬁnitions of addition and multiplication’ but the underlying idea remains the same.6. But P (ω) = 0. Factor P into irreducible factors and choose a factor Q which is not linear. so R(ω)Q(ω) = 1 and we have proved ⋆.8 shows that the following result holds. a linear factor X − α. Let Q be a non-zero polynomial of degree at most n − 1. We shall be interested in ﬁnite ﬁelds (that is ﬁelds K with only a ﬁnite number of elements).7.with the obvious formal deﬁnitions of addition and multiplication and subject to the further condition P (ω) = 0. by Euclid’s algorithm. The only thing we really need prove is that K[ω] is a ﬁeld and to do that the only thing we need to prove is that ⋆ holds. then K[ω] is a ﬁeld containing K in which P has a root. 45 . we are done. In this context. by Lemma 14. Theorem 14. we can ﬁnd polynomials R and S such that R(X)Q(X) + S(X)P (X) = 1 and so R(ω)Q(ω) + S(ω)P (ω) = 1.

Let K be a ﬁeld. (i) If G is a ﬁnite Abelian group and x.13. We also need a result on roots of unity given as part (v) of the next lemma. Thus X N − 1 has a linear factor 46 .12. (ii) If G is a ﬁnite Abelian group and x.Lemma 14. y ∈ G have orders r and s. We can thus deduce the following result which will be used in the next section. If L is a ﬁeld containing F2 such that n X − 1 factorises completely into linear terms. (iii) Let h be an element of highest order in G and use (ii). ω n−1 . then P ′ (X) = 2(X − a)Q(X) + (X − a)2 Q′ (X). If P (X) = n aj X j is a polynomial over j=0 K. (iv) If G is a ﬁnite subset of a ﬁeld K which is a group under multiplication. then P divides Q. ω 2 . Lemma 14. (iii) If P is divisible by (X − a)2 .11. s). ω. then we can ﬁnd an ω ∈ L such that the roots of X n − 1 are 1. then there exists an N and an h ∈ G such that h has order N and g N = e for all g ∈ G. then X n − 1 can have no repeated linear factors as a polynomial over L. (v) Suppose n is an odd integer.) Proof. then 2y = (1+1)y = 0y = 0 for all y ∈ L. Lemma 14. (We call ω a primitive nth root of unity. then P (a) = P ′ (a) = 0. . then xy has order rs. Suppose that P is an irreducible polynomial over a ﬁeld K which has a linear factor X − α in some ﬁeld L ⊇ K. . If L is a ﬁeld containing F2 and n is an odd integer. r/u and s/v are coprime and rs/(uv) = lcm(r. then G is cyclic. (iii) If G is a ﬁnite Abelian group. (ii) Consider z = xu y v where u is a divisor of r. we deﬁne P ′ (X) = n jaj X j−1 . j=1 (i) If P and Q are polynomials. (iv) By (iii) we can ﬁnd an integer N and a h ∈ G such that h has order N and any element g ∈ G satisﬁes g N = 1. (ii) If P and Q are polynomials with P (X) = (X − a)2 Q(X). Lemma 14. We shall need a lemma on repeated roots. . v is a divisor of s. (P + Q)′ = P ′ + Q′ and (P Q)′ = P ′ Q + P Q′ . If Q is a polynomial over K which has the factor X − α in L.10. If L is a ﬁeld containing F2 . then we can ﬁnd an element z of G with order the lowest common multiple of r and s. y ∈ G have coprime orders r and s.

13 (iv). a1 . a2 .14 shows that there is (up to ﬁeld isomorphism) only one ﬁeld with 2n elements containing F2 . If K is a ﬁeld with m elements. Here is another interesting consequence of Lemma 14. Lemma 14. . We call an element k with the properties given in Lemma 14. the so-called cyclic codes. Then K = {x ∈ L : x2 = x} is a ﬁeld with 2n elements containing F2 . Deﬁnition 15.16. a0 ) ∈ C. an−1 ) ∈ C ⇒ (a1 .1. 47 . then there is an element k of K such that K = {0} ∪ {k r : 0 ≤ r ≤ m − 2} and k m−1 = 1. . . Exercise 14. Let L be some ﬁeld containing F2 in which X 2 factorises completely. Proof. A linear code C in Fn is called cyclic if 2 (a0 .14. Thus |G| = N and g generates G. . . (v) Observe that G = {ω : ω n = 1} is an Abelian group with exactly n elements (since X n − 1 has no repeated roots) and use (iv). Lemma 14. Observe that K \ {0} forms an Abelian group under multiplication. an−1 . Find all the primitive elements of F7 . we discuss a subclass of linear codes. We call it F2n .14 a primitive element of K. But by Lagrange’s theorem N divides G. With this hint. it is not hard to show that there is indeed a ﬁeld with 2n elements containing F2 . . . an−2 . .X − g for each g ∈ G and so g∈G (X − g) divides X N − 1. n n −1 −1 = 0 15 Cyclic codes In this section.15. Lemma 14. It follows that the order |G| of G cannot exceed N .

F2 [X] is a Euclidean domain and so a principal ideal domain. An application of Euclid’s algorithm gives the following useful result.Let us establish a correspondence between Fn and the polynomials on F2 2 modulo X n − 1 by setting n−1 Pa = j=0 aj X j whenever a ∈ (Of course. A polynomial g is a generator for a cyclic code of length n if and only if it divides X n − 1. r r Exercise 15. (ii) Pa = 0 if and only if a = 0.5.) Notice that the task of ﬁnding irreducible factors (that is factors with no further factorisation) is a ﬁnite one. A code C of length n is cyclic if and only if (working modulo X n −1.) We call g(X) a generator polynomial for C.) From now on we shall talk of the code word f (X) when we mean the code word a with Pa (X) = f (X).12. we shall take n odd from now on. Fn . Lemma 15. 48 . With the notation just established. the cyclic codes are said to be separable. C is cyclic if and only if PC is an ideal of the quotient ring F2 [X]/(X n − 1). X − 1 = X n + 1 but in this context the ﬁrst expression seems more natural. g ∈ PC .) Exercise 15. Lemma 15.6.2. show that (i) Pa + Pb = Pa+b . 2 n Thus we must seek generators among the factors of X n − 1 = X n + 1. (In this case. then f + g ∈ PC . the result can be rather disappointing. In order to avoid this problem and to be able to make use of Lemma 14. (In the language of abstract algebra. (ii) If f ∈ PC and g is any polynomial.3. and using the conventions established above) there exists a polynomial g such that C = {f (X)g(X) : f a polynomial} (In the language of abstract algebra. then X 2 + 1 = (X + 1)2 . Lemma 15. (i) If f. then the product f g ∈ PC .4. If we work with polynomials over F2 . If there are no conditions on n. Thus the quotient F2 [X]/(X n − 1) is a principal ideal domain. A code C in Fn is cyclic if and only if PC = {Pa : a ∈ C} 2 satisﬁes the following two conditions (working modulo X n − 1).

If g = g1 g2 . gk with each gj irreducible over F2 and A is a subset of the set of all the roots of all the gj and containing at least one root of each gj [1 ≤ j ≤ k]. then C = {f ∈ F2 [X] : f (α) = 0 for all α ∈ A}. A deﬁning set for a cyclic code C is a set A of elements in some ﬁeld K containing F2 such that f ∈ F2 [X] belongs to C if and only if f (α) = 0 for all α ∈ A. Consider codes of length n. Cyclic codes are thus easy to specify (we just need to write down the generator polynomial g) and to encode. . we know.11. αj )T Then a vector a ∈ Fn is a code word in C if and only if 2 aB = 0 in K. Suppose further that g factorises completely into linear factors in some ﬁeld K containing F2 .) Lemma 15. The dual of a cyclic code is itself cyclic. Lemma 15. . .9. since n is odd. . Deﬁnition 15. Let B be the n × r matrix over K whose jth column is n−1 2 (1. Suppose that g is a generator of a cyclic code C of odd length n. . Then g is a generator of a cyclic code C and h is a generator for a cyclic code which is the reverse of C ⊥ .12. α2 . . . As an immediate corollary. If a cyclic code C of length n has generator g of degree n − r then g(X). Suppose that A = {α1 . Suppose that g(X)h(X) = X n −1. Lemma 15. αj . if C has length n. The same is therefore true for any polynomial dividing it. Xg(X). .7. . .12. αj . by Lemma 14. X r−1 g(X) form a basis for C. . that it has no repeated factors.Lemma 15. A must be a set of zeros of X n − 1. . we have the following remark.8. 49 . . . (Note that. Lemma 15. We know that X n + 1 factorises completely over some larger ﬁnite ﬁeld and. αr } is a deﬁning set for a cyclic code C in some ﬁeld K containing F2 .10.

then β is a primitive root of unity and β 2 is also a root of X 3 + X + 1. The BCH code with {β.. Note that the rank of C will be n − k. Hocquenghem) codes. Suppose that α ∈ K is a primitive nth root of unity. 50 . n−1 x1 1 x2 2 x2 . . . Lemma 15. (i) If K is a ﬁeld containing F2 . b ∈ K.12 is of full rank for a BCH. . [The van der Monde determinant] We work over a ﬁeld K. .. The next theorem contains the key fact about BCH codes. .14. .The columns in B are not parity checks in the usual sense since the code entries lie in F2 and the computations take place in the larger ﬁeld K. If β is a root of X 3 + X + 1 in K. α2 . . Theorem 15. n−1 xn = 1≤j<i≤n (xi − xj ).. . Recall that a primitive nth root of unity is an root α of X n − 1 = 0 such that every root is a power of α. (iii) Let K be a ﬁeld containing F2 in which X 7 − 1 factorises into linear factors.. 1 xn x2 n . Notice also that k may be very much larger than δ. Example 15. Ray-Chaudhuri. .. then (a + b)2 = a2 + b2 for all a... (iv) We continue with the notation (iii). .13.16.. . Our proof of Theorem 15. (ii) If P ∈ F2 [X] and K is a ﬁeld containing F2 . . Deﬁnition 15. . .15. then P (a)2 = P (a2 ) for all a ∈ K. A cyclic code C with deﬁning set A = {α. . With this background we can discuss a famous family of codes known as the BCH (Bose.4) code. Suppose that n is odd and K is a ﬁeld containing F2 in which X n −1 factorises into linear factors. The minimum distance for a BCH code is at least as great as the design distance. n−1 x3 .15 relies on showing that the matrix B of Lemma 15. αδ−1 } is a BCH code of design distance δ.. n−1 x2 1 x3 x2 3 . .. To do this we use a result which every undergraduate knew in 1950. where k is the degree of the product of those irreducible factors of X n − 1 over F2 which have a zero in A. β 2 } as deﬁning set is Hamming’s original (7. The determinant 1 x1 x2 1 .

. e is the error vector and we assume that there are no more than t errors. Suppose that a codeword c = (c0 . 51 .17. . In particular. We write n−1 c(X) = j=0 n−1 cj X j . Let t be the largest integer with 2t + 1 ≤ δ. We write e = r − c and assume that E = {0 ≤ j ≤ n − 1 : ej = 0} has no more than t members. Note that we know that σ0 = 1 so both the polynomials ω and σ have t unknown coeﬃcients. c1 . The error locator polynomial is σ(X) = j∈E (1 − αj X) and the error co-locator is n−1 ω(X) = i=0 ei αi j∈E. . until the end of this section. C will have length n and deﬁning set A = {α.13. In other words. We show how we can correct up to t errors. we shall suppose that we are using the BCH code C described in Deﬁnition 15.How can we construct a decoder for a BCH code? From now on. Informally. . Deﬁnition 15. cn−1 ) is transmitted and that the string received is r. . . j=i (1 − αj X). Note that ω has degree at most t − 1 and σ degree at most t. 1 − αi X We take ω(X) = j ωj X j and σ(X) = j σj X j . rj X j . . we write n−1 ω(X) = i=0 ei αi σ(X) . j=0 n−1 r(X) = e(X) = j=0 ej X j . . αδ−1 } where α is a primitive nth root of unity in K. α2 .

1 − αj Z m=0 Lemma 15. One way round is to work modulo Z 2t (more formally. (vi) 0 = u+v=j r(αu+1 )σv for all t ≤ j ≤ 2t − 1. σ(Z) m=0 (iv) ω(Z) ≡ (v) ωj = u+v=j 2t−1 m m+1 )σ(Z). It is worth noting that the system of equations in part (v) suﬃce to determine the pair σ and ω directly. we may deﬁne 1 = (αj Z)m . since σ determines E. to work in K[Z]/(Z 2t )). it is not clear what meaning to assign to such a relation. If the error locator polynomial is given the value of e and so of c can be obtained directly. if we work modulo Z 2t .Lemma 15. m=0 Z r(α u+1 r(α )σv for all 0 ≤ j 2t−1 ≤ t − 1. Lemma 15.18. Part (vi) of Lemma 15. r=0 Unfortunately.19.20 completes our search for a decoding method.20. E determines e and e determines c. We wish to make use of relations of the form 1 = 1 − αj X ∞ (αj X)r . We then have Z u ≡ 0 for all integers u ≥ 2t. as we shall from now on. Thus. Of course. errors are likely to occur in bursts (corresponding to scratches etc) and this is dealt with by 52 . If we work modulo Z 2t then 2t−1 (1 − α Z) j m=0 (αj Z)m ≡ 1. 2t−1 ω(Z) ≡ Z m e(αm+1 ). (vii) The conditions in (vi) determine σ completely. Compact disc players use BCH codes. (i) σ(Z) m=0 (ii) e(αm ) = r(αm ) for all 0 ≤ m ≤ 2t − 1. 2t−1 ω(Z) ≡ (iii) Z m r(αm+1 ). With the conventions already introduced.

now better known for his superb ‘Sustainable Energy Without the Hot Air’ — rush out and read it. But that is another story. . . .5 mm of track).1 is a permutation. x2 . The story of one such contribution is set out in T. yn−1 ) for all n ≥ d. . xd−2 . none of the codes we have considered work anywhere near the Shannon bound (see Theorem 10. yd−1 ) is the sequence The stream associated to an initial ﬁll y0 . i. .2. x2 . If the general feedback shift f given in Deﬁnition 16. with yn = C(yn−d . .) Theorem 15. 31 53 . People like David MacKay. . Thompson’s From Error-correcting Codes through Sphere Packings to Simple Groups [9] — a good. . . x1 . but the topic discussed will turn out to have connections with the decoding of BCH codes as well. xd−2 . . yj+1 . x1 . xd−1 ) = x0 + C ′ (x1 . but Elias has shown that this is not the case. . . yj . Deﬁnition 16. . . C(x0 .1). . xd−1 ). A general feedback shift register is a map f : Fd → Fd 2 2 given by f (x0 . . xd−2 . Unfortunately. so the study of error correcting codes has contributed greatly to the study of pure algebra. Example 16. .21. 16 Shift registers In this section we move towards cryptography.distributing the bits (digits) in a single codeword over a much longer stretch of track. 2 (y0 . y1 . . . . xd−1 )) with C a map C : Fd → F2 . . xd−1 ) = (x1 . . .e. Just as pure algebra has contributed greatly to the study of error correcting codes. account of the discovery of the last sporadic simple groups by Conway and others. We might suspect that this is because they are linear. yn−d+1 . . M. y1 . . . . not too mathematical. . x1 .1. The advance of computational power and the ingenuity of the discoverers31 have lead to new codes which appear to come close to the Shannon bounds. (We just state the result without proof. . xd−2 . . xd−1 .1 we can replace ‘code’ by ‘linear code’. . . . The code used can correct a burst of 4000 consecutive errors (2. In Theorem 10. . then C is linear in the ﬁrst variable. . C(x0 .

x1 . . . . n v is the appropriate polynomial in n. . show that xn ∈ F2 for all n. . with a0 = 1.3. The discussion of the linear recurrence xn = a0 xn−d + a1 xn−d+1 + · · · + ad−1 xn−1 over F2 follows the IA discussion of the same problem over R but is complicated by the fact that n2 = n in F2 .4. Exercise 16. Show that the functions fj : Z → F2 n fj (n) = j 54 . . Discuss brieﬂy the eﬀect of omitting the condition a0 = 1 from Deﬁnition 16. + ad−1 xd−1 . . . . . . . + ad−1 xn−1 ⋆ with aj ∈ F2 and a0 = 0. Consider the linear recurrence xn = a0 xn−d + a1 xn−d+1 + . In the exercise below. xd−1 ∈ F2 . If x0 . (ii) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C has d distinct roots α1 . We assume that a0 = 0 and consider the auxiliary polynomial C(X) = X d − ad−1 X d−1 − · · · − a1 X − a0 . .5. xd−1 ) = a0 x0 + a1 x1 + . Exercise 16. (iii) Work out the ﬁrst few lines of Pascal’s triangle modulo 2. . .Deﬁnition 16. (i) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C has a root α in K. x1 . . αd in K. Show that xn = αn is a solution of ⋆ in K.3.1 is a linear feedback register if C(x0 . Show that the general solution of ⋆ in K is d xn = j=1 n bj αj for some bj ∈ K. We say that the function f of Deﬁnition 16. α2 .

If the root αu has multiplicity m(u). Deﬁnition 16. . x1 . Lemma 16. 55 . show that the general solution of ⋆ in K is q m(u)−1 xn = u=1 v=0 bu.v n n α v u for some bu. [1 ≤ u ≤ q]. .v ∈ K.7 below. . . If a stream (xn ) comes from a linear feedback generator with auxiliary polynomial C of degree d.20 (iii) and making the following remark.7.6. show that xn ∈ F2 for all n. The stream (xn ) comes from a linear feedback generator with auxiliary polynomial C if and only if the generating function for the stream is (formally) of the form G(Z) = B(Z) C(Z) with B a polynomial of degree strictly smaller than that of C. If we have a sequence (or stream) x0 . . The link with BCH codes is established by looking at Lemma 15.8. x1 . . If we can recover C from G then we have recovered the linear feedback generator from the stream. of elements of F2 then its generating function G is given by ∞ G(Z) = n=0 xj Z j . If x0 . xd−1 ∈ F2 .are linearly independent in the sense that m bj fj (n) = 0 j=0 for all n implies bj = 0 for 0 ≤ j ≤ m. (iv) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C factorises completely into linear factors. x2 . . A strong link with the problem of BCH decoding is provided by Theorem 16. Theorem 16. then C is determined by the condition G(Z)C(Z) ≡ B(Z) mod Z 2d with B a polynomial of degree at most d − 1.

recover C. Problem Given a generating function G for a stream and knowing that G(Z) = B(Z) C(Z) with B a polynomial of degree less than that of C and the constant term in C is c0 = 1. The Berlekamp–Massey method In this method we do not assume that the degree d of C is known. If det Aj = 0. . cd ) over as many terms of the stream as we wish. over F2 . . . . x2 x1 .) We then check our candidate for (c0 . 32 The Berlekamp–Massey method tells us to look successively at the matrices x2 x1 x0 x1 x0 . xd+1 xd ⋆ starting at Ar if it is known that r ≥ d. .We thus have the following problem. then j − 1 = d. (A little thought shows that. . the Berlekamp–Massey method is not an algorithm in the strict sense of the term although it becomes one if we put an upper bound on the possible values of d. 0 cd x2d x2d−1 . . . The Berlekamp–Massey solution to this problem is based on the observation that. . c1 . If det Aj = 0. . then j − 1 is a good candidate for d so we solve ⋆ on the assumption that d = j − 1. A1 = (x0 ). . (Note that a one dimensional subspace of Fd+1 contains only one non-zero vector. . A3 = x3 x2 x1 . we have 1 0 xd xd−1 . If it fails the test. . . . Note also that the determinant may be evaluated much faster using reduction to (rearranged) triangular form than by Cramer’s rule and that once the system is in (rearranged) triangular form it is easy to solve the associated equations. . det Aj can only take two values so there will be many false alarms. . . .. . . . x1 x0 c1 0 xd+1 xd .. A2 = x2 x1 x4 x3 x2 56 . . . we then know that d ≥ j and we start again32 . . . For each Aj we evaluate det Aj . . . As we have stated it. since d cj xn−j = 0 j=0 (with c0 = 1) for all n ≥ d. . . . = . if no upper Note that.

Each of the n members of the Faculty Board knows a certain pair of numbers (their shadow) and it is required that. no algorithm is possible because. P (r) (the shadow pair). x2 . . For the same reason the Berlekamp–Massey method will produce the B of smallest degree which gives G and not necessarily the original B. in the absence of the Chairman. 57 . She then gives the rth member of the Faculty Board the pair of numbers xr . . Q(j) = xrj . n. Suppose S must lie between 0 and N (it is sensible to choose S at random). . but no k − 1 members can do so.bound is put on d. The solution of linear equations gives us a method of ‘secret sharing’. . She then burns her calculations. Problem 16. . the Berlekamp–Massey method is very eﬀective in cases when d is unknown. 1 ≤ xj ≤ p − 1. with a suitable initial stream.9. She then chooses integers a1 . the Faculty Board will retreat to a bunker known as Meeting Room 23. By the properties of the Van der Monde 33 An institution like SPECTRE but without the charm.) In practice. P (rj ) [1 ≤ j ≤ k] are together. a linear feedback register with large d can be made to produce a stream whose initial values would be produced by a linear feedback register with much smaller d. . sets a0 = S and computes k−1 P (r) ≡ a0 + a1 xr + a2 x2 + · · · + ak−1 xr r mod p choosing 0 ≤ P (r) ≤ p − 1. however. a2 . any k members of the Faculty can reconstruct S from their shadows. Entry to the room involves tapping out a positive integer S (the secret) known only to the Chairman of the Faculty Board. The chairman chooses a prime p > N. By careful arrangement of the work it is possible to cut down considerably on the labour involved. . . Suppose that k members of the Faculty Board with shadow pairs yj . ak−1 at random and distinct integers x1 . How can this be done? Here is one neat solution. It is not generally known that CMS when reversed forms the initials of of ‘Secret Missile Command’. If the University is attacked by HEFCE33 . xn at random subject to 0 ≤ aj ≤ p − 1. to be kept secret from everybody else) and tells everybody the value of p.

1 yk 2 y1 2 y2 2 y3 . . + y2 zk−1 ≡ Q2 2 k−1 z0 + y3 z1 + y3 z2 + . so z = a and the secret S = z0 . .16) 1 y1 1 y2 1 y3 . . . + y2 zk−1 ≡ Q2 2 k−1 z0 + y3 z1 + y3 z2 + . . . yk−1 2 yk−1 yk−1 so the system of equations 2 k−1 z0 + y1 z1 + y1 z2 + . + y1 zk−1 ≡ Q1 2 k−1 z0 + y2 z1 + y2 z2 + . so k − 1 members of the Faculty Board have no way of saying that any possible values of S is more likely than any other. k−1 . . y2 k−1 2 2 2 y1 y2 y3 . 1≤j<i≤k−1 . ≡ Thus the system of equations 1≤j<i≤k−1 (yi − yj ) ≡ 0 2 k−1 z0 + y1 z1 + y1 z2 + . .. y2 k−1 . . . . . . . . y1 y2 y3 . y3 ≡ y1 y2 . . yk−1 (yi − yj ) ≡ 0 mod p. . yk−1 2 . . . . . k−1 2 z0 + yk z1 + yk z2 + . . . . + y3 zk−1 ≡ Q3 . . . . . yk−1 mod p. . 1 . . k−1 . . . . . . . . yk−1 .. . . . . . . yk . . . . k−1 k−1 k−1 k−1 y2 y3 y1 . . . . But we know that a is a solution. . 2 y1 2 y2 2 y3 . . . . .. .determinant (see Lemma 15. .. 2 yk k−1 .. . . . . y1 k−1 . . 58 . k−1 . . + yk−1 zk−1 ≡ Qk−1 has a solution. k−1 2 z0 + yk−1 z1 + yk−1 z2 + . . .. . y1 1 1 1 k−1 y1 y2 y3 . . . . . . y3 ≡ . + yk zk−1 ≡ Qk has a unique solution z.. . + y1 zk−1 ≡ Q1 2 k−1 z0 + y2 z1 + y2 z2 + . + y3 zk−1 ≡ Q3 . . whatever value of z0 we take. . On the other hand..

Cryptanalysis is the art of code breaking.) Even the best codes are like the lock on a safe. However good the lock is. 59 . or . Exercise 16. be observed that the best cryptographic systems of our ancestors (such as diplomatic ‘book codes’) served their purpose of ensuring secrecy for a relatively small number of messages between a relatively small number of people extremely well. Fortunately the Submarine codes formed part of an ‘Enigma system’ with certain exploitable weaknesses. However. or stolen together with its contents. The new mathematical science of cryptography with its promise of codes which are ‘provably hard to break’ seems to make everything that has gone before irrelevant. Two thousand years ago. cryptographic elements of larger possible cryptographic systems.10. (For an account of how these weaknesses arose and how they were exploited see Kahn’s Seizing the Enigma [4]. the proof that the method works needs to be substantially more careful. More pertinently. or the presumed contents of the safe may have been tampered with before they go into the safe. The coding schemes we shall consider. The planning of cryptographic systems requires not only mathematics but also engineering. In the same way. . psychology. humility and an ability to learn from 34 Some versions remained unbroken until the end of the war. or a key holder may be persuaded by fraud or force to open the lock. . however. Is the secret compromised if the values of the xj become known? 17 A short homily on cryptography Cryptography is the science of code making. It is the modern requirement for secrecy on an industrial scale to cover endless streams of messages between many centres which has made necessary the modern science of cryptography.One way of looking at this method of ‘secret sharing’ is to note that a polynomial of degree k − 1 can be recovered from its value at k points but not from its value at k − 1 points. are at best. . mathematicians are apt to feel that ‘Only recently has the true nature of cryptography been discovered’. considered in isolation. economics. It should. Lucretius wrote that ‘Only recently has the true nature of things been discovered’. it should be remembered that the German Naval Enigma codes not only appeared to be ‘provably hard to break’ (though not against the modern criteria of what this should mean) but. the safe may be broken open by brute force. probably were unbreakable in practice34 .

When you sign a traveller’s cheque ‘in the presence of the paying oﬃcer’ the process is intended. level (3) is considered a minimal requirement for a really secure system. The German Navy believed on good but mistaken grounds that they satisﬁed (2). When you ﬁll out a cheque giving the sum both in numbers and words you are seeking to protect the integrity of the cheque. Integrity A and B can be sure that no third party X can alter the message M. to protect authenticity and. others must remain secret for years. how long must the secret be kept? Some military and ﬁnancial secrets need only remain secret for a few hours. It hardly matters if a few people use forged tickets to travel on the underground. to produce non-repudiation. Clearly. from the bank’s point of view. Roughly speaking. consider the level of security required. If secrecy is aimed at. Authenticity B can be sure that A sent the message M . Here are some possible aims. Ci ) of messages Mi together with their encodings Ci . the best Enigma codes satisﬁed (1).past mistakes. to conclude this non-exhaustive list. (3) Prospective opponents should ﬁnd it hard to compromise your system even if they are allowed to produce messages Mi and given their encodings Ci . 60 . (1) Prospective opponents should ﬁnd it hard to compromise your system even if they are in possession of a plentiful supply of encoded messages Ci . We must also. safety at level (3) implies safety at level (2) and safety at level (2) implies safety at level (1). Nowadays. Consider a message M sent by A to B. Here are three possible levels. (2) Prospective opponents should ﬁnd it hard to compromise your system even if they are in possession of a plentiful supply of pairs (Mi . it is important to consider its purpose. Those who do not learn the lessons of history are condemned to repeat them. it does matter if a single unauthorised individual can gain privileged access to a bank’s central computer system. Secrecy A and B can be sure that no third party X can read the message M. from your point of view. Another point to consider is the level of security aimed at. Non-repudiation B can prove to a third party that A sent the message M . Level (3) would have appeared evidently impossible to attain until a few years ago. In considering a cryptographic system.

. k1 . . is enciphered as the cipher text stream z0 . sequences) of elements of F2 . given by zn = pn + kn . The secret services of the former Soviet Union were particularly fond of onetime pads. The security of the system depends on a secret (in our case the cipher stream) k shared between the cipherer and the encipherer. z1 . The anguished debate in the US about codes and privacy refers to the privacy of large organisations and their clients. It is sometimes said that it is hard to ﬁnd random sequences.) In the one-time pad. rather harder than might appear at ﬁrst sight. (Indeed. This is an example of a private key or symmetric system. The real diﬃculty lies in the necessity for sharing the secret Take ten of your favourite long books. 10 j=1 35 61 .n and set kn = xj. The plain text stream p0 . . we see that the enciphering function α has the property that α2 = ι the identity map. the cipher stream is a random sequence kj = Kj . not the privacy of communication from individual to individual. . Ciphers like this are called symmetric. writing α(p) = p + z. convert to binary sequences xj. In our case a deciphering method is given by the observation that pn = zn + kn . Knowledge of an enciphering method makes it easy to work out a deciphering method and vice versa. We work with streams (that is. provided both of you obey some elementary rules. z2 .16). p1 . Decipherment is impossible in principle. . indeed. . . p2 . . k2 . ﬁrst discussed by Vernam in 1926. Thus (in the absence of any knowledge of the ciphering stream) the codebreaker is just faced by a stream of perfectly random binary digits. where the Kj are independent random variables with Pr(Kj = 0) = Pr(Kj = 1) = 1/2. then we see that the Zj are independent random variables with Pr(Zj = 0) = Pr(Zj = 1) = 1/2. but it is not too diﬃcult to rig up a system for producing ‘suﬃciently random’ sequences35 . We use a cipher stream k0 .18 Stream ciphers One natural way of enciphering is to use a stream cipher.1000+j+n + sn where sn is the output of your favourite ‘pseudo-random number generator’ (in this connection see Exercise 27. If we write Zj = pj + Kj . . your correspondence will be safe from MI5. Give a memory stick with a copy of k to your friend and. and it is.

Suppose that f is a linear feedback register of length d. If a random sequence is reused it ceases to be random (it becomes ‘the same code as last Wednesday’ or the ‘the same code as Paris uses’) so. though this is a mistake. the Soviet Union’s need for one-time pads suddenly increased and it appears that pages were reused in diﬀerent pads. k1 . x1 . 0. kd−1 ) as the secret seed. in particular. his warning: . (i) f (x0 . .sequence k. then. US code-breakers managed to decode messages which. although several years old. the Soviet Union’s one-time pads became genuinely one-time again and the coded messages became indecipherable. .3 by using Lemma 14. One way that we might try to generate our ciphering string is to use a general feedback shift register f of length d with the initial ﬁll (k0 . . Show that the decimal expansion of a rational number must be a recurrent expansion. Note. However. . . Exercise 18. there will exist N. . we would like to start from a short shared secret ‘seed’ and generate a ciphering string k that ‘behaves like a random sequence’. This leads us straight into deep philosophical waters37 . . . . . Some theory should be used. . there is an illuminating discussion in Chapter III of Knuth’s marvellous The Art of Computing Programming [7]. show that a recurrent decimal represents a rational number. In practice. . k1 . . Lemma 18. xd−1 ) = (0. under the pressure of the cold war. (ii) Given any initial ﬁll (k0 . . .3. kd−1 ). . still provided useful information. . M ≤ 2d − 1 such that the output stream k satisﬁes kr+N = kr for all r ≥ M . new one-time pads must be sent out. . . given any initial ﬁll (k0 . . 37 Where we drown at once. 36 62 .1. Lemma 18.2. We can complement Lemma 18. If random bits can be safely communicated. there will exist N. xd−1 ) = (x0 . If f is a general feedback shift register of length d. or otherwise. it is one which it is very diﬃcult to exploit. . xd−1 ) if (x0 . . As might be expected. x1 . so can ordinary messages and the exercise becomes pointless. In 1941. random numbers should not be generated with a method chosen at random. in my opinion) modern view is that any sequence that can be generated by a program of reasonable length from a ‘seed’ of reasonable size is automatically non-random. . . by considering geometric series. M ≤ 2d such that the output stream k satisﬁes kr+N = kr for all r ≥ M . kd−1 ). since the best (at least. when there is a great deal of code traﬃc36 . Give a bound for the period in terms of the quotient.16 and the associated discussion. . . k1 . x1 . Conversely. . . . . If the reader reﬂects. she will see that. After 1944. 0).

p1 . . is produced by an unknown linear feedback register f of unknown length d ≤ D.5 shows that. p2D−1 and z0 . . It is easy to construct immensely complicated looking linear feedback registers with hundreds of registers. Thus if we have a message of length twice the length of the linear feedback register together with its encipherment the code is broken.9% of the calls made are paid for. . . Lemma 18. The massive increase in traﬃc required by war time conditions meant that the period was now too short. the In this sort of context we shall sometimes refer to the ‘auxiliary polynomial’ as the ‘feedback polynomial’.19 for a proof. Whatever they may say in public. The plain text stream p0 . During World War II the British Navy used codes whose period was adequately long for peace time use. but merely prevent little old ladies in tennis shoes watching subscription television without paying for it. cryptographic systems based on such registers are the equivalent of leaving your house key hidden under the door mat. z1 . large companies are happy to tolerate a certain level of fraud. .Lemma 18. If we are given p0 . is enciphered as the cipher text stream z0 . given by zn = pn + kn . German naval code breakers were able to identify coincidences and crack the British codes. By dint of immense toil. See Exercise 27. (We will note why this result is plausible. However. it does not follow that long periods guarantee safety. 38 63 . k2 .) It is well known that short period streams are dangerous. Lemma 18. . Unfortunately. . Professionals say that such systems seek ‘security by obscurity’. So long as 99. systems based on linear feedback registers are cheap and quite eﬀective. . z1 .4. z2D−1 then we can ﬁnd kr for all r. if you do not wish to baﬄe the CIA. .5. whilst short periods are deﬁnitely unsafe. k1 . but we will not prove it. Suppose that an unknown cipher stream k0 . p2 . from the point of view of a determined. . . . A linear feedback register of length d attains its maximal period 2d − 1 (for a non-trivial initial ﬁll) when the roots of the auxiliary polynomial38 are primitive elements of F2d . . Using the Berlekamp–Massey method we see that stream codes based on linear feedback registers are unsafe at level (2). well equipped and technically competent opponent. . p1 . z2 .

. Exercise 18. Here is an even easier remark. what 64 . .1% which ‘break the system’. β2 . Note that this means that adding streams from two linear feedback system is no more economical than producing the same eﬀect with one. . Lemma 18.7 actually lies in F2 [X] but (for those who are familiar with the phrase in quotes) this is an easy exercise in ‘symmetric functions of roots’. 1≤i≤N 1≤i≤M We shall probably only prove Lemmas 18.9. also be produced by another linear feedback system of shorter length. Then xn yn is a stream produced by a linear feedback system of length N M with auxiliary polynomial (X − αi βj ).proﬁts of a telephone company are essentially unaﬀected by the . . Suppose that xn is a stream produced by a linear feedback system of length N with auxiliary polynomial P and yn is a stream produced by a linear feedback system of length M with auxiliary polynomial Q. If xn is a stream produced by a linear feedback system of length N with auxiliary polynomial P and yn is a stream produced by a linear feedback system of length M with auxiliary polynomial Q. Suppose that xn is a stream which is periodic with period N and yn is a stream which is periodic with period M . . We shall also not prove that the polynomial 1≤i≤N 1≤i≤M (X − αi βj ) obtained in Lemma 18. αN and Q have roots β1 .6 and 18. What happens if we try some simple tricks to increase the complexity of the cipher text stream? Lemma 18. One of the most conﬁdential German codes (called FISH by the British) involved a complex mechanism which the British found could be simulated by two loops of paper tape of length 1501 and 1497. α2 . Let P have roots α1 . If kn = xn + yn where xn is a stream of period 1501 and yn is a stream of period 1497. then xn + yn is a stream produced by a linear feedback system of length N + M with auxiliary polynomial P (X)Q(X). . Then the streams xn + yn and xn yn are periodic with periods dividing the lowest common multiple of N and M . βM over some ﬁeld K ⊇ F2 . Indeed the situation may be worse since a stream produced by linear feedback system of given length may.8. Lemma 18. possibly.7 in the case when all roots are distinct. leaving the more general case as an easy exercise.7.6.

This is not the easy way out that it appears. If chosen by an amateur. then the j=1 Berlekamp–Massey method requires of the order of 1020 consecutive values of kn and the periodicity of kn can be made still more astronomical. if zn = 1.10. Example 18. If the cipher key stream is deﬁned by kn =xn kn =yn then kn = (yn + xn )zn + xn and the cipher key stream is that produced by linear feedback register. zn produced by linear feedback registers. Suppose we have 10 streams xj. Observe that (under reasonable conditions) about 2−1 of the xj. yn .is the longest possible period of kn ? How many consecutive values of kn would you need to to ﬁnd the underlying linear feedback register using the Berlekamp–Massey method if you did not have the information given in the question? If you had all the information given in the question how many values of kn would you need? (Hint. If we form kn = 10 xj. Can you ﬁnd xn for all n? It might be thought that the lengthening of the underlying linear feedback system obtained in Lemma 18.n j=1 will have value 1. look at xn+1497 − xn . if zn = pn + kn . Let me illustrate this by an informal argument. We have 3 streams xn . Our cipher key stream kn appears safe from prying eyes. but it is bought at a substantial price. However it is doubtful if the prying eyes will mind. Even if we just combine two streams xn and yn in the way suggested we may expect xn yn = 0 for about 75% of the time.n (without any peculiar properties) produced by linear feedback registers of length about 100. We must not jump to the conclusion that the best way round these difﬁculties is to use a non-linear feedback generator f . .n will have the value 1 and about 2−10 of the kn = 10 xj.) You have shown that. given kn for suﬃciently many consecutive n we can ﬁnd kn for all n.7 is worth having. 65 if zn = 0. Here is another example where the apparent complexity of the cipher key stream is substantially greater than its true complexity. The following is a simpliﬁed version of a standard satellite TV decoder. the complicated looking f so produced will have the apparent advantage that we do not know what is wrong with it and the very real disadvantage that we do not know what is wrong with it. Thus. in more than 999 cases out of a 1000 we will have zn = pn .n .

p) where k ∈ K. At the moment of its creation. taking logarithms. 39 66 . that we know F . it appeared as a series of letters not yet joined up in the form of words. Furthermore. for the original Torah was nothing but a disordered heap of letters. |B| denotes the number of elements of B. A mistake in the coding. as usual. For this reason. Here k will be the shared secret. . . this is unlikely to happen if |K||P| > 2n and is likely to happen if 2n is substantially greater than |K||P|. Suppose m also that we know that the ‘unknown secret’ or ‘key’ k ∈ K ⊆ F2 and the n ‘unknown message’ p ∈ P ⊆ F2 .Solve the system z = F (k. p) the encoded message which can be decoded by using the fact that G(k. In the next section we shall see that an even better arrangement is possible. the task is hopeless unless ⋆ has a unique solution39 Speaking even more roughly. these letters might have been joined diﬀerently to form another story. had it not been for Adam’s sin. so far as the potential code breaker is concerned. if n − mκ − nµ ‘According to some. that n = q and we know the ‘encoded message’ z. and z = F (k. or else will teach us how to read them according to a new disposition only after the coming of the Messiah. then. nor accents. They are also much more error tolerant. . in the Torah rolls there appear neither vowels nor punctuation. ⋆ Speaking roughly. Suppose. the cipher stream method only combines the ‘unknown secret’ (here the feedback generator f together with the seed (k0 .) Now recall the deﬁnition of the information rate given in Deﬁnition 6. However.’ ([1].) A reader of this footnote has directed me to the International Torah Codes Society. z) = p. For the kabalist. transmission or decoding of a single element only produces an error in a single place of the sequence. It might be better to consider m a system with two functions F : Fm × Fn → Fq and G : F2 × Fq → Fn such 2 2 2 2 2 that G(k.2. p ∈ P. There is one further remark to be made. p the message. . God will abolish the present ordering of the letters. p)) = p. arrangements like this have the disadvantage that the message p must be entirely known before it is transmitted and the encoded message z must have been entirely received before it can be decoded. as is often the case. Stream ciphers have the advantage that they can be decoded ‘on the ﬂy’. Chapter 2. There will continue to be circumstances where stream ciphers are appropriate. If the message set M has information rate µ and the key set (that is the shared secret set) K has information rate κ. k1 .Another approach is to observe that. F (k. (Here. we see that. kd−1 )) with the unknown message p in a rather simple way. the primordial Torah was inscribed in black ﬂames on white ﬁre. We are then faced with the problem:.

but the considerations above continue to apply. The shared secret is a single letter (the code for A say). This corresponds to the very general statement that the higher the information rate of the messages. we replace the ith element of our alphabet by the i + jth (modulo 27). instead of using binary code. we consider an alphabet of 27 letters (the English alphabet plus a space).4.6n − 26. (iv) Note that the larger µ is. κ = 1 and µ ≈ . this is unlikely. Thus n − mκ − nµ ≈ . 67 . κ = 1 and µ ≈ . The shared secret is a sequence of 26 letters (given the coding of the ﬁrst 26 letters. We must take logarithms to the base 27. the slower n − mκ − nµ increases.2.is substantially greater than 0. but. The English language treated in this way has information rate about . We have m = 1. Thus n − mκ − nµ ≈ . Sherlock Holmes solves such a code with n = 68 (so n − mκ − nµ ≈ 15) without straining the reader’s credulity too much and I would think that. If n = 1 (so n − mκ − nµ ≈ −. Example 18. The information rate is certainly less than . (iii) In the one-time pad m = n and κ = 1.5 and almost certainly greater than . The ideas just introduced can be formalised by the notion of unicity distance. the harder it is to break the code in which they are sent. Deﬁnition 18. (ii) In a simple substitution code.) (i) In the Caesar code. most of my audience could solve such a code with n = 200 (so n − mκ − nµ ≈ 100).11.4. If n = 10 (so n − mκ − nµ ≈ 5) a simple search through the 27 possibilities will almost always give a single possible decode. if it is substantially smaller.12. In The Dancing Men. unless the message is very carefully chosen. Suppose that. We have m = 26.6n − 1.4) it is obviously impossible to decode the message. The unicity distance of a code is the number of bits of message required to exceed the number of bits of information in the key plus the number of bits of information in the message. (This is very much a ball park ﬁgure. then ⋆ is likely to have a unique solution. the 27th can then be deduced). a permutation of the alphabet is chosen and applied to each letter of the code in turn. so (if µ > 0) n − mκ − nµ = −nµ → −∞ as n → ∞.4.

forms the ﬁrst modern treatment of cryptography in the open literature. 40 −1 Available on the web and in his Collected Papers. could break it. Suppose. The scheme can be generalised still further by splitting the secret in two. G and k it is very hard to ﬁnd l (ii) if we do not know l then. We then have −1 (pKA )KA = p. p) the encoded message which can be decoded by using the fact that G(l. Then the code is secure at what we called level (3). p)) = p. we discussed a general coding scheme depending on a shared secret key k known to the encoder and the decoder. Here (k. published in 1949. F (k. p).) If we only use our code once to send a message which is substantially shorter than the unicity distance. we can be conﬁdent that no code breaker.) However. So far the idea is interesting but not exciting. 19 Asymmetric systems Towards the end of the previous section. even if we know F . p). (A one-time pad has unicity distance inﬁnity. the fact that there is a unique solution to a problem does not mean that it is easy to ﬁnd. but need not know k. Suppose that the conditions speciﬁed above hold. Lemma 19. p n Consider a system with two functions F : Fm ×Fn → Fq and G : F2 ×Fq → F2 2 2 2 2 such that G(l. however. some of which are spelled out in the next section. simply because there is no unambiguous decode. l) will be be a pair of secrets. to believe that there exist codes for which the unicity distance is essentially irrelevant to the maximum safe length of a message. Let us write F (k. In this scheme. the encoder must know k. but need not know l and the decoder must know l.1. p the message and z = F (k. z) = zKA and think of pKA as −1 participant A’s encipherment of p and zKA as participant B’s decipherment of z. however gifted. We have excellent reasons.(The notion of information content brings us back to Shannon whose paper Communication theory of secrecy systems 40 . z) = p. G and k. p) = pKA and G(l. 68 . it is very hard to ﬁnd p from F (k. Such a system is called asymmetric. Then an opponent who is entitled to demand the encodings zi of any messages pi they choose to specify will still ﬁnd it very hard to ﬁnd p when given F (k. that we can show that (i) knowing F .

Lemma 19.1 tells us that such a system is secure however many messages are sent. Moreover, if we think of A as a spy-master, he can broadcast KA to the world (that is why such systems are called public key systems) and invite anybody who wants to spy for him to send him secret messages in total conﬁdence41 . It is all very well to describe such a code, but do they exist? There is very strong evidence that they do, but, so far, all mathematicians have been able to do is to show that provided certain mathematical problems which are believed to be hard are indeed hard, then good codes exist. The following problem is believed to be hard. Problem Given an integer N , which is known to be the product N = pq of two primes p and q, ﬁnd p and q. Several schemes have been proposed based on the assumption that this factorisation is hard. (Note, however, that it is easy to ﬁnd large ‘random’ primes p and q.) We give a very elegant scheme due to Rabin and Williams. It makes use of some simple number theoretic results from IA and IB. The reader may well have seen the following results before. In any case, they are easy to obtain by considering primitive roots. Lemma 19.2. If p is an odd prime the congruence x2 ≡ d mod p is soluble if and only if d ≡ 0 or d(p−1)/2 ≡ 1 modulo p. Lemma 19.3. Suppose p is a prime such that p = 4k − 1 for some integer k. Then, if the congruence x2 ≡ d mod p has any solution, it has dk as a solution. We now call on the Chinese remainder theorem. Lemma 19.4. Let p and q be primes of the form 4k − 1 and set N = pq. Then the following two problems are of equivalent diﬃculty. (A) Given N and d ﬁnd all the m satisfying m2 ≡ d mod N. (B) Given N ﬁnd p and q.

Although we make statements about certain codes along the lines of ‘It does not matter who knows this’, you should remember the German naval saying ‘All radio traﬃc is high treason’. If any aspect of a code can be kept secret, it should be kept secret.

41

69

(Note that, provided that d ≡ 0, knowing the solution to (A) for any d gives us the four solutions for the case d = 1.) The result is also true but much harder to prove for general primes p and q. At the risk of giving aid and comfort to followers of the Lakatosian heresy, it must be admitted that the statement of Lemma 19.4 does not really tell us what the result we are proving is, although the proof makes it clear that the result (whatever it may be) is certainly true. However, with more work, everything can be made precise. We can now give the Rabin–Williams scheme. The spy-master A selects two very large primes p and q. (Since he has only done an undergraduate course in mathematics, he will take p and q of the form 4k − 1.) He keeps the pair (p, q) secret, but broadcasts the public key N = pq. If B wants to send him a message, she writes it in binary code and splits it into blocks of length m with 2m < N < 2m+1 . Each of these blocks is a number rj with 2 0 ≤ rj < N . B computes sj such that rj ≡ sj modulo N and sends sj . The spy-master (who knows p and q) can use the method of Lemma 19.4 to ﬁnd one of four possible values for rj (the four square roots of sj ). Of these four possible message blocks it is almost certain that three will be garbage, so the fourth will be the desired message. If the reader reﬂects, she will see that the ambiguity of the root is genuinely unproblematic. (If the decoding is mechanical then ﬁxing 50 bits scattered throughout each block will reduce the risk of ambiguity to negligible proportions.) Slightly more problematic, from the practical point of view, is the possibility that someone could be known to have sent a very short message, that is to have started with an m such that 1 ≤ m ≤ N 1/2 but, provided sensible precautions are taken, this should not occur. If I Google ‘Casino’, then I am instantly put in touch with several of the world’s ‘most trusted electronic casinos’ who subscribe to ‘responsible gambling’ and who have their absolute probity established by ‘internationally recognised Accredited Test Facilities’. Given these assurances, it seems churlish to introduce Alice and Bob who live in diﬀerent cities, can only communicate by e-mail and are so suspicious of each other that neither will accept the word of the other as to the outcome of the toss of a coin. If, in spite of this diﬃculty, Alice and Bob wish to play heads and tails (the technical expression is ‘bit exchange’ or ‘bit sharing’), then the ambiguity of the Rabin–Williams scheme becomes an advantage. Let us set out the steps of a ‘bit sharing scheme’ based on Rabin–Williams. STEP 1 Alice chooses at random two large primes p and q such that p ≡ q ≡ 3 mod 4. She computes n = pq and sends n to Bob. STEP 2 Bob chooses a random integer r with 1 < r < n/2. (He wishes to hide r from Alice, so he may take whatever other precautions he wishes in 70

choosing r.) He computes m ≡ r2 mod n and sends m to Alice. STEP 3 Since Alice knows p and q she can easily compute the 4 square roots of m modulo n. Exactly two of the roots r1 and r2 will satisfy 1 < r < n/2. (If s is a root, so is −s.) However, Alice has no means of telling which is r. Alice writes out r1 and r2 in binary and chooses a place (the kth digit say) where they diﬀer. She then tells Bob ‘I choose the value u for the kth bit’. STEP 4 Bob tells Alice the value of r. If the value of the kth bit of r is u, then Alice wins. If not, Bob wins. Alice checks that r2 ≡ m mod n. Since, −! r1 r2 is a square root of unity which is neither 1 nor −1, knowing r1 and r2 is equivalent to factoring n, she knows that Bob could not lie about the value of r. Thus Alice is happy. STEP 5 Alice tells Bob the values of p and q. He checks that p and q are primes (see Exercise 27.12 for why he does this) and ﬁnds r1 and r2 . After Bob has veriﬁed that r1 and r2 do indeed diﬀer in the kth bit, he also is happy, since there is no way Alice could know from inspection of m which root he started with.

20

Commutative public key systems

In the previous sections we introduced the coding and decoding functions −1 KA and KA with the property that (pKA )KA = p, and satisfying the condition that knowledge of KA did not help very much in −1 ﬁnding KA . We usually require, in addition, that our system be commutative in the sense that −1 (pKA )KA = p.

−1 and that knowledge of KA does not help very much in ﬁnding KA . The Rabin–Williams scheme, as described in the last section, does not have this property. Commutative public key codes are very ﬂexible and provide us with simple means for maintaining integrity, authenticity and non-repudiation. (This is not to say that non-commutative codes can not do the same; simply that commutativity makes many things easier.)

−1

Integrity and non-repudiation Let A ‘own a code’, that is know both KA −1 −1 and KA . Then A can broadcast KA to everybody so that everybody can −1 decode but only A can encode. (We say that KA is the public key and KA the private key.) Then, for example, A could issue tickets to the castle ball 71

Lemma 20. However. Shamir. The reader will have met the RSA in IA. A must have written it. Authenticity If B wants to be sure that A is sending a message then B can send A a harmless random message q. However.1. If B receives back a message p such −1 that pKA ends with the message q then A must have sent it to B. A truly patriotic lecturer would refer to the ECW system. If A wants to send a message p to B he computes −1 −1 q = pKA and sends pKB followed by qKB . B can now use the fact that (qKB )KB = q to recover p and q. since Ellis. Cocks and Williamson discovered the system earlier. they worked for GCHQ and their work was kept secret. q−1).) −1 Signature Suppose now that B also owns a commutative code pair (KB . Adleman) system. Fitzpatrick. q) with this property. 42 −1 −1 72 . A could not deny that he had issued the invitation. B then observes that qKA = p. KB ) −1 and has broadcast KB . It was the RSA system42 that ﬁrst convinced the mathematical community that public key systems might be feasible. However. Unbreakable password systems do not prevent computer systems being regularly penetrated by ‘hackers’ and however ‘secure’ a transaction on the net may be it can still involve a rogue at one end and a fool at the other. If N = pq and λ(N ) = lcm(p−1.carrying the coded message ‘admit Joe Bloggs’ which could be read by the recipients and the guards but would be unforgeable. Let p and q be primes. There is now a charming little branch of the mathematical literature based on these ideas in which Albert gets Bertha to authenticate a message from Caroline to David using information from Eveline. The most famous candidate for a commutative public key system is the RSA (Rivest. Jacob. but we will push the ideas a little bit further. a cryptographic system is only as strong as its weakest link. (Anybody can copy a coded message but only A can control the content. then M λ(N ) ≡ 1 (mod N ) for all integers M coprime to N . Since only A can produce a pair (p. Katherine and Laszlo play bridge without using a pack of cards. for the same reason. Gilbert and Harriet whilst Ingrid.

Since we wish to appeal to Lemma 19.4, we shall assume in what follows that we have secretly chosen large primes p and q. We choose an integer e and then use Euclid’s algorithm to check that e and λ(N ) are coprime and to ﬁnd an integer d such that de ≡ 1 (mod λ(N )).

If Euclid’s algorithm reveals that e and λ(N ) are not coprime, we try another e. Since others may be better psychologists than we are, we would be wise to use some sort of random method for choosing p, q and e. The public key includes the value of e and N , but we keep secret the value of d. Given a number M with 1 ≤ M ≤ N − 1, we encode it as the integer E with 1 ≤ E ≤ N − 1 E ≡ Md (mod N ).

The public decoding method is given by the observation that E e ≡ M de ≡ M for M coprime to N . (The probability that M is not coprime to N is so small that it can be neglected.) As was observed in IA, high powers are easy to compute. Exercise 20.2. Show how M 2 can be computed using n multiplications. If 1 ≤ r ≤ 2n show how M r can be computed using at most 2n multiplications. To show that (providing that factoring N is indeed hard) ﬁnding d from e and N is hard we use the following lemma. Lemma 20.3. Suppose that d, e and N are as above. Set de − 1 = 2a b where b is odd. (i) a ≥ 1. (ii) If y ≡ xb (mod N ) and y ≡ 1 then there exists an r with 0 ≤ r ≤ a−1 such that r z = y 2 ≡ 1 but z 2 ≡ 1 (mod N ). Combined with Lemma 19.4, the idea of Lemma 20.3 gives a fast probabilistic algorithm where, by making random choices of x, we very rapidly reduce the probability that we can not ﬁnd p and q to as close to zero as we wish. Lemma 20.4. The problem of ﬁnding d from the public information e and N is essentially as hard as factorising N . 73

n

Remark 1 At ﬁrst glance, we seem to have done as well for the RSA code as for the Rabin–Williams code. But this is not so. In Lemma 19.4 we showed that ﬁnding the four solutions of M 2 ≡ E (mod N ) was equivalent to factorising N . In the absence of further information, ﬁnding one root is as hard as ﬁnding another. Thus the ability to break the Rabin-Williams code (without some tremendous stroke of luck) is equivalent to the ability to factor N . On the other hand it is, a priori, possible that someone may ﬁnd a decoding method for the RSA code which does not involve knowing d. They would have broken the RSA code without ﬁnding d. It must, however, be said that, in spite of this problem, the RSA code is much used in practice and the Rabin–Williams code is not. Remark 2 It is natural to ask what evidence there is that the factorisation problem really is hard. Properly organised, trial division requires O(N 1/2 ) operations to factorise a number N . This order of magnitude was not bettered until 1972 when Lehman produced a O(N 1/3 ) method. In 1974, Pollard43 produced a O(N 1/4 ) method. In 1979, as interest in the problem grew because of its connection with secret codes, Lenstra made a breakthrough to 1/2 a O(ec((log N )(log log N )) ) method with c ≈ 2. Since then some progress has 1/3 been made (Pollard reached O(e2((log N )(log log N )) ) but, in spite of intense eﬀorts, mathematicians have not produced anything which would be a real threat to codes based on the factorisation problem. A series of challenge numbers is hosted on the Wikipedia article entitled RSA. In 1996, it was possible to factor 100 (decimal) digit numbers routinely, 150 digit numbers with immense eﬀort but 200 digit numbers were out of reach. In May 2005, the 200 digit challenge number was factored by F. Bahr, M. Boehm, J. Franke

Although mathematically trained, Pollard worked outside the professional mathematical community.

43

74

and T. Kleinjunge as follows 27997833911221327870829467638722601621 07044678695542853756000992932612840010 76093456710529553608560618223519109513 65788637105954482006576775098580557613 57909873495014417886317894629518723786 9221823983 = 35324619344027701212726049781984643 686711974001976250236493034687761212536 79423200058547956528088349 × 7925869954478333033347085841480059687 737975857364219960734330341455767872818 152135381409304740185467 but the 210 digit challenge 24524664490027821197651766357308801846 70267876783327597434144517150616008300 38587216952208399332071549103626827191 67986407977672324300560059203563124656 12184658179041001318592996199338170121 49335034875870551067 remains (as of mid-2008) unfactored. Organisations which use the RSA and related systems rely on ‘security through publicity’. Because the problem of cracking RSA codes is so notorious, any breakthrough is likely to be publicly announced44 . Moreover, even if a breakthrough occurs, it is unlikely to be one which can be easily exploited by the average criminal. So long as the secrets covered by RSA-type codes need only be kept for a few months rather than forever45 , the codes can be considered to be one of the strongest links in the security chain.

And if not, is most likely to be a government rather than a Maﬁa secret. If a suﬃciently robust ‘quantum computer’ could be built, then it could solve the factorisation problem and the discrete logarithm problem (mentioned later) with high probability extremely fast. It is highly unlikely that such a machine would be or could be kept secret, since it would have many more important applications than code breaking.

45 44

75

3. However there is another way forward which has the advantage of wider applicability since it also can be used to protect the integrity of open (noncoded) messages and to produce password systems.) One way of increasing security against tampering is to ﬁrst code our message by a classical coding method and then use our RSA (or similar) scheme on the result. 46 76 . that Deﬁnition 21.3 is more of a statement of notation than a useful deﬁnition. In the same way. but this is not so. I observe the resulting Z1 and Z2 and 3 then transmit Z1 followed by Z2 . it may not be necessary to be able to write a message in order to tamper with it. at once.) Deﬁnition 21.) The ﬁrst requirement of a good signature function is that the space M should be much larger than the space S so that H is a many-to-one function (in fact a great-many-to-one function) and we can not During World War II. Suppose that.1. Example 21. What will (I hope) be the result of this transaction? We say that the RSA scheme is vulnerable to ‘homomorphism attack’ that is to say an attack which makes use of the fact our code is a homomorphism. this included the radios. The messages are then encoded according to the RSA scheme discussed after Lemma 20. by wire tapping or by looking over peoples’ shoulders.1 d d as Z1 = M1 and Z2 = M2 . I discover that a bank creates messages in the form M1 . (Note that we shall be concerned with the ‘signature of the message’ and not the signature of the sender. British bomber crews used to spend the morning before a night raid testing their equipment. M2 where M1 is the name of the client and M2 is the sum to be transferred to the client’s account. Here is a somewhat far fetched but worrying example.21 Trapdoors and signatures It might be thought that secure codes are all that are needed to ensure the security of communications. I then enter into a transaction with the bank which adds $ 1000 to my account. These are the so-called signature systems. A signature or trapdoor or hashing function is a mapping H : M → S from the space M of possible messages to the space S of possible signatures. then θ(M1 M2 ) = θ(M1 )θ(M2 ). (Let me admit. (If θ(M ) = M d . It is not necessary to read a message to derive information from it46 .2. Discuss brieﬂy the eﬀect of ﬁrst using an RSA scheme and then a classical code. Exercise 21.

then If A sends the message m followed by the signature (r. H(Mi )) of messages Mi together with their encodings Ci . if it is soluble. is Euclid’s algorithm which will also reveal if (**) is insoluble. we simply choose another k at random and try again. (mod p). m is replaced by some hash function H(m) of m so (∗∗) becomes H(m) ≡ ur + ks (mod p − 1). A then releases the values of p. Whenever he sends a message m (some positive integer). There is a small point which I have glossed over here and elsewhere. g and y = g u (modulo p) but keeps the value of u secret. even if it is impossible to tamper with a message–signature pair it is always possible to copy one. I leave it to the reader to think about level 3 security (or to look at section 12. some integer 1 < g < p and some other integer u with 1 < u < p (as usual. If (**) is insoluble. Here is a signature scheme due to Elgamal47 . some randomisation scheme should be used). is very hard. 48 47 77 . s). Obviously we should aim at the same kind of security as that oﬀered by our ‘level 2’ for codes:Prospective opponents should ﬁnd it hard to ﬁnd H(M ) given M even if they are in possession of a plentiful supply of message– signature pairs (Mi . Needless to say. If conditions (*) and (**) are satisﬁed. he chooses another integer k with 1 ≤ k ≤ p − 2 at random and computes r and s with 1 ≤ r ≤ p − 1 and 0 ≤ s ≤ p − 2 by the rules48 r ≡ gk m ≡ ur + ks g m ≡ y r rs (mod p). which is known as the discrete logarithm problem. (*) (**) Lemma 21. This is Dr Elgamal’s own choice of spelling according to Wikipedia. 49 Sometimes. it is believed that the only way to forge signatures is to ﬁnd u from g u (or k from g k ) and it is believed that this problem. Every message should thus contain a unique identiﬁer such as a time stamp. Unless k and and p − 1 are coprime the equation (**) may not be soluble. The message sender A chooses a very large prime p. In this case the recipient checks that g H(m) ≡ y r rs (mod p). Since k is random.6 of [10]). However the quickest way to solve (**). the recipient need only verify the relation g m ≡ y r rs (mod p) to check that the message is authentic49 .work back from H(M ) to M . (mod p − 1). The second requirement is that S should be large so that a forger can not (sensibly) hope to hit on H(M ) by luck.4.

The modern coding schemes which we have discussed have the disadvantage that they require lots of computation. For the Web. Once the secret key has been sent slowly. where we must deal speedily with a lot of less than world shattering messages sent by impatient individuals. both A and B can compute k = g αβ modulo p and k becomes the shared secret key. Classical coding schemes are fast but become insecure with reuse. A and B must assume that p and g are public knowledge. Key exchange schemes use modern codes to communicate a new secret key for each message. the classical code is secure. Since a diﬀerent secret key is used each time.The evidence that the discrete logarithm problem is very hard is of the same kind of nature and strength as the evidence that the factorisation problem is very hard. Our ﬁnal cryptographic system has the advantage that it too will reveal attempts to read it. Since the telephone line is insecure. It also has the advantage that. instead of relying on the unproven belief that a certain mathematical task is hard. This is not a disadvantage when we deal slowly with a few important messages. g α and g β (modulo p) and this is the discrete logarithm problem. g α and g β (modulo p). B chooses randomly a secret number β and tells A the value of g β (modulo p). 22 Quantum cryptography In the days when messages were sent in the form of letters. it depends on the fact that a certain physical task is impossible50 . suspicious people might examine the creases where the paper was folded for evidence that the letter had been read by others. The eavesdropper is left with the problem of ﬁnding k ≡ g αβ from knowledge of g. A sends B a (randomly chosen) large prime p and a randomly chosen g with 1 < g < p − 1. Since g αβ ≡ (g α )β ≡ (g β )α . 78 . It is conjectured that this is essentially as hard as ﬁnding α and β from the values of g. 50 If you believe our present theories of the universe. We conclude our discussion with a description of the Diﬃe– Hellman key exchange system which is also based on the discrete logarithm problem. a fast classical method based on the secret key is used to encode and decode the message. How is this done? Suppose A and B are at opposite ends of a tapped telephone line. this is a grave disadvantage. A now chooses randomly a secret number α and tells B the value of g α (modulo p).

is the vertically polarised state and ↔ is the horizontally polarised state. β ∈ R. We have an orthonormal basis consisting of and ↔ by +. if Alice’s photon was actually diagonally polarised (at angle ±π/4). . . of bits (zeros and ones) and Bob produces another secret random sequence b1 b2 . . Suppose Eve51 intercepts a photon passing between Alice and Bob. α2 + β 2 = 1. Quantum mechanics tells us that a polarised photon has a state φ = α| + β| ↔ where α. We now consider a second basis given by 1 =√ | 2 1 + √ | ↔ and 2 1 =√ | 2 1 −√ |↔ 2 in which the states correspond to polarisation at angles π/4 and −π/4 to the horizontal. The system we sketch is called the BB84 system (since it was invented by Bennett and Brassard in 1984) but there is another system invented by Ekert. Let us see how this can (in principle) be used to produce a key exchange scheme (so that Alice and Bob can agree on a random number to act as the basis for a classical code). she knows that it was vertically polarised when Alice sent it and can pass on a vertically polarised photon to Bob. The business of dealing with realistic systems is a topic of active research within the faculty. STEP 2 Alice produces another secret random sequence c1 c2 . . It will pass through a horizontal polarising ﬁlter with probability β 2 and its state will then be ↔ . 79 . . of bits. If Eve knows that it is either horizontally or vertically polarised. However. It is certain that anyone who can do better than Eve would get the Nobel prize for physics since they would have overturned the basis of Quantum Mechanics. Such a photon will pass through a vertical polarising ﬁlter with probability α2 and its state will then be . STEP 1 Alice produces a secret random sequence a1 a2 . It is possible that the ﬁnder of a fast factorising method would get a Field’s medal. If the photon passes through. .We shall deal with a highly idealised system. 51 This is a traditional pun. She transmits it to Bob as follows. Observe that a photon in either state will have a probability 1/2 of passing through either a vertical or a horizontal ﬁlter and will then be in the appropriate state. then she can use a vertical ﬁlter. If the photon does not pass through through. . this procedure will result in Eve sending Bob a photon which is horizontally or vertically polarised. she knows that the photon was horizontally polarised and can pass on a horizontally polarised photon to Bob.

if Eve does not examine the associated photon. they should try to keep these communication secret. then dj = cj . if a right he sets dj = 1. that aj = bj in about half the cases. Alice and Bob only look at the ‘good cases’ when aj = bj . then with probability 1/4. the number of agreements is suﬃciently large for the purposes set out below. If he records a vertical polarisation. if a horizontal he sets dj = 1. With probability 1/2. (If not. she uses a horizontally polarised photon. she uses a vertically polarised photon. If bj = 1. If the number of discrepancies is greater than the chosen level. Since Eve records a diagonal polarisation she will send a diagonally polarised photon to Bob and. Standard statistical techniques tell Alice and Bob that. If aj = 1 and cj = 0. STEP 3 If bj = 0. if the number of discrepancies is below a certain level. Thus dj = cj . we can be pretty sure. Eve records the correct polarisation and sends Bob a correctly polarised photon. STEP 7 If Eve is intercepting less than a proportion p of photons and q > p (with q chosen in advance) the probability that she will have intercepted more than a proportion q of the remaining ‘good’ photons is less than ǫ/4. If Eve does examine the associated photon. she uses a ‘left diagonally’ polarised photon. he sets dj = 0. If he records a left diagonal polarisation. Of course. Although we shall not do this. we can ensure that. To see this. If aj = 1 and cj = 1.) In particular. by the law of large numbers.) With probability 1/2. (The other cases may be treated in exactly the same way. Bob uses a vertical polariser to examine the jth photon. with probability of at least 1 − ǫ/4 (where ǫ is chosen in advance). aj = 0 so the photon is vertically or horizontally polarised. STEP 4 Bob and Alice use another communication channel to tell each other the values of the aj and bj . since Bob’s polariser is vertical. Bob and Alice can agree to start again.If aj = 0 and cj = 0. STEP 6 Alice uses another communication channel to tell Bob the value of a randomly chosen sample of good cases. STEP 5 If the sequences are long. he sets dj = 0. dj = cj . the reader who has ploughed through these 80 . but we shall assume that worst has happened and these values become known to Eve. If aj = 0 and cj = 1. Bob uses a π/4 diagonal polariser to examine the jth photon. aj = 1 so the photon is diagonally polarised. he will record a vertical polarisation with probability 1/2. Alice and Bob will abandon the attempt to communicate. she uses a ‘right diagonally’ polarised photon. In such cases. we examine the case when cj = 0 and Eve uses a diagonal polariser. the probability that Eve is intercepting more than a previously chosen proportion p of photons is less than ǫ/4.

A little thought shows that allowing her a free choice of angle will make little difference. There are things we know we know. 52 81 . If smiling white coated technicians install big gleaming machines with ‘Unbreakable Quantum Code Company’ painted in large letters above the keyboard in the homes of Alice and Bob. First we have assumed that Eve must hold her polariser at a small ﬁxed number of angles. Money will buy the appearance of security. There are various gaps in the exposition above. It is less easy to ﬁnd applications where it is better than the best choice of one of these three ‘classical’ methods52 . We also know One problem is indicated by the ﬁrst British military action in World War I which was to cut the undersea telegraph cables linking Germany to the outside world. then the values of p and q can be adjusted to take account of this. Quantum cryptography has deﬁnite advantages when matched individually against RSA. unless they decide that their messages are being partially read. then Eve only need capture one and let the rest go. quantum cryptography will appeal to those who need to persuade others that they are using the latest and most expensive technology to guard their secrets. As we know. There is a further engineering problem that it is very diﬃcult just to send single photons every time. Secondly. this ought to make little diﬀerence.notes will readily accept that Bob and Alice can use the message conveyed through the remaining good photons to construct a common secret such that Eve has probability less than ǫ/4 of guessing it. so we can not detect eavesdropping. since physical systems always have imperfections. cryptographic elements of larger possible cryptographic systems. Of course. secret sharing (using a large number of independent channels) or one-time pads. some ‘good’ photons will produce errors even in the absence of Eve. Only thought will buy the appropriate security for a given purpose at an appropriate cost. This means that p in STEP 5 must be chosen above the ‘natural noise level’ and the sequences must be longer but. However as I said before coding schemes are at best. If there are only a few. Thus. it does not automatically follow that their communications are safe. There are several networks in existence which employ quantum cryptography. Alice and Bob can agree a shared secret with probability less than ǫ that an eavesdropper can guess it. Complex systems are easier to disrupt than simple ones. And even then we can not be sure. again. There are known knowns. If there are too many groups of photons.

and lie behind the superior performance of CD players and similar artifacts. Outsiders often think of mathematicians as guardians of abstruse but settled knowledge. 54 53 82 . Even those who understand that there are still problems unsettled. we need to go into the question of how the information content of a message is to be measured.’ However. whether secret or error correcting. But there are also unknown unknowns.) The question of what makes a calculation diﬃcult could not even have Rumsfeld We are now in the 21st century. In order to go further into the study of codes. Although the twin subjects of error-correcting codes and cryptography occupy a small place in the grand panorama of modern mathematics. it seems to me that they form a very suitable topic for such a ﬁnal course. ask what mathematicians will do when they run out of problems. when we look at this course. ‘Information theory’ has its roots in the code breaking of World War II (though technological needs would doubtless have led to the same ideas shortly thereafter anyway). Kline’s magniﬁcent Mathematical Thought from Ancient to Modern Times [5] is pervaded by the melancholy thought that.There are known unknowns. The ones we don’t know We don’t know53 . they may become more and more baroque and inbred. That is to say We know there are some things We do not know. ‘You are not the mathematicians your parents were’ whispers Kline ‘and your problems are not the problems your parents’ were. Its development required a level of sophistication in treating probability which was simply not available in the 19th century. At a more subtle level. but I suspect that we are still part of the mathematical ‘long 20th century’ which started in the 1880s with the work of Cantor and like minded contemporaries. 23 Further reading For many students this will be one of the last university mathematics course they will take. though the problems will not run out. (Even the Markov chain is essentially 20th century54 . we see that the idea of errorcorrecting codes did not exist before 1940. The best designs of such codes depend on the kind of ‘abstract algebra’ that historians like Kline and Bell consider a dead end.

they still contain deep unanswered questions. Such ideas would be most welcome and. hiding data is more romantic. the construction of algorithms was a minor interest of a few mathematicians. Developments by Turing and Church of G¨del’s theorem gave us a theory of computational complexo ity which is still under development today. Finally. they might come from outside the usual charmed circles. For budding cryptologists and cryptographers (as well as those who want a good read). be 55 Just as quantum cryptography was. The economic and practical importance of transmitting. . I had asked that a cable from Washington to New Delhi . Before 1950. the invention of the electronic computer has produced a cultural change in the attitude of mathematicians towards algorithms. the best book I know for further reading is Welsh [10]. we would consider a mathematician as much as a maker of algorithms as a prover of theorems. The notion of the probabilistic algorithm which hovered over much of our discussion of secret codes is a typical invention of the last decades of the 20th century.) Today. For example How close to the Shannon bound can a ‘computationally easy’ error correcting code get? Do provably hard public codes exist? Even if these questions are too hard. For the present course. However. There are links with the profound (and very 20th century) question of what constitutes a random number. Those who wish to learn about error correction from the horse’s mouth will consult Hamming’s own book on the matter [2]. After this. the book of Goldie and Pinch [8] provides a deeper idea of the meaning of information and its connection with the topic. there must surely exist error correcting and public codes based on new ideas55 . The question of whether there exist ‘provably hard’ public codes is intertwined with still unanswered questions in complexity theory. Although both the subjects of error correcting and secret codes are now ‘mature’ in the sense that they provide usable and well tested tools for practical application. although they are most likely to come from the professionals. The book by Koblitz [6] develops the number theoretic background. 83 . . storing and processing data far outweighs the importance of hiding it. I conclude with a quotation from Galbraith (referring to his time as ambassador to India) taken from Koblitz’s entertaining text [6]. Kahn’s The Codebreakers [3] has the same role as is taken by Bell’s Men of Mathematics for budding mathematicians.been thought about until G¨del’s theorem (itself a product of the great ‘founo dations crisis’ at the beginning of the 20th century). (Gauss and Jacobi were considered unusual in the amount of thought they gave to actual computation.

They said no. [10] D. 1991. Washington DC. [6] N. E. New York. I asked if they assumed I could read it. E. no facilities existed for decoding. W. Kahn Seizing the Enigma Houghton Miﬄin. MAA. [8] G. [7] D. Goldie and R. Pinch Communication Theory CUP. (A lightly revised edition has recently appeared. M. The third edition of Volumes I to III is appearing during this year and the next (1998–9). Blackwell. They said that when something arrived in code. 1983. Eco The Search for the Perfect Language (English translation). 1988. Oxford 1995. [9] T. It arrived in code. Hamming Coding and Information Theory (2nd edition) Prentice Hall. Thompson From Error-correcting Codes through Sphere Packings to Simple Groups Carus Mathematical Monographs 21. Knuth The Art of Computing Programming Addison-Wesley. They brought it to me at the airport — a mass of numbers. 84 .reported to me through the Toronto consulate. [2] R. Welsh Codes and Cryptography OUP. 1972. [5] M. G. M. 1967. 1991. Kline Mathematical Thought from Ancient to Modern Times OUP. References [1] U. 1987. I asked how they managed.) [4] D. [3] D. they phoned Washington and had the original read to them. Kahn The Codebreakers: The Story of Secret Writing MacMillan. 1986. Koblitz A Course in Number Theory and Cryptography Springer. Boston.

) Consider two alphabets A and B and a coding function c : A → B ∗ (i) Explain. (Exercises 1. c(1) = 00 show that c is injective but c∗ is not. 4.1. Show that the product of two preﬁx-free codes is preﬁx free. 2. 1}. that exactly twelve questions are required to provide full understanding of six hours of mathematics and that the same twelve questions should be appropriate for students of all abilities and all levels of diligence. 5. c is decodable. why. but have provided some extra questions in the various exercise sheets for those who scorn such old wives’ tales. (ii) Let A = B = {0. believed both by supervisors and supervisees. 1}. (iii) Let A = {1. Decode − • − • ∗ − − − ∗ − • • ∗ • ∗. (ii) Consider ASCII code. 6} and B = {0.There is a widespread superstition. but the product of a decodable code and a preﬁx-free code need not even be decodable. The product of two codes cj : Aj → Bj is the code g : A1 × A2 → (B1 ∪ B2 )∗ given by g(a1 . Decode 110001111000011100010. 24 Exercise Sheet 1 A→•−∗ D → − • •∗ O → − − −∗ B →−•••∗ E → •∗ S → • • •∗ C →−•−•∗ F →••−•∗ 7 → − − • • •∗ Q 24. Explain why.3. 3. if c is injective and ﬁxed length. Q 24. A → 1000001 a → 1100001 + → 0101011 B → 1000010 b → 1100010 ! → 0100001 C → 1000011 c → 1100011 7 → 0110111 Encode b7!. without using the notion of preﬁx-free codes. if c is injective and ﬁxed length. Show that there is no decodable coding c such that all code words have length 2 or less ∗ Q 24.3.2. a2 ) = c1 (a1 )c2 (a2 ).1 and 1. I have tried to keep this in mind. c is preﬁx-free.4 and 1. 1. Show that there is a variable length coding c such that c is injective and all code words have length 2 or less.) (i) Consider Morse code. (Exercises 1. If c(0) = 0.7.2. 85 .

4. M2 has probability 1/4 and Mj has probability 1/248 for 3 ≤ j ≤ 64.Q 24.24. 4}. Suppose that X1 = 1 with probability 1/2 and Xj+1 = Xj with probability . .10) (i) Let A = {1. a 52 card pack is dealt to provide 4 hands of 13 cards each.99 independent of what has gone before.6.23. Check that these results are consistent with the appropriate theorems of the course. (ii) Consider 4 messages with the following properties. Show that loga b = log b . 01. M1 has probability 1/2. Xj+9 .8. M3 has probability .1) Suppose that we have a sequence Xj of random variables taking the values 0 and 1. Show that any assignment of the code words 00. log a Q 24. 2.6 and 4. (ii) We found a Huﬀman code ch for the system in Example 2.7. (ii) Suppose we wish to send the bits Xj Xj+106 Xj+2×106 .4 and that E|ch (A)| = 1. we have a decodable code which on average requires about 5/2 bits to transmit the sequence. Show that the entropy is approximately 1. Q 24. that E|c(A)| = 2. (i) Suppose we wish to send 10 successive bits Xj Xj+1 .5. . . then the length of a code word must be at least 6. Q 24.) (i) Consider 64 messages Mj . 86 . (Exercise 5.5 and 2.4. Explain why any decodable code will require on average at least 10 bits to transmit the sequence. 3. (Exercises 2. In Bridge. . By using the ideas of Huﬀman’s algorithm (you should not need to go through all the steps) obtain a set of code words such that the expected length of a code word sent is no more than 3. . . (You need not do detailed computations. . (ii) Let a. 10 and 11 produces a best code in the sense of this course.) Q 24. a9 with 11a0 a1 . if we use code words of equal length. the sequence of ten ones with 10 and any other sequence a0 a1 . M1 has probability . a9 .7) (i) Apply Huﬀman’s algorithm to the nine messages Mj where Mj has probability j/45 for 1 ≤ j ≤ 9.27. Show that if we associate the sequence of ten zeros with 0.26 and M4 has probability . M2 has probability .85. (Exercise 4. . (Exercises 2. Explain why. Suppose that the probability that letter k is chosen is k/10. b > 0.6. Xj+9×106 . Use your calculator to ﬁnd ⌈− log2 pk ⌉ and write down a Shannon–Fano code c.9.

Q 24. (ii) Questions like (i) are rather artiﬁcial. Write down the exact form of the test when there are 2n crew members and pi = 2−n .’ ’Yet I say that they are marked!’ he replied hotly.10.) Explain how. we consider the following question. . since they require that exactly one person carries the virus. If the contents of a hand are conveyed by one player to their partner by a series of nods and shakes of the head how many movements of the head are required? Show that at least 40 movements are required. [You may assume for simplicity that the player to whom the information is being communicated does not look at her own cards. sm then m log2 m ≤ s1 + s2 + · · · + sm ≤ (m2 + m − 2)/2. l’Anglais?’ I said. of course. with a chilling sneer. Suppose that the probability that any member of a population of 2n has a certain disease is p (and that the probability ‘Marked cards. s2 . Give a simple code requiring 52 movements. The ship’s computer has worked out that the probability of crew member number i harbouring the virus is pi . . in his queer foreign jargon.’ I answered tartly. However. ’They are used. Weyman 57 ‘Captain. Bah. M. the expected number of tests can be minimised. one symbol from an alphabet of m letters is reserved to end each code word. Blood samples are taken from all crew members. by testing pooled samples. (i) In a comma code.’ 56 87 . J. who is. ye canna be serious.9. A test is available that will detect the virus at any dilution. like Morse code. I am told. the power required is such that the ship’s force shields must be switched oﬀ57 for a minute during each test. has the disease is p1 . you knew! You have swindled me!’ ’Monsieur is easy to swindle – when he plays with a mirror behind him. Show that this code is preﬁx-free and give a direct argument to show that it must satisfy Kraft’s inequality. (ii) Give an example of a code satisfying Kraft’s inequality which is not decodable. . Show that if an optimal binary code has word lengths s1 . (Thus the probability that the captain. (In fact this does not make a diﬀerence since the two players do not acquire any shared information by looking at their own cards.11. Q 24. (i) It is known that exactly one member of the starship Emphasise has contracted the Macguﬃn virus. to trap players–not unbirched schoolboys. Under the Red Robe S.(i) Purely as a matter of interest. sir. ’In my last hand I had nothing.)] (ii) If instead the player uses the initial letters of words (say using the 16 most common letters). You doubled the stakes. how many words will you need to utter56 ? Q 24. number 1.

these circumstances do occur together from time to time and the idea then produces public health beneﬁts much more cheaply than would otherwise be possible. You wish to ﬁnd the heavy coin in the fewest average number of weighings. If 3r + 1 ≤ m ≤ 3r+1 show that you can label each coin with a ternary number a1 a2 . 2} in such a way that the number of coins having 1 in the jth place equals the number of coins with 2 in the jth place for each j (think Huﬀman ternary trees). (i) A set of m apparently identical coins consists of m − 1 coins and one heavier coin. You are given a balance in which you can weigh equal numbers of the coins and determine which side (if either) contains the heavier coin. no one may be ill)? Show that the expected number of tests required by the modiﬁed scheme is no greater than pn2n+1 + 1. Explain why there cannot be a testing scheme which can be guaranteed to require less than 2n tests to diagnose all members of the population. pooling schemes will be less complicated. solve the problem stated in the ﬁrst part 88 .is independent of the health of the others) and there exists an error free test which can be carried out on pooled blood samples which indicates the presence of the disease in at least one of the samples or its absence from all. Schemes like these are only worthwhile if the disease is rare and the test is both expensive and will work on pooled samples. . Q 24. Usually a group of x people are tested jointly and. if the joint test shows the disease. If p is small. each is tested individually. the probability that an individual has the disease. Q 24. 1.13. (iii) Extend the algorithm to cover general m by introducing messages of probability zero. ar+1 with aj ∈ {0. By considering the Huﬀman algorithm problem for preﬁx-free codes on an alphabet with three letters. (i) Give the appropriate generalisation of Huﬀman’s algorithm to an alphabet with a symbols when you have m messages and m ≡ 1 mod a − 1. indeed. .12. Explain why this is not sensible if p is large but is sensible (with a reasonable choice of x) if p is small. Explain why the cost of testing a large population of size x is no more than about 2pcx log2 x with c the cost of a test. explain why there is an optimum value for x Write down (but do not attempt to solve) an equation which indicates (in a ‘mathematical methods’ sense) that optimum value in terms of p. However. How does the scheme suggested in the last sentence of (i) need to be modiﬁed to take account of the fact that more than one person may be ill (or. (iii) In practice. (ii) Prove that your algorithm gives an optimal solution.

Q 24. amongst all random variables X taking values in the non-negative integers with the same expected value µ [0 < µ < ∞].] n S 1/2 min E(k ) ≤ k √ 2 pi . the geometric distribution maximises the entropy. Mn with non-zero probabilities p1 . pi (ii) Show that 1/2 = pi k si /2 k −si /2 . by enemy agents since Operational Research spent so many man-hours on its solution.and show that you do indeed have a solution. Let S be the codeword length when the message is encoded by a decodable code c : A → B ∗ where B is an alphabet of k letters. 1/2 [Hint: Look for a code with codeword lengths si = ⌈− logk pi /λ⌉ for an appropriate λ. M2 .’58 ] Q 24. (You must allow for the possibility of inﬁnite entropy. . p2 . pn . Show that you need at least 3 weighings. (i) Show that n √ 2 pi i=1 ≤ E(k S ) [Hint: Cauchy–Schwarz. A source produces a set A of messages M1 . . there are many accounts of this problem on the web. . [In fact you can always do it in 3 weighings. As might be expected. . but the problem of showing this ‘is said to have been planted during the war . that is to say Pr(Y = k) = pk (1 − p) [0 < p < 1]. . . i=1 where the minimum is taken over all decodable codes. 58 89 . . .) Compute the expected value EY and entropy H(Y ) in the case when Y has the geometric distribution. .15.14. Show that your solution also minimises the maximum number of weighings that you might have to do. Show that. Extend the deﬁnition of entropy to a random variable X taking values in the non-negative integers.] The quotation comes from Pedoe’s The Gentle Art of Mathematics which also gives a very pretty solution. (ii) Suppose the problem is as before but m = 12 and the odd coin may be heavier or lighter.

. (Exercise 6. If we use the Continent.1. . . (v) Does (iv) remain true if we replace ‘adjacent’ by ‘diﬀerent’ ? Errors of type (ii) and (iv) are the most common in typing.) 11 j=1 (i) Find a couple of books and check that (∗) holds for their ISBNs. Would this be true if there were 9 possible letters repeated cyclically? Would this be true if there were 12 possible letters repeated cyclically? Give reasons. a2 . .1) In the model of a communication channel. Show that if we combine the Candidate Number and the Desk Number the combined code is one error correcting. (ii) Show that (∗) will not work if you make a mistake in writing down one digit of an ISBN.) The ﬁrst four numbers in the Candidate Identiﬁer identify the candidate uniquely. 1. . we take the probability p of error to be less than 1/2. 3235B.2. Each ISBN consists of nine such digits a1 ..4. .3. (vi) Since the ISBN contained information such as the name of the publisher. (Thus candidate 0004 sitting at desk 425 writes down 0004D − −425. . . (*) (In more sophisticated language.3. In communication between publishers and booksellers. 9 and X representing 10. Q 25. (Exercise 7. 8.) If you look at the inner title page of almost any book published between 1974 and 2007. (Exercise 7. . 3236C. a9 followed by a single check digit a10 chosen so that 10a1 + 9a2 + · · · + 2a9 + a10 ≡ 0 mod 11. our code C consists of those elements a ∈ F10 such that 10 (11 − j)aj = 0. Why do we not consider the case 1 ≥ p > 1/2? What if p = 1/2? Q 25. Country. 90 . you will ﬁnd its International Standard Book Number (ISBN). (iii) Show that (∗) may fail to detect two errors. (iv) Show that (∗) will not work if you interchange two distinct adjacent digits (a transposition error). The ISBN uses single digits selected from 0.25 Exercise Sheet 2 Q 25. Show that if the candidate makes one error in the Candidate Identiﬁer then that error can be detected without using the Desk Number. only a small proportion of possible ISBNs could be used59 and the 59 The same problem occurs with telephone numbers. (the eleven possible letters are repeated cyclically) and a desk number. both sides are anxious that errors should be detected but would prefer the other side to query errors rather than to guess what the error might have been. .) In an exam each candidate is asked to write down a Candidate Number of the form 3234A.

. (i) What is the expected Hamming distance between two randomly chosen code words in Fn .) Suppose we use eight hole tape with the standard paper tape code and the probability that an error occurs at a particular place on the tape (i. Q 25. direct calculation (possible with a hand calculator but really no advance on the Poisson method). ≤ A(δ) r j j=0 (We use weaker estimates in the course but this is the most illuminating. Q 25. Explain why the program requires about 17 500 lines of tape but that any particular line will be correctly decoded with probability about 1 − (21 × 10−8 ) and the probability that the entire program will be correctly decoded is better than 99. show that the probability that the tape will be accepted as error free by the decoder is less than .e. (Exercise 7.5. Show that the n-fold repetition code is perfect if and only if n is odd. A new system was introduced which was is compatible with the system used to label most consumer goods. or otherwise. Using the Poisson approximation.04%.6. . (As usual we suppose implicitly that the two choices 2 are independent and all choices are equiprobable. ﬁnd an A(δ) > 0 such that. 8. 91 . 1. Subscriber system we will need longer numbers than if we just numbered each member of the human race. the appropriate ISBN became a 13 digit number x1 x2 . Give an example to show that we cannot detect all transpositions. The particular value of A(δ) is unimportant so do not waste time trying to ﬁnd a ‘good’ value. we have r n n . whenever 0 ≤ r ≤ nδ. . . Q 25. .4. A program requires about 10 000 lines of tape (each line containing eight places) using the paper tape code. x13 with each digit selected from 0. 9 and the check digit x13 computed by using the formula x13 ≡ −(x1 + 3x2 + x3 + 3x4 + · · · + x11 + 3x12 ) mod 10.) Town. a hole occurs where it should not or fails to occur where it should) is 10−4 . . After January 2007.5.7.6%. Suppose now that we use the Hamming scheme (making no use of the last place in each line).) Q 25.system described above started to ‘run out of numbers’. If 0 < δ < 1/2. Show that we can detect single errors.

x > 0) when you retire. (ii) Let us write q = 1 − p.8. n xi = x.(ii) Three code words are chosen at random from Fn . Show that. By means of a piece of ﬁnancial wizardry called ditching (or something like that) the whizkid can oﬀer you a pension plan which for the cost of xi will return −1 Kxi qi if the world is in state i.] Q 25.) Consider the situation described in the ﬁrst paragraph of Section 11. (Formally. in your opinion. he will empower you by giving you a ﬁxed sum x now. you come away with the following information. the world will be in exactly one of n states. (Exercises 11. One way is to consider Tchebychev’s inequality. of course. your fortune will decrease. but. boring and incomprehensible. If kn is the expected 2 value of the distance between the closest two. your fortune will still tend to increase but more slowly. [There are many ways to do (ii). but nothing otherwise. Write down the equation for w1 . if you bet less than the optimal proportion. Find the appropriate choices of xi . [Moral: If you use the Kelly criterion veer on the side under-betting.) On philosophical grounds you decide to maximise the expected value S of the logarithm of the sum received on retirement. (Here qi > 0 and n i=1 qi = 1. K.9.3. Assuming that you will have to live oﬀ this sum for the rest of your life. After a long lecture in which the whizkid manages to be simultaneously condescending.] Q 25. In order to help you and the rest of the staﬀ. You must also i=1 take xi ≥ 0. (iii) Show that. When you retire.2 and 11. Do they depend on the qi ? 92 . if you bet more than some proportion w1 . E log Yn = p log p + q log q + log u − q log(u − 1). Your employer announces that he is abandoning the old-fashioned paternalistic scheme under which he guarantees you a ﬁxed sum Kx (where. why this choice is reasonable or explain why it is unreasonable. if up > 1 and we choose the optimum w. show that n−1 kn → 1/2 as n → ∞. (i) Show that for the situation described you should not bet if up ≤ 1 and should take up − 1 w= u−1 if up > 1. explain. your employer arranges that you should obtain advice from a ﬁnancial whizkid with a top degree from Cambridge. Instead.) The probability that the world will be in state i is pi . You must invest the entire ﬁxed sum. to invest as you wish.

12. S = log Kx. u) from the received message u. 00101110001 and so on) together with the zero code word. (i) Show that −t ≥ log(1 − t) for 0 ≤ t < 1. If you receive 1001 what will you decode it as.13. There are 4 codewords 1100. Show that he will choose qi = pi . using each of the following rules? (i) The ideal observer rule: ﬁnd b ∈ C so as to maximise Pr(b sent | u received}. Suppose that each bit sent through a channel has probability 1/3 of being mistransmitted. We may suppose that what is good for you is bad for him so he will seek to minimise S for your best choices. 1}8 obtained by using the Hamming code for the ﬁrst 7 bits and the ﬁnal bit as a check digit so that x1 + x2 + · · · + x8 ≡ 0 mod 2.Suppose that K is ﬁxed. (ii) Show that. 93 . 0110. Consider the code c : {0. we work under the assumption that all messages sent through our noisy channel are equally likely. 1}4 → {0. if δN > 0. Q 25. (i) The original Hamming code was a 7 bit code used in an 8 bit system (paper tape). Show that. then N −1 m=1 (1 − mδN ) → 0. In this question we drop this assumption. Q 25.10. How many errors can it detect? How many can it correct? (ii) Given a code of length n which corrects e errors can you always construct a code of length n + 1 which detects 2e + 1 errors? Q 25. Let C be the code consisting of the word 10111000100 and its cyclic shifts (that is 01011100010. with these choices. Is C linear? Show that C has minimum distance 5.11. Q 25. 1111 sent with probabilities 1/4. 1 − N δN > 0 and N 2 δN → ∞. (iii) The minimum distance rule: ﬁnd b ∈ C so as to minimise the Hamming distance d(b. 0001. 1/6. 1/2. Find the minimum distance for this code. 1/12. In general. (ii) The maximum likelihood rule: ﬁnd b ∈ C so as to maximise Pr(u received | b sent}. but the whizkid can choose qi .

r) be the number of points in a Hamming ball of radius r in Fn and let p(n. 2βn . if 2 Nn 2−n V (n. N. show that. αn) → 0. By observing that if m non-intersecting balls are already placed. r) be the probability that N such balls chosen at 2 random do not intersect. Nn . if 2β + H(α) > 1. Thus simply throwing balls down at random will not give very good systems of balls with empty intersections. rn ) → ∞. rn ) → 0. (iv) Show that. then p(n. then an m + 1st ball which does not intersect them must certainly not have its centre in one of the balls already placed.(iii) Let V (n. 94 . then p(n.

q < 1/2]. d . decoding with error correction and then recoding. (e) Show that the minimum distance of the truncation C − is d(C) or d(C) − 1 and that both cases can occur.2. (f) Show that puncturing cannot decrease the minimum distance. provided the symbol chosen to puncture by is 0.4) code and its dual. Q 26. Q 26. 1) is y 2 + (2d+1 − 2)x2 95 d d−1 y2 d−1 + x2 . Write down the weight enumerators of the trivial code (that is to say.26 Exercise Sheet 3 Q 26. {0}). sending through the second channel and decoding with error correction again or will this produce no improvement on treating the whole thing as a single channel and coding and decoding only once? Q 26. Give an example to show that C ′ may not be linear if we puncture by 1. Write down the weight enumerators and verify that they satisfy the MacWilliams identity.4. Q 26. sending through the ﬁrst channel. Fn ). (d) Show that the minimum distance of the parity extension C + is the least even integer n with n ≥ d(C). Q 26.3. Is this true if we replace ‘truncation’ by ‘puncturing’ ? (c) Give an example where puncturing reduces the information rate and an example where puncturing increases the information rate. then so are its extension C + . but give examples to show that the minimum distance can stay the same or increase. List the codewords of the Hamming (7. the repetition code and the 2 simple parity code. write down a generator matrix for C1 |C2 .5.1. Show that the result behaves as if it had been passed through a binary symmetric channel with probability of error to be determined. If C1 and C2 are linear codes of appropriate type with generator matrices G1 and G2 . (b) Show that extension followed by truncation does not change a code. Can we improve the rate at which messages are transmitted (with low error) by coding. truncation C − and puncturing C ′ . the zero code (that is to say. A message passes through a binary symmetric channel with probability p of error for each bit and the resulting message is passed through a second binary symmetric channel which is identical except that there is probability q of error [0 < p. Show that the probability of error is less than 1/2. (a) Show that if C is linear.6. Show that the weight enumerator of RM (d.

(i) Show that every codeword in RM (d. even if 2n /V (n. (i) Show that 2k if y ∈ C ⊥ (−1)x. 23) is a power of 2.7. d − 1) has even weight. [The MacWilliams identity for binary codes] Let C ⊆ Fn be 2 a linear code of dimension k.9. 0) = 5 and that ci (x) = 1 whenever xi = 1. V (90. (vi) Show that V (3. we can ﬁnd a unique c(x) ∈ C such that d(c(x). 2) (ii) Suppose that C is a perfect 2 error correcting code of length 90 and size 278 . ﬁnd the number of solutions to the equation c(x) = c(y) with x ∈ X and. corresponding to each x ∈ X. / x∈C (ii) If t ∈ R.6 and 8. r) has dual code RM (m.) Q 26. x) = 2. by considering the number of elements of X. 2 Show that. show that d(c(x). x) = 3}.7. m − r − 1). (iii) By using parts (i) and (ii) to evaluate w(y) s (−1)x. d(0. (iii) Let C be as in (ii) with 0 ∈ C. (In this case a perfect code exists called the binary Golay code. tw(y) (−1)x. If y ∈ X. (ii) Show that RM (m. r)⊥ . without loss of generality. e) is an integer. (Exercises 8.y = 0 if y ∈ C ⊥ . no perfect code may exist. Consider the set X = {x ∈ F90 : x1 = 1. m − r − 1) ⊆ RM (m.y t x∈C y∈Fn 2 96 . obtain a contradiction.Q 26. (i) Verify that 290 = 278 . Explain why we may suppose. show that y∈Fn 2 Q 26.) We show that. show that RM (m. x2 = 1. 278 ] code. that 0 ∈ C.8.y = (1 − t)w(x) (1 + t)n−w(x) . or otherwise. (iii) By considering dimension. (v) Conclude that there is no perfect [90. (iv) Continuing with the argument of (iii).

13. (iii) Let K be a ﬁeld containing F2 in which X 7 − 1 factorises into linear factors. this follows directly from general theory but direct calculation is not uninstructive. Q 26. β 2 } as deﬁning set is Hamming’s original (7. Q 26.11. Why are they easier to deal with than errors? Find a necessary and suﬃcient condition on the parity check matrix for it to be always possible to correct t erasures. What are the other cyclic codes? (iii) Identify the dual codes for each of the codes in (ii). then β is a primitive root of unity and β 2 is also a root of X 3 + X + 1.] Q 26.) Prove the following results.12. The BCH code with {β. (Example 15. (i) If K is a ﬁeld containing F2 . b ∈ K. (i) Identify the cyclic codes of length n corresponding to each of the polynomials 1. (iv) We continue with the notation of (iii).in two diﬀerent ways.14. Q 26.4) code. (ii) Show that there are three cyclic codes of length 7 corresponding to irreducible polynomials of which two are versions of Hamming’s original code. obtain the MacWilliams identity WC ⊥ (s. Consider the collection K of polynomials a0 + a1 ω with aj ∈ F2 manipulated subject to the usual rules of polynomial arithmetic and to the further condition 1 + ω + ω 2 = 0. Show by ﬁnding a generator and writing out its powers that K ∗ = K \ {0} is a cyclic group under multiplication and deduce that K is a ﬁnite ﬁeld. then P (a)2 = P (a2 ) for all a ∈ K.10. 97 . X − 1 and X n−1 + X n−2 + · · · + X + 1. then (a + b)2 = a2 + b2 for all a. Find a necessary and suﬃcient condition on the parity check matrix for it never to be possible to correct t erasures (ie whatever message you choose and whatever t erasures are made the recipient cannot tell what you sent). (ii) If P ∈ F2 [X] and K is a ﬁeld containing F2 . If β is a root of X 3 + X + 1 in K. An erasure is a digit which has been made unreadable in transmission. [Of course. t) = 2− dim C WC (t − s. t + s).

n = 3. 2 [Hint: To show d′ ≥ ⌈ d ⌉. consider. Check directly that any two people can ﬁnd S but no single individual can. Why does this not invalidate our method? 98 .15. If we take k = 3. a0 = S = 2. Implement the secret sharing method of page 57 with k = 2. n = 4.] (iv) Show that k−1 n≥d+ u=1 ⌈ 2d ⌉. the coordinates where xj = yj 2 and the coordinates where xj = yj . u Why does (iv) imply (ii)? Give an example where n > d + k − 1. (ii) Show that n ≥ d + k − 1. a1 = 3. p = 6. (i) Show that C contains a codeword x with exactly d non-zero digits. (iii) Prove that truncating C on the non-zero digits of x produces a code C ′ of length n − d. for y ∈ C.Q 26. rank k − 1 and distance d′ ≥ ⌈ d ⌉. Let C be a binary linear code of length n. Q 26. rank k and distance d.14. xj = j +1 show that the ﬁrst two members and the fourth member of the Faculty Board will be unable to determine S uniquely. xj = j + 1 p = 7.

Show that the functions fj : Z → F2 n fj (n) = j are linearly independent in the sense that m bj fj (n) = 0 j=0 99 . 4 and 9 Q 27. A binary non-linear feedback register of length 4 has deﬁning relation xn+1 = xn xn−1 + xn−3 . (Exercise 18. (i) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C has a root α in K. show that a recurrent decimal represents a rational number. . . (Exercise 16. Q 27. Show that xn = αn is a solution of ⋆ in K. Conversely. Give a bound for the period in terms of the quotient. . or otherwise. 2. αd in K. + ad−1 xn−1 ⋆ with aj ∈ F2 and a0 = 0. .2. Show that the state space contains 4 cycles of lengths 1. (iii) Work out the ﬁrst few lines of Pascal’s triangle modulo 2. α2 . .1. .5. . A binary LFR was used to generate the following stream 110001110001 .] Q 27. by considering geometric series. [The LFR has length 4 but you should work through the trials for length r for 1 ≤ r ≤ 4.4. . (ii) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C has d distinct roots α1 . . If x0 .) Show that the decimal expansion of a rational number must be a recurrent expansion. show that xn ∈ F2 for all n. . xd−1 ∈ F2 . Recover the feedback polynomial by the Berlekamp–Massey method.27 Exercise Sheet 4 Q 27.2.3. Show that the general solution of ⋆ in K is d xn = j=1 n bj αj for some bj ∈ K. . x1 .) Consider the linear recurrence xn = a0 xn−d + a1 xn−d+1 + . .

. p2 + k2 . Consider the recurrence relation n−1 un+p + j=0 cj uj+p = 0 over a ﬁeld (if you wish. If x0 . . . . (Exercise 18. Would your answers be the same if c0 = 0? Q 27. p2 + k3 . xd−1 ∈ F2 . show that the general solution of ⋆ in K is q m(u)−1 xn = u=1 v=0 bu. k2 .7. If the root αu has multiplicity m(u) [1 ≤ u ≤ q]. . . x1 .6.for all n implies bj = 0 for 1 ≤ j ≤ m. p2 . We work in F2 .v ∈ K. . what is the longest possible period of kn ? How many consecutive values of kn would you need to to ﬁnd the underlying linear feedback register using the Berlekamp–Massey method if you did not have the information given in the question? If you had all the information given in the question how many values of kn would you need? (Hint. (iv) Suppose K is a ﬁeld containing F2 such that the auxiliary polynomial C factorises completely into linear factors.9. pN . . Can you ﬁnd xn for all n? Q 27. un−1 un Find the characteristic and minimal polynomials for M . . I have a secret sequence k1 . transmit p1 + k2 . If kn = xn + yn where xn is a stream of period 1501 and yn is a stream of period 1497. show that xn ∈ F2 for all n. = M . pN + kN +1 . .) One of the most conﬁdential German codes (called FISH by the British) involved a complex mechanism which the British found could be simulated by two loops of paper tape of length 1501 and 1497.) You have shown that. pN + kN and then. . . .5. . Q 27. how would you go about ﬁnding my message? Can you now decipher other messages sent using the same part of my secret sequence? 100 . . .) We suppose c0 = 0. . look at xn+1497 − xn . . I transmit p1 + k1 . . Write down an n × n matrix M such that u0 u1 u1 u2 . given kn for suﬃciently many consecutive n we can ﬁnd kn for all n. and a message p1 . you may take the ﬁeld to be R but the algebra is the same for all ﬁelds. . . .v n n α v u for some bu. by error. Assuming that you know this and that my message makes sense.

I announce that I shall be using the Rabin–Williams scheme with modulus N . it is believed that the last letter of the message will be B if the Abacan ﬂeet is at sea. B and C and to avoid the use of spaces. my cat eats the piece of paper on which the prime factors of N are recorded.8. Q 27. 120022010211121001001021002021 Although nobody in Naval Intelligence reads Abacan. on theological grounds.25. My agent now recodes the message and sends it to me again. the Abacan Navy uses codes in which the 3r + ith number is x3r+i + yi modulo 3 [0 ≤ i ≤ 2] where xj = 0 if the jth letter of the message is A. Give an example of a homomorphism attack on an RSA code.5 and the letters B and C both have frequency .Q 27. The Admiralty are desperate to know the last letter and send a representative to your rooms in Baker Street to ask your advice. Q 27. The dreaded SNDO of X’Dofdro intercept both code messages. 2 in some order. Show that they can ﬁnd m. Radio interception has picked up the following message. 1. so I am unable to decipher it. to use an alphabet containing only three letters A.11. xj = 1 if the jth letter of the message is B. the letter A has frequency . Unfortunately. Give it. I therefore ﬁnd a new pair of primes and announce that I shall be using the Rabin–Williams scheme with modulus N ′ > N . My agent in X’Dofdro sends me a message m (with 1 ≤ m ≤ N − 1) encoded in the requisite form. was led. Extend the Diﬃe–Hellman key exchange system to cover three participants in a way that is likely to be as secure as the two party scheme. y1 and y2 are the numbers 0. St Abacus. Extend the system to n parties in such a way that they can compute their common secret key by at most n2 −n communications of ‘Diﬃe–Hellman type numbers’. who established written Abacan. Show in reasonable detail that the Elgamal signature scheme defeats it. (The numbers p and g of our original Diﬃe-Hellman system are known by everybody in advance.) Show that this can be done using at most 2n − 2 communications by including several ‘Diﬃe–Hellman type numbers’ in one message. In order to disguise this. Can they decipher any other messages sent to me using only one of the coding schemes? Q 27.9.) In modern Abacan. 101 . (Thus an Abacan book consists of single word. xj = 2 if the jth letter of the message is C and y0 .10.

Choose e and d so that ade ≡ a mod N . but you are only asked to consider this one. the Vice-Chancellor decrees that the university telephone directory should bear on its cover a number N (a product of two very large secret primes) and each name in the University Directory should be followed by their personal encryption number ei . decrypt using the secret d.) Messages a from the Vice-Chancellor to members of staﬀ are encrypted in the standard manner as aei modulo N and decrypted as bdi modulo N . Q 27. ‘Choose N a large prime.12. the Professor of Applied Numismatics (number u in the University Directory) has intercepted messages from the Vice-Chancellor to her hated rival. Explain why she can decode them. How can the outsider read the message? Can she read other messages sent from the Vice-Chancellor to the ith member of staﬀ only? (ii) By means of a phone tapping device. and sending Bob p = p1 and q = p2 p3 . Suppose further that Alice cheats by choosing 3 primes p1 . Explain how Alice can shift the odds of heads to 3/4. How many multiplications are needed to compute ae modulo N ? (iii) Why is it unwise to choose primes p and q with p − q small when forming N = pq for the RSA method? Factorise 1763. The messages are coded 102 .14. the Professor of Pure Numismatics (number v in the University Directory).15. To advertise this fact to the world. Consider the bit exchange scheme proposed at the end of Section 19. The University of Camford is proud of the excellence of its privacy system CAMSEC.13.Q 27. The Poldovian Embassy uses a one-time pad to communicate with the notorious international spy Ivanovich Smith. (She has other ways of cheating.’ Why is this not a good code? (ii) In textbook examples of the RSA code we frequently see e = 65537. Suppose that we replace STEP 5 by:. An outsider intercepts the encrypted message to individuals i and j where ei and ej are coprime. The Vice-Chancellor knows all the secret decryption numbers di but gives these out on a need to know basis only.Alice sends Bob r1 and r2 and Bob checks that 2 2 r1 ≡ r2 ≡ m mod n. (i) Consider the Fermat code given by the following procedure.) Q 27. p2 . (i) The Vice-Chancellor sends a message to all members of the University. p3 . encrypt using the publicly known N and e. (Of course each member of staﬀ must know their personal decryption number but they are instructed to keep it secret. What moral should be drawn? Q 27.

Instead of choosing k at random. I write the following. Suppose that X and Y are independent random variables taking values in Zn . (Here we treat K as a vector space over F2 . the person whom they employ to carry the messages is actually the MI5 agent ‘Union’ Jack Caruthers in disguise. MI5 are on the verge of arresting Ivanovich when ‘Union’ Jack is given the message LRP F OJQLCU D. I use the Elgamal signature scheme described on page 77.in the obvious way. Advise MI5.17. Q 27. Show that H(X + Y ) ≥ max{H(X). H(Y )}. (If the pad has C the 3rd letter of the alphabet and the message has I the 9th then the encrypted message has L the 3 + 9th. What mistakes have I made? 0000001 0000000 0002048 0000001 1391142 0000000 0177147 1033288 1391142 1174371.) 103 . Conﬁdent in the unbreakability of RSA.16. Q 27. (Recall that α is a generator of the cyclic group K \{0} under multiplication. Q 27.) Unknown to them. Advise me on how to increase the security of messages. Show that it will often be possible to ﬁnd my privacy key u from two successive messages. Q 27.) Let T : K → F2 be a non-zero linear map. Caruthers knows that the actual message is F LY XAT XON CE and suggests that ‘the boﬃns change things a little’ so that Ivanovich deciphers the message as REM AIN XHERE. I increase the value used by 2 each time I use it. Work modulo 26. The only boﬃn available is you. Why is this remark of interest in the context of one-time pads? Does this result remain true if X and Y need not be independent? Give a proof or counterexample. Let K be the ﬁnite ﬁeld with 2d elements and primitive root α.18.19.

) (iii) Show that the period of the system (that is to say the minimum period of T ) is 2d − 1. Show further that S is non-degenerate (that is to say S(x. y) = T (xy) is a symmetric bilinear form. (Part (iii) shows that it must be exactly d. (ii) Show that the sequence xn = T (αn ) is the output from a linear feedback register of length at most d. Explain brieﬂy why this is best possible. y) = 0 for all x implies y = 0).(i) Show that the map S : K × K → F2 given by S(x. 104 .

- The Communication Process is the Guide Toward Realizing Effective CommunicationUploaded byRachael Sookram
- Sardinas Patterson Like Algorithms in Coding TheoryUploaded byRoxana Puf
- Bennatan2006 - arXive 0511040v1.pdfUploaded bynomore891
- File Compression and DecompressionUploaded byPrakash Bhatiya
- Algebraic CodingUploaded byShpresa Sadiku
- Image Compression Using Haar Transform and Modified Fast Haar Wavelet TransformUploaded byIJSTR Research Publication
- American-Cinematographer-AGUSTUS 2012 VOL 93 NO 8.pdfUploaded bygenzizu
- 18070508_tocUploaded bySonu Dash
- Review: Denoising and Compression MethodsUploaded byEditor IJTSRD
- 10. Mul TechUploaded byshitterbabyboobsdick
- 4. Coding Dan KompresiUploaded byEnggar Saka Dirgantara
- 07541330Uploaded byEmanuel Sena
- H.264 ConsiderationsUploaded by임정균
- Improving the security of images transmissionUploaded byeditor3854
- Hybrid JPEG Compression Using Edge Based SegmentationUploaded bysipij
- Chapter 1. BackgroundUploaded bysyahriramadan2010
- Introduction to MATLABUploaded byShameer Phy
- Psyche Soul Breach Driven Dynamics InsidUploaded bysfofoby
- channel capacity.pdfUploaded byRajasekhar Madithati
- sachinUploaded bysshane kumar
- Data Domain Health CheckUploaded bysreekanthdama
- Jpeg LatexUploaded bywalternampimadom
- Lossless Compression MethodsUploaded byMohammad Chessab Mahdi
- [IJCST-V5I1P3]:H.N.Meenakshi, P.NagabushanUploaded byEighthSenseGroup
- Mining MusicUploaded bypicaroto
- Unit 2Uploaded bylghmshari
- Ee 451 Homework 6 Spring 2016Uploaded bymichael
- Communication 101Uploaded byMylene Alano Placido
- Audio Video FormatsUploaded byVinay Sharma
- Implementation of Image Compression and Decompression using JPEG TechniqueUploaded byIJSTE

- 2 Sockets TCPUploaded bydreamworks88
- 15Uploaded bydreamworks88
- Servicios Al Contribuyente - EdoMexUploaded bydreamworks88
- Sec HatUploaded bydreamworks88
- Las 7 Herramientas as de La CalidadUploaded bydreamworks88
- Imagenes Configuracion de IisUploaded bydreamworks88
- DESARROLLO Y SIMULACIÓN DE UNA ESTACIÓN BASEUploaded bydreamworks88
- tesis_picoblazeUploaded bydreamworks88
- ADSLUploaded bydreamworks88
- UntitledUploaded bydreamworks88