You are on page 1of 382

1 Introduction and history

Reed-Muller codes are amongst the oldest and most well-known of codes. They were discov-
ered and proposed by D. E. Muller and I. S. Reed in 1954.
Reed-Muller codes have many interesting properties that are worth examination; they
form an infinite family of codes, and larger Reed-Muller codes can be constructed from
smaller ones. This particular observation leads us to show that Reed-Muller codes can be
defined recursively.
Unfortunately, Reed-Muller codes become weaker as their length increases. However,
they are often used as building blocks in other codes.
One of the major advantages of Reed-Muller codes is their relative simplicity to encode
messages and decode received transmissions. We examine encoding using generator matrices
and decoding using one form of a process known as majority logic.
Reed-Muller codes, like many other codes, have tight links to design theory; we briefly
investigate this link between Reed-Muller codes and the designs resulting from affine geome-
tries.
Finally, we present the reader with an implementation of the Reed-Muller encoding and
decoding process, written in ANSI C.

2 Monomials and vectors over F2


We begin by introducing the concept of vector spaces over F2 and their correspondance to
certain rings of polynomials.

Notation: We use the notation Zn to refer to the set {0, 1, . . . , n − 1}.

Notation: We use a string of length n with elements in F2 to write a vector in the


vector space Fn2 for brevity when this is unambiguous. For example, if we have the vector
σ = (1, 0, 1, 1, 0, 1, 0) ∈ F72 , we simply write σ as 1011010.

Notation: Assume that x and y are vectors over our vector space Fn2 (with x = (x1 , x2 , . . . , xn ),
and y = (y1 , y2 , . . . , yn )). Let a be an element of F2 , where we call a a scalar. Then, we
have the standard operations of addition and scalar multiplication for vectors (these are not
described here, and we assume that the reader is familiar with them). However, we also
extend the set of operations in the following way:
• Scalar addition
a + x = (a + x1 , a + x2 , . . . , a + xn )

• Vector complement

x = 1 + x = (1 + x1 , 1 + x2 , . . . , 1 + xn )

• Vector multiplication
x ∗ y = (x1 ∗ y1 , x2 ∗ y2 , . . . , xn ∗ yn )

2
Note that (Fn2 , +, ∗) forms a commutative ring.
m
Assume that we are given a vector space F22 . Then we consider the ring Rm =
F2 [x0 , x1 , . . . , xm−1 ]. We will see shortly that there exists a bijection between elements of
m m
Rm and F22 (in fact, an isomorphism of rings between (Rm , +, ∗) and (F22 , +, ∗)).

Definitions: A Boolean monomial is an element p ∈ Rm of the form:


r
p = xr00 xr11 . . .xm−1
m−1

where ri ∈ N and i ∈ Zm . A Boolean polynomial is simply, as expected, a linear combination


(with coefficients in F2 ) of boolean monomials; any element of Rm may be thought of as a
Boolean polynomial.

Definition: Given a Boolean monomial p ∈ Rm , we say that p is in reduced form if it is


squarefree. For any Boolean monomial q ∈ Rm , it is trivial to find the reduced form of q by
applying:
xi xj = xj xi as Rm is a commutative ring
x2i = xi as 0 ∗ 0 = 0 and 1 ∗ 1 = 1
A Boolean polynomial in reduced form is simply a linear combination of reduced-form
Boolean monomials (with coefficients in F2 ).

Example: Say we have the Boolean polynomial p = 1 + x1 + x50 x22 + x0 x41 x101
2 ∈ R3 . Then,
by applying the above rules, we can get its reduced form, p′ :

p′ = 1 + x1 + x0 x2 + x0 x1 x2

m
Consider the mapping ψ : Rm → F22 , defined as follows:

ψ(0) = !00."#. .0$


2m
ψ(1) = !11."#. .1$
2m
ψ(x0 ) = !11."#. .1$ 00.
! "#. .0$
2m−1 2m−1
ψ(x1 ) = !11."#. .1$ 00.
! "#. .0$ 11.
! "#. .1$ 00.
! "#. .0$
2m−2 2m−2 2m−2 2m−2
ψ(x2 ) = !11."#. .1$ 00.
! "#. .0$ 11.
! "#. .1$ 00.
! "#. .0$ 11.
! "#. .1$ 00.
! "#. .0$ 11.
! "#. .1$ 00.
! "#. .0$
2m−3 2m−3 2m−3 2m−3 2m−3 2m−3 2m−3 2m−3
.. ..
. .
ψ(xi ) = !11."#. .1$ 00.
! "#. .0$ . . .
2m−i 2m−i
.. ..
. .

3
For any monomial p ∈ Rm , to calculate ψ(p), we find the reduced form:

p′ = xi1 xi2 . . .xir

(where ij ∈ Zm , ij = ik → j = k, and 0 ≤ r ≤ m). Then,

ψ(p) = ψ(xi1 ) ∗ ψ(xi2 ) ∗ · · · ∗ ψ(xir )

For any polynomial q ∈ Rm , we can write q as:

q = m1 + m2 + . . . + mr

(where mi is a monomial of Rm , mi = mj → i = j, and 0 ≤ r ≤ 2m ). Then,

ψ(q) = ψ(m1 ) + ψ(m2 ) + . . . + ψ(mr )

Example: Let p = 1 + x1 + x50 x22 + x0 x41 x101


2 ∈ R3 , and we have seen from above that p

has reduced form p = 1 + x1 + x0 x2 + x0 x1 x2 . Then:
ψ(p) = ψ(p′ )
= ψ(1) + ψ(x1 ) + ψ(x0 x2 ) + ψ(x0 x1 x2 )
= ψ(1) + ψ(x1 ) + ψ(x0 ) ∗ ψ(x2 ) + ψ(x0 ) ∗ ψ(x1 ) ∗ ψ(x2 )
= 11111111 + 11001100 + 11110000 ∗ 10101010 + 11110000 ∗ 11001100 ∗ 10101010
= 00010011

Proposition: ψ is a bijection (not shown), and indeed, ψ is a homomorphism of rings


m
(proof not shown, but obvious by construction of ψ). Thus, Rm and F22 are isomorphic, and
from this point on, we will interchangeably refer to vectors and their associated polynomials
in reduced form.

3 Simple view of Reed-Muller Codes


We now investigate the basics of Reed-Muller codes, including what they are, and a technique
for encoding and decoding. The design theory background of Reed-Muller codes is not
discussed here, but interested readers may refer to section 4.3 to learn more.

Definition: The r th order Reed-Muller code, denoted RM(r, m), is the set of all polyno-
mials of degree at most r in the ring Rm , as defined in the previous section. Alternatively,
m
through the isomorphism ψ, it may be thought of as a subspace of F22 .

Observation: The 0th order Reed-Muller code RM(0, m) consists of the monomials {0, 1},
which is equivalent to the following vectors:

{00. ! "#. .1$}


! "#. .0$, 11.
2m 2m

On the other extreme, the mth order Reed-Muller code RM(m, m) = Rm ∼


m
= F22 .

4
3.1 Encoding and the generator matrix
For the Reed-Muller code RM(r, m), we define the generator matrix as follows:
⎡ ⎤
ψ(1)
⎢ ψ(x0 ) ⎥
⎢ ⎥
⎢ ψ(x ) ⎥
⎢ 1 ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ ψ(xm−1 ) ⎥
⎢ ⎥
⎢ ψ(x0 x1 ) ⎥
GRM(r,m) = ⎢
⎢ ψ(x0 x2 )


⎢ . ⎥
⎢ .
. ⎥
⎢ ⎥
⎢ ψ(xm−2 xm−1 ) ⎥
⎢ ⎥
⎢ ψ(x0 x1 x2 ) ⎥
⎢ ⎥
⎢ . ⎥
⎣ .
. ⎦
ψ(xm−r xm−r+1 . . .xm−1 )

Example: The generator matrix for RM(1, 3) is:


⎡ ⎤ ⎡ ⎤
ψ(1) 1 1 1 1 1 1 1 1
⎢ ψ(x0 ) ⎥ ⎢ 1 1 1 1 0 0 0 0 ⎥
GRM(1,3) = ⎢ ⎥ ⎢
⎣ ψ(x1 ) ⎦ = ⎣ 1

1 0 0 1 1 0 0 ⎦
ψ(x2 ) 1 0 1 0 1 0 1 0

Example: The generator matrix for RM(2, 3) is:


⎡ ⎤ ⎡ ⎤
ψ(1) 1 1 1 1 1 1 1 1
⎢ ψ(x0 ) ⎥ ⎢ 1 1 1 1 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x1 ) ⎥ ⎢ 1 1 0 0 1 1 0 0 ⎥
⎢ ⎥ ⎢ ⎥
GRM(2,3) = ⎢
⎢ ψ(x 2 ) ⎥=⎢ 1
⎥ ⎢ 0 1 0 1 0 1 0 ⎥

⎢ ψ(x0 x1 ) ⎥ ⎢ 1 1 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎣ ψ(x0 x2 ) ⎦ ⎣ 1 0 1 0 0 0 0 0 ⎦
ψ(x1 x2 ) 1 0 0 0 1 0 0 0

5
Example: The generator matrix for RM(2, 4) is:
⎡ ⎤ ⎡ ⎤
ψ(1) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
⎢ ψ(x0 ) ⎥ ⎢ 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x1 ) ⎥ ⎢ 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x2 ) ⎥ ⎢ 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x3 ) ⎥ ⎢ 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ⎥
⎢ ⎥ ⎢ ⎥
GRM(2,4) = ⎢
⎢ ψ(x0 x1 )
⎥=⎢ 1 1 1 1 0
⎥ ⎢ 0 0 0 0 0 0 0 0 0 0 0 ⎥

⎢ ψ(x0 x2 ) ⎥ ⎢ 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x0 x3 ) ⎥ ⎢ 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x1 x2 ) ⎥ ⎢ 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎣ ψ(x1 x3 ) ⎦ ⎣ 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 ⎦
ψ(x2 x3 ) 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0

Theorem 1. The matrix GRM(r,m) has dimension k × n, where:


r , -
+ m
k=
i=0
i

n = 2m
The dimension of the code is hence k.

Proof. The rows of the matrix may be partitioned


.m/ by the degree of the monomial they repre-
sent;
.m/ for the code RM(r, m), there are 0
rows corresponding to monomials
.m/ of degree 0 (0),
1
rows corresponding to monomials of degree 1 (x0 , x1 , . . . , xm−1 ), 2 rows corresponding
to monomials of degree 2 (x0 x1 , x0 x2 , . . . , xm−2 xm−1 ), etc...
Hence, there are:
, - , - , - + r , -
m m m m
+ + ...+ =
0 1 r i=0
i
such rows, and thus, the dimension of the code is k. It is simple to see that the number of
columns is 2m by definition of ψ.

Encoding is then easy; we are given a message m ∈ Fk2 and we simply perform the
multiplication m ∗ GRM(r,m) to get our codeword c.

Example: Assume we want to encode the message m = 01101001010 using RM(2, 4).
We have:
c = m ∗ GRM(2,4) = 1010111111111010

6
3.2 Decoding using majority logic
There are different techniques available for decoding Reed-Muller codes, but one of the most
common and most easily implementable is majority logic decoding. We will investigate one
form of this technique here.
Before we begin, we will give mention to the error-correcting capabilities of Reed-Muller
codes. Given the code RM(r, m), the distance between any two codewords is 2m−r . This
is not proven here; interested readers can refer to theorem 2 in section 4.2 of this document
for a detailed proof of this fact that relies on the recursive definition of Reed-Muller codes.
To correct ϵ errors, we must have a distance strictly greater than 2ϵ. Thus, in the case of
Reed-Muller codes, since we have distance 2m−r , we can correct max(0, 2m−r−1 − 1) errors.
We now look at a basic algorithm for decoding using majority logic. The essential idea
behind this technique is that, for each row of the generator matrix, we attempt to determine
through a majority vote whether or not that row was employed in the formation of the
codeword corresponding to our message.

Definition: Let p be any monomial of degree d in Rm with reduced form p′ . Then we form
the set J of variables not in p′ and their complements. The characteristic vectors of p are
all vectors corresponding to monomials of degree m − d over the variables of J . Note: Any
monomial containing a variable and its complement corresponds to the 0 vector (through ψ:
ψ(xi xi ) = ψ(xi ) ∗ ψ(xi ) = 0). Since ψ is a bijection and ψ −1 (0) = 0, this implies that any
monomial containing both a variable and its complement is equivalent to the monomial of
degree 0. Thus, without loss of generality, we only consider monomials where the variables
are distinct (i.e. no variable and its complement appears).

Example: If we are working over RM(3, 4), the characteristic vectors of x0 x1 x3 are the
vectors corresponding to the monomials {x2 , x2 }. The characteristic vectors of x0 x2 are the
vectors corresponding to the monomials {x1 x3 , x1 x3 , x1 x3 , x1 x3 }.

The way in which we accomplish this is that we begin at the bottom of our generator
matrix and work our way upwards. Our algorithm starts by examining the rows with mono-
mials of degree r. We begin by calculating 2m−r characteristic vectors for the row, and then
we take the dot product of each of these vectors with our received message. If the majority
of the dot products are 1, we assume that this row was used in constructing our codeword,
and we set the position in our original message vector corresponding to this row to 1. If the
majority of the dot products are 0, then we assume that this row was not used in forming
our message, and hence, we set the corresponding entry in the original message vector to 0.
After we have run this technique
.m/ for all rows corresponding to monomials of degree r,
we take the vector of length r corresponding
. / to the portion of the message we have just
calculated, and we multiply it by the mr rows of our generator matrix that we have just
considered. This gives us a vector s of length n. We add s to our received message, and
proceed recursively on the rows corresponding to monomials of degree r − 1.
This is best clarified by an example.

Example: We decode the message u = 00110110 in RM(2, 3). The generator matrix for
this code can be found in the preceeding subsection.

7
As per the algorithm, we begin from the bottom of the matrix up, first considering the
rows corresponding to monomials of degree r = 2.
Row x1 x2 has characteristic vectors x0 , x0 .
0
u · ψ(x0 ) = 0
→m= 0
u · ψ(x0 ) = 0

Row x0 x2 has characteristic vectors x1 , x1 .


0
u · ψ(x1 ) = 1
→m= 10
u · ψ(x1 ) = 1

Row x0 x1 has characteristic vectors x2 , x2 .


0
u · ψ(x2 ) = 0
→m= 010
u · ψ(x2 ) = 0

We have completed processing the rows corresponding to monomials of degree r = 2, so


we compute s:
⎡ ⎤
1 2 1 1 0 0 0 0 0 0 1 2
s= 0 1 0 ·⎣ 1 0 1 0 0 0 0 0 ⎦= 1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 0

We add s to u to get s + u = 10010110, and we proceed to process the rows corresponding


to monomials of degree r − 1 = 1.
Row x2 has characteristic vectors x0 x1 , x0 x1 , x0 x1 , x0 x1 .

u · (ψ(x0 ) ∗ ψ(x1 )) = 1 ⎪

u · (ψ(x0 ) ∗ ψ(x1 )) = 1 ⎬
→m= 1010
u · (ψ(x0 ) ∗ ψ(x1 )) = 1 ⎪ ⎪

u · (ψ(x0 ) ∗ ψ(x1 )) = 1

We continue to proceed in this fashion, to discover that the original message was v =
0111010 (we can trivially check that this is correct by verifying that v · GRM(2,3) = u). In
this case, u had no errors (indeed, RM(2, 3) can correct 0 errors, which makes it a poor
choice of code, but it will suffice for the purposes of this example).

4 Different constructions for RM(r, m)


The isomorphism ψ, as mentioned above, is not unique. In fact, we can derive many such iso-
morphisms, and our generator matrix GRM(r,m) is dependent on which of these isomorphisms
that we choose.

8
4.1 An alternate view of ψ
Another way that we can think of ψ is as follows. Consider the ring Rm and the vector space
Fm
2 . We can consider the vectors in F2 simply as binary representations of the elements in
m

Z2m , and thus we can derive a natural ordering from this observation.

Example: Suppose we are considering F32 . Our natural ordering for the vectors in this
vector space is as follows:

(000, 001, 010, 011, 100, 101, 110, 111)

Let S be the set of all monomials in Rm . Let p ∈ S be a Boolean monomial with reduced
form p′ with the form p′ = xi1 xi2 . . .xir (ij ∈ Zm , ij = ik → j = k, 0 ≤ r ≤ m). We consider
the function α, defined as follows (where P(X ) is the power set of X ):

α : S → P(Zm )
7
′ ∅ : if r = 0
p ≡ p = xi1 xi2 . . .xir )→
{i1 , i2 , . . . , ir } : otherwise
We then define a class of certain functions. Given T ∈ P(Zm ), we define the following
function:
fT : Fm
2 → F2
7 8
(x0 , x1 , . . . , xm−1 ) )→ i∈T (xi + 1) : if T ̸= ∅
1 : if T = ∅

Example: Evaluate f{0,2} (1001) and f{0,2} (0101).


We begin by noting that, by definition, f{0,2} (x0 , x1 , x2 , x3 ) = (x0 + 1)·(x2 + 1).

f{0,2} (1011) = (1 + 1) · (0 + 1) = 0 · 1 = 0

f{0,2} (0101) = (0 + 1) · (0 + 1) = 1 · 1 = 1

Then we consider the function β, defined as follows:


m
β : S → F22

x )→ (fα(x) (0b ), fα(x) (1b ), fα(x) (2b ), . . . , fα(x) ((m − 1)b ))


where nb is the binary representation of n as a vector in Fm
2 .
Now we have all the tools we need in order to fully define ψ within this new framework,
which we will see is equivalent to the old framework in a moment. We first recognize that
any element of q ∈ Rm can be written as a sum of monomials, i.e. q = m1 + m2 + . . . + mr ,
where mi ∈ S, mi = mj → i = j, 0 ≤ r ≤ 2m . We then define ψ as follows:
m
ψ : Rm → F22

9
7
β(0) : x=0
x = m1 + m2 + . . . + mr )→
β(m1 ) + β(m2 ) + . . . + β(mr ) : x ∈ Rm \ {0}
This more precise definition of ψ is consistent with our previous definition of ψ, and this
is best shown with an example.

Example: Using ψ as defined above, we construct GRM(2,3) . The rows of GRM(2,3) corre-
spond to the monomials {1, x0 , x1 , x2 , x0 x1 , x0 x2 , x1 x2 }, and to get each row, we evaluate ψ
for each monomial.
ψ(1) = β(1) = (f∅ (000), f∅(001), . . . , f∅ (111)) = 11111111
ψ(x0 ) = β(x0 ) = (f{0} (000), f{0} (001), . . . , f{0} (111)) = 11110000
ψ(x1 ) = β(x1 ) = (f{1} (000), f{1} (001), . . . , f{1} (111)) = 11001100
ψ(x2 ) = β(x2 ) = (f{2} (000), f{2} (001), . . . , f{2} (111)) = 10101010
ψ(x0 x1 ) = β(x0 x1 ) = (f{0,1} (000), f{0,1} (001), . . . , f{0,1} (111)) = 11000000
ψ(x0 x2 ) = β(x0 x2 ) = (f{0,2} (000), f{0,2} (001), . . . , f{0,2} (111)) = 10100000
ψ(x1 x2 ) = β(x1 x2 ) = (f{1,2} (000), f{1,2} (001), . . . , f{1,2} (111)) = 10001000

As we can see, these vectors form the rows of GRM(2,3) as defined in section 2.

4.2 Recursive definition of RM(r, m)


As mentioned in the introduction, it is interesting to note that Reed-Muller codes can be
defined recursively. However, the standard recursive definition of RM(r, m) will result in a
generator matrix that differs from the one defined by ψ. This is of no consequence, provided
that both encoding and decoding are done using the same matrix.
Here is the standard recursive definition for RM(r, m) ([4]):

1. RM(0, m) = {00.
! "#. .0$}
2m
m
2. RM(m, m) = F22

3. RM(r, m) = {(x, x + y) | x ∈ RM(r, m − 1), y ∈ RM(r − 1, m − 1)}, 0 < r < m

Example: Using the recursive definition of Reed-Muller codes, we find RM(1, 2):

RM(1, 2) = {(x, x + y) | x ∈ RM(1, 1), y ∈ RM(0, 1)}


= {(x, x + y) | x ∈ {00, 01, 10, 11}, y ∈ {00, 11}}
= {0000, 0011, 0101, 0110, 1010, 1001, 1111, 1100}

This recursive definition of the codes translates to a recursive definition of the generator
matrix, as follows: 9 :
GRM(r−1,m) GRM(r−1,m)
GRM(r,m) =
0 GRM(r−1,m−1)
where we consider the following base cases:

10
GRM(0,m) = {11.
! "#. .1$}
2m
⎡ ⎤
GRM(m−1,m)
GRM(m,m) = ⎣ 00.
! "#. .0$ 1

2m −1

Example: Using the recursive definition of generator matrices for Reed-Muller codes, we
construct GRM(2,3) :
9 :
GRM(2,2) GRM(2,2)
GRM(2,3) =
⎡ 0 GRM(1,2) ⎤
GRM (1, 2) GRM (1, 2)
⎢ 0 0 0 1 0 0 0 1 ⎥
= ⎢ ⎣ 0 0 0 0 GRM(1,1) GRM(1,1) ⎦

⎡ 0 0 0 0 0 0 GRM(0,1) ⎤
GRM(1,1) GRM(1,1) GRM(1,1) GRM(1,1)
⎢ 0 0 GRM(0,1) 0 0 GRM(0,1) ⎥
⎢ ⎥

= ⎢ 0 0 0 1 0 0 0 1 ⎥ ⎥
⎣ 0 0 0 0 GRM(1,1) GRM(1,1) ⎦
⎡ 0 0 0 0 0 ⎤ 0 GRM(0,1)
1 1 1 1 1 1 1 1
⎢ 0 1 0 1 0 1 0 1 ⎥
⎢ ⎥
⎢ 0 0 1 1 0 0 1 1 ⎥
⎢ ⎥
= ⎢ ⎢ 0 0 0 1 0 0 0 1 ⎥

⎢ 0 0 0 0 1 1 1 1 ⎥
⎢ ⎥
⎣ 0 0 0 0 0 1 0 1 ⎦
0 0 0 0 0 0 1 1

One interesting property of the recursive definition is that it facilitates proving certain
properties of Reed-Muller codes; recursion allows us to derive inductive proofs that are, in
these cases, much simpler than non-inductive proofs.

Theorem 2. The distance of RM(r, m) is 2m−r .

Proof. (from [4]) Let wt(x) denote the weight of x in the code.
We proceed by induction on r. By definition:

RM(r, m) = {(x, x + y) | x ∈ RM(r, m − 1), y ∈ RM(r − 1, m − 1)}

It is easy to see that RM(r − 1, m − 1) ⊆ RM(r, m − 1). By combining these facts, we


observe that x + y ∈ RM(r, m − 1). We now consider two distinct cases.

11
1. x ̸= y
By the inductive hypothesis, the weight of x + y must be at least 2(m−1)−r . We also
have that the weight of x is at least 2(m−1)−r . Thus:

wt(x, x + y) = wt(x + y) + wt(x) ≥ 2 · 2m−r−1 = 2m−r

2. x = y
We then have that (x, x+y) = (x, 0) = (y, 0), and we know that y ∈ RM(r −1, m−1),
so we have that:

wt(x, x + y) = wt(y, 0) = wt(y) ≥ 2(m−1)−(r−1) = 2m−r

Thus, the distance of the code is 2m−r .

4.3 Affine geometries and Reed-Muller codes


Reed-Muller codes, like many other codes, have links in design theory, and in particular,
to designs from affine geometries. We will briefly investigate the link here. Readers are
assumed to have some basic familiarity with designs, although definitions are given where
necessary.

4.3.1 Affine geometries and their designs


Definition: Let F be any field, and let V be a vector space over F of dimension n. Then, the
affine geometry of V, denoted AG(V), is the set of all cosets over all subspaces of V. More
precisely:
AG(V) = {x + U | x ∈ V, U is a subspace of V}

Definition: For any coset x + U, if U is a subspace of V with dimension s, we say that


the coset x + U has dimension s. Then, we call x + U an s-flat of AG(V).

In AG(V), the points correspond to the 0-flats. The 1-flats are the lines and the 2-flats
the planes, with incidence defined by containment, as we will see in an example in a moment.

Example: We consider the field F = F2 and the vector space V = F32 . We then determine
the subspaces of V, partition them by dimension, and derive the cosets. For brevity, we
write vectors of V as 0-1 strings.
The subspace of dimension 0 is:
{000}
The cosets over this subspace give us the 0-flats, or the points, of our geometry:

{000} {001} {010} {011}


{100} {101} {110} {111}

12
The subspaces of dimension 1 are:

{000, 001} {000, 010} {000, 011} {000, 100}


{000, 101} {000, 110} {000, 111}

The cosets over these subspaces give us the 1-flats, or lines, of our geometry:

{000, 001} {000, 010} {000, 011} {000, 100}


{000, 101} {000, 110} {000, 111} {001, 011}
{001, 010} {001, 101} {001, 100} {001, 111}
{001, 110} {010, 011} {010, 110} {010, 111}
{010, 100} {010, 101} {011, 111} {011, 110}
{011, 101} {011, 100} {100, 101} {100, 110}
{100, 111} {101, 111} {101, 110} {110, 111}

The subspaces of dimension 2 are:

{000, 001, 010, 011} {000, 001, 100, 101} {000, 001, 110, 111} {000, 010, 100, 110}
{000, 010, 101, 111} {000, 011, 100, 111} {000, 011, 101, 110}

The cosets over these subspaces give us the 2-flats, or planes, of our geometry:

{000, 001, 010, 011} {000, 001, 100, 101} {000, 001, 110, 111} {000, 010, 100, 110}
{000, 010, 101, 111} {000, 011, 100, 111} {000, 011, 101, 110} {100, 101, 110, 111}
{010, 011, 110, 111} {010, 011, 100, 101} {011, 011, 101, 111} {001, 011, 100, 110}
{001, 010, 101, 110} {001, 010, 100, 111}

Trivially, the only subspace of V of dimension 3 is V, and the only coset derivable from this
subspace is V.

We also note that if, for an affine geometry AG(V), we take the set of 0-flats and the set
of s-flats (with s > 0) together, they form a balanced incomplete block design, with the set
of 0-flats as the points, and the set of s-flats as the blocks.

Example: In the previous example, we notice that the set of 0-flats as the points with
the set of 1-flats as the blocks forms a 2-(8, 2, 1)-BIBD. If we take the set of 0-flats as our
points and the set of 2-flats as our blocks, we have a 3-(8, 4, 1)-BIBD.

In the next section, we briefly examine the links between these designs and Reed-Muller
codes.

4.3.2 Reed-Muller codes and designs


Reed-Muller codes can be thought of as the designs of an affine geometry; the link between
the two is briefly explored here with an example. Interested readers should refer to [1] for a
more detailed examination of the ties between design theory and coding theory.

13
Theorem 3. RM(r, m) is the binary code of the design of points and (m − r)-flats of
AG(Fm
2 ).

We do not prove this theorem, but it simply requires showing that the incidence vectors
of the (m − r)-flats of AG(Fm
2 ) span RM(r, m). A detailed proof may be found in [1].

Example: We demonstrate how RM(1, 3) is the binary code of the design of points and
2-flats of AG(F32 ). We have already investigated AG(F32 ) in the previous section, and we will
rely on the results we obtained there.
We examine how the incidence vectors of the plane equations corresponding to the 2-flats
of AG(Fm 2 ) are actually codewords in our design. Recall that GRM(1,3) , using the isomorphism
ψ, is as follows: ⎡ ⎤
ψ(1)
⎢ ψ(x0 ) ⎥
GRM(1,3) = ⎢⎣ ψ(x1 ) ⎦

ψ(x2 )
Then it is easy to see that the 16 messages over F42 correspond to the codeword vectors
associated with the following equations:

x ∈ F42 x · GRM(1,3) x ∈ F42 x · GRM(1,3)


0000 0 1000 1
0001 x2 1001 1 + x2
0010 x1 1010 1 + x1
0011 x1 + x2 1011 1 + x1 + x2
0100 x0 1100 1 + x0
0101 x0 + x2 1101 1 + x0 + x2
0110 x0 + x1 1110 1 + x0 + x1
0111 x0 + x1 + x2 1111 1 + x0 + x1 + x2

Now we examine our design; each block corresponds to a plane, and each plane has an

14
incidence vector, as per the following table:

Block Plane eqn Inc vector


001, 010, 100, 111 1 + X0 + X1 + X2 = 0 0
000, 011, 101, 110 X 0 + X1 + X2 = 0 1
010, 011, 100, 101 1 + X 0 + X1 = 0 x2
000, 001, 110, 111 X 0 + X1 = 0 1 + x2
001, 011, 100, 110 1 + X 0 + X2 = 0 x1
000, 010, 101, 111 X 0 + X2 = 0 1 + x1
100, 101, 110, 111 1 + X0 = 0 x1 + x2
000, 001, 010, 011 X0 = 0 1 + x1 + x2
001, 010, 101, 110 1 + X 1 + X2 = 0 x0
000, 011, 100, 111 X 1 + X2 = 0 1 + x0
010, 011, 110, 111 1 + X1 = 0 x0 + x2
000, 001, 100, 101 X1 = 0 1 + x0 + x2
001, 011, 101, 111 1 + X2 = 0 x0 + x1
000, 010, 100, 110 X2 = 0 1 + x0 + x1

The vectors 0000 and 1111 in F42 do not have planes associated with their codewords;
however, their associated geometric structures may be formed by linear combinations of the
planes of the others.

Thus, Reed-Muller codes can be associated with the designs of affine geometries, and
indeed, several majority-logic decoding techniques have arisen because of this association.

15
A Implementation of Reed-Muller encoding and de-
coding
Using the techniques found in section 3, I have implemented two command-line applications
to perform encoding and decoding of Reed-Muller codes.
These applications have been coded in ANSI C and hence, should compile on any platform
that has an available C compiler. The source code is available on my personal web page, at

http://www.site.uottawa.ca/~raaphors

A.1 Encoding using rmencode


Encoding is done with the rmencode application, which is run in the following way:

rmencode r m vector1 [vector2 [vector3 [...]]]

where vectors are elements of Fk2 , represented by 0-1 strings.


Here are some examples of the encoding process:

~ -> ./rmencode 2 4 01101001010


1010111111111010
~ -> ./rmencode 2 4 00000000000
0000000000000000
~ -> ./rmencode 2 4 11111111111
1110100010000001
~ -> ./rmencode 0 3 0
00000000
~ -> ./rmencode 0 3 1
11111111
~ -> ./rmencode 3 3 00110011
01101110
~ -> ./rmencode 3 3 00110010
11101110
~ ->

A.2 Decoding using rmdecode


Decoding is performed using the rmdecode program, which has the same basic command-line
paramaters as rmencode:

rmdecode r m vector1 [vector2 [vector3 [...]]]


m
However, in this case, of course, the specified vectors must be elements of F22 , written as
0-1 strings.
Here are some examples of the decoding process:

16
~ -> ./rmencode 2 5 1111111111111111
01111110111010001110100010000001
~ -> ./rmdecode 2 4 1010111111111010
01101001010
~ -> ./rmdecode 2 4 1010111011111010
01101001010
~ -> ./rmdecode 2 4 1011111111111010
01101001010
~ -> ./rmdecode 2 5 01111110111010001110100010000001
1111111111111111
~ -> ./rmdecode 2 5 01101110101010001110101010000001
1111111111111111
~ ->

Note that in the final example, we add an error vector of length 3 to the encoding of
1111111111111111 (namely 00010000010000000000001000000000), and the program decodes
the correct message. This is as we expect in RM(2, 5), which can correct at most 25−2−1 −1 =
3 errors.

17
References
[1] Assmus, Jr., E. F. and Key, J. D. Designs and their Codes. Press Syndicate of the
University of Cambridge, Cambridge, 1992.

[2] Cameron, P. J. and Van Lint, J. H. Designs, Graphs, Codes, and their Links. Cambridge
University Press, Cambridge, 1991.

[3] Cooke, Ben. Reed-Muller Error Correcting Codes. MIT Undergraduate Journal of Math-
ematics Volume 1, MIT Department of Mathematics, 1999.

[4] Hankerson, D. R. et al. Coding Theory and Cryptography: The Essentials. Marcel
Dekker, New York, 1991.

[5] MacWilliams, F. J. and Sloane, N. J. A. The Theory of Error Correcting Codes. North-
Holland Mathematical Library Volume 16, Elsevier Science B. V., Amsterdam, 1977.

18
50 Channel Coding in Communication Networks

Proof. Let (c0 , c1 , . . . , cn−1 ) be a codeword. Writing:


⎛ ⎞
c0
⎜ c1 ⎟
⎜ ⎟
H ×⎜ . ⎟
⎝ .. ⎠
cn−1

is equivalent to making a linear combination of the columns of H. If there is no zero


linear combination with less that d − 1 columns of H, then the kernel of H (i.e the
code C) does not have a word with weight lower than d. 

2.2.3.7. Minimum distance of C and matrix H


The study of the columns of H gives the minimum distance of C.

EXAMPLE 2.13. We take as code C the code (known as the Hamming code) (7,4,3).
Its parity check matrix:
⎛ ⎞
1010101
H = ⎝ 0110011 ⎠
0001111

has neither a zero column, nor two equal columns. The minimum distance of C is 3.

EXAMPLE 2.14. We take as code C the Hsiao code (8,4,4) whose parity check matrix
is:
⎛ ⎞
10000111
⎜ 01001011 ⎟
H=⎜ ⎝ 00101101 ⎠

00011110

It is a code that corrects 1 error and detects 2 of them.

2.2.4. Some linear codes

The best known linear codes are the Hamming codes and the Reed-Muller codes
(known as RM codes). Hamming codes have a parity check matrix formed by all the
non-zero r-tuples. They are the (2r −1, 2r −1−r, 3) codes. An RM code with a length
of 2m and order r is built on the basis of vectors v0 , v1 , . . . , vm , where v0 = (11 · · · 1)
and vi has 2i−1 “0” then 2i−1 “1” as components from left to right, in alternation. The
codewords of an RM code of length 2m and order r are all the products (component
by component) of a maximum of rm codewords m vi . An RM code of the order r has a
length q, a dimension 1 + m 1 + 2 + · · · + r , and a minimum distance 2
m−r
.
Block Codes 51

EXAMPLE 2.15 (m = 3, r = 2). We have v0 = 11111111, v1 = 01010101,


v2 = 00110011, v3 = 00001111. The code has 11111111, 01010101, 00110011,
00001111, 00010001, 00000101, 00000011, 00000001, 00000000 as words.

2.2.5. Decoding of linear codes

There are various more or less complex decodings possible, such as, for example,
lattice decoding, studied by S. Lin and T. Kasami amongst others.

Step by step decoding

Let us now introduce a very easy algorithm that can be used for all linear codes.

Let there be a linear code C, of length n, corrector of t errors by word, for which
we take a generator matrix G. We will suppose that C is binary, although this decoding
extends directly to non-binary codes.

Preparation of decoding

The following steps must be performed before proceeding to decoding:


1) construction of the parity check matrix H on the basis of G;
2) construction of the table of pairs (weight, syndromes):
– we will take as vector x any vector whose Hamming weight (noted wH (x))
is less than or equal to t,
– we will pose H[x]t = [zx ]t ,
– it is necessary to memorize in a table all the pairs (wH (x), zx ).

Decoding

Let c be an transmitted codeword, which is supposed to have been altered by an


error x satisfying wH (x) ≤ t. For each received word c + x we have an initialization
phase and an iterative phase.

Initialization phase

The initialization phase comprises three stages:


1) calculation of H[c + x]t (equal to H[x]t ), which we will call [zx ]t ,
2) search for zx in the table of pairs, from which we deduce wH (x),
3) initialization of a variable P to the found value of wH (x).
52 Channel Coding in Communication Networks

Iterative phase, for i = 1 with n

Let us use li to indicate the binary vector of Hamming weight equal to 1, where 1
is in position i. The iterative stage comprises two stages:
1) calculation of H[c + x + li ]t , and search for wH (x + 1i ) in the table. If it is not
found, the error cannot be corrected. We pass to 5);
2) analysis of wH (x + 1i ):
– if wH (x + 1i ) ≥ P , we do nothing,
– if wH (x + 1i) ≤ P , then : c + x ← c + x + 1i and P ← wH (x + 1i ).

R EMARK. We may stop the iterations as soon as wH (x + 1i ) = 0, since the error is


then corrected.

EXAMPLE 2.16. Let C be a BCH code (see section 2.4), with n = 15, t = 2, and
g(X) = 1 + X 4 + X 6 + X 7 + X 8 . Its generator matrix is:
⎛ ⎞
100010111000000
⎜ 010001011100000 ⎟
⎜ ⎟
⎜ 001000101110000 ⎟
⎜ ⎟
G=⎜ ⎟
⎜ 000100010111000 ⎟
⎜ 000010001011100 ⎟
⎜ ⎟
⎝ 000001000101110 ⎠
000000100010111

We find the parity check matrix:


⎛ ⎞
110000010000000
⎜ 011000001000000 ⎟
⎜ ⎟
⎜ 001100000100000 ⎟
⎜ ⎟
⎜ 000110000010000 ⎟

H=⎜ ⎟

⎜ 000011000001000 ⎟
⎜ 000001100000100 ⎟
⎜ ⎟
⎝ 000000110000010 ⎠
000000011000001

The table of pairs (wH (x), zx ) contains 121 elements. Let us suppose that the
transmitted codeword is c = 0 and the received codeword is x = (000100010000000).

Initialization phase, we have H(c + x)t = zx = (10111011)t . We go through the


table of pairs and find wH (x) = 2. We pose P = 2.
Block Codes 53

Iterative phase:
– i = 1, 2, 3:
– we find nothing in the table,
– therefore, we do nothing;
– i = 4:
– H[c + x + 14 ]t = [11100011]t , and the wH (x + 14 ) equals 1, lower than P ,
– we thus replace P by 1 and the received vector by c + x + 14 , i.e.
(000000010000000);
– i = 5, 6, 7: nothing changes;
– i = 8: H[c + x + 18 ]t = [00000000]t and the wH (x + 18 ) equals 0. The
corrected word is thus c + x + 18 .

2.3. Finite fields

2.3.1. Basic concepts

We presume that the reader is already familiar with the notions of modulo n cal-
culations, Z/(p) field, p prime (also noted Fp ) and Euclid and Bezout equalities. We
also presume that the concept of ring of polynomials on the Fp field is also known. An
important result concerning the ring of polynomials is the following.

PROPOSITION 2.10. Any non-zero polynomial of degree n has at most n roots in a


field.

Proof. The proof is outside the scope of this book. 

A useful result for us is provided in the following proposition.

PROPOSITION 2.11. If β is a root of a polynomial f (X) of F2 [X], then β 2 is also a


root.

Proof. Let us pose f (X) = f0 + f1 X + · · · + fn X n , fi ∈ F2 . Since fi2 = fi , we have


the equalities f (β 2 ) = f0 +f1 β 2 +· · ·+fn β 2n = (f0 +f1 β +· · ·+fn β n )2 = 02 = 0.


2.3.2. Polynomial modulo calculations: quotient ring

Let us suppose a polynomial a(X) ∈ F2 [X]. The set noted F2 [X]/(a(X)) is


the set of polynomial expressions in X, with coefficients in F2 , where we add and
multiply two elements calculating in F2 [X] then taking the remainder of the division
of the result by a(X). We easily prove that it is a ring.
54 Channel Coding in Communication Networks

EXAMPLE 2.17. Let us consider A = F2 [X]/(a(X)), with a(X) = 1+X +X 2 +X 3 .


Let us pose u1 (X) = 1+X+X 2 and u2 = X+X 3 . In F2 [X] we have u1 (X)u2 (X) =
X + X 2 + X 4 + X 5 , the remainder of whose division by a(X) is 1 + X 2 , which is
the result of u1 (X)u2 (X) in A.

EXAMPLE 2.18. Let us pose a(X) = X 5 + 1. Let us pose u1 (X) = 1 + X + X 2 . Cal-


culate u1 (X) = 1+X +X 2 , Xu1 (X), X 2 u1 (X), X 3 u1 (X), X 4 u1 (X), and examine
the representation in the form of binary vectors. We see that we obtain a circular shift
with each multiplication by X.

The ring F2 [X]/(a(X)) is called the quotient ring.

2.3.3. Irreducible polynomial modulo calculations: finite field

When p(X) is irreducible, of degree n, we demonstrate that F2 [X]/(p(X)) is a


(finite) field with 2n elements. The field F2 [X]/(p(X)) is also noted Fq , with q = 2n .
It is said that F2 [X]/(p(X)) is a representation of Fq . If there are two irreducibles of
the same degree n, then we have two representations of the same Fq field.

It is sometimes necessary (for example for certain decodings) to seek the roots of
a polynomial in a given field. Let us give an example of such a search for the roots of
a polynomial b(Y ) in a finite field.

EXAMPLE 2.19. In F2 [X]/(1 + X + X 4 ) we seek the roots of b(Y ) = 1 + Y + Y 2 :


1) is it 1 + X? We have (1 + X)2 + (1 + X) + 1 = 1 + X + X 2 = 0. It is not a
root;
2) is it X + X 2 ? We have (X + X 2 )2 + (X + X 2 ) + 1 = 1 + X + X 4 = 0. It is
a root;
3) is it 1 + X + X 2 ? We have (1 + X + X 2 )2 + (1 + X + X 2 ) + 1 = 0. It is a
root.

We can write 1 + Y + Y 2 = (Y − (X + X 2 ))(Y − (1 + X + X 2 )) (verify it).


We will also verify that (X + X 2 )2 = 1 + X + X 2 (see proposition 2.11).

2.3.4. Order and the opposite of an element of F2 [X]/(p(X))

We can study the order and the opposite of an element of a ring, but here we
place ourselves in a finite field. Let β ∈ F2n , non-zero. Let us note that it is invert-
ible because it is in a field. We consider the family E = {β, β 2 , β 3 , . . .} of distinct
successive powers of β.
Block Codes 55

2.3.4.1. Order
The order of β is the smallest positive integer e such that β e = 1 (e depends on β).

PROPOSITION 2.12. |E| = e.

Proof. E is finite because it is part of a finite field. Let us pose E = {β, β 2 ,


β 3 , . . . , β r }. This means that β r+1 was already obtained in the form of β i , with
i ≤ r. Let us suppose β r+1 = β t , with t ≥ 2. We then have ββ r = ββ t−1 , and
since β is invertible, we have β r = β t−1 , which means that β t−1 has already been
obtained, which contradicts the definition of E. Thus β r+1 = β, from where we
directly deduce that β r = 1. The order of β is thus equal to r. 

EXAMPLE 2.20. In F2 [X]/(1 + X + X 2 ) the order of 1 + X is 3, the opposite of


1 + X is X.

EXAMPLE 2.21. In the field F2 [X]/(1 + X + X 4 ) let us pose β = X 3 . We find that


its order is 5.

EXAMPLE 2.22. In the field F2 [X]/(1 + X + X 4 ) let us pose β = 1 + X. We find


that its order is 15.

2.3.4.2. Properties of the order


The three following propositions express the properties of the order.

PROPOSITION 2.13. The order e of β divides 2n−1 .

Proof. The set of q − 1 invertibles of the field forms a multiplicative group. The set of
powers of β forms a multiplicative subgroup. We know that the cardinal of a subgroup
divides the cardinal of the group which it contains. Lastly, e is the cardinal of the
subgroup. Thus, e divides q − 1. 

PROPOSITION 2.14. If x is of the order e, then xu = 1 involves e|u.

Proof. If u = λe then xu = (xe )λ = 1. If xu = 1, then by the Euclid equality


u = qe + r, r < e, and thus 1 = xu = xqe xr = (xe )q xr = xr . Since e is the order of
x we must have r = 0. 

PROPOSITION 2.15. If x is of the order e, then xr is of the order e/gcd(e, r).

Proof. Let us note (a, b) for pgcd(a, b). We have: (xr )e/(e,r) = (xe )r/(e,r) = 1. Thus,
the order of xr divides e/(e, r) (see proposition 2.14). If we have (xr )E = 1, then
e|rE, i.e. rE = λe for a certain λ, from where [r/(e, r) × E = e/(e, r)] × λ. Since
we see that (r/(e, r), e/(e, r)) = 1, E is a multiple of e/(e, r). 
56 Channel Coding in Communication Networks

Let us provide a method to compute the order of a β of Fq , q = 2n .


1) make the lattice of the divisors of q − 1;
2) to test β i where i is a maximum divisor of q − 1;
3) if for a maximum divider k we have β k = 1, then start again with the lattice of
dividers of k;
4) if β k = 1 for any maximum divisor k of q − 1, then the order of β is q − 1 (see
proposition 2.13).

EXAMPLE 2.23. Let F26 be represented by F2 [X]/(1 + X + X 6 ). We seek the order


of β = 1 + X:
1) the lattice of divisors of 26 − 1 = 63 is as follows;
2) we must calculate β 9 and β 21 . We find β 9 = β + β 2 + β 4 + β 5 = 1 which
proves that its order does not divide 9. Moreover, we find β 21 = 1. Thus, the order of
β is 21 or 7. The calculation yields β 7 = β + β 3 + β 4 + β 5 = 1. Therefore, the order
of β is 21.

Figure 2.1. Lattice of divisors of 26 − 1 = 63

2.3.4.3. Primitive elements


An element of Fq is called primitive if its order is q − 1. We will see that there
always exists such an element in a field. We will prove the existence, then give the
number of such elements in Fq .

2.3.4.3.1. Existence
Propositions 2.16, 2.17 and 2.18 prove the existence of primitive. Let us pose
mk
q − 1 = pm
1 · · · pk (primary decomposition of q − 1).
1

PROPOSITION 2.16. There exists an element y1 whose order is of the form pm 1 i2


1 p2 · · ·
ik
pk .

Proof. If not, the order of all x = 0 of the field would be the root of X (q−1)/p1 − 1,
which is not possible, because of the degree (see proposition 2.10). 
ik
Of course, there is also an element y2 whose order is of the form pi11 pm 2 i3
2 p3 · · · p k
and so on. There thus exist particular elements y1 , y2 , . . . , yk .
Block Codes 57

m2 i
p p3 i3 ...pkk
PROPOSITION 2.17. Let z1 = y1 2 . Its order is pm
1 .
1

Proof. Applying proposition 2.15 we see that the order of the element z1 is equal to
ik ik ik
(pm 1 i2 m1 i2 m2 i3
1 p2 · · · pk )/(p1 p2 · · · pk , p2 p3 · · · pk ). 

Using the same argument we also obtain the elements z2 , . . . , zk that have respec-
mk
tive orders pm
2 , . . . , pk .
2

PROPOSITION 2.18. The element t = z1 · · · zk has as an order of q − 1.

Proof. Let E be the order of t. E is of the form pr11 · · · prkk , (see proposition 2.13), with
r1 r2 rk
ri ≤ mi for all i. We have tp1 p2 ···pk = 1. Let us raise to the power of pm 2
2 −r2
···
rk mk −rk mk r1 m2 mk
r1 r2 m2 −r2 m2 p p ···p
pm
k
k −rk
. We have: (tp1 p2 ···pk )p2 ···pk
= tp1 ···pk = z1 1 2 k
= 1.
m1 r1 m2 mk
Thus (see proposition 2.14) p1 |p1 p2 · · · pk , and then m1 |r1 , which means that
r1 = m1 . By symmetry we also obtain the equalities r2 = m2 , . . . , rk = mk , and
thus the order of t is equal to q − 1. 

We cannot formally construct a primitive, but if we know one of them we can find
them all, as indicated by the following proposition.

PROPOSITION 2.19. Let x be a primitive. The element xi is primitive, if (i, q −1) = 1.

Proof. The order of xi is (q − 1)/(q − 1, i) (see proposition 2.15). 

If we are not in a field there may not exist a primitive for the group of invertibles,
as the following examples show. Let us recall that ϕ is the Euler indicator. The number
of integers smaller than m, and relatively preceding this m, is equal to ϕ(m).

EXAMPLE 2.24. In Z/(9) we have ϕ(9) = 6. The group of invertibles thus has 6
elements. Element 2 has an order 6. It is a primitive from the group of invertibles.

EXAMPLE 2.25. In Z/(8) there are 4 invertibles. The invertibles 1, 3, 5, 7 have the
respective orders 1, 2, 2, 2. Thus, there are no primitives.

2.3.4.3.2. The number of primitives


The number of primitives is specified by the following result.

COROLLARY 2.1. The number of primitives in Fq is ϕ(q − 1).

Proof. By definition of the Euler function ϕ, and by proposition 2.19. 


58 Channel Coding in Communication Networks

2.3.4.4. Use of the primitives


The primitive elements are often used in the application of error correcting codes.

2.3.4.4.1. The use of a primitive to represent the elements of a field


Any element β of F2 [X]/(p(X)), with irreducible p(X) of nth degree, is a poly-
nomial expression in X with binary coefficients of a degree no more than n − 1. The
product of two elements β1 and β2 is thus a product of two modulo p(X) polynomi-
als. It is a rather complex operation, both time and power consuming. Therefore, in
practice it is interesting to change the representation of the field. We choose a primi-
tive α and express any non-zero element of the field as a power of this primitive. The
advantage is as follows. Let β1 = αi and β2 = αj . The product is αi+j where i + j
is calculated modulo q − 1, which is very easy and fast. Let us note that this represen-
tation renders the sum β1 + β2 more difficult to calculate than with the polynomial
expression of the elements. This disadvantage can be mitigated by using a Zech table.

2.3.4.4.2. Zech’s log table to calculate the sum of two elements


If β1 = αi and β2 = αj , with i > j, then β i + β j = αj (αi−j + 1). The Zech
table has 1 + αk as input and αm as output with 1 + αk = αm .

EXAMPLE 2.26. In F2 [X]/(1 + X + X 4 ) we take as primitive α = X. We have the


following representation:

1=1 α5 = α + α2 α10 = 1 + α + α2
α=α α6 = α2 + α3 α11 = α + α2 + α3
α2 = α2 α7 = 1 + α + α3 α12 = 1 + α + α2 + α3
α3 = α3 α8 = 1 + α2 α13 = 1 + α2 + α3
α4 = 1 + α α9 = α + α3 α14 = 1 + α3

Let β1 = α2 + α3 and β2 = 1 + α + α2 + α3 . The product is equal to α6+12 = α3 .


The Zech table is presented as follows:

1 + α = α4 1 + α2 = α8 1 + α3 = α14 1 + α4 = α
1 + α5 = α10 1 + α6 = α13 1 + α7 = α9

This is sufficient because we have the equality 1 + αi+(q/2) = αi+q/2 (1 +


(q/2)−i−1
α ). We have βi = α2 (1 + α) = α2 (α4 ) = α6 , as well as β2 = 1 +
α(1 + α) + α3 = 1 + α5 + α3 = 1 + α3 α8 = α12 .

2.3.4.5. How to find a primitive


We cannot find a primitive formally, but we can use the following algorithm:
1) create the lattice of the divisors of 2n−1 ;
2) choose a non-zero element β;
Block Codes 59

3) use the maximum divisors. If no maximum divisor d yields β d = 1, then the


element β is primitive.

EXAMPLE 2.27. In F64 represented by F2 [X]/(1 + X + X 6 ) let us consider the


non-zero element β = X. It is primitive. We finds β 9 = X 3 + X 4 = 1 and
β 21 = 1 + X + X 3 + X 4 + X 5 = 1. Thus, the order of β is 63. It is primitive.

2.3.4.6. Exponentiation
We saw how to search for the order of an element, and how find out if it is primitive.
For large fields (i.e large q) we are led to calculate β i for very large i. One of the best
methods is to proceed as follows:
1) break up i in base 2;
2 3
2) calculate the exponentiations by 2, i.e. β, β 2 , β 2 , β 2 , etc.;
3) calculate the necessary products (see example 2.28).

We prove that the complexity is in O(log i) instead of O(i).

EXAMPLE 2.28. Calculation of β 21 , with the notations of example 2.9:


1) 21 = 16 + 4 + 1;
2) β → β 2 → β 4 → β 8 → β 16 which yields 1 + X → 1 + X 2 → 1 + X 4 →
1 + X 2 + X 3 → X + X 4;
3) β 21 = β 16 β 4 β 1 = (X + X 4 )(1 + X 4 )(1 + X) = 1.

This method is used, for example, for calculations necessary for the use of RSA in
cryptography.

2.3.5. Minimum polynomials


2 3 4
Let β ∈ F2n . Let us consider the part Cβ = {β, β 2 , β 2 , β 2 , β 2 , . . .}.

PROPOSITION 2.20. There exists a polynomial with binary coefficients, which admits
all the elements of this part as the set of its roots. This polynomial is irreducible.

Proof. Let us examine Cβ . It is a finite part, because it is included in a finite field. Let
2 t−1 t
us pose: Cβ = {β, β 2 , β 2 , . . . , β 2 }. This means that β 2 is an element of the form
i
β 2 , with 0 ≤ i ≤ t − 1.
t−1 i−1 t−1 i−1
Let us suppose i = 0. We then have (β 2 )2 = (β 2 )2 . Thus, (β 2 /β 2 )2 =
1. However, the polynomial Z 2 − Z has only two roots (see proposition 2.10), which
60 Channel Coding in Communication Networks

t−1 i−1
are 0 and 1. This leads to β 2 = β 2 . Thus, 2t−1 has been already obtained,
t−1
which goes against the definition of Cβ . Therefore, β 2 = β. A consequence of this
equality is that the class Cβ is stable under exponentiation by 2.
2 t−1
Now, let us consider the polynomial (Y − β)(Y − β 2 )(Y − β 2 ) · · · (Y − β 2 ).
It has the symmetrical functions of its roots as coefficients. Thus, each of its coef-
ficients is invariant under exponentiation by 2. Each coefficient is, therefore, binary.
This polynomial is irreducible, since otherwise it would have a divisor of a strictly
smaller degree than it does. Moreover, this divisor would have at least one element
of Cβ as root. As this class is invariant under exponentiation by 2, and according to
proposition 2.11, this polynomial should have all the elements of Cβ as roots. This is
impossible according to proposition 2.10. Thus, this divisor strictly cannot exist. 

It is said that this irreducible polynomial is the minimum polynomial of β, and we


note it by Mβ (Y ). The part Cβ is called the cyclotomic class of β.

EXAMPLE 2.29. F2 [X]/(1 + X + X 3 ), β = 1 + X:


1) the cyclotomic class of β is {1 + X, 1 + X 2 , 1 + X + X 2 };
2) we have Mβ (Y ) = (Y − (1 + X))(Y − (1 + X 2 ))(Y − (1 + X + X 2 )) =
1 + Y 2 + Y 3.

When β is primitive, the polynomial Mβ (Y ) is referred to as irreducible primitive,


or simply primitive.

2.3.6. The field of nth roots of unity

When we study a cyclic code of length n, we are led to seek the smallest field
Fq (q = 2r ) containing the nth roots of unity (i.e. x such that xn = 1). If x has as an
order n, it is said that it is a nth primitive root of unity.

PROPOSITION 2.21. The smallest field Fq (q = 2r ), which contains the nth roots of
unity, is such that r is the order of 2 modulo n.

Proof. Fq has q − 1 non-zero elements, which form a multiplicative group. The set of
nth roots of unity forms a subgroup thereof. Thus, n divides 2r −1. Written differently,
we have 2r − 1 = λn, or otherwise 2r = 1 modulo n, which shows that r is the order
of 2 modulo n. 

PROPOSITION 2.22. Let γ be an element of Fq (q = 2r , r is of the order 2 modulo n),


which is a nth root of unity, primitive or not. We have:
1) 1 + γ + γ 2 + · · · + γ n−1 = 0 if γ = 1,
2) 1 + γ + γ 2 + · · · + γ n−1 = 1 if γ = 1.
Block Codes 61

Proof. Indeed:
1+X n
1) γ is a root of the polynomial 1+X , since the group of the nth roots is the set
of roots of the polynomial 1 + X n ;
2) n is odd, since it divides 2r − 1.

2.3.7. Projective geometry in a finite field

We consider Fqm+1 as space vector of dimension m + 1 over Fq , with q = 2s . Let


q m+1 −1
α be a primitive of Fqm+1 and β be a primitive of Fq . We have β = α q−1 . We can
build a particular geometry, known as projective geometry. We define the “points” in
Fqm+1 , then the projective subspaces of the dimensions 1, 2, . . . in the following way.

2.3.7.1. Points
A point, noted (αi ), is the subspace of Fqm+1 generated by αi , deprived of 0. We
have (αi ) = {αi , βαi , . . . , β q−2 αi } = L(αi )\{0}. It is a subspace of dimension 1
deprived of 0.

2.3.7.2. Projective subspaces of order 1


If αj ∈/ (αi ), then a projective subspace of order 1, denoted (αj , αi ), equals
j i
L(α , α )\{0}. It is often called a “projective straight line”.

2.3.7.3. Projective subspaces of order t


It is a subspace of dimension t + 1 deprived of 0, in other words it is L(αi1 , . . . ,
it+1
α )\{0}. For t = 2, it often called a “projective plane”.

2.3.7.4. An example
Let us take q = 2, s = 1, m = 2, F8 = F2 [X]/(1 + X + X 3 ), α = X. The points
are as follows: (α0 ) = {α0 } because β = 1, (α1 ), (α2 ), (α3 ), (α4 ), (α5 ), (α6 ).

The projective straights are as follows:


(α0 , α1 ) = {(α0 ), (α1 ), (α3 )}, because α0 + α1 = α3
(α0 , α2 ) = {(α0 ), (α2 ), (α5 )}
(α0 , α4 ) = {(α0 ), (α4 ), (α5 )}, because α0 + α4 = α5
(α1 , α2 ) = {(α1 ), (α2 ), (α4 )}
(α1 , α5 ) = {(α1 ), (α5 ), (α6 )}
(α2 , α3 ) = {(α2 ), (α3 ), (α5 )}
(α3 , α4 ) = {(α3 ), (α4 ), (α6 )}, because α3 + α4 = α6
The only projective plane is the private field of 0.
62 Channel Coding in Communication Networks

2.3.7.5. Cyclic codes and projective geometry


We note that we can pass from a point to another by multiplication by α. Indeed,
m+1 q m+1 −1
the number n of points in the geometry is q q−1−1 , and 1 and α q−1 belong to the
same point (1). The set of the points can thus be arranged like a cyclic sequence. This
suggests considering cyclic codes in Fq [X]/(X n − 1), which is what we will return
to in the description of PG-codes (see section 2.4).

2.4. Cyclic codes

After the theoretical results of C. Shannon and the first linear code constructions
(Hamming, Golay) American engineers were required to be able to obtain codes stable
not only under addition (linear codes), but also stable under circular sliding (or shift).
The codes obtained (cyclic codes) are linear codes with additional properties.

This new requirement led the mathematicians to exploit the structure of A =


F2 [X]/(X n − 1), and, in particular, to study the ideal A. An ideal A is a non-empty
part, stable under addition, and stable under multiplication by any element of A. It is
a cyclic code of length n. Everywhere hereinafter n is odd.

2.4.1. Introduction

The following results express the properties of a cyclic code.

PROPOSITION 2.23. Any code C, stable under addition and circular shift may be
represented as an ideal A.

Proof. The circular shift on the right represents the multiplication by X in A. The code
C is thus stable under addition and multiplication by X. It is therefore stable under
addition and multiplication by any polynomial: thus, it is an ideal A. Conversely, an
ideal A is clearly a code stable under addition and circular shift. 

PROPOSITION 2.24. Any cyclic code has the form (g(X)) (i.e. the set of multiples of
g (X)), with g(X) dividing X n − 1. More precisely, there is between the cyclic codes
of length n and the set of divisors of X n − 1.

Proof. We know that the ring A = F2 [X]/(X n − 1) is principal, i.e. it is the set of the
multiples of one of its elements.

Let C be a cyclic code in A. Let g(X) be a polynomial of minimum degree


in the code. In F2 [X] we have: X n − 1 = q(X)g(X) + r(X). In A we deduce
r(X) = q(X)g(X), and, thus, r(X) is in C. Due to the minimality of the degree of
g(X) it necessarily follows that r(X) = 0. Thus g(X) divides X n − 1.
Block Codes 63

Reciprocally, let (X) be a divisor of X n − 1. It is straightforward to prove that


(g(X)) is a cyclic code.

It remains to be shown that two divisors distinct from X n − 1 generate two dis-
tinct codes C1 and C2 . Let us suppose C1 = C2 . Then in F2 [X], we have g2 (X) =
q(X)g1 (X) + λ(X)(X n − 1), for a certain λ(X). Thus g1 (X) divides g2 (X), since
g1 (X) divides X n − 1. Using symmetry, we prove the equality g1 (X) = g2 (X). 

PROPOSITION 2.25. Any cyclic code (a(X)) where a (X) is unspecified is still gen-
erated by P G(X), where P G(X) = (a(X), X n − 1).

Proof. Let us pose P G(X) = (a(X), X n − 1). Using the Bezout equality we obtain
P G(X) = λ(X)a(X) + µ(X)(X n − 1), for certain λ(X), µ(X). This proves that
P G(X) is in the code (a(X)). Any multiple of a(X) is thus a multiple of P G(X). In
(a(X)) there exists a generator of minimum degree. Because of the degrees, it must
necessarily be P G(X). 

2.4.2. Base, coding, dual code and code annihilator

We now develop the basic ideas on the cyclic codes.

2.4.2.1. Cyclic code base


Let there be a cyclic code of length n, with a generator g(X) of degree n − k, and
g(X)|X n − 1.

PROPOSITION 2.26. Let h(X) = (X n − 1)/g(X), that is to say k is the degree of


h(X). The family F = {g(X), Xg(X), . . . , X k−1 g(X)} is one of the basis of the
code (g(X)).

Proof. Let the word a(X)g(X) belong to the code. Let us pose that a(X) =
q(X)h(X) + r(X), with r(X) = 0 or deg(r(X)) < deg(h(X)). In A we have
a(X)g(X) = r(X)g(X) since g(X)h(X) = 0, and thus the family F is a generating
one. Let us prove that it is of rank k.

Let us suppose α0 g(X) + α1 Xg (X) + · · · + αk−1 X k−1 g(X) = 0, in A. In F2 [X]


we deduce it (α0 + α1 X + · · · + αk−1 X k−1 )g(X) = λ(X)(X n − 1), but the degree
of the first member is at the most n − 1. Thus two members are zero, and we have
α0 = α1 = · · · = αk−1 = 0. 

EXAMPLE 2.30 (n = 7, g(X) = 1 + X + X 3 ). A base of (g(X)) is {1 + X + X 3 ,


X +X 2 +X 4 , X 2 +X 3 +X 5 , X 3 +X 4 +X 6 }. The code generator matrix g associated
64 Channel Coding in Communication Networks

to this base is:


⎛ ⎞
1101000
⎜ 0110100 ⎟
G=⎜ ⎟
⎝ 0011010 ⎠
0001101
To build the code we make all the linear combinations of the lines of this generator
matrix G. We find 24 words.

2.4.2.2. Coding
The first coding can be derived from proposition 2.26.

We suppose to have information blocks of length k. Each block will be encoded


by means of a length n code. We thus add n − k symbols of redundancy to k bits of
information.

We will describe how two of classical codings for cyclic codes are performed.
Thus, let us suppose that the code considered here has a length n, and is gener-
ated by a polynomial g(X) of degree n − k. Information is a block of length k,
which is represented by a binary sequence, let’s say i0 , i1 , . . . , ik−1 . We will asso-
ciate the polynomial to this sequence (known as information polynomial) according
to: i(X) = i0 + i1 X + · · · + ik−1 X k−1 .

The first coding consists of calculating the polynomial i(X)g(X). It is clearly in


the code, since it is a multiple of g(X): it is a codeword. This coding is referred to as
non-systematic.

The second coding consists of calculating first X n−k i(X). Then we calculate the
remainder r(X) of the division of this new polynomial X n−k i(X) by g(X). The
polynomial obtained is r(X) + X n−k i(X). The sequence of its coefficients is sent
through the transmission channel. Generally, we first send the largest degree.

The following proposition proves that we have indeed carried out a coding.

PROPOSITION 2.27. If (X) is an information polynomial, then the polynomial r(X)+


X n−k i(X) is the corresponding codeword.

Proof. The polynomial r(X) + X n−k i(X) is divisible by g(X), thus it belongs to the
code. 

This second coding is known as systematic, because information appears in it


clearly. We sometimes speak of a “systematic code”. It is not correct, because a cyclic
code is an ideal in A. It does not depend at all on the performed coding.
Block Codes 65

EXAMPLE 2.31 (n = 7, g(X) = 1+X +X 3 ). Let us take the information block equal
to 1011. The polynomial i(X) is equal to 1 + X 2 + X 3 . The polynomial X n−k i(X)
is X 3 + X 5 + X 6 . The remainder of the division of this polynomial by g(X) is 1. The
coded polynomial is, therefore, 1 + X 3 + X 5 + X 6 . As the length of the code is 7, we
will send the following sequence of 7 binary symbols through the channel 1001011
(the first sent is on the right).

EXAMPLE 2.32 (n = 15, g(X) = 1 + X 3 + X 4 ). Let us take the information block


equal to 10111110001. The polynomial i(X) is equal to 1 + X 2 + X 3 + X 4 + X 5 +
X 6 + X 10 . The polynomial X n−k i(X) is X 4 + X 6 + X 7 + X 8 + X 9 + X 10 + X 14 .
The remainder of the division of this polynomial by g(X) is X 2 + X 3 . The coded
polynomial is thus X 2 + X 3 + X 4 + X 6 + X 7 + X 8 + X 9 + X 10 + X 14 . We will
therefore send 001110111110001 (the first sent is on the right).

2.4.2.3. Annihilator and dual of a cyclic code C


Let there be a code C equal to (g(X)).

DEFINITION 2.10 (ANNIHILATOR). The annihilator of (g(X)) is the set of polynomi-


als v(X) with a zero product with all the words of the code C. It is written Ann(C).

Let h(X) = (X n − 1)/g(X).

PROPOSITION 2.28. The annihilator of C is the cyclic code (h(X)).

Proof. Ann(C) is clearly a cyclic code and is therefore generated by a h(X) which
divides X n − 1, which is of minimal degree, but h(X) is in the code Ann (C). Thus,
H(X) divides h(X). Since H(X)g(X) = 0, it means that the degree of H(X) is
equal to or higher than that of H(X). Thus, H(X) = H(X). 

DEFINITION 2.11 (ORTHOGONAL). The orthogonal (i.e. dual) of (g(X)) is the set of
all the polynomials v (X) with zero scalar product with all the codewords of the code
C. It is noted by (g(X))⊥ , or C ⊥ .

PROPOSITION 2.29. The dual of C is the cyclic code ((h(X −1 ), X n − 1)).

Proof. Let τ be the application of A in A which sends all a(X) over a(X −1 ). We
prove (and we will admit it) that τ is an automorphism. In addition we prove (and we
will also admit it) the equality:
n−1

a(X)b(X) = < a(X), X i τ (b(X)) > X i
i=0
66 Channel Coding in Communication Networks

This equality implies that a(X)b(X) = 0, if we have ∠a(X), X i τ (b(X)) = 0


for all i. We proceed in two stages:
1) a polynomial b(X) is thus in Ann(C) if τ (b(X)) ∈ (g(X))⊥ . Thus, h(X −1 ) ∈
(g(X))⊥ and, thus, (h(X −1 )) ⊆ (g(X))⊥ ;

2) in addition, let u(X) ∈ (g(X))⊥ . Then g(X), u(X)X i  = 0 and i=0,n=1
g(X), u(X)X i X i = 0. This is equivalent to τ (u(X)) ∈ Ann(g(X)). Thus, u(X) ∈
(h(X −1 )), and finally (h(X −1 )) ⊇ (g(X))⊥ , which proves the equality. Thus,
τ (u(X)) is a multiple of h(X), and consequently, according to proposition 2.25, the
code (h(X −1 )) is equal to the code (h(X −1 )), X n − 1). 

2.4.2.4. Cyclic code and error correcting capability: roots of g(X)

We consider a cyclic code of length n generated by a polynomial g(X) with mini-


mum distance d.

One of the large advantages of cyclic codes is that we can have information on
their minimum distance, i.e. on their error correcting capability. More precisely, the
error correcting capability of a code is linked to the roots of the generator.

Let α be a primitive nth root of unity.

2.4.2.5. The Vandermonde determinant

The following result expresses a property of the Vandermonde determinant.

PROPOSITION 2.30. Let us consider the determinant D with d − 1 undetermined


X0 , X1 , . . . , Xd−2 :
 
 1
 1 1 1 · · · 1 
 X0 X1 X2 X3 · · · Xd−2 
 2 2 2 2 2

D =  X0 X1 X2 X3 · · · Xd−2 
 
 .. .. .. .. . . .. 
 .
 d−2 d−2 . . . . . 
X d−2 d−2 
0 X 1 X 2 · · · · · · Xd−2

It is equal to the product j>i ,i=0,...,d−3 (Xi − Xj ).

Proof. A determinant is an alternate form of its columns. We observe here that if it is


supposed that two variables
 are equal, the determinant D is zero. It is thus divisible
by the product P = j>i,i=0,...,d−3 (Xi − Xj ).
Block Codes 67

d−2 
Let us consider the coefficient of Xd−2 in P . It is equal to j>i,i=0,...,d−4 (Xi −
Xj ). By a recurrence on the size of the determinant we easily prove that this coefficient
is equal to the determinant:
 
 1
 1 1 1 · · · 1 
 X0 X1 X2 X3 · · · Xd−3 
 
 X02 X12 X22 X32 · · · Xd−3
2 
 
 .. .. .. .. . . .. 
 . . .
 d−3 d−3 d−3 . . . 

X d−3 
0 X1 X2 · · · · · · Xd−3

This proves that the determinant D is equal to the product P . 

COROLLARY 2.2. Let j1 , . . . , jd−1 be distinct integers in {0, . . . , n1 }. Let there be


the determinant D defined by:
 
 (αi )j1 (αi )j2 · · · (αi )jd−1 

 (αi+1 )j1 (αi+1 )j2 · · · (αi+1 )jd−1 
 
D= .. .. .. .. 
 . . . . 
 
 (αi+d−2 )j1 (αi+d−2 )j2 · · · (αi+d−2 )jd−1 

It is not zero.

Proof. One of the properties of the determinants is to be a multi-linear function of their


columns. We can thus write: D = αi+2i+3i+···+(d−2)i D , with:
 
 1 1 ··· 1 
 
 αj1 α j2
· · · α dd−1 
  
D = .. . . .
. 
 . . . 
 
 α(d−2)j1 α(d−2)j2 · · · α(d−2)jd−1 

Since α is of the order n and we have d − 2 < n, the elements αj1 , . . . , αjd−1 are
all different from each other. We may thus apply proposition 2.30 and D is non-zero,
as well as D. 

2.4.2.6. BCH theorem

We can provide a lower bound of the minimum distance from a cyclic code. This
result is based on corollary 2.2.
68 Channel Coding in Communication Networks

PROPOSITION 2.31 (BCH THEOREM). Let there be a code C of length n admitting


among its roots the following elements: αi , αi+1 , . . . , αi+d−2 , where α is a nth root
of unity, whose order is greater than d − 2. Then the code has a minimum distance of
at least d.

Proof. A parity check matrix of the code is clearly:


⎛ ⎞
1 αi α2i α3i · · · α(n−1)i
⎜ 1 αi+1 α2(i+1) α3(i+1) · · · α(n−1)(i+1) ⎟
⎜ ⎟
⎜ 1 αi+2 ··· · · · · · · α(n−1)(i+2) ⎟
⎜ ⎟
⎝1 · · · ··· ··· ··· ··· ⎠
1 αi+d−2 · · · · · · · · · α(n−1)(i+d−2)

Based on the corollary 2.2, for any choice of d − 1 columns of this matrix we
obtain a determinant resembling the one studied in corollary 2.2. Every element of the
kernel of H (that is, of the code considered) thus has a Hamming weight that cannot
be less or equal to d − 1. 

2.4.3. Certain cyclic codes

We provide here some of the most used classical codes. We will not speak of the
generalized RS codes, of the alternating codes, or of the Goppa codes. Among the
latter we find codes resulting from algebraic geometry, which is outside the scope of
our subject matter. Often we will present only binary codes, although they also exist
in Fq .

2.4.3.1. Hamming codes


Cyclic Hamming codes are equivalent to linear Hamming codes. They are very
simple codes, with error correcting capability equal to 1. Let there be the field Fq ,
q = 2r . Let α be a primitive of this field.

PROPOSITION 2.32. The binary code C having for roots the elements of the class of
α is a cyclic code (n = q − 1, k = n − r, 3), called the Hamming code.

Proof. We take as a generator the minimum polynomial of α. Since α is of the order


q − 1, no polynomial in the form 1 + X 1 (i < q − 1) can have α as root. The minimum
weight of the code is thus 3. 

There exists a generalization of these Hamming codes. Let there be the field Fqm ,
q = 2r . Let α be a primitive of Fqm . We pose β = αs . We seek a cyclic code with β
for root, with a error correcting capability of 1, i.e. a code (n, k, 3).
Block Codes 69

PROPOSITION 2.33. Such a code verifies the following properties:


1) we have n = (q m − 1)/(q m − 1, s), and k = q − 1 − w where w is the cardinal
of the class of β;
2) such a code exists if ((q m − 1)/(q m − 1, s), q − 1) = 1.

Proof. Indeed:
1) the length is the order of β. As the order of α is q m − 1, we directly have the
order of β;
2) since we want d = 3 no polynomial of the form 1+µX i (µ ∈ Fq , i < n) should
admit β as a root. Thus β i should only belong to Fq if it equals 1. Multiplicative groups
(β) and Fq \{0} must have an intersection reduced to {1}. As their respective cardinals
are (q m − 1)/(q m − 1, s) and q − 1, the proposition follows directly. 

PROPOSITION 2.34. The parameter s must be a multiple of q − 1.

Proof. So that ((q m − 1)/(q m − 1, s), q − 1) = 1, it is necessary that (q m − 1, s) be


a multiple of q − 1, since q m − 1 is divisible by q − 1. 

According to this proposition we see that the length of such a code cannot exceed
(q m − 1)/(q − 1). We will now demonstrate that this length can be reached.

PROPOSITION 2.35. There exists such a code of length (q m − 1)/(q − 1), if (m,
q − 1) = 1.

Proof. We pose β = αq−1 . The length is indeed (q m − 1)/(q − 1). The multiplicative
groups (β) and Fq \{0} have the respective cardinals (q m − 1)/(q m − 1, q − 1) and
q − 1, i.e. again (q m − 1)/(q − 1) and q − 1. But we have (q m − 1)/(q − 1) = q m−1 +
q m−2 +· · ·+q +1. Since q i −1 = si (q −1) for certain si , we have (q m −1)/(q −1) =
λ(q − 1) + m. It follows that such a code exists if ((q − 1), m) = 1. 

2.4.3.2. BCH codes


Let there be Fq , q = 2r and a primitive element α. A binary BCH code of length n
is a code admitting as roots the cyclotomic classes of elements αi , αi+1 , . . . , αi+2δ−1
for any i.

PROPOSITION 2.36. This code has a minimum distance equal to at least 2δ + 1.


Moreover, its dimension is at least n − δr.

Proof. The result for the minimum distance is a consequence of the BCH theorem.
The dimension stems from the fact that all the cyclotomic classes have a cardinal that
divides r (see proposition 2.46). 
70 Channel Coding in Communication Networks

2.4.3.3. Fire codes


These are binary codes directly defined by their generator g(X) = p(X)(X c − 1),
where p(X) is an irreducible polynomial of degree m, not dividing X c −1. The length
n of the code is the least common multiple (LCM) of c and of the exponent e of p(X).

Such a code C can correct any packet of errors (or burst) of length b and detect all
bursts of length d, if the following conditions are verified:
1) d ≥ b,
2) b + d ≤ c + 1.

To prove this capacity it is enough to demonstrate that this code cannot contain the
sum of a burst of length b and a burst of length d.

A burst of length b can be represented by a polynomial B(X) of degree b − 1 and


constant 1.

PROPOSITION 2.37. C does not contain the sum of a burst of length b and of a burst
of length d.

Proof. Reduction and absurdum: let us suppose that C contains a polynomial


X i B1 (X) + X j B2 (X). By the cyclicity of code this is equivalent to saying that C
contains a polynomial in the form of B1 (X) + X k B2 (X) (with k = j − i modulo
n). Since g(X) is divisible by X c − 1, the latter must divide B1 (X) + X k B2 (X).
Let us pose k = qc + r (Euclidean division). We deduce that X c − 1 must divide
B1 (X) + X r B2 (X) + (X qc − 1)B2 (X), therefore, B1 (X) + X r B2 (X). We may
write B1 X(X) + X r B2 (X) = (X c − 1)M (X). We proceed in two stages.

Let us suppose that M (X) is not zero. We observe that we have r + d − 1 > b − 1,
and that, therefore, r + d − 1 = c + u, where u is the degree of M (X). Thus
r = c − d + 1 + u ≥ b + u. From this we deduce that r ≥ b > b − 1 and that
r > u. From this we see that in the right-hand side term there exists a monomial X r ,
but that this monomial cannot exist in the right-hand side term. The contradiction is
obvious.

Thus, B1 (X)+X r B2 (X) = 0, which involves (because of the constant of B1 (X))


that r = 0 and B1 (X) = B2 (X). Thus we are led to suppose that C contains a
polynomial B1 (X)(1 + X qc ). Since g(X) is divisible by p(X), the latter must divide
B1 (X)(1 + X qc ). Due to the degrees it cannot divide M1 (X). It thus divides 1 + X qc .
Thus, 1 + X c and 1 + X n must divide 1 + X qc , which is impossible since qc < n and
n is the LCM of e and c. 
Block Codes 71

2.4.3.4. RM codes
Let α be a primitive of Fq , q = 2m . Binary RM codes are defined by the expression
of the powers of α, which are roots of the code. Their roots is the αi such that the
Hamming weight of the binary decomposition of i is strictly lower that m − r. It is an
RM code of the order r.
m
P ROPOSITION 2.38. An RM code of the order r has length q − 1, dimension 1 + 1 +
m m m−r
2 + · · · + r , and minimum distance 2 − 1.

Proof. The proof is outside the scope of this work. 

2.4.3.5. RS codes
RS codes are codes whose coefficients are in F2r , with r ≥ 2, of length 2r − 1.
There roots are αi , αi+1 , . . . , αi+δ−1 where α is a primitive.

PROPOSITION 2.39. Such a code is a (q − 1, q − 1 − δ, δ + 1) code. Its BCH distance


is its true distance.

Proof. We have a deg g(X) = δ, and WH (g(X)) = δ + 1. 

2.4.3.6. Codes with true distance greater than their BCH distance
In an exhaustive article1 “One the minimum distance of cyclic codes”, J.H. van
Lint and R.M. Wilson provide all the binary codes with length not exceeding 61 that
have a true distance strictly larger than their BCH distance. Here are some examples;
each code is described in the form (length, dimension, true distance, BCH distance):
(31, 20, 6, 4), (31, 15, 8, 5), (39, 12, 12, 8), (45, 16, 10, 8),
(47, 23, 12, 6), (55, 30, 10, 5), (55, 20, 16, 8)
They are decodable by the FG-algorithm which we provide hereafter.

2.4.3.7. PG-codes
2.4.3.7.1. Introduction
We recall (see section 2.3) that we regard Fqm+i as a vector space of dimension
m + 1 in Fq , with q = 2s . Let α be a primitive of Fqm+i and β a primitive of Fq . We
q m+1 −1
q m+1 −1
have β = α q−1 . We have a projective geometry with n = q−1 points.

We will construct codes in F2 [X]/(X n − 1), but prior to this we will provide two
definitions.

1. VAN LINT J.H., WILSON R.M., “On the minimum distance of cyclic codes”.
72 Channel Coding in Communication Networks

DEFINITION 2.12. Let there be 2 integers i and j, and their respective writings in
base 2 be (i0 , . . . , iu ) and (j0 , . . . , ju ). It will be said that j is under i, if (j0 , . . . , ju )
is less than or equal to (i0 , . . . , iu ) for the produced order.

For example, j = 25 and i = 37. Then j is not under i. With j = 25 and i = 29, j
is under i.

DEFINITION 2.13. We use Ws (t(2s − 1)) to indicate the maximum number of integers
in the form i(2s − 1) disjoint two by two that are under t(2s − 1).

For example s = 2, t = 5. Since 3 and 12 are under 15 and are disjoint, we have
W2 (5(22 − l)) = 2.

2.4.3.7.2. PG-codes
We consider the code C such that its orthogonal C ⊥ contains all the projective
subspaces of the order r of the field Fqm+i . This code C is a code called a PG-code of
the order r. The code C ⊥ is characterized by the following proposition.

PROPOSITION 2.40. The C ⊥ code has for zeros all the elements in the form αt(2s−1)
where Ws (t(2s − 1)) ≤ r, t = 0.

Proof. The proof is outside the scope of this work. 

There does not exist a formula giving the dimension of this code. It should be
constructed, so that the sought code C can be deduced from it.

The minimum distance of C is given in the following proposition.

PROPOSITION 2.41. The BCH distance of the code C is given by:

ps(m−r+1) − 1
dBCH = +1
ps − 1

Proof. The proof is outside the scope of this work. 

The error correcting capability of the code C stems from the following proposition.

PROPOSITION 2.42. We have the following results:


1) the number J of projective subspaces of rank r, containing a projective subspace
of a fixed rank r - 1, verifies: J = dBCH − 1;
Block Codes 73

2) the J projective subspaces have a two by two intersection which is reduced to a


projective subspace of the order r − 1;
3) we can correct up to J/2 errors by majority decoding.

Proof. Indeed:
1) it is the number of subspaces of the field that contains a fixed subspace of
dimension r;
2) two projective subspaces of order r cannot have an intersection of order r − 1
since they are distinct;
3) see section 2.6. 

PROPOSITION 2.43. The length of C is equal to the number of points.

Proof. This is straightforward. 

Since dBCH = J + 1, we can correct up to J/2 errors. The J projective subspaces


are disjoint two by two, apart from the projective subspace of order r − 1.

2.4.3.7.3. An application
Majority decoding makes it possible to carry out a cheap and fast electronic opera-
tion, especially when decoding is in one stage. If decoding has more than three stages,
the complexity becomes very high.

The Japanese needed to find a powerful code with a cheap decoder. They wanted
to use it for their Teletext. Constraints: length of information 81, number of errors to
be corrected: 8. The solution found uses an information length of 82. The code is then
shortened by one position. The respective values of the parameters are: p = 2, r = 1,
m = 2. Decoding has 1 stage. We deduce from it the dimension of C: 82, the length of
the code: 273, its error correcting capability: 8 (s = 4, because 273 = 24×2 + 24 + 1).
It is a (273, 82, 18) code shortened by 1 position, decodable by majority vote with 1
level. The price of an encoder/decoder was 175 FF in 1995.

2.4.3.8. QR codes
Binary quadratic residue codes (QR codes) have a length p, where p is a prime
number in the form 8m + 1 or 8m − 1. For each such p there are 4 QR codes. One has
all the modulo p squares as roots, another has all these squares and 1, another has all
the non squares, and the last one has all non squares and 1.

PROPOSITION 2.44. If p = 8m + 1, then d2 > p, and if p = 8m − 1, then d(d − 1) ≥


p − 1.

Proof. The proof is outside the scope of this work. 


74 Channel Coding in Communication Networks

These codes have an important group of automorphisms. We can then think of


finding good algorithms of trapping isolated errors.

2.4.4. Existence and construction of cyclic codes

Faced with a list of tasks proposed by an industrialist, we are often led to seek if
there exists a code which fulfills the requirements. If one does exist, it then has to be
constructed. There are tables of known codes, for a certain number of values of the
parameters n, k and d.

2.4.4.1. Existence
It is often useful to simply know if there exists a cyclic code with the given param-
eters. It is the case when we are trying to satisfy a list of tasks. The first stage consists
of testing the existence of a code. More precisely, we are led to examine whether there
exists a polynomial g(X), divisor of X n − 1, with a given degree.

PROPOSITION 2.45. There exists a generator g(X), divisor of X n − 1, with a given


degree s, and only if there exists in Z/(n) a set of cyclotomic classes under multipli-
cation by 2, whose cardinal is equal to s.

Proof. Let F2r be the smallest field containing the nth roots of unity. Let α be a
primitive of this field, and β be a nth primitive root of unity.

Let us suppose that the polynomial g(X) has a degree s. Its s roots are powers
of β. The corresponding exponents are elements of Z/(n). According to proposition
2.11 (see section 2.3), these roots are grouped by cyclotomic classes, and, therefore,
by the powers.

The inverse is straightforward. The polynomial that admits cyclotomic classes as


a set of roots divides X n − 1 and is binary (see proposition 2.20). 

PROPOSITION 2.46. In Z/(2n − 1) there exists a cyclotomic class with a cardinal s


if s divides n.

Proof. If there exists a class with a cardinal s, then s divides n. Let there be x in
Z/(2n − 1), whose class has a cardinal s. By definition of cyclotomic classes we have
2s x = x. In addition, there is also 2n x = x. We use the Euclidean equality between x
and n : n = qs + r, 0 ≤ r < s. From there we obtain (2s )q × 2r x = x, then 2s x = x,
which implies r = 0, otherwise the class of x would contain less than s elements.
Thus, s divides n.

If there exists an s such that s divides n, then there exists a class with a cardinal
s. Let x = (2n − 1)/(2s − 1). We have (2s − 1)x = 0, and the cardinal of the class
of x is thus at most equal to s. Let us suppose that the cardinal is t(0 < t < s).
Block Codes 75

We then have: (2t − 1)x = 0, i.e. in Z : (2t − 1)x = µ × n. This implies:


(2t − 1)((2n − 1)/(2s − 1)) = µ(2n − 1), from where 2t − 1 = µ(2s − 1), which is
impossible. 

2.4.4.2. Construction
There exist various possibilities to construct a binary cyclic code with a given
length n:
– we can use the cyclotomic classes of Z/(n), then construct minimum polyno-
mials;
– we can directly seek g(X) by factorizing X n − 1;
– we may also be led to seek a code which contains given words.

2.4.4.2.1. Use of classes of Z/(n)


As soon as we ensure the existence of the generator of a cyclic code g(X) with
length n, we construct it using the following proposition.

PROPOSITION 2.47. Let g(X) be the generator of a cyclic code of length n with a
degree n − k. Let {Ci1 , Ci2 , . . . , Cir } be the family of cyclotomic classes found in
Z/(n). The polynomial g(X) has the elements of the forms αj as roots, where j tra-
verses the joining of the classes {Cαi1 , Cαi2 , . . . , Cαir }.

Proof. This is straightforward. This proposition simply indicates the link between
classes in Z/(n) and the roots of g(X). We will note that the cardinal of the join-
ing of classes must be equal n − k. 

2.4.4.2.2. Factorization by the Berlekamp method


We use a linear algebra method introduced by E. Berlekamp. This method is based
on the following propositions describing and justifying the factorization of a polyno-
mial f (X) of the r degree. In the case of cyclic codes we are led to factorize polyno-
mials of the form X n − 1, for odd n.

PROPOSITION 2.48. In A = F2 [X]/(f (X)) the elevation to the square, which we will
note h, is a linear endomorphism.

Proof. We have successively:


a(X) → h(a(X)) = a(X)2 = ra (X) + qa (X) + qa (X)f (X)
b(X) → b(X)2 = rb (X) + qb (X)f (X)
From this we deduce:
h(a(X) + b(X)) = (a(X) + b(X))2 = ra2 (X) + rb2 (X) + q(X)f (X)
= ra (X) + rb (X) 
76 Channel Coding in Communication Networks

In F2 [X] let us suppose that f (X) divides a2 (X) − a(X), for a certain a(X) of a
degree strictly smaller than the degree of f (X).

PROPOSITION 2.49. The GCD (f (X), a(X)) is a non-trivial factor of f (X).

Proof. We have a2 (X) − a(X) = a(X)(a(X) − 1) = λ(X)f (X), for a certain λ(X).
Any irreducible factor p(X) of f (X) divides either a(X) or a(X) − 1, but not both,
because otherwise it would divide their difference 1. Thus this PGCD (f (X), a(X))
is formed by a family of primary factors of f (X). It can be equal neither to f (X) nor
to 1, because of the hypotheses regarding the degree of a(X). 

To factorize f (X) it is enough to find a(X), which the Berlekamp method gives
us. The identical application is noted id.

PROPOSITION 2.50. Any element of A, different from 1, which is in the kernel of h−id,
is such a polynomial a(X).

Proof. Indeed, a(X) ∈ Ker(h − id) is equivalent to a2 (X) − a(X) = 0 in A, i.e.


a2 (X) − a(X) = λ(X)f (X) in F2 [X]. 

To find the kernel of h − id we proceed as follows:


1) considering the base {1, X, X 2 , . . . , X r−1 } (r is the degree of f (X)) we
construct the matrix M of the endomorphism h, then we construct M − I;
2) using the Gaussian method we seek a base of the kernel of M − I;
3) if the only polynomial is 1, we cannot factorize. The polynomial f (X) has only
one primary factor. Otherwise, take a polynomial different from 1. It is the sought after
a(X);
4) we calculate (f (X), a(X)). We obtain a factor fi (X) of f (X), then the second
one by simple division of f (X) by f1 (X). We then have f (X) = f1 (X)f2 (X), and
we reiterate with these two new polynomials.

EXAMPLE 2.33 (f (X) = 1 + X 2 + X 3 + X 4 ). We have:


⎛ ⎞
0010
⎜ 0101 ⎟
M −I =⎜ ⎝ 0101 ⎠

0010

from where successively, by the Gaussian algorithm:


⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0010 1100 1100 1010 1010
⎜ 1100 ⎟ ⎜ 0010 ⎟ ⎜ 0010 ⎟ ⎜ 0010 ⎟ ⎜ 0100 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1100 ⎠ → ⎝ 1100 ⎠ → ⎝ 0000 ⎠ → ⎝ 0000 ⎠ → ⎝ 0000 ⎠
0010 0010 0010 0100 0000
Block Codes 77


which yields the kernel matrix: 0101 3
1000 . It provides a(X) = X + X , and we easily
3
find (f (X), a(X)) = 1 + X. The second factor is 1 + X + X . Neither of these two
factors can be factorized further.

r
We can prove that the factorization of X 2 − X gives all the irreducibles with a
degree dividing r.

EXAMPLE 2.34. Let us factorize X 7 − 1 in F2 [X]. With the same notations as in the
previous example we have:
⎛ ⎞ ⎛ ⎞
1000000 0000000
⎜ 0000100 ⎟ ⎜ 0100100 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0100000 ⎟ ⎜ 0110000 ⎟
⎜ ⎟ ⎜ ⎟
M =⎜ ⎟ ⎜
⎜ 0000010 ⎟ and M − I = ⎜ 0001010 ⎟

⎜ 0010000 ⎟ ⎜ 0010100 ⎟
⎜ ⎟ ⎜ ⎟
⎝ 0000001 ⎠ ⎝ 0000011 ⎠
0001000 0001001
We notice the simplicity of the construction of this matrix. Using the Gaussian
method we obtain the needed matrix:
⎛ ⎞
0110100
M = ⎝ 0001011 ⎠
1000000

We take a(X) = X + X 2 + X 3 (first line) and we obtain f1 (X) = 1 + X + X 3 ,


then f2 (X) = 1 + X + X 2 + X 4 . We factorize f2 (X). The new matrix M − I equals:
⎛ ⎞
0011
⎜ 0111 ⎟
M −I =⎜ ⎝ 0100 ⎠

0000
0011
The kernel matrix is 1000 .

We take a(X) = X 2 +X 3 and obtain 1+X +X 2 +X 4 = (1+X)(1+X 2 +X 3 ).


We factorize f1 (X). The new matrix M − I equals:
⎛ ⎞
000
M − I = ⎝ 011 ⎠
010
The kernel matrix is (1000).

We cannot factorize further. It can be easily verified that 1+X +X 3 is irreducible.


Finally we have X 7 − 1 = (1 + X)(1 + X + X 3 )(1 + X 2 + X 3 ). Thus, there are
23 − 2 non-trivial cyclic codes (the trivial ones have a 0 or 1 generator).
78 Channel Coding in Communication Networks

2.4.4.2.3. Construction of a cyclic code generated by given words


We may sometimes have to find the smallest cyclic code, which contains one or
more given codewords. Let m(X) be a given binary codeword of length n. We con-
sider it as an element of A = F2 [X]/(X n − 1). We seek the smallest cyclic code of
A containing this codeword.

PROPOSITION 2.51. Let λ(X) be the polynomial of the smallest degree, such that we
have λ(X)m(X) = 0. The required code is the ideal ((X n − 1)/λ(X)).

Proof. Indeed:
1) the set of polynomials u(X), such that u(X)m(X) = 0 is an ideal of A. As
any ideal is principal, this ideal is generated by a polynomial with the smallest possible
degree. Thus, it is the polynomial λ(X) of the statement;
2) the polynomial λ(X) divides X n − 1. Thus, m(X) is in the code ((X n − 1)/
λ(X)). We pose g(X) = (X n − 1)/λ(X);
3) everything under the strict code of the code (g(X)) is generated by a polyno-
mial of the form u(X) × g(X) (with u(X) = 1). If m(X) is in such a subcode, then
m(X) must be canceled by (X n − 1)/u(X)g(X), which is impossible, because its
degree is strictly smaller than that of λ(X).

Thus, the required code is ((X n − 1)/λ(X)). 

Using the Gaussian method pivots we easily find the required code. Let us note
that the pivots may be in any column.

EXAMPLE 2.35. Find the smallest cyclic code containing the following codeword:
110010100001110. By the Gaussian method we find, for example:

1100101000011101
011001010000111X
0111100010011011 + X 2
101111000100110X + X 3
1001110000111011 + X 2 + X 4
0000000000000001 + X + X 3 + X 5 = λ(X)

Each binary vector-row is followed on its right by a polynomial v(X). This trans-
lates the fact that the row is equal to v(X)m(X). These polynomials v(X) appear
during the application of the Gaussian method.
Block Codes 79

We find g (X) = (X 15 − 1)/λ(X) = 1 + X + X 2 + X 4 + X 5 + X 8 + X 10 .


The pivots are in columns 7, 8, 9, 10 and 11, and the first column is on the right with
number 1.

In the general case we want to determine the smallest cyclic code containing the
codewords m1 (X), m2 (X), . . . , ms (X). Using the previous construction we con-
struct the polynomials λ(X), λ2 (X), . . . , λs . The polynomial with the smallest degree
canceling the mi (X) is clearly the LCM of the λi (X). Another method is to seek the
PGCD (m1 (X), m2 (X), . . . , ms (X)). It is the generator of the required code.

2.4.4.3. Shortened codes and extended codes


2.4.4.3.1. Shortened codes
We remove the s first components of information from each codeword of the code
C. This amounts to considering only those codewords in C, which have these s com-
ponents equal to 0. A shortened cyclic code is a linear code.

2.4.4.3.2. Extended codes


We add a parity symbol to each codeword, which is such that the sum of the sym-
bols of each extended codeword is even. The following is an interesting question:
is it possible for an extended cyclic code to be cyclic? That is one of the suggested
exercises.

2.4.4.4. Specifications
A specification is a set of constraints that the system of coding must satisfy. The
principal parameters, which industry specialists make a point of taking into account,
are as follows:
– length L of the information string to be coded,
– maximum redundancy rate,
– maximum length of the codewords,
– gross flow (i.e. in terms of binary symbols),
– net flow (i.e. in terms of information bits),
– residual error rate for an input error rate (i.e. pr for pe ),
– average space without errors between two badly decoded consecutive words,
– electronic constraints (delicate).

2.4.4.5. How should we look for a cyclic code?


There is no general method. We can start by looking for the possible values of k:
those that divide the length of information strings. Then one can try associating the
possible values of n to each possible value of k. We will study the values of k and
n by ascending values, in order to have the shortest possible code (thus, a priori, the
80 Channel Coding in Communication Networks

most economic). Then for fixed n and k we must examine whether there exists a cyclic
code. To that end we study the distribution of cyclotomic classes in Z/(n).

On the basis of this study we find what can be the error correcting capability of the
code. It is then necessary to use the formulae connecting pr to pe , as well as the BCH
theorem. We can have an idea of the decoding power of the code from the following
proposition.

PROPOSITION 2.52. Let there be a cyclic code of length n, dimension k, with error
correcting capability of t errors per word. Let pc be probability of channel error, and
pr be the residual probability per corrected word. We have the inequality:
n  
n i
npr ≤ (t + i) pc (1 − pc )n−i
i=t+1
i

Proof. Since pr is the probability of error per symbol of a received word, the expecta-
tion of the number of residual errors per corrected word is npr (binomial distribution).
We will provide an increase of this expectation.

Let us consider the event “the word has been decoded incorrectly”. This event
is included in the following event E “for any value i (i = t + 1, t + 2, . . . , n) of
the number of transmission errors occurring, the decoding algorithm decodes using
likelihood decoding”. This means that the number of errors in the “corrected” word
is at most equal to i + 1 (i comes from the channel, t comes from decoding). The
expectation of the number of errors in this event E is:
n  
n i
(t + i) pc (1 − pc )n−i
i=t+1
i

which yields the result announced in the statement. 

It will be noted that pr is also the probability of residual errors for the information
block recovered after decoding, provided that this decoding is systematic. Otherwise
the residual rate is much greater.

In practice, when pc is not greater than 10−3 , we are able to take as a relation:
 
n
npr = (2t + 1) pt+1 (1 − pc )n−t−1
t+1 c
It has to be well noted that the value pr obtained is the one provided by the code.
If we require a residual probability of p , we must then check for the code considered,
the inequality:
 
n
(2t + 1) pt+1 (1 − pc )n−t−1 ≤ np
t+1 c
Block Codes 81

If it is satisfied, the code is acceptable.

Lastly, we are able to take into account the more delicate constraints on electronics.

EXAMPLE 2.36 (EXAMPLE OF CYCLIC SEARCH FOR CODE). Specifications:


– channel (i.e. input) error rate: 10−4 ,
– maximum redundancy rate: 0.18,
– maximum acceptable residual error rate: 10−5 ,
– information strings of length 105.

We will look for a natural (i.e. not shortened) cyclic code:


1) The possible values for k are the divisors of 105:

{1, 3, 5, 7, 15, 21, 35, 105}

2) Using the constraint on the redundancy rate we find the inequality n ≤ k/(0.82).
This yields the possible values for n for a value of k:

k 1 3 5 7 15 21 35 105
n 1 3 3 7 15 15 31 127

We see that there exists, perhaps, a natural cyclic code of length 127. We examine
the classes of Z/(127). We will look for a joining of these classes, with a cardinal
127 − 105 = 22. The cardinals of classes are divisors of 7 (because 127 = 27 − 1).
There are, thus, cardinal classes 1, 7. Since 22 = 3 × 7 + 1, we conclude that there
exists a code whose roots contain {α, α3 , α5 , α0 }.

The apparent distance of the code is 8. It thus corrects 3 errors per codeword. If
we approximate
n the member on the right of the formula linking pr to pe by (t + 1) ×
t+1
(t+1) × p × (1 − pe )n−t−1 , we must verify the inequality:
 
n
(t + 1) × × pt × (1 − pe )n−t ≤ 127 × 10−5 .
t

We obtain: (4) × 127 4 × 10−16 × (0.9999)123 , to compare with 127 × 10−5 . We
also have: 4.084 × 10 to compare with 127 × 10−5 . It is acceptable, therefore there
9

exists a natural cyclic code that satisfies the required constraints.

2.4.4.6. How should we look for a truncated cyclic code?


We conduct the same study as before, but we allow ourselves to truncate the con-
sidered codes, which yields a greater choice.
82 Channel Coding in Communication Networks

2.4.5. Applications of cyclic codes

Since the beginning of 1970s many applications of error correcting codes have
been introduced. Let us cite a few.
– transmissions of images by remote spacecrafts,
– satellite transmissions,
– underwater transmissions,
– optical discs,
– Hubble,
– bar-codes,
– computer memory,
– mobiles,
– CD readers,
– cryptography.

2.5. Electronic circuits

The implementation of error correction on board a satellite, a remote spacecraft, in


computer memory, on optical or magnetic discs, in CD readers, is carried using elec-
tronic circuits. These circuits primarily use shift registers, carrying out multiplications
or divisions of polynomials with coefficients in F2 or F2r .

In this section, circuits are drawn without taking traditional standards into account,
as far as logical gates and oscillation are concerned. We will not represent connections
with the clock.

2.5.1. Basic gates for error correcting codes

There is the flip-flop, represented as follows, which contains a binary value. This
flip-flop is under the control of a clock. With each beat (or signal) of this clock the
flip-flop transmits the value that it contained and receives the value presented at input.
A flip-flop has an input and an output (see Figure 2.2).

Figure 2.2. Flip-flop (or oscillation)

There are also logical gates, “OR”, “AND”, “exclusive OR” represented as follows
(see Figure 2.3).
Chapter 8

Other Important Block Codes


8.1 Introduction
There are a variety of block codes of both historical and practical importance which are
used either as building blocks or components of other systems, which we have not yet seen
in this book. In this chapter we introduce some of the most important of these.

8.2 Hadamard Matrices, Codes, and Transforms


8.2.1 Introduction to Hadamard Matrices
A Hadamard matrix of order n is an n x n matrix Hn of f1 such that

H,H; = nz.

That is, by normalizing Hn by 1/,h an orthogonalmatrix is obtained. The distinct columns

-:1
of H are painvise orthogonal, as are the rows. Some examples of Hadamard matrices are:
1 1 1
1 -1 1
H4 =
1 1 -1 -1
1 -1 -1 1 _I
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
1 -1 -1 1 -1 1 1 -1

The operation of computing rH,, where r is a row vector of length n, is sometimes called
computing the Hadamard transform of r. As we show in Section 8.3.3, there are fast
algorithms for computing the Hadamard transform which are useful for decoding certain
Reed-Mullercodes (among other things). Furthermore, the Hadamard matrices can be used
to define some error correction codes.
It is clear that multiplying a row or a column of Hn by -1 produces another Hadamard
matrix. By a sequence of such operations, a Hadamard matrix can be obtained which has
the first row and the first column equal to all ones. Such a matrix is said to be n o m l i z e d .
Some of the operations associated with Hadamard matrices can be expressed using the
Kronecker product.
370 Other Important Block Codes

Definition 8.1 The Kronecker product A €3 B o f an m x n matrix A with a p x q matrix


B is the m p x nq obtained by replacing every element aij of A with the matrix ajj B. The
Kronecker product is associative and distributive, but not commutative.

Example 8.1 Let

Theorem 8.1 The Kronecker product has the following properties [246, ch. 91:

1. A €3 B # B €3 A in general. (The Kroneckerproduct does not commute.)


2. For a scalar x, (xA) €3 B = A €3 (xB) = x ( A €3 B ) .
3. Distributive properties:

(A + B) €3 C = ( A €3 C) 4- ( B €3 C).
A €3 (B + C) = ( A €3 B) + ( A €3 C ) .

4. Associativeproperty: ( A €3 B ) €3 C = A €3 (B €3 C ) .
5. Transposes: ( A €3 B)T = AT €3 BT.
6. Trace (for square A and B): tr(A €3 B) = tr(A)tr(B).
7. I f A is diagonal and B is diagonal, then A €3 B is diagonal.
8. Determinant, where A is rn x m and B is n x n: det(A €3 B) = det(A)" det(B)'".
9. The Kronecker product theorem:

( A €3 B)(C €3 D ) = ( A C ) €3 (BW, (8.2)


provided that the matrices are shaped such that the indicatedproducts are allowed.
10. Inverses: IfA and B are nonsingular then A €3 B is nonsingularand
( A €3 B)O1 = A-' €3 B-'. (8.3)

Returning now to Hadamard matrices, it may be observed that the Hadamard matrices
in (8.1) have the structure

This works in general:

Theorem 8.2 I f Hn is a Hadamard matrix, then so is H2n = H2 €3 Hn,


8.2 Hadamard Matrices, Codes, and Transforms 371

Proof By the properties of the Kronecker product,

&nH& = (H2 8 W ( H 2 8 H d T = H2HT 8 HnH: = (212) 8 (n1,)


= 2n(Z2 8 In) = 2nz2,.

This constructionof Hadamard matrices is referred to as the Sylvester construction. By this


construction, Hadamard matrices of sizes 1, 2, 4 , 8, 16, 32, etc., exist. However, unless
a Hadamard matrix of size 6 exists, for example, then this construction cannot be used to
construct a Hadamard matrix of size 12. As the following theorem indicates, there is no
Hadamard matrix of size 6.

Theorem 8.3 A Hadamard matrix must have an order that is either I , 2, or a multiple of 4.

Proof [220, p. 441 Suppose without loss of generality that Hnis normalized. By column
permutations, we can put the first three rows of Hn in the following form:
1 1 1 1 1 *.- 1 1 1 ... 1 1 1 ... 1
1 1 ... 1 1 1 ... 1 -1 -1 ... -1 -1 -1 ... -1
1 1 ... 1 -1 -1 ... -1 1 1 **- 1 -1 -1 ... -1
- \ / \

i j k 1

For example, j is the number of columns such that the first two rows of Hn have ones while
the third row has negative ones. Since the rows are orthogonal, we have
i+j-k-Z=O (inner product of row 1 with row 2)
i-j+k-Z=O (inner product of row 1 with row 3)
i-j-k+Z=O (inner product of row 2 with row 3),
which collectively imply i = j = k = 1. Thus n = 4 i , so n must be a multiple of 4. (If
n = 1or 2, then there are not three rows to consider.) 0
This theorem does not exclude the possibility of a Hadamard matrix of order 12. However,
it cannot be obtained by the Sylvester construction.

8.2.2 The Paley Construction of Hadamard Matrices


Another method of constructingHadamard matrices is by the Paley construction,which em-
ploys some number-theoretic concepts. This allows, for example, creation of the Hadamard
matrix Hl2. While in practice Hadamard matrices of order 4k are most frequently em-
ployed, the Paley construction introduces the important concepts of quadratic residues and
the Legendre symbol, both of which have application to other error correction codes.
Definition 8.2 For all numbers a such that ( a , p ) = 1, the number a is called a quadratic
residue modulo p if the congruence x = a (mod p ) has some solution x . That is to say,
a is the square of some number, modulo p . If a is not a quadratic residue, then a is called
a quadratic nonresidue. 0
+
If a is a quadratic residue modulo p , then so is a p , so we consider as distinct residues
only these which are distinct modulo p .
372 Other Important Block Codes

Example 8.2 The easiest way to find the quadratic residues modulo a prime p is to list the nonzero
numbers modulo p , then square them.
Let p = 7. The set of nonzero numbers modulo p is {1,2,3,4,5,6]. Squaring these numbers
modulo p we obtain the list (12,22, 32, 42, 5 2 , 62} = {1,4,2,2,4, 1). So the quadratic residues
modulo 7 are {1,2,4}.The quadratic nonresidues are {3,5,6}. The number 9 is a quadratic residue
+
modulo 7, since 9 = 7 2, and 2 is a quadratic residue.
Now let p = 11. Forming the list of squares we have

The quadratic residues modulo 11 are (1,3,4, 5,9}. 0

Theorem 8.4 Quadratic residues have the following properties:

I . There are ( p - 1)/2 quadratic residues modulo p for an oddprime p.


2. The product of two quadratic residues or two quadratic nonresidues is always a
quadratic residue. The product of a quadratic residue and a quadratic nonresidue is
a quadratic nonresidue.
+
3. I f p is of the form 4k 1, then -1 is a quadratic residue modulo p. I f p is of the
+
form 4k 3, then -1 is a nonresidue modulo p.

The Legendre symbol is a number theoretic function associated with quadratic residues.
Definition8.3 Let p be an odd prime. The Legendre symbol x p ( x ) is defined as

0 if n is a multiple of p
1 if x is a quadratic residue modulo p
- 1 if x is a quadratic nonresidue modulo p .

The Legendre symbol x p (x) is also denoted as


(9. 0

Example 8.3 Let p = 7. The Legendre symbol values are


~ 0 1 32 4 5 6
x7(x): 0 1 1 -1 1 -1 -1
When p = 11 the Legendre symbol values are
x : O 1 2 3 4 5 6 7 8 9 1 0
x11(x): 0 1 -1 1 1 1 -1 -1 -1 1 -1

The key to the Paley construction of Hadamard matrices is the following theorem.

Lemma 8.5 [220, p. 461 For any c f 0 (mod p ) ,


D-1
8.2 Hadamard Matrices, Codes, and Transforms 373

Proof From Theorem 8.4 and the definition of the Legendre symbol,x p (xy) = xp(x)xp (y ) .
Since b = 0 contributes nothing to the sum in (8.4), suppose b # 0. Let z = (b c)b-’ +
(mod p). As b runs from 1,2, . . . , p - 1, z takes on distinct values in 0,2,3, . . ., p - 1,
but not the value 1. Then
P-1 P-1 P-1 P-1

= c
P-1

z=o
xp(z> - Xp(1) = 0 - Xp(1) = -1,

where the last equality follows since half of the numbers z from 0 to p - 1 have xp (z) = -1
and the other half x p (z) = 1, by Theorem 8.4.
With this background, we can now define the Paley construction.

1. First, construct the p x p Jacobsthal matrix J p . with elements 4ij given by q i j =


xp(j - i) (with zero-based indexing). Note that the first row of the matrix is xp(j),
which is just the Legendre symbol sequence. The other rows are obtained by cyclic
shifting.
2. Second, form the matrix

where 1 is a column vector of length p containing all ones.

Example 8.4 Let p = 7. For the first row of the matrix, see Example 8.3.

- 1 1 1 1 1 1 1 1 ’
0 1 1 -1 1 -1 -1
1 -1 1 1 -1 1 -1 -1
-1 0 1 1 -1 1 -1
1 -1 -1 1 1 -1 1 -1
-1 -1 0 1 1 -1 1
1 -1 -1 0 1 1 -1 -1 -1 -1 -1
1 1 -1 -1 -1 1 1 -1
-1 1 -1 -1 0 1 1
1 -1 1
1 -1 1 -1 -1 -1 1 1
-1 -1 0 1
1 1 -1 1 -1 -1 -1 1
1 1 -1 1 -1 -1 0
1 1 1 -1 1 -1 -1 -1 -

Example 8.5 We now show the construction of H12. The 11 x 11 Jacobsthai matrix is

- 0 1 - 1 1 1 1 - 1 - 1 - 1 1 - 1
-1 0 1 -1 1 1 1 -1 -1 -1 1
1 -1 0 1 -1 1 1 1 -1 -1 -1
-1 1 -1 0 1 -1 1 1 1 -1 -1
-1 -1 1 -1 0 1 -1 1 1 1 -1
J11 = -1 -1 -1 1 -1 0 1 -1 1 1 1
1 -1 -1 -1 1 -1 0 1 -1 1 1
1 1 -1 -1 -1 1 -1 0 1 -1 1
1 1 1 -1 -1 -1 1 -1 0 1 -1
-1 1 1 1 -1 -1 -1 1 -1 0 1
- 1 - 1 1 1 1 - 1 - 1 - 1 1 - 1 0
374 Other ImDortant Block Codes

and the Hadamard matrix is

II
1 1 1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 1 1 1 -1 -1 -1 1 -1
1 -1 -1 1 -1 1 1 1 -1 -1 -1 1
1 1 -1 -1 1 -1 1 1 1 -1 -I -1
1 -1 1 -1 -1 1 -1 1 1 1 -1 -1
1 -1 -1 1 -1 -1 1 -1 1 1 1 -1
(8.5)

i
H12 = 1 -1 -1 -1 1 -1 -1 1 -1 1 1 1
1 1 -1 -1 -1 1 -1 -1 1 -1 1 1
1 1 1 -1 -1 -1 1 -1 -1 1 -1 1
1 1 1 1 -1 -1 -1 1 -1 -1 I -1
1 -1 1 1 1 -1 -1 -1 1 -1 -1 1
1 1 -1 1 1 1 -1 -1 -1 1 -1 -1

The following lemma establishes that the Paley construction gives a Hadamard matrix.

Lemma 8.6 Let Jp be a p x p Jacobsthal matrix. Then J pJp' = pZ - U and J p U =


U Jp = 0, where U is the matrix of all ones.

Proof Let P = Jp Jp'. Then


P-1
pii = C q:k = p - 1 (since x,2(x) = 1 for x + 01
k=O
D-1 D-1

k=O k=O
P-1
= +
xp(b)Xp(b c ) = -1 (subs. b = k - i, c = i - j , then use Lemma 8.5).
b=O
Also, JpU = 0 since each row contains ( p - 1 ) / 2 elements of 1 and ( p - 1 ) / 2 elements
of -1. 0
Now

ButfromLemma8.6, J + ( J , - l ) ( J p ' - Z ) =Ufpl-U-U- J p - Jp'+I = ( p f 1 ) Z .


+
so Hp+lHpT+1 = ( P l U p + l .
8.2.3 Hadamard Codes
Let An be the binary matrix obtained by replacing the 1s in a Hadamard matrix with Os, and
replacing the -1s with 1s. We have the following code constructions:
By the orthogonality of Hn, any pair of distinct rows of An must agree in n / 2 places
and differ in n / 2 places. Deleting the left column of An (since these bits are all the
same and do not contribute anything to the code), the rows of the resulting matrix
forms a code of length n - 1 called the Hadamard code, denoted An, having n
codewords and minimum distance n / 2 . This is also known as the simplex code.
By including the binary-complementsof all codewords in An we obtain the code Bn
which has 2n codewords of length n - 1 and a minimum distance of n / 2 - 1.
8.3 Reed-Muller Codes 375

Starting from An again, if we adjoin the binary complements of the rows of A,, we
obtain a code with code length n, 2n codewords, and minimum distance n/2. This
code is denoted e.
This book does not treat many nonlinear codes. However, if any of these codes are con-
structed using a Paley matrix with n > 8, then the codes are nonlinear. (The linear span
of the nonlinear code is a quadratic residue code.) Interestingly, if the Paley Hadamard
matrix is used in the construction of A, or B, then the codes are cyclic, but not necessarily
linear. If the codes are constructed from Hadamard matrices constructed using the Sylvester
construction,the codes are linear.

8.3 Reed-Muller Codes


Reed-Muller codes were among the first codes to be deployed in space applications, being
used in the deep space probes flown from 1969 to 1977 [373, p. 1491. They were probably
the first family of codes to provide a mechanism for obtaining a desired minimum distance.
And, while they have been largely displaced by Reed-Solomon codes in volume of practice,
they have a fast maximum likelihood decoding algorithm which is still very attractive. They
are also used as components in several other systems. Furthermore, there are a variety
of constructions for Reed-Muller codes which has made them useful in many theoretical
developments.

8.3.1 Boolean Functions


Reed-Muller codes are closely tied to functions of Boolean variables and can be described
as multinomials over the field GF(2) [284]. Consider a Boolean function of m variables,
f(u1, u2, . . . , Urn), which is a mapping from the vector space V, of binary rn-tuples to the
binary numbers (0, 1). Such functions can be represented using a truth table, which is an
exhaustive listing of the input/output values. Boolean functions can also be written in terms
of the variables.
Example 8.6 The table below is a truth table for two functions of the variables ul , u2, u3 and u4.
u 4 = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
u3= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
u2= 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
u1= 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
f1= 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0
f2= 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0
It can be verified (using, e.g., methods from elementary digital logic design) that
fl(V1, u2, u3, u4) = V l + u2 + V3 + u4
and that
f2(u1, u2, V3, V4) = 1 + u1u4 + v l u 3 + V2u3.
0
The columnsof the truth table can be numberedfrom 0 to 2rn- 1using a base-2 representation
with u1 as the least-significant bit. Then without ambiguity,the functions can be represented
simply using their bit strings. From Example 8.6,
fl = (0110100110010110)
376 Other Important Block Codes

f2 = (1111100110101100).
The number of distinct Boolean functions in m variables is the number of distinct binary
sequences of length 2 m , which is 22m. The set M of all Boolean functions in m variables
forms a vector space that has a basis
{1,ul,V2, ..., ~ m , ~ l ~ 2 , ~ 1 ~ 3 , . . . , U m - 1 U ,VlV2U3...Vm].
m,"'

Every function f in this space can be represented as a linear combination of these basis
functions:
f =a01 + a l v l + a 2 ~ 2+ . . . a m u r n + ~ l 2 ~ l ~ 2 + . . . + ~ 1 2 . . . m ~ 1 ~ 2 ' . . ~ m .

Functional and vector notation can be used interchangeably. Here are some examples
of some basic functions and their vector representations:
1 1 = 1111111111111111
f,

Ul f,V l = 0101010101010101
u2 t,v 2 = 0011001100110011
u3 f,v 3 = 00001 11100001111
u4 f,v4 = 0000000011111111
UlU2 f, V l V 2 = 0001000100010001
~1 u2 u3 ~4 t,~ 1 ~ ~4 2= ~000000000000000
3 1.
As this example demonstrates,juxtaposition of vectors represents the corresponding Boolean
'and' function, element by element. A vector representing a function can be written as
f =a01 + a l v l + a 2 ~ 2 + * . . a m ~ +m ~ 1 2 ~ 1+ .~. . 2+ a l 2 , , , m ~ l v 2 . . . v , .

8.3.2 Definition of the Reed-Muller Codes

Definition 8.4 [373, p. 1511 The binary Reed-Muller code R M ( r , m ) of order r and length
2mconsists of all linear combinations of vectors f associated with Boolean functions f that
are monomials of degree p r in m variables.
I genrm.cc I

Example8.7 TheRM(1,3)codehaslength23 = 8. Themonomialsofdegree< lare(l, v l , v2, vg},


with associated vectors
l+l= (I 1 1 1 1 1 1 1)
v3 * v 3 = (0 0 0 0 1 1 1 1)
v2*v2= (0 0 1 1 0 0 1 1)
Vl * v 1 = (0 1 0 1 0 1 0 1).
It is natural to describe the code using a generator matrix having these vectors as rows,

1
1 1 1 1 1 1 1 1
G=[ 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 '
0 1 0 1 0 1 0 1
This is an (8,4,4) code; it is single-error correcting and double-error detecting. This is also the
extended Hamming code (obtained by adding an extra parity bit to the (7,4) Hamming code). 0
8.3 Reed-Muller Codes 377

Example 8.8 The R M ( 2 , 4 ) code has length 16 and is obtained by linear combinations of the mono-
mials up to degree 2, which are

( 1 , v l , v29 V 3 , v4, v l v 2 , vlv3, Vlv4, V2V3, v2v4, V3v41


with the following corresponding vector representations:

1= (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )
v4= (0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 )
v3= (0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 )
v2= (0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 )
v1= (0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 )
v3v4= (0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 )
~ 2 ~ 4 (0
= 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 )
V Z V ~ = (0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 )
V I V ~ = (0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 )
v 1 ~ 3 = (0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1)
v1v2= (0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1).

This is a (16, 11) code, with minimumdistance 4.

In general for an R M ( r , m ) code, the dimension is

k =1 + (7) + )(; + . . . + (;).


The codes are linear, but not cyclic.
As the following lemma states, we can recursively construct an R M ( r 1, m 1) code + +
- twice the length - from an RM(r, m ) and R M ( r 1, m ) code. In this context, the+
notation (f, g) means the concatenation of the vectors f and g.

Lemma8.7 R M ( r + l , m + l ) = [(f,f+g)foraZZf E RM(r+l,m)andg E RM(r,m)].

+ +
Proof The codewords of R M ( r 1, m 1) are associated with Boolean functions in m 1 +
variables of degree F r +
1. If c(v1, . . . , v m + l ) is such a function (i.e., it represents a
codeword) we can write it as
c(~I, 1 . . 9 vm+l) = f(v1,. . + v m + l g ( v l , . . ., Urn),
- 9 vm)

where f is a Boolean function in m variables of degree 5 r + 1, and hence represents a


codeword in RM(r + 1, m ) , and g is a Boolean function in m variables with degree 5 r ,
representing a Boolean function in R M (r, m ) . The correspondingfunctions f and g are thus
in R M ( r + 1,-m) and R M ( r , m ) , respectively.
NOWlet f ( ~UZ,. , . . , V m + l ) = f ( u 1 , u 2 , . . . , v m ) + 0 . vm+l represent a codeword
in R M ( r + 1 , m + 1) and let i ( u 1 , v 2 , . . ., V m + l ) = v m + l g ( v l , u 2 , . . . , u,) represent a
codeword in R M ( r + 1, m + 1). The associated vectors, which are codewords in R M ( r +
1, m + 11, are
f = (f, f) and 8 = (0, g).
Their linear combination (f, f +g) must therefore also be a codeword in R M ( r + 1, m + 1).
0
We now use this lemma to compute the minimum distance of an R M ( r , m ) code.
378 Other Inwortant Block Codes

Theorem 8.8 The minimum distance of R M ( r , m ) is 2m-r.

Proof By induction. When m = 1 the R M ( 0 , 1 ) code is built from the basis [ l } , giving rise
to two codewords: it is the length-2 repetition code. In this case d i n = 2. The R M ( 1 , 1 )
code is built upon the basis vectors [ l , u l } and has four codewords of length two: 00, 01,
1 0 , l l . Hence dmin = 1 .
As an inductive hypothesis, assume that up to m and for 0 5 r 5 m the minimum
+
distance is 2m-r. We will show that dmin for RM(r, m 1 ) is 2m-r+1.
Let f and f’ be in RM(r, m ) and let g and g’ be in R M ( r - 1 , m).By Lemma 8.7, the
+ +
vectors c1 = (f, f g) and c2 = ( f ’ , f’ g’) must be in R M ( r , m 1 ) . +
+ +
If g = g’ then d(c1, c2) = d ( ( f ,f g), (f’, f’ g)) = d ( ( f ,f’), (f, f’)) = 2 d ( f ,f’) 2
22“‘-‘ by the inductive hypothesis. If g # g’ then
d(c1, ~ 2=) w(f - f’) + w(g - g’ + f - f’).
+
Claim: w(x y) 2 w(x) - w(y). Proof Let w x y be the number of places in which the
+ +
nonzero digits of x and y overlap. Then w(x y) = (w(x) - w x y ) (w(y) - w x y ) .But
since 2w(y) _> 2wxy,the result follows.
By this result,
d(c1, ~ 2 2) w(f - f’) + w(g - g’) - w(f - f’) = w(g - g’).
But g - g’ E R M ( r - 1 , m ) , so that w(g - g’) 2 2m-(r-1) = 2m-r+1. 0
The following theorem is useful in characterizing the duals of RM codes.

Theorem 8.9 For 0 5 r 5 m - 1 , the R M ( m - r - 1 , m ) code is dual to the R M ( r , m )


code.

Proof [373, p. 1541 Let a be a codeword in R M ( m - r - 1 , m ) and let b be a codeword


in R M ( r , m ) . Associated with a is a polynomial a(u1, u2, . . . , V m ) of degree 5 m - I - 1;
associated with b is a polynomial b(u1, u2, . . ,, V m ) of degree 5 r . The product polynomial
has degree 5 m - 1 , and thus corresponds to a codeword in the R M ( m - 1 , m ) code, with
vector representation ab. Since the minimum distance of R M ( m - r - 1 , m ) is 2+’ and
the minimum distance of R M ( m - 1 , m ) is 2m-r,the codeword ab must have even weight.
Thus a . b = 0 (mod 2). From this, R M ( m - r - 1, m ) must be a subset of the dual code
to R M ( r , m ) . Note that
dim(RM(r, m ) )
=1 + (7) + . .. + )(; + 1 + C)+ ();
+ dim(RM(m - r - 1 , m ) )
+ .. . + ( m
m-r-1
)
=1+(’l)+.--(;)+(;)+( m-1 )+(m - 2 )+...+( ml ,
r+
m

i =O
By the theorem regarding the dimensionalityof dual codes, Theorem 2.8, R M ( m -r - 1 , m )
must be the dual to R M ( r , m ) . 0
It is clear that the weight distribution of R M ( 1 , m ) codes is A0 = 1 , A2m = 1 and A2,,-1 =
2m+1- 2. Beyond these simple results, the weight distributions are more complicated.
8.3 Reed-Muller Codes 379

8.3.3 Encoding and Decoding Algorithms for First-Order RM Codes


In this section we describe algorithms for encoding and decoding RM(1, m ) codes, which
+
are ( 2 m ,m 1, 2m-1) codes. In Section 8.3.4 we present algorithms for more general RM
codes.

Encoding RM(1, m ) Codes


Consider the RM(1,3) code generated by

Lo 1 0 1 0 1 0 11
The columns of G consist of the numbers (1, 0, 0,O) through (1, 1, 1, 1)in increasingbinary
counting order (with the least-significant bit on the right). This sequence of bit values can
thus be obtained using a conventional binary digital counter. A block diagram of an encoding
circuit embodying this idea is shown in Figure 8.1.

3-bit binary digital counter

Figure 8.1: An encoder circuit for a R M (1,3) code.

Decoding RM( 1, m ) Codes


The idea behind the decoder is to compare the received sequence r with every codeword in
RM( 1, m ) by means of correlation,then to select the codeword with the highest correlation.
As we shall show, because of the structure of the code these correlations can be computed
using a Hadamard transform. The existenceof a fast Hadamard transform algorithmsmakes
this an efficient decoding algorithm.
Let r = (ro, 1 1 , . . . ,1 2 m - 1 ) be the received sequence, and let c = (co, c1, . . . , ~ 2 m - 1 )
380 Other Important Block Codes

be a codeword. We note that


2m-1 2m-1 2m-1

i=O i=O i=O


2"-1
(8.7)

i=O

where @ denotes addition modulo 2 and d(ri, ci) is the Hamming distance between the
arguments. A sequence which minimizes d(r, c) has the largest number of positive terms
in the sum on the right of (8.7) and therefore maximizes the sum.
Let 3(r) be the transformation that converts binary {O, l} elements of r to binary f l
values of a vector R according to

3(r) = F(r0, r1, . . . , r2m-1) = R = ((-l)ro, (-l)rl,. . . , (-1)"2"-1 1.


We refer to R as the bipolar representationof r. Similarly define 3 ( c ) = C = (Co, C1, . . . , C 2 m -.
We define the correlation function
2m-1
T = cor(R, C ) = cor((Ro, R 1 , . . . , R p - i ) , (Co, Ci, . . . , Cp-1)) = C RiCi.
i =O

By (8.7), the codeword c which minimizes d ( r ,c) maximizes the correlation cor(R, C ) .
The decoding algorithm is summarized as: Compute Ti = cor(R, Ci), where Ci =
3 ( c i ) for each of the 2"+' codewords, then select that codeword for which cor(R, C i ) is
the largest. The simultaneous computation of all the correlations can be represented as a
matrix. Let Ci be represented as a column vector and let

Then all the correlations can be computed by

Recall that the generator matrix for the R M ( 1, m ) code can be written as

We actually find it convenient to deal explicitly with those codewords formed as linear
+
combinationsof only the vectors v1,v1, . . ., v m ,since 1 c complements all the elements
of c, which correspondsto negating the elements of the transform C . We therefore deal with
the 2m x 2" matrix H p . Let us examine one of these matrices in detail. For the R M ( 1,3)
code with G as in (8.6),the matrix Hs can be written as
8.3 Reed-Muller Codes 381

- 0 1 2 3 4 5 6 7
1 1 1 1 1 1 1 1
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
L 1 -1 -1 1 -1 1 1 -1
Examination of this reveals that, with this column ordering, column 1 corresponds to 3(v I ) ,
column 2 corresponds to 3 ( v 2 ) , and column 4 corresponds to 3(v3). In general, the ith
+
column correspondsto the linear combination of i l v l + i2v2 i3v3, where i has the binary
representation
+ +
i = i l 2i2 4i3.
We write the binary representation as i = (i3, i2, i1)2. In the general case, for an 2m x 2m
Hadamardmatrix,weplace3(~~=l ijvj) intheithcolumnwherei = (im,i m - l , . . . , i1)2.
The computation RH is referred to as the Hadamard transform of R.
The decoding algorithm can be described as follows:

Algorithm 8.1 Decoding for R M ( 1, m) Codes

I Input: r = (ro,rl , . . . , r2m-1).


2 Output: A maximum-likelihood codeword 2.
3 Begin
4 Find the bipolar representation R = F(r).
5 Compute the Hadamard transform T = R H p = (to, t l , . . . , t2m-1)
6 Find the coordinate ti with the largest magnitude
7 Let i have the binary expansion (im, i,-l, . . . , i1)2. (il LSB)
8 if& > 0) (1 is not sent)
9 C=Cy==lijvj
10 else (1 is sent - complement all the bits)
II +
2 = 1 CY==, ijvj
1 2 end (if)
13 End

Example 8.9 For the R M ( 1, 3) code, suppose the received vector is

r = [I, 0, 0, I , O , O , 1,0] .
The steps of the algorithm follow: rmdecex.m

1. Compute the transform: R = [-1, 1, 1 , -1, 1, 1, -1, 11.


2. Compute T = RH = [2, - 2 , 2 , -2, - 2 , 2 , -2, -61.
3. The maximum absolute element occurs at t7 = -6, so i = 7 = (1, 1, 1)2.
4. Sincet7 < 0 , c = 1 + v l + v 2 + v 3 = [1,0,0, 1,0, 1, 1.03.
382 Other Important Block Codes

Expediting Decoding Using the Fast Hadamard Transform


The main step of the algorithm is the computation of the Hadamard transform R H . This
can be considerablyexpedited by using afast Hadamard transform, applicableto Hadamard
matrices obtained via the Sylvester construction. This transform is analogous to the fast
Fourier transform (FFT), but is over the set of numbers f1. It is built on some facts from
linear algebra.
As we have seen, Sylvester Hadamard can be built by

H2m = H2 @ H2m-I. (8.9)


This gives the following factorization.

Theorem 8.10 The matrix H2m can be written as


(1) (2) . . . (m) (8.10)
H2m = M2m M2rn M2m 9

where
M.!$ = 12m-i @ H2 @ Z2i-I,
and where Z p is a p x p identity matrix.

Proof By induction. When m = 1 the result holds, as may be easily verified. Assume,
then, that (8.10) holds for m. We find that

M$+l = 12m+1-i @ H2 @ 121-1


= (Z2 @ 1 2 m - i ) @ H2 @ 12i-I (by the structure of the identity matrix)
= 12 @ (12m-i @ H2 @ Z 2 i - I ) (associativity)
= 12 @ M$? (definition).

Furthermore, by the definition, Mg;: = H2 @ 1 2 m . We have


(1) (2) (m+l)
H2m+l = M2m+l M2rn+l . . . M2m+I
= (12 @ Mi:)) (I2 @ Mi?). . . (12 @ Mi:)) (H2 @ 1 2 m )
(2) . . . M g ) )
= (1TH2) @ <M2mM2m (Kroneckerproduct theorem 8.2)
= H2 @ H2m.

Example 8.10 By the theorem, we have the factorization

Ha = M i 1 ) M f ) M i 3 )= (122 €3 H2 63 Z20)(I21 63 H2 €3 I 2 ~ ) C I 2 0€3 H2 €3 122).

[ hadex.m I Straightforward substitution and multiplication shows that this gives the matrix Ha in (8.8).
Let R = [ R o , R1, . . . , R7]. Then the Hadamard transform can be written

T = RHs = R(Mg(1) M s( 2 )M8(3) ) = R(Z22 63 H2 €3 120)(121€3 H2 €3 121)(120€3 H2 €3 122).


8.3 Reed-Muller Codes 383

The matrices involved are


-
- 1 1
1 -1
1 1
1 -1
Mi1) = I4 @ H2 =
1 1
1 -1
1 1
- 1 -1

1 0
-
0 1
-1 0
0 -1
1 0 1 0
0 1 0 1
1 0 --1 0
0 1 0 -1 -
- 1 1
-
1 1
1 1
1 1
M8(3) -
- H2@14=
1 -1
1 -1
1 -1
- 1 -1 -

U
testfht.cc
Figure 8.2 shows the flow diagram corresponding to the matrix multiplications, where arrows f h t .cc
indicate the direction of flow, arrows incident along a line imply addition, and the coefficients -1 fht.m
along the horizontal branches indicate the gain along their respective branch. At each stage, the two-
point Hadamard transform is apparent. (At the first stage, the operations of H2 are enclosed in the box
to highlight the operation.) The interleaving of the various stages by virtue of the Kronecker product
is similar to the “butterfiy” pattern of the fast Fourier transform. 0

The conventional computation of the Hadamard transform R H p produces 2m elements,


each of which requires 2m additiodsubtraction operations, for a total complexity of (2”)2.
The fast Hadamard transform has m stages, each of which requires 2m additiodsubtraction
operations, for a total complexity of m2m. This is still exponential in m (typical for maximum
likelihood decoding), but much lower than brute force evaluation. Furthermore, as Figure
8.2 suggests, parallellpipelined hardware architectures are possible.
The R M ( 1, m) decoding algorithm employing the fast Hadamard transform is referred
to as the “Green machine,” after its developer at the Jet Propulsion Laboratory for the 1969
Mariner mission [373].
384 Other Important Block Codes

..........

Figure 8.2: Signal flow diagram for the fast Hadamard transform.

8.3.4 The Reed Decoding Algorithm for RM(r, rn) Codes, r 2 1

Efficient decoding algorithms for general Rh4 codes rely upon the concept of majority logic
decoding, in which multiple estimates of a bit value are obtained, and the decoded value
is that value which occurs in the majority of estimates. We demonstrate this first for a
RM(2,4) code, then develop a notation to extend this to other RM(r, rn) codes.

Details for an RM(2,4) Code

Let us write the generator for the RM(2,4) code as


- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

[z]
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 GO
G= 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 =
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1

We partition the 11input bits to correspond to the rows of this matrix as

Thus the bits in mo are associated with the zeroth order term, the ml bits are associated
with the first order terms, and the second-orderterms are associated with m2. The encoding
8.3 Reed-Muller Codes 385

[g]
operation is

c = (CO, c1, c2, . . . , C15) = mG = [mo,m l , m21 = moGo +mlGl fm2G2.

(8.11)
The general operationof the algorithmis as follows: Given a received vector r, estimates
are first obtained for the highest-orderblock of message bits, m2. Then m2G2 is subtracted
off from r, leaving only lower-order codewords. Then the message bits for m l are obtained,
then are subtracted, and so forth.
The key to finding the message bits comes from writing multiple equations for the same
quantity and taking a majority vote. Selecting coded bits from (8.11) we have
co = mo
c1 = nzo + m l
c:!=mo+m2
c3 =mo+mi +rn2+m12.

Adding these code bits together (modulo 2) we obtain an equation for the message bit m 12:

co + ci + c2 + c3 = m12.
We can similarly obtain three other equations for the message bit m 12,
c4 + + C6 f = m12
C5 C7
CS + + C10 + = m12
C9 C11

c12 + ci3 + C14 + C15 = m12.

Given the code bits co, . . .,~ 1 5we


, could compute m12 four independent ways. However,
the code bits are not available at the receiver, only the received bits r = (ro, rl, . . . , r15).
We use this in conjunction with the equations above to obtain four estimates of m 12:
(1)
m12 = ro + ri + r2 + r3
-
+ r5 + r6 + r7
12 - r4

m12 = r8 + r9 + rio + rii


n(3)

12 - r12 + 113 + 114 + 115.


h(4) -

Expressions such as this, in which the check sums all yield the same message bit, are said
to be orthogond' on the message bit. From these four orthogonal equations, we determine
the value of m12 by majority vote. Given A?;, i = 1 , 2 , 3 , 4 , the decoded value 6 1 2 is

where maj (- . . ) returns the value that occurs most frequently among its arguments.
If errors occur such that only one of the ; :
A is incorrect, the majority vote gives the
correct answer. If two of them are incorrect, then it is still possible to detect the occurrence
of errors.
'This is a different usage from orthogonality in the vector-space sense.
386 Other ImDortant Block Codes

(8.12)

m34 = c3 c7 + + +
c11 C15
Based upon these equations, majority logic decisions are computed for each element of the
second-orderblock. These decisions are then stacked up to give the block
m 2 = (A349 A243 k147 k 2 3 , fit133 fi12).
We then “peel off’ these decoded values to get
rf = r - m2G2.
Now we repeat for the first-order bits. We have eight orthogonal check sums on each of the
first-order message bits. For example,
ml = c o + c 1 m i =C2+C3 mi =c4+c5 mi = C6 +c7
mi =C8+C9 ml = c10 + c11 m l = C12 + C13 m l = C14 f C15
We use the bits of r’ to obtain eight estimates,
= rh + r; m y ) = r;+ ri mi3) = ri+ ri my’ = ri + r;
m(5) = r8
f + rb mi6’ = ri0 + ril my’ = ri2 + ri3 mr8) = T i 4 + ri5
then make a decision using majority logic,
l 1.
k 1 = maj(mi’), m y ’ , . . . , m (8)

Similarly, eight orthogonal equations can be written on the bits m2, m3, and m4, resulting
in the estimate m l = ( k 1 , k 2 , k 3 , k 4 ) .
Having estimated m l , we strip it off from the received signal,
rf f = r f - m 1 G 1
and look for mo. But if the previous decodings are correct,
r” = mol + e.
Then mo is obtained simply by majority vote:
k o = maj(rff f f, . . . ,&.
0, r1

If at any stage there is no clear majority, then a decoding failure is declared.


8.3 Reed-Muller Codes 387

Example 8.11 The computations described here are detailed in the indicated file. Suppose m =
(00001001000), SO the codeword is c = mG = (0101011001010110). Suppose the received vector
is r = (0101011011010110). The bit message estimates are
&(I)
12 - ro + ri + r2 + r3 = 0
- h(2) -
12 - r4 + r5 + rg + r7 = 0
&(3) -
12 - r8 + rg + rio + r l i = 1 ,m12
.(4) = r12 + r13 + r14 + ‘15 = 0
We obtain h i 2 = maj(0, 0, 1,O) = 0. We similarly find

h i 3 = maj(O,O, 1,O) = O 2 1 4 = maj(1, 0 , 0, 0 ) = O h 2 3 = maj(1, I, 0.1) = 1

b;]
2 2 4 = maj(1, O,O, 0 ) = O h 3 4 = maj(l,O, 0,O) = 0,

so that m2 = (001000). Removing this decoded value from the received vector we obtain

v3v4

r = r - m2G2 = r’ - m 2 v2v3 = (0101010111010101).


‘ I -

VlV2
At the next block we have
$1 =maj(l, 1, 1, 1,O, I , 1 , I ) = 1 h 2 =maj(O,O,O,O, l,O,O,O) = O
h3 = maj(O,O, O,O, 1,0,0,0) = 0 h 4 = maj(l,O, O,O, O,O, 0,O) = 0

[
so that m2 = (0001). We now remove this decoded block

v4
r” = r’ - m2G1 = r’ - m2 = (0000000010000000).

The majority decision is 40 = 0. The overall decoded message is

m = (ho,m1, m2) = (00001001000).

This matches the message sent. 0

A Geometric Viewpoint
Clearly the key to employing majority logic decoding on an R M ( r , m) code is to find a
description of the equations which are orthogonal on each bit. Consider, for example, the
orthogonal equations for m34, as seen in (8.12). Writing down the indices of the checking
bits, we create the check set
S34 = {{O, 4 , 8 , 121,11,5,9,131, {2,6,10,14),{3,7,11,1511.

Now represent the indices in 4-bit binary,


s34 = {{(0000),(OlOO), (lOOO), (llOO)}, {(OOOl), (OlOl), ( l O O l ) , (1101)},
{(OOlO), (OllO), (lolo), (1110)), {(OOll),(Olll), (loll), (llll)}}.
Within each subset there are pairs of binary numbers which are adjacent, differing by a
single bit. We can represent this adjacency with a graph that has a vertex for each of the
388 Other Important Block Codes

(a) Three dimensions. (b) Four dimensions.

Figure 8.3: Binary adjacency relationships in three and four dimensions.

numbers from 0000 to 1111, with edges between those vertices that are logically adjacent.
The graph for a code with n = 3 is shown in Figure 8.3(a). It forms a conventional 3-
dimensional cube. The graph for a code with n = 4 is shown in Figure 8.3(b); it forms a
4-dimensional hypercube. The check set S34 can be represented as subsets of the nodes in
the graph. Figure 8.4 shows these sets by shading the ‘‘plane” defined by each of the fours
check subsets. Similarly, the check sets for each the bits m 12, m13, etc., form a set of planes.

Figure 8.4: Planes shaded to represent the equations orthogonal on bit m34.

With these observations, let us now develop the notation to describe the general situation.
For a codeword c = (co, c1, . . .,c n - l ) , let the coordmate ci be associated with the binary m-
tuple Pi obtained by complementing the binary representation of the index i. For example, co
is associated with Po = (1111) (since 0 = (0000)2)and C6 is associated with P6 = (1001)
(since 6 = (0110)~).We think of the Pi as points on the adjacency graph such as those
8.3 Reed-Muller Codes 389

shown in Figure 8.3.


Each codeword c in R M ( r , m ) forms an incidence vector for a subset of the graph,
selecting points in the graph corresponding to 1 bits in the codeword. For example, the
codeword c = (0101011001010110)is an incidence vector for the subset containing the
points { p l , p3, p5, p69 p99 9 1 p13, p141-
7

Let I = { 1,2, . . . , m } . We represent the basis vectors for the RM(r, m ) code as subsets
of I . For example, the basis vector v1 is represented by the set { 1). The basis vector ~ 2 is ~ 3
represented by the set {2,3}.The basis vector v2v3v4 is represented by [2,3,4}. With this
notation, we now define the procedure for finding the orthogonal check sums for the vector
, * . . vi, [373, p. 1601.
~ i vi2

1. Let S = { S l , S2, . . . , S ~ W , }be the subset of points associated with the incidence
vector vi, Vjz . . .vi,.
2. Let { j i , j 2 , . . . , j m - p } be the set difference I - { i l , i2, . . . , ip}. Let T be the subset
of points associated with the incidence vector vj, vjz . ..vjm-,. The set T is called the
complementary subspace to S .
3. The first check sum consists of the sum of the coordinates specified by the points in
T.
4. The other check sums are obtained by “translating”the set T by the points in S. That
+
is, for each Si E S , we form the set T Si . The corresponding check sum consists
of the sum of the coordinates specified by this set.

Example 8.12 Checksums for R M ( 2 , 4 ) . Let us find checksums for v3v4 = (0000000000001111).
1. The subset for which v3v4 is an incidence vector contains the points

S = (p122 p13, p14, pis} = { ( ~ ~ ~ ~ ) ( ~ ~ ~ ~ ) ( ~ ~ ~ ~ ) ( ~ ~ ~ ~ )

In Figure 8.4, the set S is indicated by the dashed lines.


2. The difference set is
{ j l , j21 = {1,2,3,41- {3,4)= {1>21,
which has the associated vector vlv2 = (0001000100010001). This is the incidence vector
for the set
T = { p 3 , h ,p11, =~ ( ~ ~ ~ ~ ) ( ~ ~ ) ( ~ ~ ~ ~ ) ( ~
In Figure 8.4, the set T is the darkest of the shaded regions.
+ + +
3. T represents the checksum m34 = c12 cg c4 co.
4. The translations of T by the nonzero elements of S are:
by Pi2 = (0011) + {(llll)(lOll)(Olll)(OOll)} = {PO,p4, pg, p121
by Pi3 = (0010) -+ {(lllO>(lOlO)(OllO)(OOlO)} = {PI,p5, p9,
by Pi4 = (0001) + { ( ~ ~ ~ ~ ) ( ~ =~{p2, ~ p6,
~ pl0,
) (p141.
~ ~ ~ ~ ) ( ~ ~ ~ ~ ) }

These correspond to the checksums

m34 = c15 + cll + c7 + c3


m34 = C 1 4 + C10 + + C2
C6

m34 = c13 +c9 +c5 + c i ,

which are shown in the figure as shaded planes.


390 Other Important Block Codes

Figure 8.5: Geometric descriptions of parity check equations for second-order vectors of
the R M ( 2 , 4 ) code.

Figure 8.5 indicates the check equations for all of the second-order vectors for the R M ( 2 , 4 ) code.
Now let us examine check sums for the first-order vectors.
1. For the vector v4 = (OOOOoooOl 1111111) the set S is

S = {PS,p9,p10,p11,p12,p132 p14, Pl.51


= ( ( O l l l ) , ( O l l O ) , ( O l O l ) , ( O l o o ) , ( O O l l ) , (OOlO), (OOOl), (OOOO)}.

These eight points are connected by the dashed lines in Figure 8.6(a).
2. The difference set is
[1,2, 3,4}- [41 = 11,Z 31,
which has the associated vector ~ 1 ~ =2 (OOoooOOlOOOOOOOl),
~ 3 which is the incidence vector
for the set
T = [ 4 ,P15} = ((lOOO), (OOOO)}.
The corresponding equation is m4 = Cg + co, The subset is indicated by the widest line in
Figure 8.6(a).
3. There are eight translations of T by the points in S. These are shown by the other wide lines
in the figure.
0
8.3 Reed-Muller Codes 391

OOOI 1001 OOOI IWI

Figure 8.6: Geometric descriptions of parity check equations for first-order vectors of the
R M ( 2 , 4 ) code.

8.3.5 Other Constructions of Reed-Muller Codes


+ +
The lulu vI Construction The lulu vI introduced in Exercise 3.29 may be used to
construct Reed-Muller codes. In fact,
R M ( r , m ) = {[ulu + v] : u E R M ( r , m - 1),v E R M ( r - 1,m - 1)}
having generator matrix

A Kronecker Construction Let G(2,2)= [ 11 . Define the rn-fold Kronecker product


of G(2,2)as
G(2m,2m) = G(2,2)8 G(2,2)8 * * * 8 G(2,2),
m operands
392 Other Important Block Codes

which is a 2m x 2m matrix. Then the generator for the R M ( r , m ) code is obtained by


selecting from G(2m,2m)those rows with weight greater than or equal to 2m-r.

8.4 Building Long Codes from Short Codes:


The Squaring Construction
There are several ways of combining short codes together to obtain codes with different
+
properties. Among these are the [ulu v] construction (outlined in Exercise 3.29) and
concatenation(described in Section 10.6). In this section we present another one, called the
squaring construction [220,204].
We begin by examining partitions of codes into cosets by subcodes. Let Co = C be a
binary linear (n,ko) block code with generator G and let C1 c Co be a (n,kl) subcode of
Co. That is, C1 is a subgroup of Co. Recall that a coset of C1 is a set of the form
CI + c1 = {CI + c : c E el},
where c1 E Co is a coset leader. We will take the nonzero coset leaders in C \ C1. From
Section 2.2.5, recall that Co/CI forms a factor group, partitioning Co into 2ko-kl disjoint
subsets each containing 2kl codewords. Each of these subsets can be represented by a
coset leader. The set of coset leaders is called the coset representative space. The coset
representative for the coset C1 is always chosen to be 0. Denote this coset representative
space by [Co/C1]. The code C1 and the set [C/Cl] share only the zero vector in common,
c1n [c/cli = 0.
Without loss of generality, let Go = G be expressed in a form that k l rows of Go can
be selected as a generator GI for C1. The 2ko-kl codewords generated by the remaining
ko - k l rows of Go \ G I can be used as to generate representativesfor the cosets in C/CI.
Let Go\l = Go \ G I (that is, the set difference, thinking of the rows as individual elements
of the set). The 2ko-kl codewords generated by Go\l form a (n,k - k l ) subcode of Co.
Every codeword in C can be expressed as the sum of a codeword in C1 and a vector in
[Co/Cl]. We denote this as
co = c1 63 [CO/Cll = {u + v : u E Cl, v E [CO/C11).
The set-operand sum cB is called the direct sum.

Example 8.13 While the squaring construction can be applied to any linear block code, we demon-
strate it here for a Reed-Muller code. Consider the RM(1, 3) code with

[ 1
1 1 1 1 1 1 1 1
0 0 1 1 0 0 1 1
G=Go=
0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1
Let C1 be the ( 8 , 3 ) code generated by the first k l = 3 rows of the generator Go,

[ 1
1 1 1 1 1 1 1 1
GI= 0 0 1 1 0 0 1 1 .
0 0 0 0 1 1 1 1
The cosets in C/C1 are

LO, 0, 0 , 0, 0 , 0, 0,Ol + c1, LO, 1,0, 1,0,1,0, 11 + c1


8.4 Building Long Codes from Short Codes: The Squaring Construction 393

The coset representatives are


[COIC11 = “O,O, o,o, o,o, O,OI,[O, 1,0, 1,0, 1,0, 131
generated by the rows of the matrix
Go\i = Go \ G I = LO, 1,0, 1,0, 1,0, 11.
0
One-level squaring is based on C1 and the partition Co/C1. Let (Co/C1 l2 denote the code
&/i of length 2n obtained by the squaring construction, defined as
Cop = ICo/C1 l2 = {(a + x, b + x) : a, b E C1 and x E [Co/C1]}. (8.13)
Since there are 2ko-k1vectors in [Co/Cl] and 2k1 choices each for a and b, there are
2ko-k12kl2kl = 2ko+kl codewords in Co/1. The code Q/1 is thus a (n,ko kl) code. +
Let
m = [ m i , o , m i , i , ..., mi,kl-i,m2,o,m2,1, ..., m2,kl-i,m3,0,ms,i, ..., m3,k,,-kl-il
be a message vector. A coded message of the form c = (a + x, b + x) from (8.13) can be

[
obtained by

c=m :I] Lm@o/l


GO\l GO\l
so the matrix 6 0 / 1 is the generator for the code. The minimum weight for the code is
doll = min(2d0, d l ) .
We can express the generator matrix 6 0 / 1 in the following way. For two matrices M1
and M2 having the same number of columns, let M1 @ M2 denote the stacking operation

4
This is called the matrix direct sum. Let 12 be the 2 x 2 identity. Then

12 €3 G1 = [Ggi f

where €3 is the Kronecker product. We also have

Then we can write


@o/i = 12 8 G I @ [I 11 €3 Go\i.

Example 8.14 Continuing the previous example, the generator for the code ICo/C1 l2 is
- 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
QSO/l = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
394 Other Important Block Codes

We can further partition the cosets as follows. Let C2 be a (n,k2) subcode of C1 with
+
generator G2, with 0 5 k2 5 k l . Then each of the 2ko-kl cosets cl C1 in the partition
CO/CI can be partitioned into 2k1-k2cosets consisting of the following codewords

cl + + C2 = {cl + + c : c E C2}
dp dp

for each 1 in 0 , 1 , 2 , . . . ,2k-kl and each p in 1,2, . . . ,2k1-k2,where d, is a codeword in


C1 but not in C2. This partition is denoted as C/Ci/C2. We can express the entire code as
the direct sum
co = [C/ClI @ [Cl/C21 a3 cz.
Let G1\2 denote the generator matrix for the coset representative space [CI/C~]. Then
G1\2 = G I \ G2.

Example 8.15 Let C2 be generated by the first two rows of G I , so

G2=[ 0
1 1 1 1 1 1 1 1
0 1 1 0 0 1 1 1
There are two cosets in C1/C2,

[O, o,o,o,0, 0 , 0,01+ c2, [O, 0, 0, 0, 1, 1, 1, 11 + c2.


The set of coset representatives [Cl /C2] is generated by

Two-levelsquaringbegins by forming the two one-level squaring construction codes C.011 =


ICo/CI l2 and C 1 p = lC1/C2l2, with generators 8 0 / 1 and 81/2, respectively, given by

@0/1 = [:GO\l
11]
GO\l
(8.14)

Note that C 1 p is a subcode (subgroup) of Co/1. The coset representatives for Co/1/C1/2,
which are denoted by [Co/l/C1/2], form a linear code. Let 8 ~ \cl,2, , denote the generator
matrix for the coset representatives [Co/l/Cl/2]. Then form the code C0/1/2 = ICo/C1/C2I4
by
C0/1/2 = ICo/C1/C214 = {(a x, b + +
x) : a, b E C112 and x E [C0/1/C1~211.
That is, it is obtained by the squaring construction of Co/1 and C0/1/C1/2. The generator

p .P,2]
matrix for C0/1/2 is

@0/1/2 = .
@1\2 81\2

This gives a (4n, ko + 2k1 + k2) linear block code with minimum distance
d0/1/2 = min(4do92di,d2).
8.4 Building Long Codes from Short Codes: The Squaring Construction 395

Writing 6 0 / 1 and 6 1 / 2 as in (8.14), rearranging rows and columns and performing some
simple row operations, we can write 60/1/2 as

G2
0
0
0
@0/1/2 =
GO\l
G1\2
0
0
which can be expressed as

Note that

are the generator matrices for the zeroth and first order Reed-Muller codes of length 4.
More generally, let CI,C2, . . . , Cm be a sequence of linear subcodes of C = Co with
generators Gj, minimum distance dj, and dimensions k l , k2, . . . , km satisfying
co 2 c 1 2 ... 1 c,
k 3 kl 2 - *. 2 km 2 0.
Then form the chain of partitions
co/cl,co/cl/c2, ...,co / C l / - * * / c n l ,
such that the code can be expressed as the direct sum
co = [C/Cll @ [Cl/C21 @ . * * @ [C*-l/Cml.
Assume that the generator matrix is represented in a way that Go 2 G 1 . 2 Gm. Let -
Gj\i+l denote the generator matrix for the coset representative space [Ci/Cj+l], with

rank(Gj/j+l) = rank(Gi) - rank(Gj+l)


and G i \ j + l = Gi \ G i + l . Then higher-level squaring is performed recursively. From the
codes
A 2m-I
@ o / I / . . . / ~ - I = Ico/C1/. . * /cm-l I
and the code
A
m = I C ~ / C ~. ./ /.e m 12"-'
61/2/ ...I 9

form the code


A
@O/l/.../m = [Co/Cl/ .. *

= {(a + X, b + X) : a, b E 8 1 / 2 / - . / m ,x E [@0/1/.../m-l/61/2/.../mI).
396 Other Important Block Codes

The generator matrix can be written (after appropriate rearrangement of the rows) as
m
@ o / I /.../rn = Z2m @ Gm @ ~ G R M m( ) ~@ Gr\r+1,
.
r=O

~ ), is the generator of the R M ( r , m ) code of length 2m.


where G R M ( m

8.5 Quadratic Residue Codes


Quadratic residue codes are codes of length p , where p is a prime with coefficients in
G F ( s ) , where s is a quadratic residue of p . They have rather good distance properties,
being among the best codes known of their size and dimension.
We begin the construction with the following notation. Let p be prime. Denote the set
of quadratic residues of p by Q p and the set of quadratic nonresidues by N p . Then the
elements in G F ( p ) are partitioned into sets as

G F ( P ) = Q p U N p U {Ole
As we have seen G F ( p ) is cyclic. This gives rise to the following observation:

Lemma 8.11 A primitive element of G F ( p ) must be a quadratic nonresidue. That is, it is


in N p .

Proof Let y be a primitive element of G F ( p ) . We know yp-' = 1, and p - 1 is the


smallest such power. Suppose y is a quadratic residue. Then there is a number cr (square
root) such that cr2 = y . Taking powers of cr, we have c r 2 ( P - l ) = 1. Furthermore,the powers
cr, c2,c r 3 , . . . , c 2 ( P - l ) can be shown to all be distinct. But this contradicts the order p of
the field.
So a primitive element y E G F ( p ) satisfies y e E Q p if and only if e is even, and y e E N p
if and only if e is odd. The elements of Q p correspond to the first ( p - 1 ) / 2 consecutive
powers of y 2 ; that is, Q p is a cyclic group under multiplicationmodulo p , generatedby y 2 .
The quadratic residue codes are designed as follows. Choose a field G F ( s ) as the field
for the coefficients, where s is a quadratic residue modulo p . We choose an extension
field G F ( s m )so that it has a primitive pth root of unity; from Lemma 5.16 we must have
pi s"' - 1. (It can be shown [220, p. 5191 that if s = 2, then p must be of the form
p = 8k f 1.)
Let be a primitive pth root of unity in G F ( s m ) .Then the conjugates with respect to
G F ( s ) are
B ' , BS,BS2,B S 3 , . * *

The cyclotomic coset is { 1, s, s2,s3,. . . , }. Since s E


Q p and Q p is a group under multi-
plication modulo p , Q p is closed under multiplication by s. So all of the elements in the
cyclotomic coset are in Q p . Thus Q p is a cyclotomic coset or the union of cyclotomic
cosets.

Example 8.16 Let p = 11, which has quadratic residues Q p = ( 1 , 3 , 4 , 5 , 9 } . Let s = 3. A field
having a primitive 11th root of unity is GF(35). Let E CF(35) be a primitive 11th root of unity.
The conjugates of j3 are:
P , P 3 , B9, P 2 I = 5 , P81 = P4,
8.5 Quadratic Residue Codes 397

so the cyclotomic coset is


11>3,9,5,4),
which is identical to Q p . 0
Now let B be a primitive pth root of unity in G F ( s m ) .Because of the results above,

q(x)= n
iCQp
(x - P i )

is a polynomial with coefficients in G F (s) . Furthermore,

n(x)= n
icNp
(x - Pi> (8.15)

also has coefficients in G F ( s ) . We thus have the factorization


xp - 1 = q ( x ) n ( x ) ( x- 1).
Let R be the ring G F ( s ) [ x ] / ( x P- 1).
Definition 8.5 [220, p. 4811 For a prime p , the quadratic residue codes of length 9, g, 31'
and are the cyclic codes (or ideals of R ) with generator polynomials
q(x), (x - l ) q ( x ) , n(x>, (x - l>n(x>,
+
respectively. The codes 9 and N have dimension i ( p 1 ) ; the codes and have
i
dimension ( p - 1). The codes 9 and N are sometimes called augmented QR codes, while
-
9 and are called expurgated QR codes. 0
Example 8.17 Let p = 17 and s = 2. The field G F ( 2 8 )has a primitive 17th root of unity, which is
@ = a15. The quadratic residues modulo 17 are { 1 , 2 , 4 , 8 , 9 , 13, 15, 16). Then

QR codes tend to have rather good distance properties. Some binary QR codes are the best
codes known for their particular values of n and k. A bound on the distance is provided by
the following.

Theorem 8.12 The minimum distance d of the codes 9 or N satisjies d 2 ? p. IJ; addition-
ally, p = 41 - 1 forsomel, thend2 - d 1 ? p. +
The proof relies on the following lemma.

Lemma 8.13 Let $(x) = q ( x n ) , where n E N p (where the operations are in the ring R).
Then the mots of $(x) are in the set ( a i ,i E N p } . That is, $(x) is a scalar multiple of n(x).
Similarly, n ( x n )is a scalar multiple of q ( x ) .
398 Other Important Block Codes

Proof Let p be a generator of the nonzero elements of G F ( p ) . From the discussion around
Lemma 8.11, Q is generated by even powers of p and N p is generated by odd powers of p.
Write i(x) = (x" - a'). Let m E N p . Then for any m E N p ,

j(a") = n
ieNp
(arn"- a".

But since m E N p and n E N p , m n E N p (being both odd powers of p . So a' = am" for
some value of i, so am is a root of 4 (x). 0
The effect of evaluation at q (x") is to permute the coefficients of q (x).
Proof of Theorem 8.12. [220, p. 4831. Let a(x) be a codeword of minimum nonzero
weight d in 9. Then by Lemma 8.13, the polynomial Z(x) = a(x") is a codeword in N.
Since the coefficients of Z ( x ) are simply a permutation (and possible scaling) of those of
a(x),Z(x) must be a codeword of minimum weight in N. The product a(x)Z(x) must be a
multiple of the polynomial

Thus a(x)Z(x) has weight p . Since a(x) has weight d , the maximum weight of a(x)Z(x)
is d2. We obtain the bound d 2 2 p .
If p = 4k - 1 then n = 1 is a quadratic nonresidue. In the product a(x)Z(x) =
a(x)a(x-') there are d terms equal to 1, so the maximum weight of the product is d 2 - d 1. +
n
U

Table 8.1 summarizesknown distance properties for some augmented binary QR codes,
with indications of best known codes. In some cases, d is expressed in terms of upper and
lower bounds.

Table 8.1: Extended Quadratic Residue Codes 9 [220,373]


n k d n k d n k d
8 4 4" 74 37 14 138 69 14-22
18 9 6" 80 40 16" 152 76 20
24 12 8" 90 45 18" 168 84 16-24
32 16 8" 98 49 16 192 96 16-28
42 21 10* 104 52 20* 194 97 16-28
48 24 12* 114 57 12-16 200 100 16-32
72 36 12 128 64 20
* Indicates that the code is as good as the best known for this n and k

While general decoding techniques have been developed for QR codes, we present only
a decoding algorithm for a particular QR code, the Golay code presented in the next section.
Decoding algorithms for other QR codes are discussed in [2201, [287], [283], and [75].

8.6 Golay Codes


Of these codes it was said, "The Golay code is probably the most important of all codes,
for both practical and theoretical reasons." [220, p. 641. While the Golay codes have not
8.6 Golay Codes 399

supported the burden of applications this alleged importance would suggest, they do lie at
the confluence of several routes of theoretical development and are worth studying.
Let us take p = 23 and form the binary QR code. The field GF(211) has a primitive
23rd root of unity. The quadratic residues are
Q p = { 1 , 2 , 3 , 4 , 6 , 8 , 9 ,12, 13, 16, 18)

n
and the corresponding generators for L! and N are
q(x) = (x - p i ) = 1 + x + x 5 +x6 + x 7 + x 9 + x"
i€Qp

n(x)= l + x 2 + x 4 + x ~ + x 6 + x 1 0 + x l l .
This produces a (23,12,7)code, the Golay code 5 2 3 . It is straightforwardto verify that this
code is a perfect code: the number of points out to a distance t = 3 is equal to

- 1 1 1 1 1 1 1 1
-
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
G= 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
- 1 1 1 1 1 1 1 1 1 1 1 1 1 -

(8.16)
We note the following facts about 524.
In this representation,the 11 x 11 ATl matrix on the upper right is obtained from the
transpose of 12 x 12 Hadamard matrix of Paley type (8.5) by removing the first row
and column of H12, then replacing - 1 by 1 and 1 by 0. Since the rows of H12 differ
in six places, the rows of ATl differ by six places. Because of the identity block, the
sum of any two rows of G has weight 8.
If u and v are rows of G (not necessarily distinct), then wt(u . v) = 0 (mod 2). So
every row of G is orthogonal to every other row. Therefore, G is also the parity check
matrix H of 524. Also 9 is dual to itself 524 = 5i4. Such a code is call self-dual.
Every codeword has even weight: If there were a codeword u of odd weight then
wt(u .u) = 1. Furthermore, since every row of the generator has weight divisible by
4, every codeword is even.
The weight distributions for the (23, 12) and the (24, 12) codes are shown in Table
8.2.
400 Other Important Block Codes

Table 8.2: Weight Distributions for the 923 and 924 Codes [373]

923: i: 0 7 8 11 12 15 16 23
Ai: 1 253 506 1288 1288 506 253 1

Q24: i: 0 8 12 16 24
Aj: 1 759 2576 759 1

8.6.1 Decoding the Golay Code


We present here two decoding algorithms for the Golay code. The first decoder, due to
[75], is algebraic and is similar in spirit to the decoding algorithms used for BCH and Reed-
Solomon codes. The second decoder is arithmetic, being similar in spirit to the Hamming
decoder presented in Section 1.9.1.

Algebraic Decoding of the 923 Golay Code


The algebraic decoder works similar to those we have seen for BCH codes. An algebraic
syndrome is first computed, which is used to construct an error locator polynomial. The
roots of the error locator polynomial determine the error locations, which for a binary code
is sufficient for the decoding. Having minimum distance 7, 923 is capable of correcting up
to three errors.
Let /3 be a primitive 23rd root of unity in GF(211). Recall that the quadratic residues
modulo 23 are Q p = {1,2,3,4,6,8,9, 12, 13, 16, 18) and the generator polynomial is

g(x) = fl ( x - B’).
iE Qp

Thus B, B3, and B9 are all roots of g ( x ) , and hence of any codeword c ( x ) = m ( x ) g ( x ) . Let
+
c ( x ) be the transmitted codeword, and let r ( x ) = c ( x ) e ( x ) be the received polynomial.
We define the syndrome as
sj = r ( @ ) = e(B’>.

If there are no errors, then sj = 0 for i E Q p . Thus, for example, if $1 = $3 = s9 = 0, no


errors are detected. If there is a single error, e ( x ) = xJI ,then
s1 = BJI, s3 = 83’1, s9 = 8 9 ” .

When this condition is detected, single-error correction can proceed.


+ +
Suppose there are two or three errors, e ( x ) = x J l xJz x j 3 , Let z1 = B j l , z2 = BJz
and z3 = 8’3 be the error locators, where z3 = 0 in the case of only two errors.) The
syndromes in this case are
sj = z’; + z; + z;.
Define the error locator polynomial as

L ( x ) = (X - z l ) ( X - ZZ)(X - 23) = x
3
+ 01X
2
+ a 2 X + 03,
8.6 Golay Codes 401

where, by polynomial multiplication,


0 1 = 21 + 22 + 23
0 2 = 2122 + 2123 + 2223
0 3 = 212223.

The problem now is to compute the coefficients of the error locator polynomial using the
syndrome values. By substitution of the definitions, it can be shown that
S9 + S1
9
+ 2
= a2S7 03S3
2
s7 = sIs3 f 0285 + 0 3 s4l
= S1 + 0283 + 03s: + $3 = 0 3 + 02Sl.
5 3
S5 $1

By application of these equivalences(see golay s i m p . m) it can be shown that go1aysimp.m I


(8.17)

The quantity D thus has a cube root in GF(211). From (8.17) we obtain 0 2 = :s + D'J3;
similarly for 0 3 . Combining these results, we obtain the following equations:
01 = s1
An example of the decoder is shown in t e s t G o l a y . cc.

Arithmetic Decoding of the QU Code


In this section we present an arithmetic coder, which uses the weight structure of the syn-
drome to determine the error patterns.
Besides the generator matrix representation of (8.16), it is convenient to employ a sys-
tematic generator. The generator matrix can be written in the form
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
G= 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

= [I12 B].
It may be observed that B is orthogonal,
B ~ =BI .
+
Let r = c e and let e = (x, y), where x and y are each vectors of length 12. Since the code
is capable of correcting up to three errors, there are only a few possible weight distributions
of x and y to consider:
wt(x) 5 3 wt(y) = 0
wt(x) 5 2 wt(y) = 1
wt(x) 5 1 wt(y) = 2
wt(x) = 0 wt(y) = 3.
402 Other Important Block Codes

Since the code is self-dual, the generator matrix is also the parity check matrix. We can
compute a syndrome by
s = GrT = G(e T ) = G[x, yIT = xT + ByT.
If y = 0, then s = xT. If s has weight 5 3, we conclude that y = 0. The error pattern is
e = (x, 0) = (sT, 0).
Suppose now that wt(y) = 1, where the error is in the ith coordinate of y and that
wt(x) I2. The syndrome in this case is
T
S = X +bz,
where bi is the i th column of B. The position i is found by identifying the position such that
+
wt(s bj) = wt(x) L: 2. Having thus identified i, the error pattern is e = ((s b,)T, yi). +
Here, the notation yz is the vector of length 12 having a 1 in position i and zeros elsewhere.
+ + +
Ifwt(x) = Oandwt(y) = 2 o r 3 thens = b, b j o r s = b, b, bk. Since B is an
orthogonal matrix,
B ~ =S B ~ ( B Y=~ yT.)
The error pattern is e = (0, (BTs)T).
Finally, if wt(x) = 1 and wt(y) = 2, let the nonzero coordinate of x be at index i f Then
B ~ =S B T (XT + BY^) = B ~ +XB ~ ~ = rB
: + yT,
~ ~
where r, is the ith row of B. The error pattern is e = (xi, (BTs)T ri). +
Combining all these cases together, we obtain the following decoding algorithm.
go1ayarith.m

Algorithm 8.2 Arithmetic Decoding of the Golay 924 Code

(This presentation is due to Wicker [373])


I Input: r = e +
c, the received vector
z Output: c, the decoded vector
3 Compute s = G r (compute the syndrome)

4 if wt(s) 5 3

5 e= (sT,O)
6 +
eke if wt(s bi) 5 2 for some column vector bj
7 +
e = ((s bi)T y j )
8 else
9 Compute B ~ S
10 ifwt(BTs) 5 3
e = (0, ( ~ ~ s ) ~ )
iz else ifwt(BTs + r:) 5 2 for some row vector ri
13 e = (xi, + ri)
14 else
15 Too many errors: declare uncorrectable error pattern and stop.
16 end
17 end
18 c = r f e

A Matlab implementation that may be used to generate examples is in golayarith m .


Section 4.3 Reed-Muller Codes it!!:ii

13 gates. The eight X-OR trees for generating the eight syndrome bits are identical.
These provide uniform and minimum in the error-correction process.

,/j],,3; ~IEIE[l)-MlRl!E~, (O[l)!ES


Reed-Muller (RM) codes form a class of multiple-error-correction codes. These
codes were discovered by Muller in 1954 [9], but the first decoding algorithm for
these codes was devised by Reed, also in 1954 [10]. They are simple in construction
and rich in structural properties. They can be decoded easily in several ways using
either hard- or soft-decision algorithms.
For any integers m and r with O :S r :s m, there exists a binary rth-order RM
code, denoted by RM(r. m), with the following parameters:

Code length: n = 2"',


Dimension: k(r. m) = 1 +Cr)+ G) + · · · + (';').
Minimum distance: dmin = 2m-r,

where (m)
I
= (111 ~.7l ') 1.1.• 1 is the binomial coefficient. For example, let m = 5 and r = 2.
Then. n = 32. k(2. 5) = 16. and dmin = 8. There exists a (32, RM code with a
minimum distance of 8.
For 1 :S i :Sm, let w; be a 2m-tuple over GF(2) of the following form:

wl = (0 · .. 0, 1 .. -1 0 .. · 0 · .... · . '-=-=y=="
~_, ~ ' ~ '
1 .. · 1) (4.4)
2i-1 21-I

which consists of 2m-i+l alternating all-zero and all--one i- 1 -tuples. Form = 4. we


have the following four 16-tuples:

'¥4 = (0 0 0 0 0 0 0 0 11111 111),


V3 = (0 0 00111100001111),
V2 = (0 0 11 0 0 1100 11001 1),
V1 = (0 1010101 0 1010101).
Let !iil = (ao, a1, · · · , a 11 _1) and lb= (ho, b1, · · · • b11 _1) be two binary n-tuples.
We define the following logic (Boolean) product of mand lb:
6
2l ·lb= (ao · ho, Cll · b1, · · · , G11-l · bn_i),

where "." denotes the logic product (or AND operation), i.e., ai . bi = 1 if and only
if ai = h1 = 1. Form = 4,

'¥3 · '¥2 = (0 0 0 0 0 0 1 1000 0 0 0 11).


For simplicity, we use 21lb for 21 · lb.
Let vo denote the all-one 2111 -tuple, vo = (1. 1. · · · , 1). For 1 :S i1 < i2 < · · · <
i1 :S m, the product vector
106 Chapter 4 Important Linear Block Codes

is said to have degree I. Because the weights of v1, v2, · · · , v 111 are even and powers
of 2, the weight of the product v; 1wi 2 · · · Wi1 is also even and a power of 2, in fact, 2111 - 1.
The rth-order RM code, RM(r, m ), of length 2m is generated (or spanned) by
the following set of independent vectors:

(4.5)
· · · , up to products of degree r}.

There are
k(r, m) = 1+ G+ (7;) + ··· + (7)
1
)

vectors in GRM(r, m). Therefore, the dimension of the code is k(r, m).
If the vectors in GRM(r, m) are arranged as rows of a matrix, then the matrix
is a generator matrix of the RM(r. m) code. Hereafter, we use GRM(r, m) as the
generator matrix. For O:::: I :::c r, there are exactly en
rows in GRM(r, m) of weight
2111 - 1. Because all the vectors in GRM(r, 111) are of even weight, all the codewords
in the RM(r, m) code have even weight. From the code construction we readily see
that the RM(r - 1, m) code is a proper subcode of the RM(r, m) code. Hence, we
have the following inclusion chain:

RM(O, m) c RM(l, m) c · · · C RM(r, m). (4.6)

Furthermore, RM codes have the following structural property: the (m - r - l)th-


order RM code, RM(m - r - 1, m), is the dual code of the rth-order RM code,
RM(r, m) (see Problem 4.9). The zeroth-order RM code is a repetition code and the
(m - l)th-order RM code is a single-parity-check code.

IEXAMPllE 4.2
Let 111 = 4 and r = 2. The second-order RM code of length n = 16 is generated by
the following 11 vectors:

wo 1111111111111111
W4 0000000011111111
V3 0000111100001111
Vz 0011001100110011
VJ 0101010101010101
W3V4 0000000000001111
VzW4 0000000000110011
V(o/4 0000000001010101
VzV3 0000001100000011
V1V3 0000010100000101
v1v2 0001000100010001

This is a (16, 11) code with a minimum distance of 4.

With the preceding construction, the generator matrix GRM(r, 111) of the
RM(r, 111) code is not in systematic form. H can be put in systematic form with
Section 4.3 Reed-Muller Codes W1
row and column nn,µ;,•s '" form
have many and useful structures that reduce Uvv~<~rnc;,. v~H<1fJ.Cvr,H.f This
will be discussed in a later
The Reed for RM codes is best by an example.
Consider the second-order RM code RM(2, of 1n 4.2.

Note that the sum of the first four of each vector is


zero the vector v1 w2. The same is true for the other three groups of four
consecutive As a we have the four sums that relate
the information bit 012 to the code bits:

012 =ho+ b1 + b2 + b3,


a12 = be + bs + b6 + b7,

a12 = bs + b9 + b10 + b11,

a12 = b12 + bn + b14 + bis-

These four sums four determinations of the


information bit 012 from the code bits. H the codevvord (bo, b1, · · · , bis) is transmitted
and there is a transmission error in the received vector, the error can affect
one determination of a12. As a the other three determinations
of 012 will the correct value of 012- This is the basis for RM codes.
Let fi' = (ro. r1, · · · be the received vector. In 012, we form the
sums:

A1 = ro + r1 + r2 + r3,
A2 = r4 + r7,
+rs+ r6
= rg + r9 + r10 + r11.

A4 = r12 + r13 + r14 + r15.

replacing the code bits ',i\/ith the corresponding received bits


in the four determinations of 012- These sums are called check-sums,
which are the estimates of 012. Then, 012 is decoded based on the following
majority-logic decision rule: 012 is taken to be to the value assumed by the
majority in {A1, . A4}. If there is a a random choice of the value of a 12 is
made. h is dear that if there is only one error in the received vector, a 12 is
decoded correctly.
108 Chapter 4 Important Linear Block Codes

Similar independent determinations of information bits a13, a23, a14, a24, and
a34 can be made from the code bits. For example, the four independent determina-
tions of a13 are:
a13 = bo + b1 + b4 + bs,
a13 = b2 + b3 + b6 + b7,
a13 = bs + b9 + b12 + b13,
a13 = b10 + b11 + b14 + b15.
At the decoder, we decode a 13 by forming the following four check-sums from the
received bits using the preceding four independent determinations of a13:

A1 = ro + r1 + r4 + rs,
A2 = r2 + r3 + r6 + r7,
A3 = rs + r9 + r12 + r13,
A4 = r10 + r11 + r14 + r15.

From these four check-sums we use the majority-logic decision rule to decode a13-
H there is a single transmission error in the received sequence, the information bits
a12, a13, a23, a14, a24, and a34 will be decoded correctly.
After the decoding of a12, a13, a23, a14, a24, and a34, the vector

is subtracted from r. The result is a modified received vector:


r(l) = (r(l)
0 '
r(l)
1 '
·· · r(l))
' 15

In the absence of errors, r<ll is simply the following codeword:

aovo + a4v4 + a3v3 + a2v2 + afv1 = (b0(1) , b 1(1) , · · · (1)


, b 15 ).

We note that starting from the first component, the sums of every two consecutive
components in vo, v4, v3, and v2 are zero; however, the sum of every two consecutive
components of v1 is equal to 1. As a consequence, we can form the following
eight independent determinations of the information bit a 1 from the code bits 1) b6
through bf~):
-
a1 -
b(l)
o
+ b(l)
1 '
- b(l)
a1 - 8
+ b(l)
9 '
- b(l) + b(l) (1) (1)
a1 - 2 3 , a1 = blO + bll '
- b(l) + b(l) (1) (1)
a1 - 4 5 a1 = h12 + b13 ,
(l) + l (1) (1) (1)
a1 = b6 ?7 , a1 = bl4 + blS .

Similar independent determinations of a2, a3, and a4 can be formed. In decoding


we form the following check-sums from the bits of the modified received vector
a 1,
Section LU Reed-Muller Codes 11Ql!9J

1r< 1l and the preceding eight independent determinations of a1:

A1(l) _- ,.Cl)
0
+ 1)1)
1 ' A~l) = ril) + r~l)'
1.(1) + /Jl)
A 2(l) -_ 2 3 ' A6(l) _ 1.(1) -'- ,Jl)
- 10 I 11 '
A3(l) -- 1.(1) + 1.(1) (l) _ 1.(1)
A7 - + ,Jl)
4 5 ' 12 '13 '
A4(l) -- /Jl) + /Jl)
A8(l) -- ,Jl) + ,Jl)
6 7 ' 14 15 ·

From these check-sums we decode cq using the decision rule.


we can decode the information bits a2, a3, and a4.
After the decoding of a1, a2, a3, and a4, we remove the effect of cq, a2, a3, and
a4 from 1rC 1l and form the following modified received vector:

Jr(2) = (r(2) r(2) .. . r(2))


0 ' l ' ' 15

= Il°(l) - a4V4 - a3v3 - a2v2 - a1v1.

fo the absence of errors, rr.i 2l is the following codeword:

aowo = (oo, ao. · · · , ao).

This result gives 16 independent determinations of ao. fo decoding ao, we


set o0 to the value taken by the majority of the bits in Il°(2l. This step completes the
entire decoding.
The demonstrated decoding is referred to as majority-logic decoding. Because
it consists of three steps ( or levels) of decoding, it is called three-step
decoding. It can easily be implemented using majority-logic elements.
If there is only one error in the received vector ir, the information bits
a12, a13, 014, a23, a24, and a34 will be correctly decoded. Then, the modified received
vector irUl will stiH contain a single error at the same location. This single error
affects only one of the eight check-sums for a;, with 1 ::::: i ::::: 4. The other seven
check-sums give the correct value of o;. Therefore. the information bits cq, 02, a3,
and a 4 will be correctly decoded. As a result, the next modified received vector ir( 2 )
will contain only one error (still at the same location). and the information bit ao
will be correctly decoded.
If there are two transmission errors in rt, these two errors may affect two
check-sums for information bit a;J. In this case, there is no majority in the four
check-sums; two check-sums take 0, and two other check-sums take 1. A choice
between these two values as the decoded value of ct;J may result in an incorrect
decoding of a;J. This incorrect decoding affects the subsequent levels of decoding
and results in error propagation. Consequently, the decodings of a1, 02, 03, a4, and
a 0 are very likely to be incorrect. Because the code guarantees correcting any single
error but not two errors, its minimum distance is at least 3 but less than 5. Since all
the codewords have even weight. the minimum distance of the code must be 4.
We have used an example to introduce the concepts of majority-logic decoding
and the multiple-step decoding process of the Reed algorithm. Now we are ready
to present the general algorithm for decoding RM codes. The major part of the
decoding is to form check-sums at each decoding step.
110 Chapter 4 Important Linear Block Codes

Consider the rth-order RM code, RM(r, m). Let


!11 = (ao, a1, ... 'a,n' a1,2, ... , am-l.m' ... , a1.2, ... ,r' ... , am-r+l,m-r+2, .. ,111)
be the message to be encoded. The corresponding codeword is
b = (bo, b1, · · · , b11-1)

=aovo+ L a; 1V; 1 + L a; 1a; 2 V; 1v; 2


l.c:i1:C:111 l:C:i1 <i2:C:m (4.7)

+···+

Let r = (ro, r1, · · · , r11 _1) be the received vector. Decoding of RM(r, m) code
consists of r + 1 steps. At the first step of decoding, the information bits a; 1; 2 ... ;,
corresponding to the product vectors v; 1v; 2 • • • v;, of degree r in ( 4. 7) are decoded
based on their check-sums formed from the received bits in r. Based on these
decoded information bits, the received vector r = (ro, ri, · · · , r11 _1) is modified.
Let rC 1) = (rgl), r?), ... , r,;~ 1) denote the modified received vector. At the second
step of decoding, the bits in the modified received vector ll'(l) are used to form
the check-sums for decoding the information bits a; 1; 2 ... ;,_ 1 that correspond to the
product vectors v; 1v; 2 • • • v;,_ 1 of degree r - l in ( 4. 7). Then, the decoded information
bits at the second step of decoding are used to modify r(ll. The modification results
in the next modified received vector rC 2l = (rg2), r?), · · · , r,;~ 1) for the third step of
decoding. This step-by-step decoding process continues until the last information
bit ao that corresponds to the all-one vector vo in (4. 7) is decoded. This decoding
process is called (r + 1)-step majority-logic decoding [2, 11].
Now, we need to know how to form check-sums for decoding at each step. For
1 :::: i 1 < i2 < · · · < ir-l :::: m with O :::: l < r, we form the following index set:
6 {
S = c; 1 -1 2 ; 1-1 + c; 2 -1 2 io-1
- + · · · + c;,_ 1-1 2;,-1-1 .. c;;-1 E {O , l} f or 1 :::: J::::
· r - /}.

(4.8)
which is a set of 2r-l nonnegative integers less than 2 in binary form. The exponent 111

set {i 1 - l, i2 - 1, · · · , i,._, - 1} is a subset of {O, 1, · · · , m -1}. Let Ebe the set of


integers in {O, 1, · · · , m -1} but not in {i1 - 1, i2 - 1, · · · , i,._, - 1}; that is,

E ;;_ {O, 1, · · · , m - 1} \ {i1 - 1, i2 - 1, · · · , i,._, - 1}


(4.9)
= U1, 12, · · · , j111-r+1 L
where O:::: ji < .h < · · · < Jm-r+l :::: m - 1. We form the following set of integers:
6 . . .
sc = {dj12 11 + dh212 + ... + dJ,,,-,+l21m-r+I : dj, E {O, l} for 1 :::: t < m - r + l}.
(4.10)
Note that there are 2m-r+l nonnegative integers in sc, and
Sn S' = {O}.
For l = r, S = {O}, and sc = {O, 1, ... , 2 -1}. 111
Section liL3 Reed--Muller Codes iH

For O ::: l ::: r, suppose vve have


decoded information bits are (2 .. ·ir-f+I . We forn1 the modified received
vector:
(4.11)
l:Si I< ·· <ir-f+-l ::m
where ll'(l-l) is the modified received vector for the Ith
For each integer q E sc. we form the set of
I::,
B=q+-S

= {q +- s : s E S}.
the check-sums for decoding the information bits a; 1; 2 .. ;, _ 1 are
AUl =
tEB

for q E sc. Because there are 2m-r+l q in sc. we can form 2


-r+l check-sums
111

for each information bit a; 1;,. ;,_,.


At the first of l =---= 0 and 2m-r check-sums can be formed for
decoding each information bit o; 1; 2 •.. ;,. H there are 2m-r-l - 1 or fewer errors in
the received vector ll', then more than half of the check-sums assume the
.;, and hence the of ;2 .;, is correct, the
however, if ll' contains or more errors. there is no guarantee
of the check-sums will assume the value of o; 1;, ••• ;,. fo this case,
decision may result in an incorrect For consider
an error with 2111 -r-l errors such that each error appears in a different check-
sum. In this case, half of the 2111 - 1 check-sums assume the value 0, and the other half
assume the value 1. There is no clear A random choice of the two values
may resuh in an incorrect of a; 1; 2 ... ;,. consider another error
of 2m-r-l + 1 errors such that each error appears in a different check-sum. fo this
case, 2m-r-l +- 1 (majority) of the check-sums assume the opposite value of a; 1
and majority-logic decision based on the check-sums results in incorrect ,..,",""''"'
of a; 1 ;c··-i,· Note that the number of check-sums is doubled at each subsequent
decoding step. H there are 2m-r-l - 1 or fewer errors in the received vector rr.
then majority-logic decision based on check-sums results in correct decoding at
each This that the minimum distance of the RM(r. m) code is at least
2. (2111 -r-l - 1) + 1 = 2111 -r - 1. Because the codewords in RM(r, m) have even
weights, the minimum distance is at least 2111 -r; however, each product vector of
degree r in the matrix GRJvr(r, m) has 2111 - 1 and is a codeword.
Therefore, the minimum distance of the code is exactly 2 -,-. 111

The Reed decoding algorithm for RM codes is simply a multistage decoding


algorithm in which the decoded information at each stage of decoding is passed
down for the next of ~~·v~,,u,,-,,

Consider the second-order RM code of 16 with m = 4 given in Example 4.2.


Suppose we want to construct the check-sums for the information bit 012. Because
112 Chapter 4 Important Linear Block Codes

i1 = 1 and i2 = 2, we obtain the following sets:


S = {co+ ci21 : co, ci E {0, l}}
= {O, 1, 2, 3},
E = {O, 1, 2, 3} \ {O, 1}
= {2, 3},
Sc = {d222 + d32 3 : d2, d3 E {0, 1}}
= {O, 4, 8, 12}.
Then, the index sets for forming the check-sums for a 12 are

B1 = 0 + S = {O, 1, 2, 3},
B2 = 4 + S = {4, 5, 6, 7),
B3 = 8 + S = {8, 9, 10, 11},
B4 = 12 + S = {12, 13, 14, 15}.
It follows from (4.13) with l = 0 that the four check-sums for a12 are
(0)
A1 = ro + r1 + r2 + r3,

A~O) = r4 +rs+ r6 + r7,


(0)
A3 = rs + r9 + r10 + r11,
(0)
A4 = ri2 + r13 + r14 + r15.

Now, consider the check-sums for a 13 . Because i 1 = 1 and i2 = 3, we obtain the


following sets:

S = {co+ c22 2 : co, c2 E {O, 1}}


= {O, 1, 4, 5),
E = (0, 1, 2, 3} \ {O, 2}
= (1, 3},

Sc= {d12 + d32 3 : d1, d3 E (0, 1}}

= (0, 2, 8, 10}.

The index sets for constructing the check-sums for a 13 are

B1 = 0+S= (0, 1, 4, 5),

B2 = 2+S = (2, 3, 6, 7},


Section 4.3 Reed-Muller Codes iB

=8+S = (8. 9. 12, 13},


B4 = 10 + S = {10, 11, 14, 15}.

From these index sets and we obtain the following check-sums for 013:

AjO) = ro + r1 + r4 + rs,
AiO) = r2 + r3 + r6 + r7,
A~O) =rs+ r9 + r12 + r13,
(0)
A4 = rio + r11 + r14 + r15.
Using the same we can form the check-sums for information bits
014, 023, 024, and 034.
To find the check-sums for a 1 , a2, 03, and a4, we first form the modified received
vector ii'(l) based on (4.11):

ii'(l) = Ir-

Suppose we want to form the check-sums for a3. Because i1 = 3, we obtain the
following sets:
2
S = (c22 : c2 E (0, 1)} = {O, 4},

E = {O, 1, 2, 3} \ {2} = {O, L 3},

sc ={do+ d 12 + d32 3 : do, d1, d3 E {O, 1}}

= {O, 1, 2, 3, 8, 9, 10, 11}.

Then, the index sets for forming the check-sums of a3 are

B1 = (0, 4}, Bs= {8. 12},


B2 = {1, 5}, B6 = {9, 13},
B3 = {2, 6}, B7 = {10, 14},
B4 = {3, 7}, Bs = {11, 15).

It follows from (4.13) with I = 1 that we obtain the following eight check-sums:
A (1) - + r(l)
r(l)
5 - 8 12'
(1) + (1)
A6(l) = r9 r13 ,
(1) + (1)
A7(l) =r10 r14'
A (l) - r(l) + r(l)
8 - 11 15 ·

Similarly, we can form the check-sums for a1, a2, and 04.
114 Chapter 4 Important Linear Block Codes

10° -

10-1

10-2

10-,
~
:.0
"
..Cl
0
....
0..
....
........0 10-0
- - :uncoded BPSK
! __... :RM(16, 11, 4)
~
- :RM(32, 6, 16)
~ :RM(32, 16, 8)
10-7
- ....., - :RM(32, 26, 4)
- -><- - :RM(64, 22, 16)
10-8
- -o- - :RM( 64, 42, 8)

10-9
0 2 4 6 8 10 12
E1/N0 (in dB)

FIGURE 4.2: Bit-error performances of some RM codes with majority-logic decoding.

Error performances of some RM codes of lengths up to 64 using majority-logic


decoding are shown in Figure 4.2.

4.4 OlHIER CONSTRUCTIONS FOR RIEED-MUlllER CODES


Besides the construction method presented in Section 4.3, there are other methods
for constructing RM codes. We present three such methods in this section. These
methods reveal more structures of RM code that are useful in constructing trellis
diagrams and soft-decision decodings.
Let A = [aij] be an m x m matrix and B = [bij] be an n x n matrix over
GF(2). The Kronecker product of A and B, denoted by A@B, is the mn x mn matrix
obtained from A by replacing every entry a;J with the matrix a;J B. Note that for
aij = 1, a;J B = B, and for a;J = 0, a;J B is an n x n zero matrix. Let

(4.14)

be a 2 x 2 matrix over GF(2). The twofold Kronecker product of Gc2.2) is defined as

(4.15)
Section 4.4 Other Constructions for Reed-Muller Codes ns
The three-fold Kronecker of G(2.2J is defined as

.2>)
6
= [~ i J ® [ ~ iJ ® [ ~ iJ
1 1
-- [ 01
:i lt0
1 0
0 1
0 0
j]
1 1 1 1 1 1 1 1 (4.16)
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
0 0 0 1 0 0 0 1
0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1

Similarly, we can define the ,n-fold Kronecker product of G(2.2J· Let n = 2111 • We use
G(n.nJ to denote the 111-fold Kronecker of G(2.2)· G(l,. 11 ) is a 2m x 2111 matrix
over The rows of G(n.n) have weights 2°. 2 1 , 22 , ... , 2"', and the number of
rows with weight 2111 - 1 is (';')for O :s l :s m.
The generator matrix GRM(r. m) of the rth-order RM code RM(r, m) of length
n = 2"' consists of those rows of G(l,, 11 J with to or greater than
2m-r. These rows are the same vectors given by (4.5), except they are a different
permutation.

IEJ{AMrUE 4.1.ii

Let m = 4. The fourfold Kronecker product of G(2.2J is

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 0 1 1 0 1 0 1
0 0 1 0 1 0 1
0 0 1 1 0 1 1 0 0
0 1 1 0 0 1 1
0 0 0 1 0 0 1 0 0
0 0 1 0 0 0 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
G<24.2"J = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
116 Chapter 4 Important Linear Block Codes

The generator matrix GRM(2, 4) of the second-order RM code, RM(2, 4), of length
16 consists of the rows in G (16. 16) with weights 22 , 23 , and 24 . Thus, we obtain

1 1 1 1 1 1 11 1 1 1 1 1 1 1 1
0 1 0 1 0 1 01 0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
GRM(2, 4) = 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1

which is exactly the same matrix given in Example 4.2, except for the ordering of
the rows.

Let u = (uo, u1, ... , u11 _1) and v = (vo, vi, ... , v11 _1) be two n-tuples over
GF(2). From llll and v we form the following 2n-tuple:
I:,
juju+ vi= (uo, u1, ... , Un-1, uo + vo, u1 + v1, ... , U 17 -1 +v 11 _1). (4.17)

For i = 1, 2, let C; be a binary (n, k;) linear code with generator matrix G; and
minimum distance d;, respectively. Assume that d2 > d1. We form the following
linear code of length 2n:

C = IC1!C1 + C2I
(4.18)
= {llllllfill + vi : fill E C1 and V E C2}.

C is a binary (2n, k1 + k2) linear code with generator matrix

(4.19)

where O is a k2 x n zero matrix. The minimum distance dminCC) of C is

(4.20)

To prove this, let x = IM!llll + vi and y = IIDl'lllll' + v'I be two distinct codewords in C.
The Hamming distance between x and y can be expressed in terms of Hamming
weights as follows:

d(x., y) = w(fill + fill1) + w(fill + 1lll1 + v + v'), (4.21)

where w(z) denotes the Hamming weight oh. There are two cases to be considered,
v = v' and v -jc v'. If v = v', since x -jc y, we must have lUl -jc u'. In this case,

d(x, y) = w(llll + llll1 ) + w(llll + llll1 ). (4.22)


Section 4.4 Other Constructions Reed-1\/luller Codes 'l U

Because llll + 1ill1 is a nonzero codeword in C1, w(llll + lill 1


) 2': d1. it follows from
that

(4.23)

If,; cf 'I/, we have

y) 2': + --1-
W(illl + v') -- w +
= w(7 + v').

Since Y + w' is a nonzero codeword in + > From vve have

',\f\
J/ -
>

and that

y) 2':

Because 'X and y are two different codewords in C. the minimum distance
d 111 ; 11 ( C) must be lower bounded as follows:

d111i11CC) ?.':

Let nuo and Yo be two codewords m C1 and


Then. w(uno) = d1 and w(vo) -= 0 The vector ITI110ITI11ol is a codeword in C with
w(l11.!1olnnol) = 2d1. The vector li[])lvol is also a codeword in C with weight
w(li[])lvol) = cfi. From (4.27) we see that (C) must be either 2d1 or
we conclude that

dmin(C) = ( 4.28)

The construction of a code from two -~''"~·~c,,·.,u codes is called


the lnnltIB + wl-construction [12. 13] which is a for constructing
long codes from short codes.

LetC1 be the (8, 4) linear code of minimum distance 4

1 1 1 1 1 1 1
0 0 0 0 1 1 1
Gi= 0 0 1 1 0 0 1
[ 0 1 0 1 0 1 0
Let C2 be the (8, 1) repetition code of minimum distance 8 generated

G2 =[ 1 1 1 1 1 1 1 1 ].
118 Chapter 4 Important Linear Block Codes

Using lulu+ vi-construction, we obtain a (16, 5) binary linear code of minimum


distance 8 with the following generator matrix,

G-[
- G1
0 G1
G2
J

~[~
1 1 1 1 1 1 1 1 1 1 1 1 1 1

n
0 0 0 1 1 1 1 0 0 0 0 1 1 1
0 1 1 0 0 1 1 0 0 1 1 0 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1

RM codes of length 2111 can be constructed from RM codes of length 2111 - 1


using the lulu+ vi-construction [12]. Form :::_ 2, the rth-order RM code in lulu+ vi-
construction is given as follows:

RM(r, m) ={lulu+ vi: u E RM(r, m - 1) and


(4.29)
v E RM(r -1, 111 - l)}

with generator matrix

GRM(r, m) =[
GRM(r, 111
O
- 1) GRM(r, 111 -1) J (4.30)
GRM(r - 1, 111 - 1) .

The matrix of (4.30) shows that a RM code can be constructed recursively from
short RM codes by a sequence of lulu+ vi-constructions. For example, the rth-order
RM code RM(r, m) of length 2111 can be constructed from RM codes RM(r, 111 - 2),
RM(r l, m - 2), and RM(r - 2, m - 2) of length 2111 - 2. The generator matrix in
terms of component codes is given as follows:

G-
GRM(r, m - 2) GRM(r, 111 - 2)
0 GRM(r-l,m-2)
GRM(r, m - 2)
0
GRM(r, 111. - 2) ·
GRM(r-l,m-2)
1
-
l 0
0
0
0
GRM(r-l,111-2) GRM(r-l,111-2)
0 GRM(r - 2,111 - 2)
.

(4.31)

The recursive structure of RM codes is very useful in analyzing and constructing


their trellises [15, 16]. This structure also allows us to devise multistage soft-decision
decoding schemes for RM codes that achieve good error performance with reduced
decoding complexity. This topic will also be discussed in a later chapter.
Consider a Boolean function f(X1, X2, ... , X111 ) of 111 variables, X1, X2, ... ,
X 111 , that take values ofO or 1 [12, 14]. For each combination of values of X1, X2, ... ,
and X 111 , the function f takes a truth value of either O or 1. For the 2111 combinations
of values of X1, X2, ... , X 111 , the truth values off form a 2111 -tuple over GF(2).
For a nonnegative integer l less than 2111 , let (b 11 , b12, .... b1 111 ) be the standard
binary representation of l, such that I = b11 + b122 + b1322 + · · · + b1 111 2111 - 1. For a
Section 4.5 The Squaring Construction of Codes ~19)

given Boolean function f(X1, X2, ... , X 111 ), we form the following 2111 -tuple
vector):
'i!(f) = (vo, v1 .... , v1, .... v2111-1) (4.32)
where
(4.33)
and (b11, b12, ... , b1 111 ) is the standard binary representation of the index integer/.
We say that the Boolean function f (X1, ... , X111 ) represents the vector 'ii. We use
the notation w(f) for the vector represented by f (X1, X2, ... , ). For 1 :s i :s m,
consider the Boolean function

... , Xm) = (4.34)

H is easy to see that this Boolean function represents the vector 'ii; defined
For 1 :s i, j :s m, the function

represents the logic of •1; and 'ilj, represented by g(X1, ... , X111 ) = and
h(X1, X2, ... , Xm) = X 1, respectively. For 1 :S i1 < i2 < · · · < i,. :Sm, the Boolean
function
(4.36)
represents the logic product of 'ii; 1 , 'i/; 2 , ••• , and 'ii;,. Therefore, the generator vectors
of the rth-order RM code of length n = 2111 (the rows in GRM(r, m)) are represented
the Boolean functions in the following set:

(4.37)
... up to all products of r variables}.

Let P (r, m) denote the set of all Boolean functions (or polynomials) of degree r or
less with m variables. Then, RM(r, m) is given the following set of vectors [12]:

RM(r, m) = {'i!(.f): f E P(r, m)}. (4.38)

The Boolean representation is very useful in studying the weight distribution


of RM codes [18, 20].

itn.• 5 lllllE '.SQQJ1ll:Jt!WIJG (O~l§l!RQJ(l!ON Of (OIQ)IE§


Consider a binary (11, k) linear code C with generator matrix G. For O :s ki :s k., let
C1 be an (11, lq) linear subcode of C that is spanned by k1 rows of G. Partition C into
2k-tq cosets of C 1· This partition of C with respect to C 1 is denoted by C /Cl· As
shown in Section 3.5, each coset of C1 is of the following form:

(4.39)

with 1 :s l :s 2k-k1, where for 'ii/ I= @, v1 is in C but not in C1 , and for v1 = «Jl, the
cos et «Jl EB C1 is just the subcode C 1 itself. The codeword 'ilt is called the leader ( or
representative) of the coset 'ii/ EB Cl· We also showed in Section 3.5 that any codeword
120 Chapter 4 Important Linear Block Codes

in a coset can be used as the coset representative without changing the composition
of the coset. The all-zero codeword O is always used as the representative for C1.
The set of representatives for the cosets in the partition C / C1 is denoted by [C / C1],
which is called the cos et representative space for the partition C / C1 · Because all the
cosets in C / C1 are disjoint, C1 and [C / C1] have only the all-zero codeword O in
common; that is, C1 n [C / C1] = {O}. Then, we can express C as the sum of the coset
representatives in [C/C1] and the codewords in C1 as follows:

(4.40)

The preceding sum is called the direct-sum of [C / C1] and C1.


Let G1 be the subset of /q rows of the generator matrix G that generates the
linear subcode C1. Then, the 2k-k1 codewords generated by the k - k1 rows in the
set G \ G 1 can be used as the representatives for the cosets in C / C1 · These 2k-ki
codewords form an (n, k - k1) linear subcode of C.
Let C2 be an (n, k2) linear subcode of C1 with O ::c k2 ::c k1. We can further
partition each coset v1 El) C 1 in the partition C / C1 based on C2 into 2'q -k 2 cosets of
C2; each coset consists of the following codewords in C:

(4.41)

with 1 ::: I ::: 2k-k1 and 1 ::c q ::c 2k1-k 2 , where for wq f- 0, wq is a codeword in C1 but
not in C 2. We denote this partition C / Ci/ C2. This partition consists of 2k-k2 cosets
of C2. Now, we can express C as the following direct-sum:

(4.42)

Let C1 . C2, ... , C111 be a sequence of linear subcodes of C with dimensions


k1, k2, ... , k 111 , respectively, such that

(4.43)

and
k ~ k1 ~ k2 ~ · · · ~ k 111 ~ 0. (4.44)
Then, we can form a chain of partitions,

(4.45)

and can express C as the following direct-sum:

( 4.46)

We now present another method for constructing long codes from a sequence
of subcodes of a given short code. This method is known as the squaring construc-
tion [15].
Let Co be a binary (n, ko) linear block code with minimum Hamming dis-
tance do. Let C1, C2 ..... C111 be a sequence of subcodes of Co such that

(4.47)
Section 4.5 The Squaring Construction of Codes 121

For O :S i :S m, let G;, k;, and d; be the generator matrix, the dimension, and the
minimum distance of the subcode VVe form a chain of as
follows:

Co/C1, Co/Ci/C2, ... , Co/Ci/··· /Cm. (4.48)

For O :S i < m, let G,;;+1 denote the generator matrix for the coset
space [C; /C;+1J. The rank of G;;;+1 is

Rank(G;;;+1) = Rank(G;) -
Without loss of generality, we assume that Go :) G 1 :) :) ... :) . Then, for
0 :Si < m,
G;;;+1 = \G;+l· (4.50)
One-level squaring construction is based on C1 and the
1ll = (ao. 01, .... a 11 _1) and \bJ = (bo, b1 ..... b11 _1) be two binary
(ru, ibJ) denote the 211-tuple (ao, a1, ... , a11 _1, bo, b1 ..... bn-1). We form the
set of 211-tuples:

and x E [Co/C1]}. (4.51)

Then, ICo/ C11 2 is a (2n, ko + k1) linear block code with minimum Hamming distance

. 2
D1 =L. mm{ do. di). (4.52)

The generator matrix for ICo/ C11 2 is given by

Let M1 and M2 be two matrices with the same number of columns. The matrix

is called the direct-sum of M1 and M2, denoted by M = M1 EB Then. we can


express the generator matrix for the one-level squaring construction code IC0/C11 2
in the following form:
G = h@ G1 EB (1, 1)@ Go;1, (4.54)
where@ denotes the Kronecker product, EB the direct-sum, h the identity matrix of
dimension 2,

(4.55)

and

(1, 1)@ Go;1 = [Go;1 Go;1]. (4.56)


122 Chapter 4 Important Linear Block Codes

Now, we extend the one-level squaring construction to a two-level squaring


construction. First, we form two codes, U ~ IC0/C11 2 and V ~ IC1/C21 2, using one-
level squaring construction. It is easy to see that Vis a subcode of V. The two-level
squaring construction based on the partitions Co/C1, Ci/C2, and Co/Ci/C2 gives
the following code:

IC0/Ci/C21 4 ~{(a+ x, b + x): a, b EV and x E [U/V]}


={(a+ x, b + x): a, b E ICi/C21 2 (4.57)

and x E [IC0/C11 2 I ICi/C21 2]},


which is simply the code obtained by one-level squaring construction based on V and
V / V. This code is a (4n, ko + 2k1 + k2) linear block code with minimum Hamming
distance
D2 ~ min{4do, 2d1, d2}. (4.58)
Let Gu, Gv, and Gu;v denote the generator matrices for U, V, and [U/V],
respectively. Then, the generator matrix for IV/ V 12 is
Gv
G= 0 0 ] .
Gv (4.59)
[
Gu;v Gu;v
We put Gv and Gu in the form of (4.53) and note that Gu;v = Gu\Gv. Then,
we can put the generator matrix G of ICo/ Cii C2 l4 = IV/ V 12 given by (4.59) in the
following form:
G2 0 0 0
0 G2 0 0
0 0 G2 0
0 0 0 G2
G= (4.60)
Go;1 Go;1 Go;1 Go;1
G1;2 G1;2 G1;2 G1;2
0 0 G1;2 G1;2
0 G1;2 0 G1;2
We can express this matrix in the following compact form:

~ l4 0 G2 Ell (1111) 0 Go;1 Ell [ ~


1 1 1
G 0 1 1 ] 0G112 (4.61)
1 0 1
Note that

(1111) and
[ 1 1 1
1
O O 1 1
0 1 0 1 ]
are the generator matrices of the zeroth- and first-order RM codes of length 4.
Higher-level squaring construction can be carried out recursively in a similar
manner. Form :::: 2, let
Section 45 The Construction of Codes U3l

and
/:. 2111-I
Vm = I I ... /Cm!
denote the two codes obtained (m - construction. The code
obtained m-level

iCo/ I··· IC,, I2'" I:.


= { (ru + ~c, lbJ
-
+ lY:) : ru, !ill E
0

and 1f E [Um/ (4.62)

The matrix is [15, 16]:

G= ® G,.;,-+1, (4.63)
O:::r<m

denotes the matrix of dimension and (r, is the


matrix of the rth--order RM 2111.
RM codes are of the construction. Long RM codes
can be constructed from short RM codes the construction
[15, 16]. From the construction of RM codes
O<i:Sr,
RM(r, m)::) --1, m)::) · · ·::) RM(r - i, m).
Consider the RM code RM(r. m). As shown in Section 4.4, this code can be obtained
from RM(r. m - 1) and RM(r --- 1. m - 1) codes using the llllllllll +
follows from (4.30) that the matrix for the RM(r, m) code can be
as follows:

=[ . m--1)
0
GRM(r, m - 1)
-1, m -1)
"J . ( 4.65)

We define
I:.
-1.m-1)= ,m -1 -- l, m -- 1). (4.66)

Note that t,RM(r/r - L m -- 1) consists of those rows in


in GRM(r - 1, m - 1) and it spans the coset ,PrwP•,P11
RM(r -- 1, m - 1)]. Now, we can GRM(r. m - 1) in the

(4.67)

1-'"''-'""'F, GRIV!(r, m - 1) in with the ,c,v,n·,·a,cc, of and row


operations, we GRM (r, m) in the

GRM(r - L m - 1)
GRM(r, m) =
[
0
DRM(r/r -1, m --1)
GRM(r -°i.
m - 1)
DRM(r/r -1. m - 1)
J. (4.68)

This is exactly the generator matrix form of one-level squaring construction. There-
fore, the rth-order RM code of length 2111 can be constructed from the rth-order and
(r - l)th order RM codes of length 2111 -1; that

RM(r, m) = IRM(r. m -- 1)/RM(r - 1, m - 1)1 2 . (4.69)


124 Chapter 4 Important Linear Block Codes

Because
RM(r, m - 1) = IRM(r, m - 2)/RM(r - 1,111 - 2)1 2

and
RM(r -1, 111 -1) = IRM(r -1, 111 - 2)/RM(r - 2,111 - 2)1 2 ,

then we can construct RM(r, 111) from RM(r, m - 2), RM(r 1, 111 - 2), and RM(r -
2, m - 2) using two-level squaring construction; that is,

22
RM(r, 111) = IRM(r, 111 - 2)/RM(r -1, 111 - 2)/RM(r - 2,111 - 2)1 . (4.70)

Repeating the preceding process, we find that for 1 :::: µ :::: r, we can express the
RM(r, m) code as a µ-level squaring construction code as follows:

RM(r, m) = IRM(r, m - µ)/RM(r -1, m - µ)/ · · · /RM(r - µ, m - µ)I 2'' . (4.71)

A problem related to the construction of codes from component codes is code


decomposition. A code is said to be decomposable if it can be expressed in terms of
component codes. A code is said to be µ-level decomposable if it can be expressed
as a µ-level squaring construction code from a sequence of subcodes of a given code,
as shown in (4.62). From (4.71) we see that a RM code is µ-level decomposable.
A µ-level decomposable code can be decoded in multiple stages: component
codes are decoded sequentially one at a time, and decoded information is passed
from one stage to the next stage. This multistage decoding provides a good trade-off
between error performance and decoding complexity, especially for long codes.
RM codes also can be constructed from Euclidean geometry, which is discussed
in Chapter 8. This construction reveals more algebraic and geometric structures of
these codes, especially the structures and the number of minimum-weight codewords.
There is a one-to-one correspondence between a minimum-weight codeword of the
rth-order RM code, RM(r, m), and an (m - r)-dimensional flat in m-dimensional
Euclidean geometry, EG(m, 2), over GF(2). This correspondence gives the number
of minimum-weight codewords of the RM(r, m) code [12, 17]

A211,-, =2 n
r m-r-1 (

i=O
2m-r-1
. .
2m-i _ l )
-1 (4.72)

In fact, these minimum-weight codewords span (or generate) the code; that is the
linear combinations of these minimum-weight codewords produce all the codewords
of the RM(r, m) code.
The weight distribution of several subclasses of RM codes and all RM codes
of lengths up to 512 have been enumerated [12, 18-21 ]. The first-order RM code,
RM(l, m), has only three weights, 1, 2111 - 1, and 2111 • The number of codewords of
these weights are

Ao= A2111 = 1,
(4.73)
Section 4.6 The 12) Golay Code 125

The second-order RM code, RM(2. m) has the following weight distribution:

Ao= A2,,, = 1,

_ 21(1+]) n:'~m-21+1 (i -1) m


A 211,-1±2",-1-, - n'.= c22 i - l) for 1 -< 1 -< L-J
2 (4.74)
1
L JI//

' 2 1nmi=m-21+1 (i - 1)
1 = 2(111-+m+2)/2 _ 2 _ 2 "iU+l) .
L I 2
1-1 ni=l c2' - 1)

Because the (m - r - RM code, RM(m - r - 1, m), is the dual code of


the rth-order RM RM(r, m), the weight distributions of the RM(m - 2. m)
and RM(m - 3, m) codes can be derived from (4.73), (4.74), and the MacWiHiams
of

0 8 12 16 20 24 1 32
A; 1 620 13888 36518 13888 620 1
This code is a self-dual code.

RM codes form a remarkable class of linear block codes. Their rich structural
properties make them very easy to decode by either hard- or soft-decision decoding.
Various soft-decision decoding algorithms for these codes have been devised, and
some will be discussed in later chapters. Other classes of codes are more powerful
than RM codes-for the same minimum distance, these codes have higher rates:
however, the low decoding complexity of RM codes makes them very attractive
in practical applications. fo fact, in terms of both error performance and decoding
complexity, RM codes often outperform their corresponding more powerful codes.
The (m - 2)th-order RM code of length 2111 is actually the distance-4 extended
Hamming code obtained by adding an overall parity bit to the Hamming code of
length 2111 - 1.

4U6i THIE {2.4, 12.) <GiOlAY COIDllE


Besides the Hamming codes, the only other nontrivial binary perfect code is the
(23. 12) code constructed by M. J. E. Golay in 1949 [3]. This code has a
minimum distance of 7 and is capable of correcting any combination of three or
fewer random errors in a block of 23 digits. The code has abundant and beautiful
algebraic structure, and it has become a subject of study by many coding theorists
and mathematicians; many research papers have been published on its structure and
decoding. The Golay code is the most extensively studied single code. fo addition
to having beautiful structure. this code has been used in many real communication
systems for error control. This code in its cyclic form will be studied in Chapter 5.
The (23. 12) Golay code can be extended by adding an overall parity-check bit
to each codeword. This extension results in a (24, 12) code with a minimum distance
Foundations of Coding: Theory and Applications of Error-Correcting Codes
with an Introduction to Cryptography and Information Theory
by Jiri Adämek
Copyright © 1991 John Wiley & Sons, Inc.

Chapter 9

Reed-Muller Codes:
Weak Codes with Easy
Decoding

We now introduce an interesting class of multiple-error-correcting binary


codes whose prime importance lies in an easily implementable decoding
technique: the Reed-Muller codes. Their parameters are listed in Figure 1.

Length: η = 2 m

Information symbols: * = ELo (7)


Minimum distance: d = 2 ~ m r

Error-control capacity: Corrects 2 — 1 errors by a


m _ r _ 1

technique based on majority logic


which is easy to implement

F i g u r e 1: Parameters of the Reed-Muller code 72(r, m)

There are closely related punctured Reed-Muller codes, the parameters of


which are presented in Figure 2.
One of these codes, the Reed-Muller code 72(1,5), was used by the
1969 Mariner to transmit pictures of the Moon. The code 72(1,5) has

137
138 CHAPTER 9. REED-MULLER CODES

Length: η= 2 m
- 1
Information symbols: * = Σ : (?)=0

Minimum distance: d= 2 - - 1
m r

Error-control capacity: Corrects errors as 72(r, m) does,


but has a better information rate
and is a cyclic code

Figure 2: Parameters of the punctured Reed-Muller code

length 32 with 6 information bits and it corrects 7 errors. Each dot of the
transmited picture was assigned one of 2 = 64 degrees of greyness, and
6

these 6 information bits were then encoded into a word of length 32.
We introduce Reed-Muller codes by means of Boolean polynomials,
which we first discuss in some detail. To understand the decoding of Reed-
Muller codes, it is more convenient to work with finite geometries, where
code words become characteristic functions of flats.

9.1 Boolean Functions


Reed-Muller codes are best described by means of Boolean polynomials.
We first show how binary words translate to Boolean functions and Boolean
polynomials. Then we introduce the codes and present their basic proper-
ties. However, the decoding is better explained in a different, geometrical,
presentation of binary words, which we introduce later.

Definition. A Boolean function f = f(xo,xi x ~i) of m variables


m

is a rule which assigns to each m-tuple (XQ, x ..., x -i)


it of 0's and l's a
m

value f(xo,xi,.. . , x _ i ) = 0 or 1. In other words, f is a function from


m

to Z .
2

Truth Table. A simple way of presenting a Boolean function is to list all


of its values. That is, we write down all the 2 combinations of the values
m

of all variables XQ, x\, ... , x - i > &nd then to each combination, we assign
m

the value f(xo,x\,... ,x -i)-


m F°r notational convenience, we proceed in
such a way that the columns form binary expansions of the numbers 0, 1,
2, ... (from the first row downward). This presentation is called the truth
table of the function f.
9.1. BOOLEAN FUNCTIONS 139

An example of a truth table of a Boolean function of three variables is


presented in Figure 3. Observe that a truth table of a Boolean function

xo 0 1 0 1 0 1 0 1
xi 0 0 1 1 0 0 1 1
X2 0 0 0 0 1 1 1 1
f 0 1 1 0 1 1 1 0

F i g u r e 3: An example of a Boolean function of three variables

of three variables yields a binary word of length 8, and, conversely, every


binary word of length 8 is the truth table of some Boolean function. We
thus can (and will) identify Boolean function of three variables with binary
words of length 8. For example, the word 01101110 is the same as the
Boolean function in Figure 3.
More in general, every binary word f of length 2 is considered as a m

Boolean function of m variables. If we write the indices starting with zero,


then the binary word
f = foil • · · /2™-l
is the Boolean function with

f(0,0,...,0,0) = /o,
f(0,0,...,0,l) = f u

f(0,0,...,l,0) = / ,
a

f(l,l,...,l,l) = /,-_,.

In general,
fi = f(im-l, · · ·ι»1ι«θ),

where the number ι has the binary expansion i -i m • • • *i»o (i.e., i =

Examples
(1) There are two constant Boolean functions

1 = 1 1 . . . 11 and 0 = 00...00.
140 CHAPTER 9. REED-MULLER CODES

(2) Every variable is a Boolean function. For example, XQ is the Boolean


function which assigns to each m-tuple ( x , x i , . . . , x - i ) the first co-
0 m

ordinate value xo. Thus, the value is 0 for all even numbers and 1 for
all odd ones:
x = 0 1 0 1 0 1 0 . . . 1.
0

(See Figure 3 for the case m = 3.) In general,


χ,- is the binary word whose fcth position equals 1 precisely
when the binary expansion of Jb has 1 in the ith position
(* = 0, 1, . . . , 2"· - 1).

This follows from the way we are writing the truth tables. For example,
xi = 00110011...0011, and

x _ i = pOO.. .00,111.. .11,.


m

For m = 4, the four variables are shown on Figure 4.

xo 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
X! 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x a 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x 3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

F i g u r e 4: The four variables as Boolean functions (m = 4)

9.2 Boolean Polynomials


We now introduce the basic logical operations on Boolean functions of
m variables. If f is a Boolean function, we denote by / j (t = 0 , 1 , . . . , 2 —1) m

the ith position of its truth table.


The logical sum (also called "exclusive or") of Boolean functions f
and g is the function
' + «,
whose value is 1 precisely when either f or g has value 1, but not both. That
is, the ith coordinate of f + g is /< + ff< (addition modulo 2). The logical
sum is nothing new: it is just the usual addition in the linear space Z , 2

where η = 2 . m
9.2. BOOLEAN POLYNOMIALS 141

The logical product (also called "and") of Boolean functions f and g is


the function
fg.
whose value is 1 precisely when both f and g have value 1. That is, the
ith coordinate of fg is /,</j (multiplication modulo 2). This is a new oper-
ation imposed on Z for η = 2 , viz., the coordinatewise multiplication
2
m

(/ m_i . . . / l / o ) ( S 2 » - l •••gigo) -
2 (/2·»-102"·-ΐ)···(/ΐ0ΐ)(/θ0θ)·

Remarks
(1) Observe that the logical product fulfils

ff = f.
Thus, no exponents higher then 1 are ever needed.

(2) There are other natural operations which, however, can be expressed
by means of the logical sum and logical product. For example, the
negation f (which has value 1 precisely when f has value 0) is simply
expressed as a sum with the constant function 1 = 1 1 . . . 11:

f = 1+f.
The operation "or" ( f o r g), whose value is 1 precisely when f or g (or
both) have value 1, can be expressed as

f or g = f-(- g + fg.

(3) We have seen that 1 and each variable Xi are Boolean functions. Other
Boolean functions can be obtained by addition and multiplication. For
example: x , X j , l + x , + X j , etc. These are called Boolean polynomials:

Definition. By a Boolean polynomial in m indeterminates is meant a sum


of (some of the) following Boolean functions: 1, χ,·, X i X j , . . . , X i , X i g . . . X i „
(where the indices range overO, 1, . . . , m—1). The zero function 0 is called
a Boolean polynomial of degree —1, the function 1 is called a Boolean poly-
nomial of degree 0, and a Boolean polynomial f has degree k > 1 provided
that k is a maximum number of factors in a summand oft (i.e., f has a
summand x , , x , . . . x , , and no summand has more than k factors).
a k

Examples
(1) The Boolean polynomial l + x i x of three indeterminates has degree 2.
2

It is the negation of the polynomial

x i x = (01010101)(OOT10011) = 00010001.
2
142 CHAPTER 9. REED-MULLER CODES

Thus,
l + xjxa = 11101110.

The Boolean polynomial 1 + X i x j considered as a function of four


indeterminates is the word

1110111011101110.

(2) The polynomial x , X j ( t φ j) is the binary word whose Arth position


is 1 precisely when k has 1 in positions t and j . The number of such
Jb's is 2 (because we can choose the m — 2 remaining positions of k
m - 2

arbitrarily). Thus, the binary word x , X j has Hamming weight 2 ~ . m 2

Analogously, X.XJX* has Hamming weight 2 ~ , and, in general,


m 3

X j , x , - . . . X j , has Hamming weight 2 ~ *


3
m

for arbitrary pairwise distinct indices t ' j , . . . , i, chosen between 0


and m — 1.

R e m a r k . (4) A Boolean polynomial never uses exponents, simply because


x = χ,·. Thus, there are just four Boolean polynomial of one indetermi-
2

nate x, viz.: χ, χ + 1, 0, and 1.


We will later introduce (non-Boolean) polynomials of one indetermi-
nate—those are fundamentally different from the Boolean polynomials since
all the information is checked by the exponents.

Translation between words and Boolean polynomials. Every Boolean


polynomial of m-variables yields a binary word of length 2 : for a single m

indeterminate, see Example (2) in 9.1, and, further, we perform the required
additions and multiplications.
Conversely, every binary word f = / o / i . . /2·»-ι can be translated into
a Boolean polynomial as follows. First, observe that the last indetermi-
nate x _ i is the word
m

x _i = 00...0011...11,
m

whose first half is the constant function 0 and the other half is the con-
stant function 1 (now both considered in m — 1 indeterminates, i.e., in
the length ~2 = 2 m m _ 1
) . As a consequence, we see that for each Boolean
function / ( x o , x\,..., x _ i ) , the first half of the corresponding word f is
m

/ ( x , x i , . . . , x _ 2 > 0 ) and the other half is / ( x o , xi, · · ·, X m - a . 1) (both con-


0 m

sidered as functions of m — 1 indeterminates).


The following trivial proposition, applied recursively, then yields an
algorithm for converting a word into a Boolean polynomial:
9.2. BOOLEAN POLYNOMIALS 143

P r o p o s i t i o n . Every Boolean function f of m variables can be expressed


[by means of the two halves / ( * o , . . . , * - 2 > 0) and f(xo,.. •, x -2, 1) of m m

the corresponding binary word] as follows:

/ ( * 0 , . . • , Xm-2, Im-l) =

f(x ,0 * - 2 , 0) + [f(XQ,
m Xm-2, 0) + f(xo, ••·, Xm-2, l)]x -l-
m

P R O O F . Since x -\ can only take one of the two possible values, 0 or 1, it


m

is sufficient to verify that the identity holds for both of them. For x -\ — 0, m

the identity is clear, and for x -\ — 1. we get m

f(x ,
0 . . .,X -2,
m 1) = f(xo,--,Xm-2,0)
+ [f(x , 0 · · ·, Xm-2, 0) + f(xi, • • • , Xm-2, 1)] ·

E x a m p l e . (3) Let us translate f = 01101110 into a Boolean polynomial
(of three variables). We apply the preceding proposition:

f = 0 1 1 0 + [0110+ 1110]* 2

= OIIO+IOOO12.

Next we apply the same proposition to the two words 0110 and 1000 (of
two indeterminates):

f = (01 + [ 0 1 + 10]*!) + ( 1 0 + [ 1 0 + 001*0*2


= 01 + l l * i + 10*2 + 10*!*2-

Finally, an application of the proposition to the words of length 2 (in the


indeterminate x ) yields: 0

f = (0 + [ 0 + l ] * ) + (l + [l + l]*o)*i + ( l + [l + 0]*o)*a
o

+ (1 + [1 + 0 ] * o ) * i * 0

= *o + * 1 + * 2 + XQX2 + XlX2 + ΧθΧΐΧ2·

T h e o r e m . The linear space ZJ, where η = 2 , has a basis formed by all m

one-summand Boolean polynomials, i.e., by the following polynomials

1,
Xi (i = 0, 1, . . . , m - 1 ) ,
x.Xj (»'· j = 0, 1, . . . , m - 1 and i φ j),

XQXI . . x -im ·
144 CHAPTER 9. REED-MULLER CODES

P R O O F . Every word of length η = 2 is a Boolean function of m indeter-


m

minates, and, hence, can be expressed as a Boolean polynomial. It follows


that the one-summand Boolean polynomials span the whole space Z J . In
order to prove that they form a basis, it is sufficient to show that their num-
ber is η = 2 , the dimension of the space. In fact, we have 1 polynomial
m

of degree 0, m polynomials of degree 1, (™) polynomials of degree 2, etc.


For each k = 0, l , . . . , m — 1, there are (™) one-summand polynomials of
degree ib, and, thus, the total number of the one-summand polynomials is

(9.2.1)

[The last equation is easily derived from the binomial theorem applied to
(l + l ) . ]
m
Ο

9.3 Reed-Muller Codes


Definition. By the Reed-Muller code of length η = 2 and degree r m

( = 0 , 1 , . . . , m ) is meant the binary code 72(r,m) of all binary words of


length n, which have, as Boolean polynomials, degree ai most r.

Examples
(1) 72(0, m) consists of the polynomials of degree at most 0, i.e., of 0 and 1.
Thus, 72(0, m) is the repetition code of length 2 . m

(2) 72(1, m) has basis 1, xo, . . . , x - i - In fact, each polynomial of degree


m

at most 1 is a sum of (some of) those m + 1 polynomials, and they are


linearly independent by Theorem 9.2. Thus, 72(1, m) is a ( 2 , m + 1)- m

code.
For example, 72(1,3) has the following generator matrix:

1 "1 1 1 1 1 1 1 1 '
Xo 0 1 0 1 0 1 0 1
G =
xi 0 0 1 1 0 0 1 1
. X 2
. 0 0 0 0 1 1 1 1

We will see that 72(1,3) is the extended Hamming code (see 8.5).
72(1,4) is a (16,5)-code and 72(1,5) is a (32,6)-code, which was used
by the 1969 Mariner, as mentioned in the introduction.

(3) 72(2, m) has basis 1, x , . . . , x _ i , x x i , x o X a , . . . , x - 2 X - i - Thus,


0 m 0 m m

this is a ( 2 , (™) + m + l)-code. For example, 72(2,3) is a (8,7)-code.


m
9.3. REED-MULLER CODES 145

It is easy to see that this is just the even-parity code of length 8. 72.(2,4)
is a (16, ll)-code. We will see that 11(2,4) is the extended Hamming
code.

(4) Ti(m — l , m ) is the even-parity code of length 2 . In fact, every code


m

word of Tl(m — 1,m) is a sum of the polynomials X j , x . . .χ,·,, where l : )

s < m — 1. Each of these polynomials has an even Hamming weight


[see Example (2) of 9.2]. Thus, 7l(m — l , m ) is a subspace of the even-
parity code. Moreover, H(m — 1, m) has dimension 2 — 1, because it m

contains all of the basis polynomials in Theorem 9.2 except the last one,
X o X i . . x - i - Since the even-parity code also has dimension 2 — 1,
m
m

the two codes are identical by Corollary (4) of 7.4.

T h e o r e m . The Reed-Muller code Tl(r, m) has

*=έ(7)
1=0 ' x

information symbols, and its dual code is Ti(m — r — l , m ) .

P R O O F . I. The space Tl(r, m) is spanned by all the polynomials χ,·, . . .χ,·,,


where 0 < s < r, and for a given s, we have (™) such polynomials. By
Theorem 9.2, all these polynomials are linearly independent, thus, they
form a basis of H(r, m). We conclude that 7c(r, m) has dimension Σ«=ο (Τ) ·
II. The dimension of ^ ( r . m ) - - is 2 - £ i = o (7)
1 m
(by Theorem 7.7).
Using Equation (9.2.1) and the well-known identity

(β) ~ (m-s)'

we see that the dimension of Tl(r, m ) 1


can be re-written as follows:

dimft(r, m ) x
=
t ( : ) - ± ( : )
»=o v
' »=o x
'

ί ο
146 CHAPTER 9. REED-MULLER CODES

We conclude that linear spaces H(r, πι) and %(m - r - l , m ) have the
λ

same dimension. By Corollary (4) of 7.4, it is now sufficient to show that


H(m — r — 1, m) is a subspace of the space 7J(r, m ) . χ

Thus, it is our task to verify that each Boolean polynomial f of degree


p < m - r - l i s orthogonal to all Boolean polynomials g of degree q < r
[and, thus, f lies in K(r, m) ]. Since the scalar product f · g is the sum of
x

the coordinates of the logical product fg, we just have to show that fg,
represented as a binary word, has an even Hamming weight. The degree of
the polynomial fg is at most p+q < m — r - l + r = m— 1 (see Exercise 9A).
Thus, fg is a code word of H(m-1, m), which by Example (4) above implies
that fg has an even Hamming weight. Q
E x a m p l e . (5) H(m - 2, m) is the extended Hamming code of length 2 m

(see 8.5). In fact, the dual code is

H(m - 2,m)- L
=n(l,m)
and, thus, H(m — 2, m) has the following parity check matrix:
1 "1 1 1 1 · • 1 1 1 1 "
xo 0 1 0 1 • 0 1 0 1
Η = xi = 0 0 1 1 · • 0 0 1 1

Xm . 0 0 0 0 · • 1 1 1 1 .

We can add the first row to all the other rows, and then interchange it with
the last row. This yields another parity check matrix:
' 1 0 1 0 · • 1 0 1 0"
1 1 0 0 · • 1 1 0 0
Η ~
1 1 1 1 · • 0 0 0 0
1 1 1 1 · • 1 1 1 1 .
By deleting the last column and the last row from the new matrix, we
obtain a 2 — 1 by m matrix Ho with pairwise distinct, nonzero columns.
m

Thus, Ho is a parity check matrix of the Hamming code. Consequently,


our matrix
0 '
0
Ho

0
. 1 1 1 1 ··· 1 1 1 1 .
is a parity check matrix of the extended Hamming code.
9.4. GEOMETRIC INTERPRETATION: 3-DIMENSIONAL CASE 147

Remarks
(1) Encoding of the Reed-Muller codes can be performed in the usual way
by multiplying the information word by the generator matrix [see Re-
mark (4) of 8.1]. In other words, the information bits become the
coefficients of the corresponding Boolean polynomial. For example,
in 72(1, m), we encode the m + 1 information bits as follows:
1
xo
xi = «1 + U X o + · · · + U iX _i.
[«1,«2, •••."rn+l 2 m + m

Xm-l

(2) The minimum weight of the Reed-Muller code 72(r, m) is


d= 2 -\ m

In fact, the code word x o X i . . . x - i has a Hamming weight 2


r [see m - r

Example (2) of 9.2], thus, d < 2 . Since 72(r, m) is a subspace of the


m - r

even-parity code 72(m— 1, m), we know that d is even. In 9.6, we will see
that 72(r, m) can correct 2 — 1 errors. Thus, by Proposition 4.6,
m - r _ 1

d > 2(2 -m
- 1), and we conclude that
r _ l

2 m _ r
< d< 2 m - r
- 2.
Since d is even, this proves d = 2 m - r
.
(3) In some respect, it is more suitable to work with the punctured Reed-
Muller codes 72(r, m) (see 8.6). This means that in 72(r, m), we delete
the last symbol from each code word. The resulting code 72(r, m) has
length 2 — 1, and the number of information symbols can be shown
m

to be equal to that in 72(r, m). The minimum Hamming distance


of 72(r,m) is 2 - 1 and, thus, it can correct 2 ~ ~ — 1 errors,
m - r m r l

the same number as the original code can. We list all punctured Reed-
Muller codes of lengths 7, . . . , 127 in Appendix B.
For example, 72(0, m) is the repetition code of length 2 m
— 1 and 72(m —
1, m) is the Hamming code of that length.

9.4 Geometric Interpretation:


Three-Dimensional Case
In order to explain the decoding of Reed-Muller codes, it is convenient
to introduce a new interpretation: instead of with Boolean functions, we
148 CHAPTER 9. REED-MULLER CODES

work with flats (or affine subspaces). We first present the case of the three-
dimensional geometry, which corresponds to the codes of length 8, and then
the general geometry of Ζψ.
Recall that the conventional three-dimensional Euclidean geometry op-
erates within the linear space R , whose points (or vectors) are triples
3

a = 0 ^ 2 0 3 of real numbers. We have lines in R , which can be described


3

as follows:
a + <b (a, b in R , b φ 0 ) .
3

Here t denotes a real parameter, thus, the line a + t b is the set of all points
{ a + i b | t € R } . Further, we have planes in R : 3

a + tb + sc (a, b, c in R , b and c linearly independent).


3

Here, again, t and s are real parameters.


Now, the binary Euclidean three-dimensional geometry can be intro-
duced quite analogously. Its points are the vectors of the linear space Z . 3

In contrast to the real case above, there are precisely eight points. We can
enumerate them by the corresponding binary expansions of their indices:
po = 000, pi = 001, etc., see Figure 5.

Point Characteristic Function


Po 000 00000001
Pi 001 00000010
P2 010 00000100
P3 = on 00001000
Ρ4 = 100 00010000
Ρ5 101 00100000
P6 = 110 01000000
Ρ7 = 111 10000000

Figure 5: Points of the binary three-dimensional Euclidean geometry

Lines of the Euclidean geometry can be described as follows:

a + ib (a, b in Z\, b φ 0 ) ,
where ί is a binary parameter, t = 0, 1. Thus, the line has precisely two
points: a and a + b. Conversely, every pair a , a ' of points constitutes a
line, viz.,
a + <(a' - a).
9.4. GEOMETRIC INTERPRETATION: 3-DIMENSIONAL CASE 149

It follows that lines are just all two-point subsets:

{ P 0 . P l } > { P 0 , P 2 } , ··· . {P6,P7}-

The number of lines is

©•»•
since a line is just a choice of an (unordered) pair from among the eight
points.

Finally, planes of the Euclidean geometry are described as

a-Mibj -M2D2 (a, bi, b j in Zf,; b i . b j linearly independent),

where t\ and < are binary parameters. The plane, then, consists of the
2

following four points: a , a + bi, a + b?, and a + bi + b . All planes are 2

listed in Figure 6.

Plane Characteristic Function Boolean Polynomial

{Pl>P3,P5,P7} 10101010 xo
{P2,P3,P6,P7} 11001100 x\
{P4,P5,P6,P7} 11110000 X2
{P0,P2,P4.P6} 01010101 1 + xo
{P0,Pl,P4.P5 } 00110011
{P0,Pl,P2,P3} 00001111 1+ Z2
{Pl.P2,P5,P6 } 01100110 xo + xi
{ P l . P 3 , Ρ4,Ρβ} 01011010 Xo + Xl
{P2,P3,P4>P5} 00111100 Xl + * 2
{ P l 1 P 2 . P 4 . P 7 } 10010110 xo + xi + xi
{ P 0 , P 3 , P4.P7} 10011001 l + io + z i
{P0.P2,P5.P7} 10100101 1 + X0 + X2
{P0,Pl,P6,P7} 11000011 1 + XI+X2
{P0,P3,P5,P6} 01101001 1 + xo + xi + xi

F i g u r e 6: Planes in the binary three-dimensional Euclidean geometry

Observe that the planes passing through the origin ( a = 0 ) are precisely
those which can be described by a homogenous equation

hoxo + hixi + njia = 0.


150 CHAPTER 9. REED-MULLER CODES

In fact, such a plane is precisely a two-dimensional subspace of Z\, now see


Remark (4) of 7.7. It follows that a general plane can always be described
by a (possibly nonhomogenous) equation

ΛοΧο + hiXi + hx 2 2 — c.

(Given a plane a + ijbj +/_>b2 and describing the parallel plane t\hi + t b 2 2

by the equation ΛοΧο + ΛχΧι + h x = 0, put c = Λοαο + Λ ι α ι + h a , where


2 2 2 2

a = aoaia .)2

Furthermore, every line can be described by a pair of nonhomogenous


equations

hoxo + Λ 1 Χ 1 + h x 2 2 — c,
h' xo + h[xi + h' x
0 2 2 = c'.

This follows from the fact that every line a + t b is an intersection of two
planes: choose a basis b, d, d' of the space Z\ and consider the planes
a + t b + s d and a + t b + s'd'.
Lines and planes are examples of flats (also called affine spaces). A flat
in the space Z is a coset (6.2) of a linear subspace of the space Z .
2 2 That
is, a flat has the form

a + A ' = { a + b | b i 8 a point of Κ } ,

where a is a point of Z , and AT is a linear subspace. If the dimension of Κ


2

is s, we call the coset an s-flat. Thus, lines are preisely the 1-flats, and
planes are the 2-flats. For each point pj, we have a 0-flat { p< } , and there
is precisely one 3-flat, viz., the whole space Z\.
Every flat L can be described by the binary word {L — fr .. • fifo denned
by
1 if the point p* lies in L,
0 otherwise.

The word fx, (or the correspondig Boolean function of three variables) is
called the characteristic function of the flat L. (For 1-flats and 2-flats, see
the above figures.)
Given flats L and V, their intersection Lf)L' is obviously characterized
by the logical product fcft/. For example, the first two planes in Figure 6
intersect in the line { p a . p ? } . The logical product of their characteristic
functions is
x x i = 10001000,
0

which is indeed the characteristic function of { p a , P 7 } .


9.5. GEOMETRIC INTERPRETATION: GENERAL CASE 151

R e m a r k . The Reed-Muller code 72(1,3) is spanned by the characteristic


functions of all planes. In fact, we see in Figure 6 that the characteristic
functions of planes are precisely all Boolean polynomials of degree 1 in three
variables.
The Reed-Muller code 72(2,3) is spanned by the characteristic functions
of all planes and all lines. In fact, every line L is an intersection of two
planes. Hence, the characteristic function fj, is a product of two Boolean
polynomials of degree 1, which is a Boolean polynomial of degree 2—a code
word of 72(2,3).

9.5 Geometric Interpretation: General Case


We now pass to the Euclidean geometry of the m-dimensional binary linear
space Z . The points, or vectors, of Z can, again, be ordered by the
m m

binary expansions of the numbers 0, 1, . . . , 2 — 1. That is, we put Ζψ = m

{ P o > P i . P 2 , · • · , P 2 » - i }> where

po = 000...00,
pi = 000...01,
P2 = 000 . . . 10,

P 2 n._i = 111...11.

Definition. Let Κ be an r-dimensional linear subspace of the space Ζψ.


Every coset
a + Κ - { a + b I b lies in Κ }

of Κ is called an r-flat in the binary m-dimensional Euclidean geometry.


An (m — l)-flat is also called a hyperplane.

Notation. Given a basis bi, . . . , b of the subspace K, the r-flat a + Κ


r

is also denoted by
a + i,b, + . . . + f b . r P

It has 2 points (given by the r choices U = 0,1 for t = 1 , . . . , r ) .


r

Another notation of the r-flat a + Κ is by means of a nonhomogenous


system of linear equations: whenever the space Κ is described by equations
H t r _ tr
x 0 Remark ( 4 ) f 7 . 7 ^ then the r-flat a + Κ is described by
G

the following equations:

Hx t r
= c , t r
where c t r
= Ha .
t r
152 CHAPTER 9. REED-MULLER CODES

The number of these equations is m — r. For example, each hyperplane is


described by a single equation:

hoio + h xi +
x (- n _ i x _ i = c.
m m

Examples
(1) Every 0-flat is a one-point set. Thus, there are 2 m
different 0-flats, viz.,
{ p o } , ··· , { P 2 - - l } -

(2) Every 1-flat (or line) is a two-point set,

a + <b = { a , a + b } ,

and, conversely, every two-point set is a 1-flat. Therefore, there are


( ") 1-flats.
2
2

(3) Let P( denote the flat described by the equation n = 1. That is, P, is
the set of all points p* which have a 1 in the ith position. For example,
Po = { P i , P 3 . P s , · · · , P 2 » - l }·
Each Pi is a hyperplane. In fact, the point p . has precisely one nonzero
2

position (the ith one), and, thus,

Pi = p . + K,
2

where Κ is the linear space determined by the equation ib,- = 0. It is


clear that the dimension of Κ is m — 1.
The number of all hyperplanes is 2 ( 2 - 1). In fact, the space Z™ has
m

precisely 2 - 1 subspaces Κ of dimension m — 1 (see Exercise 7H).


m

Each of these subspaces Κ has 2 points and, thus, by Remark 6.2,


m _ 1

there are 2 / 2 ~ cosets modulo K.


m m 1

(4) For i φ j , the intersection Pi Π Pj (i.e., the set of all points with
a 1 in the ith and jth positions) is an (m — 2)-flat. In fact, for the
point a = p i + i (with l's just on the ith and jth positions), we have
2 2

ΡίΠΡ =&
} + Κ,

where Κ is determined by the equation i,- = kj = 0 . The dimension


of Κ is m - 2.

Definition. By the characteristic function of an r-flat L is meant the bi-


nary word tΊ, = / 2 » - i . . . / i / o defined by

1 if the point p lies in L ,


;

0 otherwise.
9.5. GEOMETRIC INTERPRETATION: GENERAL CASE 153

R e m a r k . The characteristic function ft can be interpreted as a Boolean


polynomial / t ( i o , * i , · · - , X m - i ) - It follows from 9.1 that L consists of pre-
cisely those points p< = xoXi · · · X m - i which satisfy /f,(xoi χι,..·, x - i ) = m

1. Shortly:

x xi..
0 .x -im lies in L «==> h(xo, xi, • • •, X m - i ) = 1·
Examples
(5) The only m-flat, i.e., the space Z , has the characteristic function
m

1 = 1 1 1 . . . 11. For the hyperplane P< above fp = χ,·. i

(6) Every hyperplane has a characteristic function which is a Boolean poly-


nomial of degree 1: if the hyperplane L is described by the equation

hoxo + hixi + h h -\x -i


m m — c,

then

/l-(X0, · · · , X m - l ) = Λ Χ + /»iXj Η
0 0 h h -\X -l
m m + c + 1.

In fact, a point xoxi . . . x _ i lies in the hyperplane precisely when


m

ftoxo + r hm-iXm-i = c. i.e., when / r , ( x o , x i , · • - , X m - i ) = 1-

(7) For two flats L and L', the function f t f t ' is the characteristic function
of the intersection L(~\ L'.
For example, the polynomial χ,-χ^ is the characteristic function of the
(m — 2)-flat, which is the intersection of the hyperplanes P, and Pj.
More in general: the Boolean polynomial

X|'| X | J · · · X | ,

is the characteristic function of an (m — s)-flat.

T h e o r e m . The characteristic function of an r-flat is a Boolean polynomial


of degree m — r .

P R O O F . In the notation above we have described each r-flat L by m — r


equations H x " = c , or t r

m-1

hijXj — c, for i = 1, 2, . . . , m — r.

We can rewrite the equations as follows:


m-1
(hijXj + c, + 1) = 1 for i = 1, 2, . . . , m — r.
154 CHAPTER 9. REED-MULLER CODES

Then the following Boolean polynomial of degree m — r


m—r m—1
f(x u .. .,x -i)
m = ^(n.jXj+c,· +1)

is the characteristic function of L: the equation / ( x o , · · ·. * m - i ) = 1 holds


precisely when Y^^-Q (hijXj + c, + 1) = 1 for each %=\, ... ,m — r.
1
Π

Corollary. (1) Reed-Muller codes can be characterized geometrically as


follows: 72(r, m) is spanned by all characteristic functions of flats of di-
mension at least m — r in the binary m-dimensional geometry over Z . 2

(2) Every characteristic function of an ( r + l)-flat lies in the dual code


ofH(r, m).

In fact, H(r, m) contains all characteristic functions of s-flats, s > m — r,


by the preceding theorem. That those functions span the space H(r,m)
follows from Example (7). Thus, (1) is true, and (2) follows from Theo-
rem 9.3. •

9.6 Decoding Reed-Muller Codes


We now present an interesting and easily implementable decoding tech-
nique for the code 72(r, m). It is based on majority logic, and it can correct
2 " j - r - i _ j errors. In contrast to other decoding techniques, the present
method does not use syndromes, rather it directly computes the corrupted
bits from the properties of the received word. The idea is as follows. We
receive a binary word w = ω ·»-ι ...w\Wo of length 2 . Assuming that
2
m

less then 2 " " bits are corrupted, we want to determine, for each i = 0,
m - r 1

. . . , 2 — 1, whether or not ω, should be corrected. This can be reformu-


m

lated by asking whether the position of w corresponding to the 0-flat { ρ,· }


should be corrected.
Instead of answering this question directly, we take a broader point of
view: for each s-flat L, where s = 0, 1, . . . , r + 1 , we determine whether the
positions of the received word w corresponding to the points of L (i.e., those
bits u>j such that the point p, lies in L) are corrupted or not. Well, not
exactly. We just distinguish between "even" and "odd" s-flats: an s-flat L
is called even if the number of all corrupted positions u>< corresponding to
points p, of L is even, otherwise L is odd. If we are able to determine the
parity of each s-flat L, the decoding is clear: we correct a bit ui< if and only
if the 0-flat { ρ,· } has odd parity. The trick is that we start with s = r + 1
and then proceed to the lower dimensions.
9.6. DECODING REED-MULLER CODES 155

Thus, the first step of decoding the word w is to determine, for each
(r + l)-flat L, whether L is odd or even. This is performed by computing
the scalar product of w with the characteristic function f/,:

L is even <=> w · fi = 0.

In fact: if w is a code word, then w · fx = 0 by Corollary (2) in 9.5. Now,


if precisely two of the bits of w corresponding to points of L are corrupted,
the value w · ft will not be changed. Analogously with 4, 6, . . . bits. But
if an odd number of bits of w corresponding to points of L are corrupted,
then the received word fulfils w · fL — 1.
For e-flats L, where s < r, we proceed by majority logic: suppose we
already know the parity of every ( « + l)-flat containing L, then we say that
L is odd if a majority of these ( s + l)-flats is odd, and L is even if a majority
of them is even. The reason why this procedure works is that each s-flat is
contained in a large number of (s + l)-tlats:

T h e o r e m . Every s-flat L in the binary m-dimensional geometry is con-


tained in exactly 2 ~' — 1 different (s + \)-flats. Furthermore, each point
m

outside of L lies in precisely one of these (s 4- l)-flats.

P R O O F . I. We prove first that every s-dimensional linear subspace Κ


of Z is contained in precisely 2 ' — 1 different subspaces of dimen-
m m -

sion s + 1.
Every (s + l)-dimensional space Κ containing Κ has the form Κ =
Κ + 1 b for some point b outside of K, where

Κ + t b = { a + t b I a € Κ and t = 0,1 } .

This follows immediately from the fact that every basis of Κ can be ex-
tended (by a single vector) to a basis of Κ—see Theorem 7.4.
Next observe that for two points b, b' outside of K, the linear subspaces
Κ +1 b and Κ +1 b' coincide if and only if b and b' lie in the same coset
modulo Κ (6.2). In fact, if Κ + tb = Κ -Mb', then b can be expressed as
a + t b', where a lies in Κ—thus, t φ 0 and we get

b - b ' = a € K.

By Proposition 6.2, b and b' lie in the same coset. Conversely, if b — b' = a
is a point of K, then b = a + b' lies in Κ + t b', and b' = - a + b lies
in K+tb; thus, K + tb = K + tb'. By Remark6.2, there are 2 / 2 * cosets
m

modulo K. One of them is Κ itself, and all other cosets contain only points
outside of K. Consequently, there are 2 — 1 different spaces Κ +1 b for
m _ J

b outside of K.
156 CHAPTER 9. REED-MULLER CODES

II. Every s-flat

L = a+ Κ (dim A = a)

is contained in 2 ~ * — 1 different (s+l)-flats a + K ' , where K' is an ( e + 1 ) -


m

dimensional space containing Κ (this follows from I.). It remains to prove


that every (« + l)-flat a' + K' containing L has the mentioned form. That
is, we want to show that a + K' = a' + K'. In fact, since a lies in a' + K',
the difference a —a' is in K' and, hence, the points a and a' lie in the same
coset modulo K.

III. Every point b outside of the s-flat L = a + K lies in an (s + l)-flat


containing L, viz., the flat a + K', where K' = Κ +1(b — a). In fact, b lies
in a + [K -M(b — a)] because by choosing t — 1 and 0 £ K, we have

b = a + [ 0 + (b-a)].

To verify that K' has dimension s + 1, it is sufficient to show that b — a


lies outside of K: in fact, if b - a lies in K, then a + (b - a) = b lies in L.
Finally, we prove that the (β + l)-flat is unique. Every (s + l)-flat
containing a + Κ has the form a+K', where K' is an (« + l)-dimensional
linear space containing K. If b lies in such a flat a + K', then b - a is also
a point of K'\ thus, K' contains the linear space Κ + (b — a). The last two
spaces have both dimensions 8+1, hence, they coincide. •

Corollary. If the number of errors in α received word is less then 2 ,


m - r - 1

then for each s-flat L, 0 < a < r, the majority of(s + l)-flais containing L
have the same parity of errors as L.

In fact, let t < 2 bits of the received word w be corrupted. We know


m _ r - 1

that L is contained in
2'm—r - 1 > 2t
(

(s+l)-flats V, and each such flat V is determined by one point outside of L.


Let us begin with all points pj outside of L such that u>,- is a corrupted bit.
There are at most t corresponding ( s + l)-flats V. All the remaining flats V
have the property that they contain no point p< outside of L such that u>,-
is incorrect. Thus, V has the same error parity as L. The number of the
latter flats is at least ( 2 - 1) - 1 > t; thus, they form a majority.
m - r
Ο
9.6. DECODING REED-MULLER CODES 157

Decoding A l g o r i t h m for t h e R e e d - M u l l e r C o d e 72(r, m)

First step: Receiving a word w, call each ( r + l)-flat L odd if the scalar
product of its characteristic function ft with w is 1, otherwise call L even.
That is, for (r + l)-flats L:

r. f odd if w · ft = 1,
L is <
even if w · ft = 0.

Recursive steps: For all s = r, r — 1, . . . , 0 such that each (s + l)-flat


has already been called odd or even, call an s-flat L odd if a majority of
(s + l)-flats containing L is odd, otherwise call L even.
Last step: Correct the ith bit of w if and only if the 0-flat { p, } has been
called odd.

E x a m p l e . Working with 72(1,3), we have received

11101010.

The first step is to decide which planes (see Figure 6 in 9.4) are odd and
which are even. For example, the plane L — { P 1 . p 3 . p 5 . p 7 } is even since
w f t = 11101010· 10101010 = 0. See Figure 7. Next we must decide,

Plane Parity Plane Parity

{Pl.P3,P5,P7} even {Pl.P3,P4,P6} odd


{P2.P3.Pe.P7} odd {p2.P3.P4.P5} even
{p4.Ps.P6.P7} odd {po,P3,P4.P7} even
{P0,P2,P4,P6 } odd {P0.P2.P5.P7} even
{P0.Pl.P4.P5 } even {P0,Pl,P6,P7} odd
{P0.P1.P2.P3} even {P1.P2.P4.P7} even
{Pl.P2,P5,P6} odd {P0,P3,P5,P6} odd

F i g u r e 7: First step of decoding 11101010

for each line L, whether L is odd or even. For example, the line { po.Pi }
is contained in three planes (see Figure 6 of 9.4):

{P0.P1.P4.P5} - even,
{P0.P1.P2.P3} - even,
{po.P1.Pe.P7} - odd.
158 CHAPTER 9. REED-MULLER CODES

Line Pariry Line Pariry

{Ρο,Ρι } even {P2.P4} even


{Ρο,Ρϊ} even {P2,P5> even
{Po.Pa} even {Pa.Pe} odd
{P0,P4J even {P2,P7} even
{Po.Ps} even {P3,P4} even
{ρο.Ρβ} odd {Ρ3,Ρβ} even
{P0,P7} even {Ps.Pe} odd
{P1.P2} even {P3,P7} even
{Pi.Pa} even {P4,Ps} even
{Ρ1.Ρ4} even {Ρ4,Ρβ} odd
{Pi.Ps} even {P4,P7} even
{Pi.Pe} even {Ρβ,Ρβ} odd
{Ρι.Ρτ} even {P5.P7J even
{P2,P3} even {Ρβ,Ρ7} odd

Figure 8: Second step of decoding 11101010

By majority vote, the line { ρ ο , Ρ ι } is even. We must go through all the


lines and perform such a majority vote. The result is seen in Figure 8.
Finaly, we are prepared to correct the individual bits. The point po is
contained in seven lines, one odd and six even; thus u>o will not be corrected.
Also pi is contained in one odd and six even lines, etc. The only bit to
correct is u>e because ρβ is contained in seven odd lines. The word sent
is

10101010

[which is the polynomial 1 + x\, a code word of 72(1,3)].

Concluding R e m a r k . We have presented a method by which the Reed-


Muller code 72(r, m) can correct 2 — 1 errors. As explained in Re-
m - r _ 1

mark (1) of 9.3, this proves that the minimum distance is 2 ~ . m r

For example, the code 72(1,5) is a (32,6)-code [see Example (1) in 9.3],
which corrects 2 — 1 = 7 errors, as mentioned in the introduction.
5 - 1 _ 1
246 CHAPTER 12. BCH CODES

η = 2 - 1, we derive a binary code K* of length nm by substituting for


m

each symbol the corresponding binary m-tuple.


For example, the generator polynomial of the Reed-Solomon code in
Example (2) is the word α α 1 α 1 0 0 of length 7 over GF(8). Since a =
3 3 3

a -(- 1 corresponds to 110, a corresponds to 010, etc., the derived binary


code of length 21 contains the following word:

1 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 000.

From the (7,2)-Reed-Solomon code over GF(8), we derive a binary (21,6)-


code.
The importance of those derived codes lies in their capability of correct-
ing burst errors of length (< - l)m + 1. In fact, such a burst in the derived
binary word can corrupt at most t (adjacent) symbols of the original word
over G F ( 2 ) , see Figure 2.
m

I I I I I original word over GF(2 )


m

ΙΑιΛίΛίΛίΛΙ 1 Μ I I derived binary word


" » '
buret—error

Figure 2: A burst error on a word derived from a Reed-Solomon code

For example, the above double-error-correcting Reed-Solomon code over


GF(8) yields a binary (21,6)-code which corrects burst errors of length 3 +
1 = 4 : any such burst affects at most two adjacent symbols of the original
word.

12.5 Generalized Reed-Muller Codes


Recall that puncturing (8.5) is a modification of a code consisting in delet-
ing the last symbol from each code word. We now return to the Reed-
Muller codes 7t(r, m) of length η = 2 studied in Chapter 9. Since the
m

code K(r,m) has even minimum distance d = 2 (Remark 9.6), the


m - r

punctured codeTC(r,m) has the same error-correcting capacity: it corrects


2 — 1 errors. The punctured code has a better information rate and,
m - r - 1

moreover, it is cyclic.
12.5. GENERALIZED REED-MULLER CODES 247

Notation. For each natural number s, we denote by w (s) the Hamming t

weight of the g-ary expansion of s. Examples: u> (l) = w (2) = 1 and2 2

u> (3) = 2.
2

T h e o r e m . The punctured Reed-Muller code V.(r, m) is a binary cyclic code


consisting of all binary polynomials of degree less than 2 — 1 which have m

β' as zeros, where β is a fixed primitive element of GF(2 ), and s = 1, 2,


m

. . . , 2 — 1 is any number with w (s) < m — r. In notation:


m
2

w(x) lies in 7e(r, m) ·<==> ν{β') = 0 for all i<2 , m


0 < w {i) < m - r.
2

The proof, which is quite technical, is omitted. The reader can find it
in the monograph of MacWilliams and Sloane (1981).

Definition. Let β be a primitive element of GF(q ). By a generalized


m

Reed-Muller code of length η = q — 1 and order r over the alphabet GF(q)


m

is meant the cyclic code of all polynomials with β' as zeros for all i = 1, 2,
. . . , η such that w (i) < m — r.
9

R e m a r k . The case q = 2 leads to the punctured Reed-Muller code Tl(r, m)


by the preceding theorem. Thus, Reed-Muller codes form a subclass of the
extended generalized Reed-Muller codes: 7i(r, m) is obtained from Tl(r, m)
by extension (8.5).

Examples
(1) Take q = 2, m = 4, and r = 2. All the numbers i = 1, 2 2 - 1 4

with u) (») < 2 are 1, 2, 4, and 8. Thus, we obtain a cyclic binary code
2

of length 15 with zeros at β, β , β , and β . Equivalently, with a zero


2 4 6

at β. This is the Hamming (16, ll)-code.

(2) For q = 2, m = 4, and r = 1, we consider all the numbers i = 1, 2,


. . . , 2 — 1 with w (i) < 3: besides the four numbers above, these are
4
2

3, 5, 6, 9, 10, and 12. The code is determined by the zeros β, β , 3

and β . This is the BCH (15,5)-code.


5

(3) Take q = 3, m = 3, and r = 1. All the numbers i = 1, 2, . . . , 3 - 1 3

with u ) ( t ) < 2 are 1, 3, and 9. The resulting ternary code of length 26


3

is determined by a single zero β.

P r o p o s i t i o n . The binary generalized Reed-Muller code has minimum dis-


tance at least 2 - 1.
m _ r

P R O O F . It is sufficient to prove that the generalized Reed-Muller code is a


subcode of the t-error-correcting BCH code, where t = 2 ~ ~ — 1. Since m r x
248 CHAPTER 12. BCH CODES

the latter code has minimum distance d > 2t + 1 (Theorem 12.3), it follows
that the former code has this property too. Observe that the first number
1 with u) (i) > m - r is i = 2
2 - 1. Thus, each i = 1, 2, . . . , 2 ~
m _ r m
- T

2 satisfies Wi(i) < m - r. It follows that every code word u;(ar) of the
generalized Reed-Muller code has β* as zeros for» = 1, 2 , . . . , 2 - 2 = 2t.
m - r

Therefore, w(x) is a code word of the i-error-correcting BCH code. Ο

12.6 Goppa Codes: Asymptotically Good


Codes
We now introduce a class of noncyclic binary linear codes, discovered by
Goppa in 1970, which generalize binary BCH codes and contain a number
of interesting codes (e.g., a double-error-correcting code of length 8). Later
we will also use these codes for a construction of secret codes. Parameters
of the important subclass of irreducible binary Goppa codes are listed in
Figure 3.

Specified: By an irreducible polynomial of de-


gree t over GF(2 ) m

Length: η= 2 m

Information symbols: k > η — mi


Minimum distance: d > 2i + 1
Error-control capability: Corrects t errors by a technique
analogous to that for BCH codes

Figure 3: Parameters of irreducible binary Goppa codes

Notation. Let r(x) be a fixed polynomial over a field F. Given an element a


with r(a) φ 0, we denote by the following polynomial over F:

_L_ = _r(x)-r(a) ,
χ— a χ—a

(The right-hand side is well-defined: since the polynomial r(x) — r(o) has
α as a zero, it is divisible by χ — a, see 11.1.)
Foundations of Coding: Theory and Applications of Error-Correcting Codes
with an Introduction to Cryptography and Information Theory
by Jiri Adämek
Copyright © 1991 John Wiley & Sons, Inc.

Appendix Β

B C H Codes and
Reed-Muller Codes

We list parameters of all BCH codes (Chapter 12) and all punctured Reed-
Muller codes 7c(r,m) (see 9.3) of lengths η = 7, 15, 31, 63, 127. The
number of information symbols is denoted by ib.

Length Minimum BCH codes tt(r, m)


distance ib Jb Γ

7 3 4 4 1
7 1 1 0

15 3 11 11 2
5 7
7 5
-5 -1
15 1 1 0

31 3 26 26 3
5 21
7 16
-
16
-2
11 11
15 6
-6 -1
31 1 1 0

325
326 APPENDIX Β. BCB CODES AND REED-MULLER CODES

Length Minimum BCH codes m)


distance k ib r

63 3 57 57 4
5 51
7 45
-
42 -3
7 39
11 36
- -
13 30
- -
15 24
-
22
-2
21 18
23 16
- -
27 10
- -
31 7
-7 -1
63 1 1 0

127 3 120 120 5


5 113
7 106 99
- -4
9 99 - -
11 92
13 85
- -
15 78
-64 -3
19 71
21 64
- -
23 57
- -
27 27
- -
31 43
- -
31 36
-29 -2
43 29
47 22
- -
55 15
- -
63 8
-8 -1
127 1 1 0
33 1.10 Reed–Muller codes

 
0 1 1 1 1 1
1 1 −1 −1 1
 0 
 
1 1 0 1 −1 −1
A= .
1 −1 1 0 1 −1
 
1 −1 −1 1 0 1
1 1 −1 −1 1 0

In a fashion analogous to that of Section 1.9.1, we can show that G 12 is a [12, 6, 6]


self-dual code. The code G 11 is a [11, 6, 5] code obtained from G 12 by puncturing. Again,
equivalent codes are obtained regardless of the coordinate. However, adding an overall
parity check to G 11 in the same coordinate may not give the same G 12 back; it will give
either a [12, 6, 6] code or a [12, 6, 5] code depending upon the coordinate; see Exercise 61.

Exercise 60 Prove that G 12 is a [12, 6, 6] self-dual ternary code. 

Exercise 61 Number the columns of the matrix A used to generate G 12 by ∞, 0, 1, 2, 3,


4. Let G 12 be obtained from G 12 by scaling column ∞ by −1.
(a) Show how to give the entries in row two of A using squares and non-squares of integers
modulo 5.
(b) Why is G 12 a [12, 6, 6] self-dual code? Hint: Use Exercise 60.
(c) Show that puncturing G 12 in any coordinate and adding back an overall parity check in
that same position gives the same code G 12 .
(d) Show that if G 12 is punctured in coordinate ∞ and this code is then extended in the
same position, the resulting code is G 12 .
(e) Show that if G 12 is punctured in any coordinate other than ∞ and this code is then
extended in the same position, the resulting code is a [12, 6, 5] code. 

In Exercise 61, we scaled the first column of A by −1 to obtain a [12, 6, 6] self-dual code
G 12 equivalent to G 12 . By that exercise, if we puncture G 12 in any coordinate and then extend
in the same coordinate, we get G 12 back. In Chapter 10 we will see that these punctured
codes are all equivalent to each other and to G 11 . As a result any [11, 6, 5] code equivalent
to one obtained by puncturing G 12 in any coordinate will be called the ternary Golay code;
any [12, 6, 6] code equivalent to G 12 (or G 12 ) will be called the extended ternary Golay
code.

1.10 Reed–Muller codes

In this section, we introduce the binary Reed–Muller codes. Nonbinary generalized Reed–
Muller codes will be examined in Section 13.2.3. The binary codes were first constructed
and explored by Muller [241] in 1954, and a majority logic decoding algorithm for them was
described by Reed [293] also in 1954. Although their minimum distance is relatively small,
they are of practical importance because of the ease with which they can be implemented
and decoded. They are of mathematical interest because of their connection with finite affine
34 Basic concepts of linear codes

and projective geometries; see [4, 5]. These codes can be defined in several different ways.
Here we choose a recursive definition based on the (u | u + v) construction.
Let m be a positive integer and r a nonnegative integer with r ≤ m. The binary codes
we construct will have length 2m . For each length there will be m + 1 linear codes, denoted
R(r, m) and called the r th order Reed–Muller, or RM, code of length 2m . The codes R(0, m)
and R(m, m) are trivial codes: the 0th order RM code R(0, m) is the binary repetition code
m
of length 2m with basis {1}, and the mth order RM code R(m, m) is the entire space F22 .
For 1 ≤ r < m, define
R(r, m) = {(u, u + v) | u ∈ R(r, m − 1),v ∈ R(r − 1, m − 1)}. (1.7)
Let G(0, m) = [11 · · · 1] and G(m, m) = I2m . From the above description, these are
generator matrices for R(0, m) and R(m, m), respectively. For 1 ≤ r < m, using (1.5),
a generator matrix G(r, m) for R(r, m) is

G(r, m − 1) G(r, m − 1)
G(r, m) = .
O G(r − 1, m − 1)
We illustrate this construction by producing the generator matrices for R(r, m) with
1 ≤ r < m ≤ 3:
 
  1 0 1 0 1 0 1 0
1 0 1 0 0 1 0 1 0 1 0 1
G(1, 2) = 0 1 0 1 , G(1, 3) =  0 0
 , and
1 1 0 0 1 1
0 0 1 1
0 0 0 0 1 1 1 1
 
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
 
0 0
 0 1 0 0 0 1 
 
G(2, 3) = 0 0 0 1 0 0 0 1 .
 
0 0 0 0 1 0 1 0
 
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1

From these matrices, notice that R(1, 2) and R(2, 3) are both the set of all even weight
vectors in F42 and F82 , respectively. Notice also that R(1, 3) is an [8, 4, 4] self-dual code,
which must be H  3 by Exercise 56.
The dimension, minimum weight, and duals of the binary Reed–Muller codes can be
computed directly from their definitions.
Theorem 1.10.1 Let r be an integer with 0 ≤ r ≤ m. Then the following hold:
(i) R(i, m) ⊆ R( j, m), if 0 ≤ i ≤ j ≤ m.
(ii) The dimension of R(r, m) equals
     
m m m
+ + ··· + .
0 1 r
(iii) The minimum weight of R(r, m) equals 2m−r .
(iv) R(m, m)⊥ = {0}, and if 0 ≤ r < m, then R(r, m)⊥ = R(m − r − 1, m).
35 1.10 Reed–Muller codes

Proof: Part (i) is certainly true if m = 1 by direct computation and if j = m as R(m, m)


m
is the full space F22 . Assume inductively that R(k, m − 1) ⊆ R(, m − 1) for all 0 ≤ k ≤
 < m. Let 0 < i ≤ j < m. Then:

R(i, m) = {(u, u + v) | u ∈ R(i, m − 1),v ∈ R(i − 1, m − 1)}


⊆ {(u, u + v) | u ∈ R( j, m − 1),v ∈ R( j − 1, m − 1)}
= R( j, m).

So (i) follows by induction if 0 < i. If i = 0, we only need to show that the all-one vector
of length 2m is in R( j, m) for j < m. Inductively assume the all-one vector of length 2m−1
is in R( j, m − 1). Then by definition (1.7), we see that the all-one vector of length 2m is in
R( j, m) as one choice for u is 1 and one choice for v is 0.
m
For (ii) the result is true for r = m as R(m, m) = F22 and
     
m m m
+ + ··· + = 2m .
0 1 m
It is also true for m = 1 by inspection. Now assume that R(i, m − 1) has dimension
     
m−1 m−1 m−1
+ + ··· + for all 0 ≤ i < m.
0 1 i
By the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has dimension the sum of the
dimensions of R(r, m − 1) and R(r − 1, m − 1), that is,
           
m−1 m−1 m−1 m−1 m−1 m−1
+ + ··· + + + + ··· + .
0 1 r 0 1 r −1
The result follows by the elementary properties of binomial coefficients:
         
m−1 m m−1 m−1 m
= and + = .
0 0 i −1 i i
Part (iii) is again valid for m = 1 by inspection and for both r = 0 and r = m as R(0, m)
m
is the binary repetition code of length 2m and R(m, m) = F22 . Assume that R(i, m − 1)
has minimum weight 2m−1−i for all 0 ≤ i < m. If 0 < r < m, then by definition (1.7) and
the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has minimum weight min{2 ·
2m−1−r , 2m−1−(r −1) } = 2m−r .
To prove (iv), we first note that R(m, m)⊥ is {0} since R(m, m) = F22 . So if we define
m

R(−1, m) = {0}, then R(−1, m)⊥ = R(m − (−1) − 1, m) for all m > 0. By direct compu-
tation, R(r, m)⊥ = R(m − r − 1, m) for all r with −1 ≤ r ≤ m = 1. Assume inductively
that if −1 ≤ i ≤ m − 1, then R(i, m − 1)⊥ = R((m − 1) − i − 1, m − 1). Let 0 ≤ r <
m. To prove R(r, m)⊥ = R(m − r − 1, m), it suffices to show that R(m − r − 1, m) ⊆
R(r, m)⊥ as dim R(r, m) + dim R(m − r − 1, m) = 2m by (ii). Notice that with the defini-
tion of R(−1, m), (1.7) extends to the case r = 0. Let x = (a, a + b) ∈ R(m − r − 1, m)
where a ∈ R(m − r − 1, m − 1) and b ∈ R(m − r − 2, m − 1), and let y = (u, u + v) ∈
R(r, m) where u ∈ R(r, m − 1) and v ∈ R(r − 1, m − 1). Then x · y = 2a · u + a · v + b ·
u + b · v = a · v + b · u + b · v. Each term is 0 as follows. As a ∈ R(m − r − 1, m − 1) =
R(r − 1, m − 1)⊥ , a · v = 0. As b ∈ R(m − r − 2, m − 1) = R(r, m − 1)⊥ , b · u = 0 and
36 Basic concepts of linear codes

b · v = 0 using R(r − 1, m − 1) ⊆ R(r, m − 1) from (i). We conclude that R(m − r −


1, m) ⊆ R(r, m)⊥ , completing (iv). 

We make a few observations based on this theorem. First, since R(0, m) is the length
2m repetition code, R(m − 1, m) = R(0, m)⊥ is the code of all even weight vectors in
m
F22 . We had previously observed this about R(1, 2) and R(2, 3). Second, if m is odd and
r = (m − 1)/2 we see from parts (iii) and (iv) that R(r, m) = R((m − 1)/2, m) is self-dual
with minimum weight 2(m−1)/2 . Again we had observed this about R(1, 3). In the exercises,
you will also verify the general result that puncturing R(1, m) and then taking the subcode
of even weight vectors produces the simplex code S m of length 2m − 1.

Exercise 62 In this exercise we produce another generator matrix G  (1, m) for R(1, m).
Define

1 1
G  (1, 1) = .
0 1
For m ≥ 2, recursively define

 G (1, m − 1) G  (1, m − 1)
G (1, m) = ,
00 · · · 0 11 · · · 1
and define G  (1, m) to be the matrix obtained from G  (1, m) by removing the bottom row
and placing it as row two in the matrix, moving the rows below down.
(a) Show that G  (1, 1) is a generator matrix for R(1, 1).
(b) Find the matrices G  (1, 2), G  (1, 2), G  (1, 3), and G  (1, 3).
(c) What do you notice about the columns of the matrices obtained from G  (1, 2) and
G  (1, 3) by deleting the first row and the first column?
(d) Show using induction, part (a), and the definition (1.7) that G  (1, m) is a generator
matrix for R(1, m).
(e) Formulate a generalization of part (c) that applies to the matrix obtained from G  (1, m)
by deleting the first row and the first column. Prove your generalization is correct.
(f) Show that the code generated by the matrix obtained from G  (1, m) by deleting the
first row and the first column is the simplex code S m .
(g) Show that the code R(m − 2, m) is the extended binary Hamming code H m .
Notice that this problem shows that the extended binary Hamming codes and their duals
are Reed–Muller codes. 

1.11 Encoding, decoding, and Shannon’s Theorem

Since the inception of coding theory, codes have been used in many diverse ways; in
addition to providing reliability in communication channels and computers, they give high
fidelity on compact disc recordings, and they have also permitted successful transmission
of pictures from outer space. New uses constantly appear. As a primary application of
codes is to store or transmit data, we introduce the process of encoding and decoding a
message.
37 1.11 Encoding, decoding, and Shannon’s Theorem

1.11.1 Encoding
Let C be an [n, k] linear code over the field Fq with generator matrix G. This code has q k
codewords which will be in one-to-one correspondence with q k messages. The simplest way
to view these messages is as k-tuples x in Fqk . The most common way to encode the message
x is as the codeword c = xG. If G is in standard form, the first k coordinates of the codeword
c are the information symbols x; the remaining n − k symbols are the parity check symbols,
that is, the redundancy added to x in order to help recover x if errors occur. The generator
matrix G may not be in standard form. If, however, there exist column indices i 1 , i 2 , . . . , i k
such that the k × k matrix consisting of these k columns of G is the k × k identity matrix,
then the message is found in the k coordinates i 1 , i 2 , . . . , i k of the codeword scrambled but
otherwise unchanged; that is, the message symbol x j is in component i j of the codeword.
If this occurs, we say that the encoder is systematic. If G is replaced by another generator
matrix, the encoding of x will, of course, be different. By row reduction, one could always
choose a generator matrix so that the encoder is systematic. Furthermore, if we are willing
to replace the code with a permutation equivalent one, by Theorem 1.6.2, we can choose a
code with generator matrix in standard form, and therefore the first k bits of the codeword
make up the message.
The method just described shows how to encode a message x using the generator matrix of
the code C. There is a second way to encode using the parity check matrix H . This is easiest to
do when G is in standard form [Ik | A]. In this case H = [−AT | In−k ] by Theorem 1.2.1.
Suppose that x = x1 · · · xk is to be encoded as the codeword c = c1 · · · cn . As G is in
standard form, c1 · · · ck = x1 · · · xk . So we need to determine the n − k parity check symbols
(redundancy symbols) ck+1 · · · cn . As 0 = H cT = [−AT | In−k ]cT , AT xT = [ck+1 · · · cn ]T .
One can generalize this when G is a systematic encoder.

Example 1.11.1 Let C be the [6, 3, 3] binary code with generator and parity check matrices
   
1 0 0 1 0 1 1 1 0 1 0 0

G= 0 1 0 1 1 0 and 
H= 0 1 1 0 1 0 ,
0 0 1 0 1 1 1 0 1 0 0 1

respectively. Suppose we desire to encode the message x = x1 x2 x3 to obtain the codeword


c = c1 c2 · · · c6 . Using G to encode yields

c = xG = (x1 , x2 , x3 , x1 + x2 , x2 + x3 , x1 + x3 ). (1.8)

Using H to encode, 0 = H cT leads to the system

0 = c1 + c2 + c4 ,
0 = c2 + c3 + c5 ,
0 = c1 + c3 + c6 .

As G is in standard form, c1 c2 c3 = x1 x2 x3 , and solving this system clearly gives the same
codeword as in (1.8). 
38 Basic concepts of linear codes

Exercise 63 Let C be the Hamming code H3 of Example 1.2.3, with parity check matrix
 
0 1 1 1 1 0 0
H = 1 0 1 1 0 1 0 .
1 1 0 1 0 0 1

(a) Construct the generator matrix for C and use it to encode the message 0110.
(b) Use your generator matrix to encode x1 x2 x3 x4 .
(c) Use H to encode the messages 0110 and x1 x2 x3 x4 . 

Since there is a one-to-one correspondence between messages and codewords, one often
works only with the encoded messages (the codewords) at both the sending and receiving
end. In that case, at the decoding end in Figure 1.1, we are satisfied with an estimate  c
obtained by the decoder from y, hoping that this is the codeword c that was transmitted.
However, if we are interested in the actual message, a question arises as to how to recover
the message from a codeword. If the codeword c = xG, and G is in standard form, the
message is the first k components of c; if the encoding is systematic, it is easy to recover
the message by looking at the coordinates of G containing the identity matrix. What can
be done otherwise? Because G has independent rows, there is an n × k matrix K such
that G K = Ik ; K is called a right inverse for G and is not necessarily unique. As c = xG,
cK = xG K = x.

Exercise 64 Let G be a k × n generator matrix for a binary code C.


(a) Suppose G = [Ik A]. Show that

Ik
,
O

where O is the (n − k) × k zero matrix, is a right inverse of G.


(b) Find a 7 × 3 right inverse K of G, where
 
1 0 1 1 0 1 1
G = 1 1 0 1 0 1 0 .
0 0 1 1 1 1 0

Hint: One way K can be found is by using four zero rows and the three rows of I3 .
(c) Find a 7 × 4 right inverse K of G, where
 
1 1 0 1 0 0 0
0 1 1 0 1 0 0
G= 0 0 1 1 0 1 0 .

0 0 0 1 1 0 1

Remark: In Chapter 4, we will see that G generates a cyclic code and the structure of
G is typical of the structure of generator matrices of such codes.
(d) What is the message x if xG = 1000110, where G is given in part (c)? 
39 1.11 Encoding, decoding, and Shannon’s Theorem

1−
1 s❍ ✲ s1
❍ ✟✟
❍ ✟
❍  ✟✟


✟❍
❍ ✟


✟ ❍
✟ ❥

✟ ✟  ❍❍
✟ ❍❍

0 s✟ ✲ ❍s 0
1−
Send Receive
Figure 1.2 Binary symmetric channel.

1.11.2 Decoding and Shannon’s Theorem


The process of decoding, that is, determining which codeword (and thus which message x)
was sent when a vector y is received, is more complex. Finding efficient (fast) decoding
algorithms is a major area of research in coding theory because of their practical applications.
In general, encoding is easy and decoding is hard, if the code has a reasonably large
size.
In order to set the stage for decoding, we begin with one possible mathematical model
of a channel that transmits binary data. This model is called the binary symmetric channel
(or BSC) with crossover probability  and is illustrated in Figure 1.2. If 0 or 1 is sent, the
probability it is received without error is 1 − ; if a 0 (respectively 1) is sent, the probability
that a 1 (respectively 0) is received is . In most practical situations  is very small. This
is an example of a discrete memoryless channel (or DMC), a channel in which inputs
and outputs are discrete and the probability of error in one bit is independent of previous
bits. We will assume that it is more likely that a bit is received correctly than in error; so
 < 1/2.5
If E 1 and E 2 are events, let prob(E 1 ) denote the probability that E 1 occurs and prob(E 1 |
E 2 ) the probability that E 1 occurs given that E 2 occurs. Assume that c ∈ Fn2 is sent and
y ∈ Fn2 is received and decoded as c ∈ Fn2 . So prob(c | y) is the probability that the codeword
c is sent given that y is received, and prob(y | c) is the probability that y is received given that
the codeword c is sent. These probabilities can be computed from the statistics associated
with the channel. The probabilities are related by Bayes’ Rule
prob(y | c)prob(c)
prob(c | y) = ,
prob(y)
where prob(c) is the probability that c is sent and prob(y) is the probability that y is received.
There are two natural means by which a decoder can make a choice based on these two
probabilities. First, the decoder could choose  c = c for the codeword c with prob(c | y)
maximum; such a decoder is called a maximum a posteriori probability (or MAP) decoder.

5 While  is usually very small, if  > 1/2, the probability that a bit is received in error is higher than the
probability that it is received correctly. So one strategy is to interchange 0 and 1 immediately at the receiving
end. This converts the BSC with crossover probability  to a BSC with crossover probability 1 −  < 1/2. This
of course does not help if  = 1/2; in this case communication is not possible – see Exercise 77.
40 Basic concepts of linear codes

Symbolically, a MAP decoder makes the decision


c = arg max prob(c | y).
c∈C

Here arg maxc∈C prob(c | y) is the argument c of the probability function prob(c | y) that
c = c for the codeword c
maximizes this probability. Alternately, the decoder could choose
with prob(y | c) maximum; such a decoder is called a maximum likelihood (or ML) decoder.
Symbolically, a ML decoder makes the decision


c = arg max prob(y | c). (1.9)
c∈C

Consider ML decoding over a BSC. If y = y1 · · · yn and c = c1 · · · cn ,



n
prob(y | c) = prob(yi | ci ),
i=1

since we assumed that bit errors are independent. By Figure 1.2, prob(yi | ci ) =  if yi = ci
and prob(yi | ci ) = 1 −  if yi = ci . Therefore
 d(y,c)

prob(y | c) = d(y,c) (1 − )n−d(y,c) = (1 − )n . (1.10)
1−
Since 0 <  < 1/2, 0 < /(1 − ) < 1. Therefore maximizing prob(y | c) is equivalent
to minimizing d(y, c), that is, finding the codeword c closest to the received vector y in
Hamming distance; this is called nearest neighbor decoding. Hence on a BSC, maximum
likelihood and nearest neighbor decoding are the same.
Let e = y − c so that y = c + e. The effect of noise in the communication channel is to
add an error vector e to the codeword c, and the goal of decoding is to determine e. Nearest
neighbor decoding is equivalent to finding a vector e of smallest weight such that y − e is in
the code. This error vector need not be unique since there may be more than one codeword
closest to y; in other words, (1.9) may not have a unique solution. When we have a decoder
capable of finding all codewords nearest to the received vector y, then we have a complete
decoder.
To examine vectors closest to a given codeword, the concept of spheres about codewords
proves useful. The sphere of radius r centered at a vector u in Fqn is defined to be the set
  
Sr (u) = v ∈ Fqn  d(u, v) ≤ r

of all vectors whose distance from u is less than or equal to r . The number of vectors in
Sr (u) equals
r  
n
(q − 1)i . (1.11)
i=0
i

These spheres are pairwise disjoint provided their radius is chosen small enough.

Theorem 1.11.2 If d is the minimum distance of a code C (linear or nonlinear) and t =


(d − 1)/2, then spheres of radius t about distinct codewords are disjoint.
41 1.11 Encoding, decoding, and Shannon’s Theorem

Proof: If z ∈ St (c1 ) ∩ St (c2 ), where c1 and c2 are codewords, then by the triangle inequality
(Theorem 1.4.1(iv)),

d(c1 , c2 ) ≤ d(c1 , z) + d(z, c2 ) ≤ 2t < d,

implying that c1 = c2 . 

Corollary 1.11.3 With the notation of the previous theorem, if a codeword c is sent and y
is received where t or fewer errors have occurred, then c is the unique codeword closest to
y. In particular, nearest neighbor decoding uniquely and correctly decodes any received
vector in which at most t errors have occurred in transmission.

Exercise 65 Prove that the number of vectors in Sr (u) is given by (1.11). 

For purposes of decoding as many errors as possible, this corollary implies that for given
n and k, we wish to find a code with as high a minimum weight d as possible. Alternately,
given n and d, one wishes to send as many messages as possible; thus we want to find a
code with the largest number of codewords, or, in the linear case, the highest dimension.
We may relax these requirements somewhat if we can find a code with an efficient decoding
algorithm.
Since the minimum distance of C is d, there exist two distinct codewords such that the
spheres of radius t + 1 about them are not disjoint. Therefore if more than t errors occur,
nearest neighbor decoding may yield more than one nearest codeword. Thus C is a t-error-
correcting code but not a (t + 1)-error-correcting code. The packing radius of a code is the
largest radius of spheres centered at codewords so that the spheres are pairwise disjoint.
This discussion shows the following two facts about the packing radius.

Theorem 1.11.4 Let C be an [n, k, d] code over Fq . The following hold:


(i) The packing radius of C equals t = (d − 1)/2.
(ii) The packing radius t of C is characterized by the property that nearest neighbor decoding
always decodes correctly a received vector in which t or fewer errors have occurred but
will not always decode correctly a received vector in which t + 1 errors have occurred.

The decoding problem now becomes one of finding an efficient algorithm that will correct
up to t errors. One of the most obvious decoding algorithms is to examine all codewords
until one is found with distance t or less from the received vector. But obviously this is
a realistic decoding algorithm only for codes with a small number of codewords. Another
obvious algorithm is to make a table consisting of a nearest codeword for each of the q n
vectors in Fqn and then look up a received vector in the table in order to decode it. This is
impractical if q n is very large.
For an [n, k, d] linear code C over Fq , we can, however, devise an algorithm using a
table with q n−k rather than q n entries where one can find the nearest codeword by looking
up one of these q n−k entries. This general decoding algorithm for linear codes is called
syndrome decoding. Because our code C is an elementary abelian subgroup of the additive
group of Fqn , its distinct cosets x + C partition Fqn into q n−k sets of size q k . Two vectors x
and y belong to the same coset if and only if y − x ∈ C. The weight of a coset is the smallest
weight of a vector in the coset, and any vector of this smallest weight in the coset is called
42 Basic concepts of linear codes

a coset leader. The zero vector is the unique coset leader of the code C. More generally,
every coset of weight at most t = (d − 1)/2 has a unique coset leader.
Exercise 66 Do the following:
(a) Prove that if C is an [n, k, d] code over Fq , every coset of weight at most t = (d − 1)/2
has a unique coset leader.
(b) Find a nonzero binary code of length 4 and minimum weight d in which all cosets have
unique coset leaders and some coset has weight greater than t = (d − 1)/2. 
Choose a parity check matrix H for C. The syndrome of a vector x in Fqn with respect to
the parity check matrix H is the vector in Fqn−k defined by
syn(x) = H xT .
The code C consists of all vectors whose syndrome equals 0. As H has rank n − k, every
vector in Fqn−k is a syndrome. If x1 , x2 ∈ Fqn are in the same coset of C, then x1 − x2 = c ∈ C.
Therefore syn(x1 ) = H (x2 + c)T = H xT2 + H cT = H xT2 = syn(x2 ). Hence x1 and x2 have
the same syndrome. On the other hand, if syn(x1 ) = syn(x2 ), then H (x2 − x1 )T = 0 and so
x2 − x1 ∈ C. Thus we have the following theorem.
Theorem 1.11.5 Two vectors belong to the same coset if and only if they have the same
syndrome.
Hence there exists a one-to-one correspondence between cosets of C and syndromes. We
denote by C s the coset of C consisting of all vectors in Fqn with syndrome s.
Suppose a codeword sent over a communication channel is received as a vector y. Since
in nearest neighbor decoding we seek a vector e of smallest weight such that y − e ∈ C,
nearest neighbor decoding is equivalent to finding a vector e of smallest weight in the coset
containing y, that is, a coset leader of the coset containing y. The Syndrome Decoding
Algorithm is the following implementation of nearest neighbor decoding. We begin with a
fixed parity check matrix H .
I. For each syndrome s ∈ Fqn−k , choose a coset leader es of the coset C s . Create a table
pairing the syndrome with the coset leader.
This process can be somewhat involved, but this is a one-time preprocessing task that
is carried out before received vectors are analyzed. One method of computing this table
will be described shortly. After producing the table, received vectors can be decoded.
II. After receiving a vector y, compute its syndrome s using the parity check matrix H .
III. y is then decoded as the codeword y − es .
Syndrome decoding requires a table with only q n−k entries, which may be a vast im-
provement over a table of q n vectors showing which codeword is closest to each of these.
However, there is a cost for shortening the table: before looking in the table of syndromes,
one must perform a matrix-vector multiplication in order to determine the syndrome of the
received vector. Then the table is used to look up the syndrome and find the coset leader.
How do we construct the table of syndromes as described in Step I? We briefly discuss
this for binary codes; one can extend this easily to nonbinary codes. Given the t-error-
correcting code C of length n with parity check matrix H , we can construct the syndromes
as follows. The coset of weight 0 has coset leader 0. Consider the n cosets of weight 1.
43 1.11 Encoding, decoding, and Shannon’s Theorem

Choose an n-tuple with a 1 in position i and 0s elsewhere; the coset leader is the n-tuple and
the associated syndrome is column i of H . For the ( n2 ) cosets of weight 2, choose an n-tuple
with two 1s in positions i and j, with i < j, and the rest 0s; the coset leader is the n-tuple
and the associated syndrome is the sum of columns i and j of H. Continue in this manner
through the cosets of weight t. We could choose to stop here. If we do, we can decode any
received vector with t or fewer errors, but if the received vector has more than t errors, it
will be either incorrectly decoded (if the syndrome of the received vector is in the table) or
not decoded at all (if the syndrome of the received vector is not in the table). If we decide
to go on and compute syndromes of weights w greater than t, we continue in the same
fashion with the added feature that we must check for possible repetition of syndromes.
This repetition will occur if the n-tuple of weight w is not a coset leader or it is a coset
leader with the same syndrome as another leader of weight w, in which cases we move on
to the next n-tuple. We continue until we have 2n−k syndromes. The table produced will
allow us to perform nearest neighbor decoding.
Syndrome decoding is particularly simple for the binary Hamming codes Hr with par-
ameters [n = 2r − 1, 2r − 1 − r, 3]. We do not have to create the table for syndromes and
corresponding coset leaders. This is because the coset leaders are unique and are the 2r
vectors of weight at most 1. Let Hr be the parity check matrix whose columns are the
binary numerals for the numbers 1, 2, . . . , 2r − 1. Since the syndrome of the binary n-tuple
of weight 1 whose unique 1 is in position i is the r -tuple representing the binary numeral for
i, the syndrome immediately gives the coset leader and no table is required for syndrome
decoding. Thus Syndrome Decoding for Binary Hamming Codes takes the form:
I. After receiving a vector y, compute its syndrome s using the parity check matrix Hr .
II. If s = 0, then y is in the code and y is decoded as y; otherwise, s is the binary numeral
for some positive integer i and y is decoded as the codeword obtained from y by adding
1 to its ith bit.
The above procedure is easily modified for Hamming codes over other fields. This is
explored in the exercises.

Exercise 67 Construct the parity check matrix of the binary Hamming code H4 of length
15 where the columns are the binary numbers 1, 2, . . . , 15 in that order. Using this parity
check matrix decode the following vectors, and then check that your decoded vectors are
actually codewords.
(a) 001000001100100,
(b) 101001110101100,
(c) 000100100011000. 

Exercise 68 Construct a table of all syndromes of the ternary tetracode of Example 1.3.3
using the generator matrix of that example to construct the parity check matrix. Find a coset
leader for each of the syndromes. Use your parity check matrix to decode the following
vectors, and then check that your decoded vectors are actually codewords.
(a) (1, 1, 1, 1),
(b) (1, −1, 0, −1),
(c) (0, 1, 0, 1). 
44 Basic concepts of linear codes

Exercise 69 Let C be the [6, 3, 3] binary code with generator matrix G and parity check
matrix H given by
   
1 0 0 0 1 1 0 1 1 1 0 0
G = 0 1 0 1 0 1 and H = 1 0 1 0 1 0 .
0 0 1 1 1 0 1 1 0 0 0 1
(a) Construct a table of coset leaders and associated syndromes for the eight cosets of C.
(b) One of the cosets in part (a) has weight 2. This coset has three coset leaders. Which
coset is it and what are its coset leaders?
(c) Using part (a), decode the following received vectors:
(i) 110110,
(ii) 110111,
(iii) 110001.
(d) For one of the received vectors in part (c) there is ambiguity as to what codeword
it should be decoded to. List the other nearest neighbors possible for this received
vector. 

3
Exercise 70 Let H be the extended Hamming code with parity check matrix
 
1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
3 = 
H .
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1

Number the coordinates 0, 1, 2, . . . , 7. Notice that if we delete the top row of H 3 , we have
the coordinate numbers in binary. We can decode H  3 without a table of syndromes and coset
leaders using the following algorithm. If y is received, compute syn(y) using the parity check
matrix H 3 . If syn(y) = (0, 0, 0, 0)T , then y has no errors. If syn(y) = (1, a, b, c)T , then there
is a single error in the coordinate position abc (written in binary). If syn(y) = (0, a, b, c)T
with (a, b, c) = (0, 0, 0), then there are two errors in coordinate position 0 and in the
coordinate position abc (written in binary).
(a) Decode the following vectors using this algorithm:
(i) 10110101,
(ii) 11010010,
(iii) 10011100.
(b) Verify that this procedure provides a nearest neighbor decoding algorithm for H  3 . To do
this, the following must be verified. All weight 0 and weight 1 errors can be corrected,
accounting for nine of the 16 syndromes. All weight 2 errors cannot necessarily be
corrected but all weight 2 errors lead to one of the seven syndromes remaining. 

A received vector may contain both errors (where a transmitted symbol is read as a
different symbol) and erasures (where a transmitted symbol is unreadable). These are
fundamentally different in that the locations of errors are unknown, whereas the locations
of erasures are known. Suppose c ∈ C is sent, and the received vector y contains ν errors
and  erasures. One could certainly not guarantee that y can be corrected if  ≥ d because
there may be a codeword other than c closer to y. So assume that  < d. Puncture C in the
45 1.11 Encoding, decoding, and Shannon’s Theorem

 positions where the erasures occurred in y to obtain an [n − , k ∗ , d ∗ ] code C ∗ . Note that


k ∗ = k by Theorem 1.5.7(ii), and d ∗ ≥ d − . Puncture c and y similarly to obtain c∗ and y∗ ;
these can be viewed as sent and received vectors using the code C ∗ with y∗ containing ν errors
but no erasures. If 2ν < d −  ≤ d ∗ , c∗ can be recovered from y∗ by Corollary 1.11.3. There
is a unique codeword c ∈ C which when punctured produces c∗ ; otherwise if puncturing
both c and c yields c∗ , then wt(c − c ) ≤  < d, a contradiction unless c = c . The following
theorem summarizes this discussion and extends Corollary 1.11.3.

Theorem 1.11.6 Let C be an [n, k, d] code. If a codeword c is sent and y is received where
ν errors and  erasures have occurred, then c is the unique codeword in C closest to y
provided 2ν +  < d.
3
Exercise 71 Let H be the extended Hamming code with parity check matrix
 
1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
3 = 
H .
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
Correct the received vector 101 0111, where is an erasure. 

In Exercises 70 and 71 we explored the decoding of the [8, 4, 4] extended Hamming code
 3 . In Exercise 70, we had the reader verify that there are eight cosets of weight 1 and seven
H
of weight 2. Each of these cosets is a nonlinear code and so it is appropriate to discuss the
weight distribution of these cosets and to tabulate the results. In general, the complete coset
weight distribution of a linear code is the weight distribution of each coset of the code. The
next example gives the complete coset weight distribution of H  3 . As every [8, 4, 4] code

is equivalent to H3 , by Exercise 56, this is the complete coset weight distribution of any
[8, 4, 4] binary code.

Example 1.11.7 The complete coset weight distribution of the [8, 4, 4] extended binary
Hamming code H 3 is given in the following table:

Number of vectors
Coset of given weight Number
weight 0 1 2 3 4 5 6 7 8 of cosets
0 1 0 0 0 14 0 0 0 1 1
1 0 1 0 7 0 7 0 1 0 8
2 0 0 4 0 8 0 4 0 0 7

Note that the first line is the weight distribution of H  3 . The second line is the weight
distribution of each coset of weight one. This code has the special property that all cosets of
a given weight have the same weight distribution. This is not the case for codes in general.
In Exercise 73 we ask the reader to verify some of the information in the table. Notice that
this code has the all-one vector 1 and hence the table is symmetric about the middle weight.
Notice also that an even weight coset has only even weight vectors, and an odd weight
coset has only odd weight vectors. These observations hold in general; see Exercise 72.
The information in this table helps explain the decoding of H  3 . We see that all the cosets
46 Basic concepts of linear codes

of weight 2 have four coset leaders. This implies that when we decode a received vector in
which two errors had been made, we actually have four equally likely codewords that could
have been sent. 
Exercise 72 Let C be a binary code of length n. Prove the following.
(a) If C is an even code, then an even weight coset of C has only even weight vectors, and
an odd weight coset has only odd weight vectors.
(b) If C contains the all-one vector 1, then in a fixed coset, the number of vectors of weight
i is the same as the number of vectors of weight n − i, for 0 ≤ i ≤ n. 
Exercise 73 Consider the complete coset weight distribution of H  3 given in Example
1.11.7. The results of Exercise 72 will be useful.
(a) Prove that the weight distribution of the cosets of weight 1 are as claimed.
(b) (Harder) Prove that the weight distribution of the cosets of weight 2 are as
claimed. 
We conclude this section with a discussion of Shannon’s Theorem in the framework of
the decoding we have developed. Assume that the communication channel is a BSC with
crossover probability  on which syndrome decoding is used. The word error rate Perr
for this channel and decoding scheme is the probability that the decoder makes an error,
averaged over all codewords of C; for simplicity we assume that each codeword of C is
equally likely to be sent. A decoder error occurs whenc = arg maxc∈C prob(y | c) is not the
originally transmitted word c when y is received. The syndrome decoder makes a correct
decision if y − c is a chosen coset leader. This probability is
wt(y−c) (1 − )n−wt(y−c)
by (1.10). Therefore the probability that the syndrome decoder makes a correct decision is
n
i=0 αi  (1 − ) , where αi is the number of cosets weight i. Thus
i n−i


n
Perr = 1 − αi i (1 − )n−i . (1.12)
i=0

Example 1.11.8 Suppose binary messages of length k are sent unencoded over a BSC with
crossover probability . This in effect is the same as using the [k, k] code Fk2 . This code
has a unique coset, the code itself, and its leader is the zero codeword of weight 0. Hence
(1.12) shows that the probability of decoder error is
Perr = 1 − 0 (1 − )k = 1 − (1 − )k .
This is precisely what we expect as the probability of no decoding error is the probability
(1 − )k that the k bits are received without error. 
Example 1.11.9 We compare sending 24 = 16 binary messages unencoded to encoding
using the [7, 4] binary Hamming code H3 . Assume communication is over a BSC with
crossover probability . By Example 1.11.8, Perr = 1 − (1 − )4 for the unencoded data.
H3 has one coset of weight 0 and seven cosets of weight 1. Hence Perr = 1 − (1 − )7 −
7(1 − )6 by (1.12). For example if  = 0.01, Perr without coding is 0.039 403 99. Using
H3 , it is 0.002 031 04 . . . . 
54 4. Some Good Codes

corrects if at most one error occurs; otherwise, the word is declared an erasure. In
the end, this turns out to increase the efficiency of the collaborating pair.
We extend the example treated above, to introduce a sequence of codes defined
byP.Eliasin 1954. We start with an extended Hamming code C 1 oflengthnl = 2"'.
Assume that the codes are to be used on a B. S. C. with bit error probability p,
where niP < ~. For C2 we take the extended Hamming code of length 2"'+1.
Define VI := C I and define V2 to be the direct product of C I and C 2• We continue
in this way: if V; has been defined, then V;+I is the direct product of Vi and the
extended Hamming code Ci + 1 of length 2"'+i. Denote the length of Vi by ni and
its dimension by ki • Finally, let Ei be the expected number of errors per block in
words of Vi after decoding.
From the definition, we have:

ni+1 = ni' 2",+i,


ki + 1 = k i · (2"'+i - m - i - 1),

and from Example 3.3.4 it follows that Ei+1 ~ E;andEI ~ (nIP)2 ~ *.Sothese
codes have the property that Ei tends to zero as i -+ 00.
From the recurrence relations for ni and k i , we find

. - 2mi +!i<i-I)
n ,- ,. ki = ni n
i-I (
)=fl
1-
m+ j +l)
2m +j •

So, if Ri denotes the rate of Vi, then

Ri -jo- n
co

i=fl
(
1- m+ j +1)
.2m+)
> 0

for i -jo- 00. So we have a sequence of codes for which the length tends to 00, the
rate does not tend to 0, and nevertheless the error probability tends to O. This is
close to what Shannon's theorem promises us. Note that these codes, called Elias
codes, have minimum distance d i = 4! and hence dJni -jo- 0 as i -jo- 00.

§4.S. Reed-Muller Codes


We shall now describe a class of binary codes connected with finite geo-
metries. The codes were first treated by D. E. Muller (1954) and I. S. Reed
(1954). The codes are not as good as some of the codes that will be treated in
later chapters but in practice they have the advantage that they are easy to
decode. The method is a generalization of majority logic decoding (see Sec-
tion 3.4).
There are several ways of representing the codewords of a Reed-Muller
code. We shall try to give a unified treatment which shows how the different
points of view are related. As preparation we need a theorem from number
theory that is a century old (Lucas (1878».
§4.5. Reed-Muller Codes 55

(4.5.1) Theorem. Let p be a prime and let

and

be representations of n and k in base p (i.e. 0 :s; ni :s; p - 1, 0 :s; k i :s; p - 1).


Then·

PROOF. We use the fact that (1 + x)P == 1 + x P (mod p).IfO:s; r < p then
(1 + x)"p+r == (1 + x P ),,(1 + x)' (mod p).
Comparing coefficients of x"p+s (where 0 :s; s < p) on both sides yields

(::::) == (:)(:) (mod p).


The result now follows by induction. 0

The following theorem on weights of polynomials is also a preparation.


Let q = 2r. For a polynomial P(x) E 1F,,[x] we define the Hamming weight
w(P(x)) to be the number of nonzero coefficients in the expansion of P(x). Let
c ElF", c ,,;: O. The polynomials (x + C)i, i ;?: 0, are a basis of IFq[x].

(4.5.2) Theorem (Massey et al. 1973; cf. [49]). Let P(x) = 1:1:0 b;(x + C)i,
where bl ,,;: 0 and let io be the smallest index i for which bi ,,;: o. Then

w(P(x)) ;?: w«x + C)iO).


PROOF. For 1=0 the assertion is obvious. We use induction. Assume the
theorem is true for I < 2". Now let 2" :s; I < 2"+1. Then we have
2"-1 I
P(x) = 1: b;(x + C)i + 1: b;(x + C)i
i=O i=2"

= PI (x) + (x + C)2" P2 (x) = (PI (x) + C2"P2 (X)) + x 2" P2 (x),


where P1 (x) and P2 (x) are polynomials for which the theorem holds. We
distinguish two cases.
(i) If PI (x) = 0 then w(P(x)) = 2w(P2 (x)) and since io ;?: 2"
w«x + C)iO) = w«x 2" + c 2")(x + C)iO-Z") = 2w«x + C)io-2"),
from which the assertion follows.
(ii) If PI (x) ,,;: 0 then for every term in c Z" Pz(x) that cancels a term in PI (x) we
have a term in X Z" Pz(x) that does not cancel. Hence w(P(x)) ;?: w(P1 (x))
and the result follows from the induction hypothesis. 0
56 4. Some Good Codes

The three representations of codewords in Reed-Muller codes which we


now introduce are: (i) characteristic functions of subsets in AG(m, 2); (ii) co-
efficients of binary expansions of polynomials; and (iii) lists of values which
are taken by a Boolean function on IFf.
First some notations and definitions. We consider the points of AG(m, 2),
i.e. IFf as column vectors and denote the standard basis by Uo, U 1 , ... , U m _ 1 .
Let the binary representation of j be j = L;"=-01 ~ij2i (0 5 j < 2m).
We define Xj := L;'!,-01 ~ijUi' This represents a point of AG(m, 2) and all
points are obtained in this way. Let E be the matrix with columns Xj
(0 5 j < 2m). Write n := 2m. The m by n matrix E is a list of the points of
AG(m, 2), written as column vectors.

(4.5.3) Definitions.
(i) Ai := {Xj E AG(m, 2)I~ij = I}, i.e. Ai is an (m - I)-dimensional affine sub-
space (a hyperplane), for 0 5 i < m;
(ii) vi := the ith row orE, i.e. the characteristic function of Ai' The vector Vi
is a word in lFi; as usual we write 1:= (1, 1, ... , 1) for the characteristic
function of AG(m, 2);
(iii) if a = (a o• a 1 , •••• an-I) and b = (b o• b1 ••••• b,,-d are words in lFi. we de-
fine

(4.5.4) Lemma. Let I = :L:"=-01 ~il2i and let i 1 , •••• is be the values of i for which
~il = O. If

then
71-1
(x + 1)1 = L
j=O
al,jX n - I - j .

(Here, as usual, a product with no factors (s = 0) is defined to be 1.)

PROOF. By Theorem 4.5.1 the binomial coefficient ( II


n- -]
.) is 1 itT ~ij = 1
for every i for which ~il = O. By (4.5.3) (i). (ii) and (iii) we also have al,j = 1 iff
~ij=lfori=il •...• is' D

The following shows how to interpret the products Vi, ... Vi. geometrically.

(4.5.5) Lemma. If i 1 , i 2 .... , is are different then


(i) Vi, Vi,'" Vi. is the characteristic function of the (m - s)-flat
§4.5. Reed-Muller Codes 57

(ii) the weight W(Vil ••• Vi) of the vector Vii' •• Vi. in lFi is 2"'-',
(iii) the characteristic function of {Xj}, i.e. the jth basis vector of lFi is
m-l

ej = Il
i=O
{Vi + (1 + ~i)l},
(iv) the products Vii'" Vi. (0::::;; S ::::;; m) are a basis of lFi.

PROOF.

(i) This is a consequence of (4.5.3)(i)-(iii).


(ii) By (i) the weight is the cardinality of an (m - s)-flat.
(iii) Consider the matrix E. For every i such that ~ij = 0 we replace the ith
row of E, i.e. Vi' by its complement 1 + Vi' If we then multiply the rows of
the new matrix, the product vector will have entry 1 only in position j,
since all possible columns occur only once. As an example consider {X I4 }
in the following table. Since 14 = 0 + 2 + 22 + 2 3 we see that 1 + ~ij = 1
only if i = 0 (here j = 14). So in the table we complement the row corre-
sponding to Vo and then multiply to find (vo + 1) VI V2 V3 which is a row
vector which has a 1 only in the fourteenth position.

(iv) There are L:':o (7) = 2m = n products Vii'" Vi.' The result follows from
(iii). Since the polynomials (x + 1)1 are independent we could also have
used Lemma 4.5.4. 0

The following table illustrates Lemmas 4.5.4 and 4.5.5. For example,
Vo V2 corresponds to 1 = 15 - 20 - 22 = 10 and hence (x + 1)10 =
x lO + x 8 + x 2 + 1.

Vii Viz'" Vi., Coordinates = coefficients of (x + 1i 1= n -1 - It-


t 1111111111111111 15 = 1111
Vo o1 0 1 0 1 0 101 0 1 0 1 0 1 14=1110
VI 0011001100110011 13 = 1101
V2 o 0 0 0 1 1 1 100 001 1 1 1 11 = 1011
V3 o 0 0 0 0 0 0 0 1 1 111 1 1 1 7 = 0111
Vo VI 000 1 000 100 0 1 000 1 12 = 1100
Vo V2 o 0 0 0 0 1 0 100 0 0 0 1 0 1 10 = 1010
Vo V3 o 0 0 0 0 0 000 1 0 1 0 101 6 = 0110
VI V2 o 0 0 0 0 0 1 100 0 0 0 0 1 1 9 = 1001
VI V3 000 0 0 0 0 000 1 100 1 1 5 = 0101
V2 V3 o 0 0 0 0 000 0 0 0 0 1 1 1 1 3 = 0011
Vo VI V2 o 0 0 0 000 1 0 0 000 0 0 1 8 = 1000
Vo VI V3 o 0 0 0 0 0 0 0 000 1 000 1 4 = 0100
Vo V2 V3 000 0 0 0 0 0 0 0 000 1 0 1 2 = 0010
VI v 2 V3 o 0 0 0 0 0 000 0 0 0 0 0 1 1 1 = 0001
Vo VI V2 V3 o 0 0 0 0 0 000 0 0 0 000 1 0=0000
58 4. Some Good Codes

(4.5.6) Definition. Let 0 ~ r < m. The linear code of length n = 2m which has
the products Vi, ••• Vi. with s ~ r factors as basis is called the rth order binary
Reed-Muller code (RM code; notation 3l(r, m».

The special case ~(O, m) is the repetition code. From Lemma 4.5.5(i) we see
th~t the Boolean function Xi, Xi 2 '" Xi. where x = (X o, ... , X m - 1 ) runs through
lFi has value 1 iff x E Ai, Il'" Il Ai.' Hence 9l(r, m) consists of the sequences
of values taken by polynomials in x o, ... , X m - 1 of degree at most r.

(4.5.7) Theorem. ~(r, m) has minimum distance 2m - r•

PROOF. By the definition and Lemma 4.5.5(ii) the minimum distance is at most
2m- r and by Lemma 4.5.4 and Theorem 4.5.2 it is at least 2m - r • (Also see
Problem 4.7.9.) 0

(4.5.8) Theorem. The dual of 9l(r, m) is 9l(m - r - 1, m).

PROOF.
(a) By the definition and the independence of the products Vi, ... Vi. the
dimension of ~(r, m) is 1+ (7) + ... + (~). So dim 9l(r, m) +
dim ~(m - r - 1, m) = n.
(b) Let Vi, ••. Vi. and vi, ... Vj, be basis vectors of ~(r, m) and 3l(m - r - 1, m)
respectively. Then s + t < m. Hence the product of these two basis vectors
has the form vk , ••• Vku where u < m. By Lemma 4.5.5(ii) this product has
even weight, i.e. the original two basis vectors are orthogonal. 0

Corollary. ~(m - 2, m) is the En, n - m - IJ extended Hamming code.

We have chosen the characteristic functions of certain flats as basis for an


RM-code. We shall now show that for every flat of suitable dimension the
characteristic function is in certain RM codes.

(4.5.9) Theorem. Let C = ~(m - I, m) and let A be an I-flat in AG(m, 2). Then
the characteristic function of A is in C.

PROOF. Let f = D:J jjej be the characteristic function of A. By Definition


4.5.3(iv) and Lemma 4.5.5(iii) we have
m
ej = I
.=0
I
(i, •...• i.l
Vi, Vi2 '" Vi,
jeC(i, ..... i.l
and therefore
§4.5. Reed-Muller Codes 59

Here the inner sum counts the number of points in the intersection of A and
the s-flat
L = {xj E AG(m, 2)1j E C(ii' ... , is)}.
If s > m - I then L n A is either empty or an affine subspace of positive
dimension. In both cases IL n A I is even, i.e. the inner sum is O. 0

This theorem and the definition show that a word is in 9l(r, m) iff it is the
sum of characteristic functions of affine subspaces of dimension;::: m - r. In
the terminology of Boolean functions 9l(r, m) is the set of polynomials in x o,
Xl' .•. , X m - l of degree:::;; r.
In Section 3.2 we defined the notion of equivalence of codes using permuta-
tions acting on the positions of the codewords. Let us now consider a code C
of length n and the permutations 1t E Sn which map every word in C to a word
in C. These permutations form a group, called the automorphism group of C
(Notation: Aut(C)). For example, ifC is the repetition code then Aut(C) = Sn.

(4.5.10) Theorem. AGL(m, 2) c Aut(9l(r, m)).

PROOF. This is an immediate consequence of Theorem 4.5.9 and the fact that
AGL(m, 2) maps a k-flat onto a k-flat (for every k). 0

Remark. The reader should realize that we consider AGL(m,2) acting on


AG(m, 2) as a group of permutations of the n positions, which have been
numbered by the elements of AG(m, 2).

Without going into details we briefly describe a decoding procedure for


RM codes which is a generalization of majority decoding. Let C = 9l(r, m).
By Theorems 4.5.8 and 4.5.9 the characteristic function of any (r + I)-flat in
AG(m, 2) is a parity check vector for C. Given an r-flat A there are 2m - r - I
distinct (r + I)-flats which contain A. A point not in A is in exactly one of
these (r + I)-flats. Each of these (r + I)-flats contains the points of A and
exactly as many points not in A.
Now let us look at the result of the parity checks. Let a received word
contain less than 2m- r - 1 errors (see Theorem 4.5.7). Let t parity checks fail.
These are two possible explanations:
(i) This was caused by an odd number of errors in the positions of A,
compensated 2m - r ,- I - t times by an odd number of errors in the re-
maining positions of the check set.
(ii) The number of errors in the positions of A is even but in t of the parity
check eql:lations there is an odd number of errors in the remaining
positions.
By maximum likelihood (ii) is more probable than (i) if t < 2m- r - 1 and
otherwise (i) is more probable. This means that it is possible to determine the
60 4. Some Good Codes

parity of the number of errors in the positions of any r-flat. Then, using a
similar procedure, the same thing is done for (r - I Fflats, etc. After r + 1 steps
the errors have been located. This procedure is called multistep majority
decoding.

§4.6. Kerdock Codes


We shall briefly discuss a class of nonlinear codes known as Kerdock codes,
(cf. [75], [11]). A Kerdock code is a subcode of a second order Reed-Muller
code consisting of a number of co sets of the corresponding first order Reed-
Muller code. Note that ~(2, m) is itself a union of cosets of 9l( 1, m), each coset
corresponding to some quadratic form
(4.6.1)

Corresponding to Q, there is an alternating bilinear form B defined by


B(v, w):= Q(v + w) - Q(v) - Q(w) = vBwT ,
where B is a symplectic matrix (zero on the diagonal and B = - BT ). By an
easy induction proof one can show that, by a suitable affine transformation,
Q can be put into the form
h-l
(4.6.2) L V2i V2i+l + L(v),
i=O

where L is linear and 2h is the rank of B. In fact, one can see to it that L(v) = 0,
t or V 2h •

(4.6.3) Lemma. The number of points (xo,


,h-l 0·IS 22h - 1 + 2h - 1.
Xl' ••• , X 2h - 1 ) E IFr for which
L..i=O XZiXZi+l =

PROOF. If xo = Xz = ... = X2h-2 = 0, then there are 2h choices for


1 ). Otherwise there are 2 -
(x l ' ... , x 2h - h 1 choices. So, the number of zeros is

2h + (2h - 1)2 - •
h 1 0

From (4.6.2) and (4.6.3) we find the following lemma.

(4.6.4) Lemma. Let m be even. If Q(v) is a quadratic form corresponding to a


symplectic form of rank m, then the coset of 9l(J, m) determined by Q(v) has 2m
words of weight 2m - 1 - 2m /2 - 1 and 2m words of weight 2m - 1 + 2m/ 2 - 1 .

(Note that this implies that if Q has rank smaller than m, the corresponding
coset has smaller minimum weight).
Clearly, a union of cosets of 9l(1, m) will be a code with minimum distance
at most 2m- 1 - 2m/2 - 1 • We wish to form a code C by taking the union of
co sets corresponding to certain quadratic forms Q l ' ... , Ql (with associated
108 6. Cyclic Codes

§6.11. Generalized Reed-Muller Codes


We shall define a class of (extended) cyclic codes over IFq that are equivalent to
Reed-Muller codes in the case q = 2. First, we generalize the idea of Hamming
weight to integers written in the q -ary number system.

(6.11.1) Definition. If q is an integer::: 2 and j = r:,;:;/


gjqj, with 0 ~ gj < q
for i = 0, 1, ... ,m - 1, then we define wq(j) := r:,:~1 gj.
Note that the sum is taken in 71.. The new class of codes is defined as follows.

(6.11.2) Definition.The shortened rth order generalized Reed-Muller code (GRM


code) of length n = qm - 1 over IFq is the cyclic code with generator

f1 (x - a
<rl
g(x) := j ),

where a is a primitive element in IFqM and the upper index (r). iitdicates that the
product is over integers j with 0 ~ j < qm - 1 and 0 ~ Wq (j) < (q - l)m - r.
The r-th order GRM code of length qm has a generator matrix G* obtained
from the generator matrix G of the shortened GRM code by adjoining a column
of Os and then a row of Is.
Note that the set of exponents in this definition of shortened GRM codes is
indeed closed under multiplication by q. Let h (x) be the check polynomial of the
shortened r th order GRM code. Then the dual of this code has the polynomial
h*(x) as generator, where h*(x) is obtained from h(x) by reversing the order of
the powers of x. It is defined in the same way as g(x), now with the condition
0< wq(j) ~ r.
We have the following generalization of Theorem 4.5.8.

(6.11.3) Theorem. The dual ofthe r-th order GRM code oflength qm is equivalent
to a GRM code of order (q - l)m - r - 1.

PROOF. We have seen above that (x - 1)h*(x) is the generator of the shortened
GRM code of order (q - l)m - r - 1. If we now lengthen the cyclic codes to
GRM codes, we must show orthogonality of the rows of the generator matrices.
The only ones for which this is not a consequence of the duality of the shortened
codes are the all one rows. For these, the factor (x - 1) in the generators and the
fact that the length is qm takes care of that. Since the dimensions of the two codes
add up to qm, we are done. O.
To handle the binary case, we need a lemma.

(6.11.4) Lemma. Let C I and C2 be cyclic codes of length n over IFq with check
polynomials fl (x) := TI:~I (x - aj), resp. TI~~I (x - tJj). Let Cbe the cyclic code
of the same length for which the check polynomial has all the products aj tJ j as its
zeros. Then C contains all the words ab, where a E C[, b E C 2 •
§6.11. Generalized Reed-Muller Codes 109

PROOF. We use the representation of cyclic codes by linear recurring sequences,


given at the end of §6.5. We know that the coordinates of a and b can be represented
b
as sums al = L:~l ciai1 and l = L~~l cj/lil. The result follows immediately
from this representation and the definition of abo 0
The following theorem justifies the terminology of this section.

(6.11.5) Theorem. The rth order binary GRM code of length 2m is equivalent to
the rth order Reed-Muller code of length 2m •

PROOF. The proof is by induction. For r = 0, the codes defined by (4.5.6) and
(6.11.2) are both repetition codes. We know that the binary Hamming code is
cyclic. So, for r = 1 we are done by the corollary to Theorem 4.5.8. Assume
that the assertion is true for some value of r. The check polynomial h*(x) of the
shortened GRM code has zeros a j , where W2(j) ~ r. The zeros of the check poly-
nomial of the shortened I-st order RM code are the powers a j with W2 (j) = 1. The
theorem now follows from, the induction hypothesis, Definition 4.5.6 and Lemma
6.11.4. 0

We end this section with a theorem on weights in RM codes. It is another


application of Theorem 6.8.5.

(6.11.6) Theorem. Let F = F(Xh X2, ... , xm) be apolynomialofdegree r defined


on 1F2'. We write G C F ifthe monomials ofG form a subset ofthe set ofmonomials
ofF. We define v(G) to be the number ofvariables not involved in G and we denote
the number ofmonomials in G by IG I. If N (F) is the number ofzeros of F in 1F2',
then
N(F) = 2m - 1 + L(_I)IGI2IGI+V(GH.
GcF

PROOF. For every G C F, we define f(G) to be the number of points in 1F2' where
all the monomials of G have the value 0 and all the other monomials of F have
the value 1. Clearly we have

L f(H) = 2 v (F-G)
HcG

(because this is the number of points in the affine subspace of 1F2' defined by
Xii = Xi2 = ... = Xi. = 1, where the Xit are the variables occurring in F - G). It
follows from Theorem 6.8.6 that

f(G) = L IL(H, G)2 v (F-H).

HcG

Furthermore
N(F) = f(G).
GCF.IF-GI=O (mod 2)

Since LGa /(G) = 2m , we find


110 6. Cyclic Codes

N(F) = 2m - 1 + ~ L(-l),F-G' f(G)


2 GcF

= r- 1 + ~ L(_l)lF-GI LJ.L(H, G)2v(F-Hl


2 GcF HcG

= 2m - 1 + ~ L(_1)1F- HI 2V(F- Hl L 1
2 HcF HcGcF

= 2m - 1 + ~ L(_1)1F-HI2V(F-Hl21F-HI
2 HcF

= 2m - 1 + ~ L(_1) IGI 2 (Gl+IGI.


V
0
2 GcF

We now apply this to RM codes.

(6.11.7) Theorem. ~ weights of the codewords in .9B(r, m) are divisible by


2fm/rl-l.

PROOF. The code .9B(r, m) consists of the sequences of values taken by polynomials
of degree at most r in m binary variables. The codeword corresponding to a
polynomial F has weight 2m - N(F). If G C F and G bas degree d, then
r
v(G) ~ m -IGI' d, i.e. IGI ~ m-;(Gl 1. Since

m - v(G) m
v(G) +r d 1 ~ rd 1,
the result follows from Theorem 6.11.6 o

§6.12. Comments
The reader who is interested in seeing the trace function and idempotents
used heavily in proofs should read [46, Chapter 15].
A generalization of BCH codes will be treated in Chapter 9. There is
extensive literature on weights, dimension, covering radius, etc. of BCH
codes. We mention the Carlitz-Uchiyama bound which depends on a deep
theorem in number theory by A. Weil. For the bound we refer to [42]. For a
generalization of QR codes to word length n a prime power, in which case the
theory is similar to Section 6.9 we refer to a paper by J. H. van Lint and F. J.
MacWilliams (1978; [45]).
32 Useful Background

(why?). We do not have to consider combinations of five or more vectors


because they would necessarily have weight 8 or higher. Hence d = 8, and C
is a triple-error-correcting code.
We obtain the Golay [23,12] code by removing a column of G. Although it
is not obvious, no matter which column of G we remove, we get an
equivalent code. This is not true for all codes. The Golay code has minimum
weight 7 and is a triple-error-correcting code.
Golay also discovered a ternary [12,6] Golay C. This code can be de-
scribed by a generator matrix G = (I\A) where A is as follows:
(0 1 1 1 1 l\
1 0 1 2 2 1
1 1 0 1 2 2
A=
1 2 1 0 1 2

u
1 2 2 1 0 1
1 2 2 1 0/
By Theorem 3, C is self-orthogonal, hence self-dual. The fact that d = 6 is
left for Problem 21.

2.5 REED-MULLER CODES

Reed-Muller codes are an infinite family of codes that are defined recur-
sively. Many things are known about them, including their minimum weights.
Berlekamp [2] calls them weak codes that are easy to decode. At modest
lengths, however, there are good Reed-Muller codes; they get weaker as their
lengths increase. However, they are valuable as building blocks for other
codes.
If £>! is a binary [n, kv dx] code and D2 is a binary [n, k2, d2] code, we
construct a binary code C of length 2n as follows: C = {|u|u + v| where u is
in Dv v is in Z>2}. Then (Problem 22) C is a [2n, kx + k2, minGdj, d2)] code.
Further, if G, is a generator matrix for Dh i = 1,2, then is a
generator matrix for C (Problem 23).
One way of defining Reed-Muller codes is recursively using this construc-
tion. Reed-Muller codes, R(r, m), are binary codes that exist at length n = 2 m
for 0 < r < m. Ä(m, m) is the whole space, and Ä(0, m) is the repetition
code.
If 0 < r < m, define R(r + 1 , m + 1) to be {|u|u + v| where u is in
R(r + 1, m) and v is in R(r, m)}. Let G(r, m) denote a generator matrix of
R(r, m). Then we see that

G(r + 1, m + 1) =
\ 0 G(r,m)
is a generator matrix of R(r + 1, m + 1).
Reed-Muller Codes 33

Theorem 22. The dimension of R(r, m) equals 1 + ( 7 ) + , ,


"f(j· ™β
minimum weight ofR(r, m) equals 2m~r.

Proof. We leave it to Problem 24 to prove the statement about dimension


by induction on m using the identity J . I +1 . . I = I . . I.
Suppose that the minimum weight of R(r, m) is 2 m " r . Then the minimum
weight of R(r + 1, m + 1) equals min(2(2 m - f - 1 ),2 m " r) = 2 m ~ r . We will see
below that the formula is correct for m = 2, so it is correct for all m by
induction. ■

It is interesting that one can write down a generator matrix for a specific
R(r, m) without having computed the generator matrices for smaller R(r, m).
We can show (Problem 25) that R(rv m) c R(r2, m) if rx < r2, so we start
with a basis of i?(0, m), namely h. We will call h v0. We will extend this to a
basis of /f(l, m), then to a basis of R(2, m), and so on.
Start with m = 2:
v
o 1 1 1 1
V) 0 0 1 1
v
2 0 1 0 1
v,v2 0 0 0 1

As noted, v0 is a basis of Ä(0,2), which has dimension 1 and d = 4. Now


v
o»vi>v2 a r e a b as * s °f ^(1»2), which has dimension 3 and d = 2. Note that
VJVJ has ones in those positions in which both Vj and v2 are one. All four
vectors are a basis of the whole space. This is not quite the basis we gave in
our definition, but the rows of this basis are a permutation of the rows in the
defining basis. This holds in general.
Consider now m = 3:
v
o 1 1 1 1 1 1 1 1
Vl 0 0 0 0 1 1 1 1
V
2 0 0 1 1 0 0 1 1
V
3 0 1 0 1 0 1 0 1
V
l*2 0 0 0 0 0 0 1 1
V V
1 3 0 0 0 0 0 1 0 1
v2v3 0 0 0 1 0 0 0 1
V
1V2V3

We notice that \x is chosen to have its first half zero, second half one; v2 has
its first quarter zero, second quarter one, third quarter zero, fourth quarter
34 Useful Background

one, and so on. Again ν,ν; has ones only where both v, and v; have ones and
v v v
i 2 3 *s on ty o n e w h e r e aH three of Vj, v2, and v3 are one. Note that i?(l, 3) is
equivalent to the code C5.
In addition to knowing the dimension and minimum weight of a Reed-
Muller code, we can identify its dual code. Naturally this is another Reed-
Muller code.

Theorem 23. R(m - r - 1, m) and R(r, m) are dual codes.

Proof. There are two things to prove: One is that the dimensions of these
two codes add up to 2 m , which we leave to Problem 27; the other is that these
codes are orthogonal to each other.
We prove the orthogonality by induction on m. We can verify this easily
for m = 2. Suppose that it is true for all R(r, rri) where rri <m. Recall that

G(r,m-1) G(r,m-1) \
G{r,m) =
0 G(r- l,m - 1) ]

and

G(m - r - l,m - 1) G(m - r - 1,/n - 1)\


G(m — r — 1, m) =
0 G{m - r - 2 , m - 1 ) ) '

It is enough to show that these bases are orthogonal to each other.


Any row of (G(r, m - 1) G(r, m - 1)) is orthogonal to any row of
(G(m — r - 1, m - 1) G(m - r - 1, m — 1), since each of these rows is a
repeated pair of vectors (a, a) where a has length 2 m _ 1 , and two such
vectors are clearly orthogonal to each other. By the induction assumption,
the rows of G(r, m - 1) are orthogonal to the rows of G(m - r — 2, m — 1).
The same is true for the rows of G(r — 1, m - 1) and the rows of
G(m - r - 1, m - 1). Since R(m - r - 2) c R(m - r - 1), the rows of
G{r - 1, m - 1) are orthogonal to the rows of G(m - r - 2, m - 1).
We have checked enough rows to prove our point. ■

We have seen enough of Reed-Muller codes to get the feeling that these
are codes with geometrical connections, but we will not explore these
connections here. We just remark that their geometric nature can be used to
show that the group of each R(r, m) contains the affine group, AG(m, 2), of
order 2",(2'" - lX2m - 2 2 )---(2 m - 2 m " 1 ).
We will spend just a bit of time on decoding Reed-Muller codes, called
Reed decoding. Reed decoding of R(rf m) is based on the fact that it is
possible to construct 2m~r check sums each involving 2r bits of a received
word where each received bit is used in only one check sum. This is a form of
what is called majority logic decoding.
Puncturing, Extending, and Shortening 35

We illustrate this first for /?(1,3) where we construct four check sums each
involving two bits of a received word where each received bit is used in only
one check sum. Notice that we are again decoding the single-error-correcting
extended Hamming code but now by Reed-decoding. Recall the basis
ν
ο>νι>ν2>ν3· Let v = (yo>y\>--->yi) be a received vector. If there are no
errors, v = a0v0 + a1\l + a2v2 + a3\3. It is not hard to see that

*ι = > Ό + } Ί - Λ + Λ - Λ ^ys=y^y^
β
^ ^ O + ^ ^ l + >>3 =^4 + Λ Λ +>>7>
β β
« 3 Λ + Λ ) Ί + > ; 5=>'2+)'6 = = J'3 + >V

Hence, if a single error occurs, we can determine ai9 i = 1,2,3 by a majority


count. We can find aQ by taking a majority vote on all the y/s, i = 0,..., 7.

2.6 PUNCTURING, EXTENDING, AND SHORTENING

If C is an [n,fc]binary code, then the code we obtain by adding an overall


parity check is an [n + l,k] code called the extended C. This process is called
extending C.
Note that the extended C has only even weight vectors.
If C is an [n, k] code, then the code we obtain by removing a column of a
generator matrix of C is called a punctured C. This process is called
puncturing C. Note that for a given C, it is possible that puncturing at
different columns give inequivalent codes. The punctured code has length
n - 1 and dimension k or k - 1. If C has minimum weight d, the punctured
code usually has minimum weight d — 1 but could conceivably have mini-
mum weight rf.
The binary extended Hamming code is a [2 r ,2 r - 1 - r,4] code. The
extended Hamming [8,4,4] code, C5, is equivalent to Ä(l,3). In general
R(r - 2, r), a [2r, 2 r - 1 - r, 4] code is the extended Hamming code whose
dual is Λ(1, r).
If C is an [n, k, d] code, a shortened code C of C is the set of all
codewords in C that are 0 in a fixed position with that position deleted. If the
deleted position had ones in it, C is an [n - 1, k - 1, d'] code with d' > d.
Note that C4 is a shortened code of C5. A shortened code of the
[2r, 2 r — 1 — r, 4] extended Hamming code and the [2r - 1, r 4- 1] punctured
first-order Reed-Muller code are duals of each other.
We obtained the Golay [23,12] code C by puncturing the Golay [24,12]
code C. Alternatively, we could have started with a definition of C and
obtained C as its extension.
36 Useful Background

PROBLEMS

1. Prove the four numbered facts about cosets given in Section 2.1.
2. How many errors can C2 (the code constructed after Theorem 1 in
Chapter 1) correct?
3. (a) Construct a standard array for the code C3 (Problem 1.13). For each
coset give its syndrome.
(b) Using the constructions of part (a), decode the received vectors
y i = (0,1,1,0,1,1) and y2 = (0,0,0,1,0,0) for C3.

(c) Find a pattern of 2 errors in C3 that your standard array decoding


scheme will correct and find a pattern of 2 errors that it will not
correct.
4. Compute the coset weight distribution of C3.
5. Calculate the probability of correctly decoding a message in C3 using
syndrome decoding if q = .9 and p = .1.
6. What can you say about the coset weight distribution of a [20,12]
double-error-correcting binary code?
7. Show that a binary code can correct all single errors iff any parity check
matrix has distinct nonzero columns.
8. The code Ham(4,2) has the following parity check matrix H:
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
H=
u
0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 ij
Decode the vectors yx = (1,1,1,0,1,1,1,0,0,0,0,0,0,0,0) and y2 =
(1,1,0,0,1,1,0,0,1,1,0,0,1,1,1) for Ham(4,2).
9. Is there a [12,7,5] binary code?
10. Construct a standard array for the ternary Hamming [4,2,3] code. Here
q = 3 and r = 2.
11. Using the parity check matrix H given below, devise a decoding scheme
for the extended Hamming [8,4] code that is analogous to Hamming
decoding. Correct as many errors as you can and detect as many errors
as you can detect (but cannot correct).
1 1 1 1 1 1 1 1\
0 0 0 1 1 1 1 0
H=
0 1 1 0 0 1 1 Oj
1 0 1 0 1 0 1 oj
An RS code C can thus be described using the n × k generator matrix G. From the encoding
function Enc defined using (1), it is clear that G is the Vandermonde matrix
 
1 α0 . . . α0k−1
 1 α1 . . . αk−1 
1
G =  .. ..  . (3)
 
.. . .
 . . . . 
k−1
1 αn−1 . . . αn−1

The minimum distance d of an RS code C can be computed algebraicly using Lemma 2.1.
Lemma 2.1. A polynomial of degree D over a field F has at most D roots (counting multiplicity).
Proof. The theorem is proved by induction on the degree D. The case D = 0 is obvious. Let
f (X) be a nonzero polynomial of degree D over F. let α ∈ F be a root of f (X). By the division
theorem for polynomials over a field, we can write f (X) = Q(X)(X − α) + R(X), where R(X)
is the remainder polynomial with degree less than 1, and therefore a constant polynomial. Since
f (α) = R(α) = 0, we must have R(X) = 0. Therefore f (X) = (Xα )Q(X). By induction
hypothesis, Q(X), which has degree D − 1, has at most D − 1 roots. These roots together with α
can make up at most D roots for f (X).
Since the degree of the encoded polynomial in (1) is k − 1, a codeword c can have at most
k − 1 elements M (αi ) equal to zero. The minimum distance d, equal to the minimum weight
of any codeword in C, is therefore at least as d ≥ n − k + 1. The Singleton bound (proven in
Lecture 5) provides a bound of d ≤ n − k + 1 for any code. Hence, the minimum distance of
the RS code C is d = n − k + 1. The upper bound can also be demonstrated by constructing a
codeword with exactly d = n−k +1 non-zero entries. Let M (x) = (x−α0 )(x−α1 ) . . . (x−αk−2 )
be the encoding polynomial as in (1). Since the degree of M (x) is k − 1, there exists a message
m = [m0 , . . . , mk−1 ] which corresponds to the polynomial M (x), simply by mathing coefficients
in (1). Hence, evaluating M (x) for all αi , i = 0, . . . , q − 1 yields a codeword with k − 1 zeros
followed by n − k + 1 non-zero entries. We record the distance property of RS codes as:
Lemma 2.2. Reed-Solomon codes meet the Singleton bound, i.e., a code of block length n and
dimension k has distance n − k + 1.
RS codes can thus be used to achieve a relative distance of δ = nd = n−k+1 n
= 1 − R + o(1)
for any rate R = nk . However, the alphabet size q scales as q = Ω(n). By the Plotkin bound, for
q
codes over an alphabet of size q, we have R ≤ 1 − q−1 δ, so to meet the Singleton bound q has to
grow with the block legnth n. We now use similar algebraic ideas to construct codes over smaller
alphabet size, at the expense of worse rate vs distance trade-offs.

3 Reed-Muller Codes
In what follows, a generalization is provided for the RS codes described in Section 2 by expanding
the polynomial encoding in (1) to multivariate polynomials. The resulting codes are hereafter

2
referred to as Reed-Muller (RM) codes.1

3.1 Bivariate RM Codes


We begin with the simplest extension, from univariate to bivariate polynomials. Let m be the
matrix [mij ] for 0 ≤ i ≤ ` − 1 and 0 ≤ j ≤ ` − 1 denoting a message of k = `2 symbols in Fq .
The encoding function
Enc : F`×`
q → Fq×q
q

is given by mapping a message m to a codeword c given by the matrix [M (αx , αy )] for αx ∈ Fq


and αy ∈ Fq , where M (x, y) is given by
`−1 X
X `−1
M (x, y) = mij xi y j . (4)
i=0 j=0

The resulting RM code is a [q 2 , `2 , d]q linear code. Linearity can be verified as in Section 2. The
minimum distance d of the RM code can be computed using the following result.
Lemma 3.1. The tensor product of two [q, `, d]q RS codes C1 and C2 is the [q 2 , `2 , d2 ]q (bivariate)
RM code C.
Proof. The tensor product of two codes C1 and C2 is defined as the code C = C1 ⊗ C2 given by

C1 ⊗ C2 = G1 mGT2 | m ∈ {0, 1}`×` ,




where G1 and G2 are the generator matrices for C1 and C2 , respectively. Since both C1 and C2
are RS codes, the matrices G1 and G2 are both equal to the RS generator matrix G given in (3).
Hence, a message m is mapped to the codeword M = GmGT ∈ C. The entry M (αx , αy ) in row
x and column y of the codeword M is given by the product gx mgyT , where gx denotes the row
[1, αx , . . . , αx`−1 ] of G, for 0 ≤ x ≤ q − 1. Hence, the product code is such that
`−1 X
X `−1
M (αx , αy ) = mij αxi αyj ,
i=0 j=0

which is consistent with the definition of the bivariate Reed-Muller code C in (4) with x and y
replaced with αx and αy .
The use of tensor product codes and the result of Lemma 3.1 implies that the [q 2 , `2 , d]q Reed-
2
Muller code has distance d = (q − ` + 1)2 = q 2 − 2q(` − 1) + (` − 1)2 and rate R = q`2 . Note that
the distance d = (q − ` + 1)2 no longer achieves equality in√
the Singleton bound d ≤ q 2 − `2 + 1.
However, the alphabet size q in this case scales as q = O( n). This demonstrates the trade-off
between optimal distance and smaller alphabet size that is characteristic of RM codes over RS
codes.
1
An alternate definition of Reed-Muller codes is common, but Prof. Guruswami claims the multivariate polynomial
interpretation is more clear.

3
3.2 Multivariate RM Codes
The bivariate extension of Section 3.1 generalizes in the natural way to multivariate polynomials.
A multivariate RM code C with v variables x1 , . . . , xv can be interpreted as the tensor product code
of v RS codes C1 , . . . , Cv . The encoding function

Enc : F`q1 ×···×`v → Fq×···×q


q

maps a message m = [mi1 ...iv ] to a codeword M (x1 , . . . , xv ) as


1 −1
`X v −1
`X v
i
Y
M (x1 , . . . , xv ) = ··· mi1 ...iv xjj . (5)
i1 =0 iv =0 j=1
h Qv Qv i
The resulting RM code is a q v , j=1 `j , j=1 dj linear code. Linearity can be verified using an
q
identical method to that of Section 2. The minimum distance d of the multivariate RM code can be
proven using the multivariate extension to Lemma 3.1 or using the following result.
Lemma 3.2. A non-zero polynomial Q P (x1 , . . . , xv ) over a field F with maximum degree di for the
variable xi is non-zero in at least vi=1 (q − di ) points in Fv .
Proof. We use induction on v. The case v = 1 is the content of Lemma 2.1. Fix x1 , . . . , xv−1 and
express P (x1 , . . . , xv ) as

P (x1 , . . . , xv ) = Rdv (x1 , . . . , xv−1 )xdvv + . . . + R0 (x1 , . . . , xv−1 ),

which is a polynomial of degree dv in the variable xv . By Lemma 2.1, there are at least q − dv
values of xv for which P (x1 , . . . , xv ) is a non-zero olynomial in x1 , . . . , xv . For each of the (at
Qv−1q − dv ) values of xv which yield non-zero P (x1 , . . . , xv ), by induction there are at least
least
i=1 (q − di ) values to x1 , . . . , xv that lead to a nonzero evaluation.

The following construction demonstrates how equality is achieved in the bound provided by
Lemma 3.2. Since the bound results from fewer than q − di roots for any given xi , equality is
achieved whenever there are exactly q − di roots for each xi . Hence, let Mi (xi )Qbe the product
(xi − αi,1 ) . . . (xi − αi,`i −1 ), where the αi,j are distinct, and let M (x1 , . . . , xv ) = vi=1 Mi (xi ).

3.3 Variant on Multivariate Reed-Muller Codes


We next relax the condition on multivariate RM codes independently bounding the maximum
degree of each variable xi and allow for codeword polynomials M (x1 , . . . , xv ) with total degree at
most `. The encoding function is similar to that in Section 3.2 with the encoding polynomial M
given by
v
i
X Y
M (x1 , . . . , xv ) = mi1 ...iv xjj . (6)
j=1
i1 ,...,iv ≥0,
i1 +...+iv ≤`

4
The resulting code C is a [q v , k, d]q linear code, where k is the total number of tuples (i1 , . . . , iv )
of nonnegative integers satisfying i1 + . . . + iv ≤ `. The values of k and d are computed using the
following results.
Observation 3.3. The value k for the given code C is v+l

v
(stated without proof).
Lemma 3.4. A non-zero polynomial P (x1 , . . . , xv ) of total degree at most ` over Fq is zero on at
most a fraction q` of points in Fvq .
Proof. The statement is proved via induction. The case v = 1 states that a univariate polynomial of
degree ` has at most ` roots and is proved using Lemma 2.1. We next note that such a polynomial
can be written as

P (x1 , . . . , xv ) = R`1 (x1 , . . . , xv−1 )x`v1 + . . . + R0 (x1 , . . . , xv−1 ).

The probability that P (α1 , . . . , αv ) = 0 is computed using as

Pr[P (α1 , . . . , αv ) = 0] = Pr[P (α1 , . . . , αv ) = 0 | R`1 (α1 , . . . , αv−1 ) = 0]


× Pr[R`1 (α1 , . . . , αv−1 ) = 0]
= Pr[P (α1 , . . . , αv ) = 0 | R`1 (α1 , . . . , αv−1 ) 6= 0]
× Pr[R`1 (α1 , . . . , αv−1 ) 6= 0]
` − `1 `1 `
≤1 × +1× = (7)
q q q
where we used the induction step for R` which has degree ` − `1 , and the fact that a univariate
polynomial in xv of degree `1 has at most `1 roots.
The result of Lemma 3.4 can then beused to  yield the result that (assuming ` ≤ q) the distance
`
of the code C can be bounded as d ≥ 1 − q q v . This suggests that RM codes do not provide
R, δ > 0 for constant q, i.e. q increases with n.

4 Binary Reed-Muller Codes


We now shift our attention to the “original” Reed-Muller codes. These were binary codes de-
fined by Muller (1954) and Reed (1954) gave a polynomial time majority logic decoder for these
(which we will discuss later). The binary RM code C results from encoding a multilinear encoding
polynomial M given by X Y
M (x1 , . . . , xv ) = cS xi ,
S : |S|≤` i∈S
v
P2` points
at all  in Fv2
(the coefficients cS are the message bits). The binary RM code C is a
v v
[2 , i=0 i , d]2 linear code, where the distance d is given by the following lemma.
Lemma 4.1. The minimum distance d of the binary RM code described above is d = 2v−` .

5
Proof. Consider the encoding polynomial M (x1 , . . . , xv ) = `i=1 xi resulting from the message
Q
leading to the coefficient cS = 1 if and only if S = {1, . . . , `}. There are exactly 2v−` choices for
(x1 , . . . , xv ) that make M non-zero, namely those with x1 = . . . = x` = 1. The distanceQ d is thus
bounded as d ≤ 2v−` . Next, consider the non-zero polynomial M (x1 , . . . , xv ) and let ri=1 xi be
the maximal monomial of M , i.e. reorder the indices {1, . . . , v} such that
r
Y
M (x1 , . . . , xv ) = xi + R(x1 , . . . , xv )
i=1

where there is no monomial term in R(x1 , . . . , xv ) with more than r variables. There are 2v−r
ways to choose the variables xr+1 , . . . , xv , but none of them can cause the maximal monomial to
be cancelled. This leads to the bound d ≥ 2v−r , which implies d ≥ 2v−` since r ≤ ` by the
definition of M .

5 Summary
Two families of linear codes, Reed-Solomon and Reed-Muller, were presented and analyzed using
various algebraic properties. Though the Reed-Solomon codes can be used to achieve R, δ > 0,
and in fact achieve the optimal trade-off matching the Singleton bound, this can only be done if
the alphabet size q increases linearly in the block length, i.e., q ≥ n. Reed-Muller codes use
multivariate polynomials to give codes over smaller alphabets, although they are unable to give
codes with R, δ > 0 over a bounded alphabet size.

6
CSE 533: Error-Correcting Codes (Autumn 2006)

Lecture 8: Reed Muller Codes

Lecturer: Venkat Guruswami Scribe: Prasad Raghavendra

In the previous lecture, we defined Reed Muller codes and their variants. Today, we will study
an efficient algorithm for decoding Reed Muller code when the number of errors are less than half
the distance. Then we shall return to our original goal of constructing explicit codes with constant
relative distance and rate. Towards this, we will convert Reed Solomon codes in to binary codes.

1 Recap
Let R(r, m) denote the rth order Reed Muller code. Therefore the messages consist of multilinear
polynomials in variablesPX1 , X2 . . . Xm of degree at most r. Recall that the lengthPof the code n=
m r m m−r m r m
 m−r
2 , the dimension k = i=0 i and the distance is 2 . i.e R(r, m) is a [2 , i=0 i , 2 ]2
linear code.
Some interesting special cases of the Reed Muller code:
m √
• With r = m2 , R(r, m) gives a code with rate 12 and distance d = 2 2 = n. Although this is
not constant distance, it is a fairly non-trivial code with a good rate.

• With r = 1, R(r, m) yeilds a linear code with parameters [2m , m + 1, 2m−1 ]. Further with
r = 1, a code word consists of the evaluation of a degree 1(linear) function over Fm
2 . Hence
R(1, m) code consists of the Hadammard codes and their complements.

2 Reed’s algorithm
2.1 Notation
We will use X = (x1 , . . . , xm ) to denote an element of Fm 2 . For a subset S ⊆ {1, . . . , m}, let XS
denote the restriction of X to indices in S. i.e |S|-dimensional vector consisting of xi for i ∈ S.
Denote by S the complement of a set S.
Let R(r, m) be a binary Reed Muller code. Let f : Fm 2 → F2 be given as a table of values. If f
is a codeword then f is a polynomial P (x1 , . . . xm ) of degree atmost r. For a general function f ,
define the distance from a polynomial P (x1 , . . . , xm ) as follows:

∆(f, p) = |{a ∈ Fm
2 |f (a) 6= P (a)}|

1
2.2 Algorithm
The input consists of a message which is atmost d2 away from a codeword. In particular, we are
given a function f : Fm2 → F2 such that there exists a polynomial P of degree r with ∆(f, P ) <
2m−r
2
. The goal of the algorithm is to output polynomial P .
Let us say P is of the form
X Y
P (X) = cS xi
S⊆{1,...,n},|S|≤r i∈S

For a subset S ⊂ {1, 2, . . . , m}, define the polynomial


Y
RS (X) = xi
i∈S

Observation 2.1. For any proper subset T ⊂ S, we have


X
RT (a) = 0
|S|
a∈F2

Proof: Let i ∈ S − T , then we can write the above summation as follows :


X X X
RT (a) = RT (a) + RT (a)
|S| |S| |S|
a∈F2 a∈F2 ,ai =0 a∈F2 ,ai =1
X
= 2 RT (a) = 0
|S|
a∈F2 ,ai =0

Observation 2.2. For any set S,


X
RS (a) = 1
|S|
a∈F2

Q |S|
Proof: Recall that RS (a) = i∈S xS . Hence for all but one of the values a ∈ F2 , RS (a) is 0.
Therefore the above identity follows.

Lemma 2.3. For all b ∈ F2m−r , and a degree r polynomial P the following is true
X
P (a) = cS
a∈Fm
2 ,aS =b

Proof : Let Pb be the polynomial obtained by substituting b ∈ Fm−r


2 for variables {xi |i ∈ S}.
Hence Pb is of the following form :

2
X
Pb (X) = cS RS (X) + aT RT (X)
T ⊂S

for some aT .
X X
P (a) = Pb (y)
a∈Fm
2 ,aS =b y∈Fr2
X X X
= cS RS (y) + aT RT (y)
y∈Fr2 T ⊂S y∈Fr2

Using observation 2.1,2.2 with the above equation, it follows that


X
P (a) = cS
a∈Fm
2 ,aS =b

The lemma 2.3 suggests an algorithmic method to obtain the coefficients cS of the polynomial
P . Simply sum the value of the polynomial for all a ∈ Fn2 with aS = b for some b ∈ Fr2 . Further
for each of the 2m−r choices for b, the sum ranges over disjoint set of points in Fm
2 . That is the sets
m
{a ∈ F2 |aS = b} are all disjoint.
By our assumption, f differs from the polynomial P in atmost 2m−r−1 − 1 positions. Hence
for atleast 2m−r − (2m−r−1 − 1) values of b we have
X X
f (a) = P (a) = cS
a∈Fm
2 ,aS =b a∈Fm
2 ,aS =b

Out of the 2m−r sums of the form a∈Fm ,a =b f (a), more than 12 fraction of them are equal to
P
2 S
cS . Thus a natural way to compute cS is to compute majority of all these sums. Towards finding
the other lower degree terms in P , reduce the problem as follows :

X
f0 = f − cS RS (x)
|S|=r
X
P0 = P − cS RS (x)
|S|=r

Since f 0 is atmost 2m−r < 2m−(r−1) away from a degree r − 1 polynomial P 0 , the above
procedure can be used to find the degree r − 1 terms of P 0 . Hence iteratively, all the coefficients
cS of P can be computed.
The formal description of the algorithm is given below

3
Reed’s Algorithm

Input : A function f : Fm2 → F2 such that there exists a polynomial P of degree r


m−r
with ∆(f, P ) < 2 2 .
Output : The polynomial P .

t ← r, F ← f, P ← 0
While t ≥ 0 do
For each S ⊂ {1, . . . m} with |S| = t
X 
cS = Majority over all b of F (a)
a∈Fm
2 ,aS =b
Y
P = P + cS xi
i∈S

For each x ∈ Fm
2
Y
F (x) = F (x) − cS xi
i∈S

t←t−1
Output P

3 Extension Fields
For every prime p, the field Fp consists of {0, 1, . . . , p−1} with addition and multiplication modulo
p. For any integer k, the extension field Fpk is obtained as follows:
Consider the ring Fp [x] consisting of polynomials over one variable(x) with coefficients over
Fp . Let q(x) be an irreducible degree k polynomial in Fp [x]. Since q(x) is irreducible, for any
other polynomial r(x) one of the following two cases is true

• gcd(q(x), r(x)) = 1 : By Euclid’s algorithm, we can find a(x), b(x) such that a(x)r(x) +
b(x)q(x) = 1. i.e there exists a(x) such that a(x)r(x) ≡ 1 mod q(x)

• gcd(q(x), r(x)) = q(x) :In this case a(x) = 0 mod q(x).

That is each polynomial r(x) is either 0 mod q(x) or has an inverse a(x). Hence the set of
polynomials modulo q(x) form a field. This field consists of all polynomials over Fp with degree
less than k and is denoted by Fp [x]/(q(x)). Clearly there are pk elements in this field. Further it
can be shown that the field obtained from different degree k irreducible polynomials all behave the
same way, i.e are isomorphic to each other.

4
Notice that the set of polynomials Fp [x] form a vector space over Fp . Hence the field Fp [x]/(q(x))
also form a vector space over the field Fp . Every polynomial of degree less than k can be repre-
sented naturally as a length k vector of Fp . This implies a representation/mapping of elements of
the extension field Fp [x]/(q(x)) as k-dimensional Fp vectors. So we have a mapping

Fp [x]
φ: → Fkp
(q(x))
Infact the above mapping φ is a linear mapping in the following sense : For any two elements
Fp [x]
r(x), s(x) of (q(x)) we have

φ(r(x) + s(x)) = φ(r(x)) + φ(s(x))


In summary, elements of the extension field can be represented as k-dimensional vectors over
Fp in a way that preserves linearity.

3.1 Irreducible Polynomials


Explicit irreducible polynomials are necessary to construct extension fields. It can be shown that
there is an abundance of irreducible polynomials, that is a random degree k polynomial is irre-
ducible with high probability. Explicit construction of irreducible polynomials are also known.
Lemma 3.1. For each k > 0 the following polynomial over F2 is irreducible
k−1 k−1
P (x) = x2·3 + x3 +1

Proof: Suppose not, let us say there are polynomials Q(x), R(x) over F2 such that P (x) =
Q(x)R(x). Observe that
k k−1 k−1 k−1
x3 − 1 = (x3 − 1)(x2·3 + x3 + 1)

Let F∗2 be the algebraic closure of F2 . All our arguments will be over F∗2 , of which F2 is a subfield.
Let ζ be the 3k primitive root of 1. So we have
k
ζ3 = 1
k−1
ζ 3 6= 1

k k−1
Since x3 − 1 = (x3 − 1)P (x) we get P (ζ) = 0. Hence either Q(ζ) = 0 or R(ζ) = 0, without
loss of generality we can assume Q(ζ) = 0. Recall that Φ : x → x2 gives an automorphism of the
F∗2 known as the Frechet’s mapping. That is for any two elements x, y we have

Φ(x) + Φ(y) = Φ(x + y)


Φ(x) ∗ Φ(y) = Φ(xy)

5
Further the elements of F2 = {0, 1} are fixed by the mapping Φ : x → x2 . In particular the
coefficients of the polynomial Q are fixed by Φ. Therefore if we apply Φ on the equation Q(ζ) = 0,
t
we get Q(ζ 2 ) = 0. Applying this repeatedly, we can conclude that {ζ, ζ 2 , ζ 4 , . . . , ζ 2 } are all roots
t
of Q. Now let us count the number of distinct elements of the form ζ 2 . Let n = 3k , then the
Euler’s totient function φ(n) = 2 · 3k−1 . By Euler’s theorem
2φ(n) = 1 mod n
2φ (n) i
Clearly this implies that ζ = ζ. Since ζ is a primitive nth root of 1, ζ 2 = ζ implies 2i ≡ 1
mod n. It can be shown that 2 is a primitive root modulo n = 3k . Therefore for any i < φ(n), 2i 6=
φ(n)−1
1 mod n. Hence all the elements ζ, ζ 2 , . . . , ζ 2 are distinct. Recall that all these elements are
roots of Q. But since P (x) = Q(x)R(x), the degree of Q < φ(n). This is a contradiction, since Q
cannot have more than roots than its degree.

4 Reed Solomon Codes


Reed Solomon codes are explicit linear codes that are optimal in that they meet the Singleton
bound. However these codes are over an alphabet of size q ≥ log n. Now let us try to obtain good
codes over a smaller alphabet(say F2 ) using Reed Solomon codes.
A natural thing to do is to represent each alphabet by a binary string of length log q. Let us
assume q is a power of 2, i.e q = 2t . This can be arranged for by choosing a Reed Solomon code
over an extension field of a degree t F2 -irreducible polynomial. Then as observed earlier, there is
a natural representation of elements of the extension field as t-dimensional vectors over F2 . Recall
that this representation preseves linearity. This is good news, since we not only obtained a binary
code but also a binary linear code.
Let us investigate the parameters of the binary linear code obtained. Suppose we started with
a [n, k, d]2t Reed Solomon code, then the length of our new code is nt, and dimension is kt. We
know that changing d symbols in the original code can change one codeword to another. It is
possible that, each of these d symbols need to be changed at just one bit. Hence the distance still
remains d. Therefore the code obtained has parameters [nt, kt, d]2 .
However Reed-Solomon codes are popular in practice because errors in real life channels tend
to be burst errors. Hence it is more likely that all the d errors are situated together in the same
symbol, or very few symbols. In this case, clearly the codeword can be recovered.
We still have not reached our goal of constructing a family of binary codes with constant rate
and constant distance.
An Idea :
The conversion from Reed Solomon codes to binary linear codes failed because by changing
one bit we could change the original symbol. In other words, two symbols could differ at just
one bit in their representation. Hence we should represent the original alphabet such that different
symbols differ at several places. This is exactly same as the original problem of error correcting
codes. So the idea would be to encode the symbols of the large alphabet using code words of
an error correcting code. In the next class, we will use this idea to construct binary codes with
constant rate and distance.

6
6.4. REED-MULLER CODES 65

their squared Euclidean distance will be2

s(x) − s(y) 2 = 4α2 dH (x, y).

Therefore
d2min (s(C)) = 4α2 dH (C) = 4α2 d,
where d = dH (C) is the minimum Hamming distance of C.
It follows that the nominal coding gain of s(C) is
d2min (s(C)) kd
γc (s(C)) = = . (6.1)
4Eb n
Thus the parameters (n, k, d) directly determine γc (s(C)) in this very simple way. (This gives
another reason to prefer Eb /N0 to SNRnorm in the power-limited regime.)
Moreover, every vector s(x) ∈ s(C) has the same number of nearest neighbors Kmin (s(x)),
namely the number Nd of nearest neighbors to x ∈ C. Thus Kmin (s(C)) = Nd , and Kb (s(C)) =
Nd /k.
Consequently the union bound estimate of Pb (E) is

Pb (E) ≈ Kb (s(C))Q (γc (s(C))(2Eb /N0 ))
� �
Nd √ dk
= Q 2Eb /N0 . (6.2)
k n

In summary, the parameters and performance of the binary signal constellation s(C) may be
simply determined from the parameters (n, k, d) and Nd of C.
Exercise 1. Let C be an � (n, k, d) binary linear code with d odd. Show that if we append an
overall parity check p = i xi to each codeword x, then we obtain an (n + 1, k, d + 1) binary
linear code C  with d even. Show that the nominal coding gain γc (C  ) is always greater than
γc (C) if k > 1. Conclude that we can focus primarily on linear codes with d even.
Exercise 2. Show that if C is a binary linear block code, then in every coordinate position
either all codeword components are 0 or half are 0 and half are 1. Show that a coordinate
in which all codeword components are 0 may be deleted (“punctured”) without any loss in
performance, but with savings in energy and in dimension. Show that if C has no such all-zero
coordinates, then s(C) has zero mean: m(s(C)) = 0.

6.4 Reed-Muller codes


The Reed-Muller (RM) codes are an infinite family of binary linear codes that were among the
first to be discovered (1954). For block lengths n ≤ 32, they are the best codes known with
minimum distances d equal to powers of 2. For greater block lengths, they are not in general
the best codes known, but in terms of performance vs. decoding complexity they are still quite
good, since they admit relatively simple ML decoding algorithms.
2
Moreover, the Eudlidean-space inner product of s(x) and s(y) is
s(x), s(y) = (n − dH (x, y))α2 + dH (x, y)(−α2 ) = (n − 2dH (x, y))α2 .
Therefore s(x) and s(y) are orthogonal if and only if dH (x, y) = n/2. Also, s(x) and s(y) are antipodal (s(x) =
−s(y)) if and only if dH (x, y) = n.
66 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES

For any integers m ≥ 0 and 0 ≤ r ≤ m, there exists an RM code, denoted by RM(r, m), that
has length n = 2m and minimum Hamming distance d = 2m−r , 0 ≤ r ≤ m.
For r = m, RM(m, m) is defined as the universe (2m , 2m , 1) code. It is helpful also to define
RM codes for r = −1 by RM(−1, m) = (2m , 0, ∞), the trivial code of length 2m . Thus for
m = 0, the two RM codes of length 1 are the (1, 1, 1) universe code RM(0, 0) and the (1, 0, ∞)
trivial code RM(−1, 0).
The remaining RM codes for m ≥ 1 and 0 ≤ r < m may be constructed from these elementary
codes by the following length-doubling construction, called the |u|u + v| construction (originally
due to Plotkin). RM(r, m) is constructed from RM(r − 1, m − 1) and RM(r, m − 1) as

RM(r, m) = {(u, u + v) | u ∈ RM(r, m − 1), v ∈ RM(r − 1, m − 1)}. (6.3)

From this construction, it is easy to prove the following facts by recursion:

(a) RM(r, m) is a binary linear block code with length n = 2m and dimension

k(r, m) = k(r, m − 1) + k(r − 1, m − 1).

(b) The codes are nested, in the sense that RM(r − 1, m) ⊆ RM(r, m).

(c) The minimum distance of RM(r, m) is d = 2m−r if r ≥ 0 (if r = −1, then d = ∞).

We verify that these assertions hold for RM(0, 0) and RM(−1, 0).
For m ≥ 1, the linearity and length of RM(r, m) are obvious from the construction. The
dimension (size) follows from the fact that (u, u + v) = 0 if and only if u = v = 0.
Exercise 5 below shows that the recursion for k(r, m) leads to the explicit formula
� �m�
k(r, m) = , (6.4)
j
0≤j≤r
� m� m!
where j denotes the combinatorial coefficient j!(m−j)! .
The nesting property for m follows from the nesting property for m − 1.
Finally, we verify that the minimum nonzero weight of RM(r, m) is 2m−r as follows:

(a) if u = 0, then wH ((0, v)) = wH (v) ≥ 2m−r if v = 0, since v ∈ RM(r − 1, m − 1).

(b) if u + v = 0, then u = v ∈ RM(r − 1, m − 1) and wH ((v, 0)) ≥ 2m−r if v = 0.

0 and u + v = 0, then both u and u + v are in RM(r, m − 1) (since RM(r − 1, m − 1)


(c) if u =
is a subcode of RM(r, m − 1)), so

wH ((u, u + v)) = wH (u) + wH (u + v) ≥ 2 · 2m−r−1 = 2m−r .

Equality clearly holds for (0, v), (v, 0) or (u, u) if we choose v or u as a minimum-weight
codeword from their respective codes.
6.4. REED-MULLER CODES 67

The |u|u + v| construction suggests the following tableau of RM codes:

r = m, d = 1;
*
 universe codes

 
(32, 32, 1)

  r = m − 1, d = 2;
 
(16, 16, 1)
 * SPC codes
(8,8, 1)
 
31, 2)
(32,
 
   r = m − 2, d = 4;
 
(4, 4, 1)  
(16, 15, 2)
* ext. Hamming codes

    
26, 4)

(2, 2, 1) 
(8, 7, 2) 
(32,
HH  
 HH 
(4,3, 2) 
 1, 1)
(1,
H H
 
H  
(16, 11, 4)
HH  H 
HH  HH 
(2,1, 2)
HH H

(8,4, 4)

H
-
(32, 16, 8) k = n/2;
 H self-dual codes
HH  HH
(1,
HH
0, ∞) H(4,1, 4)
H HH(16, 5, 8)
HH H H HH
H(2, 0, ∞) HH(8, 1, 8) HH(32, 6, 16)
H H HH
HH H
HH(4, 0, ∞) H H (16, 1, 16) HH r = 1, d = n/2;
HH HH H
j
H biorthogonal codes
HH HH
HH(8, 0, ∞) H(32, 1, 32)
HH
H
H H
HH(16, 0, ∞) H r = 0, d = n;
H H
j
HH repetition codes
HH
(32, 0, ∞)
H H
HH r = −1, d = ∞;
H
j
trivial codes
Figure 2. Tableau of Reed-Muller codes.

In this tableau each RM code lies halfway between the two codes of half the length that are
used to construct it in the |u|u + v| construction, from which we can immediately deduce its
dimension k.
Exercise 3. Compute the parameters (k, d) of the RM codes of lengths n = 64 and 128.
There is a known closed-form formula for the number Nd of codewords of minimum weight
d = 2m−r in RM(r, m):
� 2m−i − 1
Nd = 2r . (6.5)
2m−r−i − 1
0≤i≤m−r−1

Example 4. The number of weight-8 words in the (32, 16, 8) code RM(2, 5) is
31 · 15 · 7
N8 = 4 = 620.
7·3·1
The nominal coding gain of RM(2, 5) is γc (C) = 4 (6.02 dB); however, since Kb = N8 /k = 38.75,
the effective coding gain by our rule of thumb is only about γeff (C) ≈ 5.0 dB.
68 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES

The codes with r = m − 1 are single-parity-check (SPC) codes with d = 2. These codes
have nominal coding gain 2(k/n), which goes to 2 (3.01 dB) as n → ∞; however, since Nd =
2m (2m − 1)/2, we have Kb = 2m−1 → ∞, which ultimately limits the effective coding gain.
The codes with r = m − 2 are extended Hamming (EH) codes with d = 4. These codes
have nominal coding gain 4(k/n), which goes to 4 (6.02 dB) as n → ∞; however, since Nd =
2m (2m − 1)(2m − 2)/24, we again have Kb → ∞.
Exercise 4 (optimizing SPC and EH codes). Using the rule of thumb that a factor of two
increase in Kb costs 0.2 dB in effective coding gain, find the value of n for which an (n, n − 1, 2)
SPC code has maximum effective coding gain, and compute this maximum in dB. Similarly, find
m such that a (2m , 2m − m − 1, 4) extended Hamming code has maximum effective coding gain,
using Nd = 2m (2m − 1)(2m − 2)/24, and compute this maximum in dB.
The codes with r = 1 (first-order Reed-Muller codes) are interesting, because as shown in
Exercise 5 they generate biorthogonal signal sets of dimension n = 2m and size 2m+1 , with
nominal coding gain (m + 1)/2 → ∞. It is known that as n → ∞ this sequence of codes can
achieve arbitrarily small Pr(E) for any Eb /N0 greater than the ultimate Shannon limit, namely
Eb /N0 > ln 2 (-1.59 dB).
Exercise 5 (biorthogonal codes). We have shown that the first-order Reed-Muller codes
RM(1, m) have parameters (2m , m + 1, 2m−1 ), and that the (2m , 1, 2m ) repetition code RM(0, m)
is a subcode.
(a) Show that RM(1, m) has one word of weight 0, one word of weight 2m , and 2m+1 − 2
words of weight 2m−1 . [Hint: first show that the RM(1, m) code consists of 2m complementary
codeword pairs {x, x + 1}.]
(b) Show that the Euclidean image of an RM(1, m) code is an M = 2m+1 biorthogonal signal
set. [Hint: compute all inner products between code vectors.]
(c) Show that the code C  consisting of all words in RM(1, m) with a 0 in any given coordinate
position is a (2m , m, 2m−1 ) binary linear code, and that its Euclidean image is an M = 2m
orthogonal signal set. [Same hint as in part (a).]
(d) Show that the code C  consisting of the code words of C  with the given coordinate deleted
(“punctured”) is a binary linear (2m − 1, m, 2m−1 ) code, and that its Euclidean image is an
M = 2m simplex signal set. [Hint: use Exercise 7 of Chapter 5.]
In Exercise 2 of Chapter 1, it was shown how a 2m -orthogonal signal set A can be constructed
as the image of a 2m × 2m binary Hadamard matrix. The corresponding 2m+1 -biorthogonal
signal set ±A is identical to that constructed above from the (2m , m + 1, 2m−1 ) first-order RM
code.
The code dual to RM(r, m) is RM(m − r − 1, m); this can be shown by recursion from the
facts that the (1, 1) and (1, 0) codes are duals and that by bilinearity

(u, u + v), (u , u + v )


= u, u
+ u + v, u + v
= u, v
+ v, u
+ v, v
,

since u, u
+ u, u
= 0. In particular, this confirms that the repetition and SPC codes are
duals, and shows that the biorthogonal and extended Hamming codes are duals.
This also shows that RM codes with k/n = 1/2 are self-dual. The nominal coding gain of a
rate-1/2 RM code of length 2m (m odd) is 2(m−1)/2 , which goes to infinity as m → ∞. It seems
likely that as n → ∞ this sequence of codes can achieve arbitrarily small Pr(E) for any Eb /N0
greater than the Shannon limit for ρ = 1 b/2D, namely Eb /N0 > 1 (0 dB).
6.4. REED-MULLER CODES 69

Exercise 6 (generator matrices for RM codes). Let square 2m × 2m matrices Um , m ≥ 1, be


specified recursively as follows. The matrix U1 is the 2 × 2 matrix
� �
1 0
U1 = .
1 1

The matrix Um is the 2m × 2m matrix


� �
Um−1 0
Um = .
Um−1 Um−1

(In other words, Um is the m-fold tensor product of U1 with itself.)


(a) Show that RM(r, m) is generated by the rows of Um of Hamming weight 2m−r or greater.
[Hint: observe that this holds for m = 1, and prove by recursion using the |u|u+ v| construction.]
For example, give a generator matrix for the (8, 4, 4) RM code.
� � �m�
(b) Show that the number of rows of Um of weight 2m−r is m r . [Hint: use the fact that r
is the coefficient of z m−r in the integer polynomial (1 + z)m .]
� � �
(c) Conclude that the dimension of RM(r, m) is k(r, m) = 0≤j≤r m j .

6.4.1 Effective coding gains of RM codes

We provide below a table of the nominal spectral efficiency ρ, nominal coding gain γc , number
of nearest neighbors Nd , error coefficient per bit Kb , and estimated effective coding gain γeff at
Pb (E) ≈ 10−5 for various Reed-Muller codes, so that the student can consider these codes as
components in system design exercises.
In later lectures, we will consider trellis representations and trellis decoding of RM codes. We
give here two complexity parameters of the minimal trellises for these codes: the state complexity
s (the binary logarithm of the maximum number of states in a minimal trellis), and the branch
complexity t (the binary logarithm of the maximum number of branches per section in a minimal
trellis). The latter parameter gives a more accurate estimate of decoding complexity.

code ρ γc (dB) Nd Kb γeff (dB) s t


(8,7,2) 1.75 7/4 2.43 28 4 2.0 1 2
(8,4,4) 1.00 2 3.01 14 4 2.6 2 3
(16,15,2) 1.88 15/8 2.73 120 8 2.1 1 2
(16,11,4) 1.38 11/4 4.39 140 13 3.7 3 5
(16, 5,8) 0.63 5/2 3.98 30 6 3.5 3 4
(32,31, 2) 1.94 31/16 2.87 496 16 2.1 1 2
(32,26, 4) 1.63 13/4 5.12 1240 48 4.0 4 7
(32,16, 8) 1.00 4 6.02 620 39 4.9 6 9
(32, 6,16) 0.37 3 4.77 62 10 4.2 4 5
(64,63, 2) 1.97 63/32 2.94 2016 32 1.9 1 2
(64,57, 4) 1.78 57/16 5.52 10416 183 4.0 5 9
(64,42, 8) 1.31 21/4 7.20 11160 266 5.6 10 16
(64,22,16) 0.69 11/2 7.40 2604 118 6.0 10 14
(64, 7,32) 0.22 7/2 5.44 126 18 4.6 5 6
Table 1. Parameters of RM codes with lengths n ≤ 64.
70 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES

6.5 Decoding of binary block codes


In this section we will first show that with binary codes MD decoding reduces to “maximum-
reliability decoding.” We will then discuss the penalty incurred by making hard decisions and
then performing classical error-correction. We show how the penalty may be partially mitigated
by using erasures, or rather completely by using generalized minimum distance (GMD) decoding.

6.5.1 Maximum-reliability decoding

All of our performance estimates assume minimum-distance (MD) decoding. In other words,
given a received sequence r ∈ Rn , the receiver must find the signal s(x) for x ∈ C such that the
squared distance r − s(x) 2 is minimum. We will show that in the case of binary codes, MD
decoding reduces to maximum-reliability (MR) decoding.
Since s(x) 2 = nα2 is independent of x with binary constellations s(C), MD decoding is
equivalent to maximum-inner-product decoding : find the signal s(x) for x ∈ C such that the
inner product �
r, s(x)
= rk s(xk )
k

is maximum. Since s(xk ) = (−1)xk α,


the inner product may be expressed as
� �
r, s(x)
= α rk (−1)xk = α |rk | sgn(rk )(−1)xk
k k

The sign sgn(rk ) ∈ {±1} is often regarded as a “hard decision” based on rk , indicating which of
the two possible signals {±α} is more likely in that coordinate without taking into account the
remaining coordinates. The magnitude |rk | may be viewed as the reliability of the hard decision.
This rule may thus be expressed as: find the codeword x ∈ C that maximizes the reliability

r(x | r) = |rk |(−1)e(xk ,rk ) ,
k

where the “error” e(xk , rk ) is 0 if the signs of s(xk ) and rk agree, or 1 if they disagree. We call
this rule maximum-reliability decoding.
Any of these optimum decision rules is easy to implement for small constellations s(C). How-
ever, without special tricks they require at least one computation for every codeword x ∈ C, and
therefore become impractical when the number 2k of codewords becomes large. Finding simpler
decoding algorithms that give a good tradeoff of performance vs. complexity, perhaps only for
special classes of codes, has therefore been the major theme of practical coding research.
For example, the Wagner decoding rule, the earliest “soft-decision” decoding algorithm (circa
1955), is an optimum decoding rule for the special class of (n, n − 1, 2) SPC codes that requires
many fewer than 2n−1 computations.
Exercise 7 (“Wagner decoding”). Let C be an (n, n − 1, 2) SPC code. The Wagner decoding
rule is as follows. Make hard decisions on every symbol rk , and check whether the resulting
binary word is in C. If so, accept it. If not, change the hard decision in the symbol rk for which
the reliability metric |rk | is minimum. Show that the Wagner decoding rule is an optimum
decoding rule for SPC codes. [Hint: show that the Wagner rule finds the codeword x ∈ C that
maximizes r(x | r).]
6.5. DECODING OF BINARY BLOCK CODES 71

6.5.2 Hard decisions and error-correction

Early work on decoding of binary block codes assumed hard decisions on every symbol, yielding
a hard-decision n-tuple y ∈ (F2 )n . The main decoding step is then to find the codeword x ∈ C
that is closest to y in Hamming space. This is called error-correction.
If C is a linear (n, k, d) code, then, since the Hamming metric is a true metric, no error can
occur when a codeword x is sent unless the number of hard decision errors t = dH (x, y) is at
least as great as half the minimum Hamming distance, t ≥ d/2. For many classes of binary
block codes, efficient algebraic error-correction algorithms exist that are guaranteed to decode
correctly provided that 2t < d. This is called bounded-distance error-correction.
Example 5 (Hamming codes). The first binary error-correction codes were the Hamming
codes (mentioned in Shannon’s original paper). A Hamming code C is a (2m − 1, 2m − m − 1, 3)
code that may be found by puncturing a (2m , 2m − m − 1, 4) extended Hamming RM(m − 2, m)
code in any coordinate. Its dual C ⊥ is a (2m − 1, m, 2m−1 ) code whose Euclidean image is a
2m -simplex constellation. For example, the simplest Hamming code is the (3, 1, 3) repetition
code; its dual is the (3, 2, 2) SPC code, whose image is the 4-simplex constellation of Figure 1.
The generator matrix of C ⊥ is an m × (2m − 1) matrix H whose 2m − 1 columns must run
through the set of all nonzero binary m-tuples in some order (else C would not be guaranteed
to correct any single error; see next paragraph).
Since d = 3, a Hamming code should be able to correct any single error. A simple method for
doing so is to compute the “syndrome”
yH T = (x + e)H T = eH T ,
where e = x + y. If yH T = 0, then y ∈ C and y is assumed to be correct. If yH T = 0, then
the syndrome yH T is equal to one of the rows in H T , and a single error is assumed to have
occurred in the corresponding position. Thus it is always possible to change any y ∈ (F2 )n into
a codeword by changing at most one bit.
This implies that the 2n−m “Hamming spheres” of radius 1 and size 2m centered on the 2n−m
codewords x, which consist of x and the n = 2m − 1 n-tuples y within Hamming distance 1 of
x, form an exhaustive partition of the set of 2n n-tuples that comprise Hamming n-space (F2 )n .
In summary, Hamming codes form a “perfect” Hamming sphere-packing of (F2 )n , and have a
simple single-error-correction algorithm.
We now show that even if an error-correcting decoder does optimal MD decoding in Hamming
space, there is a loss in coding gain of the order of 3 dB relative to MD Euclidean-space decoding.
Assume an (n, k, d) binary linear code C with d odd (the situation is worse when d is even).
Let x be the transmitted codeword; then there is at least one codeword at Hamming distance
d from x, and thus at least one real n-tuple in s(C) at Euclidean distance 4α2 d from s(x). For
any ε > 0, a hard-decision decoding error will occur if the noise exceeds α + ε in any (d + 1)/2
of the places in which that word differs from x. Thus with hard decisions the minimum squared
distance to the decision boundary in Euclidean space is α2 (d + 1)/2. (For d even, it is α2 d/2.)
On the other hand, with “soft decisions” (reliability weights) and MD decoding, the minimum
squared distance to any decision boundary in Euclidean

space is α2 d. To the accuracy of
the union bound estimate, the argument of the Q function thus decreases with hard-decision
decoding by a factor of (d + 1)/2d, or approximately 1/2 (−3 dB) when d is large. (When d is
even, this factor is exactly 1/2.)
72 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES

Example 6 (Hard and soft decoding of antipodal codes). Let C be the (2, 1, 2) binary code;
then the two signal points in s(C) are antipodal, as shown in Figure 3(a) below. With hard
decisions, real 2-space R2 is partitioned into four quadrants, which must then be assigned to one
or the other of the two signal points. Of course, two of the quadrants are assigned to the signal
points that they contain. However, no matter how the other two quadrants are assigned, there
will be at least one decision boundary at squared distance α2 from a signal point, whereas with
MD decoding the decision boundary is at distance 2α2 from both signal points. The loss in the
error exponent of Pb (E) is therefore a factor of 2 (3 dB).

 R0  R0 

  
  
 R1  R0 
  
 α t R0

R0 

α
? R0 R1 R0  R0
? 
6 6 �
 R1
R1 ? α √2αR1
� R1 
α 
� 
t - t
� - 
α α 

(a) (b)

Figure 3. Decision regions in Rn with hard decisions. (a) (2, 1, 2) code; (b) (3, 1, 3) code.

Similarly, if C is the (3, 1, 3) code, then R3 is partitioned by hard decisions into 8 octants, as
shown in Figure 3(b). In this case (the simplest example of a Hamming code), it is clear how
best to assign four octants to each signal point. The squared distance from each signal point
to the nearest decision boundary is now 2α2 , compared to 3α2 with “soft decisions” and MD
decoding in Euclidean space, for a loss of 2/3 (1.76 dB) in the error exponent.

6.5.3 Erasure-and-error-correction

A decoding method halfway between hard-decision and “soft-decision” (reliability-based) tech-


niques involves the use of “erasures.” With this method, the first step of the receiver is to map
each received signal rk into one of three values, say {0, 1, ?}, where for some threshold T ,

rk → 0 if rk > T ;
rk → 1 if rk < −T ;
rk → ? if −T ≤ rk ≤ T.

The decoder subsequently tries to map the ternary-valued n-tuple into the closest codeword
x ∈ C in Hamming space, where the erased positions are ignored in measuring Hamming distance.
If there are s erased positions, then the minimum distance between codewords is at least
d − s in the unerased positions, so correct decoding is guaranteed if the number t of errors in the
unerased positions satisfies t < (d−s)/2, or equivalently if 2t+ s < d. For many classes of binary
block codes, efficient algebraic erasure-and-error-correcting algorithms exist that are guaranteed
to decode correctly if 2t + s < d. This is called bounded-distance erasure-and-error-correction.
6.5. DECODING OF BINARY BLOCK CODES 73

Erasure-and-error-correction may be viewed as a form of MR decoding in which all reliabilities


|rk | are made equal in the unerased positions, and are set to 0 in the erased positions.
The ternary-valued output allows a closer approximation to the optimum decision regions in
Euclidean space than with hard decisions, and therefore reduces the loss. With an optimized
threshold T , the loss is typically only about half as much (in dB).

 b t

? a
� b


6
? ?


b a
� ?
t� -
b

Figure 4. Decision regions with hard decisions and erasures for the (2, 1, 2) code.

Example 6 (cont.). Figure 4 shows the 9 decision regions for the (2, 1, 2) code that result from
hard decisions and/or erasures on each symbol. Three of the resulting regions are ambiguous.
The minimum squared distances to these regions are

a2 = 2(α − T )2
b2 = (α + T )2 .

To maximize the minimum of a2 and b2 , we make a2 = b2 by choosing T = √2−1 α, which yields
2+1

8
a2 = b2 √ α2 = 1.372α2 .
( 2 + 1)2

This is about 1.38 dB better than the squared Euclidean distance α2 achieved with hard decisions
only, but is still 1.63 dB worse than the 2α2 achieved with MD decoding.
Exercise 8 (Optimum threshold T ). Let C be a binary code with minimum distance d, and
let received symbols be mapped into hard decisions or erasures as above. Show that:
(a) For any integers t and s such that 2t + s ≥ d and for any decoding rule, there exists some
pattern of t errors and s erasures that will cause a decoding error;
(b) The minimum squared distance from any signal point to its decoding decision boundary is
equal to at least min2t+s≥d {s(α − T )2 + t(α + T )2 };

(c) The value of T that maximizes this minimum squared distance is T = √2−1 α, in which
2+1
case the minimum squared distance is equal to √ 4 α2 d = 0.686 α2 d. Again, this is a loss of
( 2+1)2
1.63 dB relative to the squared distance α2 d that is achieved with MD decoding.
74 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES

6.5.4 Generalized minimum distance decoding

A further step in this direction that achieves almost the same performance as MD decoding,
to the accuracy of the union bound estimate, yet still permits algebraic decoding algorithms, is
generalized minimum distance (GMD) decoding.
In GMD decoding, the decoder keeps both the hard decision sgn(rk ) and the reliability |rk | of
each received symbol, and orders them in order of their reliability.
The GMD decoder then performs a series of erasure-and-error decoding trials in which the
s = d − 1, d − 3, . . . least reliable symbols are erased. (The intermediate trials are not necessary
because if d − s is even and 2t < d − s, then also 2t < d − s − 1, so the trial with one additional
erasure will find the same codeword.) The number of such trials is d/2 if d is even, or (d + 1)/2
if d is odd; i.e., the number of trials needed is d/2.
Each trial may produce a candidate codeword. The set of d/2 trials may thus produce up
to d/2 distinct candidate codewords. These words may finally be compared according to their
reliability r(x | r) (or any equivalent optimum metric), and the best candidate chosen.
Example 7. For an (n, n − 1, 2) SPC code, GMD decoding performs just one trial with
the least reliable symbol erased; the resulting candidate codeword is the unique codeword that
agrees with all unerased symbols. Therefore in this case the GMD decoding rule is equivalent
to the Wagner decoding rule (Exercise 7), which implies that it is optimum.
It can be shown that no error can occur with a GMD decoder provided that the squared norm
||n||2 of the noise vector is less than α2 d; i.e., the squared distance from any signal point to its
decision boundary is α2 d, just as for MD decoding. Thus there is no loss in coding gain or error
exponent compared to MD decoding.
It has been shown that for the most important classes of algebraic block codes, GMD decoding
can be performed with little more complexity than ordinary hard-decision or erasures-and-errors
decoding. Furthermore, it has been shown that not only is the error exponent of GMD decod-
ing equal to that of optimum MD decoding, but also the error coefficient and thus the union
bound estimate are the same, provided that GMD decoding is augmented to include a d-erasure-
correction trial (a purely algebraic solution of the n − k linear parity-check equations for the d
unknown erased symbols).
However, GMD decoding is a bounded-distance decoding algorithm, so its decision regions are
like spheres of squared radius α2 d that lie within the MD decision regions Rj . For this reason
GMD decoding is inferior to MD decoding, typically improving over erasure-and-error-correction
by 1 dB or less. GMD decoding has rarely been used in practice.

6.5.5 Summary

In conclusion, hard decisions allow the use of efficient algebraic decoding algorithms, but incur
a significant SNR penalty, of the order of 3 dB. By using erasures, about half of this penalty
can be avoided. With GMD decoding, efficient algebraic decoding algorithms can in principle
be used with no loss in performance, at least as estimated by the the union bound estimate.
Chapter 7

Introduction to finite fields

This chapter provides an introduction to several kinds of abstract algebraic structures, partic-
ularly groups, fields, and polynomials. Our primary interest is in finite fields, i.e., fields with
a finite number of elements (also called Galois fields). In the next chapter, finite fields will be
used to develop Reed-Solomon (RS) codes, the most useful class of algebraic codes. Groups and
polynomials provide the requisite background to understand finite fields.
A field is more than just a set of elements: it is a set of elements under two operations,
called addition and multiplication, along with a set of properties governing these operations.
The addition and multiplication operations also imply inverse operations called subtraction and
division. The reader is presumably familiar with several examples of fields, such as the real field
R, the complex field C, the field of rational numbers Q, and the binary field F2 .

7.1 Summary
In this section we briefly summarize the results of this chapter. The main body of the chapter
will be devoted to defining and explaining these concepts, and to proofs of these results.
For each prime p and positive integer m ≥ 1, there exists a finite field Fpm with pm elements,
and there exists no finite field with q elements if q is not a prime power. Any two fields with pm
elements are isomorphic.
The integers modulo p form a prime field Fp under mod-p addition and multiplication. The
polynomials Fp [x] over Fp modulo an irreducible polynomial g(x) ∈ Fp [x] of degree m form a
finite field with pm elements under mod-g(x) addition and multiplication. For every prime p,
there exists at least one irreducible polynomial g(x) ∈ Fp [x] of each positive degree m ≥ 1, so
all finite fields may be constructed in this way.
Under addition, Fpm is isomorphic to the vector space (Fp )m . Under multiplication, the nonzero
m
elements of Fpm form a cyclic group {1, α, . . . , αp −2 } generated by a primitive element α ∈ Fpm .
m
The elements of Fpm are the pm roots of the polynomial xp − x ∈ Fp [x]. The polynomial
m
xp − x is the product of all monic irreducible polynomials g(x) ∈ Fp [x] such that deg g(x)
divides m. The roots of a monic irreducible polynomial g(x) ∈ Fp [x] form a cyclotomic coset of
deg g(x) elements of Fpm which is closed under the operation of raising to the pth power.
For every n that divides m, Fpm contains a subfield with pn elements.

75
76 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

For further reading on this beautiful subject, see [E. R. Berlekamp, Algebraic Coding The-
ory, Aegean Press, 1984], [R. Lidl and H. Niederreiter, Introduction to Finite Fields and their
Applications, Cambridge University Press, 1986] or [R. J. McEliece, Finite Fields for Com-
puter Scientists and Engineers, Kluwer, 1987], [M. R. Schroeder, Number Theory in Science and
Communication, Springer, 1986], or indeed any book on finite fields or algebraic coding theory.

7.2 The integers


We begin with a brief review of the familiar factorization properties of the set Z of integers. We
will use these properties immediately in our discussion of cyclic groups and their subgroups and
of prime fields. Moreover, we will model our later discussion of the factorization properties of
polynomials on the discussion here.

7.2.1 Definitions

An integer n is said to be a divisor of an integer i if i is an integer multiple of n; i.e., i = qn for


some integer q. Thus all integers are trivially divisors of 0.
The integers that have integer inverses, namely ±1, are called the units of Z. If u is a unit
and n is a divisor of i, then un is a divisor of i and n is a divisor of ui. Thus the factorization
of an integer can only be unique up to a unit u, and ui has the same divisors as i. We therefore
consider only factorizations of positive integers into products of positive integers.
Every nonzero integer i is divisible by 1 and i; these divisors are called trivial. An integer n
is said to be a factor of an integer i if n is positive and a nontrivial divisor of i. For example, 1
has no nontrivial divisors and thus no factors.
A positive integer that has no nontrivial divisors is called a prime integer.

7.2.2 Mod-n arithmetic

Given a positive integer n, every integer i may be uniquely expressed as i = qn + r for some
integer remainder r in the interval 0 ≤ r ≤ n − 1 and some integer quotient q. This may be
proved by the Euclidean division algorithm, which if i ≥ n just subtracts n from i repeatedly
until the remainder lies in the desired interval.
The remainder r, denoted by r = i mod n, is the more important part of this expression. The
set of possible mod-n remainders is the set of n integers Rn = {0, 1, . . . , n − 1}. Evidently n is
a divisor of i if and only if i mod n = 0.
Remainder arithmetic using the mod-n remainder set Rn is called “mod-n arithmetic.” The
rules for mod-n arithmetic follow from the rules for integer arithmetic as follows. Let r = i mod n
and s = j mod n; then, as integers, r = i − qn and s = j − tn for some quotients q and t. Then

r + s = i + j − (q + t)n;
rs = ij − (qj + ti)n + qtn2 .

Hence (r + s) mod n = (i + j) mod n and rs mod n = ij mod n; i.e., the mod-n remainder of
the sum or product of two integers is equal to the mod-n remainder of the sum or product of
their mod-n remainders, as integers.
7.3. GROUPS 77

The mod-n addition and multiplication rules are therefore defined as follows:

r ⊕ s = (r + s) mod n;
r ∗ s = (rs) mod n,

where “r” and “s” denote elements of the remainder set Rn on the left and the corresponding
ordinary integers on the right. This makes mod-n arithmetic consistent with ordinary integer
arithmetic in the sense expressed in the previous paragraph.

7.2.3 Unique factorization

Given a positive integer i, we may factor i into a unique product of prime factors by simply
factoring out primes no greater than i until we arrive at the quotient 1, as the reader has known
since grade school. For the time being, we will take this unique factorization property as given.
A proof will be given as an exercise after we prove the corresponding property for polynomials.

7.3 Groups
We now introduce groups.

Definition 7.1 A group is a set of elements G = {a, b, c, . . .} and an operation ⊕ for which the
following axioms hold:

• Closure: for any a ∈ G, b ∈ G, the element a ⊕ b is in G.

• Associative law: for any a, b, c ∈ G, (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c).

• Identity: There is an identity element 0 in G for which a ⊕ 0 = 0 ⊕ a = a for all a ∈ G.

• Inverse: For each a ∈ G, there is an inverse (−a) such that a ⊕ (−a) = 0.

In general it is not necessary that a ⊕ b = b ⊕ a. A group G for which a ⊕ b = b ⊕ a for all


a, b ∈ G is called abelian or commutative. In these notes all groups will be abelian.
In view of the associative law, we may write (a⊕b)⊕c as a⊕b⊕c without ambiguity. Moreover,
in an abelian group the elements a, b, c may be written in any order.
Frequently, the operation in a group is called multiplication, usually represented either by ∗
or juxtaposition. The identity is then denoted by 1 (or e) and the inverse of a by a−1 . Additive
notation is generally used only for abelian groups, whereas multiplicative notation is used for
both abelian and nonabelian groups. Since we consider only abelian groups, we will use additive
notation when the nature of the group is unspecified.
As an example, the set of integers Z with the usual addition operation + forms an abelian
group. Also, the real field R forms an additive abelian group under ordinary addition in which
the identity is 0 and the inverse of a is −a. More interestingly, as the reader should verify,
the nonzero elements of R form a multiplicative abelian group under ordinary multiplication, in
which the identity is 1 and the inverse of a is a−1 = 1/a. We will see that every field has similar
additive and multiplicative group properties.
78 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

This example illustrates that the group structure (i.e., the properties stemming from the group
operation ⊕) may reflect only part of the structure of the given set of elements; e.g., the additive
group structure of R takes no account of the fact that real numbers may also be multiplied, and
the multiplicative group structure of R − {0} takes no account of the fact that real numbers may
also be added.
We abbreviate b ⊕ (−a) for any a, b ∈ G by b − a and regard “−” as an additional opera-
tion implicitly defined by the axioms. In an additive group, “−” is called subtraction; in a
multiplicative group, “−” is called division and denoted by / or ÷.
Because of the inverse operation, cancellation is always permissible; i.e., if x ⊕ a = y ⊕ a, we
can add −a to both sides, showing that x = y. Similarly, one can move terms from one side of
an equation to the other; i.e., x ⊕ a = y implies x = y − a.
Exercise 1 (Inverses and cancellation)
(a) Verify the following set of implications for arbitrary elements a, b of a group G which is
not necessarily abelian:

b ⊕ a = 0 ⇒ b = −a ⇒ a ⊕ b = 0 ⇒ a = −b ⇒ b ⊕ a = 0.

(b) Use this result to show that the inverse is unique, i.e., that a ⊕ b = 0 ⇒ b = −a, and
that the inverse also works on the left, i.e., b ⊕ a = 0 ⇒ b = −a. Note that this shows that
cancellation is permitted on either the right or the left.
(c) Show that the identity element is unique, i.e., that for a, b ∈ G, a ⊕ b = a ⇒ b = 0 and
b ⊕ a = a ⇒ b = 0.
If G has a finite number of elements, G = {a1 , a2 , . . . , an }, then G is said to be finite and
|G| = n is said to be the order of G. The group operation ⊕ may then be specified by an n × n
“addition table” whose entry at row i, column j is ai ⊕ aj . The cancellation property implies
that if aj = ak , then ai ⊕ aj =
 ai ⊕ ak . This means that all elements in any row i of the addition
table are distinct; i.e., each row contains each element of G exactly once. Similarly, each column
contains each element of G exactly once. Thus the group axioms restrict the group operation ⊕
more than might be immediately evident.

7.3.1 Alternative group axioms

The property that a “row of the addition table,” namely a ⊕ G = {a ⊕ b | b ∈ G} is just the set
of elements of G in a different order (i.e., a permutation of G) is a fundamental property of any
group G. We will now show that this permutation property may be taken as one of the group
axioms. Subsequently we will use this property to prove that certain sets are groups.

Theorem 7.1 (Alternative group axioms) Let G = {a, b, c, . . .} be a set of elements on


which an operation ⊕ is defined. Then G is a group under the operation ⊕ if and only if the
following axioms hold:

• Associative law: for any a, b, c ∈ G, (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c).

• Identity: There is an identity element 0 in G for which a ⊕ 0 = 0 ⊕ a = a for all a ∈ G.

• Permutation property: For each a ∈ G, a ⊕ G = {a ⊕ b | b ∈ G} is a permutation of G.


7.3. GROUPS 79

Proof. (⇒) If G is a group under ⊕, then by the closure property every element a ⊕ b is in G.
Moreover, the fact that a ∈ G has an inverse −a ∈ G implies that every element b ∈ G may be
written as a ⊕ (−a ⊕ b) ∈ a ⊕ G, so every element of G is in a ⊕ G. Finally, from the cancellation
property, a ⊕ b = a ⊕ c implies b = c. Thus the correspondence between G and a ⊕ G defined by
b ↔ a ⊕ b is one-to-one; i.e., a permutation.
(⇐) Conversely, if a ⊕ G is a permutation of G for every a ∈ G, then (a) the closure property
holds; i.e., a ⊕ b ∈ G for all a, b ∈ G; (b) since 0 ∈ a ⊕ G, there must exist a unique b ∈ G such
that a ⊕ b = 0, so a has a unique inverse −a = b under ⊕. Thus G is a group under ⊕.
The properties of “rows” a ⊕ G hold equally for “columns” G ⊕ a, even when G is nonabelian.
For example, the set R∗ of nonzero elements of the real field R form an abelian group under
real multiplication, because real multiplication is associative and commutative with identity 1,
and αR∗ is a permutation of R∗ for any α ∈ R∗ .
Exercise 2 (Invertible subsets).
(a) Let H be a set of elements on which an associative operation ⊕ is defined with identity 0,
and let G be the subset of elements h ∈ H which have unique inverses −h such that h ⊕ −h = 0.
Show that G is a group under ⊕.
(b) Show that the nonzero elements of the complex field form a group under complex multi-
plication.
(c) Show that the set of invertible n × n real matrices forms a (nonabelian) group under real
matrix multiplication.
(d) What are the invertible elements of Z under multiplication? Do they form a group?

7.3.2 Cyclic groups

An important example of a finite abelian group is the set of remainders Rn = {0, 1, . . . , n − 1}


under mod-n addition, where n is a given positive integer. This group is called “the integers
mod n” and is denoted by Zn . Note that Z1 is the trivial group {0}.
A finite cyclic group is a finite group G with a particular element g ∈ G, called the generator,
such that each element of G can be expressed as the sum, g⊕· · ·⊕g, of some number of repetitions
of g.1 Thus each element of G appears in the sequence of elements {g, g ⊕ g, g ⊕ g ⊕ g, . . .}. We
denote such an i-fold sum by ig, where i is a positive integer and g is a group element; i.e.,

1g = g, 2g = g ⊕ g, . . . , ig = g ⊕ · · · ⊕ g, . . .
� �� �
i terms

Since g generates G and G includes the identity element 0, we must have ig = 0 for some positive
integer i. Let n be the smallest such integer; thus ng = 0 and ig = 0 for 1 ≤ i ≤ n − 1. Adding
 jg. Thus the elements
the sum of j g’s for any j > 0 to each side of ig = 0 results in (i + j)g =
{1g, 2g, . . . , ng = 0} must all be different.
1
Mathematicians say also that an infinite group G = {. . . , −1g, 0g, 1g, 2g, . . .} generated by a single element g
is cyclic; e.g., the group of integers Z is an infinite cyclic group with generator 1. Although such infinite cyclic
groups have the single-generator property of finite cyclic groups, they do not “cycle.” Hereafter, “cyclic group”
will mean “finite cyclic group.”
80 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

We can also add jg to both sides of the equality ng = 0, yielding (j + n)g = jg for any j > 0.
Thus for each i > n, ig is equal to some earlier element in the sequence, namely (i − n)g. The
elements {1g, 2g, . . . , ng = 0} therefore constitute all of the distinct elements in G, and the order
of G is |G| = n. If we define 0g to be the identity 0, then the elements of G may be conveniently
represented as G = {0g = 0, 1g, . . . , (n − 1)g}.
Figure 1 illustrates the cyclic structure of G that arises from the relation (j + n)g = jg.
0 = ng = 2ng = · · ·
r
r (n − 1)g = r g = (n + 1)g = · · ·
(2n − 1)g = · · ·

r r 2g = (n + 2)g = · · ·

r r
3g = (n + 3)g = · · ·
r
4g = (n + 4)g = · · ·
Figure 1. The cyclic structure of a cyclic group: the sequence {1g, 2g, . . .} goes from the group
element g up to ng = 0, then returns to g and continues to cycle.

Addition in a cyclic group of order n can be understood in terms of mod-n addition. In


particular, since ng = 0, we also have 2ng = 0, 3ng = 0, etc. Since any integer i may be uniquely
written as i = qn + r where the remainder r = i mod n is in the set Rn = {0, 1, . . . , n − 1}, we
have ig = (qn)g + rg = rg, where rg = (i mod n)g is one of the elements of G. The addition
rule of G is thus as follows: for each 0 ≤ i, j < n,
ig ⊕ jg = (i + j mod n)g.
Evidently 0g is the identity, and the inverse of a nonzero element ig is (n − i)g.
We thus see that any cyclic group G of order n is essentially identical to Zn . More precisely,
the correspondence ig ∈ G ↔ i ∈ Zn is preserved under addition; i.e., ig ⊕ jg ↔ i ⊕ j for each
i, j ∈ Zn . This type of correspondence is called an isomorphism. Specifically, two finite groups
G and H are isomorphic if there exists an invertible2 function h : G → H mapping each α ∈ G
into a β = h(α) ∈ H such that h(α ⊕ α ) = h(α) ⊕ h(α ), where ⊕ denotes the group operation
of G on the left and that of H on the right. In summary:

Theorem 7.2 (Cyclic groups) The elements of a cyclic group G of order n with generator
g are {0g, 1g, 2g, . . . , (n − 1)g}. The addition rule is ig ⊕ jg = (i + j mod n)g, the identity is
0g, and the inverse of ig = 0g is (n − i)g. Finally, G is isomorphic to Zn under the one-to-one
correspondence ig ↔ i.

Since Zn is abelian, it follows that all cyclic groups are abelian.


In multiplicative notation, the elements of a cyclic group G of order n with generator g are
denoted by {g 0 = 1, g 1 , g 2 , . . . , g n−1 }, the multiplication rule is g i ∗g j = g (i+j mod n) , the identity
is g 0 = 1, and the inverse of g i =  1 is g n−i . For example, if ω = e2πi/n , the set {1, ω, ω 2 , . . . , ω n−1 }
of complex nth roots of unity is a cyclic group under complex multiplication, isomorphic to Zn .
2
A function h : G → H is called invertible if for each β ∈ H there is a unique α ∈ G such that β = h(α). An
invertible function is also called a one-to-one correspondence, denoted by G ↔ H.
7.3. GROUPS 81

7.3.3 Subgroups

A subgroup S of a group G is a subset of the elements of the group such that if a, b ∈ S, then
a ⊕ b ∈ S and −a ∈ S. A subgroup S therefore includes the identity element of G and the
inverse of each element in S. The associative law holds for S since it holds for G. Therefore a
subgroup S ⊆ G is itself a group under the group operation of G.
For example, the set of integers Z is a subgroup of the additive group of R.
If G is abelian, then S must be abelian; however, S may be abelian even if G is nonabelian.
For any g ∈ G, we define the coset (translate) S ⊕ g = {s ⊕ g | s ∈ S}. The zero coset S ⊕ 0 is
thus equal to S itself; moreover, by Theorem 7.1, S ⊕ g = S whenever g ∈ S.
The following theorem states a more general result:

Lemma 7.3 (Cosets) Two cosets S ⊕ g and S ⊕ h are the same if g − h ∈ S, but are disjoint
if g − h ∈
/ S.

Proof. If g − h ∈ S, then the elements of S ⊕ h include (g − h) ⊕ h = g and therefore all


elements of S ⊕ g, so S ⊕ g ⊆ S ⊕ h; similarly S ⊕ h ⊆ S ⊕ g.
On the other hand, if S ⊕ g and S ⊕ h have any element in common, say s ⊕ g = s ⊕ h, then
g − h = s − s ∈ S; thus, g − h ∈
/ S implies that S ⊕ g and S ⊕ h are disjoint.
It follows that the distinct cosets S ⊕ g of a subgroup S ⊆ G form a disjoint partition of G,
since every element g ∈ G lies in some coset, namely S ⊕ g.
The elements s ⊕ g of a coset S ⊕ g are all distinct, since s ⊕ g = s ⊕ g implies s = s . Therefore
if S is finite, then all cosets of S have the same size, namely the size |S| of S = S ⊕ 0. If G is
finite, G is therefore the disjoint union of a finite number |C| of cosets of S ⊆ G, each of size
|S|, so |G| = |C||S|. This proves Lagrange’s theorem:

Theorem 7.4 (Lagrange) If S is a subgroup of a finite group G, then |S| divides |G|.

7.3.4 Cyclic subgroups

Given any finite group G and any element g ∈ G, the set of elements generated by g, namely
S(g) = {g, g ⊕ g, . . .}, is a cyclic subgroup of G. The order of g is defined as the order |S(g)|
of S(g). By Lagrange’s theorem, |S(g)| divides |G|, and by the cyclic groups theorem, S(g) is
isomorphic to Z|S(g)| . (If g = 0, then S(g) = {0} and |S(g)| = 1. We will assume g = 0.)
As a fundamental example, let G be the cyclic group Zn = {0, 1, . . . , n − 1}, and let S(m) be
the cyclic subgroup {m, 2m, . . .} generated by m ∈ Zn . Here im = m ⊕ · · · ⊕ m is simply the sum
of m with itself i times; i.e., im ∈ G is the ordinary product im mod n. The order |S(m)| of
S(m) is the least positive integer k such that km = 0 mod n; i.e., such that the integer product
km is divisible by n. Thus km is the least common multiple of m and n, denoted lcm(m, n), and
|S(m)| = k = lcm(m, n)/m. By elementary number theory, lcm(m, n) = mn/ gcd(m, n) for any
positive integers m, n, so we may alternatively write |S(m)| = n/ gcd(m, n), where gcd(m, n)
denotes the greatest common divisor of m and n. This shows explicitly that |S(m)| divides n.
For example, suppose n = 10 and m = 4. Then S(4) = {4, 8, 2, 6, 0}. Thus |S(4)| = 5,
consistent with |S(4)| = lcm(4, 10)/4 = 20/4 or |S(4)| = 10/ gcd(4, 10)/4 = 10/2.
82 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Now when does S(m) = Zn ? This occurs if and only if gcd(m, n) = 1; i.e., if and only if m is
relatively prime to n. In short, m generates Zn and has order |S(m)| = n if and only if m and
n are relatively prime. The number of integers in the set {0, 1, . . . , n − 1} that have order n is
called the Euler number φ(n).
For example, in Z10 the integers that are relatively prime to 10 are {1, 3, 7, 9}, so φ(10) = 4.
The order of the other elements of Z10 are as follows:

• 0 is the only element of order 1, and S(0) = {0}.

• 5 is the only element of order 2, and S(5) = {0, 5}.

• {2, 4, 6, 8} have order 5, and S(2) = S(4) = S(6) = S(8) = {0, 2, 4, 6, 8}.

In general, Zn has a cyclic subgroup Sd of order d for each positive integer d that divides n,
including 1 and n. Sd consists of {0, n/d, 2n/d, . . . , (d −1)n/d}, and is isomorphic to Zd . Sd thus
contains φ(d) elements that are relatively prime to d, each of which has order d and generates
Sd . The remaining elements of Sd belong also to smaller cyclic subgroups.
For example, Z10 has a subgroup S5 = {0, 2, 4, 6, 8} with 5 elements. Four of these elements,
namely {2, 4, 6, 8}, are relatively prime to 5 and generate S5 . The remaining element of S5 ,
namely 0, has order 1.
Since every element of Zn has some definite order d that divides n, we have

n= φ(d). (7.1)
d: d|n

The notation d : d|n means the set of positive integers d, including 1 and n, that divide n.
All Euler numbers may be determined recursively from this expression. For example, φ(1) =
1, φ(2) = 2 − φ(1) = 1, φ(3) = 3 − φ(1) = 2, φ(4) = 4 − φ(1) − φ(2) = 2, . . ..
Exercise 3. Show that φ(n) ≥ 1 for all n ≥ 1. [Hint: Find the order of 1 in Zn .]
Since every cyclic group G of size n is isomorphic to Zn , these results apply to every cyclic
group. In particular, every cyclic group G of size n has φ(n) generators that generate G, which
are called the primitive elements of G. G also contains one cyclic subgroup of size d for each d
that divides n.
Exercise 4. Show that every subgroup of Zn is cyclic. [Hint: Let s be the smallest nonzero
element in a subgroup S ⊆ Zn , and compare S to the subgroup generated by s.]
7.4. FIELDS 83

7.4 Fields
Definition 7.2 A field is a set F of at least two elements, with two operations ⊕ and ∗, for
which the following axioms are satisfied:

• The set F forms an abelian group (whose identity is called 0) under the operation ⊕.
• The set F∗ = F − {0} = {a ∈ F, a = 0} forms an abelian group (whose identity is called 1)
under the operation ∗.
• Distributive law: For all a, b, c ∈ F, (a ⊕ b) ∗ c = (a ∗ c) ⊕ (b ∗ c).

The operation ⊕ is called addition (and often denoted by +), and the operation ∗ is called
multiplication (and often denoted by juxtaposition). As in ordinary arithmetic, we often omit the
parentheses around a product of elements, using the convention “multiplication before addition;”
e.g., we interpret a ⊕ b ∗ c as a ⊕ (b ∗ c).
The reader may verify that R, C, Q and F2 each form a field according to this definition under
conventional addition and multiplication.
Exercise 5. Show that for any element a ∈ F, a ∗ 0 = 0.

7.4.1 Prime fields

A fundamental example of a finite (Galois) field is the set Fp of mod-p remainders, where p is
a given prime number. Here, as in Zp , the set of elements is Rp = {0, 1, · · · , p − 1}, and the
operation ⊕ is mod-p addition. The multiplicative operation ∗ is mod-p multiplication; i.e.,
multiply integers as usual and then take the remainder after division by p.

Theorem 7.5 (Prime fields) For every prime p, the set Rp = {0, 1, · · · , p − 1} forms a field
(denoted by Fp ) under mod-p addition and multiplication.

Proof. We have already seen that the elements of Fp form an abelian group under addition
modulo p, namely the cyclic group Zp .
The associative and commutative properties of multiplication mod p follow from the corre-
sponding properties of ordinary multiplication; the distributive law follows from the correspond-
ing property for ordinary addition and multiplication. The multiplicative identity is 1.
To see that the nonzero elements F∗p = Fp − {0} form a group under multiplication, we use
Theorem 7.1. By unique factorization, the product of two nonzero integers a, b < p cannot
equal 0 mod p. Therefore the nonzero elements F∗p are closed under multiplication mod p. Also,
for a, b, c ∈ F∗p and b =  c we have a(b − c) mod p = 0. Thus ab = ac mod p, which implies
a ∗ b = a ∗ c. Consequently there are no zeroes or repetitions in the set of p − 1 elements
{a ∗ 1, a ∗ 2, . . . , a ∗ (p − 1)}, which means they must be a permutation of F∗p .
We next show that Fp is essentially the only field with p elements. More precisely, we show
that all fields with p elements are isomorphic. Two fields F and G are isomorphic if there
is an invertible function h : F → G mapping each α ∈ F into a β = h(α) ∈ G such that
h(α ⊕ α ) = h(α) ⊕ h(α ) and h(α ∗ α ) = h(α) ∗ h(α ). Less formally, F and G are isomorphic
if there is a one-to-one correspondence F ↔ G that translates the addition and multiplication
tables of F to those of G and vice versa.
84 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Let F be any field with a prime p number of elements. By the field axioms, F has an additive
identity 0 and multiplicative identity 1. Consider the additive cyclic subgroup generated by 1,
namely S(1) = {1, 1 ⊕ 1, . . .}. By Lagrange’s theorem, the order of S(1) divides |F| = p, and
therefore must be equal to 1 or p. But 1 ⊕ 1 = 1, else 1 = 0, so 1 must have order p. In other
words, S(1) = F, and the additive group of F is isomorphic to that of Zp . We may therefore
denote the elements of F by {0, 1, 2, . . . , p − 1}, and use mod-p addition as the addition rule.
The only remaining question is whether this correspondence F ↔ Zp under addition extends
to multiplication. The distributive law shows that it does: j ∗ i is the sum of j terms each equal
to i, so j ∗ i = (ji mod p). Therefore, in summary:
Theorem 7.6 (Prime field uniqueness) Every field F with a prime number p of elements is
isomorphic to Fp via the correspondence 1� ⊕ ·��
· · ⊕ 1� ∈ F ↔ i ∈ Fp .
i terms

In view of this elementary isomorphism, we will denote any field with a prime number p of
elements by Fp .
It is important to note that the set Zn of integers mod n does not form a field if n is not prime.
The reason is that n = ab for some positive integers a, b < n ∈ Zn ; thus ab = 0 mod n, so the
set of nonzero elements of Zn is not closed under multiplication mod n.
However, we will see shortly that there do exist finite fields with non-prime numbers of elements
that use other rules for addition and multiplication.

7.4.2 The prime subfield of a finite field

A subfield G of a field F is a subset of the field that is itself a field under the operations of F.
For example, the real field R is a subfield of the complex field C. We now show that every finite
field Fq has a subfield that is isomorphic to a prime field Fp .
Let Fq be a finite field with q elements. By the field axioms, Fq has an additive identity 0 and
a multiplicative identity 1.
Consider the cyclic subgroup of the additive group of Fq that is generated by 1, namely
S(1) = {1, 1 ⊕ 1, . . .}. Let n = |S(1)|. By the cyclic group theorem, S(1) is isomorphic to Zn ,
and its elements may be denoted by {0, 1, 2, . . . , n − 1}, with mod-n addition.
By the distributive law in Fq , the product i∗j (in Fq ) of two nonzero elements in S(1) is simply
the sum of ij ones, which is an element of S(1), namely ij mod n. Since this is a product of
nonzero elements of Fq , by the field axioms ij mod n must be nonzero for all nonzero i, j. This
will be true if and only if n is a prime number p.
Thus S(1) forms a subfield of Fq with a prime number p of elements. By the prime field
theorem of the previous subsection, S(1) is isomorphic to Fp . Thus the elements of S(1), which
are called the integers of Fq , may be denoted by Fp = {0, 1, . . . , p − 1}, and the addition and
multiplication rules of Fq reduce to mod-p addition and multiplication in Fp .
The prime p is called the characteristic of Fq . Since the p-fold sum of the identity 1 with itself
is 0, the p-fold sum of every field element β ∈ Fq with itself is 0: pβ = 0.
In summary:
Theorem 7.7 (Prime subfields) The integers {1, 1 ⊕ 1, . . .} of any finite field Fq form a sub-
field Fp ⊆ Fq with a prime number p of elements, where p is the characteristic of Fq .
7.5. POLYNOMIALS 85

7.5 Polynomials
We now consider polynomials over Fp , namely polynomials whose coefficients lie in Fp and
for which polynomial addition and multiplication is performed in Fp . We will see that the
factorization properties of polynomials are similar to those of the integers, and that the analogue
to mod-n arithmetic is arithmetic modulo a polynomial f (x).
A nonzero polynomial f (x) of degree m over a field F is an expression of the form

f (x) = f0 + f1 x + f2 x2 + · · · + fm xm ,

where fi ∈ F, 0 ≤ i ≤ m, and fm = 0. We say that deg f (x) = m. The symbol x represents


an indeterminate (or “placeholder”), not an element of F; i.e., two polynomials are different if
and only if their coefficients are different3 . The nonzero polynomials of degree 0 are simply the
nonzero field elements f0 ∈ F. There is also a special zero polynomial f (x) = 0 whose degree
is defined by convention as deg 0 = −∞; we will explain the reason for this convention shortly.
The set of all polynomials over F in an indeterminate x is denoted by F[x].
The rules for adding, subtracting or multiplying polynomials are the same over a general field
F as over the real field R, except that coefficient operations are in F. In particular, addition and
subtraction are performed componentwise. For multiplication, the coefficients of a polynomial
product f (x) = h(x)g(x) are determined by convolution:


i
fi = hj gi−j .
j=0

If two nonzero polynomials are multiplied, then their degrees add; i.e., deg(h(x)g(x)) =
deg h(x) + deg g(x). The convention deg 0 = −∞ ensures that this formula continues to hold
when h(x) or g(x) is the zero polynomial.
The set F[x] has many of the properties of a field. It is evidently an abelian group under
addition whose identity is the zero polynomial 0 ∈ F[x]. It is closed under multiplication, which
is both associative and commutative and which distributes over addition. It has a multiplicative
identity 1 ∈ F[x], and the cancellation law holds.
However, in general we cannot divide evenly by a nonzero polynomial, since a polynomial f (x)
with deg f (x) > 0 has no multiplicative inverse. Therefore F[x] is a ring,4 not a field, like the
ring of integers Z. We now develop a series of properties of F[x] that resemble those of Z.
3
Over the real field R, a polynomial f (x) is sometimes regarded as a function f : R → R. This alternative
viewpoint makes little difference in the real case, since two polynomials over R are different if and only if the
corresponding polynomial functions are different. However, over finite fields it is important to maintain the
distinction. For example, over F2 the polynomial functions x and x2 both map 0 → 0, 1 → 1, yet the polynomials
x and x2 are different.
4
The axioms of a ring are similar to those for a field, except that there is no multiplicative inverse. For example,
Z and Zn (for n not a prime) are rings. In fact, Z and F[x] are integer domains, which are the nicest kind of
rings. An integer domain is a ring with commutative multiplication and a multiplicative identity 1 such that the
nonzero elements are closed under multiplication.
Exercise 6. Show that an integer domain with a finite number of elements must be a finite field. [Hint:
consider its cyclic multiplicative subgroups.]
86 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

7.5.1 Definitions

A polynomial g(x) is said to be a divisor of an polynomial f (x) if f (x) is a polynomial multiple


of g(x); i.e., f (x) = q(x)g(x) for some polynomial q(x). Thus all polynomials are trivially
divisors of the zero polynomial 0.
The polynomials that have polynomial inverses are the nonzero degree-0 polynomials β ∈ F∗ =
F − {0}. These are called the units of F[x]. If u(x) is a unit polynomial and g(x) is a divisor of
f (x), then u(x)g(x) is a divisor of f (x) and g(x) is a divisor of u(x)f (x). Thus the factorization
of a polynomial can be unique only up to a unit polynomial u(x), and u(x)f (x) has the same
divisors as f (x).
A monic polynomial is a nonzero polynomial f (x) of degree m with high-order coefficient fm
equal to 1; i.e., f (x) = f0 + f1 x + f2 x2 + · · · + xm . Every nonzero polynomial g(x) may be
written as the product g(x) = gm f (x) of a monic polynomial f (x) of the same degree with a unit
polynomial u(x) = gm , and the product of two monic polynomials is monic. We may therefore
consider only factorizations of monic polynomials into products of monic polynomials.
Every nonzero polynomial f (x) is divisible by 1 and f (x); these divisors are called trivial. A
polynomial g(x) is said to be a factor of a polynomial f (x) if g(x) is monic and a nontrivial
divisor of f (x). Thus the degree of any factor g(x) of f (x) satisfies 1 ≤ deg g(x) < deg f (x).
A polynomial g(x) of degree 1 or more that has no factors is called an irreducible polynomial,
and a monic irreducible polynomial is called a prime polynomial. Our goal now is to show that
every monic polynomial has a unique factorization into prime polynomial factors.

7.5.2 Mod-g(x) arithmetic

Given a monic polynomial g(x) of degree m, every polynomial f (x) may be expressed as f (x) =
q(x)g(x)+r(x) for some polynomial remainder r(x) such that deg r(x) < m and some polynomial
quotient q(x). This may be proved by the Euclidean long division algorithm of high school, with
component operations in F; i.e., divide g(x) into f (x) by long division, high-degree terms first,
stopping when the degree of the remainder is less than that of g(x). The following exercise
shows that the resulting quotient q(x) and remainder r(x) are unique.
Exercise 7 (Euclidean division algorithm).
(a) For the set F[x] of polynomials over any field F, show that the distributive law holds:
(f1 (x) + f2 (x))h(x) = f1 (x)h(x) + f2 (x)h(x).
(b) Use the distributive law to show that for any given f (x) and g(x) in F[x], there is a unique
q(x) and r(x) with deg r(x) < deg g(x) such that f (x) = q(x)g(x) + r(x).
The remainder polynomial r(x), denoted by r(x) = f (x) mod g(x), is the more important
part of this decomposition. The set of all possible remainder polynomials is the set RF,m =
{r0 + r1 x + · · · + rm−1 xm−1 | rj ∈ F, 0 ≤ j ≤ m − 1}, whose size is |RF,m | = |F|m . Evidently
g(x) is a divisor of f (x) if and only if f (x) mod g(x) = 0.
Remainder arithmetic using the remainder set RF,m is called “mod-g(x) arithmetic.” The
rules for mod-g(x) arithmetic follow from the rules for polynomial arithmetic as follows. Let
r(x) = f (x) mod g(x) and s(x) = h(x) mod g(x); then, as polynomials, r(x) = f (x) − q(x)g(x)
and s(x) = h(x) − t(x)g(x) for some quotient polynomials q(x) and t(x). Then
7.5. POLYNOMIALS 87

f (x) + h(x) = r(x) + s(x) − (q(x) + t(x))g(x);


f (x)h(x) = r(x)s(x) − (q(x)s(x) + t(x)r(x))g(x) + q(x)t(x)g 2 (x).

Hence (f (x) + h(x)) mod g(x) = (r(x) + s(x)) mod g(x) and f (x)h(x) mod g(x) = r(x)s(x)
mod g(x). In other words, the mod-g(x) remainder of the sum or product of two polynomials is
equal to the mod-g(x) remainder of the sum or product of their mod-g(x) remainders.
The mod-g(x) addition and multiplication rules are therefore defined as follows:

r(x) ⊕ s(x) = (r(x) + s(x)) mod g(x);


r(x) ∗ s(x) = (r(x)s(x)) mod g(x),

where “r(x)” and “s(x)” denote elements of the remainder set RF,m on the left and the corre-
sponding ordinary polynomials on the right. This makes mod-g(x) arithmetic consistent with
ordinary polynomial arithmetic in the sense of the previous paragraph.
Note that the mod-g(x) addition rule is just componentwise addition of coefficients in F. In this
sense the additive groups of RF,m and of the vector space Fm of m-tuples over F are isomorphic.

7.5.3 Unique factorization

By definition, every monic polynomial f (x) is either irreducible or can be factored into a product
of monic polynomial factors, each of lower degree. In turn, if a factor is not irreducible, it can
be factored further. Since factor degrees are decreasing but bounded below by 1, we must
eventually arrive at a product of monic irreducible (prime) polynomials. The following theorem
shows that there is only one such set of prime polynomial factors, regardless of the order in
which the polynomial is factored.

Theorem 7.8 (Unique factorization of polynomials) Over any field F, every monic poly-
nomial f (x) ∈ F[x] of degree m ≥ 1 may be written in the form


k
f (x) = ai (x),
i=1

where each ai (x), 1 ≤ i ≤ k, is a prime polynomial in F[x]. This factorization is unique, up to


the order of the factors.

Proof. We have already shown that f (x) may be factored in this way, so we need only prove
uniqueness. Thus assume hypothetically that the theorem is false and let m be the smallest
degree such that there exists a degree-m monic polynomial f (x) with more than one such
factorization,
f (x) = a1 (x) · · · ak (x) = b1 (x) · · · bj (x); j, k ≥ 1, (7.2)
where a1 (x), . . . , ak (x) and b1 (x), . . . , bj (x) are prime polynomials. We will show that this implies
a polynomial f  (x) with degree less than m with non-unique factorization, and this contradiction
will prove the theorem. Now a1 (x) cannot appear on the right side of (7.2), else it could be
factored out for an immediate contradiction. Similarly, b1 (x) cannot appear on the left. Without
loss of generality, assume deg b1 (x) ≤ deg a1 (x). By the Euclidean division algorithm, a1 (x) =
q(x)b1 (x) + r(x). Since a1 (x) is irreducible, r(x) = 0 and 0 ≤ deg r(x) < deg b1 (x) ≤ deg a1 (x).
88 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Thus r(x) has a prime factorization r(x) = βr1 (x) · · · rn (x), where β is the high-order coefficient
of r(x), and b1 (x) is not a divisor of any of the ri (x), since it has greater degree. Substituting
into (7.2), we have

(q(x)b1 (x) + βr1 (x) · · · rn (x))a2 (x) · · · ak (x) = b1 (x) · · · bj (x),

or, defining f  (x) = r1 (x) · · · rn (x)a2 (x) · · · ak (x) and rearranging terms,

f  (x) = r1 (x) · · · rn (x)a2 (x) · · · ak (x) = β −1 b1 (x)(b2 (x) · · · bj (x) − q(x)a2 (x) · · · ak (x)).

Now f  (x) is monic, because it is a product of monic polynomials; it has degree less than f (x),
since deg r(x) < deg a1 (x); and it has two different factorizations, with b1 (x) a factor in one but
not a divisor of any of the factors in the other; contradiction.
Exercise 8. Following this proof, prove unique factorization for the integers Z.

7.5.4 Enumerating prime polynomials

The prime polynomials in F[x] are analogous to the prime numbers in Z. One way to enumerate
the prime polynomials is to use an analogue of the sieve of Eratosthenes. For integers, this
method goes as follows: Start with a list of all integers greater than 1. The first integer on the
list is 2, which is prime. Erase all multiples of 2 (even integers). The next remaining integer
is 3, which must be the next prime. Erase all multiples of 3. The next remaining integer is 5,
which must be the next prime. Erase all multiples of 5. And so forth.
Similarly, to find the prime polynomials in F2 [x], for example, first list all polynomials of degree
1 or more in F2 [x] in order of degree. (Note that all nonzero polynomials in F2 [x] are monic.)
No degree-1 polynomial can have a factor, so the two degree-1 polynomials, x and x + 1, are
both prime. Next, erase all degree-2 multiples of x and x + 1, namely

x2 = x ∗ x;
x2 + x = x ∗ (x + 1);
x2 + 1 = (x + 1) ∗ (x + 1)

from the list of four degree-2 polynomials. This leaves one prime degree-2 polynomial, namely
x2 + x + 1. Next, erase all degree-3 multiples of x, x + 1, and x2 + x + 1 from the list of eight
degree-3 polynomials, namely the six polynomials

x3 = x ∗ x ∗ x;
x3 + x2 = (x + 1) ∗ x ∗ x;
x3 + x = (x + 1) ∗ (x + 1) ∗ x;
x3 + x2 + x = x ∗ (x2 + x + 1);
x3 + 1 = (x + 1) ∗ (x2 + x + 1);
x3 + x2 + x + 1 = (x + 1) ∗ (x + 1) ∗ (x + 1).

The remaining two polynomials, namely x3 + x2 + 1 and x3 + x + 1, must therefore be prime.


Exercise 9. Find all prime polynomials in F2 [x] of degrees 4 and 5. [Hint: There are three
prime polynomials in F2 [x] of degree 4 and six of degree 5.]
Continuing in this way, we may list all prime polynomials in F2 [x] up to any desired degree.
7.6. A CONSTRUCTION OF A FIELD WITH P M ELEMENTS 89

It turns out that the number N (m) of prime polynomials of F2 [x] of degree m is N (m) =
2, 1, 2, 3, 6, 9, 18, 30, 56, 99, . . . for m = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . .. (In Section 7.9 we will give a
simpler method to compute N (m), and will show that N (m) > 0 for all m.)
A similar sieve algorithm may be used to find the prime polynomials in F[x] over any finite
field F. The algorithm starts with a listing of the monic polynomials ordered by degree, and
successively erases the multiples of lower-degree prime polynomials.

7.6 A construction of a field with pm elements


We now show how to construct a field with pm elements for any prime integer p and positive
integer m ≥ 1. Its elements will be the set RF,m of remainder polynomials of degree less than m,
and multiplication will be defined modulo an irreducible polynomial g(x) of degree m. We will
subsequently show that that every finite field is isomorphic to a finite field that is constructed
in this way.
The construction assumes the existence of a prime polynomial g(x) ∈ Fp [x] of degree m. The
proof that such a polynomial exists for all prime p and m ≥ 1 will be deferred until later. The
field that we construct will be denoted by Fg(x) .
The set of elements of Fg(x) will be taken to be the mod-g(x) remainder set RFp ,m = {r0 +
r1 x + · · · + rm−1 xm−1 | rj ∈ Fp , 0 ≤ j ≤ m − 1}, whose size is |RFp ,m | = pm .
The addition and multiplication rules will be taken to be those of mod-g(x) arithmetic. We
must show that the axioms of a field are satisfied with these definitions.
The associative, commutative and distributive laws for mod-g(x) arithmetic follow from the
corresponding laws for ordinary polynomial arithmetic.
Mod-g(x) addition of two remainder polynomials in Fg(x) yields a remainder polynomial of
degree < m in Fg(x) . Fg(x) evidently forms an abelian group under mod-g(x) addition. (As
already mentioned, this group is isomorphic to the additive group of (Fp )m .)
Mod-g(x) multiplication of two remainder polynomials r(x), s(x) yields the remainder polyno-
mial t(x) = r(x)s(x) mod g(x). The following exercise shows that the nonzero elements of Fg(x)
form an abelian group under mod-g(x) multiplication:
Exercise 10. Let g(x) be a prime polynomial of degree m, and let r(x), s(x), t(x) be polyno-
mials in Fg(x) .
(a) Prove the distributive law, i.e., (r(x)+ s(x)) ∗t(x) = r(x) ∗t(x)+ s(x) ∗t(x). [Hint: Express
each product as a remainder using the Euclidean division algorithm.]
 0, show that r(x) ∗ s(x) = r(x) ∗ t(x) if s(x) =
(b) For r(x) =  t(x).
(c) For r(x) = 0, show that as s(x) runs through all nonzero polynomials in Fg(x) , the product
r(x) ∗ s(x) also runs through all nonzero polynomials in Fg(x) .
(d) Using part (c) and Theorem 7.1, show that the nonzero elements of Fg(x) form an abelian
group under mod-g(x) multiplication.
Since we have verified the three field axioms, we have proved:

Theorem 7.9 (Construction of Fg(x) ) If g(x) is an prime polynomial of degree m over a


prime field Fp , then the set of remainder polynomials RFp ,m with mod-g(x) arithmetic forms a
finite field Fg(x) with pm elements.
90 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Example 1. Let us construct a finite field with 22 = 4 elements using the prime degree-2
polynomial g(x) = x2 + x + 1 ∈ F2 [x].
There are four remainder polynomials mod x2 + x + 1, namely {0, 1, x, x + 1}. Addition is
componentwise mod 2. For multiplication, note that x∗x = x+1 since x2 mod (x2 +x+1) = x+1.
Also x ∗ x ∗ x = x ∗ (x + 1) = 1 since x3 mod (x2 + x + 1) = 1. The three nonzero elements
{1, x, x + 1} thus form a cyclic group under mod-g(x) multiplication, which verifies the second
field axiom for this example.
The complete mod-g(x) addition and multiplication tables are as follows:
⊕ 0 1 x x+1 ∗ 0 1 x x+1
0 0 1 x x+1 0 0 0 0 0
1 1 0 x+1 x 1 0 1 x x+1
x x x+1 0 1 x 0 x x+1 1
x+1 x+1 x 1 0 1+x 0 x+1 1 x

7.7 The multiplicative group of F∗q is cyclic


In this section we consider an arbitrary finite field Fq with q elements. By the second field axiom,
the set F∗q of all q − 1 nonzero elements must form a finite abelian group under multiplication.
In this section we will show that this group is actually cyclic.
We start by showing that every element of F∗q is a root of the polynomial xq−1 − 1 ∈ Fq [x].
Thus we first need to discuss roots of polynomials over arbitrary fields.

7.7.1 Roots of polynomials

Let F[x] be the set of polynomials over an arbitrary field F. If f (x) ∈ F[x] has a degree-1 factor
x − α for some α ∈ F, then α is called a root of f (x).
Since any f (x) may be uniquely expressed as f (x) = q(x)(x−α)+β for some quotient q(x) and
some β ∈ F (i.e., for some remainder r(x) = β of degree less than 1), it follows that f (α) = β.
Therefore α is a root of f (x) if and only if f (α) = 0 — i.e., if and only if α is a root of the
polynomial equation f (x) = 0.
By degree additivity, the degree of a polynomial f (x) is equal to the sum of the degrees of
its prime factors, which are unique by unique factorization. Therefore a polynomial of degree
m can have at most m degree-1 factors. This yields what is sometimes called the fundamental
theorem of algebra:

Theorem 7.10 (Fundamental theorem of algebra) Over any field F, a monic polynomial
f (x) ∈ F[x] of degree m can have no more than m roots in F. If it does have m roots {β1 , . . . , βm },
then the unique factorization of f (x) is f (x) = (x − β1 ) · · · (x − βm ).

Since the polynomial xn − 1 can have at most n roots in F, we have an important corollary:

Theorem 7.11 (Cyclic multiplicative subgroups) In any field F, the multiplicative group
F∗ of nonzero elements has at most one cyclic subgroup of any given order n. If such a subgroup
exists, then its elements {1, β, . . . , β n−1 } satisfy
xn − 1 = (x − 1)(x − β) · · · (x − β n−1 ).
7.7. THE MULTIPLICATIVE GROUP OF F∗Q IS CYCLIC 91

For example, the complex multiplicative group C∗ has precisely one cyclic subgroup of each
finite size n, consisting of the n complex nth roots of unity. The real multiplicative group R∗
has cyclic subgroups of size 1 ({1}) and 2 ({±1}), but none of any larger size.
Exercise 11. For 1 ≤ j ≤ n, the jth � �elementary symmetric function σj (S) of a set S of n
elements of a field F is the sum of all nj products of j distinct elements of S. In particular,
σ1 (S) is the sum of all elements of S, and σn (S) is the product of all elements of S.
(a) Show that if S = {1, β, . . . , β n−1 } is a cyclic subgroup of F∗ , then σj (S) = 0 for 1 ≤ j ≤ n−1
and σn (S) = (−1)n+1 . In particular,

n−1 �
n−1
β j = 0, if n > 1; β j = (−1)n+1 .
j=0 j=0

Verify for S = {±1, ±i} (the four complex 4th roots of unity).
(b) Prove that for any odd prime integer p,
(p − 1)! = 1 · 2 · 3 · · · (p − 1) = −1 mod p.
Verify for p = 3, 5 and 7.

7.7.2 Factoring xq − x over Fq

For any β ∈ Fq∗ , consider the cyclic subgroup S(β) = {1, β, β 2 , β 3 , . . .} of Fq∗ generated by β.
The size |S(β)| of this subgroup is called the multiplicative order of β.
By the cyclic group theorem, β |S(β)| = 1, and by Lagrange’s theorem, |S(β)| must divide
|F∗q | = q − 1. It follows that β q−1 = 1 for all β ∈ Fq∗ .
In other words, every β ∈ Fq∗ is a root of the polynomial equation xq−1 = 1, or equivalently
of the polynomial xq−1 − 1 ∈ Fq [x]. By the polynomial roots theorem, xq−1 − 1 can have at
most q − 1 roots in Fq , so these are all the roots of xq−1 − 1. Thus xq−1 − 1 factors into the
product of the degree-1 polynomials x − β for all β ∈ F∗q . Moreover, since 0 ∈ Fq is a root of the
polynomial x and x(xq−1 − 1) = xq − x, the polynomial xq − x factors into the product of the
degree-1 polynomials x − β for all β ∈ Fq .
To summarize:

Theorem 7.12 In a finite field Fq with q elements, every nonzero field element β ∈ Fq satisfies
β q−1 = 1 and has a multiplicative order |S(β)| that divides q − 1. The nonzero elements of Fq
are the q − 1 distinct roots of the polynomial xq−1 − 1 ∈ Fq [x]; i.e.,

xq−1 − 1 = (x − β). (7.3)
β∈F∗q

The elements of Fq are the q distinct roots of the polynomial xq − x ∈ Fq [x]; i.e.,

xq − x = (x − β). (7.4)
β∈Fq

Exercise 12.
(a) Verify (7.3) for the prime field F5 .
(b) Verify (7.3) for the field F4 that was constructed in Example 1. [Hint: use a symbol other
than x for the indeterminate in (7.3).]
92 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

7.7.3 Every finite field has a primitive element

A primitive element of a finite field Fq is an element α whose multiplicative order |S(α)| equals
q − 1. If α is a primitive element, then the cyclic group {1, α, α2 , . . . , αq−2 } is a set of q − 1
distinct nonzero elements of Fq , which therefore must be all the nonzero elements. Thus if we can
show that Fq has at least one primitive element, we will have shown that its nonzero elements
F∗q form a cyclic group under multiplication of size q − 1.
By Lagrange’s theorem, the multiplicative order |S(β)| of each nonzero element β ∈ F∗q divides
q − 1. Therefore the size d of each cyclic subgroup of Fq∗ divides q − 1. As we have seen, the
number of elements in a cyclic group or subgroup of size d that have order d is the Euler number
φ(d). Since by the cyclic subgroups theorem F∗q has at most one cyclic subgroup of each size d,
the number of elements in F∗q with order less than q − 1 is at most

φ(d).
d: d|(q−1), d=q−1

But since the Euler numbers satisfy the relationship (7.1), which in this case is

q−1= φ(d),
d: d|(q−1)

we conclude that there must be at least φ(q − 1) elements of F∗q with order q − 1. Indeed, since
F∗q has at most φ(q − 1) elements of order q − 1, all inequalities must be satisfied with equality;
i.e., F∗q has precisely φ(d) elements of order d for each divisor d of q − 1.
We saw in Exercise 3 that φ(q − 1) ≥ 1, so a primitive element α of order q − 1 exists. Thus
F∗q is cyclic and has one cyclic subgroup of each order d that divides q − 1. This proves the
following theorem:

Theorem 7.13 (Primitive elements) Given any field Fq with q elements, the nonzero ele-
ments of Fq form a multiplicative cyclic group F∗q = {1, α, α2 , . . . , αq−2 }. Consequently F∗q has
φ(d) ≥ 1 elements of multiplicative order d for every d that divides q − 1, and no elements of
any other order. In particular, F∗q has φ(q − 1) ≥ 1 primitive elements.

Henceforth we will usually write the elements of a finite field Fq as {0, 1, α, α2 , . . . , αq−2 }, where
α denotes a primitive element. For Fg(x) , denoting a field element β as a power of α rather than
as a remainder polynomial helps to avoid confusion when we consider polynomials in β.
Example 2. The prime field F5 has φ(1) = 1 element of order 1 (the element 1), φ(2) = 1
element of order 2 (namely 4 = -1), and φ(4) = 2 primitive elements of order 4 (namely, 2 and
3). We can therefore write F5 = {0, 1, 2, 22 , 23 }, since 22 = 4 and 23 = 3 mod 5.
Example 3. A field F16 = {0, 1, α, . . . , α14 } with 16 elements has

• φ(1) = 1 element of order 1 (the element 1);

• φ(3) = 2 elements of order 3 (α5 and α10 );

• φ(5) = 4 elements of order 5 (α3 , α6 , α9 , α12 ), and

• φ(15) = 8 primitive elements of order 15 (α, α2 , α4 , α7 , α8 , α11 , α13 , α14 ).


7.8. EVERY FINITE FIELD IS ISOMORPHIC TO A FIELD FG(X) 93

The “logarithmic” representation of the nonzero elements of Fq as distinct powers of a primitive


element α is obviously highly convenient for multiplication and division. Multiplication in Fq is
often carried out by using such a “log table” to convert a polynomial f (x) ∈ Fq to the exponent
i such that f (x) = αi , and then using an inverse “antilog table” to convert back after adding or
subtracting exponents. (Note that the zero element can be included in this scheme if we define
0 = α−∞ .)

7.8 Every finite field is isomorphic to a field Fg(x)


We now wish to show that every finite field Fq is isomorphic to a field Fg(x) of the type that
we have previously constructed. In particular, this will show that the number of elements of a
finite field must be q = pm , a prime power.
The development relies on the properties of minimal polynomials, which are the factors that
appear in the unique factorization of xq − x over the prime subfield Fp of Fq .

7.8.1 Factoring xq − x into minimal polynomials over Fp

Again, consider any field Fq with q elements. We have seen in Theorem 7.12 that the polynomial
xq − x ∈ Fq [x] factors completely into q deqree-1 factors x − β ∈ Fq [x], β ∈ Fq .
We have also seen that if Fq has characteristic p, then Fq has a prime subfield Fp with p
elements. The prime subfield Fp contains the integers of Fq , which include {0, ±1}. Therefore
we may regard xq − x alternatively as a polynomial in Fp [x].
By unique factorization, xq − x factors over Fp into a unique product of prime polynomials
gi (x) ∈ Fp [x]: �
xq − x = gi (x). (7.5)
i

Since each coefficient of gi (x) is an element of Fp ⊆ Fq , it is also an element of Fq , so gi (x) is


also a monic polynomial in Fq [x]. We therefore have the following two factorizations of xq − x
in Fq [x]: � �
xq − x = (x − β) = gi (x). (7.6)
β∈Fq i

Since the first factorization is the unique prime factorization, it follows that each monic polyno-
mial gi (x) of degree greater than 1 must be reducible over Fq , and must factor into a product
of degree-1 monic polynomials; i.e.,


deg gi (x)
gi (x) = (x − βij ). (7.7)
j=1

The prime polynomials gi (x) are called the minimal polynomials of Fq . Since each β ∈ Fq
appears exactly once on the left side of (7.6), it also appears as a factor in exactly one minimal
polynomial in (7.7). Thus the elements of Fq are partitioned into disjoint sets {βi1 , . . . , βik }
where k = deg gi (x), and each β ∈ Fq is a root of exactly one minimal polynomial of Fq , called
the minimal polynomial of β.
The key property of the minimal polynomial of β is the following:
94 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Lemma 7.14 Let g(x) be the minimal polynomial of any given β ∈ Fq . Then g(x) is the monic
polynomial of least degree in Fp [x] such that g(β) = 0. Moreover, for any f (x) ∈ Fp [x], f (β) = 0
if and only if g(x) divides f (x).

Proof: Let h(x) ∈ Fp [x] be a monic polynomial of least degree such that h(β) = 0. Using
the Euclidean division algorithm, g(x) = q(x)h(x) + r(x) where deg r(x) < deg h(x). Since
h(β) = g(β) = 0, we must have r(β) = 0. By the smallest degree property of h(x), this implies
that r(x) = 0, so h(x) divides g(x). But since g(x) is irreducible, h(x) cannot have degree less
than g(x); i.e., deg h(x) = deg g(x). Moreover, since both h(x) and g(x) are monic, this implies
that h(x) = g(x). Thus g(x) is the monic polynomial of least degree in Fp [x] such that g(β) = 0.
Now let f (x) be any polynomial in Fp [x] that satisfies f (β) = 0. By Euclidean division, f (x) =
q(x)g(x) + r(x) with deg r(x) < deg g(x). Thus r(β) = f (β) = 0. Since deg r(x) < deg g(x),
r(β) = 0 if and only if r(x) = 0; i.e., if and only if g(x) divides f (x).
Example 1 (cont.). Again consider the field F4 of Example 1, whose elements we now write
as {0, 1, α, α2 }, where α may be taken as x or x + 1. This field has characteristic 2. The prime
factorization of the binary polynomial x4 − x = x4 + x ∈ F2 [x] is

x4 + x = x(x + 1)(x2 + x + 1),

so the minimal polynomials of F4 are x, x + 1 and x2 + x + 1. The elements 0 and 1 ∈ F4 are


the roots of x and x + 1, respectively. From (7.7), the other two elements of F4 , namely α and
α2 , must be roots of x2 + x + 1 ∈ F2 [x]. We verify that

x2 + x + 1 = (x + α)(x + α2 )

since α + α2 = 1 and α ∗ α2 = α3 = 1.

7.8.2 Valuation maps, minimal polynomials and subfields

Given a�field Fq with prime subfield Fp , we now consider evaluating a nonzero polynomial
f (x) = i fi xi ∈ Fp [x] at an element β ∈ Fq to give a value


deg f (x)
f (β) = fi β i
i=0

in Fq , where fi is taken as an element of Fq for the purposes of this evaluation. The value of the
zero polynomial at any β is 0.
The value f (β) depends on both the polynomial f (x) and the field element β ∈ Fq . Rather than
regarding f (β) as a function of β, as the notation suggests, we will regard f (β) as a function of
the polynomial f (x) ∈ Fp [x] for a fixed β. In other words, we consider the map mβ : Fp [x] → Fq
that is defined by mβ (f (x)) = f (β).
The set of values mβ (Fp [x]) of this map as f (x) ranges over polynomials in Fp [x] is by definition
the subset of elements Gβ ⊆ Fq that can be expressed as linear combinations over Fp of powers of
β. We will show that Gβ forms a subfield of Fq that is isomorphic to the polynomial remainder
field Fg(x) , where g(x) is the minimal polynomial of β, namely the monic polynomial of least
degree such that g(β) = 0.
7.8. EVERY FINITE FIELD IS ISOMORPHIC TO A FIELD FG(X) 95

We observe that the map mβ : Fp [x] → Fq preserves addition and multiplication; i.e.,
mβ (f1 (x) + f2 (x)) = mβ (f1 (x)) + mβ (f2 (x)) since both sides equal f1 (β) + f2 (β), and
mβ (f1 (x)f2 (x)) = mβ (f1 (x))mβ (f2 (x)) since both sides equal f1 (β)f2 (β).
We can now prove the desired isomorphism between the fields Fg(x) and Gβ :

Theorem 7.15 (Subfields generated by β ∈ Fq ) For any β ∈ Fq , let�g(x) be the minimal


polynomial of β. Then the set of all linear combinations Gβ = {f (β) = i fi β i , f (x) ∈ Fp [x]}
over Fp of powers of β is equal to the set {r(β), r(x) ∈ RFp ,m } of values of remainder polynomials
r(x) ∈ RFp ,m , and Gβ is a field which is isomorphic to the field Fg(x) under the correspondence
r(β) ∈ Gβ ↔ r(x) ∈ RFp ,m .

Proof. We first verify that the correspondence mβ : RFp ,m → Gβ is one-to-one (invertible).


First, if f (β) is any element of Gβ , then by Euclidean division we can write f (x) = q(x)g(x)+r(x)
where r(x) ∈ RFp ,m , and then f (β) = q(β)g(β)+r(β) = r(β), so f (β) = r(β) for some remainder
polynomial r(x). Thus mβ (RFp ,m ) = mβ (Fp [x]) = Gβ . On the other hand, no two remainder
polynomials r(x), s(x) with degrees less than m can evaluate to the same element of Gβ , because
if r(β) = s(β), then r(x) − s(x) is a nonzero polynomial of degree less than g(x) that evaluates
to 0, contradiction.
Now, as we have already seen, mβ (r(x) + s(x)) = mβ (r(x)) + mβ (s(x)) and mβ (r(x)s(x)) =
mβ (r(x))mβ (s(x)), which verifies that this correspondence is an isomorphism.
We remark that Gβ may be viewed as the smallest subfield of Fq containing the element β,
because any subfield containing β must also contain all powers of β and all linear combinations
of powers over Fp .

7.8.3 Isomorphism theorems

We have shown that every finite field Fq contains a primitive element α. In this case, the subfield
Gα consisting of all linear combinations over Fp of powers of α must evidently be the whole field
Fq . Thus we obtain our main theorem:

Theorem 7.16 (Every finite field is isomorphic to a field Fg(x) ) Every finite field Fq of
characteristic p with q elements is isomorphic to a polynomial remainder field Fg(x) , where g(x)
is a prime polynomial in Fp [x] of degree m. Hence q = pm for some positive integer m.

Exercise 13. For which integers q, 1 ≤ q ≤ 12, does a finite field Fq exist?
Finally, we wish to show that all fields with pm elements are isomorphic. The following lemma
shows that every prime polynomial g(x) of degree m (we are still assuming that there exists at
least one) is a minimal polynomial of every field with pm elements:
m
Lemma 7.17 Every prime polynomial g(x) ∈ Fp [x] of degree m divides xp − x.

Proof. If g(x) is a prime polynomial in Fp [x] of degree m, then the set RFp ,m with mod-g(x)
arithmetic forms a field Fg(x) with pm elements. The remainder polynomial x ∈ RFp ,m is a field
element β ∈ Fg(x) . Evidently g(β) = 0, but r(β) = 0 if deg r(x) < m; therefore g(x) is the
m m
minimal polynomial of β. Since β p −1 = 1, β is a root of xp −1 − 1. This implies that g(x)
m m
divides xp −1 − 1, and thus also xp − x.
96 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Consequently every field of size pm includes m elements whose minimal polynomial is g(x).
Therefore by the same construction as above, we can prove:

Theorem 7.18 (All finite fields of the same size are isomorphic) For any prime poly-
nomial g(x) ∈ Fp [x] of degree m, every field of pm elements is isomorphic to the polynomial
remainder field Fg(x) .

m
7.8.4 More on the factorization of xp − x

We can now obtain further information on the factorization of xq − x. In view of Theorem 7.16,
we now set q = pm .
We first show that the set of roots of a minimal polynomial gi (x) ∈ Fp [x] is closed under the
operation of taking the pth power. This follows from the curious but important fact that over a
field F of characteristic p, taking the pth power is a linear operation. For example, when p = 2,
squaring is linear because

(α + β)2 = α2 + αβ + αβ + β 2 = α2 + β 2 .

More generally, over any field F,


p � �

p p
(α + β) = αj β p−j ,
j
j=0
�� ��
where pj αj β p−j denotes the sum of pj terms equal to αj β p−j . If F has characteristic p, then
��
the integer pj = p!/(j!)((n − j)!) may be reduced mod p. Now p! contains a factor of p, but
��
for 1 ≤ j ≤ p − 1, j! and (n − j)! do not contain a factor of p. Therefore pj = 0 mod p for
1 ≤ j ≤ p − 1, and
(α + β)p = αp + β p .

By taking the pth power n times, we may extend this result as follows:

Lemma 7.19 (Linearity of taking the pn th power) Over any field F of characteristic p,
for any n ≥ 1, taking the pn th power is linear; i.e.,
n n n
(α + β)p = αp + β p .

m
Note that if F has q = pm elements, then β p = β for all β ∈ F, so this lemma becomes
repetitive for n ≥ m.
�m i
Exercise 14. Using this lemma, prove that if f (x) = i=0 fi x , then
n n n n n n n n n
f p (x) = (f0 + f1 x + f2 x2 + · · · + fm xm )p = f0p + f1p xp + f2p x2p + · · · + fm
p mp
x .

This result yields a useful test for whether a polynomial f (x) ∈ F[x] is in Fp [x] or not, and a
useful formula in case it is:
7.8. EVERY FINITE FIELD IS ISOMORPHIC TO A FIELD FG(X) 97

Lemma 7.20 (Prime subfield polynomials) For any field F of characteristic p and any
f (x) ∈ F[x], f p (x) = f (xp ) if and only if f (x) ∈ Fp [x]; i.e., if and only if all coefficients fi
are in the prime subfield Fp ⊆ F.

Proof. By Exercise 14, we have

f p (x) = (f0 + f1 x + f2 x2 + · · · + fn xn )p = f0p + f1p xp + f2p x2p + · · · + fnp xnp .

Now the elements of F that are in Fp are precisely the p roots of the polynomial xp − x; thus
β p = β if and only if β ∈ Fp . Thus the right side of this equation simplifies to f (xp ) if and only
if fi ∈ Fp for all i.
Exercise 15. Prove that a positive integer n is prime if and only if (x − a)n = xn − a mod n
for every integer a that is relatively prime to n.5
Using Lemma 7.20, we now show that the roots of a minimal polynomial are a cyclotomic coset
2
of the form {β, β p , β p , . . .}:

Theorem 7.21 (Roots of minimal polynomials) Let g(x) be a minimal polynomial of a fi-


2 n−1
nite field F with pm elements. Then the roots of g(x) are a set of the form {β, β p , β p , . . . , β p },
n
where n is a divisor of m. Moreover, g(x) divides xp − x.

Proof. Let β be any root of g(x). Since g(x) ∈ Fp [x], Lemma 7.20 shows that g(xp ) = g p (x).
2 3 i
Therefore g(β p ) = g p (β) = 0. Thus β p is also a root of g(x). Iterating, β p , β p , . . . , β p , . . . are
all roots of g(x). Because F is finite, these roots cannot all be distinct. Therefore let n be the
n j j j+k
smallest integer such that β p = β. Thus β p = β for 1 ≤ j < n. This implies that β p = β p
2 n−1
for 0 ≤ j < n, 1 ≤ k < n; i.e., all elements of the set {β, β p , β p , . . . , β p } are distinct. Thus
2 j m
β, β p , β p , . . . is a cyclic sequence and β p = β if and only if n is a divisor of j. Since β p = β,
we see that n must divide m.
Finally, we show that these roots are all of the roots of g(x); i.e., deg g(x) = n and


n−1
i
g(x) = (x − β p ).
i=0

The right side of this equation is a monic polynomial h(x) ∈ F[x] of degree n. Since the roots
of h(x) are roots of g(x), h(x) must divide g(x) in F[x]. Now, using Lemma 7.20, we can prove
that h(x) is actually a polynomial in Fp [x], because


n−1 �
n−1 �
n−1
pi p pi+1 i
p
h (x) = (x − β ) = (x − β
p
)= (xp − β p ) = h(xp ),
i=0 i=0 i=0
n
where we use the linearity of taking the pth power and the fact that β p = β. Therefore, since
g(x) has no factors in Fp [x], g(x) must actually be equal to h(x).
n n
Finally, since the roots of g(x) all satisfy β p = β, they are all roots of the polynomial xp − x,
n
which implies that g(x) divides xp − x.
5
This is the basis of the polynomial-time primality test of [Agrawal, Kayal and Saxena, 2002].
98 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

This theorem has some important implications. First, the degree n of a minimal polynomial
g(x) of a finite field F with pm elements must be a divisor of m. Second, the subfield Gβ of F
n m
generated by a root β of g(x) must have pn elements. Third, xp − x divides xp − x, since the
n m
elements of Gβ are all the roots of xp − x and are also roots of xp − x.
Conversely, let g(x) be any prime polynomial in Fp [x] of degree n. Then there is a finite field
n
generated by g(x) with pn elements. This proves that g(x) divides xp − x, and thus g(x) divides
m m
xp − x for every multiple m of n. Thus the divisors of xp − x include every prime polynomial
in Fp [x] whose degree n divides m.
m
Moreover, xp − x has no repeated factors. We proved this earlier assuming the existence of a
field F with pm elements; however, we desire a proof that does not make this assumption. The
following exercise yields such a proof.
m
Exercise 16 (xp −x has no repeated factors). The formal derivative of a degree-n polynomial
f (x) ∈ Fp [x] is defined as
�n

f (x) = (j mod p)fj xj−1
j=1

(a) Show that if f (x) = g(x)h(x), then f  (x) = g  (x)h(x) + g(x)h (x).
(b) Show that an prime polynomial g(x) is a repeated divisor of f (x) if and only if g(x) is a
divisor of both f (x) and f  (x).
m
(c) Show that xp − x has no repeated prime factors over Fp .
m
Now we can conclude our discussion of the factorization of xp − x as follows:

m m
Theorem 7.22 (Factors of xp − x) The polynomial xp − x factors over Fp into the product
of the prime polynomials in Fp [x] whose degrees divide m, with no repetitions.

For example, over F2 , we have

x2 + x = x(x + 1);
x4 + x = x(x + 1)(x2 + x + 1);
x8 + x = x(x + 1)(x3 + x2 + 1)(x3 + x + 1);
x16 + x = x(x + 1)(x2 + x + 1)(x4 + x3 + 1)(x4 + x3 + x2 + x + 1)(x4 + x + 1).

Exercise 17. Find all prime polynomials g(x) ∈ F3 [x] of degree 1 and 2 over the ternary field
F3 . Show that the product of these polynomials is x9 − x = x9 + 2x. Explain, with reference to
F9 .
7.9. FINITE FIELDS FP M EXIST FOR ALL PRIME P AND M ≥ 1 99

7.9 Finite fields Fpm exist for all prime p and m ≥ 1


At last we can prove that for every prime p and positive integer m there exists a prime polynomial
g(x) ∈ Fp [x] of degree m. This will prove the existence of a finite field Fg(x) with pm elements.
Using the factorization of Theorem 7.22, we will show that there do not exist enough prime
polynomials of degree less than m that their product could have degree pm .
Let N (n) denote the number of prime polynomials over Fp of degree n. The product of these
m
polynomials has degree nN (n). Since xp − x is the product of these polynomials for all divisors
n of m, and there are no repeated factors, its degree pm is equal to

pm = nN (n) (7.8)
n: n|m

This formula may be solved recursively for each N (m), starting with N (1) = p.
Exercise 18. Calculate N (m) for p = 2 for m = 1 to 10. Check your results against those
stated in Section 7.5.4.
Now we are in a position to prove the desired theorem:

Theorem 7.23 (Existence of prime polynomials) Let N (m) be the number of prime poly-
nomials in Fp [x] of degree m, which is given recursively by (7.8). For every prime p and positive
integer m, N (m) > 0.

Proof. Note first that nN (n) ≤ pn . Thus



pm ≤ mN (m) + pn ≤ mN (m) + (m/2)pm/2 ,
n<m: n|m

where we have upperbounded the number of terms in the sum by m/2 and upperbounded each
term by pm/2 , since the largest divisor of m other than m is at most m/2. Thus
mN (m) ≥ pm − (m/2)pm/2 = pm/2 (pm/2 − m/2).
The quantity pm/2 − m/2 is positive for p = 2, m = 2, and is increasing in both p and m. Thus
mN (m) is positive for all prime p and all m ≥ 2. Moreover N (1) = p.
Since a finite field Fg(x) with pm elements can be constructed from any prime polynomial
g(x) ∈ Fp [x] of degree m, this implies:

Theorem 7.24 (Existence of finite fields) For every prime p and positive integer m, there
exists a finite field with pm elements.

Moreover, for each n that divides m, there exists a unique subfield G with pn elements, namely
n
the roots of the polynomial xp − x:

Theorem 7.25 (Existence of finite subfields) Every finite field with pm elements has a sub-
field with pn elements for each positive integer n that divides m.
m
In summary, the factorization of xp − x into minimal polynomials partitions the elements of
Fpm into cyclotomic cosets whose properties are determined by their minimal polynomials. The
roots of g(x) have multiplicative order k if g(x) divides xk − 1 and does not divide xj − 1 for
j < k. Moreover, the roots of g(x) are elements of the subfield with pn elements if and only if
n
g(x) divides xp − x, or equivalently if their order k divides pn − 1.
100 CHAPTER 7. INTRODUCTION TO FINITE FIELDS

Example 3 (cont.) Over F2 , the polynomial x16 + x factors as follows:

x16 + x = x(x + 1)(x2 + x + 1)(x4 + x3 + 1)(x4 + x3 + x2 + x + 1)(x4 + x + 1).

Moreover, x3 + 1 = (x + 1)(x2 + x + 1) and x5 + 1 = (x + 1)(x4 + x3 + x2 + x + 1). The primitive


elements are thus the roots of x4 + x + 1 and x4 + x3 + 1. If we choose a root of x4 + x + 1 as
α, then F16 = {0, 1, α, . . . , α14 } partitions into cyclotomic cosets as follows:

• One zero element (0), minimal polynomial x;

• One element of order 1 (1), minimal polynomial x + 1;

• Two elements of order 3 (α5 , α10 ), minimal polynomial x2 + x + 1;

• Four elements of order 5 (α3 , α6 , α9 , α12 ), minimal polynomial x4 + x3 + x2 + x + 1;

• Four elements of order 15 (α, α2 , α4 , α8 ), minimal polynomial x4 + x + 1;

• Four elements of order 15 (α7 , α14 , α13 , α11 ), minimal polynomial x4 + x3 + 1.

F16 has a prime subfield F2 consisting of the elements whose minimal polynomials divide x2 + x,
namely 0 and 1. It also has a subfield F4 consisting of the elements whose minimal polynomials
divide x4 + x, namely {0, 1, α5 , α10 }. Alternatively, F∗4 consists of the three elements of F∗16
whose multiplicative orders divide 3.
Exercise 19 (construction of F32 ).
(a) Find the prime polynomials in F2 [x] of degree 5, and determine which have primitive roots.
(b) For some minimal polynomial g(x) with a primitive root α, construct a field Fg(x) with 32
elements. Give a table with the elements partitioned into cyclotomic cosets as above. Specify the
minimal polynomial and the multiplicative order of each nonzero element. Identify the subfields
of Fg(x) .
(c) Show how to do multiplication and division in Fg(x) using this “log table.” Discuss the
rules for multiplication and division in Fg(x) when one of the field elements involved is the zero
element 0 ∈ Fg(x) .
(d) [Optional] If you know something about maximum-length shift-register (MLSR) sequences,
show that there exists a correspondence between the “log table” given above and a certain MLSR
sequence of length 31.
11.3. GRAPH-THEORETIC PROPERTIES OF GRAPHICAL REALIZATIONS 159

Note that there are 4 “active” generators at each of the 12 state times, if we take the time axis
to be circular (“end-around”). On the other hand, if we were to assume a conventional time
axis, then at least 8 generators would have to be active at the central state time.
Note also that if we “unwrap” these generators onto an infinite conventional time axis, then we
get generators for a rate-1/2 16-state period-4 time-varying (or rate-4/8 16-state time-invariant)
binary linear convolutional code, as follows:
···
... 00 11 01 11 01 11 00 00 00 00 ...
... 00 00 11 11 10 01 11 00 00 00 ...
... 00 00 00 11 01 10 11 11 00 00 ...
... 00 00 00 00 11 01 11 01 11 00 ...
···
This “Golay convolutional code” has a minimum Hamming distance of 8 and an average of
Kb = 12.25 weight-8 codewords per information bit, so its nominal coding gain is γc = 4 (6 dB)
and its effective coding gain is γeff = 5.3 dB, which are remarkable for a 16-state rate-1/2 code.
In summary, by considering a state realization with a single cycle rather than a conventional
trellis realization, we may be able to obtain a state complexity as small as the square root of
the minimum state complexity of a conventional trellis.

11.3.5 Hadamard-transform-based realizations of RM codes

In Exercise 6 of Chapter 6, it was shown that all Reed-Muller codes RM(r, m) of length 2m could
be generated by a single “universal”
2m × 2m generator matrix Um = (U1 )⊗m , the m-fold tensor
product of the 2 × 2 matrix U1 = 11 01 with itself. The matrix Um is called the Hadamard
m
transform matrix over F2 . For any binary 2m -tuple u ∈ (F2 )2 , the binary 2m -tuple y = uUm
is called the Hadamard transform of u. Since (Um )2 = I2m , the identity matrix, it follows that
the Hadamard transform of y is u; i.e., u = yUm
More particularly, RM(r, m) = {y = uUm }, where the coordinates of the binary 2m -tuple u are
free in the k(r, m) positions corresponding to the k(r, m) rows of Um of weight 2m−r or greater,
and fixed to 0 in the remaining coordinates. In other words, RM(r, m) is the set of Hadamard
transforms of all 2k(r,m) binary 2m -tuples that are all-zero in a certain 2m − k(r, m) coordinates.
(Compare the Fourier transform characterization of Reed-Solomon codes in Chapter 8.)
We can construct a graphical realization of a Hadamard transform as follows. The 2 × 2
Hadamard transform y = uU1 is explicitly given by the two equations
y0 = u0 + u1 ;
y1 = u 1 ,
which are realized by the normal graph of Figure 8. (This is sometimes called a controlled-not
gate, where y1 = u1 is regarded as a control variable.)
y0 + u0

y1 = u1

Figure 8. Normal graph of a 2 × 2 Hadamard transform.


160 CHAPTER 11. CODES ON GRAPHS

Note that there are no arrows (directed edges) in this behavioral realization. Either u or y
may be taken as input, and correspondingly y or u as output; i.e., the graph is a realization of
either the Hadamard transform y = uU1 or the inverse Hadamard transform u = yU1 .
A 2m × 2m Hadamard transform y = uUm may then be realized by connecting these 2 × 2
transforms in tensor product fashion. For example, the 8 × 8 Hadamard transform is given
explicitly by the eight equations

y 0 = u0 + u1 + u2 + u3 + u4 + u5 + u6 + u7 ;
y1 = u1 + u3 + u5 + u7 ;
y2 = u 2 + u 3 + u 6 + u 7 ;
y3 = u 3 + u 7 ;
y4 = u 4 + u 5 + u 6 + u 7 ;
y5 = u 5 + u 7 ;
y6 = u 6 + u 7 ;
y7 = u 7 .

These equations are realized by the tensor product graph of Figure 9. (Compare the “butterflies”
in the graph of an 8 × 8 fast Fourier transform.)

y0 + + + u0

y1 = = = u4
@ � @ 

@ @
y2 + � @ + @ + u2
A 
A
y3 = = A = u6
B A 
B A
y4 + + B  A + u1
B
B
y5 = =  B = u5
@ � B�

@ �B
y6 + � @ + � B + u3

y7 = = = u7

Figure 9. Normal graph of an 8 × 8 Hadamard transform.

A Reed-Muller code of length 8 may then be realized by fixing certain of the uk to zero while
letting the others range freely. For example, the (8, 4, 4) code is obtained by fixing u0 = u1 =
u2 = u4 = 0, which yields the equations

y0 = u3 + u5 + u6 + u7 ;
y1 = u3 + u 5 + u7 ;
y 2 = u3 + u6 + u7 ;
y 3 = u3 + u7 ;
11.3. GRAPH-THEORETIC PROPERTIES OF GRAPHICAL REALIZATIONS 161

y 4 = u5 + u6 + u7 ;
y 5 = u5 + u7 ;
y6 = u 6 + u 7 ;
y7 = u 7 .

These equations are realized by the graph of Figure 10(a), which may be simplified to that
of Figure 10(b). Here we regard the “inputs” uj as internal variables, and the “outputs” yk as
external variables.

y0 + + + 0 y0 + u6

y1 = = = 0 y1 = u3 + u5 = u6
@ � @  +u@7 � @
@
� @ �
@ @
y2 + � @ + @ + 0 y2 + u6� @ + u5 @
A  A
A A
y3 = = A = u6 y3 = u 3 + u7 A
B A  A 
B  A A
y4 + + B  A + 0 y4 + u6  A
B 
B 
y5 = =  B = u5 y5 = u5 + u7 = 
@ � B� @� �
@
� �B @
� �
y6 + � @ + � B + u3 y6 + u6� @ + �

y7 = = = u7 y7 = u7

Figure 10. (a) Normal graph of (8, 4, 4) RM code. (b) Equivalent realization.

In Figure 10(b), all state variables are binary and all constraint codes are simple (3, 2, 2) parity-
check constraints or (3, 1, 3) repetition constraints. It is believed (but not proved) that this
realization is the most efficient possible realization for the (8, 4, 4) code in this sense. However,
Figure 10(b) has cycles.
It is easy to see how the cycle-free graph of Figures 7(a) (as well as 7(b), or a minimal four-
section, four-state trellis) may be obtained by agglomerating subgraphs of Figure 10(b). Such
a graph is depicted in Figure 11. The code symbols are partitioned into four 2-tuples. A state
space of dimension 2 connects the two halves of a codeword (meeting the cut-set bound). Two
constraint codes of length 6 and dimension 3 determine the possible combinations of symbol
4-tuples and state 2-tuples in each half of the code.

y 2 y3 � y
@ 4 y5
�2@
2 2@

(6, 3) (6, 3)
2� 2
@�
y0 y@
1@ �y6 y7
Figure 11. Tree-structured realization of (8, 4, 4) RM code.
162 CHAPTER 11. CODES ON GRAPHS

Similarly, we may realize any Reed-Muller code RM(r, m) in any of these styles. By starting
with a Hadamard transform realization as in Figure 10(a) and reducing it as in Figure 10(b), we
can obtain a realization in which all state variables are binary and all constraint codes are simple
(3, 2, 2) parity-check constraints or (3, 1, 3) repetition constraints; however, such a realization will
generally have cycles. By agglomerating variables, we can obtain a tree-structured, cycle-free
realization as in Figure 11 which reflects the |u|u + v| iterative RM code construction.
Exercise 1. (Realizations of repetition and SPC codes)
Show that a reduced Hadamard transform realization of a repetition code RM(0, m) or a
single-parity-check code RM(m−1, m) is a cycle-free tree-structured realization with a minimum
number of (3, 1, 3) repetition constraints or (3, 2, 2) parity-check constraints, respectively, and
furthermore with minimum diameter (distance between any two code symbols in the tree).
Show that these two realizations are duals; i.e., one is obtained from the other via interchange
of (3, 2, 2) constraints and (3, 1, 3) constraints.
Exercise 2. (Dual realizations of RM codes)
Show that in general a Hadamard transform (HT) realization of any Reed-Muller code
RM(r, m) is the dual of the HT realization of the dual code RM(m − r − 1, m); i.e., one is
obtained from the other via interchange of (3, 2, 2) constraints and (3, 1, 3) constraints.
Exercise 3. (General tree-structured realizations of RM codes)
Show that there exists a tree-structured realization of RM(r, m) of the following form:

2m−2 C2 C2 2m−2
s(r, m)
HH
s(r, m)

H 
C1 s(r, m) C1
 H H
s(r, m) H
s(r, m)
2m−2 C2 C2 2m−2

Figure 12. Tree-structured realization of RM(r, m).

Show that s(r, m) = dim RM(r, m − 1) − dim RM(r − 1, m − 1) (see Exercise 1 of Chapter 10).
Show that the cut-set bound is met everywhere. Finally, show that
dim C2 = dim RM(r, m − 2);
dim C1 = dim RM(r, m − 1) − 2 dim RM(r − 2, m − 2) = t(r, m),
where t(r, m) is the branch complexity of RM(r, m) (compare Table 1 of Chapter 6). For example,
there exists a tree-structured realization of the (32, 16, 8) RM code as follows:

8 (14, 7) (14, 7) 8
H6
HH
6 


(18, 9) 6 (18, 9)

6
HH
6 H
8 (14, 7) (14, 7) 8

Figure 13. Tree-structured realization of (32, 16, 8) RM ocde.


11.4. APPENDIX. CLASSES OF GRAPHICAL REALIZATIONS 163

11.4 Appendix. Classes of graphical realizations


There are various classes of graphical realizations that can be used for general linear behavioral
realizations. Here we will briefly discuss factor graphs, Markov graphs,, and block diagrams.

11.4.1 Factor graphs

A factor graph represents a global function of a set of variables (both internal and external) that
factors into a product of local functions defined on subsets of the variables.
The indicator function ΦB(y, s) of a behavior B is a {0, 1}-valued function of external variables
y and internal variables s that equals 1 for valid trajectories (y, s) and equals 0 otherwise. If a
trajectory (y, s) is valid whenever its components lie in a set of local constraint codes {Ck , k ∈ K},
then the global indicator function ΦB is the product of local indicator functions {ΦCk , k ∈ K}.
Thus a behavioral realization may be represented by a factor graph.
A Tanner-type factor graph is an undirected bipartite graph in which variables are represented
by one type of vertex (with internal and external variables denoted differently), and functions
are represented by a different type of vertex. A Tanner graph of a behavioral realization may
be interpreted as a Tanner-type factor graph simply by regarding the constraint vertices as
representatives of constraint indicator functions. Similarly, a normal (Forney-type) factor graph
is an undirected graph in which internal variables are represented by edges, external variables
are represented by dongles, and functions are represented by vertices; in the same way a normal
graph of a behavioral realization may be interpreted as a normal factor graph.
In the following chapters, we will be interested in global probability functions that factor into
a product of local probability functions; then factor graphs become very useful.

11.4.2 Markov graphs

Markov graphs are often used in statistical physics and statistical inference to represent global
probability distributions that factor into a product of local distributions.
A Markov graph (Markov random field) is an undirected graph in which variables are repre-
sented by vertices, and a constraint or function is represented by an edge (if it has degree 2), or
by a hyperedge (if it has degree greater than 2). Moreover, a hyperedge is usually represented by
a clique, i.e., a set of ordinary edges between every pair of variables incident on the hyperedge.
(This style of graph representation sometimes generates inadvertent cliques.)
Markov graphs are particularly nice when the degrees of all constraints are 2 or less. Such a
representation is called a pairwise Markov graph. We may then represent constraints by ordinary
edges. Pairwise constraints often arise naturally in physical models.
Figure 14 shows how any Tanner graph (or Tanner-type factor graph) may be transformed
into a pairwise Markov realization by a simple conversion. Here each constraint code has been
replaced by a state “supervariable” whose alphabet is the set of all codewords in the constraint
code. Each edge then represents the constraint that the associated ordinary variable must be
equal to the corresponding component of the supervariable.
56 CHAPTER 4. HAMMING CODES

For a Hamming code, the covering radius is 1. Indeed, for any perfect e-
error-correcting code, the covering radius is e.

( 4.2.1) Proposition. Let C be an e-error-correcting code. Then cr(C) ≥ e


with equality if and only if C is a perfect e-error-correcting code.

Proof. As C is an e-error-correcting code, the spheres of radius e around


codewords are pairwise disjoint. Therefore the spheres of radius e − 1 around
codewords do not cover the whole space. Thus cr(C) > e − 1, whence cr(C) ≥ e.
If we have equality, then we must have equality in the Sphere Packing Bound
2.2.6, hence the code is perfect. 2

( 4.2.2) Proposition. The covering radius of the linear code C is equal to


the maximum weight of a coset leader.

Proof. The coset of the word −x consists of the sum of −x with each
individual codeword of C, so the weights of the coset members give the distances
of x from the various codewords. The minimal such weight is thus the distance
of x from the code and also the weight of a coset leader. The maximum weight
of a coset leader is therefore the largest distance of any word x from the code.
2

As with dmin , the covering radius of a code is, in general, difficult to compute.
The following problem, reminiscent of Problem 4.1.5, can be of great help.

( 4.2.3 ) Problem. Let the [n1 , k1 ] linear code C1 over F have generator matrix G1 ,
and let the [n2 , k2 ] linear code C2 over F have generator matrix G2 . Consider the
[n1 + n2 , k1 + k2 ] linear code C over F with generator matrix
» –
0 G1
G= ,
G2 ∗

where the upper left 0 is a k1 × n2 matrix of 0’s and the lower right ∗ is an arbitrary
k2 × n1 matrix with entries from F .
Prove that cr(C) ≤ cr(C1 ) + cr(C2 ).

4.3 First order Reed-Muller codes


In 1954, I.S. Reed and D.E. Muller introduced independently a class of binary
codes of length 2m , for any integer m, associated with Boolean logic. The first
of these codes, for each m, fits naturally into the context of Hamming codes.
A code dual to a binary extended Hamming code is called a first order Reed-
first order Reed-Muller code Muller code, denoted RM(1, m) where m is the redundancy of the associated
Hamming code. Any code that is equivalent to a first order Reed-Muller code is
also first order Reed-Muller. We shall concentrate on the specific code RM(1, m)
with generator matrix XLm .
The associated dual Hamming code is sometimes called a shortened first
shortened first order order Reed-Muller code or a simplex code. The dual Hamming code can be
Reed-Muller code
simplex code
4.3. FIRST ORDER REED-MULLER CODES 57

easily recovered from RM(1, m). Indeed by first choosing all the codewords of
RM(1, m) that have a 0 in their first coordinate position and then deleting this
now useless coordinate, we find the dual Hamming code. This is clear when we
consider how the matrix XLm was constructed by bordering the matrix Lm , the
generator matrix of the dual lexicographic Hamming code. (See page 51.)
Having earlier constructed the generator XLm as a matrix in bordered block
form, we now examine it again, but blocked in a different manner. Notice that
RM(1, 1) = F22 , RM(1, 2) is the parity check code of length 4, and RM(1, 3) is a
self-dual extended [8, 4] Hamming code.
Examples.
2 3
» – 0 0 1 1
0 1
XL1 = and XL2 = 4 0 1 0 1 5
1 1
1 1 1 1
2 3
0 0 0 0 1 1 1 1
6 0 0 1 1 0 0 1 1 7
XL3 = 6
4 0
7
1 0 1 0 1 0 1 5
1 1 1 1 1 1 1 1
2 3
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
6
6 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 7
7
XL4 = 6
6 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 7
7
4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
For i between 2m−1 and 2m , the column m-tuple containing the binary rep-
resentation of i is just that for i − 2m−1 with its leading 0 replaced by a 1.
Therefore, if we ignore the top row of XLm , then the remaining m rows consist
of an m × 2m−1 matrix repeated twice. Indeed this repeated matrix is nothing
other than XLm−1 . We now observe the recursive construction:
 
0 ··· 0 1 ··· 1
XLm = .
XLm−1 XLm−1
( 4.3.1) Theorem. For each m, the first order Reed-Muller code RM(1, m) is
a binary linear [2m , m + 1, 2m−1 ] code.
Proof. Certainly RM(1, m) is linear of length 2m , and its dimension m + 1
is evident from the generator matrix XLm . From their generator matrices XL1
and XL2 , it is easy to see that RM(1, 1) (= F22 ) and RM(1, 2) (the parity check
code of length 4) both have minimum distance 2m−1 .
We now verify the minimum distance in general by induction, assuming that
we already have dmin (RM(1, m − 1)) = 2m−2 . Let C1 be RM(1, m − 1) with
minimum distance d1 = 2m−2 , and let C2 be the repetition code of length
2m−1 , whose minimum distance is therefore d2 = 2m−1 . The generator matrix
XLm for RM(1, m) is then constructed from the generators G1 = XLm−1 and
G2 = [1 · · · 1] according to the recipe of Problem 4.1.5. Therefore, by that
problem, we have
dmin (RM(1, m)) = min(2d1 , d2 ) = min(2.2m−2 , 2m−1 ) = 2m−1 ,
58 CHAPTER 4. HAMMING CODES

as claimed. 2

( 4.3.2) Theorem. The first order Reed-Muller code RM(1, m) consists of a


unique word of weight 0, namely 0, a unique word of weight 2m , namely 1, and
2m+1 − 2 words of weight 2m−1 .

Proof. The last row of the generator matrix XLm is 1; so 0 and 1 are
the unique codewords of weight 0 and 2m , respectively. By Theorem 4.3.1 the
linear code RM(1, m) has no codewords c of weight between 0 and 2m−1 , and
so it also has no codewords 1 + c of weight between 0 and 2m−1 . That is, it has
no codewords of weight between 2m−1 and 2m . Therefore all codewords other
than 0 and 1 have weight exactly 2m−1 . 2

( 4.3.3) Corollary. The dual of the binary Hamming code of redundancy


m consists of a unique word of weight 0, namely 0, and 2m − 1 words of weight
2m−1 .

Proof. In recovering the dual Hamming code from RM(1, m), we shorten
the code by taking all codewords that begin with 0 and then delete that position.
In particular the codeword 1 of RM(1, m) does not survive. But by Theorem
4.3.2 all other nonzero codewords of RM(1, m) have weight 2m−1 . As only zeros
are deleted, all the nonzero codewords of the dual Hamming code also will have
weight 2m−1 . 2

equidistant codes These dual Hamming codes are equidistant codes in that distinct codewords
are at a fixed distance from each other, here 2m−1 . They satisfy the Plotkin
bound 2.3.8 with equality. (The proof of the Plotkin bound as given in Prob-
lem 3.1.5 compares the minimum distance with the average distance between
codewords. For an equidistant code these are the same.)
For a binary word x ∈ Fn2 , consider the corresponding word x∗ ∈ {+1, −1}n
gotten by replacing each 0 by the real number +1 and each 1 by −1.

( 4.3.4) Lemma. If x, y ∈ Fn2 , then as vectors of real numbers x∗ · y∗ =


∗ ∗
n − 2dH (x, y). In particular if x, y ∈ F2h
2 with dH (x, y) = h, then x · y = 0.

Proof. The dot product of two ±1 vectors is the number of places in which
they are the same minus the number of places where they are different. Here
that is (n − dH (x, y)) − dH (x, y). 2

Let RM(1, m)± be the code got by replacing each codeword c of RM(1, m)
with its ±1 version c∗ . List the codewords of RM(1, m)± as c∗1 , c∗2 , . . . , c∗2m+1 .

( 4.3.5) Lemma. If c∗ ∈ RM(1, m)± then also −c∗ ∈ RM(1, m)± . We have

c∗i · c∗j = 2m if c∗i = c∗j


= −2m if c∗i = −c∗j
= 0 if c∗i 6= ±c∗j .
4.3. FIRST ORDER REED-MULLER CODES 59

Proof. As 1 ∈ RM(1, m) we have (1 + c)∗ = −c∗ ∈ RM(1, m)± . By


Theorem 4.3.2, if distinct b, c ∈ RM(1, m) with b 6= 1+c, then dH (b, c) = 2m−1 .
The lemma follows from Lemma 4.3.4. 2

We use this lemma as the basis of a decoding algorithm. When a vector r


is received, calculate each of the dot products r · c∗i , for i = 1, . . . , 2m+1 . Then
decode to that codeword c∗j that maximizes the dot product.
In fact this can be done a little more efficiently. Arrange our listing of
RM(1, r) so that c∗i+2m = −c∗i , for each i = 1, . . . , 2m . That is, the second half
of the list is just the negative of the first half. In decoding, we calculate only
those dot products from the first half r · c∗i , for i = 1, . . . , 2m , and select that j
that maximizes the absolute value |r · c∗j |. The received word r is then decoded
to c∗j if r · c∗j is positive and to −c∗j if r · c∗j is negative.
We organize this as Hadamard transform decoding. Set n = 2m , and let Hn Hadamard transform decoding
be the n × n matrix whose ith row is the codeword c∗i . The dot products of
Lemma 4.3.5 then give
Hn Hn> = n In×n ,
since the negative of a row of Hn is never a row. Upon receiving the vector r,
we calculate its Hadamard transform b r = Hn r> . If rbj is that coordinate of br
that has the largest absolute value, then we decode r to c∗j in case rbj > 0 or to
to −c∗j in case rbj < 0.
An important aspect of Hadamard transform decoding is that it is a soft
decision algorithm rather than a hard decision algorithm. We need not require
that the received vector r have entries ±1. Its entries can be arbitrary real
numbers, and the algorithm still works without modification.
Example. Consider the code RM(1, 3)± which comes from RM(1, 3) of
length n = 23 = 8 with generator matrix XL3 . Let
2 3
+1 +1 +1 +1 +1 +1 +1 +1
6 +1 −1 +1 −1 +1 −1 +1 −1 7
6 7
6 +1 +1 −1 −1 +1 +1 −1 −1 7
6 7
6 +1 −1 −1 +1 +1 −1 −1 +1 7
H8 = 66 +1 +1 +1 +1 −1 −1 −1
7 ,
6 −1 7
7
6 +1 −1 +1 −1 −1 +1 −1 +1 7
6 7
4 +1 +1 −1 −1 −1 −1 +1 +1 5
+1 −1 −1 +1 −1 +1 +1 −1

so H8 H8> = 8 I8×8 . The codewords of RM(1, 3)± are then the rows of H8
and their negatives.
Suppose we receive the vector r = (1, 1, −1, 1, −1, −1, 1, 1). This has
Hadamard transform H8 r> = (2, −2, 2, −2, 2, −2, 6, 2). The entry with
largest absolute value is rb7 = 6 > 0, so we decode to

c∗7 = (+1, +1, −1, −1, −1, −1, +1, +1) .

If next we receive r = (−.7, 1, 0, −.8, −.9, 1, .9, −1), then

H8 r> = (−.5, −.9, 1.3, −6.3, −.5, −.9, .9, 1.3) .


60 CHAPTER 4. HAMMING CODES

The entry with largest absolute value is rb4 = −6.3 < 0, so we decode to

−c∗4 = (−1, +1, +1, −1, −1, +1, +1, −1) .

( 4.3.6 ) Problem. Assume that you are using the code RM(1, 3)± of the example.
Use Hadamard transform decoding to decode the received word

(.5 , .4 , −.6 , .5 , .6 , −.3 , .5 , −.6 ) .

For any positive integer n, a ±1 square matrix H of side n that satisfies

HH > = n In×n

Hadamard matrix is called a Hadamard matrix. If we take as a code the rows of H and their
negatives, then Hadamard transform decoding will work exactly as described
Hadamard code above. Such a code (or its {0, 1} counterpart) is called a Hadamard code.

( 4.3.7 ) Problem. Prove that a Hadamard matrix of side n must have n = 1, 2


or n a multiple of 4. ( Remark. It is a long-standing conjecture of combinatorial
design theory that the converse of this problem is true: for each such n, there exists a
Hadamard matrix.)

Begin with a Hadamard code of side n. Choose those n codewords that start
with +1, drop that position, and translate back to a {0, 1} code. The result is
a binary code of length n − 1 and size n which is equidistant of distance n/2. A
shortened Hadamard code code constructed in this fashion is a shortened Hadamard code. Starting with
the matrix H8 of the example above, we recover from RM(1, 3) and RM(1, 3)±
the [7, 3] dual Hamming code.

( 4.3.8 ) Problem. Let h be a positive integer. Let C be a binary equidistant code of


length 2h − 1 and size 2h with distance h.
(a) Prove that C is a shortened Hadamard code.
(b) Prove that C meets the Plotkin bound 2.3.8 with equality.

Although any Hadamard matrix can be used to design a code that allows
Hadamard transform decoding, there are certain advantages to be gained from
using those matrices that come from Reed-Muller codes as described. The
existence of a soft decision algorithm is good, but we hope to implement it as
efficiently as possible. Consider decoding using the matrix H8 of the example.
Each decoding process requires 63 operations, 56 additions for calculating the
8 dot products and 7 comparisons among the answers. (By convention the
operation of negation is considered to make no contribution.) Certain annoying
repetitions are involved. For instance, both the second and the sixth rows of
H8 begin +1, −1, +1, −1; so the corresponding calculation r1 − r2 + r3 − r4 is
made twice during the decoding process. Can this and other patterns within
H8 be exploited? The answer is “yes,” and it is this fact that makes a matrix
derived from RM(1, m) a better choice than other Hadamard matrices with the
same dimensions.
4.3. FIRST ORDER REED-MULLER CODES 61

Let H1 = [1], a 1×1 Hadamard matrix, and define recursively a 2m+1 ×2m+1
matrix in block form
 
+H2m +H2m
H2m+1 = .
+H2m −H2m

Then  
+1 +1
H2 = ,
+1 −1
and    
+1 +1 +1 +1 +1 +1 +1 +1
 +1 −1 +1 −1 
 =  +1 −1 +1 −1  .
 
H4 = 
 +1 +1 −1 −1   +1 +1 −1 −1 
+1 −1 −1 +1 +1 −1 −1 +1
The matrix H8 is that of the example. This construction can be continued,
for all m. The matrix H2m produced is a Hadamard matrix associated with
RM(1, m)± and the Reed-Muller code RM(1, m) whose generator matrix is XLm .
The recursive construction of H2m is related to that of XLm and admits a
streamlined implementation of decoding for RM(1, m) and RM(1, m)± , using
the so-called Fast Hadamard Transform or F HT algorithm. For instance, F HT
decoding of RM(1, 3)± can be achieved with 31 operations rather than the 63
counted previously.
The Reed-Muller codes in general, and the code RM(1, 3) in particular, are
important codes that often arise as constituents of larger codes. It is therefore
worthwhile to have decoding algorithms that are as efficient as possible. Sun
and Van Tilborg have given a soft decision algorithm for RM(1, 3) that is related
to F HT decoding but only needs, on the average, 14 operations with worst-case
performance of 17 operations.
POLYNOMIAL CODES AND FINITE
GEOMETRIES∗
E. F. Assmus, Jr and J. D. Key

Contents
1 Introduction 2

2 Projective and affine geometries 3


2.1 Projective geometry . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Affine geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Designs from geometries . . . . . . . . . . . . . . . . . . . . . 10
2.4 Codes from designs . . . . . . . . . . . . . . . . . . . . . . . . 11

3 The Reed-Muller codes 12


3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Geometries and Reed-Muller codes . . . . . . . . . . . . . . . 16
3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 The group-algebra approach 25


4.1 Elementary results and Berman’s theorem . . . . . . . . . . . 26
4.2 Isometries of the group algebra . . . . . . . . . . . . . . . . . 28
4.3 Translation-invariant extended cyclic codes . . . . . . . . . . 30
4.4 The generator polynomials of punctured Reed-Muller codes
and their p-ary analogues . . . . . . . . . . . . . . . . . . . . 33
4.5 Orthogonals and annihilators . . . . . . . . . . . . . . . . . . 36
4.6 The codes of the designs from AGm (Fp ) . . . . . . . . . . . . 38

The authors wish to thank Paul Camion, Pascale Charpin and Projet Codes at INRIA
for the hospitality and support shown during the preparation of this manuscript. In
particular, the first author spent much of 1992-1993 at Projet Codes where the bulk of
his work on the chapter was completed.

1
1 INTRODUCTION 2

5 Generalized Reed-Muller codes 42


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 The single-variable approach . . . . . . . . . . . . . . . . . . 50
5.4 Roots, dimensions and minimum weights . . . . . . . . . . . . 54
5.5 Codes invariant under the full affine group . . . . . . . . . . . 60
5.6 The geometric codes . . . . . . . . . . . . . . . . . . . . . . . 66
5.7 The codes of the designs from P Gm (Fp ) . . . . . . . . . . . . 69
5.8 The subfield subcodes . . . . . . . . . . . . . . . . . . . . . . 73
5.9 Formulas for p-ranks . . . . . . . . . . . . . . . . . . . . . . . 79

1 Introduction
The reader familiar with “Designs and their Codes” will soon understand the
debt this chapter owes to that book — especially its Chapter 5. We have,
however, entirely reworked that material and, more importantly, added a
discussion of the group-algebra approach to the Reed-Muller and generalized
Reed-Muller codes. This enables us to include a straightforward new proof
of Berman’s theorem identifying the Reed-Muller codes with the radical
powers in the appropriate modular group algebra and to use our treatment
of the Mattson-Solomon polynomial to give a proof of the generalization
of Berman’s theorem to the p-ary case. We have also included Charpin’s
treatment [16] of the characterization of “affine-invariant” extended cyclic
codes due to Kasami, Lin and Peterson.
We have relied heavily on Charpin’s doctoral thesis [14, 16] for the new
material. The older material relies (as did Chapter 5 of our book) on the
treatment of the polynomial codes introduced by Kasami, Lin and Peterson
[29] given by Delsarte, Goethals and MacWilliams [18].
Our definition of the generalized Reed-Muller codes is the straightfor-
ward generalization of the boolean-function definition of the Reed-Muller
codes and, for us, the cyclicity of the punctured variants is simply a conse-
quence of the easily seen fact that their automorphism groups contain the
general linear groups.
We are, of course, principally interested in the geometric nature of certain
of these codes. Were one interested only in the binary case the development
would be very short and our treatment reflects that fact in that we first dis-
cuss the Reed-Muller codes giving complete proofs that differ substantially
from those given for the general case. In fact, we have here an instance in
2 PROJECTIVE AND AFFINE GEOMETRIES 3

which the generalization to an arbitrary finite field seems far from trivial,
the biggest hurdle being the passage to fields that are not of prime order.
The peculiar nature of the definitions of the geometric codes in the
coding-theory literature was due to the interest — at the time of their in-
troduction — in majority-logic decoding of these codes; we therefore also
give a short discussion of decoding. On the other hand, we give the natu-
ral definitions of the geometric codes (as codes generated by the incidence
vectors of the geometric objects at hand) and, hence, our definitions are not
the ones found in many engineering texts.
We review the necessary geometry briefly before beginning our discussion
of the codes; our treatment is undoubtedly too brief to be useful to a reader
with no background whatsoever in finite geometry and such a reader may
wish to jump directly to Section 3 — which may even motivate a study of
the geometry involved. Much of the material will be understandable even
without a firm grip on the geometry and subsequent sections should be of
interest to professional coding theorists. We have, at least, tried to make
them so.
We assume a knowledge of coding theory and we believe the reader will
find in Chapter 1 the coding theory necessary for a study of this chapter.
We have not attempted to discuss open problems or to explore new
avenues of research. The reader interested in such matters may wish to
consult our book [2] or the articles cited in the bibliography.

2 Projective and affine geometries


Let F be a field and V a vector space over F . We denote by P G(V ) the
projective geometry of V . Its elements are the subspaces of V and its
structure is given by set-theoretic inclusion. Similarly, AG(V ) denotes the
affine geometry of V . Its elements are the cosets, x + U , of subspaces U
of V , where x is any vector in V , and again the structure is given by set-
theoretic inclusion. The “geometry” of these structures arises by viewing
inclusion as an incidence relation.

2.1 Projective geometry


If the vector space V has dimension n + 1 over F , then P G(V ) has projec-
tive dimension n. We record this with the notation P Gn (F ), realizing V
as F n+1 . In this case a “point” of the geometry is given in homogeneous co-
ordinates by (x0 , x1 , . . . , xn ) where all xi are in F and are not all zero; each
2 PROJECTIVE AND AFFINE GEOMETRIES 4

point has many such coordinate representations1 , in fact q − 1 when F is Fq ,


since (x0 , x1 , . . . , xn ) and (λx0 , λx1 , . . . , λxn ) yield the same 1-dimensional
subspace of F n+1 for any non-zero λ, the 1-dimensional subspaces being the
points — or objects of projective dimension 0. Similarly, the projective
dimension of any subspace is defined to be 1 less than the dimension of
the subspace (as a vector space over F ).
Thus the points of P G(V ) are the 1-dimensional subspaces of V , the
lines are the 2-dimensional subspaces of V , the planes the 3-dimensional
subspaces of V , and the hyperplanes the n-dimensional subspaces of V .
Neither {0} nor V play a significant role in projective geometry and they
are usually ignored. Frequently when working with projective geometry the
projective dimension is referred to simply as the dimension. The dimension
formula for subspaces of V holds for projective dimension as well, provided
it is written as follows:
dim(U ) + dim(W ) = dim(U + W ) + dim(U ∩ W ),
where U and W are arbitrary non-zero subspaces of V and U + W =
hU ∪ W i = {u + w | u ∈ U, w ∈ W }. Note that we use hSi to denote the
subspace generated by the set S. The formula has the following important
consequence:
Suppose H is a hyperplane of P Gn (F ). If U is a subspace of dimension
t > 0, then U ∩ H has dimension t or t − 1, the former if and only if U is
contained in H.
If P and Q are distinct points of P G(V ), then P + Q is necessarily a
line of P G(V ), again by the above formula, and, in fact, it is the unique
line through P and Q. Thus every two distinct points lie on a unique line.
In projective dimension 2, i.e. in a projective plane, every two distinct lines
intersect in a unique point. We will, as here, use geometric terminology
whenever convenient.
If F = Fq and V is m-dimensional, one can see by counting bases that
the number of subspaces of V of dimension k, where 0 < k ≤ m, is
(q m − 1)(q m − q) . . . (q m − q k−1 )
.
(q k − 1)(q k − q) . . . (q k − q k−1 )
Similarly — or by using the above formula on a quotient space — if V
is of dimension m, U a subspace of dimension r, and k an integer with
1
Except in the binary case; it is this uniqueness that makes the Reed-Muller codes so
much easier to analyze than the generalized Reed-Muller codes.
2 PROJECTIVE AND AFFINE GEOMETRIES 5

0 ≤ r < k ≤ m, then the number of subspaces of V of dimension k that


contain U is
(q m − q r )(q m − q r+1 ) . . . (q m − q k−1 )
.
(q k − q r )(q k − q r+1 ) . . . (q k − q k−1 )
q n+1 −1
In particular, the number of points of P Gn (Fq ) is q−1 = q n +q n−1 +· · ·+1
q n+1 −q
and the number of lines in the pencil of lines containing a point is q 2 −q
=
q n −1
q−1 = q n−1 + · · · + 1.

Definition 2.1 If V and W are finite-dimensional vector spaces, then


P G(V ) and P G(W ) are isomorphic if there is a bijection

ϕ : P G(V ) → P G(W )

such that, for U, U 0 ∈ P G(V ), U ⊆ U 0 if and only if U ϕ ⊆ U 0 ϕ. If W = V ,


then such a map ϕ is called an automorphism or collineation of P G(V ).

Since the projective dimension of P G(V ) is equal to the length of the


longest chain, U1 , U2 , . . . , Uk , of elements of P G(V ) satisfying U1 ⊂ U2 ⊂
. . . ⊂ Uk , it follows that isomorphic geometries have the same projective
dimension. That is, V and W must be of the same dimension and, provided
they are vector spaces over the same field, they must be isomorphic as vector
spaces. Any invertible linear transformation from V to W will induce an
isomorphism of the geometries, but something slightly more general will
also, a so-called semilinear transformation:

Definition 2.2 Let F be a field and let V and W be vector spaces over F .
A semilinear transformation of V into W is given by a map

T :V →W

together with an associated automorphism, α(T ), of the field F . The map T


is additive, i.e. (v+u)T = vT +uT for all v, u ∈ V , and (av)T = aα(T ) (vT )
for all a ∈ F and v ∈ V .

A semilinear transformation carries subspaces into subspaces, preserv-


ing inclusion, and thus induces an incidence-preserving map on the projec-
tive geometries. It is an isomorphism of the projective spaces whenever
T is an isomorphism of the additive structures, the inverse being given by
T −1 , with the associated automorphism of F being α(T )−1 . Notice that
2 PROJECTIVE AND AFFINE GEOMETRIES 6

the composition of semilinear transformations is again semilinear and, in


fact, α(ST ) = α(S)α(T ). It follows that when V = W the semilinear iso-
morphisms form a group and that the map sending T to α(T ) defines a
homomorphism into the Galois group of F (here the automorphism group
of F ). The kernel is the group of invertible linear transformations of V .
In terms of bases, given ordered bases v1 , v2 , . . . , vm and w1 , w2 , . . . , wn
of V and W , respectively, then if (vi )T = nj=1 aij wj , A = (aij ) and α =
P

α(T ), then
T : (x1 , x2 , . . . , xm ) 7→ (xα1 , xα2 , . . . , xαm )A,
where, as usual, we have used the bases to identify V with F m and W with
F n . In matrix form, the composition of two semilinear transformations,
(α, A) and (β, B), is (αβ, Aβ B), where Aβ denotes the matrix (aβij ). Since
a matrix A together with an automorphism α clearly yield, by the above
formula, a semilinear transformation, the map sending T to α(T ), in the
case where V = W , is a homomorphism onto the Galois group of F .
Thus, for a given vector space V , the group of semilinear isomorphisms
of V contains GL(V ), the group of invertible linear transformations of V ,
as a normal subgroup, the quotient being the Galois group of F . The group
of semilinear isomorphisms is denoted by Γ L(V ) . Clearly every semilinear
isomorphism of V induces an isomorphism of P G(V ). The scalar trans-
formations (i.e. those that send v to av for some fixed a ∈ F ) induce the
identity isomorphism and they are the only semilinear isomorphisms that
do. The subgroup of scalar transformations is the centre of GL(V ) and a
normal subgroup of Γ L(V ); the quotient groups are denoted, respectively,
by P GL(V ) — the projective general linear group — and P Γ L(V )
— the projective semilinear group. If V is n-dimensional and a basis
has been chosen, P GL(V ) becomes a matrix group modulo scalar matrices
and is denoted by P GLn (F ); similarly in this case we write P Γ Ln (F ) for
P Γ L(V ). Each of these groups acts as a permutation group on the elements
of P G(V ), the action on the points of P G(V ) being doubly-transitive, which
means that given any two pairs of distinct points, (P, Q) and (P 0 , Q0 ), there
is an automorphism in P GL(V ) which simultaneously carries P to P 0 and
Q to Q0 . In the standard notation, P GLn (F ) acts on P Gn−1 (F ); similarly
for the semilinear group.
All the collineations of P G(V ) are induced by semilinear transforma-
tions; this is the content of the following classical fundamental theorem
of projective geometry:
2 PROJECTIVE AND AFFINE GEOMETRIES 7

Theorem 2.3 Let V be a vector space of dimension at least 3. Then


P Γ L(V ) is the full automorphism group of P G(V ).

There are well-established proofs of this theorem readily available: see


Artin [1, Chapter II], for example, or, for a slightly more modern account,
Hahn and O’Meara [24, Chapter 3]. Also note that the theorem starts with
planes; the projective line consists merely of points and the lack of any inci-
dences allows an arbitrary permutation to be admitted as an automorphism.
Amongst the automorphisms of P Gn (Fq ) there is always one of order v =
(q n+1 − 1)/(q − 1) that permutes the points of the geometry in a single cycle
of this length, called a Singer cycle (after Singer [49]). This automorphism
is constructed as follows: view the finite field K = Fqn+1 as a vector space of
dimension n + 1 over the field F = Fq and let ω be a primitive element of K,
that is, a generator of the cyclic group K × = K − {0}. Using the given field
structure, it is clear that multiplication by ω induces a linear transformation
on the vector space V = K. Since the field F has ω v as primitive element,
it is easy to see that this linear transformation induces an automorphism of
P G(V ) that acts as a cycle of length v on the v points of the geometry. In
fact, the 1-dimensional subspaces of V = K given by the non-zero vectors
1, ω, ω 2 , . . . , ω v−1 represent all the points of P Gn (F ), where, of course, ω v
represents the same point as 1, etc.

2.2 Affine geometry


The affine geometry, AG(V ), where V is a vector space over a field F ,
consists of all cosets, x + U , of all subspaces U of V with incidence defined
through the natural inclusion relation. Here the dimension is the same as
that of the vector space—for obvious geometric reasons. The dimension of
a coset is that of the defining subspace U , and if the latter has dimension
r, we will also refer to a coset of U as an r-flat. Thus the points are
all the vectors, including 0, the lines are 1-dimensional cosets, or 1-flats,
the planes are the 2-dimensional cosets, or 2-flats, and so on, with the
hyperplanes the cosets of dimension n − 1 — where V is of dimension n
over F . We also write AGn (F ) for AG(V ), in analogy with the projective
case. The affine geometry of these cosets is defined by the inclusion relation
which specifies that, if M = x + U and N = y + W are cosets in AG(V ),
then M contains N if M ⊇ N , from which it follows that W is a subspace of
U . The affine geometry AG(M ) is, by definition, the set of cosets of AG(V )
that are contained in M together with the induced incidence relation. This
2 PROJECTIVE AND AFFINE GEOMETRIES 8

is quite clear when M is a subspace but if M = x + U with x ∈


/ U it follows
also that AG(M ) is isomorphic to AG(U ) since every element of AG(M )
can be written in the form x + U 0 for some subspace U 0 of U . As in the
projective case we will use standard geometric terminology — in particular
the notion of parallelism:

Definition 2.4 The cosets x+U and y+W in AG(V ) are parallel if U ⊆ W
or W ⊆ U .

Cosets of the same subspace are thus parallel and cosets of the same
dimension are parallel if and only if they are cosets of the same subspace.
For a given subspace U of dimension r, its distinct cosets partition V into
parallel r-flats and parallelism is an equivalence relation on the set of r-flats
of V , the equivalence classes being called parallel classes. Hyperplanes, i.e.
(n − 1)-flats, in AGn (F ) are parallel if and only if they are equal or intersect
in the empty set and in AGn (F2 ) a hyperplane and its complement make
up a parallel class. In AGn (Fq ) there are q hyperplanes in a parallel class.
Here is one more important fact about flats that we will need to properly
explain Reed’s decoding algorithm for Reed-Muller codes:
If M is an r-flat and N an (n − r)-flat in AGn (F ), then either M ∩ N is
a single point, in which case N meets all the r-flats parallel to M in a single
point, or else the intersection of N with an r-flat parallel to M is either a
flat of positive dimension or the empty set.
As in the projective case, both GL(V ) and Γ L(V ) act on the geometry,
but now we also have V itself acting via translation. The underlying action
of the affine general linear group, AGL(V ) , and the affine semilinear
group, AΓ L(V ), is given as follow: for T ∈ Γ L(V ) and v ∈ V , the map
(T, v) is defined by
x(T, v) = xT + v
for each x ∈ V . Such maps preserve cosets and thus act on AG(V ). Com-
position is given by (S, v)(T, w) = (ST, vT + w) and it follows that these
affine groups are semi-direct products of the linear and semilinear groups
(respectively) with the additive group of V , the action of the linear and
semilinear groups on V being the natural one. The permutation action on
the points of AG(V ), i.e. on the vectors in V , is doubly-transitive and, if
F = F2 , it is triply-transitive.
Given a basis v1 , v2 , . . . , vn for V , if (T, v) is an element of AΓ L(V ),
P P
and v = i bi vi , define the matrix A via vi T = j aij vj , and let α be the
2 PROJECTIVE AND AFFINE GEOMETRIES 9

field automorphism associated with T . Then

(T, v) : (x1 , . . . , xn ) 7→ (xα1 , . . . , xαn )A + (b1 , . . . , bn ).

Moreover, given any triple (α, A, (b1 , . . . , bn )) where α is an automorphism of


the field F , A is an n × n matrix with entries from F and (b1 , . . . , bn ) ∈ F n ,
the formula above defines an element of AΓ L(V ) and, in fact, with the
obvious multiplication of the triples,

(α, A, (b1 , . . . , bn ))(β, B, (c1 , . . . , cn )) = (αβ, Aβ B, (bβ1 + c1 , . . . , bβn + cn )),

we have an isomorphism of AΓ L(V ) with this group, denoted by AΓ Ln (F ).


Similarly we write AGLn (F ) for the affine linear group — when it is given
explicitly.
In analogy with the projective case, there is a fundamental theorem of
affine geometry which states that for n ≥ 2, Aut(AGn (F )) = AΓ Ln (F ).
This is the same theorem, in effect, as the fundamental theorem for projec-
tive geometry, if we consider the way in which affine geometries are embed-
ded in projective geometries:

Theorem 2.5 Let V be a vector space over F , H a hyperplane, and x a


vector in V that is not in H. Set P G(V )H = {U | U ∈ P G(V ), U 6⊆ H}.
Define a map
ϕ : AG(x + H) → P G(V )
by M 7→ hM i for any coset M ∈ AG(x+H). Then ϕ is an incidence preserv-
ing injection with image P G(V )H . Further, the inverse map ϕ−1 satisfies

U ϕ−1 = U ∩ (x + H),

for any U ∈ P G(V )H .

This is the fundamental embedding theorem and the proof is quite


direct from the definitions; it can be found in Gruenberg and Weir [23].
Note that the choice of the hyperplane H and vector x that produce the
embedding is not crucial since for another choice, K and y, it is clear
that AG(x + H) is isomorphic to AG(y + K) and, moreover, H and K
are equivalent under the projective group. One generally thinks of the 1-
dimensional subspaces of H as the “points at infinity” of the projective
space P G(V ) and discarding these points leaves the affine geometry of the
same dimension. In coordinate terms one can view H as the hyperplane in
2 PROJECTIVE AND AFFINE GEOMETRIES 10

F n+1 = {(x0 , x1 , . . . , xn ) | xi ∈ F } given by the equation X0 = 0, taking,


for convenience, x = (1, 0, . . . , 0). Then the embedded affine space is F n
viewed as the last n coordinates, where every projective point not at infinity
has homogeneous coordinates that can be taken to be (1, x1 , . . . , xn ). More
precisely, the embedded affine geometry of dimension n is obtained from a
projective geometry of dimension n by removing a hyperplane and all the
subspaces contained in it. The points and subspaces remaining form the
affine geometry.
When doing computations one works, normally, in the affine space. In an
affine geometry of dimension n, once a basis is chosen for the vector space,
any r-flat can be given by a set of (n − r) independent linear equations
and solutions are points of the geometry. In the projective case one uses
homogeneous equations, of course, and only looks for non-zero solutions —
which are not precisely the points but only representatives. So, for example,
in AG4 (F ) the equations X1 + X2 − X3 = 0 and X1 + X4 = 1 define a
2-flat; it is given by (0, 0, 0, 1) + U where U is the 2-dimensional subspace
{(x, y, x + y, −x) | x, y ∈ F }. In other words the 2-flat consists of all vectors
in F 4 of the form {(x, y, x + y, 1 − x}.

2.3 Designs from geometries


To define incidence structures from P G(V ) and AG(V ) we need to choose
point sets and block sets; the incidence relation will be that of the geometry,
namely containment. In every case the point set of our design will be the set
of points of the geometry: for projective spaces the 1-dimensional subspaces
of V and for affine spaces the vectors of V . For blocks we will take all
the subspaces (or cosets in the affine case) of a fixed dimension. In every
case the double-transitivity of the group involved will assure us that we are
dealing with a 2-design.
Thus, for example, we can consider the design of points and lines, the
design of points and planes, or the design of points and hyperplanes of a
geometry and be assured of a 2-design. The parameters will depend on both
the dimension of the geometry and the cardinality of the finite field. By
fixing one of these and letting the other vary we obtain numerous infinite
families of designs. Each of these designs will have an automorphism group
containing P Γ L(V ) or AΓ L(V ) in the projective or affine case, respectively.
Except for isolated cases the parameters will admit many designs other than
these classical designs, and a large amount of effort has gone into classifying
the classical designs amongst those with the same parameters.
2 PROJECTIVE AND AFFINE GEOMETRIES 11

Perhaps the most interesting case is that of dimension 2. In the projective


case, P G2 (Fq ) produces the design of points and lines of a 3-dimensional
vector space over a finite field, a classical projective plane. It is a design
with parameters 2-(q 2 + q + 1, q + 1, 1). For q a proper power of a prime
there are many such designs that do not arise from P G2 (Fq ), but for q a
prime only the classical plane has appeared — and it is possible that there
may not be any others. For q not a power of a prime no designs with
these parameters have been discovered. Observe that to recover q from the
parameters one must take λ1 − λ2 in the notation of Chapter 1; this integer
is an important parameter of a design, and is called the order.
In the affine case, AG2 (Fq ) produces the design of points and lines of
a 2-dimensional vector space, i.e. a classical affine plane. It is a 2-(q 2 , q, 1)
design. It also has order q.
Projective planes are symmetric designs, i.e. have the same number of
points as blocks. For a symmetric 2-(v, k, λ) design λ1 = k and the order is
given as k − λ, as it was for projective planes. More generally, the design
of points and hyperplanes of a projective geometry produces a symmetric
design. If the finite field has q elements and the geometry has projective
dimension n, then this design of points and hyperplanes is a symmetric
design with parameters
!
q n+1 − 1 q n − 1 q n−1 − 1
2− , ,
q−1 q−1 q−1

and order q n−1 .

2.4 Codes from designs


For any finite incidence structure D with point set P and block set B, the
code Cp (D) of D over a prime field Fp is the subspace of the space FpP of
all functions from P to Fp that is spanned by the incidence vectors of the
blocks of D. This code is equivalent to the code given by the row space of
any incidence matrix of the incidence stucture — where we use the blocks
to index the rows (and the points the columns) of the incidence matrix.
Although this is the appropriate way to view the incidence matrix in the
context of this chapter, it does sometimes prove useful to examine the code
given by the row space of the “point by block” incidence matrix; see, for
example, [52].
For any subset X ⊆ P, we denote the characteristic function of X by
3 THE REED-MULLER CODES 12

v X and refer to v X as the incidence vector of X. Thus


(
X 1 if x ∈ X
v (x) = ,
0 if x 6∈ X

where v X (x) denotes the value that the function v X takes at the point x.
Then
Cp (D) = hv B | B ∈ Bi.
The dimension of Cp (D) is referred to as the p-rank of D. The rank tends to
vary with p in the general case; for so-called 2-designs it is easily determined
except for those primes dividing the order of the design.
The minimum weight of the code arising from an incidence structure is
clearly at most equal to the cardinality of the smallest block. In general the
minimum weight is strictly less than this cardinality, but for the classical
geometric designs studied in this chapter there is a distinguished prime one
considers, and for these codes we will have equality.
As we will soon see, one of the most widely studied class of binary
codes, the Reed-Muller codes, arises precisely as the class of codes given by
geometric designs over the binary field — although the original presentation
of these codes in 1954 was in the boolean-function context and was given by
electrical engineers.

3 The Reed-Muller codes


The Reed-Muller codes have already been defined in Chapter 1 (Section 13).
For completeness, and in order to establish our notation for this section and
those to follow, we will repeat some of the definitions and results.

3.1 Definitions
Throughout this section F will denote the field F2 . Let V be a vector space
of dimension m over F . We let F V denote the vector space over F of all
functions from V to F . As a vector space over F , F V has dimension 2m ,
the cardinality of the set V . Since F V will be the ambient space for the
Reed-Muller codes we must choose a basis for it and we choose the standard
basis consisting of the characteristic functions of the elements of the set V .
Denoting a typical element of V by v these basis elements are {v v | v ∈ V },
where we write v v instead of the more cumbersome v {v} . Viewing V as F m ,
3 THE REED-MULLER CODES 13

it too has a standard basis e1 , e2 , . . . , em , where

ei = (0, 0, . . . , 1, 0, . . . , 0).
| {z }
i

Moreover, any function f ∈ F V can be given as a function of m variables


corresponding to the m coordinate positions: writing the vector x ∈ V as
m
X
x = (x1 , x2 , . . . , xm ) = xi ei ,
i=1

then f = f (x1 , x2 , . . . , xm ). The “polynomial” xi is, for example, the linear


functional that projects a vector in V onto its ith coordinate in the given
basis, its value at ( m
P
j=1 xj ej ) being xi .
As a function on V , xki = xi whenever k > 0, so we obtain all the
monomial functions via the 2m monomial functions:

M = {xi11 xi22 . . . ximm | ik = 0 or 1; k = 1, 2, . . . , m},

where we write 1 for the constant function x01 x02 . . . x0m with value 1 at all
points of V ; as a code vector it is the all-one vector 1. The linear com-
binations over F of these 2m monomials give all the polynomial functions,
since, once again, we can reduce any polynomial in the xi modulo x2i − xi ,
for i = 1, 2, . . . , m. The set M of 2m monomials is another basis for the
vector space F V ; the following lemma indicates how each of our given basis
elements of characteristic functions of the vectors in V is given as a polyno-
mial, i.e. as a sum of elements of M. This not only proves the assertion but
also shows that the set M is a linearly independent set of vectors in F V .

Lemma 3.1 Set K = {1, 2, . . . , m} and, for w = (w1 , w2 , . . . , wm ) ∈ V , let


Iw = {i ∈ K | wi = 1}. Then
m
Y X Y
vw = (xk + 1 + wk ) = xj .
k=1 K⊇J⊇Iw j∈J

Proof: The proof is simple: the first polynomial is easily seen to define the
characteristic function of the vector w; and the expansion of this product is
clearly the sum on the right. 2
We repeat the definition of the Reed-Muller codes:
3 THE REED-MULLER CODES 14

Definition 3.2 Let V denote the vector space of dimension m over F = F2


and let r satisfy 0 ≤ r ≤ m. The Reed-Muller code of order r, denoted
by R(r, m), is the subspace of F V (with basis the characteristic functions of
the vectors of V ) that consists of all polynomial functions in the xi of degree
at most r, i.e.
* +
Y
R(r, m) = xi |I ⊆ {1, 2, . . . , m}, 0 ≤ |I| ≤ r .
i∈I

Example 3.3 The first-order Reed-Muller code R(1, m) consists of all lin-
ear combinations of the monomials xi and 1 and hence each codeword, apart
from 0 and 1, is given either by a non-zero linear functional on V or by 1
plus such a functional. Since any non-zero linear functional has 2m−1 zeros,
every vector of R(1, m), apart from 0 and 1, has weight 2m−1 . A generator
matrix for R(1, m) using the basis x1 , x2 , . . . , xm , 1 can be written so that
the first 2m − 1 columns and m rows are the binary representations of the
numbers between 1 and 2m − 1, whereas the last column is all 0, apart from
a final row where all entries are equal to 1. This is clearly a generator matrix
for the orthogonal of the extended Hamming code, i.e. R(1, m) = (H ⊥
m) ,
d
where C denotes the code obtained from C by adding an overall parity check.
b

As an immediate consequence of the definition and the linear indepen-


dence of the functions in M, we have that
m m m m
dim(R(r, m)) = 0 + 1 + 2 + ··· + r .

In particular, dim(R(1, m)) = 1 + m.


The trivial cases include the repetition code, R(0, m) = F 1, R(m, m) =
F V and the code R(m − 1, m), which is of codimension 1 in F V and equal
to (F 1)⊥ . The Reed-Muller codes are a nested sequence of codes. That is,

R(r, m) ⊆ R(s, m)

whenever 0 ≤ r ≤ s ≤ m.
We mentioned above that the orthogonal of R(0, m) = F 1 is R(m−1, m).
This is a special case of the following result, which was proved in Chapter 1:

Theorem 3.4 For any m ≥ 1 and any r such that 0 ≤ r < m,

R(r, m)⊥ = R(m − r − 1, m).


3 THE REED-MULLER CODES 15

We will, in fact, reprove this result in Section 5 when we give its straight-
forward generalization to generalized Reed-Muller codes, Theorem 5.8.
Example 3.5 From Theorem 3.4 we get immediately that
R(1, m)⊥ = H
dm = R(m − 2, m).

Thus, extended Hamming codes are Reed-Muller codes.


In the next subsection we will see the connection between the Reed-
Muller codes and the codes of the designs of points and flats in affine space
over F2 . The codes of the analogous designs from projective spaces over F2
arise as punctured Reed-Muller codes:
Definition 3.6 For 0 ≤ r < m the punctured Reed-Muller code of
order r, R(r, m)∗ , is the code obtained from R(r, m) by puncturing at the
vector 0 ∈ V .
One could puncture at any vector of V and get an isomorphic code
since the set of polynomial functions is invariant under translation in V ;
i.e. if f is a polynomial in the xi ’s of degree s then so is g where g =
f (x1 + a1 , . . . , xm + am ) for any vector a = (a1 , . . . , am ) ∈ V , which means
that the automorphism group of any Reed-Muller code acts transitively on
the coordinates.
Example 3.7 If m = 3, r = 1, R(1, 3) is a self-dual [8, 4, 4] binary code, and
R(1, 3)∗ is a [7, 4, 3] code, viz. the Hamming code H3 . Example 3.5 gives the
reason for this and shows that Hamming codes are punctured Reed-Muller
codes.
Proposition 3.8 For r < m the punctured Reed-Muller code R(r, m)∗ is a
m m m
[2m − 1, 0 + 1 + ··· + r ]

binary code.
Proof: This follows easily: the dimension must be that of R(r, m) since
all the vectors in this code are of even weight and the projection cannot,
therefore, have a nontrivial kernel. 2
Finally, note that it follows from Theorem 3.4 that
(R(r, m)∗ )⊥ = R(m − r − 1, m)∗ ∩ (F2 1)⊥
provided r < m. That is, (R(r, m)∗ )⊥ consists of the vectors of (R(m − r −
1, m)) with a zero at 0, that coordinate being discarded.
3 THE REED-MULLER CODES 16

3.2 Geometries and Reed-Muller codes


The set of vectors V is the point set for any design defined from an affine
geometry AGm (F2 ); moreover the binary codes of all the associated designs
of points and flats are subspaces of F V . Similarly, the designs from the

projective geometry P Gm−1 (F2 ) all have point set V ∗ = V − {0} and F V
is the ambient space of their binary codes. In this section we indicate how
to associate these design codes with the Reed-Muller and punctured Reed-
Muller codes of the last section.
Consider the generating elements of R(r, m): the polynomial xi as a
codeword has value 1 at a point x in V if the vector x has a 1 in the
coordinate position i and value 0 otherwise. Thus 1 + xi = v H , where H
is the hyperplane with the equation Xi = 0. Also, xi is the characteristic
function of the complement of this hyperplane, i.e. the (m − 1)-flat with
equation Xi = 1. Similarly, (1 + xi )(1 + xj ), for i 6= j, is the characteristic
function of the intersection of two hyperplanes, a subspace of dimension
m − 2. In general, all the elements of M are the incidence vectors of flats
in the affine geometry and R(r, m) is spanned by the incidence vectors of
these (m − s)-flats, for 0 ≤ s ≤ r. In order to show that R(r, m) is the
binary code of the design of points and (m − r)-flats of AGm (F2 ), which
is our aim, we need to show that the vectors given by the (m − r)-flats
span R(r, m). Notice that we already have this result for the first-order
Reed-Muller codes, since the linear equations certainly define (m − 1)-flats
and, furthermore, R(1, m) has precisely 2(2m − 1) such vectors, the number
of (m − 1)-flats in AGm (F2 ). Thus, if A is the affine design of points and
(m − 1)-flats, we have that

R(1, m) = C2 (A).

The general case is almost as easy. First of all we have that the flats are
in the Reed-Muller code:

Proposition 3.9 The incidence vectors of the (m − r)-flats of AGm (F2 )


are all in R(r, m).

Proof: Any (m − r)-flat T in AGm (F2 ) consists of all the vectors (points of
the affine space) x = (x1 , x2 , . . . , xm ) that satisfy r linear equations,
m
X
aij Xj = bi , for i = 1, 2, . . . , r ,
j=1
3 THE REED-MULLER CODES 17

where all the aij and bj are in F2 . The polynomial,


 
r
Y m
X
b i + 1 + aij xj  ,
i=1 j=1

has degree at most r and thus is in R(r, m). Moreover it is clearly the
characteristic function v T of T . 2
In fact the degree of the polynomial is exactly r when the equations are
independent and the proof actually shows that all the (m − s)-flats are in
R(r, m) provided s ≤ r.

Theorem 3.10 Let A be the design of points and r-flats of the affine ge-
ometry AGm (F2 ), where 0 ≤ r ≤ m. Then the binary code C2 (A) is the
Reed-Muller code R(m − r, m). Its dimension is
! ! !
m m m
+ + ··· + .
0 1 m−r

Let P be the design of points and r-dimensional subspaces of the pro-


jective geometry P Gm−1 (F2 ) where 1 ≤ r ≤ m − 1. Then the binary code
C2 (P) is the punctured Reed-Muller code R(m − r − 1, m)∗ . Its dimension
is ! ! !
m m m
+ + ··· + .
0 1 m−r−1

Proof: The characteristic function of any (t + 1)-flat is the sum of the


characteristic functions of two t-flats contained in it and thus the binary
code of the design of points and (m−r)-flats contains, by a trivial induction,
the characteristic function of every (m − s)-flat for 0 ≤ s ≤ r and hence the
code of this design is R(r, m). Reversing the roles of r and m − r gives the
first part of the theorem.
For the second part of the theorem, notice first that the code of the
design is contained in the punctured Reed-Muller code. Extend the code
of the design by an overall parity check and note that this extended code
is a subcode of R(m − r − 1, m) and that incidence vectors of the (r + 1)-
dimensional subspaces of V generate this extended code. Now every (r + 2)-
dimensional subspace of V has an incidence vector that is the sum of all the
incidence vectors of the (r + 1)-dimensional subspaces it contains. But, over
F2 , an (r + 1)-flat that is not a subspace consists of an (r + 2)-dimensional
3 THE REED-MULLER CODES 18

subspace from which the points of an (r + 1)-dimensional subspace, the


subspace of which it is a coset, have been removed. Thus, in the code of
the design it is the sum of the incidence vectors of an (r + 2)-dimensional
subspace and an (r + 1)-dimensional subspace. It follows that all the (r +
1)-flats of V are in the extended code of the design and it is, therefore,
R(m − r − 1, m). 2
In the proof the essential new point is that subspaces alone generate
the Reed-Muller codes because flats can be obtained from subspaces, a fact
which makes the discussion of the binary case very easy. We record this as

Corollary 3.11 The Reed-Muller code R(m−r, m) is generated by the char-


acteristic functions of the r-dimensional subspaces of F2m or, indeed, by the
r-flats containing any fixed point of F2m .

The characteristic functions of the r-flats are vectors of weight 2r and are
precisely the minimum-weight vectors of R(m−r, m), as we shall soon prove.
Before doing so, we introduce two exact sequences that arise naturally from
the geometric nature of the Reed-Muller and punctured Reed-Muller codes.

Lemma 3.12 Any embedding of P Gm−1 (F2 ) into P Gm (F2 ) gives rise to
the following two short exact sequences whenever 0 ≤ r < m:

(i) 0 → R(m − r − 1, m)∗ → R(m − r, m + 1)∗ → R(m − r, m) → 0;

(ii) 0 → R(m − r − 1, m) → R(m − r, m + 1)∗ → R(m − r, m)∗ → 0.

Proof:
Let W be the (m+1)-dimensional vector space defining P Gm (F2 ). Then
an embedding of P Gm−1 (F2 ) in P Gm (F2 ) is given by a hyperplane H of W
and, moreover, the complement of H in W , H = W − H, is a copy of
AGm (F2 ), as we explained in Section 2.2.
Let D be the design of points and r-dimensional subspaces of P Gm (F2 ).
Using P G(H) we form the design D1 of r-dimensional subspaces in
P Gm−1 (F2 ), and from AG(H) we form the design D2 of r-flats in AGm (F2 ).
By Theorem 3.10, C2 (D) = R(m − r, m + 1)∗ , C2 (D1 ) = R(m − r − 1, m)∗
and C2 (D2 ) = R(m − r, m).
Any block of the design D is either in H or meets it in an (r − 1)-
dimensional (projective) subspace. The intersection with H is thus empty or
an r-flat; clearly every r-flat of AG(H) arises in this way. Thus C = C2 (D)
projects onto C2 (D2 ), and C2 (D1 ) is in the kernel. Thus dim(C2 (D1 )) ≤
3 THE REED-MULLER CODES 19

dim(C2 (D)) − dim(C2 (D2 )), and using the formula for the dimension of the
Reed-Muller codes, we have
m−r−1 m−r m−r
! ! !
X m X m+1 X m
≤ − .
i=0
i i=0
i i=0
i

m+1 m m 
Using the identity k = k + k−1 repeatedly, shows that this is actually
an equality, and hence that C2 (D1 ) is the whole kernel. This yields the short
exact sequence (i).
To obtain the second sequence we use the same embedding but now
project C onto the coordinate positions corresponding to the points of
P G(H). Let E 2 be the design of points and (r − 1)-dimensional subspaces of
P Gm−1 (F2 ), and E 1 the design of points and (r +1)-flats of AGm (F2 ). Then
C2 (E 2 ) = R((m − 1) − (r − 1), m)∗ , and C2 (E 1 ) = R(m − r − 1, m). Cer-
tainly C projects onto C2 (E 2 ) since every r-dimensional projective subspace
of P G(W ) meets H in an (r − 1)-dimensional subspace — or is contained
in H — and every (r − 1)-dimensional subspace arises in this way. Two
r-dimensional subspaces of P G(W ) that meet P G(H) in the same (r − 1)-
dimensional subspace have disjoint intersections in H and thus form two
cosets of the same r-dimensional subspace of AG(H). Together they form
an (r + 1)-dimensional space. It follows immediately that the kernel of the
projection is C2 (E 1 ) and thus yields the sequence (ii). 2
We next draw out the consequences of Lemma 3.12 and in so doing
prove that the minimum weights of the Reed-Muller codes are as we have
indicated and, more importantly, determine the nature of the minimum-
weight vectors.

Theorem 3.13 For 0 ≤ r ≤ m the minimum weight of R(m − r, m) is


2r and the vectors of minimum weight are the incidence vectors of the r-
flats of AGm (F2 ). For 1 ≤ r ≤ m the minimum weight of R(m − r, m)∗ is
2r − 1 and the vectors of minimum weight are the incidence vectors of the
(r − 1)-dimensional subspaces of P Gm−1 (F2 ).

Proof: Clearly the minimum weights are at most 2r and 2r − 1 by Theo-


rem 3.10. Now we use the short exact sequences and induction on m, the
result being trivial for m = 1. Assume the result true for m and all r < m.
Thus we assume that R(m − s, m) has minimum weight 2s for m − s ≤ m
and R(m − s, m)∗ has minimum weight 2s − 1 for m − s < m and that the
minimum-weight vectors are as announced.
3 THE REED-MULLER CODES 20

Since, for r = 0, the result is trivial for any m we may assume r > 0
and consider dimension m + 1. If r = m the results are easy, for then we
have R(1, m + 1), a case we have already discussed. We suppose 0 < r < m
and use the notation of Lemma 3.12. Thus D is the design of points and
r-dimensional subspaces of P Gm (F2 ). Let v be a minimum-weight vector
of C = C2 (D), so that wt(v) ≤ 2r+1 − 1. If v is zero at the coordinates
corresponding to H = W − H, then v can be viewed in C2 (D1 ), from the
short exact sequence (i), and hence v has weight 2r+1 −1 and is the incidence
vector of an r-dimensional subspace of H (and hence of W ), by the induction
hypothesis. If v is zero at the coordinates corresponding to H, then v can
be viewed in C2 (E 1 ) = R(m − (r + 1), m), from the short exact sequence
(ii), and thus has weight at least 2r+1 , which is not possible. Thus v can
be taken to have support meeting both H and H. Again by the induction
hypothesis, the weight is at least 2r + 2r − 1 = 2r+1 − 1, using the last
non-zero terms of the short exact sequences, and hence has exactly this
weight. Furthermore, restricted to P G(H), v is the incidence vector of an
(r − 1)-dimensional subspace. To show that v is the incidence vector of an
r-dimensional subspace of P Gm (F2 ), construct an r-dimensional subspace
of P Gm (F2 ) whose incidence vector w coincides with v on P G(H) and that
contains at least one point in H in common with the support of v. Then
the weight of v − w is easily seen to be less than 2r+1 − 1 and hence v = w.
This gives the projective result for projective dimension m from which the
affine result for dimension m + 1 follows since the Reed-Muller codes are
invariant under translation in V — as we remarked in the last section —
which means it is sufficient to consider only those minimum-weight vectors
of the Reed-Muller code with a 1 at 0. 2
It should be noted that the code of any projective-geometry design is
cyclic due to the existence of Singer cycles (as already mentioned in Sec-
tion 2.1) and hence the punctured Reed-Muller codes are cyclic.
We summarize the results obtained on the properties of the Reed-Muller
codes and finite geometries over the field F2 :

Theorem 3.14 Let m be any positive integer.

(1) If A is the design of points and r-flats of the affine geometry AGm (F2 ),
where 0 ≤ r ≤ m, then the binary code C = C2 (A) is the Reed-Muller
code R(m − r, m). It is a [2m , m m m r

0 + 1 + · · · + m−r , 2 ] binary
code and the minimum-weight vectors are the incidence vectors of the
3 THE REED-MULLER CODES 21

r-flats. Further, C contains the incidence vectors of all t-flats for


r ≤ t ≤ m.
For r > 0 the orthogonal, C ⊥ , is the Reed-Muller code R(r − 1, m),
which is the binary code of the design of points and (m − r + 1)-flats
of the affine geometry AGm (F2 ).

(2) If D is the design of points and r-dimensional subspaces of the pro-


jective geometry P Gm (F2 ), where 0 ≤ r ≤ m, then the binary code
C = C2 (D) is the punctured Reed-Muller code R(m − r, m + 1)∗ . It
m+1 m m+1 m+1 r+1
is a [2 − 1, 0 + 1 + · · · + m−r , 2 − 1] binary cyclic code
and the minimum-weight vectors are the incidence vectors of the r-
dimensional subspaces. Further, C contains the incidence vectors of
all t-dimensional subspaces for r ≤ t ≤ m.
The code orthogonal to R(m − r, m + 1)∗ is R(r, m + 1)∗ ∩ (F2 1)⊥ ,
which is the even-weight subcode of the binary code of the design of
points and (m − r)-dimensional subspaces of P Gm (F2 ).

Example 3.15 (1) The code of the design of points and lines in AG4 (F2 )
is R(3, 4), which is the even-weight subcode of F V . Its orthogonal is
F 1 = R(0, 4). The code of the design of points and planes is R(2, 4),
of dimension 11, with orthogonal the code from the design of points
and hyperplanes, of dimension 5, i.e. R(1, 4).

(2) The code of the design of points and lines in P G3 (F2 ) is R(2, 4)∗ ,
of dimension 11 and minimum weight 3; it is, of course, a binary
Hamming code.

(3) A basis consisting of the incidence vectors of lines in P Gm (F2 ) for the
code R(m − 2, m)∗ = Hm can be found as follows (as described in
Key and Sullivan [32]): take any line and include its incidence vector;
take any point off the line, and include the three incidence vectors of
the three lines joining the new point to the points on the first line.
Continue in this way: at each stage, if there is a point not yet incident
with a chosen line then simply take all the incidence vectors of the
lines joining that point to the points already obtained. These incidence
vectors are clearly linearly independent and, as is easily seen, are equal
in number to the dimension; hence they yield a basis. The successive
dimensions are

1, 1 + 3 = 4, 4 + 7 = 11, 11 + 15 = 26, 26 + 31 = 57, . . .


3 THE REED-MULLER CODES 22

Moreover, the incidence vectors chosen that have a point in common


with a fixed point of the first line form a collection of parity checks (of
the simplex code that is dual to the Hamming code) that are “focused
on” that fixed point — see Section 3.3 — and can be used for majority-
logic decoding.

The group AGLm (F2 ), in its natural action on V = F2m , yields a group
of automorphisms of every Reed-Muller code R(r, m) and P GLm (F2 ), in its
natural action on V ∗ = V − {0}, yields a group of automorphisms of every
punctured Reed-Muller code R(r, m)∗ . That P GLm (F2 ) is the full group
of automorphisms of R(r, m)∗ whenever 0 < r < m − 1 follows from Theo-
rem 3.13 and the fundamental theorem of projective geometry, Theorem 2.3.
¿From this it follows that AGLm (F2 ) is the full group of automorphisms of
R(r, m) whenever 0 < r < m − 1. One must be careful here and note that
it is not the entire projective space that must be preserved, but only part of
it, to ensure that the automorphism comes from the general linear group.

3.3 Decoding
One of the attractions of Reed-Muller codes is the simple and easily im-
plemented decoding scheme that is available, with decoding decisions made
by majority vote, just as with the repetition code — which is, of course,
the simplest Reed-Muller code, R(0, m). Since the scheme is related to
the geometric nature of these codes we describe it here. The scheme dates
from the very beginning of coding theory and is due to Reed [46]. It was
Reed’s algorithm that prompted the investigation of majority-logic decoding
and the rather peculiar definition of so-called Euclidean-geometry codes as
maximal cyclic subspaces of duals of the codes generated by certain flats in
AGm (F2s ). We begin by describing majority-logic decoding.
Let C be an arbitrary linear code contained in the ambient space Fqn .
Recall that a parity check is simply a code vector in the orthogonal code,
C ⊥ , and that the support of a vector in Fqn is the set of coordinate positions
in which it has non-zero entries. Suppose we are given J parity checks and a
coordinate position, i say, such that the intersection of the supports of any
two of the given parity checks is precisely the singleton set {i}. If a received
vector has been perturbed by t or fewer errors during transmission, where
2t ≤ J, then, clearly, at least half of the parity checks will give zero (i.e.
check) when applied to the received vector unless the symbol at coordinate i
3 THE REED-MULLER CODES 23

is in error. Moreover, had we normalized the given parity checks so that each
had a 1 at coordinate i, then, in the event that one of the t or fewer errors
had occured at coordinate i, at least half of the parity checks would record
that error. Thus a majority “vote” of the values of the parity checks corrects
the entry at coordinate i. This is the essence of majority-logic decoding.
Such a collection of parity checks is said to be focused on i2 . Note that
if each of the coordinates of C has a collection of J parity checks focused
on it, then the code will necessarily correct t or fewer errors — where again
2t ≤ J — and therefore C must have minimum weight at least 2t + 1.
Indeed, it is very easy to see that if there is a set of J parity checks focused
on a coordinate i, then any code vector with a non-zero entry at i must
have weight at least J + 1 in order to satisfy the J parity checks. Note also
that any code with a transitive automorphism group (and, in particular, a
cyclic code) will have minimum weight at least J + 1 provided that one, and
hence all coordinates, has a collection of J parity checks focused on it. In
the cyclic case majority-logic decoding, when it is available, is particularly
simple.
An instructive example is the dual C of a binary Hamming code, fre-
quently referred to as a simplex code. It has the classical Steiner triple
system, namely the lines of the projective geometry, among its parity checks
and the pencil of lines through a point of the geometry fulfills the require-
ments for a collection of parity checks focused on the given point; if the
Hamming code is of block length 2m − 1, then J = 2m−1 − 1. Indeed, C has
minimum weight 2m−1 and t = 2m−2 − 1 errors can be corrected. A simpler,
but still important, case is the following:

Example 3.16 The repetition code of length n over Fq has, clearly, n − 1


parity checks of weight 2 focused on any given coordinate and, for odd n,
one simply uses a majority vote to determine the symbol sent, obtaining the
correct symbol provided at most n−12 errors occurred during transmission.

To use majority logic to correct many errors one must have many parity
checks focused on each coordinate, which entails that the minimum weight
d⊥ of C ⊥ be small; in fact, in order to have J parity checks focused on a given
coordinate we must have (d⊥ − 1)J ≤ n − 1. The examples above have the
2
Unfortunately the term “orthogonal on i” is the terminology of most of the coding
literature. Blahut, recognizing the problem with that terminology, used “concurrent on
i” in [7], but that does not seem to have been adopted. We here make another attempt
at change.
3 THE REED-MULLER CODES 24

smallest possible minimum weights for their duals and allow error-correction
via majority logic to correct up to the full error-correcting capacity of the
code. In the coding literature such codes are said to be “completely orthog-
onalizable”.
Consider next the Reed-Muller code C = R(r, m) where r < m. A
basis for C is the set of monomials of degree r or less. The idea of Reed’s
decoding scheme is to determine first the “information bits” corresponding
to monomials of degree r, thus reducing the problem to decoding in the Reed-
Muller code R(r − 1, m). Let K be a subset of {1, 2, . . . , m} of cardinality
r and let L be the complement of K. Now the monomial of degree r,
Y
xk ,
k∈K

as an element of C = R(r, m), is the characteristic function of the (m−r)-flat


S given by the equations
Xk = 1, k ∈ K.
Also Y
xl
l∈L
is the characteristic function of the r-flat T given by

Xl = 1, l ∈ L

and, moreover, is an element of R(r − 1, m)⊥ . Each of the 2m−r translates


of T meets the flat S precisely once, but any other (m − r)-flat given by a
different monomial of degree r evenly. (To see this the reader may wish to
think of subspaces of the relevant dimensions; in one case the intersection
is the zero vector and S is a transversal to the 2m−r translates of T ; in the
other the intersection is a subspace of positive dimension and S meets a
translate of T either in a flat of that dimension or not at all.) Thus, a ma-
jority vote of the parity checks corresponding to these 2m−r translates will
Q
record only the information bit corresponding to k∈K xk provided fewer
than 2m−r−1 − 1 errors have been made in transmission. Note that one
retrieves the information bit directly by majority voteand that, after deter-
mining those information bits corresponding to the m r monomials of degree
r, the received vector is adjusted and decoding proceeds in R(r − 1, m) via
precisely the same method.
Finally, we make contact, briefly, with so-called L-step majority-logic
decoding. In our discussion of Reed’s algorithm we used parity checks which
4 THE GROUP-ALGEBRA APPROACH 25

were not in the dual of the code in question: the r-flat T above has a
characteristic function in R(r − 1, m)⊥ but not in R(r, m)⊥ . However, if
T 0 is any tranlate of T then T ∪ T 0 is an (r + 1)-flat whose characteristic
function is in R(r, m)⊥ . Moreover, the 2m−r − 1 flats of this form (namely,
T ∪ T 0 , where T 0 is a distinct translate of T ) are focused on T in the sense
that the intersection of the supports of any two of them is precisely the
set T . Thus, provided sufficiently few errors were made in transmission, a
majority vote using these parity checks will give the sum of the error bits
contained in the coordinate positions corresponding to the flat T . Such a
“divide and conquer” technique using majority or threshold circuitry was
thoroughly investigated early in the history of coding theory and was the
subject of Massey’s thesis, [41]. For a fuller discussion of the decoding of
Reed-Muller and Generalized Reed-Muller codes and L-step majority-logic
decoding the reader may wish to consult [40] or a textbook on error control,
for example, [7] or [36].

4 The group-algebra approach


We have not, so far, taken full advantage of the fact that the coordinate
set of the codes in question is, itself, endowed with structure. We did, of
course, use that structure in defining the Singer cycles — where the co-
ordinate set was given the structure of a field — and, moreover, we used
the additive structure to discuss the minimum-weight vectors and the auto-
morphism groups of the Reed-Muller codes. But, for example, we have not
yet explicitly shown how to get the generator polynomials of the punctured
Reed-Muller codes although we know they are cyclic. In fact, of course, there
are many Singer cycles and the codes are therefore cyclic in many ways —
which is another way of saying that one must specify explicitly how a code is
cyclic before one can compute the generator polynomial. The group-algebra
approach allows one to naturally specify the roots of the generator polyno-
mial without actually choosing the Singer cycle and it is this intrinsic nature
of the approach which makes the group-algebra setting so attractive.
We follow Charpin [14] but Landrock and Manz [34] have also given
an expository account; the original source of this approach was Berman’s
seminal paper, [6]. One exploits the modular group algebra of an elementary
abelian3 group, the additive group of the field that labels the coordinates,
3
The group, in fact, need not be elementary abelian nor even abelian and the general
case has been treated; see, for example, the chapter by Ward in this Handbook. But, if one
4 THE GROUP-ALGEBRA APPROACH 26

as the ambient space for the codes.

4.1 Elementary results and Berman’s theorem


We proceed in full generality but the reader interested only in Berman’s
result and the Reed-Muller codes can take G below to be F2m and F to be
F2 .
Let q = pm and set G equal to the additive subgroup of the field Fq .
We will very soon regard G as Fq , but for the moment it is merely an
elementary abelian p-group4 of order pm . Let F be any subfield of Fq and
set R = F [G], the group algebra of G over the field F . Recall that the
elements of R are simply functions from G to F and therefore, when G and
F are taken as suggested above, R is the ambient space of the Reed-Muller
codes. We choose, however, to formulate things a bit differently and view
the group algebra in a polynomial way — as one frequently does with group
algebras given by abelian monoids5 . Thus a typical element of R is a formal
sum g∈G xg X g where the xg are elements of F and, as a function, it is
P

simply the one that assigns xg to the element g of G. Addition and scalar
multiplication are component-wise and the multiplication is given by the
addition in G. Thus,
X X X
xg X g + yg X g = (xg + yg )X g
g∈G g∈G g∈G

and, for c ∈ F , X X
c( xg X g ) = (cxg )X g ;
g∈G g∈G

using the “polynomial” multiplication X g X h = X g+h gives the usual mul-


tiplication formula in the group algebra:
X X X X X
( xg X g )( yh X h ) = xg yh X g+h = ( xk−h yh )X k .
g∈G h∈G g,h∈G k∈G h∈G

Notice that X 0 is the unit element of the commutative ring R; i.e. X 0 a = a


for every a ∈ R. The augmentation map R → F given by g∈G xg X g 7→
P

restricts oneself to p-groups, a result of Faldum’s [20] shows that one might as well restrict
oneself to the elementary abelian case — as far as producing “good” codes is concerned.
4
That is, G is an abelian group all of whose non-identity elements have order p or, in
other words, a vector space over the field Fp . Since the group operation is being written
additively “order p” means simply that pg = 0 for every g ∈ G.
5
The paradigm is the ordinary polynomial ring where the monoid in question is the set
of non-negative integers under addition.
4 THE GROUP-ALGEBRA APPROACH 27

P
g∈G xgis clearly a linear transformation of the vector space structure of R
onto F ; moreover, it is an algebra homomorphism — as one can easily check
from the multiplication formula. We denote the kernel of this augmentation
map by M ; it is, of course, an ideal of R, but much more is true: since we
are in characteristic p we have the Frobenius homomorphism, a 7→ ap , at
our disposal and the fact that G is an elementary abelian p-group gives
X X X X
( xg X g )p = xpg X 0 = ( xpg )X 0 = ( xg )p X 0 ,
g∈G g∈G g∈G g∈G

which shows that every element not in M is invertible in R and hence that
M is the unique maximal ideal of R.
In the binary case, with the interpretation suggested above, M is the
Reed-Muller code R(m − 1, m); we shall shortly see that the powers of the
ideal M give precisely the Reed-Muller codes.
Observe that in our present notation the characteristic function of a sub-
set S of G is given by the element g∈S X g of the group algebra. Consider
P

next the element X g − X 0 = X g − 1 of the ideal M , where we have set


X 0 = 1 since it is the unit element of R. Provided g 6= 0,
p−1 ! p−1
g p−1
X
p−1−i p−1 X X
(X − 1) = (−1) X ig = (−1)p−1 X ig = Xh
i=0
i i=0 h∈U

where U = hgi = {ig | 0 ≤ i < p} is the subspace over Fp spanned by g. We


havehere used the fact that (−1)p−1 = 1, even when p = 2, and the fact that
p−1
i = (−1)i since we are working in a field of characteristic p. Moreover,
if we are given a set of linearly independent elements of G, g1 , g2 , . . . , gr say,
then one checks easily that ri=1 (X gi − 1)p−1 = g∈U X g , where now U
Q P

is the subspace spanned by {g1 , g2 , . . . , gr }. In fact, we have the following


more precise statement:

Lemma 4.1 Let S be a non-empty subset of G. Then


( P
g
g∈hSi X if S is a linearly independent set
Y
g p−1
(X − 1) =
g∈S
0 otherwise

Proof: We have already remarked on the case of a linearly independent set


S, so suppose S is linearly dependent. We wish to show that the product
is zero. If 0 ∈ S, that result is immediate; otherwise, let S 0 be a linearly
independent subset of S with the property that there is a g0 ∈ S − S 0
4 THE GROUP-ALGEBRA APPROACH 28

contained in hS 0 i. Then the product in question is ( g −1)p−1 )(X g0 −


Q
g∈S 0 (X
1)p−1 a where a is an element of R.
By the first part of the lemma,
Y X X X X
( (X g − 1)p−1 )(X g0 − 1)p−1 = ( X g )( X h) = X g+h .
g∈S 0 g∈hS 0 i h∈hg0 i h∈hg0 i g∈hS 0 i

Since g + h runs through hS 0 i as g does for every h ∈ hg0 i, this latter sum is
g∈hS 0 i X = 0 and we have the result. 2
P P g P g
h∈hg0 i g∈hS 0 i X = p

Now since the ideal M is generated linearly over the field F by the
elements X g − 1, the ideal M r is generated linearly by elements of the
form g∈S (X g − 1) where S is a subset of G of cardinality r. Moreover,
Q

in characteristic 2, the subsets S can be taken to be linearly independent


subsets of the vector space G over F2 , by the above Lemma. Hence in this
binary case M r is generated linearly by the characteristic functions of the r-
dimensional subspaces of the vector space G over F2 . Because of the simple
result (Corollary 3.11) that R(m − r, m) is generated by the characteristic
functions of subspaces of dimension r, we have proved Berman’s theorem:

Theorem 4.2 In the group algebra F2 [G], where G is an m-dimensional


vector space over F2 , the Reed-Muller code R(m − r, m) = M r , where M is
the unique maximal ideal of F2 [G].

Remark: The theorem is even true for r = 0 provided we define M 0 = R,


as is customary, it being the ideal generated by 1. Observe that for r = m
we have the repetition code and, indeed, g∈B (X g − 1) = 1 for every basis
Q

B of G.

4.2 Isometries of the group algebra


If R = F [G] is the group algebra of any group G, abelian or not, then any
automorphism σ of the group G induces an automorphism of R, which we
also denote by σ, via
X X
σ( xg X g ) = xg X σ(g) ,
g∈G g∈G

as one can easily check. Moreover, in the basis given by X g , the coding-
theory basis we have chosen, such an automorphism is weight preserving
— i.e. it is also an isometry preserving the Hamming metric. If σ is any
4 THE GROUP-ALGEBRA APPROACH 29

automorphism of the algebra R that is weight preserving, then it must be


given monomially; i.e. σ(X g ) must be of the form aX h for some h ∈ G
and a ∈ F × ; for G an elementary abelian p-group we have — since auto-
morphisms preserve the unit element — σ(X 0 ) = σ((X g )p ) = ap X 0 = X 0 ,
which implies that a = 1, since F is of characteristic p, and that the auto-
morphism is given by a coordinate permutation. But now one easily checks
that setting h = σ(g) defines an automorphism of G that induces the given
isometry. In the case of an elementary abelian p-group G, the automorphism
group is simply GL(G) where G is viewed as a vector space over the field
Fp . In the case at hand we can choose a basis for G and then GLm (Fp ),
where |G| = pm , is precisely the group of isometric automorphisms of our
group algebra. We record this fact with

Proposition 4.3 The group of isometric automorphisms of the group al-


gebra F [G], where G is an elementary abelian p-group and F a field of
characteristic p, is GL(G) = Aut(G) in its natural action on the coordinate
set G.

A group algebra, F [G], comes canonically equipped with an involutory


anti-automorphism induced by the map G → G which sends a group ele-
ment to its inverse. When G is abelian this canonical map is an involutory
automorphism and clearly isometric. We denote this canonical involution
by x 7→ x; in GL(G) it is represented by the map g 7→ −g. As we shall see
this canonical automorphism plays an important role when discussing the
orthogonal of a code viewed in R. It is the analogue of taking the “reverse”
when computing orthogonals to cyclic codes.
The ideal M of R is intrinsically defined since it consists of the nilpotent6
elements of R. It follows that every automorphism fixes M and hence
all powers of M ; in particular, isometric automorphisms fix M r for all r.
There are isometries not given by automorphisms, of course. For example,
multiplication by X g yields an isometry of R; such an isometry is clearly
given by a translation in the vector space G. Since any ideal of R is fixed by
such a multiplication, all the powers of M are fixed. We thus have AGL(G)
acting as a group of isometries of R and fixing the ideals that are here of
interest. We have now explained in our new language — but in a more
general setting — what we already know about the Reed-Muller codes.
6
An element of a ring is nilpotent if some power of it is 0; in our case the generators of
M , viz. X g − 1, are 0 when raised to the pth power and it follows easily that all elements
of M are nilpotent. M is the Jacobson radical of the ring R.
4 THE GROUP-ALGEBRA APPROACH 30

In the early history of coding theory there was great interest in deciding
which extended cyclic codes were “affine invariant” and Kasami, Lin and
Peterson [29] settled the question. Because of the historic interest and the
motivation it will provide for the rest of this chapter, we discuss and prove
their result. We are here, as in all of this section, following Charpin [14].
First of all it must be emphasized that “affine invariant” refers not to
the group of isometries discussed above but to a smaller group; a more
precise name would be “translation-invariant extended cyclic codes”. The
point is that one does not demand invariance under the group AGL(G), but
only under the subgroup AGL1 (Fq ), where now we are viewing G as the
field Fq . There are many more codes invariant under this smaller group,
even if one insists that the codes be self-dual: see, for example, [17] where
all binary, affine-invariant self-dual codes of block length at most 512 have
been found and where evidence is presented to suggest that the number goes
to infinity with the admissible block length. On the other hand, the only
binary codes invariant under the larger group are the Reed-Muller codes
(see Theorem 4.17 below).
We shall see in a moment how to extend the cyclic codes in question so
that they will lie in the ideal M , but let us note first that a linear subspace
of R invariant under translation is simply an ideal of R. Our aim, therefore,
is to characterize those ideals invariant under the isometric automorphisms
given by X g 7→ X ug where u is a non-zero field element and where we have
identified G with Fq .

4.3 Translation-invariant extended cyclic codes


We prove here the theorem of Kasami, Lin and Peterson characterizing
“affine-invariant” cyclic codes.
Set n = pm − 1 and suppose C ⊂ F n is a cyclic code. Now C is specified
completely by the nth roots of unity that are roots of its generator polyno-
mial. We shall assume that 1 is not a root for we wish to extend C by an
overall parity check and we wish to avoid trivial cases. Let α be a primitive
nth root of unity, i.e. a primitive element of Fq , where q = pm . Then the
set of roots of the generator polynomial are specified by that subset T of
{1, 2, . . . , n − 1} where αi is a root if and only if i ∈ T . We shall refer to T
as the defining set of the cyclic code C. We embed C in R as follows:
n−1 n−1
X X i
(c0 , c1 , . . . , cn−1 ) 7→ (− ci )X 0 + ci X α .
i=0 i=0
4 THE GROUP-ALGEBRA APPROACH 31

Clearly the image, which we denote by C,


b is invariant under the map
X X
xg X g 7→ xg X αg
g∈G g∈G

and, indeed, any linear subspace of R invariant under this map comes from
a cyclic code. Since α is a generator of Fq× the image is invariant under the
maps given by X g 7→ X ug for all non-zero u ∈ Fq . We have, by our choice
of α, embedded all cyclic codes over F in M .
Consider next the following F -linear maps φs of R into the space G = Fq :
X X
φs ( xg X g ) = xg g s ,
g∈G g∈G

where 0 ≤ s < n. With the proviso that 00 = 1, the map φ0 is simply the
augmentation map with kernel M . Observe that if i is in the defining set of
a cyclic code C, then the image, C, b of C has the property that φi (c) = 0 for
all c ∈ C. Moreover, for any embedded cyclic code, φ0 (c) = 0 for all c ∈ C.
b b
Thus, we “extend” T to Tb = T ∪ {0} and note that the image of the cyclic
code is defined by Tb in the sense that c ∈ Cb if and only if φi (c) = 0 for all
i ∈ Tb. Hence we abuse the terminology and refer to Tb as the defining set
of C.
b
Unlike φ0 , φs is not an algebra homomorphism for s > 0. It does,
however, have an important multiplicative property which we now explain.
Let N = {0, 1, . . . , n} and define a partial order on N by k  l if and only
if kν ≤ lν for all ν, where k = m−1
P ν Pm−1 ν
ν=0 kν p and l = ν=0 lν p are the p-ary
expansions of k and l. We give k ≺ l the obvious meaning: k  l but k 6= l.
Then

Proposition 4.4 For all x, y ∈ R,


!
X s
φs (xy) = φi (x)φs−i (y).
is
i

Proof: Setting x = g∈G xg X g and y = h


P P
h∈G yh X and writing out the
definition of φs (xy) yields
s
!
X s
φs (xy) = φi (x)φs−i (y)
i=0
i
4 THE GROUP-ALGEBRA APPROACH 32

and an application of Lucas’s theorem7 gives the result. 2


We have immediately the following
Corollary 4.5 If I is an ideal of R and φs (x) = 0 for all x ∈ I, then
φi (x) = 0 for all x ∈ I and i  s.
Proof: Since xX g ∈ I for all x ∈ I the formula above yields
!
X s
φs (xX g ) = φi (x)g s−i = 0
is
i

for all g ∈ G× and unless the φi (x) = 0 for all i  s we would have a
s
non-zero polynomial, is i φi (x)Z s−i , of degree less than n with n roots,
P

namely the elements of G× . 2


The discussion above and the corollary yield the theorem of Kasami, Lin
and Peterson:

Theorem 4.6 A cyclic code of block length pm − 1 has an extension which


is translation invariant if and only if its defining set T does not contain 0
and has the property that s ∈ T implies i ∈ T for all i  s.

Proof: Clearly an extended cyclic code that is translation invariant is an


ideal and hence its defining set has the required property. On the other hand
if an extended cyclic code has a defining set with the required property the
formula shows that φs (cX g ) = 0 for all c ∈ Cb and all s ∈ T or, in other
words, that c ∈ Cb implies cX g ∈ Cb for all g ∈ G or that Cb is translation
invariant. 2
The powers of M are, of course, extended cyclic codes that are trans-
lation invariant since they are invariant under the larger group AGL(G).
Thus we should be able to determine their generator polynomials. The
reader should observe that these polynomials will depend on the choice of
α, but the defining sets are intrinsic to R since they are given by the ap-
propriate φi ’s. This intrinsic nature of the group-algebra approach has been
exploited in diverse directions by Charpin and her students. The interested
reader may wish to consult [16, 17]. In the following section we give the
promised defining sets for the Reed-Muller codes and look briefly at their
p-ary analogues.
7 s
 Q sν
 s

Lucas’s theorem states that i
≡ ν iν
modulo p. Hence i
is non-zero if and
only if i  s.
4 THE GROUP-ALGEBRA APPROACH 33

4.4 The generator polynomials of punctured Reed-Muller


codes and their p-ary analogues
We will in fact determine the defining set of R(r, m). If that set is Tb =
T ∪ {0} and α is the chosen nth -root of unity then the generator polynomial
of R(r, m)∗ is simply i∈T (Z − αi ). Now R(r, m) = M m−r and we are
Q

interested in the case where r < m. For r = m − 1 we know that T = ∅ since


m
M is, clearly, annihilated only by φ0 and, of course, R(m − 1, m)∗ = F22 −1,
the whole ambient space. Since the dimension of R(r, m) is k = i=0 mi
Pr
m
we know that |T | = 2m − 1 − k = m−r−1 i − 1 and that therefore
P
i=0
m−r−1 m 
|T | = i=0
P
i . This is the cardinality of the set of integers less than n
b
whose binary expansions have fewer than m − r entries equal to 1. As the
next proposition will show, Tb is precisely this defining set and therefore the
generator polynomial of R(r, m)∗ is
Y
(Z − αi )
0<wt2 (i)<m−r

where wt2 is the function given by the following more general

Definition 4.7 For any integers, k ≥ 0 and q > 1, the q-weight of k,


written wtq (k), is

X
wtq (k) = kν ,
ν=0
P∞ ν
where k = ν=0 kν q is the q-ary expansion of k.

Proposition 4.8 The defining set of the ideal M t in F2 [G], where G is the
elementary abelian 2-group of order 2m , is that subset of {0, 1, . . . , 2m − 2}
whose elements have binary expansions containing fewer than t entries equal
to 1. That is, the defining set is {i | 0 ≤ i < 2m − 1 and wt2 (i) < t}.

Proof: We use induction on t. We have the result for t = 1 since 0 is the


only integer k with wt2 (k) = 0. Suppose the result true for t and consider
t + 1. Now, a typical generating element of M t+1 is of the form x(X g − 1)
where x ∈ M t . By the nested nature of the ideals we know, of course, that
M t+1 is annihilated by all φs with wt2 (s) < t and we need only show that
it is annihilated by those φs with wt2 (s) = t. For such an s we have that
!
X s
g
φs (x(X − 1)) = φi (x)φs−i (X g − 1)
is
i
4 THE GROUP-ALGEBRA APPROACH 34

!
X s
g
= φs (x)φ0 (X − 1) + φi (x)φs−i (X g − 1).
i≺s
i

But the first summand on the right side is 0 since X g − 1 ∈ M and the
second summand is 0 since i ≺ s implies wt2 (i) < wt2 (s) = t. Since we
know the dimension of the ideal, we must have precisely the defining set. 2
The above proof is due to Charpin [13]; observe that it does not depend
on the fact that we are in characteristic 2 and, therefore, proves more. We
have, in fact, proved the following

Proposition 4.9 Let R = F [G] where G is an elementary abelian p-group


and F a field of characteristic p. Then the ideal M t , where M is the ideal
of nilpotent elements of R, is annihilated by all φs with wtp (s) < t.

In the event that the field F is not a subfield of G one must take an overfield
of both in order to have a target for the functions φs , but this does not effect
the proof.
Of course M t is an extended cyclic code invariant under translation,
but since we have not yet computed its dimension, we cannot assert that
we have its defining set — as we did in the binary case. The proposition
does, however, show that the dimension is at most equal to |{i | 0 ≤ i <
pm − 1, wtp (i) ≥ t}| since we are in the presence of an extended cyclic code
— which means that dimF (M t ) = pm − |Tb|, where Tb is the defining set
of M t . We will soon exhibit linearly independent elements that will give
us not only this dimension but also the so-called “Jennings8 Basis” of the
algebra F [G].
Let {g0 , g1 , . . . , gm−1 } be a basis of the Fp -space G. For any k =
Pm−1 ν Qm−1 gν
ν=0 kν p , where 0 ≤ kν < p for all ν, set Jk = ν=0 (X − 1)kν . Clearly
Jk ∈ M t whenever wtp (k) ≥ t. Moreover, these elements are linearly in-
dependent over F , where F is any field of characteristic p. For suppose
wtp (k)≥t ak Jk = 0, where all ak ∈ F . Choose j such that wtp (j) is a mini-
P

mum with aj 6= 0 and set j = m−1 ν


P
Qm−1 gν ν=0 jν p . Multiplying the linear relation by
ν=0 (X − 1)p−1−jν , bearing in mind that (X g − 1)p = 0 for any g, yields
Qm−1 gν
aj ν=0 (X − 1)p−1 = aj 1 = 0 by Lemma 4.1, and hence that aj = 0.
We have thus proved the following
8
In fact, this basis first appeared in a paper by Lombardo-Radice, [38]. Lombardo-
Radice goes over the same ground as Jennings did ([28]) but only for abelian groups;
Jennings was aware of the work of Lombardo-Radice and extended that work to arbitrary
p-groups.
4 THE GROUP-ALGEBRA APPROACH 35

Theorem 4.10 Let G be an elementary abelian p-group of order pm and


F a field of characteristic p. For any basis {g0 , . . . , gm−1 } of G, the pm
elements
m−1
Y
(X gν − 1)eν
ν=0

where 0 ≤ eν < p form a linear basis for R = F [G]. Moreover,


m−1
Y m−1
X
{ (X gν − 1)eν | eν ≥ t, 0 ≤ eν < p}
ν=0 ν=0

form a basis of M t , where M is the radical of R.

Such a basis for F [G] was exploited by Jennings [28] and is called a
Jennings basis of the group algebra. It is, as the construction shows, in-
dependent of the coefficient field of the modular algebra and simultaneously
exhibits bases for all powers of the radical.
We note here that the index of nilpotency of the radical is 1 + m(p − 1);
i.e.
M 1+m(p−1) = 0
but M k 6= 0 for any smaller k. Just as in the binary case, M m(p−1) is the
repetition code generated by m−1 gν − 1)p−1 = P g
Q
ν=0 (X g∈G X = 1; it is the
minimal ideal of R, which means that it is contained in every non-zero ideal
of R, a fact that is easily seen using the Jennings basis.

Corollary 4.11 The code M t is a code of block length pm , dimension


|{k|0 ≤ k < pm , wtp (k) ≤ m(p − 1) − t}| and minimum weight (b + 1)pa ,
where t = a(p − 1) + b with 0 ≤ b < p − 1. As an extended cyclic code its
defining set is {i | 0 ≤ i < pm − 1, wtp (i) < t}.

Proof: The dimension is |{k|0 ≤ k < pm , wtp (k) ≥ t}|, of course, but
taking the set of complements, (pm − 1) − k, gives the above description
— which is sometimes more useful. As for the minimum weight, the BCH
bound implies that the minimum weight is at least as announced, since
k = a−1 ν a a −1 is the smallest integer with wt (k) = t.
P
ν=0 (p−1)p +bp = (b+1)p p
On the other hand, (X g0 − 1)b aν=1 (X gν − 1)p−1 yields a vector of the given
Q

weight since the product is the characteristic function of the a-dimensional


subspace generated by {g1 , . . . , ga }, and multiplying by (X g0 − 1)b merely
takes the sum of b + 1 distinct, weighted translates.2
4 THE GROUP-ALGEBRA APPROACH 36

Observe that the minimum-weight vectors we have exhibited have their


supports lying in an (a+1)-dimensional subspace of G, namely the subspace
generated by {g0 , g1 , . . . , ga }. When t is divisible by p − 1 these minimum-
weight vectors are characteristic functions of subspaces — just as in the
binary case.

Corollary 4.12 If t = a(p − 1) then M t contains the incidence vector of


every a-flat of the affine geometry AGm (Fp ) and hence the code over Fp of
the design of points and a-flats of AGm (Fp ).

Example 4.13 There is a simple formula, easily derived, for the dimension
of M (m−1)(p−1) , since it is the number of ways of selecting at most p − 1
objects — repetitions allowed — from a set of m objects. One has then (cf.
Example 5.6) that
!
(m−1)(p−1) m+p−1
dim(M )=
m

and that among the minimum-weight vectors one finds the characteristic
functions of flats of codimension 1. As we shall see it is the code over Fp of
this affine design.

4.5 Orthogonals and annihilators


We have already seen that for the Reed-Muller codes

R(r, m)⊥ = R(m − r − 1, m)

for 0 ≤ r < m, or — in the current language — that the Reed-Muller code


(M m−r )⊥ is precisely M r+1 . Moreover, the same equality is true if we
replace the orthogonal by the annihilator in the group algebra.
More precisely, if S is any subset of R we set

Ann(S) = {x ∈ R | xs = 0 for all s ∈ S}.

Then, since M m(p−1)+1 = {0}, Ann(M t ) ⊇ M m(p−1)+1−t and a dimen-


sion argument yields the equality. We explain the entire matter using the
canonical automorphism

xg X −g =
X X X
x= xg X g 7→ x−g X g = x
g∈G g∈G g∈G
4 THE GROUP-ALGEBRA APPROACH 37

introduced in Section 4.2.


In order to eliminate visual confusion we will use the following notation
for the usual inner product:
hX X i X
[x, y] = xg X g , yg X g = xg yg .

Of course, for a code C ⊆ R,

C ⊥ = {x ∈ R | [x, c] = 0 for all c ∈ C}.

We have immediately the following “adjoint” relationship:

[x, y] = [x, y].



This shows that S = S ⊥ , where we have set, for S ⊆ R, S = {s | s ∈ S}.
P P
Now, since xy = 0 if and only if h∈G xk−h yh = h∈G xk+h y−h = 0 for
all k ∈ G, xy = 0 if and only if [X k x, y] = 0 for all k ∈ G, and we get the
following result — which is the analogue of Theorem 5.23 of Chapter 1 giving
the orthogonals to cyclic codes and, moreover, admits a generalization to a
more general case: Proposition 1.2 of Chapter (Ward).
Proposition 4.14 For an ideal I of R

Ann(I) = I ⊥ = I .

The ideals we are concerned with are invariant under all the isometric au-
tomorphisms and, in particular, under the canonical automorphism. Hence
we have
Corollary 4.15 If an ideal I is invariant under the canonical automor-
phism, i.e. if I = I, then Ann(I) = I ⊥ . In particular, in the group algebra
R we have that

Ann(M t ) = (M t )⊥ = M m(p−1)+1−t .

Example 4.16 Taking t = m(p − 1), we have that M = (Fp 1)⊥ and, in
particular, is of codimension 1 in R, a fact that has emerged in various ways
during our discussion of the group-algebra approach.

Remark: Observe that even if an ideal is not invariant under the canonical
automorphism, the proposition shows that its annihilator and its orthogonal
are equivalent codes.
4 THE GROUP-ALGEBRA APPROACH 38

4.6 The codes of the designs from AGm (Fp )


We next prove that M r(p−1) is the code, when p is prime, of the design
of points and r-flats of the affine geometry AGm (Fp ), generalizing what we
already know when the prime is 2 (the Reed-Muller case). This more general
result follows easily from a corollary of a theorem of Delsarte [19] which
characterizes the subspaces left invariant by AGLm (Fq ) acting naturally
on the group ring Fq [G] where G is the additive group of the field Fqm .
The corollary characterizes the codes in the group ring R invariant under
AGLm (Fp ) acting naturally on R as precisely the powers of M . A recent
elegant proof of this result by Weidner [54], using the Jennings basis, will
be sketched now, but we will return to this matter in the next section.

Theorem 4.17 Let R be the group ring over Fp of an elementary abelian


p-group, G, of order pm with AGL(G) = AGLm (Fp ) — the automorphism
group of the group G — acting naturally on R. Let M be the radical of R.
Then the only subspaces of R invariant under AGLm (Fp ) are the powers of
M . In group-theoretical terms R, viewed as a module over AGLm (Fp ), is
uniserial 9 and M t /M t+1 is an irreducible GLm (Fp )-module for 0 ≤ t ≤
m(p − 1).

Proof: We know, of course, that the powers of M are invariant under


AGLm (Fp ). Any subspace invariant under AGLm (Fp ) is necessarily an ideal
of R and, since M m(p−1)+1 = 0, given any ideal I of R there is a smallest
t with M t ⊆ I and M t−1 6⊆ I — unless, of course, I = R in which case
we have our assertion since, by convention, R = M 0 . If I 6= M t , then
there is an x ∈ I which is not in M t and because I is an ideal multiplying
by a suitable element of R insures that x ∈ M t−1 ∩ I but x 6∈ M t . But
then M t−1 ∩ I would be a proper AGLm (Fp )-submodule of M t−1 strictly
containing M t , an impossibility whenever M t−1 /M t is irreducible. Thus
we are reduced to showing that M t /M t+1 is irreducible as an AGLm (Fp )-
module for all t. In fact we will show that it is an irreducible GLm (Fp )-
module for all t. (A slight change at the end of the argument would, in
fact, show that it is an irreducible SLm (Fp )-module but we do not need this
generality for the purpose at issue.)
Of course, the action of GLm (Fp ) on R is given once a basis of the
elementary abelian group G is given and then that action is given by the
9
A module is “uniserial” if it has only one composition series and “irreducible” if it
has no proper submodules.
4 THE GROUP-ALGEBRA APPROACH 39

realization of the group via non-singular m×m matrices over Fp . We slightly


change our notation letting the basis of the elementary abelian group be
g1 , g2 , . . . , gm . Then, for an element σ ∈ GLm (Fp ) represented by the matrix
(aij ), we have
m
X
σgi = aij gj
j=1

and hence, by definition,


Pm
gi σgi aij gj
σX =X =X j=1 .

Now set xi = X gi − 1. A basis for M /M 2 is given by the images of


x1 , x2 , . . . , xm and a basis for the quotient M t /M t+1 is given by the images
of elements of the form xki i where 0 ≤ ki ≤ p − 1 and
Q P
ki = t. These
elements are part of the Jennings basis of R given by our choice of the basis
of G. Moreover, the image of
m
xki i |0 ≤ ki ≤ p − 1,
Y X
{ ki = t}
i=1

in M t /M t+1 is a basis (over Fp ) of that GLm (Fp )-module. We will sys-


tematically throughout the proof work with these elements and ignore any
elements of M t+1 that arise during calculations; this is tantamount to work-
ing in the quotient space. As an example of this caveat we note — since

X g+h − 1 = (X g − 1) + (X h − 1) + (X g − 1)(X h − 1)

and since we are working over a prime field — that, modulo M 2 ,


P m
X
gi aij gj
σ(xi ) = σ(X − 1) = X −1≡ aij xj
j=1

when σ is given by the matrix (aij ). Of course, since the σ are algebra
homomorphisms, we have that
m m
xki i ) =
Y Y
σ( (σxi )ki .
i=1 i=1

It is well-known — and easy to prove — that a p-Sylow subgroup of


GLm (Fp ) is given by the lower triangular matrices with 1’s on the diagonal
and that any non-zero S-module (when the field is Fp ) contains a non-zero
4 THE GROUP-ALGEBRA APPROACH 40

element left fixed by all elements of S.10 Letting S be this p-Sylow subgroup
we investigate the action of S on M t /M t+1 , which we also denote by V t .
One first shows by induction on m that every non-zero S-submodule,
W , of V t contains the image of the element wt = xp−1 1 · · · xp−1 b
a xa+1 where
t = a(p − 1) + b with 0 ≤ b < p − 1. For m = 1 this is obvious since, in
this case, modulo M t+1 , M t is generated over Fp by xt1 . Let m > 1 and
let w be an element of M t not in M t+1 that represents an element of the
submodule W fixed by all elements of S; write
p−1
X
w= wi xim
i=0

where wi ∈ M t−i ∩ hx1 , . . . , xm−1 i. Set k = max{i|wi 6= 0}. For k = 0


the induction on m gives the result. Suppose k 6= 0. For 1 ≤ i < m define
σi ∈ S by: (
xj if j 6= m ;
σi (xj ) =
xm + xi if j = m .
Then
k
X k
X
σi w = wj (xm + xi )j = vj xjm
j=0 j=0

for some vj ∈ M t−j ∩ hx1 , . . . , xm−1 i with vk−1 = wk−1 + kxi wk . On the
other hand σi w = w and hence vk−1 = wk−1 yielding kxi wk = 0. But
k is a positive integer less that p and hence non-zero in Fp . So we have
that x wk = 0 for 1 ≤ i < m and it follows that wk is a scalar multiple of
Qm−1 i p−1
1 xi which entails a = m − 1 and b = k. But then, wi , for i < k, is
in M (m−1)(p−1)+k−i ∩ hx1 , . . . , xm−1 i = 0 and w is a scalar multiple of the
sought wt . Thus, we have the assertion.
Next, setting t0 = m(p − 1) − t, consider the bilinear map

φ : V t × V t0 → V m(p−1) ≈ Fp

given by φ(x, y) = xy. It is invariant under GLm (Fp ), i.e. φ(σx, σy) =
φ(x, y) for all σ. Moreover, the form is non-degenerate. Thus there is a
vt ∈ V t with φ(vt , wt0 ) = 1. It follows that
m
xp−1
Y
vt = xba i .
i=m+1−a
10
In other words, over Fp a p-group has only the trivial irreducible representation.
4 THE GROUP-ALGEBRA APPROACH 41

We claim that vt generates V t as an S-module: let W be the S-submodule


generated by vt and set

W ⊥ = {y ∈ V t0 | φ(w, y) = 0 for all w ∈ W }.

W ⊥ is also an S-submodule because of the invariance and, because wt0 is


not in W ⊥ , W ⊥ = 0 — which gives that W = V t .
Finally, consider the the element of GLm (Fp ) that sends xi to xm+1−i
for 1 ≤ i ≤ m. It sends wt to vt which shows that any non-trivial GLm (Fp )-
submodule of V t must, in fact, be V t . Thus these modules are irreducible. 2
Note: Another proof — somewhat more robust since it has something to
say about non-prime fields — of this result due to Mortimer [44] will be
given in Section 5.5.
Now the code generated by the r-flats of the affine geometry AGm (Fp )
is clearly invariant under the group AGLm (Fp ) and is contained in M r(p−1)
but not in M r(p−1)+1 . This yields

Theorem 4.18 For any prime p, the code of the design of points and r-flats
of the affine geometry AGm (Fp ) is M r(p−1) .

Since the dimension of M (m−1)(p−1) is easy to compute we have a proof


of the following important fact:

Corollary 4.19 The dimension of the code over Fp of the design of points
and (m − 1)-flats of AGm (Fp ) is
!
m+p−1
.
m

It is equally easy to compute the dimension of M p−1 , which gives us the


following

Corollary 4.20 The dimension of the code over Fp of the design of points
and lines of AGm (Fp ) is
!
m m+p−2
p − .
m
5 GENERALIZED REED-MULLER CODES 42

Proof: The proof consists of observing that, by complementation, one need


only count the number of ways of choosing p − 2 objects from m objects —
repetitions allowed. 2
Remark: In both cases the formulas are simple binomial coefficients since
one does not have to worry about the constraint p − 1 on the number of
repetitions allowed.
Although we now know the dimensions and minimum weights of the
codes coming from AGm (Fp ) we have not yet determined the nature of
the minimum-weight vectors nor have we discussed the codes coming from
P Gm (Fp ). We postpone that discussion until Section 5.7 where all the rel-
evant facts are established. See, for example, Theorem 5.42 and Theo-
rem 5.44.

5 Generalized Reed-Muller codes


5.1 Introduction
Our description of generalized Reed-Muller codes is based, primarily, on
the now classic paper of Delsarte, Goethals and MacWilliams [18]. We
have, however, reworked the material in several important respects and have
introduced a different notation. The definitions are based on the polyno-
mial codes introduced by Kasami, Lin and Peterson [30, 31]; these authors
introduced the primitive generalized Reed-Muller codes and Weldon [55]
introduced the non-primitive generalized Reed-Muller codes and the single-
variable approach using the Mattson-Solomon polynomial. Our treatment
of that polynomial appears to be new in that we view it in a quotient ring
that is slightly different from the one traditionally used.
Were it not for the complication introduced by moving from a prime
field to Fq , where q is a proper prime power, much of the material in this
section could be avoided. The reader interested only in the prime case will,
however, need to read Section 5.7 and the previous material necessary for
its understanding.

5.2 Definitions
First we describe the so-called m-variable approach. This is entirely anal-
ogous to our approach to the Reed-Muller codes (which are, simply, the
5 GENERALIZED REED-MULLER CODES 43

generalized Reed-Muller codes for q = 2) and the generalization is straight-


forward (the functions involved being Fq -valued — rather than boolean —
and having Fq -valued variables).
Let q = pt , where p is a prime. Set E = Fq and let V be a vector space
of dimension m over E. Again we will denote a general vector in V by v,
and we will take V to be the space E m of m-tuples, with standard basis
e1 , . . . , em , where
ei = (0, 0, . . . , 1, 0, . . . , 0).
| {z }
i
Our codes will be q-ary codes, i.e. codes over E, and the ambient space will
be the function space E V , with the usual basis of characteristic functions of
the vectors of V . As in Section 3, we can denote the members f ∈ E V by
functions of the m-variables denoting the coordinates of a variable vector in
V , i.e. if
x = (x1 , x2 , . . . , xm ) ∈ V,
then f ∈ E V is given by
f = f (x1 , x2 , . . . , xm )
and the xi take values in E. Since every element in E satisfies aq = a, the
polynomial functions in the m variables can be reduced modulo xqi − xi (as
was done in Section 3 for q = 2) and we can again form the set M of q m
monomial functions
M = {xi11 xi22 . . . ximm | 0 ≤ ik ≤ q − 1, k = 1, 2, . . . , m}. (1)
Pm
For a monomial in M the degree ρ is the total degree, i.e. ρ = k=1 ik and
we have that 0 ≤ ρ ≤ m(q − 1). We will shortly show that M forms another
basis (that we will not use for the codes) of E V — as was done for q = 2;
we do this in an entirely analogous way by expressing each characteristic
function of a vector as a polynomial — in fact, a linear combination, with
coefficients in Fq , of members of M:
Lemma 5.1 For w = (w1 , w2 , . . . , wm ) ∈ V ,
m 
Y 
vw = 1 − (xi − wi )q−1 .
i=1

Proof: Since aq−1 = 1 for any non-zero a ∈ E, 1−(xi −wi )q−1 = 0 whenever
xi 6= wi ; thus the polynomial function on the right is clearly the same as the
characteristic function on the left. 2
5 GENERALIZED REED-MULLER CODES 44

Example 5.2 The polynomial 1 − (xi − a)q−1 is the characteristic function


of the (m − 1)-flat in E m given by the equation Xi = a. This polynomial is
not linear unless q = 2.

Since E V has dimension q m , M, being of cardinality q m , is another basis


for E V — by the above lemma. The space E V can then be viewed as the
space of all polynomials (reduced modulo xqi − xi ) in the m variables; i.e. all
linear combinations with coefficients in E of the monomials in M. We use
this interpretation to define the generalized Reed-Muller codes:

Definition 5.3 Let E = Fq , where q = pt and p is a prime, and set V =


E m . Then for any ρ such that 0 ≤ ρ ≤ m(q−1), the ρth order generalized
Reed-Muller code RE (ρ, m) over E is the subspace of E V (with basis
the characteristic functions of the vectors in V ) of all reduced m-variable
polynomial functions of degree at most ρ. Thus
m
* +
xi11 xi22
X
RE (ρ, m) = RFq (ρ, m) = . . . ximm ∈ M| ik ≤ ρ .
k=1

Example 5.4 (1) For q = 2 and 0 ≤ ρ ≤ m, RF2 (ρ, m) = R(ρ, m).


m
(2) For any q, RFq (0, m) = Fq  = hi and R(m(q − 1), m) = Fqq , the
entire ambient space.

The dimension of a generalized Reed-Muller code can be obtained by


simply counting the number of elements in its obvious monomial basis:

Theorem 5.5 For any ρ such that 0 ≤ ρ ≤ m(q − 1),


ρ X
m
! !
X m i − kq + m − 1
dim(RFq (ρ, m)) = (−1)k
i=0 k=0
k i − kq
m
! !
X
k m m + ρ − kq
= (−1) .
k=0
k ρ − kq

Proof: We use the fact that the number of ways of picking jobjects from
a set of m objects — with repetitions allowed — is j+m−1 = j+m−1

m−1 j .
An inclusion-exclusion argument shows that the inner sum is the number
of ways of picking i objects from a set of m objects, when no object can
be chosen more than q − 1 times. Summing on i yields the result. The
simplification to a single sum is due to Calkin. 2
5 GENERALIZED REED-MULLER CODES 45

Example 5.6 (1) By a direct count we have dim(RFq (1, m)) = 1 + m,


just as in the binary case — but note that the function xi is zero
on the hyperplane Xi = 0 but takes on many different values off the
hyperplane unless q = 2. Since every non-constant linear polynomial
in the m variables has q m−1 zeros, these codes have minimum weight
q m − q m−1 = q m−1 (q − 1) and all code vectors except the zero vector
and non-zero multiples of 1 have this weight.

(2) By a direct count we have, for 0 ≤ ρ ≤ q − 1, dim(RFq (ρ, 1)) = 1 + ρ,


and, since a polynomial in one variable of degree at most ρ can have
at most ρ distinct roots, the minimum weight in this code is q − ρ,
there being a polynomial of degree ρ with exactly ρ distinct roots since
ρ < q. These codes are of genus zero in the sense of Tsfasman11 and
Vlăduţ, [53].

(3) More generally, if ρ ≤ q − 1 no choice of i1 , i2 , . . . , im with


P
ik = ρ
will ever have an ik > q − 1 and inclusion-exclusion is unnecessary in
the proof. Moreover, by introducing a “dummy” object the number
of ways of choosing at most ρ things — repetitions allowed — from a
set of m objects is easily seen to be ρ+m

m and hence one sees directly
that !
ρ+m
dim(RFq (ρ, m)) = , for 0 ≤ ρ ≤ q − 1.
m

Just as in the binary case, the codes orthogonal to generalized Reed-


Muller codes are again generalized Reed-Muller codes. Again the proof is
entirely analogous to the binary case, but we need a lemma which generalizes
the result that R(m − 1, m) consists of even-weight vectors. The required
result is an easy consequence of the orthogonality relations; thus the proof
depends simply on the fact that n−1 i th root of
P
i=0 ω = 0 whenever ω is an n
unity different from 1.

Lemma 5.7 If f ∈ E V has degree ρ < m(q − 1) as a polynomial in


x1 , x2 , . . . , xm then X
f (w) = 0.
w∈V

In fact, if f is any linear combination of the elements of M with coefficients


in any overfield of E, the same result holds.
11
These codes are usually referred to as MDS codes, or sometimes optimal codes, in the
coding literature; we have chosen to adopt the new terminology introduced in [53].
5 GENERALIZED REED-MULLER CODES 46

Proof: The result is clearly true for any constant function (since the block
length of the code is a multiple of, in fact a power of, p, the characteristic
of Fq ) so we need to prove the assertion only for monomial functions, i.e.
elements of M, of positive degree less than m(q − 1). Moreover, if in such
a monomial any ik = 0, the sum is again a multiple of q and hence 0. We
thus restrict ourselves to those monomials in which every xi appears. The
orthogonality relations for the group E × × . . . × E × , using E itself as the
field where the characters take their values, yields immediately, taking η
as the principal character and χ the character sending a = (a1 , . . . , am ) to
ai11 . . . aimm , that X i
a11 . . . aimm = 0
a
since there is some k for which ik < q − 1, and since the sum only need be
taken over those vectors all of whose entries are non-zero. 2

Theorem 5.8 For ρ < m(q − 1)

RFq (ρ, m)⊥ = RFq (m(q − 1) − 1 − ρ, m).

Proof: If f has degree at most ρ and g has degree at most m(q − 1) − 1 − ρ,


then the product f g has degree less than m(q − 1). Thus Lemma 5.7 implies
that X
f (w)g(w) = 0
w∈V

and the corresponding codewords are orthogonal. Hence

RFq (ρ, m)⊥ ⊇ RFq (m(q − 1) − 1 − ρ, m)

and now we need only check the dimensions: the involution of M that
sends xi11 . . . ximm to xq−1−i
1
1
. . . xq−1−i
m
m yields the fact that the number of

monomials of degree greater than m(q − 1) − 1 − ρ is equal to the number of


degree less than or equal to ρ and hence dim(RFq (m(q − 1) − 1 − ρ, m)) =
q m − dim(RFq (ρ, m)), as required. 2
The generalized Reed-Muller codes are codes over possibly non-prime
fields and thus could not be the codes of designs coming from affine geome-
tries — unless q happens to be a prime. They do contain the incidence vec-
tors of flats in the geometry, as we will shortly see. In order to demonstrate
this it is convenient, notationally, to make an observation — important in
itself — about the automorphism group of RFq (ρ, m):
5 GENERALIZED REED-MULLER CODES 47

Theorem 5.9 For 0 ≤ ρ ≤ m(q −1), the automorphism group of RFq (ρ, m)
contains the affine group AGLm (Fq ) in its natural action on V = Fqm .

Proof: Recall that for any code C ⊆ F P , an automorphism of C is a


permutation σ of P that preserves C, i.e. for which, if c ∈ C, cσ ∈ C, where
cσ is defined by cσ (P ) = c(σ(P )) for P ∈ P.
Now γ ∈ AGLm (Fq ) is given by

γ : v 7→ Av + a,

where v, a ∈ V = E m — viewed as column vectors — and A is a non-singular


m × m matrix over E. Thus, for f ∈ RFq (ρ, m), f γ is defined by

f γ (x) = f (Ax + a),

and so, clearly, f γ ∈ RFq (ρ, m). 2


Note: Berger and Charpin [5] have shown that AGLm (Fq ) is the full group
of permutation automorphisms of these generalized Reed-Muller codes —
when, of course, 0 < ρ < m(q − 1). See Chapter (Huffman) for the details.

Theorem 5.10 For 0 ≤ r ≤ m and ρ ≥ r(q − 1), the generalized Reed-


Muller code RFq (ρ, m) contains the incidence vector of any (m − r)-flat of
AGm (Fq ).

Proof: Any (m − r)-flat in AGm (Fq ) consists of all the points x of V satis-
fying r independent equations
m
X
aij Xj = wi , for i = 1, 2, . . . , r
j=1

where all aij and wi are in Fq . If the code RFq (ρ, m) contains the incidence
vector of some t-flat, then it will contain the incidence vector of every t-flat,
since the affine group AGLm (Fq ) acts transitively on t-flats and, as we have
just seen, preserves the code. So we need only construct one (m − r)-flat
that is in RFq (ρ, m).
Consider the polynomial
r  
1 − xq−1
Y
p(x1 , . . . , xm ) = i ,
i=1
5 GENERALIZED REED-MULLER CODES 48

of degree r(q − 1). Then p(x1 , . . . , xm ) = 0 on V unless xi = 0 for i =


1, 2, . . . , r. Thus the codeword corresponding to p(x) has the entry 1 at
points on the (m − r)-flat defined by the r equations

X1 = 0, X2 = 0, . . . , Xr = 0

and the entry 0 at points off the flat. Hence it is the incidence vector of this
(m − r)-flat. Since p(x1 , . . . , xm ) ∈ RFq (ρ, m) for ρ ≥ r(q − 1), we have the
result. 2
As in the binary case the proof shows more, namely that the generalized
Reed-Muller code contains all (m − s)-flats for 0 ≤ s ≤ r. Moreover, the
subcode generated by the (m−r)-flats contains, by the same induction argu-
ment used in the binary case, all (m − s)-flats for 0 ≤ s ≤ r. Note, however,
that when using characteristic functions of t-flats to obtain characteristic
functions of (t + 1)-flats one could use coefficients other than 1 provided
q > 2 and hence obtain vectors that are supported on the (t + 1)-flat but
are not characteristic functions.

Example 5.11 Take q = 3 and m = 2. The geometry is then AG2 (F3 ), the
affine plane of order 3. Let C = C3 (AG2 (F3 )) be the code over F3 associated
with this plane, i.e. the code generated by the incidence matrix of the plane.
The incidence vectors of the lines (1-flats) will be in RF3 (ρ, 2) for ρ = 2, 3
and 4. In fact C = RF3 (2, 2), while, as we know, RF3 (3, 2) = (F3 )⊥ and
RF3 (4, 2) = F39 , the entire ambient space.

This example is indicative of what happens in the case of planes over


prime fields. A rather easy argument using elementary divisors (see [2,
Chapter 6]) shows that the dimension of the code of any affine plane of prime
order p is p+1

2 and since the computation of the dimension of RFp (p − 1, 2)
is also easy (see Example 5.6 above) and also yields p+1

2 we have a proof,
which avoids the use of Delsarte’s theorem, of the following

Proposition 5.12 The code over Fp of the desarguesian affine plane of


prime order p is the generalized Reed-Muller code RFp (p − 1, 2).

Since it is easy to see ([2, Corollary 6.4.1]) that the code over Fp of
any affine plane of order p has as minimum-weight vectors only the scalar
multiples of the characteristic functions of lines of the plane, the above
proposition yields an elementary proof of the following
5 GENERALIZED REED-MULLER CODES 49

Corollary 5.13 The generalized Reed-Muller code RFp (p − 1, 2) is a


p(p + 1)
[p2 , , p]
2
code over Fp all of whose minimum-weight vectors are scalar multiples of
characteristic functions of 1-flats of Fp2 .
In terms of the modular algebra R = Fp [G], where G is the elementary
abelian p-group of order p2 , the result above is expressed as
M p−1 = RFp (p − 1, 2) = Cp (AG2 (Fp )).
These equalities should, strictly speaking, be isomorphisms but if we take the
point of view that the set on which the various functions involved are defined
is a fixed copy of Fp2 , we actually can assert equality. We shall see later
on (Theorem 5.19) that, more generally, in the prime case M m(p−1)−ρ =
RFp (ρ, m), which is a generalization of Berman’s theorem.
Just as for the Reed-Muller codes, we can remove a coordinate position
to obtain a code of length q m − 1, which turns out to be cyclic:
Definition 5.14 The ρth order punctured generalized Reed-Muller
code, where 0 ≤ ρ < m(q − 1), denoted by RFq (ρ, m)∗ , is the code of length
q m − 1 obtained by deleting the coordinate position 0 from RFq (ρ, m).
For q = 2, RF2 (ρ, m)∗ = R(ρ, m)∗ , the punctured Reed-Muller code.
These are also called shortened generalized Reed-Muller codes (see van Lint
[37]) or cyclic generalized Reed-Muller codes (see Blake and Mullin [8]) since
our next result will show they are cyclic (which we already know for q = 2).
Observe also that any coordinate position can be deleted in place of 0, since
AGLm (Fq ) acts transitively on the vectors of V = E m .
Theorem 5.15 For any ρ such that 0 ≤ ρ < m(q − 1), the automorphism
group of RFq (ρ, m)∗ contains the general linear group GLm (Fq ). In partic-
ular, RFq (ρ, m)∗ is a cyclic code.
Proof: The group GLm (Fq ) is the stabilizer of 0 in AGLm (Fq ), so it ob-
viously acts on RFq (ρ, m)∗ . We can obtain a cyclic group of order q m − 1
acting on it in the usual way: consider the field K = Fqm and let ω be a
primitive element in K; now K, as a vector space over Fq = E, is isomorphic
to V and multiplication by ω simply cycles the elements of K × = V − {0}.
(This map also yields a Singer cycle on the projective points — as discussed
earlier in Section 2.) 2
5 GENERALIZED REED-MULLER CODES 50

Corollary 5.16 Provided that ρ < m(q − 1), the generalized Reed-Muller
codes RFq (ρ, m) are extended cyclic codes and of the same dimension as the
corresponding cyclic codes RFq (ρ, m)∗ .
Proof: By Lemma 5.7, f (0) = −
P
w6=0 f (w) provided that the degree of f
is less than m(q − 1). 2

5.3 The single-variable approach


We introduce now the single-variable approach to the generalized Reed-
Muller codes utilizing the Mattson-Solomon polynomial [42].
Taking ω to be a primitive element of K = Fqm and, using the same
notation as above, set v = q m − 1 and consider the vector space of polyno-
mials in Z with coefficients in K and of degree less than v. Then Lagrange
interpolation (see, for example, [48]) shows that any function from K × to K
is given uniquely by such a polynomial, viewed as a polynomial function in
the single variable Z. In terms of the characteristic functions of the points
ω i of V ∗ = V − {0} ≈ K × , where V = Fqm ≈ K, such a polynomial function
can be written as
v−1
X
P (Z) = P (ω i )gi (Z), (2)
i=0
where gi (Z) denotes the characteristic function of {ω i }, i.e.
(Z v − 1)
gi (Z) = −ω i . (3)
(Z − ω i )
Clearly the polynomials Z i , for i = 0, 1, . . . , v − 1, form an alternative
basis, and the correspondence is given as follows: if
v−1
X
P (Z) = cj Z j , where cj ∈ K,
j=0

then, using the discrete Fourier transform and noting that 1/v is −1 when
viewed in K,
v−1
P (ω i )ω −ji = −φv−j (P ),
X
cj = − (4)
i=0
where φs is the function defined in Section 4.3. Then
v−1
X v−1
X
P (Z) = cj Z j = − φv−j (P )Z j ,
j=0 j=0
5 GENERALIZED REED-MULLER CODES 51

is the Mattson-Solomon polynomial of the function P .


We now restrict to those functions taking values in E = Fq ⊆ K, i.e. we
require that P (ω i ) ∈ E for all i. This is equivalent, from Equation (4), to
requiring that (cj )q = cqj , where the subscripts are taken modulo v = q m −1.

These functions form an E-subspace of K V . Denoting this subspace by L
and writing its vectors in terms of the basis of characteristic functions, we
have
 
 v−1
X 
L= (P (1), . . . , P (ω v−1 )) | P (Z) = cj Z j , cj ∈ K, (cj )q = cqj .
 
j=0

The vector space L over the field E corresponds with the vector-space struc-
ture of the polynomial ring E[Y ]/(Y v − 1) via
v−1
X
(P (1), . . . , P (ω v−1 )) 7→ P (ω i )Y i .
i=0

In fact a polynomial f (Y ) = v−1 i


P
Pv−1 i=0 ai Y corresponds to the function P (Z)
defined by P (Z) = j=0 cj Z , where cj = −f (ω −j ). If the polynomial g(Y )
j

divides Y v − 1, then the cyclic code generated by g(Y ) contains f (Y ) if and


only if f (ω −j ) = 0 for all zeros ω −j of g(Y ). The corresponding P (Z) ∈ L
has the property that cj = 0 if ω −j is a root of g(Y ). Thus the cyclic code
with zeros {ω −j | j ∈ T }, where T ⊆ {0, 1, . . . , v − 1}, can be characterized
as
 
 v−1
X 
(P (1), . . . , P (ω v−1 )) | P (Z) = cj Z j ∈ L, cj = 0 if j ∈ T .
 
j=0

Note that if a positive integer u has an orbit of length i under the map
j 7→ jq modulo v, i.e. if uq i ≡ u (mod v) and i is the smallest integer satisfy-
ing the congruence, then the choice of the coefficient of Z u must be in a field
of degree i over E; this agrees, of course, with the dimensional requirements.
For the extended codes we adjoin the extra coordinate position corre-
sponding to 0 ∈ K, where the entry is − v−1 i
P
i=0 P (ω ). Since P (0) = c0 ,
Equation (4) implies that all extended cyclic codes are contained in

M = {(P (0), P (1), P (ω), . . . , P (ω v−1 )) | P (Z) ∈ L}, (5)


m
the subspace of E q consisting of those vectors the sum of whose coor-
dinates is zero. (We have deliberately called this subspace M to suggest
5 GENERALIZED REED-MULLER CODES 52

to the reader the correspondence with M .) It follows from the above


that every polynomial P (Z) in K[Z] of degree less than v has the prop-
erty that z∈K P (z) = 0; thus a polynomial r(Z) = vj=0 cj Z j satisfies
P P
P
z∈K r(z) = 0 if and only if it is of degree less than v — i.e. has cv = 0 —
since such polynomials form a subspace of codimension 1 (as a vector space
over K).
We next establish the correspondence between reduced polynomials in
the variables x1 , . . . , xm of degree less than m(q − 1) and polynomials
P (Z) ∈ L. In fact, we do more and establish an algebra isomorphism
between E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ) and the E-subalgebra of
m
K[Z]/(Z q − Z) given by the set of fixed points of the Frobenius map
x 7→ xq . The difference between what follows and the development given in
m
[18] is that here the Mattson-Solomon polynomials live in K[Z]/(Z q − Z)
m
rather than K[Z]/(Z q −1 − 1).
In order to explain the correspondence we first introduce the trace: let
TrK/E denote the trace from K to E, i.e. for z ∈ K,
2 m−1
TrK/E (z) = z + z q + z q + · · · + z q .
Since the trace is a linear transformation from the vector space K over E
onto E, given any basis {α1 , α2 , . . . , αm } for K over E there is a unique
complementary basis {β1 , β2 , . . . , βm } for K over E such that
TrK/E (αi βj ) = δij ,
where δij denotes the Kronecker delta function. (See, for example, Lidl and
Niederreiter [35, Theorem 2.24].)
Using the basis {1, ω, ω 2 , . . . , ω m−1 }, where ω is a primitive element for
K, let the complementary basis be {β1 , β2 , . . . , βm }. Then z ∈ K satisfies
z= m i−1 , where the a are in E, if and only if a = Tr
P
i=1 ai ω i i K/E (βi z).
Since E ⊆ K, we can define a ring homomorphism θ by
(
E[x1 , . . . , xm ] → K[Z]
θ: m−1
xi 7→ βi Z + (βi Z)q + . . . + (βi Z)q = TrK/E (βi Z),

where we are utilizing the Frobenius map of K[Z] into itself and slightly
abusing the trace notation. Following θ by the natural map
m
K[Z] → K[Z]/(Z q − Z),
using the standard representatives — namely polynomials in Z of degree
m
less than or equal to v — and viewing Z as Z + (Z q − Z), we see that
5 GENERALIZED REED-MULLER CODES 53

(TrK/E (βi Z))q = TrK/E (βi Z); hence we get the induced ring homomor-
phism,
m
θ̄ : E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ) → K[Z]/(Z q − Z).

We can thus convert any “reduced” polynomial in the variables xi with


coefficients in E into a polynomial in Z, of degree less than or equal to
v = q m − 1, with coefficients in K. It follows from Lemma 5.7 and the fact
that the vector
(TrK/E (β1 z), . . . , TrK/E (βm z))
takes on every value in E m as z varies over K, that the image of p(x1 , . . . , xm )
is of degree less than v provided p(x1 , . . . , xm ) is of degree less than m(q −1).
cj Z j has cqj = cjq (with subscripts
Pv−1
Moreover, the polynomial P (Z) = j=0
m
computed modulo v) if and only if P (Z)q = P (Z) in the ring K[Z]/(Z q −Z)
— since qj 6= v for j < v and therefore computing subscripts modulo v is the
m
same as viewing the polynomial in K[Z]/(Z q −Z). Since (p(x1 , . . . , xm ))q =
p(x1 , . . . , xm ) in the ring E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ), the reduced
polynomials of degree less than m(q − 1) have images, under θ̄, in L.
Conversely, we define a ring homomorphism,

K[Z] → K[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ),

by
m
X
Z 7→ xi ω i−1 .
i=1

Since (
Pm i−1 ) qm Pm i−1 ,
i=1 xi ω = i=1 xi ω we obtain a ring homomorphism
m
K[Z]/(Z q − Z) → K[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ).

If P (Z)q = P (Z), then the image of P (Z) must lie in E[x1 , . . . , xm ]/(xq1 −
x1 , . . . , xqm −xm ), since this is the subring of K[x1 , . . . , xm ]/(xq1 −x1 , . . . , xqm −
xm ) left fixed by the Frobenius map, x 7→ xq . Let R denote the subring of
m
K[Z]/(Z q − Z) left pointwise fixed by the Frobenius map; then we have a
ring homomorphism

ψ : R → E[x1 , . . . xm ]/(xq1 − x1 , . . . , xqm − xm )

and using the fact that TrK/E (xi β) = xi TrK/E (β) it follows easily that ψ ◦ θ̄
m
is the identity map. Moreover, since in K[Z]/(Z q −Z) we have (Z v )k = Z v
5 GENERALIZED REED-MULLER CODES 54

for any positive integer k, P (Z)q = P (Z) = vj=0 cj Z j if and only if cv ∈ E


P

and v−1 j m
j=0 cj Z ∈ L. Both rings have dimension q as E-algebras and hence
P

θ̄ is an isomorphism of rings, with ψ the inverse of θ̄. In addition, under this


ring isomorphism
M ≈ RFq (m(q − 1) − 1, m).
In fact, much more is true: a simple calculation shows that Z j corresponds
to a polynomial in the xi ’s of degree12 wtq (j); thus, if P (Z) ∈ L is such that
cj = 0 for wtq (j) > ρ, its image in E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm )
has degree less than or equal to ρ. Hence the isomorphism above carries,
for 0 ≤ ρ < m(q − 1), the extended cyclic code with defining set Tb =
{j | wtq (j) < m(q −1)−ρ} onto the generalized Reed-Muller code RFq (ρ, m).
Hence we have proved the following
m
Theorem 5.17 Let R be the subring of Fqm [Z]/(Z q − Z) left pointwise
fixed by the Frobenius homomorphism, x 7→ xq . Then R is isomorphic (as an
algebra over the field Fq ) to Fq [x1 , . . . , xm ]/(xq1 −x1 , . . . , xqm −xm ). Moreover,
the isomorphism can be chosen in such a way that the extended cyclic code
over Fq with defining set {j | wtq (j) < m(q − 1) − ρ} is carried onto the
generalized Reed-Muller code RFq (ρ, m).

5.4 Roots, dimensions and minimum weights


In this section we draw out the consequences of what we have just proved
and discuss the minimum weights of the relevant codes.

Theorem 5.18 If K = Fqm , E = Fq , and ω is a primitive element of K,


then, for 0 ≤ u ≤ q m − 2, ω u is a root of the generator polynomial of the
code RFq (ρ, m)∗ if and only if 0 < wtq (u) ≤ m(q − 1) − 1 − ρ.

This is a consequence of Theorem 5.17 and the discussion at the begining


of Section 4.3, where αi is a root if and only if i is in the defining set T .
We also obtain the promised generalization (due to Charpin, [14, 15]) of
Berman’s theorem:

Theorem 5.19 For any prime p, and any ρ such that 0 ≤ ρ < m(p − 1), if
M is the radical of Fp [G], the group algebra over Fp of the elementary abelian
group G of order pm , then the code given by M m(p−1)−ρ is the generalized
Reed-Muller code RFp (ρ, m).
The reduction modulo xqi − xi can only reduce the degree of a given monomial and,
12

for future reference, we note that the reduction is by a multiple of q − 1.


5 GENERALIZED REED-MULLER CODES 55

This is a consequence of Theorem 5.17 and Corollary 4.11.


We draw out the consequences of Theorem 5.18 below.

Corollary 5.20 For 0 ≤ ρ < m(q − 1) the code RFq (ρ, m)∗ is the cyclic
code with generator polynomial
Y
g(Y ) = (Y − ω u ) ,
0<u<q m −1
wtq (u)≤m(q−1)−1−ρ

where ω is a primitive element of Fqm .

Corollary 5.21 For 0 ≤ ρ < m(q − 1) the code (RFq (ρ, m)∗ )⊥ is the cyclic
code with generator polynomial
Y
g(Y ) = (Y − ω u ) ,
0≤u<q m −1
wtq (u)≤ρ

where ω is a primitive element of Fqm . Moreover,

(RFq (ρ, m)∗ )⊥ = (Fq )⊥ ∩ RFq (m(q − 1) − 1 − ρ, m)∗ .

Proof: That the generating polynomial is as asserted follows from results in


Chapter 1 on cyclic codes. The second statement then follows from Corol-
lary 5.20, with the extra factor (Y − 1) placing the code inside (Fq )⊥ . 2

Corollary 5.22 For 0 ≤ ρ < m(q − 1), the dimensions of both RFq (ρ, m)∗
and RFq (ρ, m) are given by

|{u|0 ≤ u ≤ q m − 1 and wtq (u) ≤ ρ}|.

Proof: This is simply a restatement of the value of the dimension in terms


of the q-weight. 2

Example 5.23 For m = 2 and q = 3, E = F3 and K = F9 . The quadratic


f (Z) = Z 2 + Z − 1 is a primitive polynomial for K, with primitive root
ω. Since m(q − 1) = 4, RE (1, 2)∗ and RE (2, 2)∗ are the only interesting
5 GENERALIZED REED-MULLER CODES 56

cases. The generator polynomial for RE (2, 2)∗ is g(Y ) = (Y − ω)(Y − ω 3 ) =


Y 2 + Y − 1 and the code has dimension 6. A generator matrix is
−1
 
1 1 0 0 0 0 0

 0 −1 1 1 0 0 0 0 

 0 0 −1 1 1 0 0 0 
G= .
 
 0 0 0 −1 1 1 0 0 
 
 0 0 0 0 −1 1 1 0 
0 0 0 0 0 −1 1 1

The extended code, RE (2, 2), has a generator matrix that is G augmented
by an extra column whose entries are −1’s: this is a generator matrix for
the code of the affine plane AG2 (F3 ). If the extra column, corresponding to
0, is labelled 0, and added as the first column, and the columns of G then
labelled 1 to 8, then the plane can be pictured as in Figure 1, with incidence
matrix as given in Figure 2, where the rows are arranged in parallel classes.
The columns then correspond to the points

(0, 0), (1, 0), (0, 1), (1, −1), (−1, −1), (−1, 0), (0, −1), (−1, 1), (1, 1);

equivalently, they correspond to the elements of F9 in the order

0, 1, ω, ω 2 , ω 3 , ω 4 , ω 5 , ω 6 , ω 7 .

The matrix !
0 1
1 −1
cycles the last eight of these points and correspondes to multiplication by
ω. The line {0, 1, 5}, for example, has the equation X2 = 0 and the line
{6, 5, 8} has the equation X1 + X2 + 1 = 0.

6sq 5s 8
qqqqq
q q
qq
q q q q q qqqqs
q q qq qAq q q q
A@ qq q qqqqqq qAq q
@A 
qq qq Aqq 

qqq @sqqq3 Aqqqs
A @
As
0 qqq qqq qqq 7

Aqq @ qq q qq A
 Aq q q qqqqq q q@
 q q q q q qAq qqq qq q q q q@
q
 A
q q qAq q
sq q q q Aqqs @ Aq s
1 2 4
Figure 1: The affine plane AG2 (F3 )
5 GENERALIZED REED-MULLER CODES 57

0 1 2 3 4 5 6 7 8
 

 1 1 1 


 1 1 1 

1 1 1
 
 
 

 1 1 1 


 1 1 1 

1 1 1
 
 
 

 1 1 1 


 1 1 1 


 1 1 1 

1 1 1
 
 
 
 1 1 1 
1 1 1

Figure 2: Incidence matrix for AG2 (F3 )


The generator polynomial for RE (1, 2)∗ is

f (Y ) = (Y − ω)(Y − ω 2 )(Y − ω 3 )(Y − ω 4 )(Y − ω 6 )

and that for (RE (2, 2)∗ )⊥ is

(Y − 1)f (Y ) = Y 6 + Y 5 − Y 4 − Y 2 − Y + 1,

so a check13 matrix for RE (2, 2)∗ is


!
1 −1 −1 0 −1 1 1 0
H= .
0 1 −1 −1 0 −1 1 1

Theorem 5.24 If ρ = r(q − 1) + s, where 0 ≤ s < q − 1, then the code


RFq (ρ, m)∗ is a subcode of a BCH code of length q m −1 over Fq with designed
distance
(q − s)q m−r−1 − 1.

Proof: From Theorem 5.18, for a primitive element ω of Fqm , ω u is a root


of the generator polynomial of RFq (ρ, m)∗ if and only if 0 < wtq (u) <
m(q − 1) − ρ. Now

m(q − 1) − ρ = (m − r)(q − 1) − s = (m − r − 1)(q − 1) + (q − 1 − s),


13
This matrix is frequently called a parity-check matrix in the literature.
5 GENERALIZED REED-MULLER CODES 58

so if we let h be the smallest integer with wtq (h) = m(q − 1) − ρ, then


m−r−2
X
m−r−1
h = (q − s − 1)q + (q − 1)q i = (q − s)q m−r−1 − 1.
i=0

It follows that every integer u with 0 ≤ u < h satisfies wtq (u) < m(q −
1) − ρ, and thus the elements ω 1 , ω 2 , . . . , ω h−1 are all roots of the generator
polynomial of the code. Thus RFq (ρ, m)∗ is a subcode of a BCH code of
designed distance (q − s)q m−r−1 − 1 as stated. 2
The designed distance is the true minimum distance, as the following
theorem shows by an explicit construction of codewords of this weight.

Theorem 5.25 For any ρ such that 0 ≤ ρ < m(q−1), where ρ = r(q−1)+s
with 0 ≤ s < q − 1, RFq (ρ, m) has vectors of weight (q − s)q m−r−1 that
consist of the sum of multiples of the incidence vectors of (q − s) parallel
(m − r − 1)-flats, all contained in an (m − r)-flat.

Proof: Given arbitrary elements wi ∈ E, for i = 1, . . . , r, and s distinct


elements wj0 in E, let
r  s
Y
(xr+1 − wj0 ).
Y
p(x1 , . . . , xm ) = 1 − (xi − wi )q−1
i=1 j=1

Then p(x1 , . . . , xm ) has degree r(q − 1) + s = ρ and is zero in E m = V unless

xi = wi , for i = 1, . . . , r, (6)
xr+1 6= wj0 for j = 1, . . . , s. (7)

There are (q − s)q m−r−1 vectors in E m satisfying both equations and the
codeword corresponding to p(x1 , . . . , xm ) has this weight.
To establish the geometric nature of the codewords defined by such poly-
nomials, consider the q m−r−1 points of E m satisfying (6) and the additional
equation xr+1 = c, where c is an element of E that is not amongst the wj0 .
Then these points all belong to an (m − r − 1)-flat and the corresponding
coordinate positions in the codeword of p(x) have the constant value
s
(c − wj0 )
Y

j=1
5 GENERALIZED REED-MULLER CODES 59

on these points. The same is true of each of the (q − s) elements of E that


are not amongst the wj0 , and hence we get a vector of the stated form. 2
Remark: If s = 0, then the polynomial p(x1 , . . . , xm ) is the incidence vector
of an (m − r)-flat.

Corollary 5.26 If ρ = r(q − 1) + s < m(q − 1) with 0 ≤ s < q − 1, then


RFq (ρ, m) has minimum weight (q −s)q m−r−1 and RFq (ρ, m)∗ has minimum
weight (q − s)q m−r−1 − 1.

Proof: By taking the flat with wi = 0 for i = 1, . . . , r, and wj0 6= 0 for


j = 1, . . . , s, it follows that the coordinate at the point 0 of the corresponding
polynomial is non-zero, so that the corresponding codeword in RFq (ρ, m)∗
has weight (q − s)q m−r−1 − 1. This is the minimum weight by Theorem 5.24.
By translation invariance the minimum weight of RFq (ρ, m) must be (q −
s)q m−r−1 since it has vectors of that weight and, as we have just seen, the
minimum weight of RFq (ρ, m)∗ is (q − s)q m−r−1 − 1. 2

Corollary 5.27 Let p be a prime. The code over Fp of the design of points
and r-flats of the affine geometry AGm (Fpt ) has minimum weight ptr .

Proof: Apply Theorem 5.25 with s = 0. Since the code of the design is
a subset of the generalized Reed-Muller code, it must have at least this
minimum weight, and since it has vectors of this weight, this must be the
minimum weight. 2

Corollary 5.28 Let p be a prime. The code over Fp generated by the dif-
ferences of the incidence vectors of two parallel r-flats of the affine geometry
AGm (Fpt ) has minimum weight 2ptr .

Proof: Set q = pt and take ρ = (m − r − 1)(q − 1) + (q − 2). Each of the


generating vectors of the code in question is in RFq (ρ, m) which, even as a
code over Fq , has minimum weight 2q r . Thus the code of the design, being
a subset and having vectors of this weight, must also have minimum weight
2q r . 2
5 GENERALIZED REED-MULLER CODES 60

5.5 Codes invariant under the full affine group


Delsarte [19], generalizing the ideas of Kasami, Lin and Peterson and The-
orem 4.6, characterized those codes invariant under GLm (Fq ) in its natural
action on the non-zero vectors of an m-dimensional vector space over Fq ; he
also discussed, in a very general way, the “projective” case. In our report
[3] we described in full Delsarte’s proof for the affine case when q is a prime.
Here we give an alternative approach to this case due to Mortimer [44]; it is
more direct and more suitable for our purposes.
m
Mortimer’s results culminate in the proof that the only codes in E E ,
where E = Fp and p is a prime, invariant under G, where ASLm (Fp ) ⊆
G ⊆ AGLm (Fp ), are the generalized Reed-Muller codes, RE (ρ, m). We will
prove his more general results leading to this, all of which can be found
in [44, Chapter 5]. Note that we have already sketched in Section 4.6,
Theorem 4.17, a new proof, due to Weidner, of the main result. Weidner’s
proof is in the single-variable context and utilizes the Jennings basis.
As usual, let E = Fq , where q = pt and p is a prime, and set V = E m .
We will be slightly more general than in Definition 5.3 and let K denote
any extension field of E, and consider the vector space K V — viewing that
space as the space of linear combinations of M with coefficients in the field
K, where M denotes the set of monomials in m variables, as in Section 5.2,
Equation (1), page 43. For any ρ with 0 ≤ ρ ≤ m(q − 1), we write

Kρ = {f | f ∈ K V , and deg(f ) ≤ ρ}, (8)


P
where the degree of f is the total degree, i.e. the maximum value of ai
for the monomials xa11 xa22 . . . xamm that actually occur in the expression for f .
For K = E we have Kρ = RE (ρ, m).
For any integers b ≥ 0 and i and j such that 1 ≤ i, j ≤ m, define linear
transformations, δib and bi,j from K V to itself by giving them as follows on
our chosen basis, the monomials in M:
!
ai a1 a2
(xa11 xa22 . . . xamm )δib = x x . . . xai i −b . . . xamm (9)
b 1 2
!
ai a1 a2 a +b
(xa11 xa22 . . . xamm )bi,j = x1 x2 . . . xai i −b . . . xj j . . . xamm . (10)
b

Since abi = 0 for ai < b, δib annihilates the monomial xa11 xa22 . . . xamm


unless b ≤ ai ; similarly bi,j annihilates xa11 xa22 . . . xamm unless b ≤ ai . Both δi0
and 0i,j are the identity on K V .
5 GENERALIZED REED-MULLER CODES 61

Theorem 5.29 Let T be the translation subgroup of AGLm (Fq ). Then a


subspace C of K V is a T -module if and only if it is invariant under δib for
all i and b such that 1 ≤ i ≤ m and 0 ≤ b ≤ q − 1.

Proof: For u ∈ E and 1 ≤ i ≤ m let τiu denote the translation of V such


that
τiu : (x1 , x2 , . . . , xm ) 7→ (x1 , . . . , xi − u, . . . , xm ). (11)
It is clearly sufficient to show that each τiu in its action on K V is a linear
combination of the δib and conversely.
Let f = j pj xji be any function in K V where the pj are polynomials
P

independent of xi . Then
X
(f )τiu = pj (xi + u)j
j
!
X j
xj−b
X
b
= pj i u
j b
b
!
X j
pj xj−b
X
b
= u i
b j
b
X
= u (f )δib ,
b

and thus τiu = bδb.


P
bu i On the other hand,

u−b τiu = u−b


X X X
(f ) pj (xi + u)j
u∈E × u∈E × j
!
X
−b
X X j j−k k
= u pj x u
k i
u∈E × j k
!
XX j j−k X k−b
= pj x u
k j
k i × u∈E
(
−(f )δib if b 6= 0, q − 1
= ,
−(f )δi0 − (f )δiq−1 if b = 0 or q − 1

so that δib = − u∈E × u−b τiu for b 6= 0, q − 1, and δiq−1 = −δi0 − u∈E × τiu .
P P

Thus each translation is a linear combination of the δib over K, and con-
versely, giving the theorem. 2
5 GENERALIZED REED-MULLER CODES 62

Now we show that invariance under transvections is equivalent to invari-


ance under the bi,j . In fact, we need only the following transvections: for
u ∈ E and i, j = 1, 2, . . . m and i 6= j, define
u
γi,j : (x1 , x2 , . . . , xm ) 7→ (x1 , . . . , xi − uxj , . . . , xm ). (12)
u is a transvection with axis given by x = 0. Recall that the special
Then γi,j j
affine group, ASLm (Fq ), is generated by the translations and transvections:
see, for example, [23].

Theorem 5.30 A subspace C of K V is invariant under ASLm (Fq ) if and


only if it is invariant under all the transformations δib and bi,j with 0 ≤ b ≤
q − 1 and i 6= j satisfying 1 ≤ i, j ≤ m.

Proof: In view of Theorem 5.29, since SLm (Fq ) is spanned by the transvec-
u , we need only show that each of these is a linear combination over
tions γi,j
E of the bi,j , and conversely.
Any f ∈ K V can be written in the form
X
f= pr,s xri xsj
r,s

where pr,s is a polynomial which is independent of xi and xj . Then


X
u
(f )γi,j = pr,s (xi + uxj )r xsj
r,s
!
X r
xr−b s+b b
X
= pr,s i xj u
r,s b
b
!
X r
pr,s xr−b s+b b
X
b
= u i xj u
b r,s b
X
= u (f )bi,j .
b

u =
Thus γi,j
P b b
b u i,j , and we can invert this formula to obtain the converse
exactly as in the proof of Theorem 5.29. 2

Theorem 5.31 Let C be a subspace of K V . Then C is invariant under


AGLm (Fq ) if and only if
1. C is invariant under the transformations δib and bi,j for i 6= j and
1 ≤ i, j ≤ m and 0 ≤ b ≤ q − 1, and
5 GENERALIZED REED-MULLER CODES 63

2. C is spanned by monomials.

Proof: By the previous theorems, the first condition characterizes subspaces


invariant under ASLm (Fq ), so it will suffice to show that an ASLm (Fq )-
invariant subspace is also invariant under AGLm (Fq ) if and only if it is
spanned by monomials. This is equivalent to showing that if a monomial
appears with a non-zero coefficient in a function in C, then the monomial
itself is in C.
The group AGLm (Fq ) is generated by ASLm (Fq ) and the dilations ηiu
defined by
ηiu : (x1 , x2 , . . . , xm ) 7→ (x1 , . . . , uxi , . . . , xm ) (13)
for i = 1, 2, . . . , m and u ∈ E × . Suppose C is an ASLm (Fq )-module spanned
by monomials. Each ηiu maps each monomial to a scalar multiple of itself,
so C is invariant under ηiu . Thus C is an AGLm (Fq ) module.
Conversely suppose that C is an AGLm (Fq )-invariant subspace that is
not spanned by monomials. Thus C is invariant under the transformations
X
λki = − uk ηiu , (14)
u∈E ×

for 0 ≤ k ≤ q − 1, and 1 ≤ i ≤ m. Then

(x1a1 xa22 . . . xamm )λki = (− uk−ai )xa11 xa22 . . . xamm


X

u∈E ×
(
xa11 xa22 . . . xamm if k ≡ ai (mod (q − 1))
= .
0 otherwise

Thus if f ∈ C then (f )λki ∈ C and consists of the terms of f that contain


xki if k 6= 0, q − 1 and consists of those containing xq−1
i or independent of xi
if k = 0 or q − 1.
Choose f ∈ C such that none of the monomial terms in it are in C.
Subject to this condition choose f to have a minimal number of terms.
Within f choose a term g with a maximal number of exponents which are
neither 0 nor q − 1. Subject to this condition choose g with a maximal
number of exponents q − 1. Relabelling subscripts we have

g = xa11 xa22 . . . xar r xq−1 q−1


r+1 . . . xr+s ,

where 0 < ai < q − 1.


5 GENERALIZED REED-MULLER CODES 64

The function (f )λa11 . . . λar r of C contains the terms of f that contain xi


raised to the exponent ai for i = 1, . . . , r. By the minimality of the number
of terms in f , we have f = (f )λa11 . . . λar r . Thus every monomial in f begins
with xa11 xa22 . . . xar r . . . and by the choice of g, the remaining exponents are 0
or q − 1.
If r + s < m then the monomial g is fixed by κi = q−1 q−1
i,m m,i for i =
r + 1, . . . , r + s. If κi annihilates any monomial of f then (f )κi contains
fewer terms than f , contradicting the choice of f . Thus each term of f
begins
xa11 xa22 . . . xar r xq−1 q−1
r+1 . . . xr+s . . . .

Since g has the maximal number of exponents q − 1 we have f = ag for some


a ∈ E, contradicting our hypothesis.
Thus r + s = m and g = xa11 xa22 . . . xar r xq−1 q−1
r+1 . . . xm . There must be
another term h of f and we can take this to be

h = xa11 xa22 . . . xar r xq−1 q−1


r+1 . . . xt ,

where r < t < m, by changing the last variables if necessary (leaving g


q−1 = 0, and
fixed). Now (h)δm
q−1 q−1 q−1
(f )δm δm−1 . . . δt+1 = h + ...

contains fewer terms than f and is still in C. This contradicts the choice
of f as a function none of whose terms lies in C with a minimal number of
terms. This contradiction gives the theorem. 2
Now take q = p a prime, so that K is any field of characteristic p.

Lemma 5.32 The collection of transformations ki,j act transitively on the


set of all monomials of fixed degree (ignoring scalar multiples) when q = p
is a prime.

Proof: We prove this recursively. Let g = xa11 . . . xamm and h = xb11 . . . xbmm
be two monomials with a1 + · · · + am = b1 + · · · + bm . Suppose that after a
change of variables (if necessary) we have

a1 = b1 , . . . , ar−1 = br−1 , ar > br , . . . , as−1 > bs−1 , as < bs , . . . , am < bm .

Clearly
ar − br ≤ (bs − as ) + · · · + (bm − am )
5 GENERALIZED REED-MULLER CODES 65

and thus there are integers cj for j = s, . . . , m with

ar − br = cs + · · · + cm

and 0 ≤ cj ≤ bj − aj . Thus

e = (g)cr,s
s
. . . cr,m
m

b
r−1 ar −cs −···−cm a a
= uxb11 . . . xr−1 xr r+1
xr+1 s−1 as +cs
. . . xs−1 xs . . . xamm +cm
a a
= uxb11 . . . xbrr xr+1
r+1 s−1 as +cs
. . . xs−1 xs . . . xamm +cm

for some non-zero u ∈ E is a monomial with one more exponent in common


with h than g has. The lemma now follows by induction. 2

Theorem 5.33 Let

ASLm (Fp ) ⊆ G ⊆ AGLm (Fp )

where p is a prime. If C is a non-trivial G-invariant subspace of K V , where


V = Fpm , then C = Kk for some k such that 0 ≤ k ≤ m(p − 1).

Proof: The proof is by induction on m. For m = 1 let f ∈ C be of maximal


degree k, say. Then (f )δ1i has degree k − i, and so C contains functions of
each degree degree less than k. It follows that C = Kk .
Now suppose m ≥ 2 and let f ∈ C be of maximal degree k. If k = 0
then C = K0 = h1i. If k = 1 then C contains a linear function, and hence
all such, since G is transitive on linear functions; thus C = K1 . Now use
induction on k. Then for some i (f )δi1 ∈ C has degree k − 1 and hence
Kk−1 ∩ C is not trivial and thus equal to Kk−1 by the induction hypothesis.
Thus Kk−1 ⊂ C. The function f can then be taken to be homogeneous of
degree k.
Suppose k ≤ (m−1)(p−1). Choose a monomial g = xa11 . . . xamm amongst
the terms of f with a1 the maximal exponent of x1 in f . From the lemma
above we have a product σ of transformations ki,j such that (g)σ is a mono-
mial of degree k that is independent of x1 . Thus h = (f )σ ∈ C is, by the
maximality of a1 , independent of x1 . The subspace C 0 of C consisting of the
functions in C that are independent of x1 is invariant under ASLm−1 (Fp ).
By the induction hypothesis then C contains every function of degree k that
is independent of x1 . In particular, C contains a monomial of degree k and
since G acts transitively on the monomials of degree k, it follows that C
contains all the monomials of degree k. Since C ⊃ Kk−1 , we have C = Kk .
5 GENERALIZED REED-MULLER CODES 66

Now suppose that (m − 1)(p − 1) < k < m(p − 1). Since Kk⊥ =
Km(p−1)−k−1 and
Kk−1 ⊂ C ⊆ Kk ,
we have
Km(p−1)−k−1 ⊆ C ⊥ ⊂ Km(p−1)−k .
Since from the above inequality we have that

m(p − 1) − k ≤ m(p − 1) − (m − 1)(p − 1) − 1 ≤ (m − 1)(p − 1),

we get C ⊥ = Km(p−1)−k−1 by the argument above, and thus C = Kk . 2

Corollary 5.34 With the natural action of AGLm (Fp ) on a vector space V
of dimension m over Fp , where p is a prime, the only subspaces of FpV left
invariant by AGLm (Fp ) are the generalized Reed-Muller codes RFp (ρ, m).

5.6 The geometric codes


We are now ready to consider the so-called non-primitive codes. We re-
strict ourselves to the case of geometric interest; the reader interested in the
general case may wish to consult [2, Section 5.6].
Set n = (q m −1)/(q−1) = q m−1 +. . .+1 and observe that n is the number
of points of P Gm−1 (Fq ). We wish to look at those polynomials P (Z) =
Pv−1 j i n+i ) for all i, in order that they should
j=0 cj Z ∈ L for which P (ω ) = P (ω
define functions on the projective points, ω n being a primitive element of the
field Fq . It follows from Equation (4) that, for such a polynomial, cj = ω jn cj
for each j, so that cj = 0 unless j ≡ 0 (mod q − 1). Hence the polynomial
has the form
n−1
X
P (Z) = ci(q−1) Z i(q−1) . (15)
i=0

We set Lproj equal to the subspace of all such polynomials of L.


Now since P (ω i ) = P (ω i+n ), the usual vector of length q m −1 will consist
of q − 1 repetitions of the vector

(P (1), P (ω), P (ω 2 ), . . . , P (ω n−1 )),

and we will take n to be the length of the geometric codes we will consider.
Since, for any integer j, j ≡ wtq (j) (mod q − 1), in the isomorphism
we have given between R and E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ), the
5 GENERALIZED REED-MULLER CODES 67

polynomials in Lproj will correspond to reduced polynomials all of whose


monomials will have degree divisible by q − 1. It follows that we should
look only at those generalized Reed-Muller codes of order divisible by q − 1.
Thus, we shall change notation and use r rather than ρ = r(q − 1) in the
following

Definition 5.35 The r th order projective generalized Reed-Muller


code
PFq (r, m)
where 0 ≤ r < m is the code of length n = (q m − 1)/(q − 1) given as the set
of vectors
{(P (1), P (ω), . . . , P (ω n−1 )) | P (X) ∈ Lproj , cj = 0 for wtq (j) > r(q − 1)}.

Here P (Z) is defined in Equation (15) and Lproj is defined above. The rth
order projective generalized Reed-Muller code is also given by
m m
* +
xi11 xi22
X X
. . . ximm | ik ≡ 0 (mod q − 1), ik ≤ r(q − 1) ,
k=1 k=1

where these polynomials are only evaluated on a set of representatives in


Fqm of the projective points. Observe that these codes are still cyclic since
they are invariant under a Singer cycle. This is clear, of course, from the
definition — since it is phrased in the single-variable language. The following
proposition is also clear.

Proposition 5.36 The dimension of PFq (r, m) is


|{j|0 ≤ j ≤ q m − 1, q − 1 divides j, wtq (j) ≤ r(q − 1)}|.

Here the weight wtq (j) is defined in Definition 4.7, on page 33.

Example 5.37 To construct the code PF3 (1, 3), of length 13 and dimension
7, the multi-variable formulation is the easiest to give, since the generating
monomials are readily seen to be
{1, x1 x2 , x1 x3 , x2 x3 , x21 , x22 , x23 }.
If the irreducible cubic X 3 − X 2 + 1, with root ω, is used to obtain F27 , then
the matrix  
0 0 −1
 1 0 0 
 
0 1 1
5 GENERALIZED REED-MULLER CODES 68

can be used to generate representatives of the projective points, starting, say


with (1, 0, 0)t . The seven generating monomials yield the generator matrix
 
1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 1 1 0 0 −1 1 −1 −1 0 
 

0 0 0 −1 −1 0 0 1 −1 1 1 0 0 
 

 
G=
 0 0 0 0 −1 0 1 0 1 1 −1 0 −1 
.

 1 0 0 1 1 1 0 1 1 1 1 1 0 

0 1 0 0 1 1 1 0 1 1 1 1 1 
 

0 0 1 1 1 0 1 1 1 1 1 0 1
This code is the code over F3 of the projective plane of order 3. The code vec-
tors in G can be described geometrically: for example, labelling the columns
1 to 13, to represent the points, and using sets of these numbers to represent
lines, then the penultimate row (corresponding to the monomial x22 ) repre-
sents the complement of the line {1, 3, 4, 8}, i.e. the vector 1−v {1,3,4,8} in our
usual notation for characteristic functions. The second row (corresponding
to the monomial x1 x2 ) is the vector v {3,5,6,10} − v {3,9,11,12} . In terms of ho-
mogeneous coordinates for the projective geometry, the line {1, 3, 4, 8} rep-
resents the point set {(1, 0, 0), (0, 0, 1), (−1, 0, 1), (1, 0, 1)}, which is (0, 1, 0)t
in homogeneous coordinates.

Since our projective codes are cyclic we can use the roots to obtain the
orthogonal in the usual way. The code orthogonal to PFq (r, m) is obtained
as follows:

Theorem 5.38 If 0 ≤ r < m then


(PFq (r, m))⊥ = PFq (m − r − 1, m) ∩ (Fq 1)⊥ .

Remark: In Example 5.37, r = 1 and the orthogonal is PF3 (1, 3) ∩ (F3 1)⊥ ,
as expected.

Theorem 5.39 The minimum weight of PFq (m − r, m) is


(q r − 1)/(q − 1) = q r−1 + . . . + 1.

Proof: Since, by Corollary 5.26, the minimum weight of RFq ((m − r)(q −
1), m)∗ is q r − 1, the minimum weight is at least (q r − 1)/(q − 1). But the
polynomial
m−r
Y  
p(x1 , . . . , xm ) = 1 − (xi )q−1
i=1
5 GENERALIZED REED-MULLER CODES 69

is such that each of its monomials has degree divisible by q − 1 and yields a
code vector. Since it takes the value 1 at 0, it obviously yields a vector of
weight (q r − 1)/(q − 1) in PFq (m − r, m). 2
The polynomial above that yields a minimum-weight vector is, in fact,
the incidence vector of an r-dimensional subspace of Fqm . Thus projectively
it is an (r − 1)-dimensional subspace of P Gm−1 (Fq ).

5.7 The codes of the designs from P Gm (Fp )


We have already discussed in Section 4.6 the codes coming from AGm (Fp ),
showing that the code of the design of points and r-flats is precisely M r(p−1) ,
which, by Theorem 5.19, is RFp ((m−r)(p−1), m). We are now in a position
to consider the projective case — when q = p is a prime14 — and we wish
to show that the code of the design of points and projective r-dimensional
subspaces of P Gm (Fp ) is precisely PFp (m − r, m + 1). If P is this design
we already know from above, since P has an automorphism group that acts
transitively on the set of r-dimensional subspaces, that Cp (P) ⊆ PFp (m −
r, m + 1), since the characteristic functions of the r-dimensional subspaces
are in PFp (m − r, m + 1). Clearly, we can use a dimension argument to get
the equality. The following lemma gives a recursion for the dimension of the
projective generalized Reed-Muller codes at hand and will allow us to use
an induction argument. In view of our treatment of the Reed-Muller codes,
this recursion can be viewed as a substitute for the Pascal-triangle property
enjoyed by the binomial coefficients.

Lemma 5.40 For p a prime, the dimensions of the generalized Reed-Muller


codes and the projective generalized Reed-Muller codes are related by the
following recursion:

dim(PFp (r − 1, m)) + dim(RFp (r(p − 1), m)) = dim(PFp (r, m + 1)).

Proof: Let Q, A and P be the sets of integers whose cardinalities give the
dimensions of PFp (r−1, m), RFp (r(p−1), m) and PFp (r, m+1), respectively.
We must show that |Q| + |A| = |P |. Now Q is the set of integers satisfying
0 ≤ u ≤ pm −1, where (p−1) divides u, and wtp (u) ≤ (r−1)(p−1). Similarly,
A is the set of integers satisfying 0 ≤ u ≤ pm − 1 and wtp (u) ≤ r(p − 1)
while P the set of integers satisfying 0 ≤ u ≤ pm+1 − 1, p − 1 divides u, and
wtp (u) ≤ r(p − 1).
14
The general case will be treated in the following section.
5 GENERALIZED REED-MULLER CODES 70

Divide P into the following two disjoint sets: Q0 , the set of those integers
in P whose p-ary expansion has um = p − 1, and A0 , those integers in P
whose p-ary expansion has um < p−1. For u ∈ P where u = u0 +· · ·+um pm ,
set f (u) = u − um pm . The reader will have no difficulty in seeing that f
yields a one-to-one correspondence between Q0 and Q and between A0 and
A. 2
We now use an embedding of P Gm−1 (Fp ) in P Gm (Fp ) just as we did in
the Reed-Muller case; here we know that the code of the projective design of
r-dimensional subspaces projects onto RFp ((m − r)(p − 1), m), which is the
code of the design of r-dimensional flats of AGm (Fp ), and that the kernel
contains the code of r-dimensional subspaces of P Gm−1 (Fp ). An induction
on m now yields the dimensional equality we seek and hence the following

Theorem 5.41 For p a prime, the code over Fp of the design of points
and r-dimensional subspaces of P Gm (Fp ) is the projective generalized Reed-
Muller code PFp (m − r, m + 1). Moreover, we have the following exact se-
quence for these codes:

0 → PFp (m − r − 1, m) → PFp (m − r, m + 1) → RFp ((m − r)(p − 1), m) → 0.

Remark: The sequence above can also be read as an exact sequence of the
geometric codes and, as such, is

0 → Cp (Q) → Cp (P) → Cp (A) → 0,

where Q is the design of points and r-dimensional subspaces of P Gm−1 (Fp ),


P the design of points and r-dimensional subspaces of P Gm (Fp ) and A the
design of points and r-flats of AGm (Fp ).
We now want to identity the minimum-weight vectors in these geomet-
ric codes. We already know the minimum weights and that among the
minimum-weight vectors one finds the relevant geometric objects: the char-
acteristic functions of the r-flats in the affine case and the characteristic
functions of the r-dimensional projective subspaces in the projective case.
We must, therefore, prove that only these vectors and their scalar multiples
are minimum-weight vectors. We begin with the affine case:

Theorem 5.42 For p a prime, the minimum-weight vectors of RFp ((m −


r)(p − 1), m) are the scalar multiples of the characteristic function of the
r-flats of AGm (Fp ).
5 GENERALIZED REED-MULLER CODES 71

Proof: We know that the scalar multiples of the r-flats are minimum-weight
vectors. Moreover, if any minimum-weight vector has as its support an r-
flat, then it clearly must be a scalar multiple of the characteristic function
of that flat. Suppose, therefore, that we have a minimum-weight vector v
whose support, X say, is not an r-flat. Of course, |X| = pr . Without loss of
generality we may assume that X contains the zero vector. Now, since we
are over a prime field, the set X cannot be closed under addition (for then
it would be a subspace). Let x ∈ X be such that x + X 6= X. Now the
vector v is a linear combination of characteristic functions of r-flats, i.e.
X
v= aS v S ,
S∈S

where S is a collection of r-flats and the aS are in Fp . Let w be the translate


of v by x. Then the support of w is x + X, v 6= w, and
X
w= aS v x+S .
S∈S

Since v−w is in the code generated by the differences of the incidence vectors
of parallel r-flats, it has, by Corollary 5.28, weight at least 2pr . But this is
impossible since its support is a subset of X ∪(x+X), which is of cardinality
less than 2pr since x ∈ X ∪ (x + X). Thus every minimum-weight vector
is supported on an r-flat and hence is a scalar multiple of the characteristic
function of an r-flat. 2
We complete our discussion of the geometric codes in the case in which
q = p is a prime by showing that we have the analogous result in the
projective case. In order to do so we first proceed more generally with q
arbitrary and introduce some temporary notation.
Let Ar,m denote the design of points and r-flats of AGm (Fq ) and Pr,m
denote the design of points and r-dimensional subspaces of P Gm (Fq ).
Consider next the subcode Er,m of Cp (Ar,m ) generated by the differ-
ences of incidence vectors of parallel r-flats. Just as in the binary case (see
Section 3.2) Er,m is in the kernel of the projection of Cp (Pr,m ) onto the
coordinates corresponding to the embedded (m − 1)-dimensional projective
space, the image of the projection being Cp (Pr−1,m−1 ). Observe that by
using the (q − 1)-to-1 map of V − {0} onto P Gm−1 (Fq ), where V is the m-
dimensional vector space over Fq defining the projective space, we can pull

the code Cp (Pr−1,m−1 ) back to FVp , where we are writing V ∗ for V − {0};
this simply amounts to repeating each column (q − 1) times. By adjoining
5 GENERALIZED REED-MULLER CODES 72

an overall parity check to this pull-back we get the code in FVp that is gen-
erated by the incidence vectors of the r-dimensional subspaces of V . Call
this code Pr,m . Viewing Cp (Ar,m ) and Er,m in this same ambient space we
have, clearly, that
Er,m + Pr,m = Cp (Ar,m ).
This equation points to the reason why the binary case is so easy: when
q = 2, Er,m ⊆ Pr,m and thus we need analyse only the projective geometry
codes.
For the same reason as in the binary case, Pr+1,m ⊆ Pr,m and, further-
more, Pr+1,m ⊆ Er,m since, if T is any (r + 1)-dimensional subspace, S any
r-dimensional subspace contained in it, and v is in T but not in S, then

(v S − v av + S ).
X
−v T =
a∈Fq ,a6=0

Letting ar,m be the p-rank of Ar,m , pr,m the p-rank of Pr,m and setting
er,m = dim(Er,m ), we have that dim(Pr,m ) = pr−1,m−1 and that

pr−1,m−1 + er,m ≥ ar,m + pr,m−1 , (16)

since the intersection, Pr,m ∩ Er,m , contains Pr+1,m . Further, we have the
following:

Lemma 5.43 Given an embedding of P Gm−1 (Fq ) in P Gm (Fq ), if the first


of the following sequences,

0 → Cp (Pr,m−1 ) → Cp (Pr,m ) → Cp (Ar,m ) → 0

and
0 → Er,m → Cp (Pr,m ) → Cp (Pr−1,m−1 ) → 0,
that arise from the embedding is exact, then so is the second and, moreover,
in that case we have

pr,m = ar,m + pr,m−1 and er,m + pr−1,m−1 = pr,m .

Proof: Clearly, just as in the binary case, the sequences follow easily from
the embedding, and we need only check that the kernels are as described.
That the codes are contained in the kernels is obvious; thus in order to prove
that they are the kernels we must check the dimensions. From the discussion
5 GENERALIZED REED-MULLER CODES 73

preceding the lemma, in particular (16), and the second sequence, we have
that
pr,m−1 + ar,m ≤ pr−1,m−1 + er,m ≤ pr,m
and the result follows since, if the first sequence is exact, pr,m−1 + ar,m =
pr,m . 2
With this machinery in place one can now, for any prime p, imitate the
proof for the binary case: see Theorem 3.13. Note that here we need only
identify the vectors in the projective codes since we have already determined
the minimum-weight vectors in the affine case. We leave to the reader the
proof of the following

Theorem 5.44 For p a prime, the minimum-weight vectors of the code of


the design of points and r-dimensional subspaces of P Gm (Fp ) are the scalar
multiples of the incidence vectors of these subspaces.

Remark: We note that we have thus determined the minimum-weight vec-


tors of the codes PFp (r, m) in the prime case, since these codes are codes of
designs arising from P Gm−1 (Fp ).

5.8 The subfield subcodes


To obtain the codes of the designs coming from affine and projective spaces
over Fq , in the case in which q is a proper prime power, we need to restrict
the codes, RFq ((m − r)(q − 1), m) and PFq ((m − r), m + 1), to subfield
subcodes — also defined in Chapter 1, Section 5.

Definition 5.45 Let C be a linear code over a field E and let F be a subfield
of E. The set C 0 of vectors in C, all of whose coordinates lie in F , is called
the subfield subcode of C over F .

It is easy to verify that C 0 is a linear code over F and that any per-
mutation of the coordinate positions preserving C also preserves C 0 . We
are interested here only in the case where F = Fp , the prime subfield of
E = Fq . In what follows q = pt .

Definition 5.46 Denote by AFq /Fp (ρ, m) the subfield subcode of the gener-
alized Reed-Muller code RFq (ρ, m) and by PFq /Fp (r, m) the subfield subcode
of the projective generalized Reed-Muller code PFq (r, m).
5 GENERALIZED REED-MULLER CODES 74

Taking first the single-variable approach, P (Z) = v−1 jP


j=0 cj Z yields a
vector in AFq /Fp (ρ, m) if P (ω j ) ∈ F for all j, and P (0) ∈ F , which is
equivalent to the condition that cjp = cpj for all j, the subscripts being read
modulo v = q m − 1 as usual. Writing

Vρ = {u | 0 ≤ u ≤ q m − 1, wtq (upj ) ≤ ρ for j = 0, 1, . . . , t − 1}

(where upj is taken reduced modulo q m − 1, for the same reasons as before),
we have that AFq /Fp (ρ, m) is given by
 
 X 
(P (0), . . . , P (ω v−1 )) | P (Z) = cu Z u , cu ∈ Fqm , cup = (cu )p
 
u∈Vρ

and that
dim(AFq /Fp (ρ, m)) = |Vρ |.
Clearly, by Theorem 5.10, AFq /Fp (ρ, m) will contain the incidence vector of
any (m − r)-flat when ρ ≥ r(q − 1). Its minimum weight dρ is bounded by

(q − s)q m−r−1 ≤ dρ ≤ q m−r ,

where ρ = r(q − 1) + s and 0 ≤ s < q − 1, and, from Theorem 5.25, attains


the lower bound if there are s distinct wj0 in E such that the ck , as defined
there, are in F . In particular, if s = 0, then this holds and dρ = q m−r .
Further, in the case s = q − 2 this is also the case: the vector obtained is
the difference of the incidence vectors of two parallel r-flats, which is clearly
a vector of the subfield subcode.
For the orthogonal code, AFq /Fp (ρ, m)⊥ , clearly we have

AFq /Fp (µ, m) ⊆ AFq /Fp (ρ, m)⊥ ,

where µ = m(q−1)−ρ−1 = (m−r−1)(q−1)+(q−2−s) (and ρ = r(q−1)+s,


and 0 ≤ s < q − 1, as above). Its minimum weight d⊥
ρ thus certainly satisfies

d⊥
ρ ≤ dµ ≤ q
r+1
, (17)

from the above discussion. A lower bound for d⊥ ρ follows from the BCH
bound, and some evaluations of these are quoted in Delsarte et al. [18,
Theorem 4.3.1]. In particular, for ρ = r(q − 1) this gives

d⊥
r(q−1) ≥ (p + q)q
r−1
,
5 GENERALIZED REED-MULLER CODES 75

and for ρ = r(q − 1) + (q − 2) = (r + 1)(q − 1) − 1, it gives

d⊥
r(q−1)+(q−2) ≥ q
r+1
,

which, from (17), yields

d⊥
r(q−1)+(q−2) = q
r+1
.

For the codes of designs arising from projective geometries, we must take
the subfield subcodes of the codes PFq (m − r, m + 1). As we have already
indicated the minimum weight of this code is q r + q r−1 + · · · + 1 and the
incidence vectors of the projective subspaces of dimension r are minimum-
weight vectors.
Our interest is in the codes given by the designs of r-flats of the affine
spaces and r-dimensional subspaces of the projective spaces. Just as in the
binary case we must first analyse the codimension 1 case — in the projective
case the design of points and hyperplanes of a projective space. This case
was, historically, the one given the most attention and was introduced for
projective planes by Prange with Rudolph considerably enriching the sub-
ject and making serious conjectures. The first systematic treatment in the
case of planes was given by Graham and MacWilliams [22]. These results
were generalized to higher dimensions by Goethals and Delsarte [21] and
MacWilliams and Mann [39]; in particular, these authors computed the di-
mension of the code of the design of points and hyperplanes of an arbitrary
projective space. The results are highly diverse and some of the proofs very
technical: thus we only state what we need and show the reader how to
construct these codes, using results of Delsarte et al. [18]. More recently,
Rose [47] has given elegant new proofs of some of the results and Brouwer
and Wilbrink [10, Theorem 4.8] have given a simple method to compute the
p-ranks of the codes in question. We refer the reader also to a fuller account
in Assmus and Key [2].
We thus now simply state the general theorem. Notice that everything
stated is true also for p = 2, but that Theorem 3.14 gives more precise
information in that case.

Theorem 5.47 Let m be any positive integer, q = pt where p is a prime,


and let 0 ≤ r ≤ m.

(1) The code over Fp of the design of points and r-flats in the affine geom-
etry AGm (Fq ), is AFq /Fp ((m − r)(q − 1), m). It has minimum weight
5 GENERALIZED REED-MULLER CODES 76

q r and the minimum-weight vectors are the multiples of the incidence


vectors of the r-flats. The p-rank is given by the cardinality of the set
of integers u satisfying

• 0 ≤ u ≤ qm − 1
• wtq (upj ) ≤ (m − r)(q − 1), j = 0, 1, . . . , t − 1

where upj is reduced modulo q m − 1. The orthogonal code satisfies

AFq /Fp ((m − r)(q − 1), m)⊥ ⊇ AFq /Fp (r(q − 1) − 1, m)

and

AFq /Fp (r(q −1)−1, m) = hv M −v N | M, N parallel (m − r)-flats in V i.

This latter code has minimum weight 2q m−r with minimum-weight vec-
tors multiples of the difference of the incidence vectors of two parallel
(m − r)-flats . The minimum weight, d⊥ (m−r)(q−1) , of the orthogonal
code satisfies

(q + p)q m−r−1 ≤ d⊥
(m−r)(q−1) ≤ 2q
m−r
.

When q = p the subfield codes are the generalized Reed-Muller codes,


i.e.
AFp /Fp ((m − r)(p − 1), m) = RFp ((m − r)(p − 1), m)
and
RFp ((m − r)(p − 1), m)⊥ = RFp (r(p − 1) − 1, m).

(2) The code over Fp of the design of points and r-dimensional subspaces
of the projective geometry P Gm (Fq ) is PFq /Fp (m − r, m + 1). It has
minimum weight (q r+1 − 1)/(q − 1) and the minimum-weight vectors
are the multiples of the incidence vectors of the blocks. The p-rank is
given by the cardinality of the set of integers u satisfying

• 0 ≤ u ≤ q m+1 − 1
• (q − 1) divides u
• wtq (upj ) ≤ (m − r)(q − 1), j = 0, 1, . . . , t − 1

where upj is reduced modulo q m+1 − 1. The orthogonal code satisfies

PFq /Fp (m − r, m + 1)⊥ ⊇ PFq /Fp (r, m + 1) ∩ h1i⊥


5 GENERALIZED REED-MULLER CODES 77

and has minimum weight at least (q m−r+1 − 1)/(q − 1) + 1; if q = p


the subfield codes are the non-primitive generalized Reed-Muller codes
and this becomes an equality.

In particular, the code over Fp of the design of points and hyperplanes


in the affine geometry AGm (Fq ) is AFq /Fp (q − 1, m) and the code over Fp of
the design of points and hyperplanes of the projective geometry P Gm (Fq )
is PFq /Fp (1, m + 1).
In order to actually construct the subfield subcodes, the m-variable ap-
proach is once again the most straightforward; it is described fully in [18].
Before describing the construction, we need some notation: if k satisfies
0 ≤ k ≤ q − 1, and k = t−1 i
i=0 ki p , where 0 ≤ ki ≤ p − 1, then write
P

[pk] = kt−1 + k0 p + · · · + kt−2 pt−1 = pk − kt−1 (q − 1), (18)

i.e. (
pk mod (q − 1) if k < q − 1
[pk] =
q−1 if k = q − 1.
Further, write [k] = k.

Theorem 5.48 For any ρ such that 0 ≤ ρ ≤ m(q − 1), the code

AFq /Fp (ρ, m)

consists of the following polynomial functions, in terms of the usual basis of


characteristic functions on Fqm :

d(l1 , l2 , . . . , lm )xl11 xl22 . . . xlmm ,


X
p(x1 , . . . , xm ) =
l1 ,...,lm

where 0 ≤ li ≤ q − 1, d(l1 , l2 , . . . , lm ) ∈ Fq , and


Pm j
(1) i=1 [p li ] ≤ ρ, for j = 0, 1, . . . , t − 1;
j
(2) d([pj l1 ], . . . , [pj lm ]) = (d(l1 , . . . , lm ))p , for j = 0, 1, . . . , t − 1.

Example 5.49 Take m = 2 and q = 4 = 22 . Thus t = 2 and 0 ≤


ρ ≤ 6. Taking ρ = 3 will give AF4 /F2 (3, 2) = C2 (AG2 (F4 )). Then V3 =
{0, 1, 2, 3, 4, 6, 8, 9, 12} and so dim(AF4 /F2 (3, 2)) = 9. If ω is a primitive ele-
ment for E = F4 , a root of X 2 + X + 1 = 0, then polynomials that generate
the code, according to Theorem 5.48, are {1, x31 , x32 , x1 + x21 , x2 + x22 , ωx1 +
5 GENERALIZED REED-MULLER CODES 78

ω 2 x21 , ωx2 + ω 2 x22 , x1 x22 + x21 x2 , ωx1 x22 + ω 2 x21 x2 }. A generator matrix from
these polynomials can be constructed, and the entries are all, of course, in
F2 . For example, if K = F16 is constructed from E using the primitive
polynomial X 2 + ωX + ω and a is a root of this, then ordering the vectors
of E 2 in the usual way, i.e. 0, 1, a, a2 , . . . , a14 . Then the codeword obtained
from the polynomial ωx2 + ω 2 x22 is

(0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0).

The codes PFq /Fp (r, m) can be constructed in a manner analogous to the
primitive case as in Theorem 5.48. With the added condition that (q − 1)
divides i li , the codewords are given by the first n = (q m − 1)/(q − 1)
P

coordinates, as in Definition 5.35.

Example 5.50 For m = 3, q = 4, n = (43 − 1)/(4 − 1) = 21, and taking


r = 1, will produce PF4 /F2 (1, 3) as the binary code of the projective plane
of order 4, P G2 (F4 ). If ω is a primitive element for E = F4 , then the
polynomials that generate (over F2 ) PF4 /F2 (1, 3) are {1, x31 , x32 , x33 , x1 x22 +
x21 x2 , ωx1 x22 + ω 2 x21 x2 , x2 x23 + x22 x3 , ωx2 x23 + ω 2 x22 x3 , x3 x21 + x23 x1 , ωx3 x21 +
ω 2 x23 x1 }. The dimension is thus 10 and a generator matrix is given by the
codewords corresponding to each of these ten polynomials p(x1 , . . . , xm ),
appropriately evaluated. Taking X 3 + ω 2 X 2 + ωX + ω for the generating
polynomial of F64 over F4 . For example, the codeword corresponding to
p(x1 , . . . , xm ) = ωx1 x22 + ω 2 x21 x2 is

(0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0).

This is the vector v L − v M where L and M are lines, L = {3, 4, 7, 17, 19} =
(1, 1, 0)t , and M = {3, 5, 10, 11, 14} = (1, ω, 0)t , where the points are labelled
1, 2, . . . , 21 in the order given15 .

Note: The designs formed from affine or projective geometries may happen
to have orders divisible by primes other than the characteristic prime for the
geometry. The codes for such primes will not be of any interest — a result
that follows from work of Mortimer [45] on the modular representations of
doubly-transitive groups.
15
Computations here and elsewhere were with Cayley[9] and Magma[11].
5 GENERALIZED REED-MULLER CODES 79

5.9 Formulas for p-ranks


The dimensions of the codes arising from finite geometries are given in the
preceding section in terms of the number of integers with q-weight satis-
fying certain properties and this may very well be the most efficient way
to calculate the dimension in the general case, since there seems to be no
simple formula that will cover all possibilities — except in the case of the
Reed-Muller codes, where q = p = 2. There are, however, simplifications in
certain cases, and we give some of these below.
But we first give Hamada’s [25, 26] rather complicated general formula:

Result 5.51 (Hamada) Let q = pt and let D denote the design of points
and r-dimensional subspaces of the projective geometry P Gm (Fq ), where
0 < r < m. Then the p-rank of D is given by

Y L(sX
X t−1 j+1 ,sj )
! !
X
i m+1 m + sj+1 p − sj − ip
... (−1) ,
s0 st−1 j=0 i=0
i m

where st = s0 and summations are taken over all integers sj (for j =


0, 1, . . . , t − 1) such that

r + 1 ≤ sj ≤ m + 1, and 0 ≤ sj+1 p − sj ≤ (m + 1)(p − 1),

and
sj+1 p − sj
L(sj+1 , sj ) = b c,
p
i.e. the greatest integer not exceeding (sj+1 p − sj )/p.

This formula is deduced in [25, 26]. It simplifies in certain cases, in


particular in the case of designs of points and hyperplanes, when the formula
becomes that found earlier by Graham and MacWilliams [22] for planes, and
by Goethals and Delsarte [21], MacWilliams and Mann [39], and Smith [50],
for general m. It becomes in that case:

Result 5.52 If q = pt , the p-rank of the design of points and hyperplanes


of P Gm (Fq ) is
!t
m+p−1
+ 1.
m
5 GENERALIZED REED-MULLER CODES 80

If q = pt , the p-rank of the design of points and (m − 1)-flats of the affine


geometry AGm (Fq ) is
!t
m+p−1
.
m
Observe that the passage from the projective dimension to the affine dimen-
sion is, in the codimension-1 case, quite easy: since the minimum-weight is
given by the number of points of a hyperplane and since the incidence vector
of a hyperplane is a minimum-weight vector one simply projects onto the
affine points off one such hyperplane — losing one dimension.
In the case q = p = 2, the codes are Reed-Muller, and the 2-ranks are
explicitly given in Theorem 3.14. When q = p in general, Theorem 5.5 gives
the p-rank; we restate it here for the affine geometry designs.
Result 5.53 For any r such that 0 ≤ r ≤ m,
m
! !
X
k m m + (m − r)(p − 1) − kp
dim(RFp ((m − r)(p − 1), m)) = (−1) .
k=0
k m

A simplification of this for q = p prime and r = 1, when we have the


design of points and lines, was obtained by Ceccherini and Hirschfeld [12];
a general summation formula in the prime case but for any r has been
established by Hirschfeld and Shaw [27] and is as follows:
Result 5.54 For p a prime, the p-rank of the design of points and r-
dimensional projective subspaces of P Gm (Fp ) is
r−1
! !
m m−1
X
i (r − i)(p − 1) − 1 m + (r − i)p − r
p +p + ... + 1 − (−1) .
i=0
i m−i
For r = 1 Result 5.54 has a simple form which follows from Theorem 5.38
and the simple form for the dimension of points and hyperplanes; note that
the derivation of the formula below is particularly easy since one can utilize
Corollary 4.19 to get the dimension of the projective code of points and
hyperplanes — as explained above — and not even invoke Theorem 5.38,
but instead note that the complements of the hyperplanes give the necessary
orthogonal (with a compensating loss of the gained dimension). Thus the
code over Fp of the design of points and lines of the projective geometry
P Gm (Fp ) where p is a prime has dimension
!
m m−1 m+p−1
p +p + ... + 1 − .
m
5 GENERALIZED REED-MULLER CODES 81

These simple derivations for the case q = p a prime highlight the difficulties
of the prime-power case.
As we have seen in Corollary 4.20:

Result 5.55 The p-rank of the design of points and lines of the affine ge-
ometry AGm (Fp ), where p is a prime, is
!
m m+p−2
p − .
m

In the case of q = p = 3 and r = 1, we have a Steiner triple system.


Hence we have

Result 5.56 The 3-rank of the Steiner triple system of points and lines of
AGm (F3 ) is 3m − 1 − m.

Another particular case that gave a bound for the p-rank of a translation
plane is the following from Key and Mackenzie [33]:

Result 5.57 If D is the design of points and m-flats in AG2m (Fp ), where
p is a prime, then the p-rank of D is given by
m−1
! !
 X
i 2m m + (m − i)p
dim RFp (m(p − 1), 2m) = rankp (D) = (−1) .
i=0
i 2m

There are also other cases where simpler arguments give the p-rank and
even a basis in terms of incidence vectors of the geometric objects involved.
For example, Bagchi and Sastry [4] have produced a simple derivation of the
dimension of the binary code of the design of points and planes in P G3 (F2t )
by finding a set of planes whose incidence vectors form a basis:

Result 5.58 (Bagchi and Sastry) Let D be the design of points and
planes in P G3 (F2t ) and let O be an ovoid in P G3 (F2t ). Then the inci-
dence vectors of the tangent planes to the ovoid form a basis for C2 (D). It
follows that dim(C2 (D)) = 22t + 1.

Another result which describes an explicit basis of incidence vectors —


in this case lines — of the affine plane AG2 (Fp ), where p is a prime, has
been obtained by Moorhouse [43]. In this case the dimension of the code
over Fp is p+1 = pi=1 i, and a basis can be had by ordering, arbitrarily,
P
2
REFERENCES 82

any p of the p + 1 parallel classes and taking one line from the first, two
from the second, etc., even the choices of the lines being made arbitrarily;
the basis consists of the incidence vectors of the selected lines.
A conjecture of Hamada (see [25, 26]) that the p-rank of the design of
points and r-dimensional flats of a finite-geometry design over a field of char-
acteristic p is always the smallest for designs with the same parameters and
also characterizes such designs, is false in general, the counter-examples first
occuring for 2-(31,7,7) designs: see Tonchev [51] and Delsarte and Goethals
[21]. However, the minimality of the p-rank still appears to be true, and the
conjecture still stands for designs of points and hyperplanes and also for de-
signs of points and lines; moreover, when p = q = 2, this limited conjecture
is valid.

References
[1] E. Artin. Geometric Algebra. New York: Wiley Interscience, 1957.

[2] E. F. Assmus, Jr. and J. D. Key. Designs and their Codes. Cambridge
University Press, 1992. Cambridge Tracts in Mathematics, Vol. 103
(Second printing with corrections, 1993).

[3] Edward F. Assmus, Jr. and Jennifer D. Key. Codes and finite geome-
tries. Technical report, INRIA, 1993. Report No. 2027.

[4] Bhaskar Bagchi and N. S. Narasimha Sastry. Even order inversive


planes, generalized quadrangles and codes. Geom. Dedicata, 22:137–
147, 1987.

[5] Thierry Berger and Pascale Charpin. The automorphism group of Gen-
eralized Reed-Muller codes. Discrete Math., 117:1–17, 1993.

[6] S. D. Berman. On the theory of group codes. Kibernetika, 3(1):31–39,


1967.

[7] Richard E. Blahut. Theory and Practice of Error Control Codes.


Addison-Wesley, 1983.

[8] Ian F. Blake and Ronald C. Mullin. The Mathematical Theory of Cod-
ing. New York: Academic Press, 1975.
HAMMING, GOLAY AND REED–MULLER CODES 31

It is interesting to note that submatrix B of the parity-check matrix of C24 satisfies B = B .


This means that code C24 is a self-dual code. Details of self-dual codes are not covered in
this book. Interested readers are referred to (MacWilliams and Sloane 1977; Wicker 1995).
In program golay24.c, the encoding is performed by recurrence with H , as indicated
in Equation (1.20). As before, let wtH (x̄) denote the Hamming weight of a vector x̄.
The decoding steps in arithmetic decoding of the extended (24, 12, 8) Golay code are
as follows (Vanstone and van Oorschot 1989; Wicker 1995):

1. Compute the syndrome s̄ = r̄H .

2. If wtH (s̄) ≤ 3, then set ē = (s̄, 0̄) and go to step 8.


3. If wtH (s̄ + rowi ) ≤ 2, then set ē = (s̄ + rowi , x̄i ), where x̄i is a 12-bit vector with
only the i-th coordinate nonzero.
4. Compute s̄B.
5. If wtH (s̄B) ≤ 3, then set ē = (0̄, s̄B) and go to step 8.
6. If wtH (s̄B + rowi ) ≤ 2, then set ē = (x̄i , s̄B + rowi ), with x̄i defined as above, and
go to step 8.

7. r̄ is corrupted by an uncorrectable error pattern, set error failure flag. End of decoding.
8. Set ĉ = r̄ + ē. End of decoding.

2.3 Binary Reed–Muller codes


Binary RM codes constitute a family of error correcting codes that are easy to decode using
majority-logic (ML) circuits. In addition, codes in this family are known to have relatively
simple and highly structured trellises (Lin et al. 1998). Details about trellises of linear block
codes can be found in Chapter 7.
An elegant definition of binary RM code is obtained with the use of binary polyno-
mials (or Boolean functions). With this definition, RM codes become close relatives of
Bose–Chauduri–Hocquenghem (BCH) codes and Reed–Solomon codes, all members of
the class of polynomial codes.2

2.3.1 Boolean polynomials and RM codes


This section closely follows the development of (MacWilliams and Sloane 1977). Let

f (x1 , x2 , . . . , xm )

denote a Boolean function on m binary-valued variables x1 , x2 , . . . , xm . It is well known


that such a function can be specified by a truth table. The truth table lists the value of f
for all 2m combinations of values of its arguments. All the usual Boolean operations such
as AND (conjunction), OR (disjunction) and NOT (negation) can be defined in a Boolean
function.
2 Polynomial codes are presented in Section 3.4.
32 HAMMING, GOLAY AND REED–MULLER CODES
Example 2.3.1 Consider the function f (x1 , x2 ) with the following truth table:

x2 0 0 1 1
x1 0 1 0 1
f (x1 , x2 ) 0 1 1 0

Then,
f (x1 , x2 ) = (x1 AND NOT(x2 )) OR (NOT(x1 ) AND x2 ) .

Associated with each Boolean function f , let f¯ denote the binary vector of length 2m which
is obtained from evaluating f at all possible 2m values of the m variables x1 , x2 , . . . , xm .
In Example 2.3.1, f¯ = (0110), where the convention taken for ordering the bit positions
of f¯ is in accordance with a binary representation of integers, with x1 being the LSB and
xm the MSB.
Also note that a Boolean function can be written directly from its truth table to get the
disjunctive normal form (DNF). Using the DNF, any Boolean function can be expressed as
the sum3 of 2m elementary functions: 1, x1 , x2 , . . . , xm , x1 x2 , . . . , x1 x2 . . . xm , such that

f¯ = 1̄ + a1 x̄1 + a2 x̄2 + · · · + am x̄m + a12 x̄1 x̄2 + · · · + a12···m x̄1 x̄2 . . . x̄m , (2.4)

where 1̄ is added to account for independent terms (degree 0). For Example 2.3.1 above,
f¯ = x̄1 + x̄2 .
A binary RM (2m , k, 2m−r ) code, denoted RM(r, m), is defined as the set of vectors
associated with all Boolean functions of degree up to r in m variables. RM(r, m) is also
known as the r-th order RM code of length 2m . The dimension of RM(r, m) can easily be
shown to be equal to

r  
m
k= ,
i
i=0

which corresponds to the number of ways polynomials of degree up to r can be constructed


with m variables.
In view of Equation (2.4), a generator matrix for RM(r, m) is formed by taking as rows
the vectors associated with the k Boolean functions which can be expressed as polynomials
of degree up to r in m variables.

Example 2.3.2 The first order RM code of length 8, RM(1, 3), is an (8, 4, 4) binary code,
and can be constructed from Boolean functions of degree up to one in three variables:
{1, x1 , x2 , x3 }, so that

1̄ = 1 1 1 1 1 1 1 1
x̄1 = 0 0 0 0 1 1 1 1
x̄2 = 0 0 1 1 0 0 1 1
x̄3 = 0 1 0 1 0 1 0 1

3 “sum” means logical “XOR” and “multiplication” means logical “AND” in this context.
HAMMING, GOLAY AND REED–MULLER CODES 33
A generator matrix for RM(1, 3) is thus given by
   
1̄ 1 1 1 1 1 1 1 1
x̄1  0 0 0 0 1 1 1 1
G(1, 3) =    .
x̄2  = 0 0 1 1 0 0 1 1
(2.5)
x̄3 0 1 0 1 0 1 0 1

Note that code RM(1, 3) can be also obtained from a Hamming (7, 4, 3) code by append-
ing at the end of each code word an overall parity-check bit. The only difference between
the extended Hamming code and RM(1, 3) will be a possible permutation of bit (column)
positions.

Dual codes of RM codes are also RM codes


It can be shown that RM(m − r − 1, m) is the dual code of RM(r, m). In other words, the
generator matrix of RM(m − r − 1, m) can be used as a parity-check matrix of RM(r, m).

2.3.2 Finite geometries and majority-logic decoding


RM codes can be formulated in terms of a finite geometry. An Euclidean geometry EG(m, 2)
of dimension m over GF(2) consists of 2m points, which are all the binary vectors of length
m. Note that the columns of the matrix formed by the last three rows of the generator matrix
of RM(1, 3), see Example 2.3.2 above, are the 8 points of EG(3, 2). If the zero point is
deleted, then a projective geometry P G(m − 1, 2) is obtained. The reader is referred to (Lin
and Costello 2005) for an excellent treatment of finite geometries and RM codes. Finite
geometry codes are in effect a generalization of RM codes.4
The connection between codes and finite geometries can be introduced as follows:
consider EG(m, 2). The columns of the matrix (x̄1 x̄2 . . . x̄m
) are taken as the coordinates

of points of the geometry EG(m, 2). Then, there is a one-to-one correspondence between
the components of binary vectors of length 2m and the points of EG(m, 2). A given binary
vector of length 2m is associated with a subset of points of EG(m, 2). In particular, a
subset of EG(m, 2) can be associated with each binary vector w̄ = (w1 , w2 , . . . , w2m ) of
length 2m , by interpreting it as selecting points whenever wi = 1. Stated otherwise, w̄ is
an incidence vector.
Binary RM codes can then be defined as follows: the code words of RM(r, m) are the
incidence vectors of all subspaces (i.e., linear combinations of points) of dimension m − r
in EG(m, 2) (Theorem 8 of (MacWilliams and Sloane 1977)). From this it follows that the
number of minimum-weight code words of RM(r, m) is


m−r−1
2m−i − 1
A2m−r = 2r . (2.6)
2m−r−i − 1
i=0

The code obtained by deleting (puncturing) the coordinate corresponding to

x1 = x2 = · · · = xm = 0
4 See Section 3.4.
34 HAMMING, GOLAY AND REED–MULLER CODES
from all the code words of RM(r, m) is the binary cyclic RM (r, m) code, which has


m−r−1
2m−i − 1
A2m−r −1 = (2.7)
2m−r−i − 1
i=0

as minimum-weight code words.


In terms of decoding, it turns out that RM codes can be decoded with ML decoding.
The idea is the following: the parity-check matrix induces 2n−k parity-check equations.
Designing an ML decoder is selecting a subset of the parity-check equations in such a way
that a majority vote is taken on the value of a specific code position. As an illustration,
consider again the RM(1, 3) code of Example 2.3.2 above.

Example 2.3.3 Let v̄ = ūG = (v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 ) be a code word in RM(1, 3). As


mentioned before, in this case equation (2.5) is also a parity-check matrix, since r = 1 and
m − r − 1 = 1, and the code is self-dual. All possible 15 nonzero combinations of parity-
check equations (rows of H ) are the following:

v1 + v 2 + v 3 + v 4 + v 5 + v 6 + v 7 + v 8 =0
v5 + v6 + v7 + v8 =0
v3 + v4 + v7 + v8 =0
v2 + v4 + v6 + v8 =0
v1 + v2 + v3 + v4 =0
v1 + v2 + v5 + v6 =0
v1 + v3 + v5 + v7 =0
v3 + v4 + v5 + v6 =0
v2 + v4 + v5 + v7 =0
v2 + v3 + v6 + v7 =0
v1 + v2 + v7 + v8 =0
v1 + v3 + v6 + v8 =0
v1 + v4 + v5 + v8 =0
v2 + v3 + v5 + v8 =0
v1 + v4 + v6 + v7 =0 (2.8)

The reader is invited to verify that the sum vi + vj of every pair of code bits vi , vj ,
with i = j , appears in exactly four equations. Whenever a set of equations includes a term
vi + vj , but no other sum of pairs appears more than once, the parity checks involved are
said to be orthogonal on positions vi and vj .
It is now shown how a single error can be corrected. Let

r̄ = v̄ + ē = (r1 , r2 , r3 , r4 , r5 , r6 , r7 , r8 )

denote a received vector after code word v̄ is transmitted over a BSC. Suppose that a single
error is to be corrected in the fifth position, v5 . A procedure to design an ML decoder for
this case is as follows:
Two equations involving the term vi + v5 are selected, with i = 5, and another set of
two equations with the term vj + v5 , j = 5, with i = j . Select (arbitrarily, as long as i = j
and both different than 5), say, i = 3, j = 4. There are four parity checks orthogonal to the
term v3 + v5 . Select any two of them. Do the same for the term v4 + v5 .
HAMMING, GOLAY AND REED–MULLER CODES 35
The syndromes associated with these equations are denoted S1 and S2 for v3 + v5 and
S3 and S4 for v4 + v5 ,

S1 = r 1 + r3 + r5 + r 7

S2 = r3 + r 4 + r5 + r 6

S3 = r 2 + r4 + r5 + r 7

S4 = r 1 + r4 + r5 + r 8 (2.9)
Because v̄ is a code word in RM(1, 3), the set of equations (2.9) is equivalent to

S1 = e1 + e3 + e5 + e7

S2 = e3 + e4 + e5 + e6

S3 = e2 + e4 + e5 + e7

S4 = e1 + e4 + e5 + e8 (2.10)
Since S1 , S2 , S3 and S4 are orthogonal on e3 + e5 and e4 + e5 , a new pair of equations
orthogonal on e5 can be formed as:

S1 = e3 + e5




S2 = e4 + e5



(2.11)
where ej , j = 3, 4, 5, represents the ML estimate from (2.9). Equations (2.11) are orthogonal
on e5 and consequently, the value of e5 can be obtained by a majority vote, for example,
setting it to the output of a logic AND gate with inputs S1 and S2 .
Suppose that code word v̄ = (11110000) is transmitted and r̄ = (11111000) is received.
Then (2.10) gives:
S1 = r 1 + r3 + r5 + r 7 =1
S2 = r3 + r 4 + r5 + r 6 =1
S3 = r 2 + r4 + r5 + r 7 =1
S4 = r 1 + r4 + r5 + r 8 =1 (2.12)
Thus both e3 + e5 and e4 + e5 are estimated as having value equal to ‘1’. From (2.11),
it is concluded that e5 = 1, and the estimated error vector is ē = (00001000), from which
the estimated code word is v̄ = r̄ + ē = (11110000). This shows how an error in the fifth
position can be corrected with a two-step ML decoder.
In the previous example, it was shown how an error in a specific position of an RM(1, 3)
code can be corrected. A similar procedure can be applied to every position in the code.
Therefore, a total of eight ML estimates will be obtained.
In general, an RM(r, m) code can be decoded with an (r + 1)-step ML decoder capable
of correcting any combination of up to (2m−2 − 1)/2
random errors (Lin and Costello
2005; MacWilliams and Sloane 1977).
In addition, it is next shown that a cyclic RM (r, m) code is simpler to decode. The
argument is as follows. It is shown in Chapter 3 that in a cyclic code C, if (v1 , v2 , . . . , vn )
is a code word of C, then its cyclic shift (vn , v1 , . . . , vn−1 ) is also a code word of C.
Therefore, once a particular position can be corrected with ML decoding, all the other
positions can also be corrected with the same algorithm (or hardware circuit) by cyclically
shifting received code words, until all n positions have been tried.
36 HAMMING, GOLAY AND REED–MULLER CODES

Open at
n=7

Close at
n=7
r 7 6 5 4 3 2 1 v
e n-4

D
D
D
M
D
en
M

Figure 2.1 A majority-logic decoder for a cyclic RM (1, 3) code.

Example 2.3.4 In this example, a decoder for the cyclic RM (1, 3) code, a binary Hamming
(7, 4, 3) code, is derived. To obtain the parity-check equations from those of the RM(1, 3)
code, remove the coordinate v1 for which x1 = x2 = · · · = xm from all equations. Let the
code words of RM (1, 3) be indexed by relabeling the code word elements:

(v2 , v3 , v4 , v5 , v6 , v7 , v8 ) → (v1 , v2 , v3 , v4 , v5 , v6 , v7 ).

As before, an ML decoder for correcting an error in an arbitrary position (say, the fifth
position again) can be derived. This can be shown to result in the following seven nonzero
(linearly independent) parity-check equations:

v 1 + v2 + v3 + v5 =0
v2 + v3 + v4 + v6 =0
v3 + v4 + v5 + v7 =0
v1 + v4 + v5 + v6 =0
v2 + v5 + v6 + v7 =0
v1 + v3 + v6 + v7 =0
v1 + v2 + v4 + v7 =0 (2.13)

In a manner similar to the previous example, the syndromes S1 and S2 below are orthogonal
on v4 and v5 , and S2 and S3 are orthogonal on v5 and v6 :

S1 = e 3 + e 4 + e 5 + e 7
S2 = e 1 + e 4 + e 5 + e 6
S3 = e 2 + e 5 + e 6 + e 7 (2.14)
HAMMING, GOLAY AND REED–MULLER CODES 37
Based on the estimates e4 + e5 and e5 + e6 two additional orthogonal equations on e5 can
be formed to give the final estimate,
S1 = e4 + e5
S2 = e5 + e6 (2.15)
where ej , j = 4, 5, 6, represents the ML estimate from the previous step. This results in the
circuit shown in the Figure 2.1. The circuit operates as follows. Initially, the contents of the
seven-bit register are set to zero. Suppose that a single error is contained in the received word
in position i, for 1 ≤ i ≤ 7. At each clock cycle, the contents of the register are cyclically
shifted to the right by one position. Time, in clock cycles, is denoted by the subindex n in
the following.
Consider first the case i = 1. That is, there is an error in the first code word position.
After three cycles, the error is contained in register 5 (v5 ). The output of the ML circuit is set
to en = 1. Four cycles later (a total of seven cycles), the first received bit is output and the
error is corrected. Consider now the case i = 7. After nine cycles, the error is detected and
en = 1. Again, four cycles later (total 13 cycles), the bit in the last position is output and
the error corrected. This decoder has a latency of 13 cycles. Every 13 cycles, the contents
of the shift register are cleared and a new code word can be processed.

Problems
1. Using the union bound, estimate the probability of a bit error of a binary Hamming
(7,4,3) code, denoted CH 1 , with binary transmission over an AWGN channel.
2. Find the weight distribution of the (23, 12, 7) Golay code.
3. Repeat problem 1 for the binary (23, 12, 7) Golay code, denoted CG . Compare the
performances of CH 1 and CG . Comment on the underlying trade-off between the
code rate and the coding gain.

4. Repeat problem 1 with binary transmission over a flat Rayleigh fading channel. (Use
Monte Carlo integration for the union bound.)
5. Consider a binary Hamming (15,11) code, denoted CH 2 .

(a) Give the generator and parity-check matrices in systematic from, denoted Gsys
and Hsys , for this code.
(b) Suppose that the parity-check matrix is rearranged in such a way that the syn-
drome vector is the binary representation of an integer denoting the position
of the error. Let Hint denote this matrix. Find the permutation of columns πc
needed to be applied to Hsys in order to obtain Hint .
(c) A code word v̄ = ūGsys is sent through a BSC and received with a single error
as r̄. Show that the syndrome of the vector πc (r̄) is the binary representation
of the position of the error, where πc is the permutation in part (b).
(d) Sketch block diagrams of the circuits used in encoding and decoding CH 2 .
Ch. 1. §9. Construction of new codes from old (II) 31

We see by induction that f:f, consists of 0 and 2'- I codewords of weight


2'- 1• (The reader may recognize this inductive process as one of the standard
ways of building Hadamard matrices- more about this in the next chapter.)
This is called a simplex code, because every pair of codewords is the same
distance apart. So if the codewords were marked at the vertices of a unit cube
in n dimensions, they would form a regular simplex. E.g. when r = 2, f:/2 =
code # 9 forms a regular tetrahedron (the double lines in Fig. 1.12)

011

Code #9 drawn
as a tetrahedron.
Fig. 1.12.

The simplex code f:f, will also reappear later under the name of a maximal-
length feedback shift register code (see §4 of Ch. 3 and Ch. 14).

1ST ORDER REED-MULLER CODE

[2r,r +1,2r-1)
e.g.[16, 5, B]

SIMPLEX CODE 1---E_X_PU-R•G_A_TE_--IPUNCTURED REED-MULLER CODE

[2r-1,r,2r-1] [2r -1, r + 1,2r-1-1]


e.g. [15,4,8] AUGMENT •. g. [15,5,7]
{ADD !_)

Fig. 1.13. Variations on the simplex code.

The dual of the extended Hamming code is also an important code, for it is
a first-order Reed-Muller code (see Ch. 13). It is obtained by lengthening f:f, as
described in (V). For example lengthening f:/3 in this way we obtain the code in
Fig. 1.14.

Problem. (43) A signal set is a collection of real vectors x = (x 1 • • • x.). Define


x · y = ~?~, x,y, (evaluated as a real number) and call x · x the energy of x. The
unit vectors s<n, ••• , s<•l (where s<n has a I in the i'h component and 0
32 Linear codes Ch. 1. §10.

0 OjO 0 0 0 0 0
2....!JO I 0 I 0 I
00110011
OliO OliO
0000 l l l l
0101 1010
00111100
OliO 1001
llll llll
10101010
11001100
1001 1001
l l l l 0000
10100101
llOOOOll
1001 OliO
Fig. 1.14. Code # 10, an [8, 4, 4] I st order Reed-Muller Code.

elsewhere) form an orthogonal signal set, since s<o • su> = l),i· Consider the
translated signal set {t<o = s<'>- a}. Show that the total energy L?~, t<o • t< 0 is
minimized by choosing

a=(*· ... ·*)·


The resulting {t<'>} is called a simplex set (and is the continuous analog of the
binary simplex code described above). The biorthogonal signal set {± s< 0 } is
the continuous analog of the first order Reed-Muller code.

§10. Some general properties of a linear code

To conclude this chapter we give several important properties of linear


codes. The first three apply to linear codes over any field.

Theorem 9. If H is the parity check matrix of a code of length n, then the code
has dimension n - r iff some r columns of H are linearly independent but no
r + I columns are. (Thus r is the rank of H).

This requires no proof.


Reed-Muller codes

§I. Introduction

Reed-Muller (or RM) codes are one of the oldest and best understood
families of codes. However, except for first-order RM codes and codes of
modest block lengths, their minimum distance is lower than that of BCH
codes. But the great merit of RM codes is that they are relatively easy to
decode, using majority-logic circuits (see §§6 and 7).
In fact RM codes are the simplest examples of the class of geometrical
codes, which also includes Euclidean geometry and projective geometry
codes, all of which can be decoded by majority logic. A brief account of these
geometrical codes is given in §8, but regretfully space does not permit a more
detailed treatment. In compensation we give a fairly complete bibliography.
§§2, 3 and 4 give the basic properties of RM codes, and Figs. 13.3 and 13.4
give a summary of the properties. Sections 9, 10 and II consider the
automorphism groups and Mattson-Solomon polynomials of these codes.
The next chapter will discuss l '' order RM codes and their applications,
while Chapter 15 studies 2"d order RM codes, and also the general problem of
finding weight enumerators of RM codes.

§2. Boolean functions

Reed-Muller codes can be defined very simply in terms of Boolean


functions. We are going to define codes of length n =2m, and to do so we shall
need m variables Vt, ••• , Vm which take the values 0 or l. Alternatively, let
V = (V t, • • • , Vm) range OVer Vm, the set of all binary m -tuples. [In earlier
chapters this set has been called Fm, but in Cbs. 13 and 14 it will be more
Ch. 13. §2. Boolean functions 371

convenient to call it vm.] Any function f(v) = f(v,, Vm) which takes on the
0 0 0'

values 0 and 1 is called a Boolean function (abbreviated B.f.). Such a function


can be specified by a truth table, which gives the value of f at all of its 2m
arguments. For example when m = 3 one such Boolean function is specified
by the truth table:
VJ = 0 0 0 0 1 1
v2 = 0 0 1 1 0 0
v, = 0 1 0 0 1 0 1
f=OOO 1000
The last row of the table gives the values taken by f, and is a binary vector of
length n = 2m which is denoted by f. A code will consist of all vectors f,
where the function f belongs to a certain class.
The columns of the truth table are for the moment assumed to have the
natural ordering illustrated above.
The last row of the truth table can be filled in arbitrarily, so there are 22 m

Boolean functions of m variables.


The usual logical operations may be applied to Boolean functions:
f EXCLUSIVE OR g = f + g,
f AND g = fg,
(1)
f OR g = f + g + fg,
NOT f = f = 1 + f.
The right-hand side of these equations defines the operations m terms of
binary functions.
A Boolean function can be written down immediately from its truth table:
in the preceding example,

since the right-hand, side is equal to 1 exactly when f is. This is called the
disjunctive normal form for f (see Problem 1).
Using the rules (1) this simplifies to (check!)

Notice that v~ = vj. It is clear that in this way any Boolean function can be
expressed as a sum of the 2m functions

), V1, V2, . . . , Vm, V1V2, V1VJ, . . . , Vm-IVme . . . , V1V2 · · · Vm, (2)

with coefficients which are 0 or 1. Since there are 22 '" Boolean functions
altogether, all these sums must be distinct.
In other words the 2m vectors corresponding to the functions (2) are
linearly independent.
372 Reed-Muller codes Ch. 13. §3.

Problem. (I) (Decomposition of Boolean functions.) If f(v., . .. , vm) is a


B.f., show the following.
(a) f(v., 0 0Vm) = vmf(v.,
0, Vm-t. I) OR vmf(v.,
0 0 0, Vm-t. 0), and 0 0 0 ,

(b) f(v., ... , Vm) = Vmg(v., ... , Vm-1) + h(v., ... , Vm-1) where g, h are B.f.'s.
(c) Disjunctive normal form:
I I

f(v., ... , Vm) = L ··· L


i1=0 irn=O
f(i., ... , im)w:' · · · w:::,

where
w~ = v,, w,0 = v,.
-

Theorem I. Any Boolean function f can be expanded in powers of v, as


f(v., ... , Vm) = L g(a)v;' · · · v:m, (3)
aevm

where the coefficients are given by

g(a) = L
bCa
f(b., .. . , bm), (4)

and b C a means that the I' s in b are a subset of the I' s in a.

Proof. For m = I, the disjunctive normal form for f is


f(vl) = f(O)(I + V1) + f(I)VI
= f(O) I+ (f(O) + f(l))v.,

which proves (3) and (4). Similarly for m =2 we have


f(v., v2) = f(O, 0)(1 + v1)(1 + v2) + f(O, 1)(1 + v1)v2 + f(l, O)v1(1 + v2)
+ f(l, l)v1v2 = f(O, 0)1 + {f(O, 0) + f(l, O)}vl
+{1(0,0)+ f(O, l)}v2+{f(O,O)+ f(O, I)+ f(I,O)+ f(l, l)}v1v2,
which again proves (3) and (4). Clearly we can continue in this way. Q.E.D.

Problem. (2) (The randomization lemma.) Let f(v., . .. , Vm-1) be an arbitrary


Boolean function. Show that Vm + f(_v., . .. , Vm-1) takes the values 0 and I
equally often.

§3. Reed-Muller codes

As in §2, v = (v., Vm) denotes a vector which ranges over vm, and
0 0 0, I is
the vector of length 2m obtained from a Boolean function f(v., . .. , Vm).
Ch. 13. §3. Reed-Muller codes 373

Definition. The r•h order binary Reed-Muller (or RM) code ~(r, m) of length
n =2m, for 0 ~ r ~ m, is the set of all vectors f, where f(v~, ... , Vm) is a
Boolean function which is a polynomial of degree at most r.
For example, the first order RM code of length 8 consists of the 16
codewords

which are shown in Fig. 13 .I.

0 00000000
00001111
00110011
0 1 0 1 0 1 0 1
v2 + VJ 00111100
v, + V3 01011010
v, + v2 0 100110
v,+v2+v3 0 101001
l 11111111
l + V3 11110000
l + v2 11001100
l + v, 1 0 1 0 1 0 1 0
l + v2+ VJ 1100001
l + v, + VJ 1 0 1 0 0 10 1
l + v, + v2 0011001
l+v,+v2+vJ 0010110
Fig. 13.1. The I" order Reed-Muller code of length 8.

This code is also shown in Fig. 1.14, and indeed we shall see that ~0. m) is
always the dual of an extended Hamming code. ~(1. m) is also the code ~ 2m
obtained from a Sylvester-type Hadamard matrix in §3 of Ch. 2.
All the codewords of ~0. m) except 0 and l have weight 2m-'. Indeed any
B.f. -of degree exactly 1 corresponds to a vector of weight 2m-', by Problem 2.
In general the r'h order RM code consists of all linear combinations of the
vectors corresponding to the products

which therefore form a basis· for the code. There are

k = 1+ (7) + (;) + ... + (;)


374 Reed-Muller codes Ch. 13. §3.

such basis vectors vectors, and as we saw in §2 they are linearly independent.
So k is the dimension of the code.
For example when m = 4 the 16 possible basis vectors for Reed-Muller
codes of length 16 are shown in Fig. 13.2 (check!).
I llllllllllll
v. 0 0 0 0 0 0 0 0 l l l l
v, 00001111000011
V2 OOIIOOIIOOIIOOI
v, 0 l 0 l 0 l 0 l 0 l 0 l 0 l 0
V3V4 0 0 0 0 0 0 0 0 0 0 0 0 l l l
v2v. 0 0 0 0 0 0 0 0 0 0 l l 0 0 l
v,v. 0 0 0 0 0 0 0 0 0 l 0 l 0 l 0
V2V3 0 0 0 0 0 0 l l 0 0 0 0 0 0 l
v,v3 0 0 0 0 0 l 0 l 0 0 0 0 0 l 0 l
v,v2 0 0 0 l 0 0 0 1 0 0 0 1 0 0 0 1
V2V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 l
V1V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 l 0
v,v2v. 0 0 0 0 0 0 0 0 0 0 0 l 0 0 0
v,v2V3 0 0 0 0 0 0 0 l 0 0 0 0 0 0 0
v,v2V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Fig. 13.2. Basis vectors for RM codes of length 16.

The basis vectors for the r'h order RM code of length 16, ~(r, 4), are:
Order r Rows of Fig. 13.2
0 l
l l-5
2 I-ll
3 l-15
4 all

Reed-Muller codes of length 2m+l may be very simply obtained from RM


codes of length 2m using the lu I u +vi construction of §9 of Ch. 2.

Theorem 2.
~(r+ l, m + l) ={luI u +vi: u E ~(r+ l, m), v E ~(r, m)}.

For example ~(1. 4) was constructed in this way in §9 of Ch. 2.


There is an equivalent statement in terms of generator matrices. Let
G(r, m) be a generator matrix for ~(r, m). Then the theorem says

_ (G(r + l, m) G(r + l, m))


G(r + l, m + l)- 0 G(r, m) .
Ch. 13. §3. Reed-Muller codes 375

(Indeed, a codeword generated by this matrix has the form lu I u +vi where
uE ~(r+ 1, m), v E ~(r, m).)

Proof. By definition a typical codeword I in ~(r + 1, m + l) comes from a


polynomial f(v~, . .. , Vm+t) of degree at most r + l. We can write (as in
Problem 1)

where deg (g),;;;; r + 1 and deg (h),;;;; r. Let g and h be the vectors (of length 2m)
corresponding to g(v~, ... , Vm) and h(v~, ... , Vm). Of course g E ~(r + 1, m)
and hE ~(r, m). But now consider g(llJ, ... , Vm) and Vm+th(v., ... , Vm) as
polynomials m v~, . .. , Vm+t· The corresponding vectors (now of length 2m+t)
are lg I gl and IO I hi (see Problem 7). Therefore I= lg I gl + IO I hi. Q.E.D.

Notice the resemblance between the recurrence for RM codes in Theorem


2 and the recurrence for binomial coefficients

(5)

(See Problem 4.)


The minimum distance of Reed-Muller codes is easily obtained from
Theorem 2.

Theorem 3. ~(r, m) has minimum distance 2m-'.

Proof. By induction on m, using Theorem 33 of Ch. 2. Q.E.D.

Figure 13.3 shows the dimensions of the first few [n, k, d] RM codes
(check!)
Notice that ~(m, m) contains all vectors of length 2m; ~(m -1, m) con-
sists of all even weight vectors, and ~(0, m) consists of the vectors 0, l
(Problem 5).

Theorem 4. ~(m- r- 1, m) is the dual code to ~(r, m), for 0,;;;; r,;;;; m-I.

Proof. Take a E ~(m- r-1, m), bE ~(r, m). Then a(v., ... , Vm) is a poly-
nomial of degree ,;;;; m - r- 1, b( v 1 , • • • , Vm) has degree ,;;;; r, and their product
ab has degree ,;;;; m - I. Therefore ab E ~ (m - 1, m) and has even weight.
Therefore the dot product a · b = 0 (modulo 2). So ~ (m - r- 1, m) C
376 Reed-Muller codes Ch. 13. §3.

length n 4 8 16 32 64 128 256 512


m 2 3 4 5 6 7 8 9

distanced dimension k
1 4 8 16 32 64 128 256 512
2 3 7 15 31 63 127 255 511
4 4 II 26 57 120 247 502
8 5 16 42 99 219 466
16 6 22 64 163 382
32 7 29 93 256
64 1 8 37 130
128 9 46
2~G 10
512
Fig. 13.3. Reed-Muller codes.

~(r, mY. But


dim ~(m- r- I, m) +dim ~(r, m)

= l+(7)+·. ·+(m -~-I)+ l+(7)+· · ·+(~)= 2 m•


which implies ~(m- r- l, m) = ~(r, m).L. Q.E.D.

These properties are collected in Fig. 13.4.

For any m and any r, 0 ~ r,;;; m, there is a binary r'h order RM code
~(r, m) with the following properties:
length n =2m,

dimension k = l + (7) + · · · + (~).


minimum distance d = 2m-r.
~ (r, m) consists of (the vectors corresponding to) all polynomials
in the binary variables V~o . .. , Vm of degree ~ r. The dual of
~ (r, m) is ~ (m - r- l, m) (Theorem 4). ~ (r, m) is an extended
cyclic code (Theorem II), and can be easily encoded and decoded (by
majority logic, §§6, 7). Decoding is especially easy for first-order RM
codes (Ch. 14). Aut(~ (r, m )) = GA (m) (Theorem 24). Good practical
codes, also the basis for constructing many other codes (Ch. 15).
Fig. 13.4. Summary of Reed-Muller codes.
Ch. 13. §4. RM codes and geometries 377

Punctured Reed-Muller codes.

Definition. For 0 ~ r ~ m- l, the punctured RM code ~(r, m)* is obtained by


puncturing (or deleting) the coordinate corresponding to v 1 = · · · = Vm = 0
from all the codewords of ~(r, m).

(In fact we shall see in the next section that an equivalent code is obtained
no matter which coordinate is punctured.)
Clearly ~(r, m)* has length 2m -1, minimum distance 2m-r- l, and dimen-
sion 1 + (';') + · · · + (';).

Problems. (3) Show that ~(r, m) is a subcode of ~(r+ 1, m). In fact show that
g/l(r + l, m) ={a+ b: a E g/l(r, m), b is zero or a polynomial in v., ... , Vm of degree
exactly r + l }.
(4) If Theorem 2 is used as a recursive definition of RM codes, use
Equation (5) to calculate their dimension. Obtain Fig. 13.3 from Pascal's
triangle for binomial coefficients.
(5) Show that ~(0, m) and ~(0, m)* are repetition codes, ~(m -I, m)
contains all vectors of even weight, and ~ (m, m) and ~ (m - l, m )* contain
all vectors, of the appropriate lengths.
(6) Let IS I Tl ={Is It I: s E S, t E T}. Show that
~(r+l,m+l)=UISISI

where S runs through those cosets of ~(r, m) which are contained in


~(r+l,m).
(7) Let f(v., ... , Vm) be a B.f. of m variables, and let f be the cor-
responding binary vector of length 2m. Show that the vectors of length 2m+t
corresponding to g(v., ... , Vm+t) = f(v., ... , Vm) and to h(v., ... , Vm+t) =
Vm+tf(v., ... , Vm) are I/ I/I and IO I/I respectively.

§4. RM codes and geometries

Many properties of RM codes are best stated in the language of finite


geometries. The Euclidean geometry EG(m, 2) of dimension m over GF(2)
contains 2m points, whose coordinates are all the binary vectors v =
(v., ... , Vm) of length m. If the zero point is deleted, the projective geometry
PG(m - l, 2) is obtained. (See Appendix B for further details.)
Any subset S of the points of EG(m, 2) has associated with it a binary
incidence vector x(S) of length 2m, containing a l in those components s E S,
and zeros elsewhere.
378 Reed-Muller codes Ch. 13. §4.

This gives us another way of thinking about codewords of ~(r, m), namely
as (incidence vectors of) subsets of EG(m, 2).
For exarr.ple, the Euclidean geometry EG(3, 2) consists of 8 points
P 0 , Pt. ... , P 7 whose coordinates we may take to be the following column
vectors:
Po P. p2 PJ p4 Ps p6 p7
VJ 1 1 0 0 0 0
jj2 1 0 0 1 0 0
v. 0 0 0 0

The subsetS= {P 2 , PJ, P4, P 5 } has incidence vector


x(S) = 00111100.

This is a codeword of ~ (l, 3)- see Fig. 13 .I.


For any value of m let us write the complements of the vectors Vm, . .. , V~o
as follows:
0 0 0 0
0 0 0 0

1 1 0 0
1 0 1 0.

We take the columns of this array to be the coordinates of the points in


EG(m, 2). In this way there is a 1-to-1 correspondence between the points of
EG(m, 2) and the components (or coordinate positions) of binary vectors of
length 2m. Any vector x of length 2m describes a subset of EG(m, 2),
consisting of those points P for which Xp = 1. Clearly x is the incidence
vector of this subset. The number of points in the subset is equal to the
weight of x.
For example the vectors v, themselves are the characteristic vectors of
hyperplanes which pass through the origin, the v,v; describe subspaces of
dimension m - 2, and so on. (Of course there are other hyperplanes through
the origin besides the v,. For example no v, contains the point 11 · · · 1.
Similarly the v,v; are not the only subspaces of dimension m - 2, and so on.)
One of the advantages of this geometrical language is that it enables us to
say exactly what the codewords of minimum weight are in the r'h order
Reed-Mulier code of length 2m: they are the (m - r)-flats in EG(m, 2). This is
proved in Theorems 5 and 7. In Theorem 9 we use this fact to determine the
number of codewords of minimum weight. Then in §5 we show that the
codewords of minimum weight generate ~(r, m). Along the way we prove
that the punctured code ~(r. m )* is cyclic. The proofs given in §§4, 5 should
be omitted on a first reading.
Let H be any hyperplane in EG(m, 2). By definition the incidence vector
Ch. 13. §4. RM codes and geometries 379

h = x(H) consists of all points v which satisfy a linear equation in v~, ... , Vm.
In other words, the Boolean function h is a linear function of v~, ... , Vm, and
so is a codeword of weight 2m-t in ~(r, m).
We remark that if 1 E ~(r, m) is the incidence vector of a set S, then
hiE ~(r + 1, m) and is the incidence vector of S n H. We are now ready for
the first main theorem.

Theorem 5. Let I be a minimum weight codeword of ~(r, m), say I= x(S).


Then S is an (m - r)-dimensional flat in EG(m, 2) (which need not pass
through the origin).
E.g. the 14 codewords of weight 4 in the [8, 4, 4] extended Hamming code
are the 14 planes of Euclidean 3-space EG(3, 2).

Proof. Let H be any hyperplane EG(m- l, 2) in EG(m, 2) and let H' be the
parallel hyperplane, so that EG(m, 2) =HUH'.
By the above remark S n H and S n H' are in ~(r + l, m), and so contain
0 or ;;,;?: 2m-r-l points. Since lSI= 2m-r =IS n HI+ IS n H'l, IS n HI= 0, 2m-r-l
or 2m-'. The following Lemma then completes the proof of the theorem.

Lemma 6. (Rothschild and Van Lint.) LetS be a subset of EG(m, 2) such that
lSI= 2m-r, and IS n HI= 0, 2m-r-l or 2m-r for all hyperplanes H in EG(m, 2).
Then S is an (m- r)-dimensional flat in EG(m, 2).

Proof. By induction on m. The result is trivial for m = 2.

Case (i). Suppose for some H, IS n HI= 2m-'. Then S C H, i.e. S C


EG(m- 1, 2). Let X be any hyperplane in H. There exists another hyperplane
H" of EG(m, 2) such that X= H n H", and S n X= S n H", i.e. IS n X I= 0,
2m-t-<r-t>-t or 2<m-ll-<r-o. By the induction hypothesis S is an ((m- I)- (r- 1))-
flat in EG(m - l, 2) and hence in EG(m, 2).

Case (ii). If for some H, IS n HI= 0, then replacing H by its parallel


hyperplane reduces this to case (i).

Case (iii). It remains to consider the case when IS n HI= 2m-r-1 for all H.
Consider

= L
aES
~ ~
fts HC~m. 2)
XH(a)xH(b)

= ISI(2m- l) + ISI(ISI- l)(2m-l- l)


380 Reed-Muller codes Ch. 13. §4.

since there are 2m- l hyperplanes in EG(m, 2) through a point and 2m-)- I
through a line. The LHS is 2 m- r-~(2m- 1). Substituting lSI= 2m-• on the RHS
2 2

leads to a contradiction. Q.E.D.

The converse of Theorem 5 is:

Theorem 7. The incidence vector of any (m- r)-ftat in EG(m, 2) is in ~(r, m).

Proof. Any (m - r)-flat in EG(m, 2) consists of all points v which satisfy r


linear equations over GF(2), say
m

L a;;V; = b;,
j=l
i = l, ... , r,
or equivalently
m

L a;;V; + b; + l =
j=J
l, i = l, ... , r.
This can be replaced by the single equation

fr (f
..... J=l
a;;V; + b; + l) l,
=

i.e., by a polynomial equation of degree ~ r in v, ... , Vm. Therefore the flat is


in ~(r, m). Q.E.D.

Combining Theorems 5 and 7 we obtain

Theorem 8. The codewords of minimum weight in ~(r, m) are exactly the


incidence vectors of the (m - r)-dimensional flats in EG(m, 2).

Minimum weight codewords in ~(r, m)* are obtained from m1mmum


weight codewords in ~(r, m) which have a l in coordinate 0 (by deleting that
1). Such a codeword is the incidence vector of a subspace PG(m - r- l, 2) in
PG(m - l, 2). (We remind the reader that in a projective geometry there is no
distinction between flats and subs paces - see Appendix B.)

Theorem 9. The number of codewords of minimum weight in:


(a) ~(r, m)* is
Ch. 13. §5. Minimum weight vectors 381

(b) ~(r, m) is

Proof. Theorems 3 and 5 of Appendix B.

*§5. The minimum weight vectors generate the code

Theorem 10. The incidence vectors of the projective subspaces PG(JL- 1, 2) of


PG(m- 1, 2) generate ~(r, m)*, where JL = m- r.

Proof. (The beginner should skip this.) Let a be a primitive element of


GF(2m). Then the points of PG(m- 1, 2) can be taken to be
{1, a, a 2, a 3, ... , a 2 2}. Let l = 2"- 2.
m_

A subset T ={a'\ ... , ad•} of these points will be represented in the usual
way by the polynomial
Wr(X) = xdo + ... + Xd•.
If T ={ado, ... , ad'} is a PG(JL- 1, 2) then the points of T are all nonzero
linear combinations over GF(2) of JL linearly independent points a 0 , • • • , a .. _1
(say) of GF(2m). In other words the points of T are
,...-1
La,lai=ad', i=0,1, ... ,l,
j=O

where (aiO, ail, ... , a,.. -1) runs through all nonzero binary JL-tuples. Also
xwr(x) represents the PG(JL- 1, 2) spanned by aao, .. . , aa .. -1. Thus every
cyclic shift of the incidence vector of a PG(JL- 1, 2) is the incidence vector of
another PG(JL- 1, 2).
Let ~ be the code generated by all Wr(x), where T is any PG(JL- 1, 2).
Clearly ~ is a cyclic code and is contained in ~(r, m)*; the theorem asserts
that in fact ~ = ~(r, m)*. We establish this by showing that

dim~;;?: 1 + (7) + ... + (~)·


The dimension of ~ is the number of a • which are not zeros of ~; i.e. the
number of a• such that Wr(a•) ¥- 0 for some T. Now

= L (boao + · · · + b.. - 1a .. -l)'


b
382 Reed-Mulier codes Ch. 13. §5.

where the summation extends over all nonzero binary #£-tuples b =


(bo, ... , b .. -~). Call this last expression F'.(ao, ... , a .. - 1). Then

F,(ao, ... , a .. -1) =


b0
L
· · b~-L-1
(boao+ y)',

_ ~ S! i0 i~-•
- L.J . • ao · · · a .. -1·
l: j;-s Jo 1.... J.. -1.1
h ... l (6)

This is a homogeneous polynomial of degree s in a 0 , • • • , a .. - 1•


Then dim~ is the number of s such that F,(a 0 , ••• , a .. - 1) is not identically
zero, when the a; are linearly independent.
In fact we will just count those F,(ao, ... , a .. -1) which contain a coefficient
which is nonzero modulo 2. We note that such an F
(i) cannot be identically zero, and
(ii) cannot have ao, ... , a .. -1 linearly dependent (Problem 8). From Lucas'
theorem, a multinomial coefficient
s!

is nonzero modulo 2 iff

where (x) 1 denotes the i'h bit in the binary expansion of x.


Therefore (6) contains a nonzero coefficient whenever the binary expansion
of s contains ;;;?: #I- I 's. For example if s = 210 + 21• + · · · + 2'~-·. (6) contains a
nonzero coefficient corresponding to io = 210 , ••• , i.. - 1 = 2 ~-·. 1

The number of such s in the range [I, 2m- 1] is

Q.E.D.

Problem. (8) Show that if ao, ... , a .. -1 are linearly dependent, then
F'.(ao, ... , a .. -~) is identically zero modulo 2.
Ch. 13. §5. Minimum weight vectors 383

Important Remark. For nonnegative integers s let w2 (s) denote the number of
1's in the binary expansion of s. Then the proof of this theorem has shown
that a • is a nonzero of ~(r, m )* iff 1 ~ s ~2m - 1 and w2(s);;;.. p.,. Or in other
words,

Theorem ll. The punctured RM code ~(r, m)* is a cyclic code, which has as
zeros a• for all s satisfying
1 ~ w2 (s) ~ m - r- 1 and 1~ s ~2m - 2.

The generator and check polynomials, for the punctured RM code ~(r, m)*
are, for 0~ r~ m -1,
g(x) = f1
llllii0w2(S)IIIii0m-r-l
M<•>(x) (7)
l111ii0slllii02'"-2

h(x) = (x + 1) fl
m-r,.;;w2(S)IIIii0m-l
M<•>(x) (8)
llllii0slllii02'"-2

where s runs through representatives of the cyclotomic cosets, and M<•>(x) is


the minimal polynomial of a'. (Remember that an empty product is l.)
An alternative form is obtained by using a-• instead of a as the primitive
element. This has the effect of replacing s by 2m- 1- s and w2(s) by
m - w2 (s ). Then
g(x) = n
r+ llllii0w2(.S)IIIii0m - I
M<•>(x) (9)
llllii0slllii02'"-2

h(x) = (x + 1) n
lc;;~/;)c;;r
M<•>(x). (10)
llllii0slllii02'"-2

If the generator polynomial of ~(r, m)* is given by Equation 7, then by


Theorem 4 of Ch. 7, the dual code has generator polynomial (l 0).
The idempotent of ~(r, m)* is
6o+ L
lllliiOwl('.s)llliiOr
6.
llllii0slllii02'"-2

1
or equivalently (again replacing a by a- ),

6o+ L
m-r"'"""'2(s),..m-l
6.
l,..s,..2'"-2

where s runs through the representatives of the cyclotomic cosets, and 6, is


defined in Ch. 8.
The dual to ~(r, m)*, by Theorem 5 of Ch. 8, has the idempotent
384 Reed-Muller codes Ch. 13. §5.

For example, the idempotents of ~(1. m)* and ~(2, m)* may be taken to be
8o+ 8r
and
m-1

8o+ 8T + L
;~((m+1)/2)
8~

respectively, where /1 = 1 + 21• Then general forms for codewords in various


RM codes of length 2m are as follows. Here the first part of the codeword is
the overall parity check, and the second part is in the cyclic code ~ *.

~(0, m): Ia I a8ol


~(1. m): Ia I a8o+ a1x 1•8rl
m-1

~(2, m): Ia I a8o + a1x 1•8r + L


I ~[(m + 1)/2)
a,x 2 '8~1

~(m- 2, m): lf(l) I f(x)(l + 81)l


~(m- 1, m): lf(l) I f(x)l
where a., a, E GF(2), i., s 1 E {0, 1, 2, ... , 2m-2}, and f(x) is arbitrary.

Nesting habits. Figure 13.5 shows the nesting habits of BCH and punctured
RM codes. Here OO(d) denotes the BCH code of designed distance d, and the
binary numbers in parentheses are the exponents of the zeros of the codes
(one from each cyclotomic coset). The codes get smaller as we move down
the page.

R(m-1)*=6(1 l {¢}
I
ft.(m-2)*=1B(3) { 1}

I
63( 5)
I
~7)~
~ ( m- 3 )* { 1 ,1 1 • 1 0 1 •1 0 01 .... } a (9 ) { 1 •11 • 1 0 1 •111 }
6(11) { 1 ,11,1 01,111,1001}

1:1 (13) { 1 • 11,1 01, 111,10 0 1,1 011}


63 (15) { 1 • 11' 1 01 ' 111, 1 0 01, 101 1' 11 01}

~(m-4l*{wr:S 3}~ 8(1~1{1, ... ,1111}

I I
Fig. 13.5. Nesting habits of BCH and punctured RM codes.
Ch. 13. §6. Encoding and decoding (I) 385

We see that
~(r, m)* C BCH code of designed distance 2m-r- 1,
~(r, m) C extended BCH code of designed distance 2m-r- 1.

Theorem 12. The incidence vectors of the (m - r)-ftats in EG(m, 2) generate


~(r, m).

Proof. We recall that ~(r, m) is ~(r, m)* with an overall parity check added.
By Theorem 10, the incidence vectors of the (m - r)-flats with a 1 in
coordinate 0 generate ~(r, m). So certainly all the (m- r)-flats generate
~(r, m). Q.E.D.

We mention without proof the following generalization of this result.

Theorem 13. (MacWilliams and Mann.) The rank over GF(p) of the incidence
matrix of the hyperplanes of an m-dimension Euclidean or projective geometry
over GF(p') is

(m+:-1)' +e

where E = + 1 for the projective and 0 for the Euclidean geometry.

Research Problem (13.1). Is there a codeword a(x) E ~(r, m)* which is the
incidence vector of a PG(m- r- 1, 2) and generates ~(r, m)*?

§6. Encoding and decoding (I)

There are two obvious ways to encode an RM code. The first uses the
generator matrix in the form illustrated in Fig. 13.2 (this is a nonsystematic
encoder). The second, which is systematic, makes use of the fact (proved in
Theorem 11) that RM codes are extended cyclic codes. In this § we give a
decoding algorithm which applies specifically to the first encoder, and then in
§7 we give a more general decoding algorithm which applies to any encoder.
We illustrate the first decoder by studying the [16, 11, 4] second-order RM
code of length 16, ~(2, 4). As generator matrix G we take the first 11 rows of
Fig. 13.2. Thus the message symbols
386 Reed-Muller codes Ch. 13. §6.

are encoded into the codeword

x = aG = aol + a4v4 + · · · + a,v, + · · · + a,2v1v2 (11)


= XoXt · · · x,s (say).

This is a single-error-correcting code, and we shall show how to correct one


error by majority logic decoding. (Unlike the majority logic decoder for RS
codes given in Ch. 10, this is a practical scheme.) The first step is to recover
the 6 symbols a,2, ... , a34. Observe from Fig. 13.2 that if there are no errors,

a,2 = Xo + x, + X2 + X3
= X4 + Xs + x6 + x,
= Xs+ X9+ Xto+ X,,
= X12+ X13 + x,4 + x,s, (12)
a 13 =xo+x,+x4+xs
= x2 + X3 + x6 + x,
=xs+x9+x12+xl3
= X10 + x,, + Xt4 + X,s, (13)

a34 = Xo + X4 + Xs +X 12
=:x,+xs+x..+xl3
=: x2 + x6 +X to+ x,4
=: X3 + X1 + X11 + X't5·
Equation (12) gives 4 votes for the value of a 12 , Equation (13) gives 4 votes
for a 13 , and so on. So if one error occurs, the majority vote is still correct, and
thus each ajj is obtained correctly.
To find the symbols a,, ... , a 4 , subtract

from x, giving say x' = x~x; · · · x:s. Again from Fig. 13.2 we observe that

a,= x~+
x;
= x;+ x;

= x;4+ x;,,
a2 = x~+ x;

Now it is easier: there are 8 votes for each a;, and so if there is one error the
Ch. 13. §6. Encoding and decoding (i) 387

majority vote certainly gives each a, correctly. It remains to determine ao. We


have

= aol +error,
and a 0 = 0 or 1 according to the number of 1's in x".
This scheme is called the Reed decoding algorithm, and will clearly work
for any RM code.
How do we find which components of the codeword x are to be used in the
parity checks (12), (13), ... ? To answer this we shall give a geometric des-
cription of the algorithm for decoding ~(r, m). We first find au, where
u = u 1 • • • u, say. The corresponding row of the generator matrix, vu, • · · Vu,,
is the incidence vector of an (m - r)-dimensional subspace S of EG(m, 2). For
example, the double line in Fig. 13.6 shows the plane S corresponding to a 12 •
Let T be the "complementary" subspace to S with incidence vector
v., · · · v.m-•' where {T., ... , Tm-r} is the complement of {u., ... , u,} in
{1, 2, ... , m}. Clearly T meets S in a single point, the origin.
Let U., ... , U 2m-· be all the translates of T in EG(m, 2), including T itself.
(These are shaded in Fig. 13.6.) Each U, meets S in exactly one point.

Theorem 14. If there are no errors, au is given by


au = L
PEUi
Xp, i = 1' 0 0 0 ' 2m-r

0110 1110

0001 1001

Fig. 13.6. EG(4, 2) showing the subspace S (double lines) and U, ... , u. (shaded).
388 Reed-MuJ/er codes Ch. 13. §7.

These equations are a generalization of Equations (12), (13), and give 2m-r
votes for a".

Proof. Because of the form of the generator matrix, the codeword x is

where the sum is over all subsets {Pt. ... , p,} of {1, ... , m} of size at most r.
(This generalizes Equation (11)). Therefore
~ Xp = ~ ap ~ (Vp1 ' ' ' Vp.)P
PEUi p PEU1

where N ( U;, p) is the number of points in the intersection of U1 and the


subspace W with incidence vector Vp, • • • Vp,·
We use the fact that the intersection of two subspaces is a subspace, and
all subspaces (except points) contain an even number of points. Now T and
W intersect in the subspace

If s < r, this subspace has dimension at least 1, and N( U;, p) is even. On the
other hand, if s = r but W rf S, then one of the p, must equal one of the 'Tj, say
P• = 'Tt. Then T and W intersect in

which again has dimension at least 1, and N(U;, p) is even. Finally, if W = S,


N(U,, p) =I. Q.E.D.

This theorem implies that, if no more than [1(2m-r- l)] errors occur,
majority logic decoding will recover each of the symbols a" correctly, where
u is any strin~ of r symbols. The rest of the a's can be recovered in the same
way, as shown in the previous example. Thus the Reed decoding algorithm can
correct [1(d- l)] = [1(2m-r- l)] errors.

§7. Encoding and decoding (II)

The Reed decoding algorithm does not apply if the code is encoded
systematically as an extended cyclic code, as in §8 of Ch. 7. Fortunately
another majority logic decoding algorithm is available, and its description will
lead us to a more general class of codes, the finite geometry codes.
If ~ is any [n, k] code over GF(q), the rows of the H matrix are parity
Ch. 13. §7. Encoding and decoding (II) 389

checks, i.e. define equations


n-1

L h,x, = 0
i=O

which every codeword x must satisfy. Of course any linear combination of


the rows of H is also a parity check: so in all there are q"-k parity checks.
The art of majority logic decoding is to choose the best subset of these
equations.

Definition. A set of parity check equations is called orthogonal on the i'h


coordinate if X 1 appears in each equation, but no other x1 appears more than
once in the set.

Example. Consider the [7, 3, 4] simplex code, with parity check matrix

IIOIOOOJ
0110100
H =
r 0011010 .
0001101
Seven of the 16 parity checks are shown in Fig. 13.7.

0 2 3 4 5 6
I 0 I 0 0 0
0110100
0011010
0001101
1000110
0100011
I 0 I 0 0 0 I
Fig. 13.7. Parity checks on the [7, 3, 4] code.

Rows I, 5, 7 of Fig. 13.7 are the parity checks


Xo+Xt+xJ=O,
Xo + X4 + Xs = 0,
Xo+ x2+ x6 = 0,
which are orthogonal on coordinate 0. Of course these correspond to the lines
through the point 0 in Fig. 2.12. Similarly the lines through I give three parity
checks orthogonal on coordinate I, and so on.
Suppose now that an error vector e occurs and y = x + e is received. If
there are J parity checks orthogonal on coordinate 0, we define S~o ... , S1 to
be the result of applying these parity checks to y. In the above example we
390 Reed-Muller codes Ch. 13. §7.

have
St =Yo+ Yt + Y3 =eo+ e1 + e3,
S2 =Yo+ Y4 + Ys = eo+ e4 + es,
S3 = Yo+ Y2 + y6 = eo+ e2 + e6.

Theorem 15. If not more than [4}] errors occur, then the true value of eo is the
value taken by the majority of the S; 's, with the rule that ties are broken in favor of
0.

Proof. Suppose at most 41 errors occur. (i) If eo= 0, then at most BJ]
equations are affected by the errors. Therefore at least fVl of the S; 's are
equal to 0. (ii) If eo= 1, then less than BJ] equations are affected by the other
errors. Hence the majority of S;'s are equal to 1. Q.E.D.

Corollary 16. If there are J parity checks orthogonal on every coordinate, the
code can correct [4}] errors.

Remarks. (i) If the code is cyclic, once a set of J parity checks orthogonal on
one coordinate has been found, J parity checks orthogonal on the other
coordinates are obtained by cyclically shifting the first set.
(ii) The proof of Theorem 15 shows that some error vectors of weight
greater than [4}] will cause incorrect decoding. However, one of the nice
features of majority logic decoding (besides the inexpensive circuitry) is that
often many error vectors of weight greater than [4}] are also corrected.
(iii) Breaking ties. In case of a tie, the rule is to favor 0 if it is one of the
alternatives, but otherwise to break ties in any way. Equivalently, use the
majority of {0, S~o S 2, ...}.
This method of decoding is called one-step majority logic decoding.
However, usually there are not enough orthogonal parity checks to correct up
to one-half of the minimum distance, as the following theorem shows.

Theorem 17. For a code over GF(q), the number of errors which can be
corrected by one-step majority logic decoding is at most
n-1
2(d'- 1)'
where d' is the minimum distance of the dual code.

Proof. The parity checks orthogonal on the first coordinate have the form
;;,;?: d'-1
XXX 0000
000 xxxx
;;,;?: d'-1
Ch. 13. §7. Encoding and decoding (II) 391

since each corresponds to a codeword in the dual code. Therefore J ~


(n- 1)/(d'- 1). By Remark (ii) above, some error pattern of weight B(n-
1)/(d'- I)]+ I will cause the decoder to make a mistake. Q.E.D.

Examples. (I) For the [23, 12, 7] Golay code, d' = 8, and so at most [22/2.7] = l
error can be corrected by one-step decoding.
(2) Likewise most RS codes cannot be decoded by one-step decoding, since
d' = n- d + 2.

However, there are codes for which one-step majority logic decoding is
useful, such as the [7, 3, 4] code of Fig: 13.7 and more generally the diff-
erence-set cyclic codes described below.

L-step decoding. Some codes, for example RM codes, can be decoded using
several stages of majority logic.

Definition. A set of parity checks S., S 2 , ••• is called orthogonal on coor-


dinates a, b, . .. , c if the sum Xa + xb + · · · + Xc appears in each S;, but no other
x1 appears more than once in the set.

Example. 2-step decoding of the [7, 4, 3] code. The 7 nonzero parity checks are
0 2 3 4 5 6
l l 1 0 l 0 0
0111010
0011101
10011 0
0100111
1010011
1101001
There are two parity checks orthogonal on coordinates 0 and l, namely

two which are orthogonal on coordinates 0 and 2,

and so on. Suppose there is one error. Then the majority rule gives the correct
value of e0 + e 1 (from the first pair of equations), and of e0 + e2 (from the
392 Reed-Muller codes Ch. 13. §7.

·second pair). Now the equations


S4=eo+e.
Ss =eo + e2
are orthogonal on eo and again the majority rule gives eo. This is a two-step
majority logic decoding algorithm to find eo. A circuit for doing this is shown
in Fig. 13.8.

~ = MAJORITY GATE
Fig. 13.8. Two-step majority decoding of the [7, 4, 3] code.

Since the code is cyclic, it is enough to design a decoder which corrects the
first coordinate. The others are then corrected automatically.
A decoder which has L levels of majority logic is called an L-step decoder.
The basic idea, as illustrated in the preceding example, is that the number of
coordinates in the check sums which are being estimated decreases from level
to level, until at the final step we have estimates for the individual coor-
dinates.

Lemma 18. If there are J checks at each stage of the decoding, then [V] errors
can be corrected.

Proof. As for Theorem 15. Q.E.D.

Even with L-step decoding it may not be possible to correct up to half the
minimum distance.

Theorem 19. For a code over GF(q), the number of errors which can be
Ch. 13. §7. Encoding and decoding (II) 393

corrected by L-step majority decoding is at most


n
d' 2'
where d' is the minimum distance of the dual code.

Proof. Suppose there are a set of J parity checks orthogonal on I coordinates.


We shall show that J ~ 2n/ d'- 1, which by Remark (ii) above proves the
theorem. Let the i'h parity check involve a, coordinates besides the I. Since
these checks correspond to codewords in the dual code, we have
I+ a,;;;=. d', a, + aj.;;o. d'(i ¥- j). (14)
Set S = ~f-1 a,. Then S ~ n -I, and from (14),

Jl + s ;;o.Jd', (J- l)S ;;;=. (~d'.


Eliminating I gives n- S ;;;=. (Jd'- S)/1, and eliminating S then gives
(J - l)d' ~ 2n - 2d'. Q.E.D.

Example. For the [23, 12, 7] Golay code, L-step decoding cannot correct more
than 2 errors.

In contrast to this, we have:

Theorem 20. For the r'h .order RM code ~(r, m), (r +I)-step majority decoding
can correct U(d- 1)] = B(2m-•- 1)] errors.

Proof. The dual code is ~(m- r-1, m) and by Theorem 8 the low weight
codewords in the dual code are the incidence vectors of the (r + I)-dimensional
flats in EG(m, 2).
Let V be any r-flat. We will find a set of parity checks orthogonal on the
coordinates yp, P E V. In fact, let U be any (r + 1)-flat containing V. Now
each of the 2m - 2' points not in V determines a U, and each U is determined
by 2'+ 1 - 2' such points. Therefore there are (2m - 2')/(2'+ 1 - 2') = 2m-r- 1
different U's. Any two such U's meet only in V. Thus we have an estimate
for the sum

This estimate will be correct provided no more than [~(2m-•- 1)] errors occur.
We repeat this for all r-flats V.
Next, let W be any (r- 1)-flat, and let V be any r-flat containing W. There
394 Reed-Muller codes Ch. 13. §7.

are 2m-•+•- 1 such V's and from the first stage we know the values of the
corresponding sums. Therefore we can obtain an estimate for the value of the
sum

Proceeding in this way, after r + 1 steps we finally arrive at an estimate for yp,
for any point P, which will be correct provided no more than U{d- 1)] =
B(2m-r- 1)] errors occur. Q.E.D.

Improvements of the decoding algorithm. A more practical scheme than the


preceding is to use the cyclic code ~(r, m)*, since then one need only
construct a circuit to decode the first coordinate. The dual code ~(r, m)*.L
now contains the incidence vectors of all (r + I)-flats in EG(m, 2) which do not
pass through the origin.
This is illustrated by the [7, 4, 3] code, for which we gave a two-step
decoding algorithm in Fig. 13.8. This code is in fact ~(1, 3)*.
The following technique, known as sequential code reduction, considerably
reduces the number of majority gates, but at the cost of increasing the delay
in decoding.
We shall illustrate the technique by applying it to the decoder of Fig. 13.8.
The idea is very simple. In Fig. 13.8, let

denote the output from the first majority gate at successive times. Then
s4 =eo+ e.,
s~ =e.+ e2,

Note that

So if we are willing to wait one clock cycle, Ss can be obtained without using
the second majority gate. The resulting circuit is shown in Fig. 13.9.
In general this technique (when it is applicable) will reduce the number of
majority gates, adders, etc., from an exponential function of the number of
steps to a linear function, at the cost of a linear delay (as in Fig. 13.10).
The name "sequential code reduction" comes from the fact that at each
successive stage of the decoder we have estimates for additional parity
checks, and so the codeword appears to belong to a smaller code. In the
example, after the first stage we know all the sums e, + eh which are in fact
panty cnecKs on the L7, 1, /]repetition cu<le.
Unfortunately, sequential code reduction doesn't apply to all codes (and
even when it does apply, it may be a difficult task to find the best decoder).
Ch. 13. §7. Encoding and decoding (li) 395

Fig. 13.9. Decoder for [7, 4, 3] code using sequential code reduction.

Circuit using sequential code reduction. Circuit for L-step decoding.


Fig. 13.10.

References to this and to other modifications and generalizations of the basic


algorithm are given in the notes.

Threshold decoding. Let us return briefly to the problem of decoding an


arbitrary binary linear code. Suppose the codeword x is transmitted and y is
received. The decoder decides that the most likely error vector e is the coset
leader of the coset containing y, and decodes y as y = x +e. We saw in
Theorem 5 of Ch. I that e is a function of the syndrome S. More precisely e is
a binary vector-valued function of the n- k components S,, S2, ... , S.-k of S
(see Fig. 13.11). For small codes we could synthesize e by a simple com-
binational logic circuit using AND's, OR's, etc. (but no delay elements). This
can be further simplified if the code is cyclic, in which case the circuit is

A
I BUFFER I /\ X
\..,/
I I
I COMPUTE I s I COMPUTE I "e
SYNDROME
s -e-
I I I J
Fig. 13.11.
396 Reed-Muller codes Ch. 13. §7.

called a Meggitt decoder. Figure 13.8 is a simple example of a Meggitt


decoder. See also Problem 36 of Ch. 7.
But larger codes (e.g. BCH codes, Fig. 9.5) require more complicated
components to synthesize e. For example in this chapter we have used
majority gates to decode RM codes. A more general notion is that of a
weighted majority gate, or threshold gate, which is a function 8(v~, ... , vm) of
V,, ... , Vm, With real weights a,, ... , am, defined by

(15)

where these sums are real.


In contrast to majority logic decoding any code can be decoded by a
one-step threshold gate decoding algorithm. It is easy to show that this can
always be done: what is hard is to find an efficient way of doing it.

Theorem 21. (Rudolph.) Any binary linear code can be decoded by one-step
threshold gate decoding.

e
Proof. Write = (f,, ... , /.),where each component f = f(S) = f(S,, ... , S.-k)
is a function of the syndrome. Let F,(S) = l- 2f(S) be the corresponding real
± 1-valued function. By Equation (II) of Ch. 14, F.(S) can be written

F,(S) =
2
!-k L
uev"-k
ft.(u)(-1)" s.

where the F;(u) are the Hadamard coefficients of F,(S) given by Equation (8)
of Ch. 14. Then

~ • ". s)
f(S) = 2l ( 1- 2"-k
l
uE~-• F,(u)(-1) .

If 8 is any threshold gate function of the 2"-k inputs L?::t u;S,, u E V"-\ with
weights a., it is immediate from the definition of 8 that

8(Lu;S,:uEV"-k)=~(l-sgn L a.(-1)""
5
),
I uEVn-k

where sgn(x) = l if x;;;?: 0, =- l if x < 0. Therefore

f(S) = 8(~ u,S,: u E v·-k) (16)

if we take a. = F;(u)/2"-k. Since we can do this for each i, the theorem


follows. Q.E.D.
Ch. 13. §8. Other geometrical codes 397

e
Unfortunately Equation (16) represents as a threshold function of all the
r-k parity checks, so this is not a practical algorithm. However in a number
of cases it has been possible to find a different one-step threshold gate
e
realization of which involves many fewer parity checks (see Notes).

Research Problem (13.2). Find the most efficient one-step threshold gate
realization of a given Boolean function.

§8. Other geometrical codes

(I) Difference-set cyclic codes. Let II be the projective plane PG(2, p') of
order p' (see Appendix on Finite Geometries). II contains n = p 2 ' + p' + 1
points, which can be represented as triples

{J, E GF(ps).

Note that (Af3t. Af32, Af33), A E GF(p')*, is the same point as ({3., {32, {3 3). Each
triple can be regarded as an element of GF(p 3 ' ) , i.e. can be written as a power
of a, where a is a primitive element of GF(p 3 ' ) . Some scalar multiple of each
triple is equal to a' for 0 ~ i < n. We label the n points of the plane by these
powers of a.
Let a\ ... , a\ l = p' + 1, be a line of II. The incidence vector of this line
has 1's in exactly the coordinates it. ... , i,. By the proof of Theorem 10, any
cyclic shift of this vector is the incidence vector of another line. Since th~re
are n shifts and n lines, every line of II is obtained in this way.
Let 9lJ be the code generated over GF(p) by these n incidence vectors, and
let ~ = 9lJ-'-. Clearly 9lJ is a cyclic code of length n. From Theorem 13, 9lJ has
dimension (P;'y + 1.
~ can be decoded by one-step majority logic, as follows. The incidence
vectors of the l = p' + 1 lines through a point of II form a set of orthogonal
checks on that coordinate. (They are orthogonal because two lines through a
point have no other intersection.) By Corollary 16, one-step majority logic
decoding will correct ~(p' + 1) errors, and the code has minimum distance at
least p' + 2.

Examples. (1) The simplest example is when p = 2, s = 1. Then ~ is the


[7, 3, 4] binary simplex code again, as shown in Fig. 13.7.
(2) If p = s = 2, 9lJ is generated by the lines of the projective plane of order
4, and ~ is a [21, 11, 6] binary code.
These codes are closely related to difference sets.

Definition. A planar difference set modulo n = l(l- 1) + 1 is a set of l numbers


398 Reed-Muller codes Ch. 13. §9.

dt. . .. , d, with the property that the 1(1- 1) differences d,- d; (i ¥- j), when
reduced modulo n, are exactly the numbers 1, 2, ... , n- 1 in some order.

Examples. (l) For I= 3, the numbers d, = 0, d2 = 1, d)= 3 form a planar


difference set modulo 7. Indeed, the differences modulo 7 are 1-0 = l,
3- 0 = 3, 3- 1 = 2, 0- 1 = 6, 0- 3 = 4, 1 - 3 = 50
(2) {0, 1, 3, 9} is a difference set modulo 13, and {0, 2, 7, 8, 11} is a difference
set modulo 21.
The only known planar difference sets are those obtained from a projective
plane II of order ps in the following way: let a\ ... , a'• be the points of a line
of fl. Then {it. ... , i,} is a planar difference set. For suppose two differences
are equal, say
i,- is = i, - i., where i, ¥- i,.
Then the cyclic shift of this line which sends the point a'' into the point a'•
gives a new line which meets the first in two points, a contradiction. Thus we
have proved

Theorem 22. (Singer.) If a\ ... , a\ I= ps + 1, are the points of a line in a


projective plane PG(2, ps), then {i,, ... , i,} is a planar difference set modulo
p2s+ps+l.

Research Problem (13.3). Are there any other planar difference sets?

(II) Euclidean and projective geometry codes. Instead of taking 9lJ to be


generated by the incidence vectors of the lines of a projective plane, we may
use the incidence vectors of the r-dimensional flats of a Euclidean geometry
EG(m, p') or a projective geometry PG(m, ps). Then Cf6 = 9lJ.L is a Euclidean
or projective geometry code. There is no simple formula for the dimension of
these codes. Decoding can be done by r-step majority logic decoding as for
RM codes. Further details and generalizations can be found in the references
given in the notes. All of these codes can be regarded as extensions of RM
codes.

§9. Automorphism groups of the RM codes

Let A =(a,;) be an invertible m x m binary matrix. and let b be a binary


m-tuple. The transformation

T: replace(~') by A(~')+ b (17)


Vm Vm,

is a permutation of the set of 2m m-tuples which sends 0 into b.


Ch. 13. §9. Automorphism groups of the RM codes 399

We may also think of T as permuting Boolean functions:

T: replace f(v~, 0 0 0, Vm) by t(L at;V; + b~, 0 0 0, L am;V; + bm ). (18)

The set of all such transformations T forms a group, with composition as


the group operation. The order of this group is found as follows. The first
column of A may be chosen in 2m - l ways, the second in 2m -2, the third in
2m -4, .... Furthermore there are 2m choices for b. So this group, which is
called the general affine group and is denoted by GA(m), has order
IGA(m)l = 2m(2m- l)(2m- 2)(2m- 22) • • ·(2m- 2m-l). (19)
A useful approximation to its order is
IGA(m)l = 0.29 2m'+m for m large.
(We encountered another form of this group in §5 of Ch. 8.)
It is clear from (18) that iff is a polynomial of degree r, so is Tf Therefore
the group GA(m) permutes the codewords of the r'h order RM code ~(r, m),
and
GA(m) C Aut ~(r, m). (20)
The subgroup of GA(m) consisting of all transformations

T <eplace CJ {J by (21)

(i.e., for which b = 0) is the general linear group GL(m, 2) (see §5 of Ch. 8),
and has order
IGL(m, 2)1 =(2m- l)(2m- 2)(2m- 2 2) • • • (2m- 2m-l)
= 0.29 2m' for m large. (22)
Since (21) fixes the zero m-tuple, the group GL(m, 2) permutes the code-
words of the punctured RM code ~(r, m)*:
GL(m, 2) C Aut ~(r, m)*. (23)

Note that GL(m, 2) is doubly transitive and GA(m) is triply transitive


(Problem 9).

Theorem 23. For l ~ r ~ m- l,


(a) Aut ~(r, m)* C Aut ~(r + l, m)*
(b) Aut ~(r, m) C Aut ~(r + l, m).

Proof of (b). Let x~, ... , x 8 be the minimum weight vectors of ~(r, m). For
7T E Aut ~(r, m ), let 7TX; =xi'. Now X; is an (m - r)-flat. If Y is any (m - r- 1)-
400 Reed-Muller codes Ch. 13. §9.

flat, then for some i, j, Y = x, * X;. Therefore


'7TY = '7T(X, * X;)

= '7TX; * '7TX; = X;· * Xr,

which is the intersection of two (m- r)-ftats~ and contains 2m-•-• points since
'7Tis a permutation. Thus '7TY is an (m - r - 1)-ftat. So '7T permutes the
generators of ~(r+ 1, m), and therefore preserves the whole code. Part (a) is
proved in the same way. Q.E.D.

It is immediate from Problem 5 that


Aut ~(r, m)* = ::f2m_ 1 for r = 0 and m- 1,
Aut ~(r, m) =::12m for r = 0, m- 1 and m.
In the remaining cases we show that equality holds in (20) and (23).

Theorem 24. For 1 ,;;;; r ,;;;; m - 2,


(a) Aut ~(r, m)* = GL(m, 2),
(b) Aut ~(r, m) = GA(m).

Proof. (i) We have


puncture 0
add 1 coordinate
simplex code 'X!,---~ ~ (1, m )* ~(1. m).

By Problem 30 of Ch. 8, Aut 'Je!, =Aut ~(1. m)*. From (23), and the remark
following Theorem 13 of Ch. 8, since 'X;;, has dimension m,
Aut 'Je!, =Aut ~(I, m)* = GL(m, 2) .
Finally, by Problem 29 of Ch. 8, Aut 'Xm = GL(m, 2).
(ii) Let G.= Aut ~(1. m)*, G.2 =Aut ~(I, m). Clearly G 1 is the subgroup of
G2 which fixes the 0 coordinate. Since GA(m) is transitive, so is G 2. Each
coset of G 1 in G 2 sends 0 to a different point, so /G2/ =2m/G./. Therefore from
(19) and (22) G2 = GA(m). Again by Problem 29 of Ch. 8, Aut ~(m- 2, m) =
Aut ~(1. m) = GA(m).
(iii) From Theorem 23 and (i), (ii),
GA(m) =Aut ~(1. m) C Aut ~(2, m) C · · · C Aut ~(m -·2, m) = GA(m)
GL(m, 2) =Aut ~(1. m)* C Aut ~(2, m)* C · · · C Aut ~(m- 2, m)*
= GL(m, 2).
Q.E.D.

Problem. (9) Show that GL(m, 2), GA(m) are in fact groups of the stated
orders, and are respectively doubly and triply transitive.
Ch. 13. §10. Mattson-Solomon polynomials of RM codes 401

"'§10. Mattson-Solomon polynomials of RM codes

In this section we shall show that the Boolean function defining a code-
word of an RM code is really the same as the Mattson-Solomon polynomial
(Ch. 8) of the codeword.
Let a be a primitive element of GF(2m). Then I, a, ... , am-I is a basis for
GF(2m). Let A0 , ••• , Am-I be the complementary basis (Ch. 4). We shall now
consider RM codes to be defined by truth tables in which the columns are
taken in the order 0, I, a, a 2, ... , a 2m_ 2 • For example, when m = 3, the truth
table is shown in Fig. 13.12.
0 I a a2 aJ a4 as a6

VJ 0 0 0 I 0 I
v2 0 0 1 0 I 0
V1 0 I 0 0 0

Fig. 13.12.

Lemma 25. The MS polynomial of the codeword vi in £1ll(l, m) is Tm(Aiz), where


Tm is the trace function defined in §8 of Ch. 4.

Proof. Let .tU be the matrix consisting of the rows Vm, ... , V1 of the truth table,
with the first, or zero, column, deleted . .tU is an m x 2m- 1 matrix (.itlk;), and

We must show that

In fact,
m-1
Tm(Aiai) = L .tftlkiTm(Aiam-k-l) = .Jlm-j-1.0,
k-0

by the property of a complementary basis. Q.E.D.

Corollary 26. If a is any binary vector of length 2m, corresponding to the


Boolean function a(v~, ... , Vm), then the MS polynomial of a is
(24)

Notes. (i) A(O) = ~~= 0 A(ai) = a(O, .. . , 0) is an overall parity check on a.


2

(ii) When evaluating the RHS of (24), high powers of z are reduced by
402 Reed-Muller codes Ch. 13. §11.

z 2m-t = I. However, "nee A(z) has been obtained, in order to use the proper-
ties of MS polynomials given in Ch. 8, A(z) must be considered as a
polynomial in .'i'[z], .'1' = GF(2m).
Conversely, if a IS the vector with MS polynomial A(z), the B.f. cor-
responding to a is

Example. Suppose m = 3 and the codeword is 8~, so that the MS polynomial is


A(z) = z + z 2+ Z4 • Then the corresponding B.f. is
A(v. + v2 a + v3a 2) = v. + V2 T3(a) + V3 T3(a)
= v. + (v2+ V3)(a + a 2+ a 4) = v.,
in agreement with Fig. 13.12.

Problem. (10) Repeat for the codeword 8~.

*§II. The action of the general affine group on Mattson-Solomon polynomials

Definition. An affine polynomial is a linearized polynomial plus a constant (see


§9 of Ch. 4), i.e. has the form
m-1

F(z) =yo+ L y;z 2',


i=O
where y; E GF(2m).

Problem. (ll) Show that those zeros of an affine polynomial which lie in
GF(2m) form an r-ftat in EG(m, 2).

The main result of this section is the following:

Theorem 27. A transformation of the general affine group GA(m) acts on the
MS polynomial A(z) of a vector of length 2m by replacing z by F(z), where
F(z) is an affine polynomial with exactly one zero in GF(2m). Conversely, any
such transformation of MS polynomials arises from a transformation of
GA(m).

Proof. Consider any transformation belonging to GA(m), say


VT ~AVT + b,
where
V = (v., ... , Vm) = (Tm(Aoz), ... , Tm(Am-tZ).).
Ch. 13. Notes 403

The variable z in the MS polynomial is related to v by

where a = (1, a, 0 0 0 'am-I). Thus z is transformed as follows:


z~aAvT +a· b. (25)

where .\ 21 =(A~, ... , A~_,) and the T denotes transpose.


Let aA = p = ({3 0 , ••• , f3m-•) be a new basis for GF(2m). Then
z ~a . b + (/3 . A. T)z + (/3 . (A 2)T)z2 + ... + ({3 . (A. 2m-y)z2m-•, (26)

which is an affine polynomial.

Problem. (12) Show that the polynomial (26) has exactly one zero in GF(2m).
Conversely, any transformation
z ~ F(z) = U 0 + f(z), U0 E GF(2m), (27)
when f(z) is a linearized polynomial and F(z) has exactly one zero in GF(2m),
is in GA(m). We decompose (27) into
z ~z + Uo,
which is clearly in GA(m), followed by
m-1

z ~ f(z) = L Y;Z
j=O
2
j. (28)

Problem. (13) Show that (28) is in GL(m, 2), i.e. has the form
m-1

z~ L (/3. (A2Y)z2i
i=O

for some basis {3 0 , ••• , f3m-• of GF(2m).

Notes to Chapter 13.

§l. Reed-Muller codes are named after Reed [1104] and Muller [975]
(although Peterson and Weldon [1040, p. 141] attribute their discovery to an
earlier, unpublished paper of Mitani [963]).
404 Reed-Muller codes Ch. 13.

Nonbinary RM codes have been defined by several authors- see Delsarte


et al. [365], Kasami, Lin and Peterson [739-741], Massey et al. [924] and
Weldon [1401]. See also [99, 1396].

§2. Truth tables are widely used in switching theory (see for example
McCluskey [935, Ch. 3] and in elementary logic, where they are usually
written with FALSE instead of 0 and TRUE instead of 1 (see for example
Kemeny et al. [755, Ch. 1]). In either form they are of great importance in
discrete mathematics. For the disjunctive normal form see Harrison [606, p.
59] or McCluskey [935, p. 78].

§3. Problem 6 is due to S.M. Reddy.

§4. Lemma 6 is from Rothschild and Van Lint [1128].

§5. The proof of Theorem 10 is taken from' Delsarte [343].


Lucas' theorem for multinomial coefficients was used in the proof of
Theorem 10. This is a straightforward generalization of the usual Lucas
theorem for binomial coefficients, which in the binary case is as follows.

Theorem 28. (Lucas [862].) Let the binary expansions of n, k and I = n- k be


~ n-2'
n = L.J ' '
k = Lk,2',
i
I= L 1,2',
'
where n., k,, I, are 0 or 1. Then

(kn) _{0 (mod


=
1 (mod 2) iff k,,;;;; n, for all i
2) iff k, > n, for some i.
Equivalently,

(kn) = {10 (mod 2) iff k. +I;,;;;; n;


(mod 2) iff k; + /, > n,
for all i
for some i.

For a proof see for example Berlekamp [113, p. 113]. Singmaster [1216] gives
generalizations.
That RM codes are extended cyclic codes (Theorem 11) was simul-
taneously discovered by Kasami, Lin and Peterson [739, 740] and Kolesnik
and Mironchikov (see [774] and the references given there). See also Camion
[237].
Theorem 13 is from MacWilliams and Mann [884]. The ranks of the
incidence matrices of subspaces of other dimensions have been determined
by Goethals and Delsarte [499], Hamada [591], and Smith [1243-1246].

§6. The Reed decoding algorithm was given by Reed [1104], and was the first
nontrivial majority logic decoding algorithm. For more about this algorithm
Ch. 13. Notes 405

see Gore [544] and Green and San Souci [555]. Massey [918] (see also [921])
and Kolesnik and Mironchikov (described in Dobrushin [378]) extensively
studied majority logic and threshold decoding. Rudolph [1130, 1131] in-
troduced one-step weighted majority decoding, and this work was extended
by Chow [294], Due [387], Gore [544, 545], Ng [990] and Rudolph and Robbins
[1134].
Techniques for speeding up the decoding of Reed-Muller codes were given
by Weldon [1403]. See also Peterson and Weldon [1040, Ch. 10]. Decoding
Reed-Muller and other codes using a general purpose computer has been
investigated by Paschburg et al. [1025, 1026].
Due [388] has given conditions on a code which must be satisfied if L-step
decoding can correct [~(d- l)] errors. See also Kugurakov [786].
The Reed decoding algorithm will correct many error patterns of weight
greater than [!(d- 1)]. Krichevskii [784] has investigated just how many.
Theorems 17 and 19 are given by Lin [835-837].
Other papers on majority-logic decoding are Berman and Yudanina [137],
Chen [271], Delsarte [348], Due and Skattebol [390], Dyn'kin and Tene-
gol'ts [397], Kasami and Lin [732, 733], Kladov [763], Kolesnik [773], Lon-
gobardi et al. [860], Redinbo [llOl], Shiva and Tavares [1205], Smith [1245]
and Warren [1390].

§7. Decoding by sequential code reduction was introduced by Rt~dolph and


Hartmann [1132]. Some applications are given in [1112]. Meggitt decoders
were described in [952]. Theorem 21 is due to Rudolph [1131]. Rudolph and
Robbins [1134] show that the same result holds using only positive weights a;
in (15). Longobardi et a!. [860] and Robbins [1115] have found efficient
threshold decoding circuits for certain codes. '

§8. Difference-set cyclic codes were introduced by Weldon [ 1400]. Their


dimension was found by Graham and MacWilliams [553]. For Theorem 22 see
Singer [1213].
For more about difference sets see Baumert [82], Hall [589], Mann [908]
and Raghavarao [1085, Ch. 10].
Euclidean and projective geometry codes were studied by Prange [1074,
1075], but the first published discussion was given by Rudolph [1130]. There is
now an extensive literature on these codes and their generalizations- see for
example Chen [267-270], Chen et al. [273, 274, 276], Cooper and Gore [307],
Delsarte [343], Goethals [491, 494], Goethals and Delsarte [499], Gore and
Cooper [548], Hartmann et a!. [611], Kasami et al. [732, 740, 741], Lin et al.
[833, 835-837, 840], Rao [1092], .Smith [1246] and Weldon [1402].

§9. The class of codes which are invariant under the general affine group has
been studied by Kasami, Lin and Peterson [738] and Delsarte [344].

§ll. Problem 11 is from Berlekamp [113, Ch. 11].


First-order Reed-Muller
codes
§I. Introduction

In this chapter we continue the study of Reed-Muller codes begun in the


previous chapter, concentrating on the first-order codes ~ ( 1, m ). Under the
name of pseudo-noise or PN sequences, the codewords of ~(1. m), or more
precisely, of the simplex code ::lm, are widely used in range-finding,
synchronizing, modulation, and scrambling, and in §2 we describe their
properties. In §3 the difficult (and essentially unsolved) problem of classifying
the cosets of ~(1. m) is investigated. Then §4 describes the encoding and
decoding of ~(1. m). This family of codes is one of the few .for which
maximum likelihood decoding is practical, using the Green Machine decoder.
(One of these codes was used to transmit pictures from Mars.) The final
section deals with cosets of greatest minimum weight. These correspond to
Boolean functions which are least like linear functions, the so-called bent
functions.

§2. Pseudo-noise sequences

The codewords (except for 0 and I) of the cyclic [2m- 1, m, 2m-t] simplex
code Ym or the [2m, m + 1, 2m-t] extended cyclic first-order Reed-Muller code
~(1. m) resemble random sequences of O's and 1's (Fig. 14.1). In fact we shall
see that if cis any nonzero codeword of ::lm, then c has many of the properties
that we would expect from a sequence obtained by tossing a fair coin 2m - 1
times. For example, the number of O's and the number of 1's in c are as nearly
equal as they can be. Also, define a run to be maximal string of consecutive
identical symbols. Then one half of the runs in c have length 1, one quarter
Ch. 14. §2. Pseudo-noise sequences 407

0000000
1110100
0 1 1 10 10
0 0 1 1 10 1
10 0 1 1 10
0 10 0 1 1
0 10 0 1
1 1 0 10 0
Fig. 14.1. Code words of the [7, 3, 4) cyclic simplex code.

have length 2, one eighth have length 3, and so on. In each case the number of
runs of 0 is equal to the number of runs of 1. Perhaps the most important
property of c is that its auto-correlation function is given by

p(O) = 1, p( T) = - m1_ for 1 ,;;;; T ,;;;; 2m - 2


2 1
(see Fig. 14.4).
This randomness makes these codewords very useful in a number of
applications, such as range-finding, synchronizing, modulation, scrambling,
etc.
Of course the codewords are not really random, and one way this shows up
is that the properties we have mentioned hold for every nonzero codeword in
a simplex code, whereas in a coin-tossing experiment there would be some
variation from sequence to sequence. (For this reason these codewords are
unsuitable for serious encryption.)
These codewords can be generated by shift registers, and we now describe
how this is done and give their properties.
Let
h(x) = Xm + hm-tXm-t + · · · + htX + 1
be a primitive irreducible polynomial of degree m over GF(2) (see Ch. 4). As
in §3 of Ch. 8, h(x) is the check polynomial of the [2m - 1, m, 2m-t] simplex
code ::lm. We construct a linear feedback shift register whose feedback
connections are defined by h(x), as in Fig. 14.2.

OUTPUT

Fig. 14.2. Feedback shift register defined by h(x).


408 Rrst-order Reed-Muller codes Ch. 14. §2.

~ 1
Lo--o--crlur"4•. . ----·----- OUTPUT SEQUENCE

•• '1111 01 0110010001
0 0 0 1
1 0 0 0 PERIOD = 1!5
0 1 0 0
0 0 1 0
1 0 0 1
1 1 0 0
0 1 1 0
STATES
1 0 1 1
0 1 0 1
1 0 1 0
1 1 0 1
1 1 1 0
1 1 1 ,1
0 1 1 1
0 0 1 1
0 0 0

(REPEATS)

Fig. 14.3. Shift register corresponding to h(x) = x• + x +I, showing successive states.

For example, if h(x) = x 4 + x + 1, the shift register is shown in Fig. 14.3.


Suppose the initial contents (or state) of the shift register are
am-•• ... , a 1 , ao as in Fig. 14.2. The output is taken from the right-hand end of
the register, and is the infinite binary sequence a= aoa 1a 2 •• ••

Definition. For any nonzero initial state, the output a is called a pseudo-noise
(or P N) sequence. (These sequences are also called pseudo-random
sequences, m-sequences, or ·maximal length feedback shift register
sequences.) An example is shown in Fig. 14.3, which gives the successive
states of the shift register if the initial state is 0001. The output sequence is
the 4th column, i.e.,
a= a 0 a 1 • • • = 100 010 011 010 111,100 ... (1)

having period 15.

Properties of a PN sequence.

Property I. A P N sequence a satisfies the recu"ence

forl=m,m+1, ....
Ch. 14. §2. Pseudo-noise sequences 409

Property II. For some initial state a0... am-t


the sequence a is periodic with
period n =2m- l, i.e. ao+, =a, for all i;;;?: 0, and n is the smallest number with
this property.

Proof. Observe that for any a, Property (I) implies

Call this m x m matrix U. Then

(
a,_~+') (a,~m)
. =U 2
(a,_~_,)
0
. =U 0
. =···
0

a, a,_, a,_2

= u'-m+l ("'. )
Um-1'

Now U is the companion matrix of h(x) (see §3 of Ch. 4), and by problem 20
of Ch. 4, n =2m - I is the smallest number such that U" = I. Therefore there
is a vector b = (a 0 ••• am-t) such that U"b T = b T' and U'b T ¥- b T for l ~ i ~
n - l. If the initial state is taken to be b, then a has period 2m - l. Q.E.D.

Property III. With this initial state, the sh1jt register goes through all possible
2m - l nonzero states before repeating.

Proof. Obvious because the period is 2m - l. Q.E.D.

Note that the zero state doesn't occur, unless a is identically zero. Also 2m- l
is the maximum possible period for an m-stage linear shift register.

Property IV. For any nonzero initial state, the output sequence has period
2m- I, and in fact is obtained by dropping some of the initial digits from a.

Proof. This follows from (III). Q.E.D.


410 Rrst-order Reed-Muller codes Ch. 14. §2.

Now let us examine the properties of any segment

of length n = 2m - 1 from a.

Property V. c belongs to the simplex code Ym with check polynomial h(x).

Proof. c is clearly in the code with parity check polynomial h(x), since c
satisfies Equation (12) of Ch. 7. Since h(x) is a primitive polynomial, this is a
simplex code (by §3 of Ch. 8).

Property VI. (The shift-and-add property.) The sum of any segment c with a
cyclic shift of itself is another cyclic shift of c.

Proof. Immediate from V. Q.E.D.

Problems. (1) (The window property.) Show that if a window of width m is


slid along a PN sequence then each of the 2m - I nonzero binary m-tuples is
seen exactly once in a period.
(2) Consider an m-stage shift register defined by h(x) (as in Fig. 14.2),
where h(x) is not necessarily a primitive polynomial. If the initial state is
ao = 1, a,=" ""=am-I= Q, ShOW that the period Of a iS p, Where p iS the
smallest positive integer for which h(x) divides xp- 1.
(3) If h(x) is i"educible, but not necessarily primitive, then p divides
2m -1.
(4) If a = a 0a, ... is a PN sequence, then so is b = aoa1a2i ... if i is
relatively prime to 2m - 1.
(5) Some shift of a, i.e. b = a,a,+,a•+2 · · · = bob1b2 . .. (say) has the property
that b, = b 2 , for all i. (b just consists of repetitions of the idempotent of the
simplex code- see §4 of Ch. 8.)

Pseudo-randomness properties.

Property VII. In any segment c there are 2m-I 1's and 2m-!- 1 O's.

Proof. From (V). Q.E.D.

Property VIII. In c, one half of the runs have length 1, one quarter have length
2, one eighth have length 3, and so O'J, as long as these fractions give integral
numbers of runs. In each case the number of runs of O's is equal to, the number
of runs of 1's.
Ch. 14. §2. Pseudo-noise sequences 411

Problem. (6) Prove this.

Autocorrelation function. We come now to the most important property, the


autocorrelation function. The autoco"elation function p( T) of an infinite real
or complex sequence s 0 s,s 2 ••• of period n is defined by
J n-1

p( T) = -n ;-o
~ s;s>+; for T = 0, ± 1, ± 2, ... (2)

where the bar denotes complex conjugation. This is a periodic function:


p(T) = p(T + n). The autocorrelation function of a binary sequence a 0a, ... of
period n is then defined to be the autocorrelation function of the real
sequence (-1)"", (- It•, ... obtained by replacing l's by -l's and O's by+ l's.
Thus
J n-1
p( 'T) = - ~ (- oa;+a•• J. (3)
n ;-o
1
Alternatively, let A be the number of places where ao ... an-• and the cyclic
shift a~aH, ... a~+n-• agree, and D the number of places where they disagree
(so A+D= n). Then
A-D
p(T)=--. (4)
n
For example, the PN sequence' (1) has autocorrelation function p(O) = 1,
= -1, for 1 ~ 'T ~ 14, as shown in Fig. 14.4.
p(T)

Property IX. The autocorrelation function of a PN sequence of period n =


2"'- 1 is given by
p(O) =1
(5)

p!il

I
30~
\_J_ !5
!5

Fig. 14.4. Autocorrelation function of a PN sequence.


412 Rrst-order Reed-Muller codes Ch. 14. §3.

Proof. From (4)


n- 2d
p(T
) =--- (6)
n
Where d = dist (ao . .. an-t. aT ... aT+n-1) = Wt (au . .. au+n-1) for SOme U, by (VJ).
The result then follows from (V) and (VII). Q.E.D.

Problems. (7) Show that (5) is the best possible autocorrelation function of
any binary sequence of period n = 2m - 1, in the sense of minimizing max p(i).
O<i<n

(8) (A test to distinguish a PN sequence from a coin-tossing sequence.)


Let Co, . . . , cN-1 be N consecutive binary digits from a PN sequence of
period 2m- 1, where N <2m- 1, and form the matrix

where m < b < !N. Show that the rank of M over GF(2) is less than b. [Hint:
there are only m linearly independent sequences in .'I'm.] On the other hand,
show that if Co, . . . , CN-1 is a segment of a coin-tossing sequence, where each
cj is 0 or 1 with probability t then the probability that rank (M) < b is at most
22b-N-'. This is very small if b ~ !N.
Thus the question "Is rank (M) = b ?" is a test on a small number of digits
from a PN sequence which shows a departure from true randomness. E.g. if
m = 11, 2m - 1 = 2047, b = 15, the test will fail if applied to any N =50
consecutive digits of a PN sequence, whereas the probability that a coin-tossing
sequence fails is at most 2- 2 '.

§3. Cosets of the first-order Reed-Muller code

The problem of enumerating the cosets of a first-order Reed-Muller code


arises in various praCtical situ:;ttions. The general problem is unsolved (see
Research Problem l.l of Chapter 1), although the cosets have been
complete!}' enumerated for n ,;;;; 32. However, there are a few interesting
things we can say. For example, finding the weight distribution of the coset
containing v amounts to finding the Hadamard transform of v. Also
enumerating the cosets of ~ (l, m) is equivalent to classifying Boolean
functions modulo linear functions. The properties of the cosets described here
will also be useful in decoding (see §4). We end this section with a table of
co sets of ~(I, 4). Those co sets of greatest minimum weight are especially
interesting and will be studied in the final section of the chapter.
Ch. 14. §3. Cosets of the first-order Reed-Muller code 413

Notation. As in §2 of the preceding chapter, v = (v., ... , Vm) denotes a vector


which ranges over vm, and if f(vh Vm) is any Boolean function, I is the
0 0 0

corresponding binary vector of length 2m.


The first-order Reed-Muller code 87t (1, m) consists of all vectors

u 1+ ~ u,v,, u, = 0 or l, (7)
0 i=l

corresponding to linear Boolean functions. Define the orthogonal code Om to


be the [2m, m, 2m-'] code consisting of the vectors

~ u,v, = u · v, u,E{O, 1}.


i=l

Thus

Suppose that in the codewords of 87t(l, m) we replace l 's by - l 's and O's
by I 's. The resulting set of 2m+l real vectors are the coordinates of the vectors
of a regular figure in 2m-dimensional Euclidean space, called a cross-polytope
(i.e. a generalized octahedron). For example, when m = 2, we obtain the
4-dimensional cross-polytope (also called a 16-cell) shown in Fig. 14.5.

11-- 1111

--11
Fig. 14.5. 4-dimensional cross-polytope, with vertices corresponding to codewords
of ~(1.2).
414 Rrst-order Reed-Muller codes Ch. 14. §3.

This set of real vectors is also called a biorthogonal signal set- see
Problem 43 of Ch. l.
If the same transformation is applied to the codewords of (Jm, we obtain a
set of 2m mutually orthogonal real vectors.
For any vector u = (Ut, ...• Um) in vm. f(u) will denote the value off at U,
or equally the component of I in the place corresponding to u.
It will be convenient to have a name for the real vector obtained from a
binary vector 1 by replacing l's by - l's and O's by + l's- call it F. Thus the
component of F in the place corresponding to u is
F(u) :o ( - 1)1<•>

Hadamard transforms and cosets of ~(1. m). Recall from Ch. 2 that the
Hadamard transform of a real vector F is given by

F(u)= L
vEVm
(-I)"""F(v), uEVm,

= L (- l)".
vEVm
v+f(v) (8)

F is a real vector of length 2m. Alternatively


F :o FH,
where H is the 2m x 2m symmetric Hadamard matrix given by
H = (H•. v). H •. v = ( - l)". ", U, v E vm. (9)
Consequently
l A

F= mFH, (10)
2
or

F(v)= ~
2 L uEVm
(-I)"."F(u). (11)

Observe from (8) that F(u) is equal to the number of O's minus the number of
l's in the binary vector
m

1+ L
i=l
u,v,.

Thus

F(u) 2m - 2 dist {1. ~ u,v,J. (12)


·-·
:o

or

(13)
Ch. 14. §3. Cosets of the first-order Reed-Muller code 415

Also
dist {t. l + ~ U;V;} = H2m + F(u)} (14)

Now the weight distribution of that coset (of a code 'tii) which contains f gives
the distances of f from the codewords of 'tii. Therefore we have proved:

Theorem l. The weight distribution of that coset, of {Jft (l, m) which contains f is
l A

¥2m± F(u)} for u E Vm.

The weight distribution of the coset containing f is thus determined by the


Hadamard transform of F.

Problem. (9) If the coset of Om containing f has weight distribution A;(f),


0 ~ i ~2m, show that the coset of {Jft(l, m) containing f has weight distribution
A; {f)= A;(f) + A~m-;{j), 0 ~ i ~2m.

Equations 13 and 14 say that the closest codeword of {Jft(l, m) to f is that


L for which IF(u)l is largest.
U;V;

Example. For m = 2, we have


v2 = 0 0 l l
v. = 0 l 0 l
Suppose f =0 0 0 l = v.v2.
Then F =l l l - (where - stands for - 1).
The Hadamard transform coefficients of F are, from (8),
F(OO) = F(OO) + F(Ol) + F(IO) + F(ll)
=l+ l+ l- l = 2,
FtOI) = F(OO)- F(Ol) + F(IO)- F(ll) = 2,
F(IO) = 2,
ftoo =- 2.
Indeed,
F = ~2 X (I Ill)+ 2 X (1-l-) + 2 X (II--)
-2 X (1--l)} = Ill-,
verifying (II).
The code {Jft(l,2) consists of the vectors {OOOO,OIOl,OOll,OllO, llll, 1010,
1100, 1001}. The weight distribution of the coset containing f = 0001 is,
416 Rrst-order Reed-Muller codes Ch. 14. §3.

according to Theorem 1,

At([)= 4, AJ(f) = 4,

which is indeed the case.


The Hadamard transform coefficients of a ( + 1, -1)- vector F satisfy the
following orthogonality relation.

Lemma 2.
22m if v = 0
L
wEV'"
F(u)F(u + v) = {
0 if v ¥- 0

Proof.
LHS = L L (- 1)" · wF(w)o L (- 1)<•+vl·xF(x)
wEV'" wEV'" xEV'"

= L (-1)"-xF(w)F(x) L (-l)•·<w+x).
w,xe: V'" wE V'"

The inner sum is equal to 2mc5w.x· Therefore

LHS=2m L
wEV'"
(-l)"-wF(w) 2

=2m L
wEV'"
(-l)"·w, since F(w)=±l.

Q.E.D.

Corollary 3. (Parseval's equation.)

(15)

Note that Corollary 3 and Equation (12) imply that the weight distribution
A :<f), 0 ~ i ~2m, of the coset f + (Jm satisfies
2m
L
(2m - 2i)2 A:(f) = 22m.
i=O

Boolean functions and cosets of ~(1. m). The codewords of ~(1. m) are the
linear functions (7) of v,, ... , Vm. Let us say that two Boolean functions
f(vt. . .. , vm) and g(v,, ... , vm) are equivalent if the difference
f(v,, . .. , Vm)- g(v,, ... , Vm)

is in ~(I, m ). If this is so, f and g are in the same coset of ~(I, m ).


Equivalence classes of Boolean functions under this definition of equivalence
are in 1-1- correspondence with the cosets of ~(1. m).
Ch. 14. §3. Cosets of the first-order Reed-Muller code 417

Theorem 4. Suppose the Boolean functions f and g are related by


g(v) = f(Av +B), (16)

for some invertible m x m binary matrix A and some binary m-tuple B. We say
that g is obtained from f by an affine transformation. Then the cosets of
~(1, m) containing f and g have the same weight distribution.

Proof. From Theorem l it is enough to show that the sets [ ± G(u): u E V"'}
and { ± F(u): u E V'"} are equal. In fact,

G(u) = L (- l)" · vG(v)


vevm

= L (- 1)"
11EVm
vF(Av +B).

Set v =A -I w +A -I B, then

G(u)= L (-l)•·A-'w(-1)" A'aF(w)


wEVm

= ± L (- l)" · ~F(w),
wE\/m
where u' = uA -I

= ± F(u'). Q.E.D.
Therefore, in order to lump together cosets with the same weight dis-
tribution, we can introduce a stronger definition of equivalence. Namely we
define f and g to be equivalent if
m

g(v) = /(Av +B)+ aol + L a,v, (17)


i-t

for some binary invertible matrix A, vector B and constants a,. Now all
Boolean functions in the same equivalence class belongs to cosets with the
same weight distribution.
However, the cosets containing f and g may have the same weight
distribution even if f and g are not related as in Equation ( 17). The first time
this happens is when n = 32, when for example the cosets containing

and

both have the weight distribution


A:2(f) = A;,(f) = 16,
We conclude this section with a table giving the weight distribution of the
co sets of ~(I, 4) (Fig. 14.6). The table gives, for each weight distribution, the
Typical
Weight Distribution
Boolean
Number function 012345 6 7 8 9 10 11 12 13 14 15 16 Remarks
------ --------
1 30 The code itself
16 1234 15 15
120 123 7 16 7
560 1234+ 12 3 12 12
840 123 + 14 2 12 2
35 12 4 24 4
448 1234+ 12+34 6 10 10 6
28 12+ 34 16 16 Bent functions
Fig. 14.6. Cosets of first-order Reed-Muller code of length 16.
Ch. 14. §4. Encoding and decoding ~(1, m) 419

simplest corresponding Boolean function f and the number of cosets having


this weight distribution. For example, the last line of the table means that there
are 28 cosets with weight distribution
A6(f) = A w(f) = 16,
where f = v.v 2 + v3 v4 or is obtained from this by a substitution of the form shown
in Equation (17). These 28 f's are called bent functions, for reasons which will be
given in the final section (see also Problem 16).

§4. Encoding and decoding ~(1, m)

General techniques for encoding and decoding Reed-Muller codes were


described in Ch. 13. However there are special techniques for first-order
Reed-Muller codes which are described in this section.
~(1. m) is a [2m, m + 1, 2m-•] code, and so has low rate and can correct
many errors. It is therefore particularly suited to very noisy channels. For
example ~(1. 5) was succesfully used in the Marine.r 9 spacecraft to transmit
pictures from Mars, one of which is shown in Fig. 14.7. (Each dot in the
picture was assigned one of 64 levels of greyness and encoded into a 32-bit
codeword.)
Encoding is especially simple. We describe the method by giving the
encoder for ~(1. 3), an [8, 4, 4] code.
A message UoU1u2u3 is to be encoded into the codeword

1 1 1 1 1 1 1 1]
00001111 (18)
(XoXI ... X,)= (UoU1U2U3) O O OO
[ 1 1 1 1
0 1 0 1 0 1 0 1

This is accomplished by the circuit shown in Fig. 14.8. The clock circuit in
Fig. 14.8 goes through the successive states t.t2t3 = 000,001,010,011, 100, ... ,
111, 000, 001, ... (i.e., counts from 0 to 7). The circuit forms

which, from Equation (18), is the codeword XoX• ... x,. Nothing could be
simpler.

Decoding. As we saw in §3 of Ch. 1, maximum likelihood decoding requires


comparing the received vector 1 with every codeword (7) of ~(1. m). I.e., we
must find the distance from 1 to every codeword of ~(1. m), and then decode
f as the closest codeword. From Equations (13) and (14) above, this amounts
to finding the largest component IF(u)i, where F is the Hadamard transform
of F given by Equation (8). Suppose the largest component is IF(u., . .. , Um)l.
420 Rrst-order Reed-Muller codes Ch. 14. §4.

Fig. 14.7. Part of the Grand Canyon on Mars. This photograph was transmitted by
the Mariner 9 spacecraft on 19 January 1972 using the first-order Reed-Muller code
!?l( I, 5). Photograph courtesy of NASA/JPL.

MESSAGE

CODEWORD
X7 • • •Xt XQ

CLOCK

Fig. 14.8. Encoder. for !?l(l, 3).


Ch. 14. §4. Encoding and decoding ~ (1, m) 421

If F(u~, ... , Um) ~ 0 we decode f as


m

L u,v,,
i=l

(from ( 13)), whereas if F(u~, ... , Um) < 0 we decode f as


m

I+ L u,v,.
i=l

Direct calculation of F = FH2m by multiplying F and H 2m would require


about 2m X 2m = 22 m additions and subtractions. Fortunately there is a faster
way to obtain F, which is a discrete version of the so-called Fast Fourier
Transform. This is possible because H2m can be written as a product of
m 2m x 2m matrices, each of which has only two non-zero elements per
column. Thus only m 2m additions and subtractions are needed to evaluate
F = FH2m•
In order to explain this decomposition of H2m we must introduce
Kronecker products.

Kronecker product of matrices.

Definition. If A = (a,;) is an m x m matrix and B = (b,;) is an n x n matrix over


any field, the Kronecker product of A and B is the mn x mn matrix obtaining
from A by replacing every entry a,; by a,;B. This product is written A® B.
Symbolically we have
A@ B = (a,;B).
For example if

(19)

(20)

This shows that in general A® B ¥- B ®A.

Problem. ( 10) Prove the following properties of the Kronecker product.


(i) Associative law:
A @(B@ C)= (A@ B)@ C
422 Rrst-order Re•d-Muller codes Ch. 14. §4.

(ii) Distributive law:


(A+ B)@ C = A@ C + B @C.
(iii)
(A@ B)(C@ D)= (AC)@ (BD). (21)
Hadamard matrices. Let us define the Hadamard matrix H 2 m of order 2m
inductively by

H2m = H2 ® H2m-•, for m ;;;?: 2.


H 2 is the Sylvester-type Hadamard matrix of Ch. 2 (see Fig. 2.3; it is also the
m

matrix given by Equation (9) above provided u and v take on values in vm in


the right order.

Theorem 5. (The fast Hadamard transform theorem.)


H2m = M1.1Afil· · · M:::>, (22)
where

and I" is an n x n unit matrix.

Proof. By induction on m. For m = 1 the result is obvious. Assume the result


is true for m. Then for 1 ~ i ~ m
M~+l = ]2m+l-i@ H 2@ /21-1
= /2@ J2m-i@ H2@ /21-1
=I2®M~.
and

Therefore
M1.1+, · · · M:::t.n = U2 ® M1.1) · · · U2 ® M:::>)(H2 ®/2m)
= H2 ® (M1.1· · · M:::>) by (21),
= H2 ® H2m by the induction hypothesis,
= H2m+l, Q.E.D.

Example. For m = 2,
Ch. 14. §4. Encoding and decoding ~(1, m) 423

(see (19) and (20)), and indeed

M,I>ut2>= 1I -1 00 0)(1
0 0 01 01 0)
1 = (l1 -1 11 -1) = H
4 4 4
lVl 0 0 1 1 l 0 - 0 1 1- - •
(
0 0 1 - 0 1 0 - l - - 1
Form= 3,
M~ > =
1
14@ H2,
2
Ms >= I2® H2® l2,
M~ > =
3
H2@ I4.
Decoding circuit: the Green machine. We. now give a decoding circuit for
~(1. m) which is based on Theorem 5. This circuit is called the Green
machine after its discoverer R.R. Green. We illustrate the method by des-
cribing the decoder for ~(1. 3).
Suppose f = fo/1 · · · !1 is the received vector, and let
F= FoF1 · · · F1
= ((- 1)1•, ( - 1)". 0 0 0 ' ( - 1)17).
We wish to find
F = FHs
= (FoFI · · · g)Msl) Ms2>Ms3>, from (22).
Now
1 1
1 -
1 1
altl) _ 1 -
lVls -
1 1
1-
1 1
1 -
So
FMs1> = (Fo + g, Fo- Flo F2 + F3, ... , F6 + F1, F6- F1).
The circuit shown in Fig. 14.9 calculates the components of FMs 1> two at a
time. The switches are arranged so that after Fo · · · F 3 have been read in, the
two-stage registers on the right contain (Fo- Flo Fo + F1) and (F2- F3, F2 + F3)
respectively. These four quantities are used in the second stage (see Fig.
14.10). Then F4 · · · F1 are read in, and (F4- F 5 , F4 + Fs) and (F6- F1, F6 + F1)
are formed in the same pair of two-stage registers.
The second stage calculates
(23)
424 Rrst-order Reed-Muller codes Ch. 14. §4.

INPUT
F7Fs···F1Fo

Fig. 14.9. First stage of Green machine.

Fig. 14.10. Second stage of Green machine.

where

M 8( 2 )-_

So the product (23) is


(Fo + F, + F2 + F), Fo- F, + F2- F), ... , F.- F 5 - F 6 + F 7 ).
This product is formed by the circuit shown in Fig. 14.1 0. The third stage
calculates
(24)
Ch. 14. §4. Encoding and decoding gji(1, m) 425

(3)_
M8 -

So the product (24) is


= (Fo + F, + · · · + F1, Fo- F, + F2- F, + F.- F, + F6- F1, ... , Fo- F,- Fz
+ F, - F. + F, + F6 - F1)
= CFoftJ•2 °0
ft1)0

which are desired Hadamard transform components. These are formed by the
circuit shown in Fig. 14.11.
Figures 14.9-14.11 together comprise the Green machine. The final stage is
to find that i for which IF,I is largest. Then f is decoded either as the i'h
codeword of gji( I, 3) if F; ~ 0, or as the complement of the i'h codeword if
F; <0.
Note that the Green machine has the useful property that the circuit for
decoding gji(l, m +I) is obtained from that for gji(l, m) by adding an extra
register to the m'h stage and then adding one more stage.

Fo -F1 Fo -F1 Fo+F1


- Fz +F3 +Fz-F3 +Fz +F3
-F4 +F5 + F4 -F 5 +F4 +Fs
+Fs-Fr + Fs -F 7 +Fs +Fr

7 Fs ?5 ?4 ?3 Fz F1 Fo

Fig. 14.11. Third ~tage of Green machine.


426 Rrst-order Reed-Muller codes Ch. 14. §5.

Problem. (II) Show that M~~M~ = M~M~~- (This implies that the order of the
stages in the decoder may be changed without altering the final output.)

§5. Bent Functions

Cosets of a first-order Reed-Muller code with the largest minimum weight


are especially interesting. When m is even the corresponding Boolean func-
tions are called bent functions, because they are in some sense furthest away
from linear functions. In this section we study their properties and give
several constructions. Bent functions will be used in the next chapter in the
construction of the nonlinear Kerdock codes.

Definition. A Boolean function f(v 1, . .. , vm) is called bwt if the Hadamard


transform coefficients F(u) given by Equation (8) are all ± 2'" 12 .

Examples. (I) f(v" v2) = v 1v2 is a bent function, since the F(u) are all± 2 (see
the example preceding Lemma 2).
(2) f(v" v2, V3, v4) = v1v2+ V3V4 is bent, as shown by the last row of Fig. 14.6.
Since F(u) is an integer (from Equation (8)) if f(vl, . .. , vm) is bent then m
must be even. From now on we assume m is even and ;;;?: 2.

Theorem 6. A bent function f(vl, ... , v'") is further away from any linear
function

than any other Boolean function. More precisely, f(vl, . .. , vm) is bent iff the
corresponding vector 1 has distance 2'"- 1 ± 2'"12- 1 from every codeword of
~(1. m). Iff is not bent, I has distance less than 2'"- 1 - 2'" - from some
12 1

codeword of ~(1. m).

Proof. Iff is not bent, then the F(u) are not all ± 2mJ 2 • From Corollary 3, since
there are 2'" summands in Equation (15), some IF(u)i must be bigger than 2mJ 2 •
Therefore from Equation (13) or ( 14), the distance between 1 and some
codeword of ~(1. m) is less than 2'"- 1 - 2'" 12-l. Q.E.D.

Theorem 7. f(vl, ... , v'") is bent iff the 2'" x 2'" matrix H whose (u, v)'h entry is
2
(1{2mJ )F(u + v) is a Hadamard matrix.

Proof. From Lemma 2. Q.E.D.


Ch. 14. §5. Bent functions 427

Note that if f(v., . .. , Vm) is bent, then we may write

F(u) = ( _ 1)/<•> (25)


2m/2 '

which defines a Boolean function /(u., ... , Um). The Hadamard transform
coefficients of J (obtained by setting f = j in Equation (8)) are 2mJ 2( - 1)'<•> =
± 2mJ 2 • Therefore j is also a bent function!
Thus there is a natural pairing f- J of bent functions.

Problem. (12) Show that f is bent iff the matrix whose (u, v )'h entry is
u, v E Vm, is a Hadamard matrix.
(- 1)1<•+v>, for

Theorem 8. If f(v., ... , Vm) is bent and m > 2, then deg f ~ ~m.

Proof. Suppose f is bent and m > 2. The proof uses the expansion of f given
by Theorem 1 of Ch. 13, and requires a lemma.
Let F(u) = ( - 1)1<•>, let F(u) be the Hadamard transform of F(u) given by
Equation (8), and let /(u) be as in Equation (25).

Lemma 9. If~ is any [m, k] code,

L f(u) = 2m-k-t- 2~m-l + 2!m-k L /(u), (26)


uE~~ ue~

where the sums are evaluated as real sums.

Proof. This is just a restatement of Lemma 2 of Ch. 5. We start from

-~~ F(u) = 2\ -~~ F(u),


and set F(u) = 1- 2f(u) and F(u) = 2ml 2(1- 2/(u)). Q.E.D.

To complete the proof of the theorem, we apply the lemma with


~.c ={bE Vm: b C a},
~={bE Vm: b C ii},

where a is some vector of V'" and ii is its complement. Then l~.cl = 2wt<a>
and (26) becomes
L f(b) = 2wt(a)-l - 2!m-l + 2wt(a)-!m L /(b). (27)
bCa bCti
428 Rrst-order Reed-Muller codes Ch. 14. §5.
I

Now Theorem l of Ch. 13 states that

where

g(a) = L
bCa
f(b) (mod 2).

Thus g(a) is given by Equation (27). But if wt(a) > ~m and m > 2, the RHS
of (27) is even, and g(a) is zero. Therefore f has degree at most ~m. Q.E.D.

Problem. (13) Show that f is bent iff for all v# 0, v E Vrn, the directional
derivative of f in the direction v, defined by
/v(x) = f(x + v) + f(x), X E Vrn,

takes the values 0 and l equally often.

We shall now construct some families of bent functions.

Theorem 10.
h(u,, ... , Urn, v,, ... , Vn) = f(u~, ... , Urn)+ g(v,, ... , Vn)
is a bent function (of m + n arguments) iff f is a bent function (of u" .. . , Urn)
and g is a bent function (of v" ... vn).

Proof. We shall write wE Vrn+n as w = (u, v) where u E Vrn and v E V". From
Equation (8),

H(w) = L (- l)w
tE Vm +n
•+h(t>, where t = (r, s),

= F(u)G(v). (28)

Iff and g are bent, then from Equation (28) H(w) = ± 2t<rn+n> and so h is bent.
Conversely, suppose his bent but f is not, so that IF(A)I > 2lrn for some A E vrn.
Then with w = (A, v ),
±2~(rn+nJ = H(w) = F(A)G(v)
for all v E V". Therefore IG(v >I< 2i" for all v E V", which is impossible by
Corollary 3. Q.E.D.
Ch. 14. §5. Bent functions 429

Theorem 10 enables us to generalize examples (l) and (2):

Corollary II.
(29)
is bent, for any even m ~ 2.

It is easy to see that if f(v) is bent then so is any function f(Av +B)
obtained from f by an affine transformation.

Problems. (14) Show that if m is even then v,v2+ v,vJ+ · · · + Vm- 1 Vm is bent.
(15) Use Dickson's theorem (Theorem 4 of the next chapter) to show that
any quadratic bent function is an affine transformation of (29).
(16) (a) The bent function v, v2 + v3v. can be represented by the graph

4 3
Show that the 28 quadratic bent functions of v,, v2, v), v. are:

(3 types),
n
(12 types), (12 types), and (l type).

(b) Let f = v,v2 + v3v •. Show that the vectors of weight 6 in f +~(I. 4) form
a 2-(16, 6, 2) design. (This design is called a biplane- see for example
Cameron [231 ]).
(17) Show that v,v2 and v,v 2+ v3v. are essentially the only bent functions of 2
and 4 arguments.

A Boolean function f(v) = f(v,, ... , vm) is called decomposable if there is a


binary matrix A such that
f(Av) = /.(v,, ... , v,) + Uv,+~, ... , Vm)
for some I. Otherwise f is indecomposable.

Problem. (18) Show that if f(v,, . .. , vm) is a bent function of degree !m ~ 3,


then f is indecomposable.
430 Rrst-order Reed-Muller codes Ch. 14. §5.

Theorem 12. For any function g(v,, ... , vm), the function

f(u,, ... , Urn, v,, ... , Vm) = f


i=l
U;V; + g(v,, ... , Vm) (30)
is bent.

Proof. From Equations (8) and (13), f(u,, ... , urn, v,, ... , vm) is bent iff the
number of vectors (u, v) = (u,, ... , urn, v,, ... , vm) which are zeros of

h = f(u,, ... , Urn, v,, ... , Vm) + f


i=l
A;U; + f
i=l
f.L;V; (31)

is 2 m~J ± 2m~J, for all A = (A 1,


2
••• , Am), f.L = (f.L" ... , f.Lm). Substituting (30) into
(31) we obtain
h = g(v,, ... , Vm) + f
i=l
(v, + A;)u, + f
i=l
f.L;V;.

(i) For any v ¥- A, the first sum is not identically zero, and h is a linear
function of u" ... , urn. Thus there are 2m~) choices of u,, ... , urn for which h
is zero. The total number of zeros of h of this type is 2m~'(2m- 1).
(ii) Now suppose v = A. Then

h = g(v,, ... , Vm) + f


i=l
f.L;V;,

which is either 0 or 1, independent of u,, ... , urn. If it is 1, the total number of


zeros of h is 2 m~ I - 2m~'. But if it is 0, there are 2m additional zeros (u
2

arbitrary, v =A), for a total of 2 m~)+ 2m~'.


2
Q.E.D.

Problems. (19) Let


f,(u,, ... , Urn, V,, ••• , Vm) = f
i=l
U;V;.

Show that f,, f, + u,u2U3, f, + u,u2U3U4, ... , and f, + u, · · · Urn are m- 1


inequivalent bent functions, of degrees 2, 3, ... , m.
(20) Show that

is bent, where g(v) is arbitrary and cp(u) = (cp,(u), ... , 'Pm(u)) is a 1-1-mapping
of vm onto itself.

Thus we have constructed a number of bent functions, and many others


will be found in the references given in the Notes. But the following problem
is unsolved:

Research Problem (14.1 ). Classify all bent functions of m variables.


Appendix B

Finite geometries

§ 1. Introduction

Finite geometries are large combinational objects just as codes are, and
therefore it is not surprising that they have turned up in many chapters of this
book (see especially Ch. 13). In this Appendix we sketch the basic theory of
these geometries, beginning in §2 with the definitions of projective and affine
geometries. The most important examples (for us) are the projective geometry
PG(m, q) and the affine or Euclidean geometry EG(m, q) of dimension m
constructed from a finite field GF(q). For dimension m ~ 3 there are no other
geometries (Theorem I).
In §3 we study some of the properties of PG(m, q) and EG(m, q),
especially their collineation groups (Theorem 7 and Corollary 9) and the
number of subspaces of each dimension (Theorems 4-6).
In dimension 2 things are more complicated. A projective plane is
equivalent to a Steiner system S(2, n + I, n 2 + n + 1), for some n ~ 2, and an
affine plane to an S(2, n, n 2) for some n ~ 2 (Theorem 10). But now other
kinds of planes exist besides PG(2, q) and EG(2, q)- see §4.

§2. Finite geometries, PG(m, q) and EG(m, q)

Definition. A finite projective geometry consists of a finite set {l of points


p, q, ... together with a collection of subsets L, M, ... of {l called lines, which
satisfies axioms (i)-(iv). (If p E L we say that p lies on L or L passes through
p.)
(i) There is a unique line (denoted by (pq)) passing through any two
distinct points p and q.
Appendix B. §2. Finite geometries 693

(ii) Every line contains at least 3 points.


(iii) If distinct lines L, M have a common point p, and if q, rare points of L
not equal to p, and s, t are points of M not equal to p, then the lines (qt) and
(rs) also have a common point (see Fig. 1).

Fig. I. Axiom (iii).

(iv) For any point p there are at least two lines not containing p, and for
any line L there are at least two points not on L.
A subspace of the projective geometry is a subset S of {l such that
(v) If p, q are distinct points of S then S contains all points of the line (pq).
Examples of subspaces are the points and lines of {l and {l itself. A
hyperplane H is a maximal proper subspace, so that {l is the only subspace
which properly contains H.

Definition. An affine or Euclidean geometry is obtained by deleting the points


of a fixed hyperplane H (called the hyperplane at infinity) from the subspaces
of a projective geometry. The resulting sets are called the subspaces of the
affine geometry.
A set T of points in a projective or affine geometry is called independent if,
for every x E T, x does not belong to the smallest subspace which contains
T- {x}. For example, any three points not on a line are independent. The
dimension of a subspace S is r- I, where r is the size of the largest set of
independent points in S. In particular, if S = {l this defines the dimension of
the projective geometry.

The projective geometry PG(m, q). The most important examples of pro-
jective and affine geometries are those obtained from finite fields.
Let G F(q) be a finite field (see Chs. 3, 4) and suppose m ~ 2. The points of
{l are taken to be the nonzero (m + I)-tuples

(ao, a,, ... , am), a, E G F(q),


694 Finite geometries Appendix B. §2.

with the rule that

are the same point, where A is any nonzero element of GF(q). These are
called homogeneous coordinates for the points. There are qm+l- I nonzero
(m + I)-tuples, and each point appears q - I times, so the number of points in
{l is (qm+l- 1)/(q- 1).
The line through two distinct points (ao, ... , am) and (bo, . .. , bm) consists
of the points
(I)
where A, J.L E G F(q) are not both zero. A line contains q + I points since there
are q 2 - I choices for A, J.L and each point appears q- I times in (1).
Axioms (i), (ii) are clearly satisfied.

Problem. (I) Check that (iii) and (iv) hold.

The projective geometry defined in this way is denoted by PG(m, q).

Problem. (2) Show that PG(m, q) has dimension m.

Examples. (I) If m = q = 2, the projective plane PG(2, 2) contains 7 points


labeled (001), (010), (100), (011), (101), (110), (Ill), and 7 lines, as shown in
Fig. 2 (cf. Fig. 2.12).

(001)

Fig. 2. The projective plane PG(2, 2).


Appendix B. §2. Rnite geometries 695

(2) If m = 2, q = 3 we obtain the projective plane PG(2, 3), containing


32 + 3 +I= 13 points
(001) (010) (OI I) (OI2)
(100) (101) (102) (IIO)
(Ill) (II2) (120) (121)
(122),
and 13 lines, nine of which are shown in Fig. 3.

(001)

Fig. 3. The 13 points and nine of the 13 lines of the projective plane PG(2, 3).

It is convenient to extend the definition of PG(m, q) to include the values


m = - I, 0 and I, even though these degenerate geometries do not satisfy
axiom (iv). Thus PG(- I, q) is the empty set, PG(O, q) is a point, and PG(l, q)
is a line.
A hyperplane or subspace of dimension m- I in PG(m, q) consists of
those points (ao, ... 'am) which satisfy a homogeneous linear equation
Aaao +A tat+···+ A.,.am = 0, A, E GF(q).
Such a hyperplane is in fact a PG(m- I, q), and will be denoted by
[A a, ..• , Am]. Note that [Ao, ... , A.,.] and [J.LAo, ... , J.LAm], J.L r6 0, represent the
same hyperplane. The lines (i.e. hyperplanes) in Figs. 2, 3 have been labeled in
this way. Clearly a point (a 0 , • • • , am) is on the hyperplane [Ao, ... , Am] iff
~i'!.o A,a, = 0.
696 Finite geometries Appendix B. §2.

Problems. (3) Show that the points of PG(m, q) can be uniquely labeled by
making the left-most nonzero coordinate equal to I (as in Example 2).
(4) Show that PG(m, q) is constructed from a vector space V of dimension
m +I over GF(q) by taking the 1-dimensional subspaces of V to be the points
of PG(m, q) and the 2-dimensional subspaces to be the lines.
(5) Find the four missing lines in Fig. 3.
(6) Construct PG(3, 2).

The affine or Euclidean projective geometry EG(m, q). This is obtained from
PG(m, q) by deleting the points of a hyperplane H (it doesn't matter which one,
by Corollary 8). For example, deleting the line [100] from Fig. 2 gives the EG(2, 2)

llfV/100

10~01
In general if we choose H to be the hyperplane [I 0 · · · 0] consisting of all
points with ao = 0, we are left with the points whose coordinates can be taken
to be (1, a~, ... , am). In this way the qm points of EG(m, q) can be labeled by
the m-tuples (a~, ... , am), a, E GF(q).
Again we make the convention that EG(- I, q) is empty, EG(O, q) is the
point 0, and EG(I, q) is a line.

Problem. (7) Show that the dimension (as defined above) of EG(m, q) is equal
tom. Show that EG(m, q) is also a vector space of dimension m over GF(q).

Remark. The nonzero elements of GF(qm+ 1) represent the points of PG(m, q),
but there are q- I elements sitting on each point. For example, take GF(4) =
{0, I, w, w 2}. The elements 100, wOO, w 2 00 of GF(4') all represent the point
(100) of PG(2, 4). The line through (100) and (010) contains the five points

100 (or wOO or w 200),


010 (or OwO or Ow 2 0),
110 (or wwO or w 2 w 20),
ww 0 or w 2 10),
2
lwO (or
2
lw 0 (or wlO or w 2 w0).

The points of the affine geometry are all elements of G F(qm).

Desarguesian geometries. If the dimension exceeds 2 all projective and affine


geometries come from finite fields. But in dimension 2 things can be more
complicated.
Appendix B. §3. Finite geometries 697

Theorem 1. If m ~ 3 then a finite projective geometry of dimension m is a


PG(m, q) for some q, and an affine geometry of dimension m is an EG(m, q)
for some q.

PG(m, q) is called a Desarguesian geometry since Desargues' theorem holds


there.
The proof of Theorem I is in three steps. (i) A projective geometry of
dimension m > 2 is one in which Desargues' theorem holds; (ii) the points of a
Desarguesian geometry can be given coordinates from a possibly noncom-
mutative field S; (iii) if S is finite it is commutative, hence S = GF(q) for
some q. For the details see Artin [29, Ch. 2], Baer [56, Ch. 7]. Dembowski
[370,Ch. I]. Herstein [642,p. 70], Veblen and Young [1368, Vol. I,Ch. 2].

§3. Properties of PG(m, q) and EG(m, q)

Subspaces of PG(m, q)

Problem. (8) Show that if S is a subspace of PG(m, q) then S is a PG(r, q) for


some r, 0 ~ r ~ m, and that S may be defined as the set of points satisfying
m - r independent homogeneous linear equations.
This implies that the intersection of two distinct PG(m- I, q)'s in a
PG(m, q) is a PG(m- 2, q) (since the points satisfy two linear equations). The
intersection of a PG(m - I, q) and a PG(m - 2, q) is either the PG(m - 2, q)
or a PG(m - 3, q ), and so on. The intersection of a PG(m - I, q) and a line is
either the line or a point.
In general, the intersection of a PG( r, q) (defined by m - r equations) and a
PG(s, q) (m- s equations) has dimension r, r- I, ... , or r- m + s, supposing
s ~ r. If r- m + s < 0, the subs paces may be disjoint.

Principle of duality. Since points and hyperplanes in a PG(m, q) are both


represented by (m + I)-tuples, there is a natural I- 1-correspondence between
them, with the point p corresponding to the dual hyperplane [p ]. Similarly
there is a I- }-correspondence between lines and subspaces PG(m- 2, q),
with the line (pq) corresponding to the dual subspace [p J n [q ]. This cor-
respondence has the property that if p is on the line (qr) then the dual
hyperplane [p] contains the dual subspace [q] n [r].
Similarly there is a I - 1-correspondence (the technical term is a cor-
relation) between subspaces of dimension r and subspaces of dimension
m - r- I, which preserves incidence. For example if two PG(r, q )'s meet in a
point then the dual PG(m- r- I, q)'s span a hyperplane.
This correspondence justifies the principle of duality, which says that any
698 Finite geometries Appendix B. §3.

statement about PG(m, q) remains true if we interchange "point" and


"hyperplane," "PG(r, q )" and "PG(m - r- I, q )," "intersect" and "span,"
and "contained in" and "contained by."
An important application of this principle is:

Theorem 2. If s ~ r, the number of subspaces PG(s, q) in PG(m, q) which


contain a given PG(r, q) is equal to the number of PG(m - s - I, q) contained
in a given PG(m- r- I, q).

Problem. (9) Prove directly the special case of Theorem 2 which says that the
number of lines through a point is equal to the number of points on a
hyperplane.

The number of subspaces

Theorem 3. The number of PG(r, q) contained in a PG(m, q) is


(qm+l -l)(qm+l _ q) ... (qm+l _ q') = [m +I] (2)
(q'+l -l)(q'+l _ q) . .. (q'+l _ q') r+ I ,

where

r~:n
is a Gaussian binomial coefficient defined in Problem 3 of Ch. I5.

Proof. The numerator of (2) is the number of ways of picking r + I in-


dependent points in PG(m, q), to define a PG(r, q). However, many of these
sets of points determine the same PG(r, q), so we must divide by the
denominator of (2), which is the number of ways of picking r + I independent
points in a PG(r, q). Q.E.D.

A similar argument proves:

Theorem 4. In PG(m, q) let R = PG(r, q) C S = PG(s, q). The number of


subspaces T of dimension t with R C T C S is

s- rr].
[t-
Problem. (10) Use Theorem 3 to show that the number of PG(r, q) contained
in PG(m, q) is equal to the number of PG(m- r- I, q) contained in PG(m, q).
Appendix B. §3. Rnlte geometries 699

Subspaces and flats of EG(m, q). A subspace S of EG(m, q) is called a flat.

Problem. (II) Show that if a flat contains the origin then it is a linear subspace
of EG(m, q) regarded as a vector space; and that a flat not containing the
origin is a coset of a linear subspace.
Thus a flat of dimension r in EG(m, q) is a coset of an EG(r, q), and will be
referred to as an EG(r, q) or an r-flat. A subspace PG(r, q) of PG(m, q) is
also called an r-flat.

Theorem 5. The number of EG(r, q) in an EG(m, q) is

Proof. Let EG(m, q) be obtained from PG(m, q) by deleting the hyperplane H.


A PG(r, q) either meets H in a PG(r- I, q) or is contained in H. Thus the
desired number is the difference between the number of PG(r, q) in PG(m, q)
and the number of PG(r, q) in H. By Theorem 3 this is

+ ']- [ m ]
[ mr+l = qm-r[m],
r+l r
by Problems 3(b), 3(e) of Ch. 15. Q.E.D.

Theorem 6. In EG(m, q) let R = EG(r, q) C S = EG(s, q), where r ~I. The


number of flats T of dimension t with R C T C S is

[t-s- rr].
Proof. Follows from Theorem 4. Q.E.D.

Note that in a projective geometry two hyperplanes always meet in a


subspace of dimension m - 2, whereas in an affine geometry two hyperplanes
may meet in a subspace of dimension m - 2 or not at all. Disjoint hyperplanes
are called parallel.

Problem. (12) Show that EG(m, q) can be decomposed into q mutually parallel
hyperplanes.

The collineation group of PG(m, q).


700 Finite geometries Appendix B. §3.

Definition. A collineation of a projective or affine geometry is a permutation


of its points which maps lines onto Jines. It follows that every subspace is
mapped onto a subspace of the same dimension.
For example, the permutation ((011), (010), (001)) ((110), (Ill), (100)) is a
collineation of PG(2, 2)- see Fig. 2.
The set of all collineations of PG(m, q) forms its collineation group.
Suppose q = p' where p is a prime.
Recall from Theorem 12 of Ch. 4 that the automorphism group of the field
GF(p') is a cyclic group of order s generated by
{3 E GF(p').

Clearly up is a collineation of PG(m, p').


Let C be an invertible (m + 1) X (m + 1) matrix over GF(p'). Then the
permutation of the points of PG(m, p') given by
(ao, ... , am)~ (ao, ... , am)C
is also a collineation. Clearly C and AC, A r6 0, are the same collineation.
Together up and the matrices · C generate a group consisting of the
permutations
(l1 11 , .... am)~(ai.', ... ,a~,')C, O=s;;i<s. (3)

There ares il:"~o(qm+l_ q') such permutations, but only

_s_ fi
q- 1 •~o
(q'" •I _ q') (4)

distinct collineations. This group of collineations 1s denoted by PrLm+l(q ),


q =p'.

Theorem 7. (The fundamental theorem of projective geometry.) PrLm+l(q) is


the full collineation group of PG(m, q).

For the proof see for example Artin [29, p. 88], Baer [56, Ch. 3] or
Carmichael [250, p. 360].
Since PrLm+l(q) is doubly transitive we have:

Corollary 8. There is essentially only one way of obtaining EG(m, q) from


PG(m, q).

Corollary 9. The full collineation group of EG(m, q) is the subgroup of


PrLm+l(q) which fixes the hyperplane at infinity (setwise), and has order
Appendix B. §4. Finite geometries 701

s n
m-1

i~eO
(qm- q'). (5)

(See for example Carmichael [250, p. 374]).

Problem. (13) Given EG(m, q) show that there is essentially only one way to
add a hyperplane and obtain PG(m, q).

§4. Projective and affine planes

A projective geometry of dimension 2 is a projective plane. Unlike the


situation in higher dimensions, a projective plane need not be a PG(2, q) for
any q.
In §5 of Ch. 2 we defined a projective plane to be a Steiner system
S(2, n + I, n 2 + n + I) for some n ~ 2, or in other words (Definition 2): a
collection of n 2 + n + I points and n 2 + n +I lines, with n +I points on each
line and a unique line containing any two points.
However, the best definition of a projective plane is this. Definition 3. A
projective plane is a collection of points and lines satisfying (i) there is a
unique line containing any two points, (ii) any two distinct lines meet at a
unique point, and (iii) there exist four points no three of which lie on a line.

Theorem 10. The three definitions of a projective plane are equivalent.

Sketch of Proof. Definition I (§2) ~Definition 3. It is only necessary to show


that any two lines meet. This follows because otherwise the two lines would
contain four independent points and the dimension would not be 2.
Definition 3 ~Definition 2. Take two points p, q and a line L not contain-
ing them. Then the number of lines through p (or through q) is equal to the
number of points on L. Call this number n + I. Then the total number of
points (or lines) is n(n + I)+ I= n 2 + n +I.
Definition 2 ~Definition I. To prove (iii) we show that any two lines meet.
This follows from evaluating in two ways the sum of x(p, L, M) over all
points p and distinct lines L, M, where x(p, L, M) =I if p = L n M, = 0
otherwise. The dimension is 2, for if p, q, r, s are independent points then the
lines (pq) and (rs) do not meet. Q.E.D.

A Steiner system S(2, n +I, n 2 + n +I) is called a projective plane of order


n. Thus Figs. 2, 3 show projective planes of orders 2 and 3. In general a
PG(2, q) is a projective plane of order q. From Theorem 7 of Ch. 4, this gives
Desarguesian projective planes of all prime power orders.
702 Rnite geometries Appendix B. §4.

However, not all projective planes are Desarguesian. In fact non-Desar-


guesian planes are known of all orders n = p• > 8 where p is a prime and
e > I. For n ~ 8 we have

Theorem 11. The projective planes of orders n = 2, 3, 4, 5, 7, 8 are unique (and


are the Desarguesian planes PG(2, n)).

For the proof see Problem 13 of Ch. 20, and the references on p. I44 of
Dembowski [370].
We know from Problem II of Ch. I9 that there is no projective plane of
order 6. This is a special case of

Theorem 12. (Bruck and Ryser.) If n = I or 2 (mod 4) and if n is not the sum
of two squares then there is no projective plane of order n.

For the proof see Hall [587, p. I75] or Hughes and Piper [674, p. 87].
Thus planes are known of orders 2, 3, 4, 5, 7, 8, 9, II, 13, I6, I7, I9, . : . , orders
6, I4, 2I, ... do not exist by Theorem I2, and orders IO, I2, I5, I8, 20, ... are
undecided. For the connection between codes and orders n = 2 (mod 4) see
Problem II of Ch. I9.

Affine or euclidean planes. An affine geometry of order 2 is an affine plane,


and is obtained by deleting the points of a fixed line from a projective plane.
A second definition was given in §5 of Ch. 2: an affine plane is an S(2, n, n 2 ),
n ~ 2. A third definition is this. An affine plane is a collection of points and
lines satisfying (i) there is a unique line containing any two points, (ii) given
any line L and any point p e L there is a unique line through p which does not
meet L, and (iii) there exist three points not on a line. Again the three
definitions agree, and we call an S(2, n, n 2 ) an affine plane of order n. Then the
results given above about the possible orders of projective planes apply also
to affine planes.

Notes on Appendix B
Projective geometries are discussed by Artin [29], Baer [56], Biggs [I43],
Birkhoff [I 52, Ch. 8], Carmichael [250], Dembowski [370], Hall [583, Ch. I2],
MacNeish [869], Segre [II73], and Veblen and Young [1368]. References on
projective planes are Albert and Sandler [20], Hall [582, Ch. 20 and 587, Ch.
I2], Segre [II73] and especially Dembowski [370] and Hughes and Piper [674].
For the numbers of subspaces see for example Carmichael [250] or Goldman
and Rota [5I9]. See also the series of papers by Dai, Feng, Wan and Yang
[103, 104, 1441, 1457-1460, 1474].

You might also like