Professional Documents
Culture Documents
Reed-Muller codes are amongst the oldest and most well-known of codes. They were discov-
ered and proposed by D. E. Muller and I. S. Reed in 1954.
Reed-Muller codes have many interesting properties that are worth examination; they
form an infinite family of codes, and larger Reed-Muller codes can be constructed from
smaller ones. This particular observation leads us to show that Reed-Muller codes can be
defined recursively.
Unfortunately, Reed-Muller codes become weaker as their length increases. However,
they are often used as building blocks in other codes.
One of the major advantages of Reed-Muller codes is their relative simplicity to encode
messages and decode received transmissions. We examine encoding using generator matrices
and decoding using one form of a process known as majority logic.
Reed-Muller codes, like many other codes, have tight links to design theory; we briefly
investigate this link between Reed-Muller codes and the designs resulting from affine geome-
tries.
Finally, we present the reader with an implementation of the Reed-Muller encoding and
decoding process, written in ANSI C.
Notation: Assume that x and y are vectors over our vector space Fn2 (with x = (x1 , x2 , . . . , xn ),
and y = (y1 , y2 , . . . , yn )). Let a be an element of F2 , where we call a a scalar. Then, we
have the standard operations of addition and scalar multiplication for vectors (these are not
described here, and we assume that the reader is familiar with them). However, we also
extend the set of operations in the following way:
• Scalar addition
a + x = (a + x1 , a + x2 , . . . , a + xn )
• Vector complement
x = 1 + x = (1 + x1 , 1 + x2 , . . . , 1 + xn )
• Vector multiplication
x ∗ y = (x1 ∗ y1 , x2 ∗ y2 , . . . , xn ∗ yn )
2
Note that (Fn2 , +, ∗) forms a commutative ring.
m
Assume that we are given a vector space F22 . Then we consider the ring Rm =
F2 [x0 , x1 , . . . , xm−1 ]. We will see shortly that there exists a bijection between elements of
m m
Rm and F22 (in fact, an isomorphism of rings between (Rm , +, ∗) and (F22 , +, ∗)).
Example: Say we have the Boolean polynomial p = 1 + x1 + x50 x22 + x0 x41 x101
2 ∈ R3 . Then,
by applying the above rules, we can get its reduced form, p′ :
p′ = 1 + x1 + x0 x2 + x0 x1 x2
m
Consider the mapping ψ : Rm → F22 , defined as follows:
3
For any monomial p ∈ Rm , to calculate ψ(p), we find the reduced form:
q = m1 + m2 + . . . + mr
Definition: The r th order Reed-Muller code, denoted RM(r, m), is the set of all polyno-
mials of degree at most r in the ring Rm , as defined in the previous section. Alternatively,
m
through the isomorphism ψ, it may be thought of as a subspace of F22 .
Observation: The 0th order Reed-Muller code RM(0, m) consists of the monomials {0, 1},
which is equivalent to the following vectors:
4
3.1 Encoding and the generator matrix
For the Reed-Muller code RM(r, m), we define the generator matrix as follows:
⎡ ⎤
ψ(1)
⎢ ψ(x0 ) ⎥
⎢ ⎥
⎢ ψ(x ) ⎥
⎢ 1 ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ ψ(xm−1 ) ⎥
⎢ ⎥
⎢ ψ(x0 x1 ) ⎥
GRM(r,m) = ⎢
⎢ ψ(x0 x2 )
⎥
⎥
⎢ . ⎥
⎢ .
. ⎥
⎢ ⎥
⎢ ψ(xm−2 xm−1 ) ⎥
⎢ ⎥
⎢ ψ(x0 x1 x2 ) ⎥
⎢ ⎥
⎢ . ⎥
⎣ .
. ⎦
ψ(xm−r xm−r+1 . . .xm−1 )
5
Example: The generator matrix for RM(2, 4) is:
⎡ ⎤ ⎡ ⎤
ψ(1) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
⎢ ψ(x0 ) ⎥ ⎢ 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x1 ) ⎥ ⎢ 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x2 ) ⎥ ⎢ 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x3 ) ⎥ ⎢ 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ⎥
⎢ ⎥ ⎢ ⎥
GRM(2,4) = ⎢
⎢ ψ(x0 x1 )
⎥=⎢ 1 1 1 1 0
⎥ ⎢ 0 0 0 0 0 0 0 0 0 0 0 ⎥
⎥
⎢ ψ(x0 x2 ) ⎥ ⎢ 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x0 x3 ) ⎥ ⎢ 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ψ(x1 x2 ) ⎥ ⎢ 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎣ ψ(x1 x3 ) ⎦ ⎣ 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 ⎦
ψ(x2 x3 ) 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
n = 2m
The dimension of the code is hence k.
Encoding is then easy; we are given a message m ∈ Fk2 and we simply perform the
multiplication m ∗ GRM(r,m) to get our codeword c.
Example: Assume we want to encode the message m = 01101001010 using RM(2, 4).
We have:
c = m ∗ GRM(2,4) = 1010111111111010
6
3.2 Decoding using majority logic
There are different techniques available for decoding Reed-Muller codes, but one of the most
common and most easily implementable is majority logic decoding. We will investigate one
form of this technique here.
Before we begin, we will give mention to the error-correcting capabilities of Reed-Muller
codes. Given the code RM(r, m), the distance between any two codewords is 2m−r . This
is not proven here; interested readers can refer to theorem 2 in section 4.2 of this document
for a detailed proof of this fact that relies on the recursive definition of Reed-Muller codes.
To correct ϵ errors, we must have a distance strictly greater than 2ϵ. Thus, in the case of
Reed-Muller codes, since we have distance 2m−r , we can correct max(0, 2m−r−1 − 1) errors.
We now look at a basic algorithm for decoding using majority logic. The essential idea
behind this technique is that, for each row of the generator matrix, we attempt to determine
through a majority vote whether or not that row was employed in the formation of the
codeword corresponding to our message.
Definition: Let p be any monomial of degree d in Rm with reduced form p′ . Then we form
the set J of variables not in p′ and their complements. The characteristic vectors of p are
all vectors corresponding to monomials of degree m − d over the variables of J . Note: Any
monomial containing a variable and its complement corresponds to the 0 vector (through ψ:
ψ(xi xi ) = ψ(xi ) ∗ ψ(xi ) = 0). Since ψ is a bijection and ψ −1 (0) = 0, this implies that any
monomial containing both a variable and its complement is equivalent to the monomial of
degree 0. Thus, without loss of generality, we only consider monomials where the variables
are distinct (i.e. no variable and its complement appears).
Example: If we are working over RM(3, 4), the characteristic vectors of x0 x1 x3 are the
vectors corresponding to the monomials {x2 , x2 }. The characteristic vectors of x0 x2 are the
vectors corresponding to the monomials {x1 x3 , x1 x3 , x1 x3 , x1 x3 }.
The way in which we accomplish this is that we begin at the bottom of our generator
matrix and work our way upwards. Our algorithm starts by examining the rows with mono-
mials of degree r. We begin by calculating 2m−r characteristic vectors for the row, and then
we take the dot product of each of these vectors with our received message. If the majority
of the dot products are 1, we assume that this row was used in constructing our codeword,
and we set the position in our original message vector corresponding to this row to 1. If the
majority of the dot products are 0, then we assume that this row was not used in forming
our message, and hence, we set the corresponding entry in the original message vector to 0.
After we have run this technique
.m/ for all rows corresponding to monomials of degree r,
we take the vector of length r corresponding
. / to the portion of the message we have just
calculated, and we multiply it by the mr rows of our generator matrix that we have just
considered. This gives us a vector s of length n. We add s to our received message, and
proceed recursively on the rows corresponding to monomials of degree r − 1.
This is best clarified by an example.
Example: We decode the message u = 00110110 in RM(2, 3). The generator matrix for
this code can be found in the preceeding subsection.
7
As per the algorithm, we begin from the bottom of the matrix up, first considering the
rows corresponding to monomials of degree r = 2.
Row x1 x2 has characteristic vectors x0 , x0 .
0
u · ψ(x0 ) = 0
→m= 0
u · ψ(x0 ) = 0
We continue to proceed in this fashion, to discover that the original message was v =
0111010 (we can trivially check that this is correct by verifying that v · GRM(2,3) = u). In
this case, u had no errors (indeed, RM(2, 3) can correct 0 errors, which makes it a poor
choice of code, but it will suffice for the purposes of this example).
8
4.1 An alternate view of ψ
Another way that we can think of ψ is as follows. Consider the ring Rm and the vector space
Fm
2 . We can consider the vectors in F2 simply as binary representations of the elements in
m
Z2m , and thus we can derive a natural ordering from this observation.
Example: Suppose we are considering F32 . Our natural ordering for the vectors in this
vector space is as follows:
Let S be the set of all monomials in Rm . Let p ∈ S be a Boolean monomial with reduced
form p′ with the form p′ = xi1 xi2 . . .xir (ij ∈ Zm , ij = ik → j = k, 0 ≤ r ≤ m). We consider
the function α, defined as follows (where P(X ) is the power set of X ):
α : S → P(Zm )
7
′ ∅ : if r = 0
p ≡ p = xi1 xi2 . . .xir )→
{i1 , i2 , . . . , ir } : otherwise
We then define a class of certain functions. Given T ∈ P(Zm ), we define the following
function:
fT : Fm
2 → F2
7 8
(x0 , x1 , . . . , xm−1 ) )→ i∈T (xi + 1) : if T ̸= ∅
1 : if T = ∅
f{0,2} (1011) = (1 + 1) · (0 + 1) = 0 · 1 = 0
f{0,2} (0101) = (0 + 1) · (0 + 1) = 1 · 1 = 1
9
7
β(0) : x=0
x = m1 + m2 + . . . + mr )→
β(m1 ) + β(m2 ) + . . . + β(mr ) : x ∈ Rm \ {0}
This more precise definition of ψ is consistent with our previous definition of ψ, and this
is best shown with an example.
Example: Using ψ as defined above, we construct GRM(2,3) . The rows of GRM(2,3) corre-
spond to the monomials {1, x0 , x1 , x2 , x0 x1 , x0 x2 , x1 x2 }, and to get each row, we evaluate ψ
for each monomial.
ψ(1) = β(1) = (f∅ (000), f∅(001), . . . , f∅ (111)) = 11111111
ψ(x0 ) = β(x0 ) = (f{0} (000), f{0} (001), . . . , f{0} (111)) = 11110000
ψ(x1 ) = β(x1 ) = (f{1} (000), f{1} (001), . . . , f{1} (111)) = 11001100
ψ(x2 ) = β(x2 ) = (f{2} (000), f{2} (001), . . . , f{2} (111)) = 10101010
ψ(x0 x1 ) = β(x0 x1 ) = (f{0,1} (000), f{0,1} (001), . . . , f{0,1} (111)) = 11000000
ψ(x0 x2 ) = β(x0 x2 ) = (f{0,2} (000), f{0,2} (001), . . . , f{0,2} (111)) = 10100000
ψ(x1 x2 ) = β(x1 x2 ) = (f{1,2} (000), f{1,2} (001), . . . , f{1,2} (111)) = 10001000
As we can see, these vectors form the rows of GRM(2,3) as defined in section 2.
1. RM(0, m) = {00.
! "#. .0$}
2m
m
2. RM(m, m) = F22
Example: Using the recursive definition of Reed-Muller codes, we find RM(1, 2):
This recursive definition of the codes translates to a recursive definition of the generator
matrix, as follows: 9 :
GRM(r−1,m) GRM(r−1,m)
GRM(r,m) =
0 GRM(r−1,m−1)
where we consider the following base cases:
10
GRM(0,m) = {11.
! "#. .1$}
2m
⎡ ⎤
GRM(m−1,m)
GRM(m,m) = ⎣ 00.
! "#. .0$ 1
⎦
2m −1
Example: Using the recursive definition of generator matrices for Reed-Muller codes, we
construct GRM(2,3) :
9 :
GRM(2,2) GRM(2,2)
GRM(2,3) =
⎡ 0 GRM(1,2) ⎤
GRM (1, 2) GRM (1, 2)
⎢ 0 0 0 1 0 0 0 1 ⎥
= ⎢ ⎣ 0 0 0 0 GRM(1,1) GRM(1,1) ⎦
⎥
⎡ 0 0 0 0 0 0 GRM(0,1) ⎤
GRM(1,1) GRM(1,1) GRM(1,1) GRM(1,1)
⎢ 0 0 GRM(0,1) 0 0 GRM(0,1) ⎥
⎢ ⎥
⎢
= ⎢ 0 0 0 1 0 0 0 1 ⎥ ⎥
⎣ 0 0 0 0 GRM(1,1) GRM(1,1) ⎦
⎡ 0 0 0 0 0 ⎤ 0 GRM(0,1)
1 1 1 1 1 1 1 1
⎢ 0 1 0 1 0 1 0 1 ⎥
⎢ ⎥
⎢ 0 0 1 1 0 0 1 1 ⎥
⎢ ⎥
= ⎢ ⎢ 0 0 0 1 0 0 0 1 ⎥
⎥
⎢ 0 0 0 0 1 1 1 1 ⎥
⎢ ⎥
⎣ 0 0 0 0 0 1 0 1 ⎦
0 0 0 0 0 0 1 1
One interesting property of the recursive definition is that it facilitates proving certain
properties of Reed-Muller codes; recursion allows us to derive inductive proofs that are, in
these cases, much simpler than non-inductive proofs.
Proof. (from [4]) Let wt(x) denote the weight of x in the code.
We proceed by induction on r. By definition:
11
1. x ̸= y
By the inductive hypothesis, the weight of x + y must be at least 2(m−1)−r . We also
have that the weight of x is at least 2(m−1)−r . Thus:
2. x = y
We then have that (x, x+y) = (x, 0) = (y, 0), and we know that y ∈ RM(r −1, m−1),
so we have that:
In AG(V), the points correspond to the 0-flats. The 1-flats are the lines and the 2-flats
the planes, with incidence defined by containment, as we will see in an example in a moment.
Example: We consider the field F = F2 and the vector space V = F32 . We then determine
the subspaces of V, partition them by dimension, and derive the cosets. For brevity, we
write vectors of V as 0-1 strings.
The subspace of dimension 0 is:
{000}
The cosets over this subspace give us the 0-flats, or the points, of our geometry:
12
The subspaces of dimension 1 are:
The cosets over these subspaces give us the 1-flats, or lines, of our geometry:
{000, 001, 010, 011} {000, 001, 100, 101} {000, 001, 110, 111} {000, 010, 100, 110}
{000, 010, 101, 111} {000, 011, 100, 111} {000, 011, 101, 110}
The cosets over these subspaces give us the 2-flats, or planes, of our geometry:
{000, 001, 010, 011} {000, 001, 100, 101} {000, 001, 110, 111} {000, 010, 100, 110}
{000, 010, 101, 111} {000, 011, 100, 111} {000, 011, 101, 110} {100, 101, 110, 111}
{010, 011, 110, 111} {010, 011, 100, 101} {011, 011, 101, 111} {001, 011, 100, 110}
{001, 010, 101, 110} {001, 010, 100, 111}
Trivially, the only subspace of V of dimension 3 is V, and the only coset derivable from this
subspace is V.
We also note that if, for an affine geometry AG(V), we take the set of 0-flats and the set
of s-flats (with s > 0) together, they form a balanced incomplete block design, with the set
of 0-flats as the points, and the set of s-flats as the blocks.
Example: In the previous example, we notice that the set of 0-flats as the points with
the set of 1-flats as the blocks forms a 2-(8, 2, 1)-BIBD. If we take the set of 0-flats as our
points and the set of 2-flats as our blocks, we have a 3-(8, 4, 1)-BIBD.
In the next section, we briefly examine the links between these designs and Reed-Muller
codes.
13
Theorem 3. RM(r, m) is the binary code of the design of points and (m − r)-flats of
AG(Fm
2 ).
We do not prove this theorem, but it simply requires showing that the incidence vectors
of the (m − r)-flats of AG(Fm
2 ) span RM(r, m). A detailed proof may be found in [1].
Example: We demonstrate how RM(1, 3) is the binary code of the design of points and
2-flats of AG(F32 ). We have already investigated AG(F32 ) in the previous section, and we will
rely on the results we obtained there.
We examine how the incidence vectors of the plane equations corresponding to the 2-flats
of AG(Fm 2 ) are actually codewords in our design. Recall that GRM(1,3) , using the isomorphism
ψ, is as follows: ⎡ ⎤
ψ(1)
⎢ ψ(x0 ) ⎥
GRM(1,3) = ⎢⎣ ψ(x1 ) ⎦
⎥
ψ(x2 )
Then it is easy to see that the 16 messages over F42 correspond to the codeword vectors
associated with the following equations:
Now we examine our design; each block corresponds to a plane, and each plane has an
14
incidence vector, as per the following table:
The vectors 0000 and 1111 in F42 do not have planes associated with their codewords;
however, their associated geometric structures may be formed by linear combinations of the
planes of the others.
Thus, Reed-Muller codes can be associated with the designs of affine geometries, and
indeed, several majority-logic decoding techniques have arisen because of this association.
15
A Implementation of Reed-Muller encoding and de-
coding
Using the techniques found in section 3, I have implemented two command-line applications
to perform encoding and decoding of Reed-Muller codes.
These applications have been coded in ANSI C and hence, should compile on any platform
that has an available C compiler. The source code is available on my personal web page, at
http://www.site.uottawa.ca/~raaphors
16
~ -> ./rmencode 2 5 1111111111111111
01111110111010001110100010000001
~ -> ./rmdecode 2 4 1010111111111010
01101001010
~ -> ./rmdecode 2 4 1010111011111010
01101001010
~ -> ./rmdecode 2 4 1011111111111010
01101001010
~ -> ./rmdecode 2 5 01111110111010001110100010000001
1111111111111111
~ -> ./rmdecode 2 5 01101110101010001110101010000001
1111111111111111
~ ->
Note that in the final example, we add an error vector of length 3 to the encoding of
1111111111111111 (namely 00010000010000000000001000000000), and the program decodes
the correct message. This is as we expect in RM(2, 5), which can correct at most 25−2−1 −1 =
3 errors.
17
References
[1] Assmus, Jr., E. F. and Key, J. D. Designs and their Codes. Press Syndicate of the
University of Cambridge, Cambridge, 1992.
[2] Cameron, P. J. and Van Lint, J. H. Designs, Graphs, Codes, and their Links. Cambridge
University Press, Cambridge, 1991.
[3] Cooke, Ben. Reed-Muller Error Correcting Codes. MIT Undergraduate Journal of Math-
ematics Volume 1, MIT Department of Mathematics, 1999.
[4] Hankerson, D. R. et al. Coding Theory and Cryptography: The Essentials. Marcel
Dekker, New York, 1991.
[5] MacWilliams, F. J. and Sloane, N. J. A. The Theory of Error Correcting Codes. North-
Holland Mathematical Library Volume 16, Elsevier Science B. V., Amsterdam, 1977.
18
50 Channel Coding in Communication Networks
EXAMPLE 2.13. We take as code C the code (known as the Hamming code) (7,4,3).
Its parity check matrix:
⎛ ⎞
1010101
H = ⎝ 0110011 ⎠
0001111
has neither a zero column, nor two equal columns. The minimum distance of C is 3.
EXAMPLE 2.14. We take as code C the Hsiao code (8,4,4) whose parity check matrix
is:
⎛ ⎞
10000111
⎜ 01001011 ⎟
H=⎜ ⎝ 00101101 ⎠
⎟
00011110
The best known linear codes are the Hamming codes and the Reed-Muller codes
(known as RM codes). Hamming codes have a parity check matrix formed by all the
non-zero r-tuples. They are the (2r −1, 2r −1−r, 3) codes. An RM code with a length
of 2m and order r is built on the basis of vectors v0 , v1 , . . . , vm , where v0 = (11 · · · 1)
and vi has 2i−1 “0” then 2i−1 “1” as components from left to right, in alternation. The
codewords of an RM code of length 2m and order r are all the products (component
by component) of a maximum
ofrm
codewordsm vi
. An RM code of the order r has a
length q, a dimension 1 + m 1 + 2 + · · · + r , and a minimum distance 2
m−r
.
Block Codes 51
There are various more or less complex decodings possible, such as, for example,
lattice decoding, studied by S. Lin and T. Kasami amongst others.
Let us now introduce a very easy algorithm that can be used for all linear codes.
Let there be a linear code C, of length n, corrector of t errors by word, for which
we take a generator matrix G. We will suppose that C is binary, although this decoding
extends directly to non-binary codes.
Preparation of decoding
Decoding
Initialization phase
Let us use li to indicate the binary vector of Hamming weight equal to 1, where 1
is in position i. The iterative stage comprises two stages:
1) calculation of H[c + x + li ]t , and search for wH (x + 1i ) in the table. If it is not
found, the error cannot be corrected. We pass to 5);
2) analysis of wH (x + 1i ):
– if wH (x + 1i ) ≥ P , we do nothing,
– if wH (x + 1i) ≤ P , then : c + x ← c + x + 1i and P ← wH (x + 1i ).
EXAMPLE 2.16. Let C be a BCH code (see section 2.4), with n = 15, t = 2, and
g(X) = 1 + X 4 + X 6 + X 7 + X 8 . Its generator matrix is:
⎛ ⎞
100010111000000
⎜ 010001011100000 ⎟
⎜ ⎟
⎜ 001000101110000 ⎟
⎜ ⎟
G=⎜ ⎟
⎜ 000100010111000 ⎟
⎜ 000010001011100 ⎟
⎜ ⎟
⎝ 000001000101110 ⎠
000000100010111
The table of pairs (wH (x), zx ) contains 121 elements. Let us suppose that the
transmitted codeword is c = 0 and the received codeword is x = (000100010000000).
Iterative phase:
– i = 1, 2, 3:
– we find nothing in the table,
– therefore, we do nothing;
– i = 4:
– H[c + x + 14 ]t = [11100011]t , and the wH (x + 14 ) equals 1, lower than P ,
– we thus replace P by 1 and the received vector by c + x + 14 , i.e.
(000000010000000);
– i = 5, 6, 7: nothing changes;
– i = 8: H[c + x + 18 ]t = [00000000]t and the wH (x + 18 ) equals 0. The
corrected word is thus c + x + 18 .
We presume that the reader is already familiar with the notions of modulo n cal-
culations, Z/(p) field, p prime (also noted Fp ) and Euclid and Bezout equalities. We
also presume that the concept of ring of polynomials on the Fp field is also known. An
important result concerning the ring of polynomials is the following.
It is sometimes necessary (for example for certain decodings) to seek the roots of
a polynomial in a given field. Let us give an example of such a search for the roots of
a polynomial b(Y ) in a finite field.
We can study the order and the opposite of an element of a ring, but here we
place ourselves in a finite field. Let β ∈ F2n , non-zero. Let us note that it is invert-
ible because it is in a field. We consider the family E = {β, β 2 , β 3 , . . .} of distinct
successive powers of β.
Block Codes 55
2.3.4.1. Order
The order of β is the smallest positive integer e such that β e = 1 (e depends on β).
Proof. The set of q − 1 invertibles of the field forms a multiplicative group. The set of
powers of β forms a multiplicative subgroup. We know that the cardinal of a subgroup
divides the cardinal of the group which it contains. Lastly, e is the cardinal of the
subgroup. Thus, e divides q − 1.
Proof. Let us note (a, b) for pgcd(a, b). We have: (xr )e/(e,r) = (xe )r/(e,r) = 1. Thus,
the order of xr divides e/(e, r) (see proposition 2.14). If we have (xr )E = 1, then
e|rE, i.e. rE = λe for a certain λ, from where [r/(e, r) × E = e/(e, r)] × λ. Since
we see that (r/(e, r), e/(e, r)) = 1, E is a multiple of e/(e, r).
56 Channel Coding in Communication Networks
2.3.4.3.1. Existence
Propositions 2.16, 2.17 and 2.18 prove the existence of primitive. Let us pose
mk
q − 1 = pm
1 · · · pk (primary decomposition of q − 1).
1
Proof. If not, the order of all x = 0 of the field would be the root of X (q−1)/p1 − 1,
which is not possible, because of the degree (see proposition 2.10).
ik
Of course, there is also an element y2 whose order is of the form pi11 pm 2 i3
2 p3 · · · p k
and so on. There thus exist particular elements y1 , y2 , . . . , yk .
Block Codes 57
m2 i
p p3 i3 ...pkk
PROPOSITION 2.17. Let z1 = y1 2 . Its order is pm
1 .
1
Proof. Applying proposition 2.15 we see that the order of the element z1 is equal to
ik ik ik
(pm 1 i2 m1 i2 m2 i3
1 p2 · · · pk )/(p1 p2 · · · pk , p2 p3 · · · pk ).
Using the same argument we also obtain the elements z2 , . . . , zk that have respec-
mk
tive orders pm
2 , . . . , pk .
2
Proof. Let E be the order of t. E is of the form pr11 · · · prkk , (see proposition 2.13), with
r1 r2 rk
ri ≤ mi for all i. We have tp1 p2 ···pk = 1. Let us raise to the power of pm 2
2 −r2
···
rk mk −rk mk r1 m2 mk
r1 r2 m2 −r2 m2 p p ···p
pm
k
k −rk
. We have: (tp1 p2 ···pk )p2 ···pk
= tp1 ···pk = z1 1 2 k
= 1.
m1 r1 m2 mk
Thus (see proposition 2.14) p1 |p1 p2 · · · pk , and then m1 |r1 , which means that
r1 = m1 . By symmetry we also obtain the equalities r2 = m2 , . . . , rk = mk , and
thus the order of t is equal to q − 1.
We cannot formally construct a primitive, but if we know one of them we can find
them all, as indicated by the following proposition.
If we are not in a field there may not exist a primitive for the group of invertibles,
as the following examples show. Let us recall that ϕ is the Euler indicator. The number
of integers smaller than m, and relatively preceding this m, is equal to ϕ(m).
EXAMPLE 2.24. In Z/(9) we have ϕ(9) = 6. The group of invertibles thus has 6
elements. Element 2 has an order 6. It is a primitive from the group of invertibles.
EXAMPLE 2.25. In Z/(8) there are 4 invertibles. The invertibles 1, 3, 5, 7 have the
respective orders 1, 2, 2, 2. Thus, there are no primitives.
1=1 α5 = α + α2 α10 = 1 + α + α2
α=α α6 = α2 + α3 α11 = α + α2 + α3
α2 = α2 α7 = 1 + α + α3 α12 = 1 + α + α2 + α3
α3 = α3 α8 = 1 + α2 α13 = 1 + α2 + α3
α4 = 1 + α α9 = α + α3 α14 = 1 + α3
1 + α = α4 1 + α2 = α8 1 + α3 = α14 1 + α4 = α
1 + α5 = α10 1 + α6 = α13 1 + α7 = α9
2.3.4.6. Exponentiation
We saw how to search for the order of an element, and how find out if it is primitive.
For large fields (i.e large q) we are led to calculate β i for very large i. One of the best
methods is to proceed as follows:
1) break up i in base 2;
2 3
2) calculate the exponentiations by 2, i.e. β, β 2 , β 2 , β 2 , etc.;
3) calculate the necessary products (see example 2.28).
This method is used, for example, for calculations necessary for the use of RSA in
cryptography.
PROPOSITION 2.20. There exists a polynomial with binary coefficients, which admits
all the elements of this part as the set of its roots. This polynomial is irreducible.
Proof. Let us examine Cβ . It is a finite part, because it is included in a finite field. Let
2 t−1 t
us pose: Cβ = {β, β 2 , β 2 , . . . , β 2 }. This means that β 2 is an element of the form
i
β 2 , with 0 ≤ i ≤ t − 1.
t−1 i−1 t−1 i−1
Let us suppose i = 0. We then have (β 2 )2 = (β 2 )2 . Thus, (β 2 /β 2 )2 =
1. However, the polynomial Z 2 − Z has only two roots (see proposition 2.10), which
60 Channel Coding in Communication Networks
t−1 i−1
are 0 and 1. This leads to β 2 = β 2 . Thus, 2t−1 has been already obtained,
t−1
which goes against the definition of Cβ . Therefore, β 2 = β. A consequence of this
equality is that the class Cβ is stable under exponentiation by 2.
2 t−1
Now, let us consider the polynomial (Y − β)(Y − β 2 )(Y − β 2 ) · · · (Y − β 2 ).
It has the symmetrical functions of its roots as coefficients. Thus, each of its coef-
ficients is invariant under exponentiation by 2. Each coefficient is, therefore, binary.
This polynomial is irreducible, since otherwise it would have a divisor of a strictly
smaller degree than it does. Moreover, this divisor would have at least one element
of Cβ as root. As this class is invariant under exponentiation by 2, and according to
proposition 2.11, this polynomial should have all the elements of Cβ as roots. This is
impossible according to proposition 2.10. Thus, this divisor strictly cannot exist.
When we study a cyclic code of length n, we are led to seek the smallest field
Fq (q = 2r ) containing the nth roots of unity (i.e. x such that xn = 1). If x has as an
order n, it is said that it is a nth primitive root of unity.
PROPOSITION 2.21. The smallest field Fq (q = 2r ), which contains the nth roots of
unity, is such that r is the order of 2 modulo n.
Proof. Fq has q − 1 non-zero elements, which form a multiplicative group. The set of
nth roots of unity forms a subgroup thereof. Thus, n divides 2r −1. Written differently,
we have 2r − 1 = λn, or otherwise 2r = 1 modulo n, which shows that r is the order
of 2 modulo n.
Proof. Indeed:
1+X n
1) γ is a root of the polynomial 1+X , since the group of the nth roots is the set
of roots of the polynomial 1 + X n ;
2) n is odd, since it divides 2r − 1.
2.3.7.1. Points
A point, noted (αi ), is the subspace of Fqm+1 generated by αi , deprived of 0. We
have (αi ) = {αi , βαi , . . . , β q−2 αi } = L(αi )\{0}. It is a subspace of dimension 1
deprived of 0.
2.3.7.4. An example
Let us take q = 2, s = 1, m = 2, F8 = F2 [X]/(1 + X + X 3 ), α = X. The points
are as follows: (α0 ) = {α0 } because β = 1, (α1 ), (α2 ), (α3 ), (α4 ), (α5 ), (α6 ).
After the theoretical results of C. Shannon and the first linear code constructions
(Hamming, Golay) American engineers were required to be able to obtain codes stable
not only under addition (linear codes), but also stable under circular sliding (or shift).
The codes obtained (cyclic codes) are linear codes with additional properties.
2.4.1. Introduction
PROPOSITION 2.23. Any code C, stable under addition and circular shift may be
represented as an ideal A.
Proof. The circular shift on the right represents the multiplication by X in A. The code
C is thus stable under addition and multiplication by X. It is therefore stable under
addition and multiplication by any polynomial: thus, it is an ideal A. Conversely, an
ideal A is clearly a code stable under addition and circular shift.
PROPOSITION 2.24. Any cyclic code has the form (g(X)) (i.e. the set of multiples of
g (X)), with g(X) dividing X n − 1. More precisely, there is between the cyclic codes
of length n and the set of divisors of X n − 1.
Proof. We know that the ring A = F2 [X]/(X n − 1) is principal, i.e. it is the set of the
multiples of one of its elements.
It remains to be shown that two divisors distinct from X n − 1 generate two dis-
tinct codes C1 and C2 . Let us suppose C1 = C2 . Then in F2 [X], we have g2 (X) =
q(X)g1 (X) + λ(X)(X n − 1), for a certain λ(X). Thus g1 (X) divides g2 (X), since
g1 (X) divides X n − 1. Using symmetry, we prove the equality g1 (X) = g2 (X).
PROPOSITION 2.25. Any cyclic code (a(X)) where a (X) is unspecified is still gen-
erated by P G(X), where P G(X) = (a(X), X n − 1).
Proof. Let us pose P G(X) = (a(X), X n − 1). Using the Bezout equality we obtain
P G(X) = λ(X)a(X) + µ(X)(X n − 1), for certain λ(X), µ(X). This proves that
P G(X) is in the code (a(X)). Any multiple of a(X) is thus a multiple of P G(X). In
(a(X)) there exists a generator of minimum degree. Because of the degrees, it must
necessarily be P G(X).
Proof. Let the word a(X)g(X) belong to the code. Let us pose that a(X) =
q(X)h(X) + r(X), with r(X) = 0 or deg(r(X)) < deg(h(X)). In A we have
a(X)g(X) = r(X)g(X) since g(X)h(X) = 0, and thus the family F is a generating
one. Let us prove that it is of rank k.
2.4.2.2. Coding
The first coding can be derived from proposition 2.26.
We will describe how two of classical codings for cyclic codes are performed.
Thus, let us suppose that the code considered here has a length n, and is gener-
ated by a polynomial g(X) of degree n − k. Information is a block of length k,
which is represented by a binary sequence, let’s say i0 , i1 , . . . , ik−1 . We will asso-
ciate the polynomial to this sequence (known as information polynomial) according
to: i(X) = i0 + i1 X + · · · + ik−1 X k−1 .
The second coding consists of calculating first X n−k i(X). Then we calculate the
remainder r(X) of the division of this new polynomial X n−k i(X) by g(X). The
polynomial obtained is r(X) + X n−k i(X). The sequence of its coefficients is sent
through the transmission channel. Generally, we first send the largest degree.
The following proposition proves that we have indeed carried out a coding.
Proof. The polynomial r(X) + X n−k i(X) is divisible by g(X), thus it belongs to the
code.
EXAMPLE 2.31 (n = 7, g(X) = 1+X +X 3 ). Let us take the information block equal
to 1011. The polynomial i(X) is equal to 1 + X 2 + X 3 . The polynomial X n−k i(X)
is X 3 + X 5 + X 6 . The remainder of the division of this polynomial by g(X) is 1. The
coded polynomial is, therefore, 1 + X 3 + X 5 + X 6 . As the length of the code is 7, we
will send the following sequence of 7 binary symbols through the channel 1001011
(the first sent is on the right).
Proof. Ann(C) is clearly a cyclic code and is therefore generated by a h(X) which
divides X n − 1, which is of minimal degree, but h(X) is in the code Ann (C). Thus,
H(X) divides h(X). Since H(X)g(X) = 0, it means that the degree of H(X) is
equal to or higher than that of H(X). Thus, H(X) = H(X).
DEFINITION 2.11 (ORTHOGONAL). The orthogonal (i.e. dual) of (g(X)) is the set of
all the polynomials v (X) with zero scalar product with all the codewords of the code
C. It is noted by (g(X))⊥ , or C ⊥ .
Proof. Let τ be the application of A in A which sends all a(X) over a(X −1 ). We
prove (and we will admit it) that τ is an automorphism. In addition we prove (and we
will also admit it) the equality:
n−1
a(X)b(X) = < a(X), X i τ (b(X)) > X i
i=0
66 Channel Coding in Communication Networks
One of the large advantages of cyclic codes is that we can have information on
their minimum distance, i.e. on their error correcting capability. More precisely, the
error correcting capability of a code is linked to the roots of the generator.
d−2
Let us consider the coefficient of Xd−2 in P . It is equal to j>i,i=0,...,d−4 (Xi −
Xj ). By a recurrence on the size of the determinant we easily prove that this coefficient
is equal to the determinant:
1
1 1 1 · · · 1
X0 X1 X2 X3 · · · Xd−3
X02 X12 X22 X32 · · · Xd−3
2
.. .. .. .. . . ..
. . .
d−3 d−3 d−3 . . .
X d−3
0 X1 X2 · · · · · · Xd−3
It is not zero.
Since α is of the order n and we have d − 2 < n, the elements αj1 , . . . , αjd−1 are
all different from each other. We may thus apply proposition 2.30 and D is non-zero,
as well as D.
We can provide a lower bound of the minimum distance from a cyclic code. This
result is based on corollary 2.2.
68 Channel Coding in Communication Networks
Based on the corollary 2.2, for any choice of d − 1 columns of this matrix we
obtain a determinant resembling the one studied in corollary 2.2. Every element of the
kernel of H (that is, of the code considered) thus has a Hamming weight that cannot
be less or equal to d − 1.
We provide here some of the most used classical codes. We will not speak of the
generalized RS codes, of the alternating codes, or of the Goppa codes. Among the
latter we find codes resulting from algebraic geometry, which is outside the scope of
our subject matter. Often we will present only binary codes, although they also exist
in Fq .
PROPOSITION 2.32. The binary code C having for roots the elements of the class of
α is a cyclic code (n = q − 1, k = n − r, 3), called the Hamming code.
There exists a generalization of these Hamming codes. Let there be the field Fqm ,
q = 2r . Let α be a primitive of Fqm . We pose β = αs . We seek a cyclic code with β
for root, with a error correcting capability of 1, i.e. a code (n, k, 3).
Block Codes 69
Proof. Indeed:
1) the length is the order of β. As the order of α is q m − 1, we directly have the
order of β;
2) since we want d = 3 no polynomial of the form 1+µX i (µ ∈ Fq , i < n) should
admit β as a root. Thus β i should only belong to Fq if it equals 1. Multiplicative groups
(β) and Fq \{0} must have an intersection reduced to {1}. As their respective cardinals
are (q m − 1)/(q m − 1, s) and q − 1, the proposition follows directly.
According to this proposition we see that the length of such a code cannot exceed
(q m − 1)/(q − 1). We will now demonstrate that this length can be reached.
PROPOSITION 2.35. There exists such a code of length (q m − 1)/(q − 1), if (m,
q − 1) = 1.
Proof. We pose β = αq−1 . The length is indeed (q m − 1)/(q − 1). The multiplicative
groups (β) and Fq \{0} have the respective cardinals (q m − 1)/(q m − 1, q − 1) and
q − 1, i.e. again (q m − 1)/(q − 1) and q − 1. But we have (q m − 1)/(q − 1) = q m−1 +
q m−2 +· · ·+q +1. Since q i −1 = si (q −1) for certain si , we have (q m −1)/(q −1) =
λ(q − 1) + m. It follows that such a code exists if ((q − 1), m) = 1.
Proof. The result for the minimum distance is a consequence of the BCH theorem.
The dimension stems from the fact that all the cyclotomic classes have a cardinal that
divides r (see proposition 2.46).
70 Channel Coding in Communication Networks
Such a code C can correct any packet of errors (or burst) of length b and detect all
bursts of length d, if the following conditions are verified:
1) d ≥ b,
2) b + d ≤ c + 1.
To prove this capacity it is enough to demonstrate that this code cannot contain the
sum of a burst of length b and a burst of length d.
PROPOSITION 2.37. C does not contain the sum of a burst of length b and of a burst
of length d.
Let us suppose that M (X) is not zero. We observe that we have r + d − 1 > b − 1,
and that, therefore, r + d − 1 = c + u, where u is the degree of M (X). Thus
r = c − d + 1 + u ≥ b + u. From this we deduce that r ≥ b > b − 1 and that
r > u. From this we see that in the right-hand side term there exists a monomial X r ,
but that this monomial cannot exist in the right-hand side term. The contradiction is
obvious.
2.4.3.4. RM codes
Let α be a primitive of Fq , q = 2m . Binary RM codes are defined by the expression
of the powers of α, which are roots of the code. Their roots is the αi such that the
Hamming weight of the binary decomposition of i is strictly lower that m − r. It is an
RM code of the order r.
m
P ROPOSITION 2.38. An RM code of the order r has length q − 1, dimension 1 + 1 +
m
m
m−r
2 + · · · + r , and minimum distance 2 − 1.
2.4.3.5. RS codes
RS codes are codes whose coefficients are in F2r , with r ≥ 2, of length 2r − 1.
There roots are αi , αi+1 , . . . , αi+δ−1 where α is a primitive.
2.4.3.6. Codes with true distance greater than their BCH distance
In an exhaustive article1 “One the minimum distance of cyclic codes”, J.H. van
Lint and R.M. Wilson provide all the binary codes with length not exceeding 61 that
have a true distance strictly larger than their BCH distance. Here are some examples;
each code is described in the form (length, dimension, true distance, BCH distance):
(31, 20, 6, 4), (31, 15, 8, 5), (39, 12, 12, 8), (45, 16, 10, 8),
(47, 23, 12, 6), (55, 30, 10, 5), (55, 20, 16, 8)
They are decodable by the FG-algorithm which we provide hereafter.
2.4.3.7. PG-codes
2.4.3.7.1. Introduction
We recall (see section 2.3) that we regard Fqm+i as a vector space of dimension
m + 1 in Fq , with q = 2s . Let α be a primitive of Fqm+i and β a primitive of Fq . We
q m+1 −1
q m+1 −1
have β = α q−1 . We have a projective geometry with n = q−1 points.
We will construct codes in F2 [X]/(X n − 1), but prior to this we will provide two
definitions.
1. VAN LINT J.H., WILSON R.M., “On the minimum distance of cyclic codes”.
72 Channel Coding in Communication Networks
DEFINITION 2.12. Let there be 2 integers i and j, and their respective writings in
base 2 be (i0 , . . . , iu ) and (j0 , . . . , ju ). It will be said that j is under i, if (j0 , . . . , ju )
is less than or equal to (i0 , . . . , iu ) for the produced order.
For example, j = 25 and i = 37. Then j is not under i. With j = 25 and i = 29, j
is under i.
DEFINITION 2.13. We use Ws (t(2s − 1)) to indicate the maximum number of integers
in the form i(2s − 1) disjoint two by two that are under t(2s − 1).
For example s = 2, t = 5. Since 3 and 12 are under 15 and are disjoint, we have
W2 (5(22 − l)) = 2.
2.4.3.7.2. PG-codes
We consider the code C such that its orthogonal C ⊥ contains all the projective
subspaces of the order r of the field Fqm+i . This code C is a code called a PG-code of
the order r. The code C ⊥ is characterized by the following proposition.
PROPOSITION 2.40. The C ⊥ code has for zeros all the elements in the form αt(2s−1)
where Ws (t(2s − 1)) ≤ r, t = 0.
There does not exist a formula giving the dimension of this code. It should be
constructed, so that the sought code C can be deduced from it.
ps(m−r+1) − 1
dBCH = +1
ps − 1
The error correcting capability of the code C stems from the following proposition.
Proof. Indeed:
1) it is the number of subspaces of the field that contains a fixed subspace of
dimension r;
2) two projective subspaces of order r cannot have an intersection of order r − 1
since they are distinct;
3) see section 2.6.
2.4.3.7.3. An application
Majority decoding makes it possible to carry out a cheap and fast electronic opera-
tion, especially when decoding is in one stage. If decoding has more than three stages,
the complexity becomes very high.
The Japanese needed to find a powerful code with a cheap decoder. They wanted
to use it for their Teletext. Constraints: length of information 81, number of errors to
be corrected: 8. The solution found uses an information length of 82. The code is then
shortened by one position. The respective values of the parameters are: p = 2, r = 1,
m = 2. Decoding has 1 stage. We deduce from it the dimension of C: 82, the length of
the code: 273, its error correcting capability: 8 (s = 4, because 273 = 24×2 + 24 + 1).
It is a (273, 82, 18) code shortened by 1 position, decodable by majority vote with 1
level. The price of an encoder/decoder was 175 FF in 1995.
2.4.3.8. QR codes
Binary quadratic residue codes (QR codes) have a length p, where p is a prime
number in the form 8m + 1 or 8m − 1. For each such p there are 4 QR codes. One has
all the modulo p squares as roots, another has all these squares and 1, another has all
the non squares, and the last one has all non squares and 1.
Faced with a list of tasks proposed by an industrialist, we are often led to seek if
there exists a code which fulfills the requirements. If one does exist, it then has to be
constructed. There are tables of known codes, for a certain number of values of the
parameters n, k and d.
2.4.4.1. Existence
It is often useful to simply know if there exists a cyclic code with the given param-
eters. It is the case when we are trying to satisfy a list of tasks. The first stage consists
of testing the existence of a code. More precisely, we are led to examine whether there
exists a polynomial g(X), divisor of X n − 1, with a given degree.
Proof. Let F2r be the smallest field containing the nth roots of unity. Let α be a
primitive of this field, and β be a nth primitive root of unity.
Let us suppose that the polynomial g(X) has a degree s. Its s roots are powers
of β. The corresponding exponents are elements of Z/(n). According to proposition
2.11 (see section 2.3), these roots are grouped by cyclotomic classes, and, therefore,
by the powers.
Proof. If there exists a class with a cardinal s, then s divides n. Let there be x in
Z/(2n − 1), whose class has a cardinal s. By definition of cyclotomic classes we have
2s x = x. In addition, there is also 2n x = x. We use the Euclidean equality between x
and n : n = qs + r, 0 ≤ r < s. From there we obtain (2s )q × 2r x = x, then 2s x = x,
which implies r = 0, otherwise the class of x would contain less than s elements.
Thus, s divides n.
If there exists an s such that s divides n, then there exists a class with a cardinal
s. Let x = (2n − 1)/(2s − 1). We have (2s − 1)x = 0, and the cardinal of the class
of x is thus at most equal to s. Let us suppose that the cardinal is t(0 < t < s).
Block Codes 75
2.4.4.2. Construction
There exist various possibilities to construct a binary cyclic code with a given
length n:
– we can use the cyclotomic classes of Z/(n), then construct minimum polyno-
mials;
– we can directly seek g(X) by factorizing X n − 1;
– we may also be led to seek a code which contains given words.
PROPOSITION 2.47. Let g(X) be the generator of a cyclic code of length n with a
degree n − k. Let {Ci1 , Ci2 , . . . , Cir } be the family of cyclotomic classes found in
Z/(n). The polynomial g(X) has the elements of the forms αj as roots, where j tra-
verses the joining of the classes {Cαi1 , Cαi2 , . . . , Cαir }.
Proof. This is straightforward. This proposition simply indicates the link between
classes in Z/(n) and the roots of g(X). We will note that the cardinal of the join-
ing of classes must be equal n − k.
PROPOSITION 2.48. In A = F2 [X]/(f (X)) the elevation to the square, which we will
note h, is a linear endomorphism.
In F2 [X] let us suppose that f (X) divides a2 (X) − a(X), for a certain a(X) of a
degree strictly smaller than the degree of f (X).
Proof. We have a2 (X) − a(X) = a(X)(a(X) − 1) = λ(X)f (X), for a certain λ(X).
Any irreducible factor p(X) of f (X) divides either a(X) or a(X) − 1, but not both,
because otherwise it would divide their difference 1. Thus this PGCD (f (X), a(X))
is formed by a family of primary factors of f (X). It can be equal neither to f (X) nor
to 1, because of the hypotheses regarding the degree of a(X).
To factorize f (X) it is enough to find a(X), which the Berlekamp method gives
us. The identical application is noted id.
PROPOSITION 2.50. Any element of A, different from 1, which is in the kernel of h−id,
is such a polynomial a(X).
0010
which yields the kernel matrix: 0101 3
1000 . It provides a(X) = X + X , and we easily
3
find (f (X), a(X)) = 1 + X. The second factor is 1 + X + X . Neither of these two
factors can be factorized further.
r
We can prove that the factorization of X 2 − X gives all the irreducibles with a
degree dividing r.
EXAMPLE 2.34. Let us factorize X 7 − 1 in F2 [X]. With the same notations as in the
previous example we have:
⎛ ⎞ ⎛ ⎞
1000000 0000000
⎜ 0000100 ⎟ ⎜ 0100100 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0100000 ⎟ ⎜ 0110000 ⎟
⎜ ⎟ ⎜ ⎟
M =⎜ ⎟ ⎜
⎜ 0000010 ⎟ and M − I = ⎜ 0001010 ⎟
⎟
⎜ 0010000 ⎟ ⎜ 0010100 ⎟
⎜ ⎟ ⎜ ⎟
⎝ 0000001 ⎠ ⎝ 0000011 ⎠
0001000 0001001
We notice the simplicity of the construction of this matrix. Using the Gaussian
method we obtain the needed matrix:
⎛ ⎞
0110100
M = ⎝ 0001011 ⎠
1000000
0000
0011
The kernel matrix is 1000 .
PROPOSITION 2.51. Let λ(X) be the polynomial of the smallest degree, such that we
have λ(X)m(X) = 0. The required code is the ideal ((X n − 1)/λ(X)).
Proof. Indeed:
1) the set of polynomials u(X), such that u(X)m(X) = 0 is an ideal of A. As
any ideal is principal, this ideal is generated by a polynomial with the smallest possible
degree. Thus, it is the polynomial λ(X) of the statement;
2) the polynomial λ(X) divides X n − 1. Thus, m(X) is in the code ((X n − 1)/
λ(X)). We pose g(X) = (X n − 1)/λ(X);
3) everything under the strict code of the code (g(X)) is generated by a polyno-
mial of the form u(X) × g(X) (with u(X) = 1). If m(X) is in such a subcode, then
m(X) must be canceled by (X n − 1)/u(X)g(X), which is impossible, because its
degree is strictly smaller than that of λ(X).
Using the Gaussian method pivots we easily find the required code. Let us note
that the pivots may be in any column.
EXAMPLE 2.35. Find the smallest cyclic code containing the following codeword:
110010100001110. By the Gaussian method we find, for example:
1100101000011101
011001010000111X
0111100010011011 + X 2
101111000100110X + X 3
1001110000111011 + X 2 + X 4
0000000000000001 + X + X 3 + X 5 = λ(X)
Each binary vector-row is followed on its right by a polynomial v(X). This trans-
lates the fact that the row is equal to v(X)m(X). These polynomials v(X) appear
during the application of the Gaussian method.
Block Codes 79
In the general case we want to determine the smallest cyclic code containing the
codewords m1 (X), m2 (X), . . . , ms (X). Using the previous construction we con-
struct the polynomials λ(X), λ2 (X), . . . , λs . The polynomial with the smallest degree
canceling the mi (X) is clearly the LCM of the λi (X). Another method is to seek the
PGCD (m1 (X), m2 (X), . . . , ms (X)). It is the generator of the required code.
2.4.4.4. Specifications
A specification is a set of constraints that the system of coding must satisfy. The
principal parameters, which industry specialists make a point of taking into account,
are as follows:
– length L of the information string to be coded,
– maximum redundancy rate,
– maximum length of the codewords,
– gross flow (i.e. in terms of binary symbols),
– net flow (i.e. in terms of information bits),
– residual error rate for an input error rate (i.e. pr for pe ),
– average space without errors between two badly decoded consecutive words,
– electronic constraints (delicate).
most economic). Then for fixed n and k we must examine whether there exists a cyclic
code. To that end we study the distribution of cyclotomic classes in Z/(n).
On the basis of this study we find what can be the error correcting capability of the
code. It is then necessary to use the formulae connecting pr to pe , as well as the BCH
theorem. We can have an idea of the decoding power of the code from the following
proposition.
PROPOSITION 2.52. Let there be a cyclic code of length n, dimension k, with error
correcting capability of t errors per word. Let pc be probability of channel error, and
pr be the residual probability per corrected word. We have the inequality:
n
n i
npr ≤ (t + i) pc (1 − pc )n−i
i=t+1
i
Proof. Since pr is the probability of error per symbol of a received word, the expecta-
tion of the number of residual errors per corrected word is npr (binomial distribution).
We will provide an increase of this expectation.
Let us consider the event “the word has been decoded incorrectly”. This event
is included in the following event E “for any value i (i = t + 1, t + 2, . . . , n) of
the number of transmission errors occurring, the decoding algorithm decodes using
likelihood decoding”. This means that the number of errors in the “corrected” word
is at most equal to i + 1 (i comes from the channel, t comes from decoding). The
expectation of the number of errors in this event E is:
n
n i
(t + i) pc (1 − pc )n−i
i=t+1
i
It will be noted that pr is also the probability of residual errors for the information
block recovered after decoding, provided that this decoding is systematic. Otherwise
the residual rate is much greater.
In practice, when pc is not greater than 10−3 , we are able to take as a relation:
n
npr = (2t + 1) pt+1 (1 − pc )n−t−1
t+1 c
It has to be well noted that the value pr obtained is the one provided by the code.
If we require a residual probability of p , we must then check for the code considered,
the inequality:
n
(2t + 1) pt+1 (1 − pc )n−t−1 ≤ np
t+1 c
Block Codes 81
Lastly, we are able to take into account the more delicate constraints on electronics.
2) Using the constraint on the redundancy rate we find the inequality n ≤ k/(0.82).
This yields the possible values for n for a value of k:
k 1 3 5 7 15 21 35 105
n 1 3 3 7 15 15 31 127
We see that there exists, perhaps, a natural cyclic code of length 127. We examine
the classes of Z/(127). We will look for a joining of these classes, with a cardinal
127 − 105 = 22. The cardinals of classes are divisors of 7 (because 127 = 27 − 1).
There are, thus, cardinal classes 1, 7. Since 22 = 3 × 7 + 1, we conclude that there
exists a code whose roots contain {α, α3 , α5 , α0 }.
The apparent distance of the code is 8. It thus corrects 3 errors per codeword. If
we approximate
n
the member on the right of the formula linking pr to pe by (t + 1) ×
t+1
(t+1) × p × (1 − pe )n−t−1 , we must verify the inequality:
n
(t + 1) × × pt × (1 − pe )n−t ≤ 127 × 10−5 .
t
We obtain: (4) × 127 4 × 10−16 × (0.9999)123 , to compare with 127 × 10−5 . We
also have: 4.084 × 10 to compare with 127 × 10−5 . It is acceptable, therefore there
9
Since the beginning of 1970s many applications of error correcting codes have
been introduced. Let us cite a few.
– transmissions of images by remote spacecrafts,
– satellite transmissions,
– underwater transmissions,
– optical discs,
– Hubble,
– bar-codes,
– computer memory,
– mobiles,
– CD readers,
– cryptography.
In this section, circuits are drawn without taking traditional standards into account,
as far as logical gates and oscillation are concerned. We will not represent connections
with the clock.
There is the flip-flop, represented as follows, which contains a binary value. This
flip-flop is under the control of a clock. With each beat (or signal) of this clock the
flip-flop transmits the value that it contained and receives the value presented at input.
A flip-flop has an input and an output (see Figure 2.2).
There are also logical gates, “OR”, “AND”, “exclusive OR” represented as follows
(see Figure 2.3).
Chapter 8
H,H; = nz.
-:1
of H are painvise orthogonal, as are the rows. Some examples of Hadamard matrices are:
1 1 1
1 -1 1
H4 =
1 1 -1 -1
1 -1 -1 1 _I
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
1 -1 -1 1 -1 1 1 -1
The operation of computing rH,, where r is a row vector of length n, is sometimes called
computing the Hadamard transform of r. As we show in Section 8.3.3, there are fast
algorithms for computing the Hadamard transform which are useful for decoding certain
Reed-Mullercodes (among other things). Furthermore, the Hadamard matrices can be used
to define some error correction codes.
It is clear that multiplying a row or a column of Hn by -1 produces another Hadamard
matrix. By a sequence of such operations, a Hadamard matrix can be obtained which has
the first row and the first column equal to all ones. Such a matrix is said to be n o m l i z e d .
Some of the operations associated with Hadamard matrices can be expressed using the
Kronecker product.
370 Other Important Block Codes
Theorem 8.1 The Kronecker product has the following properties [246, ch. 91:
(A + B) €3 C = ( A €3 C) 4- ( B €3 C).
A €3 (B + C) = ( A €3 B) + ( A €3 C ) .
4. Associativeproperty: ( A €3 B ) €3 C = A €3 (B €3 C ) .
5. Transposes: ( A €3 B)T = AT €3 BT.
6. Trace (for square A and B): tr(A €3 B) = tr(A)tr(B).
7. I f A is diagonal and B is diagonal, then A €3 B is diagonal.
8. Determinant, where A is rn x m and B is n x n: det(A €3 B) = det(A)" det(B)'".
9. The Kronecker product theorem:
Returning now to Hadamard matrices, it may be observed that the Hadamard matrices
in (8.1) have the structure
Theorem 8.3 A Hadamard matrix must have an order that is either I , 2, or a multiple of 4.
Proof [220, p. 441 Suppose without loss of generality that Hnis normalized. By column
permutations, we can put the first three rows of Hn in the following form:
1 1 1 1 1 *.- 1 1 1 ... 1 1 1 ... 1
1 1 ... 1 1 1 ... 1 -1 -1 ... -1 -1 -1 ... -1
1 1 ... 1 -1 -1 ... -1 1 1 **- 1 -1 -1 ... -1
- \ / \
i j k 1
For example, j is the number of columns such that the first two rows of Hn have ones while
the third row has negative ones. Since the rows are orthogonal, we have
i+j-k-Z=O (inner product of row 1 with row 2)
i-j+k-Z=O (inner product of row 1 with row 3)
i-j-k+Z=O (inner product of row 2 with row 3),
which collectively imply i = j = k = 1. Thus n = 4 i , so n must be a multiple of 4. (If
n = 1or 2, then there are not three rows to consider.) 0
This theorem does not exclude the possibility of a Hadamard matrix of order 12. However,
it cannot be obtained by the Sylvester construction.
Example 8.2 The easiest way to find the quadratic residues modulo a prime p is to list the nonzero
numbers modulo p , then square them.
Let p = 7. The set of nonzero numbers modulo p is {1,2,3,4,5,6]. Squaring these numbers
modulo p we obtain the list (12,22, 32, 42, 5 2 , 62} = {1,4,2,2,4, 1). So the quadratic residues
modulo 7 are {1,2,4}.The quadratic nonresidues are {3,5,6}. The number 9 is a quadratic residue
+
modulo 7, since 9 = 7 2, and 2 is a quadratic residue.
Now let p = 11. Forming the list of squares we have
The Legendre symbol is a number theoretic function associated with quadratic residues.
Definition8.3 Let p be an odd prime. The Legendre symbol x p ( x ) is defined as
0 if n is a multiple of p
1 if x is a quadratic residue modulo p
- 1 if x is a quadratic nonresidue modulo p .
The key to the Paley construction of Hadamard matrices is the following theorem.
Proof From Theorem 8.4 and the definition of the Legendre symbol,x p (xy) = xp(x)xp (y ) .
Since b = 0 contributes nothing to the sum in (8.4), suppose b # 0. Let z = (b c)b-’ +
(mod p). As b runs from 1,2, . . . , p - 1, z takes on distinct values in 0,2,3, . . ., p - 1,
but not the value 1. Then
P-1 P-1 P-1 P-1
= c
P-1
z=o
xp(z> - Xp(1) = 0 - Xp(1) = -1,
where the last equality follows since half of the numbers z from 0 to p - 1 have xp (z) = -1
and the other half x p (z) = 1, by Theorem 8.4.
With this background, we can now define the Paley construction.
Example 8.4 Let p = 7. For the first row of the matrix, see Example 8.3.
- 1 1 1 1 1 1 1 1 ’
0 1 1 -1 1 -1 -1
1 -1 1 1 -1 1 -1 -1
-1 0 1 1 -1 1 -1
1 -1 -1 1 1 -1 1 -1
-1 -1 0 1 1 -1 1
1 -1 -1 0 1 1 -1 -1 -1 -1 -1
1 1 -1 -1 -1 1 1 -1
-1 1 -1 -1 0 1 1
1 -1 1
1 -1 1 -1 -1 -1 1 1
-1 -1 0 1
1 1 -1 1 -1 -1 -1 1
1 1 -1 1 -1 -1 0
1 1 1 -1 1 -1 -1 -1 -
Example 8.5 We now show the construction of H12. The 11 x 11 Jacobsthai matrix is
- 0 1 - 1 1 1 1 - 1 - 1 - 1 1 - 1
-1 0 1 -1 1 1 1 -1 -1 -1 1
1 -1 0 1 -1 1 1 1 -1 -1 -1
-1 1 -1 0 1 -1 1 1 1 -1 -1
-1 -1 1 -1 0 1 -1 1 1 1 -1
J11 = -1 -1 -1 1 -1 0 1 -1 1 1 1
1 -1 -1 -1 1 -1 0 1 -1 1 1
1 1 -1 -1 -1 1 -1 0 1 -1 1
1 1 1 -1 -1 -1 1 -1 0 1 -1
-1 1 1 1 -1 -1 -1 1 -1 0 1
- 1 - 1 1 1 1 - 1 - 1 - 1 1 - 1 0
374 Other ImDortant Block Codes
II
1 1 1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 1 1 1 -1 -1 -1 1 -1
1 -1 -1 1 -1 1 1 1 -1 -1 -1 1
1 1 -1 -1 1 -1 1 1 1 -1 -I -1
1 -1 1 -1 -1 1 -1 1 1 1 -1 -1
1 -1 -1 1 -1 -1 1 -1 1 1 1 -1
(8.5)
i
H12 = 1 -1 -1 -1 1 -1 -1 1 -1 1 1 1
1 1 -1 -1 -1 1 -1 -1 1 -1 1 1
1 1 1 -1 -1 -1 1 -1 -1 1 -1 1
1 1 1 1 -1 -1 -1 1 -1 -1 I -1
1 -1 1 1 1 -1 -1 -1 1 -1 -1 1
1 1 -1 1 1 1 -1 -1 -1 1 -1 -1
The following lemma establishes that the Paley construction gives a Hadamard matrix.
k=O k=O
P-1
= +
xp(b)Xp(b c ) = -1 (subs. b = k - i, c = i - j , then use Lemma 8.5).
b=O
Also, JpU = 0 since each row contains ( p - 1 ) / 2 elements of 1 and ( p - 1 ) / 2 elements
of -1. 0
Now
Starting from An again, if we adjoin the binary complements of the rows of A,, we
obtain a code with code length n, 2n codewords, and minimum distance n/2. This
code is denoted e.
This book does not treat many nonlinear codes. However, if any of these codes are con-
structed using a Paley matrix with n > 8, then the codes are nonlinear. (The linear span
of the nonlinear code is a quadratic residue code.) Interestingly, if the Paley Hadamard
matrix is used in the construction of A, or B, then the codes are cyclic, but not necessarily
linear. If the codes are constructed from Hadamard matrices constructed using the Sylvester
construction,the codes are linear.
f2 = (1111100110101100).
The number of distinct Boolean functions in m variables is the number of distinct binary
sequences of length 2 m , which is 22m. The set M of all Boolean functions in m variables
forms a vector space that has a basis
{1,ul,V2, ..., ~ m , ~ l ~ 2 , ~ 1 ~ 3 , . . . , U m - 1 U ,VlV2U3...Vm].
m,"'
Every function f in this space can be represented as a linear combination of these basis
functions:
f =a01 + a l v l + a 2 ~ 2+ . . . a m u r n + ~ l 2 ~ l ~ 2 + . . . + ~ 1 2 . . . m ~ 1 ~ 2 ' . . ~ m .
Functional and vector notation can be used interchangeably. Here are some examples
of some basic functions and their vector representations:
1 1 = 1111111111111111
f,
Ul f,V l = 0101010101010101
u2 t,v 2 = 0011001100110011
u3 f,v 3 = 00001 11100001111
u4 f,v4 = 0000000011111111
UlU2 f, V l V 2 = 0001000100010001
~1 u2 u3 ~4 t,~ 1 ~ ~4 2= ~000000000000000
3 1.
As this example demonstrates,juxtaposition of vectors represents the corresponding Boolean
'and' function, element by element. A vector representing a function can be written as
f =a01 + a l v l + a 2 ~ 2 + * . . a m ~ +m ~ 1 2 ~ 1+ .~. . 2+ a l 2 , , , m ~ l v 2 . . . v , .
Definition 8.4 [373, p. 1511 The binary Reed-Muller code R M ( r , m ) of order r and length
2mconsists of all linear combinations of vectors f associated with Boolean functions f that
are monomials of degree p r in m variables.
I genrm.cc I
1
1 1 1 1 1 1 1 1
G=[ 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 '
0 1 0 1 0 1 0 1
This is an (8,4,4) code; it is single-error correcting and double-error detecting. This is also the
extended Hamming code (obtained by adding an extra parity bit to the (7,4) Hamming code). 0
8.3 Reed-Muller Codes 377
Example 8.8 The R M ( 2 , 4 ) code has length 16 and is obtained by linear combinations of the mono-
mials up to degree 2, which are
1= (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )
v4= (0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 )
v3= (0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 )
v2= (0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 )
v1= (0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 )
v3v4= (0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 )
~ 2 ~ 4 (0
= 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 )
V Z V ~ = (0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 )
V I V ~ = (0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 )
v 1 ~ 3 = (0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1)
v1v2= (0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1).
+ +
Proof The codewords of R M ( r 1, m 1) are associated with Boolean functions in m 1 +
variables of degree F r +
1. If c(v1, . . . , v m + l ) is such a function (i.e., it represents a
codeword) we can write it as
c(~I, 1 . . 9 vm+l) = f(v1,. . + v m + l g ( v l , . . ., Urn),
- 9 vm)
Proof By induction. When m = 1 the R M ( 0 , 1 ) code is built from the basis [ l } , giving rise
to two codewords: it is the length-2 repetition code. In this case d i n = 2. The R M ( 1 , 1 )
code is built upon the basis vectors [ l , u l } and has four codewords of length two: 00, 01,
1 0 , l l . Hence dmin = 1 .
As an inductive hypothesis, assume that up to m and for 0 5 r 5 m the minimum
+
distance is 2m-r. We will show that dmin for RM(r, m 1 ) is 2m-r+1.
Let f and f’ be in RM(r, m ) and let g and g’ be in R M ( r - 1 , m).By Lemma 8.7, the
+ +
vectors c1 = (f, f g) and c2 = ( f ’ , f’ g’) must be in R M ( r , m 1 ) . +
+ +
If g = g’ then d(c1, c2) = d ( ( f ,f g), (f’, f’ g)) = d ( ( f ,f’), (f, f’)) = 2 d ( f ,f’) 2
22“‘-‘ by the inductive hypothesis. If g # g’ then
d(c1, ~ 2=) w(f - f’) + w(g - g’ + f - f’).
+
Claim: w(x y) 2 w(x) - w(y). Proof Let w x y be the number of places in which the
+ +
nonzero digits of x and y overlap. Then w(x y) = (w(x) - w x y ) (w(y) - w x y ) .But
since 2w(y) _> 2wxy,the result follows.
By this result,
d(c1, ~ 2 2) w(f - f’) + w(g - g’) - w(f - f’) = w(g - g’).
But g - g’ E R M ( r - 1 , m ) , so that w(g - g’) 2 2m-(r-1) = 2m-r+1. 0
The following theorem is useful in characterizing the duals of RM codes.
i =O
By the theorem regarding the dimensionalityof dual codes, Theorem 2.8, R M ( m -r - 1 , m )
must be the dual to R M ( r , m ) . 0
It is clear that the weight distribution of R M ( 1 , m ) codes is A0 = 1 , A2m = 1 and A2,,-1 =
2m+1- 2. Beyond these simple results, the weight distributions are more complicated.
8.3 Reed-Muller Codes 379
Lo 1 0 1 0 1 0 11
The columns of G consist of the numbers (1, 0, 0,O) through (1, 1, 1, 1)in increasingbinary
counting order (with the least-significant bit on the right). This sequence of bit values can
thus be obtained using a conventional binary digital counter. A block diagram of an encoding
circuit embodying this idea is shown in Figure 8.1.
i=O
where @ denotes addition modulo 2 and d(ri, ci) is the Hamming distance between the
arguments. A sequence which minimizes d(r, c) has the largest number of positive terms
in the sum on the right of (8.7) and therefore maximizes the sum.
Let 3(r) be the transformation that converts binary {O, l} elements of r to binary f l
values of a vector R according to
By (8.7), the codeword c which minimizes d ( r ,c) maximizes the correlation cor(R, C ) .
The decoding algorithm is summarized as: Compute Ti = cor(R, Ci), where Ci =
3 ( c i ) for each of the 2"+' codewords, then select that codeword for which cor(R, C i ) is
the largest. The simultaneous computation of all the correlations can be represented as a
matrix. Let Ci be represented as a column vector and let
Recall that the generator matrix for the R M ( 1, m ) code can be written as
We actually find it convenient to deal explicitly with those codewords formed as linear
+
combinationsof only the vectors v1,v1, . . ., v m ,since 1 c complements all the elements
of c, which correspondsto negating the elements of the transform C . We therefore deal with
the 2m x 2" matrix H p . Let us examine one of these matrices in detail. For the R M ( 1,3)
code with G as in (8.6),the matrix Hs can be written as
8.3 Reed-Muller Codes 381
- 0 1 2 3 4 5 6 7
1 1 1 1 1 1 1 1
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
L 1 -1 -1 1 -1 1 1 -1
Examination of this reveals that, with this column ordering, column 1 corresponds to 3(v I ) ,
column 2 corresponds to 3 ( v 2 ) , and column 4 corresponds to 3(v3). In general, the ith
+
column correspondsto the linear combination of i l v l + i2v2 i3v3, where i has the binary
representation
+ +
i = i l 2i2 4i3.
We write the binary representation as i = (i3, i2, i1)2. In the general case, for an 2m x 2m
Hadamardmatrix,weplace3(~~=l ijvj) intheithcolumnwherei = (im,i m - l , . . . , i1)2.
The computation RH is referred to as the Hadamard transform of R.
The decoding algorithm can be described as follows:
r = [I, 0, 0, I , O , O , 1,0] .
The steps of the algorithm follow: rmdecex.m
where
M.!$ = 12m-i @ H2 @ Z2i-I,
and where Z p is a p x p identity matrix.
Proof By induction. When m = 1 the result holds, as may be easily verified. Assume,
then, that (8.10) holds for m. We find that
[ hadex.m I Straightforward substitution and multiplication shows that this gives the matrix Ha in (8.8).
Let R = [ R o , R1, . . . , R7]. Then the Hadamard transform can be written
1 0
-
0 1
-1 0
0 -1
1 0 1 0
0 1 0 1
1 0 --1 0
0 1 0 -1 -
- 1 1
-
1 1
1 1
1 1
M8(3) -
- H2@14=
1 -1
1 -1
1 -1
- 1 -1 -
U
testfht.cc
Figure 8.2 shows the flow diagram corresponding to the matrix multiplications, where arrows f h t .cc
indicate the direction of flow, arrows incident along a line imply addition, and the coefficients -1 fht.m
along the horizontal branches indicate the gain along their respective branch. At each stage, the two-
point Hadamard transform is apparent. (At the first stage, the operations of H2 are enclosed in the box
to highlight the operation.) The interleaving of the various stages by virtue of the Kronecker product
is similar to the “butterfiy” pattern of the fast Fourier transform. 0
..........
Figure 8.2: Signal flow diagram for the fast Hadamard transform.
Efficient decoding algorithms for general Rh4 codes rely upon the concept of majority logic
decoding, in which multiple estimates of a bit value are obtained, and the decoded value
is that value which occurs in the majority of estimates. We demonstrate this first for a
RM(2,4) code, then develop a notation to extend this to other RM(r, rn) codes.
[z]
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 GO
G= 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 =
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
Thus the bits in mo are associated with the zeroth order term, the ml bits are associated
with the first order terms, and the second-orderterms are associated with m2. The encoding
8.3 Reed-Muller Codes 385
[g]
operation is
(8.11)
The general operationof the algorithmis as follows: Given a received vector r, estimates
are first obtained for the highest-orderblock of message bits, m2. Then m2G2 is subtracted
off from r, leaving only lower-order codewords. Then the message bits for m l are obtained,
then are subtracted, and so forth.
The key to finding the message bits comes from writing multiple equations for the same
quantity and taking a majority vote. Selecting coded bits from (8.11) we have
co = mo
c1 = nzo + m l
c:!=mo+m2
c3 =mo+mi +rn2+m12.
Adding these code bits together (modulo 2) we obtain an equation for the message bit m 12:
co + ci + c2 + c3 = m12.
We can similarly obtain three other equations for the message bit m 12,
c4 + + C6 f = m12
C5 C7
CS + + C10 + = m12
C9 C11
Expressions such as this, in which the check sums all yield the same message bit, are said
to be orthogond' on the message bit. From these four orthogonal equations, we determine
the value of m12 by majority vote. Given A?;, i = 1 , 2 , 3 , 4 , the decoded value 6 1 2 is
where maj (- . . ) returns the value that occurs most frequently among its arguments.
If errors occur such that only one of the ; :
A is incorrect, the majority vote gives the
correct answer. If two of them are incorrect, then it is still possible to detect the occurrence
of errors.
'This is a different usage from orthogonality in the vector-space sense.
386 Other ImDortant Block Codes
(8.12)
m34 = c3 c7 + + +
c11 C15
Based upon these equations, majority logic decisions are computed for each element of the
second-orderblock. These decisions are then stacked up to give the block
m 2 = (A349 A243 k147 k 2 3 , fit133 fi12).
We then “peel off’ these decoded values to get
rf = r - m2G2.
Now we repeat for the first-order bits. We have eight orthogonal check sums on each of the
first-order message bits. For example,
ml = c o + c 1 m i =C2+C3 mi =c4+c5 mi = C6 +c7
mi =C8+C9 ml = c10 + c11 m l = C12 + C13 m l = C14 f C15
We use the bits of r’ to obtain eight estimates,
= rh + r; m y ) = r;+ ri mi3) = ri+ ri my’ = ri + r;
m(5) = r8
f + rb mi6’ = ri0 + ril my’ = ri2 + ri3 mr8) = T i 4 + ri5
then make a decision using majority logic,
l 1.
k 1 = maj(mi’), m y ’ , . . . , m (8)
Similarly, eight orthogonal equations can be written on the bits m2, m3, and m4, resulting
in the estimate m l = ( k 1 , k 2 , k 3 , k 4 ) .
Having estimated m l , we strip it off from the received signal,
rf f = r f - m 1 G 1
and look for mo. But if the previous decodings are correct,
r” = mol + e.
Then mo is obtained simply by majority vote:
k o = maj(rff f f, . . . ,&.
0, r1
Example 8.11 The computations described here are detailed in the indicated file. Suppose m =
(00001001000), SO the codeword is c = mG = (0101011001010110). Suppose the received vector
is r = (0101011011010110). The bit message estimates are
&(I)
12 - ro + ri + r2 + r3 = 0
- h(2) -
12 - r4 + r5 + rg + r7 = 0
&(3) -
12 - r8 + rg + rio + r l i = 1 ,m12
.(4) = r12 + r13 + r14 + ‘15 = 0
We obtain h i 2 = maj(0, 0, 1,O) = 0. We similarly find
b;]
2 2 4 = maj(1, O,O, 0 ) = O h 3 4 = maj(l,O, 0,O) = 0,
so that m2 = (001000). Removing this decoded value from the received vector we obtain
v3v4
VlV2
At the next block we have
$1 =maj(l, 1, 1, 1,O, I , 1 , I ) = 1 h 2 =maj(O,O,O,O, l,O,O,O) = O
h3 = maj(O,O, O,O, 1,0,0,0) = 0 h 4 = maj(l,O, O,O, O,O, 0,O) = 0
[
so that m2 = (0001). We now remove this decoded block
v4
r” = r’ - m2G1 = r’ - m2 = (0000000010000000).
A Geometric Viewpoint
Clearly the key to employing majority logic decoding on an R M ( r , m) code is to find a
description of the equations which are orthogonal on each bit. Consider, for example, the
orthogonal equations for m34, as seen in (8.12). Writing down the indices of the checking
bits, we create the check set
S34 = {{O, 4 , 8 , 121,11,5,9,131, {2,6,10,14),{3,7,11,1511.
numbers from 0000 to 1111, with edges between those vertices that are logically adjacent.
The graph for a code with n = 3 is shown in Figure 8.3(a). It forms a conventional 3-
dimensional cube. The graph for a code with n = 4 is shown in Figure 8.3(b); it forms a
4-dimensional hypercube. The check set S34 can be represented as subsets of the nodes in
the graph. Figure 8.4 shows these sets by shading the ‘‘plane” defined by each of the fours
check subsets. Similarly, the check sets for each the bits m 12, m13, etc., form a set of planes.
Figure 8.4: Planes shaded to represent the equations orthogonal on bit m34.
With these observations, let us now develop the notation to describe the general situation.
For a codeword c = (co, c1, . . .,c n - l ) , let the coordmate ci be associated with the binary m-
tuple Pi obtained by complementing the binary representation of the index i. For example, co
is associated with Po = (1111) (since 0 = (0000)2)and C6 is associated with P6 = (1001)
(since 6 = (0110)~).We think of the Pi as points on the adjacency graph such as those
8.3 Reed-Muller Codes 389
Let I = { 1,2, . . . , m } . We represent the basis vectors for the RM(r, m ) code as subsets
of I . For example, the basis vector v1 is represented by the set { 1). The basis vector ~ 2 is ~ 3
represented by the set {2,3}.The basis vector v2v3v4 is represented by [2,3,4}. With this
notation, we now define the procedure for finding the orthogonal check sums for the vector
, * . . vi, [373, p. 1601.
~ i vi2
1. Let S = { S l , S2, . . . , S ~ W , }be the subset of points associated with the incidence
vector vi, Vjz . . .vi,.
2. Let { j i , j 2 , . . . , j m - p } be the set difference I - { i l , i2, . . . , ip}. Let T be the subset
of points associated with the incidence vector vj, vjz . ..vjm-,. The set T is called the
complementary subspace to S .
3. The first check sum consists of the sum of the coordinates specified by the points in
T.
4. The other check sums are obtained by “translating”the set T by the points in S. That
+
is, for each Si E S , we form the set T Si . The corresponding check sum consists
of the sum of the coordinates specified by this set.
Example 8.12 Checksums for R M ( 2 , 4 ) . Let us find checksums for v3v4 = (0000000000001111).
1. The subset for which v3v4 is an incidence vector contains the points
Figure 8.5: Geometric descriptions of parity check equations for second-order vectors of
the R M ( 2 , 4 ) code.
Figure 8.5 indicates the check equations for all of the second-order vectors for the R M ( 2 , 4 ) code.
Now let us examine check sums for the first-order vectors.
1. For the vector v4 = (OOOOoooOl 1111111) the set S is
These eight points are connected by the dashed lines in Figure 8.6(a).
2. The difference set is
[1,2, 3,4}- [41 = 11,Z 31,
which has the associated vector ~ 1 ~ =2 (OOoooOOlOOOOOOOl),
~ 3 which is the incidence vector
for the set
T = [ 4 ,P15} = ((lOOO), (OOOO)}.
The corresponding equation is m4 = Cg + co, The subset is indicated by the widest line in
Figure 8.6(a).
3. There are eight translations of T by the points in S. These are shown by the other wide lines
in the figure.
0
8.3 Reed-Muller Codes 391
Figure 8.6: Geometric descriptions of parity check equations for first-order vectors of the
R M ( 2 , 4 ) code.
Example 8.13 While the squaring construction can be applied to any linear block code, we demon-
strate it here for a Reed-Muller code. Consider the RM(1, 3) code with
[ 1
1 1 1 1 1 1 1 1
0 0 1 1 0 0 1 1
G=Go=
0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1
Let C1 be the ( 8 , 3 ) code generated by the first k l = 3 rows of the generator Go,
[ 1
1 1 1 1 1 1 1 1
GI= 0 0 1 1 0 0 1 1 .
0 0 0 0 1 1 1 1
The cosets in C/C1 are
[
obtained by
4
This is called the matrix direct sum. Let 12 be the 2 x 2 identity. Then
12 €3 G1 = [Ggi f
Example 8.14 Continuing the previous example, the generator for the code ICo/C1 l2 is
- 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
QSO/l = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
394 Other Important Block Codes
We can further partition the cosets as follows. Let C2 be a (n,k2) subcode of C1 with
+
generator G2, with 0 5 k2 5 k l . Then each of the 2ko-kl cosets cl C1 in the partition
CO/CI can be partitioned into 2k1-k2cosets consisting of the following codewords
cl + + C2 = {cl + + c : c E C2}
dp dp
G2=[ 0
1 1 1 1 1 1 1 1
0 1 1 0 0 1 1 1
There are two cosets in C1/C2,
@0/1 = [:GO\l
11]
GO\l
(8.14)
Note that C 1 p is a subcode (subgroup) of Co/1. The coset representatives for Co/1/C1/2,
which are denoted by [Co/l/C1/2], form a linear code. Let 8 ~ \cl,2, , denote the generator
matrix for the coset representatives [Co/l/Cl/2]. Then form the code C0/1/2 = ICo/C1/C2I4
by
C0/1/2 = ICo/C1/C214 = {(a x, b + +
x) : a, b E C112 and x E [C0/1/C1~211.
That is, it is obtained by the squaring construction of Co/1 and C0/1/C1/2. The generator
p .P,2]
matrix for C0/1/2 is
@0/1/2 = .
@1\2 81\2
This gives a (4n, ko + 2k1 + k2) linear block code with minimum distance
d0/1/2 = min(4do92di,d2).
8.4 Building Long Codes from Short Codes: The Squaring Construction 395
Writing 6 0 / 1 and 6 1 / 2 as in (8.14), rearranging rows and columns and performing some
simple row operations, we can write 60/1/2 as
G2
0
0
0
@0/1/2 =
GO\l
G1\2
0
0
which can be expressed as
Note that
are the generator matrices for the zeroth and first order Reed-Muller codes of length 4.
More generally, let CI,C2, . . . , Cm be a sequence of linear subcodes of C = Co with
generators Gj, minimum distance dj, and dimensions k l , k2, . . . , km satisfying
co 2 c 1 2 ... 1 c,
k 3 kl 2 - *. 2 km 2 0.
Then form the chain of partitions
co/cl,co/cl/c2, ...,co / C l / - * * / c n l ,
such that the code can be expressed as the direct sum
co = [C/Cll @ [Cl/C21 @ . * * @ [C*-l/Cml.
Assume that the generator matrix is represented in a way that Go 2 G 1 . 2 Gm. Let -
Gj\i+l denote the generator matrix for the coset representative space [Ci/Cj+l], with
= {(a + X, b + X) : a, b E 8 1 / 2 / - . / m ,x E [@0/1/.../m-l/61/2/.../mI).
396 Other Important Block Codes
The generator matrix can be written (after appropriate rearrangement of the rows) as
m
@ o / I /.../rn = Z2m @ Gm @ ~ G R M m( ) ~@ Gr\r+1,
.
r=O
G F ( P ) = Q p U N p U {Ole
As we have seen G F ( p ) is cyclic. This gives rise to the following observation:
Example 8.16 Let p = 11, which has quadratic residues Q p = ( 1 , 3 , 4 , 5 , 9 } . Let s = 3. A field
having a primitive 11th root of unity is GF(35). Let E CF(35) be a primitive 11th root of unity.
The conjugates of j3 are:
P , P 3 , B9, P 2 I = 5 , P81 = P4,
8.5 Quadratic Residue Codes 397
q(x)= n
iCQp
(x - P i )
n(x)= n
icNp
(x - Pi> (8.15)
QR codes tend to have rather good distance properties. Some binary QR codes are the best
codes known for their particular values of n and k. A bound on the distance is provided by
the following.
Theorem 8.12 The minimum distance d of the codes 9 or N satisjies d 2 ? p. IJ; addition-
ally, p = 41 - 1 forsomel, thend2 - d 1 ? p. +
The proof relies on the following lemma.
Lemma 8.13 Let $(x) = q ( x n ) , where n E N p (where the operations are in the ring R).
Then the mots of $(x) are in the set ( a i ,i E N p } . That is, $(x) is a scalar multiple of n(x).
Similarly, n ( x n )is a scalar multiple of q ( x ) .
398 Other Important Block Codes
Proof Let p be a generator of the nonzero elements of G F ( p ) . From the discussion around
Lemma 8.11, Q is generated by even powers of p and N p is generated by odd powers of p.
Write i(x) = (x" - a'). Let m E N p . Then for any m E N p ,
j(a") = n
ieNp
(arn"- a".
But since m E N p and n E N p , m n E N p (being both odd powers of p . So a' = am" for
some value of i, so am is a root of 4 (x). 0
The effect of evaluation at q (x") is to permute the coefficients of q (x).
Proof of Theorem 8.12. [220, p. 4831. Let a(x) be a codeword of minimum nonzero
weight d in 9. Then by Lemma 8.13, the polynomial Z(x) = a(x") is a codeword in N.
Since the coefficients of Z ( x ) are simply a permutation (and possible scaling) of those of
a(x),Z(x) must be a codeword of minimum weight in N. The product a(x)Z(x) must be a
multiple of the polynomial
Thus a(x)Z(x) has weight p . Since a(x) has weight d , the maximum weight of a(x)Z(x)
is d2. We obtain the bound d 2 2 p .
If p = 4k - 1 then n = 1 is a quadratic nonresidue. In the product a(x)Z(x) =
a(x)a(x-') there are d terms equal to 1, so the maximum weight of the product is d 2 - d 1. +
n
U
Table 8.1 summarizesknown distance properties for some augmented binary QR codes,
with indications of best known codes. In some cases, d is expressed in terms of upper and
lower bounds.
While general decoding techniques have been developed for QR codes, we present only
a decoding algorithm for a particular QR code, the Golay code presented in the next section.
Decoding algorithms for other QR codes are discussed in [2201, [287], [283], and [75].
supported the burden of applications this alleged importance would suggest, they do lie at
the confluence of several routes of theoretical development and are worth studying.
Let us take p = 23 and form the binary QR code. The field GF(211) has a primitive
23rd root of unity. The quadratic residues are
Q p = { 1 , 2 , 3 , 4 , 6 , 8 , 9 ,12, 13, 16, 18)
n
and the corresponding generators for L! and N are
q(x) = (x - p i ) = 1 + x + x 5 +x6 + x 7 + x 9 + x"
i€Qp
n(x)= l + x 2 + x 4 + x ~ + x 6 + x 1 0 + x l l .
This produces a (23,12,7)code, the Golay code 5 2 3 . It is straightforwardto verify that this
code is a perfect code: the number of points out to a distance t = 3 is equal to
- 1 1 1 1 1 1 1 1
-
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
G= 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
- 1 1 1 1 1 1 1 1 1 1 1 1 1 -
(8.16)
We note the following facts about 524.
In this representation,the 11 x 11 ATl matrix on the upper right is obtained from the
transpose of 12 x 12 Hadamard matrix of Paley type (8.5) by removing the first row
and column of H12, then replacing - 1 by 1 and 1 by 0. Since the rows of H12 differ
in six places, the rows of ATl differ by six places. Because of the identity block, the
sum of any two rows of G has weight 8.
If u and v are rows of G (not necessarily distinct), then wt(u . v) = 0 (mod 2). So
every row of G is orthogonal to every other row. Therefore, G is also the parity check
matrix H of 524. Also 9 is dual to itself 524 = 5i4. Such a code is call self-dual.
Every codeword has even weight: If there were a codeword u of odd weight then
wt(u .u) = 1. Furthermore, since every row of the generator has weight divisible by
4, every codeword is even.
The weight distributions for the (23, 12) and the (24, 12) codes are shown in Table
8.2.
400 Other Important Block Codes
Table 8.2: Weight Distributions for the 923 and 924 Codes [373]
923: i: 0 7 8 11 12 15 16 23
Ai: 1 253 506 1288 1288 506 253 1
Q24: i: 0 8 12 16 24
Aj: 1 759 2576 759 1
g(x) = fl ( x - B’).
iE Qp
Thus B, B3, and B9 are all roots of g ( x ) , and hence of any codeword c ( x ) = m ( x ) g ( x ) . Let
+
c ( x ) be the transmitted codeword, and let r ( x ) = c ( x ) e ( x ) be the received polynomial.
We define the syndrome as
sj = r ( @ ) = e(B’>.
L ( x ) = (X - z l ) ( X - ZZ)(X - 23) = x
3
+ 01X
2
+ a 2 X + 03,
8.6 Golay Codes 401
The problem now is to compute the coefficients of the error locator polynomial using the
syndrome values. By substitution of the definitions, it can be shown that
S9 + S1
9
+ 2
= a2S7 03S3
2
s7 = sIs3 f 0285 + 0 3 s4l
= S1 + 0283 + 03s: + $3 = 0 3 + 02Sl.
5 3
S5 $1
The quantity D thus has a cube root in GF(211). From (8.17) we obtain 0 2 = :s + D'J3;
similarly for 0 3 . Combining these results, we obtain the following equations:
01 = s1
An example of the decoder is shown in t e s t G o l a y . cc.
= [I12 B].
It may be observed that B is orthogonal,
B ~ =BI .
+
Let r = c e and let e = (x, y), where x and y are each vectors of length 12. Since the code
is capable of correcting up to three errors, there are only a few possible weight distributions
of x and y to consider:
wt(x) 5 3 wt(y) = 0
wt(x) 5 2 wt(y) = 1
wt(x) 5 1 wt(y) = 2
wt(x) = 0 wt(y) = 3.
402 Other Important Block Codes
Since the code is self-dual, the generator matrix is also the parity check matrix. We can
compute a syndrome by
s = GrT = G(e T ) = G[x, yIT = xT + ByT.
If y = 0, then s = xT. If s has weight 5 3, we conclude that y = 0. The error pattern is
e = (x, 0) = (sT, 0).
Suppose now that wt(y) = 1, where the error is in the ith coordinate of y and that
wt(x) I2. The syndrome in this case is
T
S = X +bz,
where bi is the i th column of B. The position i is found by identifying the position such that
+
wt(s bj) = wt(x) L: 2. Having thus identified i, the error pattern is e = ((s b,)T, yi). +
Here, the notation yz is the vector of length 12 having a 1 in position i and zeros elsewhere.
+ + +
Ifwt(x) = Oandwt(y) = 2 o r 3 thens = b, b j o r s = b, b, bk. Since B is an
orthogonal matrix,
B ~ =S B ~ ( B Y=~ yT.)
The error pattern is e = (0, (BTs)T).
Finally, if wt(x) = 1 and wt(y) = 2, let the nonzero coordinate of x be at index i f Then
B ~ =S B T (XT + BY^) = B ~ +XB ~ ~ = rB
: + yT,
~ ~
where r, is the ith row of B. The error pattern is e = (xi, (BTs)T ri). +
Combining all these cases together, we obtain the following decoding algorithm.
go1ayarith.m
4 if wt(s) 5 3
5 e= (sT,O)
6 +
eke if wt(s bi) 5 2 for some column vector bj
7 +
e = ((s bi)T y j )
8 else
9 Compute B ~ S
10 ifwt(BTs) 5 3
e = (0, ( ~ ~ s ) ~ )
iz else ifwt(BTs + r:) 5 2 for some row vector ri
13 e = (xi, + ri)
14 else
15 Too many errors: declare uncorrectable error pattern and stop.
16 end
17 end
18 c = r f e
13 gates. The eight X-OR trees for generating the eight syndrome bits are identical.
These provide uniform and minimum in the error-correction process.
where (m)
I
= (111 ~.7l ') 1.1.• 1 is the binomial coefficient. For example, let m = 5 and r = 2.
Then. n = 32. k(2. 5) = 16. and dmin = 8. There exists a (32, RM code with a
minimum distance of 8.
For 1 :S i :Sm, let w; be a 2m-tuple over GF(2) of the following form:
wl = (0 · .. 0, 1 .. -1 0 .. · 0 · .... · . '-=-=y=="
~_, ~ ' ~ '
1 .. · 1) (4.4)
2i-1 21-I
where "." denotes the logic product (or AND operation), i.e., ai . bi = 1 if and only
if ai = h1 = 1. Form = 4,
is said to have degree I. Because the weights of v1, v2, · · · , v 111 are even and powers
of 2, the weight of the product v; 1wi 2 · · · Wi1 is also even and a power of 2, in fact, 2111 - 1.
The rth-order RM code, RM(r, m ), of length 2m is generated (or spanned) by
the following set of independent vectors:
(4.5)
· · · , up to products of degree r}.
There are
k(r, m) = 1+ G+ (7;) + ··· + (7)
1
)
vectors in GRM(r, m). Therefore, the dimension of the code is k(r, m).
If the vectors in GRM(r, m) are arranged as rows of a matrix, then the matrix
is a generator matrix of the RM(r. m) code. Hereafter, we use GRM(r, m) as the
generator matrix. For O:::: I :::c r, there are exactly en
rows in GRM(r, m) of weight
2111 - 1. Because all the vectors in GRM(r, 111) are of even weight, all the codewords
in the RM(r, m) code have even weight. From the code construction we readily see
that the RM(r - 1, m) code is a proper subcode of the RM(r, m) code. Hence, we
have the following inclusion chain:
IEXAMPllE 4.2
Let 111 = 4 and r = 2. The second-order RM code of length n = 16 is generated by
the following 11 vectors:
wo 1111111111111111
W4 0000000011111111
V3 0000111100001111
Vz 0011001100110011
VJ 0101010101010101
W3V4 0000000000001111
VzW4 0000000000110011
V(o/4 0000000001010101
VzV3 0000001100000011
V1V3 0000010100000101
v1v2 0001000100010001
With the preceding construction, the generator matrix GRM(r, 111) of the
RM(r, 111) code is not in systematic form. H can be put in systematic form with
Section 4.3 Reed-Muller Codes W1
row and column nn,µ;,•s '" form
have many and useful structures that reduce Uvv~<~rnc;,. v~H<1fJ.Cvr,H.f This
will be discussed in a later
The Reed for RM codes is best by an example.
Consider the second-order RM code RM(2, of 1n 4.2.
A1 = ro + r1 + r2 + r3,
A2 = r4 + r7,
+rs+ r6
= rg + r9 + r10 + r11.
Similar independent determinations of information bits a13, a23, a14, a24, and
a34 can be made from the code bits. For example, the four independent determina-
tions of a13 are:
a13 = bo + b1 + b4 + bs,
a13 = b2 + b3 + b6 + b7,
a13 = bs + b9 + b12 + b13,
a13 = b10 + b11 + b14 + b15.
At the decoder, we decode a 13 by forming the following four check-sums from the
received bits using the preceding four independent determinations of a13:
A1 = ro + r1 + r4 + rs,
A2 = r2 + r3 + r6 + r7,
A3 = rs + r9 + r12 + r13,
A4 = r10 + r11 + r14 + r15.
From these four check-sums we use the majority-logic decision rule to decode a13-
H there is a single transmission error in the received sequence, the information bits
a12, a13, a23, a14, a24, and a34 will be decoded correctly.
After the decoding of a12, a13, a23, a14, a24, and a34, the vector
We note that starting from the first component, the sums of every two consecutive
components in vo, v4, v3, and v2 are zero; however, the sum of every two consecutive
components of v1 is equal to 1. As a consequence, we can form the following
eight independent determinations of the information bit a 1 from the code bits 1) b6
through bf~):
-
a1 -
b(l)
o
+ b(l)
1 '
- b(l)
a1 - 8
+ b(l)
9 '
- b(l) + b(l) (1) (1)
a1 - 2 3 , a1 = blO + bll '
- b(l) + b(l) (1) (1)
a1 - 4 5 a1 = h12 + b13 ,
(l) + l (1) (1) (1)
a1 = b6 ?7 , a1 = bl4 + blS .
A1(l) _- ,.Cl)
0
+ 1)1)
1 ' A~l) = ril) + r~l)'
1.(1) + /Jl)
A 2(l) -_ 2 3 ' A6(l) _ 1.(1) -'- ,Jl)
- 10 I 11 '
A3(l) -- 1.(1) + 1.(1) (l) _ 1.(1)
A7 - + ,Jl)
4 5 ' 12 '13 '
A4(l) -- /Jl) + /Jl)
A8(l) -- ,Jl) + ,Jl)
6 7 ' 14 15 ·
+···+
Let r = (ro, r1, · · · , r11 _1) be the received vector. Decoding of RM(r, m) code
consists of r + 1 steps. At the first step of decoding, the information bits a; 1; 2 ... ;,
corresponding to the product vectors v; 1v; 2 • • • v;, of degree r in ( 4. 7) are decoded
based on their check-sums formed from the received bits in r. Based on these
decoded information bits, the received vector r = (ro, ri, · · · , r11 _1) is modified.
Let rC 1) = (rgl), r?), ... , r,;~ 1) denote the modified received vector. At the second
step of decoding, the bits in the modified received vector ll'(l) are used to form
the check-sums for decoding the information bits a; 1; 2 ... ;,_ 1 that correspond to the
product vectors v; 1v; 2 • • • v;,_ 1 of degree r - l in ( 4. 7). Then, the decoded information
bits at the second step of decoding are used to modify r(ll. The modification results
in the next modified received vector rC 2l = (rg2), r?), · · · , r,;~ 1) for the third step of
decoding. This step-by-step decoding process continues until the last information
bit ao that corresponds to the all-one vector vo in (4. 7) is decoded. This decoding
process is called (r + 1)-step majority-logic decoding [2, 11].
Now, we need to know how to form check-sums for decoding at each step. For
1 :::: i 1 < i2 < · · · < ir-l :::: m with O :::: l < r, we form the following index set:
6 {
S = c; 1 -1 2 ; 1-1 + c; 2 -1 2 io-1
- + · · · + c;,_ 1-1 2;,-1-1 .. c;;-1 E {O , l} f or 1 :::: J::::
· r - /}.
(4.8)
which is a set of 2r-l nonnegative integers less than 2 in binary form. The exponent 111
= {q +- s : s E S}.
the check-sums for decoding the information bits a; 1; 2 .. ;, _ 1 are
AUl =
tEB
B1 = 0 + S = {O, 1, 2, 3},
B2 = 4 + S = {4, 5, 6, 7),
B3 = 8 + S = {8, 9, 10, 11},
B4 = 12 + S = {12, 13, 14, 15}.
It follows from (4.13) with l = 0 that the four check-sums for a12 are
(0)
A1 = ro + r1 + r2 + r3,
= (0, 2, 8, 10}.
From these index sets and we obtain the following check-sums for 013:
AjO) = ro + r1 + r4 + rs,
AiO) = r2 + r3 + r6 + r7,
A~O) =rs+ r9 + r12 + r13,
(0)
A4 = rio + r11 + r14 + r15.
Using the same we can form the check-sums for information bits
014, 023, 024, and 034.
To find the check-sums for a 1 , a2, 03, and a4, we first form the modified received
vector ii'(l) based on (4.11):
ii'(l) = Ir-
Suppose we want to form the check-sums for a3. Because i1 = 3, we obtain the
following sets:
2
S = (c22 : c2 E (0, 1)} = {O, 4},
It follows from (4.13) with I = 1 that we obtain the following eight check-sums:
A (1) - + r(l)
r(l)
5 - 8 12'
(1) + (1)
A6(l) = r9 r13 ,
(1) + (1)
A7(l) =r10 r14'
A (l) - r(l) + r(l)
8 - 11 15 ·
Similarly, we can form the check-sums for a1, a2, and 04.
114 Chapter 4 Important Linear Block Codes
10° -
10-1
10-2
10-,
~
:.0
"
..Cl
0
....
0..
....
........0 10-0
- - :uncoded BPSK
! __... :RM(16, 11, 4)
~
- :RM(32, 6, 16)
~ :RM(32, 16, 8)
10-7
- ....., - :RM(32, 26, 4)
- -><- - :RM(64, 22, 16)
10-8
- -o- - :RM( 64, 42, 8)
10-9
0 2 4 6 8 10 12
E1/N0 (in dB)
(4.14)
(4.15)
Section 4.4 Other Constructions for Reed-Muller Codes ns
The three-fold Kronecker of G(2.2J is defined as
.2>)
6
= [~ i J ® [ ~ iJ ® [ ~ iJ
1 1
-- [ 01
:i lt0
1 0
0 1
0 0
j]
1 1 1 1 1 1 1 1 (4.16)
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
0 0 0 1 0 0 0 1
0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1
Similarly, we can define the ,n-fold Kronecker product of G(2.2J· Let n = 2111 • We use
G(n.nJ to denote the 111-fold Kronecker of G(2.2)· G(l,. 11 ) is a 2m x 2111 matrix
over The rows of G(n.n) have weights 2°. 2 1 , 22 , ... , 2"', and the number of
rows with weight 2111 - 1 is (';')for O :s l :s m.
The generator matrix GRM(r. m) of the rth-order RM code RM(r, m) of length
n = 2"' consists of those rows of G(l,, 11 J with to or greater than
2m-r. These rows are the same vectors given by (4.5), except they are a different
permutation.
IEJ{AMrUE 4.1.ii
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 0 1 1 0 1 0 1
0 0 1 0 1 0 1
0 0 1 1 0 1 1 0 0
0 1 1 0 0 1 1
0 0 0 1 0 0 1 0 0
0 0 1 0 0 0 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
G<24.2"J = 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
116 Chapter 4 Important Linear Block Codes
The generator matrix GRM(2, 4) of the second-order RM code, RM(2, 4), of length
16 consists of the rows in G (16. 16) with weights 22 , 23 , and 24 . Thus, we obtain
1 1 1 1 1 1 11 1 1 1 1 1 1 1 1
0 1 0 1 0 1 01 0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
GRM(2, 4) = 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
which is exactly the same matrix given in Example 4.2, except for the ordering of
the rows.
Let u = (uo, u1, ... , u11 _1) and v = (vo, vi, ... , v11 _1) be two n-tuples over
GF(2). From llll and v we form the following 2n-tuple:
I:,
juju+ vi= (uo, u1, ... , Un-1, uo + vo, u1 + v1, ... , U 17 -1 +v 11 _1). (4.17)
For i = 1, 2, let C; be a binary (n, k;) linear code with generator matrix G; and
minimum distance d;, respectively. Assume that d2 > d1. We form the following
linear code of length 2n:
C = IC1!C1 + C2I
(4.18)
= {llllllfill + vi : fill E C1 and V E C2}.
(4.19)
(4.20)
To prove this, let x = IM!llll + vi and y = IIDl'lllll' + v'I be two distinct codewords in C.
The Hamming distance between x and y can be expressed in terms of Hamming
weights as follows:
where w(z) denotes the Hamming weight oh. There are two cases to be considered,
v = v' and v -jc v'. If v = v', since x -jc y, we must have lUl -jc u'. In this case,
(4.23)
y) 2': + --1-
W(illl + v') -- w +
= w(7 + v').
',\f\
J/ -
>
and that
y) 2':
Because 'X and y are two different codewords in C. the minimum distance
d 111 ; 11 ( C) must be lower bounded as follows:
d111i11CC) ?.':
dmin(C) = ( 4.28)
1 1 1 1 1 1 1
0 0 0 0 1 1 1
Gi= 0 0 1 1 0 0 1
[ 0 1 0 1 0 1 0
Let C2 be the (8, 1) repetition code of minimum distance 8 generated
G2 =[ 1 1 1 1 1 1 1 1 ].
118 Chapter 4 Important Linear Block Codes
G-[
- G1
0 G1
G2
J
~[~
1 1 1 1 1 1 1 1 1 1 1 1 1 1
n
0 0 0 1 1 1 1 0 0 0 0 1 1 1
0 1 1 0 0 1 1 0 0 1 1 0 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
GRM(r, m) =[
GRM(r, 111
O
- 1) GRM(r, 111 -1) J (4.30)
GRM(r - 1, 111 - 1) .
The matrix of (4.30) shows that a RM code can be constructed recursively from
short RM codes by a sequence of lulu+ vi-constructions. For example, the rth-order
RM code RM(r, m) of length 2111 can be constructed from RM codes RM(r, 111 - 2),
RM(r l, m - 2), and RM(r - 2, m - 2) of length 2111 - 2. The generator matrix in
terms of component codes is given as follows:
G-
GRM(r, m - 2) GRM(r, 111 - 2)
0 GRM(r-l,m-2)
GRM(r, m - 2)
0
GRM(r, 111. - 2) ·
GRM(r-l,m-2)
1
-
l 0
0
0
0
GRM(r-l,111-2) GRM(r-l,111-2)
0 GRM(r - 2,111 - 2)
.
(4.31)
given Boolean function f(X1, X2, ... , X 111 ), we form the following 2111 -tuple
vector):
'i!(f) = (vo, v1 .... , v1, .... v2111-1) (4.32)
where
(4.33)
and (b11, b12, ... , b1 111 ) is the standard binary representation of the index integer/.
We say that the Boolean function f (X1, ... , X111 ) represents the vector 'ii. We use
the notation w(f) for the vector represented by f (X1, X2, ... , ). For 1 :s i :s m,
consider the Boolean function
H is easy to see that this Boolean function represents the vector 'ii; defined
For 1 :s i, j :s m, the function
represents the logic of •1; and 'ilj, represented by g(X1, ... , X111 ) = and
h(X1, X2, ... , Xm) = X 1, respectively. For 1 :S i1 < i2 < · · · < i,. :Sm, the Boolean
function
(4.36)
represents the logic product of 'ii; 1 , 'i/; 2 , ••• , and 'ii;,. Therefore, the generator vectors
of the rth-order RM code of length n = 2111 (the rows in GRM(r, m)) are represented
the Boolean functions in the following set:
(4.37)
... up to all products of r variables}.
Let P (r, m) denote the set of all Boolean functions (or polynomials) of degree r or
less with m variables. Then, RM(r, m) is given the following set of vectors [12]:
(4.39)
with 1 :s l :s 2k-k1, where for 'ii/ I= @, v1 is in C but not in C1 , and for v1 = «Jl, the
cos et «Jl EB C1 is just the subcode C 1 itself. The codeword 'ilt is called the leader ( or
representative) of the coset 'ii/ EB Cl· We also showed in Section 3.5 that any codeword
120 Chapter 4 Important Linear Block Codes
in a coset can be used as the coset representative without changing the composition
of the coset. The all-zero codeword O is always used as the representative for C1.
The set of representatives for the cosets in the partition C / C1 is denoted by [C / C1],
which is called the cos et representative space for the partition C / C1 · Because all the
cosets in C / C1 are disjoint, C1 and [C / C1] have only the all-zero codeword O in
common; that is, C1 n [C / C1] = {O}. Then, we can express C as the sum of the coset
representatives in [C/C1] and the codewords in C1 as follows:
(4.40)
(4.41)
with 1 ::: I ::: 2k-k1 and 1 ::c q ::c 2k1-k 2 , where for wq f- 0, wq is a codeword in C1 but
not in C 2. We denote this partition C / Ci/ C2. This partition consists of 2k-k2 cosets
of C2. Now, we can express C as the following direct-sum:
(4.42)
(4.43)
and
k ~ k1 ~ k2 ~ · · · ~ k 111 ~ 0. (4.44)
Then, we can form a chain of partitions,
(4.45)
( 4.46)
We now present another method for constructing long codes from a sequence
of subcodes of a given short code. This method is known as the squaring construc-
tion [15].
Let Co be a binary (n, ko) linear block code with minimum Hamming dis-
tance do. Let C1, C2 ..... C111 be a sequence of subcodes of Co such that
(4.47)
Section 4.5 The Squaring Construction of Codes 121
For O :S i :S m, let G;, k;, and d; be the generator matrix, the dimension, and the
minimum distance of the subcode VVe form a chain of as
follows:
For O :S i < m, let G,;;+1 denote the generator matrix for the coset
space [C; /C;+1J. The rank of G;;;+1 is
Rank(G;;;+1) = Rank(G;) -
Without loss of generality, we assume that Go :) G 1 :) :) ... :) . Then, for
0 :Si < m,
G;;;+1 = \G;+l· (4.50)
One-level squaring construction is based on C1 and the
1ll = (ao. 01, .... a 11 _1) and \bJ = (bo, b1 ..... b11 _1) be two binary
(ru, ibJ) denote the 211-tuple (ao, a1, ... , a11 _1, bo, b1 ..... bn-1). We form the
set of 211-tuples:
Then, ICo/ C11 2 is a (2n, ko + k1) linear block code with minimum Hamming distance
. 2
D1 =L. mm{ do. di). (4.52)
Let M1 and M2 be two matrices with the same number of columns. The matrix
(4.55)
and
(1111) and
[ 1 1 1
1
O O 1 1
0 1 0 1 ]
are the generator matrices of the zeroth- and first-order RM codes of length 4.
Higher-level squaring construction can be carried out recursively in a similar
manner. Form :::: 2, let
Section 45 The Construction of Codes U3l
and
/:. 2111-I
Vm = I I ... /Cm!
denote the two codes obtained (m - construction. The code
obtained m-level
G= ® G,.;,-+1, (4.63)
O:::r<m
=[ . m--1)
0
GRM(r, m - 1)
-1, m -1)
"J . ( 4.65)
We define
I:.
-1.m-1)= ,m -1 -- l, m -- 1). (4.66)
(4.67)
GRM(r - L m - 1)
GRM(r, m) =
[
0
DRM(r/r -1, m --1)
GRM(r -°i.
m - 1)
DRM(r/r -1. m - 1)
J. (4.68)
This is exactly the generator matrix form of one-level squaring construction. There-
fore, the rth-order RM code of length 2111 can be constructed from the rth-order and
(r - l)th order RM codes of length 2111 -1; that
Because
RM(r, m - 1) = IRM(r, m - 2)/RM(r - 1,111 - 2)1 2
and
RM(r -1, 111 -1) = IRM(r -1, 111 - 2)/RM(r - 2,111 - 2)1 2 ,
then we can construct RM(r, 111) from RM(r, m - 2), RM(r 1, 111 - 2), and RM(r -
2, m - 2) using two-level squaring construction; that is,
22
RM(r, 111) = IRM(r, 111 - 2)/RM(r -1, 111 - 2)/RM(r - 2,111 - 2)1 . (4.70)
Repeating the preceding process, we find that for 1 :::: µ :::: r, we can express the
RM(r, m) code as a µ-level squaring construction code as follows:
A211,-, =2 n
r m-r-1 (
i=O
2m-r-1
. .
2m-i _ l )
-1 (4.72)
In fact, these minimum-weight codewords span (or generate) the code; that is the
linear combinations of these minimum-weight codewords produce all the codewords
of the RM(r, m) code.
The weight distribution of several subclasses of RM codes and all RM codes
of lengths up to 512 have been enumerated [12, 18-21 ]. The first-order RM code,
RM(l, m), has only three weights, 1, 2111 - 1, and 2111 • The number of codewords of
these weights are
Ao= A2111 = 1,
(4.73)
Section 4.6 The 12) Golay Code 125
Ao= A2,,, = 1,
' 2 1nmi=m-21+1 (i - 1)
1 = 2(111-+m+2)/2 _ 2 _ 2 "iU+l) .
L I 2
1-1 ni=l c2' - 1)
0 8 12 16 20 24 1 32
A; 1 620 13888 36518 13888 620 1
This code is a self-dual code.
RM codes form a remarkable class of linear block codes. Their rich structural
properties make them very easy to decode by either hard- or soft-decision decoding.
Various soft-decision decoding algorithms for these codes have been devised, and
some will be discussed in later chapters. Other classes of codes are more powerful
than RM codes-for the same minimum distance, these codes have higher rates:
however, the low decoding complexity of RM codes makes them very attractive
in practical applications. fo fact, in terms of both error performance and decoding
complexity, RM codes often outperform their corresponding more powerful codes.
The (m - 2)th-order RM code of length 2111 is actually the distance-4 extended
Hamming code obtained by adding an overall parity bit to the Hamming code of
length 2111 - 1.
Chapter 9
Reed-Muller Codes:
Weak Codes with Easy
Decoding
Length: η = 2 m
137
138 CHAPTER 9. REED-MULLER CODES
Length: η= 2 m
- 1
Information symbols: * = Σ : (?)=0
Minimum distance: d= 2 - - 1
m r
length 32 with 6 information bits and it corrects 7 errors. Each dot of the
transmited picture was assigned one of 2 = 64 degrees of greyness, and
6
these 6 information bits were then encoded into a word of length 32.
We introduce Reed-Muller codes by means of Boolean polynomials,
which we first discuss in some detail. To understand the decoding of Reed-
Muller codes, it is more convenient to work with finite geometries, where
code words become characteristic functions of flats.
to Z .
2
of all variables XQ, x\, ... , x - i > &nd then to each combination, we assign
m
xo 0 1 0 1 0 1 0 1
xi 0 0 1 1 0 0 1 1
X2 0 0 0 0 1 1 1 1
f 0 1 1 0 1 1 1 0
f(0,0,...,0,0) = /o,
f(0,0,...,0,l) = f u
f(0,0,...,l,0) = / ,
a
f(l,l,...,l,l) = /,-_,.
In general,
fi = f(im-l, · · ·ι»1ι«θ),
Examples
(1) There are two constant Boolean functions
1 = 1 1 . . . 11 and 0 = 00...00.
140 CHAPTER 9. REED-MULLER CODES
ordinate value xo. Thus, the value is 0 for all even numbers and 1 for
all odd ones:
x = 0 1 0 1 0 1 0 . . . 1.
0
This follows from the way we are writing the truth tables. For example,
xi = 00110011...0011, and
xo 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
X! 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x a 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x 3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
where η = 2 . m
9.2. BOOLEAN POLYNOMIALS 141
(/ m_i . . . / l / o ) ( S 2 » - l •••gigo) -
2 (/2·»-102"·-ΐ)···(/ΐ0ΐ)(/θ0θ)·
Remarks
(1) Observe that the logical product fulfils
ff = f.
Thus, no exponents higher then 1 are ever needed.
(2) There are other natural operations which, however, can be expressed
by means of the logical sum and logical product. For example, the
negation f (which has value 1 precisely when f has value 0) is simply
expressed as a sum with the constant function 1 = 1 1 . . . 11:
f = 1+f.
The operation "or" ( f o r g), whose value is 1 precisely when f or g (or
both) have value 1, can be expressed as
f or g = f-(- g + fg.
(3) We have seen that 1 and each variable Xi are Boolean functions. Other
Boolean functions can be obtained by addition and multiplication. For
example: x , X j , l + x , + X j , etc. These are called Boolean polynomials:
Examples
(1) The Boolean polynomial l + x i x of three indeterminates has degree 2.
2
x i x = (01010101)(OOT10011) = 00010001.
2
142 CHAPTER 9. REED-MULLER CODES
Thus,
l + xjxa = 11101110.
1110111011101110.
indeterminate, see Example (2) in 9.1, and, further, we perform the required
additions and multiplications.
Conversely, every binary word f = / o / i . . /2·»-ι can be translated into
a Boolean polynomial as follows. First, observe that the last indetermi-
nate x _ i is the word
m
x _i = 00...0011...11,
m
whose first half is the constant function 0 and the other half is the con-
stant function 1 (now both considered in m — 1 indeterminates, i.e., in
the length ~2 = 2 m m _ 1
) . As a consequence, we see that for each Boolean
function / ( x o , x\,..., x _ i ) , the first half of the corresponding word f is
m
/ ( * 0 , . . • , Xm-2, Im-l) =
f(x ,0 * - 2 , 0) + [f(XQ,
m Xm-2, 0) + f(xo, ••·, Xm-2, l)]x -l-
m
is sufficient to verify that the identity holds for both of them. For x -\ — 0, m
f(x ,
0 . . .,X -2,
m 1) = f(xo,--,Xm-2,0)
+ [f(x , 0 · · ·, Xm-2, 0) + f(xi, • • • , Xm-2, 1)] ·
•
E x a m p l e . (3) Let us translate f = 01101110 into a Boolean polynomial
(of three variables). We apply the preceding proposition:
f = 0 1 1 0 + [0110+ 1110]* 2
= OIIO+IOOO12.
Next we apply the same proposition to the two words 0110 and 1000 (of
two indeterminates):
f = (0 + [ 0 + l ] * ) + (l + [l + l]*o)*i + ( l + [l + 0]*o)*a
o
+ (1 + [1 + 0 ] * o ) * i * 0
1,
Xi (i = 0, 1, . . . , m - 1 ) ,
x.Xj (»'· j = 0, 1, . . . , m - 1 and i φ j),
XQXI . . x -im ·
144 CHAPTER 9. REED-MULLER CODES
(9.2.1)
[The last equation is easily derived from the binomial theorem applied to
(l + l ) . ]
m
Ο
Examples
(1) 72(0, m) consists of the polynomials of degree at most 0, i.e., of 0 and 1.
Thus, 72(0, m) is the repetition code of length 2 . m
code.
For example, 72(1,3) has the following generator matrix:
1 "1 1 1 1 1 1 1 1 '
Xo 0 1 0 1 0 1 0 1
G =
xi 0 0 1 1 0 0 1 1
. X 2
. 0 0 0 0 1 1 1 1
We will see that 72(1,3) is the extended Hamming code (see 8.5).
72(1,4) is a (16,5)-code and 72(1,5) is a (32,6)-code, which was used
by the 1969 Mariner, as mentioned in the introduction.
It is easy to see that this is just the even-parity code of length 8. 72.(2,4)
is a (16, ll)-code. We will see that 11(2,4) is the extended Hamming
code.
contains all of the basis polynomials in Theorem 9.2 except the last one,
X o X i . . x - i - Since the even-parity code also has dimension 2 — 1,
m
m
*=έ(7)
1=0 ' x
(β) ~ (m-s)'
dimft(r, m ) x
=
t ( : ) - ± ( : )
»=o v
' »=o x
'
ί ο
146 CHAPTER 9. REED-MULLER CODES
We conclude that linear spaces H(r, πι) and %(m - r - l , m ) have the
λ
the coordinates of the logical product fg, we just have to show that fg,
represented as a binary word, has an even Hamming weight. The degree of
the polynomial fg is at most p+q < m — r - l + r = m— 1 (see Exercise 9A).
Thus, fg is a code word of H(m-1, m), which by Example (4) above implies
that fg has an even Hamming weight. Q
E x a m p l e . (5) H(m - 2, m) is the extended Hamming code of length 2 m
H(m - 2,m)- L
=n(l,m)
and, thus, H(m — 2, m) has the following parity check matrix:
1 "1 1 1 1 · • 1 1 1 1 "
xo 0 1 0 1 • 0 1 0 1
Η = xi = 0 0 1 1 · • 0 0 1 1
Xm . 0 0 0 0 · • 1 1 1 1 .
We can add the first row to all the other rows, and then interchange it with
the last row. This yields another parity check matrix:
' 1 0 1 0 · • 1 0 1 0"
1 1 0 0 · • 1 1 0 0
Η ~
1 1 1 1 · • 0 0 0 0
1 1 1 1 · • 1 1 1 1 .
By deleting the last column and the last row from the new matrix, we
obtain a 2 — 1 by m matrix Ho with pairwise distinct, nonzero columns.
m
0
. 1 1 1 1 ··· 1 1 1 1 .
is a parity check matrix of the extended Hamming code.
9.4. GEOMETRIC INTERPRETATION: 3-DIMENSIONAL CASE 147
Remarks
(1) Encoding of the Reed-Muller codes can be performed in the usual way
by multiplying the information word by the generator matrix [see Re-
mark (4) of 8.1]. In other words, the information bits become the
coefficients of the corresponding Boolean polynomial. For example,
in 72(1, m), we encode the m + 1 information bits as follows:
1
xo
xi = «1 + U X o + · · · + U iX _i.
[«1,«2, •••."rn+l 2 m + m
Xm-l
even-parity code 72(m— 1, m), we know that d is even. In 9.6, we will see
that 72(r, m) can correct 2 — 1 errors. Thus, by Proposition 4.6,
m - r _ 1
d > 2(2 -m
- 1), and we conclude that
r _ l
2 m _ r
< d< 2 m - r
- 2.
Since d is even, this proves d = 2 m - r
.
(3) In some respect, it is more suitable to work with the punctured Reed-
Muller codes 72(r, m) (see 8.6). This means that in 72(r, m), we delete
the last symbol from each code word. The resulting code 72(r, m) has
length 2 — 1, and the number of information symbols can be shown
m
the same number as the original code can. We list all punctured Reed-
Muller codes of lengths 7, . . . , 127 in Appendix B.
For example, 72(0, m) is the repetition code of length 2 m
— 1 and 72(m —
1, m) is the Hamming code of that length.
work with flats (or affine subspaces). We first present the case of the three-
dimensional geometry, which corresponds to the codes of length 8, and then
the general geometry of Ζψ.
Recall that the conventional three-dimensional Euclidean geometry op-
erates within the linear space R , whose points (or vectors) are triples
3
as follows:
a + <b (a, b in R , b φ 0 ) .
3
Here t denotes a real parameter, thus, the line a + t b is the set of all points
{ a + i b | t € R } . Further, we have planes in R : 3
In contrast to the real case above, there are precisely eight points. We can
enumerate them by the corresponding binary expansions of their indices:
po = 000, pi = 001, etc., see Figure 5.
a + ib (a, b in Z\, b φ 0 ) ,
where ί is a binary parameter, t = 0, 1. Thus, the line has precisely two
points: a and a + b. Conversely, every pair a , a ' of points constitutes a
line, viz.,
a + <(a' - a).
9.4. GEOMETRIC INTERPRETATION: 3-DIMENSIONAL CASE 149
©•»•
since a line is just a choice of an (unordered) pair from among the eight
points.
where t\ and < are binary parameters. The plane, then, consists of the
2
listed in Figure 6.
{Pl>P3,P5,P7} 10101010 xo
{P2,P3,P6,P7} 11001100 x\
{P4,P5,P6,P7} 11110000 X2
{P0,P2,P4.P6} 01010101 1 + xo
{P0,Pl,P4.P5 } 00110011
{P0,Pl,P2,P3} 00001111 1+ Z2
{Pl.P2,P5,P6 } 01100110 xo + xi
{ P l . P 3 , Ρ4,Ρβ} 01011010 Xo + Xl
{P2,P3,P4>P5} 00111100 Xl + * 2
{ P l 1 P 2 . P 4 . P 7 } 10010110 xo + xi + xi
{ P 0 , P 3 , P4.P7} 10011001 l + io + z i
{P0.P2,P5.P7} 10100101 1 + X0 + X2
{P0,Pl,P6,P7} 11000011 1 + XI+X2
{P0,P3,P5,P6} 01101001 1 + xo + xi + xi
Observe that the planes passing through the origin ( a = 0 ) are precisely
those which can be described by a homogenous equation
ΛοΧο + hiXi + hx 2 2 — c.
(Given a plane a + ijbj +/_>b2 and describing the parallel plane t\hi + t b 2 2
a = aoaia .)2
hoxo + Λ 1 Χ 1 + h x 2 2 — c,
h' xo + h[xi + h' x
0 2 2 = c'.
This follows from the fact that every line a + t b is an intersection of two
planes: choose a basis b, d, d' of the space Z\ and consider the planes
a + t b + s d and a + t b + s'd'.
Lines and planes are examples of flats (also called affine spaces). A flat
in the space Z is a coset (6.2) of a linear subspace of the space Z .
2 2 That
is, a flat has the form
a + A ' = { a + b | b i 8 a point of Κ } ,
is s, we call the coset an s-flat. Thus, lines are preisely the 1-flats, and
planes are the 2-flats. For each point pj, we have a 0-flat { p< } , and there
is precisely one 3-flat, viz., the whole space Z\.
Every flat L can be described by the binary word {L — fr .. • fifo denned
by
1 if the point p* lies in L,
0 otherwise.
The word fx, (or the correspondig Boolean function of three variables) is
called the characteristic function of the flat L. (For 1-flats and 2-flats, see
the above figures.)
Given flats L and V, their intersection Lf)L' is obviously characterized
by the logical product fcft/. For example, the first two planes in Figure 6
intersect in the line { p a . p ? } . The logical product of their characteristic
functions is
x x i = 10001000,
0
po = 000...00,
pi = 000...01,
P2 = 000 . . . 10,
P 2 n._i = 111...11.
is also denoted by
a + i,b, + . . . + f b . r P
Hx t r
= c , t r
where c t r
= Ha .
t r
152 CHAPTER 9. REED-MULLER CODES
hoio + h xi +
x (- n _ i x _ i = c.
m m
Examples
(1) Every 0-flat is a one-point set. Thus, there are 2 m
different 0-flats, viz.,
{ p o } , ··· , { P 2 - - l } -
a + <b = { a , a + b } ,
(3) Let P( denote the flat described by the equation n = 1. That is, P, is
the set of all points p* which have a 1 in the ith position. For example,
Po = { P i , P 3 . P s , · · · , P 2 » - l }·
Each Pi is a hyperplane. In fact, the point p . has precisely one nonzero
2
Pi = p . + K,
2
(4) For i φ j , the intersection Pi Π Pj (i.e., the set of all points with
a 1 in the ith and jth positions) is an (m — 2)-flat. In fact, for the
point a = p i + i (with l's just on the ith and jth positions), we have
2 2
ΡίΠΡ =&
} + Κ,
0 otherwise.
9.5. GEOMETRIC INTERPRETATION: GENERAL CASE 153
1. Shortly:
x xi..
0 .x -im lies in L «==> h(xo, xi, • • •, X m - i ) = 1·
Examples
(5) The only m-flat, i.e., the space Z , has the characteristic function
m
then
/l-(X0, · · · , X m - l ) = Λ Χ + /»iXj Η
0 0 h h -\X -l
m m + c + 1.
(7) For two flats L and L', the function f t f t ' is the characteristic function
of the intersection L(~\ L'.
For example, the polynomial χ,-χ^ is the characteristic function of the
(m — 2)-flat, which is the intersection of the hyperplanes P, and Pj.
More in general: the Boolean polynomial
X|'| X | J · · · X | ,
m-1
hijXj — c, for i = 1, 2, . . . , m — r.
less then 2 " " bits are corrupted, we want to determine, for each i = 0,
m - r 1
Thus, the first step of decoding the word w is to determine, for each
(r + l)-flat L, whether L is odd or even. This is performed by computing
the scalar product of w with the characteristic function f/,:
L is even <=> w · fi = 0.
sion s + 1.
Every (s + l)-dimensional space Κ containing Κ has the form Κ =
Κ + 1 b for some point b outside of K, where
Κ + t b = { a + t b I a € Κ and t = 0,1 } .
This follows immediately from the fact that every basis of Κ can be ex-
tended (by a single vector) to a basis of Κ—see Theorem 7.4.
Next observe that for two points b, b' outside of K, the linear subspaces
Κ +1 b and Κ +1 b' coincide if and only if b and b' lie in the same coset
modulo Κ (6.2). In fact, if Κ + tb = Κ -Mb', then b can be expressed as
a + t b', where a lies in Κ—thus, t φ 0 and we get
b - b ' = a € K.
By Proposition 6.2, b and b' lie in the same coset. Conversely, if b — b' = a
is a point of K, then b = a + b' lies in Κ + t b', and b' = - a + b lies
in K+tb; thus, K + tb = K + tb'. By Remark6.2, there are 2 / 2 * cosets
m
modulo K. One of them is Κ itself, and all other cosets contain only points
outside of K. Consequently, there are 2 — 1 different spaces Κ +1 b for
m _ J
b outside of K.
156 CHAPTER 9. REED-MULLER CODES
L = a+ Κ (dim A = a)
b = a + [ 0 + (b-a)].
then for each s-flat L, 0 < a < r, the majority of(s + l)-flais containing L
have the same parity of errors as L.
that L is contained in
2'm—r - 1 > 2t
(
First step: Receiving a word w, call each ( r + l)-flat L odd if the scalar
product of its characteristic function ft with w is 1, otherwise call L even.
That is, for (r + l)-flats L:
r. f odd if w · ft = 1,
L is <
even if w · ft = 0.
11101010.
The first step is to decide which planes (see Figure 6 in 9.4) are odd and
which are even. For example, the plane L — { P 1 . p 3 . p 5 . p 7 } is even since
w f t = 11101010· 10101010 = 0. See Figure 7. Next we must decide,
for each line L, whether L is odd or even. For example, the line { po.Pi }
is contained in three planes (see Figure 6 of 9.4):
{P0.P1.P4.P5} - even,
{P0.P1.P2.P3} - even,
{po.P1.Pe.P7} - odd.
158 CHAPTER 9. REED-MULLER CODES
10101010
For example, the code 72(1,5) is a (32,6)-code [see Example (1) in 9.3],
which corrects 2 — 1 = 7 errors, as mentioned in the introduction.
5 - 1 _ 1
246 CHAPTER 12. BCH CODES
1 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 000.
moreover, it is cyclic.
12.5. GENERALIZED REED-MULLER CODES 247
u> (3) = 2.
2
The proof, which is quite technical, is omitted. The reader can find it
in the monograph of MacWilliams and Sloane (1981).
is meant the cyclic code of all polynomials with β' as zeros for all i = 1, 2,
. . . , η such that w (i) < m — r.
9
Examples
(1) Take q = 2, m = 4, and r = 2. All the numbers i = 1, 2 2 - 1 4
with u) (») < 2 are 1, 2, 4, and 8. Thus, we obtain a cyclic binary code
2
the latter code has minimum distance d > 2t + 1 (Theorem 12.3), it follows
that the former code has this property too. Observe that the first number
1 with u) (i) > m - r is i = 2
2 - 1. Thus, each i = 1, 2, . . . , 2 ~
m _ r m
- T
2 satisfies Wi(i) < m - r. It follows that every code word u;(ar) of the
generalized Reed-Muller code has β* as zeros for» = 1, 2 , . . . , 2 - 2 = 2t.
m - r
Length: η= 2 m
_L_ = _r(x)-r(a) ,
χ— a χ—a
(The right-hand side is well-defined: since the polynomial r(x) — r(o) has
α as a zero, it is divisible by χ — a, see 11.1.)
Foundations of Coding: Theory and Applications of Error-Correcting Codes
with an Introduction to Cryptography and Information Theory
by Jiri Adämek
Copyright © 1991 John Wiley & Sons, Inc.
Appendix Β
B C H Codes and
Reed-Muller Codes
We list parameters of all BCH codes (Chapter 12) and all punctured Reed-
Muller codes 7c(r,m) (see 9.3) of lengths η = 7, 15, 31, 63, 127. The
number of information symbols is denoted by ib.
7 3 4 4 1
7 1 1 0
15 3 11 11 2
5 7
7 5
-5 -1
15 1 1 0
31 3 26 26 3
5 21
7 16
-
16
-2
11 11
15 6
-6 -1
31 1 1 0
325
326 APPENDIX Β. BCB CODES AND REED-MULLER CODES
63 3 57 57 4
5 51
7 45
-
42 -3
7 39
11 36
- -
13 30
- -
15 24
-
22
-2
21 18
23 16
- -
27 10
- -
31 7
-7 -1
63 1 1 0
0 1 1 1 1 1
1 1 −1 −1 1
0
1 1 0 1 −1 −1
A= .
1 −1 1 0 1 −1
1 −1 −1 1 0 1
1 1 −1 −1 1 0
In Exercise 61, we scaled the first column of A by −1 to obtain a [12, 6, 6] self-dual code
G 12 equivalent to G 12 . By that exercise, if we puncture G 12 in any coordinate and then extend
in the same coordinate, we get G 12 back. In Chapter 10 we will see that these punctured
codes are all equivalent to each other and to G 11 . As a result any [11, 6, 5] code equivalent
to one obtained by puncturing G 12 in any coordinate will be called the ternary Golay code;
any [12, 6, 6] code equivalent to G 12 (or G 12 ) will be called the extended ternary Golay
code.
In this section, we introduce the binary Reed–Muller codes. Nonbinary generalized Reed–
Muller codes will be examined in Section 13.2.3. The binary codes were first constructed
and explored by Muller [241] in 1954, and a majority logic decoding algorithm for them was
described by Reed [293] also in 1954. Although their minimum distance is relatively small,
they are of practical importance because of the ease with which they can be implemented
and decoded. They are of mathematical interest because of their connection with finite affine
34 Basic concepts of linear codes
and projective geometries; see [4, 5]. These codes can be defined in several different ways.
Here we choose a recursive definition based on the (u | u + v) construction.
Let m be a positive integer and r a nonnegative integer with r ≤ m. The binary codes
we construct will have length 2m . For each length there will be m + 1 linear codes, denoted
R(r, m) and called the r th order Reed–Muller, or RM, code of length 2m . The codes R(0, m)
and R(m, m) are trivial codes: the 0th order RM code R(0, m) is the binary repetition code
m
of length 2m with basis {1}, and the mth order RM code R(m, m) is the entire space F22 .
For 1 ≤ r < m, define
R(r, m) = {(u, u + v) | u ∈ R(r, m − 1),v ∈ R(r − 1, m − 1)}. (1.7)
Let G(0, m) = [11 · · · 1] and G(m, m) = I2m . From the above description, these are
generator matrices for R(0, m) and R(m, m), respectively. For 1 ≤ r < m, using (1.5),
a generator matrix G(r, m) for R(r, m) is
G(r, m − 1) G(r, m − 1)
G(r, m) = .
O G(r − 1, m − 1)
We illustrate this construction by producing the generator matrices for R(r, m) with
1 ≤ r < m ≤ 3:
1 0 1 0 1 0 1 0
1 0 1 0 0 1 0 1 0 1 0 1
G(1, 2) = 0 1 0 1 , G(1, 3) = 0 0
, and
1 1 0 0 1 1
0 0 1 1
0 0 0 0 1 1 1 1
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
0 0
0 1 0 0 0 1
G(2, 3) = 0 0 0 1 0 0 0 1 .
0 0 0 0 1 0 1 0
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
From these matrices, notice that R(1, 2) and R(2, 3) are both the set of all even weight
vectors in F42 and F82 , respectively. Notice also that R(1, 3) is an [8, 4, 4] self-dual code,
which must be H 3 by Exercise 56.
The dimension, minimum weight, and duals of the binary Reed–Muller codes can be
computed directly from their definitions.
Theorem 1.10.1 Let r be an integer with 0 ≤ r ≤ m. Then the following hold:
(i) R(i, m) ⊆ R( j, m), if 0 ≤ i ≤ j ≤ m.
(ii) The dimension of R(r, m) equals
m m m
+ + ··· + .
0 1 r
(iii) The minimum weight of R(r, m) equals 2m−r .
(iv) R(m, m)⊥ = {0}, and if 0 ≤ r < m, then R(r, m)⊥ = R(m − r − 1, m).
35 1.10 Reed–Muller codes
So (i) follows by induction if 0 < i. If i = 0, we only need to show that the all-one vector
of length 2m is in R( j, m) for j < m. Inductively assume the all-one vector of length 2m−1
is in R( j, m − 1). Then by definition (1.7), we see that the all-one vector of length 2m is in
R( j, m) as one choice for u is 1 and one choice for v is 0.
m
For (ii) the result is true for r = m as R(m, m) = F22 and
m m m
+ + ··· + = 2m .
0 1 m
It is also true for m = 1 by inspection. Now assume that R(i, m − 1) has dimension
m−1 m−1 m−1
+ + ··· + for all 0 ≤ i < m.
0 1 i
By the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has dimension the sum of the
dimensions of R(r, m − 1) and R(r − 1, m − 1), that is,
m−1 m−1 m−1 m−1 m−1 m−1
+ + ··· + + + + ··· + .
0 1 r 0 1 r −1
The result follows by the elementary properties of binomial coefficients:
m−1 m m−1 m−1 m
= and + = .
0 0 i −1 i i
Part (iii) is again valid for m = 1 by inspection and for both r = 0 and r = m as R(0, m)
m
is the binary repetition code of length 2m and R(m, m) = F22 . Assume that R(i, m − 1)
has minimum weight 2m−1−i for all 0 ≤ i < m. If 0 < r < m, then by definition (1.7) and
the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has minimum weight min{2 ·
2m−1−r , 2m−1−(r −1) } = 2m−r .
To prove (iv), we first note that R(m, m)⊥ is {0} since R(m, m) = F22 . So if we define
m
R(−1, m) = {0}, then R(−1, m)⊥ = R(m − (−1) − 1, m) for all m > 0. By direct compu-
tation, R(r, m)⊥ = R(m − r − 1, m) for all r with −1 ≤ r ≤ m = 1. Assume inductively
that if −1 ≤ i ≤ m − 1, then R(i, m − 1)⊥ = R((m − 1) − i − 1, m − 1). Let 0 ≤ r <
m. To prove R(r, m)⊥ = R(m − r − 1, m), it suffices to show that R(m − r − 1, m) ⊆
R(r, m)⊥ as dim R(r, m) + dim R(m − r − 1, m) = 2m by (ii). Notice that with the defini-
tion of R(−1, m), (1.7) extends to the case r = 0. Let x = (a, a + b) ∈ R(m − r − 1, m)
where a ∈ R(m − r − 1, m − 1) and b ∈ R(m − r − 2, m − 1), and let y = (u, u + v) ∈
R(r, m) where u ∈ R(r, m − 1) and v ∈ R(r − 1, m − 1). Then x · y = 2a · u + a · v + b ·
u + b · v = a · v + b · u + b · v. Each term is 0 as follows. As a ∈ R(m − r − 1, m − 1) =
R(r − 1, m − 1)⊥ , a · v = 0. As b ∈ R(m − r − 2, m − 1) = R(r, m − 1)⊥ , b · u = 0 and
36 Basic concepts of linear codes
We make a few observations based on this theorem. First, since R(0, m) is the length
2m repetition code, R(m − 1, m) = R(0, m)⊥ is the code of all even weight vectors in
m
F22 . We had previously observed this about R(1, 2) and R(2, 3). Second, if m is odd and
r = (m − 1)/2 we see from parts (iii) and (iv) that R(r, m) = R((m − 1)/2, m) is self-dual
with minimum weight 2(m−1)/2 . Again we had observed this about R(1, 3). In the exercises,
you will also verify the general result that puncturing R(1, m) and then taking the subcode
of even weight vectors produces the simplex code S m of length 2m − 1.
Exercise 62 In this exercise we produce another generator matrix G (1, m) for R(1, m).
Define
1 1
G (1, 1) = .
0 1
For m ≥ 2, recursively define
G (1, m − 1) G (1, m − 1)
G (1, m) = ,
00 · · · 0 11 · · · 1
and define G (1, m) to be the matrix obtained from G (1, m) by removing the bottom row
and placing it as row two in the matrix, moving the rows below down.
(a) Show that G (1, 1) is a generator matrix for R(1, 1).
(b) Find the matrices G (1, 2), G (1, 2), G (1, 3), and G (1, 3).
(c) What do you notice about the columns of the matrices obtained from G (1, 2) and
G (1, 3) by deleting the first row and the first column?
(d) Show using induction, part (a), and the definition (1.7) that G (1, m) is a generator
matrix for R(1, m).
(e) Formulate a generalization of part (c) that applies to the matrix obtained from G (1, m)
by deleting the first row and the first column. Prove your generalization is correct.
(f) Show that the code generated by the matrix obtained from G (1, m) by deleting the
first row and the first column is the simplex code S m .
(g) Show that the code R(m − 2, m) is the extended binary Hamming code H m .
Notice that this problem shows that the extended binary Hamming codes and their duals
are Reed–Muller codes.
Since the inception of coding theory, codes have been used in many diverse ways; in
addition to providing reliability in communication channels and computers, they give high
fidelity on compact disc recordings, and they have also permitted successful transmission
of pictures from outer space. New uses constantly appear. As a primary application of
codes is to store or transmit data, we introduce the process of encoding and decoding a
message.
37 1.11 Encoding, decoding, and Shannon’s Theorem
1.11.1 Encoding
Let C be an [n, k] linear code over the field Fq with generator matrix G. This code has q k
codewords which will be in one-to-one correspondence with q k messages. The simplest way
to view these messages is as k-tuples x in Fqk . The most common way to encode the message
x is as the codeword c = xG. If G is in standard form, the first k coordinates of the codeword
c are the information symbols x; the remaining n − k symbols are the parity check symbols,
that is, the redundancy added to x in order to help recover x if errors occur. The generator
matrix G may not be in standard form. If, however, there exist column indices i 1 , i 2 , . . . , i k
such that the k × k matrix consisting of these k columns of G is the k × k identity matrix,
then the message is found in the k coordinates i 1 , i 2 , . . . , i k of the codeword scrambled but
otherwise unchanged; that is, the message symbol x j is in component i j of the codeword.
If this occurs, we say that the encoder is systematic. If G is replaced by another generator
matrix, the encoding of x will, of course, be different. By row reduction, one could always
choose a generator matrix so that the encoder is systematic. Furthermore, if we are willing
to replace the code with a permutation equivalent one, by Theorem 1.6.2, we can choose a
code with generator matrix in standard form, and therefore the first k bits of the codeword
make up the message.
The method just described shows how to encode a message x using the generator matrix of
the code C. There is a second way to encode using the parity check matrix H . This is easiest to
do when G is in standard form [Ik | A]. In this case H = [−AT | In−k ] by Theorem 1.2.1.
Suppose that x = x1 · · · xk is to be encoded as the codeword c = c1 · · · cn . As G is in
standard form, c1 · · · ck = x1 · · · xk . So we need to determine the n − k parity check symbols
(redundancy symbols) ck+1 · · · cn . As 0 = H cT = [−AT | In−k ]cT , AT xT = [ck+1 · · · cn ]T .
One can generalize this when G is a systematic encoder.
Example 1.11.1 Let C be the [6, 3, 3] binary code with generator and parity check matrices
1 0 0 1 0 1 1 1 0 1 0 0
G= 0 1 0 1 1 0 and
H= 0 1 1 0 1 0 ,
0 0 1 0 1 1 1 0 1 0 0 1
c = xG = (x1 , x2 , x3 , x1 + x2 , x2 + x3 , x1 + x3 ). (1.8)
0 = c1 + c2 + c4 ,
0 = c2 + c3 + c5 ,
0 = c1 + c3 + c6 .
As G is in standard form, c1 c2 c3 = x1 x2 x3 , and solving this system clearly gives the same
codeword as in (1.8).
38 Basic concepts of linear codes
Exercise 63 Let C be the Hamming code H3 of Example 1.2.3, with parity check matrix
0 1 1 1 1 0 0
H = 1 0 1 1 0 1 0 .
1 1 0 1 0 0 1
(a) Construct the generator matrix for C and use it to encode the message 0110.
(b) Use your generator matrix to encode x1 x2 x3 x4 .
(c) Use H to encode the messages 0110 and x1 x2 x3 x4 .
Since there is a one-to-one correspondence between messages and codewords, one often
works only with the encoded messages (the codewords) at both the sending and receiving
end. In that case, at the decoding end in Figure 1.1, we are satisfied with an estimate c
obtained by the decoder from y, hoping that this is the codeword c that was transmitted.
However, if we are interested in the actual message, a question arises as to how to recover
the message from a codeword. If the codeword c = xG, and G is in standard form, the
message is the first k components of c; if the encoding is systematic, it is easy to recover
the message by looking at the coordinates of G containing the identity matrix. What can
be done otherwise? Because G has independent rows, there is an n × k matrix K such
that G K = Ik ; K is called a right inverse for G and is not necessarily unique. As c = xG,
cK = xG K = x.
Hint: One way K can be found is by using four zero rows and the three rows of I3 .
(c) Find a 7 × 4 right inverse K of G, where
1 1 0 1 0 0 0
0 1 1 0 1 0 0
G= 0 0 1 1 0 1 0 .
0 0 0 1 1 0 1
Remark: In Chapter 4, we will see that G generates a cyclic code and the structure of
G is typical of the structure of generator matrices of such codes.
(d) What is the message x if xG = 1000110, where G is given in part (c)?
39 1.11 Encoding, decoding, and Shannon’s Theorem
1−
1 s❍ ✲ s1
❍ ✟✟
❍ ✟
❍ ✟✟
❍
✯
✟❍
❍ ✟
❍
✟
✟ ❍
✟ ❥
❍
✟ ✟ ❍❍
✟ ❍❍
✟
0 s✟ ✲ ❍s 0
1−
Send Receive
Figure 1.2 Binary symmetric channel.
5 While is usually very small, if > 1/2, the probability that a bit is received in error is higher than the
probability that it is received correctly. So one strategy is to interchange 0 and 1 immediately at the receiving
end. This converts the BSC with crossover probability to a BSC with crossover probability 1 − < 1/2. This
of course does not help if = 1/2; in this case communication is not possible – see Exercise 77.
40 Basic concepts of linear codes
c = arg max prob(c | y).
c∈C
Here arg maxc∈C prob(c | y) is the argument c of the probability function prob(c | y) that
c = c for the codeword c
maximizes this probability. Alternately, the decoder could choose
with prob(y | c) maximum; such a decoder is called a maximum likelihood (or ML) decoder.
Symbolically, a ML decoder makes the decision
c = arg max prob(y | c). (1.9)
c∈C
since we assumed that bit errors are independent. By Figure 1.2, prob(yi | ci ) = if yi = ci
and prob(yi | ci ) = 1 − if yi = ci . Therefore
d(y,c)
prob(y | c) = d(y,c) (1 − )n−d(y,c) = (1 − )n . (1.10)
1−
Since 0 < < 1/2, 0 < /(1 − ) < 1. Therefore maximizing prob(y | c) is equivalent
to minimizing d(y, c), that is, finding the codeword c closest to the received vector y in
Hamming distance; this is called nearest neighbor decoding. Hence on a BSC, maximum
likelihood and nearest neighbor decoding are the same.
Let e = y − c so that y = c + e. The effect of noise in the communication channel is to
add an error vector e to the codeword c, and the goal of decoding is to determine e. Nearest
neighbor decoding is equivalent to finding a vector e of smallest weight such that y − e is in
the code. This error vector need not be unique since there may be more than one codeword
closest to y; in other words, (1.9) may not have a unique solution. When we have a decoder
capable of finding all codewords nearest to the received vector y, then we have a complete
decoder.
To examine vectors closest to a given codeword, the concept of spheres about codewords
proves useful. The sphere of radius r centered at a vector u in Fqn is defined to be the set
Sr (u) = v ∈ Fqn d(u, v) ≤ r
of all vectors whose distance from u is less than or equal to r . The number of vectors in
Sr (u) equals
r
n
(q − 1)i . (1.11)
i=0
i
These spheres are pairwise disjoint provided their radius is chosen small enough.
Proof: If z ∈ St (c1 ) ∩ St (c2 ), where c1 and c2 are codewords, then by the triangle inequality
(Theorem 1.4.1(iv)),
implying that c1 = c2 .
Corollary 1.11.3 With the notation of the previous theorem, if a codeword c is sent and y
is received where t or fewer errors have occurred, then c is the unique codeword closest to
y. In particular, nearest neighbor decoding uniquely and correctly decodes any received
vector in which at most t errors have occurred in transmission.
For purposes of decoding as many errors as possible, this corollary implies that for given
n and k, we wish to find a code with as high a minimum weight d as possible. Alternately,
given n and d, one wishes to send as many messages as possible; thus we want to find a
code with the largest number of codewords, or, in the linear case, the highest dimension.
We may relax these requirements somewhat if we can find a code with an efficient decoding
algorithm.
Since the minimum distance of C is d, there exist two distinct codewords such that the
spheres of radius t + 1 about them are not disjoint. Therefore if more than t errors occur,
nearest neighbor decoding may yield more than one nearest codeword. Thus C is a t-error-
correcting code but not a (t + 1)-error-correcting code. The packing radius of a code is the
largest radius of spheres centered at codewords so that the spheres are pairwise disjoint.
This discussion shows the following two facts about the packing radius.
The decoding problem now becomes one of finding an efficient algorithm that will correct
up to t errors. One of the most obvious decoding algorithms is to examine all codewords
until one is found with distance t or less from the received vector. But obviously this is
a realistic decoding algorithm only for codes with a small number of codewords. Another
obvious algorithm is to make a table consisting of a nearest codeword for each of the q n
vectors in Fqn and then look up a received vector in the table in order to decode it. This is
impractical if q n is very large.
For an [n, k, d] linear code C over Fq , we can, however, devise an algorithm using a
table with q n−k rather than q n entries where one can find the nearest codeword by looking
up one of these q n−k entries. This general decoding algorithm for linear codes is called
syndrome decoding. Because our code C is an elementary abelian subgroup of the additive
group of Fqn , its distinct cosets x + C partition Fqn into q n−k sets of size q k . Two vectors x
and y belong to the same coset if and only if y − x ∈ C. The weight of a coset is the smallest
weight of a vector in the coset, and any vector of this smallest weight in the coset is called
42 Basic concepts of linear codes
a coset leader. The zero vector is the unique coset leader of the code C. More generally,
every coset of weight at most t = (d − 1)/2 has a unique coset leader.
Exercise 66 Do the following:
(a) Prove that if C is an [n, k, d] code over Fq , every coset of weight at most t = (d − 1)/2
has a unique coset leader.
(b) Find a nonzero binary code of length 4 and minimum weight d in which all cosets have
unique coset leaders and some coset has weight greater than t = (d − 1)/2.
Choose a parity check matrix H for C. The syndrome of a vector x in Fqn with respect to
the parity check matrix H is the vector in Fqn−k defined by
syn(x) = H xT .
The code C consists of all vectors whose syndrome equals 0. As H has rank n − k, every
vector in Fqn−k is a syndrome. If x1 , x2 ∈ Fqn are in the same coset of C, then x1 − x2 = c ∈ C.
Therefore syn(x1 ) = H (x2 + c)T = H xT2 + H cT = H xT2 = syn(x2 ). Hence x1 and x2 have
the same syndrome. On the other hand, if syn(x1 ) = syn(x2 ), then H (x2 − x1 )T = 0 and so
x2 − x1 ∈ C. Thus we have the following theorem.
Theorem 1.11.5 Two vectors belong to the same coset if and only if they have the same
syndrome.
Hence there exists a one-to-one correspondence between cosets of C and syndromes. We
denote by C s the coset of C consisting of all vectors in Fqn with syndrome s.
Suppose a codeword sent over a communication channel is received as a vector y. Since
in nearest neighbor decoding we seek a vector e of smallest weight such that y − e ∈ C,
nearest neighbor decoding is equivalent to finding a vector e of smallest weight in the coset
containing y, that is, a coset leader of the coset containing y. The Syndrome Decoding
Algorithm is the following implementation of nearest neighbor decoding. We begin with a
fixed parity check matrix H .
I. For each syndrome s ∈ Fqn−k , choose a coset leader es of the coset C s . Create a table
pairing the syndrome with the coset leader.
This process can be somewhat involved, but this is a one-time preprocessing task that
is carried out before received vectors are analyzed. One method of computing this table
will be described shortly. After producing the table, received vectors can be decoded.
II. After receiving a vector y, compute its syndrome s using the parity check matrix H .
III. y is then decoded as the codeword y − es .
Syndrome decoding requires a table with only q n−k entries, which may be a vast im-
provement over a table of q n vectors showing which codeword is closest to each of these.
However, there is a cost for shortening the table: before looking in the table of syndromes,
one must perform a matrix-vector multiplication in order to determine the syndrome of the
received vector. Then the table is used to look up the syndrome and find the coset leader.
How do we construct the table of syndromes as described in Step I? We briefly discuss
this for binary codes; one can extend this easily to nonbinary codes. Given the t-error-
correcting code C of length n with parity check matrix H , we can construct the syndromes
as follows. The coset of weight 0 has coset leader 0. Consider the n cosets of weight 1.
43 1.11 Encoding, decoding, and Shannon’s Theorem
Choose an n-tuple with a 1 in position i and 0s elsewhere; the coset leader is the n-tuple and
the associated syndrome is column i of H . For the ( n2 ) cosets of weight 2, choose an n-tuple
with two 1s in positions i and j, with i < j, and the rest 0s; the coset leader is the n-tuple
and the associated syndrome is the sum of columns i and j of H. Continue in this manner
through the cosets of weight t. We could choose to stop here. If we do, we can decode any
received vector with t or fewer errors, but if the received vector has more than t errors, it
will be either incorrectly decoded (if the syndrome of the received vector is in the table) or
not decoded at all (if the syndrome of the received vector is not in the table). If we decide
to go on and compute syndromes of weights w greater than t, we continue in the same
fashion with the added feature that we must check for possible repetition of syndromes.
This repetition will occur if the n-tuple of weight w is not a coset leader or it is a coset
leader with the same syndrome as another leader of weight w, in which cases we move on
to the next n-tuple. We continue until we have 2n−k syndromes. The table produced will
allow us to perform nearest neighbor decoding.
Syndrome decoding is particularly simple for the binary Hamming codes Hr with par-
ameters [n = 2r − 1, 2r − 1 − r, 3]. We do not have to create the table for syndromes and
corresponding coset leaders. This is because the coset leaders are unique and are the 2r
vectors of weight at most 1. Let Hr be the parity check matrix whose columns are the
binary numerals for the numbers 1, 2, . . . , 2r − 1. Since the syndrome of the binary n-tuple
of weight 1 whose unique 1 is in position i is the r -tuple representing the binary numeral for
i, the syndrome immediately gives the coset leader and no table is required for syndrome
decoding. Thus Syndrome Decoding for Binary Hamming Codes takes the form:
I. After receiving a vector y, compute its syndrome s using the parity check matrix Hr .
II. If s = 0, then y is in the code and y is decoded as y; otherwise, s is the binary numeral
for some positive integer i and y is decoded as the codeword obtained from y by adding
1 to its ith bit.
The above procedure is easily modified for Hamming codes over other fields. This is
explored in the exercises.
Exercise 67 Construct the parity check matrix of the binary Hamming code H4 of length
15 where the columns are the binary numbers 1, 2, . . . , 15 in that order. Using this parity
check matrix decode the following vectors, and then check that your decoded vectors are
actually codewords.
(a) 001000001100100,
(b) 101001110101100,
(c) 000100100011000.
Exercise 68 Construct a table of all syndromes of the ternary tetracode of Example 1.3.3
using the generator matrix of that example to construct the parity check matrix. Find a coset
leader for each of the syndromes. Use your parity check matrix to decode the following
vectors, and then check that your decoded vectors are actually codewords.
(a) (1, 1, 1, 1),
(b) (1, −1, 0, −1),
(c) (0, 1, 0, 1).
44 Basic concepts of linear codes
Exercise 69 Let C be the [6, 3, 3] binary code with generator matrix G and parity check
matrix H given by
1 0 0 0 1 1 0 1 1 1 0 0
G = 0 1 0 1 0 1 and H = 1 0 1 0 1 0 .
0 0 1 1 1 0 1 1 0 0 0 1
(a) Construct a table of coset leaders and associated syndromes for the eight cosets of C.
(b) One of the cosets in part (a) has weight 2. This coset has three coset leaders. Which
coset is it and what are its coset leaders?
(c) Using part (a), decode the following received vectors:
(i) 110110,
(ii) 110111,
(iii) 110001.
(d) For one of the received vectors in part (c) there is ambiguity as to what codeword
it should be decoded to. List the other nearest neighbors possible for this received
vector.
3
Exercise 70 Let H be the extended Hamming code with parity check matrix
1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
3 =
H .
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
Number the coordinates 0, 1, 2, . . . , 7. Notice that if we delete the top row of H 3 , we have
the coordinate numbers in binary. We can decode H 3 without a table of syndromes and coset
leaders using the following algorithm. If y is received, compute syn(y) using the parity check
matrix H 3 . If syn(y) = (0, 0, 0, 0)T , then y has no errors. If syn(y) = (1, a, b, c)T , then there
is a single error in the coordinate position abc (written in binary). If syn(y) = (0, a, b, c)T
with (a, b, c) = (0, 0, 0), then there are two errors in coordinate position 0 and in the
coordinate position abc (written in binary).
(a) Decode the following vectors using this algorithm:
(i) 10110101,
(ii) 11010010,
(iii) 10011100.
(b) Verify that this procedure provides a nearest neighbor decoding algorithm for H 3 . To do
this, the following must be verified. All weight 0 and weight 1 errors can be corrected,
accounting for nine of the 16 syndromes. All weight 2 errors cannot necessarily be
corrected but all weight 2 errors lead to one of the seven syndromes remaining.
A received vector may contain both errors (where a transmitted symbol is read as a
different symbol) and erasures (where a transmitted symbol is unreadable). These are
fundamentally different in that the locations of errors are unknown, whereas the locations
of erasures are known. Suppose c ∈ C is sent, and the received vector y contains ν errors
and erasures. One could certainly not guarantee that y can be corrected if ≥ d because
there may be a codeword other than c closer to y. So assume that < d. Puncture C in the
45 1.11 Encoding, decoding, and Shannon’s Theorem
Theorem 1.11.6 Let C be an [n, k, d] code. If a codeword c is sent and y is received where
ν errors and erasures have occurred, then c is the unique codeword in C closest to y
provided 2ν + < d.
3
Exercise 71 Let H be the extended Hamming code with parity check matrix
1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
3 =
H .
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
Correct the received vector 101 0111, where is an erasure.
In Exercises 70 and 71 we explored the decoding of the [8, 4, 4] extended Hamming code
3 . In Exercise 70, we had the reader verify that there are eight cosets of weight 1 and seven
H
of weight 2. Each of these cosets is a nonlinear code and so it is appropriate to discuss the
weight distribution of these cosets and to tabulate the results. In general, the complete coset
weight distribution of a linear code is the weight distribution of each coset of the code. The
next example gives the complete coset weight distribution of H 3 . As every [8, 4, 4] code
is equivalent to H3 , by Exercise 56, this is the complete coset weight distribution of any
[8, 4, 4] binary code.
Example 1.11.7 The complete coset weight distribution of the [8, 4, 4] extended binary
Hamming code H 3 is given in the following table:
Number of vectors
Coset of given weight Number
weight 0 1 2 3 4 5 6 7 8 of cosets
0 1 0 0 0 14 0 0 0 1 1
1 0 1 0 7 0 7 0 1 0 8
2 0 0 4 0 8 0 4 0 0 7
Note that the first line is the weight distribution of H 3 . The second line is the weight
distribution of each coset of weight one. This code has the special property that all cosets of
a given weight have the same weight distribution. This is not the case for codes in general.
In Exercise 73 we ask the reader to verify some of the information in the table. Notice that
this code has the all-one vector 1 and hence the table is symmetric about the middle weight.
Notice also that an even weight coset has only even weight vectors, and an odd weight
coset has only odd weight vectors. These observations hold in general; see Exercise 72.
The information in this table helps explain the decoding of H 3 . We see that all the cosets
46 Basic concepts of linear codes
of weight 2 have four coset leaders. This implies that when we decode a received vector in
which two errors had been made, we actually have four equally likely codewords that could
have been sent.
Exercise 72 Let C be a binary code of length n. Prove the following.
(a) If C is an even code, then an even weight coset of C has only even weight vectors, and
an odd weight coset has only odd weight vectors.
(b) If C contains the all-one vector 1, then in a fixed coset, the number of vectors of weight
i is the same as the number of vectors of weight n − i, for 0 ≤ i ≤ n.
Exercise 73 Consider the complete coset weight distribution of H 3 given in Example
1.11.7. The results of Exercise 72 will be useful.
(a) Prove that the weight distribution of the cosets of weight 1 are as claimed.
(b) (Harder) Prove that the weight distribution of the cosets of weight 2 are as
claimed.
We conclude this section with a discussion of Shannon’s Theorem in the framework of
the decoding we have developed. Assume that the communication channel is a BSC with
crossover probability on which syndrome decoding is used. The word error rate Perr
for this channel and decoding scheme is the probability that the decoder makes an error,
averaged over all codewords of C; for simplicity we assume that each codeword of C is
equally likely to be sent. A decoder error occurs whenc = arg maxc∈C prob(y | c) is not the
originally transmitted word c when y is received. The syndrome decoder makes a correct
decision if y − c is a chosen coset leader. This probability is
wt(y−c) (1 − )n−wt(y−c)
by (1.10). Therefore the probability that the syndrome decoder makes a correct decision is
n
i=0 αi (1 − ) , where αi is the number of cosets weight i. Thus
i n−i
n
Perr = 1 − αi i (1 − )n−i . (1.12)
i=0
Example 1.11.8 Suppose binary messages of length k are sent unencoded over a BSC with
crossover probability . This in effect is the same as using the [k, k] code Fk2 . This code
has a unique coset, the code itself, and its leader is the zero codeword of weight 0. Hence
(1.12) shows that the probability of decoder error is
Perr = 1 − 0 (1 − )k = 1 − (1 − )k .
This is precisely what we expect as the probability of no decoding error is the probability
(1 − )k that the k bits are received without error.
Example 1.11.9 We compare sending 24 = 16 binary messages unencoded to encoding
using the [7, 4] binary Hamming code H3 . Assume communication is over a BSC with
crossover probability . By Example 1.11.8, Perr = 1 − (1 − )4 for the unencoded data.
H3 has one coset of weight 0 and seven cosets of weight 1. Hence Perr = 1 − (1 − )7 −
7(1 − )6 by (1.12). For example if = 0.01, Perr without coding is 0.039 403 99. Using
H3 , it is 0.002 031 04 . . . .
54 4. Some Good Codes
corrects if at most one error occurs; otherwise, the word is declared an erasure. In
the end, this turns out to increase the efficiency of the collaborating pair.
We extend the example treated above, to introduce a sequence of codes defined
byP.Eliasin 1954. We start with an extended Hamming code C 1 oflengthnl = 2"'.
Assume that the codes are to be used on a B. S. C. with bit error probability p,
where niP < ~. For C2 we take the extended Hamming code of length 2"'+1.
Define VI := C I and define V2 to be the direct product of C I and C 2• We continue
in this way: if V; has been defined, then V;+I is the direct product of Vi and the
extended Hamming code Ci + 1 of length 2"'+i. Denote the length of Vi by ni and
its dimension by ki • Finally, let Ei be the expected number of errors per block in
words of Vi after decoding.
From the definition, we have:
and from Example 3.3.4 it follows that Ei+1 ~ E;andEI ~ (nIP)2 ~ *.Sothese
codes have the property that Ei tends to zero as i -+ 00.
From the recurrence relations for ni and k i , we find
. - 2mi +!i<i-I)
n ,- ,. ki = ni n
i-I (
)=fl
1-
m+ j +l)
2m +j •
Ri -jo- n
co
i=fl
(
1- m+ j +1)
.2m+)
> 0
for i -jo- 00. So we have a sequence of codes for which the length tends to 00, the
rate does not tend to 0, and nevertheless the error probability tends to O. This is
close to what Shannon's theorem promises us. Note that these codes, called Elias
codes, have minimum distance d i = 4! and hence dJni -jo- 0 as i -jo- 00.
and
PROOF. We use the fact that (1 + x)P == 1 + x P (mod p).IfO:s; r < p then
(1 + x)"p+r == (1 + x P ),,(1 + x)' (mod p).
Comparing coefficients of x"p+s (where 0 :s; s < p) on both sides yields
(4.5.2) Theorem (Massey et al. 1973; cf. [49]). Let P(x) = 1:1:0 b;(x + C)i,
where bl ,,;: 0 and let io be the smallest index i for which bi ,,;: o. Then
(4.5.3) Definitions.
(i) Ai := {Xj E AG(m, 2)I~ij = I}, i.e. Ai is an (m - I)-dimensional affine sub-
space (a hyperplane), for 0 5 i < m;
(ii) vi := the ith row orE, i.e. the characteristic function of Ai' The vector Vi
is a word in lFi; as usual we write 1:= (1, 1, ... , 1) for the characteristic
function of AG(m, 2);
(iii) if a = (a o• a 1 , •••• an-I) and b = (b o• b1 ••••• b,,-d are words in lFi. we de-
fine
(4.5.4) Lemma. Let I = :L:"=-01 ~il2i and let i 1 , •••• is be the values of i for which
~il = O. If
then
71-1
(x + 1)1 = L
j=O
al,jX n - I - j .
The following shows how to interpret the products Vi, ... Vi. geometrically.
(ii) the weight W(Vil ••• Vi) of the vector Vii' •• Vi. in lFi is 2"'-',
(iii) the characteristic function of {Xj}, i.e. the jth basis vector of lFi is
m-l
ej = Il
i=O
{Vi + (1 + ~i)l},
(iv) the products Vii'" Vi. (0::::;; S ::::;; m) are a basis of lFi.
PROOF.
(iv) There are L:':o (7) = 2m = n products Vii'" Vi.' The result follows from
(iii). Since the polynomials (x + 1)1 are independent we could also have
used Lemma 4.5.4. 0
The following table illustrates Lemmas 4.5.4 and 4.5.5. For example,
Vo V2 corresponds to 1 = 15 - 20 - 22 = 10 and hence (x + 1)10 =
x lO + x 8 + x 2 + 1.
(4.5.6) Definition. Let 0 ~ r < m. The linear code of length n = 2m which has
the products Vi, ••• Vi. with s ~ r factors as basis is called the rth order binary
Reed-Muller code (RM code; notation 3l(r, m».
The special case ~(O, m) is the repetition code. From Lemma 4.5.5(i) we see
th~t the Boolean function Xi, Xi 2 '" Xi. where x = (X o, ... , X m - 1 ) runs through
lFi has value 1 iff x E Ai, Il'" Il Ai.' Hence 9l(r, m) consists of the sequences
of values taken by polynomials in x o, ... , X m - 1 of degree at most r.
PROOF. By the definition and Lemma 4.5.5(ii) the minimum distance is at most
2m- r and by Lemma 4.5.4 and Theorem 4.5.2 it is at least 2m - r • (Also see
Problem 4.7.9.) 0
PROOF.
(a) By the definition and the independence of the products Vi, ... Vi. the
dimension of ~(r, m) is 1+ (7) + ... + (~). So dim 9l(r, m) +
dim ~(m - r - 1, m) = n.
(b) Let Vi, ••. Vi. and vi, ... Vj, be basis vectors of ~(r, m) and 3l(m - r - 1, m)
respectively. Then s + t < m. Hence the product of these two basis vectors
has the form vk , ••• Vku where u < m. By Lemma 4.5.5(ii) this product has
even weight, i.e. the original two basis vectors are orthogonal. 0
(4.5.9) Theorem. Let C = ~(m - I, m) and let A be an I-flat in AG(m, 2). Then
the characteristic function of A is in C.
Here the inner sum counts the number of points in the intersection of A and
the s-flat
L = {xj E AG(m, 2)1j E C(ii' ... , is)}.
If s > m - I then L n A is either empty or an affine subspace of positive
dimension. In both cases IL n A I is even, i.e. the inner sum is O. 0
This theorem and the definition show that a word is in 9l(r, m) iff it is the
sum of characteristic functions of affine subspaces of dimension;::: m - r. In
the terminology of Boolean functions 9l(r, m) is the set of polynomials in x o,
Xl' .•. , X m - l of degree:::;; r.
In Section 3.2 we defined the notion of equivalence of codes using permuta-
tions acting on the positions of the codewords. Let us now consider a code C
of length n and the permutations 1t E Sn which map every word in C to a word
in C. These permutations form a group, called the automorphism group of C
(Notation: Aut(C)). For example, ifC is the repetition code then Aut(C) = Sn.
PROOF. This is an immediate consequence of Theorem 4.5.9 and the fact that
AGL(m, 2) maps a k-flat onto a k-flat (for every k). 0
parity of the number of errors in the positions of any r-flat. Then, using a
similar procedure, the same thing is done for (r - I Fflats, etc. After r + 1 steps
the errors have been located. This procedure is called multistep majority
decoding.
where L is linear and 2h is the rank of B. In fact, one can see to it that L(v) = 0,
t or V 2h •
2h + (2h - 1)2 - •
h 1 0
(Note that this implies that if Q has rank smaller than m, the corresponding
coset has smaller minimum weight).
Clearly, a union of cosets of 9l(1, m) will be a code with minimum distance
at most 2m- 1 - 2m/2 - 1 • We wish to form a code C by taking the union of
co sets corresponding to certain quadratic forms Q l ' ... , Ql (with associated
108 6. Cyclic Codes
f1 (x - a
<rl
g(x) := j ),
where a is a primitive element in IFqM and the upper index (r). iitdicates that the
product is over integers j with 0 ~ j < qm - 1 and 0 ~ Wq (j) < (q - l)m - r.
The r-th order GRM code of length qm has a generator matrix G* obtained
from the generator matrix G of the shortened GRM code by adjoining a column
of Os and then a row of Is.
Note that the set of exponents in this definition of shortened GRM codes is
indeed closed under multiplication by q. Let h (x) be the check polynomial of the
shortened r th order GRM code. Then the dual of this code has the polynomial
h*(x) as generator, where h*(x) is obtained from h(x) by reversing the order of
the powers of x. It is defined in the same way as g(x), now with the condition
0< wq(j) ~ r.
We have the following generalization of Theorem 4.5.8.
(6.11.3) Theorem. The dual ofthe r-th order GRM code oflength qm is equivalent
to a GRM code of order (q - l)m - r - 1.
PROOF. We have seen above that (x - 1)h*(x) is the generator of the shortened
GRM code of order (q - l)m - r - 1. If we now lengthen the cyclic codes to
GRM codes, we must show orthogonality of the rows of the generator matrices.
The only ones for which this is not a consequence of the duality of the shortened
codes are the all one rows. For these, the factor (x - 1) in the generators and the
fact that the length is qm takes care of that. Since the dimensions of the two codes
add up to qm, we are done. O.
To handle the binary case, we need a lemma.
(6.11.4) Lemma. Let C I and C2 be cyclic codes of length n over IFq with check
polynomials fl (x) := TI:~I (x - aj), resp. TI~~I (x - tJj). Let Cbe the cyclic code
of the same length for which the check polynomial has all the products aj tJ j as its
zeros. Then C contains all the words ab, where a E C[, b E C 2 •
§6.11. Generalized Reed-Muller Codes 109
(6.11.5) Theorem. The rth order binary GRM code of length 2m is equivalent to
the rth order Reed-Muller code of length 2m •
PROOF. The proof is by induction. For r = 0, the codes defined by (4.5.6) and
(6.11.2) are both repetition codes. We know that the binary Hamming code is
cyclic. So, for r = 1 we are done by the corollary to Theorem 4.5.8. Assume
that the assertion is true for some value of r. The check polynomial h*(x) of the
shortened GRM code has zeros a j , where W2(j) ~ r. The zeros of the check poly-
nomial of the shortened I-st order RM code are the powers a j with W2 (j) = 1. The
theorem now follows from, the induction hypothesis, Definition 4.5.6 and Lemma
6.11.4. 0
PROOF. For every G C F, we define f(G) to be the number of points in 1F2' where
all the monomials of G have the value 0 and all the other monomials of F have
the value 1. Clearly we have
L f(H) = 2 v (F-G)
HcG
(because this is the number of points in the affine subspace of 1F2' defined by
Xii = Xi2 = ... = Xi. = 1, where the Xit are the variables occurring in F - G). It
follows from Theorem 6.8.6 that
HcG
Furthermore
N(F) = f(G).
GCF.IF-GI=O (mod 2)
= 2m - 1 + ~ L(_1)1F- HI 2V(F- Hl L 1
2 HcF HcGcF
= 2m - 1 + ~ L(_1)1F-HI2V(F-Hl21F-HI
2 HcF
PROOF. The code .9B(r, m) consists of the sequences of values taken by polynomials
of degree at most r in m binary variables. The codeword corresponding to a
polynomial F has weight 2m - N(F). If G C F and G bas degree d, then
r
v(G) ~ m -IGI' d, i.e. IGI ~ m-;(Gl 1. Since
m - v(G) m
v(G) +r d 1 ~ rd 1,
the result follows from Theorem 6.11.6 o
§6.12. Comments
The reader who is interested in seeing the trace function and idempotents
used heavily in proofs should read [46, Chapter 15].
A generalization of BCH codes will be treated in Chapter 9. There is
extensive literature on weights, dimension, covering radius, etc. of BCH
codes. We mention the Carlitz-Uchiyama bound which depends on a deep
theorem in number theory by A. Weil. For the bound we refer to [42]. For a
generalization of QR codes to word length n a prime power, in which case the
theory is similar to Section 6.9 we refer to a paper by J. H. van Lint and F. J.
MacWilliams (1978; [45]).
32 Useful Background
u
1 2 2 1 0 1
1 2 2 1 0/
By Theorem 3, C is self-orthogonal, hence self-dual. The fact that d = 6 is
left for Problem 21.
Reed-Muller codes are an infinite family of codes that are defined recur-
sively. Many things are known about them, including their minimum weights.
Berlekamp [2] calls them weak codes that are easy to decode. At modest
lengths, however, there are good Reed-Muller codes; they get weaker as their
lengths increase. However, they are valuable as building blocks for other
codes.
If £>! is a binary [n, kv dx] code and D2 is a binary [n, k2, d2] code, we
construct a binary code C of length 2n as follows: C = {|u|u + v| where u is
in Dv v is in Z>2}. Then (Problem 22) C is a [2n, kx + k2, minGdj, d2)] code.
Further, if G, is a generator matrix for Dh i = 1,2, then is a
generator matrix for C (Problem 23).
One way of defining Reed-Muller codes is recursively using this construc-
tion. Reed-Muller codes, R(r, m), are binary codes that exist at length n = 2 m
for 0 < r < m. Ä(m, m) is the whole space, and Ä(0, m) is the repetition
code.
If 0 < r < m, define R(r + 1 , m + 1) to be {|u|u + v| where u is in
R(r + 1, m) and v is in R(r, m)}. Let G(r, m) denote a generator matrix of
R(r, m). Then we see that
G(r + 1, m + 1) =
\ 0 G(r,m)
is a generator matrix of R(r + 1, m + 1).
Reed-Muller Codes 33
It is interesting that one can write down a generator matrix for a specific
R(r, m) without having computed the generator matrices for smaller R(r, m).
We can show (Problem 25) that R(rv m) c R(r2, m) if rx < r2, so we start
with a basis of i?(0, m), namely h. We will call h v0. We will extend this to a
basis of /f(l, m), then to a basis of R(2, m), and so on.
Start with m = 2:
v
o 1 1 1 1
V) 0 0 1 1
v
2 0 1 0 1
v,v2 0 0 0 1
We notice that \x is chosen to have its first half zero, second half one; v2 has
its first quarter zero, second quarter one, third quarter zero, fourth quarter
34 Useful Background
one, and so on. Again ν,ν; has ones only where both v, and v; have ones and
v v v
i 2 3 *s on ty o n e w h e r e aH three of Vj, v2, and v3 are one. Note that i?(l, 3) is
equivalent to the code C5.
In addition to knowing the dimension and minimum weight of a Reed-
Muller code, we can identify its dual code. Naturally this is another Reed-
Muller code.
Proof. There are two things to prove: One is that the dimensions of these
two codes add up to 2 m , which we leave to Problem 27; the other is that these
codes are orthogonal to each other.
We prove the orthogonality by induction on m. We can verify this easily
for m = 2. Suppose that it is true for all R(r, rri) where rri <m. Recall that
G(r,m-1) G(r,m-1) \
G{r,m) =
0 G(r- l,m - 1) ]
and
We have seen enough of Reed-Muller codes to get the feeling that these
are codes with geometrical connections, but we will not explore these
connections here. We just remark that their geometric nature can be used to
show that the group of each R(r, m) contains the affine group, AG(m, 2), of
order 2",(2'" - lX2m - 2 2 )---(2 m - 2 m " 1 ).
We will spend just a bit of time on decoding Reed-Muller codes, called
Reed decoding. Reed decoding of R(rf m) is based on the fact that it is
possible to construct 2m~r check sums each involving 2r bits of a received
word where each received bit is used in only one check sum. This is a form of
what is called majority logic decoding.
Puncturing, Extending, and Shortening 35
We illustrate this first for /?(1,3) where we construct four check sums each
involving two bits of a received word where each received bit is used in only
one check sum. Notice that we are again decoding the single-error-correcting
extended Hamming code but now by Reed-decoding. Recall the basis
ν
ο>νι>ν2>ν3· Let v = (yo>y\>--->yi) be a received vector. If there are no
errors, v = a0v0 + a1\l + a2v2 + a3\3. It is not hard to see that
*ι = > Ό + } Ί - Λ + Λ - Λ ^ys=y^y^
β
^ ^ O + ^ ^ l + >>3 =^4 + Λ Λ +>>7>
β β
« 3 Λ + Λ ) Ί + > ; 5=>'2+)'6 = = J'3 + >V
PROBLEMS
1. Prove the four numbered facts about cosets given in Section 2.1.
2. How many errors can C2 (the code constructed after Theorem 1 in
Chapter 1) correct?
3. (a) Construct a standard array for the code C3 (Problem 1.13). For each
coset give its syndrome.
(b) Using the constructions of part (a), decode the received vectors
y i = (0,1,1,0,1,1) and y2 = (0,0,0,1,0,0) for C3.
The minimum distance d of an RS code C can be computed algebraicly using Lemma 2.1.
Lemma 2.1. A polynomial of degree D over a field F has at most D roots (counting multiplicity).
Proof. The theorem is proved by induction on the degree D. The case D = 0 is obvious. Let
f (X) be a nonzero polynomial of degree D over F. let α ∈ F be a root of f (X). By the division
theorem for polynomials over a field, we can write f (X) = Q(X)(X − α) + R(X), where R(X)
is the remainder polynomial with degree less than 1, and therefore a constant polynomial. Since
f (α) = R(α) = 0, we must have R(X) = 0. Therefore f (X) = (Xα )Q(X). By induction
hypothesis, Q(X), which has degree D − 1, has at most D − 1 roots. These roots together with α
can make up at most D roots for f (X).
Since the degree of the encoded polynomial in (1) is k − 1, a codeword c can have at most
k − 1 elements M (αi ) equal to zero. The minimum distance d, equal to the minimum weight
of any codeword in C, is therefore at least as d ≥ n − k + 1. The Singleton bound (proven in
Lecture 5) provides a bound of d ≤ n − k + 1 for any code. Hence, the minimum distance of
the RS code C is d = n − k + 1. The upper bound can also be demonstrated by constructing a
codeword with exactly d = n−k +1 non-zero entries. Let M (x) = (x−α0 )(x−α1 ) . . . (x−αk−2 )
be the encoding polynomial as in (1). Since the degree of M (x) is k − 1, there exists a message
m = [m0 , . . . , mk−1 ] which corresponds to the polynomial M (x), simply by mathing coefficients
in (1). Hence, evaluating M (x) for all αi , i = 0, . . . , q − 1 yields a codeword with k − 1 zeros
followed by n − k + 1 non-zero entries. We record the distance property of RS codes as:
Lemma 2.2. Reed-Solomon codes meet the Singleton bound, i.e., a code of block length n and
dimension k has distance n − k + 1.
RS codes can thus be used to achieve a relative distance of δ = nd = n−k+1 n
= 1 − R + o(1)
for any rate R = nk . However, the alphabet size q scales as q = Ω(n). By the Plotkin bound, for
q
codes over an alphabet of size q, we have R ≤ 1 − q−1 δ, so to meet the Singleton bound q has to
grow with the block legnth n. We now use similar algebraic ideas to construct codes over smaller
alphabet size, at the expense of worse rate vs distance trade-offs.
3 Reed-Muller Codes
In what follows, a generalization is provided for the RS codes described in Section 2 by expanding
the polynomial encoding in (1) to multivariate polynomials. The resulting codes are hereafter
2
referred to as Reed-Muller (RM) codes.1
The resulting RM code is a [q 2 , `2 , d]q linear code. Linearity can be verified as in Section 2. The
minimum distance d of the RM code can be computed using the following result.
Lemma 3.1. The tensor product of two [q, `, d]q RS codes C1 and C2 is the [q 2 , `2 , d2 ]q (bivariate)
RM code C.
Proof. The tensor product of two codes C1 and C2 is defined as the code C = C1 ⊗ C2 given by
where G1 and G2 are the generator matrices for C1 and C2 , respectively. Since both C1 and C2
are RS codes, the matrices G1 and G2 are both equal to the RS generator matrix G given in (3).
Hence, a message m is mapped to the codeword M = GmGT ∈ C. The entry M (αx , αy ) in row
x and column y of the codeword M is given by the product gx mgyT , where gx denotes the row
[1, αx , . . . , αx`−1 ] of G, for 0 ≤ x ≤ q − 1. Hence, the product code is such that
`−1 X
X `−1
M (αx , αy ) = mij αxi αyj ,
i=0 j=0
which is consistent with the definition of the bivariate Reed-Muller code C in (4) with x and y
replaced with αx and αy .
The use of tensor product codes and the result of Lemma 3.1 implies that the [q 2 , `2 , d]q Reed-
2
Muller code has distance d = (q − ` + 1)2 = q 2 − 2q(` − 1) + (` − 1)2 and rate R = q`2 . Note that
the distance d = (q − ` + 1)2 no longer achieves equality in√
the Singleton bound d ≤ q 2 − `2 + 1.
However, the alphabet size q in this case scales as q = O( n). This demonstrates the trade-off
between optimal distance and smaller alphabet size that is characteristic of RM codes over RS
codes.
1
An alternate definition of Reed-Muller codes is common, but Prof. Guruswami claims the multivariate polynomial
interpretation is more clear.
3
3.2 Multivariate RM Codes
The bivariate extension of Section 3.1 generalizes in the natural way to multivariate polynomials.
A multivariate RM code C with v variables x1 , . . . , xv can be interpreted as the tensor product code
of v RS codes C1 , . . . , Cv . The encoding function
which is a polynomial of degree dv in the variable xv . By Lemma 2.1, there are at least q − dv
values of xv for which P (x1 , . . . , xv ) is a non-zero olynomial in x1 , . . . , xv . For each of the (at
Qv−1q − dv ) values of xv which yield non-zero P (x1 , . . . , xv ), by induction there are at least
least
i=1 (q − di ) values to x1 , . . . , xv that lead to a nonzero evaluation.
The following construction demonstrates how equality is achieved in the bound provided by
Lemma 3.2. Since the bound results from fewer than q − di roots for any given xi , equality is
achieved whenever there are exactly q − di roots for each xi . Hence, let Mi (xi )Qbe the product
(xi − αi,1 ) . . . (xi − αi,`i −1 ), where the αi,j are distinct, and let M (x1 , . . . , xv ) = vi=1 Mi (xi ).
4
The resulting code C is a [q v , k, d]q linear code, where k is the total number of tuples (i1 , . . . , iv )
of nonnegative integers satisfying i1 + . . . + iv ≤ `. The values of k and d are computed using the
following results.
Observation 3.3. The value k for the given code C is v+l
v
(stated without proof).
Lemma 3.4. A non-zero polynomial P (x1 , . . . , xv ) of total degree at most ` over Fq is zero on at
most a fraction q` of points in Fvq .
Proof. The statement is proved via induction. The case v = 1 states that a univariate polynomial of
degree ` has at most ` roots and is proved using Lemma 2.1. We next note that such a polynomial
can be written as
5
Proof. Consider the encoding polynomial M (x1 , . . . , xv ) = `i=1 xi resulting from the message
Q
leading to the coefficient cS = 1 if and only if S = {1, . . . , `}. There are exactly 2v−` choices for
(x1 , . . . , xv ) that make M non-zero, namely those with x1 = . . . = x` = 1. The distanceQ d is thus
bounded as d ≤ 2v−` . Next, consider the non-zero polynomial M (x1 , . . . , xv ) and let ri=1 xi be
the maximal monomial of M , i.e. reorder the indices {1, . . . , v} such that
r
Y
M (x1 , . . . , xv ) = xi + R(x1 , . . . , xv )
i=1
where there is no monomial term in R(x1 , . . . , xv ) with more than r variables. There are 2v−r
ways to choose the variables xr+1 , . . . , xv , but none of them can cause the maximal monomial to
be cancelled. This leads to the bound d ≥ 2v−r , which implies d ≥ 2v−` since r ≤ ` by the
definition of M .
5 Summary
Two families of linear codes, Reed-Solomon and Reed-Muller, were presented and analyzed using
various algebraic properties. Though the Reed-Solomon codes can be used to achieve R, δ > 0,
and in fact achieve the optimal trade-off matching the Singleton bound, this can only be done if
the alphabet size q increases linearly in the block length, i.e., q ≥ n. Reed-Muller codes use
multivariate polynomials to give codes over smaller alphabets, although they are unable to give
codes with R, δ > 0 over a bounded alphabet size.
6
CSE 533: Error-Correcting Codes (Autumn 2006)
In the previous lecture, we defined Reed Muller codes and their variants. Today, we will study
an efficient algorithm for decoding Reed Muller code when the number of errors are less than half
the distance. Then we shall return to our original goal of constructing explicit codes with constant
relative distance and rate. Towards this, we will convert Reed Solomon codes in to binary codes.
1 Recap
Let R(r, m) denote the rth order Reed Muller code. Therefore the messages consist of multilinear
polynomials in variablesPX1 , X2 . . . Xm of degree at most r. Recall that the lengthPof the code n=
m r m m−r m r m
m−r
2 , the dimension k = i=0 i and the distance is 2 . i.e R(r, m) is a [2 , i=0 i , 2 ]2
linear code.
Some interesting special cases of the Reed Muller code:
m √
• With r = m2 , R(r, m) gives a code with rate 12 and distance d = 2 2 = n. Although this is
not constant distance, it is a fairly non-trivial code with a good rate.
• With r = 1, R(r, m) yeilds a linear code with parameters [2m , m + 1, 2m−1 ]. Further with
r = 1, a code word consists of the evaluation of a degree 1(linear) function over Fm
2 . Hence
R(1, m) code consists of the Hadammard codes and their complements.
2 Reed’s algorithm
2.1 Notation
We will use X = (x1 , . . . , xm ) to denote an element of Fm 2 . For a subset S ⊆ {1, . . . , m}, let XS
denote the restriction of X to indices in S. i.e |S|-dimensional vector consisting of xi for i ∈ S.
Denote by S the complement of a set S.
Let R(r, m) be a binary Reed Muller code. Let f : Fm 2 → F2 be given as a table of values. If f
is a codeword then f is a polynomial P (x1 , . . . xm ) of degree atmost r. For a general function f ,
define the distance from a polynomial P (x1 , . . . , xm ) as follows:
∆(f, p) = |{a ∈ Fm
2 |f (a) 6= P (a)}|
1
2.2 Algorithm
The input consists of a message which is atmost d2 away from a codeword. In particular, we are
given a function f : Fm2 → F2 such that there exists a polynomial P of degree r with ∆(f, P ) <
2m−r
2
. The goal of the algorithm is to output polynomial P .
Let us say P is of the form
X Y
P (X) = cS xi
S⊆{1,...,n},|S|≤r i∈S
Q |S|
Proof: Recall that RS (a) = i∈S xS . Hence for all but one of the values a ∈ F2 , RS (a) is 0.
Therefore the above identity follows.
Lemma 2.3. For all b ∈ F2m−r , and a degree r polynomial P the following is true
X
P (a) = cS
a∈Fm
2 ,aS =b
2
X
Pb (X) = cS RS (X) + aT RT (X)
T ⊂S
for some aT .
X X
P (a) = Pb (y)
a∈Fm
2 ,aS =b y∈Fr2
X X X
= cS RS (y) + aT RT (y)
y∈Fr2 T ⊂S y∈Fr2
The lemma 2.3 suggests an algorithmic method to obtain the coefficients cS of the polynomial
P . Simply sum the value of the polynomial for all a ∈ Fn2 with aS = b for some b ∈ Fr2 . Further
for each of the 2m−r choices for b, the sum ranges over disjoint set of points in Fm
2 . That is the sets
m
{a ∈ F2 |aS = b} are all disjoint.
By our assumption, f differs from the polynomial P in atmost 2m−r−1 − 1 positions. Hence
for atleast 2m−r − (2m−r−1 − 1) values of b we have
X X
f (a) = P (a) = cS
a∈Fm
2 ,aS =b a∈Fm
2 ,aS =b
Out of the 2m−r sums of the form a∈Fm ,a =b f (a), more than 12 fraction of them are equal to
P
2 S
cS . Thus a natural way to compute cS is to compute majority of all these sums. Towards finding
the other lower degree terms in P , reduce the problem as follows :
X
f0 = f − cS RS (x)
|S|=r
X
P0 = P − cS RS (x)
|S|=r
Since f 0 is atmost 2m−r < 2m−(r−1) away from a degree r − 1 polynomial P 0 , the above
procedure can be used to find the degree r − 1 terms of P 0 . Hence iteratively, all the coefficients
cS of P can be computed.
The formal description of the algorithm is given below
3
Reed’s Algorithm
t ← r, F ← f, P ← 0
While t ≥ 0 do
For each S ⊂ {1, . . . m} with |S| = t
X
cS = Majority over all b of F (a)
a∈Fm
2 ,aS =b
Y
P = P + cS xi
i∈S
For each x ∈ Fm
2
Y
F (x) = F (x) − cS xi
i∈S
t←t−1
Output P
3 Extension Fields
For every prime p, the field Fp consists of {0, 1, . . . , p−1} with addition and multiplication modulo
p. For any integer k, the extension field Fpk is obtained as follows:
Consider the ring Fp [x] consisting of polynomials over one variable(x) with coefficients over
Fp . Let q(x) be an irreducible degree k polynomial in Fp [x]. Since q(x) is irreducible, for any
other polynomial r(x) one of the following two cases is true
• gcd(q(x), r(x)) = 1 : By Euclid’s algorithm, we can find a(x), b(x) such that a(x)r(x) +
b(x)q(x) = 1. i.e there exists a(x) such that a(x)r(x) ≡ 1 mod q(x)
That is each polynomial r(x) is either 0 mod q(x) or has an inverse a(x). Hence the set of
polynomials modulo q(x) form a field. This field consists of all polynomials over Fp with degree
less than k and is denoted by Fp [x]/(q(x)). Clearly there are pk elements in this field. Further it
can be shown that the field obtained from different degree k irreducible polynomials all behave the
same way, i.e are isomorphic to each other.
4
Notice that the set of polynomials Fp [x] form a vector space over Fp . Hence the field Fp [x]/(q(x))
also form a vector space over the field Fp . Every polynomial of degree less than k can be repre-
sented naturally as a length k vector of Fp . This implies a representation/mapping of elements of
the extension field Fp [x]/(q(x)) as k-dimensional Fp vectors. So we have a mapping
Fp [x]
φ: → Fkp
(q(x))
Infact the above mapping φ is a linear mapping in the following sense : For any two elements
Fp [x]
r(x), s(x) of (q(x)) we have
Proof: Suppose not, let us say there are polynomials Q(x), R(x) over F2 such that P (x) =
Q(x)R(x). Observe that
k k−1 k−1 k−1
x3 − 1 = (x3 − 1)(x2·3 + x3 + 1)
Let F∗2 be the algebraic closure of F2 . All our arguments will be over F∗2 , of which F2 is a subfield.
Let ζ be the 3k primitive root of 1. So we have
k
ζ3 = 1
k−1
ζ 3 6= 1
k k−1
Since x3 − 1 = (x3 − 1)P (x) we get P (ζ) = 0. Hence either Q(ζ) = 0 or R(ζ) = 0, without
loss of generality we can assume Q(ζ) = 0. Recall that Φ : x → x2 gives an automorphism of the
F∗2 known as the Frechet’s mapping. That is for any two elements x, y we have
5
Further the elements of F2 = {0, 1} are fixed by the mapping Φ : x → x2 . In particular the
coefficients of the polynomial Q are fixed by Φ. Therefore if we apply Φ on the equation Q(ζ) = 0,
t
we get Q(ζ 2 ) = 0. Applying this repeatedly, we can conclude that {ζ, ζ 2 , ζ 4 , . . . , ζ 2 } are all roots
t
of Q. Now let us count the number of distinct elements of the form ζ 2 . Let n = 3k , then the
Euler’s totient function φ(n) = 2 · 3k−1 . By Euler’s theorem
2φ(n) = 1 mod n
2φ (n) i
Clearly this implies that ζ = ζ. Since ζ is a primitive nth root of 1, ζ 2 = ζ implies 2i ≡ 1
mod n. It can be shown that 2 is a primitive root modulo n = 3k . Therefore for any i < φ(n), 2i 6=
φ(n)−1
1 mod n. Hence all the elements ζ, ζ 2 , . . . , ζ 2 are distinct. Recall that all these elements are
roots of Q. But since P (x) = Q(x)R(x), the degree of Q < φ(n). This is a contradiction, since Q
cannot have more than roots than its degree.
6
6.4. REED-MULLER CODES 65
Therefore
d2min (s(C)) = 4α2 dH (C) = 4α2 d,
where d = dH (C) is the minimum Hamming distance of C.
It follows that the nominal coding gain of s(C) is
d2min (s(C)) kd
γc (s(C)) = = . (6.1)
4Eb n
Thus the parameters (n, k, d) directly determine γc (s(C)) in this very simple way. (This gives
another reason to prefer Eb /N0 to SNRnorm in the power-limited regime.)
Moreover, every vector s(x) ∈ s(C) has the same number of nearest neighbors Kmin (s(x)),
namely the number Nd of nearest neighbors to x ∈ C. Thus Kmin (s(C)) = Nd , and Kb (s(C)) =
Nd /k.
Consequently the union bound estimate of Pb (E) is
√
Pb (E) ≈ Kb (s(C))Q (γc (s(C))(2Eb /N0 ))
� �
Nd √ dk
= Q 2Eb /N0 . (6.2)
k n
In summary, the parameters and performance of the binary signal constellation s(C) may be
simply determined from the parameters (n, k, d) and Nd of C.
Exercise 1. Let C be an � (n, k, d) binary linear code with d odd. Show that if we append an
overall parity check p = i xi to each codeword x, then we obtain an (n + 1, k, d + 1) binary
linear code C with d even. Show that the nominal coding gain γc (C ) is always greater than
γc (C) if k > 1. Conclude that we can focus primarily on linear codes with d even.
Exercise 2. Show that if C is a binary linear block code, then in every coordinate position
either all codeword components are 0 or half are 0 and half are 1. Show that a coordinate
in which all codeword components are 0 may be deleted (“punctured”) without any loss in
performance, but with savings in energy and in dimension. Show that if C has no such all-zero
coordinates, then s(C) has zero mean: m(s(C)) = 0.
For any integers m ≥ 0 and 0 ≤ r ≤ m, there exists an RM code, denoted by RM(r, m), that
has length n = 2m and minimum Hamming distance d = 2m−r , 0 ≤ r ≤ m.
For r = m, RM(m, m) is defined as the universe (2m , 2m , 1) code. It is helpful also to define
RM codes for r = −1 by RM(−1, m) = (2m , 0, ∞), the trivial code of length 2m . Thus for
m = 0, the two RM codes of length 1 are the (1, 1, 1) universe code RM(0, 0) and the (1, 0, ∞)
trivial code RM(−1, 0).
The remaining RM codes for m ≥ 1 and 0 ≤ r < m may be constructed from these elementary
codes by the following length-doubling construction, called the |u|u + v| construction (originally
due to Plotkin). RM(r, m) is constructed from RM(r − 1, m − 1) and RM(r, m − 1) as
(a) RM(r, m) is a binary linear block code with length n = 2m and dimension
(b) The codes are nested, in the sense that RM(r − 1, m) ⊆ RM(r, m).
(c) The minimum distance of RM(r, m) is d = 2m−r if r ≥ 0 (if r = −1, then d = ∞).
We verify that these assertions hold for RM(0, 0) and RM(−1, 0).
For m ≥ 1, the linearity and length of RM(r, m) are obvious from the construction. The
dimension (size) follows from the fact that (u, u + v) = 0 if and only if u = v = 0.
Exercise 5 below shows that the recursion for k(r, m) leads to the explicit formula
� �m�
k(r, m) = , (6.4)
j
0≤j≤r
� m� m!
where j denotes the combinatorial coefficient j!(m−j)! .
The nesting property for m follows from the nesting property for m − 1.
Finally, we verify that the minimum nonzero weight of RM(r, m) is 2m−r as follows:
Equality clearly holds for (0, v), (v, 0) or (u, u) if we choose v or u as a minimum-weight
codeword from their respective codes.
6.4. REED-MULLER CODES 67
r = m, d = 1;
*
universe codes
(32, 32, 1)
r = m − 1, d = 2;
(16, 16, 1)
* SPC codes
(8,8, 1)
31, 2)
(32,
r = m − 2, d = 4;
(4, 4, 1)
(16, 15, 2)
* ext. Hamming codes
26, 4)
(2, 2, 1)
(8, 7, 2)
(32,
HH
HH
(4,3, 2)
1, 1)
(1,
H H
H
(16, 11, 4)
HH H
HH HH
(2,1, 2)
HH H
(8,4, 4)
H
-
(32, 16, 8) k = n/2;
H self-dual codes
HH HH
(1,
HH
0, ∞) H(4,1, 4)
H HH(16, 5, 8)
HH H H HH
H(2, 0, ∞) HH(8, 1, 8) HH(32, 6, 16)
H H HH
HH H
HH(4, 0, ∞) H H (16, 1, 16) HH r = 1, d = n/2;
HH HH H
j
H biorthogonal codes
HH HH
HH(8, 0, ∞) H(32, 1, 32)
HH
H
H H
HH(16, 0, ∞) H r = 0, d = n;
H H
j
HH repetition codes
HH
(32, 0, ∞)
H H
HH r = −1, d = ∞;
H
j
trivial codes
Figure 2. Tableau of Reed-Muller codes.
In this tableau each RM code lies halfway between the two codes of half the length that are
used to construct it in the |u|u + v| construction, from which we can immediately deduce its
dimension k.
Exercise 3. Compute the parameters (k, d) of the RM codes of lengths n = 64 and 128.
There is a known closed-form formula for the number Nd of codewords of minimum weight
d = 2m−r in RM(r, m):
� 2m−i − 1
Nd = 2r . (6.5)
2m−r−i − 1
0≤i≤m−r−1
Example 4. The number of weight-8 words in the (32, 16, 8) code RM(2, 5) is
31 · 15 · 7
N8 = 4 = 620.
7·3·1
The nominal coding gain of RM(2, 5) is γc (C) = 4 (6.02 dB); however, since Kb = N8 /k = 38.75,
the effective coding gain by our rule of thumb is only about γeff (C) ≈ 5.0 dB.
68 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES
The codes with r = m − 1 are single-parity-check (SPC) codes with d = 2. These codes
have nominal coding gain 2(k/n), which goes to 2 (3.01 dB) as n → ∞; however, since Nd =
2m (2m − 1)/2, we have Kb = 2m−1 → ∞, which ultimately limits the effective coding gain.
The codes with r = m − 2 are extended Hamming (EH) codes with d = 4. These codes
have nominal coding gain 4(k/n), which goes to 4 (6.02 dB) as n → ∞; however, since Nd =
2m (2m − 1)(2m − 2)/24, we again have Kb → ∞.
Exercise 4 (optimizing SPC and EH codes). Using the rule of thumb that a factor of two
increase in Kb costs 0.2 dB in effective coding gain, find the value of n for which an (n, n − 1, 2)
SPC code has maximum effective coding gain, and compute this maximum in dB. Similarly, find
m such that a (2m , 2m − m − 1, 4) extended Hamming code has maximum effective coding gain,
using Nd = 2m (2m − 1)(2m − 2)/24, and compute this maximum in dB.
The codes with r = 1 (first-order Reed-Muller codes) are interesting, because as shown in
Exercise 5 they generate biorthogonal signal sets of dimension n = 2m and size 2m+1 , with
nominal coding gain (m + 1)/2 → ∞. It is known that as n → ∞ this sequence of codes can
achieve arbitrarily small Pr(E) for any Eb /N0 greater than the ultimate Shannon limit, namely
Eb /N0 > ln 2 (-1.59 dB).
Exercise 5 (biorthogonal codes). We have shown that the first-order Reed-Muller codes
RM(1, m) have parameters (2m , m + 1, 2m−1 ), and that the (2m , 1, 2m ) repetition code RM(0, m)
is a subcode.
(a) Show that RM(1, m) has one word of weight 0, one word of weight 2m , and 2m+1 − 2
words of weight 2m−1 . [Hint: first show that the RM(1, m) code consists of 2m complementary
codeword pairs {x, x + 1}.]
(b) Show that the Euclidean image of an RM(1, m) code is an M = 2m+1 biorthogonal signal
set. [Hint: compute all inner products between code vectors.]
(c) Show that the code C consisting of all words in RM(1, m) with a 0 in any given coordinate
position is a (2m , m, 2m−1 ) binary linear code, and that its Euclidean image is an M = 2m
orthogonal signal set. [Same hint as in part (a).]
(d) Show that the code C consisting of the code words of C with the given coordinate deleted
(“punctured”) is a binary linear (2m − 1, m, 2m−1 ) code, and that its Euclidean image is an
M = 2m simplex signal set. [Hint: use Exercise 7 of Chapter 5.]
In Exercise 2 of Chapter 1, it was shown how a 2m -orthogonal signal set A can be constructed
as the image of a 2m × 2m binary Hadamard matrix. The corresponding 2m+1 -biorthogonal
signal set ±A is identical to that constructed above from the (2m , m + 1, 2m−1 ) first-order RM
code.
The code dual to RM(r, m) is RM(m − r − 1, m); this can be shown by recursion from the
facts that the (1, 1) and (1, 0) codes are duals and that by bilinearity
since u, u
+ u, u
= 0. In particular, this confirms that the repetition and SPC codes are
duals, and shows that the biorthogonal and extended Hamming codes are duals.
This also shows that RM codes with k/n = 1/2 are self-dual. The nominal coding gain of a
rate-1/2 RM code of length 2m (m odd) is 2(m−1)/2 , which goes to infinity as m → ∞. It seems
likely that as n → ∞ this sequence of codes can achieve arbitrarily small Pr(E) for any Eb /N0
greater than the Shannon limit for ρ = 1 b/2D, namely Eb /N0 > 1 (0 dB).
6.4. REED-MULLER CODES 69
We provide below a table of the nominal spectral efficiency ρ, nominal coding gain γc , number
of nearest neighbors Nd , error coefficient per bit Kb , and estimated effective coding gain γeff at
Pb (E) ≈ 10−5 for various Reed-Muller codes, so that the student can consider these codes as
components in system design exercises.
In later lectures, we will consider trellis representations and trellis decoding of RM codes. We
give here two complexity parameters of the minimal trellises for these codes: the state complexity
s (the binary logarithm of the maximum number of states in a minimal trellis), and the branch
complexity t (the binary logarithm of the maximum number of branches per section in a minimal
trellis). The latter parameter gives a more accurate estimate of decoding complexity.
All of our performance estimates assume minimum-distance (MD) decoding. In other words,
given a received sequence r ∈ Rn , the receiver must find the signal s(x) for x ∈ C such that the
squared distance r − s(x)2 is minimum. We will show that in the case of binary codes, MD
decoding reduces to maximum-reliability (MR) decoding.
Since s(x)2 = nα2 is independent of x with binary constellations s(C), MD decoding is
equivalent to maximum-inner-product decoding : find the signal s(x) for x ∈ C such that the
inner product �
r, s(x)
= rk s(xk )
k
The sign sgn(rk ) ∈ {±1} is often regarded as a “hard decision” based on rk , indicating which of
the two possible signals {±α} is more likely in that coordinate without taking into account the
remaining coordinates. The magnitude |rk | may be viewed as the reliability of the hard decision.
This rule may thus be expressed as: find the codeword x ∈ C that maximizes the reliability
�
r(x | r) = |rk |(−1)e(xk ,rk ) ,
k
where the “error” e(xk , rk ) is 0 if the signs of s(xk ) and rk agree, or 1 if they disagree. We call
this rule maximum-reliability decoding.
Any of these optimum decision rules is easy to implement for small constellations s(C). How-
ever, without special tricks they require at least one computation for every codeword x ∈ C, and
therefore become impractical when the number 2k of codewords becomes large. Finding simpler
decoding algorithms that give a good tradeoff of performance vs. complexity, perhaps only for
special classes of codes, has therefore been the major theme of practical coding research.
For example, the Wagner decoding rule, the earliest “soft-decision” decoding algorithm (circa
1955), is an optimum decoding rule for the special class of (n, n − 1, 2) SPC codes that requires
many fewer than 2n−1 computations.
Exercise 7 (“Wagner decoding”). Let C be an (n, n − 1, 2) SPC code. The Wagner decoding
rule is as follows. Make hard decisions on every symbol rk , and check whether the resulting
binary word is in C. If so, accept it. If not, change the hard decision in the symbol rk for which
the reliability metric |rk | is minimum. Show that the Wagner decoding rule is an optimum
decoding rule for SPC codes. [Hint: show that the Wagner rule finds the codeword x ∈ C that
maximizes r(x | r).]
6.5. DECODING OF BINARY BLOCK CODES 71
Early work on decoding of binary block codes assumed hard decisions on every symbol, yielding
a hard-decision n-tuple y ∈ (F2 )n . The main decoding step is then to find the codeword x ∈ C
that is closest to y in Hamming space. This is called error-correction.
If C is a linear (n, k, d) code, then, since the Hamming metric is a true metric, no error can
occur when a codeword x is sent unless the number of hard decision errors t = dH (x, y) is at
least as great as half the minimum Hamming distance, t ≥ d/2. For many classes of binary
block codes, efficient algebraic error-correction algorithms exist that are guaranteed to decode
correctly provided that 2t < d. This is called bounded-distance error-correction.
Example 5 (Hamming codes). The first binary error-correction codes were the Hamming
codes (mentioned in Shannon’s original paper). A Hamming code C is a (2m − 1, 2m − m − 1, 3)
code that may be found by puncturing a (2m , 2m − m − 1, 4) extended Hamming RM(m − 2, m)
code in any coordinate. Its dual C ⊥ is a (2m − 1, m, 2m−1 ) code whose Euclidean image is a
2m -simplex constellation. For example, the simplest Hamming code is the (3, 1, 3) repetition
code; its dual is the (3, 2, 2) SPC code, whose image is the 4-simplex constellation of Figure 1.
The generator matrix of C ⊥ is an m × (2m − 1) matrix H whose 2m − 1 columns must run
through the set of all nonzero binary m-tuples in some order (else C would not be guaranteed
to correct any single error; see next paragraph).
Since d = 3, a Hamming code should be able to correct any single error. A simple method for
doing so is to compute the “syndrome”
yH T = (x + e)H T = eH T ,
where e = x + y. If yH T = 0, then y ∈ C and y is assumed to be correct. If yH T
= 0, then
the syndrome yH T is equal to one of the rows in H T , and a single error is assumed to have
occurred in the corresponding position. Thus it is always possible to change any y ∈ (F2 )n into
a codeword by changing at most one bit.
This implies that the 2n−m “Hamming spheres” of radius 1 and size 2m centered on the 2n−m
codewords x, which consist of x and the n = 2m − 1 n-tuples y within Hamming distance 1 of
x, form an exhaustive partition of the set of 2n n-tuples that comprise Hamming n-space (F2 )n .
In summary, Hamming codes form a “perfect” Hamming sphere-packing of (F2 )n , and have a
simple single-error-correction algorithm.
We now show that even if an error-correcting decoder does optimal MD decoding in Hamming
space, there is a loss in coding gain of the order of 3 dB relative to MD Euclidean-space decoding.
Assume an (n, k, d) binary linear code C with d odd (the situation is worse when d is even).
Let x be the transmitted codeword; then there is at least one codeword at Hamming distance
d from x, and thus at least one real n-tuple in s(C) at Euclidean distance 4α2 d from s(x). For
any ε > 0, a hard-decision decoding error will occur if the noise exceeds α + ε in any (d + 1)/2
of the places in which that word differs from x. Thus with hard decisions the minimum squared
distance to the decision boundary in Euclidean space is α2 (d + 1)/2. (For d even, it is α2 d/2.)
On the other hand, with “soft decisions” (reliability weights) and MD decoding, the minimum
squared distance to any decision boundary in Euclidean
√
space is α2 d. To the accuracy of
the union bound estimate, the argument of the Q function thus decreases with hard-decision
decoding by a factor of (d + 1)/2d, or approximately 1/2 (−3 dB) when d is large. (When d is
even, this factor is exactly 1/2.)
72 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES
Example 6 (Hard and soft decoding of antipodal codes). Let C be the (2, 1, 2) binary code;
then the two signal points in s(C) are antipodal, as shown in Figure 3(a) below. With hard
decisions, real 2-space R2 is partitioned into four quadrants, which must then be assigned to one
or the other of the two signal points. Of course, two of the quadrants are assigned to the signal
points that they contain. However, no matter how the other two quadrants are assigned, there
will be at least one decision boundary at squared distance α2 from a signal point, whereas with
MD decoding the decision boundary is at distance 2α2 from both signal points. The loss in the
error exponent of Pb (E) is therefore a factor of 2 (3 dB).
R0 R0
R1 R0
α t R0
R0
α
? R0 R1 R0 R0
?
6 6 �
R1
R1 ? α √2αR1
� R1
α
�
t - t
� -
α α
(a) (b)
Figure 3. Decision regions in Rn with hard decisions. (a) (2, 1, 2) code; (b) (3, 1, 3) code.
Similarly, if C is the (3, 1, 3) code, then R3 is partitioned by hard decisions into 8 octants, as
shown in Figure 3(b). In this case (the simplest example of a Hamming code), it is clear how
best to assign four octants to each signal point. The squared distance from each signal point
to the nearest decision boundary is now 2α2 , compared to 3α2 with “soft decisions” and MD
decoding in Euclidean space, for a loss of 2/3 (1.76 dB) in the error exponent.
6.5.3 Erasure-and-error-correction
rk → 0 if rk > T ;
rk → 1 if rk < −T ;
rk → ? if −T ≤ rk ≤ T.
The decoder subsequently tries to map the ternary-valued n-tuple into the closest codeword
x ∈ C in Hamming space, where the erased positions are ignored in measuring Hamming distance.
If there are s erased positions, then the minimum distance between codewords is at least
d − s in the unerased positions, so correct decoding is guaranteed if the number t of errors in the
unerased positions satisfies t < (d−s)/2, or equivalently if 2t+ s < d. For many classes of binary
block codes, efficient algebraic erasure-and-error-correcting algorithms exist that are guaranteed
to decode correctly if 2t + s < d. This is called bounded-distance erasure-and-error-correction.
6.5. DECODING OF BINARY BLOCK CODES 73
b t
�
? a
� b
�
6
? ?
�
b a
� ?
t� -
b
Figure 4. Decision regions with hard decisions and erasures for the (2, 1, 2) code.
Example 6 (cont.). Figure 4 shows the 9 decision regions for the (2, 1, 2) code that result from
hard decisions and/or erasures on each symbol. Three of the resulting regions are ambiguous.
The minimum squared distances to these regions are
a2 = 2(α − T )2
b2 = (α + T )2 .
√
To maximize the minimum of a2 and b2 , we make a2 = b2 by choosing T = √2−1 α, which yields
2+1
8
a2 = b2 √ α2 = 1.372α2 .
( 2 + 1)2
This is about 1.38 dB better than the squared Euclidean distance α2 achieved with hard decisions
only, but is still 1.63 dB worse than the 2α2 achieved with MD decoding.
Exercise 8 (Optimum threshold T ). Let C be a binary code with minimum distance d, and
let received symbols be mapped into hard decisions or erasures as above. Show that:
(a) For any integers t and s such that 2t + s ≥ d and for any decoding rule, there exists some
pattern of t errors and s erasures that will cause a decoding error;
(b) The minimum squared distance from any signal point to its decoding decision boundary is
equal to at least min2t+s≥d {s(α − T )2 + t(α + T )2 };
√
(c) The value of T that maximizes this minimum squared distance is T = √2−1 α, in which
2+1
case the minimum squared distance is equal to √ 4 α2 d = 0.686 α2 d. Again, this is a loss of
( 2+1)2
1.63 dB relative to the squared distance α2 d that is achieved with MD decoding.
74 CHAPTER 6. INTRODUCTION TO BINARY BLOCK CODES
A further step in this direction that achieves almost the same performance as MD decoding,
to the accuracy of the union bound estimate, yet still permits algebraic decoding algorithms, is
generalized minimum distance (GMD) decoding.
In GMD decoding, the decoder keeps both the hard decision sgn(rk ) and the reliability |rk | of
each received symbol, and orders them in order of their reliability.
The GMD decoder then performs a series of erasure-and-error decoding trials in which the
s = d − 1, d − 3, . . . least reliable symbols are erased. (The intermediate trials are not necessary
because if d − s is even and 2t < d − s, then also 2t < d − s − 1, so the trial with one additional
erasure will find the same codeword.) The number of such trials is d/2 if d is even, or (d + 1)/2
if d is odd; i.e., the number of trials needed is d/2.
Each trial may produce a candidate codeword. The set of d/2 trials may thus produce up
to d/2 distinct candidate codewords. These words may finally be compared according to their
reliability r(x | r) (or any equivalent optimum metric), and the best candidate chosen.
Example 7. For an (n, n − 1, 2) SPC code, GMD decoding performs just one trial with
the least reliable symbol erased; the resulting candidate codeword is the unique codeword that
agrees with all unerased symbols. Therefore in this case the GMD decoding rule is equivalent
to the Wagner decoding rule (Exercise 7), which implies that it is optimum.
It can be shown that no error can occur with a GMD decoder provided that the squared norm
||n||2 of the noise vector is less than α2 d; i.e., the squared distance from any signal point to its
decision boundary is α2 d, just as for MD decoding. Thus there is no loss in coding gain or error
exponent compared to MD decoding.
It has been shown that for the most important classes of algebraic block codes, GMD decoding
can be performed with little more complexity than ordinary hard-decision or erasures-and-errors
decoding. Furthermore, it has been shown that not only is the error exponent of GMD decod-
ing equal to that of optimum MD decoding, but also the error coefficient and thus the union
bound estimate are the same, provided that GMD decoding is augmented to include a d-erasure-
correction trial (a purely algebraic solution of the n − k linear parity-check equations for the d
unknown erased symbols).
However, GMD decoding is a bounded-distance decoding algorithm, so its decision regions are
like spheres of squared radius α2 d that lie within the MD decision regions Rj . For this reason
GMD decoding is inferior to MD decoding, typically improving over erasure-and-error-correction
by 1 dB or less. GMD decoding has rarely been used in practice.
6.5.5 Summary
In conclusion, hard decisions allow the use of efficient algebraic decoding algorithms, but incur
a significant SNR penalty, of the order of 3 dB. By using erasures, about half of this penalty
can be avoided. With GMD decoding, efficient algebraic decoding algorithms can in principle
be used with no loss in performance, at least as estimated by the the union bound estimate.
Chapter 7
This chapter provides an introduction to several kinds of abstract algebraic structures, partic-
ularly groups, fields, and polynomials. Our primary interest is in finite fields, i.e., fields with
a finite number of elements (also called Galois fields). In the next chapter, finite fields will be
used to develop Reed-Solomon (RS) codes, the most useful class of algebraic codes. Groups and
polynomials provide the requisite background to understand finite fields.
A field is more than just a set of elements: it is a set of elements under two operations,
called addition and multiplication, along with a set of properties governing these operations.
The addition and multiplication operations also imply inverse operations called subtraction and
division. The reader is presumably familiar with several examples of fields, such as the real field
R, the complex field C, the field of rational numbers Q, and the binary field F2 .
7.1 Summary
In this section we briefly summarize the results of this chapter. The main body of the chapter
will be devoted to defining and explaining these concepts, and to proofs of these results.
For each prime p and positive integer m ≥ 1, there exists a finite field Fpm with pm elements,
and there exists no finite field with q elements if q is not a prime power. Any two fields with pm
elements are isomorphic.
The integers modulo p form a prime field Fp under mod-p addition and multiplication. The
polynomials Fp [x] over Fp modulo an irreducible polynomial g(x) ∈ Fp [x] of degree m form a
finite field with pm elements under mod-g(x) addition and multiplication. For every prime p,
there exists at least one irreducible polynomial g(x) ∈ Fp [x] of each positive degree m ≥ 1, so
all finite fields may be constructed in this way.
Under addition, Fpm is isomorphic to the vector space (Fp )m . Under multiplication, the nonzero
m
elements of Fpm form a cyclic group {1, α, . . . , αp −2 } generated by a primitive element α ∈ Fpm .
m
The elements of Fpm are the pm roots of the polynomial xp − x ∈ Fp [x]. The polynomial
m
xp − x is the product of all monic irreducible polynomials g(x) ∈ Fp [x] such that deg g(x)
divides m. The roots of a monic irreducible polynomial g(x) ∈ Fp [x] form a cyclotomic coset of
deg g(x) elements of Fpm which is closed under the operation of raising to the pth power.
For every n that divides m, Fpm contains a subfield with pn elements.
75
76 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
For further reading on this beautiful subject, see [E. R. Berlekamp, Algebraic Coding The-
ory, Aegean Press, 1984], [R. Lidl and H. Niederreiter, Introduction to Finite Fields and their
Applications, Cambridge University Press, 1986] or [R. J. McEliece, Finite Fields for Com-
puter Scientists and Engineers, Kluwer, 1987], [M. R. Schroeder, Number Theory in Science and
Communication, Springer, 1986], or indeed any book on finite fields or algebraic coding theory.
7.2.1 Definitions
Given a positive integer n, every integer i may be uniquely expressed as i = qn + r for some
integer remainder r in the interval 0 ≤ r ≤ n − 1 and some integer quotient q. This may be
proved by the Euclidean division algorithm, which if i ≥ n just subtracts n from i repeatedly
until the remainder lies in the desired interval.
The remainder r, denoted by r = i mod n, is the more important part of this expression. The
set of possible mod-n remainders is the set of n integers Rn = {0, 1, . . . , n − 1}. Evidently n is
a divisor of i if and only if i mod n = 0.
Remainder arithmetic using the mod-n remainder set Rn is called “mod-n arithmetic.” The
rules for mod-n arithmetic follow from the rules for integer arithmetic as follows. Let r = i mod n
and s = j mod n; then, as integers, r = i − qn and s = j − tn for some quotients q and t. Then
r + s = i + j − (q + t)n;
rs = ij − (qj + ti)n + qtn2 .
Hence (r + s) mod n = (i + j) mod n and rs mod n = ij mod n; i.e., the mod-n remainder of
the sum or product of two integers is equal to the mod-n remainder of the sum or product of
their mod-n remainders, as integers.
7.3. GROUPS 77
The mod-n addition and multiplication rules are therefore defined as follows:
r ⊕ s = (r + s) mod n;
r ∗ s = (rs) mod n,
where “r” and “s” denote elements of the remainder set Rn on the left and the corresponding
ordinary integers on the right. This makes mod-n arithmetic consistent with ordinary integer
arithmetic in the sense expressed in the previous paragraph.
Given a positive integer i, we may factor i into a unique product of prime factors by simply
factoring out primes no greater than i until we arrive at the quotient 1, as the reader has known
since grade school. For the time being, we will take this unique factorization property as given.
A proof will be given as an exercise after we prove the corresponding property for polynomials.
7.3 Groups
We now introduce groups.
Definition 7.1 A group is a set of elements G = {a, b, c, . . .} and an operation ⊕ for which the
following axioms hold:
This example illustrates that the group structure (i.e., the properties stemming from the group
operation ⊕) may reflect only part of the structure of the given set of elements; e.g., the additive
group structure of R takes no account of the fact that real numbers may also be multiplied, and
the multiplicative group structure of R − {0} takes no account of the fact that real numbers may
also be added.
We abbreviate b ⊕ (−a) for any a, b ∈ G by b − a and regard “−” as an additional opera-
tion implicitly defined by the axioms. In an additive group, “−” is called subtraction; in a
multiplicative group, “−” is called division and denoted by / or ÷.
Because of the inverse operation, cancellation is always permissible; i.e., if x ⊕ a = y ⊕ a, we
can add −a to both sides, showing that x = y. Similarly, one can move terms from one side of
an equation to the other; i.e., x ⊕ a = y implies x = y − a.
Exercise 1 (Inverses and cancellation)
(a) Verify the following set of implications for arbitrary elements a, b of a group G which is
not necessarily abelian:
b ⊕ a = 0 ⇒ b = −a ⇒ a ⊕ b = 0 ⇒ a = −b ⇒ b ⊕ a = 0.
(b) Use this result to show that the inverse is unique, i.e., that a ⊕ b = 0 ⇒ b = −a, and
that the inverse also works on the left, i.e., b ⊕ a = 0 ⇒ b = −a. Note that this shows that
cancellation is permitted on either the right or the left.
(c) Show that the identity element is unique, i.e., that for a, b ∈ G, a ⊕ b = a ⇒ b = 0 and
b ⊕ a = a ⇒ b = 0.
If G has a finite number of elements, G = {a1 , a2 , . . . , an }, then G is said to be finite and
|G| = n is said to be the order of G. The group operation ⊕ may then be specified by an n × n
“addition table” whose entry at row i, column j is ai ⊕ aj . The cancellation property implies
that if aj = ak , then ai ⊕ aj =
ai ⊕ ak . This means that all elements in any row i of the addition
table are distinct; i.e., each row contains each element of G exactly once. Similarly, each column
contains each element of G exactly once. Thus the group axioms restrict the group operation ⊕
more than might be immediately evident.
The property that a “row of the addition table,” namely a ⊕ G = {a ⊕ b | b ∈ G} is just the set
of elements of G in a different order (i.e., a permutation of G) is a fundamental property of any
group G. We will now show that this permutation property may be taken as one of the group
axioms. Subsequently we will use this property to prove that certain sets are groups.
Proof. (⇒) If G is a group under ⊕, then by the closure property every element a ⊕ b is in G.
Moreover, the fact that a ∈ G has an inverse −a ∈ G implies that every element b ∈ G may be
written as a ⊕ (−a ⊕ b) ∈ a ⊕ G, so every element of G is in a ⊕ G. Finally, from the cancellation
property, a ⊕ b = a ⊕ c implies b = c. Thus the correspondence between G and a ⊕ G defined by
b ↔ a ⊕ b is one-to-one; i.e., a permutation.
(⇐) Conversely, if a ⊕ G is a permutation of G for every a ∈ G, then (a) the closure property
holds; i.e., a ⊕ b ∈ G for all a, b ∈ G; (b) since 0 ∈ a ⊕ G, there must exist a unique b ∈ G such
that a ⊕ b = 0, so a has a unique inverse −a = b under ⊕. Thus G is a group under ⊕.
The properties of “rows” a ⊕ G hold equally for “columns” G ⊕ a, even when G is nonabelian.
For example, the set R∗ of nonzero elements of the real field R form an abelian group under
real multiplication, because real multiplication is associative and commutative with identity 1,
and αR∗ is a permutation of R∗ for any α ∈ R∗ .
Exercise 2 (Invertible subsets).
(a) Let H be a set of elements on which an associative operation ⊕ is defined with identity 0,
and let G be the subset of elements h ∈ H which have unique inverses −h such that h ⊕ −h = 0.
Show that G is a group under ⊕.
(b) Show that the nonzero elements of the complex field form a group under complex multi-
plication.
(c) Show that the set of invertible n × n real matrices forms a (nonabelian) group under real
matrix multiplication.
(d) What are the invertible elements of Z under multiplication? Do they form a group?
1g = g, 2g = g ⊕ g, . . . , ig = g ⊕ · · · ⊕ g, . . .
� �� �
i terms
Since g generates G and G includes the identity element 0, we must have ig = 0 for some positive
integer i. Let n be the smallest such integer; thus ng = 0 and ig = 0 for 1 ≤ i ≤ n − 1. Adding
jg. Thus the elements
the sum of j g’s for any j > 0 to each side of ig = 0 results in (i + j)g =
{1g, 2g, . . . , ng = 0} must all be different.
1
Mathematicians say also that an infinite group G = {. . . , −1g, 0g, 1g, 2g, . . .} generated by a single element g
is cyclic; e.g., the group of integers Z is an infinite cyclic group with generator 1. Although such infinite cyclic
groups have the single-generator property of finite cyclic groups, they do not “cycle.” Hereafter, “cyclic group”
will mean “finite cyclic group.”
80 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
We can also add jg to both sides of the equality ng = 0, yielding (j + n)g = jg for any j > 0.
Thus for each i > n, ig is equal to some earlier element in the sequence, namely (i − n)g. The
elements {1g, 2g, . . . , ng = 0} therefore constitute all of the distinct elements in G, and the order
of G is |G| = n. If we define 0g to be the identity 0, then the elements of G may be conveniently
represented as G = {0g = 0, 1g, . . . , (n − 1)g}.
Figure 1 illustrates the cyclic structure of G that arises from the relation (j + n)g = jg.
0 = ng = 2ng = · · ·
r
r (n − 1)g = r g = (n + 1)g = · · ·
(2n − 1)g = · · ·
r r 2g = (n + 2)g = · · ·
r r
3g = (n + 3)g = · · ·
r
4g = (n + 4)g = · · ·
Figure 1. The cyclic structure of a cyclic group: the sequence {1g, 2g, . . .} goes from the group
element g up to ng = 0, then returns to g and continues to cycle.
Theorem 7.2 (Cyclic groups) The elements of a cyclic group G of order n with generator
g are {0g, 1g, 2g, . . . , (n − 1)g}. The addition rule is ig ⊕ jg = (i + j mod n)g, the identity is
0g, and the inverse of ig = 0g is (n − i)g. Finally, G is isomorphic to Zn under the one-to-one
correspondence ig ↔ i.
7.3.3 Subgroups
A subgroup S of a group G is a subset of the elements of the group such that if a, b ∈ S, then
a ⊕ b ∈ S and −a ∈ S. A subgroup S therefore includes the identity element of G and the
inverse of each element in S. The associative law holds for S since it holds for G. Therefore a
subgroup S ⊆ G is itself a group under the group operation of G.
For example, the set of integers Z is a subgroup of the additive group of R.
If G is abelian, then S must be abelian; however, S may be abelian even if G is nonabelian.
For any g ∈ G, we define the coset (translate) S ⊕ g = {s ⊕ g | s ∈ S}. The zero coset S ⊕ 0 is
thus equal to S itself; moreover, by Theorem 7.1, S ⊕ g = S whenever g ∈ S.
The following theorem states a more general result:
Lemma 7.3 (Cosets) Two cosets S ⊕ g and S ⊕ h are the same if g − h ∈ S, but are disjoint
if g − h ∈
/ S.
Theorem 7.4 (Lagrange) If S is a subgroup of a finite group G, then |S| divides |G|.
Given any finite group G and any element g ∈ G, the set of elements generated by g, namely
S(g) = {g, g ⊕ g, . . .}, is a cyclic subgroup of G. The order of g is defined as the order |S(g)|
of S(g). By Lagrange’s theorem, |S(g)| divides |G|, and by the cyclic groups theorem, S(g) is
isomorphic to Z|S(g)| . (If g = 0, then S(g) = {0} and |S(g)| = 1. We will assume g = 0.)
As a fundamental example, let G be the cyclic group Zn = {0, 1, . . . , n − 1}, and let S(m) be
the cyclic subgroup {m, 2m, . . .} generated by m ∈ Zn . Here im = m ⊕ · · · ⊕ m is simply the sum
of m with itself i times; i.e., im ∈ G is the ordinary product im mod n. The order |S(m)| of
S(m) is the least positive integer k such that km = 0 mod n; i.e., such that the integer product
km is divisible by n. Thus km is the least common multiple of m and n, denoted lcm(m, n), and
|S(m)| = k = lcm(m, n)/m. By elementary number theory, lcm(m, n) = mn/ gcd(m, n) for any
positive integers m, n, so we may alternatively write |S(m)| = n/ gcd(m, n), where gcd(m, n)
denotes the greatest common divisor of m and n. This shows explicitly that |S(m)| divides n.
For example, suppose n = 10 and m = 4. Then S(4) = {4, 8, 2, 6, 0}. Thus |S(4)| = 5,
consistent with |S(4)| = lcm(4, 10)/4 = 20/4 or |S(4)| = 10/ gcd(4, 10)/4 = 10/2.
82 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
Now when does S(m) = Zn ? This occurs if and only if gcd(m, n) = 1; i.e., if and only if m is
relatively prime to n. In short, m generates Zn and has order |S(m)| = n if and only if m and
n are relatively prime. The number of integers in the set {0, 1, . . . , n − 1} that have order n is
called the Euler number φ(n).
For example, in Z10 the integers that are relatively prime to 10 are {1, 3, 7, 9}, so φ(10) = 4.
The order of the other elements of Z10 are as follows:
• {2, 4, 6, 8} have order 5, and S(2) = S(4) = S(6) = S(8) = {0, 2, 4, 6, 8}.
In general, Zn has a cyclic subgroup Sd of order d for each positive integer d that divides n,
including 1 and n. Sd consists of {0, n/d, 2n/d, . . . , (d −1)n/d}, and is isomorphic to Zd . Sd thus
contains φ(d) elements that are relatively prime to d, each of which has order d and generates
Sd . The remaining elements of Sd belong also to smaller cyclic subgroups.
For example, Z10 has a subgroup S5 = {0, 2, 4, 6, 8} with 5 elements. Four of these elements,
namely {2, 4, 6, 8}, are relatively prime to 5 and generate S5 . The remaining element of S5 ,
namely 0, has order 1.
Since every element of Zn has some definite order d that divides n, we have
�
n= φ(d). (7.1)
d: d|n
The notation d : d|n means the set of positive integers d, including 1 and n, that divide n.
All Euler numbers may be determined recursively from this expression. For example, φ(1) =
1, φ(2) = 2 − φ(1) = 1, φ(3) = 3 − φ(1) = 2, φ(4) = 4 − φ(1) − φ(2) = 2, . . ..
Exercise 3. Show that φ(n) ≥ 1 for all n ≥ 1. [Hint: Find the order of 1 in Zn .]
Since every cyclic group G of size n is isomorphic to Zn , these results apply to every cyclic
group. In particular, every cyclic group G of size n has φ(n) generators that generate G, which
are called the primitive elements of G. G also contains one cyclic subgroup of size d for each d
that divides n.
Exercise 4. Show that every subgroup of Zn is cyclic. [Hint: Let s be the smallest nonzero
element in a subgroup S ⊆ Zn , and compare S to the subgroup generated by s.]
7.4. FIELDS 83
7.4 Fields
Definition 7.2 A field is a set F of at least two elements, with two operations ⊕ and ∗, for
which the following axioms are satisfied:
• The set F forms an abelian group (whose identity is called 0) under the operation ⊕.
• The set F∗ = F − {0} = {a ∈ F, a = 0} forms an abelian group (whose identity is called 1)
under the operation ∗.
• Distributive law: For all a, b, c ∈ F, (a ⊕ b) ∗ c = (a ∗ c) ⊕ (b ∗ c).
The operation ⊕ is called addition (and often denoted by +), and the operation ∗ is called
multiplication (and often denoted by juxtaposition). As in ordinary arithmetic, we often omit the
parentheses around a product of elements, using the convention “multiplication before addition;”
e.g., we interpret a ⊕ b ∗ c as a ⊕ (b ∗ c).
The reader may verify that R, C, Q and F2 each form a field according to this definition under
conventional addition and multiplication.
Exercise 5. Show that for any element a ∈ F, a ∗ 0 = 0.
A fundamental example of a finite (Galois) field is the set Fp of mod-p remainders, where p is
a given prime number. Here, as in Zp , the set of elements is Rp = {0, 1, · · · , p − 1}, and the
operation ⊕ is mod-p addition. The multiplicative operation ∗ is mod-p multiplication; i.e.,
multiply integers as usual and then take the remainder after division by p.
Theorem 7.5 (Prime fields) For every prime p, the set Rp = {0, 1, · · · , p − 1} forms a field
(denoted by Fp ) under mod-p addition and multiplication.
Proof. We have already seen that the elements of Fp form an abelian group under addition
modulo p, namely the cyclic group Zp .
The associative and commutative properties of multiplication mod p follow from the corre-
sponding properties of ordinary multiplication; the distributive law follows from the correspond-
ing property for ordinary addition and multiplication. The multiplicative identity is 1.
To see that the nonzero elements F∗p = Fp − {0} form a group under multiplication, we use
Theorem 7.1. By unique factorization, the product of two nonzero integers a, b < p cannot
equal 0 mod p. Therefore the nonzero elements F∗p are closed under multiplication mod p. Also,
for a, b, c ∈ F∗p and b = c we have a(b − c) mod p = 0. Thus ab = ac mod p, which implies
a ∗ b = a ∗ c. Consequently there are no zeroes or repetitions in the set of p − 1 elements
{a ∗ 1, a ∗ 2, . . . , a ∗ (p − 1)}, which means they must be a permutation of F∗p .
We next show that Fp is essentially the only field with p elements. More precisely, we show
that all fields with p elements are isomorphic. Two fields F and G are isomorphic if there
is an invertible function h : F → G mapping each α ∈ F into a β = h(α) ∈ G such that
h(α ⊕ α ) = h(α) ⊕ h(α ) and h(α ∗ α ) = h(α) ∗ h(α ). Less formally, F and G are isomorphic
if there is a one-to-one correspondence F ↔ G that translates the addition and multiplication
tables of F to those of G and vice versa.
84 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
Let F be any field with a prime p number of elements. By the field axioms, F has an additive
identity 0 and multiplicative identity 1. Consider the additive cyclic subgroup generated by 1,
namely S(1) = {1, 1 ⊕ 1, . . .}. By Lagrange’s theorem, the order of S(1) divides |F| = p, and
therefore must be equal to 1 or p. But 1 ⊕ 1 = 1, else 1 = 0, so 1 must have order p. In other
words, S(1) = F, and the additive group of F is isomorphic to that of Zp . We may therefore
denote the elements of F by {0, 1, 2, . . . , p − 1}, and use mod-p addition as the addition rule.
The only remaining question is whether this correspondence F ↔ Zp under addition extends
to multiplication. The distributive law shows that it does: j ∗ i is the sum of j terms each equal
to i, so j ∗ i = (ji mod p). Therefore, in summary:
Theorem 7.6 (Prime field uniqueness) Every field F with a prime number p of elements is
isomorphic to Fp via the correspondence 1� ⊕ ·��
· · ⊕ 1� ∈ F ↔ i ∈ Fp .
i terms
In view of this elementary isomorphism, we will denote any field with a prime number p of
elements by Fp .
It is important to note that the set Zn of integers mod n does not form a field if n is not prime.
The reason is that n = ab for some positive integers a, b < n ∈ Zn ; thus ab = 0 mod n, so the
set of nonzero elements of Zn is not closed under multiplication mod n.
However, we will see shortly that there do exist finite fields with non-prime numbers of elements
that use other rules for addition and multiplication.
A subfield G of a field F is a subset of the field that is itself a field under the operations of F.
For example, the real field R is a subfield of the complex field C. We now show that every finite
field Fq has a subfield that is isomorphic to a prime field Fp .
Let Fq be a finite field with q elements. By the field axioms, Fq has an additive identity 0 and
a multiplicative identity 1.
Consider the cyclic subgroup of the additive group of Fq that is generated by 1, namely
S(1) = {1, 1 ⊕ 1, . . .}. Let n = |S(1)|. By the cyclic group theorem, S(1) is isomorphic to Zn ,
and its elements may be denoted by {0, 1, 2, . . . , n − 1}, with mod-n addition.
By the distributive law in Fq , the product i∗j (in Fq ) of two nonzero elements in S(1) is simply
the sum of ij ones, which is an element of S(1), namely ij mod n. Since this is a product of
nonzero elements of Fq , by the field axioms ij mod n must be nonzero for all nonzero i, j. This
will be true if and only if n is a prime number p.
Thus S(1) forms a subfield of Fq with a prime number p of elements. By the prime field
theorem of the previous subsection, S(1) is isomorphic to Fp . Thus the elements of S(1), which
are called the integers of Fq , may be denoted by Fp = {0, 1, . . . , p − 1}, and the addition and
multiplication rules of Fq reduce to mod-p addition and multiplication in Fp .
The prime p is called the characteristic of Fq . Since the p-fold sum of the identity 1 with itself
is 0, the p-fold sum of every field element β ∈ Fq with itself is 0: pβ = 0.
In summary:
Theorem 7.7 (Prime subfields) The integers {1, 1 ⊕ 1, . . .} of any finite field Fq form a sub-
field Fp ⊆ Fq with a prime number p of elements, where p is the characteristic of Fq .
7.5. POLYNOMIALS 85
7.5 Polynomials
We now consider polynomials over Fp , namely polynomials whose coefficients lie in Fp and
for which polynomial addition and multiplication is performed in Fp . We will see that the
factorization properties of polynomials are similar to those of the integers, and that the analogue
to mod-n arithmetic is arithmetic modulo a polynomial f (x).
A nonzero polynomial f (x) of degree m over a field F is an expression of the form
f (x) = f0 + f1 x + f2 x2 + · · · + fm xm ,
�
i
fi = hj gi−j .
j=0
If two nonzero polynomials are multiplied, then their degrees add; i.e., deg(h(x)g(x)) =
deg h(x) + deg g(x). The convention deg 0 = −∞ ensures that this formula continues to hold
when h(x) or g(x) is the zero polynomial.
The set F[x] has many of the properties of a field. It is evidently an abelian group under
addition whose identity is the zero polynomial 0 ∈ F[x]. It is closed under multiplication, which
is both associative and commutative and which distributes over addition. It has a multiplicative
identity 1 ∈ F[x], and the cancellation law holds.
However, in general we cannot divide evenly by a nonzero polynomial, since a polynomial f (x)
with deg f (x) > 0 has no multiplicative inverse. Therefore F[x] is a ring,4 not a field, like the
ring of integers Z. We now develop a series of properties of F[x] that resemble those of Z.
3
Over the real field R, a polynomial f (x) is sometimes regarded as a function f : R → R. This alternative
viewpoint makes little difference in the real case, since two polynomials over R are different if and only if the
corresponding polynomial functions are different. However, over finite fields it is important to maintain the
distinction. For example, over F2 the polynomial functions x and x2 both map 0 → 0, 1 → 1, yet the polynomials
x and x2 are different.
4
The axioms of a ring are similar to those for a field, except that there is no multiplicative inverse. For example,
Z and Zn (for n not a prime) are rings. In fact, Z and F[x] are integer domains, which are the nicest kind of
rings. An integer domain is a ring with commutative multiplication and a multiplicative identity 1 such that the
nonzero elements are closed under multiplication.
Exercise 6. Show that an integer domain with a finite number of elements must be a finite field. [Hint:
consider its cyclic multiplicative subgroups.]
86 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
7.5.1 Definitions
Given a monic polynomial g(x) of degree m, every polynomial f (x) may be expressed as f (x) =
q(x)g(x)+r(x) for some polynomial remainder r(x) such that deg r(x) < m and some polynomial
quotient q(x). This may be proved by the Euclidean long division algorithm of high school, with
component operations in F; i.e., divide g(x) into f (x) by long division, high-degree terms first,
stopping when the degree of the remainder is less than that of g(x). The following exercise
shows that the resulting quotient q(x) and remainder r(x) are unique.
Exercise 7 (Euclidean division algorithm).
(a) For the set F[x] of polynomials over any field F, show that the distributive law holds:
(f1 (x) + f2 (x))h(x) = f1 (x)h(x) + f2 (x)h(x).
(b) Use the distributive law to show that for any given f (x) and g(x) in F[x], there is a unique
q(x) and r(x) with deg r(x) < deg g(x) such that f (x) = q(x)g(x) + r(x).
The remainder polynomial r(x), denoted by r(x) = f (x) mod g(x), is the more important
part of this decomposition. The set of all possible remainder polynomials is the set RF,m =
{r0 + r1 x + · · · + rm−1 xm−1 | rj ∈ F, 0 ≤ j ≤ m − 1}, whose size is |RF,m | = |F|m . Evidently
g(x) is a divisor of f (x) if and only if f (x) mod g(x) = 0.
Remainder arithmetic using the remainder set RF,m is called “mod-g(x) arithmetic.” The
rules for mod-g(x) arithmetic follow from the rules for polynomial arithmetic as follows. Let
r(x) = f (x) mod g(x) and s(x) = h(x) mod g(x); then, as polynomials, r(x) = f (x) − q(x)g(x)
and s(x) = h(x) − t(x)g(x) for some quotient polynomials q(x) and t(x). Then
7.5. POLYNOMIALS 87
Hence (f (x) + h(x)) mod g(x) = (r(x) + s(x)) mod g(x) and f (x)h(x) mod g(x) = r(x)s(x)
mod g(x). In other words, the mod-g(x) remainder of the sum or product of two polynomials is
equal to the mod-g(x) remainder of the sum or product of their mod-g(x) remainders.
The mod-g(x) addition and multiplication rules are therefore defined as follows:
where “r(x)” and “s(x)” denote elements of the remainder set RF,m on the left and the corre-
sponding ordinary polynomials on the right. This makes mod-g(x) arithmetic consistent with
ordinary polynomial arithmetic in the sense of the previous paragraph.
Note that the mod-g(x) addition rule is just componentwise addition of coefficients in F. In this
sense the additive groups of RF,m and of the vector space Fm of m-tuples over F are isomorphic.
By definition, every monic polynomial f (x) is either irreducible or can be factored into a product
of monic polynomial factors, each of lower degree. In turn, if a factor is not irreducible, it can
be factored further. Since factor degrees are decreasing but bounded below by 1, we must
eventually arrive at a product of monic irreducible (prime) polynomials. The following theorem
shows that there is only one such set of prime polynomial factors, regardless of the order in
which the polynomial is factored.
Theorem 7.8 (Unique factorization of polynomials) Over any field F, every monic poly-
nomial f (x) ∈ F[x] of degree m ≥ 1 may be written in the form
�
k
f (x) = ai (x),
i=1
Proof. We have already shown that f (x) may be factored in this way, so we need only prove
uniqueness. Thus assume hypothetically that the theorem is false and let m be the smallest
degree such that there exists a degree-m monic polynomial f (x) with more than one such
factorization,
f (x) = a1 (x) · · · ak (x) = b1 (x) · · · bj (x); j, k ≥ 1, (7.2)
where a1 (x), . . . , ak (x) and b1 (x), . . . , bj (x) are prime polynomials. We will show that this implies
a polynomial f (x) with degree less than m with non-unique factorization, and this contradiction
will prove the theorem. Now a1 (x) cannot appear on the right side of (7.2), else it could be
factored out for an immediate contradiction. Similarly, b1 (x) cannot appear on the left. Without
loss of generality, assume deg b1 (x) ≤ deg a1 (x). By the Euclidean division algorithm, a1 (x) =
q(x)b1 (x) + r(x). Since a1 (x) is irreducible, r(x) = 0 and 0 ≤ deg r(x) < deg b1 (x) ≤ deg a1 (x).
88 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
Thus r(x) has a prime factorization r(x) = βr1 (x) · · · rn (x), where β is the high-order coefficient
of r(x), and b1 (x) is not a divisor of any of the ri (x), since it has greater degree. Substituting
into (7.2), we have
or, defining f (x) = r1 (x) · · · rn (x)a2 (x) · · · ak (x) and rearranging terms,
f (x) = r1 (x) · · · rn (x)a2 (x) · · · ak (x) = β −1 b1 (x)(b2 (x) · · · bj (x) − q(x)a2 (x) · · · ak (x)).
Now f (x) is monic, because it is a product of monic polynomials; it has degree less than f (x),
since deg r(x) < deg a1 (x); and it has two different factorizations, with b1 (x) a factor in one but
not a divisor of any of the factors in the other; contradiction.
Exercise 8. Following this proof, prove unique factorization for the integers Z.
The prime polynomials in F[x] are analogous to the prime numbers in Z. One way to enumerate
the prime polynomials is to use an analogue of the sieve of Eratosthenes. For integers, this
method goes as follows: Start with a list of all integers greater than 1. The first integer on the
list is 2, which is prime. Erase all multiples of 2 (even integers). The next remaining integer
is 3, which must be the next prime. Erase all multiples of 3. The next remaining integer is 5,
which must be the next prime. Erase all multiples of 5. And so forth.
Similarly, to find the prime polynomials in F2 [x], for example, first list all polynomials of degree
1 or more in F2 [x] in order of degree. (Note that all nonzero polynomials in F2 [x] are monic.)
No degree-1 polynomial can have a factor, so the two degree-1 polynomials, x and x + 1, are
both prime. Next, erase all degree-2 multiples of x and x + 1, namely
x2 = x ∗ x;
x2 + x = x ∗ (x + 1);
x2 + 1 = (x + 1) ∗ (x + 1)
from the list of four degree-2 polynomials. This leaves one prime degree-2 polynomial, namely
x2 + x + 1. Next, erase all degree-3 multiples of x, x + 1, and x2 + x + 1 from the list of eight
degree-3 polynomials, namely the six polynomials
x3 = x ∗ x ∗ x;
x3 + x2 = (x + 1) ∗ x ∗ x;
x3 + x = (x + 1) ∗ (x + 1) ∗ x;
x3 + x2 + x = x ∗ (x2 + x + 1);
x3 + 1 = (x + 1) ∗ (x2 + x + 1);
x3 + x2 + x + 1 = (x + 1) ∗ (x + 1) ∗ (x + 1).
It turns out that the number N (m) of prime polynomials of F2 [x] of degree m is N (m) =
2, 1, 2, 3, 6, 9, 18, 30, 56, 99, . . . for m = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . .. (In Section 7.9 we will give a
simpler method to compute N (m), and will show that N (m) > 0 for all m.)
A similar sieve algorithm may be used to find the prime polynomials in F[x] over any finite
field F. The algorithm starts with a listing of the monic polynomials ordered by degree, and
successively erases the multiples of lower-degree prime polynomials.
Example 1. Let us construct a finite field with 22 = 4 elements using the prime degree-2
polynomial g(x) = x2 + x + 1 ∈ F2 [x].
There are four remainder polynomials mod x2 + x + 1, namely {0, 1, x, x + 1}. Addition is
componentwise mod 2. For multiplication, note that x∗x = x+1 since x2 mod (x2 +x+1) = x+1.
Also x ∗ x ∗ x = x ∗ (x + 1) = 1 since x3 mod (x2 + x + 1) = 1. The three nonzero elements
{1, x, x + 1} thus form a cyclic group under mod-g(x) multiplication, which verifies the second
field axiom for this example.
The complete mod-g(x) addition and multiplication tables are as follows:
⊕ 0 1 x x+1 ∗ 0 1 x x+1
0 0 1 x x+1 0 0 0 0 0
1 1 0 x+1 x 1 0 1 x x+1
x x x+1 0 1 x 0 x x+1 1
x+1 x+1 x 1 0 1+x 0 x+1 1 x
Let F[x] be the set of polynomials over an arbitrary field F. If f (x) ∈ F[x] has a degree-1 factor
x − α for some α ∈ F, then α is called a root of f (x).
Since any f (x) may be uniquely expressed as f (x) = q(x)(x−α)+β for some quotient q(x) and
some β ∈ F (i.e., for some remainder r(x) = β of degree less than 1), it follows that f (α) = β.
Therefore α is a root of f (x) if and only if f (α) = 0 — i.e., if and only if α is a root of the
polynomial equation f (x) = 0.
By degree additivity, the degree of a polynomial f (x) is equal to the sum of the degrees of
its prime factors, which are unique by unique factorization. Therefore a polynomial of degree
m can have at most m degree-1 factors. This yields what is sometimes called the fundamental
theorem of algebra:
Theorem 7.10 (Fundamental theorem of algebra) Over any field F, a monic polynomial
f (x) ∈ F[x] of degree m can have no more than m roots in F. If it does have m roots {β1 , . . . , βm },
then the unique factorization of f (x) is f (x) = (x − β1 ) · · · (x − βm ).
Since the polynomial xn − 1 can have at most n roots in F, we have an important corollary:
Theorem 7.11 (Cyclic multiplicative subgroups) In any field F, the multiplicative group
F∗ of nonzero elements has at most one cyclic subgroup of any given order n. If such a subgroup
exists, then its elements {1, β, . . . , β n−1 } satisfy
xn − 1 = (x − 1)(x − β) · · · (x − β n−1 ).
7.7. THE MULTIPLICATIVE GROUP OF F∗Q IS CYCLIC 91
For example, the complex multiplicative group C∗ has precisely one cyclic subgroup of each
finite size n, consisting of the n complex nth roots of unity. The real multiplicative group R∗
has cyclic subgroups of size 1 ({1}) and 2 ({±1}), but none of any larger size.
Exercise 11. For 1 ≤ j ≤ n, the jth � �elementary symmetric function σj (S) of a set S of n
elements of a field F is the sum of all nj products of j distinct elements of S. In particular,
σ1 (S) is the sum of all elements of S, and σn (S) is the product of all elements of S.
(a) Show that if S = {1, β, . . . , β n−1 } is a cyclic subgroup of F∗ , then σj (S) = 0 for 1 ≤ j ≤ n−1
and σn (S) = (−1)n+1 . In particular,
�
n−1 �
n−1
β j = 0, if n > 1; β j = (−1)n+1 .
j=0 j=0
Verify for S = {±1, ±i} (the four complex 4th roots of unity).
(b) Prove that for any odd prime integer p,
(p − 1)! = 1 · 2 · 3 · · · (p − 1) = −1 mod p.
Verify for p = 3, 5 and 7.
For any β ∈ Fq∗ , consider the cyclic subgroup S(β) = {1, β, β 2 , β 3 , . . .} of Fq∗ generated by β.
The size |S(β)| of this subgroup is called the multiplicative order of β.
By the cyclic group theorem, β |S(β)| = 1, and by Lagrange’s theorem, |S(β)| must divide
|F∗q | = q − 1. It follows that β q−1 = 1 for all β ∈ Fq∗ .
In other words, every β ∈ Fq∗ is a root of the polynomial equation xq−1 = 1, or equivalently
of the polynomial xq−1 − 1 ∈ Fq [x]. By the polynomial roots theorem, xq−1 − 1 can have at
most q − 1 roots in Fq , so these are all the roots of xq−1 − 1. Thus xq−1 − 1 factors into the
product of the degree-1 polynomials x − β for all β ∈ F∗q . Moreover, since 0 ∈ Fq is a root of the
polynomial x and x(xq−1 − 1) = xq − x, the polynomial xq − x factors into the product of the
degree-1 polynomials x − β for all β ∈ Fq .
To summarize:
Theorem 7.12 In a finite field Fq with q elements, every nonzero field element β ∈ Fq satisfies
β q−1 = 1 and has a multiplicative order |S(β)| that divides q − 1. The nonzero elements of Fq
are the q − 1 distinct roots of the polynomial xq−1 − 1 ∈ Fq [x]; i.e.,
�
xq−1 − 1 = (x − β). (7.3)
β∈F∗q
The elements of Fq are the q distinct roots of the polynomial xq − x ∈ Fq [x]; i.e.,
�
xq − x = (x − β). (7.4)
β∈Fq
Exercise 12.
(a) Verify (7.3) for the prime field F5 .
(b) Verify (7.3) for the field F4 that was constructed in Example 1. [Hint: use a symbol other
than x for the indeterminate in (7.3).]
92 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
A primitive element of a finite field Fq is an element α whose multiplicative order |S(α)| equals
q − 1. If α is a primitive element, then the cyclic group {1, α, α2 , . . . , αq−2 } is a set of q − 1
distinct nonzero elements of Fq , which therefore must be all the nonzero elements. Thus if we can
show that Fq has at least one primitive element, we will have shown that its nonzero elements
F∗q form a cyclic group under multiplication of size q − 1.
By Lagrange’s theorem, the multiplicative order |S(β)| of each nonzero element β ∈ F∗q divides
q − 1. Therefore the size d of each cyclic subgroup of Fq∗ divides q − 1. As we have seen, the
number of elements in a cyclic group or subgroup of size d that have order d is the Euler number
φ(d). Since by the cyclic subgroups theorem F∗q has at most one cyclic subgroup of each size d,
the number of elements in F∗q with order less than q − 1 is at most
�
φ(d).
d: d|(q−1), d=q−1
But since the Euler numbers satisfy the relationship (7.1), which in this case is
�
q−1= φ(d),
d: d|(q−1)
we conclude that there must be at least φ(q − 1) elements of F∗q with order q − 1. Indeed, since
F∗q has at most φ(q − 1) elements of order q − 1, all inequalities must be satisfied with equality;
i.e., F∗q has precisely φ(d) elements of order d for each divisor d of q − 1.
We saw in Exercise 3 that φ(q − 1) ≥ 1, so a primitive element α of order q − 1 exists. Thus
F∗q is cyclic and has one cyclic subgroup of each order d that divides q − 1. This proves the
following theorem:
Theorem 7.13 (Primitive elements) Given any field Fq with q elements, the nonzero ele-
ments of Fq form a multiplicative cyclic group F∗q = {1, α, α2 , . . . , αq−2 }. Consequently F∗q has
φ(d) ≥ 1 elements of multiplicative order d for every d that divides q − 1, and no elements of
any other order. In particular, F∗q has φ(q − 1) ≥ 1 primitive elements.
Henceforth we will usually write the elements of a finite field Fq as {0, 1, α, α2 , . . . , αq−2 }, where
α denotes a primitive element. For Fg(x) , denoting a field element β as a power of α rather than
as a remainder polynomial helps to avoid confusion when we consider polynomials in β.
Example 2. The prime field F5 has φ(1) = 1 element of order 1 (the element 1), φ(2) = 1
element of order 2 (namely 4 = -1), and φ(4) = 2 primitive elements of order 4 (namely, 2 and
3). We can therefore write F5 = {0, 1, 2, 22 , 23 }, since 22 = 4 and 23 = 3 mod 5.
Example 3. A field F16 = {0, 1, α, . . . , α14 } with 16 elements has
Again, consider any field Fq with q elements. We have seen in Theorem 7.12 that the polynomial
xq − x ∈ Fq [x] factors completely into q deqree-1 factors x − β ∈ Fq [x], β ∈ Fq .
We have also seen that if Fq has characteristic p, then Fq has a prime subfield Fp with p
elements. The prime subfield Fp contains the integers of Fq , which include {0, ±1}. Therefore
we may regard xq − x alternatively as a polynomial in Fp [x].
By unique factorization, xq − x factors over Fp into a unique product of prime polynomials
gi (x) ∈ Fp [x]: �
xq − x = gi (x). (7.5)
i
Since the first factorization is the unique prime factorization, it follows that each monic polyno-
mial gi (x) of degree greater than 1 must be reducible over Fq , and must factor into a product
of degree-1 monic polynomials; i.e.,
�
deg gi (x)
gi (x) = (x − βij ). (7.7)
j=1
The prime polynomials gi (x) are called the minimal polynomials of Fq . Since each β ∈ Fq
appears exactly once on the left side of (7.6), it also appears as a factor in exactly one minimal
polynomial in (7.7). Thus the elements of Fq are partitioned into disjoint sets {βi1 , . . . , βik }
where k = deg gi (x), and each β ∈ Fq is a root of exactly one minimal polynomial of Fq , called
the minimal polynomial of β.
The key property of the minimal polynomial of β is the following:
94 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
Lemma 7.14 Let g(x) be the minimal polynomial of any given β ∈ Fq . Then g(x) is the monic
polynomial of least degree in Fp [x] such that g(β) = 0. Moreover, for any f (x) ∈ Fp [x], f (β) = 0
if and only if g(x) divides f (x).
Proof: Let h(x) ∈ Fp [x] be a monic polynomial of least degree such that h(β) = 0. Using
the Euclidean division algorithm, g(x) = q(x)h(x) + r(x) where deg r(x) < deg h(x). Since
h(β) = g(β) = 0, we must have r(β) = 0. By the smallest degree property of h(x), this implies
that r(x) = 0, so h(x) divides g(x). But since g(x) is irreducible, h(x) cannot have degree less
than g(x); i.e., deg h(x) = deg g(x). Moreover, since both h(x) and g(x) are monic, this implies
that h(x) = g(x). Thus g(x) is the monic polynomial of least degree in Fp [x] such that g(β) = 0.
Now let f (x) be any polynomial in Fp [x] that satisfies f (β) = 0. By Euclidean division, f (x) =
q(x)g(x) + r(x) with deg r(x) < deg g(x). Thus r(β) = f (β) = 0. Since deg r(x) < deg g(x),
r(β) = 0 if and only if r(x) = 0; i.e., if and only if g(x) divides f (x).
Example 1 (cont.). Again consider the field F4 of Example 1, whose elements we now write
as {0, 1, α, α2 }, where α may be taken as x or x + 1. This field has characteristic 2. The prime
factorization of the binary polynomial x4 − x = x4 + x ∈ F2 [x] is
x2 + x + 1 = (x + α)(x + α2 )
since α + α2 = 1 and α ∗ α2 = α3 = 1.
Given a�field Fq with prime subfield Fp , we now consider evaluating a nonzero polynomial
f (x) = i fi xi ∈ Fp [x] at an element β ∈ Fq to give a value
�
deg f (x)
f (β) = fi β i
i=0
in Fq , where fi is taken as an element of Fq for the purposes of this evaluation. The value of the
zero polynomial at any β is 0.
The value f (β) depends on both the polynomial f (x) and the field element β ∈ Fq . Rather than
regarding f (β) as a function of β, as the notation suggests, we will regard f (β) as a function of
the polynomial f (x) ∈ Fp [x] for a fixed β. In other words, we consider the map mβ : Fp [x] → Fq
that is defined by mβ (f (x)) = f (β).
The set of values mβ (Fp [x]) of this map as f (x) ranges over polynomials in Fp [x] is by definition
the subset of elements Gβ ⊆ Fq that can be expressed as linear combinations over Fp of powers of
β. We will show that Gβ forms a subfield of Fq that is isomorphic to the polynomial remainder
field Fg(x) , where g(x) is the minimal polynomial of β, namely the monic polynomial of least
degree such that g(β) = 0.
7.8. EVERY FINITE FIELD IS ISOMORPHIC TO A FIELD FG(X) 95
We observe that the map mβ : Fp [x] → Fq preserves addition and multiplication; i.e.,
mβ (f1 (x) + f2 (x)) = mβ (f1 (x)) + mβ (f2 (x)) since both sides equal f1 (β) + f2 (β), and
mβ (f1 (x)f2 (x)) = mβ (f1 (x))mβ (f2 (x)) since both sides equal f1 (β)f2 (β).
We can now prove the desired isomorphism between the fields Fg(x) and Gβ :
We have shown that every finite field Fq contains a primitive element α. In this case, the subfield
Gα consisting of all linear combinations over Fp of powers of α must evidently be the whole field
Fq . Thus we obtain our main theorem:
Theorem 7.16 (Every finite field is isomorphic to a field Fg(x) ) Every finite field Fq of
characteristic p with q elements is isomorphic to a polynomial remainder field Fg(x) , where g(x)
is a prime polynomial in Fp [x] of degree m. Hence q = pm for some positive integer m.
Exercise 13. For which integers q, 1 ≤ q ≤ 12, does a finite field Fq exist?
Finally, we wish to show that all fields with pm elements are isomorphic. The following lemma
shows that every prime polynomial g(x) of degree m (we are still assuming that there exists at
least one) is a minimal polynomial of every field with pm elements:
m
Lemma 7.17 Every prime polynomial g(x) ∈ Fp [x] of degree m divides xp − x.
Proof. If g(x) is a prime polynomial in Fp [x] of degree m, then the set RFp ,m with mod-g(x)
arithmetic forms a field Fg(x) with pm elements. The remainder polynomial x ∈ RFp ,m is a field
element β ∈ Fg(x) . Evidently g(β) = 0, but r(β) = 0 if deg r(x) < m; therefore g(x) is the
m m
minimal polynomial of β. Since β p −1 = 1, β is a root of xp −1 − 1. This implies that g(x)
m m
divides xp −1 − 1, and thus also xp − x.
96 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
Consequently every field of size pm includes m elements whose minimal polynomial is g(x).
Therefore by the same construction as above, we can prove:
Theorem 7.18 (All finite fields of the same size are isomorphic) For any prime poly-
nomial g(x) ∈ Fp [x] of degree m, every field of pm elements is isomorphic to the polynomial
remainder field Fg(x) .
m
7.8.4 More on the factorization of xp − x
We can now obtain further information on the factorization of xq − x. In view of Theorem 7.16,
we now set q = pm .
We first show that the set of roots of a minimal polynomial gi (x) ∈ Fp [x] is closed under the
operation of taking the pth power. This follows from the curious but important fact that over a
field F of characteristic p, taking the pth power is a linear operation. For example, when p = 2,
squaring is linear because
(α + β)2 = α2 + αβ + αβ + β 2 = α2 + β 2 .
By taking the pth power n times, we may extend this result as follows:
Lemma 7.19 (Linearity of taking the pn th power) Over any field F of characteristic p,
for any n ≥ 1, taking the pn th power is linear; i.e.,
n n n
(α + β)p = αp + β p .
m
Note that if F has q = pm elements, then β p = β for all β ∈ F, so this lemma becomes
repetitive for n ≥ m.
�m i
Exercise 14. Using this lemma, prove that if f (x) = i=0 fi x , then
n n n n n n n n n
f p (x) = (f0 + f1 x + f2 x2 + · · · + fm xm )p = f0p + f1p xp + f2p x2p + · · · + fm
p mp
x .
This result yields a useful test for whether a polynomial f (x) ∈ F[x] is in Fp [x] or not, and a
useful formula in case it is:
7.8. EVERY FINITE FIELD IS ISOMORPHIC TO A FIELD FG(X) 97
Lemma 7.20 (Prime subfield polynomials) For any field F of characteristic p and any
f (x) ∈ F[x], f p (x) = f (xp ) if and only if f (x) ∈ Fp [x]; i.e., if and only if all coefficients fi
are in the prime subfield Fp ⊆ F.
Now the elements of F that are in Fp are precisely the p roots of the polynomial xp − x; thus
β p = β if and only if β ∈ Fp . Thus the right side of this equation simplifies to f (xp ) if and only
if fi ∈ Fp for all i.
Exercise 15. Prove that a positive integer n is prime if and only if (x − a)n = xn − a mod n
for every integer a that is relatively prime to n.5
Using Lemma 7.20, we now show that the roots of a minimal polynomial are a cyclotomic coset
2
of the form {β, β p , β p , . . .}:
Proof. Let β be any root of g(x). Since g(x) ∈ Fp [x], Lemma 7.20 shows that g(xp ) = g p (x).
2 3 i
Therefore g(β p ) = g p (β) = 0. Thus β p is also a root of g(x). Iterating, β p , β p , . . . , β p , . . . are
all roots of g(x). Because F is finite, these roots cannot all be distinct. Therefore let n be the
n j j j+k
smallest integer such that β p = β. Thus β p = β for 1 ≤ j < n. This implies that β p = β p
2 n−1
for 0 ≤ j < n, 1 ≤ k < n; i.e., all elements of the set {β, β p , β p , . . . , β p } are distinct. Thus
2 j m
β, β p , β p , . . . is a cyclic sequence and β p = β if and only if n is a divisor of j. Since β p = β,
we see that n must divide m.
Finally, we show that these roots are all of the roots of g(x); i.e., deg g(x) = n and
�
n−1
i
g(x) = (x − β p ).
i=0
The right side of this equation is a monic polynomial h(x) ∈ F[x] of degree n. Since the roots
of h(x) are roots of g(x), h(x) must divide g(x) in F[x]. Now, using Lemma 7.20, we can prove
that h(x) is actually a polynomial in Fp [x], because
�
n−1 �
n−1 �
n−1
pi p pi+1 i
p
h (x) = (x − β ) = (x − β
p
)= (xp − β p ) = h(xp ),
i=0 i=0 i=0
n
where we use the linearity of taking the pth power and the fact that β p = β. Therefore, since
g(x) has no factors in Fp [x], g(x) must actually be equal to h(x).
n n
Finally, since the roots of g(x) all satisfy β p = β, they are all roots of the polynomial xp − x,
n
which implies that g(x) divides xp − x.
5
This is the basis of the polynomial-time primality test of [Agrawal, Kayal and Saxena, 2002].
98 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
This theorem has some important implications. First, the degree n of a minimal polynomial
g(x) of a finite field F with pm elements must be a divisor of m. Second, the subfield Gβ of F
n m
generated by a root β of g(x) must have pn elements. Third, xp − x divides xp − x, since the
n m
elements of Gβ are all the roots of xp − x and are also roots of xp − x.
Conversely, let g(x) be any prime polynomial in Fp [x] of degree n. Then there is a finite field
n
generated by g(x) with pn elements. This proves that g(x) divides xp − x, and thus g(x) divides
m m
xp − x for every multiple m of n. Thus the divisors of xp − x include every prime polynomial
in Fp [x] whose degree n divides m.
m
Moreover, xp − x has no repeated factors. We proved this earlier assuming the existence of a
field F with pm elements; however, we desire a proof that does not make this assumption. The
following exercise yields such a proof.
m
Exercise 16 (xp −x has no repeated factors). The formal derivative of a degree-n polynomial
f (x) ∈ Fp [x] is defined as
�n
f (x) = (j mod p)fj xj−1
j=1
(a) Show that if f (x) = g(x)h(x), then f (x) = g (x)h(x) + g(x)h (x).
(b) Show that an prime polynomial g(x) is a repeated divisor of f (x) if and only if g(x) is a
divisor of both f (x) and f (x).
m
(c) Show that xp − x has no repeated prime factors over Fp .
m
Now we can conclude our discussion of the factorization of xp − x as follows:
m m
Theorem 7.22 (Factors of xp − x) The polynomial xp − x factors over Fp into the product
of the prime polynomials in Fp [x] whose degrees divide m, with no repetitions.
x2 + x = x(x + 1);
x4 + x = x(x + 1)(x2 + x + 1);
x8 + x = x(x + 1)(x3 + x2 + 1)(x3 + x + 1);
x16 + x = x(x + 1)(x2 + x + 1)(x4 + x3 + 1)(x4 + x3 + x2 + x + 1)(x4 + x + 1).
Exercise 17. Find all prime polynomials g(x) ∈ F3 [x] of degree 1 and 2 over the ternary field
F3 . Show that the product of these polynomials is x9 − x = x9 + 2x. Explain, with reference to
F9 .
7.9. FINITE FIELDS FP M EXIST FOR ALL PRIME P AND M ≥ 1 99
This formula may be solved recursively for each N (m), starting with N (1) = p.
Exercise 18. Calculate N (m) for p = 2 for m = 1 to 10. Check your results against those
stated in Section 7.5.4.
Now we are in a position to prove the desired theorem:
Theorem 7.23 (Existence of prime polynomials) Let N (m) be the number of prime poly-
nomials in Fp [x] of degree m, which is given recursively by (7.8). For every prime p and positive
integer m, N (m) > 0.
where we have upperbounded the number of terms in the sum by m/2 and upperbounded each
term by pm/2 , since the largest divisor of m other than m is at most m/2. Thus
mN (m) ≥ pm − (m/2)pm/2 = pm/2 (pm/2 − m/2).
The quantity pm/2 − m/2 is positive for p = 2, m = 2, and is increasing in both p and m. Thus
mN (m) is positive for all prime p and all m ≥ 2. Moreover N (1) = p.
Since a finite field Fg(x) with pm elements can be constructed from any prime polynomial
g(x) ∈ Fp [x] of degree m, this implies:
Theorem 7.24 (Existence of finite fields) For every prime p and positive integer m, there
exists a finite field with pm elements.
Moreover, for each n that divides m, there exists a unique subfield G with pn elements, namely
n
the roots of the polynomial xp − x:
Theorem 7.25 (Existence of finite subfields) Every finite field with pm elements has a sub-
field with pn elements for each positive integer n that divides m.
m
In summary, the factorization of xp − x into minimal polynomials partitions the elements of
Fpm into cyclotomic cosets whose properties are determined by their minimal polynomials. The
roots of g(x) have multiplicative order k if g(x) divides xk − 1 and does not divide xj − 1 for
j < k. Moreover, the roots of g(x) are elements of the subfield with pn elements if and only if
n
g(x) divides xp − x, or equivalently if their order k divides pn − 1.
100 CHAPTER 7. INTRODUCTION TO FINITE FIELDS
F16 has a prime subfield F2 consisting of the elements whose minimal polynomials divide x2 + x,
namely 0 and 1. It also has a subfield F4 consisting of the elements whose minimal polynomials
divide x4 + x, namely {0, 1, α5 , α10 }. Alternatively, F∗4 consists of the three elements of F∗16
whose multiplicative orders divide 3.
Exercise 19 (construction of F32 ).
(a) Find the prime polynomials in F2 [x] of degree 5, and determine which have primitive roots.
(b) For some minimal polynomial g(x) with a primitive root α, construct a field Fg(x) with 32
elements. Give a table with the elements partitioned into cyclotomic cosets as above. Specify the
minimal polynomial and the multiplicative order of each nonzero element. Identify the subfields
of Fg(x) .
(c) Show how to do multiplication and division in Fg(x) using this “log table.” Discuss the
rules for multiplication and division in Fg(x) when one of the field elements involved is the zero
element 0 ∈ Fg(x) .
(d) [Optional] If you know something about maximum-length shift-register (MLSR) sequences,
show that there exists a correspondence between the “log table” given above and a certain MLSR
sequence of length 31.
11.3. GRAPH-THEORETIC PROPERTIES OF GRAPHICAL REALIZATIONS 159
Note that there are 4 “active” generators at each of the 12 state times, if we take the time axis
to be circular (“end-around”). On the other hand, if we were to assume a conventional time
axis, then at least 8 generators would have to be active at the central state time.
Note also that if we “unwrap” these generators onto an infinite conventional time axis, then we
get generators for a rate-1/2 16-state period-4 time-varying (or rate-4/8 16-state time-invariant)
binary linear convolutional code, as follows:
···
... 00 11 01 11 01 11 00 00 00 00 ...
... 00 00 11 11 10 01 11 00 00 00 ...
... 00 00 00 11 01 10 11 11 00 00 ...
... 00 00 00 00 11 01 11 01 11 00 ...
···
This “Golay convolutional code” has a minimum Hamming distance of 8 and an average of
Kb = 12.25 weight-8 codewords per information bit, so its nominal coding gain is γc = 4 (6 dB)
and its effective coding gain is γeff = 5.3 dB, which are remarkable for a 16-state rate-1/2 code.
In summary, by considering a state realization with a single cycle rather than a conventional
trellis realization, we may be able to obtain a state complexity as small as the square root of
the minimum state complexity of a conventional trellis.
In Exercise 6 of Chapter 6, it was shown that all Reed-Muller codes RM(r, m) of length 2m could
be generated by a single “universal”
2m × 2m generator matrix Um = (U1 )⊗m , the m-fold tensor
product of the 2 × 2 matrix U1 = 11 01 with itself. The matrix Um is called the Hadamard
m
transform matrix over F2 . For any binary 2m -tuple u ∈ (F2 )2 , the binary 2m -tuple y = uUm
is called the Hadamard transform of u. Since (Um )2 = I2m , the identity matrix, it follows that
the Hadamard transform of y is u; i.e., u = yUm
More particularly, RM(r, m) = {y = uUm }, where the coordinates of the binary 2m -tuple u are
free in the k(r, m) positions corresponding to the k(r, m) rows of Um of weight 2m−r or greater,
and fixed to 0 in the remaining coordinates. In other words, RM(r, m) is the set of Hadamard
transforms of all 2k(r,m) binary 2m -tuples that are all-zero in a certain 2m − k(r, m) coordinates.
(Compare the Fourier transform characterization of Reed-Solomon codes in Chapter 8.)
We can construct a graphical realization of a Hadamard transform as follows. The 2 × 2
Hadamard transform y = uU1 is explicitly given by the two equations
y0 = u0 + u1 ;
y1 = u 1 ,
which are realized by the normal graph of Figure 8. (This is sometimes called a controlled-not
gate, where y1 = u1 is regarded as a control variable.)
y0 + u0
y1 = u1
Note that there are no arrows (directed edges) in this behavioral realization. Either u or y
may be taken as input, and correspondingly y or u as output; i.e., the graph is a realization of
either the Hadamard transform y = uU1 or the inverse Hadamard transform u = yU1 .
A 2m × 2m Hadamard transform y = uUm may then be realized by connecting these 2 × 2
transforms in tensor product fashion. For example, the 8 × 8 Hadamard transform is given
explicitly by the eight equations
y 0 = u0 + u1 + u2 + u3 + u4 + u5 + u6 + u7 ;
y1 = u1 + u3 + u5 + u7 ;
y2 = u 2 + u 3 + u 6 + u 7 ;
y3 = u 3 + u 7 ;
y4 = u 4 + u 5 + u 6 + u 7 ;
y5 = u 5 + u 7 ;
y6 = u 6 + u 7 ;
y7 = u 7 .
These equations are realized by the tensor product graph of Figure 9. (Compare the “butterflies”
in the graph of an 8 × 8 fast Fourier transform.)
y0 + + + u0
y1 = = = u4
@ � @
�
@ @
y2 + � @ + @ + u2
A
A
y3 = = A = u6
B A
B A
y4 + + B A + u1
B
B
y5 = = B = u5
@ � B�
�
@ �B
y6 + � @ + � B + u3
y7 = = = u7
A Reed-Muller code of length 8 may then be realized by fixing certain of the uk to zero while
letting the others range freely. For example, the (8, 4, 4) code is obtained by fixing u0 = u1 =
u2 = u4 = 0, which yields the equations
y0 = u3 + u5 + u6 + u7 ;
y1 = u3 + u 5 + u7 ;
y 2 = u3 + u6 + u7 ;
y 3 = u3 + u7 ;
11.3. GRAPH-THEORETIC PROPERTIES OF GRAPHICAL REALIZATIONS 161
y 4 = u5 + u6 + u7 ;
y 5 = u5 + u7 ;
y6 = u 6 + u 7 ;
y7 = u 7 .
These equations are realized by the graph of Figure 10(a), which may be simplified to that
of Figure 10(b). Here we regard the “inputs” uj as internal variables, and the “outputs” yk as
external variables.
y0 + + + 0 y0 + u6
y1 = = = 0 y1 = u3 + u5 = u6
@ � @ +u@7 � @
@
� @ �
@ @
y2 + � @ + @ + 0 y2 + u6� @ + u5 @
A A
A A
y3 = = A = u6 y3 = u 3 + u7 A
B A A
B A A
y4 + + B A + 0 y4 + u6 A
B
B
y5 = = B = u5 y5 = u5 + u7 =
@ � B� @� �
@
� �B @
� �
y6 + � @ + � B + u3 y6 + u6� @ + �
y7 = = = u7 y7 = u7
Figure 10. (a) Normal graph of (8, 4, 4) RM code. (b) Equivalent realization.
In Figure 10(b), all state variables are binary and all constraint codes are simple (3, 2, 2) parity-
check constraints or (3, 1, 3) repetition constraints. It is believed (but not proved) that this
realization is the most efficient possible realization for the (8, 4, 4) code in this sense. However,
Figure 10(b) has cycles.
It is easy to see how the cycle-free graph of Figures 7(a) (as well as 7(b), or a minimal four-
section, four-state trellis) may be obtained by agglomerating subgraphs of Figure 10(b). Such
a graph is depicted in Figure 11. The code symbols are partitioned into four 2-tuples. A state
space of dimension 2 connects the two halves of a codeword (meeting the cut-set bound). Two
constraint codes of length 6 and dimension 3 determine the possible combinations of symbol
4-tuples and state 2-tuples in each half of the code.
y 2 y3 � y
@ 4 y5
�2@
2 2@
�
(6, 3) (6, 3)
2� 2
@�
y0 y@
1@ �y6 y7
Figure 11. Tree-structured realization of (8, 4, 4) RM code.
162 CHAPTER 11. CODES ON GRAPHS
Similarly, we may realize any Reed-Muller code RM(r, m) in any of these styles. By starting
with a Hadamard transform realization as in Figure 10(a) and reducing it as in Figure 10(b), we
can obtain a realization in which all state variables are binary and all constraint codes are simple
(3, 2, 2) parity-check constraints or (3, 1, 3) repetition constraints; however, such a realization will
generally have cycles. By agglomerating variables, we can obtain a tree-structured, cycle-free
realization as in Figure 11 which reflects the |u|u + v| iterative RM code construction.
Exercise 1. (Realizations of repetition and SPC codes)
Show that a reduced Hadamard transform realization of a repetition code RM(0, m) or a
single-parity-check code RM(m−1, m) is a cycle-free tree-structured realization with a minimum
number of (3, 1, 3) repetition constraints or (3, 2, 2) parity-check constraints, respectively, and
furthermore with minimum diameter (distance between any two code symbols in the tree).
Show that these two realizations are duals; i.e., one is obtained from the other via interchange
of (3, 2, 2) constraints and (3, 1, 3) constraints.
Exercise 2. (Dual realizations of RM codes)
Show that in general a Hadamard transform (HT) realization of any Reed-Muller code
RM(r, m) is the dual of the HT realization of the dual code RM(m − r − 1, m); i.e., one is
obtained from the other via interchange of (3, 2, 2) constraints and (3, 1, 3) constraints.
Exercise 3. (General tree-structured realizations of RM codes)
Show that there exists a tree-structured realization of RM(r, m) of the following form:
2m−2 C2 C2 2m−2
s(r, m)
HH
s(r, m)
H
C1 s(r, m) C1
H H
s(r, m) H
s(r, m)
2m−2 C2 C2 2m−2
Show that s(r, m) = dim RM(r, m − 1) − dim RM(r − 1, m − 1) (see Exercise 1 of Chapter 10).
Show that the cut-set bound is met everywhere. Finally, show that
dim C2 = dim RM(r, m − 2);
dim C1 = dim RM(r, m − 1) − 2 dim RM(r − 2, m − 2) = t(r, m),
where t(r, m) is the branch complexity of RM(r, m) (compare Table 1 of Chapter 6). For example,
there exists a tree-structured realization of the (32, 16, 8) RM code as follows:
8 (14, 7) (14, 7) 8
H6
HH
6
(18, 9) 6 (18, 9)
6
HH
6 H
8 (14, 7) (14, 7) 8
A factor graph represents a global function of a set of variables (both internal and external) that
factors into a product of local functions defined on subsets of the variables.
The indicator function ΦB(y, s) of a behavior B is a {0, 1}-valued function of external variables
y and internal variables s that equals 1 for valid trajectories (y, s) and equals 0 otherwise. If a
trajectory (y, s) is valid whenever its components lie in a set of local constraint codes {Ck , k ∈ K},
then the global indicator function ΦB is the product of local indicator functions {ΦCk , k ∈ K}.
Thus a behavioral realization may be represented by a factor graph.
A Tanner-type factor graph is an undirected bipartite graph in which variables are represented
by one type of vertex (with internal and external variables denoted differently), and functions
are represented by a different type of vertex. A Tanner graph of a behavioral realization may
be interpreted as a Tanner-type factor graph simply by regarding the constraint vertices as
representatives of constraint indicator functions. Similarly, a normal (Forney-type) factor graph
is an undirected graph in which internal variables are represented by edges, external variables
are represented by dongles, and functions are represented by vertices; in the same way a normal
graph of a behavioral realization may be interpreted as a normal factor graph.
In the following chapters, we will be interested in global probability functions that factor into
a product of local probability functions; then factor graphs become very useful.
Markov graphs are often used in statistical physics and statistical inference to represent global
probability distributions that factor into a product of local distributions.
A Markov graph (Markov random field) is an undirected graph in which variables are repre-
sented by vertices, and a constraint or function is represented by an edge (if it has degree 2), or
by a hyperedge (if it has degree greater than 2). Moreover, a hyperedge is usually represented by
a clique, i.e., a set of ordinary edges between every pair of variables incident on the hyperedge.
(This style of graph representation sometimes generates inadvertent cliques.)
Markov graphs are particularly nice when the degrees of all constraints are 2 or less. Such a
representation is called a pairwise Markov graph. We may then represent constraints by ordinary
edges. Pairwise constraints often arise naturally in physical models.
Figure 14 shows how any Tanner graph (or Tanner-type factor graph) may be transformed
into a pairwise Markov realization by a simple conversion. Here each constraint code has been
replaced by a state “supervariable” whose alphabet is the set of all codewords in the constraint
code. Each edge then represents the constraint that the associated ordinary variable must be
equal to the corresponding component of the supervariable.
56 CHAPTER 4. HAMMING CODES
For a Hamming code, the covering radius is 1. Indeed, for any perfect e-
error-correcting code, the covering radius is e.
Proof. The coset of the word −x consists of the sum of −x with each
individual codeword of C, so the weights of the coset members give the distances
of x from the various codewords. The minimal such weight is thus the distance
of x from the code and also the weight of a coset leader. The maximum weight
of a coset leader is therefore the largest distance of any word x from the code.
2
As with dmin , the covering radius of a code is, in general, difficult to compute.
The following problem, reminiscent of Problem 4.1.5, can be of great help.
( 4.2.3 ) Problem. Let the [n1 , k1 ] linear code C1 over F have generator matrix G1 ,
and let the [n2 , k2 ] linear code C2 over F have generator matrix G2 . Consider the
[n1 + n2 , k1 + k2 ] linear code C over F with generator matrix
» –
0 G1
G= ,
G2 ∗
where the upper left 0 is a k1 × n2 matrix of 0’s and the lower right ∗ is an arbitrary
k2 × n1 matrix with entries from F .
Prove that cr(C) ≤ cr(C1 ) + cr(C2 ).
easily recovered from RM(1, m). Indeed by first choosing all the codewords of
RM(1, m) that have a 0 in their first coordinate position and then deleting this
now useless coordinate, we find the dual Hamming code. This is clear when we
consider how the matrix XLm was constructed by bordering the matrix Lm , the
generator matrix of the dual lexicographic Hamming code. (See page 51.)
Having earlier constructed the generator XLm as a matrix in bordered block
form, we now examine it again, but blocked in a different manner. Notice that
RM(1, 1) = F22 , RM(1, 2) is the parity check code of length 4, and RM(1, 3) is a
self-dual extended [8, 4] Hamming code.
Examples.
2 3
» – 0 0 1 1
0 1
XL1 = and XL2 = 4 0 1 0 1 5
1 1
1 1 1 1
2 3
0 0 0 0 1 1 1 1
6 0 0 1 1 0 0 1 1 7
XL3 = 6
4 0
7
1 0 1 0 1 0 1 5
1 1 1 1 1 1 1 1
2 3
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
6
6 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 7
7
XL4 = 6
6 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 7
7
4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
For i between 2m−1 and 2m , the column m-tuple containing the binary rep-
resentation of i is just that for i − 2m−1 with its leading 0 replaced by a 1.
Therefore, if we ignore the top row of XLm , then the remaining m rows consist
of an m × 2m−1 matrix repeated twice. Indeed this repeated matrix is nothing
other than XLm−1 . We now observe the recursive construction:
0 ··· 0 1 ··· 1
XLm = .
XLm−1 XLm−1
( 4.3.1) Theorem. For each m, the first order Reed-Muller code RM(1, m) is
a binary linear [2m , m + 1, 2m−1 ] code.
Proof. Certainly RM(1, m) is linear of length 2m , and its dimension m + 1
is evident from the generator matrix XLm . From their generator matrices XL1
and XL2 , it is easy to see that RM(1, 1) (= F22 ) and RM(1, 2) (the parity check
code of length 4) both have minimum distance 2m−1 .
We now verify the minimum distance in general by induction, assuming that
we already have dmin (RM(1, m − 1)) = 2m−2 . Let C1 be RM(1, m − 1) with
minimum distance d1 = 2m−2 , and let C2 be the repetition code of length
2m−1 , whose minimum distance is therefore d2 = 2m−1 . The generator matrix
XLm for RM(1, m) is then constructed from the generators G1 = XLm−1 and
G2 = [1 · · · 1] according to the recipe of Problem 4.1.5. Therefore, by that
problem, we have
dmin (RM(1, m)) = min(2d1 , d2 ) = min(2.2m−2 , 2m−1 ) = 2m−1 ,
58 CHAPTER 4. HAMMING CODES
as claimed. 2
Proof. The last row of the generator matrix XLm is 1; so 0 and 1 are
the unique codewords of weight 0 and 2m , respectively. By Theorem 4.3.1 the
linear code RM(1, m) has no codewords c of weight between 0 and 2m−1 , and
so it also has no codewords 1 + c of weight between 0 and 2m−1 . That is, it has
no codewords of weight between 2m−1 and 2m . Therefore all codewords other
than 0 and 1 have weight exactly 2m−1 . 2
Proof. In recovering the dual Hamming code from RM(1, m), we shorten
the code by taking all codewords that begin with 0 and then delete that position.
In particular the codeword 1 of RM(1, m) does not survive. But by Theorem
4.3.2 all other nonzero codewords of RM(1, m) have weight 2m−1 . As only zeros
are deleted, all the nonzero codewords of the dual Hamming code also will have
weight 2m−1 . 2
equidistant codes These dual Hamming codes are equidistant codes in that distinct codewords
are at a fixed distance from each other, here 2m−1 . They satisfy the Plotkin
bound 2.3.8 with equality. (The proof of the Plotkin bound as given in Prob-
lem 3.1.5 compares the minimum distance with the average distance between
codewords. For an equidistant code these are the same.)
For a binary word x ∈ Fn2 , consider the corresponding word x∗ ∈ {+1, −1}n
gotten by replacing each 0 by the real number +1 and each 1 by −1.
Proof. The dot product of two ±1 vectors is the number of places in which
they are the same minus the number of places where they are different. Here
that is (n − dH (x, y)) − dH (x, y). 2
Let RM(1, m)± be the code got by replacing each codeword c of RM(1, m)
with its ±1 version c∗ . List the codewords of RM(1, m)± as c∗1 , c∗2 , . . . , c∗2m+1 .
( 4.3.5) Lemma. If c∗ ∈ RM(1, m)± then also −c∗ ∈ RM(1, m)± . We have
so H8 H8> = 8 I8×8 . The codewords of RM(1, 3)± are then the rows of H8
and their negatives.
Suppose we receive the vector r = (1, 1, −1, 1, −1, −1, 1, 1). This has
Hadamard transform H8 r> = (2, −2, 2, −2, 2, −2, 6, 2). The entry with
largest absolute value is rb7 = 6 > 0, so we decode to
The entry with largest absolute value is rb4 = −6.3 < 0, so we decode to
( 4.3.6 ) Problem. Assume that you are using the code RM(1, 3)± of the example.
Use Hadamard transform decoding to decode the received word
HH > = n In×n
Hadamard matrix is called a Hadamard matrix. If we take as a code the rows of H and their
negatives, then Hadamard transform decoding will work exactly as described
Hadamard code above. Such a code (or its {0, 1} counterpart) is called a Hadamard code.
Begin with a Hadamard code of side n. Choose those n codewords that start
with +1, drop that position, and translate back to a {0, 1} code. The result is
a binary code of length n − 1 and size n which is equidistant of distance n/2. A
shortened Hadamard code code constructed in this fashion is a shortened Hadamard code. Starting with
the matrix H8 of the example above, we recover from RM(1, 3) and RM(1, 3)±
the [7, 3] dual Hamming code.
Although any Hadamard matrix can be used to design a code that allows
Hadamard transform decoding, there are certain advantages to be gained from
using those matrices that come from Reed-Muller codes as described. The
existence of a soft decision algorithm is good, but we hope to implement it as
efficiently as possible. Consider decoding using the matrix H8 of the example.
Each decoding process requires 63 operations, 56 additions for calculating the
8 dot products and 7 comparisons among the answers. (By convention the
operation of negation is considered to make no contribution.) Certain annoying
repetitions are involved. For instance, both the second and the sixth rows of
H8 begin +1, −1, +1, −1; so the corresponding calculation r1 − r2 + r3 − r4 is
made twice during the decoding process. Can this and other patterns within
H8 be exploited? The answer is “yes,” and it is this fact that makes a matrix
derived from RM(1, m) a better choice than other Hadamard matrices with the
same dimensions.
4.3. FIRST ORDER REED-MULLER CODES 61
Let H1 = [1], a 1×1 Hadamard matrix, and define recursively a 2m+1 ×2m+1
matrix in block form
+H2m +H2m
H2m+1 = .
+H2m −H2m
Then
+1 +1
H2 = ,
+1 −1
and
+1 +1 +1 +1 +1 +1 +1 +1
+1 −1 +1 −1
= +1 −1 +1 −1 .
H4 =
+1 +1 −1 −1 +1 +1 −1 −1
+1 −1 −1 +1 +1 −1 −1 +1
The matrix H8 is that of the example. This construction can be continued,
for all m. The matrix H2m produced is a Hadamard matrix associated with
RM(1, m)± and the Reed-Muller code RM(1, m) whose generator matrix is XLm .
The recursive construction of H2m is related to that of XLm and admits a
streamlined implementation of decoding for RM(1, m) and RM(1, m)± , using
the so-called Fast Hadamard Transform or F HT algorithm. For instance, F HT
decoding of RM(1, 3)± can be achieved with 31 operations rather than the 63
counted previously.
The Reed-Muller codes in general, and the code RM(1, 3) in particular, are
important codes that often arise as constituents of larger codes. It is therefore
worthwhile to have decoding algorithms that are as efficient as possible. Sun
and Van Tilborg have given a soft decision algorithm for RM(1, 3) that is related
to F HT decoding but only needs, on the average, 14 operations with worst-case
performance of 17 operations.
POLYNOMIAL CODES AND FINITE
GEOMETRIES∗
E. F. Assmus, Jr and J. D. Key
Contents
1 Introduction 2
1
1 INTRODUCTION 2
1 Introduction
The reader familiar with “Designs and their Codes” will soon understand the
debt this chapter owes to that book — especially its Chapter 5. We have,
however, entirely reworked that material and, more importantly, added a
discussion of the group-algebra approach to the Reed-Muller and generalized
Reed-Muller codes. This enables us to include a straightforward new proof
of Berman’s theorem identifying the Reed-Muller codes with the radical
powers in the appropriate modular group algebra and to use our treatment
of the Mattson-Solomon polynomial to give a proof of the generalization
of Berman’s theorem to the p-ary case. We have also included Charpin’s
treatment [16] of the characterization of “affine-invariant” extended cyclic
codes due to Kasami, Lin and Peterson.
We have relied heavily on Charpin’s doctoral thesis [14, 16] for the new
material. The older material relies (as did Chapter 5 of our book) on the
treatment of the polynomial codes introduced by Kasami, Lin and Peterson
[29] given by Delsarte, Goethals and MacWilliams [18].
Our definition of the generalized Reed-Muller codes is the straightfor-
ward generalization of the boolean-function definition of the Reed-Muller
codes and, for us, the cyclicity of the punctured variants is simply a conse-
quence of the easily seen fact that their automorphism groups contain the
general linear groups.
We are, of course, principally interested in the geometric nature of certain
of these codes. Were one interested only in the binary case the development
would be very short and our treatment reflects that fact in that we first dis-
cuss the Reed-Muller codes giving complete proofs that differ substantially
from those given for the general case. In fact, we have here an instance in
2 PROJECTIVE AND AFFINE GEOMETRIES 3
which the generalization to an arbitrary finite field seems far from trivial,
the biggest hurdle being the passage to fields that are not of prime order.
The peculiar nature of the definitions of the geometric codes in the
coding-theory literature was due to the interest — at the time of their in-
troduction — in majority-logic decoding of these codes; we therefore also
give a short discussion of decoding. On the other hand, we give the natu-
ral definitions of the geometric codes (as codes generated by the incidence
vectors of the geometric objects at hand) and, hence, our definitions are not
the ones found in many engineering texts.
We review the necessary geometry briefly before beginning our discussion
of the codes; our treatment is undoubtedly too brief to be useful to a reader
with no background whatsoever in finite geometry and such a reader may
wish to jump directly to Section 3 — which may even motivate a study of
the geometry involved. Much of the material will be understandable even
without a firm grip on the geometry and subsequent sections should be of
interest to professional coding theorists. We have, at least, tried to make
them so.
We assume a knowledge of coding theory and we believe the reader will
find in Chapter 1 the coding theory necessary for a study of this chapter.
We have not attempted to discuss open problems or to explore new
avenues of research. The reader interested in such matters may wish to
consult our book [2] or the articles cited in the bibliography.
ϕ : P G(V ) → P G(W )
Definition 2.2 Let F be a field and let V and W be vector spaces over F .
A semilinear transformation of V into W is given by a map
T :V →W
α(T ), then
T : (x1 , x2 , . . . , xm ) 7→ (xα1 , xα2 , . . . , xαm )A,
where, as usual, we have used the bases to identify V with F m and W with
F n . In matrix form, the composition of two semilinear transformations,
(α, A) and (β, B), is (αβ, Aβ B), where Aβ denotes the matrix (aβij ). Since
a matrix A together with an automorphism α clearly yield, by the above
formula, a semilinear transformation, the map sending T to α(T ), in the
case where V = W , is a homomorphism onto the Galois group of F .
Thus, for a given vector space V , the group of semilinear isomorphisms
of V contains GL(V ), the group of invertible linear transformations of V ,
as a normal subgroup, the quotient being the Galois group of F . The group
of semilinear isomorphisms is denoted by Γ L(V ) . Clearly every semilinear
isomorphism of V induces an isomorphism of P G(V ). The scalar trans-
formations (i.e. those that send v to av for some fixed a ∈ F ) induce the
identity isomorphism and they are the only semilinear isomorphisms that
do. The subgroup of scalar transformations is the centre of GL(V ) and a
normal subgroup of Γ L(V ); the quotient groups are denoted, respectively,
by P GL(V ) — the projective general linear group — and P Γ L(V )
— the projective semilinear group. If V is n-dimensional and a basis
has been chosen, P GL(V ) becomes a matrix group modulo scalar matrices
and is denoted by P GLn (F ); similarly in this case we write P Γ Ln (F ) for
P Γ L(V ). Each of these groups acts as a permutation group on the elements
of P G(V ), the action on the points of P G(V ) being doubly-transitive, which
means that given any two pairs of distinct points, (P, Q) and (P 0 , Q0 ), there
is an automorphism in P GL(V ) which simultaneously carries P to P 0 and
Q to Q0 . In the standard notation, P GLn (F ) acts on P Gn−1 (F ); similarly
for the semilinear group.
All the collineations of P G(V ) are induced by semilinear transforma-
tions; this is the content of the following classical fundamental theorem
of projective geometry:
2 PROJECTIVE AND AFFINE GEOMETRIES 7
Definition 2.4 The cosets x+U and y+W in AG(V ) are parallel if U ⊆ W
or W ⊆ U .
Cosets of the same subspace are thus parallel and cosets of the same
dimension are parallel if and only if they are cosets of the same subspace.
For a given subspace U of dimension r, its distinct cosets partition V into
parallel r-flats and parallelism is an equivalence relation on the set of r-flats
of V , the equivalence classes being called parallel classes. Hyperplanes, i.e.
(n − 1)-flats, in AGn (F ) are parallel if and only if they are equal or intersect
in the empty set and in AGn (F2 ) a hyperplane and its complement make
up a parallel class. In AGn (Fq ) there are q hyperplanes in a parallel class.
Here is one more important fact about flats that we will need to properly
explain Reed’s decoding algorithm for Reed-Muller codes:
If M is an r-flat and N an (n − r)-flat in AGn (F ), then either M ∩ N is
a single point, in which case N meets all the r-flats parallel to M in a single
point, or else the intersection of N with an r-flat parallel to M is either a
flat of positive dimension or the empty set.
As in the projective case, both GL(V ) and Γ L(V ) act on the geometry,
but now we also have V itself acting via translation. The underlying action
of the affine general linear group, AGL(V ) , and the affine semilinear
group, AΓ L(V ), is given as follow: for T ∈ Γ L(V ) and v ∈ V , the map
(T, v) is defined by
x(T, v) = xT + v
for each x ∈ V . Such maps preserve cosets and thus act on AG(V ). Com-
position is given by (S, v)(T, w) = (ST, vT + w) and it follows that these
affine groups are semi-direct products of the linear and semilinear groups
(respectively) with the additive group of V , the action of the linear and
semilinear groups on V being the natural one. The permutation action on
the points of AG(V ), i.e. on the vectors in V , is doubly-transitive and, if
F = F2 , it is triply-transitive.
Given a basis v1 , v2 , . . . , vn for V , if (T, v) is an element of AΓ L(V ),
P P
and v = i bi vi , define the matrix A via vi T = j aij vj , and let α be the
2 PROJECTIVE AND AFFINE GEOMETRIES 9
U ϕ−1 = U ∩ (x + H),
where v X (x) denotes the value that the function v X takes at the point x.
Then
Cp (D) = hv B | B ∈ Bi.
The dimension of Cp (D) is referred to as the p-rank of D. The rank tends to
vary with p in the general case; for so-called 2-designs it is easily determined
except for those primes dividing the order of the design.
The minimum weight of the code arising from an incidence structure is
clearly at most equal to the cardinality of the smallest block. In general the
minimum weight is strictly less than this cardinality, but for the classical
geometric designs studied in this chapter there is a distinguished prime one
considers, and for these codes we will have equality.
As we will soon see, one of the most widely studied class of binary
codes, the Reed-Muller codes, arises precisely as the class of codes given by
geometric designs over the binary field — although the original presentation
of these codes in 1954 was in the boolean-function context and was given by
electrical engineers.
3.1 Definitions
Throughout this section F will denote the field F2 . Let V be a vector space
of dimension m over F . We let F V denote the vector space over F of all
functions from V to F . As a vector space over F , F V has dimension 2m ,
the cardinality of the set V . Since F V will be the ambient space for the
Reed-Muller codes we must choose a basis for it and we choose the standard
basis consisting of the characteristic functions of the elements of the set V .
Denoting a typical element of V by v these basis elements are {v v | v ∈ V },
where we write v v instead of the more cumbersome v {v} . Viewing V as F m ,
3 THE REED-MULLER CODES 13
ei = (0, 0, . . . , 1, 0, . . . , 0).
| {z }
i
where we write 1 for the constant function x01 x02 . . . x0m with value 1 at all
points of V ; as a code vector it is the all-one vector 1. The linear com-
binations over F of these 2m monomials give all the polynomial functions,
since, once again, we can reduce any polynomial in the xi modulo x2i − xi ,
for i = 1, 2, . . . , m. The set M of 2m monomials is another basis for the
vector space F V ; the following lemma indicates how each of our given basis
elements of characteristic functions of the vectors in V is given as a polyno-
mial, i.e. as a sum of elements of M. This not only proves the assertion but
also shows that the set M is a linearly independent set of vectors in F V .
Proof: The proof is simple: the first polynomial is easily seen to define the
characteristic function of the vector w; and the expansion of this product is
clearly the sum on the right. 2
We repeat the definition of the Reed-Muller codes:
3 THE REED-MULLER CODES 14
Example 3.3 The first-order Reed-Muller code R(1, m) consists of all lin-
ear combinations of the monomials xi and 1 and hence each codeword, apart
from 0 and 1, is given either by a non-zero linear functional on V or by 1
plus such a functional. Since any non-zero linear functional has 2m−1 zeros,
every vector of R(1, m), apart from 0 and 1, has weight 2m−1 . A generator
matrix for R(1, m) using the basis x1 , x2 , . . . , xm , 1 can be written so that
the first 2m − 1 columns and m rows are the binary representations of the
numbers between 1 and 2m − 1, whereas the last column is all 0, apart from
a final row where all entries are equal to 1. This is clearly a generator matrix
for the orthogonal of the extended Hamming code, i.e. R(1, m) = (H ⊥
m) ,
d
where C denotes the code obtained from C by adding an overall parity check.
b
R(r, m) ⊆ R(s, m)
whenever 0 ≤ r ≤ s ≤ m.
We mentioned above that the orthogonal of R(0, m) = F 1 is R(m−1, m).
This is a special case of the following result, which was proved in Chapter 1:
We will, in fact, reprove this result in Section 5 when we give its straight-
forward generalization to generalized Reed-Muller codes, Theorem 5.8.
Example 3.5 From Theorem 3.4 we get immediately that
R(1, m)⊥ = H
dm = R(m − 2, m).
binary code.
Proof: This follows easily: the dimension must be that of R(r, m) since
all the vectors in this code are of even weight and the projection cannot,
therefore, have a nontrivial kernel. 2
Finally, note that it follows from Theorem 3.4 that
(R(r, m)∗ )⊥ = R(m − r − 1, m)∗ ∩ (F2 1)⊥
provided r < m. That is, (R(r, m)∗ )⊥ consists of the vectors of (R(m − r −
1, m)) with a zero at 0, that coordinate being discarded.
3 THE REED-MULLER CODES 16
R(1, m) = C2 (A).
The general case is almost as easy. First of all we have that the flats are
in the Reed-Muller code:
Proof: Any (m − r)-flat T in AGm (F2 ) consists of all the vectors (points of
the affine space) x = (x1 , x2 , . . . , xm ) that satisfy r linear equations,
m
X
aij Xj = bi , for i = 1, 2, . . . , r ,
j=1
3 THE REED-MULLER CODES 17
has degree at most r and thus is in R(r, m). Moreover it is clearly the
characteristic function v T of T . 2
In fact the degree of the polynomial is exactly r when the equations are
independent and the proof actually shows that all the (m − s)-flats are in
R(r, m) provided s ≤ r.
Theorem 3.10 Let A be the design of points and r-flats of the affine ge-
ometry AGm (F2 ), where 0 ≤ r ≤ m. Then the binary code C2 (A) is the
Reed-Muller code R(m − r, m). Its dimension is
! ! !
m m m
+ + ··· + .
0 1 m−r
The characteristic functions of the r-flats are vectors of weight 2r and are
precisely the minimum-weight vectors of R(m−r, m), as we shall soon prove.
Before doing so, we introduce two exact sequences that arise naturally from
the geometric nature of the Reed-Muller and punctured Reed-Muller codes.
Lemma 3.12 Any embedding of P Gm−1 (F2 ) into P Gm (F2 ) gives rise to
the following two short exact sequences whenever 0 ≤ r < m:
Proof:
Let W be the (m+1)-dimensional vector space defining P Gm (F2 ). Then
an embedding of P Gm−1 (F2 ) in P Gm (F2 ) is given by a hyperplane H of W
and, moreover, the complement of H in W , H = W − H, is a copy of
AGm (F2 ), as we explained in Section 2.2.
Let D be the design of points and r-dimensional subspaces of P Gm (F2 ).
Using P G(H) we form the design D1 of r-dimensional subspaces in
P Gm−1 (F2 ), and from AG(H) we form the design D2 of r-flats in AGm (F2 ).
By Theorem 3.10, C2 (D) = R(m − r, m + 1)∗ , C2 (D1 ) = R(m − r − 1, m)∗
and C2 (D2 ) = R(m − r, m).
Any block of the design D is either in H or meets it in an (r − 1)-
dimensional (projective) subspace. The intersection with H is thus empty or
an r-flat; clearly every r-flat of AG(H) arises in this way. Thus C = C2 (D)
projects onto C2 (D2 ), and C2 (D1 ) is in the kernel. Thus dim(C2 (D1 )) ≤
3 THE REED-MULLER CODES 19
dim(C2 (D)) − dim(C2 (D2 )), and using the formula for the dimension of the
Reed-Muller codes, we have
m−r−1 m−r m−r
! ! !
X m X m+1 X m
≤ − .
i=0
i i=0
i i=0
i
m+1 m m
Using the identity k = k + k−1 repeatedly, shows that this is actually
an equality, and hence that C2 (D1 ) is the whole kernel. This yields the short
exact sequence (i).
To obtain the second sequence we use the same embedding but now
project C onto the coordinate positions corresponding to the points of
P G(H). Let E 2 be the design of points and (r − 1)-dimensional subspaces of
P Gm−1 (F2 ), and E 1 the design of points and (r +1)-flats of AGm (F2 ). Then
C2 (E 2 ) = R((m − 1) − (r − 1), m)∗ , and C2 (E 1 ) = R(m − r − 1, m). Cer-
tainly C projects onto C2 (E 2 ) since every r-dimensional projective subspace
of P G(W ) meets H in an (r − 1)-dimensional subspace — or is contained
in H — and every (r − 1)-dimensional subspace arises in this way. Two
r-dimensional subspaces of P G(W ) that meet P G(H) in the same (r − 1)-
dimensional subspace have disjoint intersections in H and thus form two
cosets of the same r-dimensional subspace of AG(H). Together they form
an (r + 1)-dimensional space. It follows immediately that the kernel of the
projection is C2 (E 1 ) and thus yields the sequence (ii). 2
We next draw out the consequences of Lemma 3.12 and in so doing
prove that the minimum weights of the Reed-Muller codes are as we have
indicated and, more importantly, determine the nature of the minimum-
weight vectors.
Since, for r = 0, the result is trivial for any m we may assume r > 0
and consider dimension m + 1. If r = m the results are easy, for then we
have R(1, m + 1), a case we have already discussed. We suppose 0 < r < m
and use the notation of Lemma 3.12. Thus D is the design of points and
r-dimensional subspaces of P Gm (F2 ). Let v be a minimum-weight vector
of C = C2 (D), so that wt(v) ≤ 2r+1 − 1. If v is zero at the coordinates
corresponding to H = W − H, then v can be viewed in C2 (D1 ), from the
short exact sequence (i), and hence v has weight 2r+1 −1 and is the incidence
vector of an r-dimensional subspace of H (and hence of W ), by the induction
hypothesis. If v is zero at the coordinates corresponding to H, then v can
be viewed in C2 (E 1 ) = R(m − (r + 1), m), from the short exact sequence
(ii), and thus has weight at least 2r+1 , which is not possible. Thus v can
be taken to have support meeting both H and H. Again by the induction
hypothesis, the weight is at least 2r + 2r − 1 = 2r+1 − 1, using the last
non-zero terms of the short exact sequences, and hence has exactly this
weight. Furthermore, restricted to P G(H), v is the incidence vector of an
(r − 1)-dimensional subspace. To show that v is the incidence vector of an
r-dimensional subspace of P Gm (F2 ), construct an r-dimensional subspace
of P Gm (F2 ) whose incidence vector w coincides with v on P G(H) and that
contains at least one point in H in common with the support of v. Then
the weight of v − w is easily seen to be less than 2r+1 − 1 and hence v = w.
This gives the projective result for projective dimension m from which the
affine result for dimension m + 1 follows since the Reed-Muller codes are
invariant under translation in V — as we remarked in the last section —
which means it is sufficient to consider only those minimum-weight vectors
of the Reed-Muller code with a 1 at 0. 2
It should be noted that the code of any projective-geometry design is
cyclic due to the existence of Singer cycles (as already mentioned in Sec-
tion 2.1) and hence the punctured Reed-Muller codes are cyclic.
We summarize the results obtained on the properties of the Reed-Muller
codes and finite geometries over the field F2 :
(1) If A is the design of points and r-flats of the affine geometry AGm (F2 ),
where 0 ≤ r ≤ m, then the binary code C = C2 (A) is the Reed-Muller
code R(m − r, m). It is a [2m , m m m r
0 + 1 + · · · + m−r , 2 ] binary
code and the minimum-weight vectors are the incidence vectors of the
3 THE REED-MULLER CODES 21
Example 3.15 (1) The code of the design of points and lines in AG4 (F2 )
is R(3, 4), which is the even-weight subcode of F V . Its orthogonal is
F 1 = R(0, 4). The code of the design of points and planes is R(2, 4),
of dimension 11, with orthogonal the code from the design of points
and hyperplanes, of dimension 5, i.e. R(1, 4).
(2) The code of the design of points and lines in P G3 (F2 ) is R(2, 4)∗ ,
of dimension 11 and minimum weight 3; it is, of course, a binary
Hamming code.
(3) A basis consisting of the incidence vectors of lines in P Gm (F2 ) for the
code R(m − 2, m)∗ = Hm can be found as follows (as described in
Key and Sullivan [32]): take any line and include its incidence vector;
take any point off the line, and include the three incidence vectors of
the three lines joining the new point to the points on the first line.
Continue in this way: at each stage, if there is a point not yet incident
with a chosen line then simply take all the incidence vectors of the
lines joining that point to the points already obtained. These incidence
vectors are clearly linearly independent and, as is easily seen, are equal
in number to the dimension; hence they yield a basis. The successive
dimensions are
The group AGLm (F2 ), in its natural action on V = F2m , yields a group
of automorphisms of every Reed-Muller code R(r, m) and P GLm (F2 ), in its
natural action on V ∗ = V − {0}, yields a group of automorphisms of every
punctured Reed-Muller code R(r, m)∗ . That P GLm (F2 ) is the full group
of automorphisms of R(r, m)∗ whenever 0 < r < m − 1 follows from Theo-
rem 3.13 and the fundamental theorem of projective geometry, Theorem 2.3.
¿From this it follows that AGLm (F2 ) is the full group of automorphisms of
R(r, m) whenever 0 < r < m − 1. One must be careful here and note that
it is not the entire projective space that must be preserved, but only part of
it, to ensure that the automorphism comes from the general linear group.
3.3 Decoding
One of the attractions of Reed-Muller codes is the simple and easily im-
plemented decoding scheme that is available, with decoding decisions made
by majority vote, just as with the repetition code — which is, of course,
the simplest Reed-Muller code, R(0, m). Since the scheme is related to
the geometric nature of these codes we describe it here. The scheme dates
from the very beginning of coding theory and is due to Reed [46]. It was
Reed’s algorithm that prompted the investigation of majority-logic decoding
and the rather peculiar definition of so-called Euclidean-geometry codes as
maximal cyclic subspaces of duals of the codes generated by certain flats in
AGm (F2s ). We begin by describing majority-logic decoding.
Let C be an arbitrary linear code contained in the ambient space Fqn .
Recall that a parity check is simply a code vector in the orthogonal code,
C ⊥ , and that the support of a vector in Fqn is the set of coordinate positions
in which it has non-zero entries. Suppose we are given J parity checks and a
coordinate position, i say, such that the intersection of the supports of any
two of the given parity checks is precisely the singleton set {i}. If a received
vector has been perturbed by t or fewer errors during transmission, where
2t ≤ J, then, clearly, at least half of the parity checks will give zero (i.e.
check) when applied to the received vector unless the symbol at coordinate i
3 THE REED-MULLER CODES 23
is in error. Moreover, had we normalized the given parity checks so that each
had a 1 at coordinate i, then, in the event that one of the t or fewer errors
had occured at coordinate i, at least half of the parity checks would record
that error. Thus a majority “vote” of the values of the parity checks corrects
the entry at coordinate i. This is the essence of majority-logic decoding.
Such a collection of parity checks is said to be focused on i2 . Note that
if each of the coordinates of C has a collection of J parity checks focused
on it, then the code will necessarily correct t or fewer errors — where again
2t ≤ J — and therefore C must have minimum weight at least 2t + 1.
Indeed, it is very easy to see that if there is a set of J parity checks focused
on a coordinate i, then any code vector with a non-zero entry at i must
have weight at least J + 1 in order to satisfy the J parity checks. Note also
that any code with a transitive automorphism group (and, in particular, a
cyclic code) will have minimum weight at least J + 1 provided that one, and
hence all coordinates, has a collection of J parity checks focused on it. In
the cyclic case majority-logic decoding, when it is available, is particularly
simple.
An instructive example is the dual C of a binary Hamming code, fre-
quently referred to as a simplex code. It has the classical Steiner triple
system, namely the lines of the projective geometry, among its parity checks
and the pencil of lines through a point of the geometry fulfills the require-
ments for a collection of parity checks focused on the given point; if the
Hamming code is of block length 2m − 1, then J = 2m−1 − 1. Indeed, C has
minimum weight 2m−1 and t = 2m−2 − 1 errors can be corrected. A simpler,
but still important, case is the following:
To use majority logic to correct many errors one must have many parity
checks focused on each coordinate, which entails that the minimum weight
d⊥ of C ⊥ be small; in fact, in order to have J parity checks focused on a given
coordinate we must have (d⊥ − 1)J ≤ n − 1. The examples above have the
2
Unfortunately the term “orthogonal on i” is the terminology of most of the coding
literature. Blahut, recognizing the problem with that terminology, used “concurrent on
i” in [7], but that does not seem to have been adopted. We here make another attempt
at change.
3 THE REED-MULLER CODES 24
smallest possible minimum weights for their duals and allow error-correction
via majority logic to correct up to the full error-correcting capacity of the
code. In the coding literature such codes are said to be “completely orthog-
onalizable”.
Consider next the Reed-Muller code C = R(r, m) where r < m. A
basis for C is the set of monomials of degree r or less. The idea of Reed’s
decoding scheme is to determine first the “information bits” corresponding
to monomials of degree r, thus reducing the problem to decoding in the Reed-
Muller code R(r − 1, m). Let K be a subset of {1, 2, . . . , m} of cardinality
r and let L be the complement of K. Now the monomial of degree r,
Y
xk ,
k∈K
Xl = 1, l ∈ L
were not in the dual of the code in question: the r-flat T above has a
characteristic function in R(r − 1, m)⊥ but not in R(r, m)⊥ . However, if
T 0 is any tranlate of T then T ∪ T 0 is an (r + 1)-flat whose characteristic
function is in R(r, m)⊥ . Moreover, the 2m−r − 1 flats of this form (namely,
T ∪ T 0 , where T 0 is a distinct translate of T ) are focused on T in the sense
that the intersection of the supports of any two of them is precisely the
set T . Thus, provided sufficiently few errors were made in transmission, a
majority vote using these parity checks will give the sum of the error bits
contained in the coordinate positions corresponding to the flat T . Such a
“divide and conquer” technique using majority or threshold circuitry was
thoroughly investigated early in the history of coding theory and was the
subject of Massey’s thesis, [41]. For a fuller discussion of the decoding of
Reed-Muller and Generalized Reed-Muller codes and L-step majority-logic
decoding the reader may wish to consult [40] or a textbook on error control,
for example, [7] or [36].
simply the one that assigns xg to the element g of G. Addition and scalar
multiplication are component-wise and the multiplication is given by the
addition in G. Thus,
X X X
xg X g + yg X g = (xg + yg )X g
g∈G g∈G g∈G
and, for c ∈ F , X X
c( xg X g ) = (cxg )X g ;
g∈G g∈G
restricts oneself to p-groups, a result of Faldum’s [20] shows that one might as well restrict
oneself to the elementary abelian case — as far as producing “good” codes is concerned.
4
That is, G is an abelian group all of whose non-identity elements have order p or, in
other words, a vector space over the field Fp . Since the group operation is being written
additively “order p” means simply that pg = 0 for every g ∈ G.
5
The paradigm is the ordinary polynomial ring where the monoid in question is the set
of non-negative integers under addition.
4 THE GROUP-ALGEBRA APPROACH 27
P
g∈G xgis clearly a linear transformation of the vector space structure of R
onto F ; moreover, it is an algebra homomorphism — as one can easily check
from the multiplication formula. We denote the kernel of this augmentation
map by M ; it is, of course, an ideal of R, but much more is true: since we
are in characteristic p we have the Frobenius homomorphism, a 7→ ap , at
our disposal and the fact that G is an elementary abelian p-group gives
X X X X
( xg X g )p = xpg X 0 = ( xpg )X 0 = ( xg )p X 0 ,
g∈G g∈G g∈G g∈G
which shows that every element not in M is invertible in R and hence that
M is the unique maximal ideal of R.
In the binary case, with the interpretation suggested above, M is the
Reed-Muller code R(m − 1, m); we shall shortly see that the powers of the
ideal M give precisely the Reed-Muller codes.
Observe that in our present notation the characteristic function of a sub-
set S of G is given by the element g∈S X g of the group algebra. Consider
P
Since g + h runs through hS 0 i as g does for every h ∈ hg0 i, this latter sum is
g∈hS 0 i X = 0 and we have the result. 2
P P g P g
h∈hg0 i g∈hS 0 i X = p
Now since the ideal M is generated linearly over the field F by the
elements X g − 1, the ideal M r is generated linearly by elements of the
form g∈S (X g − 1) where S is a subset of G of cardinality r. Moreover,
Q
B of G.
as one can easily check. Moreover, in the basis given by X g , the coding-
theory basis we have chosen, such an automorphism is weight preserving
— i.e. it is also an isometry preserving the Hamming metric. If σ is any
4 THE GROUP-ALGEBRA APPROACH 29
In the early history of coding theory there was great interest in deciding
which extended cyclic codes were “affine invariant” and Kasami, Lin and
Peterson [29] settled the question. Because of the historic interest and the
motivation it will provide for the rest of this chapter, we discuss and prove
their result. We are here, as in all of this section, following Charpin [14].
First of all it must be emphasized that “affine invariant” refers not to
the group of isometries discussed above but to a smaller group; a more
precise name would be “translation-invariant extended cyclic codes”. The
point is that one does not demand invariance under the group AGL(G), but
only under the subgroup AGL1 (Fq ), where now we are viewing G as the
field Fq . There are many more codes invariant under this smaller group,
even if one insists that the codes be self-dual: see, for example, [17] where
all binary, affine-invariant self-dual codes of block length at most 512 have
been found and where evidence is presented to suggest that the number goes
to infinity with the admissible block length. On the other hand, the only
binary codes invariant under the larger group are the Reed-Muller codes
(see Theorem 4.17 below).
We shall see in a moment how to extend the cyclic codes in question so
that they will lie in the ideal M , but let us note first that a linear subspace
of R invariant under translation is simply an ideal of R. Our aim, therefore,
is to characterize those ideals invariant under the isometric automorphisms
given by X g 7→ X ug where u is a non-zero field element and where we have
identified G with Fq .
and, indeed, any linear subspace of R invariant under this map comes from
a cyclic code. Since α is a generator of Fq× the image is invariant under the
maps given by X g 7→ X ug for all non-zero u ∈ Fq . We have, by our choice
of α, embedded all cyclic codes over F in M .
Consider next the following F -linear maps φs of R into the space G = Fq :
X X
φs ( xg X g ) = xg g s ,
g∈G g∈G
where 0 ≤ s < n. With the proviso that 00 = 1, the map φ0 is simply the
augmentation map with kernel M . Observe that if i is in the defining set of
a cyclic code C, then the image, C, b of C has the property that φi (c) = 0 for
all c ∈ C. Moreover, for any embedded cyclic code, φ0 (c) = 0 for all c ∈ C.
b b
Thus, we “extend” T to Tb = T ∪ {0} and note that the image of the cyclic
code is defined by Tb in the sense that c ∈ Cb if and only if φi (c) = 0 for all
i ∈ Tb. Hence we abuse the terminology and refer to Tb as the defining set
of C.
b
Unlike φ0 , φs is not an algebra homomorphism for s > 0. It does,
however, have an important multiplicative property which we now explain.
Let N = {0, 1, . . . , n} and define a partial order on N by k l if and only
if kν ≤ lν for all ν, where k = m−1
P ν Pm−1 ν
ν=0 kν p and l = ν=0 lν p are the p-ary
expansions of k and l. We give k ≺ l the obvious meaning: k l but k 6= l.
Then
for all g ∈ G× and unless the φi (x) = 0 for all i s we would have a
s
non-zero polynomial, is i φi (x)Z s−i , of degree less than n with n roots,
P
Proposition 4.8 The defining set of the ideal M t in F2 [G], where G is the
elementary abelian 2-group of order 2m , is that subset of {0, 1, . . . , 2m − 2}
whose elements have binary expansions containing fewer than t entries equal
to 1. That is, the defining set is {i | 0 ≤ i < 2m − 1 and wt2 (i) < t}.
!
X s
g
= φs (x)φ0 (X − 1) + φi (x)φs−i (X g − 1).
i≺s
i
But the first summand on the right side is 0 since X g − 1 ∈ M and the
second summand is 0 since i ≺ s implies wt2 (i) < wt2 (s) = t. Since we
know the dimension of the ideal, we must have precisely the defining set. 2
The above proof is due to Charpin [13]; observe that it does not depend
on the fact that we are in characteristic 2 and, therefore, proves more. We
have, in fact, proved the following
In the event that the field F is not a subfield of G one must take an overfield
of both in order to have a target for the functions φs , but this does not effect
the proof.
Of course M t is an extended cyclic code invariant under translation,
but since we have not yet computed its dimension, we cannot assert that
we have its defining set — as we did in the binary case. The proposition
does, however, show that the dimension is at most equal to |{i | 0 ≤ i <
pm − 1, wtp (i) ≥ t}| since we are in the presence of an extended cyclic code
— which means that dimF (M t ) = pm − |Tb|, where Tb is the defining set
of M t . We will soon exhibit linearly independent elements that will give
us not only this dimension but also the so-called “Jennings8 Basis” of the
algebra F [G].
Let {g0 , g1 , . . . , gm−1 } be a basis of the Fp -space G. For any k =
Pm−1 ν Qm−1 gν
ν=0 kν p , where 0 ≤ kν < p for all ν, set Jk = ν=0 (X − 1)kν . Clearly
Jk ∈ M t whenever wtp (k) ≥ t. Moreover, these elements are linearly in-
dependent over F , where F is any field of characteristic p. For suppose
wtp (k)≥t ak Jk = 0, where all ak ∈ F . Choose j such that wtp (j) is a mini-
P
Such a basis for F [G] was exploited by Jennings [28] and is called a
Jennings basis of the group algebra. It is, as the construction shows, in-
dependent of the coefficient field of the modular algebra and simultaneously
exhibits bases for all powers of the radical.
We note here that the index of nilpotency of the radical is 1 + m(p − 1);
i.e.
M 1+m(p−1) = 0
but M k 6= 0 for any smaller k. Just as in the binary case, M m(p−1) is the
repetition code generated by m−1 gν − 1)p−1 = P g
Q
ν=0 (X g∈G X = 1; it is the
minimal ideal of R, which means that it is contained in every non-zero ideal
of R, a fact that is easily seen using the Jennings basis.
Proof: The dimension is |{k|0 ≤ k < pm , wtp (k) ≥ t}|, of course, but
taking the set of complements, (pm − 1) − k, gives the above description
— which is sometimes more useful. As for the minimum weight, the BCH
bound implies that the minimum weight is at least as announced, since
k = a−1 ν a a −1 is the smallest integer with wt (k) = t.
P
ν=0 (p−1)p +bp = (b+1)p p
On the other hand, (X g0 − 1)b aν=1 (X gν − 1)p−1 yields a vector of the given
Q
Example 4.13 There is a simple formula, easily derived, for the dimension
of M (m−1)(p−1) , since it is the number of ways of selecting at most p − 1
objects — repetitions allowed — from a set of m objects. One has then (cf.
Example 5.6) that
!
(m−1)(p−1) m+p−1
dim(M )=
m
and that among the minimum-weight vectors one finds the characteristic
functions of flats of codimension 1. As we shall see it is the code over Fp of
this affine design.
xg X −g =
X X X
x= xg X g 7→ x−g X g = x
g∈G g∈G g∈G
4 THE GROUP-ALGEBRA APPROACH 37
The ideals we are concerned with are invariant under all the isometric au-
tomorphisms and, in particular, under the canonical automorphism. Hence
we have
Corollary 4.15 If an ideal I is invariant under the canonical automor-
phism, i.e. if I = I, then Ann(I) = I ⊥ . In particular, in the group algebra
R we have that
Ann(M t ) = (M t )⊥ = M m(p−1)+1−t .
Example 4.16 Taking t = m(p − 1), we have that M = (Fp 1)⊥ and, in
particular, is of codimension 1 in R, a fact that has emerged in various ways
during our discussion of the group-algebra approach.
Remark: Observe that even if an ideal is not invariant under the canonical
automorphism, the proposition shows that its annihilator and its orthogonal
are equivalent codes.
4 THE GROUP-ALGEBRA APPROACH 38
X g+h − 1 = (X g − 1) + (X h − 1) + (X g − 1)(X h − 1)
when σ is given by the matrix (aij ). Of course, since the σ are algebra
homomorphisms, we have that
m m
xki i ) =
Y Y
σ( (σxi )ki .
i=1 i=1
element left fixed by all elements of S.10 Letting S be this p-Sylow subgroup
we investigate the action of S on M t /M t+1 , which we also denote by V t .
One first shows by induction on m that every non-zero S-submodule,
W , of V t contains the image of the element wt = xp−1 1 · · · xp−1 b
a xa+1 where
t = a(p − 1) + b with 0 ≤ b < p − 1. For m = 1 this is obvious since, in
this case, modulo M t+1 , M t is generated over Fp by xt1 . Let m > 1 and
let w be an element of M t not in M t+1 that represents an element of the
submodule W fixed by all elements of S; write
p−1
X
w= wi xim
i=0
for some vj ∈ M t−j ∩ hx1 , . . . , xm−1 i with vk−1 = wk−1 + kxi wk . On the
other hand σi w = w and hence vk−1 = wk−1 yielding kxi wk = 0. But
k is a positive integer less that p and hence non-zero in Fp . So we have
that x wk = 0 for 1 ≤ i < m and it follows that wk is a scalar multiple of
Qm−1 i p−1
1 xi which entails a = m − 1 and b = k. But then, wi , for i < k, is
in M (m−1)(p−1)+k−i ∩ hx1 , . . . , xm−1 i = 0 and w is a scalar multiple of the
sought wt . Thus, we have the assertion.
Next, setting t0 = m(p − 1) − t, consider the bilinear map
φ : V t × V t0 → V m(p−1) ≈ Fp
given by φ(x, y) = xy. It is invariant under GLm (Fp ), i.e. φ(σx, σy) =
φ(x, y) for all σ. Moreover, the form is non-degenerate. Thus there is a
vt ∈ V t with φ(vt , wt0 ) = 1. It follows that
m
xp−1
Y
vt = xba i .
i=m+1−a
10
In other words, over Fp a p-group has only the trivial irreducible representation.
4 THE GROUP-ALGEBRA APPROACH 41
Theorem 4.18 For any prime p, the code of the design of points and r-flats
of the affine geometry AGm (Fp ) is M r(p−1) .
Corollary 4.19 The dimension of the code over Fp of the design of points
and (m − 1)-flats of AGm (Fp ) is
!
m+p−1
.
m
Corollary 4.20 The dimension of the code over Fp of the design of points
and lines of AGm (Fp ) is
!
m m+p−2
p − .
m
5 GENERALIZED REED-MULLER CODES 42
5.2 Definitions
First we describe the so-called m-variable approach. This is entirely anal-
ogous to our approach to the Reed-Muller codes (which are, simply, the
5 GENERALIZED REED-MULLER CODES 43
Proof: Since aq−1 = 1 for any non-zero a ∈ E, 1−(xi −wi )q−1 = 0 whenever
xi 6= wi ; thus the polynomial function on the right is clearly the same as the
characteristic function on the left. 2
5 GENERALIZED REED-MULLER CODES 44
Proof: We use the fact that the number of ways of picking jobjects from
a set of m objects — with repetitions allowed — is j+m−1 = j+m−1
m−1 j .
An inclusion-exclusion argument shows that the inner sum is the number
of ways of picking i objects from a set of m objects, when no object can
be chosen more than q − 1 times. Summing on i yields the result. The
simplification to a single sum is due to Calkin. 2
5 GENERALIZED REED-MULLER CODES 45
Proof: The result is clearly true for any constant function (since the block
length of the code is a multiple of, in fact a power of, p, the characteristic
of Fq ) so we need to prove the assertion only for monomial functions, i.e.
elements of M, of positive degree less than m(q − 1). Moreover, if in such
a monomial any ik = 0, the sum is again a multiple of q and hence 0. We
thus restrict ourselves to those monomials in which every xi appears. The
orthogonality relations for the group E × × . . . × E × , using E itself as the
field where the characters take their values, yields immediately, taking η
as the principal character and χ the character sending a = (a1 , . . . , am ) to
ai11 . . . aimm , that X i
a11 . . . aimm = 0
a
since there is some k for which ik < q − 1, and since the sum only need be
taken over those vectors all of whose entries are non-zero. 2
and now we need only check the dimensions: the involution of M that
sends xi11 . . . ximm to xq−1−i
1
1
. . . xq−1−i
m
m yields the fact that the number of
Theorem 5.9 For 0 ≤ ρ ≤ m(q −1), the automorphism group of RFq (ρ, m)
contains the affine group AGLm (Fq ) in its natural action on V = Fqm .
γ : v 7→ Av + a,
Proof: Any (m − r)-flat in AGm (Fq ) consists of all the points x of V satis-
fying r independent equations
m
X
aij Xj = wi , for i = 1, 2, . . . , r
j=1
where all aij and wi are in Fq . If the code RFq (ρ, m) contains the incidence
vector of some t-flat, then it will contain the incidence vector of every t-flat,
since the affine group AGLm (Fq ) acts transitively on t-flats and, as we have
just seen, preserves the code. So we need only construct one (m − r)-flat
that is in RFq (ρ, m).
Consider the polynomial
r
1 − xq−1
Y
p(x1 , . . . , xm ) = i ,
i=1
5 GENERALIZED REED-MULLER CODES 48
X1 = 0, X2 = 0, . . . , Xr = 0
and the entry 0 at points off the flat. Hence it is the incidence vector of this
(m − r)-flat. Since p(x1 , . . . , xm ) ∈ RFq (ρ, m) for ρ ≥ r(q − 1), we have the
result. 2
As in the binary case the proof shows more, namely that the generalized
Reed-Muller code contains all (m − s)-flats for 0 ≤ s ≤ r. Moreover, the
subcode generated by the (m−r)-flats contains, by the same induction argu-
ment used in the binary case, all (m − s)-flats for 0 ≤ s ≤ r. Note, however,
that when using characteristic functions of t-flats to obtain characteristic
functions of (t + 1)-flats one could use coefficients other than 1 provided
q > 2 and hence obtain vectors that are supported on the (t + 1)-flat but
are not characteristic functions.
Example 5.11 Take q = 3 and m = 2. The geometry is then AG2 (F3 ), the
affine plane of order 3. Let C = C3 (AG2 (F3 )) be the code over F3 associated
with this plane, i.e. the code generated by the incidence matrix of the plane.
The incidence vectors of the lines (1-flats) will be in RF3 (ρ, 2) for ρ = 2, 3
and 4. In fact C = RF3 (2, 2), while, as we know, RF3 (3, 2) = (F3 )⊥ and
RF3 (4, 2) = F39 , the entire ambient space.
Since it is easy to see ([2, Corollary 6.4.1]) that the code over Fp of
any affine plane of order p has as minimum-weight vectors only the scalar
multiples of the characteristic functions of lines of the plane, the above
proposition yields an elementary proof of the following
5 GENERALIZED REED-MULLER CODES 49
Corollary 5.16 Provided that ρ < m(q − 1), the generalized Reed-Muller
codes RFq (ρ, m) are extended cyclic codes and of the same dimension as the
corresponding cyclic codes RFq (ρ, m)∗ .
Proof: By Lemma 5.7, f (0) = −
P
w6=0 f (w) provided that the degree of f
is less than m(q − 1). 2
then, using the discrete Fourier transform and noting that 1/v is −1 when
viewed in K,
v−1
P (ω i )ω −ji = −φv−j (P ),
X
cj = − (4)
i=0
where φs is the function defined in Section 4.3. Then
v−1
X v−1
X
P (Z) = cj Z j = − φv−j (P )Z j ,
j=0 j=0
5 GENERALIZED REED-MULLER CODES 51
The vector space L over the field E corresponds with the vector-space struc-
ture of the polynomial ring E[Y ]/(Y v − 1) via
v−1
X
(P (1), . . . , P (ω v−1 )) 7→ P (ω i )Y i .
i=0
Note that if a positive integer u has an orbit of length i under the map
j 7→ jq modulo v, i.e. if uq i ≡ u (mod v) and i is the smallest integer satisfy-
ing the congruence, then the choice of the coefficient of Z u must be in a field
of degree i over E; this agrees, of course, with the dimensional requirements.
For the extended codes we adjoin the extra coordinate position corre-
sponding to 0 ∈ K, where the entry is − v−1 i
P
i=0 P (ω ). Since P (0) = c0 ,
Equation (4) implies that all extended cyclic codes are contained in
where we are utilizing the Frobenius map of K[Z] into itself and slightly
abusing the trace notation. Following θ by the natural map
m
K[Z] → K[Z]/(Z q − Z),
using the standard representatives — namely polynomials in Z of degree
m
less than or equal to v — and viewing Z as Z + (Z q − Z), we see that
5 GENERALIZED REED-MULLER CODES 53
(TrK/E (βi Z))q = TrK/E (βi Z); hence we get the induced ring homomor-
phism,
m
θ̄ : E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ) → K[Z]/(Z q − Z).
by
m
X
Z 7→ xi ω i−1 .
i=1
Since (
Pm i−1 ) qm Pm i−1 ,
i=1 xi ω = i=1 xi ω we obtain a ring homomorphism
m
K[Z]/(Z q − Z) → K[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ).
If P (Z)q = P (Z), then the image of P (Z) must lie in E[x1 , . . . , xm ]/(xq1 −
x1 , . . . , xqm −xm ), since this is the subring of K[x1 , . . . , xm ]/(xq1 −x1 , . . . , xqm −
xm ) left fixed by the Frobenius map, x 7→ xq . Let R denote the subring of
m
K[Z]/(Z q − Z) left pointwise fixed by the Frobenius map; then we have a
ring homomorphism
and using the fact that TrK/E (xi β) = xi TrK/E (β) it follows easily that ψ ◦ θ̄
m
is the identity map. Moreover, since in K[Z]/(Z q −Z) we have (Z v )k = Z v
5 GENERALIZED REED-MULLER CODES 54
and v−1 j m
j=0 cj Z ∈ L. Both rings have dimension q as E-algebras and hence
P
Theorem 5.19 For any prime p, and any ρ such that 0 ≤ ρ < m(p − 1), if
M is the radical of Fp [G], the group algebra over Fp of the elementary abelian
group G of order pm , then the code given by M m(p−1)−ρ is the generalized
Reed-Muller code RFp (ρ, m).
The reduction modulo xqi − xi can only reduce the degree of a given monomial and,
12
Corollary 5.20 For 0 ≤ ρ < m(q − 1) the code RFq (ρ, m)∗ is the cyclic
code with generator polynomial
Y
g(Y ) = (Y − ω u ) ,
0<u<q m −1
wtq (u)≤m(q−1)−1−ρ
Corollary 5.21 For 0 ≤ ρ < m(q − 1) the code (RFq (ρ, m)∗ )⊥ is the cyclic
code with generator polynomial
Y
g(Y ) = (Y − ω u ) ,
0≤u<q m −1
wtq (u)≤ρ
Corollary 5.22 For 0 ≤ ρ < m(q − 1), the dimensions of both RFq (ρ, m)∗
and RFq (ρ, m) are given by
The extended code, RE (2, 2), has a generator matrix that is G augmented
by an extra column whose entries are −1’s: this is a generator matrix for
the code of the affine plane AG2 (F3 ). If the extra column, corresponding to
0, is labelled 0, and added as the first column, and the columns of G then
labelled 1 to 8, then the plane can be pictured as in Figure 1, with incidence
matrix as given in Figure 2, where the rows are arranged in parallel classes.
The columns then correspond to the points
(0, 0), (1, 0), (0, 1), (1, −1), (−1, −1), (−1, 0), (0, −1), (−1, 1), (1, 1);
0, 1, ω, ω 2 , ω 3 , ω 4 , ω 5 , ω 6 , ω 7 .
The matrix !
0 1
1 −1
cycles the last eight of these points and correspondes to multiplication by
ω. The line {0, 1, 5}, for example, has the equation X2 = 0 and the line
{6, 5, 8} has the equation X1 + X2 + 1 = 0.
6sq 5s 8
qqqqq
q q
qq
q q q q q qqqqs
q q qq qAq q q q
A@ qq q qqqqqq qAq q
@A
qq qq Aqq
qqq @sqqq3 Aqqqs
A @
As
0 qqq qqq qqq 7
Aqq @ qq q qq A
Aq q q qqqqq q q@
q q q q q qAq qqq qq q q q q@
q
A
q q qAq q
sq q q q Aqqs @ Aq s
1 2 4
Figure 1: The affine plane AG2 (F3 )
5 GENERALIZED REED-MULLER CODES 57
0 1 2 3 4 5 6 7 8
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
(Y − 1)f (Y ) = Y 6 + Y 5 − Y 4 − Y 2 − Y + 1,
It follows that every integer u with 0 ≤ u < h satisfies wtq (u) < m(q −
1) − ρ, and thus the elements ω 1 , ω 2 , . . . , ω h−1 are all roots of the generator
polynomial of the code. Thus RFq (ρ, m)∗ is a subcode of a BCH code of
designed distance (q − s)q m−r−1 − 1 as stated. 2
The designed distance is the true minimum distance, as the following
theorem shows by an explicit construction of codewords of this weight.
Theorem 5.25 For any ρ such that 0 ≤ ρ < m(q−1), where ρ = r(q−1)+s
with 0 ≤ s < q − 1, RFq (ρ, m) has vectors of weight (q − s)q m−r−1 that
consist of the sum of multiples of the incidence vectors of (q − s) parallel
(m − r − 1)-flats, all contained in an (m − r)-flat.
xi = wi , for i = 1, . . . , r, (6)
xr+1 6= wj0 for j = 1, . . . , s. (7)
There are (q − s)q m−r−1 vectors in E m satisfying both equations and the
codeword corresponding to p(x1 , . . . , xm ) has this weight.
To establish the geometric nature of the codewords defined by such poly-
nomials, consider the q m−r−1 points of E m satisfying (6) and the additional
equation xr+1 = c, where c is an element of E that is not amongst the wj0 .
Then these points all belong to an (m − r − 1)-flat and the corresponding
coordinate positions in the codeword of p(x) have the constant value
s
(c − wj0 )
Y
j=1
5 GENERALIZED REED-MULLER CODES 59
Corollary 5.27 Let p be a prime. The code over Fp of the design of points
and r-flats of the affine geometry AGm (Fpt ) has minimum weight ptr .
Proof: Apply Theorem 5.25 with s = 0. Since the code of the design is
a subset of the generalized Reed-Muller code, it must have at least this
minimum weight, and since it has vectors of this weight, this must be the
minimum weight. 2
Corollary 5.28 Let p be a prime. The code over Fp generated by the dif-
ferences of the incidence vectors of two parallel r-flats of the affine geometry
AGm (Fpt ) has minimum weight 2ptr .
Since abi = 0 for ai < b, δib annihilates the monomial xa11 xa22 . . . xamm
unless b ≤ ai ; similarly bi,j annihilates xa11 xa22 . . . xamm unless b ≤ ai . Both δi0
and 0i,j are the identity on K V .
5 GENERALIZED REED-MULLER CODES 61
independent of xi . Then
X
(f )τiu = pj (xi + u)j
j
!
X j
xj−b
X
b
= pj i u
j b
b
!
X j
pj xj−b
X
b
= u i
b j
b
X
= u (f )δib ,
b
so that δib = − u∈E × u−b τiu for b 6= 0, q − 1, and δiq−1 = −δi0 − u∈E × τiu .
P P
Thus each translation is a linear combination of the δib over K, and con-
versely, giving the theorem. 2
5 GENERALIZED REED-MULLER CODES 62
Proof: In view of Theorem 5.29, since SLm (Fq ) is spanned by the transvec-
u , we need only show that each of these is a linear combination over
tions γi,j
E of the bi,j , and conversely.
Any f ∈ K V can be written in the form
X
f= pr,s xri xsj
r,s
u =
Thus γi,j
P b b
b u i,j , and we can invert this formula to obtain the converse
exactly as in the proof of Theorem 5.29. 2
2. C is spanned by monomials.
u∈E ×
(
xa11 xa22 . . . xamm if k ≡ ai (mod (q − 1))
= .
0 otherwise
contains fewer terms than f and is still in C. This contradicts the choice
of f as a function none of whose terms lies in C with a minimal number of
terms. This contradiction gives the theorem. 2
Now take q = p a prime, so that K is any field of characteristic p.
Proof: We prove this recursively. Let g = xa11 . . . xamm and h = xb11 . . . xbmm
be two monomials with a1 + · · · + am = b1 + · · · + bm . Suppose that after a
change of variables (if necessary) we have
Clearly
ar − br ≤ (bs − as ) + · · · + (bm − am )
5 GENERALIZED REED-MULLER CODES 65
ar − br = cs + · · · + cm
and 0 ≤ cj ≤ bj − aj . Thus
e = (g)cr,s
s
. . . cr,m
m
b
r−1 ar −cs −···−cm a a
= uxb11 . . . xr−1 xr r+1
xr+1 s−1 as +cs
. . . xs−1 xs . . . xamm +cm
a a
= uxb11 . . . xbrr xr+1
r+1 s−1 as +cs
. . . xs−1 xs . . . xamm +cm
Now suppose that (m − 1)(p − 1) < k < m(p − 1). Since Kk⊥ =
Km(p−1)−k−1 and
Kk−1 ⊂ C ⊆ Kk ,
we have
Km(p−1)−k−1 ⊆ C ⊥ ⊂ Km(p−1)−k .
Since from the above inequality we have that
Corollary 5.34 With the natural action of AGLm (Fp ) on a vector space V
of dimension m over Fp , where p is a prime, the only subspaces of FpV left
invariant by AGLm (Fp ) are the generalized Reed-Muller codes RFp (ρ, m).
and we will take n to be the length of the geometric codes we will consider.
Since, for any integer j, j ≡ wtq (j) (mod q − 1), in the isomorphism
we have given between R and E[x1 , . . . , xm ]/(xq1 − x1 , . . . , xqm − xm ), the
5 GENERALIZED REED-MULLER CODES 67
Here P (Z) is defined in Equation (15) and Lproj is defined above. The rth
order projective generalized Reed-Muller code is also given by
m m
* +
xi11 xi22
X X
. . . ximm | ik ≡ 0 (mod q − 1), ik ≤ r(q − 1) ,
k=1 k=1
Here the weight wtq (j) is defined in Definition 4.7, on page 33.
Example 5.37 To construct the code PF3 (1, 3), of length 13 and dimension
7, the multi-variable formulation is the easiest to give, since the generating
monomials are readily seen to be
{1, x1 x2 , x1 x3 , x2 x3 , x21 , x22 , x23 }.
If the irreducible cubic X 3 − X 2 + 1, with root ω, is used to obtain F27 , then
the matrix
0 0 −1
1 0 0
0 1 1
5 GENERALIZED REED-MULLER CODES 68
Since our projective codes are cyclic we can use the roots to obtain the
orthogonal in the usual way. The code orthogonal to PFq (r, m) is obtained
as follows:
Remark: In Example 5.37, r = 1 and the orthogonal is PF3 (1, 3) ∩ (F3 1)⊥ ,
as expected.
Proof: Since, by Corollary 5.26, the minimum weight of RFq ((m − r)(q −
1), m)∗ is q r − 1, the minimum weight is at least (q r − 1)/(q − 1). But the
polynomial
m−r
Y
p(x1 , . . . , xm ) = 1 − (xi )q−1
i=1
5 GENERALIZED REED-MULLER CODES 69
is such that each of its monomials has degree divisible by q − 1 and yields a
code vector. Since it takes the value 1 at 0, it obviously yields a vector of
weight (q r − 1)/(q − 1) in PFq (m − r, m). 2
The polynomial above that yields a minimum-weight vector is, in fact,
the incidence vector of an r-dimensional subspace of Fqm . Thus projectively
it is an (r − 1)-dimensional subspace of P Gm−1 (Fq ).
Proof: Let Q, A and P be the sets of integers whose cardinalities give the
dimensions of PFp (r−1, m), RFp (r(p−1), m) and PFp (r, m+1), respectively.
We must show that |Q| + |A| = |P |. Now Q is the set of integers satisfying
0 ≤ u ≤ pm −1, where (p−1) divides u, and wtp (u) ≤ (r−1)(p−1). Similarly,
A is the set of integers satisfying 0 ≤ u ≤ pm − 1 and wtp (u) ≤ r(p − 1)
while P the set of integers satisfying 0 ≤ u ≤ pm+1 − 1, p − 1 divides u, and
wtp (u) ≤ r(p − 1).
14
The general case will be treated in the following section.
5 GENERALIZED REED-MULLER CODES 70
Divide P into the following two disjoint sets: Q0 , the set of those integers
in P whose p-ary expansion has um = p − 1, and A0 , those integers in P
whose p-ary expansion has um < p−1. For u ∈ P where u = u0 +· · ·+um pm ,
set f (u) = u − um pm . The reader will have no difficulty in seeing that f
yields a one-to-one correspondence between Q0 and Q and between A0 and
A. 2
We now use an embedding of P Gm−1 (Fp ) in P Gm (Fp ) just as we did in
the Reed-Muller case; here we know that the code of the projective design of
r-dimensional subspaces projects onto RFp ((m − r)(p − 1), m), which is the
code of the design of r-dimensional flats of AGm (Fp ), and that the kernel
contains the code of r-dimensional subspaces of P Gm−1 (Fp ). An induction
on m now yields the dimensional equality we seek and hence the following
Theorem 5.41 For p a prime, the code over Fp of the design of points
and r-dimensional subspaces of P Gm (Fp ) is the projective generalized Reed-
Muller code PFp (m − r, m + 1). Moreover, we have the following exact se-
quence for these codes:
Remark: The sequence above can also be read as an exact sequence of the
geometric codes and, as such, is
Proof: We know that the scalar multiples of the r-flats are minimum-weight
vectors. Moreover, if any minimum-weight vector has as its support an r-
flat, then it clearly must be a scalar multiple of the characteristic function
of that flat. Suppose, therefore, that we have a minimum-weight vector v
whose support, X say, is not an r-flat. Of course, |X| = pr . Without loss of
generality we may assume that X contains the zero vector. Now, since we
are over a prime field, the set X cannot be closed under addition (for then
it would be a subspace). Let x ∈ X be such that x + X 6= X. Now the
vector v is a linear combination of characteristic functions of r-flats, i.e.
X
v= aS v S ,
S∈S
Since v−w is in the code generated by the differences of the incidence vectors
of parallel r-flats, it has, by Corollary 5.28, weight at least 2pr . But this is
impossible since its support is a subset of X ∪(x+X), which is of cardinality
less than 2pr since x ∈ X ∪ (x + X). Thus every minimum-weight vector
is supported on an r-flat and hence is a scalar multiple of the characteristic
function of an r-flat. 2
We complete our discussion of the geometric codes in the case in which
q = p is a prime by showing that we have the analogous result in the
projective case. In order to do so we first proceed more generally with q
arbitrary and introduce some temporary notation.
Let Ar,m denote the design of points and r-flats of AGm (Fq ) and Pr,m
denote the design of points and r-dimensional subspaces of P Gm (Fq ).
Consider next the subcode Er,m of Cp (Ar,m ) generated by the differ-
ences of incidence vectors of parallel r-flats. Just as in the binary case (see
Section 3.2) Er,m is in the kernel of the projection of Cp (Pr,m ) onto the
coordinates corresponding to the embedded (m − 1)-dimensional projective
space, the image of the projection being Cp (Pr−1,m−1 ). Observe that by
using the (q − 1)-to-1 map of V − {0} onto P Gm−1 (Fq ), where V is the m-
dimensional vector space over Fq defining the projective space, we can pull
∗
the code Cp (Pr−1,m−1 ) back to FVp , where we are writing V ∗ for V − {0};
this simply amounts to repeating each column (q − 1) times. By adjoining
5 GENERALIZED REED-MULLER CODES 72
an overall parity check to this pull-back we get the code in FVp that is gen-
erated by the incidence vectors of the r-dimensional subspaces of V . Call
this code Pr,m . Viewing Cp (Ar,m ) and Er,m in this same ambient space we
have, clearly, that
Er,m + Pr,m = Cp (Ar,m ).
This equation points to the reason why the binary case is so easy: when
q = 2, Er,m ⊆ Pr,m and thus we need analyse only the projective geometry
codes.
For the same reason as in the binary case, Pr+1,m ⊆ Pr,m and, further-
more, Pr+1,m ⊆ Er,m since, if T is any (r + 1)-dimensional subspace, S any
r-dimensional subspace contained in it, and v is in T but not in S, then
(v S − v av + S ).
X
−v T =
a∈Fq ,a6=0
Letting ar,m be the p-rank of Ar,m , pr,m the p-rank of Pr,m and setting
er,m = dim(Er,m ), we have that dim(Pr,m ) = pr−1,m−1 and that
since the intersection, Pr,m ∩ Er,m , contains Pr+1,m . Further, we have the
following:
and
0 → Er,m → Cp (Pr,m ) → Cp (Pr−1,m−1 ) → 0,
that arise from the embedding is exact, then so is the second and, moreover,
in that case we have
Proof: Clearly, just as in the binary case, the sequences follow easily from
the embedding, and we need only check that the kernels are as described.
That the codes are contained in the kernels is obvious; thus in order to prove
that they are the kernels we must check the dimensions. From the discussion
5 GENERALIZED REED-MULLER CODES 73
preceding the lemma, in particular (16), and the second sequence, we have
that
pr,m−1 + ar,m ≤ pr−1,m−1 + er,m ≤ pr,m
and the result follows since, if the first sequence is exact, pr,m−1 + ar,m =
pr,m . 2
With this machinery in place one can now, for any prime p, imitate the
proof for the binary case: see Theorem 3.13. Note that here we need only
identify the vectors in the projective codes since we have already determined
the minimum-weight vectors in the affine case. We leave to the reader the
proof of the following
Definition 5.45 Let C be a linear code over a field E and let F be a subfield
of E. The set C 0 of vectors in C, all of whose coordinates lie in F , is called
the subfield subcode of C over F .
It is easy to verify that C 0 is a linear code over F and that any per-
mutation of the coordinate positions preserving C also preserves C 0 . We
are interested here only in the case where F = Fp , the prime subfield of
E = Fq . In what follows q = pt .
Definition 5.46 Denote by AFq /Fp (ρ, m) the subfield subcode of the gener-
alized Reed-Muller code RFq (ρ, m) and by PFq /Fp (r, m) the subfield subcode
of the projective generalized Reed-Muller code PFq (r, m).
5 GENERALIZED REED-MULLER CODES 74
(where upj is taken reduced modulo q m − 1, for the same reasons as before),
we have that AFq /Fp (ρ, m) is given by
X
(P (0), . . . , P (ω v−1 )) | P (Z) = cu Z u , cu ∈ Fqm , cup = (cu )p
u∈Vρ
and that
dim(AFq /Fp (ρ, m)) = |Vρ |.
Clearly, by Theorem 5.10, AFq /Fp (ρ, m) will contain the incidence vector of
any (m − r)-flat when ρ ≥ r(q − 1). Its minimum weight dρ is bounded by
d⊥
ρ ≤ dµ ≤ q
r+1
, (17)
from the above discussion. A lower bound for d⊥ ρ follows from the BCH
bound, and some evaluations of these are quoted in Delsarte et al. [18,
Theorem 4.3.1]. In particular, for ρ = r(q − 1) this gives
d⊥
r(q−1) ≥ (p + q)q
r−1
,
5 GENERALIZED REED-MULLER CODES 75
d⊥
r(q−1)+(q−2) ≥ q
r+1
,
d⊥
r(q−1)+(q−2) = q
r+1
.
For the codes of designs arising from projective geometries, we must take
the subfield subcodes of the codes PFq (m − r, m + 1). As we have already
indicated the minimum weight of this code is q r + q r−1 + · · · + 1 and the
incidence vectors of the projective subspaces of dimension r are minimum-
weight vectors.
Our interest is in the codes given by the designs of r-flats of the affine
spaces and r-dimensional subspaces of the projective spaces. Just as in the
binary case we must first analyse the codimension 1 case — in the projective
case the design of points and hyperplanes of a projective space. This case
was, historically, the one given the most attention and was introduced for
projective planes by Prange with Rudolph considerably enriching the sub-
ject and making serious conjectures. The first systematic treatment in the
case of planes was given by Graham and MacWilliams [22]. These results
were generalized to higher dimensions by Goethals and Delsarte [21] and
MacWilliams and Mann [39]; in particular, these authors computed the di-
mension of the code of the design of points and hyperplanes of an arbitrary
projective space. The results are highly diverse and some of the proofs very
technical: thus we only state what we need and show the reader how to
construct these codes, using results of Delsarte et al. [18]. More recently,
Rose [47] has given elegant new proofs of some of the results and Brouwer
and Wilbrink [10, Theorem 4.8] have given a simple method to compute the
p-ranks of the codes in question. We refer the reader also to a fuller account
in Assmus and Key [2].
We thus now simply state the general theorem. Notice that everything
stated is true also for p = 2, but that Theorem 3.14 gives more precise
information in that case.
(1) The code over Fp of the design of points and r-flats in the affine geom-
etry AGm (Fq ), is AFq /Fp ((m − r)(q − 1), m). It has minimum weight
5 GENERALIZED REED-MULLER CODES 76
• 0 ≤ u ≤ qm − 1
• wtq (upj ) ≤ (m − r)(q − 1), j = 0, 1, . . . , t − 1
and
This latter code has minimum weight 2q m−r with minimum-weight vec-
tors multiples of the difference of the incidence vectors of two parallel
(m − r)-flats . The minimum weight, d⊥ (m−r)(q−1) , of the orthogonal
code satisfies
(q + p)q m−r−1 ≤ d⊥
(m−r)(q−1) ≤ 2q
m−r
.
(2) The code over Fp of the design of points and r-dimensional subspaces
of the projective geometry P Gm (Fq ) is PFq /Fp (m − r, m + 1). It has
minimum weight (q r+1 − 1)/(q − 1) and the minimum-weight vectors
are the multiples of the incidence vectors of the blocks. The p-rank is
given by the cardinality of the set of integers u satisfying
• 0 ≤ u ≤ q m+1 − 1
• (q − 1) divides u
• wtq (upj ) ≤ (m − r)(q − 1), j = 0, 1, . . . , t − 1
i.e. (
pk mod (q − 1) if k < q − 1
[pk] =
q−1 if k = q − 1.
Further, write [k] = k.
Theorem 5.48 For any ρ such that 0 ≤ ρ ≤ m(q − 1), the code
ω 2 x21 , ωx2 + ω 2 x22 , x1 x22 + x21 x2 , ωx1 x22 + ω 2 x21 x2 }. A generator matrix from
these polynomials can be constructed, and the entries are all, of course, in
F2 . For example, if K = F16 is constructed from E using the primitive
polynomial X 2 + ωX + ω and a is a root of this, then ordering the vectors
of E 2 in the usual way, i.e. 0, 1, a, a2 , . . . , a14 . Then the codeword obtained
from the polynomial ωx2 + ω 2 x22 is
(0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0).
The codes PFq /Fp (r, m) can be constructed in a manner analogous to the
primitive case as in Theorem 5.48. With the added condition that (q − 1)
divides i li , the codewords are given by the first n = (q m − 1)/(q − 1)
P
(0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0).
This is the vector v L − v M where L and M are lines, L = {3, 4, 7, 17, 19} =
(1, 1, 0)t , and M = {3, 5, 10, 11, 14} = (1, ω, 0)t , where the points are labelled
1, 2, . . . , 21 in the order given15 .
Note: The designs formed from affine or projective geometries may happen
to have orders divisible by primes other than the characteristic prime for the
geometry. The codes for such primes will not be of any interest — a result
that follows from work of Mortimer [45] on the modular representations of
doubly-transitive groups.
15
Computations here and elsewhere were with Cayley[9] and Magma[11].
5 GENERALIZED REED-MULLER CODES 79
Result 5.51 (Hamada) Let q = pt and let D denote the design of points
and r-dimensional subspaces of the projective geometry P Gm (Fq ), where
0 < r < m. Then the p-rank of D is given by
Y L(sX
X t−1 j+1 ,sj )
! !
X
i m+1 m + sj+1 p − sj − ip
... (−1) ,
s0 st−1 j=0 i=0
i m
and
sj+1 p − sj
L(sj+1 , sj ) = b c,
p
i.e. the greatest integer not exceeding (sj+1 p − sj )/p.
These simple derivations for the case q = p a prime highlight the difficulties
of the prime-power case.
As we have seen in Corollary 4.20:
Result 5.55 The p-rank of the design of points and lines of the affine ge-
ometry AGm (Fp ), where p is a prime, is
!
m m+p−2
p − .
m
Result 5.56 The 3-rank of the Steiner triple system of points and lines of
AGm (F3 ) is 3m − 1 − m.
Another particular case that gave a bound for the p-rank of a translation
plane is the following from Key and Mackenzie [33]:
Result 5.57 If D is the design of points and m-flats in AG2m (Fp ), where
p is a prime, then the p-rank of D is given by
m−1
! !
X
i 2m m + (m − i)p
dim RFp (m(p − 1), 2m) = rankp (D) = (−1) .
i=0
i 2m
There are also other cases where simpler arguments give the p-rank and
even a basis in terms of incidence vectors of the geometric objects involved.
For example, Bagchi and Sastry [4] have produced a simple derivation of the
dimension of the binary code of the design of points and planes in P G3 (F2t )
by finding a set of planes whose incidence vectors form a basis:
Result 5.58 (Bagchi and Sastry) Let D be the design of points and
planes in P G3 (F2t ) and let O be an ovoid in P G3 (F2t ). Then the inci-
dence vectors of the tangent planes to the ovoid form a basis for C2 (D). It
follows that dim(C2 (D)) = 22t + 1.
any p of the p + 1 parallel classes and taking one line from the first, two
from the second, etc., even the choices of the lines being made arbitrarily;
the basis consists of the incidence vectors of the selected lines.
A conjecture of Hamada (see [25, 26]) that the p-rank of the design of
points and r-dimensional flats of a finite-geometry design over a field of char-
acteristic p is always the smallest for designs with the same parameters and
also characterizes such designs, is false in general, the counter-examples first
occuring for 2-(31,7,7) designs: see Tonchev [51] and Delsarte and Goethals
[21]. However, the minimality of the p-rank still appears to be true, and the
conjecture still stands for designs of points and hyperplanes and also for de-
signs of points and lines; moreover, when p = q = 2, this limited conjecture
is valid.
References
[1] E. Artin. Geometric Algebra. New York: Wiley Interscience, 1957.
[2] E. F. Assmus, Jr. and J. D. Key. Designs and their Codes. Cambridge
University Press, 1992. Cambridge Tracts in Mathematics, Vol. 103
(Second printing with corrections, 1993).
[3] Edward F. Assmus, Jr. and Jennifer D. Key. Codes and finite geome-
tries. Technical report, INRIA, 1993. Report No. 2027.
[5] Thierry Berger and Pascale Charpin. The automorphism group of Gen-
eralized Reed-Muller codes. Discrete Math., 117:1–17, 1993.
[8] Ian F. Blake and Ronald C. Mullin. The Mathematical Theory of Cod-
ing. New York: Academic Press, 1975.
HAMMING, GOLAY AND REED–MULLER CODES 31
7. r̄ is corrupted by an uncorrectable error pattern, set error failure flag. End of decoding.
8. Set ĉ = r̄ + ē. End of decoding.
f (x1 , x2 , . . . , xm )
x2 0 0 1 1
x1 0 1 0 1
f (x1 , x2 ) 0 1 1 0
Then,
f (x1 , x2 ) = (x1 AND NOT(x2 )) OR (NOT(x1 ) AND x2 ) .
Associated with each Boolean function f , let f¯ denote the binary vector of length 2m which
is obtained from evaluating f at all possible 2m values of the m variables x1 , x2 , . . . , xm .
In Example 2.3.1, f¯ = (0110), where the convention taken for ordering the bit positions
of f¯ is in accordance with a binary representation of integers, with x1 being the LSB and
xm the MSB.
Also note that a Boolean function can be written directly from its truth table to get the
disjunctive normal form (DNF). Using the DNF, any Boolean function can be expressed as
the sum3 of 2m elementary functions: 1, x1 , x2 , . . . , xm , x1 x2 , . . . , x1 x2 . . . xm , such that
f¯ = 1̄ + a1 x̄1 + a2 x̄2 + · · · + am x̄m + a12 x̄1 x̄2 + · · · + a12···m x̄1 x̄2 . . . x̄m , (2.4)
where 1̄ is added to account for independent terms (degree 0). For Example 2.3.1 above,
f¯ = x̄1 + x̄2 .
A binary RM (2m , k, 2m−r ) code, denoted RM(r, m), is defined as the set of vectors
associated with all Boolean functions of degree up to r in m variables. RM(r, m) is also
known as the r-th order RM code of length 2m . The dimension of RM(r, m) can easily be
shown to be equal to
r
m
k= ,
i
i=0
Example 2.3.2 The first order RM code of length 8, RM(1, 3), is an (8, 4, 4) binary code,
and can be constructed from Boolean functions of degree up to one in three variables:
{1, x1 , x2 , x3 }, so that
1̄ = 1 1 1 1 1 1 1 1
x̄1 = 0 0 0 0 1 1 1 1
x̄2 = 0 0 1 1 0 0 1 1
x̄3 = 0 1 0 1 0 1 0 1
3 “sum” means logical “XOR” and “multiplication” means logical “AND” in this context.
HAMMING, GOLAY AND REED–MULLER CODES 33
A generator matrix for RM(1, 3) is thus given by
1̄ 1 1 1 1 1 1 1 1
x̄1 0 0 0 0 1 1 1 1
G(1, 3) = .
x̄2 = 0 0 1 1 0 0 1 1
(2.5)
x̄3 0 1 0 1 0 1 0 1
Note that code RM(1, 3) can be also obtained from a Hamming (7, 4, 3) code by append-
ing at the end of each code word an overall parity-check bit. The only difference between
the extended Hamming code and RM(1, 3) will be a possible permutation of bit (column)
positions.
of points of the geometry EG(m, 2). Then, there is a one-to-one correspondence between
the components of binary vectors of length 2m and the points of EG(m, 2). A given binary
vector of length 2m is associated with a subset of points of EG(m, 2). In particular, a
subset of EG(m, 2) can be associated with each binary vector w̄ = (w1 , w2 , . . . , w2m ) of
length 2m , by interpreting it as selecting points whenever wi = 1. Stated otherwise, w̄ is
an incidence vector.
Binary RM codes can then be defined as follows: the code words of RM(r, m) are the
incidence vectors of all subspaces (i.e., linear combinations of points) of dimension m − r
in EG(m, 2) (Theorem 8 of (MacWilliams and Sloane 1977)). From this it follows that the
number of minimum-weight code words of RM(r, m) is
m−r−1
2m−i − 1
A2m−r = 2r . (2.6)
2m−r−i − 1
i=0
x1 = x2 = · · · = xm = 0
4 See Section 3.4.
34 HAMMING, GOLAY AND REED–MULLER CODES
from all the code words of RM(r, m) is the binary cyclic RM (r, m) code, which has
m−r−1
2m−i − 1
A2m−r −1 = (2.7)
2m−r−i − 1
i=0
v1 + v 2 + v 3 + v 4 + v 5 + v 6 + v 7 + v 8 =0
v5 + v6 + v7 + v8 =0
v3 + v4 + v7 + v8 =0
v2 + v4 + v6 + v8 =0
v1 + v2 + v3 + v4 =0
v1 + v2 + v5 + v6 =0
v1 + v3 + v5 + v7 =0
v3 + v4 + v5 + v6 =0
v2 + v4 + v5 + v7 =0
v2 + v3 + v6 + v7 =0
v1 + v2 + v7 + v8 =0
v1 + v3 + v6 + v8 =0
v1 + v4 + v5 + v8 =0
v2 + v3 + v5 + v8 =0
v1 + v4 + v6 + v7 =0 (2.8)
The reader is invited to verify that the sum vi + vj of every pair of code bits vi , vj ,
with i = j , appears in exactly four equations. Whenever a set of equations includes a term
vi + vj , but no other sum of pairs appears more than once, the parity checks involved are
said to be orthogonal on positions vi and vj .
It is now shown how a single error can be corrected. Let
r̄ = v̄ + ē = (r1 , r2 , r3 , r4 , r5 , r6 , r7 , r8 )
denote a received vector after code word v̄ is transmitted over a BSC. Suppose that a single
error is to be corrected in the fifth position, v5 . A procedure to design an ML decoder for
this case is as follows:
Two equations involving the term vi + v5 are selected, with i = 5, and another set of
two equations with the term vj + v5 , j = 5, with i = j . Select (arbitrarily, as long as i = j
and both different than 5), say, i = 3, j = 4. There are four parity checks orthogonal to the
term v3 + v5 . Select any two of them. Do the same for the term v4 + v5 .
HAMMING, GOLAY AND REED–MULLER CODES 35
The syndromes associated with these equations are denoted S1 and S2 for v3 + v5 and
S3 and S4 for v4 + v5 ,
S1 = r 1 + r3 + r5 + r 7
S2 = r3 + r 4 + r5 + r 6
S3 = r 2 + r4 + r5 + r 7
S4 = r 1 + r4 + r5 + r 8 (2.9)
Because v̄ is a code word in RM(1, 3), the set of equations (2.9) is equivalent to
S1 = e1 + e3 + e5 + e7
S2 = e3 + e4 + e5 + e6
S3 = e2 + e4 + e5 + e7
S4 = e1 + e4 + e5 + e8 (2.10)
Since S1 , S2 , S3 and S4 are orthogonal on e3 + e5 and e4 + e5 , a new pair of equations
orthogonal on e5 can be formed as:
Open at
n=7
Close at
n=7
r 7 6 5 4 3 2 1 v
e n-4
D
D
D
M
D
en
M
Example 2.3.4 In this example, a decoder for the cyclic RM (1, 3) code, a binary Hamming
(7, 4, 3) code, is derived. To obtain the parity-check equations from those of the RM(1, 3)
code, remove the coordinate v1 for which x1 = x2 = · · · = xm from all equations. Let the
code words of RM (1, 3) be indexed by relabeling the code word elements:
(v2 , v3 , v4 , v5 , v6 , v7 , v8 ) → (v1 , v2 , v3 , v4 , v5 , v6 , v7 ).
As before, an ML decoder for correcting an error in an arbitrary position (say, the fifth
position again) can be derived. This can be shown to result in the following seven nonzero
(linearly independent) parity-check equations:
v 1 + v2 + v3 + v5 =0
v2 + v3 + v4 + v6 =0
v3 + v4 + v5 + v7 =0
v1 + v4 + v5 + v6 =0
v2 + v5 + v6 + v7 =0
v1 + v3 + v6 + v7 =0
v1 + v2 + v4 + v7 =0 (2.13)
In a manner similar to the previous example, the syndromes S1 and S2 below are orthogonal
on v4 and v5 , and S2 and S3 are orthogonal on v5 and v6 :
S1 = e 3 + e 4 + e 5 + e 7
S2 = e 1 + e 4 + e 5 + e 6
S3 = e 2 + e 5 + e 6 + e 7 (2.14)
HAMMING, GOLAY AND REED–MULLER CODES 37
Based on the estimates e4 + e5 and e5 + e6 two additional orthogonal equations on e5 can
be formed to give the final estimate,
S1 = e4 + e5
S2 = e5 + e6 (2.15)
where ej , j = 4, 5, 6, represents the ML estimate from the previous step. This results in the
circuit shown in the Figure 2.1. The circuit operates as follows. Initially, the contents of the
seven-bit register are set to zero. Suppose that a single error is contained in the received word
in position i, for 1 ≤ i ≤ 7. At each clock cycle, the contents of the register are cyclically
shifted to the right by one position. Time, in clock cycles, is denoted by the subindex n in
the following.
Consider first the case i = 1. That is, there is an error in the first code word position.
After three cycles, the error is contained in register 5 (v5 ). The output of the ML circuit is set
to en = 1. Four cycles later (a total of seven cycles), the first received bit is output and the
error is corrected. Consider now the case i = 7. After nine cycles, the error is detected and
en = 1. Again, four cycles later (total 13 cycles), the bit in the last position is output and
the error corrected. This decoder has a latency of 13 cycles. Every 13 cycles, the contents
of the shift register are cleared and a new code word can be processed.
Problems
1. Using the union bound, estimate the probability of a bit error of a binary Hamming
(7,4,3) code, denoted CH 1 , with binary transmission over an AWGN channel.
2. Find the weight distribution of the (23, 12, 7) Golay code.
3. Repeat problem 1 for the binary (23, 12, 7) Golay code, denoted CG . Compare the
performances of CH 1 and CG . Comment on the underlying trade-off between the
code rate and the coding gain.
4. Repeat problem 1 with binary transmission over a flat Rayleigh fading channel. (Use
Monte Carlo integration for the union bound.)
5. Consider a binary Hamming (15,11) code, denoted CH 2 .
(a) Give the generator and parity-check matrices in systematic from, denoted Gsys
and Hsys , for this code.
(b) Suppose that the parity-check matrix is rearranged in such a way that the syn-
drome vector is the binary representation of an integer denoting the position
of the error. Let Hint denote this matrix. Find the permutation of columns πc
needed to be applied to Hsys in order to obtain Hint .
(c) A code word v̄ = ūGsys is sent through a BSC and received with a single error
as r̄. Show that the syndrome of the vector πc (r̄) is the binary representation
of the position of the error, where πc is the permutation in part (b).
(d) Sketch block diagrams of the circuits used in encoding and decoding CH 2 .
Ch. 1. §9. Construction of new codes from old (II) 31
011
Code #9 drawn
as a tetrahedron.
Fig. 1.12.
The simplex code f:f, will also reappear later under the name of a maximal-
length feedback shift register code (see §4 of Ch. 3 and Ch. 14).
[2r,r +1,2r-1)
e.g.[16, 5, B]
The dual of the extended Hamming code is also an important code, for it is
a first-order Reed-Muller code (see Ch. 13). It is obtained by lengthening f:f, as
described in (V). For example lengthening f:/3 in this way we obtain the code in
Fig. 1.14.
0 OjO 0 0 0 0 0
2....!JO I 0 I 0 I
00110011
OliO OliO
0000 l l l l
0101 1010
00111100
OliO 1001
llll llll
10101010
11001100
1001 1001
l l l l 0000
10100101
llOOOOll
1001 OliO
Fig. 1.14. Code # 10, an [8, 4, 4] I st order Reed-Muller Code.
elsewhere) form an orthogonal signal set, since s<o • su> = l),i· Consider the
translated signal set {t<o = s<'>- a}. Show that the total energy L?~, t<o • t< 0 is
minimized by choosing
Theorem 9. If H is the parity check matrix of a code of length n, then the code
has dimension n - r iff some r columns of H are linearly independent but no
r + I columns are. (Thus r is the rank of H).
§I. Introduction
Reed-Muller (or RM) codes are one of the oldest and best understood
families of codes. However, except for first-order RM codes and codes of
modest block lengths, their minimum distance is lower than that of BCH
codes. But the great merit of RM codes is that they are relatively easy to
decode, using majority-logic circuits (see §§6 and 7).
In fact RM codes are the simplest examples of the class of geometrical
codes, which also includes Euclidean geometry and projective geometry
codes, all of which can be decoded by majority logic. A brief account of these
geometrical codes is given in §8, but regretfully space does not permit a more
detailed treatment. In compensation we give a fairly complete bibliography.
§§2, 3 and 4 give the basic properties of RM codes, and Figs. 13.3 and 13.4
give a summary of the properties. Sections 9, 10 and II consider the
automorphism groups and Mattson-Solomon polynomials of these codes.
The next chapter will discuss l '' order RM codes and their applications,
while Chapter 15 studies 2"d order RM codes, and also the general problem of
finding weight enumerators of RM codes.
convenient to call it vm.] Any function f(v) = f(v,, Vm) which takes on the
0 0 0'
since the right-hand, side is equal to 1 exactly when f is. This is called the
disjunctive normal form for f (see Problem 1).
Using the rules (1) this simplifies to (check!)
Notice that v~ = vj. It is clear that in this way any Boolean function can be
expressed as a sum of the 2m functions
with coefficients which are 0 or 1. Since there are 22 '" Boolean functions
altogether, all these sums must be distinct.
In other words the 2m vectors corresponding to the functions (2) are
linearly independent.
372 Reed-Muller codes Ch. 13. §3.
(b) f(v., ... , Vm) = Vmg(v., ... , Vm-1) + h(v., ... , Vm-1) where g, h are B.f.'s.
(c) Disjunctive normal form:
I I
where
w~ = v,, w,0 = v,.
-
g(a) = L
bCa
f(b., .. . , bm), (4)
As in §2, v = (v., Vm) denotes a vector which ranges over vm, and
0 0 0, I is
the vector of length 2m obtained from a Boolean function f(v., . .. , Vm).
Ch. 13. §3. Reed-Muller codes 373
Definition. The r•h order binary Reed-Muller (or RM) code ~(r, m) of length
n =2m, for 0 ~ r ~ m, is the set of all vectors f, where f(v~, ... , Vm) is a
Boolean function which is a polynomial of degree at most r.
For example, the first order RM code of length 8 consists of the 16
codewords
0 00000000
00001111
00110011
0 1 0 1 0 1 0 1
v2 + VJ 00111100
v, + V3 01011010
v, + v2 0 100110
v,+v2+v3 0 101001
l 11111111
l + V3 11110000
l + v2 11001100
l + v, 1 0 1 0 1 0 1 0
l + v2+ VJ 1100001
l + v, + VJ 1 0 1 0 0 10 1
l + v, + v2 0011001
l+v,+v2+vJ 0010110
Fig. 13.1. The I" order Reed-Muller code of length 8.
This code is also shown in Fig. 1.14, and indeed we shall see that ~0. m) is
always the dual of an extended Hamming code. ~(1. m) is also the code ~ 2m
obtained from a Sylvester-type Hadamard matrix in §3 of Ch. 2.
All the codewords of ~0. m) except 0 and l have weight 2m-'. Indeed any
B.f. -of degree exactly 1 corresponds to a vector of weight 2m-', by Problem 2.
In general the r'h order RM code consists of all linear combinations of the
vectors corresponding to the products
such basis vectors vectors, and as we saw in §2 they are linearly independent.
So k is the dimension of the code.
For example when m = 4 the 16 possible basis vectors for Reed-Muller
codes of length 16 are shown in Fig. 13.2 (check!).
I llllllllllll
v. 0 0 0 0 0 0 0 0 l l l l
v, 00001111000011
V2 OOIIOOIIOOIIOOI
v, 0 l 0 l 0 l 0 l 0 l 0 l 0 l 0
V3V4 0 0 0 0 0 0 0 0 0 0 0 0 l l l
v2v. 0 0 0 0 0 0 0 0 0 0 l l 0 0 l
v,v. 0 0 0 0 0 0 0 0 0 l 0 l 0 l 0
V2V3 0 0 0 0 0 0 l l 0 0 0 0 0 0 l
v,v3 0 0 0 0 0 l 0 l 0 0 0 0 0 l 0 l
v,v2 0 0 0 l 0 0 0 1 0 0 0 1 0 0 0 1
V2V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 l
V1V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 l 0
v,v2v. 0 0 0 0 0 0 0 0 0 0 0 l 0 0 0
v,v2V3 0 0 0 0 0 0 0 l 0 0 0 0 0 0 0
v,v2V3V4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Fig. 13.2. Basis vectors for RM codes of length 16.
The basis vectors for the r'h order RM code of length 16, ~(r, 4), are:
Order r Rows of Fig. 13.2
0 l
l l-5
2 I-ll
3 l-15
4 all
Theorem 2.
~(r+ l, m + l) ={luI u +vi: u E ~(r+ l, m), v E ~(r, m)}.
(Indeed, a codeword generated by this matrix has the form lu I u +vi where
uE ~(r+ 1, m), v E ~(r, m).)
where deg (g),;;;; r + 1 and deg (h),;;;; r. Let g and h be the vectors (of length 2m)
corresponding to g(v~, ... , Vm) and h(v~, ... , Vm). Of course g E ~(r + 1, m)
and hE ~(r, m). But now consider g(llJ, ... , Vm) and Vm+th(v., ... , Vm) as
polynomials m v~, . .. , Vm+t· The corresponding vectors (now of length 2m+t)
are lg I gl and IO I hi (see Problem 7). Therefore I= lg I gl + IO I hi. Q.E.D.
(5)
Figure 13.3 shows the dimensions of the first few [n, k, d] RM codes
(check!)
Notice that ~(m, m) contains all vectors of length 2m; ~(m -1, m) con-
sists of all even weight vectors, and ~(0, m) consists of the vectors 0, l
(Problem 5).
Theorem 4. ~(m- r- 1, m) is the dual code to ~(r, m), for 0,;;;; r,;;;; m-I.
Proof. Take a E ~(m- r-1, m), bE ~(r, m). Then a(v., ... , Vm) is a poly-
nomial of degree ,;;;; m - r- 1, b( v 1 , • • • , Vm) has degree ,;;;; r, and their product
ab has degree ,;;;; m - I. Therefore ab E ~ (m - 1, m) and has even weight.
Therefore the dot product a · b = 0 (modulo 2). So ~ (m - r- 1, m) C
376 Reed-Muller codes Ch. 13. §3.
distanced dimension k
1 4 8 16 32 64 128 256 512
2 3 7 15 31 63 127 255 511
4 4 II 26 57 120 247 502
8 5 16 42 99 219 466
16 6 22 64 163 382
32 7 29 93 256
64 1 8 37 130
128 9 46
2~G 10
512
Fig. 13.3. Reed-Muller codes.
For any m and any r, 0 ~ r,;;; m, there is a binary r'h order RM code
~(r, m) with the following properties:
length n =2m,
(In fact we shall see in the next section that an equivalent code is obtained
no matter which coordinate is punctured.)
Clearly ~(r, m)* has length 2m -1, minimum distance 2m-r- l, and dimen-
sion 1 + (';') + · · · + (';).
Problems. (3) Show that ~(r, m) is a subcode of ~(r+ 1, m). In fact show that
g/l(r + l, m) ={a+ b: a E g/l(r, m), b is zero or a polynomial in v., ... , Vm of degree
exactly r + l }.
(4) If Theorem 2 is used as a recursive definition of RM codes, use
Equation (5) to calculate their dimension. Obtain Fig. 13.3 from Pascal's
triangle for binomial coefficients.
(5) Show that ~(0, m) and ~(0, m)* are repetition codes, ~(m -I, m)
contains all vectors of even weight, and ~ (m, m) and ~ (m - l, m )* contain
all vectors, of the appropriate lengths.
(6) Let IS I Tl ={Is It I: s E S, t E T}. Show that
~(r+l,m+l)=UISISI
This gives us another way of thinking about codewords of ~(r, m), namely
as (incidence vectors of) subsets of EG(m, 2).
For exarr.ple, the Euclidean geometry EG(3, 2) consists of 8 points
P 0 , Pt. ... , P 7 whose coordinates we may take to be the following column
vectors:
Po P. p2 PJ p4 Ps p6 p7
VJ 1 1 0 0 0 0
jj2 1 0 0 1 0 0
v. 0 0 0 0
1 1 0 0
1 0 1 0.
h = x(H) consists of all points v which satisfy a linear equation in v~, ... , Vm.
In other words, the Boolean function h is a linear function of v~, ... , Vm, and
so is a codeword of weight 2m-t in ~(r, m).
We remark that if 1 E ~(r, m) is the incidence vector of a set S, then
hiE ~(r + 1, m) and is the incidence vector of S n H. We are now ready for
the first main theorem.
Proof. Let H be any hyperplane EG(m- l, 2) in EG(m, 2) and let H' be the
parallel hyperplane, so that EG(m, 2) =HUH'.
By the above remark S n H and S n H' are in ~(r + l, m), and so contain
0 or ;;,;?: 2m-r-l points. Since lSI= 2m-r =IS n HI+ IS n H'l, IS n HI= 0, 2m-r-l
or 2m-'. The following Lemma then completes the proof of the theorem.
Lemma 6. (Rothschild and Van Lint.) LetS be a subset of EG(m, 2) such that
lSI= 2m-r, and IS n HI= 0, 2m-r-l or 2m-r for all hyperplanes H in EG(m, 2).
Then S is an (m- r)-dimensional flat in EG(m, 2).
Case (iii). It remains to consider the case when IS n HI= 2m-r-1 for all H.
Consider
= L
aES
~ ~
fts HC~m. 2)
XH(a)xH(b)
since there are 2m- l hyperplanes in EG(m, 2) through a point and 2m-)- I
through a line. The LHS is 2 m- r-~(2m- 1). Substituting lSI= 2m-• on the RHS
2 2
Theorem 7. The incidence vector of any (m- r)-ftat in EG(m, 2) is in ~(r, m).
L a;;V; = b;,
j=l
i = l, ... , r,
or equivalently
m
L a;;V; + b; + l =
j=J
l, i = l, ... , r.
This can be replaced by the single equation
fr (f
..... J=l
a;;V; + b; + l) l,
=
(b) ~(r, m) is
A subset T ={a'\ ... , ad•} of these points will be represented in the usual
way by the polynomial
Wr(X) = xdo + ... + Xd•.
If T ={ado, ... , ad'} is a PG(JL- 1, 2) then the points of T are all nonzero
linear combinations over GF(2) of JL linearly independent points a 0 , • • • , a .. _1
(say) of GF(2m). In other words the points of T are
,...-1
La,lai=ad', i=0,1, ... ,l,
j=O
where (aiO, ail, ... , a,.. -1) runs through all nonzero binary JL-tuples. Also
xwr(x) represents the PG(JL- 1, 2) spanned by aao, .. . , aa .. -1. Thus every
cyclic shift of the incidence vector of a PG(JL- 1, 2) is the incidence vector of
another PG(JL- 1, 2).
Let ~ be the code generated by all Wr(x), where T is any PG(JL- 1, 2).
Clearly ~ is a cyclic code and is contained in ~(r, m)*; the theorem asserts
that in fact ~ = ~(r, m)*. We establish this by showing that
_ ~ S! i0 i~-•
- L.J . • ao · · · a .. -1·
l: j;-s Jo 1.... J.. -1.1
h ... l (6)
Q.E.D.
Problem. (8) Show that if ao, ... , a .. -1 are linearly dependent, then
F'.(ao, ... , a .. -~) is identically zero modulo 2.
Ch. 13. §5. Minimum weight vectors 383
Important Remark. For nonnegative integers s let w2 (s) denote the number of
1's in the binary expansion of s. Then the proof of this theorem has shown
that a • is a nonzero of ~(r, m )* iff 1 ~ s ~2m - 1 and w2(s);;;.. p.,. Or in other
words,
Theorem ll. The punctured RM code ~(r, m)* is a cyclic code, which has as
zeros a• for all s satisfying
1 ~ w2 (s) ~ m - r- 1 and 1~ s ~2m - 2.
The generator and check polynomials, for the punctured RM code ~(r, m)*
are, for 0~ r~ m -1,
g(x) = f1
llllii0w2(S)IIIii0m-r-l
M<•>(x) (7)
l111ii0slllii02'"-2
h(x) = (x + 1) fl
m-r,.;;w2(S)IIIii0m-l
M<•>(x) (8)
llllii0slllii02'"-2
h(x) = (x + 1) n
lc;;~/;)c;;r
M<•>(x). (10)
llllii0slllii02'"-2
1
or equivalently (again replacing a by a- ),
6o+ L
m-r"'"""'2(s),..m-l
6.
l,..s,..2'"-2
For example, the idempotents of ~(1. m)* and ~(2, m)* may be taken to be
8o+ 8r
and
m-1
8o+ 8T + L
;~((m+1)/2)
8~
Nesting habits. Figure 13.5 shows the nesting habits of BCH and punctured
RM codes. Here OO(d) denotes the BCH code of designed distance d, and the
binary numbers in parentheses are the exponents of the zeros of the codes
(one from each cyclotomic coset). The codes get smaller as we move down
the page.
R(m-1)*=6(1 l {¢}
I
ft.(m-2)*=1B(3) { 1}
I
63( 5)
I
~7)~
~ ( m- 3 )* { 1 ,1 1 • 1 0 1 •1 0 01 .... } a (9 ) { 1 •11 • 1 0 1 •111 }
6(11) { 1 ,11,1 01,111,1001}
I I
Fig. 13.5. Nesting habits of BCH and punctured RM codes.
Ch. 13. §6. Encoding and decoding (I) 385
We see that
~(r, m)* C BCH code of designed distance 2m-r- 1,
~(r, m) C extended BCH code of designed distance 2m-r- 1.
Proof. We recall that ~(r, m) is ~(r, m)* with an overall parity check added.
By Theorem 10, the incidence vectors of the (m - r)-flats with a 1 in
coordinate 0 generate ~(r, m). So certainly all the (m- r)-flats generate
~(r, m). Q.E.D.
Theorem 13. (MacWilliams and Mann.) The rank over GF(p) of the incidence
matrix of the hyperplanes of an m-dimension Euclidean or projective geometry
over GF(p') is
(m+:-1)' +e
Research Problem (13.1). Is there a codeword a(x) E ~(r, m)* which is the
incidence vector of a PG(m- r- 1, 2) and generates ~(r, m)*?
There are two obvious ways to encode an RM code. The first uses the
generator matrix in the form illustrated in Fig. 13.2 (this is a nonsystematic
encoder). The second, which is systematic, makes use of the fact (proved in
Theorem 11) that RM codes are extended cyclic codes. In this § we give a
decoding algorithm which applies specifically to the first encoder, and then in
§7 we give a more general decoding algorithm which applies to any encoder.
We illustrate the first decoder by studying the [16, 11, 4] second-order RM
code of length 16, ~(2, 4). As generator matrix G we take the first 11 rows of
Fig. 13.2. Thus the message symbols
386 Reed-Muller codes Ch. 13. §6.
a,2 = Xo + x, + X2 + X3
= X4 + Xs + x6 + x,
= Xs+ X9+ Xto+ X,,
= X12+ X13 + x,4 + x,s, (12)
a 13 =xo+x,+x4+xs
= x2 + X3 + x6 + x,
=xs+x9+x12+xl3
= X10 + x,, + Xt4 + X,s, (13)
a34 = Xo + X4 + Xs +X 12
=:x,+xs+x..+xl3
=: x2 + x6 +X to+ x,4
=: X3 + X1 + X11 + X't5·
Equation (12) gives 4 votes for the value of a 12 , Equation (13) gives 4 votes
for a 13 , and so on. So if one error occurs, the majority vote is still correct, and
thus each ajj is obtained correctly.
To find the symbols a,, ... , a 4 , subtract
from x, giving say x' = x~x; · · · x:s. Again from Fig. 13.2 we observe that
a,= x~+
x;
= x;+ x;
= x;4+ x;,,
a2 = x~+ x;
Now it is easier: there are 8 votes for each a;, and so if there is one error the
Ch. 13. §6. Encoding and decoding (i) 387
= aol +error,
and a 0 = 0 or 1 according to the number of 1's in x".
This scheme is called the Reed decoding algorithm, and will clearly work
for any RM code.
How do we find which components of the codeword x are to be used in the
parity checks (12), (13), ... ? To answer this we shall give a geometric des-
cription of the algorithm for decoding ~(r, m). We first find au, where
u = u 1 • • • u, say. The corresponding row of the generator matrix, vu, • · · Vu,,
is the incidence vector of an (m - r)-dimensional subspace S of EG(m, 2). For
example, the double line in Fig. 13.6 shows the plane S corresponding to a 12 •
Let T be the "complementary" subspace to S with incidence vector
v., · · · v.m-•' where {T., ... , Tm-r} is the complement of {u., ... , u,} in
{1, 2, ... , m}. Clearly T meets S in a single point, the origin.
Let U., ... , U 2m-· be all the translates of T in EG(m, 2), including T itself.
(These are shaded in Fig. 13.6.) Each U, meets S in exactly one point.
0110 1110
0001 1001
Fig. 13.6. EG(4, 2) showing the subspace S (double lines) and U, ... , u. (shaded).
388 Reed-MuJ/er codes Ch. 13. §7.
These equations are a generalization of Equations (12), (13), and give 2m-r
votes for a".
where the sum is over all subsets {Pt. ... , p,} of {1, ... , m} of size at most r.
(This generalizes Equation (11)). Therefore
~ Xp = ~ ap ~ (Vp1 ' ' ' Vp.)P
PEUi p PEU1
If s < r, this subspace has dimension at least 1, and N( U;, p) is even. On the
other hand, if s = r but W rf S, then one of the p, must equal one of the 'Tj, say
P• = 'Tt. Then T and W intersect in
This theorem implies that, if no more than [1(2m-r- l)] errors occur,
majority logic decoding will recover each of the symbols a" correctly, where
u is any strin~ of r symbols. The rest of the a's can be recovered in the same
way, as shown in the previous example. Thus the Reed decoding algorithm can
correct [1(d- l)] = [1(2m-r- l)] errors.
The Reed decoding algorithm does not apply if the code is encoded
systematically as an extended cyclic code, as in §8 of Ch. 7. Fortunately
another majority logic decoding algorithm is available, and its description will
lead us to a more general class of codes, the finite geometry codes.
If ~ is any [n, k] code over GF(q), the rows of the H matrix are parity
Ch. 13. §7. Encoding and decoding (II) 389
L h,x, = 0
i=O
Example. Consider the [7, 3, 4] simplex code, with parity check matrix
IIOIOOOJ
0110100
H =
r 0011010 .
0001101
Seven of the 16 parity checks are shown in Fig. 13.7.
0 2 3 4 5 6
I 0 I 0 0 0
0110100
0011010
0001101
1000110
0100011
I 0 I 0 0 0 I
Fig. 13.7. Parity checks on the [7, 3, 4] code.
have
St =Yo+ Yt + Y3 =eo+ e1 + e3,
S2 =Yo+ Y4 + Ys = eo+ e4 + es,
S3 = Yo+ Y2 + y6 = eo+ e2 + e6.
Theorem 15. If not more than [4}] errors occur, then the true value of eo is the
value taken by the majority of the S; 's, with the rule that ties are broken in favor of
0.
Proof. Suppose at most 41 errors occur. (i) If eo= 0, then at most BJ]
equations are affected by the errors. Therefore at least fVl of the S; 's are
equal to 0. (ii) If eo= 1, then less than BJ] equations are affected by the other
errors. Hence the majority of S;'s are equal to 1. Q.E.D.
Corollary 16. If there are J parity checks orthogonal on every coordinate, the
code can correct [4}] errors.
Remarks. (i) If the code is cyclic, once a set of J parity checks orthogonal on
one coordinate has been found, J parity checks orthogonal on the other
coordinates are obtained by cyclically shifting the first set.
(ii) The proof of Theorem 15 shows that some error vectors of weight
greater than [4}] will cause incorrect decoding. However, one of the nice
features of majority logic decoding (besides the inexpensive circuitry) is that
often many error vectors of weight greater than [4}] are also corrected.
(iii) Breaking ties. In case of a tie, the rule is to favor 0 if it is one of the
alternatives, but otherwise to break ties in any way. Equivalently, use the
majority of {0, S~o S 2, ...}.
This method of decoding is called one-step majority logic decoding.
However, usually there are not enough orthogonal parity checks to correct up
to one-half of the minimum distance, as the following theorem shows.
Theorem 17. For a code over GF(q), the number of errors which can be
corrected by one-step majority logic decoding is at most
n-1
2(d'- 1)'
where d' is the minimum distance of the dual code.
Proof. The parity checks orthogonal on the first coordinate have the form
;;,;?: d'-1
XXX 0000
000 xxxx
;;,;?: d'-1
Ch. 13. §7. Encoding and decoding (II) 391
Examples. (I) For the [23, 12, 7] Golay code, d' = 8, and so at most [22/2.7] = l
error can be corrected by one-step decoding.
(2) Likewise most RS codes cannot be decoded by one-step decoding, since
d' = n- d + 2.
However, there are codes for which one-step majority logic decoding is
useful, such as the [7, 3, 4] code of Fig: 13.7 and more generally the diff-
erence-set cyclic codes described below.
L-step decoding. Some codes, for example RM codes, can be decoded using
several stages of majority logic.
Example. 2-step decoding of the [7, 4, 3] code. The 7 nonzero parity checks are
0 2 3 4 5 6
l l 1 0 l 0 0
0111010
0011101
10011 0
0100111
1010011
1101001
There are two parity checks orthogonal on coordinates 0 and l, namely
and so on. Suppose there is one error. Then the majority rule gives the correct
value of e0 + e 1 (from the first pair of equations), and of e0 + e2 (from the
392 Reed-Muller codes Ch. 13. §7.
~ = MAJORITY GATE
Fig. 13.8. Two-step majority decoding of the [7, 4, 3] code.
Since the code is cyclic, it is enough to design a decoder which corrects the
first coordinate. The others are then corrected automatically.
A decoder which has L levels of majority logic is called an L-step decoder.
The basic idea, as illustrated in the preceding example, is that the number of
coordinates in the check sums which are being estimated decreases from level
to level, until at the final step we have estimates for the individual coor-
dinates.
Lemma 18. If there are J checks at each stage of the decoding, then [V] errors
can be corrected.
Even with L-step decoding it may not be possible to correct up to half the
minimum distance.
Theorem 19. For a code over GF(q), the number of errors which can be
Ch. 13. §7. Encoding and decoding (II) 393
Example. For the [23, 12, 7] Golay code, L-step decoding cannot correct more
than 2 errors.
Theorem 20. For the r'h .order RM code ~(r, m), (r +I)-step majority decoding
can correct U(d- 1)] = B(2m-•- 1)] errors.
Proof. The dual code is ~(m- r-1, m) and by Theorem 8 the low weight
codewords in the dual code are the incidence vectors of the (r + I)-dimensional
flats in EG(m, 2).
Let V be any r-flat. We will find a set of parity checks orthogonal on the
coordinates yp, P E V. In fact, let U be any (r + 1)-flat containing V. Now
each of the 2m - 2' points not in V determines a U, and each U is determined
by 2'+ 1 - 2' such points. Therefore there are (2m - 2')/(2'+ 1 - 2') = 2m-r- 1
different U's. Any two such U's meet only in V. Thus we have an estimate
for the sum
This estimate will be correct provided no more than [~(2m-•- 1)] errors occur.
We repeat this for all r-flats V.
Next, let W be any (r- 1)-flat, and let V be any r-flat containing W. There
394 Reed-Muller codes Ch. 13. §7.
are 2m-•+•- 1 such V's and from the first stage we know the values of the
corresponding sums. Therefore we can obtain an estimate for the value of the
sum
Proceeding in this way, after r + 1 steps we finally arrive at an estimate for yp,
for any point P, which will be correct provided no more than U{d- 1)] =
B(2m-r- 1)] errors occur. Q.E.D.
denote the output from the first majority gate at successive times. Then
s4 =eo+ e.,
s~ =e.+ e2,
Note that
So if we are willing to wait one clock cycle, Ss can be obtained without using
the second majority gate. The resulting circuit is shown in Fig. 13.9.
In general this technique (when it is applicable) will reduce the number of
majority gates, adders, etc., from an exponential function of the number of
steps to a linear function, at the cost of a linear delay (as in Fig. 13.10).
The name "sequential code reduction" comes from the fact that at each
successive stage of the decoder we have estimates for additional parity
checks, and so the codeword appears to belong to a smaller code. In the
example, after the first stage we know all the sums e, + eh which are in fact
panty cnecKs on the L7, 1, /]repetition cu<le.
Unfortunately, sequential code reduction doesn't apply to all codes (and
even when it does apply, it may be a difficult task to find the best decoder).
Ch. 13. §7. Encoding and decoding (li) 395
Fig. 13.9. Decoder for [7, 4, 3] code using sequential code reduction.
A
I BUFFER I /\ X
\..,/
I I
I COMPUTE I s I COMPUTE I "e
SYNDROME
s -e-
I I I J
Fig. 13.11.
396 Reed-Muller codes Ch. 13. §7.
(15)
Theorem 21. (Rudolph.) Any binary linear code can be decoded by one-step
threshold gate decoding.
e
Proof. Write = (f,, ... , /.),where each component f = f(S) = f(S,, ... , S.-k)
is a function of the syndrome. Let F,(S) = l- 2f(S) be the corresponding real
± 1-valued function. By Equation (II) of Ch. 14, F.(S) can be written
F,(S) =
2
!-k L
uev"-k
ft.(u)(-1)" s.
where the F;(u) are the Hadamard coefficients of F,(S) given by Equation (8)
of Ch. 14. Then
~ • ". s)
f(S) = 2l ( 1- 2"-k
l
uE~-• F,(u)(-1) .
If 8 is any threshold gate function of the 2"-k inputs L?::t u;S,, u E V"-\ with
weights a., it is immediate from the definition of 8 that
8(Lu;S,:uEV"-k)=~(l-sgn L a.(-1)""
5
),
I uEVn-k
e
Unfortunately Equation (16) represents as a threshold function of all the
r-k parity checks, so this is not a practical algorithm. However in a number
of cases it has been possible to find a different one-step threshold gate
e
realization of which involves many fewer parity checks (see Notes).
Research Problem (13.2). Find the most efficient one-step threshold gate
realization of a given Boolean function.
(I) Difference-set cyclic codes. Let II be the projective plane PG(2, p') of
order p' (see Appendix on Finite Geometries). II contains n = p 2 ' + p' + 1
points, which can be represented as triples
{J, E GF(ps).
Note that (Af3t. Af32, Af33), A E GF(p')*, is the same point as ({3., {32, {3 3). Each
triple can be regarded as an element of GF(p 3 ' ) , i.e. can be written as a power
of a, where a is a primitive element of GF(p 3 ' ) . Some scalar multiple of each
triple is equal to a' for 0 ~ i < n. We label the n points of the plane by these
powers of a.
Let a\ ... , a\ l = p' + 1, be a line of II. The incidence vector of this line
has 1's in exactly the coordinates it. ... , i,. By the proof of Theorem 10, any
cyclic shift of this vector is the incidence vector of another line. Since th~re
are n shifts and n lines, every line of II is obtained in this way.
Let 9lJ be the code generated over GF(p) by these n incidence vectors, and
let ~ = 9lJ-'-. Clearly 9lJ is a cyclic code of length n. From Theorem 13, 9lJ has
dimension (P;'y + 1.
~ can be decoded by one-step majority logic, as follows. The incidence
vectors of the l = p' + 1 lines through a point of II form a set of orthogonal
checks on that coordinate. (They are orthogonal because two lines through a
point have no other intersection.) By Corollary 16, one-step majority logic
decoding will correct ~(p' + 1) errors, and the code has minimum distance at
least p' + 2.
dt. . .. , d, with the property that the 1(1- 1) differences d,- d; (i ¥- j), when
reduced modulo n, are exactly the numbers 1, 2, ... , n- 1 in some order.
Research Problem (13.3). Are there any other planar difference sets?
T <eplace CJ {J by (21)
(i.e., for which b = 0) is the general linear group GL(m, 2) (see §5 of Ch. 8),
and has order
IGL(m, 2)1 =(2m- l)(2m- 2)(2m- 2 2) • • • (2m- 2m-l)
= 0.29 2m' for m large. (22)
Since (21) fixes the zero m-tuple, the group GL(m, 2) permutes the code-
words of the punctured RM code ~(r, m)*:
GL(m, 2) C Aut ~(r, m)*. (23)
Proof of (b). Let x~, ... , x 8 be the minimum weight vectors of ~(r, m). For
7T E Aut ~(r, m ), let 7TX; =xi'. Now X; is an (m - r)-flat. If Y is any (m - r- 1)-
400 Reed-Muller codes Ch. 13. §9.
which is the intersection of two (m- r)-ftats~ and contains 2m-•-• points since
'7Tis a permutation. Thus '7TY is an (m - r - 1)-ftat. So '7T permutes the
generators of ~(r+ 1, m), and therefore preserves the whole code. Part (a) is
proved in the same way. Q.E.D.
By Problem 30 of Ch. 8, Aut 'Je!, =Aut ~(1. m)*. From (23), and the remark
following Theorem 13 of Ch. 8, since 'X;;, has dimension m,
Aut 'Je!, =Aut ~(I, m)* = GL(m, 2) .
Finally, by Problem 29 of Ch. 8, Aut 'Xm = GL(m, 2).
(ii) Let G.= Aut ~(1. m)*, G.2 =Aut ~(I, m). Clearly G 1 is the subgroup of
G2 which fixes the 0 coordinate. Since GA(m) is transitive, so is G 2. Each
coset of G 1 in G 2 sends 0 to a different point, so /G2/ =2m/G./. Therefore from
(19) and (22) G2 = GA(m). Again by Problem 29 of Ch. 8, Aut ~(m- 2, m) =
Aut ~(1. m) = GA(m).
(iii) From Theorem 23 and (i), (ii),
GA(m) =Aut ~(1. m) C Aut ~(2, m) C · · · C Aut ~(m -·2, m) = GA(m)
GL(m, 2) =Aut ~(1. m)* C Aut ~(2, m)* C · · · C Aut ~(m- 2, m)*
= GL(m, 2).
Q.E.D.
Problem. (9) Show that GL(m, 2), GA(m) are in fact groups of the stated
orders, and are respectively doubly and triply transitive.
Ch. 13. §10. Mattson-Solomon polynomials of RM codes 401
In this section we shall show that the Boolean function defining a code-
word of an RM code is really the same as the Mattson-Solomon polynomial
(Ch. 8) of the codeword.
Let a be a primitive element of GF(2m). Then I, a, ... , am-I is a basis for
GF(2m). Let A0 , ••• , Am-I be the complementary basis (Ch. 4). We shall now
consider RM codes to be defined by truth tables in which the columns are
taken in the order 0, I, a, a 2, ... , a 2m_ 2 • For example, when m = 3, the truth
table is shown in Fig. 13.12.
0 I a a2 aJ a4 as a6
VJ 0 0 0 I 0 I
v2 0 0 1 0 I 0
V1 0 I 0 0 0
Fig. 13.12.
Proof. Let .tU be the matrix consisting of the rows Vm, ... , V1 of the truth table,
with the first, or zero, column, deleted . .tU is an m x 2m- 1 matrix (.itlk;), and
In fact,
m-1
Tm(Aiai) = L .tftlkiTm(Aiam-k-l) = .Jlm-j-1.0,
k-0
(ii) When evaluating the RHS of (24), high powers of z are reduced by
402 Reed-Muller codes Ch. 13. §11.
z 2m-t = I. However, "nee A(z) has been obtained, in order to use the proper-
ties of MS polynomials given in Ch. 8, A(z) must be considered as a
polynomial in .'i'[z], .'1' = GF(2m).
Conversely, if a IS the vector with MS polynomial A(z), the B.f. cor-
responding to a is
Problem. (ll) Show that those zeros of an affine polynomial which lie in
GF(2m) form an r-ftat in EG(m, 2).
Theorem 27. A transformation of the general affine group GA(m) acts on the
MS polynomial A(z) of a vector of length 2m by replacing z by F(z), where
F(z) is an affine polynomial with exactly one zero in GF(2m). Conversely, any
such transformation of MS polynomials arises from a transformation of
GA(m).
Problem. (12) Show that the polynomial (26) has exactly one zero in GF(2m).
Conversely, any transformation
z ~ F(z) = U 0 + f(z), U0 E GF(2m), (27)
when f(z) is a linearized polynomial and F(z) has exactly one zero in GF(2m),
is in GA(m). We decompose (27) into
z ~z + Uo,
which is clearly in GA(m), followed by
m-1
z ~ f(z) = L Y;Z
j=O
2
j. (28)
Problem. (13) Show that (28) is in GL(m, 2), i.e. has the form
m-1
z~ L (/3. (A2Y)z2i
i=O
§l. Reed-Muller codes are named after Reed [1104] and Muller [975]
(although Peterson and Weldon [1040, p. 141] attribute their discovery to an
earlier, unpublished paper of Mitani [963]).
404 Reed-Muller codes Ch. 13.
§2. Truth tables are widely used in switching theory (see for example
McCluskey [935, Ch. 3] and in elementary logic, where they are usually
written with FALSE instead of 0 and TRUE instead of 1 (see for example
Kemeny et al. [755, Ch. 1]). In either form they are of great importance in
discrete mathematics. For the disjunctive normal form see Harrison [606, p.
59] or McCluskey [935, p. 78].
For a proof see for example Berlekamp [113, p. 113]. Singmaster [1216] gives
generalizations.
That RM codes are extended cyclic codes (Theorem 11) was simul-
taneously discovered by Kasami, Lin and Peterson [739, 740] and Kolesnik
and Mironchikov (see [774] and the references given there). See also Camion
[237].
Theorem 13 is from MacWilliams and Mann [884]. The ranks of the
incidence matrices of subspaces of other dimensions have been determined
by Goethals and Delsarte [499], Hamada [591], and Smith [1243-1246].
§6. The Reed decoding algorithm was given by Reed [1104], and was the first
nontrivial majority logic decoding algorithm. For more about this algorithm
Ch. 13. Notes 405
see Gore [544] and Green and San Souci [555]. Massey [918] (see also [921])
and Kolesnik and Mironchikov (described in Dobrushin [378]) extensively
studied majority logic and threshold decoding. Rudolph [1130, 1131] in-
troduced one-step weighted majority decoding, and this work was extended
by Chow [294], Due [387], Gore [544, 545], Ng [990] and Rudolph and Robbins
[1134].
Techniques for speeding up the decoding of Reed-Muller codes were given
by Weldon [1403]. See also Peterson and Weldon [1040, Ch. 10]. Decoding
Reed-Muller and other codes using a general purpose computer has been
investigated by Paschburg et al. [1025, 1026].
Due [388] has given conditions on a code which must be satisfied if L-step
decoding can correct [~(d- l)] errors. See also Kugurakov [786].
The Reed decoding algorithm will correct many error patterns of weight
greater than [!(d- 1)]. Krichevskii [784] has investigated just how many.
Theorems 17 and 19 are given by Lin [835-837].
Other papers on majority-logic decoding are Berman and Yudanina [137],
Chen [271], Delsarte [348], Due and Skattebol [390], Dyn'kin and Tene-
gol'ts [397], Kasami and Lin [732, 733], Kladov [763], Kolesnik [773], Lon-
gobardi et al. [860], Redinbo [llOl], Shiva and Tavares [1205], Smith [1245]
and Warren [1390].
§9. The class of codes which are invariant under the general affine group has
been studied by Kasami, Lin and Peterson [738] and Delsarte [344].
The codewords (except for 0 and I) of the cyclic [2m- 1, m, 2m-t] simplex
code Ym or the [2m, m + 1, 2m-t] extended cyclic first-order Reed-Muller code
~(1. m) resemble random sequences of O's and 1's (Fig. 14.1). In fact we shall
see that if cis any nonzero codeword of ::lm, then c has many of the properties
that we would expect from a sequence obtained by tossing a fair coin 2m - 1
times. For example, the number of O's and the number of 1's in c are as nearly
equal as they can be. Also, define a run to be maximal string of consecutive
identical symbols. Then one half of the runs in c have length 1, one quarter
Ch. 14. §2. Pseudo-noise sequences 407
0000000
1110100
0 1 1 10 10
0 0 1 1 10 1
10 0 1 1 10
0 10 0 1 1
0 10 0 1
1 1 0 10 0
Fig. 14.1. Code words of the [7, 3, 4) cyclic simplex code.
have length 2, one eighth have length 3, and so on. In each case the number of
runs of 0 is equal to the number of runs of 1. Perhaps the most important
property of c is that its auto-correlation function is given by
OUTPUT
~ 1
Lo--o--crlur"4•. . ----·----- OUTPUT SEQUENCE
•• '1111 01 0110010001
0 0 0 1
1 0 0 0 PERIOD = 1!5
0 1 0 0
0 0 1 0
1 0 0 1
1 1 0 0
0 1 1 0
STATES
1 0 1 1
0 1 0 1
1 0 1 0
1 1 0 1
1 1 1 0
1 1 1 ,1
0 1 1 1
0 0 1 1
0 0 0
(REPEATS)
Fig. 14.3. Shift register corresponding to h(x) = x• + x +I, showing successive states.
Definition. For any nonzero initial state, the output a is called a pseudo-noise
(or P N) sequence. (These sequences are also called pseudo-random
sequences, m-sequences, or ·maximal length feedback shift register
sequences.) An example is shown in Fig. 14.3, which gives the successive
states of the shift register if the initial state is 0001. The output sequence is
the 4th column, i.e.,
a= a 0 a 1 • • • = 100 010 011 010 111,100 ... (1)
Properties of a PN sequence.
forl=m,m+1, ....
Ch. 14. §2. Pseudo-noise sequences 409
(
a,_~+') (a,~m)
. =U 2
(a,_~_,)
0
. =U 0
. =···
0
a, a,_, a,_2
= u'-m+l ("'. )
Um-1'
Now U is the companion matrix of h(x) (see §3 of Ch. 4), and by problem 20
of Ch. 4, n =2m - I is the smallest number such that U" = I. Therefore there
is a vector b = (a 0 ••• am-t) such that U"b T = b T' and U'b T ¥- b T for l ~ i ~
n - l. If the initial state is taken to be b, then a has period 2m - l. Q.E.D.
Property III. With this initial state, the sh1jt register goes through all possible
2m - l nonzero states before repeating.
Note that the zero state doesn't occur, unless a is identically zero. Also 2m- l
is the maximum possible period for an m-stage linear shift register.
Property IV. For any nonzero initial state, the output sequence has period
2m- I, and in fact is obtained by dropping some of the initial digits from a.
of length n = 2m - 1 from a.
Proof. c is clearly in the code with parity check polynomial h(x), since c
satisfies Equation (12) of Ch. 7. Since h(x) is a primitive polynomial, this is a
simplex code (by §3 of Ch. 8).
Property VI. (The shift-and-add property.) The sum of any segment c with a
cyclic shift of itself is another cyclic shift of c.
Pseudo-randomness properties.
Property VII. In any segment c there are 2m-I 1's and 2m-!- 1 O's.
Property VIII. In c, one half of the runs have length 1, one quarter have length
2, one eighth have length 3, and so O'J, as long as these fractions give integral
numbers of runs. In each case the number of runs of O's is equal to, the number
of runs of 1's.
Ch. 14. §2. Pseudo-noise sequences 411
p( T) = -n ;-o
~ s;s>+; for T = 0, ± 1, ± 2, ... (2)
p!il
I
30~
\_J_ !5
!5
Problems. (7) Show that (5) is the best possible autocorrelation function of
any binary sequence of period n = 2m - 1, in the sense of minimizing max p(i).
O<i<n
where m < b < !N. Show that the rank of M over GF(2) is less than b. [Hint:
there are only m linearly independent sequences in .'I'm.] On the other hand,
show that if Co, . . . , CN-1 is a segment of a coin-tossing sequence, where each
cj is 0 or 1 with probability t then the probability that rank (M) < b is at most
22b-N-'. This is very small if b ~ !N.
Thus the question "Is rank (M) = b ?" is a test on a small number of digits
from a PN sequence which shows a departure from true randomness. E.g. if
m = 11, 2m - 1 = 2047, b = 15, the test will fail if applied to any N =50
consecutive digits of a PN sequence, whereas the probability that a coin-tossing
sequence fails is at most 2- 2 '.
u 1+ ~ u,v,, u, = 0 or l, (7)
0 i=l
Thus
Suppose that in the codewords of 87t(l, m) we replace l 's by - l 's and O's
by I 's. The resulting set of 2m+l real vectors are the coordinates of the vectors
of a regular figure in 2m-dimensional Euclidean space, called a cross-polytope
(i.e. a generalized octahedron). For example, when m = 2, we obtain the
4-dimensional cross-polytope (also called a 16-cell) shown in Fig. 14.5.
11-- 1111
--11
Fig. 14.5. 4-dimensional cross-polytope, with vertices corresponding to codewords
of ~(1.2).
414 Rrst-order Reed-Muller codes Ch. 14. §3.
This set of real vectors is also called a biorthogonal signal set- see
Problem 43 of Ch. l.
If the same transformation is applied to the codewords of (Jm, we obtain a
set of 2m mutually orthogonal real vectors.
For any vector u = (Ut, ...• Um) in vm. f(u) will denote the value off at U,
or equally the component of I in the place corresponding to u.
It will be convenient to have a name for the real vector obtained from a
binary vector 1 by replacing l's by - l's and O's by + l's- call it F. Thus the
component of F in the place corresponding to u is
F(u) :o ( - 1)1<•>
Hadamard transforms and cosets of ~(1. m). Recall from Ch. 2 that the
Hadamard transform of a real vector F is given by
F(u)= L
vEVm
(-I)"""F(v), uEVm,
= L (- l)".
vEVm
v+f(v) (8)
F= mFH, (10)
2
or
F(v)= ~
2 L uEVm
(-I)"."F(u). (11)
Observe from (8) that F(u) is equal to the number of O's minus the number of
l's in the binary vector
m
1+ L
i=l
u,v,.
Thus
or
(13)
Ch. 14. §3. Cosets of the first-order Reed-Muller code 415
Also
dist {t. l + ~ U;V;} = H2m + F(u)} (14)
Now the weight distribution of that coset (of a code 'tii) which contains f gives
the distances of f from the codewords of 'tii. Therefore we have proved:
Theorem l. The weight distribution of that coset, of {Jft (l, m) which contains f is
l A
according to Theorem 1,
At([)= 4, AJ(f) = 4,
Lemma 2.
22m if v = 0
L
wEV'"
F(u)F(u + v) = {
0 if v ¥- 0
Proof.
LHS = L L (- 1)" · wF(w)o L (- 1)<•+vl·xF(x)
wEV'" wEV'" xEV'"
= L (-1)"-xF(w)F(x) L (-l)•·<w+x).
w,xe: V'" wE V'"
LHS=2m L
wEV'"
(-l)"-wF(w) 2
=2m L
wEV'"
(-l)"·w, since F(w)=±l.
Q.E.D.
(15)
Note that Corollary 3 and Equation (12) imply that the weight distribution
A :<f), 0 ~ i ~2m, of the coset f + (Jm satisfies
2m
L
(2m - 2i)2 A:(f) = 22m.
i=O
Boolean functions and cosets of ~(1. m). The codewords of ~(1. m) are the
linear functions (7) of v,, ... , Vm. Let us say that two Boolean functions
f(vt. . .. , vm) and g(v,, ... , vm) are equivalent if the difference
f(v,, . .. , Vm)- g(v,, ... , Vm)
for some invertible m x m binary matrix A and some binary m-tuple B. We say
that g is obtained from f by an affine transformation. Then the cosets of
~(1, m) containing f and g have the same weight distribution.
Proof. From Theorem l it is enough to show that the sets [ ± G(u): u E V"'}
and { ± F(u): u E V'"} are equal. In fact,
= L (- 1)"
11EVm
vF(Av +B).
Set v =A -I w +A -I B, then
= ± L (- l)" · ~F(w),
wE\/m
where u' = uA -I
= ± F(u'). Q.E.D.
Therefore, in order to lump together cosets with the same weight dis-
tribution, we can introduce a stronger definition of equivalence. Namely we
define f and g to be equivalent if
m
for some binary invertible matrix A, vector B and constants a,. Now all
Boolean functions in the same equivalence class belongs to cosets with the
same weight distribution.
However, the cosets containing f and g may have the same weight
distribution even if f and g are not related as in Equation ( 17). The first time
this happens is when n = 32, when for example the cosets containing
and
1 1 1 1 1 1 1 1]
00001111 (18)
(XoXI ... X,)= (UoU1U2U3) O O OO
[ 1 1 1 1
0 1 0 1 0 1 0 1
This is accomplished by the circuit shown in Fig. 14.8. The clock circuit in
Fig. 14.8 goes through the successive states t.t2t3 = 000,001,010,011, 100, ... ,
111, 000, 001, ... (i.e., counts from 0 to 7). The circuit forms
which, from Equation (18), is the codeword XoX• ... x,. Nothing could be
simpler.
Fig. 14.7. Part of the Grand Canyon on Mars. This photograph was transmitted by
the Mariner 9 spacecraft on 19 January 1972 using the first-order Reed-Muller code
!?l( I, 5). Photograph courtesy of NASA/JPL.
MESSAGE
CODEWORD
X7 • • •Xt XQ
•
•
CLOCK
L u,v,,
i=l
I+ L u,v,.
i=l
(19)
(20)
Therefore
M1.1+, · · · M:::t.n = U2 ® M1.1) · · · U2 ® M:::>)(H2 ®/2m)
= H2 ® (M1.1· · · M:::>) by (21),
= H2 ® H2m by the induction hypothesis,
= H2m+l, Q.E.D.
Example. For m = 2,
Ch. 14. §4. Encoding and decoding ~(1, m) 423
M,I>ut2>= 1I -1 00 0)(1
0 0 01 01 0)
1 = (l1 -1 11 -1) = H
4 4 4
lVl 0 0 1 1 l 0 - 0 1 1- - •
(
0 0 1 - 0 1 0 - l - - 1
Form= 3,
M~ > =
1
14@ H2,
2
Ms >= I2® H2® l2,
M~ > =
3
H2@ I4.
Decoding circuit: the Green machine. We. now give a decoding circuit for
~(1. m) which is based on Theorem 5. This circuit is called the Green
machine after its discoverer R.R. Green. We illustrate the method by des-
cribing the decoder for ~(1. 3).
Suppose f = fo/1 · · · !1 is the received vector, and let
F= FoF1 · · · F1
= ((- 1)1•, ( - 1)". 0 0 0 ' ( - 1)17).
We wish to find
F = FHs
= (FoFI · · · g)Msl) Ms2>Ms3>, from (22).
Now
1 1
1 -
1 1
altl) _ 1 -
lVls -
1 1
1-
1 1
1 -
So
FMs1> = (Fo + g, Fo- Flo F2 + F3, ... , F6 + F1, F6- F1).
The circuit shown in Fig. 14.9 calculates the components of FMs 1> two at a
time. The switches are arranged so that after Fo · · · F 3 have been read in, the
two-stage registers on the right contain (Fo- Flo Fo + F1) and (F2- F3, F2 + F3)
respectively. These four quantities are used in the second stage (see Fig.
14.10). Then F4 · · · F1 are read in, and (F4- F 5 , F4 + Fs) and (F6- F1, F6 + F1)
are formed in the same pair of two-stage registers.
The second stage calculates
(23)
424 Rrst-order Reed-Muller codes Ch. 14. §4.
INPUT
F7Fs···F1Fo
where
M 8( 2 )-_
(3)_
M8 -
which are desired Hadamard transform components. These are formed by the
circuit shown in Fig. 14.11.
Figures 14.9-14.11 together comprise the Green machine. The final stage is
to find that i for which IF,I is largest. Then f is decoded either as the i'h
codeword of gji( I, 3) if F; ~ 0, or as the complement of the i'h codeword if
F; <0.
Note that the Green machine has the useful property that the circuit for
decoding gji(l, m +I) is obtained from that for gji(l, m) by adding an extra
register to the m'h stage and then adding one more stage.
7 Fs ?5 ?4 ?3 Fz F1 Fo
Problem. (II) Show that M~~M~ = M~M~~- (This implies that the order of the
stages in the decoder may be changed without altering the final output.)
Examples. (I) f(v" v2) = v 1v2 is a bent function, since the F(u) are all± 2 (see
the example preceding Lemma 2).
(2) f(v" v2, V3, v4) = v1v2+ V3V4 is bent, as shown by the last row of Fig. 14.6.
Since F(u) is an integer (from Equation (8)) if f(vl, . .. , vm) is bent then m
must be even. From now on we assume m is even and ;;;?: 2.
Theorem 6. A bent function f(vl, ... , v'") is further away from any linear
function
than any other Boolean function. More precisely, f(vl, . .. , vm) is bent iff the
corresponding vector 1 has distance 2'"- 1 ± 2'"12- 1 from every codeword of
~(1. m). Iff is not bent, I has distance less than 2'"- 1 - 2'" - from some
12 1
Proof. Iff is not bent, then the F(u) are not all ± 2mJ 2 • From Corollary 3, since
there are 2'" summands in Equation (15), some IF(u)i must be bigger than 2mJ 2 •
Therefore from Equation (13) or ( 14), the distance between 1 and some
codeword of ~(1. m) is less than 2'"- 1 - 2'" 12-l. Q.E.D.
Theorem 7. f(vl, ... , v'") is bent iff the 2'" x 2'" matrix H whose (u, v)'h entry is
2
(1{2mJ )F(u + v) is a Hadamard matrix.
which defines a Boolean function /(u., ... , Um). The Hadamard transform
coefficients of J (obtained by setting f = j in Equation (8)) are 2mJ 2( - 1)'<•> =
± 2mJ 2 • Therefore j is also a bent function!
Thus there is a natural pairing f- J of bent functions.
Problem. (12) Show that f is bent iff the matrix whose (u, v )'h entry is
u, v E Vm, is a Hadamard matrix.
(- 1)1<•+v>, for
Theorem 8. If f(v., ... , Vm) is bent and m > 2, then deg f ~ ~m.
Proof. Suppose f is bent and m > 2. The proof uses the expansion of f given
by Theorem 1 of Ch. 13, and requires a lemma.
Let F(u) = ( - 1)1<•>, let F(u) be the Hadamard transform of F(u) given by
Equation (8), and let /(u) be as in Equation (25).
where a is some vector of V'" and ii is its complement. Then l~.cl = 2wt<a>
and (26) becomes
L f(b) = 2wt(a)-l - 2!m-l + 2wt(a)-!m L /(b). (27)
bCa bCti
428 Rrst-order Reed-Muller codes Ch. 14. §5.
I
where
g(a) = L
bCa
f(b) (mod 2).
Thus g(a) is given by Equation (27). But if wt(a) > ~m and m > 2, the RHS
of (27) is even, and g(a) is zero. Therefore f has degree at most ~m. Q.E.D.
Problem. (13) Show that f is bent iff for all v# 0, v E Vrn, the directional
derivative of f in the direction v, defined by
/v(x) = f(x + v) + f(x), X E Vrn,
Theorem 10.
h(u,, ... , Urn, v,, ... , Vn) = f(u~, ... , Urn)+ g(v,, ... , Vn)
is a bent function (of m + n arguments) iff f is a bent function (of u" .. . , Urn)
and g is a bent function (of v" ... vn).
Proof. We shall write wE Vrn+n as w = (u, v) where u E Vrn and v E V". From
Equation (8),
H(w) = L (- l)w
tE Vm +n
•+h(t>, where t = (r, s),
= F(u)G(v). (28)
Iff and g are bent, then from Equation (28) H(w) = ± 2t<rn+n> and so h is bent.
Conversely, suppose his bent but f is not, so that IF(A)I > 2lrn for some A E vrn.
Then with w = (A, v ),
±2~(rn+nJ = H(w) = F(A)G(v)
for all v E V". Therefore IG(v >I< 2i" for all v E V", which is impossible by
Corollary 3. Q.E.D.
Ch. 14. §5. Bent functions 429
Corollary II.
(29)
is bent, for any even m ~ 2.
It is easy to see that if f(v) is bent then so is any function f(Av +B)
obtained from f by an affine transformation.
Problems. (14) Show that if m is even then v,v2+ v,vJ+ · · · + Vm- 1 Vm is bent.
(15) Use Dickson's theorem (Theorem 4 of the next chapter) to show that
any quadratic bent function is an affine transformation of (29).
(16) (a) The bent function v, v2 + v3v. can be represented by the graph
4 3
Show that the 28 quadratic bent functions of v,, v2, v), v. are:
(3 types),
n
(12 types), (12 types), and (l type).
(b) Let f = v,v2 + v3v •. Show that the vectors of weight 6 in f +~(I. 4) form
a 2-(16, 6, 2) design. (This design is called a biplane- see for example
Cameron [231 ]).
(17) Show that v,v2 and v,v 2+ v3v. are essentially the only bent functions of 2
and 4 arguments.
Theorem 12. For any function g(v,, ... , vm), the function
Proof. From Equations (8) and (13), f(u,, ... , urn, v,, ... , vm) is bent iff the
number of vectors (u, v) = (u,, ... , urn, v,, ... , vm) which are zeros of
(i) For any v ¥- A, the first sum is not identically zero, and h is a linear
function of u" ... , urn. Thus there are 2m~) choices of u,, ... , urn for which h
is zero. The total number of zeros of h of this type is 2m~'(2m- 1).
(ii) Now suppose v = A. Then
is bent, where g(v) is arbitrary and cp(u) = (cp,(u), ... , 'Pm(u)) is a 1-1-mapping
of vm onto itself.
Finite geometries
§ 1. Introduction
Finite geometries are large combinational objects just as codes are, and
therefore it is not surprising that they have turned up in many chapters of this
book (see especially Ch. 13). In this Appendix we sketch the basic theory of
these geometries, beginning in §2 with the definitions of projective and affine
geometries. The most important examples (for us) are the projective geometry
PG(m, q) and the affine or Euclidean geometry EG(m, q) of dimension m
constructed from a finite field GF(q). For dimension m ~ 3 there are no other
geometries (Theorem I).
In §3 we study some of the properties of PG(m, q) and EG(m, q),
especially their collineation groups (Theorem 7 and Corollary 9) and the
number of subspaces of each dimension (Theorems 4-6).
In dimension 2 things are more complicated. A projective plane is
equivalent to a Steiner system S(2, n + I, n 2 + n + 1), for some n ~ 2, and an
affine plane to an S(2, n, n 2) for some n ~ 2 (Theorem 10). But now other
kinds of planes exist besides PG(2, q) and EG(2, q)- see §4.
(iv) For any point p there are at least two lines not containing p, and for
any line L there are at least two points not on L.
A subspace of the projective geometry is a subset S of {l such that
(v) If p, q are distinct points of S then S contains all points of the line (pq).
Examples of subspaces are the points and lines of {l and {l itself. A
hyperplane H is a maximal proper subspace, so that {l is the only subspace
which properly contains H.
The projective geometry PG(m, q). The most important examples of pro-
jective and affine geometries are those obtained from finite fields.
Let G F(q) be a finite field (see Chs. 3, 4) and suppose m ~ 2. The points of
{l are taken to be the nonzero (m + I)-tuples
are the same point, where A is any nonzero element of GF(q). These are
called homogeneous coordinates for the points. There are qm+l- I nonzero
(m + I)-tuples, and each point appears q - I times, so the number of points in
{l is (qm+l- 1)/(q- 1).
The line through two distinct points (ao, ... , am) and (bo, . .. , bm) consists
of the points
(I)
where A, J.L E G F(q) are not both zero. A line contains q + I points since there
are q 2 - I choices for A, J.L and each point appears q- I times in (1).
Axioms (i), (ii) are clearly satisfied.
(001)
(001)
Fig. 3. The 13 points and nine of the 13 lines of the projective plane PG(2, 3).
Problems. (3) Show that the points of PG(m, q) can be uniquely labeled by
making the left-most nonzero coordinate equal to I (as in Example 2).
(4) Show that PG(m, q) is constructed from a vector space V of dimension
m +I over GF(q) by taking the 1-dimensional subspaces of V to be the points
of PG(m, q) and the 2-dimensional subspaces to be the lines.
(5) Find the four missing lines in Fig. 3.
(6) Construct PG(3, 2).
The affine or Euclidean projective geometry EG(m, q). This is obtained from
PG(m, q) by deleting the points of a hyperplane H (it doesn't matter which one,
by Corollary 8). For example, deleting the line [100] from Fig. 2 gives the EG(2, 2)
llfV/100
10~01
In general if we choose H to be the hyperplane [I 0 · · · 0] consisting of all
points with ao = 0, we are left with the points whose coordinates can be taken
to be (1, a~, ... , am). In this way the qm points of EG(m, q) can be labeled by
the m-tuples (a~, ... , am), a, E GF(q).
Again we make the convention that EG(- I, q) is empty, EG(O, q) is the
point 0, and EG(I, q) is a line.
Problem. (7) Show that the dimension (as defined above) of EG(m, q) is equal
tom. Show that EG(m, q) is also a vector space of dimension m over GF(q).
Remark. The nonzero elements of GF(qm+ 1) represent the points of PG(m, q),
but there are q- I elements sitting on each point. For example, take GF(4) =
{0, I, w, w 2}. The elements 100, wOO, w 2 00 of GF(4') all represent the point
(100) of PG(2, 4). The line through (100) and (010) contains the five points
Subspaces of PG(m, q)
Problem. (9) Prove directly the special case of Theorem 2 which says that the
number of lines through a point is equal to the number of points on a
hyperplane.
where
r~:n
is a Gaussian binomial coefficient defined in Problem 3 of Ch. I5.
s- rr].
[t-
Problem. (10) Use Theorem 3 to show that the number of PG(r, q) contained
in PG(m, q) is equal to the number of PG(m- r- I, q) contained in PG(m, q).
Appendix B. §3. Rnlte geometries 699
Problem. (II) Show that if a flat contains the origin then it is a linear subspace
of EG(m, q) regarded as a vector space; and that a flat not containing the
origin is a coset of a linear subspace.
Thus a flat of dimension r in EG(m, q) is a coset of an EG(r, q), and will be
referred to as an EG(r, q) or an r-flat. A subspace PG(r, q) of PG(m, q) is
also called an r-flat.
+ ']- [ m ]
[ mr+l = qm-r[m],
r+l r
by Problems 3(b), 3(e) of Ch. 15. Q.E.D.
[t-s- rr].
Proof. Follows from Theorem 4. Q.E.D.
Problem. (12) Show that EG(m, q) can be decomposed into q mutually parallel
hyperplanes.
_s_ fi
q- 1 •~o
(q'" •I _ q') (4)
For the proof see for example Artin [29, p. 88], Baer [56, Ch. 3] or
Carmichael [250, p. 360].
Since PrLm+l(q) is doubly transitive we have:
s n
m-1
i~eO
(qm- q'). (5)
Problem. (13) Given EG(m, q) show that there is essentially only one way to
add a hyperplane and obtain PG(m, q).
For the proof see Problem 13 of Ch. 20, and the references on p. I44 of
Dembowski [370].
We know from Problem II of Ch. I9 that there is no projective plane of
order 6. This is a special case of
Theorem 12. (Bruck and Ryser.) If n = I or 2 (mod 4) and if n is not the sum
of two squares then there is no projective plane of order n.
For the proof see Hall [587, p. I75] or Hughes and Piper [674, p. 87].
Thus planes are known of orders 2, 3, 4, 5, 7, 8, 9, II, 13, I6, I7, I9, . : . , orders
6, I4, 2I, ... do not exist by Theorem I2, and orders IO, I2, I5, I8, 20, ... are
undecided. For the connection between codes and orders n = 2 (mod 4) see
Problem II of Ch. I9.
Notes on Appendix B
Projective geometries are discussed by Artin [29], Baer [56], Biggs [I43],
Birkhoff [I 52, Ch. 8], Carmichael [250], Dembowski [370], Hall [583, Ch. I2],
MacNeish [869], Segre [II73], and Veblen and Young [1368]. References on
projective planes are Albert and Sandler [20], Hall [582, Ch. 20 and 587, Ch.
I2], Segre [II73] and especially Dembowski [370] and Hughes and Piper [674].
For the numbers of subspaces see for example Carmichael [250] or Goldman
and Rota [5I9]. See also the series of papers by Dai, Feng, Wan and Yang
[103, 104, 1441, 1457-1460, 1474].