Trellis Structure of Codes

Trellis Structure of Codes
Alexander Vardy
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
1308 W. Main Street, Urbana, IL 61801
vardy@shannon.csl.uiuc.edu
January 21, 1998
00 00 00 00
00 00
11 11 11 11
11 11 11 11
00
11 11 11 11 00
11 11
00 00 00 00 00 00
01 01 01 01 01
01
01 01
10 10 10 10
10 10 10 10 10 10
10 10 10 10
01 01 01 01 01 01
a b
h f
d c
g
Alexander Vardy
Coordinated Science Laboratory
Department of Electrical Engineering
Department of Mathematics
Department of Computer Science
University of Illinois at Urbana-Champaign
This work was supported in part by the David and Lucile Packard Foundation Fellowship,
and by the U.S. National Science Foundation under grants NCR{9415860 and NCR{9501345.
c January 1998 by Alexander Vardy. This document is intended to appear as a chapter in

the Handbook of Coding Theory, edited by Vera S. Pless, W. Cary Human, and
Richard A. Brualdi, to be published by Elsevier Science Publishers, Amsterdam.
Contents
1. Introduction and historical overview 1
1.1. Historical overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1
1.2. Overview of the chapter : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
2. The trellis and its complexity measures 7
2.1. One-to-one, proper, and biproper trellises : : : : : : : : : : : : : : : : : : : : 8
2.2. Trellis representation of block codes : : : : : : : : : : : : : : : : : : : : : : : 9
2.3. Complexity measures for trellises : : : : : : : : : : : : : : : : : : : : : : : : 10
3. The Viterbi algorithm 13
3.1. Computing ows on a trellis : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
3.2. Maximum-likelihood decoding : : : : : : : : : : : : : : : : : : : : : : : : : : 18
3.3. Complexity of the Viterbi algorithm : : : : : : : : : : : : : : : : : : : : : : : 20
4. The minimal trellis: properties and constructions 22
4.1. Existence and uniqueness : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
4.2. Constructions of the minimal trellis : : : : : : : : : : : : : : : : : : : : : : : 34
4.3. Properties of the minimal trellis : : : : : : : : : : : : : : : : : : : : : : : : : 47
5. The permutation problem 57
5.1. Complexity of the permutation problem : : : : : : : : : : : : : : : : : : : : 59
5.2. Upper bounds on trellis complexity : : : : : : : : : : : : : : : : : : : : : : : 62
5.3. Lower bounds on trellis complexity : : : : : : : : : : : : : : : : : : : : : : : 71
5.4. Table of bounds for short codes : : : : : : : : : : : : : : : : : : : : : : : : : 89
5.5. Asymptotic bounds on trellis complexity : : : : : : : : : : : : : : : : : : : : 93
6. The sectionalization problem 99
6.1. Functions and operations on trellises : : : : : : : : : : : : : : : : : : : : : : 100
6.2. The sectionalization algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 103
6.3. Dynamics of optimal sectionalizations : : : : : : : : : : : : : : : : : : : : : : 106
7. Guide to the literature 110
References 120
Index 128
1. Introduction and historical overview
A trellis T is an edge-labeled directed graph with the property that every vertex in T has
a well-dened depth. Namely, there is an independent subset V0 of the vertex set V vertices
in this subset are called root vertices, and their depth is zero by convention. For every other
vertex v 2 V , there is at least one directed path to v from some root vertex, and if this path
contains i edges then all the directed paths from a root vertex to v contain i edges. The
integer i is thus the depth of v. For example, four of the ve edge-labeled directed graphs
depicted in Figure 1 are trellises (which ones?).
Here is some notation and terminology pertinent to trellises. Because we are interested only
in directed paths, we shall hereafter use the word \path" to mean a directed path. We let Vi
denote the collection of vertices of depth i in T , and refer to Vi as the set of vertices at time i.
The vertex set of T is thus naturally partitioned into disjoint subsets V0 V1 V2 : : :, so that
every edge in T that begins at a vertex in Vi ends at a vertex in Vi+1. The ordered index set
I = f0 1 2 : : :g induced by this partition is called the time axis for T . As we shall see later
in this chapter, this temporal notation is both natural and convenient in the study of trellises.
We defer a more detailed denition of a trellis to the next section. However, it is already
apparent from the foregoing paragraphs that the trellis is essentially a graph-theoretic object.
It is therefore rather remarkable that the trellis plays such a prominent role in coding theory.
In coding theory, trellises establish close ties between block codes and convolutional codes,
serve as a blueprint for a number of ecient decoding algorithms, and give rise to a variety of
challenging and important problems. Some of these problems have elegant solutions, while
many others remain largely unsolved. An overview of these fascinating connections between
codes and trellises is the subject of this chapter. We start with some history.
1.1. Historical overview

The trellis originated in the study of nite-state automata. It was rst introduced by For-
ney 33] in 1967 as a conceptual means to better explain the algorithm 35] for decoding con-
volutional codes. To this day, trellis decoding algorithms, such as Viterbi decoding 82, 126],
the BCJR algorithm 2], sequential decoding 31, 36, 101, 106], and others, remain the pri-
mary motivation for research on trellis structure of codes. However, while a large part of
the research today (as well as this chapter) is concerned with the trellis structure of block
codes, during the rst two decades since its invention in 33] the trellis was used and studied
almost exclusively in the context of convolutional codes.
For an in-depth treatment of the algebraic theory of convolutional codes, we refer the reader
to 83, 34, 22]. Here, we will only note that an encoder E for a convolutional code can be
regarded as a nite-state machine. As such, E has a state-transition diagram, which is an
When drawing a trellis, it is customary to arrange the vertices of each subset 0 1 along a vertical
V V :::
line, with i+1 to the right of i . All the edges are then oriented from left to right. For example, Figures 1a
V V
and 1b depict one and the same trellis, with the trellis structure being much more apparent in Figure 1b.
1
edge-labeled directed graph, although not necessarily a trellis. In order to visualize certain
properties of a convolutional code, it is often convenient to \expand" the state-diagram of
a d
c c
c c c
b
a c a c
a c a
c c c
b
c
b d b d d a
a d
c
d a c b
b
a. b.
00
00 00 00 00
11 11 11 11
11 11
11 11 11 11
01
00 00 00 00
00 01 01 01 01
10 10 10 10 10 10
10 10 10 10
01
01 01 01 01
d.
c.
0 0 0 0
1 1
1 1 0
0
1 1
1 1
0 0 0 1 0
1 0 0
1 1
1 0 0 1
1 1
0 0
1 1
0 0 1 0 0
1
1 0 0 1
e.
Figure 1. Some graphs that are trellises and some that are not
the encoder in time, so that the set of states is replicated for each time unit. This process
always produces a trellis. For example, Figure 1c may be regarded as the state diagram of
a rate 1 2 convolutional code, while Figure 1d depicts the corresponding trellis.
=
2
Following the publication of 33, 35], trellises quickly became ubiquitous in the theory of con-
volutional codes. Concurrently, trellis decoding algorithms for convolutional codes became
the decoding method of choice in coding practice. For example, the Linkabit Corporation
has designed and built in the early 1970's a convolutional encoder and Viterbi decoder for
a wide variety of applications 89]. Indeed, it is a curious fact that trellises were actually
used to transmit images from Mars (during the 1977 Voyager mission, using NASA Planetary
Standard 76, p.534] convolutional code), before the rst truly rigorous denition of a trellis
was given by Massey 80] in 1978. Massey 80] writes in his 1978 paper:
It is becoming apparent that \trellises" are much more fundamental than anyone had
originally expected | even if no one has as yet said precisely what a \trellis" is. ] We
:::
now give a precise denition of what past researchers seem to have meant by a \trellis."
In fact, trellises were (and still are) so prevalent in the study of convolutional codes that
when Ungerboeck 108] introduced a pioneering coded-modulation scheme based on signal-
set partitioning, the class of codes proposed in 108] became known as trellis codes . This
terminology was adopted merely because Ungerboeck used convolutional codes to address
the partition of a signal constellation. Thus, despite what their name seems to imply, trel-
lis codes are only supercially related to trellises we refer the reader to 108, 37, 38] for
a comprehensive overview of this important subject.
Finally we note that, notwithstanding the remark of Massey 80] quoted above, trellises for
convolutional codes are still often studied in the general framework of Figures 1c and 1d.
At least in part, this is due to the fact that trellises for most convolutional codes are rather
uneventful | they are time invariant, meaning that the number of vertices at time i, as well
as all other relevant properties of a trellis, remain the same for all i. The kind of trellises
that we study in this chapter are almost never time invariant.
In 1974, Bahl, Cocke, Jelinek, and Raviv 2], following on an unpublished remark of Forney,
found that linear block codes can be also represented by a trellis, and showed how to construct
such a trellis. They thus uncovered an important connection between block and convolutional
codes. The trellis construction of 2] is indeed an important one it will be described in detail
in Section 4. First, however, we need to dene trellis representation of a block code. Briey,
a trellis T representing a block code has nitely many vertices, so the vertex set V can be
always partitioned as V0 V1 : : : Vn for some integer n. Now consider the ordered sequences
of edge labels along each path of length n in T . Evidently, each such sequence denes an
ordered n-tuple over the label alphabet A. We say that T represents a block code C of
length n over A (or simply that T is a trellis for C ) if the set of all such n-tuples is precisely
the set of codewords of C . For example, the reader can easily verify that the trellis depicted
in Figure 1e represents the (8 4 4) extended binary Hamming code in this way.
The discovery of Bahl, Cocke, Jelinek, and Raviv 2] created immanent potential for appli-
cations of the algebraic and combinatorial theory of block codes in the study of their trellis
representations. However, the subject remained dormant for a long while. The list of papers
3
on trellis structure of block codes published during the fteen years between 1974 and 1988
is very short. We will review these papers in just a few paragraphs below.
In 1978, Wolf 126] elaborated upon the BCJR construction of a trellis in 2], and argued that
such a trellis might be employed for maximum-likelihood decoding of block codes with the
Viterbi algorithm. He also observed that the BCJR trellis for a linear (n k d) code over IFq
satises jVij qn;k for all i, a result now known as the Wolf bound. In the same year,
Massey 80] nally gave the rst rigorous denition of the trellis as a graph-theoretic object,
together with a new construction of trellises for block codes. Three years later, Dumer 26]
presented existence results that lead to asymptotic upper bounds on the trellis complexity
of binary codes meeting the Gilbert-Varshamov bound. These asymptotic upper bounds are
still the best known today | see 74, 71, 128] and Section 5. However, with both 80] and 26]
being rather obscure references, the subject became largely forgotten.
The study of trellis structure of block codes was re-awakened in 1988 by the papers of
Forney 38] and Muder 87]. In the appendix to his 1988 paper, Forney 38] sketched yet
another construction of a trellis for both linear block codes and lattices, and claimed that
this construction is minimal in a certain sense. This claim motivated the work of Muder 87],
who re-derived much of the graph-theoretic formalism introduced by Massey 80] and used
it to prove that every linear block code has an essentially unique minimal trellis. Muder 87]
was able to show that the construction of Forney 38] does indeed produce such a trellis. For
an extensive treatment of minimal trellises, their properties and constructions, see Section 4.
Muder 87] also considered how permuting the coordinates of each codeword in a linear block
code C can change the structure of the minimal trellis for C . It is worth quoting a paragraph
from 87], which reads as follows:
In discussing the trellises of specic codes, we run into a problem of terminology.
We usually refer, for example, to the (12 6 6) ternary Golay code, when in fact

we mean a class of ternary linear codes with these parameters. For instance, an (? )
orthogonal transformation of \the" Golay code is also \the" Golay code, even
though it may be a dierent set of codewords. The diculty this presents is that
the two codes may have dierent trellises...
A similar remark was made ten years earlier by Massey 80], who used the term art of trellis
decoding to describe the problem of minimizing the trellis complexity via permutations, or
more generally, via operations on the time axis for the code. For an overview of the current
state of knowledge on the art of trellis decoding, see Sections 5 and 6.
Notably, the work of Muder in 87] goes signicantly beyond 80]. In particular, Muder 87]
determined the trellis complexity of binary and ternary Golay codes, and gave bounds on
the size of the trellis for general block codes. He observed that these bounds are exact for
maximum-distance separable (MDS) codes, and nearly exact for perfect codes. The papers
of Forney 38] and Muder 87] thus produced a range of signicant results on the trellis
structure of block codes, and set the stage for future work in this area.
The rst few years since the publication of 38] and 87] witnessed a slow but steady stream
of further important results. In particular, the connection between the trellis complexity
4
of a code and its generalized Hamming weight hierarchy (GHW) was uncovered in 61, 114]
and further studied in 39]. Optimal permutations of the time axis for the binary Reed-
Muller codes were found in 60], while certain `good' permutations for binary BCH codes
were presented in 114]. The rst asymptotic bounds on trellis complexity of binary codes
were given in 71, 128]. Forney's construction 38] of minimal trellises for block codes was
extended to general group codes in 42], and to lattices in 40]. Other contemporaneous
results on trellises for block codes were reported in 5, 59, 60, 61, 65, 119].
During the years 1995{1997, over two decades since the work of Bahl, Cocke, Jelinek, and Ra-
viv 2], the subject of trellis representation of block codes nally experienced an exponential
growth of interest. For example, about a third of the papers collected in the special issue 32]
on \Codes and Complexity" of the IEEE Transactions on Information Theory are
devoted to trellis complexity of block codes and lattices. Although the progress obtained
in these papers and other recent work on the subject is quite impressive, it is fair to say
that we still know much less than we would like to. The general problem area thus remains
wide-open for future research.
1.2. Overview of the chapter

We hope that this chapter will serve as an elementary, albeit brief, introduction to the
fascinating problems that arise in trellis representation of block codes. At the same time,
we hope that it will provide a useful reference for future work in this area.
To date, the subject of trellis structure of block codes has accumulated a sizable body of
literature | a comprehensive bibliography, consisting of some 100 references, is compiled
in the last section of this chapter. A detailed survey of all this work would hardly t in
a multi-volume treatise, much less in a single chapter. Thus our overview here will be
neither exhaustive nor completely self-contained. In particular, we will assume that the
reader is already familiar with the basic theory of block codes, although we will not assume
any prior knowledge about trellises. Furthermore, when reviewing the literature on trellis
representation of block codes, we will have to be selective. To some degree, the selection
reects the personal preferences of the author.
Here is how the rest of this chapter is organized. In the next section, we revisit the denition
of a trellis. We further dene one-to-one, proper, and biproper trellises, and introduce the
various measures of trellis complexity that will be used throughout this chapter. Section 3 is
a primer on the Viterbi algorithm. We describe this algorithm in a general setting of opera-
tions in a certain semiring, and follow closely the exposition given by McEliece in 82].
Section 4 is devoted to minimal trellises, for a xed order of the time axis. We start with the
denition of minimality given by Muder 87], and prove that the minimal trellis for a linear
code is unique up to graph isomorphism. We will then describe several constructions of the
minimal trellis, due to Bahl, Cocke, Jelinek, and Raviv 2], Massey 80], Forney 38], and
Kschischang and Sorokine 70]. We will further prove that a trellis is minimal if and only if
it is biproper. As a corollary to our proof, we conclude in Section 4 that the minimal trellis
simultaneously minimizes all the measures of trellis complexity introduced in Section 2.
5
Sections 5 and 6 constitute a brief overview of the art of trellis decoding. Section 5 deals
with permutations of the time axis, while Section 6 is concerned with another operation on
the time axis called sectionalization. The problem of nding the optimal permutation for
a general linear code appears to be intractable 55, 57, 112]. Hence Section 5 deals primarily
with upper and lower bounds on trellis complexity under all possible permutations. Upper
bounds are usually obtained by exhibiting specic `good' permutations of the time axis,
while lower bounds follow by exploiting certain properties of the code that are invariant
under coordinate permutations, such as distance spectrum and generalized Hamming weight
hierarchy. We concentrate on the lower bounds and discuss their asymptotic form for n ! 1,
especially for codes meeting the Gilbert-Varshamov bound.
In contrast, the sectionalization problem turns out to be more tractable. In particular, in
Section 6 we present a polynomial-time algorithm which produces the optimal sectionaliza-
tion of a given trellis T in time O(n2 ), where n is the length of the code represented by T .
The algorithm is developed in a general setting of certain operations and functions dened
on the set of trellises it therefore applies to both linear and nonlinear codes, and easily
accommodates a wide range of optimality criteria. The exposition in Section 6 follows 75].
This might be an appropriate place to mention two topics closely related to the subject
matter of this chapter, but not covered herein. The rst of these topics is trellis structure and
complexity of lattices. By and large, it appears that trellis complexity problems for lattices
are considerably more dicult than the corresponding problems for block codes: a trellis for
a given n-dimensional lattice is determined by an ordered basis v1 v2 : : : vn for , and
the number of vertices, edges, and labels in the corresponding trellis depends not only on
the order in which the basis vectors are listed (analogous to the permutation problem for
block codes), but also on the choice of the basis itself. The rst detailed study of minimal
trellises for lattices was undertaken by Forney in 40]. Subsequently, upper and lower bounds
on trellis complexity of lattices were developed in 3, 11, 40, 103, 104, 105]. In particular,
the work of Tarokh and Blake 11, 103, 104] contains profound results on the fundamental
trellis complexity functions T1( ), T2( ), and T3( ), which describe the best possible tradeo!
between the coding gain of a lattice and its trellis complexity. Another closely related
topic is that of representing a code by a graph. In this context, one encounters various
generalizations of a trellis, such as tail-biting trellises 17, 67, 100], transparent trellises
and trellis manifolds, as well as Tanner graphs 102] that are in some well-dened sense
diametrically opposite to trellises. All these representations are special cases of the general
concept of a factor graph, and we refer the reader to 44, 123] for a detailed treatment of
factor graphs and the associated iterative algorithms: the min-sum and the sum-product.
We will compensate to some extent for the omission of these two subjects in this chapter by
providing a comprehensive bibliography on these and other related topics in Section 7.
6
2. The trellis and its complexity measures
In this section, we introduce the basic denitions that will be used throughout this chapter.
We rst elaborate upon some of the denitions made in the previous section. An edge-labeled
directed graph consists of a set V of vertices, a set A called the alphabet, and a set E of ordered
triples (v v0 a), with v v0 2 V and a 2 A, called edges. We say that an edge (v v0 a) 2 E
begins at v, ends at v0 , and has label a. A directed walk of length from a vertex v0 2 V to
a vertex v 2 V is an ordered sequence of edges e1 e2 : : : e 2 E , such that e1 begins at v0,
e ends at v , and each consecutive pair of edges ei ei+1 shares a common vertex vi at which
ei ends and ei+1 begins. If the +1 vertices v0 v1 : : : v are all distinct, then e1 e2 : : : e
is called a path of length . We say that the vertices v0 v1 : : : v lie on this path.
A trellis T was dened in the previous section as an edge-labeled directed graph in which
every vertex has a well-dened depth. In the context of trellis representation of block codes,
we can assume that the maximum depth of a vertex in T is nite. We denote this maximum
depth by n, and call it the depth of T . This leads to the following denition.
Denition 2.1. A trellis T = (V E A) of depth n is an edge-labeled directed graph with
the following property: the vertex set V can be decomposed as a union of disjoint subsets
V = V0 V1 Vn (1)
such that every edge in T that begins at a vertex in Vi ends at a vertex in Vi+1, and every
vertex in T lies on at least one path from a vertex in V0 to a vertex in Vn.
We will assume, unless stated otherwise, that the subsets V0 Vn V each consist of a single
vertex, called the root and the toor , respectively. This assumption is crucial in the study of
minimal trellises, but it should not be construed as part of the denition of a trellis. In fact,
this assumption will be dropped in Section 6. Until then, we shall denote the root by and
the toor by ', so that V0 = fg and Vn = f'g.
The trellises dened above are called reduced in 68, 70, 115]. A non-reduced \trellis" might
contain vertices that do not lie on a path from to ', and in fact may have no paths from
to ' at all. In the context of trellis representation of block codes, vertices that do not lie on a
path from to ' can be removed from the trellis without a!ecting the code being represented.
Hence, we assume that a trellis is always reduced, and make this part of its denition.
The partition of the vertex set of a trellis dened in (1) induces the corresponding partition
of the set of edges E into disjoint subsets
E = E1 E2 En (2)
where Ei consists of all the edges in T that end in a vertex of Vi. This further induces
a decomposition of the label alphabet A as
A = A1 A2 An (3)
where Ai is the set of edge-labels for edges in Ei . Notice that the subsets A1 A2 : : : An are
not necessarily disjoint. In fact A1 = A2 = = An = A, for most trellises in this chapter.
7
Hereafter, we will refer to Vi, Ei , and Ai , respectively, as the set of vertices, edges, and labels
at time i. Historically, this temporal terminology is due to the fact that trellises originated
in the study of nite-state machines evolving in time. For the same reason, the vertices in T
are often called states. The edges of T are sometimes called branches, since trellises may be
regarded as compact representations of trees 87].
2.1. One-to-one, proper, and biproper trellises

We now introduce three special types of trellises, distinguished by certain properties of the
edge labeling. The signicance of the denitions below will become apparent in Section 4.
Denition 2.2. A trellis T is said to be one-to-one if all paths from a root vertex to a toor
vertex in T are labeled distinctly.
One-to-one trellises are also called observable in 70], because observing a sequence of n edge-
labels in a one-to-one trellis uniquely determines the trellis path.
0 0
0 0 0 0
0 0
1
1
0 1 1
1 1 1 1
1 1
a. Unobservable trellis b. Observable trellis, that is not proper
0 0 0 0
0
0
1
1
1 1 1 1
1
c. Proper trellis that is not biproper d. Biproper trellis
Figure 2. Four trellises representing the code f000 011 111 100g
Notice that the condition of Denition 2.2 may be equivalently phrased as follows: all paths
of length n in T are labeled distinctly. The next denition includes the condition that all
paths of length n are labeled distinctly as an extreme special case.
Denition 2.3a. A trellis T is said to be proper if there is a unique root vertex , and all
paths of length i starting at are labeled distinctly, for all i = 1 2 : : : n.
Thus a proper trellis is also one-to-one, but not necessarily vice versa. An example of a one-
to-one trellis that is not proper is depicted in Figure 2. The condition used in Denition 2.3a
8
implies a global property of a trellis. However, a simple inductive argument shows that this
condition is, in fact, equivalent to a local property. In other words, one can equivalently
dene proper trellises as follows.
Denition 2.3b. A trellis T is said to be proper if there is a unique root vertex , and the
edges beginning at any given vertex of T are labeled distinctly.
We say that T is co-proper if the condition of Denition 2.3 holds with the direction of all
edges reversed. Namely, a trellis T is co-proper if there is a unique toor vertex ', and the
edges ending at any vertex of T are labeled distinctly.
Denition 2.4. A trellis T is said to be biproper if it is both proper and co-proper.
For an example of a proper trellis that is not also co-proper, see again Figure 2. Thus we have
the following proper inclusion chain fbiproper trellisesg fproper trellisesg fone-to-one
trellisesg between the three types of trellises that we have just dened.
2.2. Trellis representation of block codes

Notice that the foregoing denitions are concerned with the properties of the trellis as an
edge-labeled directed graph. As such, they do not necessarily have anything to do with
codes. The following denition, already mentioned in the previous section, establishes the
relation between codes and trellises.
Denition 2.5. Let T = (V E A) be a trellis of depth n. Then the sequence of edge labels
along each path of length n in T denes an ordered n-tuple over the label alphabet A. We
say that T represents a block code C of length n over A if the set of all such n-tuples is
precisely the set of codewords of C .
We sometimes write C (T ) to denote the code represented by T . If T is one-to-one then the
total number of paths from the root to the toor in T is equal to the number of codewords
of C (T ). In general, there could be more paths in T than there are codewords in C (T ).
We say that two trellises T = (V E A) and T 0 = (V 0 E 0 A) are isomorphic if there is a one-
to-one mapping : V ! V 0 , such that (v1 v2 a) is an edge in E if and only if ((v1) (v2) a)
is an edge in E 0 . It is easy to see that an isomorphism between trellises must preserve the
partitions of vertices and edges dened in (1) and (2), namely:
(Vi) = Vi0 for i = 0 1 : : : n
and
(Ei ) = Ei0 for i = 1 2 : : : n
It follows that isomorphic trellises always represent the same code. On the other hand, it is
not dicult to see that two trellises for the same code C need not be isomorphic.
9
2.3. Complexity measures for trellises
Given two non-isomorphic trellises representing the same code, which one is better? While
there is no single answer to this question, one can dene several measures of trellis complexity
and argue that the trellis which minimizes these complexity measures is the one to prefer.
Just as the performance of a code on a real channel is not completely determined by its
minimum distance, the actual complexity of trellis decoding is not completely determined
by the complexity measures introduced below. Actual decoding complexity depends on
the specic algorithm being used, and can be often reduced by means of computational
techniques such as Gray codes, ecient processing of parallel transitions, and recursive
decoding of sub-trellises 114, 75, 45]. Nevertheless, trellis complexity measures dened in
the following paragraphs usually govern the complexity of trellis decoding in the same sense
as the minimum distance of a code usually governs its performance.
Let C be a block code of length n over the nite eld IFq . Let T = (V E IFq ) be a trellis of
depth n that represents C , and let V0 V1 : : : Vn be the underlying partition of the vertex
set of T , as dened in (1). In compliance with the standard terminology, we use the term
\states" in place of vertices in what follows. The following measures of the state complexity
of T have been introduced in the literature:
state-cardinality prole: the ordered sequence jV0j jV1j : : : jVn j (4)
maximum number of states: Vmax = max fjV0j jV1j : : : jVn jg (5)
total number of states: jV j = jV0j + jV1j + + jV j n (6)
We shall see in Section 4 that if C is a linear code over IFq and T is the minimal trellis for C ,
then the state-cardinality prole jV0j jV1j : : : jVn j consists of powers of q. Hence in this con-
text it is more convenient to dene si = logq jVi j, and use logarithmic complexity measures:
state-complexity prole: the ordered sequence s0 s1 : : : sn (7)
state complexity: s = max fs0 s1 : : : sng (8)
total span: = s0 + s1 + + sn (9)
The single most widely accepted measure of trellis complexity is the state complexity s
dened by (8). If a trellis for a binary code C is implemented in hardware, then 2s is the
number of add-compare-select (ACS) circuits that one has to put on a chip 85]. If the state-
complexity s of the minimal trellis for a given linear code C is minimized over all possible
coordinate permutations, then it becomes a descriptive characteristic of the code itself, in
the sense of (?). Muder 87] contends that it is a fundamental one. He writes that \the
logarithm of the maximum number of states in the minimal trellis of a code is an intuitive
measure of the decoding complexity of the code and appears to be a fundamental descriptive
characteristic, comparable to the quantities n (length), k (size), and d (minimum distance)."
10
On the other hand, trellis complexity measures based on the number of edges in the trellis
may be preferable for the following reason. The state-complexity s can be somewhat arti-
cially reduced through a process called sectionalization. In contrast, the analogous measure
of edge-complexity is invariant under sectionalization, as we shall see in Section 6.
Let E1 E2 : : : En be a partition of the edge set of T , as dened in (2). Then edge complexity
measures analogous to (4){(6) are dened as follows:
edge-cardinality prole: the ordered sequence jE1 j jE2 j : : : jEn j (10)
maximum number of edges: Emax = max fjE1 j jE2 j : : : jEn jg (11)
total number of edges: jE j = jE1j + jE2j + + jE j n (12)
If C is a linear code and T is minimal, then each of jE1j jE2 j : : : jE j is again a power of q.
Thus we let b = log jE j, and dene logarithmic edge-complexity measures as follows:
n
i q i
edge-complexity prole: the ordered sequence b1 b2 : : : bn (13)

edge complexity: b = max fb1 b2 : : : bng (14)
total edge-span: " = b1 + b2 + + bn (15)
McEliece argues in 82] that the total number of edges in the trellis is the most meaningful
measure of Viterbi decoding complexity. Indeed, as we shall see in Section 3, the Viterbi
algorithm on a trellis T , when used for maximum-likelihood decoding of C (T ), requires jE j
binary additions and jE j;jV j + 1 binary comparisons, where jE j;jV j + 1 counts the total
number of merges or expansions in T . This leads to the following trellis complexity measures:
expansion or merge index: E = jE j ; jV j + 1 (16)
Viterbi decoding complexity: D = 2jE j ; jV j + 1 (17)
Example 2.1. Consider the trellis T = (V E IF2) depicted in Figure 3. It can be veried by
direct inspection that this trellis represents the (8 4 4) extended binary Hamming code. We
shall see in Section 4 that this is, in fact, the minimal trellis for this code, and in Section 5
we will learn that it is also optimal with respect to all possible coordinate permutations.
The state-cardinality prole for this trellis is f1 2 4 8 4 8 4 2 1g, so that the maximum
number of states is Vmax = 8 and the total number of states is jV j = 34. The number
of states at each time is indeed a power of 2, and the logarithmic state-complexity prole
is given by f0 1 2 3 2 3 2 1 0g. Thus the state-complexity is s = 3 and the total span
is = 14. The edge-cardinality prole for the same trellis is f2 4 8 8 8 8 4 2g. We see
that the number of edges in Ei is at most twice the number of vertices in Vi | this is always
true in a minimal trellis for a binary code. The maximum number of edges is Emax = 8, and
11
the total number of edges is jE j = 44. The logarithmic edge-complexity prole is given by
f1 2 3 3 3 3 2 1g, so that the edge-complexity is b = 3 and the total edge-span is " = 18.
0 0 0 0
1 1
1 1 0
0
1 1
1 1
0 0 0 1 0
1 0 0
1 1
1 0 0 1
1 1
0 0
1 1
0 0 1 0 0
1
1 0 0 1
Figure 3. Minimal trellis for the (8 4 4) extended binary Hamming code
We can easily compute the expansion index as E = jE j;jV j + 1 = 11, and it can be veried
by direct inspection that this is indeed the number of expansions (or bifurcations) in the
trellis. Inspection further conrms that this is also the total number of merges in the trellis.
The Viterbi decoding complexity for the (8 4 4) Hamming code, using the trellis in Figure 3,
is given by D = jE j + E = 55. We shall see in the next section that this is indeed the total
number of additions and comparisons required by the Viterbi algorithm on this trellis. }
All the measures of trellis complexity in (4){(17) have been introduced and studied by past
researchers | see 62] for a recent exposition. Furthermore, several other complexity mea-
sures that we did not mention can be found in the literature. While there is no universal
agreement which of these complexity measures is the most appropriate, this usually does not
present a problem because all these measures are closely related. A trellis that minimizes one
complexity measure is often minimal, or close to minimal, with respect to most other com-
plexity measures. In fact, we will prove in Section 4 that the minimal trellis simultaneously
minimizes all fourteen trellis complexity measures dened in (4){(17). Furthermore, we shall
see in Section 5 that all these complexity measures coincide asymptotically for n ! 1.
12
3. The Viterbi algorithm
The Viterbi algorithm is an application of the dynamic programming methodology 20,
Chapter 16] to the problem of computing ows on a trellis. It is a simple algorithm, but
nonetheless a fundamental one. It was introduced by Andrew J. Viterbi 118] in 1967, and
motivated the invention of trellises by Forney 33] shortly thereafter. To this day, maximum-
likelihood decoding of block and convolutional codes using the Viterbi algorithm on a trellis
constitutes the main application of trellises in practice.
This section contains a detailed description of the Viterbi algorithm in the general setting
of computing path ows on a trellis. In this setting, the application to maximum-likelihood
decoding becomes a simple special case. We follow closely an excellent exposition of the
Viterbi algorithm on a trellis given by McEliece in 82]. We are grateful to Robert J. McEliece
for an explicit permission to use this material.
We start by establishing the terminology pertinent to this section. If e = (v v0 a) is an edge
in a trellis T = (V E A), we let
(e) = a denote the label of e. Hereafter, we assume that
the label alphabet A is an algebraic set S , closed under two binary operations and +
called product and addition, respectively. The two operations satisfy the following axioms:
A1. The product operation is associative, and there is an identity element 1,
such that a 1 = 1 a = a for all a 2S . This makes (S ) a semigroup with
identity, or a monoid.
A2. The addition operation + is associative and commutative, and there is an
identity element 0, such that a + 0 = 0 + a = a for all a 2S . This makes
(S +) an abelian semigroup with identity, or a commutative monoid.
A3. The distributive law (a + b) c = (a c) + (b c) holds for all triples a b c 2S .
The algebraic structure (S +) is called a semiring. For more details on the general prop-
erties of semirings, we refer the reader to 25, 56]. We will soon encounter specic examples
of semirings that are of importance in the context of the Viterbi algorithm. First, however,
we need to dene ows on a trellis.
Denition 3.1. Let T = (V E S ) be a trellis. If P = e1 e2 : : : e is a path in T , then the
ow along P is dened as the ordered product
(P ) =
(e1)
(e2)
(e ) (18)
of the edge labels along the path. If u and v are two vertices in T , then the ow from u to v
is denoted (u v) and dened as the sum of the ows along all paths from u to v.
Notice that the terms product and sum in the foregoing denition refer to the semiring
operations and + respectively. Thus the order of multiplication in (18) is signicant, as
the product operation need not be commutative. Observe that if there is no path from u to v,
then the ow (u v) is an empty sum, which may be taken as 0 2S by convention. On the
other hand, we dene (v v) = 1 for all v, where 1 is the multiplicative identity of S .
13
We can now describe the purpose of the Viterbi algorithm. The object of the Viterbi algo-
rithm, when applied to a trellis T = (V E S ), is to compute the ow from the root vertex
to the toor vertex '. This ow may have di!erent meanings, depending on the particular
semiring used to label the trellis. The following examples illustrate this point.
Example 3.1.1. Let S = f0 1g with and + being the Boolean AND and OR operations,
respectively. This is the simplest example of a semiring. Suppose we interpret edges labeled 1
as being \active" and edges labeled 0 as being \inactive." Then for all u v 2 V , the ow
(u v) = 1 if and only if there is a path from u to v that consists of active edges. }
Example 3.2.1. Suppose that the edges of T are labeled by symbols from an arbitrary
alphabet A, and let the product operation be string concatenation. Further dene addi-
tion + as the operation of taking the union of sets of strings. Then the ow from the root
vertex to the toor vertex in the trellis is the set of all n-tuples over A that correspond to
ordered sequences of edge labels along each path from the root to the toor. In other words,
the ow ( ') is precisely the block code represented by the trellis! In automata theory
and symbolic dynamics 77], the ow ( ') would be called the language of the trellis. }
Example 3.3.1. Let S be the ring Z x] of polynomials in x over the integers, with the
usual polynomial addition and multiplication. Suppose that we re-label each edge e in T
by the monomial xwt(e) where wt(e) is the weight of e in the original trellis. Then the ow
(u v) is the generating function for the weights of the paths from u to v. In particular,
if the weight of e is taken as the Hamming weight of its label, then the ow ( ') is the
weight-enumerator polynomial for the code represented by T . }
Example 3.4.1. Let S be the set of nonnegative real numbers, plus the special symbol 1.
Dene the product to be ordinary addition, with the real number 0 playing the role of the
multiplicative identity. Dene the addition + to be the operation of taking the minimum,
with the symbol 1 playing the role of the additive identity. Thus minfs 1g = s for all real
numbers s, as is natural. If we now interpret the label of each edge e as its cost, then the ow
(u v) is the cost of the lowest-cost path from u to v. As we shall see later in this section, this
is the semiring appropriate for maximum-likelihood decoding with the Viterbi algorithm, in
which case the costs are log-likelihood functions. We call it the min-sum semiring. }
3.1. Computing ows on a trellis

The foregoing examples show that computing ows on a trellis is important, and the Viterbi
algorithm is an ecient way to carry out this computation. The underlying partition of
the vertex set V = V0 V1 Vn serves as the computational backbone for the Viterbi
algorithm on a trellis. The algorithm rst computes the ows ( v) for all the vertices
in V1, then it uses this information to compute the ows ( v) for all v 2 V2, and so forth,
until the value ( ') is nally computed and returned. This is the classical bottom-to-top
approach of dynamic programming, pioneered by Bellman 4] in 1957.
14
The precise description of the Viterbi algorithm now follows. First, however, we need some
more notation. If e = (v v a) is an edge in a trellis T = (V E S ), we say that v is the
0
initial vertex of e, v is the nal vertex of e, and write e = v and e = v as a shorthand for
0 0
the two vertices. Since we are only interested in ows from a single vertex | the root , we
henceforth simplify notation by writing (v) to denote the ow ( v) from the root to v.
Here is a pseudo-code description of the Viterbi algorithm in this notation.
/* The Viterbi algorithm on a trellis */
() := 1 /* initialization */
for i = 1 to n do
f
for v 2 Vi do
f (v) :=
X
(e) (e) (19)
g e e v
g
: =
return (')
For example, if the Viterbi algorithm is applied to the trellis in Figure 4a, whose edges are
labeled by the abstract elements a b : : : h from a semiring S , the resulting sequence of
computations is given by:
(v ) = () a = 1 a = a
11 (20)
(v ) = () b = 1 b = b
12 (21)
(v ) = (v ) c + (v ) d = ac + ad
21 11 11 (22)
(v ) = (v ) e + (v ) f = ae + bf
22 11 12 (23)
(') = (v ) g + (v ) h = acg + adg + aeh + bf h
21 22 (24)
Thus in this case, at least, the Viterbi algorithm correctly computes the ow (') = ( ').
The following theorem shows that this is true in general.
Theorem 3.1. The Viterbi algorithm correctly computes the ow (v) for all v 2 V .
Proof. The proof mimics the algorithm, and proceeds by induction on the depth of v
in the trellis. Suppose that v 2 Vi, and consider rst the case i = 1. If v 2 V and there is 1
a single edge from to v, then the ow (v) = ( v) is just the label of this edge. If there
is more than one edge from to v, then (v) is the sum of the labels on all such edges.
In either case, it is correct to say that (v) is the sum of the labels on all the edges that
end at v, since all the edges in E begin at . Thus
1
X X X X
(v) = (e) = 1(e) = ()(e) = (e)(e) (25)
e e
: = v e e
: = v e e : = v e e
: = v
where the last equality in (25) follows again from the fact that all the edges in E begin at . 1
But the right-hand side of (25) is exactly the value assigned to (v) by the Viterbi algorithm.
The above establishes the induction base. Now assume that the Viterbi algorithm correctly
computes the ows from the root to all the vertices in Vi, and consider a vertex v 2 Vi . +1
15
The value assigned to (v) by the Viterbi algorithm in given by (19). By the denition of
a trellis, if e = v and v is a vertex of Vi then e is necessarily a vertex of Vi. Hence by
+1
the induction hypothesis and Denition 3.1, we have

X
(e) = (P ) (26)
P
: ! e
where P : ! u denotes the set of all paths from the root to a vertex u 2 V . Substituting the
expression in (26) into (19), while using the commutativity of addition in S and the distri-
butive law, we conclude that the Viterbi algorithm computes
X X X X X
(v) := (e) (e) = (P ) (e) = (P e) (27)
e e
: = v e e
: = v P
: ! e e e
: = v P
: ! e
where P e denotes the path which consists of the concatenation of P with the edge e. But
all the paths from to v are of the form P e for some edge e that ends at v and some path P
from to the vertex e 2 Vi. Thus we can rewrite the double sum in (27) simply as
X
(v) = (P ) (28)
P v
: !
The expression in (28) is precisely the ow from the root to v, according to Denition 3.1.
This completes the induction step, and shows that the Viterbi algorithm correctly computes
the ows from to all the vertices in the trellis.
We observe that if there are multiple toor vertices, then the Viterbi algorithm correctly
computes the ow from the root to each toor vertex. If there are multiple root vertices, all
initialized to 1 2 S , and a single toor vertex ', then the Viterbi algorithm correctly computes
the sum of the ows to the toor from all the root vertices. Finally, if there are multiple root
and toor vertices, then the Viterbi algorithm will compute the sum of the ows from all the
root vertices for each toor vertex. These properties of the Viterbi algorithm are important
in computing ows on sectionalized and tail-biting trellises 75, 17, 67].
Having proved that the Viterbi algorithm works in general, let us see how it operates when
the edge labels come from the four specic semirings described in examples 3.1.1 { 3.4.1.
Example 3.1.2. Consider the trellis in Figure 4b, whose edges are labeled by the elements
of the semiring S = f0 1g described in Example 3.1.1. The Viterbi algorithm in this case
follows the computation in (20) { (24), while interpreting as Boolean AND (denoted ^),
and + as Boolean OR (denoted _). The algorithm thus computes successively:
(v ) = () ^ 0 = 1 ^ 0 = 0
11
(v ) = () ^ 1 = 1 ^ 1 = 1
12
(v ) = ((v ) ^ 0) _ ((v ) ^ 1) = (0 ^ 0) _ (0 ^ 1) = 0

21 11 11
(v ) = ((v ) ^ 1) _ ((v ) ^ 1) = (0 ^ 1) _ (1 ^ 1) = 1

22 11 12
(') = ((v ) ^ 0) _ ((v ) ^ 1) = (1 ^ 0) _ (1 ^ 1) = 1

21 22
The Viterbi algorithm hence concludes that ( ') = 1, which means that there is at least
one active path from to '. Indeed P = ( v 1) (v v 1) (v ' 1) is such a path. }
12 12 22 22
16
Example 3.2.2. Consider again the trellis in Figure 4b, but this time think of the edge labels
as elements in the nite eld IF . The semiring operations in this case are string concatenation
(denoted ) and set union, as discussed in Example 3.2.1. The semiring S thus consists of
2
all sets of strings over IF . Notice that the multiplicative identity in S is the empty string ,
while the additive identity is the empty set ?. The Viterbi algorithm again follows the
2
computations in (20) { (24) and successively nds:

(v ) = 0 = f0g
11
(v ) = 1 = f1g
12
(v ) = (f0g 0)
(f0g 1) = f00 01g
21
(v ) = (f0g 1)
(f1g 1) = f01 11g
22
(') = (f00 01g 0)

(f01 11g 1) = f000 010 011 111g
The value of (') is indeed the entire code represented by the trellis in Figure 4b, as may
be veried by direct inspection. }
c 0
v11 v21 v11 v21
d 1
a g 0 0
e 1
b h 1 1
f 1
v12 v22 v12 v22
a. b.
1 0.92
v11 v21 v11 v21
x 0.51
1 1 1.20 0.22
x 0.51
x x 0.35 1.61
x 0.51
v12 v22 v12 v22
c. d.
Figure 4. Dierent semiring labelings of the same trellis
Example 3.3.2. Now consider the trellis in Figure 4c. The edges in this trellis are labeled by
monomials of the form x e
wt( )
, where wt(e) is the Hamming weight of the corresponding label
17
in Figure 4b. The appropriate semiring in this case is Z x], as discussed in Example 3.3.1.
The Viterbi algorithm computes:
(v ) = 1 1 = 1
11
(v ) = 1 x = x
12
(v ) = 1 1 + 1 x = x + 1
21
(v ) = 1 x + x x = x + x
22
2
(') = (x + 1) 1 + (x + x) x = x + x + x + 1
2 3 2
We conclude that the Viterbi algorithm in this example returns the weight enumerator
polynomial for the code computed in the previous example. }
Example 3.4.2. Finally consider the trellis in Figure 4d. Each edge in this trellis is labeled
by a real number which represents the cost of this edge. The appropriate semiring in this
case is the min-sum semiring of Example 3.4.1, so that is ordinary addition and + is the
operation of taking the minimum. The computation sequence is given by:
(v ) = 0 + 1:20 = 1:20
11
(v ) = 0 + 0:35 = 0:35

12
(v ) = min f1:20 + 0:51 1:20 + 0:92g = 1:71

21
(v ) = min f1:20 + 0:51 0:35 + 0:51g = 0:86

22
(') = min f1:71 + 0:22 0:86 + 1:61g = 1:93

The Viterbi algorithm thus concludes that the lowest possible cost along a path from to '
is (') = 1:93, and it is easy to see that P = ( v 1:20) (v v 0:51) (v ' 0:22) is
11 11 21 21
the lowest-cost path. If the edge costs in Figure 4d are the log-likelihoods observed upon
transmission of a codeword from the code computed in Example 3.2.2, the Viterbi algorithm
will nd that the codeword 010, corresponding to this path P , is the most likely. }
3.2. Maximum-likelihood decoding

We now elaborate upon Example 3.4.2, and show how maximum-likelihood decoding with
the Viterbi algorithm works in general. Suppose that a codeword of a code C of length n
over IFq is transmitted over a noisy communication channel with input alphabet X = IFq and
output alphabet Y . We assume that the channel is memoryless and discrete. This means
that the channel output Y is a discrete set, and the channel is completely characterized by
q known probability-mass functions:
f (jx) : Y ! 0 1] for all x 2 X = IFq
where f (yjx) is the probability of receiving y 2 Y given that the symbol x 2 IFq was transmit-
ted. The assumption that Y is a discrete set is needed only to simplify the terminology. The
argument below holds essentially without change if Y is continuous, say Y = IR, in which
case f (jx) are continuous probability-density functions.
18
If a vector y = (y y : : : yn) 2 Y n is observed at the channel output, the optimal decoding
strategy is to nd the codeword c 2 C that maximizes the probability Prfcjyg that c was
1 2
transmitted given that y was received. We may often assume w.l.o.g. that the codewords of C
are transmitted with equal a priori probability 1=jC j. In this case, by a simple application
of the Bayes rule, the optimal decoding strategy is equivalent to nding the most likely
codeword c = (c c : : : cn) 2 C that maximizes the probability Prfyjcg that y would be
1 2
received if c was transmitted. A decoder that always nds the most likely codeword (or one
such codeword, if there are ties) is said to be a maximum-likelihood decoder for C .
Now recall the assumption that the channel is memoryless. This assumption means that the
noise is an i.i.d. random process, and Prfyjcg factors as follows:
Y
n Yn
Prfyjcg = Prfyijci g = f (yi jci) (29)
i=1 i=1
We can take logarithms to convert the product in (29) into a sum, and use negation to make
this sum nonnegative. Thus maximizing (29) over all codewords is equivalent to
Xn
arg maxc C Prfyjcg = arg minc C
2 ; log f (yijci) 2 (30)
i =1
This is precisely the min-sum form that we need to invoke the Viterbi algorithm. Indeed,
suppose that T = (V E IFq ) is a trellis that represents C . Given the observed channel output
y = (y y : : : yn ) 2 Y n , we rst relabel the edges in T as follows. If e 2 Ei and (e) is the
1 2
label of e in T , then the new label is dened by the mapping:

(e) 7! (e) = ; log f (yi j(e))
0
(31)
This produces the trellis T = (V E S ), where S is the min-sum semiring of Example 3.4.1.
0
It follows from the fact that T represents C , along with the new edge labeling in (31), that
Xn
minc C2 ; log f (yijci) = mine1e2:::en 'f (e ) + (e ) + + (en) g (32)
: !
0
1
0
2
0
i =1
where the minimization on the right-hand side is over all paths from the root to the toor in
the trellis T . But in the min-sum semiring S , real addition is the product operation, so that
0
(e ) + (e ) + + (en) = (P ) by Denition 3.1, where P is the path e e : : : en .

0 0 0
Furthermore, minimization is the addition operation in S . Hence we can rewrite (32) as

1 2 1 2
Xn X
minc C ; log f (yijci) =
2 (P ) = ( ')
i
=1 P '
: !
Thus the ow from the root to the toor in T is precisely the log-likelihood cost of the most
0
likely codeword, namely !

maxc C Prfyjcg = exp ;(')2
Finding the most likely codeword in C is therefore equivalent to nding the lowest-cost path
in T . Indeed, given such a path in T , the most likely codeword can be reconstructed as
0 0
19
the sequence of edge labels along the same path in T . To nd not only the lowest cost but
also the lowest-cost path, we need a slight modication of the Viterbi algorithm. Here is
a pseudo-code description of the modied procedure.
/* Maximum-likelihood decoding with the Viterbi algorithm */
() := 0 /* initialization */
for i = 1 n do
to
f
for v 2 Vi do
f
(v) := mine e v f (e) + (e) g
: =
0
(33)
survivor(v ) := arg mine e v f (e) + (e) g 0
(34)
g : =
g
v := ' /* trace-back initialization */
for i = n down to 1 do
f
ci := (survivor(v))
v := survivor(v)
g
return (c c : : : cn )
1 2
Namely, all we have to do is to keep track of the particular edge that achieves the minimum
value in (33) we call it the survivor edge in (34). Given the additional information com-
puted in (34), we can easily reconstruct the lowest-cost path by tracing back the sequence
of survivor edges from the toor to the root. This path is usually called the survivor path in
the literature on Viterbi decoding 35, 76, 81]. The most likely codeword is thus obtained as
the sequence of edge labels along the survivor path.
3.3. Complexity of the Viterbi algorithm

Regardless of whether the Viterbi algorithm is used for maximum-likelihood decoding or for
any other purpose, its computational complexity is the same. The following theorem shows
that the number of additions and multiplications in S performed by the Viterbi algorithm
depends only on the total number of edges and vertices in the trellis.
Theorem 3.2. The Viterbi algorithm on a trellis T = (V E S ) requires jE j multiplications
in S and E = jE j ; jV j + 1 additions in S .
Proof. Given a vertex v 2 V , we denote by deg (v) the in-degree of v, which is the
total number of edges that end at v. The execution of line (19) | or of line (33) in the
in
special case of maximum-likelihood decoding | in the main loop of the Viterbi algorithm
requires deg (v) multiplications in S and deg (v) ; 1 additions in S . This is so because the
in in
20
summation in (19) is over all the edges that end in v. This line is executed for every vertex
in V , except the root. Thus, we have
Xn X X
multiplications = deg (v) = deg (v) = jE j
in in
(35)
i=1 v Vi
2 v V
2 nf g
n X
X X X
additions = deg (v) ; 1 =
in
deg (v) ; 1 = jE j ; jV j + 1
in
(36)
i =1 v Vi
2 v V
2 nf g v V
2 nf g
It is easy to see that each edge in E is indeed counted exactly once in (35) and (36). These
are all the additions and multiplications performed by the algorithm.
Following McEliece 82], we conclude this section with some remarks concerning the rela-
tionship between the Viterbi algorithm and other similar algorithms that may be found in
the computer science literature. The closest match to the Viterbi algorithm is arguably the
Dijkstra algorithm 23], but there are important dierences. The Dijkstra algorithm nds
the shortest paths from a given initial vertex to all other vertices in an arbitrary nite di-
rected graph, tacitly assuming that the underlying graph is complete. The latter assumption
is costly in the context of trellises, since trellises are not at all complete. Thus when the
Dijkstra algorithm is applied to a trellis, it is not as ecient as the Viterbi algorithm | its
running time is O(jV j ). Furthermore, as pointed out in 1, Section 5.10], the Dijkstra algo-
2
rithm does not lend itself to the \semiring" generalization. The semiring generalization is
available for the Floyd-Warshall type algorithms described in 1, Section 5.6] and 20, Chap-
ter 24] that nd the shortest paths between all pairs of vertices. However, the complexity
of these algorithms is O(jV j ), and there does not appear to be any way to signicantly
3
simplify these algorithms if only ows from one particular vertex are required. Another
close match to the Viterbi algorithm is the Dag-Shortest-Paths algorithm, described
in 20, Section 25.4], which nds the single-source shortest paths in a directed acyclic graph
(DAG). The complexity of this algorithm is O(jV j + jE j), which is better than the Dijkstra
algorithm, but still not as good as the Viterbi algorithm. The Viterbi algorithm on a trellis
is more ecient because a trellis is a special kind of DAG, which obviates the \topological
sort" required in the Dag-Shortest-Paths algorithm. Also, the Dag-Shortest-Paths
algorithm does not appear to lend itself to the semiring generalization.
The conclusion from this comparison, as drawn by McEliece 82], is that the Viterbi algorithm
is an algorithm on a trellis. Non-trellis algorithms, when specialized to trellises are not as
ecient as the Viterbi algorithm. Conversely, it is not fair to say that the Viterbi algorithm
applies to structures more general than trellises, such as arbitrary digraphs, since highly
ecient algorithms are already available for such problems.
21
4. The minimal trellis: properties and constructions
It is obvious that every trellis T represents a unique code, which can be easily determined by
reading the edge labels along each path in T . Indeed, the Viterbi algorithm can be used to
compute C (T ), as in Example 3.2 of the previous section. However, we usually need to solve
the converse problem: given a code C over IFq , we wish to construct a trellis T which repre-
sents C . It is easy to see that there are always many non-isomorphic trellises representing the
same code. Hence, we would generally like to construct the \best" trellis for a given code C .
This problem has two important aspects. First, as discussed in the introduction, operations
on the time axis for a given code, such as permutations and sectionalizations, can lead
to a drastically dierent trellis representation. This problem is discussed in the next two
sections. In this section, we will assume throughout that the time axis is xed. Still, there
are many non-isomorphic trellises that represent a given code C for each given order of
its time axis. For example, four dierent trellises that represent the binary linear code
f000 011 111 100g are depicted in Figure 2. However, when the time axis is xed, one of
the trellises representing a linear code C | the minimal trellis | is denitely the best!
We adopt the original denition of minimality due to Muder 87]. As we shall see later in
this section (cf. Theorem 4.25), the minimal trellis may be dened in a number of dierent
ways which, in most cases, are all equivalent to the following denition given in 87].
Denition 4.1. A trellis T for a code C of length n is minimal if it satises the following
property: for each i = 0 1 : : : n, the number of vertices in T at time i is less than or equal
to the number of vertices at time i in any other trellis for C .
The dening property of the minimal trellis | simultaneous minimization of the number of
vertices at each time i | is a strong requirement. Given a code C , it is not at all obvious
that there exists a minimal trellis for C , since minimization of the number of vertices at one
time index may be incompatible with minimization of the number of vertices at another time
index. In fact, in the next subsection we will give an example of a code which does not admit
minimal trellis representation. However, if C is a linear code, then the minimal trellis for C
not only exists but is also unique up to isomorphism. This remarkable result is established
in the next subsection. We also show in the next subsection that the same is true for the
more general class of rectangular codes, which includes the linear codes as a special case.
In a later subsection, we briey survey several well-known constructions of the minimal trellis
for linear codes. Some of these constructions, due to Bahl, Cocke, Jelinek, and Raviv 2],
Massey 80], Forney 38], and Kschischang and Sorokine 70], will be presented without
proof. Although the constructions themselves are dierent, the fact that the minimal trellis
is unique implies that they all produce one and the same trellis, up to isomorphism.
Finally, in the last subsection, we investigate the relations between the dynamical properties
of a linear code C and the structural properties of trellises that represent C . As we shall see,
the minimal trellis invariably exhibits an extremal structure. In particular, we will prove that
the minimal trellis simultaneously minimizes all the trellis complexity measures introduced
in Section 2.3, among all possible trellis representations for a given code C .
22
4.1. Existence and uniqueness
We start by restricting our attention to proper trellises, as dened in Section 2.2. In partic-
ular, we dene the minimal proper trellis as follows.
Denition 4.2. Let T be a proper trellis for a code C of length n. We say that T is the
minimal proper trellis for C if it satises the following property: for each i = 0 1 : : : n, the
number of vertices in T at time i is less than or equal to the number of vertices at time i in
any other proper trellis for C .
Restricting the denition of minimality to minimization over the set of proper trellises leads
to the following strong result, which holds for any block code, linear or not.
Theorem 4.1. Every block code has a minimal proper trellis, and any two minimal proper
trellises for the same code are isomorphic.
To prove Theorem 4.1, we will proceed by showing that every proper trellis T for C denes
a certain equivalence relation. We will use another equivalence relation to dene a trellis
T for C , and then show that this trellis is minimal among all proper trellises for the same

code. This approach follows the proof of Theorem 4.1 given by Muder in 87].
We note that similar results are well known in the system theory literature, since the work
of Willems 124, 125]. However, the translation of the results of Willems 124, 125] into the
language of block code trellises is not entirely obvious.
Let C be a code of length n over a nite alphabet A. For each i = 1 2 : : : n;1, we dene
two punctured versions of C as follows:
Pi = f(c c : : : ci) : (c : : : ci ci : : : cn) 2 C for some ci : : : cn 2 Ag (37)

1 2 1 +1 +1
Fi = f(ci ci : : : cn) : (c : : : ci ci : : : cn) 2 C for some c : : : ci 2 Ag (38)

+1 +2 1 +1 1
with Pn = F = C and P = Fn = ?, by convention. The code Pi , respectively Fi , is

known 39, 70, 115] as the projection of C on the past, respectively future, at time i.
0 0
For each i, a proper trellis T for C denes an equivalence relation on the codewords of Pi
as follows. Given a codeword c 2 Pi and a path P = e e : : : ei starting at the root of T ,

1 2
we say that P corresponds to c, if c is the sequence of edge labels along P , namely if

c = ((e ) (e ) : : : (ei ))
1 2 (39)
where (e) stands for the label of e. This correspondence between paths of length i in T and
codewords in Pi is one-to-one for all i = 1 2 : : : n if and only if T is proper. We say that

two codewords c c 2 Pi are T -equivalent, and write c T c , if the paths in T corresponding

0 0
to these codewords end at the same vertex. The number of equivalence classes thus dened is
obviously equal to the number of vertices at time i in T . Furthermore, there is a one-to-one
correspondence between T -equivalence classes and vertices in Vi .
If T is not proper, the relation dened in this way need not be transitive, and thus need not be an equivalence
relation. For example, in the improper trellis of Figure 5b, we have 00 T 10 T 11, but 00 6 T 11.
23
We can dene another equivalence relation on the codewords of Pi that is induced by the code
C itself rather than by any particular trellis for C . This is known 95, 125] as past-induced
future equivalence. Specically, for each c 2 Pi , we dene the future of c in C as follows:

F (c) = f x 2 An i : (c x) 2 C g
def ;
(40)
where ( ) denotes string concatenation. We say that c c 2 Pi are future-equivalent and 0
write c c , if F (c) = F (c ). It is easy to see that this is indeed an equivalence relation.

0 0
Proposition 4.2. Let T be a proper trellis for C . Then any two codewords c c 2 Pi that 0
are T -equivalent are also future-equivalent.

Proof. Let V = V
V

Vn be the partition of the vertex set of T , as dened in (1).
Given a vertex v 2 Vi, we dene the past of v in T and the future of v in T as follows:
0 1
PT (v) = f x 2 Ai : x is a sequence of edge labels along a path in T ending at v g (41)

def
FT (v) = f x 2 An i : x is a sequence of edge labels along a path in T starting at v g (42)

def ;
Now suppose that the unique path in T corresponding to a codeword c 2 Pi ends at v 2 Vi.
Since T represents C , it follows that F (c) = FT (v). Thus if the paths in T corresponding to
c c 2 Pi end at the same vertex v 2 Vi, then F (c) = F (c ) = FT (v).
0 0
We let jVi j denote the number of future-equivalence classes in Pi . At this point, jVi j is just

an elaborate notation the signicance of this notation will become clear shortly.
Corollary 4.3. Let T = (V E A) be a proper trellis for C , and let V = V
V

Vn
be the partition of the vertex set of T . Then jVij jVi j for all i = 1 2 : : : n.
0 1

Proof. By Proposition 4.2, the number of equivalence classes induced by future-equivalence

cannot be greater than the number jVij of equivalence classes induced by T -equivalence.
In light of Corollary 4.3, if we could construct a trellis for C whose vertices are the equivalence
classes induced by future-equivalence, such a trellis would be a minimal proper trellis for C .
Specically, for each i = 0 1 : : : n, let Vi denote the set of future-equivalence classes in Pi .

Notice that each of P = ? and Pn = C may be regarded as a single future-equivalence class

by convention, so that V and Vn each consist of a single element. Now let

0

0
V = V
V

Vn
def
0

1

This is the vertex set of the minimal proper trellis T = (V E A) for C . The edge set E

is dened as follows: there is an edge from a vertex v 2 Vi to a vertex v 2 Vi if and only if 0
there is a codeword (c c : : : cn) 2 C , such that (c c : : : ci) 2 v and (c c : : : ci ) 2 v .

+1
0
The label of this edge is ci 2 A. For example, the future-equivalence classes for the nonlinear
1 2 1 2 1 2 +1
code C = f000 100 101 111g at times i = 1 and i = 2 are given by

+1
P = f0g
f1g
1

P = f00g
f10g
f11g
2

The codeword (100) 2 C , for instance, implies that there is an edge labeled 0 from f1g to f10g
in the minimal proper trellis T for C . This trellis is depicted in Figure 5a.

24
The following proposition shows that, in general, the trellis T dened in the previous para-
graph is a proper trellis that represents C .

Proposition 4.4. The trellis T is a minimal proper trellis for C .
Proof. We rst prove that T is a proper trellis. Assume to the contrary the existence of

two distinct but identically-labeled edges starting at the same vertex of T , say e = (v v a) 0
and e = (v v a) with v 2 Vi . By construction of T , this assumption implies the existence

1
00
of two codewords c c 2 C , such that (c c : : : ci ) 2 v and (c c : : : ci) 2 v, while

2
0 0 0 0
1 2 1 2
(c c : : : ci ci ) 2 v
1 2 (c c : : : ci ci ) 2 v
+1
0
(43) 0
1
0
2
0 0
+1
00
and ci = ci = a. By the denition of Vi we have (c c : : : ci) (c c : : : ci), which

0
+1 +1

1 2
0
1
0
2
0
further implies that

(c c : : : ci a) (c c : : : ci a)
1 2 (44) 0
1
0
2
0
Since the vertices v v 2 Vi are future-equivalence classes, it follows from (43) and (44)
0 00
that v = v , contrary to our original assumption. Thus T is proper by Denition 2.3b.

+1
00 0
We next prove that T represents C . Since the codewords of C were used to dene the edges

of T , it is clear that C C (T ). To prove that C (T ) C , we show by induction on i that

every path of length i starting at the root in T corresponds to some codeword of Pi . Since
Pn = C , the fact that C (T ) C follows as the special case of this statement for i = n.

As an induction hypothesis, assume that the statement is true for all paths of length i, and
consider a path P = e e : : : ei starting at the root of T . By the induction hypothesis,
there is a codeword x = (x x : : : xn) 2 C , such that (x x : : : xi ) 2 Pi corresponds to the

1 2 +1

1 2 1 2
rst i edges of P . Now consider the last edge ei = (v v a). By construction of T , there ex- 0
ists a codeword c = (c c : : : cn) 2 C , possibly distinct from x, such that (c c : : : ci ) 2 v,

+1
(c c : : : ci ci ) 2 v , and ci = a. Since the paths in T corresponding to (x x : : : xi)

1 2 1 2
0
and (c c : : : ci ) end at the same vertex v, we have (x x : : : xi ) T (c c : : : ci). In

1 2 +1 +1 1 2
1 2 1 2 1 2
view of Proposition 4.2, this further implies that

(x x : : : xi ) (c c : : : ci )
1 2 (45) 1 2
Thus any (n;i)-tuple in the future of (c c : : : ci) is also in the future of (x x : : : xi ).

In particular, it follows from (45) that (x : : : xi ci : : : cn) is a codeword of C . Since
1 2 1 2
ci = a, this further implies that (x x : : : xi a) is a codeword of Pi .

1 +1

+1 1 2 +1
The above establishes the induction step and proves that T is indeed a trellis for C . Finally,
the fact that T is the minimal proper trellis for C follows immediately from Corollary 4.3.

The foregoing proposition establishes the existence of the minimal proper trellis. To complete
the proof of Theorem 4.1, we have to establish its uniqueness.
Proposition 4.5. Any minimal proper trellis for C is isomorphic to T .
Proof. Let T be a minimal proper trellis for C , with vertex set V = V

V

Vn .
Given a codeword c 2 Pi , let v(c) be the T -equivalence class of c. Let v (c) be the T -equi-
0 1

valence class of c, which is also its future-equivalence class. By Proposition 4.2, we have that
v(c) v (c) for all c 2 Pi . Since T is minimal, it follows that jVij = jVi j and the total number

25
of equivalence classes in Pi induced by T and T is the same. This implies that v(c) cannot be

smaller than v (c), and hence v(c) = v (c). The latter equality leads to a natural one-to-one

correspondence
: V ! V . For each v 2 Vi, choose a codeword c 2 Pi such that the unique

path in T corresponding to c ends at v, and dene

(v) = v (c), with v (c) being interpreted

as a vertex of V . It is now easy to verify that

() is an isomorphism between T and T .

Having proved Theorem 4.1, we note that a similar result is known in automata theory 54]
as the Myhill-Nerode theorem, which says that every nite-state automaton is equivalent
to a unique minimal deterministic nite-state automaton. Proper trellises for block codes
are the counterparts of deterministic nite-state automata in formal language theory. In
other elds, such as symbolic dynamics, system theory, and the study of Markov chains,
a proper trellis would be called, respectively, right-resolving, past-induced, or unilar |
see the multilingual dictionary 41] for more details. It appears that results analogous to
Theorem 4.1 were developed more or less independently in each of these elds.
It is natural to ask whether the minimal proper trellis remains minimal under minimization
over all trellises for C , not only the proper ones. The following example, due to Muder 87],
shows that in general this is not the case.
Example 4.1. Consider the nonlinear binary code C = f000 100 101 111g. The unique
minimal proper trellis for this code is depicted in Figure 5a.
0 0
0 0 0 0
0
0
0 1
1 1 1 1
0
1
1
a. b.
Figure 5. Minimal proper trellis and improper minimal trellis for the same code
However, the improper trellis in Figure 5b represents the same code and has less vertices. It
is easy to see that this improper trellis is minimal, according to Denition 4.1. }
Example 4.1 is just a tip of the iceberg of the various diculties that arise in constructing
minimal trellises for general nonlinear codes. Kschischang and Sorokine 70] give a series
of examples, which show that if C is a nonlinear code then: the minimal trellis for C may
be unobservable, C may have several non-isomorphic minimal trellises, or C may not admit
minimal trellis representation at all. To describe these examples here, we now briey discuss
some of the formalism introduced by Kschischang and Sorokine 70].
26
With the past and future projections Pi and Fi dened in (37),(38), it is clear that every

code C is a subset of the Cartesian product Pi Fi . Thus one can think of C as a relation

between codeword past and codeword futures: we say that a codeword future f 2 Fi follows
a codeword past p 2 Pi if (p f ) 2 C . The past/future relation induced by C at time i may be

represented by a Cartesian array Ai whose rows and columns are indexed by the codewords
of Pi and Fi , respectively. The entries of a Cartesian array are either blank or . There is a

in row p and column f in Ai if and only if (p f ) 2 C . The total number of in Ai is equal
to the number of codewords, for all i. For example, Cartesian arrays for the past/future
relations in the code C = f000 100 101 111g of Example 4.1 are shown below:
0 1
000
00 01 11
00
000 100 101 111
0 100
10 (46)
1 101
11
111
Now let T = (V E A) be a trellis for C , and consider a vertex v 2 Vi. With the past PT (v)
and future FT (v) of v as dened in (41),(42), it is clear that
PT (v) FT (v) C
In other words, every future in FT (v) follows every past in PT (v). This means that every
vertex of Vi corresponds to a complete rectangle in the Cartesian array Ai , possibly up to
a permutation of rows and columns. Furthermore, since each codeword of C corresponds to
some path in T , and this path must pass through some vertex v 2 Vi, we have

C = PT (v) FT (v) for i = 0 1 : : : n (47)
v Vi
2
It follows that the collection of rectangles corresponding to all the vertices of Vi must cover
all the in the Cartesian array Ai, for all i = 0 1 : : : n. For example, the trellis in Figure 5b
is equivalent to the following covering of the Cartesian arrays in (46) by rectangles
0 1
000
00 01 11
00
000 100 101 111
0 100
10
1 101
11
111
Remark. In general, it follows from (47) that the problem of minimizing the number of
vertices in a trellis T for C is equivalent to the problem of covering the Cartesian array Ai
by the minimum number of rectangles 70]. The latter problem is also equivalent to the
problem of covering the edges of an arbitrary bipartite graph by the minimum number of
complete bipartite subgraphs, or bicliques 70, 92, 93]. This computational task was shown
to be NP-hard by Orlin 88] | see also Garey and Johnson 48, p.194, Problem GT18].
27
Having established the formalism of Cartesian arrays and their relation to trellises, we are
ready to discuss the examples of Kschischang and Sorokine 70].
Example 4.2. This example shows that the minimal proper trellis for a nonlinear code C
may have exponentially more vertices than the minimal trellis. Vertices in the minimal proper
trellis correspond to groups of equal rows in the Cartesian array: two pasts p p 2 Pi are
in the same future-equivalence class if and only if the rows of Ai indexed by p and p are
1 2
identical. Now consider the code C m of length n = 2 with jC m j = 2m ; 1, dened as the

1 2
subset of f1 2 : : : 2m gf1 2 : : : mg for which the Cartesian array at time i = 1 consists
;1
of all the 2m ; 1 distinct non-blank rows. For instance, the Cartesian array shown below
1 2 3
1
2
3
4
5
6
7
denes the code C = f13 22 32 33 41 51 53 61 62 71 72 73g. Since all the 2m ; 1 rows
in the Cartesian array are distinct, the minimal proper trellis for C has 2m ; 1 vertices at
3
time i = 1. On the other hand, an improper trellis obtained by grouping the columns of the
Cartesian array has only m vertices at time i = 1. In fact, this trellis corresponds to the
minimal proper trellis for the time-reversed version of C . }
Example 4.3. All the minimal trellises we have seen so far were one-to-one. This example
shows that, in general, the minimal trellis does not have to be one-to-one. Consider the
ternary code C = f00 01 10 11 12 21 22g of length n = 2.
0 1 2
0 0 0
1 1
1
1 1
2 2 2
a. b.
Figure 6. Cartesian array and the corresponding minimal trellis

The corresponding Cartesian array, together with its covering by rectangles, is shown in
Figure 6a, and the resulting minimal trellis T for C is shown in Figure 6b. It is easy to see
28
that this minimal trellis for C is unique. Observe that the codeword (1 1) 2 C corresponds
to two distinct paths in T , so that the unique minimal trellis for C is not one-to-one. }
Example 4.4. In the foregoing two examples, the minimal trellis for C was unique, up to
isomorphism. However, this example shows that, even for binary codes of length n = 2,
the minimal trellis need not be unique. Indeed, consider the code C = f00 10 11g. The
Cartesian array for the past/future relation induced by C at time i = 1 is shown below:
0 1 0 1
0 0
1 1
0 0 0
0
1
0
1 1 1 1
As we can see, this array admits two distinct minimal coverings, which correspond to two
non-isomorphic minimal trellises for C . }
For all the codes encountered so far, we were able to construct at least one minimal trellis,
and it is natural to ask whether every block code admits a minimal trellis representation.
The next example answers this question in the negative by exhibiting a code which does not
have a minimal trellis. It is presented without proof | we refer the reader to Kschischang
and Sorokine 70] for a detailed treatment.
Example 4.5. This example shows that, in general, minimizing the number of vertices in
a trellis at one time index may be incompatible with minimizing the number of vertices at
another time index. Consider the code
C = f115 122 123 213 214 215 222 223 224 313 314 316 321 324 326 414 416 421 426g
This is a 19-element subset of the set of 3-tuples over Z . The Cartesian array A for the
past/future relation induced by C at time i = 1 is given by
6 1
15 22 23 13 24 14 16 21 26
2
(48)
3
The covering of all the in (48) by three rectangles implies that there exists a trellis T
for C with three vertices at time i = 1. Furthermore, it is easy to see that this is the unique
1
29
way to cover A with only three rectangles. On the other hand, the Cartesian array for the
past/future relation induced by C at time i = 2 is given by
1
5 2 3 4 6 1
11
12
22
21
31
(49)
41
32
42
The covering in (49) implies that there exists a trellis T for C with only ve vertices at
2
time i = 2. It is not dicult to see that the coverings in (48) and (49) are incompatible: they
cannot correspond to vertices of the same trellis. Kschischang and Sorokine 70] furthermore
show that the covering in (48) forces a trellis for C with at least six vertices at time i = 2.
This implies that a minimal trellis for C does not exist. }
None of the diculties illustrated in the foregoing examples is encountered if the past/future
relation at each time is rectangular. A relation is said to be rectangular if the correspond-
ing Cartesian array can be arranged, possibly under row and column permutations, into
a collection of complete non-overlapping rectangles with no rows or columns in common.
A Cartesian array for a rectangular relation is depicted schematically below:
A code C is rectangular if the past/future relation induced by C is rectangular at each time

index. This property is expressed succinctly in the following denition.
Denition 4.3. A code C is said to be rectangular if (a c) (a d) (b c) 2 C implies (b d) 2 C ,
for all choices of a b c d.
For example, the quaternary code C = f000 111 112 211 212 323 333g is rectangular, and
the Cartesian arrays for the past/future relations induced by C are depicted in Figure 7.
30
The following proposition shows that there are many more rectangular codes. It is a simple
observation, which is nonetheless key to the proof of our main result in this subsection.
Proposition 4.6. Every linear code is rectangular.
Proof. Suppose that (a c) (a d) (b c) are codewords in a linear code C , for some a b c d.
Then (b d) = (b c) ; (a c) + (a d) is also a codeword of C .
The class of rectangular codes is considerably more general than the class of linear codes. The
code C = f000 111 112 211 212 323 333g is a specic example of a nonlinear rectangular
code. The Nordstrom-Robinson code is another such example 110, 92]. In general, the class
of rectangular codes includes all group codes of 42] and many notoriously `nonlinear' codes,
such as shells of constant norm in the integer lattice, permutation codes, and constant-weight
subcodes of a linear code. For more examples of rectangular codes, see Kschischang 68].
0 1 2 3 000
00 11 12 23 33
00 111
0
11 112
000 111 333
1
... 21 211
2
32 212
3
33 323
333
Figure 7. Cartesian arrays for the rectangular code f000 111 112 211 212 323 333g
Rectangular codes have the remarkable property that for all c c 2 Pi , the futures F (c )
and F (c ) are either equal or disjoint. Similarly, for each x 2 Fi , we can dene the past of
1 2 1

x in C , in a manner analogous to (40), as follows:

2
P (x) = f c 2 Ai : (c x) 2 C g
def
(50)
Then for all x x 2 Fi , the pasts P (x ) and P (x ) are also either equal or disjoint, provided

C is rectangular.
1 2 1 2
Proposition 4.7. If C is a rectangular code, then F (c ) \F (c ) =6 ? implies F (c ) = F (c ),

6 ? implies P (x ) = P (x ).
and P (x ) \ P (x ) =
1 2 1 2
1 2 1 2
Proof. Suppose that F (c ) and F (c ) are not disjoint, and let x 2 F (c ) \ F (c ). Then
for any a 2 F (c ), we have (c a) (c x) (c x) 2 C , which implies that (c a) 2 C , if C is
1 2 1 2
rectangular. Thus a 2 F (c ). By a similar argument, any b 2 F (c ) also belongs to F (c ).

2 2 2 1 1
Hence F (c ) = F (c ). The case P (x ) \ P (x ) = 6 ? is completely similar.

1 1 2
1 2 1 2
31
Using Proposition 4.7, we can prove the following result for rectangular codes. Although
this result is analogous to Theorem 4.1, it is actually considerably stronger, as the examples
given earlier in this subsection demonstrate.
Theorem 4.8. Every rectangular code has a minimal trellis, and any two minimal trellises
for the same rectangular code are isomorphic.
To prove Theorem 4.8, we will show that if C is a rectangular code then the minimal proper
trellis for C , constructed in Proposition 4.4, is also the minimal trellis for C , and furthermore
any minimal trellis for C is proper. The uniqueness of the minimal trellis then follows
immediately from Theorem 4.1.
First, we need to generalize the notion of T -equivalence to arbitrary, not necessarily proper,
trellises. In the following denition, T is an arbitrary trellis for an arbitrary block code.
Denition 4.4. We say that two codewords c c 2 Pi are T -adjacent and write c !
1 2
T
1 c, 2
if there is a path P in T corresponding to c and a path P in T corresponding to c , such

1 1 2 2
that P P start at the root and end at the same vertex of T .

1 2
Denition 4.4 is analogous to the denition of T -equivalence for proper trellises. However,
T -adjacency is not necessarily an equivalence relation if T is not proper. For example, in the
improper trellis of Figure 5b, we see that 00 ! T
10 and 10 ! T
11, but 00 and 11 are not
adjacent. Thus the ! relation is reexive and symmetric, but not necessarily transitive.
T
Nevertheless, the notion of T -adjacency can be extended to an equivalence relation on the

codewords of Pi as follows. Consider a graph GiT whose vertices are codewords of Pi . This

graph is not labeled and not directed. The edge set of GiT is dened by the T -adjacency
relation: there is an edge between c 2 Pi and c 2 Pi in GiT if and only if c !
1

2

1
T
c. 2
Denition 4.5. We say that two codewords c c 2 Pi are T -equivalent and write c T c ,

if there is a path from c to c in GiT .

1 2 1 2
1 2
It is obvious that T -equivalence, as dened above, is an equivalence relation for any trellis T ,
proper or not. Let !i denote the number of equivalence classes in Pi thus dened, which is

equal to the number of connected components in GiT .

Proposition 4.9. Let V = V
V

Vn be the partition of the vertex set of T . Then
0 1
jVij !i for all i = 1 2 : : : n (51)

Equality holds for all i = 1 2 : : : n if and only if T is proper.
Proof. Each vertex v 2 Vi denes a clique in GiT , since the codewords of Pi that corre-
spond to those paths in T that end at v are all T -adjacent to each other. The edge set of GiT
is the union of the edge-sets of all such cliques. Thus the number of connected components
in GiT cannot be larger than the number of vertices in Vi.
Equality holds at time i if and only if the cliques in GiT dened by the vertices of Vi are
not connected to each other. A simple inductive argument then shows that equality in (51)
holds for all i = 1 2 : : : n if and only if T is proper. Indeed, suppose that the paths
P = e e : : : ei and P = e e : : : ei in T correspond to the same codeword of Pi .
1 1 2 2
0
1
0
2
0
32
If P and P end at distinct vertices of Vi, then the cliques dened by these vertices are
connected and jVij > !i . Similarly, if the paths of length i ; 1 obtained from P P , namely
1 2
e e : : : ei and e e : : : ei , end at distinct vertices of Vi then jVi j > !i . Equality

1 2
0 0 0
in (51) at times i and i ; 1 thus implies that ei = ei . Continuing in this manner, we conclude
1 2 ;1 1 2 ;1 ;1 ;1 ;1
0
that e e : : : ei = e e : : : ei . Thus P = P , and T is proper by Denition 2.3a.

1 2
0
1
0
2
0
1 2
Our proof of Proposition 4.9 shows that Denition 4.5 indeed reduces to the simpler notion
of T -equivalence introduced earlier in the special case of proper trellises. Using the more
general form of Denition 4.5, along with Proposition 4.7, we can now prove the general form
of Proposition 4.2 for arbitrary trellis representations of rectangular codes.
Proposition 4.10. Let T be a trellis for a rectangular code C . Then any two codewords
c c 2 Pi that are T -equivalent are also future-equivalent.
1 2

Proof. This follows from the fact that if C is rectangular and two codewords x x 2 Pi
are T -adjacent, then F (x ) = F (x ) in C . Indeed, suppose that there are paths P P in T

1 2
that correspond to x x , respectively, and end at the same vertex v 2 Vi. By the denition
1 2 1 2
1 2
of a trellis, there is a path P from v to the toor. The sequence of edge labels along this
path P is in the future of both x and x . Thus F (x ) \ F (x ) 6= ?. Proposition 4.7 then
implies that F (x ) = F (x ), and x x are future-equivalent. Now suppose that c c 2 Pi
1 2 1 2

1 2 1 2 1 2
are T -equivalent, and let

c !T
xk ! T
xl !
1
T
!
T
xm !T
c 2
be a T -adjacency path from c to c in GiT . By the foregoing argument, we conclude that

F (c ) = F (xk ) = F (xl ) = = F (xm) = F (c ), and hence c c are future-equivalent.
1 2
1 2 1 2
We are now ready to complete the proof of Theorem 4.8. Let C be a rectangular code.
Let T = (V E A) be the minimal proper trellis for C whose vertices are the future-

equivalence classes, as discussed in Proposition 4.4. Let T = (V E A) be any other trellis
for C . Combining Propositions 4.9 and 4.10, we conclude that
jVij !i jVi j
for all i = 1 2 : : : n (52)
This implies that T is the minimal trellis for C , according to Denition 4.1. Furthermore,

by Proposition 4.9, we have jVij = !i for all i if and only if T is proper. In view of (52),
this implies that any minimal trellis for C must be proper. The fact that any minimal trellis
for C is isomorphic to T now follows from Proposition 4.5.

Remark. An obvious corollary of Proposition 4.6 and Theorem 4.8 is that every linear code
has a minimal trellis, which is unique up to isomorphism.
Having proved Theorem 4.8, it is natural to ask whether the converse is also true. Namely, is
it true that any block code that has a unique minimal trellis is necessarily rectangular? The
answer to this question turns out to be negative. One counter-example is the ternary code
C = f00 01 10 11 12 21 22g discussed in Example 4.3, which has a unique minimal trellis
depicted in Figure 6b, although it is not rectangular. Complete characterization of the class
of codes that admit unique minimal trellis representation remains an open problem.
33
4.2. Constructions of the minimal trellis
The minimal trellis T was constructed in the previous subsection by identifying the vertices

of T with equivalence classes induced by future-equivalence. Although this approach works

for any rectangular code C , in this subsection we concentrate on the important special case
of linear codes, and describe several alternative constructions of the minimal trellis.
We point out that all these constructions are really alternative ways of dening the sets of
vertices and edges in the minimal trellis: they might be called constructions by a mathemati-
cian, but should not be construed as `constructions' in the sense usually assigned to this word
in computer science. The constructions we describe provide useful insight given a parity-
check or generator matrix for a linear code C , they make it possible to readily determine the
properties of the minimal trellis for C . None of them, however, leads to an algorithm that
explicitly constructs the minimal trellis for a general linear code in polynomial time. As we
shall see in the next section, such an algorithm does not exist | the number of vertices in
the minimal trellis for a linear code of length n grows exponentially with n in most cases.
We describe the constructions due to Bahl, Cocke, Jelinek, and Raviv 2], Massey 80], For-
ney 38], and Kschischang-Sorokine 70], in chronological order. Given a linear code C , each
construction species a trellis T for C . Thus, in general, one needs to prove that T indeed rep-
resents C and that it is minimal. We will establish minimality for all the four constructions.
However, we will prove representation only for the BCJR trellis and the Kschischang-Sorokine
trellis. We refer the reader to 68, 70, 82, 87], where some of the other proofs may be found.
Bahl, Cocke, Jelinek, Raviv construction. Let C be a linear code of length n over IFq .
Let H = h h : : : hn ] be a parity-check matrix for C . The BCJR trellis T = (V E IFq ) for C
1 2
is constructed by identifying the vertices in Vi with partial codeword syndromes, taken with
respect to the rst i columns of H . Specically, the set of vertices at time i is given by
Vi = f c h + + ci hi : (c : : : ci ci : : : cn) 2 C for some ci : : : cn 2 IFq g (53)
def
1 1 1 +1 +1
with V = fg = f0g by convention. Since the syndrome of each codeword is 0 by denition,
we also have Vn = f'g = f0g. There is an edge e 2 Ei from a vertex v 2 Vi to a vertex
0
v 2 Vi if and only if there exists a codeword (c c : : : cn) 2 C , such that

;1
0
1 2
c h + c h + + ci hi = v
1 1 2 2 ;1 ;1
c h + + ci hi + cihi = v
1 1 ;1 ;1
0
The label of this edge is (ei ) = ci . Let Hi, respectively Gi, denote the matrix consisting
of the rst i columns of H , respectively G, where G is a generator matrix for C . Then it is
obvious from the denition of Vi in (53) that
Vi = column-space Hi GTi (54)
where T denotes transposition. Thus the set of vertices at time i is a linear space for all i.
Indeed, Vi is the image of C under the linear mapping i : C ! Vi dened by
c = (c c : : : cn ) 2 C 7! i(c) = c h + + ci hi + cihi
1 2
def
1 1 ;1 ;1 (55)
The edge set Ei is also a linear space for all i. It is the image of C under the linear mapping
c = (c c : : : cn) 2 C 7! ( i (c) i(c) ci). Thus when T is the minimal trellis for a linear
1 2 ;1
34
code C , it makes sense to consider vertex-spaces (or state-spaces) and edge-spaces (or branch-
spaces). We let si = dim Vi = logq jVi j and bi = dim Ei = logq jEi j denote the dimensions of
these vector spaces, as in equations (7), (13) of Section 2.
Example 4.6. Consider the (7 3 3) binary linear code C , dened by the following parity-
check matrix: 2 3
1 1 1 0 0 0 0
6 7
H = 64 01 10 00 10 01 01 00 75 (56)
1 0 0 0 1 0 1
Then by the denition in (53) we have V = V = f0g, while V V V V V V can be 0 7 1 2 3 4 5 6
represented according to (54) as the column-spaces of the following matrices

21 1 03 21 1 13 20 0 03 20 0 03 20 0 03 20 0 03
66 0 0 0 77 66 0 0 1 77 66 0 0 1 77 66 0 0 0 77 66 0 0 0 77 66 0 0 0 77
41 1 05 41 1 05 41 1 05 41 1 05 41 0 05 40 0 05 (57)
110 110 110 110 100 100
respectively. Notice that in order to compute (57), we assumed that the (7 3 3) code C de-
ned by (56) is generated by the following matrix
2 3
1 0 1 0 0 1 1
G = 41 0 1 0 1 0 05
0 1 1 1 0 0 0
If we were to assume a dierent generator matrix for C , the matrices in (57) might be
dierent, but their column-spaces would be the same.
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
1 0
0 0
1
1 1 1 1
1 1 1
0 1 0 0 1
1 1 0 0 0 0
0 1 1 0 0 0
1 1 1 1 1 0
0 0
1 0 1 0 1 1 1
1 1
0 0
1 1
1 1
1 0 1
Figure 8. The minimal trellis for C resulting from the BCJR construction
A straightforward inspection of (57) readily shows that the state-complexity prole is given
by fs s s s s s s s g = f0 1 2 2 1 1 1 0g. The resulting BCJR trellis for C is
0 1 2 3 4 5 6 7
shown in Figure 8. The edge complexity prole can be also obtained by inspection of (57)
and Figure 8, as follows fb b b b b b b g = f1 2 2 2 2 1 1g.
1 2 3 4 5 6 7
}
35
We now show that the BCJR trellis T = (V E IFq ) indeed represents the linear code C whose
parity-check matrix is H = h h : : : hn]. Since codewords of C dene the edge set of T , it
is obvious that C C (T ). It remains to show that C (T ) C , or in other words that every
1 2
path from to ' in T corresponds to a codeword of C . Let P = e e : : : en be such a path.

Notice that for each edge e = (v v a) 2 Ei in T , we have v = v + ahi by the construction
1 2
0 0
of T . A simple inductive argument now shows that if ei is the i-th edge in P , then
ei = e + (e )h + (e )h + + (ei )hi
1 1 1 2 2 (58)
In particular, we obtain en = e + (e )h + (e )h + + (en )hn for i = n in (58).
Since en = ' = 0 and e = = 0, it follows that (e )h + (e )h + + (en)hn = 0.
1 1 1 2 2
Thus the sequence of edge-labels along P has syndrome 0 with respect to the parity-check
1 1 1 2 2
matrix H , and therefore corresponds to a codeword of C .

Theorem 4.11. The BCJR construction produces the minimal trellis.
Proof. Referring to Proposition 4.4 and Proposition 4.10 of the foregoing subsection, to
establish the minimality of the BCJR trellis T it would suce to prove that if two codewords
of Pi are future-equivalent then they are also T -equivalent, for all i = 1 2 : : : n. Thus let

c c 2 Pi be future-equivalent, and let x = (xi xi : : : xn) be any element of the common

0
future F (c) = F (c ). Then H (c x)t = H (c x)t = 0. This obviously implies that
+1 +2
0 0
c h + + cihi = ;(xi hi + + xnhn) = c h + + ci hi

1 1 +1 +1
0
1 1
0
It now follows from the denition of Vi in (53) that the paths in T corresponding to c and c 0
end at the same vertex of Vi. Hence c and c are T -equivalent. 0
The fact that the BCJR trellis is minimal was rst established in 128] and 82]. However,
the proof of Theorem 4.11 presented here diers from the argument of 82, 128].
Massey construction. We begin with some necessary denitions. Given a nonzero vector
x = (x x : : : xn) over IFq , we let L(x) denote the smallest integer i such that xi =
6 0. We
call L(x) the left index of x. Given a k n matrix M = xij ] over IFq , we say that M is in
1 2
row-reduced echelon form if the rows x x : : : xk of M are such that

1 2
L(x ) < L(x ) < < L(xk )

1 2
and the k columns found at positions L(x ) L(x ) : : : L(xk ) in M are all of weight one: if
j = L(xi), then xij 6= 0 is the only nonzero entry in the j -th column of M .
1 2
Let C be a linear code of length n and dimension k over IFq , and let G be a k n generator
matrix for C . Without loss of generality, we assume that G is in row-reduced echelon form
and denote the left indices of its rows by : : : k . This implies that < < < k ,
and that the k positions : : : k form an information set for C . Thus if
1 2 1 2
1 2
(c c : : : cn) = (u u : : : uk ) G
1 2 1 2
then (c1 c2 : : : ck ) = (u u : : : uk ) may be called the information symbols. We refer to
the remaining n;k symbols in (c c : : : cn) as parity symbols. One may think of the parity
1 2
1 2
symbols in each codeword as being determined by the information symbols in that codeword.
36
The Massey trellis T = (V E IFq ) for C is constructed by identifying the vertices in Vi with
the parity symbols that are yet to be observed at time i, as determined by the information
symbols that have been already observed at time i assuming that all the other information
symbols are zero. More precisely, let m be the largest integer such that m i. Then
Vi = f (ci ci : : : cn ) : (c c : : : cn) = (u u : : : um 0 : : : 0) G g
def
+1 +2 1 2 1(59) 2
where u u : : : um 2 IFq range over all the qm possible values. We have V = f0g, while
Vn = fg by convention, where is the empty string. The edge set of T is dened as follows,
1 2 0
while distinguishing between two cases. If i > m, then there is an edge e 2 Ei from a vertex
v 2 Vi to a vertex v 2 Vi if and only if there exists a codeword (c c : : : cn ) 2 C , such that
;1
0
1 2
(ci ci : : : cn ) = v +1
(ci : : : cn ) = v +1
0
The label of this edge is (e) = ci. Notice that in this case, for each vertex v 2 Vi , there ;1
is exactly one edge that begins at v. On the other hand if i = m , then there is an edge
e 2 Ei from a vertex v 2 Vi to a vertex v 2 Vi if and only if there exists a pair of codewords
0
c = (c c : : : cn) and c = (c c : : : cn) in C , such that

;1
0 0 0 0
1 2 1 2
(ci ci : : : cn ) = v +1
(ci : : : cn ) = v
0
+1
0 0
and either c = c, or (c ; c) equals the m-th row of G for some nonzero constant 2 IFq .
0 0
The label of this edge is (e) = ci . In this case, each vertex v 2 Vi will have out-degree q.
0
;1
Theorem 4.12. Massey construction produces the minimal trellis.

Proof. It is obvious that the set of vertices at time i in the Massey trellis is again a linear
space for all i. Let C be the subcode of C generated by the rst m rows of G. Then Vi is
0
the linear code obtained by puncturing-out the rst i positions of C . This code is generated 0
by the matrix Gmn i consisting of the rst m rows and the last n;i columns of G. Hence
;
Vi = row-space Gmn i (60) ;
Now let H be a parity-check matrix for C , and let G$ n i, H$ n i denote the matrices consisting
of the last n ; i columns of G and H , respectively. Then it is easy to see that
; ;
rank Gmn i = rank H$ n i G$ Tn i = rank Hi GTi

; ; ;
(61)
It follows that the Massey trellis and the BCJR trellis have the same number of vertices at
all times. Since the BCJR trellis is minimal, then so is the Massey trellis.
Example 4.7. Consider again the (7 3 3) binary linear code C of Example 4.6. Starting
with the generator matrix for C given in Example 4.6, we obtain the generator matrix
2 3
1 0 1 0 0 1 1
G = 64 0 1 1 1 0 0 0 75 (62)
0 0 0 0 1 1 1
in row-reduced echelon form by means of elementary row operations (it is well-known 25, 81]
that every linear code has a generator matrix in row-reduced echelon form). The left indices
of the rows in (62) are given by = 1, = 2, and = 5. The Massey trellis for C can
1 2 3
37
be now constructed as follows. According to (59) and (60), we set V = f0g, V = fg, and 0 7
identify the sets of vertices V V V V V V with the row-spaces of the following matrices:
2 3 2 3
1 2 3 4 5 6
11 1
0 1 0 0 1 1] 1 0 0 1 1 0 0 1 1 0 1 1 40 05 405
11000 1000 000 11 1
respectively. The resulting trellis for C is depicted in Figure 9. It is easy to see that this
trellis is isomorphic to the BCJR trellis in Figure 8.
0 (000000) (00000) 0 (0000) 0 (000) 0 (00) 0 (0) 0
0
1
1 1
1
0 (10011) 1 (0011) 0 0 1
(010011) (011) (11) (1)
1 1
1
1 1
(11000) (1000)
0
(01011) (1011)
Figure 9. The minimal trellis for C resulting from the Massey construction
Thus the state-complexity proles, as well as all other measures of trellis complexity, are the
same, although the vertices and edges in Figures 9 and 8 have dierent interpretations. }
Forney construction. Let C be a linear code of length n over IFq . Recall that the projec-
tions of C on the past and future at time i were dened in (37),(38) as punctured versions
of C . We now dene for each i = 1 2 : : : n;1, two shortened versions of C as follows:
Pi = f (c c : : : ci) : (c : : : ci ci : : : cn) 2 C for ci = = cn = 0 g (63)
1 2 1 +1 +1
Fi = f (ci ci : : : cn) : (c : : : ci ci : : : cn) 2 C for c = = ci = 0 g (64)

+1 +2 1 +1 1
with Pn = F = C and P = Fn = f0g, by convention. Thus Pi consists of those codewords

of C whose support lies entirely in the past, while Fi consists of those codewords of C whose
0 0
support lies entirely in the future at time i. The codes Pi and Fi are known 38, 39, 74, 82]
as the past subcode and the future subcode of C at time i, respectively.
Evidently, the direct-sum Pi Fi is a linear subcode of C . The Forney trellis T = (V E IFq )
for C is constructed by identifying the vertices in Vi with the cosets of Pi Fi in C , namely:
Vi = C =(Pi Fi )
def
for i = 0 1 : : : n (65)
Since F = Pn = C , we obviously have P F = Pn Fn = C . It follows that V and Vn
both consist of the single coset, which is C itself. The edge set of T is dened as follows.
0 0 0 0
38
There is an edge e 2 Ei from a vertex v 2 Vi to a vertex v 2 Vi if and only if there exists 0
a codeword c = (c c : : : cn ) 2 C , such that c 2 v \ v . The label of this edge is (ei ) = ci .

;1
0
Notice that although the intersection v \ v may in general contain several codewords, all
1 2
0
these codewords coincide in the i-th position and correspond to the single edge e = (v v ci ), 0
unless C contains a codeword of weight one whose nonzero entry is at the i-th position. In
the latter case, there will be q distinctly labeled edges from v to v . 0
Theorem 4.13. Forney construction produces the minimal trellis.

Proof. Let H be a parity-check matrix for C , and recall thelinear mapping i () dened
in (55). Given c 2 C , it is easy to see that i(c) = 0 if and only if c 2 Pi Fi. This is so
because c h + + cihi = 0 if and only if c = a + b, where a = (c : : : ci 0 : : : 0) 2 Pi
and b = (0 : : : 0 ci : : : cn) 2 Fi . Thus Pi Fi is precisely the kernel of i(), and we have
1 1 1
+1
dim i(C ) = dim C ; dim (Pi Fi ) (66)

It follows that the number of elements in i(C ) is equal to the number of cosets of Pi Fi
in C . This means that the Forney trellis and the BCJR trellis have the same number of
vertices at all times. Since the BCJR trellis is minimal, then so is the Forney trellis.
Example 4.8. Consider again the (7 3 3) binary linear code C of Example 4.6 and Exam-
ple 4.7. The past and future subcodes of C can be easily found by inspection of (62). We have
P = P = P = P = f0g, P = f0 0111000g, P = P = f0 0111000 1010100 1101100g,
and P = C F = C , F = f0 0111000 0000111 0111111g, F = F = F = f0 0000111g,
0 1 2 3 4 5 6
and F = F = F = f0g. The direct-sum subcodes Pi Fi are thus given by:

7 0 1 2 3 4
5 6 7
P F = f0 0111000 0000111 0111111g

1 1
P F = f0 0000111g 2 2
P F = f0 0000111g 3 3
P F = f0 0111000 0000111 0111111g

4 4
P F = f0 0111000 1010100 1101100g

5 5
P F = f0 0111000 1010100 1101100g

6 6
This determines the coset structure C =(Pi Fi) at all times. In particular, we can conclude
by inspection of the above that V = V , V = V , and V = V . Explicitly, we have:
n 1 4 2 3
o5 6
V = f0 0111000 0000111 0111111g f1010100 1101100 1010011 1101011g

n o
1
V = f0 0000111g f1010100 1010011g f0111000 0111111g f1101100 1101011g

n o
2
V = f0 0000111g f1010100 1010011g f0111000 0111111g f1101100 1101011g

n o
3
V = f0 0111000 0000111 0111111g f1010100 1101100 1010011 1101011g

n o
4
V = f0 0111000 1010100 1101100g f0000111 0111111 1010011 1101011g

n o
5
V = f0 0111000 1010100 1101100g f0000111 0111111 1010011 1101011g

6
The resulting trellis for C is depicted in Figure 10. The vertices in Figure 10 are labeled by the
representatives of the corresponding cosets in C =(Pi Fi ). For each v 2 V , we have chosen the
39
coset representative as the rst vector that appears in the description of v in the above expres-
sions for V V : : : V . For example, the vertex f1010100 1101100 1010011 1101011g 2 V
is labeled by h1010100i, while the vertex f0111000 0111111g 2 V is labeled by h0111000i.
1 2 6 1
0 0 0 0 0 0 0 0 0 0 0 0
0 0
1
1 1
1
0 1010100 1 1010100 0 0 1
1010100 1010100 0000111 0000111
1 1
1
1 1
0111000 0111000
0
1101100 1101100
Figure 10. The minimal trellis for C resulting from the Forney construction
Thus the Forney trellis for C is isomorphic to the BCJR trellis and to the Massey trellis,
although the `meaning' of the edges and vertices in the Forney trellis is completely dierent. }
Forney observes in 38] that there are two alternative ways to dene the vertices in the
minimal trellis for C in terms of past and future subcodes of C . Indeed, it is shown in 38,
39, 82] that the following quotient groups are isomorphic
Pi =Pi ' Fi =Fi ' C =(Pi Fi)

(67)
This means that it is also possible to think of the vertices in the Forney trellis for C either
as cosets of Pi in Pi or as cosets of Fi in Fi . Similar results are known in linear system

theory 42, 95, 125] as past-induced and future-induced canonical realizations. Notice that
the cosets of Pi in Pi are precisely the future-equivalence classes dened in the foregoing

subsection. This, then, constitutes another proof of the fact that the Forney trellis is minimal.
Kschischang-Sorokine construction. This construction is dierent from the previous

three in that it species a variety of non-isomorphic trellises for C , only one of which is
minimal. The main idea is to represent C as a sum of certain elementary subcodes, and then
construct a trellis for C as a product of the minimal trellises for these elementary subcodes.
To this end, we rst need to dene the trellis product operation. This operation takes two
trellises T and T of depth n, and produces a trellis T = T T with the following property:
0 0
if C = C (T ) and C = C (T ), then the product trellis T represents the code

1
0
2

C = C + C = f c + c : c 2 C and c 2 C g
1 2
def
1 2 (68) 1 1 2 2
We will usually assume that C and C are linear codes of length n over IFq , in which case
the + in (68) is the ordinary vector addition in IFqn and C is a linear code. Kschischang
1 2
40
and Sorokine 70] point out, however, that the trellis product operation works in a more
general setting. To start with, either C , or C , or both can be nonlinear codes. For
example, the nonlinear Nordstrom-Robinson code N over IF can be written as C + C ,
1 2
where C is the (16 5 8) rst-order Reed-Muller code and C is a nonlinear code with
16 2 1 2
8 codewords 49, 79, 110]. Thus a trellis for N can be constructed as a product of the
1 2
trellises for C and C . Even more generally, the codes C and C can be dened over a non-
16
1 2 1 2
abelian semigroup, in which case the + in (68) is the componentwise semigroup operation.
This is how the trellis product T = T T is dened. Let T = (V E A) and T = (V E A) 0 0 0 0
be trellises of depth n. Then the set of vertices at time i in T is the Cartesian product:
Vi = Vi Vi = f (v v ) : v 2 Vi and v 2 Vi g
0 def
(69) 0 0 0
There is an edge e 2 Ei in T from a vertex (v v ) 2 Vi to a vertex (v v ) 2 Vi if and only 0 0
if (v v a ) is an edge in Ei for some a 2 A, and (v v a ) is an edge in Ei for some a 2 A.

1 1 ;1 2 2
0 0 0 0 0
The label of this edge e 2 Ei is the sum (e) = a + a . In other words, the set of edges at
1 2 1 2
0
time i in T can be also interpreted as the Cartesian product:

Ei = f ( (v v ) (v v ) a + a ) : (v v a ) 2 Ei and (v v a ) 2 Ei g
def 0
1

1
0
2

2
0
(70) 0
1
0
2
0 0
1

2

We see from (69) and (70) that the state-cardinality and edge-cardinality proles of T T 0
are the componentwise products of the corresponding proles of T and T , namely 0
jVij = jVi j jVi j for i = 0 1 : : : n (71) 0
jEij = jEi j jEi j for i = 1 2 : : : n (72) 0
Notice, however, that the product trellis T is not necessarily the minimal trellis for C + C ,
even if both T and T are minimal trellises for C and C , respectively.
1 2
0
1 2
Example 4.9. A trellis T for the (3 1 3) binary repetition code C and a trellis T for the
0
(3 2 2) binary single-parity-check code C are depicted in Figures 11a and 11b, respectively.
1
0 a 0 c 0 0 aw 0 cy 0
1 1 1 1
1 1
1 1 1
0
b d ax cz
a.
0 bw 1 dy 0
0 w 0 y 0
0 0
1 1
1 1
1
0
bx dz
x z
b. c.
Figure 11. Two trellises and their product

Their product T = T T , depicted in Figure 11c, represents the (3 3 1) code IF = C + C .
0 3
It is not dicult to see that T and T are minimal trellises for C and C , respectively.
2 1 2
0
However, their product T is certainly not the minimal trellis for C + C .

1 2
} 1 2
41
It is obvious that the trellis product operation is associative. Since the order of the vertex
labels in (69) has no signicance, the trellis product operation is commutative if and only if
the addition operation + in (70) is abelian, as we hereafter assume. It follows that an
expression of the form
T = T T Tk
1 2
is well-dened, providing only that the trellises T T : : : Tk all have the same depth. Fur-
1 2
thermore, multiplying T T : : : Tk in any order produces trellises isomorphic to T .

1 2
Now let C be a linear code of length n and dimension k over IFq , and let G be a generator ma-
trix for C . The rows of G, hereafter called generators and denoted x x : : : xk , form a basis
for C . Each row xi generates a one-dimensional subcode of C , which we denote by hxi i. Thus
1 2
C = hx i + hx i + + hxk i
1 2 (73)
It follows from (73) that if T T : : : Tk are trellises for hx i hx i : : : hxk i, respectively,
then their product represents C . We denote the minimal trellis for hxi i by Txi , and dene
1 2 1 2
TG = Tx1 Tx2 Txk

def
(74)
The trellis TG is known as the Kschischang-Sorokine trellis for C , based on the generator
matrix G = (x x : : : xk ). Even though each of Tx1 Tx2 : : : Txk is the minimal trellis
for the corresponding one-dimensional subcode of C by denition, their product TG is not
1 2
necessarily the minimal trellis for C . On the other hand, an appropriate choice of the
generators x x : : : xk does make TG minimal. To nd these generators, we rst need to
1 2
analyze the structure of the elementary trellises Tx1 Tx2 : : : Txk .

Given a nonzero codeword x = (x x : : : xn) 2 C , the structure of Tx depends solely on the
1 2
support interval of x, called the span of x in the trellis literature. The notion of span is
a very simple concept, which nonetheless turns out to be ubiquitous in the study of trellises.
We therefore pause to give a precise denition. Recall that the left index L(x) of x was
dened as the smallest integer i such that xi 6= 0. Similarly, we dene the right index R(x)
of x as the largest integer i such that xi 6= 0.
Denition 4.6. The span of a nonzero codeword x 2 C , denoted x], is the non-empty interval
x] = L(x) R(x)] f1 2 : : : ng
def
We say that x starts at L(x), ends at R(x), and is active in the interval L(x) R(x) ; 1].
The span of 0 is taken as the empty interval ] by convention. Further, if L(x) = R(x) then
wt(x) = 1 and x is never active. The length of x] counts the number of times during which
x is active it is dened as sx] = R(x) ; L(x).
The minimal trellis Tx for the binary code hxi generated by a codeword x with span a b]
is shown in Figure 12. It is not dicult to see that this trellis is indeed the minimal trel-
lis for hxi. Furthermore, the trellis in Figure 12 can be easily modied to accommodate nonbi-
nary codes: if hxi were a code over IFq then Tx would have q vertices at times a a+1 : : : b;1,
corresponding to the q dierent multiples 0 x : : : q x of the generator x, with each such
1 ;1
42
vertex having a single predecessor and a single successor. If the generator x has span a a]
then Tx will have just one vertex at each time, with q distinctly labeled edges connecting
the vertex at time a ; 1 with the vertex at time a.
time axis
... ... ...
0 1 a–2 a–1 a a+1 b–2 b–1 b b+1 n–1 n
0
... 0 0 0
... 0 0 0
... 0
1 1
...
xa +1 xb –1
Figure 12. The minimal trellis for hxi

The structure of the elementary trellis Tx in Figure 12 along with (71) make it easy to count
vertices in the Kschischang-Sorokine trellis. The number of vertices at time i in the trellis
TG = (V E IFq ) is given by jVij = qsi , where si is precisely the number of generators in G
that are active at time i. This further implies that
Yn
= logq jVi j = s + s + + sn = sx ] + sx ] + + sxk ]
def
0 1 (75) 1 2
i=0
where is the total span (hence, the name) of the trellis, as dened in (9). This suggests
that to minimize the number of vertices in the Kschischang-Sorokine trellis, we need to nd
a \short" basis for the code | that is, a set of generators whose spans are as short as possible.
Denition 4.7. A generator matrix G for a linear code C of length n and dimension k is
said to be in minimal span form if the total span of G is as small as possible, namely:
n o
sx ] + sx ] + + sxk ] = x0 xmin
1 2
0 :::x0
s x ] + s x ] + + s xk ] 0
1
0
2
0
1 2 k
where fx x : : : xk g is the set of rows of G, and the minimum in the above expression is
taken over all the (qn;1)(qn;q) (qn ; qk ) bases x x : : : xk for C .
1 2
;1 0 0 0
1 2
Intuitively, if the lengths of x ] x ] : : : xk ] are small, each generator will be active over
1 2
a short period of time only, and hence will contribute as little as possible to the vertex count.
As pointed out in 70], the notion that \shortest generators" determine minimal trellises for
linear codes has been repeatedly rediscovered 42, 91, 94].
Theorem 4.14. The Kschischang-Sorokine trellis TG based on a generator matrix G is the
minimal trellis for C if and only if G is in minimal span form.
Proof. (() Suppose that G is in minimal span form. Then if x x : : : xk are the rows 1 2
of G, no two of them can end at the same position. To see that this is so, assume to the
contrary that R(xi) = R(xj ) for some xi and xj , and w.l.o.g. suppose that L(xi) L(xj ).
Then clearly xj ] xi ], and replacing xi with an appropriate linear combination of xi and xj
43
produces a generator matrix for C whose total span is strictly less than the total span of G,
contradicting Denition 4.7. It follows that we can arrange the rows of G in such a way that
R(x ) < R(x ) < < R(xk )
1 2 (76)
This makes it clear that the dimension of the past subcode Pi, as dened in (63), is equal
to the number of rows of G that end at time i or earlier. Indeed, dene pi as the largest
integer such that R(xpi ) i. Then obviously x x : : : xpi 2 Pi, so that dim Pi pi. Now
assume to the contrary that there exists a codeword c 2 Pi which is not a linear combination
1 2
of x x : : : xpi . Then R(c) = R(xj ) for some j pi + 1 in view of (76). Furthermore,

R(xj ) R(xpi ) > i, which is a contradiction. Hence dim Pi = pi, as claimed. By a similar
1 2
+1
argument, no two rows of G can start at the same position. This means that we can rearrange
the rows of G, possibly in a way dierent from (76), such that
L(x ) < L(x ) < < L(xk )
1 2 (77)
This rearrangement makes it clear that the dimension of the future subcode Fi , as dened
in (64), is equal to the number of rows of G that start at time i + 1 or later. It now follows
from Denition 4.6 that the number of rows of G that are active at time i is given by
si = k ; dim Pi ; dim Fi (78)
Recall that jVi j = qsi in the Kschischang-Sorokine trellis. This means that the number of
vertices in TG is equal to the number of vertices in the Forney trellis at all times. Since the
Forney trellis is minimal, so is TG . ()) The `only if' part is straightforward. The minimal
trellis certainly has the minimal possible total span, by denition. Hence if G is not in
minimal span form, then TG cannot be minimal in view of (75).
The question now arises as to how to produce a generator matrix for a given code which is in
minimal span form. It turns out that this question has a simple and elegant answer. First,
the following theorem makes it possible to easily recognize a matrix in minimal span form.
Theorem 4.15. A generator matrix is in minimal span form if and only if it does not contain
rows that start at the same position or end at the same position.
The ()) part of Theorem 4.15 was established as a by-product in the proof of Theorem 4.14.
We refer the reader to 82] and 70] for a complete proof of Theorem 4.15. The conversion of an
arbitrary set of generators x x : : : xk to the minimal span form can be now accomplished
1 2
by a greedy sequence of elementary row operations, as follows:

/* Greedy conversion to minimal span form */

while 9xi xj such that L(xi) = L(xj ) or R(xi) = R(xj ) do (79)
f
if xj ] xi ] then xi := xi + xj
else xj := xj + xi
g
The foregoing piece of pseudo-code assumes operations over IF , to simplify the notation.
2
It is clear that the algorithm extends in the obvious way to non-binary codes. The algorithm
necessarily terminates in at most kn steps, since the total span sx ] + + sxk ] strictly
1
44
decreases at each step. In fact, Kschischang and Sorokine 70] show that O(k ) steps are 2
sucient. In any case, when the algorithm does terminate, the condition of while() in (79)
fails to holds, and the generators x x : : : xk are in minimal span form by Theorem 4.15.
1 2
Example 4.10. We return, for the nal time, to the (7 3 3) binary linear code C studied in
Examples 4.6, 4.7, and 4.8. Consider the generator matrix G for C in row-reduced echelon 1
form, which is given in (62). The corresponding trellis TG1 is shown in Figure 13.
0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0
1 1
0 0 0 0 0 0 0 0 1 0 0 1
1 1 1 1
1 1 1 1 1
1 1 1 1
0 0 0 0 0 0 0 0 0
1 1 1
Figure 13. The Kschischang-Sorokine trellis for C based on row-reduced echelon matrix
Evidently, G is not in minimal span form, as the rst row x and the third row x both end
at time i = 7. Since x ] x ], we replace x by x + x . This produces the matrix
1 1 3
2 3
3 1 1 1 3
1 0 1 0 1 0 0
G = 64 0 1 1 1 0 0 0 75 (80)
0 0 0 0 1 1 1
2
The spans of the rows of G , shown in boldface in (80), are 1 5], 2 4], and 5 7]. Thus G is
2 2
in minimal span form by Theorem 4.15. The corresponding trellis TG2 is shown in Figure 14.
0 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0 0 0 0
1 1
0 1 0 0 1
0 0 0 0 0 0 0
1 1
1
1 1 1
1 1
0
0 0 0 0 0 0 0
1 1 1
Figure 14. The minimal trellis for C resulting from the Kschischang-Sorokine construction
It is again apparent that this trellis is isomorphic to the BCJR, Massey, and Forney trellises.
Further observe that the state-complexity prole fs s s s s s s g = f1 2 2 1 1 1 0g
1 2 3 4 5 6 7
simply counts the number of rows of G that are active at each time. 2
}
45
It is well-known 25, 56, 81] that a generator matrix in row-reduced echelon form is unique.
It is natural to ask whether the same is true for the minimal span form: given two generator
matrices G G in minimal span form, are the rows of G necessarily a permutation of the
1 2 2
rows of G ? The answer to this question is negative. For example, the following two matrices:
1
21 1 1 1 0 0 0 03 21 1 1 1 0 0 0 03
6 7 6 7
G = 664 00 00 10 10 11 11 10 10 775
1 G = 664 00 00 10 10 11 11 10 10 775 (81)
2
0 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0
generate the same (8 4 4) extended Hamming code, and are both in minimal span form. The
corresponding Kschischang-Sorokine trellis is depicted in Figure 3. It is shown in 70, 82],
however, that any two generator matrices for the same code that are in minimal span form
have the same set of row spans. These row spans are thus uniquely determined by the code.
They are called atomic spans by Kschischang and Sorokine in 70]. For example, both matri-
ces G and G in (81) have the same atomic row spans 1 4] 3 6] 5 8] 2 7]. It is not dicult
1 2
to see that the minimal span form is unique if and only if the atomic spans x ] x ] : : : xk ]
form an antichain: xi] is not a proper subset of xj ] for all i 6= j .
1 2
Finally, we note that a generator matrix in minimal span form is also called a trellis-oriented
generator matrix in some papers 5, 7, 38, 39, 128]. This terminology is natural since, as we
have seen, the state-complexity prole of the minimal trellis for C can be read-o directly
from a trellis-oriented generator matrix. We shall hereafter use the terms \trellis-oriented
generator matrix" and \generator matrix in minimal span form" interchangeably.
We have shown in Theorems 4.11, 4.12, 4.13, and 4.14 that the trellises resulting from the
BCJR 2], Massey 80], Forney 38], and Kschischang-Sorokine 70] constructions are minimal.
It therefore follows from Theorem 4.8 that all the four constructions produce one and the
same trellis up to isomorphism. Each construction, however, provides a dierent insight into
its properties. These are investigated in more detail in the next subsection.
We point out that several alternative constructions of the minimal trellis can be found in the
literature. Notably, Forney and Trott 42] extend the approach of (65) to the general class of
group codes, thereby establishing a connection to the results of Willems 125] in behavioral
system theory. Still further generalizations along these lines can be found in 43, 68, 78, 117].
On the other hand, Laerty and Vardy 72] give a construction of the minimal trellis through
a step-by-step merging algorithm that mimics the construction of ordered binary decision
diagrams for Boolean functions, due to Bryant 14]. The construction of 72] thus establishes
an interesting connection between minimal trellises for binary codes and decision diagrams
for Boolean functions, a topic extensively studied in the computer engineering literature.
See 15, 16] for a recent survey of results on ordered binary decision diagrams.
46
4.3. Properties of the minimal trellis
The constructions described in the previous subsection make it possible to readily deter-
mine the structural properties of the minimal trellis for a linear code C , such as state-
complexity and edge-complexity proles, the expansion index, the in-degree and out-degree
of each vertex, and so forth. We will express all these in terms of the dimensions of the
subcodes Pi Fi Pi Fi dened in (37),(38),(63),(64), respectively.

We introduce the following notation: the dimensions of the past and future subcodes will
be denoted by pi = dim Pi and fi = dim Fi , while the dimensions of the past and future
projections will be denoted by pi = dim Pi and fi = dim Fi , with p = fn = 0 by convention.

It is well-known 39, 82, 87] and obvious that the sequences p p : : : pn and p p : : : pn
0

1 2
are non-decreasing, while the sequences f f : : : fn and f f : : : fn are non-increasing.

1 2

1 2 1 2
Each of the four sequences has a simple interpretation in terms of a trellis-oriented generator
matrix for C , as summarized in the following proposition.
Proposition 4.16. Let C be a linear code of length n over IFq . If G is a generator matrix
for C in minimal span form, then:
dim Pi = # of rows of G that end at time i or earlier (82)
dim Pi = # of rows of G that start at time i or earlier

(83)
dim Fi = # of rows of G that start at time i + 1 or later (84)
dim Fi = # of rows of G that end at time i + 1 or later

(85)
Furthermore, if (82),(83) or (82),(84) or (83),(85) or (84),(85) hold for all i = 1 2 : : : n;1,
then G is necessarily in minimal span form.
Proof. Equations (82) and (84) were already established in the proof of Theorem 4.14.
Equations (83) and (85) follow from (77) and (76) by a similar argument. If (82),(84) hold for
all i, then G is in minimal span form by (78) and (75). The other cases are all equivalent.
Example 4.11. For the (8 4 4) extended binary Hamming code, we can start with the
trellis-oriented generator matrix G in (81) and read-o the dimensions of the past and
future subcodes and projections Pi Pi Fi Fi as follows:
1

2 3
1 1 1 1 0 0 0
66 0 0 1 1 1 1 0
0
64 0 777
0 0 0 0 1 1 1 15
0 1 1 0 0 1 1 0
pi : 0 0 0 1 1 2 3 4
(86)
pi : 1 2 3 3 4 4 4

4
fi : 3 2 1 1 0 0 0 0
fi : 4 4 4 3 3 2 1

0
The sum of the entries in the second and third row in (86) is always equal to dim C = 4.
Similarly, for the rst and fourth rows. We shall see shortly that this is not a coincidence. }
47
Indeed, one can partition the rows of G according to whether they start at time i as in (83)
or at time i + 1 as in (84). Thus Proposition 4.16 establishes the following useful relations
k = pi + fi = pi + fi

for i = 0 1 : : : n

(87)
The second equality in (87) follows from (82) and (85), by partitioning the k rows of G into
those that end at time i and those that end at time i + 1. We conclude that, given the
dimension k, any two of pi pi fi fi , except pi fi and pi fi , determine the rest.

The following theorem counts the number of vertices and edges in the minimal trellis for C .
The proof makes essential use of the constructions presented in the foregoing subsection.
Theorem 4.17. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq . Then
jVij = qk pi fi = qpi pi = qfi fi
; ; ;
for i = 0 1 : : : n
;
(88)
j Ei j = q k p
;i ; fi1;
= qi p p i ; ; f
= q i;1 fi for i = 1 2 : : : n
1; (89)
Proof. Think of T = (V E IFq ) as the Forney trellis for C . Then (88) follows immediately
from (67). To prove (89), it is most convenient to think of T as the BCJR trellis for C . Then
Ei may be regarded as the image of C under the linear mapping
c = (c c : : : cn) 2 C !
1 2 7 i (c) = ( i (c) i(c) ci)

(90)
;1
where i(c) is the BCJR mapping dened in (55), as we have already observed in the foregoing
subsection. It follows that the dimension of the edge-space is given by
dim Ei = dim C ; dim ker i (C )
(91)
It is obvious from (90) that ker i (C ) = ker i (C ) \ ker i(C ) \ C i , where C i is the set of

all (c c : : : cn) 2 C such that ci = 0. But ker i(C ) = Pi Fi as we have found in (66).
;1
Since the past and future subcodes of C are nested, namely Pi Pi and Fi Fi for
1 2
all i = 1 2 : : : n, we have (Pi Fi ) \ (Pi Fi ) = Pi Fi. Furthermore, it is easy to

;1 ;1
see that (Pi Fi) C i . This implies that

;1 ;1 ;1
;1

ker i (C ) = ker i (C ) \ ker i(C ) \ C i = Pi Fi
;1 ;1
Hence dim Ei = k ; dim(Pi Fi ) = k ; pi ; fi in view of (91), as claimed. Another

;1 ;1
way to establish both (88) and (89) is to consider the Kschischang-Sorokine construction.
Referring to Figure 12, we see that a generator with span a b] contributes to vertex-space
dimension at times a a + 1 : : : b ; 1 and to edge-space dimension at times a a + 1 : : : b.
The rest easily follows from Theorem 4.14 and Proposition 4.16.
Theorem 4.17 makes it possible to compute the state complexity and the edge complexity,
as dened in (8) and (14), from p p : : : pn and f f : : : fn. These are given by
1 2 1 2
s = k ; mini f pi + fi g 2I
b = k ; mini f pi + fi g 2I ;1
where I is the time axis for the trellis. All the other measures of trellis complexity introduced
in Section 2.3 can be also computed from the past and future proles using Theorem 4.17.
48
Now let P (v) denote the total number of paths from the root to a given vertex v in the
#
minimal trellis, with P () = 1 by convention. We have the following proposition.

#
Proposition 4.18. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq .
Then P (v) is the same for all v 2 Vi, and we have:
#
P (v) = qpi # for i = 0 1 : : : n (92)

X
P (v) = qpi # for i = 1 2 : : : n (93)
v Vi
2
Similarly, the number of paths from a vertex v 2 Vi to the toor ' is qfi , while the total
number of paths from all the vertices in Vi to the toor is given by qfi .
Proof. Referring to the Forney construction, every vertex v 2 Vi in the minimal trellis can
be thought of as a coset of Pi Fi in C . Thus, with the past PT (v) and the future FT (v)
of v as dened in (41) and (42), respectively, we have
PT (v) FT (v) = x + (Pi Fi)
for some x 2 C . It follows that PT (v) is a coset of Pi for all v 2 Vi, and hence jPT (v)j = jPij.
Now recall that the minimal trellis for C is proper by Theorem 4.8. This means that the
number of codewords in PT (v) is precisely equal to P (v), and (92) follows. #
To establish (93), let us dene the partial trellis T ji as the trellis obtained from T by deleting
all the vertices in Vi
Vi

Vn and all the edges that are incident upon these vertices.
Then the left-hand side of (93) counts the total number of paths in T ji . The claim of (93)
+1 +2
now follows immediately by observing that if T is a proper trellis for C , then T ji represents
the past projection Pi in a one-to-one manner.

The following theorem deals with the degrees of the vertices in the minimal trellis. We
distinguish between the in-degree deg (v) which counts the number of edges that end at v
and the out-degree deg (v) which counts the number of edges that begin at v.
in
out
Theorem 4.19. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq . Then
all the vertices v 2 Vi have the same in-degree and the same out-degree, given by:
deg (v) = qpi pi;1 = qfi;1 fi
in
;
for i = 1 2 : : : n
;
(94)
deg (v) = qfi fi+1 = qpi+1 pi
out
;
for i = 0 1 : : : n;1
;
(95)
Proof. We rst prove that all the vertices in Vi have the same in-degree and the same out-
degree. One way to show this is through the Kschischang-Sorokine construction. Indeed,
observe that the Cartesian product operation in (69) and (70) multiplies the degrees: if
(v v ) is a vertex in the trellis product T T then
0 0
deg (v v ) = deg (v ) deg (v )

in
0
in
0
in
(96)

deg (v v ) = deg (v ) deg (v )

out
0
out
0
out
(97)
The elementary trellises Tx1 Tx2 : : : Txk in (74) certainly have the property that the in-
degree and the out-degree is the same for all vertices at time i, as illustrated in Figure 12.
49
In view of (96) and (97), this property is preserved under the trellis product | it is therefore
inherited by the Kschischang-Sorokine trellis T = Tx1 Tx2 Txk , whether it is minimal
or not. Given that the in-degrees and the out-degrees are all equal, we have
deg (v) = jEi j deg (v) = jEi j +1
in
jVij j Vi j
out
for every vertex v 2 Vi. The expressions for deg (v) and deg (v) in (94) and (95) then
follow immediately from the expressions for jVij and jEi j obtained in Theorem 4.17.
in out
We now dene %pi = pi ; pi and %fi = fi ; fi, with %p = %f = 0 by convention. Due

;1 ;1 0 0
to the monotonicity of the sequences p p : : : pn and f f : : : fn , both %pi and %fi take
on values in the set f0 1g. In conjunction with Theorem 4.19, this implies that there are only
0 1 0 1
four possible ways to connect the vertices of Vi to the vertices of Vi in the minimal trellis.
The four possible values of the pair (%pi %fi) determine the in-degree of each vertex v 2 Vi
;1
and the out-degree of each vertex v 2 Vi . The values of (%pi %fi) thus specify one of four
0
;1
fundamental types of trellis structure. These four types of trellis structure are summarized
in Table 1 below for binary linear codes the extension to codes over IFq is straightforward.
Edge Edge and state Vertex structure

%fi %pi structure complexity proles Vi Vi
;1
>; ;<
0 0 bi = si = si
;1
;; ;;
HHH
HH >; >;
0 1
bi = si = si + 1
;1
;; ><

HH
;< ;<
1 0
HHH bi = si + 1 = si
;1
>< ;;
QQQ ;< >;
1 1
QQ
bi = si + 1 = si + 1
;1
>< ><
Table 1. The four fundamental types of trellis structure
According to Theorem 4.19, at each time i = 1 2 : : : n, the minimal trellis for a linear
code C exhibits one of the four fundamental structures shown in Table 1. We will often use
the mnemonics =, >, <, and ./, introduced in 74, 62], to refer to these structures. Notice
50
that the buttery structure ./ in Table 1 could be degenerate. A non-degenerate buttery ./
involves q vertices of Vi , q vertices of Vi, and has the structure of the biclique Kqq with
q edges. A degenerate buttery occurs if C contains a codeword with span i i]. It still
;1
2
involves q vertices of Vi , q vertices of Vi, and q edges of Ei . However, the degenerate

;1
2
buttery has the structure of a matching: it is a q-partite graph, with q distinctly labeled
edges connecting each pair of vertices.
A useful observation from Table 1 is that the total number of < structures in the minimal
trellis is equal to the total number of > structures. This follows from the fact that
%p + %p + + %pn = %f + %f + + %fn = k
1 2 1 2
Another way to see this is to observe that si = si + 1 whenever there is a < in the trellis,
si = si ; 1 whenever there is a > in the trellis, and si = si at all other times. As s = sn,
;1
;1 ;1 0
the > and < structures must appear the same number of times in the minimal trellis.
Finally, observe that the sequences %p %p : : : %pn and %f %f : : : %fn also have an in-
1 2 1 2
teresting interpretation in terms of a trellis-oriented generator matrix. Recall that the spans
of the rows in this matrix are uniquely determined by the code. Thus the following sets
L(C ) = f L(x ) L(x ) : : : L(xk ) g
1 2 (98)
R(C ) = f R(x ) R(x ) : : : R(xk ) g
1 2 (99)
where x x : : : xk is any basis for C in minimal span form, are well dened. With this
notation, it follows immediately from Proposition 4.16 that %pi = 1 if and only if i 2 R(C ),
1 2
and %pi = 0 otherwise. Similarly %fi = 1 if and only if i 2 L(C ), and %fi = 0 otherwise.
There is a number of remarkable relations between the minimal trellis for a linear code C and
the minimal trellis for its dual code C . The most important of these relations is summarized
?
in the following theorem, rst established by Forney in 38].

Theorem 4.20. The minimal trellis T = (V E IFq ) for a linear code C and the minimal
trellis T = (V E IFq ) for its dual code C have the same number of vertices at all times.
? ? ? ?
Proof. The easiest way to prove this is to consider the BCJR construction of the minimal
trellis. In other words, we will think of Vi as the column-space of the BCJR matrix HiGTi
dened in (54). For the dual code C , all we have to do is to interchange the roles of the
?
generator martix and the parity-check matrix: hence Vi is just the column-space of Gi HiT .
?
But the columns of GiHiT are precisely the rows of HiGTi , and therefore dim Vi = dim Vi ?
follows directly from the \row rank = column rank" theorem of linear algebra.
Notice that the edge-complexity proles of T and T are not necessarily the same. For in-
?
stance, the minimal trellises for the (3 1 3) binary repetition code C and the (3 2 2) binary
single-parity-check code C are depicted in Figure 11 we see that jE j = 2 while jE j = 4.
? ?
On the other hand, the edge-complexity proles fb b : : : bn g and fb b : : : bn g cannot

1 1
? ? ?
1 2
dier too much. An inspection of Table 1 shows that bi = si + %pi. Since the state-comp-
1 2
lexity proles are the same by Theorem 4.20, it follows that bi ; bi 2 f0 1 ;1g.?
51
Still more intriguing connections between the trellises T and T follow from a closer examina- ?
tion of the matrices Hi Gi H$ n i G$ n i, dened in (54) and (61). With Pi Pi Fi Fi dened
as before, we let Pi Pi Fi Fi denote the corresponding past and future subcodes and
; ;
? ? ? ?
projections for the dual code C . Then it is obvious that

?
Gi = generator matrix for Pi and parity-check matrix for Pi

(100) ?
Hi = parity-check matrix for Pi and generator matrix for Pi (101) ?
G$ n i = generator matrix for Fi and parity-check matrix for Fi

;

(102) ?
H$ n i = parity-check matrix for Fi and generator matrix for Fi

; (103) ?
It follows from (100) and (87) that pi = i ; pi = (i +fi) ; k. Similarly, it follows from (102)
?
and (87) that fi = (n ; i) ; fi = (n ; k) ; (i ; pi). Hence we have

?
%pi = pi ; pi = fi ; fi + 1 = 1 ; %fi
? def ? ?
;1 ;1 (104)
%fi = fi ; fi = pi ; pi + 1 = 1 ; %pi
? def ?
;1
?
;1 (105)
In other words %pi is the binary complement of %fi, and %fi is the complement of %pi.
? ?
In conjunction with (98) and (99), this implies that R(C ) is the complement of L(C ), and?
L(C ) is the complement of R(C ), in the set f1 2 : : : ng. This means that the time axis
?
I = f1 2 : : : ng can be partitioned into the left and right indices for C and C , as follows ?
L(C )
R(C ) = R(C )
L(C ) = f1 2 : : : ng
? ?
(106)
Similar results were obtained by Wei 121] in the context of generalized Hamming weight
hierarchy of linear codes. We shall see in the next section that this is not a coincidence.
Referring to Table 1, another consequence of (104),(105) is that the four fundamental types
of trellis structure in the minimal trellises for C and C interchange in the following way.
?
Theorem 4.21. If T is the minimal trellis for C and T is the minimal trellis for C , then
? ?
./ at time i in T corresponds to = at time i in T and vice versa. At all other times, the
?
trellises T and T have the same structure.

?
An interesting corollary of Theorem 4.21 is that the minimal trellis for a self-dual code C
cannot contain the = and ./ structures. Indeed, for self-dual codes it follows from (104),(105)
that %pi = 1 ; %fi , or equivalently %pi 6= %fi. Thus the minimal trellis for a self-dual code
necessarily consists of n=2 structures of type < and n=2 structures of type >.
The minimal trellis has one more important property which we now establish: it is biproper.
That in itself is not surprising this should be obvious from any one of the four constructions
described in Section 4.2. What is more interesting is that a trellis for a linear code C is
minimal if and only if it is biproper. Furthermore, this is true not only for linear codes
but for the general class of rectangular codes dened in Section 4.1. Moreover, it can be
shown 68] that the class of rectangular codes is precisely characterized by this property:
a code is rectangular if and only if it admits a biproper trellis representation.
52
We will prove these results in a roundabout manner, which involves the notion of mergeable
and non-mergeable trellises. The reason for doing so is that we will be able to establish
as a corollary to our proof that the minimal trellis minimizes all the measures of trellis
complexity introduced in Section 2.3, most of them uniquely. We start with a denition.
Denition 4.8. Let T = (V E A) be a trellis representing a code C of length n. Two distinct
vertices v and v in V are said to be mergeable if concatenating the past of one with the future
0
of the other produces no strings that are not codewords, namely if PT (v) FT (v ) C and 0
PT (v ) FT (v) C . The trellis T is said to be mergeable if it contains mergeable vertices
otherwise it is said to be non-mergeable.

We next establish three lemmas, which taken together will lead to a proof that the notions of
non-mergeable trellis, biproper trellis, and minimal trellis all coincide for rectangular codes.
Lemma 4.22. If a trellis T for a block code C is minimal, then it is non-mergeable.
Proof. A mergeable trellis can be transformed into a smaller trellis for the same code
by merging the mergeable vertices. If v v 2 V are mergeable, we can replace v and v by
1 2 1 2
a single vertex v which inherits all of the edges of v and v : namely, all the edges originally
1 2
incident to/from v or v are taken as being incident to/from v. Obviously, this procedure
1 2
decreases the number of vertices in T . Thus if T is mergeable it cannot be minimal.

Lemma 4.23. If a trellis T for a rectangular code C is non-mergeable, then it is biproper.
Proof. We assume that T = (V E A) is not biproper, and show that it contains mergeable
vertices. Recall that a trellis is not biproper if and only if it is either not proper or not co-
proper. If T is not proper, then it has two distinct edges (v v a) and (v v a) with the
same label a starting at the same vertex v 2 V . Let x be any element of PT (v). Then the
1 2
0
distinct vertices v and v have the past x = (x a) in common. We let F (x) denote the
0
future of x in C , as dened in (40). Obviously

1 2
FT (v ) F (x) and FT (v ) F (x)

1 2 (107)
Now let x be any element of PT (v ). Then the futures F (x ) and F (x) both contain FT (v ).
Since C is rectangular, this implies that F (x ) = F (x) by Proposition 4.7. In view of (107),
1 1 1 1
this further implies that FT (v ) F (x ). As this is true for all x 2 PT (v ), it follows that
1
2 1 1 1
PT (v ) FT (v ) C
1 2
By a similar argument, we have PT (v ) FT (v ) C . Hence the vertices v and v are

2 1 1 2
mergeable by Denition 4.8. If T is not co-proper, then a similar argument can be used to
show that there exist distinct vertices v and v in T with a common future (a x ). Such 0
vertices are again easily shown to be mergeable if C is rectangular.

1 2
Lemma 4.24. If a trellis T for a rectangular code C is biproper, then it is minimal.

Proof. Let T be a biproper trellis for C . Referring to Proposition 4.4 and Proposition 4.10,
all we need to establish the minimality of T is to prove that if two codewords of Pi are future-
equivalent then they are also T -equivalent, for all i = 1 2 : : : n. Thus let x x 2 Pi be 1 2

53
future-equivalent, that is F (x1) = F (x2) in C . Since T is proper, each of x1 x2 corresponds
to a unique path in T , as observed in (39). Let P1 P2 denote these two paths, and assume
to the contrary that P1 P2 end at distinct vertices v1 v2 2 Vi, so that F (x1) = FT (v1) and
F (x2 ) = FT (v2 ). Since T is also co-proper, the paths from all the vertices in Vi to the toor
are labeled distinctly. This implies that
F (x1 ) \ F (x2) = FT (v1 ) \ FT (v2 ) = ?
which is a contradiction, since F (x1) = F (x2) by assumption. Hence P1 P2 must end at the
same vertex of T , and x1 x2 are T -equivalent.
Given a code C , we can de
ne a partial order on the set of trellises for C as follows. Following
Kschischang 68], we will say that T T if T can be obtained from T by a sequence (possibly
0 0
empty) of vertex merges. It is easy to see that the minima in the resulting partially ordered
set T (C ) are precisely the non-mergeable trellises for C . In general, the poset T (C ) thus
de
ned may have several non-isomorphic minima. However, the following theorem shows
that if C is a rectangular code, then T (C ) contains a unique minimum up to isomorphism.
Theorem 4.25. If C is a rectangular code, then the following statements are equivalent:
a trellis T for C is minimal
a trellis T for C is biproper
a trellis T for C is non-mergeable.
Furthermore, such a trellis exists and is unique, up to isomorphism. In particular, any two
non-mergeable trellises for C are minimal and isomorphic to each other.
Proof. The fact that the properties of being minimal, biproper, and non-mergeable are
equivalent is a corollary to Lemmas 4.22, 4.23, and 4.24. The existence and uniqueness then
follow from the existence and uniqueness of the minimal trellis, established in Theorem 4.8.
With Theorem 4.25 at hand, it is not dicult to prove that the minimal (or the biproper, or
the non-mergeable) trellis minimizes all the complexity measures introduced in Section 2.3
Theorem 4.26. Let T = (V E A) be the minimal trellis for a rectangular code C of length n,
and let T = (V E A) be any other trellis for C . Then:
0 0 0
jVi j jVi j for all i = 0 1 : : : n

0
(108)
jEi j jEi j for all i = 1 2 : : : n
0
(109)
jV j = jV0 j + + jVn j jV0 j + + jVn j = jV j
0 0
(110)0
jE j = jE1 j + + jEn j jE1 j + + jEn j = jE j

0 0
(111) 0
Vmax = maxi jVi j maxi jVi j = Vmax

2I 2I
0 0
(112)
Emax = maxi jEi j maxi jEi j = Emax
2I 2I
0 0
(113)
E = jE j ; jV j + 1 E = jE j ; jV j + 1
0 0 0
(114)
D = 2jE j ; jV j + 1 D = 2jE j ; jV j + 1
0 0 0
(115)
Furthermore, if any one of the inequalities (108), (109), (110), (111), (115) holds with equality,
then all of (108) { (115) hold with equality, and T is isomorphic to T .
0
54
Proof. If T is minimal then T is isomorphic to T by Theorem 4.8, and all of (108) { (115)
0 0
hold with equality. If T is not minimal, then it is necessarily mergeable by Theorem 4.25.
0
We can successively merge the mergeable vertices in T to obtain a sequence of trellises

0
T ! T1 ! T2 ! ! T 0
(116) 00
until there are no more mergeable vertices. The trellis T is thus non-mergeable, and hence
00
isomorphic to T by Theorem 4.25. Since at each vertex merge in (116), the number of vertices
strictly decreases while the number of edges does not increase, this proves (108) { (113).
However, an arbitrary sequence of vertex merges { as in (116) { is not sucient to prove
(114) and (115). Indeed, if two mergeable vertices v1 v2 are merged into a single vertex v,
the vertex count decreases by one, and the expansion index might increase if the edge count
remains the same. Hence, a more careful argument is required to establish (114) and (115).
The key observation is that if T is not isomorphic to T , then it is also not biproper by
0
Theorem 4.25. This means that it is either not proper, or not co-proper, or both. If T is not 0
proper, then it has two distinct edges (v v1 a) and (v v2 a) with the same label a starting
at the same vertex v 2 V . As shown in the proof of Lemma 4.23, the vertices v1 and v2 are
0
mergeable. In addition, we now observe that merging v1 and v2 into a single vertex v creates 0
two identical edges of the type (v v a). Deleting one of the two identical edges results in
0
a trellis T = (V E A) with jV j = jV j ; 1 and jE j jE j ; 1 (a strict inequality

00 00 00 00 0 00 0
jE j < jE j ; 1 would apply if there are more than two identical edges created by the merge
00 0
of v1 and v2). This vertex merging process is illustrated in Figure 15.
v1
a
a
v v v
a
v2
Figure 15. Vertex merging in a non-proper trellis for a rectangular code

We proceed merging vertices and deleting identical edges in this manner until a proper
trellis is obtained. Since a similar argument applies in the case T is not co-proper, we 0
can eventually transform T into a biproper trellis, which has to be isomorphic to T by

0
Theorem 4.25. Since at each step of this procedure, the expansion index does not increase,
this proves (114). As D = jE j + E , inequality (115) also follows. Furthermore, the number
of vertices and the number of edges strictly decreases at each merge. Hence if any one of
(108), (109), (110), (111), (115) holds with equality, the sequence of vertex merges required
to transform T into a trellis isomorphic to T must be empty. In other words, T itself must
0 0
be isomorphic to T , and all of (108) { (115) must hold with equality.

55
An important consequence of Theorem 2.26 is that the minimal trellis not only minimizes
all the conceivable measures of trellis complexity, but is also the unique trellis to do so, for
most of these measures. In particular, the minimal trellis is the only trellis for a rectangular
code C , up to graph isomorphism, that minimizes the total number of vertices jV j, or the
total number of edges jE j, or the Viterbi decoding complexity D = 2jE j ; jV j + 1.
Having established this remarkable property of the minimal trellis, we conclude our discus-
sion of minimal trellises for a
xed time axis. The art of trellis decoding has to do with
manipulations of the time axis, which is the topic of the next two sections.
Notes on Section 4: Theorem 4.1 is due to Muder 87]. Our proof of this theorem, through
Propositions 4.3, 4.4, and 4.5, follows the exposition in 87]. All the examples dealing with
non-rectangular codes in Section 4.1 are from Kschischang and Sorokine 70]. Rectangular
codes were introduced in 70], and further studied in 68, 97, 98, 115]. Given a nonlinear
rectangular code C , it is not clear whether a permutation of C is still rectangular | see 98]
for more on this problem. Theorem 4.8 was
rst proved by Muder 87] for linear codes, and
extended by Kschischang and Sorokine 70] to rectangular codes. The proof of Theorem 4.8
presented here, including the notion of T -adjacency, is new. As we have already mentioned,
results analogous to Theorem 4.8 are known in linear system theory 125], the theory of
nite-state automata 54], symbolic dynamics 77], and computer engineering 14].
The constructions of BCJR 2], Massey 80], Forney 38], and Kschischang-Sorokine 70] date
back to 1974, 1978, 1988, and 1995, respectively. It took a long time to realize that all these
constructions produce one and the same trellis, up to isomorphism. The minimality of the
Forney trellis was established by Muder 87]. Kot and Leung 65] claimed without proof that
the BCJR, Massey, and Forney trellises are isomorphic. The fact that the BCJR trellis is min-
imal, and hence isomorphic to the Forney trellis, was proved by Zyablov and Sidorenko 128]
and by McEliece 82]. Our proof of the minimality of the Massey trellis is new. There is
an interesting connection between the (enumeration of) minimal trellises and rook polyno-
mials 70]. To
nd out how to construct minimal trellises for group codes see 42], for lattices
see 40, 103, 105], for Boolean functions see 72], for codes over
nite abelian groups see 117].
The properties of the minimal trellis discussed in Section 4.3 were
rst discovered in 38,
39, 70, 74, 82]. Table 1 is from Lafourcade and Vardy 74], but see also 70, 62]. Theo-
rems 4.20 and 4.21 are from 38] and 70], respectively. McEliece 82] was the
rst to show
that the minimal trellis minimizes the edge count at each time, while inequalities (114),(115)
in Theorem 4.26 were
rst established in 115]. Our proof of Theorem 4.26 follows Vardy and
Kschischang 115]. Theorem 4.25 is also from 115], although the proof presented here is new.
56
5. The permutation problem
Considering the comprehensive theory developed in the foregoing section, it is fair to say that
minimal trellises for linear codes are by now well understood, and most questions pertaining
to the minimal trellis for a
xed time axis have been already answered. On the other
hand, the innocuous operation of permuting the symbols in each codeword seems to assume
a fundamental signi
cance in the context of trellis complexity, and leads to a number of
challenging problems. It turns out that a permutation of coordinates can drastically change
the number of vertices in the minimal trellis representation of a given code C , often by an
exponential factor. Let us illustrate this point by a simple example.
Example 5.1. Consider the (6 3 2) linear code C , generated by (100001), (010100), and
(001010). It is easy to see (cf. Theorem 4.15) that this basis for C is in minimal span form,
and the corresponding minimal trellis is shown in Figure 16a.
0 0
0 0
1
1
0 0
1 1 0
0 0
0 0 0 0 0
1 1
0 0
1 1 1 1 1 1
1 0 0 1
1 1
b.
0
0
1 1
1 1
a.
Figure 16. Minimizing trellis complexity via permutations for a simple code
However, re-ordering the time axis I = f1 2 3 4 5 6g for C according to the permutation
= (1)(4)(5)(2 3 6) maps (100001), (010100), (001010) into (110000), (001100), (000011),
respectively. The corresponding minimal trellis is shown in Figure 16b. }
Indeed, for larger { and less trivial { codes, the reduction in trellis complexity that may be
achieved by permutations of the time axis is even more dramatic. The problem of minimiz-
ing the trellis complexity of a code via coordinate permutations, termed the \art of trellis
decoding" by Massey 80], has attracted a lot of interest in the coding theory literature. In
this section, we briey survey the present state of knowledge on this problem.
We point out that, given a linear code C , there is no guarantee that there exists a time axis
for C that simultaneously minimizes all the various measures of trellis complexity. Hence,
in the context of minimizing the complexity of a trellis via permutations of the time axis,
it is important to specify precisely which measure of complexity one is trying to minimize.
57
In this section, we will usually concentrate on minimizing the trellis state complexity s, as
de
ned in equation (8) of Section 2. This choice is consistent with most of the literature on
the subject. However, see 62, 82] for a diering viewpoint.
It is intuitively clear that
nding a permutation that minimizes s for a given linear code is
a hard problem. It would be nice if this could be also shown in rigorous terms, using the
well-established language of complexity theory 48]. In particular, is this problem NP-hard?
Although this question is still open, there has been much progress recently on closely related
problems, and we describe these results in the next subsection.
Since
nding the optimal permutation of the time axis for a general linear code appears to
be intractable, most of the work in the literature is concerned with upper and lower bounds
on the trellis complexity that can be achieved under all possible permutations. The upper
bounds are discussed in Section 5.2. These are usually obtained by
nding speci
c `good'
permutations for speci
c codes. In particular, we will exhibit in Section 5.2 an ordering of
the time axis for the binary Reed-Muller codes 60] which turns out to be optimal compo-
nentwise, or uniformly ecient in the terminology of 62]. We will also construct uniformly
ecient permutations for the binary (24 12 8) Golay code 39, 87], the (48 24 12) binary
quadratic-residue code 7, 24], and some other codes. Finally, we will describe a technique
for constructing reasonably good permutations for primitive binary BCH codes 114].
Section 5.3 is concerned with lower bounds on trellis complexity. In particular, we discuss the
bound of 39, 87], based on the notion of a dimension-length pro
le (DLP), and establish the
connection 60, 114] between the trellis complexity of a code and its generalized Hamming
weight hierarchy. Another lower bound on trellis complexity is the span bound s R(d ; 1).
We will show in Section 5.3 that the DLP bound and the span bound are, in fact, extreme
special cases of a general lower bound on s, derived in 74] and based on partitioning the
time axis into a number of sections of varying lengths. We will also consider the extension
of these results to lower bounds on edge complexity, and to nonlinear codes. Finally, we
discuss the notion of entropy-length pro
les (ELP) for nonlinear codes, introduced in 92].
Section 5.4 contains a table of bounds on the trellis state complexity of binary linear codes of
length n 24, compiled by Petra Schuurman 96]. In contrast, Section 5.5 is concerned with
asymptotic bounds on the relative trellis complexity & = s=n. We will show that for n ! 1,
all the complexity measures introduced in Section 2.3 coincide, so that & may be regarded
as the single asymptotic measure of trellis complexity. We will discuss upper and lower
bounds on & due to Kudryashov-Zakharova 71], Zyablov-Sidorenko 128], and Lafourcade-
Vardy 74]. In particular, we will prove that the number of vertices in the minimal trellis
grows exponentially fast with the length n, in any asymptotically good sequence of codes.
Regretfully, it would not be possible to include a self-contained proof for all of these results
in a single section. Thus some of the theorems below will be given without proof. However,
in all such cases, we provide a reference to the original work where a proof can be found.
Remark on terminology: In what follows, we will often refer to codes that dier by a per-
mutation of coordinates { usually called equivalent in the coding theory literature { as one
and the same code under two dierent time axes (cf. Example 5.1).
58
5.1. Complexity of the permutation problem
Given a general linear code C , how hard is it to
nd a permutation of the time axis for C
that minimizes the trellis state complexity? In this subsection, we analyze the computa-
tional complexity of this task. To do so, we
rst need to pose the problem of
nding such
a permutation as a rigorous decision problem, in the style of Garey and Johnson 48].
In this regard, it would be convenient to introduce the notion of width of a matrix, as de
ned
in 57], also called the partition rank of a matrix in 55]. Let M be an m n matrix over IFq .
As before, we let Mi and Mn i denote the matrices consisting of the
rst i columns of M
;
and the last n ; i columns of M , respectively. For each i = 1 2 : : : n;1, we de

ne
i (M ) def
= rank Mi + rank Mn i ; rank M
; (117)
with 0(M ) = n(M ) = 0 by convention. Thus i(M ) is the dimension of a linear space
which consists of the intersection of the column-space of Mi with the column-space of Mn i. ;
The width of M is then de

ned as width M = maxi i(M ), where I = f0 1 : : : ng.
2I
Lemma 5.1. Let C be a linear code of length n over IFq , and let T = (V E IFq ) be the
minimal trellis for C . If H is a parity-check matrix for C and G is a generator matrix for C ,
then the dimension of the vertex-space Vi is given by:
si = i(H ) = i(G) for i = 0 1 : : : n
Consequently, the state complexity s = maxi si is precisely the width of a parity-check matrix
for C , which is also equal to the width of a generator matrix for C .
Proof. By Theorem 4.17, we have si = k ; dim Pi ; dim Fi . We know from (101),(103)
that Hi is a parity-check matrix for Pi and Hn i is a parity-check matrix for Fi. Hence
;
dim Pi = i ; rank Hi
dim Fi = (n ; i) ; rank Hn i;
In conjunction with (117), this shows that si = i (H ). A similar argument for the dual code
of C shows that si = (n ; k) ; dim Pi ; dim Fi = i(G). The lemma now follows by
? ? ?
observing that si = si for all i, in view of Theorem 4.20.

?
As a consequence of Lemma 5.1, we conclude that the problem of

nding a permutation of
the time axis that minimizes the state complexity is equivalent to the problem of
nding
a column permutation that minimizes the width of a matrix. The corresponding decision
problem can be posed as follows:
Problem: Trellis State-Complexity
Instance: A binary m n matrix M , and a positive integer s.
Question: Is there a permutation that takes M into a matrix M with width M s?
0 0
Notice that it does not matter whether M is viewed as a parity-check matrix or as a generator
matrix for C in this problem. We strongly believe that the Trellis State-Complexity
problem is NP-complete. The NP-completeness of Trellis State-Complexity was con-
59
jectured repeatedly in 55, 57, 69, 111, 112]" however, a proof of this conjecture remains elu-
sive. On the other hand, the following closely related problem is known to be NP-complete.
Problem: State-Complexity Profile
Instance: A binary m n matrix M , and positive integers s and i, with i n.
Question: Is there a permutation that takes M into a matrix M with i(M ) s?
0 0
Horn and Kschischang proved in 55] that State-Complexity Profile is NP-complete

using an ingenious and elaborate transformation from Simple Max Cut 48, p.210], which
spans over
ve pages. Herein, we will present a much simpler proof of a somewhat stronger
result. Our proof is based on reduction from the following problem.
Problem: Minimum Distance
Instance: A binary (n;k) n matrix H and an integer w > 0.
Question: Is there a nonzero vector x 2 IF2n of weight w, such that Hxt = 0?
The NP-completeness of Minimum Distance was conjectured by Berlekamp, McEliece, and
van Tilborg 8], and this conjecture was recently proved in 111, 112]. The reduction from
Minimum Distance is based on the following observation.
Lemma 5.2. Let C be an (n k d) linear code over IFq , and let d denote the dual distance
?
of C . Then, under all possible permutations of the time axis I for C , we have
si = i for i = 1 2 : : : min(d d ) ; 1
?
Similarly, si = n ; i for all i > n ; min(d d ), and for all permutations of I . On the other
?
hand, if i min(d d ) then there exists a permutation of I such that si < i.

?
Proof. Let H be a parity-check matrix for C , and let G be a generator matrix for C . It
follows from Theorem 4.17 in conjunction with (100) and (101) that
si = dim Pi ; dim Pi = rank Gi + rank Hi ; i

(118)
If i < min(d d ) then every i columns of G and every i columns of H are linearly independent.
?
Hence (118) implies that si = 2i ; i = i for all possible permutations of the time axis I .
On the other hand, if i d then we can
nd a permutation of I , such that the
rst d columns
of H are dependent. For this permutation, we conclude from (118) that si rank Hi < i.
If i d then there exists a permutation of the columns of G, such that si rank Gi < i.
?
It follows from Lemma 5.2 that if we can answer the question of State-Complexity Pro-
file in polynomial time, then we can compute min(d d ) in polynomial time. Since comput-
?
ing the minimum distance of a binary linear code is NP-hard 112], it remains to distinguish
between the distance and the dual distance.
Theorem 5.3. State-Complexity Profile is NP-complete.
Proof. It is obvious that State-Complexity Profile is in NP, since given a putative
permutation, we can verify that i (M ) s in polynomial time. Now suppose that C is
0
an (n k d) binary linear code whose minimum distance we would like to determine. In other
words, C is the code whose parity-check matrix H is at the input to Minimum Distance.
60
Given H , we
rst construct a binary linear Reed-Muller code C of length 2m and order r, 0
where m = 2dlog2 ne + 1 and r = dlog2 ne. Then C is an (n k d ) self-dual code, where:

0 0 0 0
n = 22 log2 n +1 8n2
0 d e
k = n =2 4n2
0 0
d = 2m r = 2 log2 n +1 2n
0 ; d e
We then use the well-known Kronecker product construction 79, p.568] to obtain a generator
matrix for the product code C = C C , where C is the dual code of C generated by H .
? 0 ?
Evidently, the length of C is n = nn 8n3, and its minimum distance is

0
d = d d 2nd n > d
? 0 ?
Furthermore, it is easy to see that the dual distance of C is the minimum of the dual
distances of C and C , namely min(d d ) = d. We can therefore query an oracle for State-
? 0 0
Complexity Profile, with the input M being a generator matrix for C , for the existence
of a permutation such that w (M ) w ;1. In view of Lemma 5.2, such a permutation exists
0
if and only if the answer to the question of Minimum Distance is \Yes."

The proof of Theorem 5.3 shows that State-Complexity Profile remains NP-complete
if the input is restricted to s = i;1. In other words, even if all we want to know is whether
si 6= i for some permutation, the computational task of determining this is still NP-hard.
This is a somewhat stronger result than the one reported by Horn and Kschischang in 55].
We also observe that although the statement of the State-Complexity Profile problem
refers speci
cally to binary codes, the proof of Theorem 5.3 straightforwardly extends to
codes over an arbitrary
nite
eld of
xed order q.
Jain, M#andoiu, and Vazirani 57] recently considered a dierent variation of the Trellis
State-Complexity problem. They were able to show that this problem becomes NP-
complete over large
elds. That is, the order of the
eld is variable in the formulation of 57],
and is allowed to grow as part of the input, although the characteristic of the
eld may be
kept constant, say p = 2. The corresponding decision problem can be phrased as follows:
Problem: State-Complexity over Large Fields
Instance: An integer
> 0, an m n matrix M over GF(2), and an integer s > 0.
Question: Is there a permutation that takes M into a matrix M with width M s? 0 0
To prove that this problem is NP-complete, Jain, M#andoiu, and Vazirani 57] use a polyno-
mial transformation from a modi
ed version of the following problem:
Problem: MDS Code
Instance: Positive integers k n and
, and an (n;k) n matrix H over GF(2).
Question: Is there a nonzero vector x of length n over GF(2), such that Hxt = 0
and wt(x) n ; k?
The fact that MDS Code is NP-complete was established in 112]. Jain, M#andoiu, and Vazi-
rani prove in 57] that this problem remains NP-complete if the input is restricted to n = 2k.
61
That is, determining whether a given linear code of rate 0:5 over a
eld of characteristic 2 is
MDS is still NP-hard. Using this fact, they establish the following theorem.
Theorem 5.4. State-Complexity over Large Fields is NP-complete.
Proof. Let H be a k 2k matrix at the input to MDS Code. If H indeed de
nes an MDS
code, then every k columns of H are linearly independent and width H = k for any matrix
0
H obtained by means of a permutation of columns from H . On the other hand, suppose that
0
the answer to the question of MDS Code is \Yes," which means that H contains a set of k
linearly dependent columns. Then one can construct a matrix H by listing these k columns
0
rst and the remaining k columns of H last. It is obvious that width H k ; 1. Hence, if
0
we take M = H and s = k;1 as the instance of State-Complexity over Large Fields,

it will be a \Yes" instance if and only if H is a \Yes" instance of MDS Code.
Jain, M#andoiu, and Vazirani 57] further extend these ideas to show that if the order of the
eld grows as part of the input, then each one of the following computational tasks:
Finding a permutation that minimizes the trellis state complexity s"
Finding a permutation that minimizes the total number of vertices jV j
Finding a permutation that minimizes the total number of edges jE j
becomes NP-hard. All these results suggest that
nding the optimal solution to the permu-
tation problem, regardless of the speci
c trellis complexity measure one wishes to minimize,
is likely to be intractable for the general class of linear codes.
As for many other intractable problems, several heuristics for the permutation problem have
been proposed in the literature. For instance, Kschischang and Horn 69] consider a gradient-
type search based on iteratively transposing columns in a trellis-oriented generator matrix for
the code. They apply this procedure to lexicographic codes and to BCH codes, with limited
success. Other heuristics for the permutation problem can be found in 24, 30, 107, 120].
5.2. Upper bounds on trellis complexity

The following general upper bound on the state complexity of linear codes is a trivial corollary
to the properties of the minimal trellis described in the previous section.
Theorem 5.5. Let C be a linear code of length n and dimension k over IFq , and let T be the
minimal trellis for C . Then the state complexity of T is upper bounded by s min(k n;k).
Proof. We know from Theorem 4.17 that si = k ; pi ; fi k for all i, and hence s k.
As C and C have the same state complexity by Theorem 4.20, this also implies s n ; k.
?
The simple result of Theorem 5.5 is known as the Wolf bound on trellis complexity, as it was
rst observed by Wolf in 126]. This bound holds for any (n k d) linear code C and any
permutation of the time axis for C . Thus, in a sense, Theorem 5.5 is an upper bound on the
62
state complexity of the trellis resulting from the worst possible permutation. In this sense,
the Wolf bound is exact for all MDS codes 39, 87], all cyclic codes 61], and many other linear
codes. However, the bound of Theorem 5.5 is also tight in the sense that there exist codes
for which the state complexity in the best possible permutation is given by min(k n;k).
We shall see examples of such codes shortly.
The Wolf upper bound of Theorem 5.5 can be re
ned in a number of ways. One option is
to consider subcodes with low contraction index. A non-negative integer is said to be the
contraction index of a linear code C of dimension k if a maximal set of pairwise linearly
independent columns in a generator matrix for C has k + elements. For more details on
codes and subcodes with prescribed contraction index see 116]. Suppose that an (n k d)
linear code C contains a subcode of dimension and contraction index , while the dual
code C contains a subcode of dimension and contraction index . Berger and Be'ery 5]
? ? ?
show that there exists a permutation of the time axis for C , such that
n o
s min k ; ( ; ) + 1 (n;k) ; ( ; ) + 1
? ?
(119)
in the corresponding minimal trellis. To minimize the bound of (119), one needs to
nd sub-
codes of high dimension and low contraction index in a given code. Indeed, the most useful
special case of (119) is when = 0. In this case, and simply count the largest number of
?
codewords with disjoint supports in C and C , respectively. See 113] for more details on this.
?
Example 5.2. Consider the (63 18 21) cyclic binary BCH code C , with roots at 3 5 7,
9 11 13 15, where is a primitive element of GF(26). The polynomial
c(x) = 1 + x3 + x6 + + x60 = xx3 ;; 11
63
is a codeword of C , since it vanishes at all the roots of unity except 0 21 , and 42. The
three cyclic shifts c(x), xc(x), and x2c(x) have disjoint supports. These three codewords thus
generate a subcode of C of dimension = 3 and contraction index = 0. It now follows
from (119) that there exists a permutation of the time axis for C , such that s k ;+1 = 16.
Indeed, this is the permutation that takes i = 3a + b into (i) = 21b + a, and makes the
spans of c(x), xc(x), and x2c(x) disjoint. Under this permutation, at most one of the three
generators (c) (xc) (x2c) is active at each time i, and hence si k ; 2 = 16 for all i. }
Another variation of the Wolf bound extends it to an upper bound on the entire state-
complexity pro
le. The state-complexity pro
le of the minimal trellis for C is given by:
si = i(G) = rank Gi + rank Gn i ; rank G
; for i = 0 1 : : : n (120)
where G is a generator matrix for C . Since rank Gn i rank G and rank Gi i, we conclude
that si i for all i and for all possible permutations of the time axis for C . On the other
;
hand, we have rank Gi rank G and rank Gn i n ; i, so that si n ; i. Combining this

;
with the Wolf bound of Theorem 5.5, we obtain:

n o
si min i k n;k n;i for i = 0 1 : : : n (121)
This bound is illustrated in Figure 17 for both low-rate and high-rate codes. By Lemma 5.2,
the bound of (121) holds with equality for every linear code, whenever i or n;i is strictly less
63
than min(d d ). It is not dicult to show that (121) correctly predicts the state-complexity
?
pro
le of MDS codes at all times. What is perhaps more surprising is that the converse is
also true: if (121) holds with equality for all i and under all possible permutations of the
time axis, then C is necessarily an MDS code.
Proposition 5.6. Let C be an (n k d) linear code over IFq . Then C is MDS if and only if
si = minfi k n;k n;ig for all i and for all permutations of the time axis for C .
Proof. ()) Suppose that C is MDS. Then the fact that (121) holds with equality for
all i, regardless of the speci
c time axis for C , follows immediately from (120) by observing
that every k columns in a generator matrix G for C are linearly independent. (() Assume
w.l.o.g. that k n ; k. Then equality in (121) implies in particular that sk = k = rank Gk .
If this is true under all possible permutations of the time axis for C , then every k columns
of G must be linearly independent, and C is an MDS code.
low-rate codes high-rate codes

si si
n–k
k
k n–k n time n–k k n time
Figure 17. Wolf bound on the state-complexity prole

Notice that MDS codes also attain the original Wolf bound s min(k n;k) under all possi-
ble permutations, but the converse is not true. For example, the (6 2 4) binary linear code
generated by (111100) and (001111) has state complexity s = 2 regardless of the permutation
of the time axis, although it is clearly not MDS.
Given a linear code C , we can de

ne a partial order on the permutations of the time axis
for C as follows. Let s1 s2 : : : sn and s1 s2 : : : sn be the state-complexity pro
les of the
0 0 0
minimal trellises for (C ) and (C ), corresponding to the permutations and , respec-
0 0
tively. Following 62], we say that uniformly dominates if si si for all i = 1 2 : : : n.

0 0
A permutation and the corresponding minimal trellis are said to be componentwise optimal ,
or uniformly ecient, if uniformly dominates every other permutation. Thus a componen-
twise optimal permutation simultaneously minimizes the number of vertices in the minimal
trellis for C at each time i = 1 2 : : : n. This is a strong requirement, akin to the de
nition
of the minimal trellis in the previous section. Determining whether there exists a uniformly
ecient permutation of the time axis for a given code appears to be a dicult problem.
64
There are very few codes, for which componentwise optimal permutations are known. Among
them, the MDS codes constitute an extreme special case. For MDS codes, every permutation
is trivially uniformly ecient | or uniformly inecient. Proposition 5.6 shows that the trellis
complexity of an (n k n;k+1) MDS code C is invariant under permutations: the minimal
trellis for C has the largest possible number of vertices at each time i, among all codes of
length n and dimension k over the same
eld, regardless of the ordering of the time axis.
Binary Reed-Muller codes constitute the only other in
nite family of nontrivial codes for
which uniformly ecient permutations are known. It turns out that the standard binary
order is componentwise optimal for the Reed-Muller codes. This remarkable result is due
to Kasami, Takata, Fujiwara, and Lin 60]. We now elaborate on this. Let L(r m) denote
the set of polynomials f (x1 x2 : : : xm) in m variables over IF2 that have degree at most r.
Then the binary Reed-Muller code of length 2m and order r can be de
ned as follows:
n o
R(r m) = ( f ( 1 ) f ( 2 ) : : : f ( 2m ) ) : f () 2 L(r m) and 1 2 : : : 2m 2 IF2m
The standard binary order for R(r m) is obtained by listing 1 2 : : : 2m in the above
expression in lexicographic order: that is, the binary m-tuples 1 2 : : : 2m are just the
radix-2 representations of the integers 0 1 : : : 2m;1, respectively. It is well-known 79]
that R(r m) may be obtained from Reed-Muller codes of length 2m 1 using the uju+v
;
construction of 79, p. 76], also called the squaring construction in 38], as follows:
n o
R(r m) = (u u + v) : u 2 R(r m;1) and v 2 R(r;1 m;1) (122)
Each of R(r m;1) and R(r;1 m;1) can in turn be obtained by the uju+v construction
from Reed-Muller codes of length 2m 2, as so forth. It is not dicult to see that pursuing
;
this recursively m times also produces the time axis in standard binary order for R(r m).
We present the following theorem without proof.
Theorem 5.7. The standard binary order is componentwise optimal for binary Reed-Muller
codes. The state complexity of the resulting minimal trellis for R(r m) is given by:
! ! !
m ;1 m ;3
s = r + r;1 + r;2 + + 1 m ;5
with the number of terms in the above summation being the minimum of r + 1 and m ; r.
The optimality of the standard binary order was established in 60], and the resulting state
complexity was computed in 5]. The state complexities of the componentwise optimal mini-
mal trellises for Reed-Muller codes of length up to 256 are summarized in Table 2.
Example 5.3. The (8 4 4) extended binary Hamming code is the
rst-order Reed-Muller
code R(1 3). We can obtain a basis for R(1 3) by evaluating the monomials
f0(x1 x2 x3) = 1 f1(x1 x2 x3) = x1 f2(x1 x2 x3) = x2 f3(x1 x2 x3) = x3
in L(1 3) at all the points (x1 x2 x3) of IF23. Evaluating each monomial in the lexicographic
order f (0 0 0) f (0 0 1) : : : f (1 1 1) produces the familiar basis (11111111), (00001111),
(00110011), and (01010101) for R(1 3). Notice that this basis is not in minimal span form"
however it can be easily brought into such form by elementary row operations, as discussed
65
in Section 4.2. This produces one of the two generator matrices for R(1 3) given in (81).
The corresponding minimal trellis for R(1 3) is depicted in Figure 3. Theorem 5.7 further
tells us that this trellis is not only minimal, but also optimal componentwise. }
Order Length m
r 1 2 3 4 5 6 7 8
0 1 1 1 1 1 1 1 1
1 1 3 4 5 6 7 8
2 1 4 9 14 20 27
3 1 5 14 29 49
4 1 6 20 49
5 1 7 27
6 1 8
7 1
Table 2. State complexities of binary Reed-Muller codes
In addition to the families of MDS codes and binary Reed-Muller codes, there are also several
sporadic examples of codes for which componentwise optimal permutations are known. The
most notable of these examples is the (24 12 8) binary Golay code G24. A componentwise
optimal permutation of the time axis for G24 was obtained by Forney in 38, 39], from the
Turyn a + xjb + xja + b + x construction 79, p. 588] for the Golay code:
n o
G24 = (a + x b + x a + b + x) : a b 2 R(1 3) and x 2 R (1 3)
where R(1 3) is the (8 4 4) Hamming code in the standard binary order and R (1 3) is the

same code in reverse order. Here is the resulting generator matrix in minimal span form:
2 3
1111 1111 0000 0000 0000 0000
66 0000 1111 1111 0000 0000 0000 77
66 0000 0000 1111 1111 0000 0000 77
66 0000 0000 0000 1111 1111 0000 77
66 0000 0000 0000 0000 1111 1111 77
66 0110 0110 0110 011 0 0000 0000 77
66 0011 0011 1100 11 00 0000 0000 77 (123)
66 0000 0101 0011 1001 1010 0000 77
66 0000 0011 0110 1010 1100 0000 77
66 0000 0000 0110 0110 0110 0110 77
64 0000 0000 0011 0011 1100 1100 75
0001 0001 0001 1110 1000 1000
We notice that this matrix also conforms to the standard Miracle Octad Generator (MOG)
coordinates for the Golay code 19, p. 303]. The standard MOG order is thus componentwise
optimal for the Golay code! The state-complexity pro
le
f0 1 2 3 4 5 6 7 6 7 8 9 8 9 8 7 6 7 6 5 4 3 2 1 0g (124)
for G24 can be found at a glance from (123). We will show in the next subsection, using the
DLP lower bound, that this state-complexity pro
le is optimal componentwise.
66
Another short code for which a uniformly ecient permutation of the time axis is known is
the (16 7 6) lexicode L16. Here is the generator matrix for L16 in minimal span form:
2 3
111111 0000000000
66 0101101110000000 77
66 0011010101100000 77
66 0000110110110000 77 (125)
66 0000011011001100 77
64 0000000111011010 75
0000000000111111
This matrix was found in 69] by computer search, using the heuristics mentioned in the
foregoing subsection. The componentwise optimal state-complexity pro
le is given by:
f0 1 2 3 3 4 4 4 5 4 4 4 3 3 2 1 0g (126)
The (16 9 4) dual code L16 has the same state-complexity pro
le by Theorem 4.20" the time
?
axis in (125) is obviously componentwise optimal for the dual code as well.
Our third and
nal example is the (48 24 12) quadratic-residue code Q48. Two dierent
uniformly ecient permutations of the time axis for Q48 were recently reported in 7, 24].
Here is one of the resulting generator matrices for Q48 in minimal span form:
2 3
111111 111111 000000 000000 000000 000000
66 011010 011010 001101 001110 000000 000000 000000 000000 77
000000 000000
66 001001 001001 011011 011011 000000 000000 000000 000000 77
66 000111 000111 111000 111000 000000 000000 000000 000000 77
66 000011 010100 100111 000001 111000 000000 000000 000000 77
66 000001 011001 110111 011100 110000 110000 000000 000000 77
66 7
66 000000 111111 111111 000000 000000 000000 000000 000000 777
66 000000 010001 000011 111010 111010 001001 110000 000000 77
66 000000 001100 110101 001000 110010 001010 000000 000000 77
66 000000 000110 101011 011001 100011 000000 000000 000000 77
66 000000 000011 110000 101001 101010 010010 11011 0 000000 77
66 000000 000000 111001 111101 010000 011000 000000 000000 77 (127)
66 000000 000000 010100 111110 111101 100010 0101 00 000000 77
66 000000 000000 001111 111110 00111 0 000000 000000 000000 77
66 000000 000000 000110 000010 101111 100111 000000 000000 77
66 000000 000000 000000 110001 100110 110101 011000 000000 77
66 000000 000000 000000 011100 011111 111100 000000 000000 77
66 000000 000000 000000 000000 000000 111111 111111 000000 77
66 000000 000000 000011 011111 010001 000111 100110 1 00000 77
66 000000 000000 000000 000111 100000 111001 001010 110000 77
66 000000 000000 000000 000000 000111 111000 000111 111000 77
66 000000 000000 000000 000000 110110 110110 100100 100100 77
64 000000 000000 000000 000000 011100 101100 010110 010110 75
000000 000000 000000 000000 000000 000000 111111 111111
This matrix was obtained by Berger and Be'ery 7] using a recursive `twisted squaring con-
struction' that is somewhat similar to (122). Another generator matrix for Q48, which is
also componentwise optimal, was found earlier in 24] by computer search. However, the
67
time axis of (127) has the additional nice property of making Q48 reversible. In both cases,
the state-complexity pro
le is the same. Since Q48 is a self-dual code with d = d = 12, we
?
know from Lemma 5.2 that si = s48 i = i for i = 0 1 : : : 11, under all permutations of the
;
time axis. The remaining 25 entries in the state-complexity pro

le are given by:
f 10111213141514151615161514151615161514151413121110 g (128)
Notice that this pro
le is symmetric: namely si = sn i for all i. It is not dicult to see that
;
this is always true for uniformly ecient permutations. Also notice that si 6= si 1 for all i.
;
Again, this is always true for self-dual codes, because the minimal trellis cannot contain the
= and ./ structures by Theorem 4.21. The fact that the pro
le in (128) | and hence also the
generator matrix in (127) | is optimal componentwise follows from the DLP lower bounds
on trellis complexity, discussed in the next subsection.
In addition to L16 G24, and Q48, uniformly ecient permutations were constructed in 62] for
the so-called minimal-span codes, and certain codes obtained by augmenting minimal-span
codes. Unfortunately, all these codes are trivial. In fact, the minimal-span codes of 62]
are precisely the codes with contraction index 1 of 5, 116]. Less trivial examples were
obtained by Encheva 27]. For instance, for each k 4 and a 1, Encheva 27] constructs
an (a2k 1+a k a2k 2 ) code together with a uniformly ecient permutation for this code.
; ;
However, none of these examples is as interesting as L16 G24, and Q48.

The work of 39, 60, 61] and in particular 62] contains detailed analysis of the properties
of uniformly ecient permutations. For example: an ordering of the time axis is uniformly
ecient for C if and only if it is also uniformly ecient for the dual code C " if a given
?
ordering of I is uniformly ecient then so is the reverse ordering, and so forth. Finally,
we note that codes for which a uniformly ecient permutation exists are said to satisfy
the double-chain condition in some papers 39, 27, 64]. For an extensive treatment of the
double-chain condition and codes that satisfy it, we refer the reader to 27, 28, 29, 64].
In some cases, it is possible to use the structure of a code in order to obtain reasonably
good, although not componentwise optimal, permutations of the time axis. We give just one
concrete example: we will show how this approach works for primitive binary BCH codes.
Let C be an extended primitive binary narrow-sense BCH code of length n = 2m, dimen-
sion k, and designed distance . As usual, we label the coordinates of C with the elements
f0 1 : : : n 2 g of the
nite
eld GF(2m ), where is a primitive element of GF(2m ). Thus
;
2 3
1 1 1 1 1
66 0 1 2 n 2 777
66
;
H = 66 0 1 2 4 2(n 2) 777
;
64 ... ... ... ... ... ... 7

5
0 1 1 2( 1) ( 1)(n 2)
; ; ; ;
is a parity-check matrix for C . Notice that if the time axis for C is chosen in this way,
then C becomes an extended cyclic code, and the state complexity of the resulting minimal
68
trellis for C is just one less than the Wolf bound min(k n;k). We can do much better, as
follows. Given a subset V GF(2m), let C (V ) denote the subcode of C consisting of all the
codewords whose support is con
ned to those positions that have labels in V .
Bounds on s
Code
Upper Lower
bound bound Reference
1. BCH8,4,4] 3 3 60]
2. BCH16,11,4] 4 4 60]
3. BCH16,7,6] 6 6 127]
4. BCH16,5,8] 4 4 60]
5. BCH32,26,4] 5 5 60]
6. BCH32,21,6] 10 10 127]
7. BCH32,16,8] 9 9 60]
8. BCH32,11,12] 10 10 127]
9. BCH32,6,16] 5 5 60]
10. BCH64,57,4] 6 6 60]
11. BCH64,51,6] 12 12 127]
12. BCH64,45,8] 14 12 74]
13. BCH64,39,10] 20 13 74]
14. BCH64,36,12] 19 15 114]
15. BCH64,30,14] 21 16 114]
16. BCH64,24,16] 16 14 114]
17. BCH64,18,22] 17 17 127]
18. BCH64,16,24] 15 15 127]
19. BCH64,10,28] 9 9 114]
20. BCH64,7,32] 6 6 60]
Table 3. Bounds on state complexity for primitive binary BCH codes

Now let us view GF(2m) as the vector space IF2m over GF(2). Then we can choose V as
a subspace of IF2m of dimension
m ; 1, and partition the space into 2m additive cosets
;
V1 V2 : : : V2m; of this subspace, with V1 = V . Let = dim C (V ). It is shown in 114] that

dim C (V1 ) = dim C (V2 ) = = dim C (V2m; ) =
Since the supports of C (V1 ) C (V2 ) : : : C (V2m; ) are disjoint by construction, we can re-
order the time axis for C so that the spans of the codewords in C (V1 ) C (V2 ) : : : C (V2m; )
also become disjoint. Namely, we list
rst all the positions with labels in V1, then all the
69
positions with labels in V2, and so on. In this ordering, the generators of at most one of the
subcodes C (V1 ) C (V2 ) : : : C (V2m; ) are active at each time, and hence the state complexity
of the resulting minimal trellis is upper bounded by
s k ; (2m ; 1)
;
The situation here is somewhat analogous to that of Example 5.2. The resulting generator
matrix for C is said to have direct-sum structure in 114]. This terminology is natural, since
C (V1 ) C (V2 ) C (V2m; )
constitutes a direct-sum subcode of C of dimension 2m . We can further pursue this idea re-
;
cursively: namely, we can partition V itself into additive cosets of a smaller subspace V V ,
0
thereby exhibiting direct-sum structure in each of the subcodes C (V1 ) C (V2 ) : : : C (V2m; ),
and so forth. Notice that the standard binary order | that is, lexicographic ordering of the
elements in IF2m | has this kind of recursive structure.
Some of the upper bounds on the state complexity of BCH codes obtained using these
techniques are listed in Table 3. The lower bounds are also included in Table 3 for comparison.
To
nd out how these bounds were obtained, see the next subsection and the references
provided in Table 3. Notice that the upper and lower bounds on state complexity coincide
for all but
ve codes in Table 3, demonstrating the utility of the direct-sum structure.
We point out that the approach discussed above { based on the direct-sum structure { is by no
means the only way to
nd good permutations. For example, the work of 114] also considers
the so-called concurring-sum structure, which leads to useful bounds on state complexity.
Other methods of constructing good permutations for BCH codes were developed by Kasami,
Takata, Fujiwara, and Lin in 59, 60, 61]. These methods were extended to Euclidean
geometry codes and generalized-concatenated codes in 86]. Cyclic codes of composite length
were considered in 6, 114]. For such codes, it is often possible to obtain good permutations
by partitioning the time axis into multiplicative, rather than additive, cosets | see 114] for
more details. Berger and Be'ery 7] consider codes whose automorphism group contains the
general ane group GA(m) or the projective special linear group PSL2(p). The former type
of codes includes all the primitive binary BCH codes of length 2m, while the latter includes
all the quadratic-residue codes of length p. It is shown in 7] that good permutations for
such codes can be obtained using the `twisted' squaring construction of 38].
The conclusion from all this is that the structure of a code is key to
nding good upper
bounds on trellis complexity. Thus, to some extent, the art of trellis decoding consists of
identifying the appropriate structure and using it to establish an ordering of the time axis.
70
5.3. Lower bounds on trellis complexity
A variety of powerful lower bounds on trellis complexity of linear codes follow directly from
Theorem 4.17 that counts the number of vertices and edges in the minimal trellis in terms
of the dimensions of the past and future subcodes, de
ned in (63) and (64) respectively.
Perhaps the earliest known lower bound of this type is due to Muder 87].
Theorem 5.8. Let C be an (n k d) linear code over IFq . Then under all permutations of
the time axis I for C , the state complexity of the minimal trellis for C is lower bounded by:
s k ; mini f K (i d) + K (n;i d) g
2I
where K (n d) denotes the largest possible dimension of a linear code of length n and mini-
mum Hamming distance d over IFq .
Proof. We know from Theorem 4.17 that si = k ; (pi + fi ), where pi and fi are the
dimensions of the past and future subcodes Pi and Fi, respectively. Thus the theorem follows
immediately by observing that the minimum distance of both Pi and Fi is at least d.
The foregoing argument can be easily extended to a lower bound on the entire state-comp-
lexity pro
le. In fact, one can just as easily produce a similar lower bound on the edge-
complexity pro
le. If follows directly from Theorem 4.17 that:
si k ; K (i d) ; K (n;i d) (129)
bi k ; K (i;1 d) ; K (n;i d) (130)
for all i = 1 2 : : : n, and under all possible permutations of the time axis. Notice that
these bounds are based on retaining a single essential piece of information about the past
and future subcodes: that their distance is at least d. This leaves room for improvement.
Indeed, the bounds (129) and (130) were improved upon by several authors 60, 114, 39] who
noticed that instead of just specifying a lower bound on the minimum distance of Pi and Fi ,
it might be better to characterize Pi and Fi as subcodes of C of support size i and n ; i,
respectively. This leads to the notion of dimension-length pro
le (DLP) and establishes an
interesting connection between the trellis complexity of a code and its generalized Hamming
weight (GHW) hierarchy. We start with some de
nitions.
We de
ne the support of a code C of length n, denoted (C ), as the set of all positions i such
that there exist codewords (c1 c2 : : : cn) (c1 c2 : : : cn ) 2 C with ci 6= ci. Notice that this
0 0 0 0
de
nition of (), introduced in 74], applies to both linear and nonlinear codes" for linear
codes, it coincides with the usual notion of support as the set of nonzero positions.
De nition 5.1. Let C be a linear code of length n and dimension k over IFq . Then the i-th
generalized Hamming weight of C is dened as:
di (C ) def
= min
D
j (D) j for i = 1 2 : : : k (131)
where the minimum is taken over all linear subcodes D C such that dim D = i. The
sequence d1(C ) d2 (C ) : : : dk (C ) is called the generalized Hamming weight hierarchy of C .
71
Generalized Hamming weights were
rst studied in 1977 by Helleseth, Kl%ve, and Mykkelveit
in 51], where they were called support weights. The current terminology was introduced by
Wei 121], following the work of Ozarow and Wyner 90] on codes for the wire-tap channel.
This terminology arises from the observation that the
rst component d1(C ) = d in the GHW
hierarchy is just the minimum weight of a nonzero codeword. Since the work of Wei 121], the
study of generalized Hamming weights has attracted considerable interest, and at least partial
results on the GHW hierarchy of a variety of codes are now available 7, 18, 51, 52, 121, 122].
We provide a more extensive bibliography on this subject in Section 7.
For our purposes, a somewhat dierent sequence will be more convenient. We will see shortly
that this sequence is essentially equivalent to the GHW hierarchy.
De nition 5.2. Let C be a linear code of length n and dimension k over IFq . We dene i(C )
as the dimension of the largest subcode of C of support size i, namely:
i (C ) def
= max
D
dim D for i = 1 2 : : : n (132)
where the maximum is taken over all linear subcodes D C such that j(D)j = i. The se-
quence 1(C ) 2 (C ) : : : n(C ) is called the dimension-length prole of C .
The DLP was introduced in 60, 114, 39] and later studied in 24, 74] and other works.
It is obvious from (131) and (132) that both the GHW hierarchy and the DLP are non-
decreasing sequences. The DLP and the GHW hierarchy are equivalent, in the sense that
either sequence is uniquely determined by the other, as follows:
di (C ) = min f j : j (C ) i g for i = 1 2 : : : k (133)
i (C ) = maxf j : dj (C ) i g for i = 1 2 : : : n (134)
where d0(C ) = 0(C ) = 0 by convention. For example, since d1(C ) = d, we can conclude
from (134) that i(C ) = 0 for i = 1 2 : : : d;1. This should be clear from De
nition 5.2
directly. It is also easy to see from De
nition 5.2 that n i(C ) = k ; i for i = 0 1 : : : d ;1.
;
?
Thus (133) implies that the last few terms in the GHW hierarchy are n;d +2 : : : n;1 n.
?
Theorem 5.9. Let C be an (n k d) linear code over IFq . Then, under all permutations of
the time axis I for C , the state complexity of the minimal trellis for C is lower bounded by:
s k ; mini f i (C ) + n i(C ) g
2I ;
Moreover, for all i = 1 2 : : : n, the state-complexity prole and the edge-complexity prole
of the minimal trellis are lower bounded by:
si k ; i(C ) ; n i(C )
; (135)
bi k ; i 1(C ) ; n i(C )
; ; (136)
under all permutations of the time axis. Furthermore, these bounds hold with equality when-
ever i < min(d d ) or i > n ; min(d d ) + 1, where d denotes the dual distance.
? ? ?
Proof. It is obvious from the de

nition of the dimension-length pro
le that pi i (C )
and fi n i(C ) for all i, and for all permutations of the time axis. Thus all the bounds in
;
72
Theorem 5.9 follow directly from Theorem 4.17. If i < min(d d ) then (135) and (136) reduce
?
to si i and bi i, respectively, and equality holds by Lemma 5.2. For i > n;min(d d )+1, ?
the bounds in (135) and (136) reduce to si n ; i and bi n ; i +1, respectively, and equality
follows from Lemma 5.2 and Table 1.
The bounds of Theorem 5.9 are known as the DLP bounds on trellis complexity. If C is an
(n k d) linear code, then clearly i (C ) K (i d) for all i. The sequence K (1 d) K (2 d) : : :
is thus an upper bound on the dimension-length pro
le of any code with minimum distance d.
This simple observation is known 39] as the distance bound on the dimension-length pro
le.
The sequence K (1 d) K (2 d) : : : can be estimated using any of the techniques described
in 79, Chapter 17] or, if the code parameters are small enough, the required values can be
simply looked up in the tables of Brouwer 12, 13]. Inferring the dimension-length pro
le
from the distance bound reduces Theorem 5.9 to the Muder bound of Theorem 5.8. In some
cases, this is sucient to completely determine the trellis complexity of a code.
Example 5.4. Referring to the table of 13], the distance upper bound on the dimension-
length pro
le of a binary linear (n k 6) code is given by:
f0 0 0 0 0 1 1 1 2 2 3 4 4 5 6 7 8 9 9 10 11 12 13 14 14 15 16 g (137)
This pro
le is attained by the (6 1 6) repetition code, by the (15 6 6) Kasami code 53],
and by the (16 7 6) lexicode L16, discussed in the previous subsection. The corresponding
lower bound on the GHW hierarchy of an (n k 6) binary code is given by:
f6 9 11 12 14 15 16 17 18 20 21 22 23 24 26 27 g (138)
This bound is derived from (133) as follows: the sequence in (138) consists of those integers i
for which K (i 6) > K (i;1 6) in (137). Of course, this can be also obtained directly from 13].
Now consider the (16 7 6) lexicode L16. Taking the
rst sixteen terms in (137) gives an upper
bound on i (L16)" ipping the resulting sequence around gives an upper bound on 16 i(L16)" ;
substituting all this into equation (135) of Theorem 5.9 produces a lower bound on the state-
complexity pro
le of L16 as follows:
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
i(L16) 0 0 0 0 0 0 1 1 1 2 2 3 4 4 5 6 7
16 i(L16) 7 6 5 4 4 3 2 2 1 1 1 0 0 0 0 0 0
;
si 0 1 2 3 3 4 4 4 5 4 4 4 3 3 2 1 0
The state-complexity pro
le in (126) attains this bound at each position. Hence the time
axis for L16 given by the generator matrix in (125) is optimal componentwise. }
Example 5.5. Referring once again to the table of 13], the distance bound on the dimension-
length pro
le of a binary linear (n k 8) code is given by:
f0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12 12 12 13 14 15 16 g (139)
This pro
le is attained by the (8 1 8) repetition code, by the (16 5 8)
rst-order Reed-Muller
code, and by the (24 12 8) binary Golay code G24, but not by any (32 16 8) code.
73
As in the foregoing example, the distance bound on DLP suces to completely determine
the trellis complexity of the Golay code. Taking the
rst 24 terms in (139), we obtain the
following bounds on i (G24), 24 i(G24), and the state-complexity pro
le of the Golay code:
;
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
i(G24) 0 0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12
24 i(G24) 12 11 10 9 8 7 6 5 5 4 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
;
si 0 1 2 3 4 5 6 7 6 7 8 9 8 9 8 7 6 7 6 5 4 3 2 1 0
The state-complexity pro
le in (124) attains this bound at each position. It follows that the
generator matrix in (123) and the standard MOG order are optimal componentwise. }
In contrast to the situation in the foregoing two examples, the distance bound on DLP is
not enough to determine the trellis complexity of the (48 24 12) quadratic-residue code Q48.
In fact, the Muder bound of Theorem 5.8 does not even show that the state complexity of
the minimal trellis for Q48 is at least s 16 under all permutations of the time axis. Since
we already know that the state-complexity pro
le in (128) is attainable, and si 15 for
i 6= 20 22 26 28 in (128), the only positions where one could hope to show that si 16
are i = 20 22 26 28. However, all we get from the Muder bound (129) at these positions is:
s20 s28 k ; K (20 12) ; K (28 12) = 24 ; 2 ; 7 = 15
s22 s26 k ; K (22 12) ; K (26 12) = 24 ; 3 ; 6 = 15
Thus to prove that the state-complexity pro
le in (128) is componentwise optimal for Q48,
we have to make essential use of dimension-length pro
les and the bound of Theorem 5.9.
In particular, we will need the following DLP duality theorem.
Theorem 5.10. The dimension-length proles of an (n k d) linear code C and of its dual
(n n;k d ) code C are related to each other as follows:
? ?
i(C ) = i ; k + n i(C )
?
; for i = 1 2 : : : n (140)
Thus the dimension-length prole of C is determined by the dimension-length prole of C ,
?
and vice versa.

Proof. In Section 4, we have established the relation pi = i ; k + fi between the dimensions
?
of the past and future subcodes of C and C . Evidently pi i (C ) and fi n i(C ) for
? ? ?
;
all i and for all permutations of the time axis. Furthermore, there exists some permutation
of the time axis, such that fi = n i(C ). For this permutation, we have:
;
i(C ) pi = i ; k + fi = i ; k + n i(C )
? ?
; (141)
On the other hand, there exists some permutation of the time axis, possibly dierent from ,
such that pi = i (C ). For this permutation, we have:
? ?
i(C ) = pi = i ; k + fi i ; k + n i(C )
? ?
; (142)
The key observation is that the dimension-length pro
les of C and C are invariant under
?
coordinate permutations. Hence (141) and (142) imply equality in (140) for all i.
74
Theorem 5.10 is equivalent to the duality theorem of Wei 121] relating the GHW hierarchies
of a code and its dual. Let us de
ne the inverse GHW hierarchy by the relation:

di (C ) def
= n ; di (C ) + 1 for i = 1 2 : : : k
Then Wei 121] shows that the inverse GHW hierarchy of C and the GHW hierarchy of its
dual code C partition the time axis:
?
f d1 (C ) d2 (C ) : : : dk (C ) g f d1 (C ) d2 (C ) : : : dn k (C ) g = f1 2 : : : ng (143)
? ?
;
?
In other words, every positive integer i n is either a generalized Hamming weight of C , ?
or else n ; i + 1 is a generalized Hamming weight of C , but not both. This striking duality
result closely resembles the partition of the time axis into the left and right indices of C
and C established in equation (106) of Section 4.
?
Indeed, we now show that both partitions of the time axis essentially follow from the same
relation pi = i ; k + fi between the past and future subcodes of C and C . It is clear from
? ?
De
nition 5.2 that the DLP sequence 0 (C ) 1 (C ) : : : n (C ) increases from zero to k, in
k distinct unit steps. We can de
ne the inverse DLP sequence by the relation:
i (C ) def

= k ; n i(C )
; for i = 0 1 : : : n
This inverse DLP sequence 0(C ) 1 (C ) : : : n(C ) also increases from zero to k, in k distinct

unit steps, and Theorem 5.10 can be re-phrased in terms of the inverse DLP as follows:
i (C ) = i ; i (C )
?
for i = 0 1 : : : n (144)
Thus it follows from Theorem 5.10 that the dual DLP sequence 0(C ) 1(C ) : : : n(C )
? ? ?
increases from zero to n ; k, in n ; k distinct unit steps which occur precisely when there is
no increase in the inverse DLP sequence 0 (C ) 1 (C ) : : : n (C ). In conjunction with (133),

this establishes the duality theorem of Wei 121] in (143).

Wei suggested in 121] that the GHW duality relations (143) and (144) should have something
to do with the MacWilliams identities 79, Chapter 5]. Although no immediate connection of
this type is apparent, Kl%ve 63] and Simonis 99] have been able to prove generalized Mac-
Williams identities for \support weight distributions," which determine GHW hierarchies.
We now return to the problem of bounding the trellis complexity of the quadratic-residue
code Q48. The key point is that Theorem 5.10 implies a stronger bound on the dimension-
length pro
le sequence than the distance bound i (C ) K (i d), namely:
i (C ) minf K (i d) K (n;i d ) + i ; (n;k) g
?
(145)
For self-dual codes, the minimum in (145) usually coincides with the second term for i k.
For example, for the (48 24 12) quadratic-residue code Q48, we obtain:
K (i 12) 567789101111121314151617181920202122232324
K (48;i12)+ i ; 24 556678 8 910111213131415161718192021222324
for i = 25 26 : : : 48, where we have again made use of the table of 13]. Substituting the
resulting bounds on i(Q48) into Theorem 5.9 shows that the state-complexity pro
le in (128)
75
is optimal componentwise. Furthermore, this also implies that the bound (145) holds with
equality for all i = 1 2 : : : 48 in the case of the (48 24 12) quadratic-residue code, and
completely determines the dimension-length pro
le and the GHW hierarchy of this code.
The latter property is true in general: a componentwise optimal permutation of the time
axis for a linear code C always determines the DLP and the GHW hierarchy of C .
A useful general bound on the dimension-length pro

le and/or GHW hierarchy of any (n k d)
linear code over IFq is the well-known 79, p. 547] Griesmer bound:
& ' & ' & '
di (C ) d + q + q2 + + qid 1
d d
;
(146)
Using this bound along with Theorem 5.9 and (134), we will establish a simple lower bound
on the state complexity of low-rate binary codes with large minimum distance. The following
theorem shows that the upper bound s k of Theorem 5.5 cannot be improved by more
than 1 or 2 units, when d=n is greater than about 0.4 or 0.33, respectively.
Theorem 5.11. Let C be an (n k d) binary linear code. Then, under all permutations of
the time axis, the state complexity of the minimal trellis for C is lower bounded by:
s k;1 if d 25 (n + 2) (147)
s k;2 if d 31 (n + 2) (148)
Proof. The Griesmer bound (146) implies that the second generalized Hamming weight
is lower-bounded by d + dd=2e for binary linear codes. If d 52 (n +2), then this means that
d2(C ) > n ; d +1 and hence n d+1(C ) 1. The DLP bound of Theorem 5.9 now shows that:
;
s k ; d 1(C ) ; n d+1(C ) k ; 1
; ;
By a similar argument, if d 13 (n + 2) then d2(C ) > dn=2e and n=2 (C ) 1. Thus (148)
d e
follows again from the DLP bound (135) of Theorem 5.9, evaluated at i = dn=2e.
The argument of Theorem 5.11 can be easily pursued further, using more and more terms in
the Griesmer bound, to show that s k ; 3 if d=n is greater than about 4=13 ' 0:308, and
s k ; 4 if d=n is greater than about 2=7 ' 0:286, and so on. However, the proof of all this
soon becomes tedious. In general, we shall see in Section 5.5 that the upper bound s k
cannot be improved upon by more than a constant if and only if d=n 0:25.
Lower bounds on s similar to Theorem 5.11 were established by Ytrehus 127]. However, in
contrast to all the lower bounds discussed so far | which are based on GHW hierarchy and
dimension-length pro
les | Ytrehus 127] uses the distance set of a code. Given a code C
of length n, we de
ne the distance set of C as follows:
D(C ) def
= f 0 w n : 9c c 2 C such that d(c c ) = w g
0 0
Thus for a linear code C , the distance set D(C ) is just the set of integers that occur as
Hamming weights of the codewords of C . The one important property that is shared by the
76
distance set and the DLP is that both are obviously invariant under permutations of the
time axis. The following theorem is due to Ytrehus 127]. It is presented without proof.
Theorem 5.12. Let C be an (n k d) binary linear code, and let s denote the state com-
plexity of the minimal trellis for C . If the distance set of C satises
D(C ) f0g fd d + 1 : : : 2d ; 1g fng (149)
then s k ; 1. If furthermore n 2 D(C ) then s = k ; 1. On the other hand, if D(C ) satises
(149) with the additional restriction that n 2d;1 2d;2 62 D(C ) then s = k.
Notice that if a binary linear code C is self-complementary, namely if n 2 D(C ), then the
condition (149) is satis
ed whenever d=n > 13 . It follows that s = k ; 1 for all such codes by
Theorem 5.12. In particular, the dual codes of extended double-error-correcting binary BCH
codes of length 2m are self-complementary, and d = 2m 1 ; 2 m=2 > 2m=3 for all m 5.
? ; b c
Thus the state complexity of these codes and their duals is s = 2m by Theorem 5.12.
Lower bounds on the state-complexity of the (16 7 6), (32 21 6), (32 11 12), (64 51 6),
(64 18 22), and (64 16 24) BCH codes in Table 3 are derived from Theorem 5.12, by consid-
ering the distance sets of these codes and/or of their duals.
Another lower bound on the state complexity of the minimal trellis is known 17, 62, 73] as
the span bound. This bound has a simple proof which does not involve the notions of past
and future subcodes, dimension-length pro
les, and so forth.
& '
s k (d ; 1)
= dR(d ; 1)e
n
Proof. Let x1 x2 : : : xk be a basis for C in minimal span form. We have observed in
equation (75) of Section 4 that the total span of the minimal trellis for C is given by:
= s0 + s1 + + sn = sx1] + sx2] + + sxk ]
Since wt(xi) d for all i, it is obvious that sxi] d ; 1 for all i, and for all permutations of
the time axis. Hence k(d ; 1). The theorem now follows by observing that the maximum
value s of the sequence s1 s2 : : : sn is lower bounded by its average value =n.
Since the proof of Theorem 5.13 makes essential use of linearity | it assumes the existence
of a basis in minimal span form | it is somewhat surprising that the lower bound
logq Vmax R(d ; 1) (150)
holds also for nonlinear codes, where the rate R has the usual meaning R = logq jC j=n.
Loosely speaking, the proof of this result is based on dividing the time axis for the code into
dn=(d;1)e sections, each of length d;1, and using the fact that no two paths can start
and end at the same vertices in a trellis section of length less than d. One can then relate
77
the maximum number of vertices in the trellis to the total number of paths, which in turn
must be greater than or equal to the number of codewords in the code. For a detailed proof
of the span bound on trellis complexity of nonlinear codes, see Lafourcade and Vardy 73].
The span bound of Theorem 5.13 is usually a weak bound. For instance, for L16, G24, and Q48,
we conclude from the span bound that the state complexity of the minimal trellis is at least
3, 4, and 6, respectively, whereas the DLP bound of Theorem 5.9 establishes the true values
of 5, 9, and 16. However, we shall see in Section 5.5 that asymptotically the span bound
becomes stronger than the DLP bound for high-rate codes.
Moreover, the span bound holds essentially without change for tail-biting trellises 17, 67],
and can be further extended to more general representations of a code by various graphs 66].
On the other hand, the DLP bound does not appear to lend itself to such generalizations.
Although the span bound of Theorem 5.13 and the DLP bound of Theorem 5.9 seem to be
genuinely dissimilar, we now derive a powerful lower bound on trellis complexity of linear
codes that includes Theorem 5.8, Theorem 5.9, and Theorem 5.13 as special cases. This
bound is obtained by partitioning the time axis for C into several { that is, generally more
than two { sections of varying lengths, and counting the number of paths between vertices of
the minimal trellis that lie on section boundaries, in two dierent ways. In some cases, this
results in substantially more accurate estimates of trellis complexity than the DLP bound.
We start with some notation. Let T = (V E IFq ) be the minimal trellis for an (n k d) linear
code C over IFq . Given a vertex v 2 Vi and a positive integer j n; i, let P (v" j ) denote the
set of all paths of length j in T starting at v. Further, given a particular vertex v 2 Vi+j , 0
we denote by P (v v ) the subset of P (v" j ) consisting of all the paths (of length j ) in T that
0
start at v and end in v . We have the following lemma.

0
Lemma 5.14.
logq jP (v v )j j (C )
0
(151)
Proof. First consider the unique path in T corresponding to the all-zero codeword, and
assume that both v and v lie on this path. Let J = fi + 1 i + 2 : : : i + j g and let
0
C def = f (c1 c2 : : : cn) 2 C : ci = 0 for all i 62 J g

J
denote the shortened subcode of C whose support (C ) is con

ned to J . Clearly, if v v lie
J
0
on the path corresponding to the all-zero codeword, then the sequence of edge-labels along
any path in P (v v ) can be completed to a codeword of C . Since j(C )j jJ j = j , we
0
have dim C j (C ) by the de

nition of dimension-length pro
le. As the trellis T is proper,
J J
all the paths in P (v v ) must be labeled distinctly, and hence jP (v v )j jC j. Thus
J
0 0
J
logq jP (v v )j dim C j (C )

0
J
as claimed. Now let v v be arbitrary vertices in Vi and Vi+j , respectively. We will distinguish
0
between two cases. If there is no path from v to v in T then P (v v ) = ? and (151) holds vac-
0 0
uously. Otherwise, let x = (x1 x2 : : : xn) be a codeword of C such that (xi+1 xi+2 : : : xi+j )
78
corresponds to a path from v to v in T . A trellis T for C ; x may be obtained from the trellis
0 0
T = (V E IFq ) by subtracting xi from the label of each edge in Ei, for all i = 1 2 : : : n.
It is obvious that this does not alter the structure of the trellis, and in particular the number
of paths from v to v . It is also obvious that the vertices v and v lie on the all-zero path
0 0
in T . In other words, the labels of the paths in P (v v ) correspond to a coset of C , provided
0 0
there is a path from v 2 Vi to v 2 Vi+j . Hence logq jP (v v )j j (C ; x) = j (C ) as before,

J
0 0
where the second equality follows by the linearity of C .

By the de
nition of P (v" j ) and P (v v ), we have P (v" j ) = v0 Vi+j P (v v ). Thus the fore-
0
2
0
going lemma establishes an upper bound on the total number of paths in P (v" j ) in terms of
the dimension-length pro
le of C and the number of vertices at time i + j , as follows:
X
jP (v" j )j = jP (v v )j jVi+j j qj (C )
0
(152)
v0 Vi+j
2
On the other hand, one can also relate jP (v" j )j to the dimensions of the future subcodes of C
at times i and i+j . We have shown in Theorem 4.19 that the out-degree of each vertex v 2 Vi
is given by degout(v) = qfi fi+1 for all i = 0 1 : : : n;1. Therefore
;
i+Y
j 1 ;
jP (v" j )j = qfl fl+1 = qfi fi+j

; ;
(153)
l=i
Using these results we can now prove the following lower bound on the state-complexity of
linear codes. Later on, we shall see that a similar bound holds for nonlinear codes as well.
& '
k ; l (C ) ; l (C ) ; ; lL (C )
s 1
L;1
2
(154)
where 1(C ) 2 (C ) : : : n (C ) is the dimension-length prole of C , and l1 l2 : : : lL is any
set of positive integers such that l1 + l2 + + lL = n.
Proof. We
rst observe that the enumeration of paths in P (v" j ) obtained in (152)
and (153) produces a lower bound on the number of vertices at time i+j in T , as follows:
si+j = logq jVi+j j logq jP (v" j )j ; j (C ) = fi ; fi+j ; j (C ) (155)
Next, we partition the minimal trellis T into L sections of lengths l1 l2 : : : lL , respectively.
Let hj = l1 + l2 + + lj denote the boundary of the j -th section in this partition. Set h0 = 0
by convention. Then (155) implies that
XL X L XL
sh1 + sh2 + + shL fhj;1 ; fhj ; lj (C ) = ( fhj;1 ; fhj ) ; lj (C ) (156)
j =1 j =1 j =1
Recall that at time i = 0 the entire code lies in the future while at time i = n the entire
code lies in the past: in other words f0 = k and fn = 0. But the
rst sum on the right-hand
79
side of (156) telescopes precisely to the dierence f0 ; fn = k. It follows that the expression
on the right-hand side of (156) can be rewritten as:
sh1 + sh2 + + shL k ; l1 (C ) ; l2 (C ) ; ; lL (C ) (157)
Furthermore, if T is the minimal trellis for C then Vn consists of the single toor vertex ',
and hence shL = sn = 0. Thus the summation on the left-hand side of (157) contains at most
L ; 1 nonzero terms. To complete the proof, it remains to observe that s shj for all j .
It is easy to see that the DLP bound of Theorem 5.9 is a special case of Theorem 5.15 obtained
by setting L = 2 and then maximizing over all partitions of the type l1 = i and l2 = n ; i.
The span bound s dR(d ; 1)e is also a special case of Theorem 5.15, obtained by taking
L = dn=(d ; 1)e and letting l1 l2 : : : lL d ; 1. In this case, we have:
l1 (C ) = l2 (C ) = = lL (C ) = 0
and Theorem 5.15 reduces to Theorem 5.13. For many codes, Theorem 5.15 is stronger than
both the span bound and the DLP bound, as the following example demonstrates.
Example 5.6. Consider the (64 39 10) BCH code. The DLP of this code is not known, and
the dual distance is d = 8. In this case, the DLP and Muder bounds coincide, giving:
?
s 39 ; K (31 10) ; K (33 10) 39 ; (13 + 14) = 12

On the other hand, partitioning the time axis for this code into three sections of lengths
l1 = 22, l2 = 20, and l3 = 22, we conclude from Theorem 5.15 and the table of 13] that:
& '
39 ; K (22 10) ; K (20 10) ; K (22 10) 39 ; 5 ; 4 ; 5
s 2
=
2
= 13
As another example, consider the (127 85 13) BCH code. Using the table of 13] and (145)
to evaluate the DLP bound for this code, we get s 14. On the other hand, substituting
the partition l1 = l2 = l3 = 30, and l4 = 37 into Theorem 5.15 produces
& ' 85 ; 36 ; 12
85 ; 3K (30 13) ; K (37 13)
s 3

3
= 19
Notice that partition of the time axis into four sections is necessary in this case | all
partitions with L 6= 4 yield bounds that are below 19. }
In general, one is interested in
nding a partition l1 l2 : : : lL of the time axis that yields
the highest lower bound on s when substituted in (154). Since the total number of possible
partitions is 2n 1, it might appear that complete evaluation of (154) requires a search pro-
;
cedure whose complexity is exponential in n. However, we shall see in the next section that
this is not so: the `best' partition of the time axis can be easily found in time O(n3 ).
This observation makes it possible to automate the evaluation of the lower bound of Theo-
rem 5.15, given the code parameters n, k, and d. This is precisely what we have done, and
our programs may be accessed by electronic mail at trellis@golay.csl.uiuc.edu. In par-
80
ticular, we have applied the lower bound of Theorem 5.15 to all the 8128 best-known binary
linear codes of length 128 in the table of Brouwer 12]. The resulting table is currently
available via anonymous ftp at ftp.csl.uiuc.edu:/pub/trellis/table-s.gz.
Bounds on s
Code
DLP Partition LV
bound bound
1. Hamming 9,3,4]
1 3 3 3
0 0 0 2
2. Hamming 13,7,4]
2 5 3 5
1 0 1 3
3. Hamming 41,32,4]
3 9 9 5 9 9
4 4 1 4 4 4
4. BCH 64,39,10] 12 22 20 22
5 4 5 13
5. BCH 70,45,9] 11 24 22 24
7 6 7 13
6. BCH 73,46,10]

11 25 23 25
7 6 7 13
7. BCH 76,44,11] 11 27 22 27
7 4 7 13
8. BCH 76,50,9] 10 24 28 24
7 11 7 13
9. Goppa 97,62,12]

12 28 41 28
7 19 7 15
10. Goppa 105,56,16] 13 34 37 34
6 8 6 18
11. Goppa 109,61,14]

7 26 26 26 31
3 3 3 6 16
12. BCH 127,57,23] 22 42 43 42
3 3 3 24
13. BCH 127,64,21] 22 40 45 42
3 6 4 26
14. BCH 127,71,19] 21 43 41 43
7 6 7 26
15. BCH 127,78,15] 14 40 42 45
11 13 15 20
16. BCH 127,85,13] 14 30 30 30 37
6 6 6 12 19
17. BCH 127,92,11] 13 42 43 42
20 21 20 16
18. BCH 127,99,9] 12 24 32 47 24
7 14 28 7 15
Table 4. Lower bounds on state complexity for some binary linear codes
Herein, we provide a small representative table of lower bounds on s for some 18 codes
selected from the table of 13]. The values listed immediately below li represent upper bounds
on li (C ), which are also deduced from the table of 13]. The asterisk denotes shortening.

81
The argument of Theorem 5.15 can be easily modi
ed to produce a lower bound on the edge
complexity b of the minimal trellis. It is obvious that for linear codes b = s or b = s + 1, and
any bound on s is also a bound on b. The next theorem, however, gives a lower bound on b
which is often tighter than the trivial statement b s.
the time axis, the edge complexity of the minimal trellis for C is lower bounded by:
& '
k ; l1 (C ) ; l2 (C ) ; ; lL (C )
b L;1 (158)
where 1(C ) 2 (C ) : : : n (C ) is the dimension-length prole of C , and l1 l2 : : : lL is any
set of positive integers such that l1 + l2 + + lL = n ; L + 1.
The proof of Theorem 5.16 is similar to that of Theorem 5.15, and is omitted. We refer the
reader to Lafourcade and Vardy 74] for a detailed proof. The lower bound of Theorem 5.16
has been also evaluated for all the 8128 best-known binary linear codes of length 128
in the table of Brouwer 12], and the results are currently available via anonymous ftp at
ftp.csl.uiuc.edu:/pub/trellis/table-b.gz.
Some of the forementioned lower bounds on s and b can be converted into bounds on the
total number of vertices jV j and the total number of edges jE j in the minimal trellis. For
instance, the DLP bounds (135) and (136) of Theorem 5.9 immediately imply that:
X
n
jV j qk i (C ) n;i (C )
; ;
(159)
i=0
Xn
jE j qk ;i;1 (C ) n;i (C )
;
(160)
i=1
Theorems 5.15 and 5.16 often lead to tighter lower bounds on jV j and jE j, respectively. The
derivation of these bounds requires some explanation. Let J = fj1 j2 : : : jLg be a subset
of I = f1 2 : : : ng. Assume w.l.o.g. that j1 < j2 < < jL and de
ne the functions:
Fs (J C ) def = k ; j1 (C ) ; j2 j1 (C ) ; ; jL jL;1 (C ) ; n jL (C )
; ; ;
Fb (J C ) = k ; j1 1(C ) ; j2 j1 1(C ) ; ; jL jL;1 1(C ) ; n jL (C )

def
; ; ; ; ; ;
We have shown in (157) that for every subset J I , the corresponding values of the state-
complexity pro
le satisfy the linear constraint sj1 + sj2 + + sjL Fs (J C ). We can
therefore set up a nonlinear integer programming problem with linear constraints as follows:
Minimize V (s0 s1 : : : sn) def
= qs0 + qs1 + + qsn
(161)
subject to: X
sj Fs (J C ) for all J I
j 2J
This problem may be solved using standard (nonlinear) constrained optimization techniques
| see for instance Bertsekas 9] and references therein. In many cases, most of the constraints
82
in (161) are redundant, and the optimal solution can be found using elementary methods
described in 74]. We shall see an example of this situation shortly.
Likewise, it is shown in 74] that for every subset J I , the corresponding values of the edge-
complexity pro
le b1 b2 : : : bn satisfy the linear constraint bj1 + bj2 + + bjL Fb (J C ).
This leads to a similar integer programming problem with linear constraints:
Minimize E (b1 b2 : : : bn) def
= qb1 + qb2 + + qbn
(162)
subject to: X
bj Fb (J C ) for all J I
j 2J
It is easy to see that the optimal solutions to the two problems that we have set up constitute
lower bounds on jV j and jE j, respectively. Thus we have the following theorem.
Theorem 5.17. Let C be a linear code of length n over IFq . Then the total number of vertices
and the total number of edges in the minimal trellis for C are lower bounded by:
jV j V (s0 s1 : : : sn )

jE j E (b1 b2 : : : bn )

under all permutations of the time axis, where V (s0 s1 : : : sn) and E (b1 b2 : : : bn) denote

the optimal solutions to the minization problems (161) and (162), respectively.
Notice that the DLP bounds (159) and (160) are essentially special cases of Theorem 5.17,
which result by retaining only constraints of the type J = fj g in problems (161) and (162),
respectively. The de
nitions of Fs (J C ) and Fb (J C ) then reduce to:
Fs(j C ) = k ; j (C ) ; n j (C ) ; (163)
Fb (j C ) = k ; j 1(C ) ; n j (C )
; ; (164)

and, since all the n1 constraints are disjoint in this case, it is obvious that the optimal
solutions to (161) and (162) are given on the right-hand side of (159) and (160), respectively.
In general, however, we have 2n dierent constraints, so that complete evaluation of the lower
bounds of Theorem 5.17 appears to be computationally intractable. Nevertheless, most of
these 2n constraints are either redundant or do not improve upon the DLP bound, which
usually leaves only a small number of `useful' constraints in addition to (163) and (164).
Example 5.7. Consider again the (64 39 10) binary BCH code C , with dual distance d = 8. ?
We will show how equation (160) and Theorem 5.17 can be used to establish a lower bound
on the number of edges jE j in the minimal trellis for C . The DLP bound (136) on the
edge-complexity pro
le of the minimal trellis is given by:
bi 39 ; i 1(C ) ; 64 i(C )
; ; for i = 1 2 : : : 64 (165)
Thus, using equation (145) to estimate the dimension-length pro
le of the (64 39 10) BCH
code, we conclude from (160) and (165) that jE j 161 020. On the other hand, a simple com-
puter search produces 324 additional useful constraints for the minimization problem (162),
83
corresponding to partitions of the time axis into 3 and 4 sections. All the other constraints
in (162) are subsumed by (165). Most of these 324 constraints are redundant, so that they
can be further reduced to the following system of only 20 inequalities:
b23 + b43 26 b39 + b19 26 b38 + b15 25
b24 + b44 26 b23 + b49 25 b39 + b16 25
b25 + b45 26 b24 + b50 25 b20 + b40 26
b26 + b46 26 b25 + b51 24 b + b 26 (166)
b27 + b47 26 b26 + b52 24 21 41
b37 + b17 25 b27 + b53 23 b22 + b42 26
b38 + b18 26 b37 + b14 24 b28 + b48 25
This system, augmented by the 64 inequalities in (165), can be easily solved using elementary
techniques described in 74]. The solution produces the values of b1 b2 : : : b64 that satisfy
the constraints of (166) and (165), while minimizing the objective function:
E (b1 b2 : : : b64) = 2b1 + 2b2 + + 2b64
These values of b1 b2 : : : b64 are listed below, with the entries exceeding the DLP bound
of (165) set in boldface:
i : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
bi = 1 2 3 4 5 6 7 7 8 9 9 9 10 11 12 12 12 13 13 13 13 13 13 13
i : 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
(167)
bi = 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 13 13 13
i : 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
bi = 13 13 13 12 12 12 11 11 10 9 9 8 7 7 6 5 4 3 2 1
Notice that these values do not constitute lower bounds on the edge-complexity pro
le of
the minimal trellis. For example, there exists a time axis for C such that b14 = 10, despite
the fact that the corresponding entry in (167) is 11. Nevertheless, Theorem 5.17 allows us to
deduce from (167) a lower bound on the total number of edges in the minimal trellis, under
all permutations of the time axis. The resulting bound is jE j 274 172. }
All the lower bounds discussed so far pertain to linear codes. Indeed, the proofs of Theorems
5.9, 5.13, and 5.15 make essential use of linearity. Nevertheless, we will now show that
most of these results extend to nonlinear codes as well. This extension will furthermore
establish interesting relations between the trellis structure of linear and nonlinear codes and
information-theoretic measures, such as entropy and mutual information. Here is a typical
example: the logarithm of the number of vertices at time i in any trellis T for C cannot
be smaller than the mutual information between the past and the future at time i, under
a uniform probability distribution on the codewords of C . See 82, 92] for a proof of this result.
84
Our
rst goal is to generalize the notion of dimension-length pro
les to nonlinear codes. In
what follows, we describe two dierent ways to do so. One simple generalization of this type,
introduced in 74], is known as the cardinality-length pro
le or CLP.
De nition 5.3. Let C be a code of length n over an alphabet A of size q. We dene
i(C )
as the logarithm of the cardinality of the largest subcode of C of support size i, namely:

i (C ) def
= max
D
logq jDj for i = 1 2 : : : n (168)
where the maximum is taken over all subcodes D C such that j(D)j = i. The sequence
1(C )
2 (C ) : : :
n (C ) is called the cardinality-length prole of C .
It is obvious that the cardinality-length pro
le reduces to the DLP if C is a linear code. In
general, however, the alphabet A in De
nition 5.3 does not even have to have a group struc-
ture and/or contain a special zero element. Thus the notion of support () in De
nition 5.3
is de
ned in terms of variation (i 2 (C ) i exist c c 2 C with ci 6= ci), as explained on p.71.
0 0
The following theorem employs cardinality-length pro

les to generalize the lower bound of
Theorem 5.15 to nonlinear codes.
Theorem 5.18. Let C be a code of length n, with M codewords, over an alphabet of size q.
Then the state complexity of any trellis T for C is lower bounded by:
logq M ;
l1 (C ) ;
l2 (C ) ; ;
lL (C )
s = logq Vmax L;1
where
1(C )
2 (C ) : : :
n (C ) is the cardinality-length prole of C , and l1 l2 : : : lL is any set
of positive integers such that l1 + l2 + + lL = n.
We refer the reader to Lafourcade and Vardy 74] for a detailed proof of this theorem. Herein,
we observe that the corresponding generalizations of the DLP bound (Theorem 5.9) and the
span bound (Theorem 5.13) can be obtained as special cases of Theorem 5.18.
Reuven and Be'ery 92], following upon an elliptic observation of McEliece 82, Theorem 4.5],
developed a more interesting generalization of dimension-length pro
les to nonlinear codes.
To describe the results of 92], we
rst need to set up the appropriate notation. Given
a code C of length n (linear or nonlinear) and a subset J = fi1 i2 : : : im g of the time axis
I = f1 2 : : : ng for C , we de
ne the projection of a codeword x 2 C on J by the mapping
x = (x1 x2 : : : xn) 2 C 7! P (x) defJ= (xi1 xi2 : : : xim )
The image P (C ) of the entire code under this mapping is called the projection of C on J .
This generalizes the projections Pi Fi of C on the past and future at time i, as de
ned in (37)
J

and (38) respectively. Thus if J = f1 2 : : : ig then Pi = P (C ) and Fi = P (C ), where

J

I nJ
I n J = fi+1 i+2 : : : ng denotes the complement of J in I .

Now suppose that C has M codewords. We will convert C into a uniform probability space by
assigning each codeword a probability of 1=M . Thus our sample space ' = fx1 x2 : : : xM g
is the set of all codewords of C , and the probability measure is p(x) = 1=M for all x 2 '.
85
Given a subset J I , we de
ne a discrete random variable X which takes on values in
the set P (C ) with probabilities induced by the uniform distribution on the codewords of C .
J
Thus if jJ j d ; 1, where d is the dual distance of C in the sense of 79, p. 139], then
J
0 0
X is a uniformly distributed random variable. In general, however, X is not a uniform

J J
random variable. Given two subsets J1 J2 I , one can straightforwardly deduce the joint
probability mass function of the random variables X 1 X 2 , or the conditional probability
mass function of X 1 given X 2 , from the uniform probability measure on the codewords of C .
J J
J J
In particular, the conditional distribution of X given X is well de

ned.
J I nJ
For general background on the information-theoretic concepts used below, we refer the reader
to any textbook on information theory | see, for instance, Gallager 46]. Here, we briey
review the essential de
nitions. If X is a discrete random variable taking m values with
probabilities p1 p2 : : : pm , the entropy of X is de
ned as:
H (X ) def = p1 log 1 + p2 log 1 + + pm log 1 (169)
p1 p2 pm
Given two discrete random variables X and Y , the conditional entropy H (X jY ) is de
ned in
a similar fashion, using weighted average of conditional probability distributions. The base
of the logarithms in (169) is arbitrary in principle. However, when discussing codes over
an alphabet of size q, we will always take all logarithms to base-q.
De nition 5.4. Let C be a code of length n over an alphabet of size q. We dene i (C ) as
the minimum possible entropy of i positions along the time axis for C , namely:

i (C ) def
= min H (X )
J
J for i = 1 2 : : : n (170)
where the minimum is taken over all subsets J f1 2 : : : ng with jJ j = i. The sequence
1 (C ) 2 (C ) : : : n(C ) is called the entropy-length prole of C .

De nition 5.5. Let C be a code of length n over an alphabet of size q. We dene i(C ) as
the maximum conditional entropy of any i positions given the other n;i positions, namely:
i(C ) def
= max H (X jX )
J
J I nJ for i = 1 2 : : : n (171)
where the maximum is taken over all subsets J f1 2 : : : ng with jJ j = i. The sequence
1(C ) 2 (C ) : : : n(C ) is called the conditional entropy-length prole of C .
It is not immediately clear what the entropy-length pro
le and the conditional entropy-length
pro
le, as de
ned above, have to do with the dimension-length pro
les as de
ned in (132).
However, Reuven and Be'ery 92] establish the following result.
Proposition 5.19. For a linear code C , the conditional entropy-length prole reduces to
the DLP of C , and the entropy-length prole reduces to the inverse DLP of C .
Proposition 5.19 is interesting for several reasons. First, this result shows that the conditional
entropy-length pro
le is a natural generalization of the notion of DLP to nonlinear codes.
This generalization is essentially dierent from the cardinality-length pro
le de
ned by (168).
The following example illustrates this point. Using this example, we will also establish
a number of general relations between the CLP, the ELP, and the conditional ELP.
86
Example 5.8. To demonstrate the dierences between the CLP and the ELP, let us consider
two nonlinear binary codes C 1 = f0000 0110 1100 1111g and C 2 = f0000 0010 1100 1111g.
The CLP and the ELP, conditional and otherwise, of the two codes can be easily determined
by inspection. For instance, the cardinality-length pro
les of C 1 and C 2 are given by:
i: 1 2 3 4

i (C 1 ) = 0 1 log2 3 2

i (C 2 ) = 1 1 log2 3 2
We see that the cardinality-length pro
les of the two codes are very similar. For example,
since
2(C 1 ) =
2(C 2 ) = 1, lower bounds based on the CLP would predict the same number
of vertices at time i = 2 in both cases. The ELP and the conditional ELP, given by:
i: 1 2 3 4 i: 1 2 3 4
i (C 1 ) = 2 ; 3=4 log2 3 3=2 2 2

i (C 1 ) = 0 1=2 3=4 log2 3 2
i (C 2 ) = 2 ; 3=4 log2 3 1 3=2 2

i (C 2 ) = 1=2 1 3=4 log2 3 2
are considerably more informative. In particular, both the ELP and the conditional ELP
distinguish between the two codes at time i = 2. Also observe that for both codes:
i(C )
i(C ) for i = 1 2 : : : n (172)
Referring to De
nition 5.3 and De
nition 5.5, it is not dicult to see that this must be true
for any code. Another interesting observation is that
i(C ) + n i(C ) = logq jC j

;
for i = 1 2 : : : n
for both codes. Again, it is easy to see from (170), (171), and the well-known 46] properties
of the entropy function that this must be true in general. Thus the ELP and the conditional
ELP sequences determine each other. In general, if C is a nonlinear code, then each one of
these sequences contains `more information' than the cardinality-length pro
le of C . }
Another conclusion from Proposition 5.19 is that the various bounds that we have already
established for linear codes in terms of the DLP have an interesting information-theoretic
interpretation. This was
rst observed by McEliece 82] in a somewhat dierent context.
Reuven and Be'ery 92] use the ELP to extend most of the known DLP bounds to nonlinear
codes. Herein, we present just two of the main results of 92], both without proof.
s = logq Vmax logq M ; mini f i (C ) + n i(C ) g
2I ;
Moreover, for all i = 1 2 : : : n and under all permutations of the time axis, the number of
vertices at time i in T is lower bounded by:
logq jVi j logq M ; i (C ) ; n i(C ) = i (C ) + n i(C ) ; logq M
;

;
where 1 (C ) 2 (C ) : : : n (C ) is the entropy-length prole of C , and 1(C ) 2 (C ) : : : n(C )

is the conditional entropy-length prole of C .

87
The foregoing theorem generalizes the DLP bounds of Theorem 5.9 in terms of entropy-length
pro
les. The following theorem is the corresponding generalization of Theorem 5.15.
log M ; l1 (C ) ; l2 (C ) ; ; lL (C )
s = logq Vmax q L;1
where 1(C ) 2 (C ) : : : n (C ) is the conditional entropy-length prole of C , and l1 l2 : : : lL
is any set of positive integers such that l1 + l2 + + lL = n.
The foregoing lower bounds on trellis complexity of nonlinear codes are not smaller than the
corresponding CLP bounds. This follows immediately from the fact that i (C )
i(C ) for
all i, as we have observed in (172). In fact, the ELP bounds are often stronger than the CLP
bounds. Reuven and Be'ery 92] attribute this to the fact that the entropy-length pro
le
captures the inherent asymmetry of a nonlinear code, while the CLP does not.
Example 5.9. The Nordstrom-Robinson code N16 is a remarkable nonlinear binary code of
length 16 with 256 codewords 79, p. 73]. This code is rectangular under all permutations
of the time axis 98]. The cardinality-length pro
le of N16 was computed in 74], while the
(conditional) entropy-length pro
le of N16 was found in 92]. These pro
les are given by:
i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i(N16) = 0 0 0 0 0 1 1 1 log2 3 1+ log2 3 3 4 5 6 7 8
i(N16) = 0 0 0 0 0 1=4 1=4 1 3=4 log2 3 1+ 3=4 log2 3 3 4 5 6 7 8
We see that the conditional ELP improves upon the CLP in four cases. The resulting lower
bounds on the state-cardinality pro
le of the Nordstrom-Robinson code are given by:
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
(173)
jVi j 1 2 4 8 16 32 48 95 64 95 48 32 16 8 4 2 1
All these bounds follow directly from Theorem 5.20. Using the distance spectrum of N16,
it is possible to improve the lower bound on jVij to 96 vertices at times i = 7 and i = 9.
See 74] for a proof of this result. No further componentwise improvements on (173) are
possible. Forney 38] and Vardy 74, 110] construct a trellis for N16 that attains the lower
bound of (173) at all times except i = 6 7 9 10. On the other hand, Reuven and Be'ery 92]
construct another trellis for N16, corresponding to a dierent permutation of the time axis,
which attains the bound of (173) at all times except i = 7 8 9, and has 96 vertices at times
i = 7 and i = 9. However, the maximum number of vertices in both trellises is 128, at time
i = 8 in the trellis of 92] and at times i = 7 9 in the trellis of 38, 74]. It is currently not
known whether this is the best possible state complexity for N16 under all permutations of
the time axis. The total number of edges is jE j = 764 in both trellises, and we conjecture
that this is the best possible. It is shown in 74] that jE j 700 in any trellis for N16. }
This concludes our discussion of lower bounds on trellis complexity of linear and nonlinear
codes. We refer the reader to 39, 62, 73, 74, 92, 96, 127, 128] for more details on this subject.
88
5.4. Table of bounds for short codes
Petra Schuurman 96] compiled a table of upper and lower bounds on the state complexity
of minimal trellises for binary linear codes of length n 24. The table of Schuurman 96] is
included in this subsection as Table 5, in a slightly dierent format.
Ideally, it would be nice to have a three-dimensional table, in a format similar to that of the
tables of Brouwer 12, 13]. For each
xed length n, dimension k, and minimum distance d,
the table should specify the best known upper and lower bounds on the smallest possible
state complexity s of a trellis for an (n k d) binary linear code, under all permutations of
the time axis. Alternatively, one could
x any three of the four parameters n k d s and
provide upper and lower bounds on the remaining parameter.
Unfortunately, it is not possible to print a three-dimensional table. Hence we
x only two
parameters: length n and dimension k. The values of n and k are thus used to index the
entries in Table 5. These entries consist of several ordered pairs of positive integers. Each
pair s d listed in row n and column k of the table has the following meaning.
Condition A: There exists an (n k d) binary linear code C and a permutation of the time
axis for C , such that the state complexity of the resulting minimal trellis for C is s.
For instance, the pair 9 8 listed in row n = 24 and column k = 12 means that there exists
a (24 12 8) binary linear code C and a minimal trellis for C whose state complexity is s = 9.
Indeed, this is the binary Golay code G24 in the componentwise optimal order given by (123).
As we are interested in codes that have minimum distance as large as possible and state
complexity as small as possible, we do not list every pair of integers that satis
es Condition A.
Instead, we compile only those pairs s d which also satisfy the following two conditions.
Condition B: For every (n k d) binary linear code C , the state complexity of the minimal
trellis for C is at least s, under all permutations of the time axis.
Condition C: If there exists an (n k d ) binary linear code C and a minimal trellis for C
0
whose state complexity is s, then d d.

0
For instance, the pair 5 6 listed in row n = 16 and column k = 7 implies that for every
(16 7 6) binary linear code, the state complexity of the minimal trellis is at least s 5.
This fact was established in Example 5.4 of the previous subsection, where we observed that
the distance bound on the DLP of the (16 7 6) lexicode L16 implies that s 5. Since the
distance bound on DLP depends only on n, k, and d, this must be true for any (16 7 6) code.
The same listing 5 6 also implies that if there exists a (16 7 d) binary linear code C and
a permutation of the time axis for C such that the state complexity of the resulting minimal
trellis is s = 5, then the minimum distance of C is at most d 6. The latter implication is
trivial in this case, since the (16 7 6) lexicode L16 is known to be optimal.
Finally, we observe that in some cases the value of s that satis
es Conditions A, B, and C
for a given n, k, and d is not known exactly. In such cases we list upper and lower bounds
on this value of s, separated by a hyphen. For example, the entry 7{9 6 in row n = 24
and column k = 14 means that the best possible state complexity of the minimal trellis for
a (24 14 6) binary linear code (the Wagner code) is at least 7 and at most 9.
89
nnk 1 2 3 4 5 6 7 8 9 10
2 1,2
3 1,3 1,2
4 1,4 1,2 1,2
5 1,5 1,3 1,2 1,2
6
1,6 1,3 1,2 1,2 1,2
2,4 2,3
7
1,7 1,4 1,3 1,2 1,2 1,2
3,4 3,3
1,8 1,4 1,3 1,2 1,2 1,2 1,2
8 2,5 2,4 2,3
3,4
9
1,9 1,5 1,3 1,3 1,2 1,2 1,2 1,2
2,6 2,4 3,4 2,3
1,10 1,5 1,4 1,3 1,2 1,2 1,2 1,2 1,2
10 2,6 2,5 2,4 2,3 3,3
3,4
1,11 1,6 1,4 1,3 1,3 1,2 1,2 1,2 1,2 1,2
11 2,7 2,5 2,4 3,4 2,3 3,3
3,6 3,5 3,4
1,12 1,6 1,4 1,3 1,3 1,2 1,2 1,2 1,2 1,2
12
2,8 2,6 2,4 2,4 2,3 2,3 3,3
3,5 3,4 3,4
4,6
1,13 1,7 1,5 1,4 1,3 1,3 1,2 1,2 1,2 1,2
13 2,8 2,6 2,5 2,4 3,4 2,3 3,3 4,3
3,7 3,6 3,5 3,4 4,4
1,14 1,7 1,5 1,4 1,3 1,3 1,2 1,2 1,2 1,2
14
2,9 2,7 2,5 2,4 2,4 2,3 2,3 3,3 4,3
3,8 3,6 3,5 4,5 3,4 3,4 4,4
4,7 4,6
1,15 1,8 1,5 1,4 1,3 1,3 1,3 1,2 1,2 1,2
15
2,10 2,7 2,6 2,5 2,4 3,4 2,3 2,3 3,3
3,8 3,7 3,6 3,5 4,5 3,4 3,4 4,4
4,8 4,7 4,6
1,16 1,8 1,6 1,4 1,4 1,3 1,3 1,2 1,2 1,2
16
2,10 2,8 2,6 2,5 2,4 2,4 2,3 2,3 3,4
3,8 3,6 3,5 4,5 3,4 3,4
4,8 4,6 5,6 5,5
1,17 1,9 1,6 1,5 1,4 1,3 1,3 1,3 1,2 1,2
17
2,11 2,8 2,6 2,5 2,4 2,4 3,4 2,3 2,3
3,9 3,8 3,6 3,6 3,5 4,5 3,4 3,4
4,8 5,7 4,6 5,6 5,5
Table 5. Bounds on the state complexity of short binary linear codes

90
nnk 11 12 13 14 15 16 17 18 19 20 21 22 23
12 1,2
13 1,2 1,2
14 1,2 1,2 1,2
15
1,2 1,2 1,2 1,2
4,3
1,2 1,2 1,2 1,2 1,2
16 3,3
4,4
1,2 1,2 1,2 1,2 1,2 1,2
17 3,3 3,3
4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2
18 2,3 3,3 4,3
3,4 4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
19 2,3 3,4 3,3 4,3
3,4 4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
20
2,3 2,3 3,4 3,3 4,3
3,4 3,4 4,4
5-9,5
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,3 2,3 2,3 3,3 3,3 4,3
21 3,4 3,4 3,4 4,4 4,4
5-7,5 6-9,5
6-9,6
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,3 2,3 2,3 3,4 3,3 3,3 4,3
22
3,4 3,4 3,4 4,4 4,4
4-6,5 5-8,5 6-9,5
5-8,6 6-9,6
8-9,7
1,3 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
3,4 2,3 2,3 2,3 3,4 3,3 4,4 4,3
23
4-5,5 3,4 3,4 3,4 4,4
5-7,6 4-7,5 5-8,5 6-9,5
7-8,7 6-8,6 7-9,6
9,8 9,7
1,3 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,4 2,3 2,3 2,3 2,3 3,4 3,3 4,3 4,3
3-5,5 3,4 3,4 3,4 3,4 4,4 4,4
24 4-6,6 4-7,5 5-8,5 6-9,5
7-8,7 5-7,6 6-8,6 7-9,6
8,8 8-9,7
9,8

91
nnk 1 2 3 4 5 6 7 8 9 10
1,18 1,9 1,6 1,5 1,4 1,3 1,3 1,3 1,2 1,2
2,12 2,9 2,7 2,6 2,5 2,4 2,4 2,3 2,3
18 3,10 3,8 3,7 3,6 3,5 4,5 5,5 3,4
4,8 4,7 4,6 5,6 6,6
5,8 6,7
1,19 1,10 1,7 1,5 1,4 1,4 1,3 1,3 1,3 1,2
2,12 2,9 2,7 2,6 2,5 2,4 2,4 3,4 2,3
19 3,10 3,9 3,7 3,6 3,6 3,5 4-5,5 3,4
4,8 4,7 5,7 4,6 5-6,6 5-6,5
5,8 6,8 6,7
1,20 1,10 1,7 1,5 1,4 1,4 1,3 1,3 1,3 1,2
2,13 2,10 2,8 2,6 2,5 2,5 2,4 2,4 2,3
20 3,11 3,9 3,8 3,6 3,6 3,5 3-4,5 3,4
4,10 4,9 4,8 5,8 4,6 5,6 4-5,5
6,8 7,7 6-7,6
1,21 1,11 1,7 1,6 1,5 1,4 1,3 1,3 1,3 1,3
2,14 2,10 2,8 2,7 2,6 2,5 2,4 2,4 3,4
21
3,12 3,10 3,8 3,7 3,6 3,6 3,5 4-5,5
4,9 4,8 4,7 5-6,7 4,6 5-6,6
5,10 5,8 6,8 6-7,7 8,7
7,8
1,22 1,11 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,14 2,11 2,8 2,7 2,6 2,5 2,4 2,4 2,4
22
3,12 3,10 3,8 3,7 3,6 3,6 3,5 3-4,5
4,11 4,10 4,8 4,7 5,7 3-4,6 4-6,6
5,9 5,8 6,8 6-7,7 7-8,7
7,8 8,8
1,23 1,12 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,15 2,11 2,9 2,7 2,6 2,5 2,5 2,4 2,4
23
3,12 3,11 3,9 3,8 3,7 3,6 3,6 3-4,5
4,12 4,10 4-5,9 4-5,8 4-5,7 5-6,7 4,6
5,11 5,10 5-7,9 5-6,8 6-7,8 6-7,7
7-8,8
1,24 1,12 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,16 2,12 2,9 2,8 2,6 2,6 2,5 2,4 2,4
24
3,13 3,12 3,9 3,8 3,7 3,6 3,6 3,5
4,10 4-5,9 4,8 4,7 5-6,7 4,6
4-5,11 5,10 5-7,9 5,8 6,8 6-7,7
5,12 6-7,10 7,8
Due to space limitations, we do not provide a list of references to the entries of Table 5. Some
of these references may be found in the original table of 96]. Herein, we point out that all
the bounds in Table 5 were obtained by Petra Schuurman, using the techniques described in
this section as well as various other methods described in 96].
92
5.5. Asymptotic bounds on trellis complexity
We now investigate the asymptotic behavior of the upper and lower bounds on trellis com-
plexity developed earlier in this section. In particular, we will be interested in the relative
trellis state complexity & = s=n as n ! 1. It is a simple but important observation that all
the relative measures of trellis complexity coincide with & for n ! 1.
Consider for instance the quantity b=n, where b = maxi logq jEi j is the edge complexity, as
de
ned in (14). If T is a proper trellis for a code C over an alphabet of size q, then obviously
s b s + 1. This is so because each vertex in T , except the toor, is the initial vertex for
at least one edge but not more than q distinctly labeled edges. Thus
& nb & + n1
and the relative edge complexity coincides with the relative state complexity as n ! 1.
As another example, consider the total number of vertices in the trellis. It is obvious that
qs jV j nqs , and therefore
logq jV j log n
& &+ q
n n
Thus the relative measure of the total number of vertices in the trellis asymptotically reduces
to the relative state complexity & , de
ned in terms of the maximum number of vertices in
the trellis. It is easy to see that the same is true for the total number of edges in the trellis,
the expansion index, and the Viterbi decoding complexity: when appropriately normalized,
all these measures of trellis complexity coincide with & as n ! 1. Thus & = s=n may be
regarded as the single asymptotic measure of the complexity of a trellis.
The question now arises as to how the trellis complexity & trades o versus the conven-
tional asymptotic parameters: rate R = logq jC j=n and relative minimum distance = d=n.
The following theorem provides a basis for this investigation. It shows that the number of
vertices in the minimal trellis grows (exponentially) without bound with the length n in any
asymptotically good sequence of codes.
Theorem 5.22. Let C 1 C 2 : : : be an innite sequence of distinct codes over an alphabet of
size q, of length ni , rate Ri, and minimum distance di, respectively. Let s be a xed positive
integer. If for all i = 1 2 : : : there exists a trellis for C i with state complexity at most s,
then either Ri ! 0 or di =ni ! 0 when i ! 1.
Proof. This follows immediately from the span bound logq Vmax R(d;1), established
in Theorem 5.13 and Theorem 5.18. Suppose that lim inf i Ri = > 0 and that the state
complexity logq Vmax of the (minimal proper) trellis for C i does not exceed s for all i. Then
!1
the span bound implies that di 1 + s= for all i. Thus di is bounded by a constant, and
therefore di =ni ! 0 as i, and hence also ni , tend to in
nity.
The proof of Theorem 5.22, based on the span bound, shows that the relative trellis com-
plexity & is strictly greater than zero for any asymptotically good sequence of codes with
rate
xed at R and relative minimum distance d=n
xed at . It is somewhat surprising
93
that the asymptotic form of the DLP bound of Theorem 5.9 does not suce to establish this
fact, although the DLP bound is usually stronger than the span bound for short to moderate
lengths. It is easy to see that the span bound asymptotically translates into:
& & R (174)
for n ! 1, so that & is always bounded away from zero if R > 0. On the other hand,
Zyablov and Sidorenko 128] derived the asymptotic form of the DLP bound, which coincides
with the asymptotic form of the Muder bound of Theorem 5.8. Both bounds show that:
& & R ; Rmax(2) (175)
for n ! 1. Here Rmax() is the maximum possible asymptotic rate of codes with relative
distance d=n = . At the time of writing, the best known upper bound on Rmax() is the
McEliece-Rodemich-Rumsey-Welch 84] bound (or the JPL bound, in the terminology of 19,
p. 247]), which for binary codes is given by:
n p p o
Rmax() min1;2 u1 1 + H2 1=
2 ; 1=2 1;u2 ; H2 1=2 ; 1=2 u2 + 2u + 2 (176)
for 0 0:5, where H2(x) is the binary entropy function. It is also known that Rmax() = 0
for 0:5, and Rmax() 1 ; H2() for binary codes. The former result is called the Plotkin
bound, while the latter result is known 79, p. 557] as the Gilbert-Varshamov bound.
& 0.5
0.45
0.4 Wolf bound
0.35
0.3
0.25
0.2
DLP bound
0.15
0.1
0.05
Span bound
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 18. Asymptotic form of the span bound and the DLP bound for binary codes
The span bound (174) and the DLP bound (175) are plotted in Figure 18 for binary codes
meeting the Gilbert-Varshamov bound. We have used (176) to evaluate the asymptotic DLP
The notation & and . is employed herein to denote inequalities that hold asymptotically for n ! 1. Thus
means f (n) g(n) (1 + o(1)), where o(1) is a function of n that tends to zero as n ! 1.
f (n) & g (n)
94
bound bound of (175). The problem with this bound is that for many asymptotically good
codes Rmax(2) 1 ; H2(2) R. In this case, the lower bound of (175) reduces to the vacu-
ous statement & 0. This happens, for example, for the entire family of Justesen 58] codes.
On the other hand, the fact that Rmax(2) = 0 if 0:25 for binary codes shows that the
asymptotic DLP bound is often exact. Indeed, if Rmax(2) = 0 then the lower bound of (175)
reduces to & & R. But the Wolf bound s k implies that & R for any linear code C . It
is easy to see that & R also for (the minimal proper trellis for) nonlinear codes. Thus it
follows that the asymptotic DLP bound is exact in this case! For binary codes:
& ' R if 0:25 (177)
This is precisely the asymptotic version of Theorem 5.11. Equation (177) shows that for bi-
nary codes, the upper bound s k cannot be improved upon by more than o(1) if d=n 0:25.
We now derive the asymptotic equivalent of Theorems 5.15 and 5.18. In doing so, we will
restrict our attention to partitions of the time axis into sections of equal length. It is shown
in 74] that such partitions are indeed asymptotically optimal, providing the Rmax() function
is -convex everywhere. The following theorem holds for both linear and nonlinear codes.
Theorem 5.23.
& & R ; LR;
max (L )
1
for all L = 2 3 4 : : :
Proof. Let C be a code of length n with M codewords, and let 1 (C ) 2 (C ) : : : n (C )

be the cardinality-length prole of C . It follows from Theorem 5.18 that:
logq M 1 X L R ; n1 PLj=1 lj (C )
& n(L ; 1) ; n(L ; 1) lj (C ) = L;1
j =1
for all integer L 2 and all l1 + l2 + + lL = n. Thus it would suce to exhibit a partition
l1 l2 : : : lL of the time axis for C , such that:
1X L
n j=1 lj (C ) . Rmax(L) (178)
Denote = bn=Lc and choose the section lengths so that lj + 1 for all j = 1 2 : : : L.
Then, using the fact that lj (C ) lj ;1(C ) + 1, we have:
lj (C ) 1 L logq Aq ( d) + L logq Aq ( d) + L
1X L
(179)
n j=1 n n
where Aq (n d) is the largest possible number of codewords in a code of length n and minimum
distance d over an alphabet of size q. Evidently:
logq Aq ( d)
. Rmax(d= ) Rmax(L)
where the second inequality follows from the fact that Rmax() is a nonincreasing function.
Now, for any xed L, the term L=n in (179) obviously vanishes when n ! 1. This estab-
lishes (178) and completes the proof of the theorem.
95
Theorem 5.23 produces a countably innite family of lower bounds on & . As might be ex-
pected, the DLP bound (175) and the span bound (174) are extreme members in this family,
corresponding to L = 2 and L ' 1= respectively. In the latter case, Theorem 5.23 reduces
to & & R ; Rmax(1) = R. For binary codes, this bound can be now improved to
& & 2R (180)
by taking L ' 1=2 and observing again that Rmax(u) = 0 for u 0:5 by the Plotkin bound.
The relationship between the lower bounds on & corresponding to L = 2 3 4 5 and L ' 1=2
is illustrated in Figure 19 for binary codes meeting the Gilbert-Varshamov bound.
&
0.5
0.45
0.4
Wolf bound
0.35
0.3
0.25
2
0.2
3
4
0.15
5
0.1 1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 19. Asymptotic bounds on trellis complexity of binary codes

meeting the Gilbert-Varshamov bound
Curves labeled 2 3 4 5 arise from Theorem 5.23 with = 2 3 4 5 respectively.
L
The curve labeled 1 corresponds to & 2 obtained by taking ' 1 2 .

& R L =
Although this is not apparent from Figure 19, we note that there exist values of R and
on the curve R = 1 ; H2() described by the Gilbert-Varshamov bound, for which the
lower bound & & 2R is stronger that the bound obtained by taking any xed value of L in
Theorem 5.23. We omit the proof of this statement, but observe that these values of R lie
in the neighborhood of the point R = 1 and = 0. Thus the innite family of bounds in
Theorem 5.23 converges to (180) as R ! 1, and coincides with the DLP bound as R ! 0.
96
The best known lower bound on the trellis complexity of binary codes meeting the Gilbert-
Varshamov bound is the \envelope" of all the curves in Figure 19, given by:
& & Lmax R ; Rmax(L)
=23::: L;1
This bound takes the form of a highly irregular curve, illustrated in Figure 20, that is not
dierentiable at a countably innite number of points. We conjecture that the bound of
Theorem 5.23 holds for all rational numbers L 2. If this conjecture is true, the resulting
\envelope" would be a smooth curve improving upon the bounds shown in Figures 19 and 20.
&
0.5
0.45
Wolf bound
0.4
0.35
0.3
Theorem 5.24
0.25
0.2
DLP
0.15 bound
0.1 Theorem 5.23
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 20. Asymptotic trellis complexity of the best binary codes

meeting the Gilbert-Varshamov bound
We now discuss asymptotic upper bounds on trellis complexity. The Wolf bound of Theo-
rem 5.5 trivially implies that & min(R 1;R) for any code C and any permutation of the
time axis for C . This bound on & is illustrated in Figures 18, 19, and 20. On the other hand,
Kudryashov and Zakharova 71], following upon the earlier work of Dumer 26], establish the
following existence bound on the trellis complexity of binary codes.
Theorem 5.24. There exist (sequences of) binary linear codes that attain the Gilbert-Var-
shamov bound R = 1 ; H2(), and whose relative trellis complexity satises:
8
< 1 ; H2 ( ) if 0 R 1 ; H2(0:25)
& . : (181)
H2(2) ; H2() if 1 ; H2(0:25) R 1
Notice that the rst case in (181) is precisely (177). This must be true for any binary code
with R = 1 ; H2(). The nontrivial statement & . H2(2) ; H2() for R 1 ; H2(0:25) is
97
proved in 71] using an ingenious construction of partially tail-biting trellises for binary convo-
lutional codes. Kudryashov and Zakharova 71] consider the ensemble of block codes obtained
by such tail-biting termination of convolutional codes with prescribed trellis complexity & ,
and show that there exist codes in this ensemble that attain the Gilbert-Varshamov bound.
The resulting upper bound on & is illustrated in Figure 20. It follows from Theorem 5.23 and
Theorem 5.24 that the `best' binary codes, in terms of the tradeo between three asymptotic
parameters R , and & , lie somewhere in the shaded area of Figure 20.
We conclude with the following observation. A long-standing conjecture 50] in coding theory
says that the Gilbert-Varshamov bound is asymptotically tight for binary codes, namely:
Rmax() . 1 ; H2() (182)
It is a curious fact, rst observed in 128], that if this conjecture is true then the DLP lower
bound of (175) coincides with the upper bound of Theorem 5.24. Thus a proof of (182)
would establish the best possible tradeo not only between R and , but between the three
asymptotic parameters: rate R, relative distance , and relative trellis complexity & .
Notes on Section 5: The \permutation problem" for trellises was rst posed by Massey 80]
in 1978, but was not studied in much detail until recently. Results on the computational
complexity of this problem, described in Section 5.1, are from 55, 57, 69, 111, 112]. The
main results of Section 5.2 are from 5, 7, 39, 60, 62, 69, 114]. The term \uniformly ecient"
permutation was introduced in 39, 62] see 62] for a related notion of uniformly concise
codes. The discussion of BCH codes in Section 5.2 follows Vardy and Be'ery 114].
Muder 87] was the rst to consider general lower bounds on trellis state complexity. He
not only proved Theorem 5.8, but also established a generalization of this result to nonlinear
codes, which is now subsumed by Theorem 5.18. The connection between trellis complexity
and generalized Hamming weights was discovered in 60, 114]. The term \dimension-length
prole" was coined by Forney 39], who also proved the DLP duality theorem (Theorem 5.10).
Theorem 5.11 is due to Vardy and Be'ery 114]. The span bound of Theorem 5.13 and the
derivative asymptotic results of Theorem 5.22 and (174) are from Lafourcade and Vardy 73].
Further generalizations of the span bound may be found in 62]. Theorem 5.15, Theorem 5.17,
Theorem 5.18, and Theorem 5.23 are all from Lafourcade and Vardy 74]. All the results
involving entropy-length proles are due to Reuven and Be'ery 92]. Reuven and Be'ery 92]
also prove an upper bound on trellis complexity in terms of entropy-length proles, which we
did not discuss here. Table 5 of Section 5.4 is due to Petra Schuurman 96], and is included
here by permission. We did not verify all the bounds in this table.
98
6. The sectionalization problem
The foregoing section was devoted to minimizing the complexity of a trellis over all possible
permutations of the time axis. In this section we consider another operation on the time axis,
called sectionalization, which can also drastically change the structure and the complexity
of a trellis. For example, it is easy to verify by inspection that the two trellises in Figure 21
represent the same (8 4 4) binary Hamming code. Both trellises conform to the same order
of the time axis | the componentwise optimal standard binary order (cf. Theorem 5.7). Yet,
it is clear that the trellis in Figure 21b is simpler than the trellis in Figure 21a.
0 0 0 0
00 00
1 1
1 1 0
0 11 11 11 11
1 1 00
1 1 00
0 0 0 1 0
1 0 0 11 11
00 00
1 01 01
1 1 01 01
0 0 1
1 1
0 0 10 10
1 1 10 10 10 10
0 0 1 0 0
1
01 01
1 0 0 1
a. b.
Figure 21. Two trellises for the (8 4 4) extended binary Hamming code
In general, by a sectionalization we mean the choice of symbol alphabet at each time index.
For a given order of the time axis I , the sectionalization eectively shrinks I at the expense
of increasing the code alphabet 38, 87, 109]. For example, a binary code of length 2n may be
thought of as a quaternary code of length n if pairs of consecutive bits are grouped together,
as in Figure 21. A wide variety of such granularity adjustments 42] is possible, and each may
substantially aect the number of vertices, the number of edges, and the decoding complexity
of the minimal trellis for a given code. Thus, the problem at hand is this: given a code C
and the minimal (proper) trellis T for C , nd the optimal sectionalization of this trellis.
Let us state this somewhat more precisely. For a given code C of length n and a given order
of its time axis I = f0 1 : : : ng, a specic sectionalization of the minimal trellis T for C is
determined by the set fh0 h1 : : : h g I of section boundaries, where:
0 = h0 < h1 < h2 < < h ;1 < h = n
Clearly, there are 2n;1 possible ways to choose the section boundaries, and the sectionaliza-
tion problem consists of nding the optimal choice among the 2n;1 possibilities. Notice that
we have not yet specied what exactly optimality means. We will leave this matter open for
a while because there is a wide range of conceivable optimality criteria. For example, the to-
tal number of edges in the trellis and/or the Viterbi decoding complexity are natural criteria
for optimality. However, we shall see shortly that the key to the sectionalization problem
lies precisely in not restricting one's attention to such narrow denitions of optimality.
99
In this section we present a complete solution to the general sectionalization problem, as
stated above. Namely, we describe a polynomial-time algorithm which produces an optimal
sectionalization of the minimal trellis for a given linear code C , when presented with a gen-
erator matrix for C . In fact, this sectionalization algorithm of Lafourcade and Vardy 75]
is developed in a considerably more general setting it therefore works for both linear and
nonlinear codes and easily accommodates a variety of optimality criteria.
Following 75], we will dene the operations of composition and amalgamation of trellises.
This will enable us to consider a class of functions, dened on the set of trellises, that satisfy
a certain linearity property with respect to the composition operation. We will then seek
a sequence of amalgamations and compositions that minimize the value of an arbitrary given
function from this class. We will show that nding such a sequence is equivalent to nding
the minimum-weight path in a certain weighted digraph. Once this level of abstraction is
reached, the solution to our sectionalization problem will become apparent.
6.1. Functions and operations on trellises

Trellis notation and terminology used in this section diers slightly from how it was used
earlier in this chapter. We now brie!y describe these dierences.
In the previous two sections, we have identied the depth of a trellis T with the length n of
the code it represents. Herein, we will usually denote the depth of a trellis by , reserving n
for the length of a trellis that will be dened shortly. All the trellises considered so far had a
single root vertex and a single toor vertex. In contrast, most trellises in this section will have
multiple root and toor vertices. Finally, recall the decomposition (3) of the label alphabet
as A = A1 A2 A , where Ai is the set of edge-labels for edges in Ei . We have so far
assumed that A1 = A2 = = A = A throughout this chapter. Herein, we will drop this
assumption. Instead, we postulate the existence of a primary alphabet A, such that
Ai = A
|
A
{z
A}
li times
for all i = 1 2 : : : . The integer li is said to be the length of the i-th section in the trellis,
as the label
(e) of each edge e 2 Ei may be regarded as a sequence of length li over the
primary alphabet A. The length of a trellis T is then dened as n = l1 + l2 + + l .
The rest of this subsection is concerned with denitions that are needed to set the stage for
the results that follow in the next two subsections. Here is the rst denition.
Denition 6.1. Two trellises T and T 0 of length n are said to be equivalent if they represent
the same code. We use T ' T 0 to denote equivalence.
Notice that this equivalence of trellises should not be confused with equivalence of codes
(which has to do with permutations of the time axis, as discussed in the previous section).
Equivalent codes usually have distinct sets of codewords, and hence non-equivalent trellises.
100
We can now dene the operations of composition and amalgamation of trellises. Given
a trellis T = (V E A) of depth and a trellis T 0 = (V 0 E 0 A0 ) of depth 0, such that V = V00 ,
we can \glue" them together to form a trellis of depth + 0. Here is the formal denition.
Denition 6.2. A trellis T = (V E A) of depth + 0 is said to be the composition of
T = (V E A) and T 0 = (V 0 E 0 A0 ) if the set of vertices at time i in T is given by:
(
Vi for i = 0 1 : : :
Vi =
Vi0; for i = +1 : : : + 0
the set of edges of T is given by E = E E 0, and the set of edge-labels at time i in T is
given by Ai = Ai for i = 1 2 : : : and Ai = A0i; for i = +1 +2 : : : + 0.
We use T = T T 0 to denote composition. For example, if T is the trellis in Figure 22a
and T 0 is the trellis in Figure 22b, then their composition T = T T 0 is the trellis depicted
in Figure 22c. Composing trellises is easy!
(0,0,0,0,0)
(0,1,1,0,1)
0 0 (0,0) 0
1 1 1 1 (0,1,0,0,0)
1
0 (0,1) 0
(1,0,0,1,0)
0 (1,0)
(1,1,1,1,1)
0 0 (1,0,1,1,1)
1 1 1 1
1 (1,0,0,1,1)
0 (1,1) 0
(1,1,0,1,0)
a. b. (0,1,1,0,0)
(0,0) (0,0,1,0,1)
0 0 0 (0,0,0,0,1)
(1,1,0,0,1)
1 1 1 1
1
0 (0,1) 0
(0,0,1,0,0)
(1,0) (1,1,1,1,0)
0
0 0
(1,1,0,1,1)
1 1 1 1 1
0 (1,1) 0
(1,0,1,1,0)
c.
d.
Figure 22. Example of composition and amalgamation of trellises

It is obvious that although the composition operation is not commutative, it is associative.
That is (T1 T2) T3 = T1 (T2 T3) = T1 T2 T3. Furthermore, any trellis T of depth
may be uniquely decomposed as T = T1 T2 T , where T1 T2 : : : T each have depth
one. The trellises T1 T2 : : : T are called the sections of T . If, in addition, the length of
a section Ti in the above decomposition is one, we say that Ti is a unit section.
101
Instead of composing the two trellises T and T 0 to produce a trellis of depth + 0, we can
amalgamate them to produce a trellis of depth one. This is done by identifying the edges in
the amalgamated trellis with the paths in T T 0 . Here is the formal denition.
Denition 6.3. A trellis T = (V E A ) of depth one is said to be the amalgamation of
T and T 0 , if V0 = V0, V1 = V0 , the edge-label alphabet is given by:
0
A = A1
A2

A
A01
A02

A00
and E is the set of paths from V0 to V0 in T T 0 . That is, there is an edge (v1 v2
) in E
if and only if there is a path labeled
from v1 to v2 in T T 0 .
0
We use T = T T 0 to denote amalgamation. For example, if T is the trellis in Figure 22a

and T 0 is the trellis in Figure 22b, then the amalgamation T T 0 is the trellis in Figure 22d.
The sixteen edges in this trellis correspond to the sixteen paths in the composition trellis
T T 0 depicted in Figure 22c. Notice that T T 0 and T T 0 have the same length.
In fact, it is obvious from the foregoing denitions that T T 0 and T T 0 always represent
the same code. Thus, in our notation, we have
T T0 ' T T0 (183)
It follows from (183) that, given a decomposition of the form T = T1 T2 T , we can
replace any number of consecutive compositions with consecutive amalgamations to obtain
an equivalent trellis. For instance:
T1 T2 T3 T4 T5 T6 ' T1 (T2 T3 T4) (T5 T6)
This brings us closer to our sectionalization problem: given a decomposition of T into unit
sections, we seek a sequence of consecutive amalgamations of these sections that produces
an equivalent trellis which is optimal in some respect. It remains to dene \optimality."
To this end, we need to introduce an objective function, and require that it satises a cer-
tain property. Let T be the set of trellises, and consider functions from T into the set of
nonnegative integers N = f0 1 : : :g. For instance:
F1(T ) = the total number of edges in T (184)
F2(T ) = the minimum distance of the code represented by T (185)
F3(T ) = the number of operations required for Viterbi decoding of T (186)
are examples of such functions. All the results herein apply to functions from T into more
general sets however N will suce for our purposes.
Denition 6.4. A function F : T ! N is said to be decomposition-linear, if for all T 2 T
and for any decomposition T = T1 T2 of T , we have F (T1 T2) = F (T1 ) + F (T2).
For example, it is easy to see that the function F1 () in (184) is decomposition-linear, but
the function F2() in (185) is not. The function F3 () in (186) is decomposition-linear if the
Viterbi decoding complexity is dened by D = 2jE j ; jV j + jV0j as in (17), as well as under
more general denitions of trellis decoding complexity that take into account the operations
needed to compute the edge metrics. For more details on this, we refer the reader to 75].
102
6.2. The sectionalization algorithm
Suppose that we are given a generator or a parity-check matrix for a linear code C of length n
over IFq . The minimal trellis T = (V E A) for C can be be constructed using any of the
techniques described in Section 4.2. All the constructions in Section 4.2 also produce a natural
decomposition of T into n unit sections of the form:
T = T1 T2 Tn (187)
with each Ti being a length-one trellis over A = IFq . This decomposition corresponds to the
so-called 38, 61] full \unsectionalized"trellis for C , although it would be more appropriate
to refer to the trellis T in (187) as a trellis sectionalized to the extreme. In general, with the
notation of the foregoing subsection, we can establish a one-to-one correspondence between
the 2n;1 possible sectionalizations of T , with section boundaries at h0 =0 h1 h2 : : : h = n,
and decompositions of the form:
T 0 = (T1 Th1 ) (Th1 +1 Th2 ) (Th
Tn) (188)
; 1 +1
Now let F : T ! N be a given objective function. Observe that T 0 ' T , regardless of the
choice of section boundaries, in view of (183). That is, all the 2n;1 decompositions in (188)
produce trellises equivalent to T . However, these trellises are not equal , and the value of the
objective function F () could be dierent for dierent decompositions. The sectionalization
algorithm iteratively nds a decomposition T of type (188), which minimizes the value of

the objective function F (). Here is a pseudo-code description of this algorithm.

/* Sectionalization algorithm */
Tn := Tn

/* initialization */
for i = n;1 down to 1 do

f
/* find optimal sectionalization T
i

of i Ti+1 Tn
T */
n o
aux := minj=ii+1:::n;1 F (Ti Ti+1 Tj ) + F (Tj+1)
(189)
n o
jmin := arg minj=ii+1:::n;1 F (Ti Ti+1 Tj ) + F (Tj+1)
(190)
if aux F (Ti Ti+1 Tn) then Ti := (Ti Ti+1 Tjmin ) Tjmin+1

else Ti := Ti Ti+1 Tn

g
return T1
Notice that for j = i the expression Ti Ti+1 Tj in (189) and (190) should be understood
simply as Ti . It is clear that the complexity of the sectionalization algorithm is O(n2 ): there
are n ; 1 iterations, each requiring to compute and compare at most n values on line (189).
103
It is not dicult to prove directly that the sectionalization algorithm indeed produces an
optimal sectionalization of a trellis, provided F () is decomposition-linear. The following
indirect proof appears to be more insightful, however.
F(T1* T2* T3)
F(T1* T2) F(T3 * T4)
F(T1) F(T2) F(T3) F(T4)
0 1 2 3 4
F(T2* T3)
F(T2 * T3 * T4)
F(T1* T2 * T3 * T4)
Figure 23. The sectionalization digraph G = (V N E ) for a trellis of length 4

We dene an edge-labeled sectionalization digraph G = (V E N ) as follows. The vertices
of G are the integers from 0 to n, that is V = f0 1 : : : ng. Between every pair of vertices
i j 2 V with i < j , there is a weighted directed edge e = (i j
ij ) whose weight is given by:
= F (Ti+1 Tj ) 2 N

(e) =
ij def
For example, the sectionalization digraph corresponding to a trellis of length 4 is depicted
in Figure 23. Given a directed path P = e1 e2 : : : em in G , the weight of P is dened, as
usual, as the sum of the edge-weights, namely wt(P ) =
(e1 ) +
(e2) + +
(em ).
Proposition 6.1. There is a one-to-one correspondence between the values of the objective
function F () for the 2n;1 sectionalizations in (188) and the weights of the 2n;1 paths from
0 to n in G , provided F () is decomposition-linear.
Proof. Given a sequence of section boundaries h0 <h1 <h2 < <h and the correspond-
ing trellis T 0 as in (188), we construct the directed path P in G from h0 = 0 to h1, to h2,
and so forth, until we reach h = n. For this path P , we have:
F (T 0 ) = F (T1 Th1 ) + F (Th1 +1 Th2 ) + + F (Th ; 1 +1 Tn )
=
h0 h1 +
h1h2 + +
h 1 h = wt(P )
;
if the objective function F () is decomposition-linear. Thus solving the sectionalization prob-
lem is tantamount to nding the minimum-weight path in the sectionalization digraph G .
104
Having reduced the sectionalization problem to nding the minimum-weight path in a di-
rected graph, we observe that various ecient algorithms for this purpose are known 47].
(See also a discussion of this at the end of Section 3.) In particular, we refer the reader
to 10, 21] for a description of the Bellman-Ford and the Dijkstra algorithms.
It is now easy to see that our sectionalization algorithm is essentially a variant of the Dijkstra
algorithm. We have modied the original Dijkstra algorithm 23] slightly to exploit the
structure of the sectionalization digraph G . The Dijkstra algorithm applies to any digraph,
tacitly assuming that the digraph is complete it requires n(n;1)=2 additions and 2n(n;1)
comparisons for a graph with n vertices 21, p.296]. The sectionalization algorithm described
in this subsection requires n(n;1)=2 additions and only n(n;1)=2 comparisons for a graph
with n + 1 vertices. We have been able to reduce the number of comparisons in the Dijkstra
algorithm by a factor of 4 due to the fact that the sectionalization digraph G is not at all
complete | there is a directed edge between vertices i and j if and only if i < j .
It is known 20, 21] that the Dijkstra algorithm still works in a more general scenario. That is,
the weight of a directed path P = e1 e2 : : : em in G does not have to be equal to the sum
of the edge-weights. Instead, we could have:
=
(e1)
(e2 )
(em )
wt(P ) def
where is any associative binary operation. It follows that our sectionalization algorithm
works for a more general class of objective functions. We will say that a function F : T ! N
is decomposition-associative if for all T 2 T and for every decomposition T1 T2 T3 of T ,
F (T ) = F (T1 T2 T3) = F (T1) F (T2 ) F (T3)
for some associative operation on N . One can readily verify that the sectionalization
algorithm can be used with any decomposition-associative objective function, essentially
without change, by replacing the additions in (189) and (190) with .
Example 6.1. Suppose we are given a trellis T for C and wish to construct a sectionalization
of T , such that the in-degree of every vertex in the resulting trellis is 8. (Let's say we have
a stockpile of 1-out-of-8 comparators, and want to implement a Viterbi decoder for C using
a minimum number of these comparators.) Dene the function F : T ! f0 1g by:
8
< 0 if the in-degree of every vertex in T is equal to 8
F (T ) = :
1 otherwise
It is easy to see that F (T1 T2) = max fF (T1) F (T2 )g. Hence F () is not decomposition-
linear, but it is decomposition-associative since maxf g is an associative binary operation.
Thus we can use the sectionalization algorithm to nd an optimal sectionalized trellis T .
If F (T ) = 0, then T has the desired property. Otherwise, no such sectionalization exists. }

We note that the sectionalization algorithm may be further generalized in various ways.
For instance, one might be interested to nd an optimal sectionalization into a prescribed
number L of sections. For this purpose, a variant of the Bellman-Ford algorithm 10, p.396]
may be applied to the sectionalization digraph G . The resulting complexity of nding the
optimal L-section sectionalizations, for all L = 2 3 : : : n, is O(n3 ).
105
6.3. Dynamics of optimal sectionalizations
In practice, the objective function one would usually like to minimize is the one that counts
the total number of operations required for Viterbi decoding of a given trellis T = (V E A).
If this trellis consists of n unit sections, then Viterbi decoding complexity of T is given by:
D(T ) def
= 2jE j ; jV j + jV0j (191)
as we have shown in Theorem 3.2 of Section 3. If the trellis T contains sections of length
strictly greater than one, then we also need to compute the edge-labels for these sections
from the log-likelihoods of the individual symbols (cf. Section 3.2). There are various e-
cient ways to do so | we refer the reader to 45, 75] for a comprehensive treatment of this
subject. Herein, we will not be concerned with the details of this computation. All we need
to know is that the function M(T ) which counts the total number of operations (additions
and comparisons of real values) required to compute the edge-labels has the following prop-
erties. First, this function is decomposition-linear, so that M(T1 T2) = M(T1) + M(T2).
Furthermore, it is also \-convex with respect to the amalgamation operation, namely:
M(T1 T2) M(T1) + M(T2) (192)
It is shown in 75] that the function M(T ) has these properties regardless of the particular
method employed to compute the edge-labels for Viterbi decoding. In this subsection, we
will be interested in optimal sectionalizations with respect to the objective function:
F (T ) def
= D(T ) + M(T ) (193)
Since D(T ) and M(T ) are decomposition-linear, so is F (T ). Thus an optimal sectionalization
with respect to F () can be readily found using the sectionalization algorithm of the foregoing
subsection. However, this provides little insight into the structure of the resulting trellis.
Our goal in this subsection is to establish several enlightening relations between the section
boundaries of the optimal sectionalization with respect to F () and the past and future
proles of a linear code C . For many codes, these relations make it possible to determine the
optimal sectionalization \at a glance" from the sequences p0 p1 : : : pn and f0 f1 : : : fn.
To simplify the terminology, we hereafter use the term \optimal sectionalization" to refer
to a sectionalization that minimizes the objective function F () in (193). We now describe
a key observation that leads to most of the results in this subsection.
Lemma 6.2. Suppose that a section T = (V E A) can be represented as an amalgamation
of two shorter sections T 0 = (V 0 E 0 A0 ) and T 00 = (V 00 E 00 A00 ) such that jE 0 j + jE 00 j jE j.
Then T = T 0 T 00 cannot be a section in an optimal sectionalization.
Proof. This follows immediately from (191) and (192). Given the \-convexity of M()
with respect to amalgamation, it is easy to see that
F (T ) = F (T 0 T 00) > F (T 0 T 00) = F (T 0) + F (T 00)
This is so because D(T ) > D(T 0)+ D(T 00) if jE j jE 0 j + jE 00 j, in view of (191). Thus cutting
T = T 0 T 00 into two subsections T 0 and T 00 reduces the value of the objective function F ().
106
Let T = T1 T2 Tn be the minimal trellis for a linear code C of length n and dimension k
over IFq , decomposed into n unit sections. In what follows, we will be concerned specically
with sectionalizations of T . A section in such sectionalization is the trellis:
= Th+1 Th+2 Th
Thh def 0 0
where h and h0 are integers such that 0 h < h0 n. Lemma 6.2 shows that counting edges
in Thh is important, and the following theorem establishes a lower bound on the number of
0
edges in Thh in terms of the past and future proles of C .

0
Theorem 6.3. If Thh = Th+1 Th+2 Th is a section in any sectionalization of T , then

0 0
the number of edges in Thh is lower bounded by:

0
jE j maxi=h+1h+2:::h jEij = maxi=h+1h+2:::h qk;pi 1;fi

0 (194) 0
;
Furthermore, if Thh = Th+1 Th+2 Th is a section in the optimal sectionalization of T ,

0 0
then this lower bound holds with equality.

Proof. Notice that qk;pi 1 ;fi = jEi j is the number of edges in the i-th unit section Ti , as
;
we have established in Theorem 4.17. The total number of edges in Thh is given by: 0
jE j = qsh+fh;fh = qk;ph;fh 0
(195) 0
Indeed, the number of edges in Thh is equal to the total number of paths in the composition
0
trellis Th+1 Th+2 Th , which we have counted in equation (153) of Section 5. Thus
0
jE j = qp i;1 ; h p q f ;f
i (196)
jEij
h0
and the lower bound jE j jEi j for i = h + 1 h + 2 : : : h0 follows by observing that the se-
quence p0 p1 : : : pn is nondecreasing while the sequence f0 f1 : : : fn is nonincreasing.
Now suppose that Thh = Th+1 Th+2 Th is a section in the optimal sectionalization of T .
0 0
We will assume a strict inequality in the lower bound of (194) and obtain a contradiction.
In view of (196), strict inequality in (194) implies that
pi;1 + fi > ph + fh for i = h+1 h+2 : : : h0
0 (197)
Substituting i = h0 in (197), we obtain ph ;1 > ph . We let denote the smallest integer in
0
the set fh+1 h+2 : : : h0 ;1g satisfying p = ph + 1. Since ph ;1 > ph, such a exists. We 0
will cut Thh into two subsections of shorter length at time i = . Namely, we dene:
0
= Th = Th+1 Th+2 T

T 0 def
= Th = T+1 T+2 Th
T 00 def 0 0
Equation (195) shows that the total number of edges in T 0 and T 00 is given by jE 0 j = qk;ph;f
and jE 00 j = qk;p ;fh , respectively. It follows that:
0
jE j = qk;p ;f = qf ;f
h h0

jE j = qk;p ;f = qp ;p (198)
h h0
h
jE 0 j jE 00j
h0
q k ;p ;fh qk;p ;f h0
We rst deal with the second ratio jE j=jE 00 j in (198). This is straightforward: by the deni-
tion of we have p ; ph = 1, and therefore jE j=jE 00 j = q. The rst ratio jE j=jE 0 j in (198)
107
requires a bit more tinkering. Since we chose to be the smallest integer with p = ph + 1,
it follows that p;1 = ph. Invoking (197) with i = , we obtain that p;1 + f > ph + fh . 0
Since p;1 = ph this further implies that f fh + 1, and therefore jE j=jE 0 j q.

0
Considering both ratios in (198), we can now conclude that jE j jE 0 j + jE 00 j. It follows that
Thh = Th Th cannot be a section in the optimal sectionalization by Lemma 6.2.
0 0
Theorem 6.3 shows that the maximum edge-complexity of T cannot decrease under sec-
tionalization. Furthermore, it remains invariant under optimal sectionalizations. The next
theorem uses this fact to establish our main result in this subsection: a relation between the
past and future proles of a linear code C and the optimal sectionalization of its trellis.
Theorem 6.4. Let h and h0 be consecutive section boundaries in the optimal sectionaliza-
tion. Then for each i = h + 1 h + 2 : : : h0 , either pi = ph, or fi = fh , or both.
0
Proof. Assume to the contrary that there exists a position i 2 fh + 1 h + 2 : : : h0 g, such

that pi > ph and fi > fh . Then for all j 2 fh + 1 h + 2 : : : ig, we have:
0
pj;1 ph and fj fi > fh 0
since the sequence p0 p1 : : : pn is nondecreasing while the sequence f0 f1 : : : fn is nonin-
creasing. Therefore jE j = qk;ph;fh is strictly greater than jEj j = qk;pj 1;fj for all such j .
0 ;
For j 2 fi + 1 i + 2 : : : h0 g, we conclude by a similar argument that:

pj;1 pi > ph and fj fh 0
and again jE j > jEj j for all such j . It follows that jE j is strictly greater than jEj j for all
positions j in fh + 1 h + 2 : : : h0 g. This is a contradiction to Theorem 6.3.
Theorem 6.4 provides a means to determine at least some of the section boundaries in the
optimal sectionalization of T from the past and future proles of C . For instance, if for
a certain position i 2 I we have pi > pi;1 and fi > fi+1, then by Theorem 6.4 this position
i cannot be properly within the set of positions fh + 1 h + 2 : : : h0 g of any section Thh in 0
the optimal sectionalization of T . This proves the following corollary to Theorem 6.4.
Corollary 6.5. If pi > pi;1 and fi > fi+1 for some i 2 I , then i is necessarily a section bound-
ary in the optimal sectionalization of T .
Corollary 6.5 is reminiscent of the incremental past and future proles "pi = pi ; pi;1 and
"fi = fi;1 ; fi dened in Section 4. Herein, it will be more convenient to use:
rfi def
= fi ; fi+1 = "fi+1
Then the condition of Corollary 6.5 reduces to ("pi rfi ) = (1 1). A more careful examina-
tion of Theorem 6.4 shows that within a section of the optimal sectionalization, we can have
("pi rfi ) = (0 1) followed by ("pj rfj ) = (1 0), but not vice versa.
Corollary 6.6. For all i j 2 I with j > i, if ("pi rfi) = (1 0) and ("pj rfj ) = (0 1), then
the optimal sectionalization has at least one section boundary in the set fi i + 1 : : : j g.
We observe that the values of ("pi rfi ) completely determine the vertex structure at time i,
just as the values of the pair ("pi "fi ) determine the edge structure at time i (cf. Table 1).
108
For example ("pi rfi ) = (1 1) simply means that all the vertices in Vi are of type ><.
Thus an optimal sectionalization always cuts through such vertices. Similarly, the condition
of Corollary 6.6 corresponds to a sequence of vertex structures of the type:
>; ;; ;; ;< (199)
An optimal sectionalization always cuts somewhere along this sequence. Furthermore, it
does not matter where we cut | it can be shown that all the cuts through (199) yield the
same value for the objective function F (). Using these rules, it is often possible to determine
all the section boundaries of the optimal sectionalization just by looking at the trellis!
Example 6.2. Consider again the (24 12 8) binary Golay code G24. The past and future
proles for G24, in the componentwise optimal ordering (123) of the time axis for G24 found
in the previous section, are given by:
i: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
pi : 0 0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12
fi : 12 11 10 9 8 7 6 5 5 4 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
"pi : 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1
rfi : 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
We see that there are exactly three positions satisfying the condition of Corollary 6.5, namely
("pi rfi ) = (1 1). These are 8, 12, and 16. The condition of Corollary 6.6 does not
occur. This is always true for self-dual codes in view of Theorem 4.21. Thus, according to
Corollaries 6.5 and 6.6, examination of the past and future proles of G24 suggests that the
optimal sectionalization should have boundaries at f0 8 12 16 24g. It can be readily veried
that this sectionalization indeed minimizes the function F () dened in (193). }
For many other codes, the rules described above also suce to nd all the boundaries in
the sectionalization that minimizes the objective function (193). In fact, we have not yet
encountered a code for which these rules fail to produce an optimal sectionalization.
109
7. Guide to the literature
In this section, we compile a comprehensive bibliography of papers on trellis structure and
complexity of codes. These papers are roughly classied into nine categories. We also provide
a list of references for several closely related topics that were not discussed in this chapter:
trellis complexity of lattices, trellis decoding algorithms, trellises for group and convolutional
codes, generalized Hamming weights, and representation of codes by general graphs.
The papers within each category are arranged more or less in chronological order, to show
the historical development of ideas in the eld. In some cases, key papers that are relevant
to more than one topic are listed under several categories.
A. Historical background and rst papers

A1] G.D. Forney, Jr., \Final report on a coding system design for advanced solar missions,"
Contract NAS2{3637, NASA Ames Research Center, CA, December 1967.
A2] G.D. Forney, Jr., \The Viterbi algorithm," Proceedings IEEE, vol. 61, pp. 268{278, 1973.
A3] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for mini-
mizing symbol error rate," IEEE Trans. Inform. Theory, vol. 20, pp. 284{287, 1974.
A4] J.K. Wolf, \Ecient maximum-likelihood decoding of linear block codes using a trellis,"
IEEE Trans. Inform. Theory, vol. 24, pp. 76{80, 1978.
A5] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Informa-
tion Theory and Systems, vol. 65, pp. 148{157, NTG-Fachberichte, Berlin, 1978.
A6] I.I. Dumer, \On complexity of maximum-likelihood decoding of the best concatenated
codes," in Proc. 8-th All-Union Conference on Coding Theory and Information Theory,
Moscow-Kuibishev, pp. 66-69, 1981, (in Russian).
A7] G.D. Forney, Jr., \Coset codes II: Binary lattices and related codes," IEEE Trans. Inform.
Theory, vol. 34, pp. 1152{1187, 1988.
A8] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
B. The minimal trellis for a xed time axis: properties and constructions
B1] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for mini-
B2] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Informa-
110
B3] G.D. Forney, Jr., \Coset codes II: Binary lattices and related codes," IEEE Trans. Inform.
Theory, vol. 34, pp. 1152{1187, 1988.
B4] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
B5] V.V. Zyablov and V.R. Sidorenko, \Bounds on complexity of trellis decoding of linear
block codes," Problemy Peredachi Informatsii, vol. 29, pp. 3{9, 1993, (in Russian).
B6] A.D. Kot and C. Leung, \On the construction and dimensionality of linear block code
trellises," in Proc. IEEE Int. Symp. Inform. Theory, p. 291, San Antonio, TX., 1993.
B7] F.R. Kschischang and V. Sorokine, \On the trellis structure of block codes," IEEE Trans.
Inform. Theory, vol. 41, pp. 1924{1937, 1995.
B8] U. Dettmar, R. Raschofer, and U. Sorger, \On the trellis complexity of block and convo-
lutional codes," Problemy Peredachi Informatsii, vol. 32, pp. 10{21, 1996.
B9] R.J. McEliece, \On the BCJR trellis for linear block codes," IEEE Trans. Inform. Theory,
vol. 42, pp. 1072{1092, 1996.
B10] A. Vardy and F.R. Kschischang, \Proof of a conjecture of McEliece on the expansion index
of the minimal trellis," IEEE Trans. Inform. Theory, vol. 42, pp. 2027{2033, 1996.
B11] F.R. Kschischang, \The trellis structure of maximal xed-cost codes," IEEE Trans. In-
form. Theory, vol. 42, pp. 1828{1838, 1996.
B12] V.V. Vazirani, H. Saran, and B. Sundar Rajan, \An ecient algorithm for constructing
minimal trellises for codes over nite abelian groups," IEEE Trans. Inform. Theory,
vol. 42, pp. 1839{1854, 1996.
B13] V.R. Sidorenko, G. Markarian, and B. Honary \Minimal trellis design for linear block
codes based on the Shannon product," IEEE Trans. Inform. Theory, vol. 42, pp. 2048{
2053, 1996.
B14] V.R. Sidorenko, \The Euler characteristic of the minimal code trellis is maximum," Prob-
lemy Peredachi Informatsii, vol. 33, pp. 87-93, 1997, (in Russian).
B15] V.R. Sidorenko, I. Martin, and B. Honary \On separability of nonlinear block codes,"
IEEE Trans. Inform. Theory, to appear, 1998.
B16] J.D. Laerty and A. Vardy, \Ordered binary decision diagrams and minimal trellises,"
unpublished manuscript, 1998.
C. Operations on the time axis and trellis complexity

C1] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Informa-
C2] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
111
C3] B.D. Kudryashov and T.G. Zakharova, \Block codes from convolutional codes," Problemy
Peredachi Informatsii, vol. 25, pp. 98{102, 1989, (in Russian).
C4] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \Trellis diagram construction for some
BCH codes," IEEE Int. Symp. Inform. Theory and Appl., Honolulu, Hawaii, 1990.
C5] V.V. Zyablov and V.R. Sidorenko, \Bounds on complexity of trellis decoding of linear
block codes," Problemy Peredachi Informatsii, vol. 29, pp. 3{9, 1993.
C6] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On the optimum bit orders with respect
to the state complexity of trellis diagrams for binary linear codes," IEEE Trans. Inform.
Theory, vol. 39, pp. 242{245, 1993.
C7] Y. Berger and Y. Be'ery, \Bounds on the trellis size of linear block codes," IEEE Trans.
C8] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On complexity of trellis structure of linear
block codes," IEEE Trans. Inform. Theory, vol. 39, pp. 1057{1064, 1993.
C9] B. Honary, G. Markarian, and P.G. Farrell, \Generalised array codes and their trellis struc-
ture," IEE Electronics Lett., vol. 29, pp.541{542, 1993.
C10] Y.-Y. Wang and C.-C. Lu, \The trellis complexity of equivalent binary (17 9) quadratic
residue code is ve," in Proc. IEEE Int. Symp. Inform. Theory, San Antonio, TX., 1993.
C11] A. Vardy and Y. Be'ery, \Maximum-likelihood soft decision decoding of BCH codes,"
C12] S. Dolinar, L. Ekroot, A.B. Kiely, R.J. McEliece, and W. Lin, \The permutation trellis
complexity of linear block codes," in Proc. 32-nd Allerton Conference on Comm., Control,
and Computing, Monticello, IL., pp. 60{74, September 1994.
C13] F.R. Kschischang and G.B. Horn, \A heuristic for ordering a linear block code to minimize
trellis state complexity," in Proc. 32-nd Allerton Conference on Comm., Control, and
Computing, Monticello, IL., pp. 75{84, September 1994.
C14] G.D. Forney, Jr., \Dimension/length proles and trellis complexity of linear block codes,"
C15] A. Lafourcade and A. Vardy, \Asymptotically good codes have innite trellis complexity,"
C16] . Ytrehus, \On the trellis complexity of certain binary linear block codes," IEEE Trans.
C17] Y. Berger and Y. Be'ery, \Trellis-oriented decomposition and trellis-complexity of com-
posite length cyclic codes," IEEE Trans. Inform. Theory, vol. 41, pp. 1185{1191, 1995.
C18] A. Lafourcade and A. Vardy, \Lower bounds on trellis complexity of block codes," IEEE
Trans. Inform. Theory, vol. 41, pp. 1938{1954, 1995.
C19] C.-C. Lu and S.H. Huang, \On bit-level trellis complexity of Reed-Muller codes," IEEE
112
C20] A. Lafourcade and A. Vardy, \Optimal sectionalization of a trellis," IEEE Trans. Inform.
Theory, vol. 42, pp. 689{703, 1996.
C21] S.B. Encheva, \On the binary linear codes which satisfy the two-way chain condition,"
C22] A.B. Kiely, S. Dolinar, R.J. McEliece, L. Ekroot, and W. Lin, \Trellis decoding complexity
of linear block codes," IEEE Trans. Inform. Theory, vol. 42, pp. 1687{1697, 1996.
C23] Y. Berger and Y. Be'ery, \The twisted squaring construction, trellis complexity and gen-
eralized Hamming weights of BCH and QR codes," IEEE Trans. Inform. Theory, vol. 42,
pp. 1817{1827, 1996.
C24] R.J. McEliece and W. Lin, \The trellis complexity of convolutional codes," IEEE Trans.
C25] P. Schuurman, \A table of state complexity bounds for binary linear codes," IEEE Trans.
C26] G.B. Horn and F.R. Kschischang, \On the intractability of permuting a block code to
minimize trellis complexity," IEEE Trans. Inform. Theory, vol. 42, pp. 2042{2048, 1996.
C27] S.B. Encheva, \On self-dual codes over GF(3) and their coordinate orering," Reports in
Informatics, No. 128, Department of Informatics, University of Bergen, December, 1996.
C28] S.B. Encheva, \On repeated-root cyclic codes and the two-way chain condition," Lecture
Notes Comp. Sci., vol. 1255, pp. 78{87, Springer-Verlag 1997.
C29] A. Trachtenberg and A. Vardy, \Lexicographic codes: constructions, bounds, and trellis
complexity," in Proc. 31-st Annual Conf. on Inform. Sciences and Systems, Princeton,
NJ., pp. 521{526, March 1997.
C30] S.B. Encheva and G.D. Cohen, \Self-orthogonal codes and their coordinate ordering,"
IEICE Trans. Fundamentals, vol. E80{A, pp. 2256{2259, 1997.
C31] A. Vardy, \The intractability of computing the minimum distance of a code," IEEE Trans.
C32] I. Reuven and Y. Be'ery, \Entropy/length proles and bounds on trellis complexity of
nonlinear codes," IEEE Trans. Inform. Theory, vol. 44, pp. 580{598, 1998.
C33] K. Jain, I. Mandoiu, and V.V. Vazirani, \The \art of trellis decoding" is computationally
hard | for large elds," IEEE Trans. Inform. Theory, to appear, 1998.
C34] T. Kl!ve, \On codes satisfying the double chain condition," Discr. Math., to appear, 1998.
C35] R. Morelos-Zaragoza, T. Fujiwara, T. Kasami, and S. Lin, \Constructions of generalized
concatenated codes and their trellis-based decoding complexity," IEEE Trans. Inform.
Theory, to appear, 1998.
C36] Y.-Y. Wang and C.-C. Lu, \Theory and algorithms for optimal equivalent codes with
absolute trellis size," unpublished manuscript, 1998.
C37] A. Engelhart, M. Bossert, and J. Maucher, \Heuristic algorithms for ordering a linear code
to reduce the number of nodes of the minimal trellis," unpublished manuscript, 1998.
113
D. Trellis structure and complexity of lattices
D1] G.D. Forney, Jr., \Coset codes II: Binary lattices and related codes," IEEE Trans. Inform.
Theory, vol. 34, pp. 1152{1187, 1988.
D2] G.D. Forney, Jr., \Density/length proles and trellis complexity of lattices," IEEE Trans.
D3] I.F. Blake and V. Tarokh, \On the trellis complexity of the densest lattice packings in IRn ,"
SIAM J. Discrete Math., vol. 9, pp. 597{601, 1996.
D4] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices I," IEEE
D5] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices II," IEEE
D6] V. Tarokh and A. Vardy, \Upper bounds on trellis complexity of lattices," IEEE Trans.
Inform. Theory, vol. 43, pp. 1294-1300, 1997.
D7] A.H. Banihashemi and I.F. Blake, \Trellis complexity and minimal trellis diagrams of
lattices," IEEE Trans. Inform. Theory, to appear, 1998.
D8] A. M"ehes, Z. Nagy, and K. Zeger, \Improved upper bounds on trellis complexity of lat-
tices," IEEE Trans. Inform. Theory, to appear, 1998.
D9] A.H. Banihashemi and I.F. Blake, \On the trellis complexity of root lattices and their
duals," IEEE Trans. Inform. Theory, submitted for publication, 1997.
E. Trellis decoding: structures and algorithms

E1] G.D. Forney, Jr., \The Viterbi algorithm," Proceedings IEEE, vol. 61, pp. 268{278, 1973.
E2] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for mini-
E3] J.K. Wolf, \Ecient maximum-likelihood decoding of linear block codes using a trellis,"
E4] G.D. Forney, Jr., R.G. Gallager, G.R. Lang, F. Longsta, and S.U. Qureshi,\Ecient mod-
ulation for band-limited channels," IEEE J. Sel. Areas Comm., vol. 2, pp. 632{646, 1984.
E5] A. Vardy and Y. Be'ery, \Bit-level soft decision decoding of Reed-Solomon codes," IEEE
Trans. Commun., vol. 39, pp. 440{445, 1991.
E6] R.C. Davis and H.-A. Loeliger, \A nonalgoritmic maximum-likelihood decoder for trellis
codes," IEEE Trans. Inform. Theory, vol. 39, pp. 1450{1453, 1993.
114
E7] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On structural complexity of the -section
L
minimal trellis diagrams for binary linear block codes," IEICE Trans. Fundamentals,
vol. E76{A, pp. 1411{1421, 1993.
E8] B. Honary and G. Markarian, \Low complexity trellis decoding of Hamming codes," IEE
Electronics Lett., vol. 29, pp. 1114{1116, 1993.
E9] B. Honary, G. Markarian, M. Darnell, and L. Kaya, \Maximum likelihood decoding of
array codes with trellis structure," IEE Proceedings, vol. I{140, pp.340{345, 1993.
E10] A. Vardy and Y. Be'ery, \Maximum-likelihood soft decision decoding of BCH codes,"
E11] V. Sorokine, F.R. Kschischang, and V. Durand, \Trellis-based decoding of binary linear
block codes," in Lecture Notes in Comput. Sci., vol. 793, pp. 270{286, Springer, 1994.
E12] Y. Berger and Y. Be'ery, \Soft trellis-based decoder for linear block codes," IEEE Trans.
E13] V.R. Sidorenko and V.V. Zyablov, \Decoding of convolutional codes using a syndrome
trellis," IEEE Trans. Inform. Theory, vol. 40, pp. 1663{1666, 1994.
E14] T. Kasami, T. Fujiwara, Y. Desaki, and S. Lin, \On branch labels of parallel components
of the -section minimal trellis diagrams for binary linear block codes," IEICE Trans.
L
Fundamentals, vol. E77{A, pp. 1058{1068, 1994.

E15] G. Markarian and B. Honary, \Trellis decoding technique for the block RLL/ECC codes,"
IEE Proceedings, vol. I{141, pp.297{302, 1994.
E16] G. Markarian, B. Honary, and M. Blaum, \Maximum likelihood trellis decoding technique
for balanced codes," IEE Electronics Lett., vol. 31, pp. 447{448, 1995.
E17] B. Honary, G. Markarian, and M. Darnell, \Low complexity trellis decoding of linear block
codes," IEE Proc. Communications, vol. 142, pp. 201{209, 1995.
E18] A. Lafourcade and A. Vardy, \Optimal sectionalization of a trellis," IEEE Trans. Inform.
Theory, vol. 42, pp. 689{703, 1996.
E19] R.J. McEliece, \On the BCJR trellis for linear block codes," IEEE Trans. Inform. Theory,
vol. 42, pp. 1072{1092, 1996.
E20] R.C.-K. Lee and F.R. Kschischang, \Non-minimal trellises for linear block codes," in Lec-
ture Notes in Comput. Sci., vol. 1133, pp. 111{129, Springer-Verlag, 1996.
E21] I.I. Dumer \Suboptimal decoding of linear codes: partition technique," IEEE Trans. In-
E22] H.T. Moorthy, S. Lin, and G.T. Uehara, \Good trellises for IC implementation of Viterbi
decoders for linear block codes," IEEE Trans. Comm., vol. 45, pp. 52{63, 1997.
E23] T. Fujiwara, H. Yamamoto, T. Kasami and S. Lin \A trellis-based maximum likelihood de-
coding algorithm for linear block codes," IEEE Trans. Inform. Theory, to appear, 1998.
115
F. Related topics: generalized Hamming weights
F1] T. Helleseth, T. Kl!ve, and J. Mykkelveit, \The weight distribution of irreducible cyclic
codes with block lengths 1 ( l ; 1 )n ," Discrete Math., vol. 18, pp. 179{211, 1977.
n q =N
F2] V.K. Wei, \Generalized Hamming weights for linear codes," IEEE Trans. Inform. Theory,
vol. 37, pp. 1412{1418, 1991.
F3] G.L. Feng, K.K. Tzeng, and V.K. Wei, \On the generalized Hamming weights of several
classes of cyclic codes," IEEE Trans. Inform. Theory, vol. 38, pp. 1125{1130, 1992.
F4] T. Helleseth, T. Kl!ve, and . Ytrehus, \Generalized Hamming weights of linear codes,"
F5] T. Kl!ve, \Minimum support weights of binary codes," IEEE Trans. Inform. Theory,
vol. 39, pp. 648{654, 1993.
F6] V.K. Wei and K. Yang, \On the generalized Hamming weights of product codes," IEEE
F7] J. Simonis, \The eective length of subcodes," Applicable Algebra in Engineering, Com-
munication and Computing, vol. 5, pp. 371{377, 1994.
F8] G. van der Geer and M. van der Vlugt, \On generalized Hamming weights of BCH codes,"
F9] K. Yang, P.V. Kumar, and H. Stichtenoth, \On the weight hierarchy of geometric Goppa
F10] G. Cohen, S. Litsyn, and G. Z"emor, \Upper bounds on generalized distances," IEEE
F11] T. Helleseth and P.V. Kumar, \The weight hierarchy of the Kasami codes," Discrete
Math., vol. 145, pp. 133{143, 1995.
F12] K. Yang, T. Helleseth, P.V. Kumar, and A.G. Shanbhag, \On the weight hierarchy of Ker-
dock codes over Z 4 ," IEEE Trans. Inform. Theory, vol. 42, pp. 1587{1593, 1996.
G. Related topics: trellises for group and convolutional codes

G1] J.C. Willems, \System theoretic models for the analysis of physical systems," Richerche
di Automatica, vol. 10, pp. 71{106, 1979.
G2] J.C. Willems, \Models for dynamics," in Dynamics Reported, Volume 2, U. Kirchgraber
and H.O. Walther, (Editors), pp. 171{269, New York: Wiley, 1989.
G3] G.D. Forney, Jr. and M.D. Trott, \The dynamics of group codes: state spaces, trellises
and canonical encoders," IEEE Trans. Inform. Theory, vol. 39, pp. 1491{1513, 1993.
116
G4] H.-A. Loeliger, G.D. Forney, Jr., T. Mittelholzer, and M.D. Trott, \Minimality and observ-
ability of group systems," Linear Alg. Appl., vols. 205{206, pp. 937{963, 1994.
G5] H.-A. Loeliger and T. Mittelholzer, \Convolutional codes over groups," IEEE Trans. In-
G6] F.R. Kschischang, \The trellis structure of maximal xed-cost codes," IEEE Trans. In-
G7] R.J. McEliece and W. Lin, \The trellis complexity of convolutional codes," IEEE Trans.
G8] G.D. Forney, Jr. and M.D. Trott, \The dynamics of group codes: dual group codes and
systems," unpublished manuscript, 1998.
H. Related topics: representing a code by a graph

H1] R.G. Gallager, Low Density Parity-Check Codes, Cambridge: M.I.T. Press, 1962.
H2] G. Solomon and H.C.A. van Tilborg, \A connection between block and convolutional codes,"
SIAM J. Appl. Math., vol. 37, pp. 358{369, 1979.
H3] R.M. Tanner, \A recursive approach to low-complexity codes," IEEE Trans. Inform. Theory,
vol. 27, pp. 533{547, 1981.
H4] H.H. Ma and J.K. Wolf, \On tail biting convolutional codes," IEEE Trans. Commun.,
vol. 34, pp. 104{111, 1986.
H5] N. Wiberg, H.-A. Loeliger, and R. Kotter, \Codes and iterative decoding on general graphs,"
Euro. Trans. Telecommun., vol. 6, pp. 513{526, 1995.
H6] N. Wiberg, \Codes and decoding on general graphs," Ph.D. dissertation, University of
Linkoping, Sweden, 1996.
H7] G.D. Forney, Jr., \The forward-backward algorithm," in Proc. 34-th Allerton Conference on
Comm., Control, and Computing, Monticello, IL., pp. 432{446, October 1996.
H8] J.F. Cheng and R.J. McEliece, \Near capacity codecs for the Gaussian channel based on
low-density generator matrices," in Proc. 34-th Annual Allerton Conference on Comm.,
Control, and Computing, Monticello, IL., pp. 494{503, October 1996.
H9] M. Sipser and D.A. Spielman, \Expander codes," IEEE Trans. Inform. Theory, vol. 42,
pp. 1710{1722, 1996.
H10] D.A. Spielman, \Linear-time encodable and decodable codes," IEEE Trans. Inform. Theory,
vol. 42, pp. 1723{1731, 1996.
H11] D.J.C. MacKay and R.M. Neal, \Near Shannon limit performance of low-density parity-
check codes," IEE Electronics Lett., vol. 32, pp. 1645{1646, 1996.
117
H12] B.J. Frey, F.R. Kschischang, H.-A. Loeliger, and N. Wiberg, \Factor graphs and algorithms,"
in Proc. 35-th Allerton Conference on Comm., Control, and Computing, Monticello, IL.,
September 1997.
H13] B.J. Frey, Graphical Models for Machine Learning and Digital Communication, Cambridge,
MA: MIT Press, 1998.
H14] B.J. Frey and F.R. Kschischang, \Iterative decoding of compound codes by probability prop-
agation in graphical models," IEEE J. on Selected Areas in Communications, to appear,
February 1998.
H15] D.J.C. MacKay, \Good error-correcting codes based on very sparse matrices," IEEE Trans.
Inform. Theory, to appear, 1998.
H16] A.R. Calderbank, G.D. Forney, Jr., and A. Vardy, \Minimal tail-biting trellises: the Golay
code and more," IEEE Trans. Inform. Theory, to appear, 1998.
H17] B.J. Frey, F.R. Kschischang, H.-A. Loeliger, and N. Wiberg, \Factor graphs and the sum-
product algorithm," unpublished manuscript, 1998.
H18] R. Kotter and A. Vardy, \Factor graphs: classication, bounds, and constructions," unpub-
lished manuscript, 1998.
I. Miscellaneous
I1] J. Feigenbaum, G.D. Forney Jr., B.H. Marcus, R.J. McEliece, and A. Vardy, Special issue
on \Codes and Complexity," IEEE Trans. Inform. Theory, vol. 42, November 1996.
I2] B. Honary and G. Markarian, Trellis Decoding of Block Codes: A Practical Approach, Bos-
ton: Kluwer Academic, 1997.
I3] S. Lin, T. Kasami, T. Fujiwara, and M. Fossorier, Trellises and Trellis-Based Decoding
Algorithms for Linear Block Codes, Boston: Kluwer Academic, 1998, to appear.
In this `guide to the literature' we have have attempted to provide a comprehensive list of
references for subjects A, B, C, and D. On the remaining subjects, the bibliography in this
section is admittedly sketchy. As these subjects are of peripheral relevance to our chapter,
we reference only those papers that are key to the subject and/or are most closely related to
subjects A, B, C, and D. For the most part, we did not reference in this section conference
papers that were later subsumed by more extensive journal publications.
118
About the cover: The top left gure is
part of the semi-in nite trellis for a sim-
00 00 00 00
00 00
11 11 11 11
11 11 11 11
ple rate-1=2 convolutional code. Trellises of

00
11 11 11 11 00
11 11
00 00 00 00 00 00
this kind were rst studied over 30 years 01
10
01
10
01
10
01
10
01
01 01
01
ago. The top right gure is the minimal 10 10 10 10 10 10

10 10 10 10
trellis for the (8 4 4) binary Hamming code,

01 01 01 01 01 01
in the componentwise optimal order of the e
time axis, sectionalized into four sections of

length 2. This simple trellis thus embodies
a b
most of what is known about trellises today, h f
as discussed in Sections 4, 5, and 6 of this d c
chapter. The bottom gure is a factor graph

that represents the (24 12 8) binary Golay g
code G24. Each of the 24 edges in this factor graph corresponds to a trellis section the state
complexity of the resulting representation is only s = 3 as compared to s = 9 in the best
possible conventional trellis for G24. Preciously little is known about such representations
today, but we will surely know more 30 years from now.
Acknowledgement. My work on trellis structure of codes in 17, 66, 72, 73, 74, 75, 105,
107, 111, 112, 113, 114, 115, 116] would not have been possible without my co-authors:
Yair Be'ery, Robert Calderbank, David Forney, John Laerty, Alec Lafourcade, Ralf Kotter,
Frank Kschischang, Jakov Snyders, Vahid Tarokh, and Ari Trachtenberg. Collaborating with
each and every one of them was a pleasure. Speci cally for their contributions to this chapter,
I would like to thank Petra Schuurman for providing the table included in Section 5.4 and
Bob McEliece for letting me follow his work 82] so closely in Section 3.1. Frank Kschichang
communicated to me many thoughtful and stimulating remarks on this chapter, that are
much appreciated. Yair Be'ery, Sylvia Encheva, David Forney, Aaron Kiely, Shu Lin, and
Vladimir Sidorenko kindly provided comments on the list of references compiled in Section 7.
All the gures in this chapter are due to the artwork of Robert F. MacFarlane. I am grateful
to Vera Pless and Cary Human, the Editors of this Handbook, for their encouragement
and their patience. Finally, I am deeply indebted to my best friend Hagit Itzkowitz without
her invaluable help this chapter would have never been written.
119
Bibliography
1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The Design and Analysis of Computer Al-
gorithms, Reading, MA: Addison-Wesley, 1974.
2] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for
minimizing symbol error rate," IEEE Trans. Inform. Theory, vol. 20, pp. 284{287, 1974.
3] A.H. Banihashemi and I.F. Blake, \Trellis complexity and minimal trellis diagrams of
lattices," IEEE Trans. Inform. Theory, to appear, 1998.
4] R. Bellman, Dynamic Programming, Princeton, NJ: Princeton University Press, 1957.
5] Y. Berger and Y. Be'ery, \Bounds on the trellis size of linear block codes," IEEE Trans.
6] Y. Berger and Y. Be'ery, \Trellis-oriented decomposition and trellis-complexity of com-
posite length cyclic codes," IEEE Trans. Inform. Theory, vol. 41, pp. 1185{1191, 1995.
7] Y. Berger and Y. Be'ery, \The twisted squaring construction, trellis complexity and
generalized weights of BCH and QR codes," IEEE Trans. Inform. Theory, vol. 42,
pp. 1817{1827, 1996.
8] E.R. Berlekamp, R.J. McEliece, and H.C.A. van Tilborg, \On the inherent intractability
of certain coding problems," IEEE Trans. Inform. Theory, vol. 24, pp. 384{386, 1978.
9] D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, New York:
Academic Press, 1982.
10] D.P. Bertsekas and R.G. Gallager, Data Networks, Englewood Clis: Prentice-Hall,
2nd Edition, 1991.
11] I.F. Blake and V. Tarokh, \On the trellis complexity of densest lattice packings in IR ,"
n
SIAM J. Discrete Math., vol. 9, pp. 597{601, 1996.

12] A.E. Brouwer, \Bounds on the size of linear codes," to appear in the Handbook of
Coding Theory, V.S. Pless, W.C. Human, and R.A. Brualdi, (Editors), Amster-
dam: Elsevier, 1998.
13] A.E. Brouwer and T. Verhoe, \An updated table of minimum-distance bounds for
binary linear codes," IEEE Trans. Inform. Theory, vol. IT-39, pp. 662{677, 1993.
14] R.E. Bryant, \Graph-based algorithms for boolean function manipulations," IEEE
Trans. Computers, vol. 35, pp. 677{691, 1986.
15] R.E. Bryant, \Symbolic Boolean manipulation with ordered binary decision diagrams,"
ACM Computing Surveys, vol. 24, pp. 293{318, 1992.
16] R.E. Bryant, \Binary decision diagrams and beyond: Enabling technologies for formal
veri cation," in Proc. International Conf. Computer-Aided Design, November 1995.
120
17] A.R. Calderbank, G.D. Forney, Jr., and A. Vardy, \Minimal tail-biting trellises: the
Golay code and more," IEEE Trans. Inform. Theory, submitted for publication, 1997.
18] G. Cohen, S. Litsyn, and G. Zemor, \Upper bounds on generalized distances," IEEE
19] J.H. Conway and N.J.A. Sloane, Sphere Packings, Lattices and Groups, New York:
Springer-Verlag, 1988.
20] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, Cambridge,
MA: MIT Press and McGraw-Hill, 1990.
21] N. Deo, Graph Theory with Applications to Engineering and Computer Science, Engle-
wood Clis: Prentice-Hall, 1974.
22] A. Dholakia, Introduction to Convolutional Codes, Boston: Kluwer Academic, 1994.
23] E.W. Dijkstra, \A note on two problems in connection with graphs," Numerische
Math., vol. 1, pp. 269{271, 1959.
24] S. Dolinar, L. Ekroot, A.B. Kiely, R.J. McEliece, and W. Lin, \The permutation trellis
complexity of linear block codes," in Proc. 32-nd Allerton Conference on Comm.,
Control, and Computing, Monticello, IL., pp. 60{74, September 1994.
25] L.L. Dornho and F.E. Hohn, Applied Modern Algebra, New York: Macmillan Publish-
ing Co., 1978.
26] I.I. Dumer, \On complexity of maximum-likelihood decoding of the best concatenated
codes," in Proc. 8-th All-Union Conf. on Coding Theory and Information Theory,
Moscow-Kuibishev, pp. 66-69, 1981, (in Russian).
27] S.B. Encheva, \On the binary linear codes which satisfy the two-way chain condition,"
28] S.B. Encheva, \On repeated-root cyclic codes and the two-way chain condition," Lec-
ture Notes Comp. Sci., vol. 1255, pp. 78{87, Springer 1997.
29] S.B. Encheva and G.D. Cohen, \Self-orthogonal codes and their coordinate ordering,"
IEICE Trans. Fundamentals, vol. E80{A, pp. 2256{2259, 1997.
30] A. Engelhart, M. Bossert, and J. Maucher, \Heuristic algorithms for ordering a linear
code to reduce the number of nodes of the minimal trellis," unpublished manuscript.
31] R.M. Fano, \A heuristic discussion of probabilistic decoding," IEEE Trans. Inform.
Theory, vol. 9, pp. 64{73, 1963.
32] J. Feigenbaum, G.D. Forney, Jr., B. Marcus, R.J. McEliece, and A. Vardy, Special issue
on \Codes and Complexity," IEEE Trans. Inform. Theory, vol. 42, November 1996.
33] G.D. Forney, Jr., \Final report on a coding system design for advanced solar missions,"
Contract NAS2{3637, NASA Ames Research Center, CA, December 1967.
121
34] G.D. Forney, Jr., \Convolutional codes I: Algebraic structure," IEEE Trans. Inform.
Theory, vol. 16, pp. 720{738, 1970.
35] G.D. Forney, Jr., \The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268{278, 1973.
36] G.D. Forney, Jr., \Convolutional codes III: Sequential decoding," Inform. Control,
vol. 25, pp. 267{297, 1974.
37] G.D. Forney, Jr., \Coset codes I: Introduction and geometrical classi cation," IEEE
38] G.D. Forney, Jr., \Coset codes II: Binary lattices and related codes," IEEE Trans.
39] G.D. Forney, Jr., \Dimension/length pro les and trellis complexity of linear block
40] G.D. Forney, Jr., \Density/length pro les and trellis complexity of lattices," IEEE
41] G.D. Forney, Jr., B.H. Marcus, N.T. Sindhushayana, and M.D. Trott, \Multilingual Dic-
tionary," in Dierent Aspects of Coding Theory, A.R. Calderbank (Editor), vol. 50,
pp. 109{138, Proc. Symp. Applied Mathematics, Providence, RI: AMS Press, 1995.
42] G.D. Forney, Jr. and M.D. Trott, \The dynamics of group codes: state spaces, trellises
and canonical encoders," IEEE Trans. Inform. Theory, vol. 39, pp. 1491{1513, 1993.
43] G.D. Forney, Jr. and M.D. Trott, \The dynamics of group codes: dual group codes and
systems," unpublished manuscript, 1998.
44] B.J. Frey, F.R. Kschischang, H.-A. Loeliger, and N. Wiberg, \Factor graphs and the
sum-product algorithm," in preparation, 1997.
45] T. Fujiwara, H. Yamamoto, T. Kasami and S. Lin \A trellis-based recursive maximum-
likelihood decoding algorithm for binary linear block codes," IEEE Trans. Inform.
Theory, vol. 44, pp. 714{729, 1998.
46] R.G. Gallager, Information Theory and Reliable Communication, Boston: Wiley, 1968.
47] G.S. Gallo and S. Pallotino, \Shortest path algorithms," Annals Oper. Research, vol. 7,
pp. 3{79, 1988.
48] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory
of NP-Completeness. San Francisco, CA: Freeman, 1979.
49] J.-M. Goethals, \On the Golay perfect binary code," J. Combin. Theory, vol. 11,
pp. 178{186, 1971.
50] V.D. Goppa, \Bounds for codes," Doklady Akademii Nauk, vol. 333, p. 423, 1993.
51] T. Helleseth, T. Klve, and J. Mykkelveit, \The weight distribution of irreducible cyclic
codes with block lengths n1(q ; 1=N ) ," Discrete Math., vol. 18, pp. 179{211, 1977.
l n
122
52] T. Helleseth, T. Klve, and . Ytrehus, \Generalized Hamming weights of linear
53] T. Helleseth and P.V. Kumar, \The weight hierarchy of the Kasami codes," Discrete
Math., vol. 145, pp. 133{143, 1995.
54] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and
Computation, Reading, MA: Addison-Wesley, 1979.
55] G. Horn and F.R. Kschischang, \On the intractability of permuting a block code to min-
imize trellis complexity," IEEE Trans. Inform. Theory, vol. 42, pp. 2042{2048, 1996.
56] T.W. Hungerford, Algebra, New York: Holt, Rinehart and Winston, 1974.
57] K. Jain, I. M!andoiu, and V.V. Vazirani, \The \art of trellis decoding" is computation-
ally hard | for large elds," IEEE Trans. Inform. Theory, to appear, 1998.
58] J. Justesen, \A class of constructive asymptotically good algebraic codes," IEEE Trans.
59] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \Trellis diagram construction for some
BCH codes," IEEE Int. Symp. Inform. Theory and Appl., Honolulu, Hawaii, 1990.
60] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On the optimum bit orders with respect
to the state complexity of trellis diagrams for binary linear codes," IEEE Trans. Inform.
Theory, vol. 39, pp. 242{245, 1993.
61] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On complexity of trellis structure of
linear block codes," IEEE Trans. Inform. Theory, vol. 39, pp. 1057{1064, 1993.
62] A.B. Kiely, S. Dolinar, R.J. McEliece, L. Ekroot, and W. Lin, \Trellis decoding comple-
xity of linear block codes," IEEE Trans. Inform. Theory, vol. 42, pp. 1687{1697, 1996.
63] T. Klve, \Support weight distribution of linear codes," Discrete Math., vol. 107,
pp. 311{316, 1992.
64] T. Klve, \On codes satisfying the double chain condition," Discrete Math., to appear.
65] A.D. Kot and C. Leung, \On the construction and dimensionality of linear block code
trellises," in Proc. IEEE Int. Symp. Inform. Theory, p. 291, San Antonio, TX., 1993.
66] R. Kotter and A. Vardy, \Factor graphs: classi cation, bounds, and constructions,"
67] R. Kotter and A. Vardy, \Theory of tail-biting trellises," unpublished manuscript, 1998.
68] F.R. Kschischang, \The trellis structure of maximal xed-cost codes," IEEE Trans.
69] F.R. Kschischang and G.B. Horn, \A heuristic for ordering a linear block code to mini-
mize trellis state complexity," in Proc. 32-nd Allerton Conference on Comm., Control,
and Computing, Monticello, IL., pp. 75{84, September 1994.
123
70] F.R. Kschischang and V. Sorokine, \On the trellis structure of block codes," IEEE
71] B.D. Kudryashov and T.G. Zakharova, \Block codes from convolutional codes," Prob-
lemy Peredachi Informatsii, vol. 25, pp. 98{102, 1989, (in Russian).
72] J.D. Laerty and A. Vardy, \Ordered binary decision diagrams and minimal trellises,"
73] A. Lafourcade and A. Vardy, \Asymptotically good codes have in nite trellis complex-
ity," IEEE Trans. Inform. Theory, vol. 41, pp. 555{559, 1995.
74] A. Lafourcade and A. Vardy, \Lower bounds on trellis complexity of block codes," IEEE
75] A. Lafourcade and A. Vardy, \Optimal sectionalization of a trellis," IEEE Trans. In-
76] S. Lin and D.J. Costello, Jr., Error Control Coding: Fundamentals and Applications,
Englewood Clis: Prentice-Hall, 1983.
77] D. Lind and B.H. Marcus, An Introduction to Symbolic Dynamics and Coding, New
York: Cambridge University Press, 1995.
78] H.-A. Loeliger, G.D. Forney, Jr., T. Mittelholzer, and M.D. Trott, \Minimality and ob-
servability of group systems," Linear Alg. Appl., vols. 205{206, pp. 937{963, 1994.
79] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, New
York: North-Holland, 1977.
80] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Infor-
mation Theory and Systems, vol. 65, pp. 148{157, NTG-Fachberichte, Berlin, 1978.
81] R.J. McEliece, Theory of Information and Coding, Reading: Addison-Wesley, 1977.
82] R.J. McEliece, \On the BCJR trellis for linear block codes," IEEE Trans. Inform.
Theory, vol. 42, pp. 1072{1092, 1996.
83] R.J. McEliece, \The algebraic theory of convolutional codes," to appear in the Hand-
book of Coding Theory, V.S. Pless, W.C. Human, and R.A. Brualdi, (Editors),
Amsterdam: Elsevier, 1998.
84] R.J. McEliece, E.R. Rodemich, H.C. Rumsey, and L.R. Welch, \New upper bounds on
the rate of a code via the Delsarte-MacWilliams inequalities," IEEE Trans. Inform.
Theory, vol. 23, pp. 157{166, 1977.
85] H.T. Moorthy, S. Lin, and G.T. Uehara, \Good trellises for IC implementation of
Viterbi decoders for linear block codes," IEEE Trans. Comm., vol. 45, pp. 52{63, 1997.
86] R. Morelos-Zaragoza, T. Fujiwara, T. Kasami, and S. Lin, \Constructions of generalized
concatenated codes and their trellis-based decoding complexity," preprint, 1998.
124
87] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
88] J. Orlin, \Contentment in graph theory: covering graphs with cliques," unpublished
manuscript, 1976.
89] J.P. Odenwalter and A.J. Viterbi, \Overview of existing and projected uses of coding in
military satellite communications," NTC Conference Records, Los Angeles, CA., 1977.
90] L.H. Ozarow and A.D. Wyner, \Wire-tap-channel II," Bell Labs Tech. J., vol. 63,
pp. 2135{2157, 1984.
91] Ph. Piret, Convolutional Codes: An Algebraic Approach, Cambridge: MIT Press, 1988.
92] I. Reuven and Y. Be'ery, \Entropy/length pro les, bounds on the minimal covering
of bipartite graphs, and trellis complexity of nonlinear codes," IEEE Trans. Inform.
Theory, vol. 44, pp. 580{598, 1998.
93] B. Reznick, P. Tiwari, and D.B. West, \Decomposition of product graphs into complete
bipartite subgraphs," Discrete Math., vol. 57, pp. 179{183, 1985.
94] C. Roos, \On the structure of convolutional and cyclic convolutional codes," IEEE
95] E.J. Rossin, N.T. Sindhushayana, and C.D. Heegard, \Trellis group codes for the Gaus-
sian channel," IEEE Trans. Inform. Theory, vol. 41, pp. 1217{1245, 1995.
96] P. Schuurman, \A table of state complexity bounds for binary linear codes," IEEE
97] V.R. Sidorenko, \The Euler characteristic of the minimal code trellis is maximum,"
Problemy Peredachi Informatsii, vol. 33, pp. 87-93, 1997, (in Russian).
98] V.R. Sidorenko, I. Martin, and B. Honary \On separability of nonlinear block codes,"
IEEE Trans. Inform. Theory, to appear, 1998.
99] J. Simonis, \The eective length of subcodes," Applicable Algebra in Engineering,
Communication and Computing, vol. 5, pp. 371{377, 1994.
100] G. Solomon and H.C.A. van Tilborg, \A connection between block and convolutional
codes," SIAM J. Appl. Math., vol. 37, pp. 358{369, 1979.
101] V. Sorokine, F.R. Kschischang, and V. Durand, \Trellis-based decoding of binary linear
block codes," in Lecture Notes in Comput. Sci., vol. 793, pp. 270{286, Springer 1994.
102] R.M. Tanner, \A recursive approach to low-complexity codes," IEEE Trans. Inform.
Theory, vol. 27, pp. 533{547, 1981.
103] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices I," IEEE
125
104] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices II,"
105] V. Tarokh and A. Vardy, \Upper bounds on trellis complexity of lattices," IEEE Trans.
Inform. Theory, vol. 43, pp. 1294-1300, 1997.
106] V. Tarokh, A. Vardy, and K. Zeger, \Sequential decoding of lattice codes," unpublished
manuscript, 1997.
107] A. Trachtenberg and A. Vardy, \Lexicographic codes: constructions, bounds, and trellis
complexity," in Proc. 31-st Annual Conf. on Inform. Sciences and Systems, Princeton,
NJ., pp. 521{526, March 1997.
108] G. Ungerboeck, \Channel coding with multilevel/phase signals," IEEE Trans. Inform.
Theory, vol. 28, pp. 55{67, 1982.
109] A. Vardy, \Dynamical structure of block codes," IEEE Inform. Theory Workshop on
Coding, System Theory, and Symbolic Dynamics, Mans eld, MA, October 1993.
110] A. Vardy, \The Nordstrom-Robinson code: representation over GF(4) and e#cient
decoding," IEEE Trans. Inform. Theory, vol. 40, pp. 1686{1693, 1994.
111] A. Vardy, \Algorithmic complexity in coding theory and the minimum distance prob-
lem," in Proc. ACM Symp. Theory of Computing, pp. 92{109, El Paso, TX., 1997.
112] A. Vardy, \The intractability of computing the minimum distance of a code," IEEE
113] A. Vardy and Y. Be'ery, \On the problem of nding zero-concurring codewords," IEEE
114] A. Vardy and Y. Be'ery, \Maximum-likelihood soft decision decoding of BCH codes,"
115] A. Vardy and F.R. Kschischang, \Proof of a conjecture of McEliece regarding the expa-
nsion index of the minimal trellis," IEEE Trans. Inform. Theory, pp. 2027{2033, 1996.
116] A. Vardy, J. Snyders, and Y. Be'ery, \Bounds on the dimension of codes and subcodes
with prescribed contraction index," Linear Algebra Appl., vol. 142, pp. 237{261, 1990.
117] V.V. Vazirani, H. Saran, and B. Sundar Rajan, \An e#cient algorithm for constructing
minimal trellises for codes over nite abelian groups," IEEE Trans. Inform. Theory,
vol. 42, pp. 1839{1854, 1996.
118] A.J. Viterbi, \Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm," IEEE Trans. Inform. Theory, vol. 13, pp. 260{269, 1967.
119] Y.-Y. Wang and C.-C. Lu, \The trellis complexity of equivalent binary (17 9) quadratic
residue code is ve," Proc. IEEE Int. Symp. Inform. Theory, San Antonio, TX., 1993.
126
120] Y.-Y. Wang and C.-C. Lu, \Theory and algorithms for optimal equivalent codes with
absolute trellis size," unpublished manuscript, 1996.
121] V.K. Wei, \Generalized Hamming weights for linear codes," IEEE Trans. Inform. The-
ory, vol. 37, pp. 1412{1418, 1991.
122] V.K. Wei and K. Yang, \On the generalized Hamming weights of product codes," IEEE
123] N. Wiberg, H.-A. Loeliger, and R. Kotter, \Codes and iterative decoding on general
graphs," Euro. Trans. Telecommun., vol. 6, pp. 513{526, 1995.
124] J.C. Willems, \System theoretic models for the analysis of physical systems," Richerche
di Automatica, vol. 10, pp. 71{106, 1979.
125] J.C. Willems, \Models for dynamics," in Dynamics Reported, vol. 2, U. Kirchgraber
and H.O. Walther (Editors), pp. 171{269, New York: Wiley, 1989.
126] J.K. Wolf, \E#cient maximum-likelihood decoding of linear block codes using a trellis,"
127] . Ytrehus, \On the trellis complexity of certain binary linear block codes," IEEE
128] V.V. Zyablov and V.R. Sidorenko, \Bounds on complexity of trellis decoding of linear
block codes," Problemy Peredachi Informatsii, vol. 29, pp. 3{9, 1993, (in Russian).
127
Index
T -adjacency, 32, 33 Boolean semiring, 14, 16
T -adjacency graph bounds on trellis complexity
cliques in, 32 asymptotic form of, 93{98
connected components of, 32 CLP bound, 85{88, 95
de nition of, 32 DLP bound, 72{76, 78, 80{85, 94{96
T -equivalence, 23, 32, 33, 36 ELP bound, 87, 88, 98
classes of, 23, 32 for BCH codes, 69, 77
de nition of, 23, 33 for nonlinear codes, 84{88
$f incremental future pro le, 50, 51, 108
i integer programming bound, 82, 83
for the dual code, 52 Kudryashov-Zakharova bound, 97, 98
$p incremental past pro le, 50, 51, 108
i Lafourcade-Vardy bound, 79{82, 85
for the dual code, 52 Muder bound, 71, 73, 74, 80, 94
rf incremental future pro le, 108, 109
i span bound, 77, 78, 80, 85, 93, 94, 96
a + xjb + xja + b + x construction, 66 tables for short codes, 89{92
uju+v construction, 65 Vardy-Be'ery bound, 76, 95
Wolf bound, 4, 62{64, 95, 97
Ytrehus bound, 77
branch-space of a trellis, 35
amalgamation of trellises, 100{102 branches of a trellis, 8
antichain, 46
art of trellis decoding, 4, 56, 57
asymptotic trellis complexity, 93{98 canonical realization
atomic spans, 46, 51 future-induced, 40
past-induced, 40
cardinality-length pro le (CLP), 85{88, 95
BCJR algorithm, 1 Cartesian array
BCJR mapping as a bipartite graph, 27
de nition of, 34 as improper trellis, 28
kernel of, 39, 48 as minimal proper trellis, 28
BCJR matrix, 34, 37, 51 covering by rectangles, 27
BCJR trellis construction, 3, 34, 35, 48, 51 de nition of, 27
minimality of, 36 for a rectangular code, 30
Bellman-Ford algorithm, 105 Cartesian product of trellises, 41, 49, 50
biclique, 27, 51 chain condition, 68
bifurcations in a trellis, 12 channel
binary decision diagram, 46 input alphabet, 18
bipartite graph memoryless and discrete, 18, 19
covering by bicliques, 27 output alphabet, 18
matching, 51 CLP bound, 85{88, 95
biproper trellis, 9, 52{55 co-proper trellis, 9, 53, 55
128
code constructions of a trellis
as a probability space, 85{88 BCJR construction, 3, 34, 35, 48, 51
as past/future relation, 27 minimality of, 36
asymptotically good, 93{98 Forney construction, 4, 38, 39
BCH, 62, 63, 68{70, 77, 80, 83 minimality of, 39, 40
constant-weight, 31 Kschischang-Sorokine construction, 5,
convolutional, 1, 3, 98 40, 42, 43, 45, 49, 50
cyclic, 63, 68, 70 minimality of, 43
distance set of, 76, 77 Massey construction, 4, 36, 37
dual, 51, 52, 74 minimality of, 37
Euclidean geometry, 70 Muder construction, 24
lexicographic, 62 contraction index, 63, 68
linear, 31, 51 convolutional codes, 98
MDS, 62{66 encoder for, 1, 3
Nordstrom-Robinson N16, 31, 41, 88 state-transition diagram for, 1, 2
quadratic-residue, 70 trellis for, 2, 3
rectangular, 30{33, 52{56, 88 covering graphs by bicliques, 27
Reed-Muller, 41, 61, 65, 66, 73
represented by a trellis, 3, 9
satisfying double-chain condition, 68 decomposition-associative function, 105
self-complementary, 77 decomposition-linear function, 102
self-dual, 52, 68, 75, 109 depth of a trellis, 7, 100
the (16 7 6) lexicode L16, 67, 73, 89 depth of a vertex in a trellis, 1
the (24 12 8) binary Golay G24, 66, 73, Dijkstra algorithm, 21, 105
74, 89, 109, 119 dimension-length pro le (DLP), 71{79, 82,
the (48 24 12) quadratic-residue Q48, 85, 86, 89, 98
67, 68, 74, 75 as a special case of ELP, 86
the (8 4 4) binary Hamming, 3, 12, 47, distance bound upon, 73{75
65, 99, 119 duality theorem, 74
uniformly concise, 98 equivalent to the GHW hierarchy, 72
weight-enumerator of, 14, 18 Griesmer bound upon, 76
codeword future, 27, 53 invariant under permutations, 74, 77
codeword past, 27, 53 inverse, 75, 86
commutative monoid, 13 direct-sum structure, 70
componentwise optimal permutation, 64, direct-sum subcode P F , 38, 39, 48
i i
65, 67, 68, 73, 74, 76 directed path, 7
composition of trellises, 100{102 directed walk, 7
concurring-sum structure, 70 distance set of a code, 76, 77
conditional entropy, 86 DLP bound, 72{76, 78, 80{83, 85, 94{96
conditional entropy-length pro le, 86{88 as a special case of LV bound, 80, 96
constant-weight code, 31 asymptotic form, 94, 95
constructions of a code on the total number of edges, 82, 83
uju+v, 65 on the total number of vertices, 82, 83
Turyn a + xjb + xja + b + x, 66 double-chain condition, 68
twisted squaring, 65, 67, 70 dynamic programming, 13, 14
129
edge complexity of a trellis, 11, 82 tail-biting trellis, 6, 16, 78
edge set of a trellis, 7 Tanner graph, 6
as a linear space, 35 generalized Hamming weight hierarchy, 52,
in the BCJR trellis, 34, 48 71{73, 75, 76
in the Forney trellis, 39 duality theorem, 75
in the Massey trellis, 37 equivalent to the DLP, 72
number of edges in, 48, 82, 84 Griesmer bound upon, 76
edge-cardinality pro le, 11, 48 inverse, 75
edge-complexity pro le, 11, 48, 50, 51, 71, generator for a linear code
72, 82, 84 activity interval of, 42, 48
edge-space of a trellis, 35 ends at, 42
elementary trellis, 42, 43, 49 starts at, 42
ELP bound, 87, 88, 98 trellis for, 43
entropy, 84, 86 generator matrix, 52, 59, 60, 63
entropy-length pro le (ELP), 86{88 direct-sum structure in, 70
equivalence of trellises, 100, 102, 103 in BCJR construction, 34
expansion index of a trellis, 11, 54, 55 in Kschischang-Sorokine construction,
40, 42, 43
in Massey construction, 36
factor graph, 6, 119 in minimal span form, 43{47, 51, 67
&ows on a trellis, 13{18 in row-reduced echelon form, 36, 37
Floyd-Warshall algorithm, 21 Gilbert-Varshamov bound, 94{98
Forney trellis construction, 4, 38, 39 Griesmer bound, 76
minimality of, 39, 40 group codes, 46
function of a trellis
decomposition-associative, 105
decomposition-linear, 102, 104 heuristics for the permutation problem, 62
examples of, 102 history of trellises, 1{5
minimized by sectionalization, 103
future of a vertex, 24, 27, 49, 53, 54
future pro le, 47, 48, 106, 108, 109 improper trellis, 26
future subcode, 38{40, 47 information symbols, 36
dimension of, 44, 47 integer programming bound
in the dual code, 52, 74 on the number of edges, 83
minimum distance of, 71 on the number of vertices, 82
nested property of, 48 isomorphic trellises, 9, 54, 55
support size of, 71
future-equivalence, 24, 28, 33
classes of, 24, 40 JPL bound, 94
de nition of, 24
Kschischang-Sorokine trellis construction,
general a#ne group GA(m), 70 40, 42, 43, 45, 49, 50
generalizations of a trellis minimality of, 43
factor graph, 6, 119 Kudryashov-Zakharova bound, 97, 98
130
Lafourcade-Vardy bound, 79{82, 85, 95, 97 for the dual code, 51, 52
asymptotic form, 95, 97 for the lexicode L16, 67, 73
for nonlinear codes, 85 for the quadratic-residue code Q48, 67
on edge complexity, 82 minimizes trellis complexity, 53{56
tables for speci c codes, 80, 82 Muder construction of, 24
language of a trellis, 14, 17 nonexistence of, 29
left index L(), 36, 42, 51 not one-to-one, 28
partition of the time axis, 52 not proper, 26, 28
length of a section in a trellis, 100 not unique, 29
length of a trellis, 100 number of edges in, 48
linear constraints number of vertices in, 48
on edge-complexity pro le, 83 uniqueness of, 22, 25, 26, 29, 32, 33, 54
on state-complexity pro le, 82 minimal-span codes, 68
lowest-cost path in a trellis, 18{20 minimal-span generator matrix, 44
de nition of, 43
determines past and future pro les, 47
Massey trellis construction, 4, 36, 37 for the binary Golay code G24, 66
minimality of, 37 for the lexicode L16, 67
maximum-likelihood decoder, 19 for the quadratic-residue code Q48, 67
maximum-likelihood decoding on a trellis, greedy conversion algorithm for, 44, 45
13, 14, 18{20 non-uniqueness of, 46
MDS codes, 62{66 rows end at dierent times, 44, 51
merge index of a trellis, 11 rows start at dierent times, 44, 51
mergeable trellis, 53{55 minimum distance of a code, 60
mergeable vertices, 53{55 Miracle Octad Generator, 66, 74
min-sum semiring, 14, 18, 19 monoid, 13
minimal proper trellis, 23{26, 28, 32, 95 most likely codeword, 19
as covering of a Cartesian array, 28 Muder bound, 71, 73, 74, 80, 94
existence of, 23 Muder trellis, 24, 25
uniqueness of, 23, 25 mutual information, 84, 87
minimal trellis, 4, 12, 26, 48, 49 between past and future, 84
as a binary decision diagram, 46 Myhill-Nerode theorem, 26
as BCJR trellis, 34{36, 48, 51 MDS Code problem, 61, 62
as biproper trellis, 53{55 Minimum Distance problem, 60, 61
as Forney trellis, 38{40
as Kschischang-Sorokine trellis, 40{45
as Massey trellis, 36, 37 non-mergeable trellis, 53{55
as non-mergeable trellis, 53{55
componentwise optimal, 64
de nition of, 22 objective function
enumeration of, 56 \-convex under amalgamation, 106
existence of, 32 decomposition-associative, 105
for Reed-Muller codes, 65 decomposition-linear, 102
for self-dual code, 52 for Viterbi decoding, 106
for the binary Golay code G24, 66, 74 minimized by sectionalization, 103
131
observable trellis, 8 on the future, 23, 27, 40, 47, 52, 74
one-to-one trellis, 8, 28 on the past, 23, 27, 40, 47, 49, 52, 74
optimal sectionalization, 6, 99{109 projective special linear group PSL2(p), 70
determined by vertex structure, 108 proper trellis, 8, 9, 23{26, 28, 32, 33, 95
dynamical structure of, 106{109
for Viterbi decoding, 106{109
sections of, 107 rectangular code, 30{33, 52{56, 88
rectangular relation, 30
reduced trellis, 7
parity symbols, 36 relative trellis complexity, 93{98
parity-check matrix, 52, 59, 60 trade-o with rate and distance, 98
for a BCH code, 68 representation of a code by a trellis, 3, 9
in BCJR construction, 34 right index R(), 42, 51
partial order of permutations, 64 partition of the time axis, 52
partial order of trellises, 54 rook polynomials, 56
partial syndrome, 34 root vertex in a trellis, 1, 7, 15
partial trellis, 49 row-reduced echelon form, 36, 37, 45, 46
partition rank of a matrix, 59{63
past of a vertex, 24, 27, 49, 53
past pro le, 47, 48, 106, 108, 109 section boundaries in a trellis, 99, 104, 108
past subcode, 38{40, 47 section of a trellis, 101
dimension of, 44, 47 in the optimal sectionalization, 106, 107
in the dual code, 52, 74 sectionalization algorithm, 100, 104
minimum distance of, 71 compared to Dijkstra algorithm, 105
nested property of, 48 complexity of, 103, 105
support size of, 71 decomposition-associative variation, 105
past/future relation, 27 pseudo-code description of, 103
for rectangular code, 31 sectionalization digraph, 100, 104, 105
path &ows on a trellis, 13{18 sectionalization of a trellis
path in a trellis, 1, 7 as amalgamation sequence, 102
permutation dynamical structure of, 108
componentwise optimal, 64{68, 73, 74 for Viterbi decoding, 106{109
uniformly dominating, 64 increases symbol alphabet, 99
uniformly e#cient, 64{68, 73, 74, 76 into unit sections, 103
uniformly ine#cient, 65 optimal, 6, 99{109
permutation codes, 31 optimality criteria for, 99
permutation problem, 4, 22, 57 reduces the state complexity, 11
heuristics for, 62, 67 shrinks the time axis, 99
NP-hardness of, 58, 59, 61 sectionalization problem, 6, 22, 99{109
Plotkin bound, 94, 96 self-dual codes, 52, 68, 75, 109
primary alphabet of a trellis, 100 semiring
product of trellises, 40, 41, 49, 50 addition operation, 13
associativity of, 42 Boolean, 14, 16
projection of a code de nition of, 13
on a subset of the time axis, 78, 85 distributive law, 13
132
min-sum, 14, 18, 19 permutations of, 57, 64, 74
product operation, 13 sectionalization of, 80, 99{109
span, 42, 43, 46, 51, 63, 77 standard binary order of, 65, 70
atomic, 46, 51 toor vertex in a trellis, 7
length of, 42 total span of a trellis, 10, 43, 77
of a codeword, 42 trellis
of a matrix, 43 alphabet, 7
span bound on trellis complexity, 77, 78, applications in practice, 13
80, 85, 93, 94, 96 as a binary decision diagram, 46
as a special case of LV bound, 80, 96 as a covering by bicliques, 27
asymptotic form, 94 as a graph-theoretic object, 1, 7
for a tail-biting trellis, 78 biproper, 9, 52{55
standard binary order branches of, 8
for Reed-Muller codes, 65 co-proper, 9, 53, 55
state complexity of a trellis, 10, 58, 59, 71, componentwise optimal, 64
72, 76, 77, 79, 87{92 construction of
for BCH codes, 69 BCJR, 3, 34{36, 48, 51
for Reed-Muller codes, 65, 66 Forney, 4, 38{40
reduced by sectionalization, 11 Kschischang-Sorokine, 40, 42, 43, 45
state-cardinality pro le, 10, 48 Massey, 4, 36, 37
state-complexity pro le, 10, 48, 50, 60, 63, decomposition of the alphabet, 7
64, 71, 72, 75 de nition of, 1, 7
expressed as width, 59 depth of, 7, 100
for the binary Golay code G24, 66, 74 edge complexity of, 11, 82
for the lexicode L16, 67, 73 edge set of, 7
for the quadratic-residue code Q48, 68 edge structure of, 50{52
of the dual code, 51 edge-cardinality pro le of, 11, 48
state-space of a trellis, 35 edge-complexity pro le of, 11, 48, 50,
support, 85 51, 71, 72, 82
de nition of, 71 elementary, 42, 43, 49
support weights, 72, 75 equivalent, 100, 102
survivor edge, 20 expansion index of, 11, 54, 55
survivor path in a trellis, 18{20 for a block code, 3, 9
State-Complexity Profile problem, 60 for a convolutional code, 2, 3
State-Complexity over Large Fields for a group code, 46
problem, 61, 62 for BCH codes, 69
for nonlinear codes, 26, 86{88
for Reed-Muller codes, 65
tail-biting trellis, 6, 16, 78 for self-dual code, 52
Tanner graph, 6 for the binary Hamming code, 3, 12, 99
time axis for the dual code, 51, 52
for a code, 22 for the Golay code G24, 66, 74, 109
for a trellis, 1 for the lexicode L16, 67, 73
lexicographic order of, 65, 70 for the Nordstrom-Robinson code, 88
partition of, 52, 75 for the quadratic-residue code Q48, 67
133
hardware implementation of, 10 maximum number of edges, 11
history of, 1{5 maximum number of states, 10
improper, 26 minimized by minimal trellis, 53{56
invention of, 1, 13 state complexity, 10
isomorphic, 9 state-cardinality pro le, 10
language of, 14, 17 state-complexity pro le, 10
length of, 100 total edge span, 11
merge index of, 11 total number of edges, 11
minimal, 22, 25, 26, 29, 32, 33, 54 total number of states, 10
minimal proper, 23{26, 28, 32, 95 total span, 10
non-mergeable, 53{55 Viterbi decoding complexity, 11, 106
observable, 8 trellis poset, 54
one-to-one, 8, 28 trellis representation of a code, 3, 9
partial, 49 trellis structure
partition of the edge set, 7 =, >, <, ./ types of, 50{52
partition of the vertex set, 7, 14 butter&y ./, 51, 52, 68
product operation, 40{42, 49, 50 degenerate butter&y, 51
proper, 8, 9, 26, 33, 78 of self-dual codes, 52
reduced, 7 of the dual code, 52
representing a code, 3, 9 of vertices, 108
sectionalization of, 99{109 trellis structure of lattices, 6
state complexity of, 10, 58, 59, 65, 71, trellis-adjacency, 32, 33
72, 76, 77, 79, 87{92 trellis-equivalence, 23, 32, 33, 36
state-cardinality pro le of, 10, 48 trellis-oriented generator matrix, 46, 47, 51,
state-complexity pro le of, 10, 48, 50, 66, 67, 77
51, 59, 60, 63, 64, 71, 72 Turyn construction, 66
states of, 8 twisted squaring construction, 65, 67, 70
tail-biting, 6, 16, 78 Trellis State-Complexity problem, 59
temporal notation for, 1, 8
time axis for, 1, 64
time invariant, 3 uniformly concise codes, 98
total span of, 10, 43 uniformly e#cient permutation, 64{68, 73,
unsectionalized, 103 74, 76
vertex set of, 7 uniformly ine#cient permutation, 65
vertex structure of, 108 unit section of a trellis, 101{103
Viterbi decoding complexity of, 11, 20, unsectionalized trellis, 103
54{56, 106
trellis codes, 3
trellis complexity measures vertex in a trellis
coincide asymptotically, 12, 93 degree of, 20, 49, 50
computation of, 48 depth of, 1
edge complexity, 11 future of, 24, 27, 49, 53, 54
edge-cardinality pro le, 11 lying on a path, 7
edge-complexity pro le, 11 past of, 24, 27, 49, 53
expansion index, 11 vertex merging in a trellis, 55
134
vertex set of a trellis, 7 pseudo-code description of, 15, 20
as a linear space, 35 trace-back stage of, 20
in the BCJR trellis, 34, 48 Viterbi decoding complexity of a trellis, 11,
in the Forney trellis, 38 20, 54{56
in the Massey trellis, 37
number of vertices in, 48, 82
vertex structure of a trellis, 108 weight of a path, 104, 105
vertex-space of a trellis, 35 weight-enumerator polynomial, 14, 18
Viterbi algorithm, 1, 13{21 width of a matrix, 59{63
compared to other algorithms, 21 wire-tap channel, 72
complexity of, 20, 21 Wolf bound, 4, 62{64, 95, 97
correctly computes &ows, 15
maximum-likelihood decoding with, 18
objective of, 14 Ytrehus bound, 77
135
136

Trellis Structure of Codes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Trellis Structure of Codes

Uploaded by

Copyright:

Available Formats

Trellis Structure of Codes

January 21, 1998

c January 1998 by Alexander Vardy. This document is intended to appear as a chapter in

1.1. Historical overview

1.2. Overview of the chapter

2.1. One-to-one, proper, and biproper trellises

a. Unobservable trellis b. Observable trellis, that is not proper

c. Proper trellis that is not biproper d. Biproper trellis

2.2. Trellis representation of block codes

edge-complexity prole: the ordered sequence b1 b2 : : : bn (13)

3.1. Computing ows on a trellis

the induction hypothesis and Denition 3.1, we have

(v ) = ((v ) ^ 0) _ ((v ) ^ 1) = (0 ^ 0) _ (0 ^ 1) = 0

(v ) = ((v ) ^ 1) _ ((v ) ^ 1) = (0 ^ 1) _ (1 ^ 1) = 1

(') = ((v ) ^ 0) _ ((v ) ^ 1) = (1 ^ 0) _ (1 ^ 1) = 1

computations in (20) { (24) and successively nds:

(') = (f00 01g 0)

Figure 4. Dierent semiring labelings of the same trellis

(v ) = 0 + 0:35 = 0:35

(v ) = min f1:20 + 0:51 1:20 + 0:92g = 1:71

(v ) = min f1:20 + 0:51 0:35 + 0:51g = 0:86

(') = min f1:71 + 0:22 0:86 + 1:61g = 1:93

3.2. Maximum-likelihood decoding

label of e in T , then the new label is dened by the mapping:

 (e ) +  (e ) + +  (en) = (P ) by Denition 3.1, where P is the path e e : : : en .

Furthermore, minimization is the addition operation in S . Hence we can rewrite (32) as

likely codeword, namely !

3.3. Complexity of the Viterbi algorithm

Fi = f(ci ci : : : cn) : (c : : : ci ci : : : cn) 2 C for some c : : : ci 2 Ag (38)

with Pn = F = C and P = Fn = ?, by convention. The code Pi , respectively Fi , is

as follows. Given a codeword c 2 Pi and a path P = e e : : : ei starting at the root of T ,

we say that P corresponds to c, if c is the sequence of edge labels along P , namely if

two codewords c c 2 Pi are T -equivalent, and write c T c , if the paths in T corresponding

write c c , if F (c) = F (c ). It is easy to see that this is indeed an equivalence relation.

are T -equivalent are also future-equivalent.

PT (v) = f x 2 Ai : x is a sequence of edge labels along a path in T ending at v g (41)

FT (v) = f x 2 An i : x is a sequence of edge labels along a path in T starting at v g (42)

Proof. By Proposition 4.2, the number of equivalence classes induced by future-equivalence

Notice that each of P = ? and Pn = C may be regarded as a single future-equivalence class

by convention, so that V and Vn each consist of a single element. Now let

is dened as follows: there is an edge from a vertex v 2 Vi to a vertex v 2 Vi if and only if 0

there is a codeword (c c : : : cn) 2 C , such that (c c : : : ci) 2 v and (c c : : : ci ) 2 v .

code C = f000 100 101 111g at times i = 1 and i = 2 are given by

graph is a proper trellis that represents C .

and e = (v v a) with v 2 Vi . By construction of T , this assumption implies the existence

of two codewords c c 2 C , such that (c c : : : ci ) 2 v and (c c : : : ci) 2 v, while

and ci = ci = a. By the denition of Vi we have (c c : : : ci) (c c : : : ci), which

further implies that

that v = v , contrary to our original assumption. Thus T is proper by Denition 2.3b.

of T , it is clear that C  C (T ). To prove that C (T )  C , we show by induction on i that

there is a codeword x = (x x : : : xn) 2 C , such that (x x : : : xi ) 2 Pi corresponds to the

ists a codeword c = (c c : : : cn) 2 C , possibly distinct from x, such that (c c : : : ci ) 2 v,

(c c : : : ci ci ) 2 v , and ci = a. Since the paths in T corresponding to (x x : : : xi)

and (c c : : : ci ) end at the same vertex v, we have (x x : : : xi ) T (c c : : : ci). In

view of Proposition 4.2, this further implies that

Thus any (n;i)-tuple in the future of (c c : : : ci) is also in the future of (x x : : : xi ).

ci = a, this further implies that (x x : : : xi a) is a codeword of Pi .

Proof. Let T be a minimal proper trellis for C , with vertex set V = V

path in T corresponding to c ends at v, and dene

as a vertex of V . It is now easy to verify that

a codeword past p 2 Pi if (p f ) 2 C . The past/future relation induced by C at time i may be

identical. Now consider the code C m of length n = 2 with jC m j = 2m ; 1, dened as the

Figure 6. Cartesian array and the corresponding minimal trellis

A code C is rectangular if the past/future relation induced by C is rectangular at each time

edge-complexity prole: the ordered sequence b1 b2 : : : bn (13)

3.1. Computing ows on a trellis

the induction hypothesis and Denition 3.1, we have

computations in (20) { (24) and successively nds:

label of e in T , then the new label is dened by the mapping:

(e ) + (e ) + + (en) = (P ) by Denition 3.1, where P is the path e e : : : en .

is dened as follows: there is an edge from a vertex v 2 Vi to a vertex v 2 Vi if and only if 0

and ci = ci = a. By the denition of Vi we have (c c : : : ci) (c c : : : ci), which

that v = v , contrary to our original assumption. Thus T is proper by Denition 2.3b.

of T , it is clear that C C (T ). To prove that C (T ) C , we show by induction on i that

path in T corresponding to c ends at v, and dene

identical. Now consider the code C m of length n = 2 with jC m j = 2m ; 1, dened as the

jVij !i for all i = 1 2 : : : n (51)

that e e : : : ei = e e : : : ei . Thus P = P , and T is proper by Denition 2.3a.

a codeword c = (c c : : : cn ) 2 C , such that c 2 v \ v . The label of this edge is (ei ) = ci .

dim i(C ) = dim C ; dim (Pi Fi ) (66)

and F = F = F = f0g. The direct-sum subcodes Pi Fi are thus given by:

P F = f0 0111000 0000111 0111111g

P F = f0 0111000 0000111 0111111g

P F = f0 0111000 1010100 1101100g

P F = f0 0111000 1010100 1101100g

Kschischang-Sorokine construction. This construction is dierent from the previous

are the componentwise products of the corresponding proles of T and T , namely 0

TG = Tx1 Tx2 Txk

of x x : : : xpi . Then R(c) = R(xj ) for some j pi + 1 in view of (76). Furthermore,

all i = 1 2 : : : n, we have (Pi Fi ) \ (Pi Fi ) = Pi Fi. Furthermore, it is easy to

see that (Pi Fi) C i . This implies that

Hence dim Ei = k ; dim(Pi Fi ) = k ; pi ; fi in view of (91), as claimed. Another

We now dene %pi = pi ; pi and %fi = fi ; fi, with %p = %f = 0 by convention. Due