Professional Documents
Culture Documents
Abstract-This tutorial paper begins with an elementary presenta- form block codes of the same order of complexity, there
tion of the fundamental properties and structure of convolutional remainstodate a lack of acceptance of convolutional
codes and proceeds with the development of the maximum likeli-
coding and decoding techniques on thepart of many
hood decoder. The powerful tool of generating function analysis is
demonstrated toyield forarbitrary codes both the distance propertiescommunication technologists. In most cases, this is due
and upper bounds on the bit error probability for communication to an incomplete understanding of convolutional codes,
over any memoryless channel. Previous results on code ensemble whose cause can be traced primarily to the sizable litera-
average error probabilities are also derived and extended by these ture in this field, composed largely of papers which em-
techniques.Finally,practicalconsiderationsconcerning finite de-
codingmemory,metric representation,andsynchronizationare
phasize detailsof the decoding algorithms rather than the
discussed. more fundamental unifying concepts, andwhich,until
I.INTRODUCTION recently, have been divided into two nearly disjoint sub-
sets. This malady is shared by the block-coding litera-
greatly outperform earlier versions of sequential decoders time-varyingconvolutional codes bymeans of a gen-
bothintheoryandpractice.Meanwhilethefeedback eralizedgeneratingfunctionapproach;explicitresults
decoding advocates wereencouragedby the burst-error are obtained for the limiting case of a very noisy channel
correctingcapabilities of the codes which renderthem and compared with the corresponding results for block
quite useful for channels with memory. codes. Finally, practical considerations concerning finite
T o add to theconfusion, yet a thirddecoding technique memory, metric representation, and synchronization are
emerged with the Viterbi decoding algorithm [9], which discussed. Furtherand moreexplicit details on these
was soon thereafter shown to yield maximum likelihood problemsanddetailedresults of performanceanalysis
decisions (Forney [ 121, Omura [ 171 ) . Although this ap- andsimulationare givenin thepaperbyHellerand
proach is probabilistic and emerged primarily from the ,Jacobs [ 241.
sequential-decoding oriented discipline, it leads naturally While sequential decoding is not treated explicitly in
to a morefundamentalapproachto convolut.iona1 code thispaper,thefundamentalsandtechniquespresented
representationandperformanceanalysis.Furthermore, herelead naturallytoaneleganttutorialpresentation
byemphasizingthedecoding-invariantproperties of of thissubject,particularlyif, following Jelinek [18],
convolutional codes, one arrives directly to the maximum onebeginswith the recentlyproposed stack sequent.ia1
likelihooddecoding algorithm and from it to the alter- decoding algorithm proposedindependentlybyJelinek
nateapproaches whichlead tosequential decoding on and Zigangirov [7], which is far simpler to describe and
the one hand and feedback decoding on the other. This understand then the original sequential algorithms. Such
decodingalgorithm has recentlyfoundnumerousappli- a development, which proceeds from maximum likelihood
cations incommunicationsystems,two of which are decoding tosequential decoding,exploiting the simi-
covered in this issue (Hellerand .Jacobs [24], Cohen larities in performance and analysis has been undertaken
e t al. [25] ) . It is particularly desirable for efficient com- by Forney [22]. Similarly, the potentials and limitations
munication a t very high datarates, where very low of feedback decoders can be better understood with the
errorratesarenotrequired,or wherelargedecoding background of the fundamental decoding-invariant con-
delaysareintolerable. volutional code properties previously mentioned, as dem-
Foremost among the recent works which seek to unify onstrated, for example, by the recent work of hlorrissey
thesevariousbranches of convolutionalcodingtheory ~151.
is that of Forney 1121, [21], [22], et seq., which includes
11. CODE REPRESENTATION
a three-part contribution devoted, respectively, to alge-
braicstructure,maximum likelihooddecoding, andse- A convolutional encoder is a linear finite-state machine
quential decoding. Thispaper, whichbegan asanat- consisting of a K-stage shift register and n linear alge-
temptto present, theauthor’soriginalpaper [9] to a braicfunctiongenerators. Theinputdata, whichis
broader audience; is another such effort a t consolidating usually, though not necessarily, binary, is shifted along
this discipline. the register b bits at a time. An examplewith K = 3,
It beginswith anelementarypresentation of the n = 2, b = 1 is shown in Fig. 1.
fundamcntalpropertiesandstructure of convolutional The binary input data and output code sequences are
codes and proceeds toanatural development of the indicated on Fig. 1. The first three input bits, 0, 1, and
maximum likelihooddecoder. The relative
distances 1, generate the code outputs 00, 11, and 01, respectively.
among codewords are then determined by means of the We shall pursue this example to develop various repre-
generatingfunction (ortransferfunction) of the code sentations of convolutional codes andtheirproperties.
statediagram.This in turn leads totheevaluation of The techniques thus developed will then be shown to
coded communication system performance on any mem- generalize directly to any convolutional code.
orylesschannel.Performance isfirstevaluatedforthe It is traditional and instructive to exhibit a convolu-
specific cases of thebinarysymmetricchannel (BSC) tional code bymeans of atreediagramas shownin
andtheadditivewhiteGaussian noise (AWGN)chan- Fig. 2.
nel with biphase (or quadriphase) modulation, and finally If the first input bit is a zero, the code symbols. are
generalized toothermemorylesschannels.Newresults thoseshown on the firstupperbranch, while if it is a
are obtained for t,he evaluation of specific codes (by the one, theoutput codesymbols are thoseshownon the
generatingfunctiontechnique),ratherthantheen- first lower branch. Similarly, if the second input bit is a
semble average of a class of codes, ashad been done zero, we tracethetreediagramtothe nextupper
previously, and for bit error probability, as distinguished branch, while if it, is a one, we trace the diagram down-
from event error probability. ward. In this manner all 32 possible outputs for the first
The previousensembleaverageresultsarethen ex- five inputs may be traced.
tended to bit error, probability bounds for the class of From the diagram it also becomes clear that after the
first three branches the structure becomes repetitive. I n
fact, we readily recognize that beyond the third branch
1 This
material first appeared inunpublishedform as the
notes for th? Linkabit. Corp., “Seminar on convolutional codes,”
the codesymbols on branchesemanatingfromthetwo
Jan. 1970. nodeslabeled a are ident.ica1, andsimilarlyforall the
VITERBI : CONVOLUTIONAL CODES 753
010001. ..
m a=
001101010010. . 011010. .. b=
C - m
~011100;. . I
d=(lll
Fig. 1. Convolutional coder for K = 3, TL = 2, b = 1.
Fig. 3. Trellis-code representation for coder of Fig. 1 .
nn -00
n 10
I 1 r L 0 0
?
:,;
llb
01 d r 0 1
1 11
01
11
=b c=
00
01 10d
y;o
10
1’
10
11
pond merely to the last two input bits to the coder we
may use these bits to denote the nodes or states of this
diagram.
01 Weobservefinally thatthestatediagram can be
drawndirectlybyobservingthefinite-statemachine
properties of the encoder and particularly the fact thata
10
four-st,ate
directed
graph
can
be used to represent
Fig. 2. Tree-code representation for coder of Fig. 1 uniquely the input-outputrelation of theeight-state
machine. For the nodes represent the previous two bits
identically labeled pairs of nodes. The reason for this is while thepresentbit is indicatedby thetransition
obvious from examination of the encoder. As the fourth branch;forexample, if the encoder (machine)contains
input bit enters the coder a t t h e right, the first data bit 011, this is represented in the diagram by the transition
falls off on the left end and no longer influences the out- from state b = 01 to state d = 11 and the corresponding
put codesymbols.Consequently, thedata sequences branch indicates the code symbol outputs 01.
1 0 0 ~. .~ and
. OOOxy- . generate the same code symbols
afterthethirdbranchand, as isshownin thetree 111.MINIMUM DISTANCE DECODER FOR BINARY
Referring first to the tree diagram, this implies that channels.Anotherdescription of thealgorithmcan be
we should choose that path in the tree whose codese- obtainedfromthestate-diagramrepresentation of Fig.
quence differs in the minimum number of symbols from 4. Supposewc soughtthatpatharoundthedirected
the receivedsequence.However,recognizing thatthe state diagram, arriving atnode a. after the kthtransit.ion,
transmitted code branches remerge continually, we may whose code symbols are at a minimum distance from the
equallylimitour choice tothe possiblepat.hs inthe receivedsequence. But clearlythisminimumdistance
trellisdiagram of Fig. 3. Examination of thisdiagram path to node a a t time k can be only one of two candi-
indicatesthat it isunnecessary to consider theentire dates:the miminunldistancepath.to node a attime
received sequence(whichconceivablycould be thou- k - 1 and the minimum distance path to node c a t time
sandsor millions of symbolsinlength) a t one time in k - 1. The comparison is performed by adding the new
decidingupon t.he most likely
(minimum distance) distanceaccumulated in thekthtransitionby each of
transmittedsequence. In particular,immediatelyafter thesepathstotheirminimumdistances(metrics) at
thethirdbranch we maydetermine which of the two timc k - 1.
paths leading t o node or state a ismorelikely to havc It appears thus that the statc diagram also represents
been sent. For examplc, if 010001 is received, i t is clear asystemdiagramforthisdecoder.Witheach node or
that this is at distance 2 from 000000 while it is a t dis- state we associate a stomgeregister whichremembers
tance 3 from 111011 andconsequently we may exclude theminimumdistancepathintothestateaftereach
the lower pathinto node a. For, no matterwhatthe transition as well as a metric register which remembers
subsequent, received symbols will be, they will effect the its
(minimum) distancefromthe received
sequence.
distanc,es only over subsequent branches after these two Furthermore, comparisons are made a t each step between
pathshave remerged andconsequentlyinexactlythe the two paths whichlead intoeachnode.Thusfour
sameway.Thesamecan be saidforpairs of paths comparators must als9 be provided.
merging at. the other three nodes after the third branch. Thereremainsonlythequestion of truncatingthe
We shall refer to the minimum distance path of the two algorithmandultimately decidingonone pathrather
paths merging at. a given node as the “survivor.” Thus than four. This iseasilydonebyforcing thelasttwo
it is necessary only to remember which was the minimum input bits to the coder to be 00. Then the final state of
distance path from the receivedsequence (or survivor) the code must be a = 00 and consequently the ultimate
at each node, as well as the value of that minimum dis- survivor is the survivor at node a, after the insertion
tance. This is necessary because at the next nodelevel into the coder of the two dummy zeros and transmission
we mustcomparethetwobranches merging a t each of the corresponding four code symbols. In terms of the
nodelevel,whichweresurvivors at the previous level trellis diagram this means that the number of states is
for different nodes; e.g., the comparison a t node a after reducedfromfour to twoby the insertion of the first.
the fourth branch is among the survivors .of comparisons zero and to a single state by the insertion of the second.
a t nodes a and c after the third branch. For example, if The diagram is thus truncated in the same way as it was
the receivedsequence overthefirstfourbranches is begun.
01000111, the survivor at the third nodelevel for node We shall proceed to generalize these code representa-
a is 000000 with distance 2 and at. node c it is 110101, tionsandoptimal decodingalgorithm to general con-
also with distance 2. I n going from the third node level volutional codes and arbitrary memoryless channels, in-
to the fourth the received sequence agrees precisely with cluding theGaussianchannel, ‘in SectionsVandVI.
the survivor from c but has distance 2 from the survivor However, first we shall exploit the state diagram further
from a. Hence the survivor at node a of the fourth level todeterminetherelativedistanceproperties of binary
is the data sequence1100whichproduced the codese- convolutionalcodes.
quence 11010111 which is at (minimum) distance 2 from
the received sequence. IV. DISTANCE PROPERTIES
OF CONVOLUTIONAL CODES
I n this way we may proceed through the received se- We continue .to pursue the example of Fig. 1 for the
quence andat each stepfor each state preserveone sake of clarity; in the next .section we shall easily gen-
survivingpathanditsdistance from the received se- eralize results. It is well known that convolutional codes
quence, which is more generally called metric. The only aregroup codes. Thus there is no loss in generalityin
difficulty which mayarise is the possibility that in a computing the distance from the all zeros codeword to
givencomparisonbetweenmergingpaths, the distances all the other codewords,for this set of distances is the
or metrics are identical. Then we may simply flip a coin same as the set of distances from any. specific codeword
as isdoneforblockcodewords at equal distancesfrom to all the others.
the received sequence. For even if we preserved both of For this purpose we may again use either the trellis
theequallyvalidcontenders,further received symbols diagramorthestatediagram.Wefirst of allredraw
would affect both metrics in exactly the same way and the trellis diagram in Fig. 5 labeling the branches ac-
thus not further influence our choice. cording to their distances from the all zeros path. Now
This decoding algorithm was first proposed by Viterbi consider all the paths that, merge with ‘the all zeros for
[9]in the more general context of arbitrary memoryless the first time at some arbitrary node j .
VITERBI : CONVOLUTIONAL CODES 755
V. GENERALIZATION
TO ARBITRARY
CONVOLUTIONAL
CODES
The generalization of thesetechniques toarbitrary
binary-tree ( b = 1) convolutionalcodesisimmediate.
That is, a coder withaI<-stageshiftregisterand n
mod-2 adders will produce a trellis or state diagram with
2K-1 nodes or states and each branch will contain n code
symbols. The rate of this code is then
fc = -1 bits/code symbol.
n
The exalnple pursued in the previous sections had rate
R = 1/2. The primary characteristic of the binary-tree
codes is that only two branches exit from and enter each
node.
If ratesotherthan l/n are desired we mustmake
b >-1, where b is the number of bits shifted into the
register at one time. An example for K = 2, b = 2, n =
3, and consequently rate R = 2/3 is shown in Fig. 8 and Fig. 9. State diagram for code of Fig. 8.
its state diagram is shown in Fig. 9. It differs from the
binary-tree codesonly inthat eachnode is connected
tofourothernodes,andforgeneral b it willbecon- responding symbol of x i with probability p and is identical
nected to 2b nodes. Still all the preceding techniques in- to it with probability 1 - p .
cluding thetrellisandstate-diagramgeneratingfunc- Forcompletelygeneralchannels it isreadilyshown
tionanalysisarestillapplicable. It must benoted, [ 6 ] , [14] that if all inputdata sequences areequally
however, that the minimum distance decoder must make likely, the decoder which minimizes the error probability
comparisons among all the paths entering is one which compares the conditional probabilities, also
each node a t
eachlevel of thetreliisand selectone survivorout of calledlikelihood functions, P ( y I X"")), where y isthe
four (or outof 2* in general). overall received sequence and X"'" is one of the possible
transmitted sequences, and decides in favor ,of the max-
imum. This is called a maximum likelihood decoder. The
VI. GENERALIZATION OF OPTIMAL DECODER TO
likelihood functionsaregivenorcomputedfromthe
ARBITRARY MEMORYLESS CHANNELS
specifications of the channel.Generally it is morecon-
Fig. 10 exhibitsacommunicationsystememploying venient to compare the quantities log P(y I xcm))called
a convolutional code. The convolut,ionalencoder
is the log-likelihood functionsandtheresultisunaltered
preciseiy the devicestudiedintheprecedingsections. since the logarithm is a monotonic function of its (always
Thedata sequenceisgenerallybinary (ai = 0 or 1) positive) argument.
and the code sequence is divided into subsequences where To illustrate,letus consider againthe BSC. Here
x i representsthe n codesymbolsgeneratedjustafter eachtransmittedsymbolisalteredwithprobability
the input bit ai entersthecoder: that is,thesymbols p < .1/2. Now suppose we have received a particular
of the jth branch. In terms of the example of Fig. 1, N-dimensionalbinarysequence y andare considering
a3 = 1 and x3 = 01. Thechanneloutputor received a possible transmitted N-dimensional code
sequence
sequence is similarly denoted. y i represents the n symbols whichdiffers in d m symbolsfrom y (that is, the
received when the n code symbols of xi were transmitted. Hamming distance
between
and y is d m ) . Then
This model includes the BSC wherein the y i are binary since thechannel is memoryless(i.e., it affectseach
n vectorseach of whosesymbolsdiffersfrom the cor- symbol independently of all the others), the probability
VITERBI CONVOLUTION.AL CODES 757
1
I
that
this was transformed
to
the specific received
y at distance d,, from it is
p(y I x(m)). = pd"(l - p)N-d-
stant ineachcase. Furthermore, since we may assume Fig. 11. Modemforadditivewhite Gaussiannoise P S K . modu-
lated memoryless channel. '
p < 1/2 (otherwise the roie of 0 and 1 is simply inter-
changed at. the rcceivcr), we may express this as
code path xi("')
log P ( y I ~ ( m ) ) = -adrn -0 (3)
wherc and ,8 are positive constants and d,,, is the (posi-
.(Y
theAWGNchannelwithbiphase4phase-shiftkeying
(PSK) modulation. Themodulatorandoptimum de-
modulator(correlatdr or integrate-anddumpfilter)for
this channel are shown in Fig. 11.
We use the notation thatx i k is the kth code symbol for
the jth branch. Each binary symbol (which we take here
forconvenience t o be f1) modulatesthecarrierby
=tII/2 radiansfor T seconds. The transmissionrate is, n
therefore, 1 / T symbols/secondor b/nT' = R / T bit/s.
The function e, is the energy transmitted for each symboi.
The energyperbit is, therefore e b = E J R . The white where C and D are independent of m, and we have
Gaussian noise is a zero-mean random process of one- used the fact. that = 1. Similarly,
the log-likeli-
sidedspectraldensity NO W/Hz, whichaffectseach hoods function for any path is the sum of the log-likeli-
symbol independently. It thetl follows direct,ly that the hood functions for each of its branches.
channel outbut symbol y i k is a Gaussian random variable Wehavethus shown thatthe maximumlikelihood
whose mean is d < x j k (i.e., + 4:
if x j k = 1 and - decoderforthe memoryless AWGNbiphase (or quad-
if xi,+ = -1) ,and whose varianceis N , / 2 . Thusthe riphase) modulated channel is one which forms the inner
conditionalprobabilitydensity (or likelihood) function product between the received (realnumber) sequence
of Y i k given x i k is andthe code sequence(consisting of f 1) and chooses
the path corresponding to the greatest. Thus the metric
forthischannel is the innerproduct (5) as contrasted
with the distanceG metricused for the BSC.
The likelihood function for the jth branch of a particular
For convolutional codes the structure of the code paths path merging with the all zeros a t node a a t the jth level.
was described in Sections II-V. I n Section I11 the opti- Nowsuppose that the previous-levelsurvivorswere
mum decoder w&s derived for the BSC. It now becomes such that the path compared with the all zeros a t step j
clearthat if we substitutetheinnerproductmetric is the path whose data sequence is 00 . 0100 correspond-
syjkxjk(m) forthedistancemetric s d j k ( m ) ,usedfor the ing to nodes a * * . a a b c a (see Fig. 4.). This differs
'
BSC, all the arguments used in Section 111 for the latter from the correct (all zeros) path in five symbols. Conse-
apply equally to this Gaussian channel. I n particular the quently an error will be made in this comparison if the
optimum decoder has a block diagram represented by the BSC caused three or more errors in these particular five
code state diagram. At step j the stored metric for each symbols. Hence the probability of an error in thisspecific
state(whichisthemaximum of the metrics of all the comparison is
paths leading to this state at this time) is augmented by
thebranchmetrics for branchesemanatingfromthis
state. The comparisons are performed among all pairs of
P, = 2 (:)p"(l -
e-3
p)5--s.
(oringeneralsets of 2 b ) branchesenteringeachstate On the other hand, there is no assurance that this par-
and the maxima are selected as the new most likely paths. ticular distance five path will have previously survived
Thehistory(inputdata) of each new survivormust so as to becomparedwiththecorrectpath at the jth
again be storedandthedecoderis now readyforstep step. If either of the distance 6 paths were compared in-
j + 1. stead, then four or more errors in the six different sym-
Clearly, this argument generalizes to any memoryless bols will definitely cause an error in the survivor deci-
channel and we must simply use the appropriate metric sion, while three errors will cause a tie which, if resolved
In P ( y I X("')), which may always be determined from the by coin flipping, will result in an error only half the time.
statisticaldescription of thechannel.Thisincludes, Then the probability if this comparison is made is
among others, AWGN channels employing other formsof
modulation.'
In thenextsection, we applytheanalysis of con-
volutionalcodedistanceproperties of Section IV t o Similarly, if thepreviousiysutvivingpathsweresuch
determinetheerrorprobabilities of specific codes on that a distance d path is compared with the correct path
more general memoryless channels. at the jth step, the resulting error probability is
VII. PERFORMANCE
OF CONVOLUTIONAL
CODES k odd
ON MEMORYLESS
CHANNELS
I n Section IV we analyzed the distance properties of
convolutional codes employinga state-diagram generating
functioli technique. We now extend this approach 'to ob-
tain tight upper bounds on the error proba.bility of such
codes.We shallconsider the BSC, the AWGN channel
Now at step j, since there is no simple way of deter-
and more general memoryless channels, inthat order. We
mining previous survivors, we may overbound the prob-
shail obtain both the first-event error probability, which
ability of afirst-event,errorbythesum of theerror
is the probability that the correct path is excluded (not
probabilities for all possible paths which merge with the
a survivor) for the first time at the jth,step; and the bit
correct path at this point. Note this union bound is in-
error probability which is theexpectedratio of bit er-
deed anupperboundbecausetwoormoresuchpaths
rors to total numberof bits transmitted.
may both have distance closer to the received sequence
than the correct path (even though only one has survived
A . BinarySymmetric Cho.nnel
to this point) and thus the events are not disjoint. For
Thefirst-eventerrorprobability is readilyobtained the example with generating function (1) it follows that
from the generat.ing function T(D)[ (5) for the code of the first-event error probabilitys is bounded by
Fig. 1, whichwe shallagainpursuefordemonstrative
purposes].Wemayassume,without loss of generality, P, < P, + 2P, + i P , + 1 . . + 2kP,+,+ * . . (9):
since we are dealing with group codes, that the all zeros where PI,is given by (8).
path was, transmitted. Then a first-event error is made a t I n Section VII-Cit wili be shown that (8) canbe
the jth step if this path is excluded by selecting another upper bounded by (see (39) ) .
P, < 2k-52kp(1- p ) k / 2
k=5
/PATH x "
T(D) = ak Dk ( 1 2)
k=d binary-tree code if we weight each term of the first-event
the first-event error probability is bounded by the gen- error probabilitybound at any step by the number of
erroneous bits for each possible erroneous path merging
eralization of ( 9 ) .
with the correct path at that node level, we upper bound
i - thebiterrorprobability. For, agiven step decision
corresponds to decoder actionon onemore bit of the
transmitteddata sequence; the first-eventerrorprob-
where Ph:is given by (8) and more loosely upper bounded abilityunionboundwitheachterm weighted bythe
by the generalization of (11) correspondingnumber of biterrors is anupperbound
on the expectednumber of biterrors caused bythis
P E < T'(D) I D = ~ ~ F K I ~ . (14)
action.Summingthe expect,ed number of biterrors
Whenever a decision error occurs, one or more bits will over L steps, which as was just shown mayresultin
be incorrectly decoded. Specifically, those bits in which overedmating through double counting, gives an upper
the path selecteddiffersfrom the correctpath will be bound on the expected number of bit errors in L branches
incorrect. If only one error were ever made in decoding for arbitrary L. But since the upper bound on expected
an arbitrary long code path, the number of bits in error number of bit errors is the same a t each step, it follows,
in this incorrect path could easily be obtained from the upon dividing the sum of L equal terms by L, that t,his
augmentedgeneratingfunction T ( D , N ) (such as given expectednumber of biterrors per step is justthebit
by (2) with factors in L deleted). For the exponents of errorprobability P,, forabinary-tree code (b = 1 ) .
the N factors indicate the number of bit errors for the If b > 1, then we must divide this expression by b, the
given incorrect path arriving a t node a at the jth level. number of bits encoded and decoded per step.
After the first error has been made, the incorrect paths T o illustrate the calculation of PIj for a convolutional
no longer will be compared with a path which is overall code, let us consideragain the example of Fig. 1 . Its
correct, but rather with a path which has diverged from transfer function in D and N is obtained from (2), letting
the correct path over some span of branches (see Fig. 12). L = 1, since we are not now interested in the lengths of
If the correct path x hasbeen excluded bya decision incorrect paths, to be
error at step j in favor of path x', the decision at step
j + 1 will bebetween x' and x". Now the (first-event) T(D,N ) = D5N
errorprobability of (13) or (14) isforacomparison, 1 - 2DN
at any step, between path x and any other path merging - D5N + 2D6N2= . . . + Z k Dk"Nk+' + . . . .
with it at that step, including path x" in this case. How-
ever,since the metricgfor path x' is greaterthanthe (15)
metricfor x, foronthisbasisthecorrectpath was
excluded at step j, the probabilit,~ that path x" metric The exponents of the factors in N in each term deter-
exceeds path x' metric at step j +1 isless thanthe mine
ing to
the number of bit errors for the path(s) correspond-
that term. Since T ( D ) = T(D, N ) I N = l yields the
probability thatpath x" exceeds the (correct) path x
metric atthispoint. Consequently, theprobability of first-event errorprobability P,, eachofwhose terms
a newincorrect path beingselectedafteraprevious must be weighted by the exponent of N to obtain PO, it
errorhasoccurred is upperboundedby the first-event follows that we should first
differentiate T ( D , A') a t
error probability at that step. N = 1 to obtain
Moreover,whenaseconderror follows closely after
afirsterror, itoften occurs(as inFig. 12) thatthe
erroneous bit(s) of path x" overlap the erroneous bit(s)
of path x'. With this in mind, we now show that for a
Then from this we obtain, as in (9) , that for the BSC higher met,ric than the correct path, i.e.,
PB <p5 + 2*2P6 "
Xii'yii 2
n
xiiyii
+ 3'4P7 + * * ' + ( k + 1)2kpk+,+ " * (17) i i=1 i i=l
< exp
- (2) As was indicated in Section VI, for equally likely input
data sequences, the minimumerrorprobability
chooses thepath whichmaximizes
decoder
the log-likelihood
whence the bound of (24), using (27), becomes
funct.ion (metric)
In P(y I x("'))
PB < 2
k=d
CkPk wherc xijO")is a code symbol of the ,mth path, y i j is the
corresponding received (demodulated)symbol, j runs
where ck are thecoefficients of
over the n symbols of each branch, and i runs over the
branchesin t,he given path.Thisincludesthe special
cases considered in Sections VII-A and -B.
The decoder isthesameasforthe BSC except for
Thus following the came arguments which led from (24) using this morc general metric. Decisions are made after
to (28) we have for a binary-tree code each set of new branch metrics have been added to the
previouslystoredmetrics. Toanalyzeperformance, we
must merely evaluate PIC,the pairwise error probability
for an incorrect path which differs in k symbols from the
(31)
correctpath,aswasdoneforthe specialchannels of
For b > 1, this expression must be divided by b. SectionsVII-Aand -B. Proceeding as in (22), letting
To illustrate the application of this result we consider xij and xi/ denotesymbols of the correct and incorrect
the code of Fig. 1 withparameters K = 3, R = 1,/2, paths, respectively, we obtain
whose transferfunction isgivenby (15). For this case
since R = 1j2 and E~ = 1/2 Eb, we obtain Pdx, x')
firstinequalityisvalidbecause we are multiplying the where we have used (41) and x$ = xi2 = 1. The product
summandbyaquantitygreaterthanunity,"andthe of these k identical terms is, therefore,
secondbecause we are merely extendingthesum of
positive terms over a larger set. Finally we may break
up the k-dimensional sum over y into. IC one-dimensional P, < exp (2)
summationsover yl, yz, . . , vk, respectively,andthis
+
1-1
x,)1/2P(y, I and using (25)and(30)yieldsthe boundonfirst-
event error probability and bit error probability.
=
k
7-1
c P(y, I xJ1~2P(YrI
I,
(37)
ut-0
where13
and the product (37) of k identical factors is
Do A P(y, I Z~)''~P(~~
I X,')''' < 1. (46)
P, = 2k p k / 2 (1 - p y 2 (39) Ilr
for all pairs of correct and incorrect paths. This was used While thisboundon Pk isvalid for all suchchannels,
in Section VII-A to obtain the bounds (11) and (21). clearly it depends on the actual values assumed by the
For the AWGN channel of Section VII-B we showed symbols x , and x,', of the correct and incorrect path, and
these will generally vary according to the pairs of paths
11 This would be the set of all 2' k-dimensional binary vectors
x and x' inquestion.However, if theinput symbols
for the BSC, and Euclidean k space for theAWGN channel. arebinary, x and 3, whenever x, = x , then x?' = 3,
Note also that the bound of (36) may be improved for
asymmetricchannelsbychangingthetwoexponents of ?h to s
-
and 1 s, respectively, where 0 s < 1. < 13 For an asymmetric channel this bound may be improved by
changingthe twoexponents 1/2 to s and 1 - s, respectively,
12 The squareroot of a quantitygreaterthanone isalso
greater than one. where 0 < s < 1.
VITERBI CONVOLUTION.4L CODES 763
so that for anyinput-binary memorylesschannel (46) n hi
becomes
Do = P(y I ~ ) " z P ( yI 9)"' (47)
U
and consequently ON
(49) TABLE I
FREEDISTANCE
MAXIMUM-MINIMUM
wherc D,, is given by (47). Other examples of channels of --
this type are FSK modulation over the AWGN (both co- K Nonsystematics
Systematic
herentandnoncoherelit)andRayleighfadingchannels.
2 3 9
3 4 5
4 4 6
5 6 7
Theterm syste.matic convolutional code referstoa
code on each of whose branches one of the code symbols a We have excluded catastrophic codes (see Section IX); R = a.
BOUNDS
X. PERFORMANCE FOR BESTCONVOLUTIONAL come isahead we connect the particular stage to the
MEMORYLESS
CODESFOR GENERAL CHANNELS AND particular adder; if it is a tail we do not. Since this is
COMPARISON WITH BLOCK CODES repeated for each new branch, the result is that for each
We beginbyconsidering thepathstructure of a branch of the trellis the code sequence is a random binary
binary-tree14 ( b = 1) convolutional code of any con- n-dimensional vector. Furthermore, it can be shown that
straint K , independent of the specific coder used. For this the distribution of theserandom codesequences is the
purpose we needonly determine T ( L ) the generating same for each branch a t each node level except for the all
function for the state diagram with each branch labeled zeros path, which must necessarily produce the all zeros
merelyby L so that the exponent of each term of the code sequence on each branch. To avoid treating the all
infiniteseriesexpansion of T ( L ) determines the length zeros path differently; we ensure statistical uniformity by
over which an incorrect path differs from the correct path requiring further that after each shift a random binary
before merging with i t a t a given node level. (See Fig. 7 n-dimensional vector be added to each branch16 and that
and (2) with D = N = 1). this also be reselected after each shift. (This additional
After some manipulation of the state-transition matrix artificiality is unnecessary for input-binary channels but
of the state diagram of a binary-tree convolutional code isrequired to prove our resultfor general memoryless
of constraint length K , it is shown in Appendix 115 that channels). Further details of this procedure are given in
Viterbi [9].
LK(l - L) LK We now seek a bound on the average error probability
T(L) = <----
1 - 2L of this ensemble of codes relative to the measure (random-
1 - 2L+ LK
selection process) imposed.. We begin by considering the
+ +
= LK(l 21, +
4L2 * * * +
2‘Lk +
* * .) (50) probability thatafter transmissionoveramemoryless
channelthemetric of one of the fewer than 2k paths
where the inequality indicates that more paths are being
counted thanactuallyexist.The expression (50) indi-
merging with the correct path after differing in K + k
branches,isgreater than the correctmetric. Let Si be
cates that of the paths merging with the correct path at
the correct(transmitted)sequenceand xi‘ anincorrect
a given node level there is i o more than one of length K,
sequencefor theithbranch of thetwopaths.Then
no more than two of length K +
1, no more than three of
following the argument whichled to (37) we have that
+
length K 2, etc.
the probability that the given incorrect path may cause
Wehavepurposelyavoidedconsideringtheactual
a n error is bounded by
code or coder configuration so that the preceding expres-
sions arevalidforallbinary-tree codes of constraint K+k
length K. Wenowextendourclass of codes to include P I ( + k ( X , x)) .< r]: P(y, I Xiy2P(yiI
i-1 yi
(51)
time-varying convolutional codes. A time-varying coder
is one in which the tap positions may be changed after where the product is over allK +
k branches in the path.
eachshift of thebitsintheregister.We consider the If we now average over the ensembleof codes constructed
ensemble of all possible time-varying codes,which in- above we obtain
cludes as asubsettheensemble of all fixed codes,for K+ k
a given constraint length K . ‘We furtherimpose a uniform
probabilisticmeasure on all codes inthisensembleby
PI(+, <r ]: x i X i ’ y i q(xi)p(yi I x i ) ’ / 2 ~ ( x i ’ ) P ~I i
i-1
1 4 Although for clarityallresults will bederived for b = 1, PK+k < ( [ q(x)P(y I X)1/2]2)KCk
= 2 - ( K + k ) n R o(53)
the extension to b > 1 isdirectandtheresults will be indi- Y X
cated at the endof this Section.
15This generatingfunctioncan also,be used to obtainerror
bounds for orthogonal convolutional codes all of whose branches 16 The samevectorisadded to allbranches a t a givennode
have the same weight,as is shown in Appendix I. level.
VITERBI : CONVOLUTIONAL CODES 765
(57)
The bounds of (56) 8nd (57) arefinite onlyforrates
R < Ro, and Ro canbeshown.to be always less than 1
the channel capacity. = - I(Xnl Y") 4
n
c
766 IEEE TRANSACTIONS O N COMMUNICATIONS TECHNOLOGY, OCTOBER 1971
LIM EIR)
6-01
E(R) = p 0 ,
(Eo(p),
OIR<R,
R, < R < C, 0 <p 5 1
lim $(R) =
'6 -0
{y:
R,
0 S R 5 C/2
C/2 5 R 5 C .
(69)
p/R
This limiting form of E ( R ) is shown in Fig. 17.
Thebounds (63) and (64) arefortheaverageerror
&(a) = - 1, O<R<R, (66) probabilities of the ensemble of codesrelat.ive tothe
Eo(p)/R - P , Ro 5 R < C, 0 <P 5 1. measure induced by random selection of the time-varying
coder t a p sequences. At least.onecode in the ensemble
To minimize the numerators of (63) and (64) for R > Ro must perform better than $he average. Thus the bounds
we should choose p as large as possible, since E j o ( p ) is a (63) and (64) hold for t.hebest time-varyingbinary-
nondecreasing function of p . However, we are limited by tree convolutional coder of constraint length K . Whether
t.he necessity of making S ( R ) > 0 t o keep the denomi- there exists a fixed convolutional code with this perform-
natorfrom.becomingzero. On theotherhand, as the ance is an unsolved problem. However, for small K the
constraint length K becomes very large we may choose results of Section VI1 seem to indicate that these bounds
. .
S(R) = 6 very small. In particular, as 8 approaches 0, are valid also forfixed codes.
(65) approaches ' T o determine'the tightness' of the upper bounds, it. is
useful to have lower bounds for convolutional code error
probabilities, It canbeshown [9] that for all R < C
17 C canbemadeequaltothechannelcapacitybyproperly
choosing the ensemble measure q ( x ) . For an input-binary channel
the random binary convolutional coder described above achieves
this.Otherwise ' further transformation of the branch sequence
a
into a smaller set, of nonbinary sequences is required 191. and o ( K ) + 0 as K+ w. Comparison of the parametric
VITERBICONVOLUTIONALCODES 767
LIM EIRI
Both Eb( R ) and E L b ( R )arefunctions of R which for
all R > 0 are less than the exponents E ( R ) and E L ( R )
c/2
T o minimize this bound we shouldmaximize the expo- in performance between optimal and suboptimal metrics
nent E o ( p ) / R - p with respect to p on the unit interval. is significant [ 111.
But this yields exactly E , ( R ) ,the upper bound exponent In a practical system other considerations than error
of (73) for block codes. Thus performancefora given degree of decodercomplexity
oftendictatethe selection of acoding system. Chief
among these are often the synchronization requirements.
Convolutional codes utilizingmaximumlikelihoodde-
where E , ( R ) is the blockcodingexponent. coding areparticularlyadvantageousinthat noblock
Weconclude thereforethatthe memorytruncation synchronization is ever required. For block codes,de-
error is less than the bit error probability bound without coding cannot begin until the initial point of each block
truncation,providedthe bound of (76) is less than the has been located. Practical systems often require more
bound of (64). This will certainly beassured if complexity in the synchronizationsystemthaninthe
decoder. On the other hand, as we have by now amply
illustrated, a maximum likelihood decoder for a convolu-
tional code doesnot’require any blocksynchronization
because the coder is free running (i.e., it performs identi-
Forvery noisychannels we havefrom (69) and (74) cal operations for each successive input bit and does not
or Fig. 17, that require that I< bits be input before generating an out-
put). Furthermore, the decoder does not require knowl-
edge of past inputs to start decoding; it may as well as-
0 I tl _< c/4
sume that all previous bits were zeros. This is not to say
thatinitiallythe decoder will operateas well, inthe
sense of error performance, as if the preceding bits of the
I 1 - R/C
(1 - ’
C/2 < R < C
correct path were known. On the other hand, consider a
decoderwhich startswithaninitiallyknownpathbut
makes an error at some point and excludes the correct
path. Immediately thereafter it will be operating as if it
For example, at R = C / 2 this indicates that it suffices hadjust been turned on withanunknownandincor-
to take M > (5.8)K. rectly chosen previous path history. That this decoder
Anotherproblemfacedby a systemdesigner is the will recover and stop making errors within a finite num-
amount of storage required by the metrics (or log-likeli- ber of branches follows from our previous discussions in
hoodfunctions)foreach of the ZK paths. Fof. a BSC which itwas shown that-, otherthan forcatastrophic
this poses no difficultysince themetric is justthe codes, error sequences are always finite. Hence our ini-
Hammingdistance which is at most n, thenumber of tiallyunsynchronized decoder will operatejustlikea
code symbols, per branch. For the AWGN, on the other decoder which has just made an error and will thus al-
hand, the optimum metric is a real number, the analog ways achieve synchronization and generally will produce
output of a correlator,matchedfilter, or integrate-and- correct decisions after a limited number of initial errors.
dump circuit. Since digital storage is generally required, Simulations have demonstrated that synchronization gen-
it is necessary t o quantize this analog metric. However, erally takes no more than four or five constraint lengths
once the components yjk of the optimum metric of (5), of received symbols.
whicharethecorrelatoroutputs,havebeenquantized Alt.hough, as we have just shown, branch synchroniza-
to Q levels, the channel is no longer an AWGN channel. tion is not required, code symbol synchronization within
For biphase modulation, for example, it becomes a binary a branch is necessary. Thus, for example, for a binary-
input Q-ary output discretememorylesschannel,whose tree rate R = 1/2 code, we must resolve the two-way
transition probabilities are readily calculated asa function ambiguity asto whereeachtwocode-symbol branch
of the energy-to-noise density and
the
quantization begins. This iscalled node synchronization. Clearly if
levels. The optimum metric is not obtained by replacing we make the wrong decisions, errors will constantly be
yi, by its quantized value &(yjk) in (5) but rather it is madethereafter.However,thissituationcaneasily be
the log-likelihood function log P ( y I x c m ) )for the binary- detected because the mismatch will cause all the path
input Q-ary-output channel. metrics to be small, sincein fact there will not be any
Nevertheless,extensivesimulation [24] indicates that correct path in this case. We can thus detect this event
for 8-level quantization even use of the suboptimal metric and change our decision as to node synchronization (cf.
ck Q ( ~ J , ~ ) Zresults
~ ~ (in
~ ) a degradat,ion of nomore Heller and Jacobs [24]). Of course, for an R .= l / n code,
than 0.25 dB relative to the maximumlikelihood decoder we may have to repeat our choice n times, once for each
for the unquantized AWGN, and that use of the optimum of the symbols on a branch, but since n represents the
metric isonlynegligiblysuperior to this. However, t.his redundancy factor or bandwidth expansion, practical sys-
is not the case for sequential decoding,where the difference tems rarely use n > 4.
VITERBI : CONVOLUTIONAL CODES 769
GENERATING
FUNCTIONFOR STRUCTURE
CONVOLUTIONAL CODE
BOUNDS
APPENDIXI
OF A BINARY-TREE
K A N D ERROR
FOR ARBITRARY
FOR ORTHOGONAL CODES L
-L
L
1
0
-L
1 - NL
] * P I n 1 ] = [
X,,,
:1. (83)
1 0 0 -NL 0 0 0 1
-L 1 0 0 -L 0 0
-NL 0 1 0 -NL 0 0
JT-'J=l,]'
0 -L' 0 1 0 -L
0 1
0
0
-NL
0
O
-0L
-NL
0
0
0
0
1
- N0L :
-L
1-NL
Thispatterncan be easilyseen t o generalizetoa Since in all respects, except these two, the matrix after
2K-1 - 1 dimensional square matrix of t,his form for any thissequence of reductionsisthesame as the original
binary-tree code of constraint length K , and in general butwithitsdimensionreducedcorrespondingtoare-
the generating function duction of K by unity, we may proceed t o perform this
sequence of reductions again. The steps will be the same
T ( L ,N ) = LXloo...o, except that now in place of (go), we have
where 100 . . 0 contains ( K - 2) zeros.
+ (79)
N X i , i ,...j K _ , O l = Xil;;...;rc-rll (80')
Fromthisgeneralpatternitiseasilyshownthatthe
matrix can be reduced to a dimension of ZX-'. First. com- and in place of (82)
bining adjacent rows, from the second to the last, pair-
wise, one obtains the set of ZK-' - 1 relations X"00 ...01 = NLX'on ...o 1 + Xon...111 (82')
NX;l;a...iK-20
= X;l;l...;h.-21 (80) while in place of (81) the right of center term of the first.
where jl,j., . . . , j K - 2 runs over all binary vectors except
+
row is - ( L L 2 ) and the first component on the right
side is N'L'. Similarly in place of (83) the center term
for t,he all zeros. Subhtution of (80) into. (78) yields a
2fi--'-dimensional matrix equation. The result for R = 4
of the first row is - N ( L L' + +
L3) and the first com-
ponent on the right side is N3L3.
is Performingthissequence of reductions K - 2 times
,,,1 TNLl in all, but omitting the last step-leadingfrom (81) to
(83)-in the last reduction, the original 2K-1 - 1 equa-
tions are reduced in the general case tot.he two equations
xoo-01
(NT,)"-'
Xll...1 = (135) - DcK(1 - D,")' < D,"x(l -
1, - N(L + + L 2 * * + LK-') -
(1 - 2D: +D,"K)z (1 - 20,")' (91)
Applying (79) andthe K - 2 extensionsof (80) and where Do isafunction of the channel transition prob-
(80') we find abilities or energy-to-noise ratio and is given by (46).
ACKNOWLEDGMENT
T(L, N ) = LXloo...oo
= LN-lXloo...o,
Theauthorgratefullyacknowledgestheconsiderable
= LN-2Xloo...oll= . * = LN-'"-2'Xll..., stimulationhehas receivedover the course of writing
the several versions of this paper from Dr. J. A. Heller,
-
- NLK whose recentworkstronglycomplementsandenhances
1 - N(L + + L2 * * + this effort, for numerous discussions and suggestion8 and
fqrhisassistanceinitspresentation attheLinkabit
-
- N L ~ (-
I L) Corporation"Seminars on Convolutional Codes." This
1 - L(l +
N ) + NL" t<torial approach owes part of its origin to Dr. G. D.
Forney, Jr., whose imaginative and perceptive reinterpre-
If we require only the path length structure, and not tation of myoriginalwork hsls aidedimmeasurslbly in
the number of bit errors corresponding to any incorrect rendering it more comprehensible. Also, thanks are due
' path, we may set N = 1 in (86) and obtain to Dr. J. K. Omura for his careful and detailed reading
and correction of the manuscript during his presentation
L" -
- LK(l - L) of this material in the UCLA graduate course on infor-
T(L) =
1 - (. L +. L2 + + LK-! 1 - 2L+ LK mation theory.
(87) REFERENCES
If we denote as an upper bound an expression which is [ I ] P. Elias,"Coding for' noisychannels," in 1055 I R E N a t .
the generating function of more paths than exist in our Conv. Rec., vol. 3, pt.4, pp; 37-46.
121 J. M. Wozencraft,"Sequentlal decoding for reliable com-
state diagram, we have munication," in 1957 I R E N a t . Conv. Record, vol. 5, pt.
2, pp.11-25.
[31 J . L. Massey, Threshold Decoding. Cambridge,Mass.:
L" M.I.T. Press, 1963.
T(L) < *-
1 - 2L [41 R. M. Fano, "A heuristic discussion of probabilisticdecod-
ing," I E h E Trans.Inform.Theory, vol. IT-9, Apr. 1963,
p p .64-74.
As an additional application of thisgeneratjngfunc- [51 R. G. Gallager, "A simple derivation of the coding theorem
tion technique, we now obtain bounds on PE and PB for andsome applications," IEEE Trans. Inform. Theory, vol.
rl'-11, Jan. 1965, pp: 3-18.
the class of orthogonal convolutional (tree) codes ,intra- 161 J. M. Wozencraft and I. M. Ja,cobs, Principles of Communi-
duced by Viterbi [ 101. For this class of codes, to'each of cation Engineering. New York: Wiley, 1965.
[71 K. S. Zigangirov,"Some sequential decoding proced'ures,"
the 2 K .branches of the K-state diagram there corresponds Probl. Peredach Inform.,vol. 2, no. 4, 1966, pp. 13-25.
one of 2R orthogonalsignals.Given that eachsignalis [SI C . E,. Shannon, R . G. Gallager, and E. R. Berlekamp,
''Lower boundstoerrorprobabilityfor coding on discrete
orthogonal t0 all others in n 2 1 dimensions, correspond- memoryless channels," Inform. Contr., vol. 10, 1967,'pt..I, pp.
ing to n channel symbols or transmission times (as, for 6!$-103, pt. 11,pp. 522-552.
[91 A . J. Viterbi, "Error bounds for convolutional codes and an
example, if each signal consists of n different pulses out asymptoticallyoptimum decodingalgorithm," IEEE Trans.
of 2% possible positions), then the weightof each branch Inform. Theory,vol. IT-13, Apr. 1967, pp. 260-269.
[lo1 ---, "Orthogonal tree codes for Communlcation inthe
is n. Consequently, ifwe replace L , thepathlength presence of whiteGaussian noise," I E E E Tvans. 8Commun..
enumerator,by D" in (86) we obtainfororthogonal Technol., vol. COM-15, April 1967, pp. 238-242.
E111 I. M. Jacobs, "Sequentid decoding fo'r efficient communica-
codes tionfromdeep space, IEEE'Trans.Commun.Techm'l.,
V O ~ .COM-15, Aug. 1968, pp. 492-M1.
[I21 G-. D. Forney,Jr., "Coding system design for advanced
N D " ~ (-
I 0") solar missions," submittedto NASA Ames Res.Ctr.by
T(D, N ) =
1 - Dn(l + N ) + NDnK (89) Codex Corp., Watertown, Mass., Final Rep., Contract NAS2-
3637, Dec. 1967.
[I31 J . L. Massey and M. K. Sain, "Inverses of linear sequential
Then using (48) and (49) , the first-event error prob- circuits," I E E E Trans.Cornput., vol. C-17, Apr. 1968, pp.
ability for orthogonal codes is bounded by 330437.
[I41 R.. G. Gallager, InformationTheoryandReliableCom-
' m.unicatwn. New York: Wiley, 1968.
1151 T. N. Morrissey, "Analysis of decoders for convolutional
codes bystochasticsequential machinemethods," Univ.
NotreDame, Not.re Dame,Ind.,Tech.Rep. EE-682, May
1968.
[I61 R. W. Lucky, J. Salz, and E. J . Weldon, Principles of Data
and the bit error probability bound is Communication. New York: McGraw-Hill, 1968.
772 IEEE TRANSACTICINS ON COMMUNICATIONS TECHNOLOGY, VOL. COM-19, NO. 5, OCTOBER 1971
[17] J. K. Omura, “On theViterbidecodingalgorithm,” IEEE Andrew J. Viterbi (S’54-M’58SM’63) w&s
Trans. Inform. Theory, vol. IT-15, Jan. 1969, pp. 177-179. born in Bergamo, Italy, on March 9, 1935.
[181 FI .Jelinek,“Fast.sequential decodingalgorithmusinga He received the B.S. and M.S.degrees in
stack,” I B M J. Res. Dev., vol. 13, no. 6, Nov. 1969, pp. electricalengineering fromthe Massachu-
675-685. settsInstitute of Technology,Cambridge,
[191 E. A. Bucherand J. A. Heller, “E;ror robabilitybounds in 1957, and the Ph.D. degreeinelectrical
for systematicconvolutionalcodes, IEEE Trans. Inform. engineering from the University of Southern
Theory, vol. IT-16, Mar. 1970, pp. 219-224. California, Los Angeles, in 1962.
[201 J. P. Odenwalder, “Optimal decoding of convolutional While attending M.I.T., he participated in
codes,’’ Ph.D.dissertation,Dep.Syst. Sci., Sch.Eng.Appl.
Sci., Univ. California, Los ‘Angeles,1970. thecooperativeprogram attheRaytheon
[211 G. D. Forney, Jr., “Codinganditsapplicationin space Company. In 1957 he joined the Jet Propul-
communlcatlons,” IEEE Spectrum, vol. 7, June 1970, pp. sion Laboratory where he became a Research Group Supervisor in
47-58. the Commnnications Systems Research Section. I n 1963 he joined
[221 -, “Convolutional codes I: Algebraic structure,” ZEEE the faculty of the University of California, Los Angeles, as an As-
Trans. Inform. Theory, vol. IT-16, Nov. 1970, pp. 720-738; sistant Professor. In 1965 he was promoted to Associate Professor
“I1: Maximumlikelihood decoding,’’ and “111: Sequential and in 1969 to Professor of Engineering and Applied Science. He
decoding,” IEEE Trans. Inform. Theory, tobepublished. was a cofounderin 1968 of Linkabit Corporation of which he is
[231 W. J. Rosenberg,“Structuralproperties of convolutional presently Vice President.
codes,”Ph.D.dissertation,Dep. Syst.. Sci., Sch.Eng.Appl. Dr. Viterbi is a member of the EditorialBoardsof thePRocmmNGs
Sci., Univ. California, Los Angeles,1971. OF THE IEEE and of the journal Information and Control. He is a
[241 J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite member of Sigma Xi,Tau Beta Pi, and E t a Kappa Nu and has served
and space com,munication,” this issue, pp. 835-848. on several governmental advisory committees and panels. He is the
[251 A. R. Cohen, J. A.Heller,and A. J. Viterbi, “A new cod- coauthor of a book on digital cornmurkation and authorof another
ing technique for asynchronous multiple access communica- on coherent communication, and he has received three awards for
tion,,’ this issue, pp. 849-855. his journal publications.
Abstract-The purpose of this paper is to organizeand clarify posed, but turns out to be a rather inefficient method of
the work of the past decade on burst-correcting codes. Our method burst correction.
is, first,todefine an idealizedmodel,called the classicbursty Of the work that has gone into burst-correcting codes,
channel, toward which most burst-correcting schemes are explicitly
or implicitly aimed; next, to b o y d the best possible performance thebulkhas been devotedtofinding codes capable of
on this channel; and, finally, to exhibit classes of schemes which correcting all bursts of length B separatedbyguard
are asymptotically optimum and serve as archetypes of the burst- spaces of length G. Wecallthese zero-error burst-
correcting codes actually in use. In this light we survey and cat- correcting codes. It has beenrealizedinthepast few
egorize previous work on burst-correcting codes. Finally, we discuss years that this work too has been somewhat misdirected ;
qualitatively theways in whichreal channels failto satisfy’ the
assumptions of the classic bursty channel, and the effects of such for on channels for whichsuchcodes are suited, called
failqreson the various types of burst-correcting schemes.We in this paper classic bursty channels, much more efficient
concludeby comparing forward-error-correction to the popular communication is possiblk if we require pnly that practi-
alternative of automatic repeat-request (ARQ). cally all bursts of length B be correctible.
The principal purpose of this paper is tutorial. In order
INTRODUCTION toclarifythe issues involvedinthedesign of burst-
correctingcodes, we examine an idealizedmodel,the
OST WORK in coding theory has been addressed classic bursty channel, on which bursts are never longer
to efficient communication over
memoryless than B nor guard spaces shorter than G. We see that the
channels.Whilethisworkhas been directly inefficiency of zero-error codes is due to their operating
applicable to space channels [ 13, it has been of little use at the zero-error capacity of the channel, approximately
on all other real channels, where errors tend to occur in ( G - B ) / (G + B ) , rather than at the true capacity,
bursts. The use of interleaving to adapt random-error-
correcting codes toburstychannelsisfrequently pro-
which i s morelike G / ( G + B ) . Operation a t thetrue
capacity is possible, however, if bursts can be treated as
erasures; that is, if their locations can be identified. By
Paper approved by the Communicatioq Theory Committee of theconstruction of some archetypalschemes in which
theIEEECommunication TechnologyGroupforpublication short Reed-Solomon (RS) codes are used withinter-
without oral presentation. Manuscript received May 10,1971.
The author is with Codex Corporatioq, Newton, Mass., 02195. leavers, we arriveatasymptoticallyoptimal codes of