Codes In: Convolutional and 'Their Performance Communication Systems

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, VOL. COM-19, NO.
5, OCTOBER 1971 751

[211 R. T. Ctticn,
“Cyclic
decoding
procedure
for the Bose-decodingbeyond the
BCHbound,” Univ. Illinois, Urbana,
Chaudllu~1-Hocquenghcm codes,” I E E E l’rans. Inform. CSL
Rep. R-404,
1969.
Theory, vol. IT-10, Oct. 1964, pp. 357-363. 1341 E. J. Weldon,Jr.,‘!Difference-setcycliccodes,” Bell Syst.
[221 G . D.Forney, “Ondecoding BCH codes,” IEEETrans. Tech. J., vol. 45, Sept. 1966, pp. 1045-1055.
Inform. Theory, vol. IT-11, Oct. 1965, pp. 549-557.
1231 J. 1,. Mnssey,“Shift-registersynthesisand BCH decoding,’]
IEEE Trans. I n f o r m . Theory, vol IT-15, Jan. 1969, pp. 122-
3 0-
1L1.
11241 -, Tlmsholtl Decoding. Cambridge,Mass.:M.I.T.Press, Robert T. Chien (SJ56-M’58) wasbornin
1963. Kiangsu, China, on November 20, 1931. He
E251 H. F. Mattson and G . Solomon: “A new treatment of Bosc- received the A.M. degree inmathematics
Chnudhuri Codes,” J . Soc. Indust. A p p l . Math., vol. 9, Dee. and the Ph.D. degree in electrical engineering
1961, 11p. 654-669. from the University of Illinois, Urbana,in
~ . I.
-r261 . S. Reed. “A clnss of multinle-error-correcting codes and
~ 1957 and 1958, respectively.
the dccodidg scheme,” I R E ~ r k sz72forVL.. ~ h e o & ,vol. IT^, From 1958 to 1965 he was associated with
Sept. 1954, 171,. 3 8 4 9 . the IBM Thomas J. Watson Research Center,
C271 I . S.,Reed and G. Solomon, “Polynomial codes
over YorktownHeights, N.Y., where he waa
ccrtam finite fields.” J . Soc. I7ldIlSt. Anwl. Math.. vol. 8.
1960,11p. 300-304.
1 1
responsiblefor a research groupincoding
C281 1,. D . Rudolph, “Gcornetric configuration and majority theory and memoryaddressing. From 1961
logicdecodnblc codes,” M.E.E. thesis, Univ. Oklahoma, to 1963 he was Adjunct Associate Professor at Columbia University,
Norman, 1964. New York, N. Y. I n 1965 he joined the University of Illinois where
1291 -, “A clnss of majority logicdecodablecodes,” IEEE he is currently Professor of ElectricalEngineeringandAssociate
Trans.Inform.Theory (Corresp.),vol. IT-13, Apr. 1967, Director for Systemsa t t h eCoordinated Science Laboratory.He has
1Ill. 305-307. published in the areas of graph theory, coding theory, artificial in-
[301 -, “Thresholddecoding of cyclic codes,” IEEETrans. telligence, and information retrieval systems. He is the coauthor of
Iwform. Theory, vol.IT-15, May 1969. pp. 414-418.
1311 J. J. Stone, “Multiple burst
errorcorrection,” Inform. a book, Topological Analysis and Synthesis of Communication Net-
C O ~ L ~vol.
T . ,4, Mar. 1961, pp. 324-331. works (New York: Columbia University Press, 1962). He is also a
[321 S. Y. Tong,“Bursttrappingtechniquesforacompound consulta.nt to IBM in error control and coding, digital communica-
channel,”Bell TelephoneLab.,Tech.Memo., 1968. tion, information retrieval, and memory indexing.
1.331 K. X. M. Tzeng, “On iterative decoding of BCH codes and Dr. Chien is a member of Tau Beta Pi andSigma Xi.
Convolutional Codes and ’Their Performance

in Communication Systems
ANDREW J. VITERBI, SENIOR MEMBER, IEEE
Abstract-This tutorial paper begins with an elementary presenta- form block codes of the same order of complexity, there
tion of the fundamental properties and structure of convolutional remainstodate a lack of acceptance of convolutional
codes and proceeds with the development of the maximum likeli-
coding and decoding techniques on thepart of many
hood decoder. The powerful tool of generating function analysis is
demonstrated toyield forarbitrary codes both the distance propertiescommunication technologists. In most cases, this is due
and upper bounds on the bit error probability for communication to an incomplete understanding of convolutional codes,
over any memoryless channel. Previous results on code ensemble whose cause can be traced primarily to the sizable litera-
average error probabilities are also derived and extended by these ture in this field, composed largely of papers which em-
techniques.Finally,practicalconsiderationsconcerning finite de-
codingmemory,metric representation,andsynchronizationare
phasize detailsof the decoding algorithms rather than the
discussed. more fundamental unifying concepts, andwhich,until
I.INTRODUCTION recently, have been divided into two nearly disjoint sub-
sets. This malady is shared by the block-coding litera-
A LTHOUGH convolutional codes, first introduced

by Elias [ I ], have been appliedoverthepast
decadeto increase the efficiency of numerous
communicationsystems, where theyinvariablyoutper-
ture, wherein thealgebraic decoders andprobabilistic
decoders have been a t oddsfor a considerably longer
period.
The convolutional code dichotomy owes its origins to
thedevelopment of sequential(probabilistic) decoding
Paper approved by the Communicat.ion Theory Committee of byWozencraft [2] and of threshold(feedback,alge-
the IEEE CommunicationTechnologyGroupforpublication braic) decoding byMassey [3]. Untilrecently t.he two
withoutoralpresentation.ManuscriptreceivedJanuary 7, 1971 ;
rcvised June 11, 1971. disciplines flourished almost independently, each with its
Theauthor is with the School of EngineeringandApplied own literature,applications,andenthusiasts.TheFano
Science,University of California, Los Angeles.Calif. 90024, and
the Linknbit Corporation, San Diego, Calif. sequential decoding algorithm [ 4 ] was soon found to
752 IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER 1971
greatly outperform earlier versions of sequential decoders time-varyingconvolutional codes bymeans of a gen-
bothintheoryandpractice.Meanwhilethefeedback eralizedgeneratingfunctionapproach;explicitresults
decoding advocates wereencouragedby the burst-error are obtained for the limiting case of a very noisy channel
correctingcapabilities of the codes which renderthem and compared with the corresponding results for block
quite useful for channels with memory. codes. Finally, practical considerations concerning finite
T o add to theconfusion, yet a thirddecoding technique memory, metric representation, and synchronization are
emerged with the Viterbi decoding algorithm [9], which discussed. Furtherand moreexplicit details on these
was soon thereafter shown to yield maximum likelihood problemsanddetailedresults of performanceanalysis
decisions (Forney [ 121, Omura [ 171 ) . Although this ap- andsimulationare givenin thepaperbyHellerand
proach is probabilistic and emerged primarily from the ,Jacobs [ 241.
sequential-decoding oriented discipline, it leads naturally While sequential decoding is not treated explicitly in
to a morefundamentalapproachto convolut.iona1 code thispaper,thefundamentalsandtechniquespresented
representationandperformanceanalysis.Furthermore, herelead naturallytoaneleganttutorialpresentation
byemphasizingthedecoding-invariantproperties of of thissubject,particularlyif, following Jelinek [18],
convolutional codes, one arrives directly to the maximum onebeginswith the recentlyproposed stack sequent.ia1
likelihooddecoding algorithm and from it to the alter- decoding algorithm proposedindependentlybyJelinek
nateapproaches whichlead tosequential decoding on and Zigangirov [7], which is far simpler to describe and
the one hand and feedback decoding on the other. This understand then the original sequential algorithms. Such
decodingalgorithm has recentlyfoundnumerousappli- a development, which proceeds from maximum likelihood
cations incommunicationsystems,two of which are decoding tosequential decoding,exploiting the simi-
covered in this issue (Hellerand .Jacobs [24], Cohen larities in performance and analysis has been undertaken
e t al. [25] ) . It is particularly desirable for efficient com- by Forney [22]. Similarly, the potentials and limitations
munication a t very high datarates, where very low of feedback decoders can be better understood with the
errorratesarenotrequired,or wherelargedecoding background of the fundamental decoding-invariant con-
delaysareintolerable. volutional code properties previously mentioned, as dem-
Foremost among the recent works which seek to unify onstrated, for example, by the recent work of hlorrissey
thesevariousbranches of convolutionalcodingtheory ~151.
is that of Forney 1121, [21], [22], et seq., which includes
11. CODE REPRESENTATION
a three-part contribution devoted, respectively, to alge-
braicstructure,maximum likelihooddecoding, andse- A convolutional encoder is a linear finite-state machine
quential decoding. Thispaper, whichbegan asanat- consisting of a K-stage shift register and n linear alge-
temptto present, theauthor’soriginalpaper [9] to a braicfunctiongenerators. Theinputdata, whichis
broader audience; is another such effort a t consolidating usually, though not necessarily, binary, is shifted along
this discipline. the register b bits at a time. An examplewith K = 3,
It beginswith anelementarypresentation of the n = 2, b = 1 is shown in Fig. 1.
fundamcntalpropertiesandstructure of convolutional The binary input data and output code sequences are
codes and proceeds toanatural development of the indicated on Fig. 1. The first three input bits, 0, 1, and
maximum likelihooddecoder. The relative
distances 1, generate the code outputs 00, 11, and 01, respectively.
among codewords are then determined by means of the We shall pursue this example to develop various repre-
generatingfunction (ortransferfunction) of the code sentations of convolutional codes andtheirproperties.
statediagram.This in turn leads totheevaluation of The techniques thus developed will then be shown to
coded communication system performance on any mem- generalize directly to any convolutional code.
orylesschannel.Performance isfirstevaluatedforthe It is traditional and instructive to exhibit a convolu-
specific cases of thebinarysymmetricchannel (BSC) tional code bymeans of atreediagramas shownin
andtheadditivewhiteGaussian noise (AWGN)chan- Fig. 2.
nel with biphase (or quadriphase) modulation, and finally If the first input bit is a zero, the code symbols. are
generalized toothermemorylesschannels.Newresults thoseshown on the firstupperbranch, while if it is a
are obtained for t,he evaluation of specific codes (by the one, theoutput codesymbols are thoseshownon the
generatingfunctiontechnique),ratherthantheen- first lower branch. Similarly, if the second input bit is a
semble average of a class of codes, ashad been done zero, we tracethetreediagramtothe nextupper
previously, and for bit error probability, as distinguished branch, while if it, is a one, we trace the diagram down-
from event error probability. ward. In this manner all 32 possible outputs for the first
The previousensembleaverageresultsarethen ex- five inputs may be traced.
tended to bit error, probability bounds for the class of From the diagram it also becomes clear that after the
first three branches the structure becomes repetitive. I n
fact, we readily recognize that beyond the third branch
1 This
material first appeared inunpublishedform as the
notes for th? Linkabit. Corp., “Seminar on convolutional codes,”
the codesymbols on branchesemanatingfromthetwo
Jan. 1970. nodeslabeled a are ident.ica1, andsimilarlyforall the
VITERBI : CONVOLUTIONAL CODES 753
010001. ..
m a=
001101010010. . 011010. .. b=
CODE SEQUENCE DATA SEQUENCE
C - m
~011100;. . I
d=(lll
Fig. 1. Convolutional coder for K = 3, TL = 2, b = 1.
Fig. 3. Trellis-code representation for coder of Fig. 1 .
nn -00
n 10
I 1 r L 0 0
?
:,;
llb
01 d r 0 1
1 11
01
11
=b c=
00
01 10d
y;o
10
1’
10
Fig 4. State-diagram representation for coder of Fig. 1.
11
pond merely to the last two input bits to the coder we
may use these bits to denote the nodes or states of this
diagram.
01 Weobservefinally thatthestatediagram can be
drawndirectlybyobservingthefinite-statemachine
properties of the encoder and particularly the fact thata
10
four-st,ate
directed
graph
can
be used to represent
Fig. 2. Tree-code representation for coder of Fig. 1 uniquely the input-outputrelation of theeight-state
machine. For the nodes represent the previous two bits
identically labeled pairs of nodes. The reason for this is while thepresentbit is indicatedby thetransition
obvious from examination of the encoder. As the fourth branch;forexample, if the encoder (machine)contains
input bit enters the coder a t t h e right, the first data bit 011, this is represented in the diagram by the transition
falls off on the left end and no longer influences the out- from state b = 01 to state d = 11 and the corresponding
put codesymbols.Consequently, thedata sequences branch indicates the code symbol outputs 01.
1 0 0 ~. .~ and
. OOOxy- . generate the same code symbols
afterthethirdbranchand, as isshownin thetree 111.MINIMUM DISTANCE DECODER FOR BINARY
diagram,both nodeslabeled a can be joinedtogether. SYMMETRIC CHANNEL

This leads to redrawing the tree diagram as shown in On a B,SC, errorswhichtransformachannel code
Fig. 3. This has been called a trellis diagram [12], since symbol 0 to 1 or 1 to 0 are assumed to occur indepen-
a trellis is a tree-like structure with remerging branches. dently from symbol to symbol with probability p . If all
We adopt the convention here that code branches pro- input (message) sequences are equally likely, the decoder
duced by a “zero” input bit are shown as solid lines and which minimizes the overallerrorprobabilityforany
code branches produced by a “one” input bit are shown code, block or convolutional, is one which examines
dashed. the error-corrupted received sequence ylyz * * yj . . . and
The completely repetitive structure of the trellis dia- chooses the data sequencecorresponding to the trans-
gram suggestsa further reduction in the representation -
mitted code sequence ~ ~ 2 *2 x-j *. *, whichisclosest to
of the code to the state diagram of Fig. 4. The “states” the received sequence in the sense of Hamming distance;
of the state diagram
- are labeled according to-the nodes that is, the transmitted sequence which differs from the
of the trellis diagram. However, since the states corres- received sequence in the minimum number of symbols.
Referring first to the tree diagram, this implies that channels.Anotherdescription of thealgorithmcan be
we should choose that path in the tree whose codese- obtainedfromthestate-diagramrepresentation of Fig.
quence differs in the minimum number of symbols from 4. Supposewc soughtthatpatharoundthedirected
the receivedsequence.However,recognizing thatthe state diagram, arriving atnode a. after the kthtransit.ion,
transmitted code branches remerge continually, we may whose code symbols are at a minimum distance from the
equallylimitour choice tothe possiblepat.hs inthe receivedsequence. But clearlythisminimumdistance
trellisdiagram of Fig. 3. Examination of thisdiagram path to node a a t time k can be only one of two candi-
indicatesthat it isunnecessary to consider theentire dates:the miminunldistancepath.to node a attime
received sequence(whichconceivablycould be thou- k - 1 and the minimum distance path to node c a t time
sandsor millions of symbolsinlength) a t one time in k - 1. The comparison is performed by adding the new
decidingupon t.he most likely
(minimum distance) distanceaccumulated in thekthtransitionby each of
transmittedsequence. In particular,immediatelyafter thesepathstotheirminimumdistances(metrics) at
thethirdbranch we maydetermine which of the two timc k - 1.
paths leading t o node or state a ismorelikely to havc It appears thus that the statc diagram also represents
been sent. For examplc, if 010001 is received, i t is clear asystemdiagramforthisdecoder.Witheach node or
that this is at distance 2 from 000000 while it is a t dis- state we associate a stomgeregister whichremembers
tance 3 from 111011 andconsequently we may exclude theminimumdistancepathintothestateaftereach
the lower pathinto node a. For, no matterwhatthe transition as well as a metric register which remembers
subsequent, received symbols will be, they will effect the its
(minimum) distancefromthe received
sequence.
distanc,es only over subsequent branches after these two Furthermore, comparisons are made a t each step between
pathshave remerged andconsequentlyinexactlythe the two paths whichlead intoeachnode.Thusfour
sameway.Thesamecan be saidforpairs of paths comparators must als9 be provided.
merging at. the other three nodes after the third branch. Thereremainsonlythequestion of truncatingthe
We shall refer to the minimum distance path of the two algorithmandultimately decidingonone pathrather
paths merging at. a given node as the “survivor.” Thus than four. This iseasilydonebyforcing thelasttwo
it is necessary only to remember which was the minimum input bits to the coder to be 00. Then the final state of
distance path from the receivedsequence (or survivor) the code must be a = 00 and consequently the ultimate
at each node, as well as the value of that minimum dis- survivor is the survivor at node a, after the insertion
tance. This is necessary because at the next nodelevel into the coder of the two dummy zeros and transmission
we mustcomparethetwobranches merging a t each of the corresponding four code symbols. In terms of the
nodelevel,whichweresurvivors at the previous level trellis diagram this means that the number of states is
for different nodes; e.g., the comparison a t node a after reducedfromfour to twoby the insertion of the first.
the fourth branch is among the survivors .of comparisons zero and to a single state by the insertion of the second.
a t nodes a and c after the third branch. For example, if The diagram is thus truncated in the same way as it was
the receivedsequence overthefirstfourbranches is begun.
01000111, the survivor at the third nodelevel for node We shall proceed to generalize these code representa-
a is 000000 with distance 2 and at. node c it is 110101, tionsandoptimal decodingalgorithm to general con-
also with distance 2. I n going from the third node level volutional codes and arbitrary memoryless channels, in-
to the fourth the received sequence agrees precisely with cluding theGaussianchannel, ‘in SectionsVandVI.
the survivor from c but has distance 2 from the survivor However, first we shall exploit the state diagram further
from a. Hence the survivor at node a of the fourth level todeterminetherelativedistanceproperties of binary
is the data sequence1100whichproduced the codese- convolutionalcodes.
quence 11010111 which is at (minimum) distance 2 from
the received sequence. IV. DISTANCE PROPERTIES
OF CONVOLUTIONAL CODES
I n this way we may proceed through the received se- We continue .to pursue the example of Fig. 1 for the
quence andat each stepfor each state preserveone sake of clarity; in the next .section we shall easily gen-
survivingpathanditsdistance from the received se- eralize results. It is well known that convolutional codes
quence, which is more generally called metric. The only aregroup codes. Thus there is no loss in generalityin
difficulty which mayarise is the possibility that in a computing the distance from the all zeros codeword to
givencomparisonbetweenmergingpaths, the distances all the other codewords,for this set of distances is the
or metrics are identical. Then we may simply flip a coin same as the set of distances from any. specific codeword
as isdoneforblockcodewords at equal distancesfrom to all the others.
the received sequence. For even if we preserved both of For this purpose we may again use either the trellis
theequallyvalidcontenders,further received symbols diagramorthestatediagram.Wefirst of allredraw
would affect both metrics in exactly the same way and the trellis diagram in Fig. 5 labeling the branches ac-
thus not further influence our choice. cording to their distances from the all zeros path. Now
This decoding algorithm was first proposed by Viterbi consider all the paths that, merge with ‘the all zeros for
[9]in the more general context of arbitrary memoryless the first time at some arbitrary node j .
at this same state a = 00. All such paths can be traced

on the modified state diagram. Adding branch exponents
we see that path a b c a is a t distance 5 from the correct
path,.paths a b d c a and a b c b c a are both at distance
6 , and SO forth, for thegenerating functions of the output
sequence weights of these paths are D 5 and Do, respec-
tively
Now we may evaluate the generating function of all
paths mergingwith the all zeros at the jth nodelevel
simply by evaluating the generating function of all the
Fig. 5 . Trellis diagram labeled with distances from all zeros path.
weights of the output sequences of the finite-state ma-
chine.2 The result in this case is
D5
T(D) =I ____
1 -20
=: D 5+ 2 D R f 4D7 + . .; + 2&Dk’5 + . . . * (1)
This verifies our previous observation and in fact shows
that among the paths which merge with the all zeros a t a
given node there are 2k paths at distance k 4-5 from the
all zeros.
Of course, ( 1 ) holdsfor an infinitelylong code se-
Fig. 6. Statediagram labeledaccording to distance fromall
zeros path. quence; if we are dealingwith the jth nodelevel, we
musttruncatetheseries a t some point.This is most
easilydoneby considering theadditionalinformation
It is seen from the diagram that of theEe paths there
indicated ,in the modified state diagralrl of Fig. 7.
will be just one path at distance 5 from theall zeros
The L terms will be used to determine the length of a
pathandthis diverged fromitthree branchesback.
given path ; since each branch has an L , the exponent of
Similarly there are two a t distance 6 from it, one which
the L factor will be augmentedbyoneevery’ the a
diverged 4 branches back and the other which diverged
branbh is passed through. The N term is included only
5 branchesback,and so forth.Wenote also that the
if that branch transition was,caused by an input data
inputbitsfordistance 5 path are 00 . 0100 and thus
“one,” Corresponding to a dottedbranchinthetrellis
differ inonlyoneinputbitfromtheall zeros,while
diagram:The,generating function of thisaugmented
thedistance 6 pathsare 00..* 01100 and 00 * 010100
state diagram is then
and thus each differs in 2 input tiitsfrom the all zeros
path. The minimum distance, sometimes called the min- T(D,L, N )
imum “free” distance, among ail paths is thus seen to be
5. This implies that any pair of channelerrorscan be -- D 5 L3N
corrected, for two errors will cause the received sequence 1 - DL(1 + L)N
to be at distance 2 from thetransmitted(correct) se- = D5L3N + D0L4(l+ L)N2 + D7L5(l + L)*)1N3
quence but ,it will. be at least at distance 3 from any
other possible code
sequence. It appearsthatwith
+ ... + L ) 5 + k ~ 3 ++k ( l + . .. (2)
enoughpatience the distance of all paths frohn the ail Thus we have verified that of the two distance 6 paths
zeros (oranyarbitrary)pathcan be so determined one is of length 4 and the other is of length 5 and both
from the trellis diagram. differ in 2 inputbits from theall zeros.3 Also, of the
However, by examining instead the state diagram we distance 7 paths, one is of length 5, two are of length 6,
canreadilyobtaina &sed formexpression whose ex- and one is of length 7 ; all four paths correspond to input
pansionyieldsdirectlyand effortlessly all the distance sequences with three ones. If we are interested in the jth
information. We begin bylabelingthebranches of the node level, clearly we should truncate the series such that
state diagram 6f Fig. 4 either D 2 , D ,or D o = 1 , where no terms of power greater than Lj are included.
the exponent corresponds to the distanceof the particular Wehavethusfullydeterminedtheproperties of all
branchfrom the correspondingbranch of theall zeros paths in the convolutional code. This will be useful later
path. Also we split open the node a = 00, since circula- in evaluating error probability performanceof codes used
tion around this seif-loop simply corresponds to branches over arbitrary memoryless channels.
of the all zeros path whose distance from itself is ob-
viously zero. The result is Fig. 6 . Now as is clear from 2 Alternatively,this can be regarded as thetransfer function
examination of thetrellisdiagram,everypath which of the diagram regarded as a signal flow graph.
arrives at state a = 00 a t node level j , musthave a t 3Thus if the allzeros was the correct, pathandthe noise
causes 11s to choose one of the incorrect paths, two biterrors
somepreviousnode level (possibly the first) Originated will be made.
Fig. 7. Statediagram labeledaccording to distance,length,and

number of input ones. Fig. 8. Coder for K = 2, b = 2, n = 3, and R = 2/3.
V. GENERALIZATION
TO ARBITRARY
CONVOLUTIONAL
CODES
The generalization of thesetechniques toarbitrary
binary-tree ( b = 1) convolutionalcodesisimmediate.
That is, a coder withaI<-stageshiftregisterand n
mod-2 adders will produce a trellis or state diagram with
2K-1 nodes or states and each branch will contain n code
symbols. The rate of this code is then
fc = -1 bits/code symbol.
n
The exalnple pursued in the previous sections had rate
R = 1/2. The primary characteristic of the binary-tree
codes is that only two branches exit from and enter each
node.
If ratesotherthan l/n are desired we mustmake
b >-1, where b is the number of bits shifted into the
register at one time. An example for K = 2, b = 2, n =
3, and consequently rate R = 2/3 is shown in Fig. 8 and Fig. 9. State diagram for code of Fig. 8.
its state diagram is shown in Fig. 9. It differs from the
binary-tree codesonly inthat eachnode is connected
tofourothernodes,andforgeneral b it willbecon- responding symbol of x i with probability p and is identical
nected to 2b nodes. Still all the preceding techniques in- to it with probability 1 - p .
cluding thetrellisandstate-diagramgeneratingfunc- Forcompletelygeneralchannels it isreadilyshown
tionanalysisarestillapplicable. It must benoted, [ 6 ] , [14] that if all inputdata sequences areequally
however, that the minimum distance decoder must make likely, the decoder which minimizes the error probability
comparisons among all the paths entering is one which compares the conditional probabilities, also
each node a t
eachlevel of thetreliisand selectone survivorout of calledlikelihood functions, P ( y I X"")), where y isthe
four (or outof 2* in general). overall received sequence and X"'" is one of the possible
transmitted sequences, and decides in favor ,of the max-
imum. This is called a maximum likelihood decoder. The
VI. GENERALIZATION OF OPTIMAL DECODER TO
likelihood functionsaregivenorcomputedfromthe
ARBITRARY MEMORYLESS CHANNELS
specifications of the channel.Generally it is morecon-
Fig. 10 exhibitsacommunicationsystememploying venient to compare the quantities log P(y I xcm))called
a convolutional code. The convolut,ionalencoder
is the log-likelihood functionsandtheresultisunaltered
preciseiy the devicestudiedintheprecedingsections. since the logarithm is a monotonic function of its (always
Thedata sequenceisgenerallybinary (ai = 0 or 1) positive) argument.
and the code sequence is divided into subsequences where To illustrate,letus consider againthe BSC. Here
x i representsthe n codesymbolsgeneratedjustafter eachtransmittedsymbolisalteredwithprobability
the input bit ai entersthecoder: that is,thesymbols p < .1/2. Now suppose we have received a particular
of the jth branch. In terms of the example of Fig. 1, N-dimensionalbinarysequence y andare considering
a3 = 1 and x3 = 01. Thechanneloutputor received a possible transmitted N-dimensional code
sequence
sequence is similarly denoted. y i represents the n symbols whichdiffers in d m symbolsfrom y (that is, the
received when the n code symbols of xi were transmitted. Hamming distance
between
and y is d m ) . Then
This model includes the BSC wherein the y i are binary since thechannel is memoryless(i.e., it affectseach
n vectorseach of whosesymbolsdiffersfrom the cor- symbol independently of all the others), the probability
VITERBI CONVOLUTION.AL CODES 757
1
I
al. a2. . .ai.. . x, I x2. . .xi. '
Pig. 10. Communicationsystememployingconvolutional

. Y,, Y*. . .Yj,
.codes.
that
this was transformed
to
the specific received
y at distance d,, from it is
p(y I x(m)). = pd"(l - p)N-d-
and the log-likelihood function is thus

log P ( y I X"")) = -dm log (1 - p / p ) + N log (1 - 11) CORRELATOR e
DEMODULATOR
Now if we compute this quantity for each possible trans-
mitted sequence, it is clear that, the second term is con- n(t) WHITE GAUSSIAN NOISE
stant ineachcase. Furthermore, since we may assume Fig. 11. Modemforadditivewhite Gaussiannoise P S K . modu-
lated memoryless channel. '
p < 1/2 (otherwise the roie of 0 and 1 is simply inter-
changed at. the rcceivcr), we may express this as
code path xi("')
log P ( y I ~ ( m ) ) = -adrn -0 (3)
wherc and ,8 are positive constants and d,,, is the (posi-
.(Y
tive) distance. Consequently, it is clear that maximizing

the log-likelihoodfunctionisequivalent. to minimizing since each symbol is affected independently by the white
the Hamming distance d,,,. Thus for the BSC to minimize Gaussian noise, and thus the log-likelihood function for
the error probability we should choose that code sequence the jth branchis
a t minimum distance from the received sequence, as we
havc indicated and done in preceding sections. In p(yf I xicm))
= In p ( y , , I zjk("'))
W e now consider a morephysicalpracticalchannel: t-1
theAWGNchannelwithbiphase4phase-shiftkeying
(PSK) modulation. Themodulatorandoptimum de-
modulator(correlatdr or integrate-anddumpfilter)for
this channel are shown in Fig. 11.
We use the notation thatx i k is the kth code symbol for
the jth branch. Each binary symbol (which we take here
forconvenience t o be f1) modulatesthecarrierby
=tII/2 radiansfor T seconds. The transmissionrate is, n
therefore, 1 / T symbols/secondor b/nT' = R / T bit/s.
The function e, is the energy transmitted for each symboi.
The energyperbit is, therefore e b = E J R . The white where C and D are independent of m, and we have
Gaussian noise is a zero-mean random process of one- used the fact. that = 1. Similarly,
the log-likeli-
sidedspectraldensity NO W/Hz, whichaffectseach hoods function for any path is the sum of the log-likeli-
symbol independently. It thetl follows direct,ly that the hood functions for each of its branches.
channel outbut symbol y i k is a Gaussian random variable Wehavethus shown thatthe maximumlikelihood
whose mean is d < x j k (i.e., + 4:
if x j k = 1 and - decoderforthe memoryless AWGNbiphase (or quad-
if xi,+ = -1) ,and whose varianceis N , / 2 . Thusthe riphase) modulated channel is one which forms the inner
conditionalprobabilitydensity (or likelihood) function product between the received (realnumber) sequence
of Y i k given x i k is andthe code sequence(consisting of f 1) and chooses
the path corresponding to the greatest. Thus the metric
forthischannel is the innerproduct (5) as contrasted
with the distanceG metricused for the BSC.
The likelihood function for the jth branch of a particular
5 W e have used thenatural logarithmhere, b u t obviously a

4The results are the same for ouadriphase PSK with coherent change of base results merely in a scale factor.
reception.Theanalysis proceeds .'in the sameway, if we treat 0 Actually itis easilyshown that maximizing aninnerproduct
quadriphase PSKas two parallelindependentbiphase PSK isequivalent t.o minimizing the Euclideandistancebetween the
channels. corresponding vectors.
For convolutional codes the structure of the code paths path merging with the all zeros a t node a a t the jth level.
was described in Sections II-V. I n Section I11 the opti- Nowsuppose that the previous-levelsurvivorswere
mum decoder w&s derived for the BSC. It now becomes such that the path compared with the all zeros a t step j
clearthat if we substitutetheinnerproductmetric is the path whose data sequence is 00 . 0100 correspond-
syjkxjk(m) forthedistancemetric s d j k ( m ) ,usedfor the ing to nodes a * * . a a b c a (see Fig. 4.). This differs
'
BSC, all the arguments used in Section 111 for the latter from the correct (all zeros) path in five symbols. Conse-
apply equally to this Gaussian channel. I n particular the quently an error will be made in this comparison if the
optimum decoder has a block diagram represented by the BSC caused three or more errors in these particular five
code state diagram. At step j the stored metric for each symbols. Hence the probability of an error in thisspecific
state(whichisthemaximum of the metrics of all the comparison is
paths leading to this state at this time) is augmented by
thebranchmetrics for branchesemanatingfromthis
state. The comparisons are performed among all pairs of
P, = 2 (:)p"(l -
e-3
p)5--s.
(oringeneralsets of 2 b ) branchesenteringeachstate On the other hand, there is no assurance that this par-
and the maxima are selected as the new most likely paths. ticular distance five path will have previously survived
Thehistory(inputdata) of each new survivormust so as to becomparedwiththecorrectpath at the jth
again be storedandthedecoderis now readyforstep step. If either of the distance 6 paths were compared in-
j + 1. stead, then four or more errors in the six different sym-
Clearly, this argument generalizes to any memoryless bols will definitely cause an error in the survivor deci-
channel and we must simply use the appropriate metric sion, while three errors will cause a tie which, if resolved
In P ( y I X("')), which may always be determined from the by coin flipping, will result in an error only half the time.
statisticaldescription of thechannel.Thisincludes, Then the probability if this comparison is made is
among others, AWGN channels employing other formsof
modulation.'
In thenextsection, we applytheanalysis of con-
volutionalcodedistanceproperties of Section IV t o Similarly, if thepreviousiysutvivingpathsweresuch
determinetheerrorprobabilities of specific codes on that a distance d path is compared with the correct path
more general memoryless channels. at the jth step, the resulting error probability is
VII. PERFORMANCE
OF CONVOLUTIONAL
CODES k odd
ON MEMORYLESS
CHANNELS
I n Section IV we analyzed the distance properties of
convolutional codes employinga state-diagram generating
functioli technique. We now extend this approach 'to ob-
tain tight upper bounds on the error proba.bility of such
codes.We shallconsider the BSC, the AWGN channel
Now at step j, since there is no simple way of deter-
and more general memoryless channels, inthat order. We
mining previous survivors, we may overbound the prob-
shail obtain both the first-event error probability, which
ability of afirst-event,errorbythesum of theerror
is the probability that the correct path is excluded (not
probabilities for all possible paths which merge with the
a survivor) for the first time at the jth,step; and the bit
correct path at this point. Note this union bound is in-
error probability which is theexpectedratio of bit er-
deed anupperboundbecausetwoormoresuchpaths
rors to total numberof bits transmitted.
may both have distance closer to the received sequence
than the correct path (even though only one has survived
A . BinarySymmetric Cho.nnel
to this point) and thus the events are not disjoint. For
Thefirst-eventerrorprobability is readilyobtained the example with generating function (1) it follows that
from the generat.ing function T(D)[ (5) for the code of the first-event error probabilitys is bounded by
Fig. 1, whichwe shallagainpursuefordemonstrative
purposes].Wemayassume,without loss of generality, P, < P, + 2P, + i P , + 1 . . + 2kP,+,+ * . . (9):
since we are dealing with group codes, that the all zeros where PI,is given by (8).
path was, transmitted. Then a first-event error is made a t I n Section VII-Cit wili be shown that (8) canbe
the jth step if this path is excluded by selecting another upper bounded by (see (39) ) .
7 Although moreelaboratemodulators, such asmultiple FSK

Pk < 2kp(1 - p)k/2. (10)
or multiphase modulators,mightbeemployed, Jacobs [ I l l has Usingthis,thefirst-eventerrorprobabilitybound (9)
shown that the most effective as well as the simplest system for
wide-band space and satellite channels is the binary PSK modu-
lator considered inthe example of thissection.Wenoteagain
that the performance of quadriphasemodulationisthesameas 8 We are ignoring the finite length of the path, but the expres-
for biphasemodulation,whenbotharecoherentlydemodulated. sion is stillvalid since it is an upper bound.
can be more loosely bounded by CORRECTPATH x.
P, < 2k-52kp(1- p ) k / 2
k=5
/PATH x "
where T ( D ) is just the generating function of ( 1 )

Y INCORRECTSURVIVOR x'
It follows easily that for a general binary-tree ( b = 1 )

Fig. 12. Example of decoding decision after initialerror has
convolutional code with generating function occurred.
m
T(D) = ak Dk ( 1 2)
k=d binary-tree code if we weight each term of the first-event
the first-event error probability is bounded by the gen- error probabilitybound at any step by the number of
erroneous bits for each possible erroneous path merging
eralization of ( 9 ) .
with the correct path at that node level, we upper bound
i - thebiterrorprobability. For, agiven step decision
corresponds to decoder actionon onemore bit of the
transmitteddata sequence; the first-eventerrorprob-
where Ph:is given by (8) and more loosely upper bounded abilityunionboundwitheachterm weighted bythe
by the generalization of (11) correspondingnumber of biterrors is anupperbound
on the expectednumber of biterrors caused bythis
P E < T'(D) I D = ~ ~ F K I ~ . (14)
action.Summingthe expect,ed number of biterrors
Whenever a decision error occurs, one or more bits will over L steps, which as was just shown mayresultin
be incorrectly decoded. Specifically, those bits in which overedmating through double counting, gives an upper
the path selecteddiffersfrom the correctpath will be bound on the expected number of bit errors in L branches
incorrect. If only one error were ever made in decoding for arbitrary L. But since the upper bound on expected
an arbitrary long code path, the number of bits in error number of bit errors is the same a t each step, it follows,
in this incorrect path could easily be obtained from the upon dividing the sum of L equal terms by L, that t,his
augmentedgeneratingfunction T ( D , N ) (such as given expectednumber of biterrors per step is justthebit
by (2) with factors in L deleted). For the exponents of errorprobability P,, forabinary-tree code (b = 1 ) .
the N factors indicate the number of bit errors for the If b > 1, then we must divide this expression by b, the
given incorrect path arriving a t node a at the jth level. number of bits encoded and decoded per step.
After the first error has been made, the incorrect paths T o illustrate the calculation of PIj for a convolutional
no longer will be compared with a path which is overall code, let us consideragain the example of Fig. 1 . Its
correct, but rather with a path which has diverged from transfer function in D and N is obtained from (2), letting
the correct path over some span of branches (see Fig. 12). L = 1, since we are not now interested in the lengths of
If the correct path x hasbeen excluded bya decision incorrect paths, to be
error at step j in favor of path x', the decision at step
j + 1 will bebetween x' and x". Now the (first-event) T(D,N ) = D5N
errorprobability of (13) or (14) isforacomparison, 1 - 2DN
at any step, between path x and any other path merging - D5N + 2D6N2= . . . + Z k Dk"Nk+' + . . . .
with it at that step, including path x" in this case. How-
ever,since the metricgfor path x' is greaterthanthe (15)
metricfor x, foronthisbasisthecorrectpath was
excluded at step j, the probabilit,~ that path x" metric The exponents of the factors in N in each term deter-
exceeds path x' metric at step j +1 isless thanthe mine
ing to
the number of bit errors for the path(s) correspond-
that term. Since T ( D ) = T(D, N ) I N = l yields the
probability thatpath x" exceeds the (correct) path x
metric atthispoint. Consequently, theprobability of first-event errorprobability P,, eachofwhose terms
a newincorrect path beingselectedafteraprevious must be weighted by the exponent of N to obtain PO, it
errorhasoccurred is upperboundedby the first-event follows that we should first
differentiate T ( D , A') a t
error probability at that step. N = 1 to obtain
Moreover,whenaseconderror follows closely after
afirsterror, itoften occurs(as inFig. 12) thatthe
erroneous bit(s) of path x" overlap the erroneous bit(s)
of path x'. With this in mind, we now show that for a
9 Negativedistancefromthe receivedsequence for the BSC, = --. D5

but clearly thisargument generalizes toany memoryless
channel. (1 - 20)'
Then from this we obtain, as in (9) , that for the BSC higher met,ric than the correct path, i.e.,
PB <p5 + 2*2P6 "
Xii'yii 2
n
xiiyii
+ 3'4P7 + * * ' + ( k + 1)2kpk+,+ " * (17) i i=1 i i=l
where Pr, is given by ( 8 ) . or

If for P, we use the upper bound (10) we obtain the n
weaker but simpler bound i j=1
where i runsoverallbranchesinthetwopaths. But

P B < 2 (k - 4)2k-5[.4p(1
K=5
- p)lk'? since, as we have assumed, the paths x and x' differ in
exactly k symbols,wherein x i j = 1 and xiit = - 1, the
pairwise error probability .is just
More generally for any binary-tree ( b = 1) code used

on the BSC if
then corresponding to (17)

where r runs over the k symbols wherein the two paths
differ. Now it was shown in Section VI that the yii are
independent Gaussian random variables of variance N0/2
and corresponding to (18) we have the weaker bound and mean &xii, where xii is the actually transmitted
codesymbol.Since we are assuming that the (correct)
t,ransmitted path has xii = +1 for all i and j , it follows
that y i i or y, has mean d<
and variance N0/2. There-
For a nonbinary:tree code ( b # I ) , all these expressions fore, sincethe k variables yr are independent and Gaussian,
must be divided by b.
The results of (14) and (18) will be extended to more
the sum 2 = x,=lk
and variance kN0/2.
y7 is also Gaussian with mean IC fi
general memoryless channels, but first we shall consider Consequently,

one more specific channel of particular interest.
13. AWGN Biphnse-Modulated Channel
As was shown in Section VI the decoder for this chan-
nel operates in exactlythesamewayasfortheBSC,
except that instead of Hammingdistance it uses the
metric WerecallfromSectionVI that isthesymbolenergy,
which isrelatedtothebitenergyby =Reb, where
R = b/n. The boundon PE then follows exactly as in
Section VII-Aand we obtainthesamegeneralbound
where xi; = f l arethetransmitted codesymbols, as (13)
y i i the correspondingreceived(demodulated)symbols,
a,nd j runs over the n symbols of eachbranch while i
runs over all the branches in a particular path. Hence,
to analyzeit'sperformance we mayproceedexactlyas where aL are the coefficients of
in Section VII-A except that the appropriate pairwise-
decision errors P k mustbesubstitutedforthose of ( 6 )
to (8).
As before we assume, without loss of generality, that and where d is the minimum distance between any two
the correct(transmitted)path x has zii = +1 forall paths in the code. We may simplify this procedure con-
i and j (corresponding to the all zeros if the input symbols siderably while loosening the bound only slightly for this
were 0 and 1). Let us consider an incorrect path x' merging channel by observing that for x 2 0, y 2 0,
with the correct pat,h a t a particular step, which has k
negative symbols (xij t = - 1) and the remainder posit.ive.
Such a path may be incorrectly chosenonly if it has a
Consequently,for k 2 dl letting 1 = k - d, out tbe firsttwofactors.Since the product of thefirst
from (23) twofactors is always less than one, the moregeneral
bound is somewhat weaker.
C. General Memoryless Channels
< exp
- (2) As was indicated in Section VI, for equally likely input
data sequences, the minimumerrorprobability
chooses thepath whichmaximizes
decoder
the log-likelihood
whence the bound of (24), using (27), becomes
funct.ion (metric)
In P(y I x("'))
or over d l possible paths X("'). If each symbol is transmitted

(or modulates the transmitter) independent of all
preceding and succeedingsymbols, and the interference
corruptseachsymbolindependently of all the others,
thenthe channel,whichincludes the modem,issaid
The bit error probability can be obtained in exactly the
to bememorylessl0 andthe log-likelihood function
same way. .Just as for the BSC [ (19) and (20)l we have
that for a binary-tree code
PB < 2
k=d
CkPk wherc xijO")is a code symbol of the ,mth path, y i j is the
corresponding received (demodulated)symbol, j runs
where ck are thecoefficients of
over the n symbols of each branch, and i runs over the
branchesin t,he given path.Thisincludesthe special
cases considered in Sections VII-A and -B.
The decoder isthesameasforthe BSC except for
Thus following the came arguments which led from (24) using this morc general metric. Decisions are made after
to (28) we have for a binary-tree code each set of new branch metrics have been added to the
previouslystoredmetrics. Toanalyzeperformance, we
must merely evaluate PIC,the pairwise error probability
for an incorrect path which differs in k symbols from the
(31)
correctpath,aswasdoneforthe specialchannels of
For b > 1, this expression must be divided by b. SectionsVII-Aand -B. Proceeding as in (22), letting
To illustrate the application of this result we consider xij and xi/ denotesymbols of the correct and incorrect
the code of Fig. 1 withparameters K = 3, R = 1,/2, paths, respectively, we obtain
whose transferfunction isgivenby (15). For this case
since R = 1j2 and E~ = 1/2 Eb, we obtain Pdx, x')
Since the number of states in the state diagram grows

exponentially with K , direct calculation of the generating
function becomes unmanageable for K > 4. On the other
hand, a generating function calculation is basically just (33)
a matrix inversion (see Appendix I ) , which can be per-
formed numerically for a given value of D . The deriva- where r runs over the k code symbols in which the paths
tive at N = 1 can be upper bounded by evaluating the differ. This probability can be rewritten as
first difference [ T ( D ,1 +
E ) - T ( D , l ) ] / ~ for
, small C.
A computer program has been written to evaluate (31)
for anyconstraintlengthupto K = 10 andallrates (34)
R = l/n as well as R = 2/3 and R = 3/4. Extensive
results of thesecalculationsare given in thepaperby
where Y k istheset of allvectors y = (gl, y2, .. ,
y, . . , yk) forwhich
Heller and Jacobs [24], along with the results of simula-
tions of the corresponding codes and channels. The sim-
ulations verify the tightness of the bounds. Often more than one code symbol in a. given branch is used
to modulate the transmitter a t one time. In this case, provided
In the next section, these bounding techniques will be theinterferencestill affects succzedingbranchesindependently,
extended to moregeneralmemorylesschannels,from the channel can still be treated as memoryless but now the sym-
bol likelihood functionsare replaced by branch likelihood func-
which (28) and (31) can be obtained directly, but with- tions and (33) is replaced by a single sum over i.
762 TRANSACTIONS IEEE O N COMMUNICITIONS TECHNOLOGY, OCTOBER 1971
that the likelihood functions (probability densities) were

(35)
But if this is the case, then
where x,.= + 1.or -1 and
firstinequalityisvalidbecause we are multiplying the where we have used (41) and x$ = xi2 = 1. The product
summandbyaquantitygreaterthanunity,"andthe of these k identical terms is, therefore,
secondbecause we are merely extendingthesum of
positive terms over a larger set. Finally we may break
up the k-dimensional sum over y into. IC one-dimensional P, < exp (2)
summationsover yl, yz, . . , vk, respectively,andthis
+
yields forallpairs of correctandincorrectpaths.Inserting

theseboundsin the generalexpressions (24)and(29) ,
P,;(x,x') 5
UI I,
+ * *
Uk
n P(y7 I
k
1-1
x,)1/2P(y, I and using (25)and(30)yieldsthe boundonfirst-
event error probability and bit error probability.
=
k
7-1
c P(y, I xJ1~2P(YrI
I,
(37)
To illustrate the use of this bound we consider the two

specific channels treated above. For the BSC,y,. is either
equal to xr, the transmitted symbol, or t o Z,., its comple-
ment. Now y,. depends on x, through the channel statis- which are somewhat (though not exponentially) weaker
tics. Thus than(28)and(31).
A characteristicfeature of both the BSC andthe
P(y, = x,). = 1- p AWGN channelis that they affecteachsymbol in' the
P(y, = 2,) = p . (38) samewayindependent of itslocationinthe sequence.
Any memoryless channel has this property provided it is
For each symbol in the set r = 1, 2, * , k by definition - stationary (statistically time invariant). For a station-
x,. # x,.'. Hence for each term in the sum if x,. = 0, x,' = 1 ary memorylesschannel (37) reduces to
or vice versa. Hence, whateverx,.and x,.' may be
ut-0
where13
and the product (37) of k identical factors is
Do A P(y, I Z~)''~P(~~
I X,')''' < 1. (46)
P, = 2k p k / 2 (1 - p y 2 (39) Ilr
for all pairs of correct and incorrect paths. This was used While thisboundon Pk isvalid for all suchchannels,
in Section VII-A to obtain the bounds (11) and (21). clearly it depends on the actual values assumed by the
For the AWGN channel of Section VII-B we showed symbols x , and x,', of the correct and incorrect path, and
these will generally vary according to the pairs of paths
11 This would be the set of all 2' k-dimensional binary vectors
x and x' inquestion.However, if theinput symbols
for the BSC, and Euclidean k space for theAWGN channel. arebinary, x and 3, whenever x, = x , then x?' = 3,
Note also that the bound of (36) may be improved for
asymmetricchannelsbychangingthetwoexponents of ?h to s
-
and 1 s, respectively, where 0 s < 1. < 13 For an asymmetric channel this bound may be improved by
changingthe twoexponents 1/2 to s and 1 - s, respectively,
12 The squareroot of a quantitygreaterthanone isalso
greater than one. where 0 < s < 1.
VITERBI CONVOLUTION.4L CODES 763
so that for anyinput-binary memorylesschannel (46) n hi
becomes
Do = P(y I ~ ) " z P ( yI 9)"' (47)
U
and consequently ON
Fig. 13. Systematicconvolutioncoder for K = 3 and T = 1/2.
(49) TABLE I
FREEDISTANCE
MAXIMUM-MINIMUM
wherc D,, is given by (47). Other examples of channels of --
this type are FSK modulation over the AWGN (both co- K Nonsystematics
Systematic
herentandnoncoherelit)andRayleighfadingchannels.
2 3 9
3 4 5
4 4 6
5 6 7
Theterm syste.matic convolutional code referstoa
code on each of whose branches one of the code symbols a We have excluded catastrophic codes (see Section IX); R = a.
is just the data bit generating that branch. Thus a sys-

tematic coder will have its stages connected to only n - 1 catastrophicerrorsis,thatall of theaddershavetap
adders, the izth being replaced by a direct line from the sequences,represented as polynomials,with a common
first stage to the commutator. Fig. 13 shows an R = 1/2 factor.
systematic coder for K = 3. I n terms of the state diagram it iseasily seen that
It is well known that for group block codes, any non- catastrophicerrorscan occur if andonly if any closed
systematic code canbetransformedintoasystematic loop path in the diagram has a zero weight (isel the ex-
code which performsexactly as well. This is notthe ponent of D for the loop path is zero). TO illustrate this,
case for convolutional codes. The reason for this is that, we consider the example of Fig. 14.
as was shown in Section VII, the performance of a code
Assuming that the all zeros is the correctpath,the
on any channel depends largely on the relative distances
incorrect path a b d d . d c a hasexactly6 ones,no
betweencodewords andparticulariy on the minimum
matter how manytimes we go around the self loop d.
free distance dl which is the exponent of D in the leading
Thus for a BSC, forexample,four-channelerrorsmay
term of the generating function. Eliminating one of the
cause 1;s to choose thisincorrectpath or consequently
adders results in a reduction of d . For example, the maxi-
make an arbitrarily large number of bit errors (equal to
mum free distance code for K = 3 is that of Fig. 13 and
two plils the number of times the self loop is traversed).
this has d = 4,while the nonsystematic K = 3 code of
Similarlyfor theAWGN channelthisincorrectpath
Fig. 1 has minimum free distance d = 5 . Table I shows
witharbitrarilymanycorrespondingbiterrors will be
the maximum minimum free distance for systematic and
chosen with probability erfc 4 6 e , / N 0 .
nonsystematic codes for K = 2through 5 . For large
Anothernecessaryand sufficient conditionfor cata-
constraint lengths the results are even more widely sepa-
strophic error propagation, recently found by Odenwalder
rated. In fact, Bucher and Heller [19] have shown that
[20]is that any nonzero data path in the trellis or state
for asymptotically large K , the performance of a syste-
diagramproduces K - 1 consecutivebrancheswithall
matic code of constraint length K is approximately the
zero code symbols.
sameasthat of anonsystematic code of constraint
We observe also that for binary-tree (k= l,/n) codes,
length K ( l - R ) . Thus for R = 1/2 and very large K ,
if each adder of the coder has an even number of con-
systematic codes have the performance of nonsystematic
nections, then the self loop corresponding to the all ones
codes of half the constraint length, whilerequiring ex-
(data) state will have zero weight and consequently the
actly the same optimaldecoder complexity. For R = 3/4,
code will be catastrophic.
the constraint length is effectively divided by 4.
Themainadvantage of a systematic code is that it
cannever be catastrophic,sihce each closed loop must
IX. CATASTROPHIC ERRORPROPAGATION IN
contain a t least one branch generated by a nonzero data
CONVOLUTIONAL CODES bit and thus having a nonzero code symbol. Still it can be
MasseyandSain [ 131 have defined a catastrophic shown [23] that only a. small fraction of nonsystematic
error as the event thata finite number of channel symbol codes if3 catastrophic (in fact, 1/(2" - 1 ) for binary-tree
errors causes an infinite number of data bit errors to be R = l./n codes. Wenotefurtherthat if catastrophic
decoded. Furthermore, they showed that a necessary and errors are ignored, nonsystematic codes with even larger
sufficient conditionforaconvolutional code to produce free distance than those of Table I. exist.
IEEE TRANSACTIONSO N COMMUNICATIONS TECHNOLOGY, OCTOBER 1971
Fig. 14. Coder displaying catastrophic error propagation.
BOUNDS
X. PERFORMANCE FOR BESTCONVOLUTIONAL come isahead we connect the particular stage to the
MEMORYLESS
CODESFOR GENERAL CHANNELS AND particular adder; if it is a tail we do not. Since this is
COMPARISON WITH BLOCK CODES repeated for each new branch, the result is that for each
We beginbyconsidering thepathstructure of a branch of the trellis the code sequence is a random binary
binary-tree14 ( b = 1) convolutional code of any con- n-dimensional vector. Furthermore, it can be shown that
straint K , independent of the specific coder used. For this the distribution of theserandom codesequences is the
purpose we needonly determine T ( L ) the generating same for each branch a t each node level except for the all
function for the state diagram with each branch labeled zeros path, which must necessarily produce the all zeros
merelyby L so that the exponent of each term of the code sequence on each branch. To avoid treating the all
infiniteseriesexpansion of T ( L ) determines the length zeros path differently; we ensure statistical uniformity by
over which an incorrect path differs from the correct path requiring further that after each shift a random binary
before merging with i t a t a given node level. (See Fig. 7 n-dimensional vector be added to each branch16 and that
and (2) with D = N = 1). this also be reselected after each shift. (This additional
After some manipulation of the state-transition matrix artificiality is unnecessary for input-binary channels but
of the state diagram of a binary-tree convolutional code isrequired to prove our resultfor general memoryless
of constraint length K , it is shown in Appendix 115 that channels). Further details of this procedure are given in
Viterbi [9].
LK(l - L) LK We now seek a bound on the average error probability
T(L) = <----
1 - 2L of this ensemble of codes relative to the measure (random-
1 - 2L+ LK
selection process) imposed.. We begin by considering the
+ +
= LK(l 21, +
4L2 * * * +
2‘Lk +
* * .) (50) probability thatafter transmissionoveramemoryless
channelthemetric of one of the fewer than 2k paths
where the inequality indicates that more paths are being
counted thanactuallyexist.The expression (50) indi-
merging with the correct path after differing in K + k
branches,isgreater than the correctmetric. Let Si be
cates that of the paths merging with the correct path at
the correct(transmitted)sequenceand xi‘ anincorrect
a given node level there is i o more than one of length K,
sequencefor theithbranch of thetwopaths.Then
no more than two of length K +
1, no more than three of
following the argument whichled to (37) we have that
+
length K 2, etc.
the probability that the given incorrect path may cause
Wehavepurposelyavoidedconsideringtheactual
a n error is bounded by
code or coder configuration so that the preceding expres-
sions arevalidforallbinary-tree codes of constraint K+k
length K. Wenowextendourclass of codes to include P I ( + k ( X , x)) .< r]: P(y, I Xiy2P(yiI
i-1 yi
(51)
time-varying convolutional codes. A time-varying coder
is one in which the tap positions may be changed after where the product is over allK +
k branches in the path.
eachshift of thebitsintheregister.We consider the If we now average over the ensembleof codes constructed
ensemble of all possible time-varying codes,which in- above we obtain
cludes as asubsettheensemble of all fixed codes,for K+ k
a given constraint length K . ‘We furtherimpose a uniform
probabilisticmeasure on all codes inthisensembleby
PI(+, <r ]: x i X i ’ y i q(xi)p(yi I x i ) ’ / 2 ~ ( x i ’ ) P ~I i
i-1
randomlyreselectingeach tap position after each shift (52)

of the register. This can be done by hypothetically flip-
ping a coin nK times after each shift, once for each stage where q ( x ) is the measure imposed on the code symbols
of the register and for each of the n adders. If the outof eachbranchbytherandomselection,and because
of thestatisticaluniformity of allbranches we have
1 4 Although for clarityallresults will bederived for b = 1, PK+k < ( [ q(x)P(y I X)1/2]2)KCk
= 2 - ( K + k ) n R o(53)
the extension to b > 1 isdirectandtheresults will be indi- Y X
cated at the endof this Section.
15This generatingfunctioncan also,be used to obtainerror
bounds for orthogonal convolutional codes all of whose branches 16 The samevectorisadded to allbranches a t a givennode
have the same weight,as is shown in Appendix I. level.
where To improve on these bounds when R > R,, we must

improveontheunionboundapproachbyobtaininga
single boundontheprobabilitythatanyone of the
fewer than 2' paths whichdifferfrom the correct path
in K -t k branches has a metric higher than the correct,
Note that the random vectors x and y are n dimensional. path a,t a given node level. This bound, first derived by
If each symbol is transmitted independently on a memory-Gallager [5] far block codes, is always less than 2 k times
lesschannel,such as mas the case inthechannels of the bound for each individual path. Letting QK+k L& P r
Sections VII-A and -B, (54) is reduced further to (anyone of 2' incorrectpathmetrics > correct path
cc metric), Gallager [5] has shown that its ensemble average
a0 = - log2 { v z dX):)t.'(YI x)1/2121 (55) for the code ensemble is bounded by
&K+k < ~kp2-(K+k)nEo(p) (5%
where x and y are now scalar random variables associated
witheach codesymbol.Note 'also that because of the where
statisticaluniformity of the code, theresultsare in-
dependent of which path wastransmittedand which
incorrect path we are considering.
Proceeding as in Section VII, it follows that, a union
bound on the ensembleaverage of the 'first-eventerror 0 <p 5 1 (59)
probability is obtainedbysubstituting pKckfor LK+k where p is an arbitrary parameter which we shall choose
in (50). Thus to minimize the bound. It is easily seen that Eo(0) = 0,
while E,(l) = R,, in which case = a k P P , + k , the
ordinaryunionbound of (56). Webound the overall
ensemble first-event error probability by the probability
of the union of these composite eventsgivenby (58).
Thus we find
where we have used the fact that since b = 1, R = l / n

bits/symbol.
T o bound the bit, errorprobability we must weight Clearly (60) reduces to (56) when p = 1.
each term of (56) by the number of bit errors for the Todeterminethebiterrorprobability using this
corresponding incorrect path. This could be done by eval- approach, we must recognize that refers to 2k
uating the transfer function T(L,N ) as in Section VI1 differentincorrect paths,eachwithadifferentnumber
(seealsoAppendix' I ) , but.asimpler approach, which of incorrect bit,s.However,justas wasobserved in
yieldsasimplerboundwhich is nearlyastight, is to deriving (57), an incorrect path whichdiffersfrom the
recognize that an incorrectly chosen path which merges correct path in K + k branchesprior to merging can
with the correct path after K + IC branches can produce produce a t most k + 1 bit errors. Hence weighting the
nomore IC + 1 biterrors. For, any path whichmerges kth term of (60) by k + 1, we obtain
with the correct pathat a givenlevel must begen-
erated by data whichcoincideswith the correct path
data over the lastK - 1 branches prior to merging, since
only in this way can the coder register be filled with the
same bits as t.he correct path, which is the condition for
merging. Hencethenumber of incorrectbitsduetoa Clearly (61) reduces t.o (57) when p = 1.
path whichdiffers fromthecorrectpath in K + IC Before we can interpret the results of (56) , (57), (60),
branchescanbe no greater than K + k - ( K - 1) = and (61) i t isessential that we establishsome of the
k + 1. properties of Eo(p)(0 < p 5 1) definedby (59). It can
Hence we mayoverbound p B by weighting thekth beshown [5], [14] thatforany memorylesschannel,
term of (56) by k + 1, .which results in E o ( p ) is a concave monotonic nondecreasing function as
shown in Fig. 15 with E,(O) = 0 and Eo(1) = e,.
Where the derivative E,,'(p) exists, it decreases with p
and it follows easily from the definition that
(57)
The bounds of (56) 8nd (57) arefinite onlyforrates
R < Ro, and Ro canbeshown.to be always less than 1
the channel capacity. = - I(Xnl Y") 4
n
c
766 IEEE TRANSACTIONS O N COMMUNICATIONS TECHNOLOGY, OCTOBER 1971
LIM EIR)
6-01
Fig. 15. Example of E&) function for general

memoryless R, e R
channel.
Fig. 16. Typical limiting value of exponent of (67).
the mutual information of the channells where Snand Y n

are the channel input and output 'spaces, respectively, for Fig. 15 demonstratesthegraphicaldetermination of
eachbranphsequence.Consequently, i t follows t h a t t o lim6-,oE(R) from Eo(p).
minimize the bounds (60) and ( 6 i ) , we must make p' i 1 It follows frorn the properties of E,,(p) described, that
as 'large as possible t o maximize the exponent of the for R > Rot lim6-o E ' ( R ) decreases from Ro to 0 as R
numerator, but at the same'time'we must ensure that increases from Ro t o C, but that it remains positive for
all rates less than C. The function is shown for a typical
channel in Fig. 16.
It is particularly instruct.ive to obtain specific. bounds,
in the limiting' case, .for the class of "very noisy!' chan-
inorder t o keepthedenominatorpositive.Thussince nels,whichincludes the BSC with p = 1/2 - y where
E O ( l )= R,) 'and E,,(p) < Ro, for p < 1, i t follows that 171 << 1 'and the
biphasemodulated
AWGN with
for R < Ro and sufficientlylarge K we shouldchoose . c ~ / N , )<< 1. Forthisclass of channels i t can be shown
p = 1, or equivalently use the bounds ( 5 6 ) and (57). We [ 5 ] that
may thus combine all the above bounds into the expres-
sions
and consequently R,, = E o (1) = C/2.(FortheBSC,

C = y2/2 In 2 while for the AWGN, C = E ~ / I VIn, , 2.)
For thevery noisychannel,suppose we let p = C/
P R - 1, so that using (68) we obtain Eo(,,) = C - R .
P, <
11 - 2- 6 ( R )]2 Then in the limit 'as 6 -j 0 ( 6 5 ) becomes for a very noisy
channel
where
E(R) = p 0 ,
(Eo(p),
OIR<R,
R, < R < C, 0 <p 5 1
lim $(R) =
'6 -0
{y:
R,
0 S R 5 C/2
C/2 5 R 5 C .
(69)
p/R
This limiting form of E ( R ) is shown in Fig. 17.
Thebounds (63) and (64) arefortheaverageerror
&(a) = - 1, O<R<R, (66) probabilities of the ensemble of codesrelat.ive tothe
Eo(p)/R - P , Ro 5 R < C, 0 <P 5 1. measure induced by random selection of the time-varying
coder t a p sequences. At least.onecode in the ensemble
To minimize the numerators of (63) and (64) for R > Ro must perform better than $he average. Thus the bounds
we should choose p as large as possible, since E j o ( p ) is a (63) and (64) hold for t.hebest time-varyingbinary-
nondecreasing function of p . However, we are limited by tree convolutional coder of constraint length K . Whether
t.he necessity of making S ( R ) > 0 t o keep the denomi- there exists a fixed convolutional code with this perform-
natorfrom.becomingzero. On theotherhand, as the ance is an unsolved problem. However, for small K the
constraint length K becomes very large we may choose results of Section VI1 seem to indicate that these bounds
. .
S(R) = 6 very small. In particular, as 8 approaches 0, are valid also forfixed codes.
(65) approaches ' T o determine'the tightness' of the upper bounds, it. is
useful to have lower bounds for convolutional code error
probabilities, It canbeshown [9] that for all R < C
17 C canbemadeequaltothechannelcapacitybyproperly
choosing the ensemble measure q ( x ) . For an input-binary channel
the random binary convolutional coder described above achieves
this.Otherwise ' further transformation of the branch sequence
a
into a smaller set, of nonbinary sequences is required 191. and o ( K ) + 0 as K+ w. Comparison of the parametric
VITERBICONVOLUTIONALCODES 767
LIM EIRI
Both Eb( R ) and E L b ( R )arefunctions of R which for
all R > 0 are less than the exponents E ( R ) and E L ( R )
‘. for convolutional codes [SI. In particular, for very noisy

channels they both become [5]
c/2
Fig. 17. Limiting values of E ( R ) for very noisy channels
This is plotted as a dotted curve in Fig. 17.

equations (67) with (71), shows that
Thus it is clear by comparing the magnitudes of the
E L @ ) = 1im8-,”E(R) negat.ive exponents of (73) and‘ (64) that, at least for
very noisy channels, a convolutional code performs much
for R > R,, but is greater for low rates. better asymptotically than the corresponding block code
For very noisy channels, i t follows easilyfrom (71) of the same order of complexity. In particular at R =
and (68) that C / 2 , theratio of exponents is 5.8, indicatingthatto
achieve equivalent performance asymptotically theblock
E,(R) = C - R, 0 5 R 5 C. length must be over five times the const.raint length of
Actually, however, tighter lowerbounds for R < C/2 the convolutional code. Similar degrees of relative per-
(Viterbi 191) show that for very noisy channels forma,nce can beshown formore generalmemoryless
channels [ 91.
More significant from a practical viewpoint, for short
constraint lengths also, convolutional codes considerably
outperform block codes of the same order of complexity.
which is precisely theresult of (69) or of Fig. 17. It
follows that, at least,forvery noisy channels,the ex-
ponential bounds are asymptotically exact. XI. PATHMEMORY TRUNCATION METRICQUANTIZATION
AND SYNCHRONIZATION
All the result,s derived in this section can be extended
directly to nonbinary ( b > 1) codes. It iseasily shown A major problem which arises in the implementat.ion
(Viterbi [ 9 ] ) that the same results hold with R = b / n , of a maximumlikelihooddecoder is the length of the
R,, and E o ( p ) multiplied by b , and all event probability path history which must be stored. In our previous dis-
upper bounds multiplied by 2b - 1, and bit probability cussion weignored t.his importantpointandtherefore
upper bounds multipliedby (ab - l ) / b . implicitlyassumed that all past data would be stored.
Clearly, the ensemble of codes considered here is non- Afinal decision was made by forcing the coder intoa
systematic. However, by a modification of the arguments known (all zeros) state. We now remove this impractical
used here, Bucher and Heller [19] restricted the ensem- condit.ion. Suppose we truncate the path memories after
ble to systematic t.ime-varying convolutional codes (i.e., M bits(branches)have been accumulat.ed, bycompar-
codes for which b code symbols of eachbranchcorre- ing all 2K metrics for a maximum and deciding on the
spond to the data which generates the branch) and ob- bit corresponding to that path (out of 2 K ) with the high-
tained all the above results modified only to the extent est metric M branches forward. If M is several times as
that the exponents E ( E ) and ET,( R ) are multiplied by large as K , the additional bit errors introduced in this
1 - R. (See also Section VIII.) way are very few, as. we shall now demonstrate using the
Finally, it. is most revealing to compare the asymptotic asymptotic results of the last section.
’ resultsforthe best convolutional codes of a given con- An additionalbiterrormay occur dueto memory
straint length with the corresponding asymptotic results truncation after M branches, if the bit selected is from
for the best block codes of a given block length. Suppose an incorrect path which differed from the correct path M
t.hat K bits are coded into a block code of length N so branches back and which has a higher metric, but which
that R = K / N bits/code symbol. Then i t can be shown would ultimately be eliminated by the maximum likeli-
(Gallager [5J , Shannon e t al. [8] ) that for the best block hood decoder. But for a binary-tree code there can be no
code, t.he bit error probability is bounded above and be- more 1,han 2$‘ distinct paths which differ from the correct
low by path M branchesback. Of these we need concern our-
selves only with those which have not merged with the
correct path in the intervening nodes. As was originally
where shown byForney [ 1 2 ] , usingthe ensemble arguments
of Section X we may bound the average probability of
this event by [see (58)]
768 IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY. OCTOBER 1971
T o minimize this bound we shouldmaximize the expo- in performance between optimal and suboptimal metrics
nent E o ( p ) / R - p with respect to p on the unit interval. is significant [ 111.
But this yields exactly E , ( R ) ,the upper bound exponent In a practical system other considerations than error
of (73) for block codes. Thus performancefora given degree of decodercomplexity
oftendictatethe selection of acoding system. Chief
among these are often the synchronization requirements.
Convolutional codes utilizingmaximumlikelihoodde-
where E , ( R ) is the blockcodingexponent. coding areparticularlyadvantageousinthat noblock
Weconclude thereforethatthe memorytruncation synchronization is ever required. For block codes,de-
error is less than the bit error probability bound without coding cannot begin until the initial point of each block
truncation,providedthe bound of (76) is less than the has been located. Practical systems often require more
bound of (64). This will certainly beassured if complexity in the synchronizationsystemthaninthe
decoder. On the other hand, as we have by now amply
illustrated, a maximum likelihood decoder for a convolu-
tional code doesnot’require any blocksynchronization
because the coder is free running (i.e., it performs identi-
Forvery noisychannels we havefrom (69) and (74) cal operations for each successive input bit and does not
or Fig. 17, that require that I< bits be input before generating an out-
put). Furthermore, the decoder does not require knowl-
edge of past inputs to start decoding; it may as well as-
0 I tl _< c/4
sume that all previous bits were zeros. This is not to say
thatinitiallythe decoder will operateas well, inthe
sense of error performance, as if the preceding bits of the
I 1 - R/C
(1 - ’
C/2 < R < C
correct path were known. On the other hand, consider a
decoderwhich startswithaninitiallyknownpathbut
makes an error at some point and excludes the correct
path. Immediately thereafter it will be operating as if it
For example, at R = C / 2 this indicates that it suffices hadjust been turned on withanunknownandincor-
to take M > (5.8)K. rectly chosen previous path history. That this decoder
Anotherproblemfacedby a systemdesigner is the will recover and stop making errors within a finite num-
amount of storage required by the metrics (or log-likeli- ber of branches follows from our previous discussions in
hoodfunctions)foreach of the ZK paths. Fof. a BSC which itwas shown that-, otherthan forcatastrophic
this poses no difficultysince themetric is justthe codes, error sequences are always finite. Hence our ini-
Hammingdistance which is at most n, thenumber of tiallyunsynchronized decoder will operatejustlikea
code symbols, per branch. For the AWGN, on the other decoder which has just made an error and will thus al-
hand, the optimum metric is a real number, the analog ways achieve synchronization and generally will produce
output of a correlator,matchedfilter, or integrate-and- correct decisions after a limited number of initial errors.
dump circuit. Since digital storage is generally required, Simulations have demonstrated that synchronization gen-
it is necessary t o quantize this analog metric. However, erally takes no more than four or five constraint lengths
once the components yjk of the optimum metric of (5), of received symbols.
whicharethecorrelatoroutputs,havebeenquantized Alt.hough, as we have just shown, branch synchroniza-
to Q levels, the channel is no longer an AWGN channel. tion is not required, code symbol synchronization within
For biphase modulation, for example, it becomes a binary a branch is necessary. Thus, for example, for a binary-
input Q-ary output discretememorylesschannel,whose tree rate R = 1/2 code, we must resolve the two-way
transition probabilities are readily calculated asa function ambiguity asto whereeachtwocode-symbol branch
of the energy-to-noise density and
the
quantization begins. This iscalled node synchronization. Clearly if
levels. The optimum metric is not obtained by replacing we make the wrong decisions, errors will constantly be
yi, by its quantized value &(yjk) in (5) but rather it is madethereafter.However,thissituationcaneasily be
the log-likelihood function log P ( y I x c m ) )for the binary- detected because the mismatch will cause all the path
input Q-ary-output channel. metrics to be small, sincein fact there will not be any
Nevertheless,extensivesimulation [24] indicates that correct path in this case. We can thus detect this event
for 8-level quantization even use of the suboptimal metric and change our decision as to node synchronization (cf.
ck Q ( ~ J , ~ ) Zresults
~ ~ (in
~ ) a degradat,ion of nomore Heller and Jacobs [24]). Of course, for an R .= l / n code,
than 0.25 dB relative to the maximumlikelihood decoder we may have to repeat our choice n times, once for each
for the unquantized AWGN, and that use of the optimum of the symbols on a branch, but since n represents the
metric isonlynegligiblysuperior to this. However, t.his redundancy factor or bandwidth expansion, practical sys-
is not the case for sequential decoding,where the difference tems rarely use n > 4.
XII. OTHERDECODING FOR CONVOLU-

ALGORITHMS quantization (8 or more levels-3 or more bits). On the
TIONAL CODES other hand, with maximum likelihood decoding, by em-
ploying
parallel
a implementation, short
constraint
This paper has treated primarily maximum likelihood length codes ( K
decoding of convolutional codes. The reason for this was
< 6 ) can be decoded a t very high data
rates (10 to 100 Mbits/s) even with soft quantiz at'ion.
two-fold: 1) maximumlikelihooddecodingisclosely Inaddition,theinsensitivitytometricaccuracyand
relatedtothestructure of convolutional codes andits simplicity of synchronization render maximum likelihood
considerationenhancesourunderstanding of theulti- decoding generally preferable when moderate error prob-
mate capabilities,performance,andlimitation of these abilities are sufficient. In particular, since sequential de-
codes; 2) forreasonablyshortconstraintlengths ( K < coding is limited by the overflow problem to operate at
10) its implementation is quite feasible'* and worthwhile code rates somewhat below E o , it appears that for the
because of itsoptimality.Furthermorefor K 5 6 , the AWG'N the crossover point above which maximum like-
complexity of maximum likelihood decoding is sufficiently lihood decoding is preferable to sequential decoding oc-
limit,ed that a completely parallel implementation (sepa- curs a t values of P, somewhere between and
rate metric calculators) ispossible. This minimizes the depending on the transmitted data rate. As the data rate
decoding time per bit and affords the possibility of ex- increases the P, crossover point decreases.
tremely high decoding speeds [24]. A third technique for decoding convolutional codes is
Longerconstraintlengthsarerequiredforextremely known as feedbackdecoding, withthresholddwoding
low errorprobabilities a t high rates. Since thestorage [3] asasubclass.Afeedbackdecoderbasicallymakes
andcomputationalcomplexityareproportionalto 2R, a decision on a particular bit or branch in the decoding
maximumlikelihooddecoders become impracticalfor tree or trellis based on the received symbols for a limited
K > 10. At this point sequential decoding [2], [ 4 ] , [ 6 . ] number of branches beyond t.his point. Even though the
becomes attractive. This is an algorithm which sequen- decision isirrevocable,forlimitedconstraintlengths
tially searches the code tree in an at.telnpt to find a path (which are appropriate considering the limitednumber
whose metric rises faster than some predetermined, but of branches involved in a decision) errors will propagate
variable, threshold. Since the difference between the cor- only for moderate lengths. When transmission is over a
rect path metric and any incorrect path metric increases binary symmetric channel, by employing only codes with
with constraint length, for large I< generally the correct certain algebraic (orthogonal) properties, the decision on
path will be foundby this algorithm. The main draw- a given branch can be based on a linear function of the
back is that the number of incorrect path branches, and receivedsymbols,called the syndrome, whose dimen-
consequently the computationcomplexity,isarandom sionality is equal to the number of branches involved in
variabledepending on thechannel noise. For R < Rot the decision. One particularlysimple decision criterion
it is shown that the average numberof incorrect branches based on this syndrome, referred t o as threshold decod-
searched per decoded bit is bounded [ 6 ] ,while for R > ing, is mechanizable in a very inexpensive manner. How-
R,, it is not; hence R,) is called the computat,ional cutoff ever, feedback decoders in general, and threshold decod-
rate. To make storage requirements reasonable, it is nec- ersinparticular,haveanerror-correctingcapability
essary to makethe decodingspeed (branches/s) some- equivalenttoveryshortconstraint'length codes and
whatlargerthanthe bit. rate,thussomewhatlimiting consequently do not compare favorably with the perform-
the maximum bit rate capability. Also, even though the ance of maximum likelihood or sequential decoding.
.average number of branches searched per bit is finite, i t However, feedback
decoders are
particularly well
may sometimes become very large, resulting in a storage suited to correcting error bursts which may occur in fad-
overflow and consequently relatively long sequences being channels. Burst errors are generally best handled by
ing erased. The stack sequential decoding algorithm [ 7 ], usinginterleavedcodes: that is,employing L convolu-
1181 provides a very simple and elegant presentation
thekey concepts insequentialdecoding,althoughthe
of tional codes so that the jth, (L + +
j ) t h (2L j)th, etc.,
bitsare encoded into one code foreach j = 0, 1, . * ,
Fano algorithm [4] is generally preferable practically. L - 1. This will cause any burst of length less than L
For a number of reasons, including buffer size require- to be broken up into random errors for the L independ-
ments, comput.ation speed, and metric sensitivity,sequen- entlyoperating decoders. Interleavingcan be achieved
tial decoding of data transmitted at rates above about bysimplyinserting L - 1 stagedelay linesbetween
100 K bits/s is practical only for hard-quantized binary stages of the convolutional encoder; the resulting single
received data (that is, for channels in which a hard deci- encoder then generat.es the L interleaved codes. The sig-
sion -0 or 1- is made for each demodulated symbol). nificant advantage of a feedback or threshold decoder is
For thebiphase modulat.ed AWGNchannel, of course, that the same technique can be employed in the decoder
hardquantization (2 levels or 1 bit)resultsinan effi- resulting in a single (time-shared) decoder rather than L
ciency loss of approximately 2 dB comparedwithsoft decoders,providingfeasibleimplementations forhard-
quantizedchannels, even forprotectionagainsterror
18 Performing metric calculations and comparisons serially. bursts of thousands of bits. Details of feedback decoding
aretreatedextensivelyinMassey[3],Gallager [ 141, times the first), we obtain finally a 2"' - 1 dimensional

and Lucky e t al. 1161. matrix equation, which for K = 4 is
GENERATING
FUNCTIONFOR STRUCTURE
CONVOLUTIONAL CODE
BOUNDS
APPENDIXI
OF A BINARY-TREE
K A N D ERROR
FOR ARBITRARY
FOR ORTHOGONAL CODES L
-L
L
1
0
-L
1 - NL
] * P I n 1 ] = [
X,,,
:1. (83)
Wederivehere t.he distance-invariant (D = 1) gen- Notethat (83) is thesameas(78)for K reducedby

erat,ingfunction T ( L , N ) for anybinarytree ( b = 1) unity, but with modifications in two places, both in the
convolutionalcode of arbitrary constraint length K . It first row; namely, the first component on the right side
is most convenient in the general case to begin with the is squared, and the middle term of the first rowis re-
finite-state machine state-transition matrix for the linear ducedby an amount NL'. Although we have given the
equations among t.he state (node) variables. We exhibit explicit result only for K = 4, it is easily seen t o be valid
this in terms of N and L for a I< = 4 code as follows: for any K .
1 0 0 -NL 0 0 0 1
-L 1 0 0 -L 0 0
-NL 0 1 0 -NL 0 0
JT-'J=l,]'
0 -L' 0 1 0 -L
0 1
0
0
-NL
0
O
-0L
-NL
0
0
0
0
1
- N0L :
-L
1-NL
Thispatterncan be easilyseen t o generalizetoa Since in all respects, except these two, the matrix after
2K-1 - 1 dimensional square matrix of t,his form for any thissequence of reductionsisthesame as the original
binary-tree code of constraint length K , and in general butwithitsdimensionreducedcorrespondingtoare-
the generating function duction of K by unity, we may proceed t o perform this
sequence of reductions again. The steps will be the same
T ( L ,N ) = LXloo...o, except that now in place of (go), we have
where 100 . . 0 contains ( K - 2) zeros.
+ (79)
N X i , i ,...j K _ , O l = Xil;;...;rc-rll (80')
Fromthisgeneralpatternitiseasilyshownthatthe
matrix can be reduced to a dimension of ZX-'. First. com- and in place of (82)
bining adjacent rows, from the second to the last, pair-
wise, one obtains the set of ZK-' - 1 relations X"00 ...01 = NLX'on ...o 1 + Xon...111 (82')
NX;l;a...iK-20
= X;l;l...;h.-21 (80) while in place of (81) the right of center term of the first.
where jl,j., . . . , j K - 2 runs over all binary vectors except
+
row is - ( L L 2 ) and the first component on the right
side is N'L'. Similarly in place of (83) the center term
for t,he all zeros. Subhtution of (80) into. (78) yields a
2fi--'-dimensional matrix equation. The result for R = 4
of the first row is - N ( L L' + +
L3) and the first com-
ponent on the right side is N3L3.
is Performingthissequence of reductions K - 2 times
,,,1 TNLl in all, but omitting the last step-leadingfrom (81) to
(83)-in the last reduction, the original 2K-1 - 1 equa-
tions are reduced in the general case tot.he two equations
xoo-01
Defining the new variable 1 - NL
X'on...o1 = N L Xnn...o1 + Xnn...11 (82)

(whichcorresponds toaddingthe second row to NL
VITERBI : CONVOLUTlONAL CODES 771
whence i t follows that
(NT,)"-'
Xll...1 = (135) - DcK(1 - D,")' < D,"x(l -
1, - N(L + + L 2 * * + LK-') -
(1 - 2D: +D,"K)z (1 - 20,")' (91)
Applying (79) andthe K - 2 extensionsof (80) and where Do isafunction of the channel transition prob-
(80') we find abilities or energy-to-noise ratio and is given by (46).
ACKNOWLEDGMENT
T(L, N ) = LXloo...oo
= LN-lXloo...o,
Theauthorgratefullyacknowledgestheconsiderable
= LN-2Xloo...oll= . * = LN-'"-2'Xll..., stimulationhehas receivedover the course of writing
the several versions of this paper from Dr. J. A. Heller,
-
- NLK whose recentworkstronglycomplementsandenhances
1 - N(L + + L2 * * + this effort, for numerous discussions and suggestion8 and
fqrhisassistanceinitspresentation attheLinkabit
-
- N L ~ (-
I L) Corporation"Seminars on Convolutional Codes." This
1 - L(l +
N ) + NL" t<torial approach owes part of its origin to Dr. G. D.
Forney, Jr., whose imaginative and perceptive reinterpre-
If we require only the path length structure, and not tation of myoriginalwork hsls aidedimmeasurslbly in
the number of bit errors corresponding to any incorrect rendering it more comprehensible. Also, thanks are due
' path, we may set N = 1 in (86) and obtain to Dr. J. K. Omura for his careful and detailed reading
and correction of the manuscript during his presentation
L" -
- LK(l - L) of this material in the UCLA graduate course on infor-
T(L) =
1 - (. L +. L2 + + LK-! 1 - 2L+ LK mation theory.
(87) REFERENCES
If we denote as an upper bound an expression which is [ I ] P. Elias,"Coding for' noisychannels," in 1055 I R E N a t .
the generating function of more paths than exist in our Conv. Rec., vol. 3, pt.4, pp; 37-46.
121 J. M. Wozencraft,"Sequentlal decoding for reliable com-
state diagram, we have munication," in 1957 I R E N a t . Conv. Record, vol. 5, pt.
2, pp.11-25.
[31 J . L. Massey, Threshold Decoding. Cambridge,Mass.:
L" M.I.T. Press, 1963.
T(L) < *-
1 - 2L [41 R. M. Fano, "A heuristic discussion of probabilisticdecod-
ing," I E h E Trans.Inform.Theory, vol. IT-9, Apr. 1963,
p p .64-74.
As an additional application of thisgeneratjngfunc- [51 R. G. Gallager, "A simple derivation of the coding theorem
tion technique, we now obtain bounds on PE and PB for andsome applications," IEEE Trans. Inform. Theory, vol.
rl'-11, Jan. 1965, pp: 3-18.
the class of orthogonal convolutional (tree) codes ,intra- 161 J. M. Wozencraft and I. M. Ja,cobs, Principles of Communi-
duced by Viterbi [ 101. For this class of codes, to'each of cation Engineering. New York: Wiley, 1965.
[71 K. S. Zigangirov,"Some sequential decoding proced'ures,"
the 2 K .branches of the K-state diagram there corresponds Probl. Peredach Inform.,vol. 2, no. 4, 1966, pp. 13-25.
one of 2R orthogonalsignals.Given that eachsignalis [SI C . E,. Shannon, R . G. Gallager, and E. R. Berlekamp,
''Lower boundstoerrorprobabilityfor coding on discrete
orthogonal t0 all others in n 2 1 dimensions, correspond- memoryless channels," Inform. Contr., vol. 10, 1967,'pt..I, pp.
ing to n channel symbols or transmission times (as, for 6!$-103, pt. 11,pp. 522-552.
[91 A . J. Viterbi, "Error bounds for convolutional codes and an
example, if each signal consists of n different pulses out asymptoticallyoptimum decodingalgorithm," IEEE Trans.
of 2% possible positions), then the weightof each branch Inform. Theory,vol. IT-13, Apr. 1967, pp. 260-269.
[lo1 ---, "Orthogonal tree codes for Communlcation inthe
is n. Consequently, ifwe replace L , thepathlength presence of whiteGaussian noise," I E E E Tvans. 8Commun..
enumerator,by D" in (86) we obtainfororthogonal Technol., vol. COM-15, April 1967, pp. 238-242.
E111 I. M. Jacobs, "Sequentid decoding fo'r efficient communica-
codes tionfromdeep space, IEEE'Trans.Commun.Techm'l.,
V O ~ .COM-15, Aug. 1968, pp. 492-M1.
[I21 G-. D. Forney,Jr., "Coding system design for advanced
N D " ~ (-
I 0") solar missions," submittedto NASA Ames Res.Ctr.by
T(D, N ) =
1 - Dn(l + N ) + NDnK (89) Codex Corp., Watertown, Mass., Final Rep., Contract NAS2-
3637, Dec. 1967.
[I31 J . L. Massey and M. K. Sain, "Inverses of linear sequential
Then using (48) and (49) , the first-event error prob- circuits," I E E E Trans.Cornput., vol. C-17, Apr. 1968, pp.
ability for orthogonal codes is bounded by 330437.
[I41 R.. G. Gallager, InformationTheoryandReliableCom-
' m.unicatwn. New York: Wiley, 1968.
1151 T. N. Morrissey, "Analysis of decoders for convolutional
codes bystochasticsequential machinemethods," Univ.
NotreDame, Not.re Dame,Ind.,Tech.Rep. EE-682, May
1968.
[I61 R. W. Lucky, J. Salz, and E. J . Weldon, Principles of Data
and the bit error probability bound is Communication. New York: McGraw-Hill, 1968.
772 IEEE TRANSACTICINS ON COMMUNICATIONS TECHNOLOGY, VOL. COM-19, NO. 5, OCTOBER 1971
[17] J. K. Omura, “On theViterbidecodingalgorithm,” IEEE Andrew J. Viterbi (S’54-M’58SM’63) w&s
Trans. Inform. Theory, vol. IT-15, Jan. 1969, pp. 177-179. born in Bergamo, Italy, on March 9, 1935.
[181 FI .Jelinek,“Fast.sequential decodingalgorithmusinga He received the B.S. and M.S.degrees in
stack,” I B M J. Res. Dev., vol. 13, no. 6, Nov. 1969, pp. electricalengineering fromthe Massachu-
675-685. settsInstitute of Technology,Cambridge,
[191 E. A. Bucherand J. A. Heller, “E;ror robabilitybounds in 1957, and the Ph.D. degreeinelectrical
for systematicconvolutionalcodes, IEEE Trans. Inform. engineering from the University of Southern
Theory, vol. IT-16, Mar. 1970, pp. 219-224. California, Los Angeles, in 1962.
[201 J. P. Odenwalder, “Optimal decoding of convolutional While attending M.I.T., he participated in
codes,’’ Ph.D.dissertation,Dep.Syst. Sci., Sch.Eng.Appl.
Sci., Univ. California, Los ‘Angeles,1970. thecooperativeprogram attheRaytheon
[211 G. D. Forney, Jr., “Codinganditsapplicationin space Company. In 1957 he joined the Jet Propul-
communlcatlons,” IEEE Spectrum, vol. 7, June 1970, pp. sion Laboratory where he became a Research Group Supervisor in
47-58. the Commnnications Systems Research Section. I n 1963 he joined
[221 -, “Convolutional codes I: Algebraic structure,” ZEEE the faculty of the University of California, Los Angeles, as an As-
Trans. Inform. Theory, vol. IT-16, Nov. 1970, pp. 720-738; sistant Professor. In 1965 he was promoted to Associate Professor
“I1: Maximumlikelihood decoding,’’ and “111: Sequential and in 1969 to Professor of Engineering and Applied Science. He
decoding,” IEEE Trans. Inform. Theory, tobepublished. was a cofounderin 1968 of Linkabit Corporation of which he is
[231 W. J. Rosenberg,“Structuralproperties of convolutional presently Vice President.
codes,”Ph.D.dissertation,Dep. Syst.. Sci., Sch.Eng.Appl. Dr. Viterbi is a member of the EditorialBoardsof thePRocmmNGs
Sci., Univ. California, Los Angeles,1971. OF THE IEEE and of the journal Information and Control. He is a
[241 J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite member of Sigma Xi,Tau Beta Pi, and E t a Kappa Nu and has served
and space com,munication,” this issue, pp. 835-848. on several governmental advisory committees and panels. He is the
[251 A. R. Cohen, J. A.Heller,and A. J. Viterbi, “A new cod- coauthor of a book on digital cornmurkation and authorof another
ing technique for asynchronous multiple access communica- on coherent communication, and he has received three awards for
tion,,’ this issue, pp. 849-855. his journal publications.
Burst-Correcting Codes for the Classic Bursty Channel
Abstract-The purpose of this paper is to organizeand clarify posed, but turns out to be a rather inefficient method of
the work of the past decade on burst-correcting codes. Our method burst correction.
is, first,todefine an idealizedmodel,called the classicbursty Of the work that has gone into burst-correcting codes,
channel, toward which most burst-correcting schemes are explicitly
or implicitly aimed; next, to b o y d the best possible performance thebulkhas been devotedtofinding codes capable of
on this channel; and, finally, to exhibit classes of schemes which correcting all bursts of length B separatedbyguard
are asymptotically optimum and serve as archetypes of the burst- spaces of length G. Wecallthese zero-error burst-
correcting codes actually in use. In this light we survey and cat- correcting codes. It has beenrealizedinthepast few
egorize previous work on burst-correcting codes. Finally, we discuss years that this work too has been somewhat misdirected ;
qualitatively theways in whichreal channels failto satisfy’ the
assumptions of the classic bursty channel, and the effects of such for on channels for whichsuchcodes are suited, called
failqreson the various types of burst-correcting schemes.We in this paper classic bursty channels, much more efficient
concludeby comparing forward-error-correction to the popular communication is possiblk if we require pnly that practi-
alternative of automatic repeat-request (ARQ). cally all bursts of length B be correctible.
The principal purpose of this paper is tutorial. In order
INTRODUCTION toclarifythe issues involvedinthedesign of burst-
correctingcodes, we examine an idealizedmodel,the
OST WORK in coding theory has been addressed classic bursty channel, on which bursts are never longer
to efficient communication over
memoryless than B nor guard spaces shorter than G. We see that the
channels.Whilethisworkhas been directly inefficiency of zero-error codes is due to their operating
applicable to space channels [ 13, it has been of little use at the zero-error capacity of the channel, approximately
on all other real channels, where errors tend to occur in ( G - B ) / (G + B ) , rather than at the true capacity,
bursts. The use of interleaving to adapt random-error-
correcting codes toburstychannelsisfrequently pro-
which i s morelike G / ( G + B ) . Operation a t thetrue
capacity is possible, however, if bursts can be treated as
erasures; that is, if their locations can be identified. By
Paper approved by the Communicatioq Theory Committee of theconstruction of some archetypalschemes in which
theIEEECommunication TechnologyGroupforpublication short Reed-Solomon (RS) codes are used withinter-
without oral presentation. Manuscript received May 10,1971.
The author is with Codex Corporatioq, Newton, Mass., 02195. leavers, we arriveatasymptoticallyoptimal codes of

Codes In: Convolutional and 'Their Performance Communication Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Codes In: Convolutional and 'Their Performance Communication Systems

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, VOL. COM-19, NO.

5, OCTOBER 1971 751

Convolutional Codes and ’Their Performance

A LTHOUGH convolutional codes, first introduced

CODE SEQUENCE DATA SEQUENCE

Fig 4. State-diagram representation for coder of Fig. 1.

diagram,both nodeslabeled a can be joinedtogether. SYMMETRIC CHANNEL

at this same state a = 00. All such paths can be traced

Fig. 7. Statediagram labeledaccording to distance,length,and

al. a2. . .ai.. . x, I x2. . .xi. '

Pig. 10. Communicationsystememployingconvolutional

and the log-likelihood function is thus

tive) distance. Consequently, it is clear that maximizing

5 W e have used thenatural logarithmhere, b u t obviously a

7 Although moreelaboratemodulators, such asmultiple FSK

can be more loosely bounded by CORRECTPATH x.

where T ( D ) is just the generating function of ( 1 )

It follows easily that for a general binary-tree ( b = 1 )

9 Negativedistancefromthe receivedsequence for the BSC, = --. D5

where Pr, is given by ( 8 ) . or

weaker but simpler bound i j=1

where i runsoverallbranchesinthetwopaths. But

More generally for any binary-tree ( b = 1) code used

then corresponding to (17)

general memoryless channels, but first we shall consider Consequently,

or over d l possible paths X("'). If each symbol is transmitted

Since the number of states in the state diagram grows

that the likelihood functions (probability densities) were

where x,.= + 1.or -1 and

yields forallpairs of correctandincorrectpaths.Inserting

To illustrate the use of this bound we consider the two

Fig. 13. Systematicconvolutioncoder for K = 3 and T = 1/2.

is just the data bit generating that branch. Thus a sys-

Fig. 14. Coder displaying catastrophic error propagation.

randomlyreselectingeach tap position after each shift (52)

where To improve on these bounds when R > R,, we must

where we have used the fact that since b = 1, R = l / n

Fig. 15. Example of E&) function for general

the mutual information of the channells where Snand Y n

and consequently R,, = E o (1) = C/2.(FortheBSC,

‘. for convolutional codes [SI. In particular, for very noisy

Fig. 17. Limiting values of E ( R ) for very noisy channels

This is plotted as a dotted curve in Fig. 17.

XII. OTHERDECODING FOR CONVOLU-

aretreatedextensivelyinMassey[3],Gallager [ 141, times the first), we obtain finally a 2"' - 1 dimensional

Wederivehere t.he distance-invariant (D = 1) gen- Notethat (83) is thesameas(78)for K reducedby

Defining the new variable 1 - NL

X'on...o1 = N L Xnn...o1 + Xnn...11 (82)

Burst-Correcting Codes for the Classic Bursty Channel

You might also like