Viterbi 1971

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY,
V L COM-19, NO. 5, OCTOBER 1971 O.
751
the Bosedecoding beyond the bound, BCH Univ. Illinois, Urbana, Chaudllu~1-Hocquenghcm codes, I E E E lrans. Inform. CSL Rep. R-404, 1969. Theory, vol. IT-10, Oct. 1964, pp. 357-363. 1341 E. J. Weldon,Jr.,!Difference-setcycliccodes, Bell Syst. [221 G . D. Forney, On decoding BCH codes, IEEE Trans. Tech. J., vol. 45, Sept. 1966, pp. 1045-1055. Inform. Theory, vol. IT-11, Oct. 1965, pp. 549-557. BCH decoding,] 1231 J. 1,. Mnssey,Shift-registersynthesisand IEEE Trans. I n f o r m . Theory, vol IT-15, Jan. 1969, pp. 1221L1. Robert T. Chien (SJ56-M58) wasbornin 11241 , Tlmsholtl Decoding. Cambridge,Mass.:M.I.T.Press, Kiangsu, China, on November 20, 1931. He 1963. received the A.M. degree in mathematics E251 H. F. Mattson and G . Solomon: A new treatment of Boscand the Ph.D. degree in electrical engineering Chnudhuri Codes, J . Soc. Indust. A p p l . Math., vol. 9, Dee. 1961, 11p. 654-669. from the University of Illinois, Urbana,in r261 . - ~ . I. S. Reed. A clnss of multinle-error-correcting codes and 1957 and 1958, respectively. the dccodidg scheme, I R E ~ r k sz72forVL. ~ h e o & ,vol. IT^, . From 1958 to 1965 he was associated with Sept. 1954, 171,. 3 8 4 9 . the IBM Thomas J. Watson Research Center, C271 I . S.,Reed and G. Solomon, Polynomial codes over Yorktown Heights, N. Y., where he waa ccrtam finite fields. J . Soc. I7ldIlSt. Anwl. Math.. vol. 8. responsiblefor a researchgroupincoding 1960,11p. 300-304. theory and memoryaddressing. From 1961 C281 1,. D . Rudolph, Gcornetric configuration and majority to 1963 he was Adjunct Associate Professor at Columbia University, M.E.E. thesis, Univ. Oklahoma, logic decodnblc codes, New York, N. Y. I n 1965 he joined the University of Illinois where Norman, 1964. 1291 , A clnss of majority logic decodable codes, IEEE he is currently Professor of ElectricalEngineeringandAssociate Trans. Inform. Theory (Corresp.), vol. IT-13, Apr. 1967, Director for Systemsa t t h eCoordinated Science Laboratory.He has 1Ill. 305-307. published in the areas of graph theory, coding theory, artificial in[301 -, Threshold decoding of cyclic codes, IEEETrans. telligence, and information retrieval systems. He is the coauthor of Iwform. Theory, vol. IT-15, May 1969. pp. 414-418. 1311 J. J. Stone, Multiple error burst correction, Inform. a book, Topological Analysis and Synthesis of Communication Networks (New York: Columbia University Press, 1962). He is also a C O ~ L ~ T . ,4, Mar. 1961, pp. 324-331. vol. consulta.nt to IBM in error control and coding, digital communicaBurst trapping techniques a forcompound [321 S. Y. Tong, tion, information retrieval, and memory indexing. channel,Bell TelephoneLab.,Tech.Memo., 1968. Dr. Chien is a member of Tau Beta Pi andSigma Xi. 1.331 K. X. M. Tzeng, On iterative decoding of BCH codes and
3 0~
[211 R. T. Ctticn, Cyclic decoding procedure for
Convolutional Codes and Their Performance in Communication Systems

ANDREW J. VITERBI,
SENIOR MEMBER, IEEE
Abstract-This tutorial paper begins with an elementary presenta- form block codes of the same order of complexity, there tion of the fundamental properties and structure of convolutional remainstodate a lack of acceptance of convolutional codes and proceeds with the development of the maximum likelicoding and decoding techniques on thepart of many hood decoder. The powerful tool of generating function analysis is communication technologists. In most cases, this is due demonstrated toyield forarbitrary codes both the distance properties to an incomplete understanding of convolutional codes, and upper bounds on the bit error probability for communication over any memoryless channel. Previous results on code ensemble whose cause can be traced primarily to the sizable literaaverage error probabilities are also derived and extended by these ture in this field, composed largely of papers which emtechniques.Finally,practicalconsiderationsconcerning finite dephasize detailsof the decoding algorithms rather than the coding memory, metric representation, synchronization and are morefundamentalunifying concepts, andwhich,until discussed.
I.INTRODUCTION
LTHOUGH convolutional codes, first introduced by Elias [ I ], have been appliedoverthepast decade to increase the efficiency of numerous communicationsystems, where theyinvariablyoutper-
Paper approved by the Communicat.ion Theory Committee of the IEEE Communication Technology Group publication for withoutoralpresentation.ManuscriptreceivedJanuary 7, 1971 ; rcvised June 11, 1971. The author is with the School of Engineering Applied and Science,University of California, Los Angeles.Calif. 90024, and the Linknbit Corporation, San Diego, Calif.
recently, have been divided into two nearly disjoint subsets. This malady is shared by the block-coding literature, wherein thealgebraic decoders andprobabilistic decoders have been a t oddsfor a considerably longer period. The convolutional code dichotomy owes its origins to thedevelopment of sequential(probabilistic) decoding by Wozencraft [2] and of threshold (feedback, algebraic) decoding byMassey [3]. Untilrecently t.he two disciplines flourished almost independently, each with its own literature,applications,andenthusiasts.TheFano sequential decoding algorithm [ 4 ] was soon found to
752
IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER 1971
greatly outperform earlier versions of sequential decoders bothintheoryandpractice.Meanwhilethefeedback decoding advocates wereencouragedby the burst-error correctingcapabilities of the codes which renderthem quite useful for channels with memory. T o add to theconfusion, yet a thirddecoding technique [9], which emerged with the Viterbi decoding algorithm was soon thereafter shown to yield maximum likelihood decisions (Forney [ 121, Omura [ 171 ) . Although this approach is probabilistic and emerged primarily from the sequential-decoding oriented discipline, it leads naturally convolut.iona1 code to a morefundamentalapproachto representationandperformanceanalysis.Furthermore, by emphasizing decoding-invariant the properties of convolutional codes, one arrives directly to the maximum likelihooddecoding algorithm and from it to the alternateapproaches whichlead tosequential decoding on the one hand and feedback decoding on the other. This decodingalgorithm has recentlyfoundnumerousapplications in communication systems, two of which are covered in this issue (Hellerand .Jacobs [24], Cohen e t al. [25] ) . It is particularly desirable for efficient communication a t very high data rates, where very low errorratesarenotrequired,or where large decoding delaysareintolerable. Foremost among the recent works which seek to unify thesevariousbranches of convolutionalcodingtheory is that of Forney 1121, [21], [22], et seq., which includes a three-part contribution devoted, respectively, to algebraicstructure,maximum likelihooddecoding, andsequential decoding. Thispaper, whichbegan asanattemptto present, theauthorsoriginalpaper [9] to a broader audience; is another such effort a t consolidating this discipline. It begins with an elementary presentation of the fundamcntalpropertiesandstructure of convolutional codes and proceeds toanatural development of the maximum likelihood decoder. The relative distances among codewords are then determined by means of the generatingfunction (ortransferfunction) of the code statediagram.This in turn leads totheevaluation of coded communication system performance on any memorylesschannel.Performance isfirstevaluatedforthe specific cases of thebinarysymmetricchannel (BSC) andtheadditivewhiteGaussian noise (AWGN)channel with biphase (or quadriphase) modulation, and finally generalized toothermemorylesschannels.Newresults are obtained for t,he evaluation of specific codes (by the generating function technique), rather the than ensemble average of a class of codes, ashad been done previously, and for bit error probability, as distinguished from event error probability. The previousensembleaverageresultsarethen extended to bit error, probability bounds for the class of
1 This material first appeared in unpublished form notes for th? Linkabit. Corp., Seminar on convolutional Jan. 1970.
time-varyingconvolutional codes bymeans of a generalized generating function approach; explicit results are obtained for the limiting case of a very noisy channel and compared with the corresponding results for block codes. Finally, practical considerations concerning finite memory, metric representation, and synchronization are discussed. Furtherand more explicit details on these problemsanddetailedresults of performanceanalysis andsimulationare givenin thepaperbyHellerand ,Jacobs [ 241. While sequential decoding is not treated explicitly in thispaper,thefundamentalsandtechniquespresented herelead naturallytoaneleganttutorialpresentation of thissubject,particularlyif, following Jelinek [18], onebeginswith the recentlyproposed stack sequent.ia1 decoding algorithm proposedindependentlybyJelinek and Zigangirov [7], which is far simpler to describe and understand then the original sequential algorithms. Such a development, which proceeds from maximum likelihood decoding to sequential decoding, exploiting the similarities in performance and analysis has been undertaken by Forney [22]. Similarly, the potentials and limitations of feedback decoders can be better understood with the background of the fundamental decoding-invariant convolutional code properties previously mentioned, as demonstrated, for example, by the recent work of hlorrissey ~151. 11. CODE REPRESENTATION
as the codes,
A convolutional encoder is a linear finite-state machine consisting of a K-stage shift register and n linear algebraic function generators. The input data, which is usually, though not necessarily, binary, is shifted along the register b bits at a time. An examplewith K = 3, n = 2, b = 1 is shown in Fig. 1. The binary input data and output code sequences are indicated on Fig. 1. The first three input bits, 0, 1, and 1, generate the code outputs 00, 11, and 01, respectively. We shall pursue this example to develop various representations of convolutional codes andtheirproperties. The techniques thus developed will then be shown to generalize directly to any convolutional code. It is traditional and instructive to exhibit a convolutional code bymeans of atreediagramas shown in Fig. 2. If the first input bit is a zero, the code symbols. are thoseshown on the firstupperbranch, while if it is a one, theoutput codesymbols are thoseshownon the first lower branch. Similarly, if the second input bit is a zero, we trace tree the diagram the to next upper branch, while if it, is a one, we trace the diagram downward. In this manner all 32 possible outputs for the first five inputs may be traced. From the diagram it also becomes clear that after the first three branches the structure becomes repetitive. I n fact, we readily recognize that beyond the third branch the codesymbols on branchesemanatingfromthetwo nodeslabeled a are ident.ica1, andsimilarlyforall the
VITERBI : CONVOLUTIONAL CODES
753
a=
010001.
m
001101010010.
..
011010.
..
b=
CODE SEQUENCE
DATA SEQUENCE
~011100;.
I
= 3, TL = 2, b = 1.
-00
n 10
Fig. 1. Convolutional coder for K
d=(lll
nn
Fig. 3. Trellis-code representation for coder of Fig. 1 .
1
1
11
11
? :; ,
llb 01
r L 0 0
01 d r 0 1
01 11
=b
c=
00
10d 10
10
yo ;
Fig 4. State-diagram representation for coder of Fig. 1.
01
10
Fig. 2. Tree-code representation for coder of Fig. 1
identically labeled pairs of nodes. The reason for this is obvious from examination of the encoder. As the fourth input bit enters the coder a t t h e right, the first data bit falls off on the left end and no longer influences the output code symbols. Consequently, the data sequences 1 0 0 ~. . and OOOxy- . generate the same code symbols ~ . as is shown in the tree after third the branch and, diagram,both nodeslabeled a can be joinedtogether. This leads to redrawing the tree diagram as shown in Fig. 3. This has been called a trellis diagram [12], since a trellis is a tree-like structure with remerging branches. We adopt the convention here that code branches produced by a zero input bit are shown as solid lines and code branches produced by a one input bit are shown dashed. The completely repetitive structure of the trellis diagram suggestsa further reduction in the representation of the code to the state diagram of Fig. 4. The states of the state diagram are labeled according to the nodes of the trellis diagram. However, since the states corres-
pond merely to the last two input bits to the coder we may use these bits to denote the nodes or states of this diagram. We observe finally thatthestatediagram can be drawn directly observing finite-state by the machine properties of the encoder and particularly the fact that a four-st,ate directed can graph be used to represent uniquely the input-output relation of the eight-state machine. For the nodes represent the previous two bits while the present bit is indicated by the transition branch;forexample, if the encoder (machine)contains 011, this is represented in the diagram by the transition from state b = 01 to state d = 11 and the corresponding branch indicates the code symbol outputs 01.
111.MINIMUM DISTANCE DECODER BINARY FOR SYMMETRIC CHANNEL On a B,SC, errors which transformchannel a code symbol 0 to 1 or 1 to 0 are assumed to occur independently from symbol to symbol with probability p . I all f input (message) sequences are equally likely, the decoder which minimizes the overallerrorprobabilityforany code, block or convolutional, is one which examines the error-corrupted received sequence ylyz * * yj . . . and chooses the data sequencecorresponding to the transmitted code sequence ~ ~ 2 * x-j * *, whichisclosest 2 . to the received sequence in the sense of Hamming distance; that is, the transmitted sequence which differs from the received sequence in the minimum number of symbols.
754
IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER
1971
Referring first to the tree diagram, this implies that we should choose that path in the tree whose codesequence differs in the minimum number of symbols from the received sequence. However, recognizing thatthe transmitted code branches remerge continually, we may equallylimitour choice tothe possible pat.hs inthe trellisdiagram of Fig. 3. Examination of thisdiagram indicatesthat it isunnecessary to consider theentire received sequence (which conceivably could be thousandsor millions of symbolsinlength) a t one time in deciding upon t.he most (minimum likely distance) transmittedsequence. In particular,immediatelyafter thethirdbranch we maydetermine which of the two paths leading t o node or state a ismorelikely to havc been sent. For examplc, if 010001 is received, i t is clear that this is at distance 2 from 000000 while it is a t distance 3 from 111011 andconsequently we may exclude the lower pathinto node a. For, no matterwhatthe subsequent, received symbols will be, they will effect the distanc,es only over subsequent branches after these two pathshave remerged andconsequentlyinexactlythe sameway.Thesamecan be said pairs for of paths merging at. the other three nodes after the third branch. We shall refer to the minimum distance path of the two paths merging at. a given node as the survivor. Thus it is necessary only to remember which was the minimum distance path from the receivedsequence (or survivor) at each node, as well as the value of that minimum distance. This is necessary because at the next nodelevel we must compare two the branches merging a t each nodelevel,whichweresurvivors at the previous level for different nodes; e.g., the comparison a t node a after the fourth branch is among the survivors .of comparisons a t nodes a and c after the third branch. For example, if the received sequence overthefirstfourbranches is 01000111, the survivor at the third nodelevel for node a is 000000 with distance 2 and at. node c it is 110101, also with distance 2. I n going from the third node level to the fourth the received sequence agrees precisely with the survivor from c but has distance 2 from the survivor from a. Hence the survivor at node a of the fourth level is the data sequence1100whichproduced the codesequence 11010111 which is at (minimum) distance 2 from the received sequence. I n this way we may proceed through the received sequence andat each step for each state preserve one survivingpathanditsdistance from the received sequence, which is more generally called metric. The only difficulty which mayarise is the possibility that in a givencomparisonbetweenmergingpaths, the distances or metrics are identical. Then we may simply flip a coin as isdoneforblockcodewords at equal distancesfrom the received sequence. For even if we preserved both of theequallyvalidcontenders,further received symbols would affect both metrics in exactly the same way and thus not further influence our choice. This decoding algorithm was first proposed by Viterbi [9]in the more general context of arbitrary memoryless
channels.Anotherdescription of thealgorithmcan be obtainedfromthestate-diagramrepresentation of Fig. 4. Suppose wc sought path that around directed the state diagram, arriving atnode a. after the kthtransit.ion, whose code symbols are at a minimum distance from the received sequence. But clearly minimum this distance path to node a a t time k can be only one of two candidates:the miminunldistancepath.to node a attime k - 1 and the minimum distance path to node c a t time k - 1. The comparison is performed by adding the new distanceaccumulated in thekthtransitionby each of these paths their to minimum distances (metrics) at timc k - 1. It appears thus that the statc diagram also represents asystemdiagramforthisdecoder.Witheach node or state we associate a stomgeregister whichremembers theminimumdistancepathintothestateaftereach transition as well as a metric register which remembers its (minimum) distance the from received sequence. Furthermore, comparisons are made a t each step between the two paths which lead intoeachnode.Thusfour comparators must als9 be provided. There remains the only question of truncatingthe algorithmandultimately decidingonone pathrather than four. This iseasilydonebyforcing thelasttwo input bits to the coder to be 00. Then the final state of the code must be a = 00 and consequently the ultimate survivor is the survivor at node a, after the insertion into the coder of the two dummy zeros and transmission of the corresponding four code symbols. In terms of the trellis diagram this means that the number of states is reducedfromfour to twoby the insertion of the first. zero and to a single state by the insertion of the second. The diagram is thus truncated in the same way as it was begun. We shall proceed to generalize these code representationsandoptimal decoding algorithm to general convolutional codes and arbitrary memoryless channels, including theGaussianchannel, in SectionsVandVI. However, first we shall exploit the state diagram further todeterminetherelativedistanceproperties of binary convolutionalcodes.
IV. DISTANCE PROPERTIES OF CONVOLUTIONAL CODES

We continue .to pursue the example of Fig. 1 for the sake of clarity; in the next .section we shall easily generalize results. It is well known that convolutional codes aregroup codes. Thus there is no loss in generalityin computing the distance from the all zeros codeword to all the other codewords,for this set of distances is the same as the set of distances from any. specific codeword to all the others. For this purpose we may again use either the trellis diagramorthestatediagram.Wefirst of allredraw the trellis diagram in Fig. 5 labeling the branches according to their distances from the all zeros path. Now consider all the paths that, merge with the all zeros for the first time at some arbitrary node j .
755
Fig. 5 . Trellis diagram labeled with distances from all zeros path.
at this same state a = 00. All such paths can be traced on the modified state diagram. Adding branch exponents we see that path a b c a is a t distance 5 from the correct path,.paths a b d c a and a b c b c a are both at distance 6 , and SO forth, for thegenerating functions of the output sequence weights of these paths are D 5 and Do, respectively Now we may evaluate the generating function of all paths mergingwith the all zeros at the jth nodelevel simply by evaluating the generating function of all the weights of the output sequences of the finite-state machine.2 The result in this case is
T(D) =I ____ 1 -20

= :
D5
D 5+ 2 D R f 4D7 + . .; + 2&Dk5 + . . .
(1)
diagram Fig. 6. State
labeled according zeros path.
to distance from all
It is seen from the diagram that of theEe paths there will be just one path at distance 5 from theall zeros pathandthis diverged fromitthree branches back. Similarly there are two a t distance 6 from it, one which diverged 4 branches back and the other which diverged 5 branchesback,and so forth.Wenote also that the 5 path are 00 . 0100 and thus inputbitsfordistance differ inonlyoneinputbitfromtheall zeros, while thedistance 6 pathsare 00.. 01100 and 00 * 010100 * and thus each differs in 2 input tiitsfrom the all zeros path. The minimum distance, sometimes called the minimum free distance, among ail paths is thus seen to be 5. This implies that any pair of channelerrorscan be corrected, for two errors will cause the received sequence to be at distance 2 from thetransmitted(correct) sequence but ,it will. be at least at distance 3 from any It appears with that other possible sequence. code enoughpatience the distance of all paths frohn the ail zeros (or arbitrary) can any path be so determined from the trellis diagram. However, by examining instead the state diagram we canreadilyobtaina &sed formexpression whose expansionyieldsdirectlyand effortlessly all the distance information. We begin bylabelingthebranches of the state diagram 6f Fig. 4 either D 2 , D ,or D o = 1 , where the exponent corresponds to the distance the particular of branchfrom the correspondingbranch of theall zeros path. Also we split open the node a = 00, since circulation around this seif-loop simply corresponds to branches of the all zeros path whose distance from itself is obviously zero. The result is Fig. 6 . Now as is clear from examination of thetrellisdiagram,everypath which arrives at state a = 00 a t node level j , musthave a t somepreviousnode level (possibly the first) Originated
This verifies our previous observation and in fact shows that among the paths which merge with the all zeros a t a given node there are 2k paths at distance k 4-5 from the all zeros. for an infinitelylong code seOf course, ( 1 ) holds quence; if we are dealingwith the jth nodelevel, we musttruncatetheseries a t some point.This is most easilydoneby considering theadditionalinformation indicated ,in the modified state diagralrl of Fig. 7. The L terms will be used to determine the length of a given path ; since each branch has an L , the exponent of the L factor will be augmentedbyoneevery the a branbh is passed through. The N term is included only if that branch transition was,caused by an input data one, Corresponding to a dottedbranchinthetrellis diagram: ,generating The function of this augmented state diagram is then T(D,L, N )
-=
D 5
L3N
1 - DL(1
+ L)N
L)*)1N3
D5L3N
+ D0L4(l+ L)N2 + D7L5(l +

+ . ..
+ ... + L ) 5 + k ~ 3 + k ( l +
(2)
Thus we have verified that of the two distance 6 paths one is of length 4 and the other is of length 5 and both differ in 2 inputbits from theall zeros.3 Also, of the distance 7 paths, one is of length 5, two are of length 6, and one is of length 7 ; all four paths correspond to input sequences with three ones. If we are interested in the jth node level, clearly we should truncate the series such that no terms of power greater than Lj are included. Wehavethusfullydeterminedtheproperties of all paths in the convolutional code. This will be useful later in evaluating error probability performance codes used of over arbitrary memoryless channels.
2 Alternatively,this can be regarded as thetransfer function of the diagram regarded as a signal flow graph. 3Thus if the all zeros was the correct, path the and noise causes 1 s to choose one of the incorrect paths, two biterrors 1 will be made.
756
1971
Fig. 7. Statediagram labeledaccording to distance,length,and number of input ones.
Fig. 8. Coder for K = 2, b
= 2, n = 3, and R = 2/3.
V. GENERALIZATION TO ARBITRARY CONVOLUTIONAL CODES

The generalization of these techniques toarbitrary binary-tree ( b = 1) convolutionalcodesisimmediate. That is, a coder withaI<-stageshiftregisterand n mod-2 adders will produce a trellis or state diagram with 2K-1 nodes or states and each branch will contain n code symbols. The rate of this code is then
fc
1 - bits/code symbol.
responding symbol of x i with probability p and is identical to it with probability 1 - p . Forcompletelygeneralchannels it isreadilyshown [ 6 ] , [14] that if all inputdata sequences are equally likely, the decoder which minimizes the error probability is one which compares the conditional probabilities, also calledlikelihood functions, P ( y I X"")), where y isthe overall received sequence and X"'" is one of the possible transmitted sequences, and decides in favor ,of the maximum. This is called a maximum likelihood decoder. The VI. GENERALIZATION OF OPTIMAL DECODER TO likelihood functions given computed the are or from ARBITRARY MEMORYLESS CHANNELS specifications of the channel.Generally it is moreconFig. 10 exhibitsacommunicationsystememploying venient to compare the quantities log P(y I xcm)) called a convolutional code. The convolut,ional encoder is the log-likelihood functionsandtheresultisunaltered preciseiy the devicestudiedintheprecedingsections. since the logarithm is a monotonic function of its (always Thedata sequence generally is binary (ai = 0 or 1) positive) argument. and the code sequence is divided into subsequences where To illustrate, us let consider again the BSC. Here x i representsthe n codesymbolsgeneratedjustafter each transmitted symbol is altered probability with the input bit ai entersthecoder: that is,thesymbols p < .1/2. Now suppose we have received a particular of the jth branch. In terms of the example of Fig. 1, N-dimensional binary sequence y andare considering a3 = 1 and x3 = 01. Thechanneloutputor received a possible transmitted N-dimensional code sequence sequence is similarly denoted. y i represents the n symbols which differs in d m symbolsfrom y (that is, the received when the n code symbols of xi were transmitted. Hamming between distance and y is d m ) . Then This model includes the BSC wherein the y i are binary since the channel is memoryless (i.e., it affects each n vectorseach of whosesymbolsdiffersfrom the cor- symbol independently of all the others), the probability
The exalnple pursued in the previous sections had rate R = 1/2. The primary characteristic of the binary-tree codes is that only two branches exit from and enter each node. If ratesotherthan l/n are desired we mustmake b >-1, where b is the number of bits shifted into the register at one time. An example for K = 2, b = 2, n = 3, and consequently rate R = 2/3 is shown in Fig. 8 and its state diagram is shown in Fig. 9. It differs from the binary-tree codesonly inthat eachnode is connected b it willbecontofourothernodes,andforgeneral nected to 2b nodes. Still all the preceding techniques including thetrellisandstate-diagramgeneratingfunction analysis still are applicable. It must be noted, however, that the minimum distance decoder must make comparisons among all the paths entering each node a t eachlevel of thetreliisand selectone survivorout of four (or outof 2* in general).
Fig. 9. State diagram for code of Fig. 8.
VITERBI
CONVOLUTION.AL CODES
757
1 I
al. a2.
. .ai.. .
x,
x2.
. .xi.
'
Pig. 10. Communicationsystememployingconvolutional
Y,,
Y*.
. .Yj,
.codes.
that this was transformed to the y at distance d,, from it is
specific received
p)N-d-
p(y
log P ( y
I x(m)). =
-dm
pd"(l
and the log-likelihood function is thus
I X""))
log (1 - p / p )
+ N log (1 - 11)
Now if we compute this quantity for each possible transmitted sequence, it is clear that, the second term is constant ineachcase. Furthermore, since we may assume p < 1/2 (otherwise the roie of 0 and 1 is simply interchanged at. the rcceivcr), we may express this as log P ( y
.(Y
CORRELATOR DEMODULATOR
n(t) WHITE GAUSSIAN NOISE
Fig. 11. Modemforadditivewhite Gaussiannoise lated memoryless channel. '
P S K . modu-
~ ( m ) )
= -adrn
-0
(3)
code path xi("')
wherc and , are positive constants and d,,, is the (posi8 tive) distance. Consequently, it is clear that maximizing the log-likelihoodfunctionisequivalent. to minimizing the Hamming distance d,,,. Thus for the BSC to minimize the error probability we should choose that code sequence a t minimum distance from the received sequence, as we havc indicated and done in preceding sections. W e now consider a morephysicalpracticalchannel: the AWGN channel biphase4 with phase-shift keying (PSK) modulation. The modulator optimum and demodulator(correlatdr or integrate-anddumpfilter)for this channel are shown in Fig. 11. We use the notation thatx i k is the kth code symbol for the jth branch. Each binary symbol (which we take here for convenience t o be f1) modulatesthecarrierby =tII/2 radiansfor T seconds. The transmissionrate is, therefore, 1 / T symbols/secondor b/nT' = R / T bit/s. The function e, is the energy transmitted for each symboi. is, therefore e b = E J R . The white The energyperbit Gaussian noise is a zero-mean random process of onesided spectral density N W/Hz, which O affects each symbol independently. It thetl follows direct,ly that the channel outbut symbol y i k is a Gaussian random variable whose mean is d < x j k (i.e., if x j k = 1 and if xi,+ = -1) ,and whose variance is N , / 2 . Thusthe conditionalprobabilitydensity (or likelihood) function of Y i k given x i k is
since each symbol is affected independently by the white Gaussian noise, and thus the log-likelihood function for the jth branchis
= In p(yf I xicm))
t-1
In p ( y , , I zjk("'))
+4 :
The likelihood function for the jth branch of a particular

4
where C and D are independent of m, and we have used the fact. that = 1. Similarly, the log-likelihoods function for any path is the sum of the log-likelihood functions for each of its branches. Wehavethus shown thatthe maximum likelihood decoderforthe memoryless AWGNbiphase (or quadriphase) modulated channel is one which forms the inner product between the received (real number) sequence andthe code sequence(consisting of f 1) and chooses the path corresponding to the greatest. Thus the metric forthischannel is the innerproduct (5) as contrasted with the distanceG metric used for the BSC.
5 W e have used thenatural logarithmhere, b u t obviously a The results are the same for ouadriphase PSK with coherent change of base results merely in a scale factor. reception.Theanalysis proceeds .'in the sameway, if we treat 0 Actually itis easily shown that maximizing aninnerproduct PSK is equivalent t.o minimizing the Euclidean distance between the quadriphase PSK as two parallel independent biphase channels. corresponding vectors.
758
1971
For convolutional codes the structure of the code paths was described in Sections II-V. I n Section I11 the optimum decoder w&s derived for the BSC. It now becomes clear that if we substitute inner the product metric syjkxjk(m) forthedistancemetric s d j k ( m ) , for the used BSC, all the arguments used in Section 1 1 for the latter 1 apply equally to this Gaussian channel. I n particular the optimum decoder has a block diagram represented by the code state diagram. At step j the stored metric for each state(whichisthemaximum of the metrics of all the paths leading to this state at this time) is augmented by thebranchmetrics for branchesemanatingfromthis state. The comparisons are performed among all pairs of (oringeneralsets of 2 b ) branchesenteringeachstate and the maxima are selected as the most likely paths. new Thehistory(inputdata) of each new survivor must again be storedandthedecoderis now readyforstep j + 1. Clearly, this argument generalizes to any memoryless channel and we must simply use the appropriate metric In P ( y I X("')), which may always be determined from the statistical description of the channel. includes, This among others, AWGN channels employing other formsof modulation.' In thenextsection, we applytheanalysis of convolutional distance code properties of Section IV t o determine error the probabilities of specific codes on more general memoryless channels. VII. PERFORMANCE OF CONVOLUTIONAL CODES ON MEMORYLESS CHANNELS I n Section IV we analyzed the distance properties of convolutional codes employinga state-diagram generating functioli technique. We now extend this approach 'to obtain tight upper bounds on the error proba.bility of such codes.We shallconsider the BSC, the AWGN channel and more general memoryless channels, inthat order. We shail obtain both the first-event error probability, which is the probability that the correct path is excluded (not a survivor) for the first time at the jth,step; and the bit error probability which is theexpectedratio of bit errors to total number bits transmitted. of
path merging with the all zeros a t node a a t the jth level. Nowsuppose that the previous-levelsurvivorswere such that the path compared with the all zeros a t step j is the path whose data sequence is 00 . 0100 corresponding to nodes a * * . a a b c a (see Fig. 4.). This differs from the correct (all zeros) path in five symbols. Consequently an error will be made in this comparison if the BSC caused three or more errors in these particular five symbols. Hence the probability of an error in this specific comparison is
'
P, =
2 (:)p"(l e-3
p)5--s.
On the other hand, there is no assurance that this particular distance five path will have previously survived so as to becomparedwiththecorrectpath at the jth step. If either of the distance 6 paths were compared instead, then four or more errors in the six different symbols will definitely cause an error in the survivor decision, while three errors will cause a tie which, if resolved by coin flipping, will result in an error only half the time. Then the probability if this comparison is made is
Similarly, if thepreviousiysutvivingpathsweresuch that a distance d path is compared with the correct path at the jth step, the resulting error probability is
k odd
A . BinarySymmetric Cho.nnel Thefirst-eventerrorprobability is readilyobtained from the generat.ing function T(D)[ (5) for the code of Fig. 1, whichwe shallagainpursuefordemonstrative purposes].Wemayassume,without loss of generality, since we are dealing with group codes, that the all zeros path was, transmitted. Then a first-event error is made a t the jth step if this path is excluded by selecting another
7 Although moreelaboratemodulators, such asmultiple FSK or multiphase modulators,mightbeemployed, Jacobs [ I l l has shown that the most effective as well as the simplest system for wide-band space and satellite channels is the binary PSK modulator considered inthe example of thissection.Wenoteagain that the performance of quadriphasemodulationisthesameas for biphasemodulation,whenbotharecoherentlydemodulated.
Now at step j, since there is no simple way of determining previous survivors, we may overbound the probability of afirst-event,errorbythesum of theerror probabilities for all possible paths which merge with the correct path at this point. Note this union bound is indeed anupperboundbecausetwoormoresuchpaths may both have distance closer to the received sequence than the correct path (even though only one has survived to this point) and thus the events are not disjoint. For the example with generating function (1) it follows that the first-event error probabilitys is bounded by
P < P, ,
+ 2P, + i P , +
+ 2kP,+,+
* .
(9):
where PI,is given by (8). I n Section VII-Cit wili be shown that (8) canbe upper bounded by (see (39) ) .
Pk
< 2kp(1 - p)k/2.
(10)
(9)
Usingthis,thefirst-eventerrorprobabilitybound
8 We are ignoring the finite length of the path, but the expression is stillvalid since it is an upper bound.
VITERBI
: CONVOLUTIONAL CODES
CORRECT PATH
759
x.
can be more loosely bounded by
P,
<
2k-52kp(1 p ) k / 2 k=5
Y
where T ( D ) is just the generating function of ( 1 ) It follows easily that for a general binary-tree ( b = 1 ) convolutional code with generating function
m
/PATH
"
INCORRECTSURVIVOR
x'
Fig. 12. Example of decoding decision after initial error occurred.
has
binary-tree code if we weight each term of the first-event errorprobabilitybound at any step by the number of the first-event error probability is bounded by the generroneous bits for each possible erroneous path merging eralization of ( 9 ) . with the correct path at that node level, we upper bound i the error bit probability. For, a given step decision corresponds to decoder actionon onemore bit of the transmitteddata sequence; the first-event proberror where Ph:is given by (8) and more loosely upper bounded abilityunionboundwitheachterm weighted bythe by the generalization of (11) correspondingnumber of biterrors is anupperbound on the expectednumber of biterrors caused bythis P E < T'(D) I D = ~ ~ F K I ~ . (14) action. Summing the expect,ed number of bit errors Whenever a decision error occurs, one or more bits will over L steps, which as was just shown mayresultin be incorrectly decoded. Specifically, those bits in which overedmating through double counting, gives an upper the path selecteddiffersfrom the correctpath will be bound on the expected number of bit errors in L branches incorrect. If only one error were ever made in decoding for arbitrary L. But since the upper bound on expected an arbitrary long code path, the number of bits in error number of bit errors is the same a t each step, it follows, in this incorrect path could easily be obtained from the upon dividing the sum of L equal terms by L, that t,his T ( D , N ) (such as given expectednumber of biterrors per step is justthebit augmentedgeneratingfunction by (2) with factors in L deleted). For the exponents of error probability P,, forbinary-tree a code (b = 1 ) . the N factors indicate the number of bit errors for the If b > 1, then we must divide this expression by b, the given incorrect path arriving a t node a at the jth level. number of bits encoded and decoded per step. T o illustrate the calculation of PIj for a convolutional After the first error has been made, the incorrect paths no longer will be compared with a path which is overall code, let us consideragain the example of Fig. 1 . Its correct, but rather with a path which has diverged from transfer function in D and N is obtained from (2), letting the correct path over some span of branches (see Fig. 12). L = 1, since we are not now interested in the lengths of If the correct path x hasbeen excluded bya decision incorrect paths, to be error at step j in favor of path x, the decision at step ' D5N j 1 will bebetween x and x". Now the (first-event) T(D, ) = ' N 1 - 2DN of (13) or (14) is a forcomparison, error probability at any step, between path x and any other path merging - D5N + 2D6N2= . . . + Z k Dk"Nk+' + . . . . with it at that step, including path x" in this case. However,since the metricgfor path x is greaterthanthe ' (15) metric for x, for this the on basis correct path was The exponents of the factors in N in each term deterexcluded at step j, the probabilit,~ that path x" metric mine the number of bit errors for the path(s) correspondexceeds path x metric at step j ' 1 isless thanthe N probability thatpath x" exceeds the (correct) path x ing to that term. Since T ( D ) = T(D, ) I N = l yields the first-event probability error P,, each whose of terms metric atthispoint. Consequently, theprobability of must be weighted by the exponent of N to obtain PO, it a new incorrect path being selected a afterprevious first T ( D , A') a t errorhasoccurred is upperboundedby the first-event follows that we should differentiate N = 1 to obtain error probability at that step.
k=d
T(D) =
ak Dk
( 1 2)
Moreover,whenaseconderror follows closely after a error, first itoften occurs (as in Fig. 12) thatthe erroneous bit(s) of path x" overlap the erroneous bit(s) of path x With this in mind, ' . we now show that for a
9 Negative distance the from received sequence for the BSC, but clearly this argument generalizes to any memoryless channel.
--. D5 (1 - 20)'
760
OCTOBER
1971
Then from this we obtain, as in (9) , that for the BSC
PB
<p5
+ 3'4P7 +
+ 2*2P6
* * '
+ ( k + 1)2kpk+,+
higher met,ric than the correct path, i.e., " n Xii'yii 2 xiiyii
i
i=1
" *
(17) or
i=l
where P , is given by ( 8 ) . r I for P, we use the upper bound (10) we obtain the f weaker but simpler bound
P B
j=1
<
2 (k - 4)2k-5[.4p(1
K=5
- p)lk'?
where i runsoverallbranchesinthetwopaths. But since, as we have assumed, the paths x and x differ in ' exactly k symbols,wherein x i j = 1 and xiit = - 1, the pairwise error probability .is just
More generally for any binary-tree on the BSC if
( b = 1) code used
then corresponding to
(17) where r runs over the k symbols wherein the two paths differ. Now it was shown in Section VI that the yii are independent Gaussian random variables of variance N0/2 and mean &xii, where xii is the actually transmitted codesymbol.Since we are assuming that the (correct) t,ransmitted path has xii = +1 for all i and j , it follows and variance N0/2. Therethat y i i or y, has mean fore, sincethe k variables yr are independent and Gaussian, the sum 2 = y7 is also Gaussian with mean IC fi and variance kN0/2. Consequently,
and corresponding to (18) we have the weaker bound
For a nonbinary:tree code ( b # I ) , all these expressions must be divided by b. The results of (14) and (18) will be extended to more general memoryless channels, but first we shall consider one more specific channel of particular interest.
13. AWGN Biphnse-Modulated Channel
d<
x,=lk
As was shown in Section VI the decoder for this channel operates in exactlythesamewayasfortheBSC, except that instead of Hammingdistance it uses the metric
the code symbols, where xi; = f l are transmitted y i i the corresponding received (demodulated) symbols, a,nd j runs over the n symbols of eachbranch while i runs over all the branches in a particular path. Hence, to analyzeit'sperformance we mayproceedexactlyas in Section VII-A except that the appropriate pairwisedecision errors P k mustbesubstitutedforthose of ( 6 ) to (8). As before we assume, without loss of generality, that the correct(transmitted)path x has zii = +1 forall i and j (corresponding to the all zeros if the input symbols were 0 and 1). Let us consider an incorrect path x merging ' with the correct pat,h a t a particular step, which has k negative symbols (xi t = - 1) and the remainder posit.ive. j Such a path may be incorrectly chosenonly if it has a
WerecallfromSectionVI that isthesymbolenergy, which is related the energy to bit by =Reb, where R = b/n. The boundon PE then follows exactly as in Section VII-Aand we obtainthesamegeneralbound as (13)
where aL are the coefficients of
and where d is the minimum distance between any two paths in the code. We may simplify this procedure considerably while loosening the bound only slightly for this channel by observing that for x 2 0, y 2 0,
VITERBI
: CONVOLUTIONAL
CODES
761
Consequently,for k 2 dl letting 1 = k - d, from (23)
out tbe firsttwofactors.Since the product of thefirst twofactors is always less than one, the moregeneral bound is somewhat weaker.
< exp
(2)
whence the bound of (24), using (27), becomes
C. General Memoryless Channels As was indicated in Section VI, for equally likely input data sequences, the minimumerrorprobability decoder chooses the path which maximizes the log-likelihood funct.ion (metric)
In P(y I x("'))
or
The bit error probability can be obtained in exactly the same way. .Just as for the BSC [ (19) and (20)l we have that for a binary-tree code
over d l possible paths X("'). I each symbol is transmitted f (or modulates the transmitter) independent of all preceding and succeedingsymbols, and the interference corruptseachsymbolindependently of all the others, thenthe channel, which includes the modem, said is to be memorylessl0 and the log-likelihood function
P < B
2
k=d
CkPk
where ck are thecoefficients of
Thus following the came arguments which led from (24) to (28) we have for a binary-tree code
(31) For b > 1, this expression must be divided by b. To illustrate the application of this result we consider the code of Fig. 1 withparameters K = 3, R = 1,/2, whose transferfunction isgivenby (15). For this case since R = 1j2 and E~ = 1/2 Eb, we obtain
wherc xijO")is a code symbol of the ,mth path, y i j is the j runs corresponding received (demodulated) symbol, over the n symbols of each branch, and i runs over the branches in t,he given path.Thisincludesthe special cases considered in Sections VII-A and -B. The decoder isthesameasforthe BSC except for using this morc general metric. Decisions are made after each set of new branch metrics have been added to the previouslystoredmetrics. Toanalyzeperformance, we must merely evaluate PIC, pairwise error probability the for an incorrect path which differs in k symbols from the correctpath,aswasdoneforthe specialchannels of SectionsVII-Aand -B. Proceeding as in (22), letting xij and xi/ denotesymbols of the correct and incorrect paths, respectively, we obtain
Pdx, x) '
Since the number of states in the state diagram grows exponentially with K , direct calculation of the generating function becomes unmanageable for K > 4. On the other hand, a generating function calculation is basically just a matrix inversion (see Appendix I ) , which can be performed numerically for a given value of D . The derivative at N = 1 can be upper bounded by evaluating the first difference [ T ( D ,1 E ) - T ( D , l ) ] / ~ for small C. , A computer program has been written to evaluate (31) for anyconstraintlengthupto K = 10 andallrates R = l/n as well as R = 2/3 and R = 3/4. Extensive results of thesecalculationsare given in thepaperby Heller and Jacobs [24], along with the results of simulations of the corresponding codes and channels. The simulations verify the tightness of the bounds. In the next section, these bounding techniques will be extended to more general memoryless channels, from which (28) and (31) can be obtained directly, but with-
(33) where r runs over the k code symbols in which the paths differ. This probability can be rewritten as (34) where Y k is set the of all vectors y . . , yk) forwhich ,
(gl, y2,
.. ,
Often more than one code symbol in a. given branch is used to modulate the transmitter a t one time. In this case, provided theinterferencestill affects succzeding branches independently, the channel can still be treated as memoryless but now the symbol likelihood functionsare replaced by branch likelihood functions and (33) is replaced by a single sum over i.
762
TRANSACTIONS
IEEE
O N COMMUNICITIONS TECHNOLOGY, OCTOBER
1971
that the likelihood functions (probability densities) were (35) But if this is the case, then where x,.=
+ 1.or -1
and
firstinequalityisvalidbecause we are multiplying the summandbyaquantitygreaterthanunity,"andthe second because we are merely extending sum the of positive terms over a larger set. Finally we may break up the k-dimensional sum over y into. IC one-dimensional summationsover yl, yz, . . , vk, respectively,andthis yields
+
where we have used (41) and x$ = xi2 = 1. The product of these k identical terms is, therefore,
P, < exp
(2)
,
P,;(x,x) 5 '
UI
* *
I,
Uk
=
7-1
c P(y, I xJ1~2P(YrI
I,
To illustrate the use of this bound we consider the two specific channels treated above. For the BSC,y,. is either equal to xr, the transmitted symbol, or t o Z,., its complement. Now y,. depends on x, through the channel statistics. Thus
P(y,
=
x,).
P(y, = 2,) = p .
For each symbol in the set r = 1, 2, * , k by definition x,. # x,.'. Hence for each term in the sum if x,. = 0, x,' = 1 or vice versa. Hence, whateverx,.and x,.' may be
ut-0
and the product (37)
of k identical factors is
P, = 2k p k / 2 (1 - p y 2
for all pairs of correct and incorrect paths. This was used in Section VII-A to obtain the bounds (11) and (21). For the AWGN channel of Section VII-B we showed
1 This would be the set of all 2' k-dimensional binary vectors 1 for the BSC, and Euclidean k space for theAWGN channel. Note also that the bound of (36) may be improved for asymmetricchannelsbychangingthetwoexponents of ?h to s and 1 s, respectively, where 0 s < 1. 12 The square root of a quantity greater one than is also greater than one.
n P(y7 I
k
1-1
x,)1/2P(y, I (37)
for pairs all of correctandincorrectpaths.Inserting theseboundsin the generalexpressions (24)and(29) and using (25) (30) and yields the boundfirston event error probability and bit error probability.
1- p (38)
which are somewhat (though not exponentially) weaker than(28)and(31). A characteristic feature of both the BSC and the AWGN channelis that they affecteachsymbol in' the samewayindependent of itslocationinthe sequence. Any memoryless channel has this property provided it is stationary (statistically time invariant). For a stationary memorylesschannel (37) reduces to
where13
Do A
(39)
Ilr
P(y,
I Z~)''~P(~~ < 1. I X,')'''
(46)
While thisboundon Pk isvalid for all suchchannels, clearly it depends on the actual values assumed by the symbols x , and x,', of the correct and incorrect path, and these will generally vary according to the pairs of paths x and x' in question. However, if theinput symbols arebinary, x and 3, whenever x, = x , then x?' = 3,
13 For an asymmetric channel this bound may be improved by changing the two exponents 1/2 to s and 1 - s, respectively, where 0 < s < 1.
<
VITERBI
CONVOLUTION.4L CODES
763 memoryless channel (46) (47)

ON
so that for anyinput-binary becomes

Do
and consequently
=
U
n hi
P(y I ~ ) " z P ( yI 9)"'
Fig. 13. Systematicconvolutioncoder
for K
= 3 and
1/2.
(49) wherc D,, is given by (47). Other examples of channels of this type are FSK modulation over the AWGN (both coherentandnoncoherelit)andRayleighfadingchannels.
-
TABLE I MAXIMUM-MINIMUM DISTANCE FREE
K
2
Nonsystematics Systematic
3 4
3
4
4
6
5 6
Theterm syste.matic convolutional code referstoa code on each of whose branches one of the code symbols a We have excluded catastrophic codes (see Section IX); R = a. is just the data bit generating that branch. Thus a systematic coder will have its stages connected to only n - 1 catastrophicerrorsis,thatall of theaddershavetap adders, the izth being replaced by a direct line from the sequences,represented as polynomials,with a common first stage to the commutator. Fig. 13 shows an R = 1/2 factor. systematic coder for K = 3. I n terms of the state diagram it iseasily seen that It is well known that for group block codes, any non- catastrophicerrorscan occur if andonly if any closed systematic code canbetransformedintoasystematic loop path in the diagram has a zero weight (isel the excode which performsexactly as well. This is notthe ponent of D for the loop path is zero). T illustrate this, O case for convolutional codes. The reason for this is that, we consider the example of Fig. 14. as was shown in Section VII, the performance of a code Assuming that the all zeros is the correctpath,the on any channel depends largely on the relative distances incorrect path a b d d . d c a hasexactly6 ones,no between codewords and particulariy on the minimum matter how manytimes we go around the self loop d. free distance dl which is the exponent of D in the leading Thus for a BSC, forexample,four-channelerrorsmay term of the generating function. Eliminating one of the cause 1;s to choose thisincorrectpath or consequently adders results in a reduction of d . For example, the maximake an arbitrarily large number of bit errors (equal to mum free distance code for K = 3 is that of Fig. 13 and two plils the number of times the self loop is traversed). this has d = 4,while the nonsystematic K = 3 code of Similarly for theAWGN channel incorrect this path Fig. 1 has minimum free distance d = 5 . Table I shows witharbitrarilymanycorrespondingbiterrors will be the maximum minimum free distance for systematic and chosen with probability erfc 4 6 e , / N 0 . nonsystematic codes for K = 2through 5 . For large Anothernecessaryand sufficient condition for cataconstraint lengths the results are even more widely sepastrophic error propagation, recently found by Odenwalder rated. In fact, Bucher and Heller [19] have shown that [20]is that any nonzero data path in the trellis or state for asymptotically large K , the performance of a systediagramproduces K - 1 consecutivebrancheswithall matic code of constraint length K is approximately the zero code symbols. same that as of a nonsystematic code of constraint We observe also that for binary-tree (k= l,/n) codes, length K ( l - R ) . Thus for R = 1/2 and very large K , if each adder of the coder has an even number of consystematic codes have the performance of nonsystematic nections, then the self loop corresponding to the all ones codes of half the constraint length, whilerequiring ex(data) state will have zero weight and consequently the actly the same optimaldecoder complexity. For R = 3/4, code will be catastrophic. the constraint length is effectively divided by 4. Themainadvantage of a systematic code is that it cannever be catastrophic,sihce each closed loop must IX. CATASTROPHIC ERRORPROPAGATION IN contain a t least one branch generated by a nonzero data CONVOLUTIONAL CODES bit and thus having a nonzero code symbol. Still it can be [ 131 have defined a catastrophic shown [23] that only a. small fraction of nonsystematic MasseyandSain error as the event that finite number of channel symbol codes if3 catastrophic (in fact, 1/(2" - 1 ) for binary-tree a if catastrophic errors causes an infinite number of data bit errors to be R = l./n codes. Wenotefurtherthat errors are ignored, nonsystematic codes with even larger decoded. Furthermore, they showed that a necessary and sufficient conditionforaconvolutional code to produce free distance than those of Table I. exist.
IEEE TRANSACTIONSO N COMMUNICATIONS TECHNOLOGY, OCTOBER
1971
Fig. 14. Coder displaying catastrophic error propagation.
come isahead we connect the particular stage to the particular adder; if it is a tail we do not. Since this is repeated for each new branch, the result is that for each COMPARISON WITH BLOCK CODES branch of the trellis the code sequence is a random binary We begin considering by the structure path of a binary-tree14 ( b = 1) convolutional code of any con- n-dimensional vector. Furthermore, it can be shown that straint K , independent of the specific coder used. For this thedistribution of theserandom codesequences is the purpose we need only determine T ( L ) the generating same for each branch a t each node level except for the all zeros path, which must necessarily produce the all zeros function for the state diagram with each branch labeled merelyby L so that the exponent of each term of the code sequence on each branch. To avoid treating the all infiniteseriesexpansion of T ( L ) determines the length zeros path differently; we ensure statistical uniformity by over which an incorrect path differs from the correct path requiring further that after each shift a random binary before merging with i t a t a given node level. (See Fig. 7 n-dimensional vector be added to each branch16 and that this also be reselected after each shift. (This additional and (2) with D = N = 1). After some manipulation of the state-transition matrix artificiality is unnecessary for input-binary channels but of the state diagram of a binary-tree convolutional code isrequired to proveourresultforgeneralmemoryless of constraint length K , it is shown in Appendix 1 5 that channels). Further details of this procedure are given in 1 Viterbi [9]. We now seek a bound on the average error probability LK(l - L) T(L) = <----LK of this ensemble of codes relative to the measure (random1 - 2L 1 - 2L+ LK selection process) imposed.. We begin by considering the = LK(l 21, 4L2 * * * 2Lk * * .) (50) probability thatafter transmissionoveramemoryless channelthemetric of one of the fewer than 2k paths where the inequality indicates that more paths are being merging with the correct path after differing in K k counted thanactuallyexist.The expression (50) indibranches,isgreater than the correctmetric. Let Si be cates that of the paths merging with the correct path at the correct(transmitted)sequenceand xi anincorrect a given node level there is i o more than one of length K, sequence for theithbranch of thetwopaths.Then no more than two of length K 1, no more than three of following the argument whichled to (37) we have that length K 2, etc. the probability that the given incorrect path may cause We purposely have avoided considering actual the a n error is bounded by code or coder configuration so that the preceding expresK+k sions arevalidforallbinary-tree codes of constraint P I ( + k ( X , x)) .< r: ] i P(y, I Xiy2P(yiI (51) of codes to include length K. Wenowextendourclass i-1 y time-varying convolutional codes. A time-varying coder is one in which the tap positions may be changed after where the product is over allK k branches in the path. eachshift of thebitsintheregister.We consider the If we now average over the ensembleof codes constructed ensemble of all possible time-varying codes,which in- above we obtain cludes as asubsettheensemble of all fixed codes,for K+ k a given constraint length K . We furtherimpose a uniform PI(+, r1: x i X i y i q(xi)p(yi I x i ) / 2 ~ ( x i ) P ~ i < i -] I probabilisticmeasure on all codes inthisensembleby (52) randomlyreselectingeach tap position after each shift of the register. This can be done by hypothetically flipping a coin nK times after each shift, once for each stage where q ( x ) is the measure imposed on the code symbols of eachbranchbytherandomselection,and because of the register and for each of the n adders. If the outof thestatisticaluniformity of all branches we have
X. PERFORMANCE FOR BESTCONVOLUTIONAL BOUNDS CODES GENERAL FOR MEMORYLESS CHANNELS AND
+ +
1 4 Although for clarityallresults will bederived for b = 1, the extension to b > 1 isdirectandtheresults will be indicated at the end this Section. of 15This generatingfunctioncan also,be used to obtainerror bounds for orthogonal convolutional codes all of whose branches have the same weight, is shown in Appendix I. as
PK+k
<
(
Y
[
X
q(x)P(y I X)1/2]2)KCk 2 - ( K + k ) n R o(53) =
16
level.
The samevectorisadded
to allbranches a t a givennode
VITERBI : CONVOLUTIONAL
CODES
765
where
To improve on these bounds when R > R,, we must improveontheunionboundapproachbyobtaininga single boundontheprobabilitythatanyone of the fewer than 2' paths whichdifferfrom the correct path in K -t k branches has a metric higher than the correct, Note that the random vectors x and y are n dimensional. path a,t a given node level. This bound, first derived by If each symbol is transmitted independently on a memoryGallager [5] far block codes, is always less than 2 k times lesschannel,such as mas the case inthechannels of the bound for each individual path. Letting QK+k L& P r Sections VII-A and -B, (54) is reduced further to (anyone of 2' incorrectpathmetrics > correct path metric), Gallager [5] has shown that its ensemble average a = - log2 { v z dX):)t.'(Y x)1/2121 (55) for the code ensemble is bounded by 0 I
cc
(5% where x and y are now scalar random variables associated witheach codesymbol.Note 'also that because of the where statistical uniformity of the code, the results are independent of which path wastransmittedand which incorrect path we are considering. Proceeding as in Section VII, it follows that, a union 0 <p 5 1 (59) bound on the ensembleaverage of the 'first-eventerror probability is obtainedbysubstituting pKck LK+k where p is an arbitrary parameter which we shall choose for in (50). Thus to minimize the bound. It is easily seen that Eo(0) = 0, while E,(l) = R,, in which case = a k P P , + k , the ordinary union bound of (56). We bound the overall ensemble first-event error probability by the probability of the union of these composite eventsgivenby (58). Thus we find where we have used the fact that since b = 1, R = l / n bits/symbol. T o bound the bit, errorprobability we must weight each term of (56) by the number of bit errors for the corresponding incorrect path. This could be done by evaluating the transfer function T(L, ) as in Section VI1 N (seealsoAppendix' I ) , but.asimpler approach, which yieldsasimplerboundwhich is nearlyastight, is to recognize that an incorrectly chosen path which merges with the correct path after K IC branches can produce nomore IC 1 biterrors. For, any path whichmerges with the correct pathat a given level must be generated by data which coincides with the correct path data over the lastK - 1 branches prior to merging, since only in this way can the coder register be filled with the same bits as t.he correct path, which is the condition for merging. Hencethenumber of incorrectbitsduetoa path which differs from correct the path in K IC branchescanbe no greater than K k - ( K - 1) = k 1. Hence we mayoverbound p B by weighting thekth term of (56) by k 1, .which results in
&K+k
< ~kp2-(K+k)nEo(p)
Clearly (60) reduces to (56) when p = 1. To determine bit probability the error using this approach, we must recognize that refers to 2k differentincorrect paths,eachwithadifferentnumber of incorrect bit,s. However, as just was observed in deriving (57), an incorrect path whichdiffersfrom the correct path in K k branchesprior to merging can 1 bit errors. Hence weighting the produce a t most k kth term of (60) by k 1, we obtain
+ +
Clearly (61) reduces t.o (57) when p = 1. Before we can interpret the results of (56) , (57), (60), and (61) i t isessential that we establishsome of the properties of Eo(p)(0 < p 5 1) definedby (59). It can memorylesschannel, be shown [5], [14] thatforany E o ( p ) is a concave monotonic nondecreasing function as shown in Fig. 15 with E,(O) = 0 and Eo(1) = Where the derivative E,,'(p) exists, it decreases with p and it follows easily from the definition that
e,.
(57) The bounds of (56) 8nd (57) arefinite onlyforrates be always less than the channel capacity.
R < Ro, and Ro canbeshown.to
= - I(Xnl Y") 4 n
766
IEEE TRANSACTIONS O N COMMUNICATIONS TECHNOLOGY, OCTOBER

LIM EIR)
1971
6-01
Fig. 15. Example of E&)
function for general memoryless channel.
R,
Fig. 16. Typical limiting value of exponent of (67).
the mutual information of the channells where Sn and Y n are the channel input and output 'spaces, respectively, for eachbranphsequence.Consequently, i t follows t h a t t o minimize the bounds (60) and ( 6 i ) , we must make p' i 1 as 'large as possible t o maximize the exponent of the numerator, but at the same'time'we must ensure that
inorder t o keepthedenominatorpositive.Thussince E O ( l )= R,) 'and E,,(p) < Ro, for p < 1, i t follows that for R < Ro and sufficientlylarge K we shouldchoose p = 1, or equivalently use the bounds ( 5 6 ) and (57). We may thus combine all the above bounds into the expressions
Fig. 15 demonstrates graphical the determination of lim6-,oE(R) from Eo(p). It follows frorn the properties of E,,(p) described, that for R > Rot lim6-o E ' ( R ) decreases from Ro to 0 as R increases from Ro t o C, but that it remains positive for all rates less than C. The function is shown for a typical channel in Fig. 16. It is particularly instruct.ive to obtain specific. bounds, in the limiting' case, .for the class of "very noisy!' channels,whichincludes the BSC with p = 1/2 - y where 11 < 1 'and biphase 7 < the modulated AWGN with . c ~ / N , )< 1. Forthisclass < of channels i t can be shown [ 5 ] that
P, <
where
E(R) =
11 - 2- 6 ( R ) 2 ]
and consequently R,, = E o (1) = C/2.(FortheBSC, C = y2/2 In 2 while for the AWGN, C = E ~ / I V , , 2.) In For thevery noisychannel,suppose we let p = C/ R - 1, so that using (68) we obtain Eo(,,) = C - R . Then in the limit 'as 6 -j 0 ( 6 5 ) becomes for a very noisy channel
OIR<R,
p 0 ,
&(a) =
p/R
(Eo(p),
R,
< R < C,
Ro 5 R
<p 5
0
- 1,
O<R<R,
(66)
Eo(p)/R - P ,
< C,
<P 5
1.
To minimize the numerators of (63) and (64) for R > Ro we should choose p as large as possible, since E j o ( p ) is a
nondecreasing function of p . However, we are limited by t.he necessity of making S ( R ) > 0 t o keep the denomiOn theotherhand, as the natorfrom.becomingzero. constraint length K becomes very large we may choose S(R) = 6 very small. In particular, as 8 approaches 0, (65) approaches '
(69) R, C/2 5 R 5 C . This limiting form of E ( R ) is shown in Fig. 17. Thebounds (63) and (64) arefortheaverageerror probabilities of the ensemble of codesrelat.ive tothe measure induced by random selection of the time-varying coder t a p sequences. At least.onecode in the ensemble must perform better than $he average. Thus the bounds (63) and (64) hold for t.he best time-varyingbinarytree convolutional coder of constraint length K . Whether there exists a fixed convolutional code with this performance is an unsolved problem. However, for small K the results of Section VI1 seem to indicate that these bounds . . are valid also forfixed codes. T o determine'the tightness' of the upper bounds, it. is useful to have lower bounds for convolutional code error probabilities, It canbeshown [9] that for all R < C
'6 -0
lim $(R) =
{y:
0 S R 5 C/2
17 C canbemadeequaltothechannelcapacitybyproperly choosing the ensemble measure q ( x ) . For an input-binary channel the random binary convolutional coder described above achieves further transformation of the branch sequence this.Otherwise into a smaller set, of nonbinary sequences is required 191.
' a
and o ( K ) + 0 as K +
w.
Comparison of the parametric
VITERBI CONVOLUTIONAL CODES

LIM EIRI
767
.
c/2
Both Eb( R ) and E L b ( R )arefunctions of R which for all R > 0 are less than the exponents E ( R ) and E L ( R ) for convolutional codes [SI. In particular, for very noisy channels they both become [5]
Fig. 17. Limiting values of E ( R ) for very noisy channels
equations (67) with (71), shows that
E L @ ) = 1im8-,E(R)
for R > R,, but is greater for low rates. For very noisy channels, i t follows easilyfrom and (68) that (71)
E,(R)
C - R,
0 5 R 5 C.
Actually, however, tighter lowerbounds for R (Viterbi 191) show that for very noisy channels
<
C/2
This is plotted as a dotted curve in Fig. 17. Thus it is clear by comparing the magnitudes of the negat.ive exponents of (73) and (64) that, at least for very noisy channels, a convolutional code performs much better asymptotically than the corresponding block code of the same order of complexity. In particular at R = C / 2 , theratio of exponents is 5.8, indicatingthatto achieve equivalent performance asymptotically theblock length must be over five times the const.raint length of the convolutional code. Similar degrees of relative performa,nce can be shown formore general memoryless channels [ 91. More significant from a practical viewpoint, for short constraint lengths also, convolutional codes considerably outperform block codes of the same order of complexity.
which is precisely theresult of (69) or of Fig. 17. It follows that, at least,forvery noisy channels,the exXI. PATH MEMORY TRUNCATION METRICQUANTIZATION ponential bounds are asymptotically exact. AND SYNCHRONIZATION All the result,s derived in this section can be extended A major problem which arises in the implementat.ion directly to nonbinary ( b > 1) codes. It iseasily shown is the length of the (Viterbi [ 9 ] ) that the same results hold with R = b / n , of a maximumlikelihooddecoder path history which must be stored. In our previous disR,, and E o ( p ) multiplied by b , and all event probability cussion weignored t.his importantpointandtherefore upper bounds multiplied by 2b - 1, and bit probability implicitlyassumed that all past data would be stored. upper bounds multipliedby (ab - l ) / b . Afinal decision was made by forcing the coder intoa Clearly, the ensemble of codes considered here is nonsystematic. However, by a modification of the arguments known (all zeros) state. We now remove this impractical used here, Bucher and Heller [19] restricted the ensem- condit.ion. Suppose we truncate the path memories after ble to systematic t.ime-varying convolutional codes (i.e., M bits(branches)have been accumulat.ed, bycomparing all 2K metrics for a maximum and deciding on the codes for which b code symbols of eachbranchcorrespond to the data which generates the branch) and ob- bit corresponding to that path (out of 2 K ) with the hightained all the above results modified only to the extent est metric M branches forward. If M is several times as in this large as K , the additional bit errors introduced that the exponents E ( E ) and ET,( R ) are multiplied by way are very few, as. we shall now demonstrate using the 1 - R. (See also Section VIII.) Finally, it. is most revealing to compare the asymptotic asymptotic results of the last section. An additionalbiterrormay occur dueto memory resultsforthe best convolutional codes of a given contruncation after M branches, if the bit selected is from straint length with the corresponding asymptotic results for the best block codes of a given block length. Suppose an incorrect path which differed from the correct path M t.hat K bits are coded into a block code of length N so branches back and which has a higher metric, but which that R = K / N bits/code symbol. Then i t can be shown would ultimately be eliminated by the maximum likeli(Gallager [5J , Shannon e t al. [8] ) that for the best block hood decoder. But for a binary-tree code there can be no code, t.he bit error probability is bounded above and be- more 1,han 2$ distinct paths which differ from the correct path M branchesback. Of these we need concern ourlow by selves only with those which have not merged with the correct path in the intervening nodes. As was originally shown byForney [ 1 2 ] , usingthe ensemble arguments where of Section X we may bound the average probability of this event by [see (58)]
768
IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY. OCTOBER
1971
in performance between optimal and suboptimal metrics is significant [ 111. In a practical system other considerations than error performancefora given degree of decodercomplexity oftendictatethe selection of a coding system. Chief among these are often the synchronization requirements. Convolutional codes utilizing maximum likelihood decoding areparticularlyadvantageousinthat noblock where E , ( R ) is the blockcodingexponent. synchronization is everrequired. For blockcodes,deWe conclude thereforethatthe memorytruncation coding cannot begin until the initial point of each block error is less than the bit error probability bound without has been located.Practicalsystemsoftenrequiremore truncation,providedthe bound of (76) is less than the complexity in the synchronizationsystemthaninthe bound of (64). This will certainly beassured if decoder. On the other hand, as we have by now amply illustrated, a maximum likelihood decoder for a convolutional code doesnotrequire any blocksynchronization because the coder is free running (i.e., it performs identiForvery noisychannels we havefrom (69) and (74) cal operations for each successive input bit and does not or Fig. 17, that require that I< bits be input before generating an output). Furthermore, the decoder does not require knowledge of past inputs to start decoding; it may as well as0 I tl _< c/4 sume that all previous bits were zeros. This is not to say thatinitiallythe decoder will operateas well, inthe sense of error performance, as if the preceding bits of the correct path were known. On the other hand, consider a decoderwhich startswithaninitiallyknownpathbut 1 - R/C C/2 < R < C makes an error at some point and excludes the correct (1 path. Immediately thereafter it will be operating as if it For example, at R = C / 2 this indicates that it suffices hadjust been turned on withanunknownandincorrectly chosen previous path history. That this decoder to take M > (5.8)K. will recover and stop making errors within a finite numAnotherproblemfacedby a systemdesigner is the amount of storage required by the metrics (or log-likeli- ber of branches follows from our previous discussions in hoodfunctions)foreach of the ZK paths. Fof. a BSC which itwas shown that-,otherthanforcatastrophic codes, error sequences are always finite. Hence our inithis poses no difficulty since the metric is just the tiallyunsynchronized decoder will operatejustlikea Hammingdistance which is at most n, thenumber of decoder which has just made an error and will thus alcode symbols, per branch. For the AWGN, on the other ways achieve synchronization and generally will produce hand, the optimum metric is a real number, the analog output of a correlator,matchedfilter, or integrate-and- correct decisions after a limited number of initial errors. Simulations have demonstrated that synchronizationgendump circuit. Since digital storage is generally required, erally takes no more than four or five constraint lengths it is necessary t o quantize this analog metric. However, once the components yjk of the optimum metric of (5), of received symbols. Alt.hough, as we have just shown, branch synchronizawhicharethecorrelatoroutputs,havebeenquantized tion is not required, code symbol synchronization within to Q levels, the channel is no longer an AWGN channel. For biphase modulation, for example, it becomes a binary a branch is necessary. Thus, for example, for a binarytreerate R = 1/2 code, we must resolve the two-way input Q-ary output discretememorylesschannel,whose where two each code-symbol branch a transition probabilities are readily calculated as function ambiguityasto begins. This iscallednode synchronization.Clearly if of the energy-to-noise density the and quantization we make the wrongdecisions, errors will constantly be levels. The optimum metric is not obtained by replacing madethereafter.However,thissituationcaneasily be yi, by its quantized value &(yjk) in (5) but rather it is detectedbecause the mismatch will cause all the path the log-likelihood function log P ( y I x c m ) )for the binarymetrics to be small, sincein fact there will not be any input Q-ary-output channel. Nevertheless,extensivesimulation [24] indicates that correct path in this case. We can thus detect this event for 8-level quantization even use of the suboptimal metric and change our decision as to node synchronization (cf. Heller and Jacobs [24]). Of course, for an R .= l / n code, Q ( ~ J , ~ ) Z ~ ~ (in ) a degradat,ion of no results~ more than 0.25 dB relative to the maximumlikelihood decoder we may have to repeat our choice n times, once for each since n represents the for the unquantized AWGN, and that of the optimum of the symbols on a branch, but use redundancy factor or bandwidth expansion, practical sysmetric isonlynegligiblysuperior to this. However, t.his is not the case for sequential decoding,where the difference tems rarely use n > 4.
T o minimize this bound we shouldmaximize the exponent E o ( p ) / R - p with respect to p on the unit interval. But this yields exactly E , ( R ) ,the upper bound exponent of (73) for block codes. Thus
ck
769
quantization (8 or more levels-3 or more bits). On the other hand, with maximum likelihood decoding, by employing implementation, constraint parallel a short This paper has treated primarily maximum likelihood length codes ( K 6 ) can be decoded a t very high data decoding of convolutional codes. The reason for this was rates (10 to 100 Mbits/s) even with soft quantiz at'ion. two-fold: 1) maximum likelihood decoding is closely In addition, insensitivity metric the to accuracy and relatedtothestructure of convolutional codes andits simplicity of synchronization render maximum likelihood considerationenhancesourunderstanding of theultidecoding generally preferable when moderate error probmate capabilities,performance,andlimitation of these abilities are sufficient. In particular, since sequential decodes; 2) forreasonablyshortconstraintlengths ( K < coding is limited by the overflow problem to operate at 10) its implementation is quite feasible'* and worthwhile code rates somewhat below E o , it appears that for the K 5 6 , the AWG'N the crossover point above which maximum likebecause of itsoptimality.Furthermorefor complexity of maximum likelihood decoding is sufficiently lihood decoding is preferable to sequential decoding oclimit,ed that a completely parallel implementation (sepacurs a t values of P, somewhere between and rate metric calculators) ispossible. This minimizes the depending on the transmitted data rate. As the data rate decoding time per bit and affords the possibility of ex- increases the P, crossover point decreases. tremely high decoding speeds [24]. A third technique for decoding convolutional codes is Longerconstraintlengthsarerequiredforextremely withthresholddwoding known as feedbackdecoding, low errorprobabilities a t high rates. Since thestorage [3] asasubclass.Afeedbackdecoderbasicallymakes andcomputationalcomplexityareproportionalto 2R, a decision on a particular bit or branch in the decoding maximum likelihood decoders become impractical for tree or trellis based on the received symbols for a limited K > 10. At this point sequential decoding [2], [ 4 ] , [ 6 . ] number of branches beyond t.his point. Even though the becomes attractive. This is an algorithm which sequendecision is irrevocable, limited for constraint lengths tially searches the code tree in an at.telnpt to find a path (which are appropriate considering the limitednumber whose metric rises faster than some predetermined, but of branches involved in a decision) errors will propagate variable, threshold. Since the difference between the cor- only for moderate lengths. When transmission is over a rect path metric and any incorrect path metric increases binary symmetric channel, by employing only codes with with constraint length, for large I< generally the correct certain algebraic (orthogonal) properties, the decision on path will be foundby this algorithm. The main drawa given branch can be based on a linear function of the back is that the number of incorrect path branches, and received symbols, called the syndrome, whose dimenconsequently the computationcomplexity,isarandom sionality is equal to the number of branches involved in variabledepending on thechannel noise. For R < Rot the decision. One particularlysimple decision criterion it is shown that the average number incorrect branches based on this syndrome, referred t o as threshold decodof searched per decoded bit is bounded [ 6 ] ,while for R > ing, is mechanizable in a very inexpensive manner. HowR,, it is not; hence R,) is called the computat,ional cutoff ever, feedback decoders in general, and threshold decodrate. To make storage requirements reasonable, it is nec- ers particular, an in have error-correcting capability essary to makethe decodingspeed (branches/s) some- equivalent very to short constraint' length codes and whatlargerthanthe bit. rate,thussomewhatlimiting consequently do not compare favorably with the performthe maximum bit rate capability. Also, even though the ance of maximum likelihood or sequential decoding. .average number of branches searched per bit is finite, i t However, feedback decoders particularly are well may sometimes become very large, resulting in a storage suited to correcting error bursts which may occur in fadoverflow and consequently relatively long sequences being channels. Burst errors are generally best handled by ing erased. The stack sequential decoding algorithm [ 7 ], usinginterleavedcodes: that is,employing L convolu1181 provides a very simple and elegant presentation of tional codes so that the jth, (L j ) t h (2L j)th, etc., thekey concepts insequentialdecoding,althoughthe bitsare encoded into one code foreach j = 0, 1, . * , Fano algorithm [4] is generally preferable practically. L - 1. This will cause any burst of length less than L For a number of reasons, including buffer size require- to be broken up into random errors for the L independments, comput.ation speed, and metric sensitivity,sequen- entlyoperating decoders. Interleavingcan be achieved tial decoding of data transmitted at rates above about bysimplyinserting L - 1 stagedelay lines between 100 K bits/s is practical only for hard-quantized binary stages of the convolutional encoder; the resulting single received data (that is, for channels in which a hard deciencoder then generat.es the L interleaved codes. The sigsion -0 or 1- is made for each demodulated symbol). nificant advantage of a feedback or threshold decoder is For thebiphase modulat.ed AWGNchannel, of course, that the same technique can be employed in the decoder hardquantization (2 levels or 1 bit)resultsinan effiresulting in a single (time-shared) decoder rather than L ciency loss of approximately 2 dB comparedwithsoft decoders,providingfeasibleimplementations forhardquantized channels, even for protection against error bursts of thousands of bits. Details of feedback decoding 18 Performing metric calculations and comparisons serially.
XII. OTHERDECODING ALGORITHMS CONVOLUFOR TIONAL CODES
<
770
OCTOBER
1971
aretreatedextensivelyinMassey[3],Gallager and Lucky e t al. 1161.

APPENDIX I
[ 141,
times the first), we obtain finally a 2"' matrix equation, which for K = 4 is
1 dimensional
GENERATING FUNCTIONSTRUCTURE BINARY-TREE FOR OF A CONVOLUTIONAL CODE FOR ARBITRARY A N D ERROR K BOUNDS ORTHOGONAL FOR CODES L
Wederivehere t.he distance-invariant (D = 1) generat,ingfunction T ( L , N ) for anybinarytree ( b = 1) convolutionalcode of arbitrary constraint length K . It is most convenient in the general case to begin with the finite-state machine state-transition matrix for the linear equations among t.he state (node) variables. We exhibit this in terms of N and L for a I< = 4 code as follows:
1
-L
1
0
-L
1 - NL
] * P I n 1 ] = [
:1.
(83)
X,,,
Notethat (83) is thesameas(78)for K reducedby unity, but with modifications in two places, both in the first row; namely, the first component on the right side is squared, and the middle term of the first rowis reducedby an amount NL'. Although we have given the explicit result only for K = 4, it is easily seen t o be valid for any K .
0
1
-NL
0 0
1
0 0 0 -L
-L -NL
0
0
1
-L -NL
0
1
0 0
-L' -NL 0
O
0 0
0
-0 L -NL
0 0
0 0
-L
0 -NL
1-NL
JT-'J=l,]'
= Xil;;...;rc-rll
Thispatterncan be easily seen t o generalizetoa 2K-1 - 1 dimensional square matrix of t,his form for any binary-tree code of constraint length K , and in general the generating function
T ( L ,N )
LXloo...o,
(79)
+
Since in all respects, except these two, the matrix after thissequence of reductionsisthesame as the original butwithitsdimensionreducedcorrespondingtoareduction of K by unity, we may proceed t o perform this sequence of reductions again. The steps will be the same except that now in place of (go), we have
where 100 . . 0 contains ( K - 2) zeros.
N X i , i ,...j K _ , O l
and in place of (82) X"00
...01
(80')
Fromthisgeneralpatternitiseasilyshownthatthe matrix can be reduced to a dimension of ZX-'. First. combining adjacent rows, from the second to the last, pairwise, one obtains the set of ZK-' - 1 relations
NLX'on ...o
+ Xon...111
(82')
NX;l;a...iK-20 X;l;l...;h.-21 =
(80)
where jl, . . . , j K - 2 runs over all binary vectors except j., for t,he all zeros. Subhtution of (80) into. (78) yields a 2fi--'-dimensional matrix equation. The result for R = 4 is
,,,1
TNLl
while in place of (81) the right of center term of the first. row is - ( L L 2 ) and the first component on the right side is N'L'. Similarly in place of (83) the center term L' L3) and the first comof the first row is - N ( L ponent on the right side is N3L3. Performingthissequence of reductions K - 2 times in all, but omitting the last step-leadingfrom (81) to (83)-in the last reduction, the original 2K-1 - 1 equations are reduced in the general case tot.he two equations
xoo-01
Defining the new variable X'on...o1 (which corresponds

=
1 - NL
N L Xnn...o1
Xnn...11
(82)
toaddingthe
second row to NL
VITERBI
: CONVOLUTlONAL CODES
771
whence i t follows that
Xll...1 =
1,
- N(L + +
L 2
(NT,)"-'
* *
+ LK-')
(135)
Applying (79) andthe (80') we find
K - 2 extensionsof
(80) and
DcK(1 - D,")' < D,"x(l (1 - 2D: D,"K)z (1 - 20,")' (91) where Do isafunction of the channel transition prob(46). abilities or energy-to-noise ratio and is given by
T(L, N ) = LXloo...oo LN-lXloo...o, =

=
ACKNOWLEDGMENT Theauthorgratefullyacknowledgestheconsiderable stimulationhehas receivedover the course of writing the several versions of this paper from Dr. J. A. Heller, whose recentworkstronglycomplementsandenhances this effort, for numerous discussions and suggestion8 and fqr assistance its his in presentation attheLinkabit Corporation"Seminars on Convolutional Codes." This t<torial approach owes part of its origin to Dr. G. D. Forney, Jr., whose imaginative and perceptive reinterpretation of myoriginalwork hsls aidedimmeasurslbly in rendering it more comprehensible. Also, thanks are due to Dr. J. K. Omura for his careful and detailed reading and correction of the manuscript during his presentation of this material in the UCLA graduate course on information theory. REFERENCES
[ I ] P. Elias, "Coding for' noisy channels," in 1055 I R E N a t . Conv. Rec., vol. 3, pt.4, pp; 37-46. 121 J. M. Wozencraft,"Sequentlal decoding for reliable communication," in 1957 I R E N a t . Conv. Record, vol. 5, pt. 2, pp.11-25. Decoding. Cambridge, Mass.: [31 J . L. Massey, Threshold M.I.T. Press, 1963. [41 R. M. Fano, "A heuristic discussion of probabilisticdecoding," I E h E Trans. Inform. Theory, vol. IT-9, Apr. 1963, p p .64-74. [51 R. G. Gallager, "A simple derivation of the coding theorem andsome applications," IEEE Trans. Inform. Theory, vol. rl'-11, Jan. 1965, pp: 3-18. 161 J. M. Wozencraft and I. M. Ja,cobs, Principles of Communication Engineering. New York: Wiley, 1965. [71 K. S. Zigangirov, "Some sequential decoding proced'ures," Probl. Peredach Inform.,vol. 2, no. 4, 1966, pp. 13-25. [SI C . E,. Shannon, R . G. Gallager, and E. R. Berlekamp, ''Lower boundstoerrorprobabilityfor coding on discrete memoryless channels," Inform. Contr., vol. 10, 1967,'pt..I, pp. 6!$-103, pt. 11,pp. 522-552. [91 A . J. Viterbi, "Error bounds for convolutional codes and an asymptoticallyoptimum decodingalgorithm," IEEE Trans. Inform. Theory,vol. IT-13, Apr. 1967, pp. 260-269. [lo1 ---, "Orthogonal tree codes for Communlcation in the presence of whiteGaussian noise," I E E E Tvans. 8Commun.. Technol., vol. COM-15, April 1967, pp. 238-242. E111 I. M. Jacobs, "Sequentid decoding fo'r efficient communication deep from space, IEEE 'Trans. Commun. Techm'l., V O ~ .COM-15, Aug. 1968, pp. 492-M1. [I21 G-. D. Forney, Jr., "Coding system design for advanced solar missions," submitted to NASA Ames Res. by Ctr. Codex Corp., Watertown, Mass., Final Rep., Contract NAS23637, Dec. 1967. [I31 J . L. Massey and M. K. Sain, "Inverses of linear sequential circuits," I E E E Trans.Cornput., vol. C-17, Apr. 1968, pp. 330437. Theory Reliable and Com[I41 R.. G. Gallager, Information m.unicatwn. New York: Wiley, 1968. 1151 T. N. Morrissey, "Analysis of decoders for convolutional codes by stochastic sequential machine methods," Univ. NotreDame, Not.re Dame,Ind.,Tech.Rep. EE-682, May 1968. [I61 R. W. Lucky, J. Salz, and E. J . Weldon, Principles of Data Communication. New York: McGraw-Hill, 1968.
'
LN-2Xloo...oll . * =
1 - N(L
LN-'"-2'Xll...,
+ +
NLK L2
* *
N L ~ ( - L) I 1 - L(l N ) + NL"
'
If we require only the path length structure, and not the number of bit errors corresponding to any incorrect path, we may set N = 1 in (86) and obtain
T(L) =
- (. L + L2 + .
L"
+ LK-!
LK(l - L) 1 - 2L LK
(87)
expression which is the generating function of more paths than exist in our state diagram, we have
If we denote as an upper bound an
T(L) < *-
- 2L
L"
As an additional application of thisgeneratjngfunction technique, we now obtain bounds on PE and PB for the class of orthogonal convolutional (tree) codes ,intraduced by Viterbi [ 101. For this class of codes, to'each of the 2 K .branches of the K-state diagram there corresponds one of 2R orthogonalsignals.Given that eachsignalis orthogonal t0 all others in n 2 1 dimensions, corresponding to n channel symbols or transmission times (as, for example, if each signal consists of n different pulses out of 2% possible positions), then the weightof each branch is n. Consequently, if we replace L , thepathlength enumerator, by D" in (86) we obtain orthogonal for codes
T(D, N ) =
- Dn(l + N ) + NDnK
N D " ~ ( - 0") I
(89)
Then using (48) and (49) , the first-event error probability for orthogonal codes is bounded by
and the bit error probability bound
is
772
IEEE TRANSACTICINS ON COMMUNICATIONS TECHNOLOGY, VOL. COM-19, NO.
5, OCTOBER 1971
[17] J. K. Omura, On theViterbidecodingalgorithm, IEEE Trans. Inform. Theory, vol. IT-15, Jan. 1969, pp. 177-179. [181 FI .Jelinek, Fast. sequential decoding algorithm using a stack, I B M J. Res. Dev., vol. 13, no. 6, Nov. 1969, pp. 675-685. robabilitybounds [191 E. A. Bucherand J. A. Heller, E;ror for systematic convolutional codes, IEEE Trans. Inform. Theory, vol. IT-16, Mar. 1970, pp. 219-224. [201 J. P. Odenwalder, Optimal decoding of convolutional codes, Ph.D.dissertation,Dep.Syst. Sci., Sch.Eng.Appl. Sci., Univ. California, Los Angeles, 1970. [211 G. D. Forney, Jr., Coding its and application in space communlcatlons, IEEE Spectrum, vol. 7, June 1970, pp. 47-58. [221 , Convolutional codes I: Algebraic structure, ZEEE Trans. Inform. Theory, vol. IT-16, Nov. 1970, pp. 720-738; I1: Maximumlikelihood decoding, and 111: Sequential decoding, IEEE Trans. Inform. Theory, tobepublished. [231 W. J. Rosenberg, Structural properties of convolutional codes,Ph.D.dissertation,Dep. Syst.. Sci., Sch.Eng.Appl. Sci., Univ. California, Los Angeles,1971. [241 J. A. Heller and I. M. Jacobs, Viterbi decoding for satellite and space com,munication, this issue, pp. 835-848. [251 A. R. Cohen, J. A.Heller,and A. J. Viterbi, A new coding technique for asynchronous multiple access communication,, this issue, pp. 849-855.
Andrew J. Viterbi (S54-M58SM63) w&s born in Bergamo, Italy, on March 9, 1935. He received the B.S. and M.S.degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1957, and the Ph.D. degreeinelectrical engineering from the University of Southern California, Los Angeles, in 1962. While attending M.I.T., he participated in the cooperative program attheRaytheon Company. In 1957 he joined the Jet Propulsion Laboratory where he became a Research Group Supervisor in the Commnnications Systems Research Section. I n 1963 he joined the faculty of the University of California, Los Angeles, as an Assistant Professor. In 1965 he was promoted to Associate Professor and in 1969 to Professor of Engineering and Applied Science. He was a cofounderin 1968 of Linkabit Corporation of which he is presently Vice President. Dr. Viterbi is a member of the EditorialBoardsof thePRocmmNGs OF THE IEEE and of the journal Information and Control. He is a member of Sigma Xi,Tau Beta Pi, andt a Kappa Nu and has served E on several governmental advisory committees and panels. He is the coauthor of a book on digital cornmurkation and authorof another on coherent communication, and he has received three awards for his journal publications.
Burst-Correcting Codes for the Classic Bursty Channel
Abstract-The purpose of this paper is to organizeand clarify the work of the past decade on burst-correcting codes. Our method is, first, define to an idealizedmodel,called the classic bursty channel, toward which most burst-correcting schemes are explicitly or implicitly aimed; next, to b o y d the best possible performance on this channel; and, finally, to exhibit classes of schemes which are asymptotically optimum and serve as archetypes of the burstcorrecting codes actually i use. In this light we survey and catn egorize previous work on burst-correcting codes. Finally, we discuss qualitatively theways in whichreal channels failto satisfy the assumptions of the classic bursty channel, and the effects of such failqres on the various types of burst-correcting schemes. We conclude by comparing forward-error-correction to the popular alternative of automatic repeat-request (ARQ).
INTRODUCTION
OST WORK in coding theory has been addressed
to efficient communication memoryless over channels. While work this has been directly applicable to space channels [ 13, it has been of little use on all other real channels, where errors tend to occur in bursts. The use of interleaving to adapt random-errorcorrecting codes toburstychannelsisfrequently proPaper approved by the Communicatioq Theory Committee of the IEEE Communication Technology Group publication for without oral presentation. Manuscript received May 10,1971. The author is with Codex Corporatioq, Newton, Mass., 02195.
posed, but turns out to be a rather inefficient method of burst correction. Of the work that has gone into burst-correcting codes, thebulkhas been devotedtofinding codes capable of correcting all bursts of length B separated guard by spaces of length G. We these call zero-error burstcorrecting codes. It has beenrealizedinthepast few years that this work too has been somewhat misdirected ; for on channels for whichsuchcodes are suited, called in this paper classic bursty channels, much more efficient communication is possiblk if we require pnly that practically all bursts of length B be correctible. The principal purpose of this paper is tutorial. In order toclarifythe issues involved the in design of burstcorrecting codes, we examine an idealized model, the classic bursty channel, on which bursts are never longer than B nor guard spaces shorter than G. We see that the inefficiency of zero-error codes is due to their operating at the zero-error capacity of the channel, approximately ( G - B ) / (G B ) , rather than at the true capacity, which i s morelike G / ( G B ) . Operation a t thetrue capacity is possible, however, if bursts can be treated as erasures; that is, if their locations can be identified. By theconstruction of some archetypalschemes in which short Reed-Solomon (RS) codes are used with interleavers, we arriveatasymptoticallyoptimal codes of

Viterbi 1971

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Viterbi 1971

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY,

V L COM-19, NO. 5, OCTOBER 1971 O.

[211 R. T. Ctticn, Cyclic decoding procedure for

Convolutional Codes and Their Performance in Communication Systems

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER 1971

VITERBI : CONVOLUTIONAL CODES

Fig. 1. Convolutional coder for K

Fig. 3. Trellis-code representation for coder of Fig. 1 .

Fig. 2. Tree-code representation for coder of Fig. 1

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER

IV. DISTANCE PROPERTIES OF CONVOLUTIONAL CODES

VITERBI : CONVOLUTIONAL CODES

T(D) =I ____ 1 -20

diagram Fig. 6. State

labeled according zeros path.

to distance from all

+ D0L4(l+ L)N2 + D7L5(l +

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER

Fig. 7. Statediagram labeledaccording to distance,length,and number of input ones.

Fig. 8. Coder for K = 2, b

V. GENERALIZATION TO ARBITRARY CONVOLUTIONAL CODES

Fig. 9. State diagram for code of Fig. 8.

Pig. 10. Communicationsystememployingconvolutional

that this was transformed to the y at distance d,, from it is

and the log-likelihood function is thus

n(t) WHITE GAUSSIAN NOISE

Fig. 11. Modemforadditivewhite Gaussiannoise lated memoryless channel. '

code path xi("')

The likelihood function for the jth branch of a particular

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY, OCTOBER

< 2kp(1 - p)k/2.

can be more loosely bounded by

Fig. 12. Example of decoding decision after initial error occurred.

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY,

Then from this we obtain, as in (9) , that for the BSC

More generally for any binary-tree on the BSC if

and corresponding to (18) we have the weaker bound

where aL are the coefficients of

Consequently,for k 2 dl letting 1 = k - d, from (23)

whence the bound of (24), using (27), becomes

where ck are thecoefficients of

O N COMMUNICITIONS TECHNOLOGY, OCTOBER

and the product (37)

I Z~)''~P(~~ < 1. I X,')'''

763 memoryless channel (46) (47)

so that for anyinput-binary becomes

P(y I ~ ) " z P ( yI 9)"'

Fig. 13. Systematicconvolutioncoder

TABLE I MAXIMUM-MINIMUM DISTANCE FREE

IEEE TRANSACTIONSO N COMMUNICATIONS TECHNOLOGY, OCTOBER

Fig. 14. Coder displaying catastrophic error propagation.

q(x)P(y I X)1/2]2)KCk 2 - ( K + k ) n R o(53) =

R < Ro, and Ro canbeshown.to

IEEE TRANSACTIONS O N COMMUNICATIONS TECHNOLOGY, OCTOBER

Fig. 15. Example of E&)

function for general memoryless channel.

Fig. 16. Typical limiting value of exponent of (67).

Comparison of the parametric

VITERBI CONVOLUTIONAL CODES

Fig. 17. Limiting values of E ( R ) for very noisy channels

equations (67) with (71), shows that

IEEE TRANSACTIONS ON COMMUNICATIONS TECHNOLOGY. OCTOBER