Parallel Viterbi Algorithm Implementation: Breaking The ACS-Bottleneck

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31, NO.
8, AUGUST 1989 785
Parallel Viterbi Algorithm Implementation:

Breaking the ACS-Bottleneck
Abstract-The central unit of a Viterbi decoder is a data-dependent estimation of a finite state discrete-time Markov process where
feedback loop which performs an add-compare-select (ACS) operation. the optimality can be achieved by criteria such as maximum-
This nonlinear recursion is the only bottleneck for a high-speed parallel likelihood or maximum-a posteriori. For a tutorial on the VA
implementation. see [4]. Below, the VA is explained only briefly to introduce
This paper presents a solution to implement the Viterbi algorithm by the notation used.
parallel hardware for high data rates. For a fixed processing speed of The underlying discrete-time Markov process has a number
given hardware it allows a linear speedup in the throughput rate by a of N,states z, . At time ( n + 1) T a transition takes place from
linear increase in hardware complexity. A systolic array implementation is the state of time n T to the new state of time ( n + 1) T. The
presented. transitions are independent (Markov process) and occur on a
The method described here is based on the underlying finite state memoryless (noisy) channel. I The transition dynamics can be
feature. Thus it is possible to transfer this method to other types of described by a trellis diagram, see Fig. 1. Note that parallel
algorithms which contain a data-dependent feedback loop and have a transition branches can also exist, as in Fig. 1 from z1 + z,.
finite state property. To simplify the notation, we assume T = 1 and the transition
probabilities to be time invariant.
I. INTRODUCTION The VA estimates (reconstructs) the path the Markov
process has taken through the trellis recursively (sequence
T 0 boost the achievable throughput rate of an
implementation of an algorithm, parallel and/or pipelined
architectures can be used. For high-speed implementations of
estimation). At each new time instant n and for every state the
VA calculates the optimum path which leads to that state, and
discards all other paths already at time as nonoptimal. This is
an algorithm, architectures are desired that at maximum lead accomplished by summing a probability measure called state
to a linear increase in hardware complexity for a linear metric rn,,,for each state z, at every time instant n. At the next
speedup in the throughput rate if the limit of the computational time instant n + 1, depending on the newly observed
speed of the hardware is reached. An architecture that transition, a transition metric Xn,zk+zr is calculated for all
achieves this linear dependency is referred to as a linear scale possible transition branches of the trellis.
solution. It can be derived for a number of algorithms such as The algorithm for obtaining the updated rn+l,z, can be
those of the plain feedforward type. Also, for algorithms described in the following way. It is called the add-compare-
containing linear feedback loops a linear scale solution can be select (ACS) unit of the VA. For each state, z, and all its
found [ 11. However, a linear scale solution has not yet been predecessor states z k choose that path as optimum according to
achieved for algorithms containing a data-dependent decision the following decision:
feedback. An algorithm of the latter type is the Viterbi
algorithm (VA), which is related to dynamic programming
PI. + Xn,zk+zi).
rn+I,zi := maximum (rnSzk
(all possible Zk'Z;)
In this paper, a linear scale solution (architecture) is
presented which allows the implementation of the VA despite The surviving path has to be updated for each state and has to
the fact that the VA contains a data-dependent decision be stored in an additional memory called survivor memory.
feedback loop. In Section I1 the VA and its application are For a sufficiently large number of observed transitions
described. Section I11 introduces the new method which (survivor depth B) it is highly probable that all N,paths merge
achieves the linear scale solution. The add-compare-select when they are followed back. Hence, the number B of
(ACS) unit of the VA can be implemented with this new transitions which have to be stored as the path leading to each
method as a systolic array as shown in Section IV. Investiga- state is finite, which allows the estimated transition of time
tions concerning the implementation of the survivor memory instant n - B to be determined.
are found in Section V. Conclusions form the contents of the Note, when parallel branches [(a) and ( b ) ]exist, one can
summarizing Section VI. find the maximum of their transition metrics before the ACS
procedure is performed, since
11. PROBLEM DEFINITION
In 1967, the VA was presented as a method of decoding
convolutional codes [3]. In the meantime it has been proven
to be a solution for a variety of digital estimation problems.
The VA is an efficient realization of optimum sequence
Therefore, the notation used here assumes that the maximum
Paper approved by the Editor for Coding Theory and Applications of the
metric of each set of parallel branches is found prior to the
IEEE Communications Society. Manuscript received August 12, 1987; ACS operation being performed. It is the one referred to as
revised May 1, 1988. This paper was presented in part at ICC'88, Xll,Zj'Zk:
Philadelphia, PA, June 12-15, 1988. An implementation of the VA, called the Viterbi decoder
The authors are with Aachen University of Technology, Templergraben 55,
5100 Aachen, West Germany. ' For certain problems the VA has proven to be an efficient solution even
IEEE Log Number 89291 11. for channels with memory (intersymbol interference [ 5 ] , [ 6 ] ) .
.OO 0 1989 IEEE

0090-6778/89/08OO-0785$01
786 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 37, NO. 8, AUGUST 1989
= - * : ... : z3
... original
t r elli s -
1-step
21
... 2, ... M-step-

trellis
nT ln+llT
-
time
2,
--
I M = 31
Fig. 3. Principle of the M-step trellis shown for a simple example ( M = 3).
Fig. 1. General example of a trellis.
feed-back of step trellis (1s trellis, 1s-VA etc ... .). An illustration for a
updated s t a t e metrics r simple example with M = 3 is given in Fig. 3. Now the MS
trellis can be used for Viterbi decoding the same process,
allowing the Ms-ACS loop to be computed M times slower.
+J-y-t+-k
Input transition
metric
Fig. 2.
survwor
memory
Pipeline structure of a Viterbi decoder.

output
But the number of transition branches in the Ms trellis
increases exponentially as M increases linearly. Also states are
connected by transition branches which were not connected in
the 1s trellis.
B. Linear Scale Solution
As was mentioned in Section II, to achieve a fast Ms-ACS
(VD), can be modeled as shown in Fig. 2 where it is broken up unit parallel branches of the trellis should be eliminated prior
into its three basic pipelined components; the computation unit to the Ms-ACS unit using them. This has to be done to
of the transition metrics, the add-compare-select (ACS) unit simplify the Ms-ACS procedure as far as possible and to
and the survivor memory. minimize the time required by the comparison and feedback. It
is the actual part where the exponential increase in implemen-
111. THEHIGH-SPEED PARALLEL IMPLEMENTATION OF THE tation effort arises in the Ms trellis. However, the search for
VITERBI ALGORITHM the optimum Ms transition branch of each set of parallel
A high speed implementation of the VA can only be branches can be achieved by the VA using the 1s trellis. This
achieved by increasing the speed of computation of all its three can be explained as follows. Let us calculate the Ms transition
units. This can be done conventionally for the transition metric metrics for the simple example shown in Fig. 3, e.g., from
unit, as it is of simple feedforward type which can easily be state zI to all N, = 3 states zl, zz, and 2 3 . The exponential
implemented with parallel and pipelined architecture, thus increase in branches of the Ms trellis is illustrated by the
allowing a linear scale solution. The survivor memory rooted tree shown in Fig. 4. However, this rooted tree can be
follows, as a slave unit, the decision feedback of the redrawn as a trellis as shown in Fig. 5 which we refer to as a
predecessor master unit (ACS). Since the ACS unit is much rooted 1s trellis in contrast to a 1s trellis as shown in Fig. 3.
more complex, it is the bottleneck which limits the throughput Hence, to estimate the optimum transition metric of the Ms
rate. Consequently, we omit the survivor memory in the trellis from state zI at time n T to all three states at time ( n +
discussion below, but discuss it later in Section V. M ) T, the VA can be used based on this rooted 1s trellis. This
The ACS procedure has to be computed independently for has to be done for all states (see Fig. 5 ) .
each individual state. Thus, a parallel number of N,computa- In general, the maximum transition metric out of every set
tion cells, called ACS cells, can be implemented in the ACS of parallel branches can be computed and selected by applying
unit to perform the ACS operation separately for every state the VA to decode each (of N,)rooted 1s trellis. The most
(then called shuffle exchange ACS unit). When requiring a important aspect of this approach, which allows the breaking
very high decoding speed this approach is limited by the of the ACS feedback bottleneck, is that the length (number of
maximum achievable computation speed of an ACS cell [7]. transitions or steps) of each rooted 1s trellis equals M. Thus,
Since the ACS feedback loop contains data-dependent multi- the computational complexity of each 1s-VA is asymptotically
plexing, it seems that the above-mentioned approach provides linearly dependent on M . Furthermore, the 1s-VA’s are
the maximum parallelism that can be achieved for the ACS independent of each other and independent of the state metrics
unit. The nonlinear ACS feedback loop does not allow any r of the Ms-VA. They can therefore be computed with
linear algebraic methods to be used to obtain highly parallel/ pipelined and/or parallel Is-VD’s. Note that the parallel
pipelined architecture as is done for linear feedback loops, rooted 1s trellises, as e.g., shown in Fig. 5, use the transition
e.g., in [l]. However, a linear scale solution can be found. metrics based on the original trellis. Hence, the irnplementa-
tion of the transition metric unit is not affected by the parallel
A . Introducing the M-Step Trellis ACS implementation.
The underlying Markov process is time discrete with rate 1 /
T, i.e., transitions take place in intervals of length T between C . Linear Scale Solution: Verification
nT and ( n + 1) T. Consequently, the trellis describing the We recall that the bottleneck in a high speed implementation
process is also time discrete with rate 1/T. However, the same of the VA is the ACS unit, containing a maximum number of
Markov process can also be viewed upon in time intervals of N,ACS cells in its parallel version (shuffle exchange ACS
length MT, i.e., when observing the transitions from n M T to unit). Thus, for a discussion of time and hardware scaling we
( n + 1)MT. Thus, the trellis describing the process by this need to introduce the two total cycle times of the ACS
lower rate is time discrete with rate I / ( M T ) . Since this trellis feedback loops; first for the Ms-ACS unit of the Ms-VD: 7 ,
combines M transitions of the original trellis we refer to it as and second for the 1s-ACS unit of the Is-VD: 8 . The Ms
the M step trellis (Ms trellis, Ms transition, Ms-VA etc ... .), trellis in general has a much higher connectivity, i.e., many
while the original trellis from now on is referred to as the 1- more states are connected by transition branches in the Ms
FETTWEIS AND MEYR: PARALLEL VITERBI ALGORITHM IMPLEMENTATION 787
Fig. 4. Rooted tree of Ms-transitions leaving state 2, for M = 3 of the Fig. 6 . Timing diagram of the decoding cycles of the Ms-VD and the Is-
example of Fig. 3. ACS units given the time scale of the Ms transitions.
1s- ACS units
transition
WS - ACS
metric
memory
2, rooted
1s -trelL$s
z, O f 2,
Fig. 7. Multiplexed structure of a Ms/ls-VD implementation.
rooted that indicates the complexity of hardware is given by L (LN,

‘2 1s-trellis 1s-ACS units), and (2) shows a linear dependency of L on the
23 2, of 23 rate 1 / T for a given 0. Therefore, an implementation of the
Fig. 5. The N, rooted Is trellisses of the example of Fig. 3 (M = 3). ACS procedure of the VA is found which is a linear scale
solution. Note that for a given T the achievable T only implies
trellis than in the 1s trellis. For this reason the minimum the M needed, but the influence of M on the required L given
achievable T in general is greater than 8. Another time in (2) is negligible. Therefore, this linear scale solution is
parameter which has already been introduced is the rate of independent of the Ms-VD, i.e., for a desired speedup of 1 / T
transition 1 / T of the underlying Markov process. Thus, given only additional 1s-ACS cells are necessary and no additional
T and T this directly implies the minimum M of the M s trellis Ms-VD hardware is required. The complexity of the imple-
by mentation only depends on 8 (leading to the required L ) .
Thus, the Ms-VD can be computed with a long cycle time T
7 without influencing the amount of L fold ACS-hardware
Mz-. (1) required. This is a very important result since the Ms-trellis in
T
general has a much greater connectivity and therefore the Ms-
Since the 1s-VA is based on a rooted trellis, the first step of ACS unit has many more additions and comparisons to
the 1s-VA is simply a “load” operation where transition perform than the 1s:ACS unit.
metrics are loaded as state metrics. And because each 1s-VA The parallel VD implementation, which we refer to as Ms/
is computed over a limited interval of M transitions only, (at 1s-VD, requires an additional multiplicity factor of 1s-ACS
maximum) a number of M - 1 add-compare-select operations units by the number of states N, and the speedup L . Now, each
have to be performed by each 1s-VD while the “missing” 1s-ACS unit in its fully parallel shuffle exchange implementa-
M t h ACS operation is inherently performed by the Ms-VD2. tion comprises N, ACS cells. Thus, the linear scale solution
Thus each 1s-ACS unit needs the finite time (M - l ) e to presented here is linear assuming a given trellis, i.e., a given
perform one 1s-VA, and therefore can be time-multiplexed to Markov process, but depends on the number of states at least
perform the 1s-VA’s, see Fig. 6. Then the number L of ls- by O ( N z 2 ) .However, various implementation architectures
ACS units needed for each (of N,) rooted 1s-trellis is given by can be chosen for a 1s-ACS unit [8]. Each is characterized by
its complexity A and its decoding cycle time 8 . When a Ms/
1s-VD is composed of such 1s-ACS units a speedup by L
leads to a complexity C proportional to
During the time interval LT the 1s-ACS unit has to have C-ALN,. (3)
finished a complete computation over M transitions of a 1s
trellis to be multiplexed to its next 1s trellis (see Fig. 6). In
By (2) the speedup L is proportional to BIT ( L - BIT),
other words, each 1s-ACS unit carries out a 1s-VA which is which combined with (3) yields the (complexity) X (cycle-
based on a rooted 1s trellis of each Lth Ms transition. The time) - measure
computations which are carried out on the LN, 1s-ACS units CT -A8Nz. (4)
have to be synchronized in such a way that their outputs (ready This states that the additional multiplicity factor by N, also
after every M - 1 1s-ACS operations) form the sequence of arises in the CT measure of the parallel Ms/ls-VD when
transition metrics which is needed for the Ms-VD. The compared to the AB measure [8] of a corresponding single
resulting block diagram of the parallel VD is given in Fig. 7. implementation. We mention the fact that another new linear
Equation (1) implies that for a given 7 the minimum M scale solution for the VD with proportionality between CT and
depends linearly on the rate 1 / T required, but M does not AB is outlined in [9], [ 101.
influence the amount of ACS-hardware needed. The factor
IV . SYSTOLIC ARRAYIMPLEMENTATION
Note that one ACS operation comprises up to N, ACS computations, one The newly derived parallel VD can easily be implemented
for each state. by a simple multiplexed structure as shown in Fig. 7. The ls-
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 37, NO. 8, AUGUST 1989
NI= 3 columns
o f 1s - ACS units
L = M - 1 rows
input of s t a t e metrics
c
E I
explanotion :
1-step-ACS unit
M - step - V D
updated s t a t e metrics
Fig. 8. Systolic array solution of the Ms/ls-VD, clocked at time instances
Fig. 9. N, fold pipelined and interleaved systolic Ms/ls-VD. Here, X, is
/8-/7 X, is the complete set of Is
(rate 1/8, / is the time index). Here, -
the complete set of 1s transition metrics of time instant n. The sets of A i are
transition metrics of time instant n.
fed in P = N, times in a row. Therefore, the index I is incremented every
N, clock cycles. The array is clocked by rate P / 8 = N,/O.
ACS cells have to be clocked in a way that the Ms-VD
receives their results in the correct time slots which yields that which can be easier to clock in case of a large array (clock
the rate of multiplexing has to be equal the clock rate of the skew).
Ms-VD 1 / = ~ l / ( M T ) . Now, for 8 = 7 = MT this leads to For any implementation (systolic/wavefront array or multi-
an overall synchronous system, in which at each time instant IT plexed version) the 1s-ACS units can be divided into a set of P
(I as time index) the computation of a set of N,parallel l s - pipelined (latched) parts, e.g., for P = 3 into three parts with
VA’s is started and the same number of 1s-VA’s are part 1: add, part 2: compare and part 3: select. Therefore,
completed. Therefore, instead of implementing a set of depending on the number P of pipelined parts, P 1s-VA’s can
parallel multiplexed 1s-ACS units one can also implement a be interleaved in one ACS unit. An especially interesting
pipeline structure of these 1s-ACS units. The pipeline has the pipelined architecture can be derived for the systolic array
length M - 1 (8 = T = M T =. L = M - 1) and the solution of Fig. 8, since the whore 1s-array is of simple
computation of each 1s-VA is pipelined through this imple- feedforward structure. If one column is pipelined by P = N,,
mentation. At the end of this pipeline the results of the last then the processing performed by all N, columns can be
iteration of the 1s-VA’s can simply be fed to the Ms-ACS pipeline interleaved [15] in this one column, see Fig. 9.
unit. Hence, this new array is clocked at rate P/8 = N7/0. The
This is shown in Fig. 8 for an example with Nz= 3, M = main advantage of this pipelined systolic array is the better
9, and T = 8 = M T ( * L = M - 1). The systolic array is exploitation of processing hardware and the reduced amount of
clocked at the time instants IT which results in a throughput wiring required. The wiring is reduced in particular between
rate of 1 / T = M/8 = ( L + 1)/8. Each column of the array the 1s-array and the Ms-VD. Here the simple array supplies
computes the (M - 1) fold ACS procedure based on one the Ms-VD in parallel with N:Ms-transition metrics (equal to
rooted 1 s trellis. Therefore, a parallel number of N,columns 1s-state metrics) where the pipelined array supplies the Ms-
has to be implemented. As a result this systolic array VD serially N7times in a row with Nz metrics. This allows the
implementation consists of a number of N, independent Ms-VD to carry out a serial processing of its ACS procedure
parallel columns, each made up of cells which communicate which is another major advantage of the pipelined array.
only in the top-down direction. Since the input (transition
metrics) to all ACS units of one row is the same, only one V. SURVIVOR MEMORY
conventional 1 s transition metric unit has to be implemented By introducing the Ms/ 1s approach a linear scale solution
for each row. T o minimize the interconnection wiring between was presented for the ACS unit of a parallel high-speed Ms/
the rows of each column of the array the methods presented in 1s-VD. Also a linear scale solution can easily be found for the
[11]-[13] and/or of the cascade processor presented in [8] can transition metric unit. However, such a linear scale solution
be applied (here as a pure feedforward implementation). The cannot be found for the total survivor memory needed.
systolic array can be transferred to a wavefront array solution The size of the survivor memory of each 1s-VD is linearly
FETTWEIS AND MEYR: PARALLEL VITERBI ALGORITHM IMPLEMENTATION 789
implementation of a VD with the help of 1.5 pm CMOS

standard and macrocell ASIC’s 1141. One 6-bit ACS cell as a
standard cell block takes up about 0.4mm2 chip area and
operates at 20 MHz. For N,= 4 and a speedup by a factor of
. BIT = 8 to achieve 120 MHz baud rate requires L = 6. This

leads to a number of 96 1s-ACS units, which yields a chip-
area of approximately 40 mm2 (shuffle exchange 1s-ACS
unit). With the help of pipelining and interleaving the number
of ACS units and thus the chip-area can be reduced (e.g., to
first hierarchy half). Even when considering an on-chip overhead, this
M, -step -trellis example clearly shows the practicability of the method
described in this paper.
origin a 1 t r e I1is REFERENCES

1 - step - t r e l l i s
M. Bellanger et al., “TDM/FDM transmultiplexer: Digital polyphase
Fig. 10. Schematic view of the hierarchical order of trellisses. and FIT,” ZEEE Trans. Commun,, vol. COM-23, pp. 1199-1205,
Sept. 1974.
dependent on M (the length of each rooted 1 s trellis). Since R. E. Bellman and S. E. Dreyfus, Applied Dynamic Program-
ming. Princeton NJ: Princeton University Press, 1962.
LN, 1s-VD’s are implemented, the total size of the survivor A. J. Viterbi, “Error bounds for convolutional codes and an asymptoti-
memory of the 1s-VD’s is a linear function of MLN,. A cally optimum decoding algorithm,” ZEEE Trans. Inform. Theory,
speedup of the transition rate 1/T by a factor b leads to an vol. IT-13, pp. 260-269, Apr. 1967.
increase in L and M by a factor b ( l ) , (2). Therefore, the G. D. Forney, “The Viterbi algorithm,” Proc. ZEEE, vol. 61, pp.
speedup results in an increase of required memory by b2which 268-278, Mar. 1973.
does not lead to a linear scale solution. However, one possible -, “Maximum-likelihood sequence estimation of digital sequences
solution is given as follows. Since the 1s-VA is carried out in the presence of intersymbol interference,” ZEEE Trans. Inform.
only over a limited 1s trellis which consists of M 1 s Theory, vol. IT-18, May 1972, pp. 363-378.
G. Ungerback, “Adaptive maximum-likelihood receiver for carrier-
transitions, the decisions of the 1s-ACS units can simply be modulated data-transmission systems,’’ ZEEE Trans. Commun., vol.
stored in RAM’S. Then, according to the decision of the Ms- COM-22, pp. 624-636, May 1974.
VD, only the optimum path has to be decoded by reading the J. Snyder, “High speed Viterbi decoding of high rate codes,’’ in Proc.
contents of the RAM (e.g., with the low rate l/O) and tracing 7th ZCDSC, Phoenix, AZ, 1983, Conf. Rec. pp. XII16-XII23.
back the path wanted. Thus, because the survivor memory can P. G. Gulak and T. Kailath, “Locally connected VLSI architectures for
be implemented with RAM, its realization is not a bottleneck. the Viterbi algorithm,” ZEEE J. Select. Areas Commun., vol. SAC-
Another possible solution is not to implement a 1s survivor 6, pp. 527-537, 1988.
memory at all for the Nz parallel 1s-VD’s, but to store the 1s G. Fettweis and H. Meyr, “A modular variable speed Viterbi decoding
transition metrics in a RAM. Then, after the corresponding implementation for high data rates,” Conf. Rec. EUSZPCO-88,
Ms transition has been decoded by the Ms-VD its beginning Grenoble France, Sept. 1988.
G. Fettweis and H. Meyr, “Verfahren zur ausfuhrung des Viterbi
and ending states are known (coarse grain decoded). There- algorithmus mit hilfe parallelverarbeitender strukturen,” German Pat.
fore, a simple additional 1s-ACS structure can be imple- pend. No. P37 21 884.0, July 1987.
mented to decode the fine grain 1s transitions of the correct H. Burkhardt and L. C. Barbosa, “Contributions to the application of
Ms transition (with the help of its stored 1s metrics). Note that the Viterbi algorithm,” ZEEE Trans. Inform. Theory, vol. IT-3 1, pp.
the here required RAM space for the 1 s transitions again 626-634, Sept. 1985.
depends on ML and therefore increases by b 2 . D. J. Coggins, D. J. Skellern, and B. S. Vucetic, “A partitioning
As was pointed out in Section I1 the survivor only has to be scheme based on state relabelling for an 8Mbps single chip Viterbi
implemented for a finite depth B . Since the Ms trellis always decoder,” in Proc. 10th SZTA, Enoshima Island Japan, Nov. 1987,
unites a set of M 1 s transitions to one Ms transition the vol. 2, ED2-1, pp. 643-648.
C. M. Rader, “Memory management in a Viterbi decoder,” ZEEE
survivor depth of the Ms-VD decreases when M is increased. Trans. Commun., vol. COM-29, pp. 1399-1401, Sept. 1981.
For M > B it then takes on the minimum value of t w o M - E. Horbst, C. Muller-Schloer, and H. Schwartzel, Design of VLsz
steps. Thus the survivor memory of the Ms-VD does not lead Circuits. Based on VENUS. New York: Springer-Verlag, 1987.
to any implementation problems. K. K. Parhi, D. G. Messerschmitt, “Concurrent cellular VLSI adaptive
filter architectures,” ZEEE Trans. Circuits Syst., vol. CAS-34, pp.
VI. CONCLUSIONS 1141-1151. Oct. 1987.
The presented method of implementing the VA allows the
use of hardware with a limited processing speed to achieve a
very high throughput rate, i.e., rate of decoding desired. It is a
linear scale solution.
The approach presented here is based on the principal idea
of introducing two hierarchies of trellises. However, in
general this can also be extended to additional hierarchies, see Gerhard Fettweis (S’84) was born in Wilrijk
Fig. 10. Thus a whole variety of VD-systems can be (Antwerpen), Belgium, on March 16, 1962. He
developed. However, in most cases this leads to a larger received the Dip1.-Ing. degree in electrical engi-
hardware complexity. neering from the Aachen University of Technology
The compare-select feedback procedure based on a finite in 1986. During 1986 he was with the communica-
state process is not limited to the VA. Generally, it is a well- tions group of the Brown Boveri Corporation
known element of dynamic programming. Thus, the method research center, Baden, Switzerland to work on his
diploma thesis. He is currently working towards the
described here for the special case of dynamic programming,
Ph.D. degree in electrical engineering at the Aachen
the VA, may also be a solution or be of help in finding new University of Technology, West Germany.
high-speed implementations of related algorithms. His interests are in dieital communications, espe-
~~
To show that the method described is of practical interest a ciallv the interaction between algorithm and architecture for high-speed
view on our design is given here; we examine the VLSI parailel VLSI implementations.
790 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 37, NO. 8, AUGUST 1989
Heinrich Meyr (M’75-SM’83-F’86) received the Electrical Engineering at the Aachen University of Technology (RWTH),
Dip1.-Ing. and Ph.D. degrees from the Swiss Aachen West Germany. His research focuses on synchronization, digital
Federal Institute of Technology (ETH), Zurich, in signal processing, and in particular, on algorithms and architectures suitable
1967 and 1973, respectively. for VLSl implementation. In this area he is frequently in demand as a
From 1968 to 1970 he held’research positions at consultant to industrial concerns. He has published work in various fields and
Brown Boveri Corporation, Zurich, and the Swiss journals and holds over a dozen patents.
Federal Institute for Reactor Research. From 1970 Dr. Meyr served as a Vice Chairman for the 1978 IEEE Zurich Seminar
to the summer of 1977 he was with Hasler Research and as an International Chairman for the 1980 National Telecommunications
Laboratory, Bern, Switzerland. His last position at Conference, Houston, TX. He served as Associate Editor for the IEEE
Hasler was Manager of the Research Department. TRANSACTIONSON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING from
During 1974 he was a Visiting Assistant Professor 1982 to 1985, and as Vice President for International Affairs of the IEEE
with the Department of Electrical Engineering, University of Southern Communications Society.
California, Los Angeles. Since the summer of 1977 he has been Professor of

Parallel Viterbi Algorithm Implementation: Breaking The ACS-Bottleneck

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Viterbi Algorithm Implementation: Breaking The ACS-Bottleneck

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31, NO.

8, AUGUST 1989 785

Parallel Viterbi Algorithm Implementation:

.OO 0 1989 IEEE

... 2, ... M-step-

Pipeline structure of a Viterbi decoder.

1s- ACS units

Fig. 7. Multiplexed structure of a Ms/ls-VD implementation.

rooted that indicates the complexity of hardware is given by L (LN,

implementation of a VD with the help of 1.5 pm CMOS

. BIT = 8 to achieve 120 MHz baud rate requires L = 6. This

origin a 1 t r e I1is REFERENCES

You might also like