You are on page 1of 6

TTERBI-DECODER SYSTEM WITH TRACE-BACK IMPLEMENTATION BACKGROUND

OF THE INVENTION The invention relates to a Viterbi decoder system as recited in the
preamble of Claim 1. Systems based on the Viterbi algorithm are in general usage for solving
various maximum likelihood estimation problems. Without restriction, the disclosure hereinafter
is directed to their use for decoding error protective convolutional codes in communication
systems. A general treatise of the Viterbi algorithm has been presented in G. D. Forney, The
Viterbi Algorithm, Proc. IEEE Vol 61 (1973) pp 268-278. The trace-back implementation of the
Viterbi decoding approach is used in particular if the truncation length L and the constraint
length K relatively large, and if the required processing speed required is relatively high. Now,
the decoding is executed on successive intervals that each have the decoding truncation length L,
and it will present the best path estimate in a reverse order as compared with the sequence of
receiving the encoded data. In prior art, the procedure will necessitate four random access
memories, each with a storage depth that equals the truncation length L, and which memories
will collectively occupy a large fraction of the overall decoding chip area, and will in
consequence bring about a relatively large part of the chip cost. It would be preferable to
diminish this area.
SUMMARY TO THE INVENTION In consequence, amongst other things, it is an object of the
present invention to improve a system as recited through diminishing the storage requirements,
and which object has in particular been arrived at by arranging the system for updating of the
state pointers in a forward direction as received, by copying the state pointer of the surviving
state to the current state. Such state pointers for each associated current state indicate with
respect to an interval of predetermined length a preceding state to be reached through a trace-
back over the interval in question. Now therefore, according to one of its aspects the invention is
characterized according to the characterizing part of Claim 1.
Further advantageous aspects of the invention are recited in dependent Claims.
BRIEF DESCRIPTION OF THE DRAWING These and further aspects and advantages of the
invention will be discussed more in detail hereinafter with reference to the disclosure of
preferred embodiments, and in particular with reference to the appended Figures and Tables that
show: Figure 1, a top level architecture of a Viterbi decoder; Figure 2, the merging of survivor
paths; Figure 3, a known Trace Back Algorithm for a particular code; Figure 4, operations of the
Trace Back Algorithm for the various memory modules; Figure 5, a Radix-2 Multi-Butterfly
embodiment; Figure 6, updating of the State Pointers according to the present invention; Figures
7A-7C, Pre Trace Back in combinatorial logic; Figures 8A-8C, PTB in combinatorial logic,
depth L/2; Figure 9, an exemplary architecture of a Survivor Memory Unit; Figure 10, the
functional sequence for the prior art Trace Back operation; Figure 11, the improved functional
sequence without Pre Trace Back; Figure 12, the further improved functional sequence of a
Survivor Memory Unit with depth L/2.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Figure 1 shows a top level
architecture of a Viterbi decoder. The present invention may be applied in various different
contexts, but has in particular been conceived for a specific error protective convolutional code
for data communication, with a constraint length K, and with a rate R that is less than unity. The
code itself has been described in ETSI Digital Broadcasting Systems for Television, Sound and
Data Services, Frame Structure, Channel Coding, and Modulation for 11-12 GHz Satellite
Receivers, December 1994, ETS 300421. For decoding, the present invention uses a Viterbi
decoder algorithm with a trace-back implementation. Another implementation would have been
the so-called Register Exchange Algorithm, which for a great truncation length, and in
consequence, many states, would however require too much hardware.
In Figure 1, the stream of error-protected input data arrives on input 20, in the form of a bit
stream that has been subjected to a hard decision or soft decision procedure. The Branch Metric
Unit (BMU) 22 computes the distance of the received data on input 20 to the respective edges of
the trellis that spans the various states of the receiving system. The Add- Compare-Select (Add-
Compare-Select) unit 26 determines for each path the cumulative error or metric, and for each
trellis state the associated optimum path. As shown by retrocoupling means 28, this cumulation
is a progressive procedure that steps for each data symbol or data unit received. The output of
Add-Compare-Select 26 are the 2K-1 survivors (30) of the trellis at that instant. The embodiment
uses a decoding constraint length K=7 and in consequence, 64 individual states. Finally, the
Survivor Memory Unit (SMU) 34 uses these survivors to determine the decoded data. The
present invention deals with raising the area-efficiency of the SMU 32.
The survivor bits that arrive from Add-Compare-Select 26 collectively form a survivor word and
represent pointers that link each respective state with its associated surviving predecessor state
from the previous stage or symbol cell. Searching back from a particular state will let the system
retrieve the best path. Note that each current state has its own respective best path. With a high
probability, these best paths will merge at a certain depth back in time. The depth that is actually
used for the maximum likelihood decoding is called the truncation length L ; the value of L will
thus be chosen equal to or larger than this "certain depth". At depth L, all oldest survivors will
have merged, thereby allowing the system to decode the associated oldest bit. Figure 2 shows
this merging of survivor paths in a direction to the left. At the right-hand edge there is a column
of 64 possible paths (fewer shown for reasons of clarity) that at a depth of LmjnjmUm stages to
the left will all have merged to a single path. As stated earlier and also shown, the length of
Let"$i may extend further to the left.
Figure 3 represents a known Trace Back Algorithm that is based on 64 states, and in
consequence, uses a six-bit state pointer that points at one position along the survivor word. The
survivor word has been shown as a column consisting of 64 bits. Successive columns have been
formed from a sequence of L successive survivor words that had arrived from Add-Compare-
Select 30. The actual survivor word is written into a RAM that is part of SMU 32. When L
survivor words will have been received, an arbitrary state pointer such as 000 000 is assumed
and loaded in a register, upper row at right in the Figure. For simplicity, data and control path of
this register have not been shown in detail. Now, this pointer addresses a bit in the most recent
column of survivor bits. In the example, this bit has the value "1"as shown, and is shifted into the
most significant position of the pointer register that now will read 100 000 as shown, second row
at right in the Figure. The state pointer, with a digital value of 32, will now address a single bit in
the second survivor word column from the right, that has a"1"value. This bit will also shift into
the register, third row at right, giving the state pointer a value of 48. The procedure is continued
and will next lead to a state pointer of 011 000, and so on, until eventually attaining a depth of L
columns. At this truncation depth, the correct path will with a high probability have been
reached. This is the pre-race-back procedure. The next bit obtained by accessing the memory at
left from this trace-back will therefore be a real decoded bit in the final trace-back procedure,
which is invariant under any change in the initial state pointer value.
This traceback over L stages should be done for every arriving symbol, which procedure has
conventionally been implemented through the four RAMs 40-46 symbolized in Figure 4. In each
sequence of intervals of L cycles each, a set of 64 survivor words from Add- Compare-Select 26
is written into one of the RAMs, which in the Figure has been symbolized for RAM 46. One
such interval after the writing, a pre-trace back as described with reference to Figures 2,3 is
executed, which in the Figure has been symbolized with respect to RAM 44.
After one further such interval, the Pre Trace Back is complete, which in the Figure has been
symbolized with respect to RAM 42. During the fourth interval, the final state of the pointer is
used to repeat the traceback once more, which now will at each step produce a decoded output
bit, so that a real-time decoder will be realized. The latter in the Figure has been symbolized with
respect to RAM 40. Therefore, always one of the four partial RAMs will be idle in each interval.
In the example each interval represents L=144 clock cycles. The output of SMU 32 consists of L
bits that will be produced in reverse temporal order. The amount of RAM may be reduced
through using RAMs that can read and write in one cycle, and/or a larger number of smaller
RAMs. The latter however need more overhead area for address generating. The operating as
described has been illustrated in Figure 10.
Figure 5 shows a Radix-2 Multi-Butterfly embodiment, that is used in the present embodiment.
In the Figure, the number of states has been contemplated as 8 only, with K=4. Each of the two
states i and (i+2K-2) at left is connected to states 2i and (2i+1) at right, modulo 2K'1. For
example, state 0 at left is connected to states 0 and 1 at right, just as is state 4 at left. For
illustration, this single butterfly has been indicated in doubled lines. Such butterfly structure is
now used to indicate the interconnection pattern between the states, for conveniently updating
the state pointers. In fact, for each current state the state pointer grows incrementally whilst
pointing to a state that is 1,2,... L stages back.
The improved schedule joins combinatorial logic with aspects of the earlier register-exchange
structure, that itself could not be used because of its great hardware requirements. The
combination however, can perform the Pre Trace Back operation in parallel to the writing. This
distinguishes radically from the earlier setup, wherein the Pre Trace Back could only commence
after L columns had been written, due to the mutually inverse directions of the writing versus the
traceback. Therefore, the Final Trace Back can start a whole interval of L cycles earlier, and the
number of RAMs can be reduced from four to three. The scheduling has been shown in Table 2.
As an example, suppose that the write operation in RAM 1 has just finished and the writing into
RAM2 is started. At time instant 0 of the second time slot, for each of the trellis states, the
associated state is determined that would have resulted from tracing back over one branch, so to
the final column of RAM 1. The new state pointer itself is calculated in the same manner as
Figure 3, but now the calculation is effected for all 2 (K-1) state pointers in parallel. Next, each
of the 2 (K-l) states now has its associated state pointer stored in a respective register of K-1 bits.
The hardware for the embodiment therefore needs 64 of these registers. These registers have a
similar setup as those shown in Figure 3 at right. The above initialization step differs however
from the subsequent steps.
In the next time instant after the initialization, for each of the states the associated new survivor
bit is received. These survivors are used to update the various registers that store the associated
state pointers. For the states 2i and 2i+1, the system copies the state pointer from state i if the
survivor equals 0 and from state i+2k-2 if the survivor equals 1. The new contents of the various
registers now point to the state that would result from a trace back over exactly two branches,
and therefore, again to the final column of RAM1. It should be clear that as long as the procedure
continues within the truncation length L, this pointing back distance will incrementally grow up
to L stages, and may be considered a continual resetting procedure.
Figure 6 shows accordingly an example of the updating of the State Pointers.
By way of a simplified example, Figure 6 uses a trellis with eight states only, and shows the
updating of the state pointers over only three successive clock cycles. Note that, as opposed to
the prior art procedure, the various state pointers will be updated in the forward direction.
Now, the first cycle shown is the initialization step, wherein the state pointers are calculated in
the same way as according to the state of the art. For clarity, only the updating of the uppermost
and lowermost butterflies, respectively, have been shown by indicating the connections between
the associated states. The solid lines indicate the survivor paths, whereas the dashed lines
indicate paths that will be discarded. During each subsequent clock cycle, the state pointers are
updated by copying the state pointer of the previous surviving state to the current state. In
contradistinction, the known Add-Compare-Select unit has the copying of the path metric of the
previous surviving state to the current state.
Continuing accordingly, at each instant t for each of the states the system determines the
respectively associated state that would result from tracing back over t branches, so to the final
column of RAM 1. This is advantageously done by for each state copying the state pointer of its
surviving predecessor. After L clock cycles, RAM2 will be filled with new data, and we will also
have the result of a Pre-Trace-Back from each of the available states to the start of RAM1. As
can be seen from Figure 6, a particular state pointer that has stopped to be a survivor for any of
the states, will never reappear in the list again.
Hence, the values of all state pointers will eventually merge into a single value when attaining
the truncation length. With high probability, the resulting state pointer will therefore be the same
for all states. Hence, we can start a Final Trace Back from the state in question. When the next
timeslot starts, all PTB values will be reset, and the processing is started for finding the starting
state for the Final Trace Back in RAM2. The above concept allows to reduce the memory
facilities to only three RAMs of depth L, instead of four, without needing a complicated address
scheduling or the use of several smaller RAMs. The operation schedule has also been shown in
Figure 11. If the available RAM modules have been provided with a Read-Modify-Write facility,
the number of necessary modules may be lowered still more to only two: in this situation, the
Final Trace Back may be executed immediately while writing: in fact, when the information has
been read once, and will not be needed anymore thereafter.
Figures 7A-7C indicate the Pre Trace Back in combinatorial logic, for a logic depth of L. For
simplicity, an organization with a relatively reduced number of state pointers has been shown,
the actual value of this number being less relevant for the qualitative showing. At time instant 0,
the Pre-Trace Back starts from the left-hand edge of RAM2 and results in a rather large number
of result paths of which a single pair has been shown as actually merging. At an intermediate
time instant t, various ones of the paths have merged already, which has been symbolized by the
actual merging inside the RAM in question, so that at the left hand edge of RAM2 only four of
these resulting paths are still present, of which one pair has been shown to actually merge at the
right hand side of RAM 1. At time instance L, all paths have merged to a single path.
Figures 8A-8C show a Pre-Trace-Back in combinatorial logic amended for the usage of RAM
modules that have a depth of only L/2, instead of operating on blocks of depth L. We may in this
manner even further reduce the amount of memory and silicon area.
Inasmuch as the truncation length should remain equal to L, we need to store the results of a Pre
Trace Back of depth L/2 until the Pre Trace Back over the next L/2 columns will have finished.
Then we may find the starting state for the Final Trace Back as a concatenation of the two Pre
Trace Back operations. This has been shown in Figures 8A-8C, that distinguish the span of the
Pre Trace Back over a full L columns, combined with the modularization of the RAM in
modules of L/2 columns each. In the Figures, the uppermost representation corresponds to that of
Figure 7A. Similarly, the other two representations also have their respective parallels in Figures
7B, 7C, respectively. The heavy vertical lines indicate that the process has actually arrived at the
edge of the RAM module in question.
As can be seen, when starting from an arbitrary state, the correct path at a depth of L is still
found. We now need four memories of depth L/2 each, of which only two are active in each
respective time slot. This fact may be used by combining two of those memories into a single
memory of depth L, as shown in Figure 12, infra. In this Table, RAMO-1 and RAM0_2 together
constitute a single RAM of depth L, and similarly for RAM11 and RAM12. Upon arriving at a
depth of L/2, the activity states of the modules are exchanged between reading and writing.
In principle, it would be possible to further reduce the amount of memory, by modularizing the
memory down to modules of depth L/4, which would yield an overall memory requirement of
1.5L. However, inasmuch as we must needs four columns of Pre- Trace-Back results now, the
increase in logic complexity could outrage the diminishing in memory storage. The presently
preferred embodiment with four memories of a logic depth of L/2 each has been found most
effective for the DVB system with a 64 state trellis and a truncation length of L=144. It is
recognized that systems with other values for the number of states and the truncation length
could lead to another trade-off optimization. The division into smaller memory modules than L/2
appears less advantageous however, due to the fast rise in logic cost, as compared to the
diminishing returns on memory decrease.
Figure 9 shows an architecture embodiment of a Survivor Memory Unit. The Pre-Trace-Back
Unit 56 receives the survivors and contains the recursive computations of the starting state for
the Final Trace Back. This unit comprises a first set of 2K-1 registers of K-1 bits each that are
updated in each clock cycle through a first control signal"update bank", and a second set of 2K-I
registers of K-1 bits each that are updated in each clock cycle through a second control
signal"storage bank"to store the signals of the previous PTB operation. Each time the"new
PTB"signal from the SMU control unit 54 goes high, the results of the current PTB operation are
shifted to the storage bank, and a new PTB operation is started in the update bank. The result of
the concatenation of two PTB results is sent to the control unit 54 as representing the starting for
the Final Trace Back. Furthermore, the control unit 54 decides which one of the two RAMs
50,52 will be read and which one will be written, through the R/W control signal pair. The
memory addresses are calculated as starting from the state (FTB) that is being received from the
PTB-unit 56, as a starting point for again going backwards as discussed with reference to Figure
3 supra. Finally, the decoded data is run through a LIFO memory arrangement not shown for
clarity, and outputted on output 60 for further usage. The timing in the SMU-control unit takes
into account that one clock cycle is necessary for the sending of the address to memory, and the
reception of the associated read data.
The new algorithm as presented is useful to reduce the overall chip area that is necessary for
implementing the survivor memory of a Viterbi decoder. The algorithm performs the Pre-Trace-
Back operation in combinatorial logic, and in parallel with the writing of the survivors. In this
manner, the results of the Pre-Trace-Back will be available one time slot earlier than before, and
hence the Final Trace Back can also start one time slot earlier.
This decreases both the memory size for data storage, and also the overall latency of the decoder.
Depending on the number of states of the trellis and the necessary truncation length, a trade-off
may be made between the usage of RAM, and the increased amount of combinatorial logic that
is necessary for using smaller RAMs.
On the one hand, RAMs circuitry is more dense than combinatorial logic, but on the other hand,
logic is more easy to route during the layout process. The use of relatively more combinatorial
logic will therefore allow a designer more flexibility in the positioning of the various functional
blocks. In the preferred DVB embodiment, a size reduction of the SMU unit of about 25% has
been obtained. Inasmuch as the SMU unit often consumes the larger part of the decoder overall
area, this represents an appreciable gain.
Furthermore, using the improved algorithm decreases the temporal latency of the overall
decoder, which feature will often allow to decrease circuit complexity in other parts of the
decoder. For example, the LIFO store that is needed for reordering the decoded bits may be
reduced in size by 50%, just as the storage of the number of hard decision bits for a BER
estimation.
Finally, in a system with a trellis-coded modulation (TCM), it is often necessary to delay or
transiently store certain unencoded bits during the time that the decoder is still busy with
determining the results for the encoded bits. Therefore, a decoder system with a decreased
latency will present smaller memory requirements for the associated delay line.
Furthermore, the synchronizing of the next block in the communication system may start earlier
along with the decreasing of the Viterbi decoder latency.

You might also like