You are on page 1of 5

An FPGA Implementation of a Soft-in Soft-out Decoder for Block Codes

Abdul-Rafeeq Abdul-Shakoor, Ron Kerr, John Lodge, and Valek Szwarc


Communications Research Centre Canada
3701 Carling Ave., Box 11490, Station H
Ottawa, Ontario, K2H 8S2, Canada.

ABSTRACT
This paper presents an FPGA implementation of the Vector Here FPGA implementation of a binary version will be
SISO algorithm for the (64, 57) extended Hamming code presented. For the sake of completeness, some of the details
(EH) and (64, 51) extended Bose, Chaudhri, and of the algorithm provided in [4] will be presented here.
Hocquenghem code (EBCH). The decoder architecture is Here we consider general linear codes for which
defined in VHDL and the circuit is implemented on a Xilinx codewords can be generated from c = iG, where i is a row
XC2VP100-1704ff–5 FPGA device. To achieve the required vector of k information bits, c is a row n-vector of coded bits
throughput, a pipelined data path architecture operating off a and G is a k x n generator matrix. A parity check matrix,
master clock was selected. To reduce gate count, the H, is any (n-k) x n matrix of rank (n-k) for which GHT = 0.
dynamic range of intermediate results was limited through A parity check matrix is referred to as being in pseudo-
use of saturation arithmetic. The decoder functionality was systematic form relative to a set of n-k bit positions if, by
verified by means of a test bench that compared the decoded moving the corresponding n-k columns and appropriately
bit stream with error free transmitted signals. SISO decoder ordering them, the matrix can be put into form H = [ I H’],
design choices that impact the bit error rate (BER) are also where I denotes the identity matrix. If we are given bit
presented. values for the k positions corresponding to the non-identity
Keywords: Vector SISO decoder, BCH and EH codes. portion of the pseudo-systematic parity check matrix, it is
easy to form the entire codeword by selecting the remaining
1. INTRODUCTION n-k bits to satisfy the parity constraints. We refer to forming
codewords in this manner as “recoding”.
The Bose, Chaudhuri, and Hocquenghem (BCH) class of The Vector SISO algorithm uses a reliability vector based
linear block codes has been effectively used to correct for on the best decision bits, vbest, and the soft input values to
channel induced errors in wireless communications. the decoder y. The vector y is the input to the decoder
Throughput, power, and gate count considerations dictate where the sign of the value provides a representation of the
that, for efficient realization of decoders, both algorithm logical value and the magnitude is assumed to be
and architecture need to be optimized to meet design and proportional to the reliability of the sample. Here, decision
performance requirements. Recently a number of soft- bits are bipolar values +1 and -1 representing binary ‘0' and
decision decoding algorithms [1-7] have been proposed for ‘1' respectively. The reliability vector is given as the
decoding linear block codes. The optimal soft-in soft-out element-by-element product of the two vectors, vbest and y.
(SISO) decoder is based upon maximum a posteriori In this implementation, we have taken the best decision
probability (APP). A suboptimal SISO approach that is vector at the input to be the sign of the soft-input values y.
numerically less complex is the max-log-APP algorithm [8]. Given this assumption, the reliability vector can be
However, full max-log-APP decoding is computationally computed by taking the absolute value of the elements of y.
too intensive to be practical for many block codes. Here, we The next step is to modify the current parity check matrix
present a design and implementation of an algorithm for to a pseudo-systematic form based on n-k least reliable bit
linear block code decoding [4] that is closely related to the positions. In doing so, we divide the bit positions into two
max-log-APP algorithm but of lower realization complexity. classes, most-reliable and least-reliable similar to [11]. This
The application of the algorithm can be extended to the method ensures that the least reliable bit (lrb) positions are
decoding of more powerful composite code structures such linearly independent. This is accomplished by utilizing row
as Turbo Product Codes (TPC) where the overall code reduction of the parity check matrix. The algorithm selects
structure is decoded in an iterative manner based on SISO the first parity check equation from the parity check matrix.
decoding of the individual constituent codes. In this equation, the algorithm finds the bit position with the
minimum value in the reliability vector corresponding to the
bits involved in the equation. This corresponds to the
position of the least reliable bit for this equation. The
2. BACKGROUND algorithm then carries out row reduction to remove the
effect of this position from any of the other n-k-1 parity
The Vector SISO algorithm has been applied to a number of
equations in the parity check matrix. The algorithm then
linear block codes, both binary [4, 10] and non-binary [9].
moves to the next unprocessed equation and repeats the performing hard decisions on y for the mrb and then using
process of finding the least reliable bit position and the pseudo-systematic parity check matrix selecting the lrb
removing it from the other equations. This process values to satisfy the parity constraints.
continues until all n-k equations have been processed. This iv) Compute the extrinsic information for the mrb. For
step is referred to as the row reduction step and, after it is the ith mrb, a reasonable approximation to the extrinsic
finished, the parity check matrix is in pseudo-systematic information is to take the metric difference between the best
form relative to the n-k lrb positions. decision code word, d’, and the recoding solution with all of
The parity check matrix is used to form extrinsic the mrb remaining the same except for the ith position
information for the most-reliable bits (mrb). An which is flipped, resulting in code word dj’. Thus, the
approximation for the extrinsic information, for a binary extrinsic information from Equation (1) becomes
single parity check equation, consists of the minimum ˆ
llrie = di′ * ∑ dm′ ym , (2)
magnitude within the equation with the sign that satisfies m≠i
the parity equation [12]. Extrinsic information for an mrb in ′ ≠ d m, j '
dm
the jth position of the codeword, is the sum of the extrinsic
where each of the indices in the sum, m, corresponds to an
values from all of the parity equations in which it is
lrb and multiplication by d’i accounts for the possibility that
involved, i.e.
the ith bit may not be a +1 in the vector of best decisions.
e j = ∑ d 'j ,m y m (1) v) Compute the mrb metric differences by summing the
extrinsic information and the intrinsic information and
where e j is the extrinsic value for the jth position in the compare with the signs of the mrb in the current best code
codeword, d 'j ,m is the decision for jth bit from equation m word. If any of the signs differ, then a better code word than
the current best code word has been found. Consequently,
that satisfies the parity equation constraints, and y m is the there is a new best code word. If the signs differ in several
minimum value found for equation m. locations, the location with the highest magnitude composite
The soft-input information is combined with the extrinsic value (intrinsic plus extrinsic) is the one considered (as
information for the mrb positions. A decision vector is flipping this bit position will make the biggest improvement
based on the signs of the composite information vector. The in the code word metric). If there are no sign differences,
k decisions on the mrb positions are used to recode the lrb go to STEP vii. Otherwise, form the new best code word by
positions. first changing the sign of the appropriate mrb and then
At this point the current decision bits on the mrb and the computing the lrb using recoding.
recoded lrb bits form a new best decision vector based on vi) Form the new reliability vector by performing
the code information and the soft-input values. element-by-element multiplication between the new best
The new decision bits and values from the composite decision vector and y. Note that the reliability vector is now
vector are used to create the soft-output information for the a signed vector with the “smallest” elements being negative
lrb positions. Similar to the computation of the mrb (i.e., where the intrinsic information and the “best” decision
extrinsic, the lrb extrinsic has the sign that satisfies the differ in sign). Go back to STEP (ii).
parity equation and a magnitude equal to the minimum vii) Compute the soft outputs for the lrbs. Recall that
magnitude of the composite information with d’ the each row in the pseudo-systematic parity check matrix
contribution of the lrb value subtracted. In this way, the represents a parity equation that includes only a single lrb,
soft-output values of the lrb take advantage of the extrinsic with the remainder of the entries being mrbs. Furthermore,
and soft-input values of the mrb positions. the magnitude of the composite information (intrinsic plus
extrinsic) for each of the mrbs represents the difference
between the metrics for the best code word and the recoded
code word with the given mrb flipped. Each of these mrb
3. SISO ALGORITHM composite information values are candidates for the
composite information value (with the appropriate sign
i) Form an initial “reliability” vector by taking the
adjustment) for the lrb, since it will always be among the
absolute values of the elements of y.
bits whose sign differs between the pair of code words
ii) Select the n-k linearly independent bit positions that
involved in the metric difference. The best choice is the one
are the least reliable in the sense that they correspond to the
with the smallest magnitude because this is the one with the
smallest elements in the reliability vector. We will refer to
best pair of metrics. (one of the pair is always the metric for
these as “least reliable bits” (lrb). Similarly, the remaining k
the best path d’).
bit positions are referred to as “most reliable bits” (mrb).
Computational procedures have been developed for all of
iii) Using row reduction techniques put H into pseudo-
the above steps which translate into very efficient
systematic form such that the identity portion of the parity
architectures or software realizations for vector processing.
check matrix corresponds to the lrb. Let d’ denote the
As will be demonstrated by the examples, the basic
vector of best decisions found to the given point in the
algorithm described above performs very well as a SISO
algorithm, where “best” means the code word with the best
decoder for the constituent codes in a composite turbo-like
metric. Initially, d’ will be the code word obtained by
code such as a turbo product code [3]. It also performs well MRB block, the most significant bits (MSB) are truncated
as a soft-in decoder for relatively small distance block codes and output bus B1 consists of only 7 bits per symbol
such as extended Hamming codes and low-distance BCH resulting in area savings.
codes. However, its performance is quite sub-optimal for Extrinsic computation in the MRB block depicted in Fig. 3
fairly powerful codes. Some additional processing can be involves further operation on the updated H-matrix to
used to improve the performance for such codes. Here we compute the most reliable bits. The LRB block in Fig. 4
describe one ad hoc approach that shows promise.
For codes with a large minimum distance, it is possible to Soft inputs Soft outputs
test d’ to determine with fairly high reliability whether or RRB MRB LRB
B0 B1 B2 B3
not it is the correct code word [4]. This can be done by
comparing the normalized metric (i.e., the inner product of Clock
the code word and the received vector divided by the sum of
the absolute values of the received vector) to a threshold Fig. 1: SISO decoder block diagram.
value. If the threshold is exceeded, then we accept d’ as the computes least reliable bits from the received vector. The
decoded code word. If not, then we do some additional MRB intermediate values increase to 10 bits per symbol in
processing to find a better code word. Given that the basic conformance with Matlab simulation results in Sect. 5 [4].
algorithm only changes the sign of one mrb at a time, a Hence busses B2 and B3 consist of 10 bits per symbol. The
possible approach is to change the signs of groups of 2 or (N-K) rows of the N column H matrix are referred to as the
more of the mrbs, perform recoding, and compute the parity check equations in current algorithm implementation.
corresponding metrics. However, the number of possible The received N symbol codeword processed along with H
combinations grows very quickly for groupings of greater matrix is also referred to as a soft input. From the soft inputs
than 2 bits. Here we take a different approach. If the best and 1st parity equation (i.e 1st row of H matrix), a most
code word found by the basic algorithm is not accepted, reliable bit is generated and the most reliable bit position is
then we subtract a small scaled version of it from the found and the parity check matrix is updated with
received vector in order to bias the results away from the
already found solution and repeat the basic algorithm. This Soft inputs
procedure can be repeated until a code word is found that
y => N symbol register
y
passes the threshold test or until the maximum number of
allowed attempts is reached.
| y| => N symbol buffer register (i)
4. DECODER ARCHITECTURE AND ITS
IMPLEMENTATION
Parity check (H) matrix row 1 to
Based on the vector SISO decoding algorithm of Sect. 3, row (N-K) (ii)
the hardware decoders for respectively single and double

Repeated (N-K) times.


error correcting EH (64, 57) and EBCH (64, 51) linear
codes are presented here. The decoders process Index register (ii)
simultaneously 64 symbol input blocks and with a latency
of 5 clock cycles and outputs a decoded 64 symbol block
every clock cycle. The decoder architecture depicted in Fig. Minimum search algorithm (ii)
1 consists of three processing modules identified as the Row
Reduction Block (RRB), Most Reliable Bit Block (MRB), Row reduced new H matrix (iii)
and Least Reliable Bit Block (LRB). In terms of the
algorithmic description in Sect. 3, the RRB, MRB, and LRB
RRB (out)
blocks execute steps (i-iii), (iv-v), and (vi-vii) respectively.
To achieve high throughput, the data path architecture Fig. 2: Row Reduction Block (i, ii, iii
was fine-tuned to minimize the delay variations in the refer to SISO algorithm steps in Sect. 3)
pipelined stages. To reduce the gate count, the dynamic
range of the intermediate results was limited through the use this information. Likewise all the (N-K) rows of the parity
of saturation arithmetic [13]. Thus data corresponding to 64 check matrix are processed and a new matrix is generated
symbols in a block are simultaneously operated on and and output by the row-reduction block. As depicted in Fig.
routed between the blocks depicted in Fig. 1. The soft inputs 2, the RRB block takes 2 clock cycles to process the data
to the decoder’s RRB section at the B0 bus consist of 8 bits and generate the pseudo systematic matrix by way of row
per symbol for both decoder realizations. The RRB reduction. The circuit block diagrams in Figs. 3 and 4
hardware shown in Fig. 2 operates on the parity check (H) depict the computation of the most and least reliable bits
matrix and, in the process, the soft inputs are stored in a respectively from the received vector. Each computation is
single register with 8 bits per symbol. Since only positive
numbers or symbol magnitudes need to be forwarded to the
completed in 2 clock cycles and the soft-output results are decoder. In such a decoder, the soft-input can be the
provided at output of the LRB block. combination of reliability information derived from the
received channel samples and the soft-outputs from previous
decoding operations. The values tend to grow with multiple
RRB(out)
iterations so an 8-bit word size for the input was selected.
The intermediate values within the decoder were limited to
Updated H matrix from RR block
10-bit word size through use of saturation arithmetic to
prevent overflow. If a value in the decoder saturates at the
Compute new MRB indices for each row (iv) maximum value, it signifies that the corresponding bit is

Repeated (N-K) times.


reliable as it has a large magnitude. In a TPC application, a
normalization routine is necessary to ensure that the values
Compute signs of soft inputs (iv) and compute
product of signs (iv)
remain within a range to prevent too many values from
reaching saturation. Such a normalization routine was
indeed implemented in the hardware.
Compute error arrays for each row producing a
single row of extrinsic matrix (v) To verify the performance of the EH (64,57) and EBCH
(64,51) decoders with additive white Gaussian noise, the
Total extrinsic = soft inputs + error matrix (v) algorithm with 8-bit word input and 10-bit internal word
size with saturation arithmetic was implemented and
MRB(out) simulated in Matlab. A nominal value of one from the
Fig. 3: Most Reliable Bit Block (iv and v channel was scaled to a quantized level of 64. In Figs. 5 and
refer to SISO algorithm steps in Sect. 3) 6, the codeword error rate versus the signal to noise ratio
Eb/N0, is shown for binary phase shift keying for the two
5. MATLAB AND HARDWARE codes along with an ideal hard decision decoder for
SIMULATION RESULTS comparison. The ideal decoders correct either 1 or 2 errors
for EH (64,57) or EBCH (64,51) respectively. As seen in
As one of the decoder functions is to act as component the figures, for a codeword error rate of 10-2 gains of 1 dB
decoder for a TPC application, the input values were and 0.8 dB are provided for the EH (64,57) and EBCH
quantized to 8-bit word size. The word size was chosen to (64,51) codes respectively. The hardware implementation
allow for growth of input values when used in an iterative results in Table 1 and simulation results in Figs. 5 and 6
shows that code performance, area, and throughput are a
Soft inputs MRB(out) function of the code configuration, namely the values of N
y
and K, and consequently register sizes and extent of parallel
Soft inputs => N symbol register
processing. The throughput results show that the two
decoders process the 64 symbol data blocks at the master
clock rates of 20.4 and 15.3 MHz.
For each soft input < 0, Composite vector = soft The Table 1 results show that the EBCH (64,51) decoder
index register stores -1 inputs + MRB(out). (vi) requires 25% more CLB slices than the second decoder and
otherwise stores +1. (vi) this corresponds to the difference of their respective H
matrices with 13 rows for the former and 7 for the latter.

Threshold vector (Vth) = composite 10


0

vector - first row of MRB(out) extrinsic


Repeated (N-K) times.

matrix (vii) -1
10

Take absolute of Vth (vii)


Codeword error rate

-2
10

Product of signs of Vth (vii) 10


-3

Compute the LRB extrinsic (vii) 10


-4

Ideal 1-error hard decision decoder


Vector SISO, 8 bit input, 10 bit internal
Decoded bits
0 1 2 3 4 5 6 7 8
E b /N 0 (dB)
LRB(out)
Fig. 5: Codeword error rate for EH (64,57) on the AWGN
Fig. 4: Least Reliable Bit Block (vi, vii refer to SISO channel. An ideal 1-error correcting hard decision decoder
algorithm steps in Sect. 3) is shown for comparison.
0
10

REFERENCES
-1
10
[1] F. Therattil and A. Thangaraj, “A Low Complexity Soft-
decision Decoder for Extended BCH and RS-like Codes,”
Codeword error rate

10
-2 Proc. IEEE Intl. Symposium on Info. Theory, vol. v, pp.
1320-1324, Sept. 2005.
-3
[2] M.P.C. Fossorier and S. Lin, “Soft-decision Decoding of
10
Linear Block Codes Based on Ordered Statistics,” IEEE
Trans. Info. Theory, vol. 41, no. 5, pp. 1379 - 1396, Sept.
10
-4 1995.
[3] E. Piriou, C. Jego, P. Adde, and M. Jezequel, “A
Ideal 2-error hard decision decoder
Vector SISO, 8 bit input, 10 bit internal Flexible Architecture for Block Turbo Decoders Using BCH
0 1 2 3 4 5 6 7 8 or Reed-Solomon Components Codes,” Proc. of IEEE
E b/N0 (dB) Computer Society Annual Symposium on Emerging VLSI
Technologies and Architecture, vol. v, pp. 430-431, March
Fig. 6: Codeword error rate for EBCH(64,51) on the 2006.
AWGN channel. An ideal 2-error correcting hard decision [4] J. Lodge and R. Kerr, "Vector Soft-in-soft-out Decoding
decoder is shown for comparison. of Linear Block Codes," Proc. 22nd Biennial Symposium on
Table 1: Hardware implementation and simulation results Comm., Kingston, Ontario, pp. 373-375, May 2004.
for (64, 57) and (64, 51) codes. [5] M. Zwolinski and J.S. Reeve, "Behavioral Synthesis of
an Adaptive Viterbi Decoder," Proc. IEE Signal Processing
Code CLB Slices Clock Speed Professional Network and EURASIP, pp. 1-4, Sept. 2005.
(N,K) [6] E.H. Lu, Y.C. Cheng and P.C. Lu, "Fast Decoder for
EH (64,57) 12,200 20.4 MHz Triple-error-correcting Primitive Binary BCH Codes with
EBCH (64,51) 15,266 15.3 MHz Odd m,” IEE Proc. Comm., vol. 145, no. 2, pp. 60-64, April
1998.
The number of parity check equations (or rows in H matrix) [7] Efficient Channel Coding, Inc., Product Brief,
directly affects the hardware requirements of the vector "Advanced Turbo Product Code - Intellectual Property
SISO decoder. The differences in parity check matrix Core," Ohio, USA.
dimensionality also impacts on the master clock frequency [8] P. Robertson and P. Hoeher, “Optimal and Sub-optimal
and decoder throughput as confirmed by results in Table 1. Maximum A Posteriori Algorithms Suitable for Turbo
Design efforts to lower the gate count by reducing the Decoding,” European Trans. Telecom., pp. 119-125, Mar.
intermediate word sizes showed that this would result in 1997.
performance degradation. [9] R. Kerr and J. Lodge, "Vector Soft-in-soft-out Decoding
The implementations presented here are based on the Applied to Non-binary Linear Block Codes," Proc. 22nd
Xilinx XC2VP100-1704ff–5 FPGA device. The decoder Biennial Symp. on Comm., Kingston, Ontario, pp. 376-378,
implementations incorporated a self-test circuit consisting of May 2004.
a pseudo-random sequence generator, a comparator, delay [10] R. Kerr and J. Lodge, "Near ML Performance for
circuits, and multiplexers to route the signals. In normal Linear Block Codes Using an Iterative Vector SISO
operation, the SISO decoder operates independently. The Decoder," 4th Intl. Symp. on Turbo Codes & Related
self-test circuit enables on-chip at-speed verification of the Topics, Munich, Germany, April 2006.
decoders. [11] M.P.C. Fossorier and S. Lin, “Soft-decision Decoding
of Linear Block Codes Based on Ordered Statistics,” IEEE
6. SUMMARY Trans. Inf. Theory, vol. 41, no. 5, pp. 1379-1396, Sept. 1995
[12] G. Battail, “Building Long Codes by Combinations of
The SISO decoder implementation has resulted in a high Simple Ones, Thanks to Weighted-output Decoding,” in
throughput circuit design. The VHDL code has been tested Proc. URSI ISSSE (Erlangen, Germany), pp. 634-637,
and verified by comparing its results with that of the Matlab Sept. 1989.
based function models presented in [4]. The decoder [13] B. Shim and J. C. Suh, “Pipelined VLSI Architecture of
architecture is applicable to a wide range of linear block the Viterbi Decoder for IMT-2000,” Global
codes such as Hamming codes, BCH codes, and binary Telecommunications Conference GLOBECOM, vol. 1a, pp.
images of linear codes defined over GF(2m). Furthermore, 158-162, 1999.
the decoder is sufficiently flexible so that it can be used in a [14] D. G. Williams, "Turbo Product Codes and Their
TPC decoder to provide improved bandwidth efficiency at Bandwidth Efficiency," Advanced Hardware Architecture
relatively low implementation cost [14]. Inc., pp. 6/1-6/7, Nov. 1999.

You might also like