You are on page 1of 9

Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs

1019

AN AREA-EFFICIENT VLSI ARCHITECTURE OF A REED-SOLOMON DECODEWENCODER FOR DIGITAL VCRs


Sunghoon Kwon and Hyunchul Shin Dept. of Electronics Engineering, Hanyang University, Korea

Abstract : A new flexible and area-efficient VLSI architecture of a Reed-Solomon product-code decoder/encoder has been developed for digital VCRs. The new architecture of the decoder/encoder targeted to reduce the circuit size and decoding latency has the following three features. First, high area-efficiency has been achieved by sharing a functional block for encoding, modified syndrome computation, and erasure locator polynomial evaluation. Second, circuit size and decoding latency has been reduced by using a new architecture to implement the modified Euclids algorithm. Third, by doubling the internal clock speed from 18 MHz to 36 MHz), the decoding latency and hence the memory size can be reduced. The decodedencoder designed by using the proposed method uses less number of gates, by about 30%, than the one based on the conventional architectures.

I. Introduction
The Reed-Solomon (RS) coding is one of the most powerful and standardized techniques for error and erasure correction. Owing to its excellent capability for correcting burst errors, it has been widely used for digital communication systems and storage devices such as digital VCRs and disk drives [ 11. In particular, for error correction coding (ECC), the digtal VCRs employ RS product codes. RS codes over finite fields are maximum &stance separable. The properties and theoretical analysis of RS codes are well-known [2]. However, the design of a high-bit-rate RS decoder is not straightforward. The complexity of a RS decodedencoder circuit is dependent on the data rate, the code length, and the error correction capability. Therefore, the algorithm and the archtecture should be customized for each specific application to achieve high efficiency. RS decoding techniques can be classified into two categories (time-domain and frequency-domain). Time-domain techmques seem to outperform
Contributed Paper Manuscript received June 9, 1997

frequency-domain techmques both in area and in delay [3]. Most of the published time-domain techniques use one of the following algorithms to evaluate errorlerasure locator and evaluator polynomials. Berlekamp-Massey algorithm : [4] (Modified) Euclids algorithm : [ 5 , 6, 71 Matrix calculation : [8] A VLSI design of a pipeline RS decoder using systolic array was also presented in [ 5 ] . By the use of a multiplexing and recursive technique, the modified Euclids algorithm was implemented with significant reduction of cells. RS decoder in [7] can decode three types of codes (inner/outer/subcode) flexibly. One of the features of this design is to share the encoder and the syndrome computation block to reduce hardware size. In [9], it is shown that any RS decoder which corrects both errors and erasures also can be used as an encoder for the RS code. But power consumption can be increased by driving all of the blocks of decoder circuit during encodmg. To satisfy the decoding/encoding requirements for digital VCRs, we have developed a new flexible and area-efficient RS product-code decodedencoder architecture. Our decodedencoder can decode and encode three types of audio and video signals over GF(256). The three main features of the proposed decodedencoder are 1. sharing a hardware functional block to evaluate three different functions, 2. developing a new architecture for the modified Euclids algorithm and 3. doubling the intemal clock speed to reduce the latency. The proposed RS decodedencoder has been implemented by using about 30% less number of gates than the one based on conventional architectures. In Section 11, we describe the overview of the proposed RS decodedencoder. The flexible

0098 3063/97 $10.00 1997 IEEE

1020

IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997

decodedencoder architecture is presented in Section 111. The experimental results are described in Section IV. Finally, the conclusions are given in Section V.

11. New Features of the ecoder/Encoder


The new flexible RS decoder circuit based on the modified Euclid's algorithm can correct errors and erasures for product codes (two-dimensional array codes). The decoding is performed in two steps, inner (row) and outer (column) decoding [lo]. The inner decoder performs error correction and error detection. The inner code protects inhvidual cade, primarily against random errors. The same code, RS (85, 77), is used for both the audio and video signals. If a decoder failure is declared by the inner decoder, then the entire symbols in the corresponding row are erased. The outer decoder works essentially as an error and erasure corrector. Outer RS code protects a complete video or audio sector, primarily against burst errors spanning a few codes. If the outer decoder can not correct the errors and erasures, a decoding failure occurs and an error concealment is applied. Our decodedencoder is flexible in that it can decode the following three types of codes over GF(256) [ 111: (85,77) audiohide0 inner code * ( 149,138) video outer code * (14,9) audio outer code The maximum number of errors which can be

corrected are 4, 5 , and 2, respectively, for the three codes. For outer codes, the maximum number of erasures which can be corrected are 11 and 5 , respectively, for (149, 138) and (14, 9) codes. In the proposed architecture, (85, 77) and (14, 9) codes are also encoded and decoded by using the same (149, 138) code processing block. Suppose e errors and f erasures occur, then in general, the following equation indicated by maximum correction capability should be satisfied in (n, k) RS codes. 2e + f I n - k (1) where, n is the code length and k is information length. Furthermore, 2t = n - k and t is the error correction capability. The proposed RS decodedencoder is shown in Fig. 1. The circuit is pipelined and consists of six functional blocks, which are syndrome computation, encoder and polynomial expansion computation, modified Euclid's algorithm processing, Chien search, error correction and verification, and FIFO memory blocks. The general decoding algorithm is as follows. 1. Compute a syndrome polynomial, S(x). 2. Compute the error locator polynomial, A(x) and modified syndrome polynomial, T(x). 3. Perform a modified Euclid's algorithm. 4. Evaluate erroderasure locator polynomial and error/erasure evaluator polynomial (Ghien search algorithm). 5 . Correct and verify the errors.

.E.$. m p u t

Interior Regbns ............... ........A c .....


Modified Eudids
AlgOnthm

....................................

....................

Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs

1021

The proposed decodedencoder architecture has at least three major features which improve the areaefficiency significantly when compared to conventional architectures. 1. Three different functions share a hardware functional block. The encoder and polynomial expansions of T(x) and A(x) share a functional block by using MUXs as shown in Fig. 2. The modified syndrome polynomial, T(x), and the erasure locator polynomial, A(x) , can be computed by using the same hardware by setting different initial values. Each function is performed in different time periods. For faster (simpler) erasure information transfer during this process, we use erasure addresses of 8 bits instead of erasure flags of 1 bit. 2. Decoder size is reduced by using a new circuit architecture in which erroderasure locator and erroderasure evaluator polynomials are found. The new architecture implementing the modified Euclids algorithm is shown in Fig. 3. In this architecture, the coefficients of each polynomial are grouped into two i , (upper and lower) parts and separately stored in R Q;, L,, U, registers. The two parts are calculated in parallel. Owing to the parallel processing, decoding latency is significantly reduced. 3. The clock speed is doubled in intemal decoder blocks to reduce latency and memory size. The data rate in our specification is 18 Mbytesjsec at the primary I/O and the base clock speed is 18 MHz. However, present VLSIs can easily accommodate a 36 MHz clock. Therefore, ,the three blocks in the interior regions marked by dotted box in Fig. 1 may operate at 36 MHz. (Other blocks operate at 18 MHz) Th~s allows sharing of the Chien search block, shown in Fig. 4, by errorlerasure locator and erroderasure evaluator polynomials, These novel techniques have the obvious benefits of reducing the decodedencoder size and the decoding latency.

3.1 Overall Behavior of the Decoder/Encoder


Encoding for (n, k) RS codes is usually performed by (n - k)-stage linear feedback shift registers [ 11. In our design, encoding is performed in the Encoder and Polynomial Expansion block, to share hardware. In Fig. 1, decoding is performed in two steps for audlo and video signals, respectively. For audio signals, (85, 77) inner decoding is followed by (14, 9) outer decodmg. Similarly, for video signals, (85, 77) inner and (149, 138) outer decodings are performed. In inner or outer decoding, errors can be detected and corrected. However, erasures are detected during the inner decoding and corrected during the outer decoding. The proposed decoder/ encoder consists of six blocks similar to general dkcoding architecture. Let {rn.], ..., r l , ro>be the received symbols of 8 bits and let r(x, = rn-lxn-l. +.. + r,xl + ro (2) The generator polynomial of a (n, k) code is represented by the following equation over GF(256).

(3) where, a is a primitive element in GF(2.56). The behavior of each block of the decodedencoder in Fig. 1 can be summarized as follows. 1. The Syndrome Computation block calculates syndrome polynomial, S(x) . The coefficients of polynomial S ( x ) are calculated as
n-k-1

g(x) =(x- a o ) ( x -

a )...(x -

1=0

111. RS DecodedEncoder Architectures


In t h s Section, we first explain the overall behavior of the proposed decoderlencoder and then, describe the proposed architectures and their timing diagrams in detail.

where, s, = r( a ) 2. The Encoder and Polynomial Expansion block calculates the erasure locator polynomial, A(x), by using the erasure addresses of 8 bits. In other approaches, the erasure locations are frequently represented by one-bit flags. However, we use erasure addresses of 8 bits instead of erasure flags. By using these addresses, A(x) can be calculated in advance and passed to the next modified Euclids algorithm block. The modified syndrome polynomial, T(x), is calculated by using the following equation.

T(x)=S(x) A(x) modx2

(5)

1022

IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997

Erasure

,,IL*
8 its
A

Encoding output

@ : 8 bits multiplier over GF(256) 0 : 8 bits adder over GF(256)

Coefficients of S(X) are stored in registers to be used for the calculation of modified syndrome. Switch S is ON (OFF) if erasure value is non-zero (0). Fig. 2. Encoder and polynomial expansion block.

3 . The Modified Euclids Algorithm block computes error/erasure locator polynomial, O(X) ,

3.2 Hardware Sharing for Encoding and Polynomial Expansion


Fig. 2 shows the detailed architecture of the Encoder and Polynomial Expansion block. Since encoding and polynomial expansions of A(x) and T(x) can be computed by using the architecture shown in Fig. 2, the hardware is time-shared for the three computations. Now we explain how encoding of a (149, 138) code is performed. Switch S is used to control signal transfer. Initially, all the switches are ON and are reset to 0. Constants go - 810 registers, & RIO, (the coefficients of the generator polynomial in eq. (3)) are selected by MUXs, i.e., MU& MUXIO. Then, systematic encoding is performed. For the first 138 clock cycle times, encoding input is selected by MUXll and MUXI*. For the rest 11 clock times, the resultant parities computed in registers are selected by MUXI 1. A(x) is computed as follows. Assume that A

and erroderasure evaluator polynomial, ~ ( x,)by using T(x) and A(x) as initial conditions. A new architecture for this block implementation has been developed and will be described later in detail. 4. The Chien Searcp block evaluates o(a-) ,

odd,(a-) and m ( d ) for i = 0,

..., n-1. The roots

of these polynomials are used to compute erroderasure locations and values. 5 . The Error Correction and Verification block computes erroderasure value, ei,from ei = ~ ( a - /oodd ) ( a ? ), for i = 0, ..., n-1 (6) where oOdd (a-)is the sum of odd-terms of ~ ( x. ) Then, a corrected symbol is obtained by adding the error cito the received erroneous symbol r, . 6= q + ei, for i = 0, ..., n-1 (7) The is then used to verify the correctness of the corrected values, i.e., to check whether S(x) = 0 or not. If S ( x ) = 0 , is accepted. Othemise, the correction in the inner decoding is not successful and thus the row address is saved for the outer decoding. For erasure decoding, the received data stored in the FIFO memory block is used. 6 . The FIFO memory block is used to delay the received data by 2n 6 , where 2n is the delay required for the syndrome computation, Chien search, and verification and 6 is the processing time delay of other (the Encoder and Polynomial Expansion and the modified Euclids algorithm) blocks,

is the set of a-s where a- E A implies that the location of r, is an erasure. The A(x) can be written as A(x) = (X - a-) (8)

a- EA

During inner decodmg, A ( x ) = 1, since there are no erasures. In the case of a (149, 138) outer code, the maximum degree of A(x) is 2t = 11. RI1 is only used to store the coefficient of the maximum degree 11.

Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon Decoder/Encoder for Digital VCRs

1023

Initially, RO = 1 and all the R1 RI1values are 0. After the polynomial expansion, the i'th degree coefficient of A(x) are stored in Ri register. If

a-' f A , switch S is off and thus signals can not be passed to registers. Similarly, the modified syndrome polynomial T ( x ) shown in equation (5) can be computed by using the circuit. T ( x ) uses the coefficients of S(x), instead of the initial register values, i.e., 1. RO Rlo registers initially store the coefficients of

For (85, 77) and (14, 9) code processing, (R3 R11) and (% R11) registers are used, respectively.

S(X)

'

addition is performed for ea& coefficient of polynomials, the coefficients can be computed in parallel. In addition, double-speed clock is used in this block. Since r 6 * 5 / 14 / 2 / 2 1=1, (14, 9) outer code requires only 1 cell. As a result, the number of cells is reduced from 3 to 1 when compared to a straightforward implementation of [SI. We now describe briefly the hardware operation for the modified Euclid's algorithm. Fig. 3 (a) and (b) show the recursive computation blocks of R i (x) , Q i (x) and Li (x) , Ui (x) . Initially, the coefficients of these polynomials are set in Ri , Q i ,

Li and Ui registers, divided into the upper and


lower coefficient groups, respectively.
"

Qi (x) and

3.3 An Efficient Architecture to Perform the

Modified Euclid's Algorithm


To determine erroderasure locator polynomial ~(x) and erroderasure evaluator polynomial m(x) , the modified Euclid's algorithm [5] is applied. The algorithm is summarized in the following procedure. 1. Initial value setting 2. Recursive computation and stop condition checking The initial values of the Euclid's algorithm are
(9) (10) Then, recursive computation is performed as follows until the stop condition, i.e., deg(Li(x)) > deg(R,(x)), is satisfied. If the stop condition is

U,(x) register value transfer control" block is composed of MUXs. In Fig. 3 (c), 4 and stop
condition is calculated. According to

I,

, "Qi(x)

R, (x) = x~~ , Q, (x) = T(x) Lo(x) = 0, U, (x) = A(x)

satisfied,then m ( x > = R i ( x ) and o(x)=Li(x).

(x) = ColbiRi (x) + 'iaiQi (~11xl'i'[oiaiQi(x)+FibiRi(x)] (11) (12) Qi+l (x) = o i Q i (x) + ZiRi (x) Li+,(x)= [oibiLi(x)+CiaiUi(x)]x'zi'[oiaiUi(x) +FibiLi(x)] (13) (14) ui+l (x) = q u i(x) + OiLi(x) where, (ai, b i ) is the leadmg coefficients of
R 1 + 1

@i

(x),
i'

Qi

(XI)

(15) (16) Fig. 3, (a), (b), (c) and (d) shows the proposed new archtecture. In this architecture, we compute upper coefficients and lower coefficients of the polynomials in parallel. Since multiplication and

= deg(Ri (XI)- deg(Qi (x)) if Zi 2 0 , oi= 1, else oi =0

and U,(x) register value transfer control" block shifts the coefficients or swaps polynomials, { R i ( x 1, Qi (x) 1 and { Li (x) ,Ui(x) I. Owing to the polynomial computations in (1 1) (16), polynomial swapping is needed if Zi < 0 . Fig. 3 (d) shows the arithmetic blocks used in (a) and (b). In particular, arithmetic unit (3) is used to compute initial condition for the next "&en Search" block. This computation is necessary, for n # 2"- 1 in GF(z"). As an example, we describe decoding of a (149, 138) code by using the circuit in Fig. 3 (a). Note that the maximum number of coefficient of Ri ( x > polynomial is 12. Therefore, the upper degree coefficients of Ri ( x are stored in & R11 registers in block 3, marked by dotted box. Similarly, the lower degree coefficients are stored in Ro - RS registers of block 1 and block 2. Then, the modified Euclid's algorithm is performed, recursively. At this 2 and R5, time, MUXl and MUX3 select R respectively. And, MUX2 selects the output of the arithmetic unit (1) of block 3. In the case of a (85, 77) code, RS, RIOand R 1 1 registers are initially set to 0. The (14, 9) code is processed by using blocks 1 and 2. Similarly, the 6 coefficients of the (14, 9) code are divided into two parts and stored in (& k )and in (R3 Rs) registers in blocks 1 and 2, respectively.

>

1024

IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997

(a) Ri(x), Qi(x) computation block.

(b) Li(x), Ui(x) computation block.

.............................................
deg(R;(x))
~

deg(Q;(x))

(c) Degree computation block.


..

.....................................
(a,,b,)= leading coeff. of R , ( x ) , Q ,( x )

Arith. unit (3).

(d) Arithmetic units. Fig. 3. Modified Euclid's algorithm processing block.

The modified Euclid's algorithm is processed by using the proposed architecture in parallel. Owing to this parallel processing, the decoding latency can be significantly reduced and as a consequence, the number of cells and the FIFO memory sizes are reduced.

3.4 Chien Search Implementation Block


In general, o(a-'), o,,(a-') and

are computed in two blocks of the same architecture. We have developed a new archtecture capable of sharing the hardware blocks. The architecture is simple, as shown in Fig. 4. Hardware sharing is possible by using double-speed clock of 36 MHZ. In this block, the roots of erroderasure locator polynomials, and erroderasure evaluator polynomials are computed alternately. During even clocks, o(a-')and oodd(a-') are computed, while

~ ( a - ' ) during odd clocks, w ( a - ' ) is computed. The output

Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs
8 / -0

1025

odd

(a-i)

Summation by XOR tree Fig. 4 . Chien search Implementation block. is adjusted to the original clock speed which is 18
M H Z.

3.5 RS Decoder/Encoder Timing Diagram


Fig. 5 shows the timing diagram for (n, k) RS decodedencoder. In particular, the latency of the Euclids algorithm block is { (r(2t + 1) / 21 + 1) * 2t

The error correction and verification is initially performed for (n+3) clock cycle times. As a result, overall decodmg latency is given by + {2t + Decoding latency = n ((r(2t + 1) / 21 + 1) * 2t + (2t + 1)) }/2 + n + 3.

IV. Experimental Results


We have implemented the proposed RS decodedencoder using VHDL. During the hardware synthesis, circuit delays have been considered by restricting the longest path delays of the decodedencoder. Table 1 shows the implementation results based on several design methods, i.e., [ 5 ] , [7], an industry design, and our proposed method. A gate corresponds to a 2 input NAND gate. The number of total gates is from the synthesized results excluding the FIFO memories. The new proposed architecture has been implemented by using about 35,000 gates, which is about 30 % smaller than the other design results. The FIFO memory sizes have been reduced since the latency has been reduced. In the industry result, Berlekamp-Massey algorithm has been used. In others, the modified Euclids algorithm has been used. Note that in [7], design specification (code length, error correction capability, etc.) is different from others and therefore, direct comparison is not possible. We have verified the correctness of the designed decodedencoder by using extensive simulatiomith randomly generated data and errors.

+ (2t + 1) }/2. In this latency,

(r(

2t

+ 1) / 21 +1)*2t

clock cycle times are needed to complete the modified Euclids algorithm. It takes (2t+l) clock cycle times to compute initial values used in Chien search block. Of course, by using the double-speed clock, decoding latency is reduced by half.
w

1026

IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997

Modified Euclid's algorithm*

(Notes): '-'in 171 means unknown, i.e., the results have not been published.

V. Conclusions
In this paper, we have proposed a new flexible and area-efficient VLSI architecture of a Reed-Solomon product code decodedencoder for digital VCRs . The proposed architecture has three major features. First, area-efficiency has been significantly improved by sharing a functional block. Second, a new archtecture implementing the modified Euclid's algorithm has been developed to reduce the circuit size and latency. Third, by doubling the internal clock speed, the decoding latency and hence the memory size have been reduced. The correctness of the proposed decoder/encoder has been verified by the extensive simulation.

[6] H. M. Shao, W. Hills, T. K. Truong, I. S. Hsu, both

References
[I] S. B. Wicker and V. K. Bhargava, "Reed-Solomon Codes and Their Applications", New York, IEEE Press, 1994. [2] S. B. Wicker "Error Control Systems for Digital Communication and Storage", Prentice Hall, 1995. [3] Y. Jeong and W. Burleson, "High-Level Estimation of High-Performance Architectures for ReedSolomon Decoding," ISCAS, vol. 1, pp. 720-723, May, 1995. [4] J. M. Hsu and C. L. Wang, "An Area-Efficient VLSI Architecture for Decoding of Reed-Solomon Codes," ICASSP, vol. 6, pp. 3291-3294, May 1996. [5] H. M. Shao and I. S. Reed, "On the VLSI Design of a Pipeline Reed-Solomon Decoder Using Systolic Arrays," IEEE Trans. on Computers, vol. 37, no. 10, pp. 1273-1280, Oct. 1988.

of Pasadena, L. J. Deutsch, Sepulveda, all of Calif., "Architecture for Time or Transform Domain Decoding of Reed-Solomon Codes," U. S. Patent 4,868,828, Sep. 19, 1989. [7] G. Y. Lee, B. H. Kwuan, S. W. Lee, J. W. Jung, S. H. Nam, Y. S. Chun, D. I. Han, K. S. Park, Y. D. Choi, D. I. Cho, and J. K. Lee, "A VLSI Design of RS CODEC for Digital Data Recorder," ASIC Design Workshop, pp. 115-124, 1996. [8] T. Iwaki, T. Tanaka, E. Yamada, T. Okuda, and T. Sasada, "Architecture of A High Speed ReedSolomon Decoder," IEEE Trans. on Consumer Electronics, vol. 40, no. 1, pp. 75-81, Feb. 1994. [9] C. C. Hsu, I. S. Reed, and T. K. Truong, "Use of the RS Decoder as an RS Encoder for Two-way Digital Communications and Storage Systems," IEEE Trans. on Circuits and Systems for Video Technology, vol. 4. no. 1, pp. 91-92, Feb. 1994. [lo] S. H. Kim and S. W. Kim, "An Error-Control Coding Scheme for Multispeed Play of Digital VCR", IEEE Trans. on Circuits and Systems for Video Technology, vol. 5. no. 3. pp. 243-247, June 1995 [1I] "Specifications of Digital VCR for Consumer-Use", SD; Standard Definition, NTSC, PAL, SECOM, HD-Digital VCR Conference, DRAFT IEC
document, Confidential, June 1994.

Biographies
Sunghoon Kwon received the B.S. degree and the M.S. degree in electronics engineering from Han-Yang University, in 1991 and 1993, respectively. He is working toward the Ph.D. degree

Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs

1027

in electronics engineering at Han-Yang University. His research interests include system design and synthesis of VLSIs

Hyunchul Shin (S78-M80-SM96) received


the B.S. degree in electronics engineering from Seoul National University, the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology in 1978 and 1980, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1987. From 1980 to 1983, he was with the Department of Electronics Engineering, Kun-Oh Institute of Technology, Korea. In 1983, he received a Fullbright Scholarship. From 1987 to 1989, he was a Member of the Technical Staff at AT&T Bell Laboratories, Murray Hill, NJ. Since 1989, he has been with the Department of Electronics Engineering, Han-Yang University, Korea. His research interests include design and synthesis of integrated circuits and systems.

You might also like