Professional Documents
Culture Documents
1019
Abstract : A new flexible and area-efficient VLSI architecture of a Reed-Solomon product-code decoder/encoder has been developed for digital VCRs. The new architecture of the decoder/encoder targeted to reduce the circuit size and decoding latency has the following three features. First, high area-efficiency has been achieved by sharing a functional block for encoding, modified syndrome computation, and erasure locator polynomial evaluation. Second, circuit size and decoding latency has been reduced by using a new architecture to implement the modified Euclids algorithm. Third, by doubling the internal clock speed from 18 MHz to 36 MHz), the decoding latency and hence the memory size can be reduced. The decodedencoder designed by using the proposed method uses less number of gates, by about 30%, than the one based on the conventional architectures.
I. Introduction
The Reed-Solomon (RS) coding is one of the most powerful and standardized techniques for error and erasure correction. Owing to its excellent capability for correcting burst errors, it has been widely used for digital communication systems and storage devices such as digital VCRs and disk drives [ 11. In particular, for error correction coding (ECC), the digtal VCRs employ RS product codes. RS codes over finite fields are maximum &stance separable. The properties and theoretical analysis of RS codes are well-known [2]. However, the design of a high-bit-rate RS decoder is not straightforward. The complexity of a RS decodedencoder circuit is dependent on the data rate, the code length, and the error correction capability. Therefore, the algorithm and the archtecture should be customized for each specific application to achieve high efficiency. RS decoding techniques can be classified into two categories (time-domain and frequency-domain). Time-domain techmques seem to outperform
Contributed Paper Manuscript received June 9, 1997
frequency-domain techmques both in area and in delay [3]. Most of the published time-domain techniques use one of the following algorithms to evaluate errorlerasure locator and evaluator polynomials. Berlekamp-Massey algorithm : [4] (Modified) Euclids algorithm : [ 5 , 6, 71 Matrix calculation : [8] A VLSI design of a pipeline RS decoder using systolic array was also presented in [ 5 ] . By the use of a multiplexing and recursive technique, the modified Euclids algorithm was implemented with significant reduction of cells. RS decoder in [7] can decode three types of codes (inner/outer/subcode) flexibly. One of the features of this design is to share the encoder and the syndrome computation block to reduce hardware size. In [9], it is shown that any RS decoder which corrects both errors and erasures also can be used as an encoder for the RS code. But power consumption can be increased by driving all of the blocks of decoder circuit during encodmg. To satisfy the decoding/encoding requirements for digital VCRs, we have developed a new flexible and area-efficient RS product-code decodedencoder architecture. Our decodedencoder can decode and encode three types of audio and video signals over GF(256). The three main features of the proposed decodedencoder are 1. sharing a hardware functional block to evaluate three different functions, 2. developing a new architecture for the modified Euclids algorithm and 3. doubling the intemal clock speed to reduce the latency. The proposed RS decodedencoder has been implemented by using about 30% less number of gates than the one based on conventional architectures. In Section 11, we describe the overview of the proposed RS decodedencoder. The flexible
1020
decodedencoder architecture is presented in Section 111. The experimental results are described in Section IV. Finally, the conclusions are given in Section V.
corrected are 4, 5 , and 2, respectively, for the three codes. For outer codes, the maximum number of erasures which can be corrected are 11 and 5 , respectively, for (149, 138) and (14, 9) codes. In the proposed architecture, (85, 77) and (14, 9) codes are also encoded and decoded by using the same (149, 138) code processing block. Suppose e errors and f erasures occur, then in general, the following equation indicated by maximum correction capability should be satisfied in (n, k) RS codes. 2e + f I n - k (1) where, n is the code length and k is information length. Furthermore, 2t = n - k and t is the error correction capability. The proposed RS decodedencoder is shown in Fig. 1. The circuit is pipelined and consists of six functional blocks, which are syndrome computation, encoder and polynomial expansion computation, modified Euclid's algorithm processing, Chien search, error correction and verification, and FIFO memory blocks. The general decoding algorithm is as follows. 1. Compute a syndrome polynomial, S(x). 2. Compute the error locator polynomial, A(x) and modified syndrome polynomial, T(x). 3. Perform a modified Euclid's algorithm. 4. Evaluate erroderasure locator polynomial and error/erasure evaluator polynomial (Ghien search algorithm). 5 . Correct and verify the errors.
.E.$. m p u t
....................................
....................
Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs
1021
The proposed decodedencoder architecture has at least three major features which improve the areaefficiency significantly when compared to conventional architectures. 1. Three different functions share a hardware functional block. The encoder and polynomial expansions of T(x) and A(x) share a functional block by using MUXs as shown in Fig. 2. The modified syndrome polynomial, T(x), and the erasure locator polynomial, A(x) , can be computed by using the same hardware by setting different initial values. Each function is performed in different time periods. For faster (simpler) erasure information transfer during this process, we use erasure addresses of 8 bits instead of erasure flags of 1 bit. 2. Decoder size is reduced by using a new circuit architecture in which erroderasure locator and erroderasure evaluator polynomials are found. The new architecture implementing the modified Euclids algorithm is shown in Fig. 3. In this architecture, the coefficients of each polynomial are grouped into two i , (upper and lower) parts and separately stored in R Q;, L,, U, registers. The two parts are calculated in parallel. Owing to the parallel processing, decoding latency is significantly reduced. 3. The clock speed is doubled in intemal decoder blocks to reduce latency and memory size. The data rate in our specification is 18 Mbytesjsec at the primary I/O and the base clock speed is 18 MHz. However, present VLSIs can easily accommodate a 36 MHz clock. Therefore, ,the three blocks in the interior regions marked by dotted box in Fig. 1 may operate at 36 MHz. (Other blocks operate at 18 MHz) Th~s allows sharing of the Chien search block, shown in Fig. 4, by errorlerasure locator and erroderasure evaluator polynomials, These novel techniques have the obvious benefits of reducing the decodedencoder size and the decoding latency.
(3) where, a is a primitive element in GF(2.56). The behavior of each block of the decodedencoder in Fig. 1 can be summarized as follows. 1. The Syndrome Computation block calculates syndrome polynomial, S(x) . The coefficients of polynomial S ( x ) are calculated as
n-k-1
g(x) =(x- a o ) ( x -
a )...(x -
1=0
where, s, = r( a ) 2. The Encoder and Polynomial Expansion block calculates the erasure locator polynomial, A(x), by using the erasure addresses of 8 bits. In other approaches, the erasure locations are frequently represented by one-bit flags. However, we use erasure addresses of 8 bits instead of erasure flags. By using these addresses, A(x) can be calculated in advance and passed to the next modified Euclids algorithm block. The modified syndrome polynomial, T(x), is calculated by using the following equation.
(5)
1022
Erasure
,,IL*
8 its
A
Encoding output
Coefficients of S(X) are stored in registers to be used for the calculation of modified syndrome. Switch S is ON (OFF) if erasure value is non-zero (0). Fig. 2. Encoder and polynomial expansion block.
3 . The Modified Euclids Algorithm block computes error/erasure locator polynomial, O(X) ,
and erroderasure evaluator polynomial, ~ ( x,)by using T(x) and A(x) as initial conditions. A new architecture for this block implementation has been developed and will be described later in detail. 4. The Chien Searcp block evaluates o(a-) ,
of these polynomials are used to compute erroderasure locations and values. 5 . The Error Correction and Verification block computes erroderasure value, ei,from ei = ~ ( a - /oodd ) ( a ? ), for i = 0, ..., n-1 (6) where oOdd (a-)is the sum of odd-terms of ~ ( x. ) Then, a corrected symbol is obtained by adding the error cito the received erroneous symbol r, . 6= q + ei, for i = 0, ..., n-1 (7) The is then used to verify the correctness of the corrected values, i.e., to check whether S(x) = 0 or not. If S ( x ) = 0 , is accepted. Othemise, the correction in the inner decoding is not successful and thus the row address is saved for the outer decoding. For erasure decoding, the received data stored in the FIFO memory block is used. 6 . The FIFO memory block is used to delay the received data by 2n 6 , where 2n is the delay required for the syndrome computation, Chien search, and verification and 6 is the processing time delay of other (the Encoder and Polynomial Expansion and the modified Euclids algorithm) blocks,
is the set of a-s where a- E A implies that the location of r, is an erasure. The A(x) can be written as A(x) = (X - a-) (8)
a- EA
During inner decodmg, A ( x ) = 1, since there are no erasures. In the case of a (149, 138) outer code, the maximum degree of A(x) is 2t = 11. RI1 is only used to store the coefficient of the maximum degree 11.
Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon Decoder/Encoder for Digital VCRs
1023
Initially, RO = 1 and all the R1 RI1values are 0. After the polynomial expansion, the i'th degree coefficient of A(x) are stored in Ri register. If
a-' f A , switch S is off and thus signals can not be passed to registers. Similarly, the modified syndrome polynomial T ( x ) shown in equation (5) can be computed by using the circuit. T ( x ) uses the coefficients of S(x), instead of the initial register values, i.e., 1. RO Rlo registers initially store the coefficients of
For (85, 77) and (14, 9) code processing, (R3 R11) and (% R11) registers are used, respectively.
S(X)
'
addition is performed for ea& coefficient of polynomials, the coefficients can be computed in parallel. In addition, double-speed clock is used in this block. Since r 6 * 5 / 14 / 2 / 2 1=1, (14, 9) outer code requires only 1 cell. As a result, the number of cells is reduced from 3 to 1 when compared to a straightforward implementation of [SI. We now describe briefly the hardware operation for the modified Euclid's algorithm. Fig. 3 (a) and (b) show the recursive computation blocks of R i (x) , Q i (x) and Li (x) , Ui (x) . Initially, the coefficients of these polynomials are set in Ri , Q i ,
Qi (x) and
U,(x) register value transfer control" block is composed of MUXs. In Fig. 3 (c), 4 and stop
condition is calculated. According to
I,
, "Qi(x)
(x) = ColbiRi (x) + 'iaiQi (~11xl'i'[oiaiQi(x)+FibiRi(x)] (11) (12) Qi+l (x) = o i Q i (x) + ZiRi (x) Li+,(x)= [oibiLi(x)+CiaiUi(x)]x'zi'[oiaiUi(x) +FibiLi(x)] (13) (14) ui+l (x) = q u i(x) + OiLi(x) where, (ai, b i ) is the leadmg coefficients of
R 1 + 1
@i
(x),
i'
Qi
(XI)
(15) (16) Fig. 3, (a), (b), (c) and (d) shows the proposed new archtecture. In this architecture, we compute upper coefficients and lower coefficients of the polynomials in parallel. Since multiplication and
and U,(x) register value transfer control" block shifts the coefficients or swaps polynomials, { R i ( x 1, Qi (x) 1 and { Li (x) ,Ui(x) I. Owing to the polynomial computations in (1 1) (16), polynomial swapping is needed if Zi < 0 . Fig. 3 (d) shows the arithmetic blocks used in (a) and (b). In particular, arithmetic unit (3) is used to compute initial condition for the next "&en Search" block. This computation is necessary, for n # 2"- 1 in GF(z"). As an example, we describe decoding of a (149, 138) code by using the circuit in Fig. 3 (a). Note that the maximum number of coefficient of Ri ( x > polynomial is 12. Therefore, the upper degree coefficients of Ri ( x are stored in & R11 registers in block 3, marked by dotted box. Similarly, the lower degree coefficients are stored in Ro - RS registers of block 1 and block 2. Then, the modified Euclid's algorithm is performed, recursively. At this 2 and R5, time, MUXl and MUX3 select R respectively. And, MUX2 selects the output of the arithmetic unit (1) of block 3. In the case of a (85, 77) code, RS, RIOand R 1 1 registers are initially set to 0. The (14, 9) code is processed by using blocks 1 and 2. Similarly, the 6 coefficients of the (14, 9) code are divided into two parts and stored in (& k )and in (R3 Rs) registers in blocks 1 and 2, respectively.
>
1024
.............................................
deg(R;(x))
~
deg(Q;(x))
.....................................
(a,,b,)= leading coeff. of R , ( x ) , Q ,( x )
The modified Euclid's algorithm is processed by using the proposed architecture in parallel. Owing to this parallel processing, the decoding latency can be significantly reduced and as a consequence, the number of cells and the FIFO memory sizes are reduced.
are computed in two blocks of the same architecture. We have developed a new archtecture capable of sharing the hardware blocks. The architecture is simple, as shown in Fig. 4. Hardware sharing is possible by using double-speed clock of 36 MHZ. In this block, the roots of erroderasure locator polynomials, and erroderasure evaluator polynomials are computed alternately. During even clocks, o(a-')and oodd(a-') are computed, while
Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs
8 / -0
1025
odd
(a-i)
Summation by XOR tree Fig. 4 . Chien search Implementation block. is adjusted to the original clock speed which is 18
M H Z.
The error correction and verification is initially performed for (n+3) clock cycle times. As a result, overall decodmg latency is given by + {2t + Decoding latency = n ((r(2t + 1) / 21 + 1) * 2t + (2t + 1)) }/2 + n + 3.
(r(
2t
+ 1) / 21 +1)*2t
clock cycle times are needed to complete the modified Euclids algorithm. It takes (2t+l) clock cycle times to compute initial values used in Chien search block. Of course, by using the double-speed clock, decoding latency is reduced by half.
w
1026
(Notes): '-'in 171 means unknown, i.e., the results have not been published.
V. Conclusions
In this paper, we have proposed a new flexible and area-efficient VLSI architecture of a Reed-Solomon product code decodedencoder for digital VCRs . The proposed architecture has three major features. First, area-efficiency has been significantly improved by sharing a functional block. Second, a new archtecture implementing the modified Euclid's algorithm has been developed to reduce the circuit size and latency. Third, by doubling the internal clock speed, the decoding latency and hence the memory size have been reduced. The correctness of the proposed decoder/encoder has been verified by the extensive simulation.
References
[I] S. B. Wicker and V. K. Bhargava, "Reed-Solomon Codes and Their Applications", New York, IEEE Press, 1994. [2] S. B. Wicker "Error Control Systems for Digital Communication and Storage", Prentice Hall, 1995. [3] Y. Jeong and W. Burleson, "High-Level Estimation of High-Performance Architectures for ReedSolomon Decoding," ISCAS, vol. 1, pp. 720-723, May, 1995. [4] J. M. Hsu and C. L. Wang, "An Area-Efficient VLSI Architecture for Decoding of Reed-Solomon Codes," ICASSP, vol. 6, pp. 3291-3294, May 1996. [5] H. M. Shao and I. S. Reed, "On the VLSI Design of a Pipeline Reed-Solomon Decoder Using Systolic Arrays," IEEE Trans. on Computers, vol. 37, no. 10, pp. 1273-1280, Oct. 1988.
of Pasadena, L. J. Deutsch, Sepulveda, all of Calif., "Architecture for Time or Transform Domain Decoding of Reed-Solomon Codes," U. S. Patent 4,868,828, Sep. 19, 1989. [7] G. Y. Lee, B. H. Kwuan, S. W. Lee, J. W. Jung, S. H. Nam, Y. S. Chun, D. I. Han, K. S. Park, Y. D. Choi, D. I. Cho, and J. K. Lee, "A VLSI Design of RS CODEC for Digital Data Recorder," ASIC Design Workshop, pp. 115-124, 1996. [8] T. Iwaki, T. Tanaka, E. Yamada, T. Okuda, and T. Sasada, "Architecture of A High Speed ReedSolomon Decoder," IEEE Trans. on Consumer Electronics, vol. 40, no. 1, pp. 75-81, Feb. 1994. [9] C. C. Hsu, I. S. Reed, and T. K. Truong, "Use of the RS Decoder as an RS Encoder for Two-way Digital Communications and Storage Systems," IEEE Trans. on Circuits and Systems for Video Technology, vol. 4. no. 1, pp. 91-92, Feb. 1994. [lo] S. H. Kim and S. W. Kim, "An Error-Control Coding Scheme for Multispeed Play of Digital VCR", IEEE Trans. on Circuits and Systems for Video Technology, vol. 5. no. 3. pp. 243-247, June 1995 [1I] "Specifications of Digital VCR for Consumer-Use", SD; Standard Definition, NTSC, PAL, SECOM, HD-Digital VCR Conference, DRAFT IEC
document, Confidential, June 1994.
Biographies
Sunghoon Kwon received the B.S. degree and the M.S. degree in electronics engineering from Han-Yang University, in 1991 and 1993, respectively. He is working toward the Ph.D. degree
Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs
1027
in electronics engineering at Han-Yang University. His research interests include system design and synthesis of VLSIs