Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs

1019

AN AREA-EFFICIENT VLSI ARCHITECTURE OF A REED-SOLOMON DECODEWENCODER FOR DIGITAL VCRs
Sunghoon Kwon and Hyunchul Shin Dept. of Electronics Engineering, Hanyang University, Korea

Abstract : A new flexible and area-efficient VLSI architecture of a Reed-Solomon product-code decoder/encoder has been developed for digital VCRs. The new architecture of the decoder/encoder targeted to reduce the circuit size and decoding latency has the following three features. First, high area-efficiency has been achieved by sharing a functional block for encoding, modified syndrome computation, and erasure locator polynomial evaluation. Second, circuit size and decoding latency has been reduced by using a new architecture to implement the modified Euclid’s algorithm. Third, by doubling the internal clock speed from 18 MHz to 36 MHz), the decoding latency and hence the memory size can be reduced. The decodedencoder designed by using the proposed method uses less number of gates, by about 30%, than the one based on the conventional architectures.

I. Introduction
The Reed-Solomon (RS) coding is one of the most powerful and standardized techniques for error and erasure correction. Owing to its excellent capability for correcting burst errors, it has been widely used for digital communication systems and storage devices such as digital VCRs and disk drives [ 11. In particular, for error correction coding (ECC), the digtal VCRs employ RS product codes. RS codes over finite fields are maximum &stance separable. The properties and theoretical analysis of RS codes are well-known [2]. However, the design of a high-bit-rate RS decoder is not straightforward. The complexity of a RS decodedencoder circuit is dependent on the data rate, the code length, and the error correction capability. Therefore, the algorithm and the archtecture should be customized for each specific application to achieve high efficiency. RS decoding techniques can be classified into two categories (time-domain and frequency-domain). Time-domain techmques seem to outperform
Contributed Paper Manuscript received June 9, 1997

frequency-domain techmques both in area and in delay [3]. Most of the published time-domain techniques use one of the following algorithms to evaluate errorlerasure locator and evaluator polynomials. Berlekamp-Massey algorithm : [4] (Modified) Euclid’s algorithm : [ 5 , 6, 71 Matrix calculation : [8] A VLSI design of a pipeline RS decoder using systolic array was also presented in [ 5 ] . By the use of a multiplexing and recursive technique, the modified Euclid’s algorithm was implemented with significant reduction of cells. RS decoder in [7] can decode three types of codes (inner/outer/subcode) flexibly. One of the features of this design is to share the encoder and the syndrome computation block to reduce hardware size. In [9], it is shown that any RS decoder which corrects both errors and erasures also can be used as an encoder for the RS code. But power consumption can be increased by driving all of the blocks of decoder circuit during encodmg. To satisfy the decoding/encoding requirements for digital VCRs, we have developed a new flexible and area-efficient RS product-code decodedencoder architecture. Our decodedencoder can decode and encode three types of audio and video signals over GF(256). The three main features of the proposed decodedencoder are 1. sharing a hardware functional block to evaluate three different functions, 2. developing a new architecture for the modified Euclid’s algorithm and 3. doubling the intemal clock speed to reduce the latency. The proposed RS decodedencoder has been implemented by using about 30% less number of gates than the one based on conventional architectures. In Section 11, we describe the overview of the proposed RS decodedencoder. The flexible

0098 3063/97 $10.00 “1997 IEEE

Correct and verify the errors... Chien search. 2e + f I n .. 77).. . Vol. for (149. The inner code protects inhvidual cade. 9) codes.. New Features of the ecoder/Encoder The new flexible RS decoder circuit based on the modified Euclid's algorithm can correct errors and erasures for product codes (two-dimensional array codes).1020 IEEE Transactions on Consumer Electronics. Suppose e errors and f erasures occur. 9) codes are also encoded and decoded by using the same (149... Perform a modified Euclid's algorithm.. The inner decoder performs error correction and error detection. 5 .. n is the code length and k is information length. T(x). The proposed RS decodedencoder is shown in Fig. and FIFO memory blocks. Furthermore. In the proposed architecture.. error correction and verification.9) audio outer code The maximum number of errors which can be corrected are 4.. inner (row) and outer (column) decoding [lo].. Finally..... which are syndrome computation.... 77) and (14. m p u t Interior Regbns . the conclusions are given in Section V... .... 5 . S(x).. 1. 2t = n . For outer codes. If a decoder failure is declared by the inner decoder. The outer decoder works essentially as an error and erasure corrector.. respectively.A c ..... 3..... 138) code processing block.... modified Euclid's algorithm processing... k) RS codes. Outer RS code protects a complete video or audio sector. The circuit is pipelined and consists of six functional blocks.77) audiohide0 inner code * ( 149.$..k (1) where... the maximum number of erasures which can be corrected are 11 and 5 . encoder and polynomial expansion computation. No. 138) and (14. If the outer decoder can not correct the errors and erasures.. respectively. . then the entire symbols in the corresponding row are erased. 11.... 43.E. 4.. primarily against burst errors spanning a few codes...... Compute the error locator polynomial. 4..... Evaluate erroderasure locator polynomial and error/erasure evaluator polynomial (Ghien search algorithm). Modified Eudids AlgOnthm 1 .. NOVEMBER 1997 decodedencoder architecture is presented in Section 111.138) video outer code * (14.. (85.. is used for both the audio and video signals.k and t is the error correction capability... Our decodedencoder is flexible in that it can decode the following three types of codes over GF(256) [ 111: (85. 2. 1.. Compute a syndrome polynomial.. The same code.. The experimental results are described in Section IV.. for the three codes.. a decoding failure occurs and an error concealment is applied. primarily against random errors... and 2. then in general.... .. The general decoding algorithm is as follows... A(x) and modified syndrome polynomial... the following equation indicated by maximum correction capability should be satisfied in (n... RS (85. The decoding is performed in two steps..

a is a primitive element in GF(2.. k) code is represented by the following equation over GF(256). However. the coefficients of each polynomial are grouped into two i . Let {rn. we use erasure addresses of 8 bits instead of erasure flags. The behavior of each block of the decodedencoder in Fig. s.56). 3. The coefficients of polynomial S ( x ) are calculated as n-k-1 g(x) =(x. A(x). the erasure locations are frequently represented by one-bit flags. to share hardware. Therefore. 1 may operate at 36 MHz.. However.. decoding latency is significantly reduced. The modified syndrome polynomial. The new architecture implementing the modified Euclid’s algorithm is shown in Fig. In inner or outer decoding. +. 1. For faster (simpler) erasure information transfer during this process. T(x)=S(x) A(x) modx2’ (5) .(x - 1=0 111. However. In other approaches. by using the erasure addresses of 8 bits. 77) inner and (149.the three blocks in the interior regions marked by dotted box in Fig. 1. encoding is performed in the “Encoder and Polynomial Expansion” block. Owing to the parallel processing. = r( a’ ) 2. 77) inner decoding is followed by (14. For audio signals. and the erasure locator polynomial. shown in Fig. Similarly.]. (Other blocks operate at 18 MHz) Th~s allows sharing of the Chien search block... (85. 1 can be summarized as follows. The modified syndrome polynomial.. (upper and lower) parts and separately stored in R Q. . U. These novel techniques have the obvious benefits of reducing the decodedencoder size and the decoding latency. The “Syndrome Computation” block calculates syndrome polynomial. A(x) can be calculated in advance and passed to the next “modified Euclid’s algorithm” block. present VLSIs can easily accommodate a 36 MHz clock. The “Encoder and Polynomial Expansion” block calculates the erasure locator polynomial. T(x). T(x). The data rate in our specification is 18 Mbytesjsec at the primary I/O and the base clock speed is 18 MHz. The proposed decoder/ encoder consists of six blocks similar to general dkcoding architecture. 2. registers. 1. In Fig. by errorlerasure locator and erroderasure evaluator polynomials.. r l . The clock speed is doubled in intemal decoder blocks to reduce latency and memory size.1 Overall Behavior of the Decoder/Encoder Encoding for (n. can be computed by using the same hardware by setting different initial values. By using these addresses. describe the proposed architectures and their timing diagrams in detail. ro>be the received symbols of 8 bits and let r(x.a o ) ( x - a ’ ). RS DecodedEncoder Architectures In t h s Section. decoding is performed in two steps for audlo and video signals. 9) outer decodmg. In our design. (3) where. 4. The encoder and polynomial expansions of T(x) and A(x) share a functional block by using MUXs as shown in Fig. erasures are detected during the inner decoding and corrected during the outer decoding. 3.xl + ro (2) The generator polynomial of a (n. In this architecture. A(x) . errors can be detected and corrected. = rn-lxn-l. L. The two parts are calculated in parallel. we use erasure addresses of 8 bits instead of erasure flags of 1 bit. 138) outer decodings are performed. k) RS codes is usually performed by (n . + r. Three different functions share a hardware functional block. Each function is performed in different time periods. respectively. is calculated by using the following equation. we first explain the overall behavior of the proposed decoderlencoder and then. 3. S(x) . . where. 2. (85.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 1021 The proposed decodedencoder architecture has at least three major features which improve the areaefficiency significantly when compared to conventional architectures. for video signals. Decoder size is reduced by using a new circuit architecture in which erroderasure locator and erroderasure evaluator polynomials are found.k)-stage linear feedback shift registers [ 11..

The “FIFO memory” block is used to delay the received data by 2n 6 . A new architecture for this block implementation has been developed and will be described later in detail. (3)) are selected by MUXs.. Switch S is ON (OFF) if erasure value is non-zero (0). 2 shows the detailed architecture of the “Encoder and Polynomial Expansion” block. where 2n is the delay required for the syndrome computation.. Since encoding and polynomial expansions of A(x) and T(x) can be computed by using the architecture shown in Fig. The “Chien Searcp’ block evaluates o(a-’) . Now we explain how encoding of a (149. encoding input is selected by MUXll and MUXI*. & RIO.. The “Modified Euclid’s Algorithm” block computes error/erasure locator polynomial.(a-‘) and m ( d ) for i = 0. Encoder and polynomial expansion block.. the received data stored in the “FIFO memory” block is used. 3 . for i = 0. Constants go . The roots of these polynomials are used to compute erroderasure locations and values. - - is the set of a-”s where a-’ E A implies that the location of r. odd.e. For the first 138 clock cycle times. For the rest 11 clock times. A(x) is computed as follows. is accepted.810 registers. ... A ( x ) = 1. Chien search. Vol. for i = 0. the correction in the inner decoding is not successful and thus the row address is saved for the outer decoding. n-1. 2. the hardware is time-shared for the three computations. In the case of a (149.1022 IEEE Transactions on Consumer Electronics. Assume that A and erroderasure evaluator polynomial. No. a corrected symbol is obtained by adding the error cito the received erroneous symbol r. Othemise. The “Error Correction and Verification” block computes erroderasure value.. . n-1 (7) The is then used to verify the correctness of the corrected values. and verification and 6 is the processing time delay of other (the “Encoder and Polynomial Expansion” and the “modified Euclid’s algorithm”) blocks. n-1 (6) where oOdd (a-’)is the sum of odd-terms of ~ ( x.)by using T(x) and A(x) as initial conditions.a-’) (8) n a-’ EA + During inner decodmg. i. ei. 5 .. i.from ei = ~ ( a . O(X) . the maximum degree of A(x) is 2t = 11. . Initially. 3. Switch S is used to control signal transfer. Then. . 138) code is performed. Fig. RI1 is only used to store the coefficient of the maximum degree 11. systematic encoding is performed..e./’oodd ) ( a ? ).. . ~ ( x. 6 . (the coefficients of the generator polynomial in eq. NOVEMBER 1997 Erasure . 2. For erasure decoding. the resultant parities computed in registers are selected by MUXI 1. 4. 43. since there are no erasures. 4. all the switches are ON and are reset to 0..2 Hardware Sharing for Encoding and Polynomial Expansion Fig. is an erasure. 6= q + ei. 138) outer code. to check whether S(x) = 0 or not.IL* 8 its A Encoding output @ : 8 bits multiplier over GF(256) 0 : 8 bits adder over GF(256) Coefficients of S(X) are stored in registers to be used for the calculation of modified syndrome. The A(x) can be written as A(x) = (X . ) Then. MU& MUXIO.. If S ( x ) = 0 .

recursive computation is performed as follows until the stop condition. we describe decoding of a (149. As a result.RS registers of block 1 and block 2. 3 (c). In addition. double-speed clock is used in this block. else oi =0 and U. the number of cells is reduced from 3 to 1 when compared to a straightforward implementation of [SI. MUX2 selects the output of the arithmetic unit (1) of block 3. Then. Therefore. the coefficients can be computed in parallel. "Qi(x) R. Ui (x) . arithmetic unit (3) is used to compute initial condition for the next "&en Search" block. Since r 6 * 5 / 14 / 2 / 2 1=1. (a). 9) outer code requires only 1 cell. switch S is off and thus signals can not be passed to registers. U. According to I. 3 (a) and (b) show the recursive computation blocks of R i (x) . (ai. 3. If the stop condition is U. This computation is necessary. (R3 R11) and (% R11) registers are used. time. Similarly. After the polynomial expansion. As an example. If - a-' f A ..1 in GF(z"). we compute upper coefficients and lower coefficients of the polynomials in parallel. deg(Li(x)) > deg(R. for n # 2". In the case of a (85. divided into the upper and lower coefficient groups. oi= 1.(x)= [oibiLi(x)+CiaiUi(x)]x'zi'[oiaiUi(x) +FibiLi(x)] (13) (14) ui+l (x) = q u i(x) + OiLi(x) where.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon Decoder/Encoder for Digital VCRs 1023 Initially. 138) code by using the circuit in Fig. Similarly. Q i (x) and Li (x) . MUXl and MUX3 select R respectively. respectively. At this 2 and R5.e. Initial value setting 2.(x) register value transfer control" block is composed of MUXs. (x) = A(x) satisfied. " Qi (x) and 3.then m ( x > = R i ( x ) and o(x)=Li(x). - > - - - . Note that the maximum number of coefficient of Ri ( x > polynomial is 12. RO Rlo registers initially store the coefficients of - For (85. the coefficients of these polynomials are set in Ri . instead of the initial register values.3 An Efficient Architecture to Perform the Modified Euclid's Algorithm To determine erroderasure locator polynomial ~(x) and erroderasure evaluator polynomial m(x) .e. . b i ) is the leadmg coefficients of R 1 + 1 @i (x). 3 (d) shows the arithmetic blocks used in (a) and (b). (c) and (d) shows the proposed new archtecture. (b). is satisfied. i. Similarly. the lower degree coefficients are stored in Ro . the modified syndrome polynomial T ( x ) shown in equation (5) can be computed by using the circuit. And. In Fig. RO = 1 and all the R1 RI1values are 0. respectively. the modified Euclid's algorithm [5] is applied. Qi (x) 1 and { Li (x) . 1. - Li and Ui registers. (x) = x~~ . 3 (a). T ( x ) uses the coefficients of S(x). recursively. i. 9) code are divided into two parts and stored in (& k )and in (R3 Rs) registers in blocks 1 and 2. the upper degree coefficients of Ri ( x are stored in & R11 registers in block 3. the i'th degree coefficient of A(x) are stored in Ri register. i' Qi (XI) (15) (16) Fig. (x) = T(x) Lo(x) = 0. In particular. The algorithm is summarized in the following procedure. Q i .. 9) code is processed by using blocks 1 and 2. Q. the modified Euclid's algorithm is performed. 77) code.(x)).deg(Qi (x)) if Zi 2 0 . Since multiplication and = deg(Ri (XI). Fig. respectively. Fig. RS. Owing to the polynomial computations in (1 1) (16). (x) = ColbiRi (x) + 'iaiQi (~11xl'i'[oiaiQi(x)+FibiRi(x)] (11) (12) Qi+l (x) = o i Q i (x) + ZiRi (x) Li+. (14. 1. 4 and stop condition is calculated. We now describe briefly the hardware operation for the modified Euclid's algorithm. the 6 coefficients of the (14. In this architecture. RIOand R 1 1 registers are initially set to 0. { R i ( x 1. marked by dotted box.Ui(x) I.(x) register value transfer control" block shifts the coefficients or swaps polynomials. 77) and (14. polynomial swapping is needed if Zi < 0 . Recursive computation and stop condition checking The initial values of the Euclid's algorithm are (9) (10) Then. The (14. 9) code processing. S(X) ' addition is performed for ea& coefficient of polynomials. Initially.

. the number of cells and the FIFO memory sizes are reduced.. The architecture is simple... Hardware sharing is possible by using double-speed clock of 36 MHZ. while ~ ( a ... (a. deg(R... o(a-').' ) during odd clocks.... ( x ) .. o(a-')and oodd(a-') are computed.......... The modified Euclid's algorithm is processed by using the proposed architecture in parallel.(x)) (c) Degree computation block.. of R .1024 IEEE Transactions on Consumer Electronics. and erroderasure evaluator polynomials are computed alternately. 4.. 3..(x)) ~ deg(Q. Qi(x) computation block... unit (3). o.. as shown in Fig... the decoding latency can be significantly reduced and as a consequence....... the roots of erroderasure locator polynomials.( x ) Arith... w ( a ........ . (d) Arithmetic units.... Ui(x) computation block.' ) is computed.. . No.. Modified Euclid's algorithm processing block. NOVEMBER 1997 (a) Ri(x). .... Fig..(a-') and are computed in two blocks of the same architecture........... Owing to this parallel processing...... 43. The output .b. We have developed a new archtecture capable of sharing the hardware blocks.... In this block. 4... Vol.....)= leading coeff.... 3.. During even clocks...4 Chien Search Implementation Block In general. Q . (b) Li(x).

Table 1 shows the implementation results based on several design methods. and our proposed method. 3. circuit delays have been considered by restricting the longest path delays of the decodedencoder. (r( 2t + 1) / 21 +1)*2t clock cycle times are needed to complete the modified Euclid’s algorithm. The FIFO memory sizes have been reduced since the latency has been reduced. etc. direct comparison is not possible. We have verified the correctness of the designed decodedencoder by using extensive simulatiomith randomly generated data and errors. In others. Chien search Implementation block.e. decoding latency is reduced by half. Experimental Results We have implemented the proposed RS decodedencoder using VHDL. overall decodmg latency is given by + {2t + Decoding latency = n ((r(2t + 1) / 21 + 1) * 2t + (2t + 1)) }/2 + n + 3. the modified Euclid’s algorithm has been used. design specification (code length. k) RS decodedencoder. is adjusted to the original clock speed which is 18 M H Z. During the hardware synthesis. Of course. error correction capability. IV. the latency of the Euclid’s algorithm block is { (r(2t + 1) / 21 + 1) * 2t The error correction and verification is initially performed for (n+3) clock cycle times. It takes (2t+l) clock cycle times to compute initial values used in Chien search block. 5 shows the timing diagram for (n. [ 5 ] . In this latency. The number of total gates is from the synthesized results excluding the FIFO memories.5 RS Decoder/Encoder Timing Diagram Fig.. In the industry result. i. + (2t + 1) }/2. w . As a result.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 8 / -0 1025 r 1’ odd (a-i) Summation by XOR tree Fig. A gate corresponds to a 2 input NAND gate. [7]. 4 .000 gates. Note that in [7]. In particular. Berlekamp-Massey algorithm has been used.) is different from others and therefore. which is about 30 % smaller than the other design results. an industry design. by using the double-speed clock. The new proposed architecture has been implemented by using about 35.

B. 37. we have proposed a new flexible and area-efficient VLSI architecture of a Reed-Solomon product code decodedencoder for digital VCRs . Shao and I.828. Wang. pp. Reed. 40.e. Wicker and V. Chun. D. SD. Sasada. He is working toward the Ph. 1996. 1989. S. 6. Hsu and C. S. Lee. Iwaki. S. Hsu. Kim and S.868. D. Hsu.." IEEE Trans. [4] J. May 1996. "Architecture of A High Speed ReedSolomon Decoder.S. [2] S. S. J. degree in electronics engineering from Han-Yang University. Park.. and J. 1988. [7] G. E. "An Error-Control Coding Scheme for Multispeed Play of Digital VCR". 1. 1995. Tanaka. vol. Confidential. on Circuits and Systems for Video Technology. K. Cho.D. 3291-3294. The proposed architecture has three major features. "A VLSI Design of RS CODEC for Digital Data Recorder. 10. no. no. S. [lo] S. Yamada. June 1994. Y. K. on Circuits and Systems for Video Technology. H. vol. Sepulveda. H. "An Area-Efficient VLSI Architecture for Decoding of Reed-Solomon Codes. pp. Biographies Sunghoon Kwon received the B. Nam. "Use of the RS Decoder as an RS Encoder for Two-way Digital Communications and Storage Systems. [5] H. Prentice Hall. IEEE Press." U. pp. Jung. i. 3. W. 1994. NTSC. May." IEEE Trans." ISCAS. S. K. and T. area-efficiency has been significantly improved by sharing a functional block. no. I. Burleson. the results have not been published. IEEE Trans. M. pp. 1. K. K. 19. New York. Hills. Lee. the decoding latency and hence the memory size have been reduced. W. degree and the M. 4. June 1995 [1I] "Specifications of Digital VCR for Consumer-Use". Second. 91-92. [3] Y. S. Sep. by doubling the internal clock speed. First. "High-Level Estimation of High-Performance Architectures for ReedSolomon Decoding. Kwuan. Oct. of Pasadena. DRAFT IEC document." ASIC Design Workshop. no. B. Lee. PAL. "On the VLSI Design of a Pipeline Reed-Solomon Decoder Using Systolic Arrays. Truong. B. vol. I. Shao. "Architecture for Time or Transform Domain Decoding of Reed-Solomon Codes. H. 1995. "Reed-Solomon Codes and Their Applications". vol. pp. both References [I] S. L. W. NOVEMBER 1997 Modified Euclid's algorithm* (Notes): '-'in 171 means unknown. SECOM. Jeong and W. 115-124. degree . Feb. 43. Han.1026 IEEE Transactions on Consumer Electronics. respectively. Choi. V. Wicker "Error Control Systems for Digital Communication and Storage". S. I. W. I. T. 1. 75-81. Vol. L. T. 243-247. J. Standard Definition. C. Conclusions In this paper. a new archtecture implementing the modified Euclid's algorithm has been developed to reduce the circuit size and latency. and T. vol. Bhargava. D. [6] H. [9] C." ICASSP. Feb. Kim. No. pp. all of Calif. M. 720-723. T. Reed." IEEE Trans.S. Truong. 1273-1280. Okuda. Patent 4. on Computers. 1994. Y. Third. in 1991 and 1993. Deutsch. on Consumer Electronics. M. 5. The correctness of the proposed decoder/encoder has been verified by the extensive simulation. HD-Digital VCR Conference. 4. pp. 1994. [8] T. vol. Y.

In 1983. he was a Member of the Technical Staff at AT&T Bell Laboratories. Murray Hill. and the Ph.D. he has been with the Department of Electronics Engineering. Korea. he was with the Department of Electronics Engineering. degree in electrical engineering from the Korea Advanced Institute of Science and Technology in 1978 and 1980. respectively. Korea. Berkeley. From 1980 to 1983. Han-Yang University. degree in electrical engineering and computer sciences from the University of California.S.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 1027 in electronics engineering at Han-Yang University. in 1987. His research interests include design and synthesis of integrated circuits and systems. NJ. Kun-Oh Institute of Technology. From 1987 to 1989. His research interests include system design and synthesis of VLSI’s Hyunchul Shin (S’78-M’80-SM’96) received the B. . the M. Since 1989. he received a Fullbright Scholarship. degree in electronics engineering from Seoul National University.S.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.