Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs

1019

AN AREA-EFFICIENT VLSI ARCHITECTURE OF A REED-SOLOMON DECODEWENCODER FOR DIGITAL VCRs
Sunghoon Kwon and Hyunchul Shin Dept. of Electronics Engineering, Hanyang University, Korea

Abstract : A new flexible and area-efficient VLSI architecture of a Reed-Solomon product-code decoder/encoder has been developed for digital VCRs. The new architecture of the decoder/encoder targeted to reduce the circuit size and decoding latency has the following three features. First, high area-efficiency has been achieved by sharing a functional block for encoding, modified syndrome computation, and erasure locator polynomial evaluation. Second, circuit size and decoding latency has been reduced by using a new architecture to implement the modified Euclid’s algorithm. Third, by doubling the internal clock speed from 18 MHz to 36 MHz), the decoding latency and hence the memory size can be reduced. The decodedencoder designed by using the proposed method uses less number of gates, by about 30%, than the one based on the conventional architectures.

I. Introduction
The Reed-Solomon (RS) coding is one of the most powerful and standardized techniques for error and erasure correction. Owing to its excellent capability for correcting burst errors, it has been widely used for digital communication systems and storage devices such as digital VCRs and disk drives [ 11. In particular, for error correction coding (ECC), the digtal VCRs employ RS product codes. RS codes over finite fields are maximum &stance separable. The properties and theoretical analysis of RS codes are well-known [2]. However, the design of a high-bit-rate RS decoder is not straightforward. The complexity of a RS decodedencoder circuit is dependent on the data rate, the code length, and the error correction capability. Therefore, the algorithm and the archtecture should be customized for each specific application to achieve high efficiency. RS decoding techniques can be classified into two categories (time-domain and frequency-domain). Time-domain techmques seem to outperform
Contributed Paper Manuscript received June 9, 1997

frequency-domain techmques both in area and in delay [3]. Most of the published time-domain techniques use one of the following algorithms to evaluate errorlerasure locator and evaluator polynomials. Berlekamp-Massey algorithm : [4] (Modified) Euclid’s algorithm : [ 5 , 6, 71 Matrix calculation : [8] A VLSI design of a pipeline RS decoder using systolic array was also presented in [ 5 ] . By the use of a multiplexing and recursive technique, the modified Euclid’s algorithm was implemented with significant reduction of cells. RS decoder in [7] can decode three types of codes (inner/outer/subcode) flexibly. One of the features of this design is to share the encoder and the syndrome computation block to reduce hardware size. In [9], it is shown that any RS decoder which corrects both errors and erasures also can be used as an encoder for the RS code. But power consumption can be increased by driving all of the blocks of decoder circuit during encodmg. To satisfy the decoding/encoding requirements for digital VCRs, we have developed a new flexible and area-efficient RS product-code decodedencoder architecture. Our decodedencoder can decode and encode three types of audio and video signals over GF(256). The three main features of the proposed decodedencoder are 1. sharing a hardware functional block to evaluate three different functions, 2. developing a new architecture for the modified Euclid’s algorithm and 3. doubling the intemal clock speed to reduce the latency. The proposed RS decodedencoder has been implemented by using about 30% less number of gates than the one based on conventional architectures. In Section 11, we describe the overview of the proposed RS decodedencoder. The flexible

0098 3063/97 $10.00 “1997 IEEE

... respectively. 2t = n .. Outer RS code protects a complete video or audio sector..$. 138) code processing block. encoder and polynomial expansion computation. 11...... No. primarily against random errors. Evaluate erroderasure locator polynomial and error/erasure evaluator polynomial (Ghien search algorithm)..9) audio outer code The maximum number of errors which can be corrected are 4. primarily against burst errors spanning a few codes. inner (row) and outer (column) decoding [lo]. Furthermore.. The inner code protects inhvidual cade.. Modified Eudids AlgOnthm 1 . The inner decoder performs error correction and error detection.. For outer codes.. 4. the following equation indicated by maximum correction capability should be satisfied in (n.... The proposed RS decodedencoder is shown in Fig.. and 2. respectively.. error correction and verification.. The outer decoder works essentially as an error and erasure corrector. 2.E. New Features of the ecoder/Encoder The new flexible RS decoder circuit based on the modified Euclid's algorithm can correct errors and erasures for product codes (two-dimensional array codes). 43... 4.. T(x). .. 1. the maximum number of erasures which can be corrected are 11 and 5 . Compute a syndrome polynomial. then in general. 77) and (14. for (149. If a decoder failure is declared by the inner decoder... 2e + f I n . modified Euclid's algorithm processing.. RS (85.. The decoding is performed in two steps.. If the outer decoder can not correct the errors and erasures.. Vol. k) RS codes... 3.... 5 .. S(x). 5 .k (1) where........ Our decodedencoder is flexible in that it can decode the following three types of codes over GF(256) [ 111: (85. Suppose e errors and f erasures occur. NOVEMBER 1997 decodedencoder architecture is presented in Section 111.. 1...A c ... a decoding failure occurs and an error concealment is applied.. The experimental results are described in Section IV..... Chien search. 138) and (14.77) audiohide0 inner code * ( 149. .. n is the code length and k is information length. 77). . then the entire symbols in the corresponding row are erased. is used for both the audio and video signals.138) video outer code * (14.... The general decoding algorithm is as follows.k and t is the error correction capability. for the three codes.. Finally. the conclusions are given in Section V. The same code..... (85... Perform a modified Euclid's algorithm.. A(x) and modified syndrome polynomial.. 9) codes.. which are syndrome computation.. 9) codes are also encoded and decoded by using the same (149.. Compute the error locator polynomial... and FIFO memory blocks.. In the proposed architecture.. m p u t Interior Regbns ... . The circuit is pipelined and consists of six functional blocks...1020 IEEE Transactions on Consumer Electronics.. Correct and verify the errors.

. we first explain the overall behavior of the proposed decoderlencoder and then. and the erasure locator polynomial. s. can be computed by using the same hardware by setting different initial values. 3. Three different functions share a hardware functional block.. Similarly. ro>be the received symbols of 8 bits and let r(x. The “Encoder and Polynomial Expansion” block calculates the erasure locator polynomial. 3.the three blocks in the interior regions marked by dotted box in Fig. The data rate in our specification is 18 Mbytesjsec at the primary I/O and the base clock speed is 18 MHz. However. 1 may operate at 36 MHz. encoding is performed in the “Encoder and Polynomial Expansion” block. U. erasures are detected during the inner decoding and corrected during the outer decoding. describe the proposed architectures and their timing diagrams in detail. 1. However. The modified syndrome polynomial. By using these addresses. A(x) can be calculated in advance and passed to the next “modified Euclid’s algorithm” block. Each function is performed in different time periods. In Fig. For audio signals. registers. L. 1. +. r l .k)-stage linear feedback shift registers [ 11. In other approaches.. present VLSIs can easily accommodate a 36 MHz clock. 77) inner decoding is followed by (14. The behavior of each block of the decodedencoder in Fig.(x - 1=0 111.xl + ro (2) The generator polynomial of a (n..].Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 1021 The proposed decodedencoder architecture has at least three major features which improve the areaefficiency significantly when compared to conventional architectures. a is a primitive element in GF(2. For faster (simpler) erasure information transfer during this process. The proposed decoder/ encoder consists of six blocks similar to general dkcoding architecture. by using the erasure addresses of 8 bits. These novel techniques have the obvious benefits of reducing the decodedencoder size and the decoding latency. shown in Fig. 1 can be summarized as follows. . 3. decoding latency is significantly reduced.. 4. However. In our design. T(x).. The two parts are calculated in parallel. 9) outer decodmg. The “Syndrome Computation” block calculates syndrome polynomial. T(x). for video signals. (85. In this architecture. S(x) . The coefficients of polynomial S ( x ) are calculated as n-k-1 g(x) =(x. (85. 1. T(x)=S(x) A(x) modx2’ (5) . (3) where. is calculated by using the following equation. A(x). = rn-lxn-l. to share hardware. The encoder and polynomial expansions of T(x) and A(x) share a functional block by using MUXs as shown in Fig. the erasure locations are frequently represented by one-bit flags. Decoder size is reduced by using a new circuit architecture in which erroderasure locator and erroderasure evaluator polynomials are found. The clock speed is doubled in intemal decoder blocks to reduce latency and memory size. 138) outer decodings are performed.56). Owing to the parallel processing. 77) inner and (149. we use erasure addresses of 8 bits instead of erasure flags of 1 bit.. Therefore. errors can be detected and corrected. RS DecodedEncoder Architectures In t h s Section. we use erasure addresses of 8 bits instead of erasure flags. k) code is represented by the following equation over GF(256). .1 Overall Behavior of the Decoder/Encoder Encoding for (n. 2. where. A(x) . by errorlerasure locator and erroderasure evaluator polynomials. (Other blocks operate at 18 MHz) Th~s allows sharing of the Chien search block.. The modified syndrome polynomial. 2. k) RS codes is usually performed by (n . = r( a’ ) 2. (upper and lower) parts and separately stored in R Q. respectively. the coefficients of each polynomial are grouped into two i . Let {rn. decoding is performed in two steps for audlo and video signals. The new architecture implementing the modified Euclid’s algorithm is shown in Fig. In inner or outer decoding. + r.a o ) ( x - a ’ ).

2 Hardware Sharing for Encoding and Polynomial Expansion Fig. . Fig. For the rest 11 clock times. If S ( x ) = 0 . for i = 0. A new architecture for this block implementation has been developed and will be described later in detail.. The “Chien Searcp’ block evaluates o(a-’) . Assume that A and erroderasure evaluator polynomial. n-1 (7) The is then used to verify the correctness of the corrected values. the maximum degree of A(x) is 2t = 11. 6= q + ei. Since encoding and polynomial expansions of A(x) and T(x) can be computed by using the architecture shown in Fig. 3 . n-1.1022 IEEE Transactions on Consumer Electronics..IL* 8 its A Encoding output @ : 8 bits multiplier over GF(256) 0 : 8 bits adder over GF(256) Coefficients of S(X) are stored in registers to be used for the calculation of modified syndrome. the received data stored in the “FIFO memory” block is used. The “Error Correction and Verification” block computes erroderasure value. 2. 4. For the first 138 clock cycle times. The “FIFO memory” block is used to delay the received data by 2n 6 . & RIO. Vol. Othemise. The roots of these polynomials are used to compute erroderasure locations and values... i.. is an erasure. odd. The A(x) can be written as A(x) = (X .. 4.. 5 . .. 138) code is performed. Encoder and polynomial expansion block. 6 . NOVEMBER 1997 Erasure . Switch S is used to control signal transfer.. For erasure decoding. . encoding input is selected by MUXll and MUXI*.810 registers. (3)) are selected by MUXs. .from ei = ~ ( a . - - is the set of a-”s where a-’ E A implies that the location of r. Switch S is ON (OFF) if erasure value is non-zero (0).e.. systematic encoding is performed.e. Constants go . the correction in the inner decoding is not successful and thus the row address is saved for the outer decoding./’oodd ) ( a ? ).)by using T(x) and A(x) as initial conditions. the resultant parities computed in registers are selected by MUXI 1. A ( x ) = 1. i. No. . A(x) is computed as follows. a corrected symbol is obtained by adding the error cito the received erroneous symbol r. and verification and 6 is the processing time delay of other (the “Encoder and Polynomial Expansion” and the “modified Euclid’s algorithm”) blocks. ) Then. Now we explain how encoding of a (149. where 2n is the delay required for the syndrome computation. to check whether S(x) = 0 or not.(a-‘) and m ( d ) for i = 0. ~ ( x. n-1 (6) where oOdd (a-’)is the sum of odd-terms of ~ ( x. 138) outer code. all the switches are ON and are reset to 0. Initially. ei. 3. The “Modified Euclid’s Algorithm” block computes error/erasure locator polynomial.a-’) (8) n a-’ EA + During inner decodmg. Then. since there are no erasures.. the hardware is time-shared for the three computations.. In the case of a (149. O(X) . 2. 2 shows the detailed architecture of the “Encoder and Polynomial Expansion” block. RI1 is only used to store the coefficient of the maximum degree 11. (the coefficients of the generator polynomial in eq. for i = 0. 43. MU& MUXIO. is accepted. Chien search.

b i ) is the leadmg coefficients of R 1 + 1 @i (x). the number of cells is reduced from 3 to 1 when compared to a straightforward implementation of [SI. (c) and (d) shows the proposed new archtecture. Therefore. i' Qi (XI) (15) (16) Fig. we describe decoding of a (149. RO = 1 and all the R1 RI1values are 0. As an example. If the stop condition is U. . (R3 R11) and (% R11) registers are used. the 6 coefficients of the (14. we compute upper coefficients and lower coefficients of the polynomials in parallel. arithmetic unit (3) is used to compute initial condition for the next "&en Search" block.deg(Qi (x)) if Zi 2 0 .3 An Efficient Architecture to Perform the Modified Euclid's Algorithm To determine erroderasure locator polynomial ~(x) and erroderasure evaluator polynomial m(x) .then m ( x > = R i ( x ) and o(x)=Li(x). RO Rlo registers initially store the coefficients of - For (85. recursively. In addition.(x) register value transfer control" block shifts the coefficients or swaps polynomials. deg(Li(x)) > deg(R. 77) code. (a). 9) code are divided into two parts and stored in (& k )and in (R3 Rs) registers in blocks 1 and 2. 3. U. Similarly. respectively. In Fig. recursive computation is performed as follows until the stop condition. respectively. 9) code is processed by using blocks 1 and 2. Q. respectively. 9) outer code requires only 1 cell. else oi =0 and U. Ui (x) . Qi (x) 1 and { Li (x) . 4 and stop condition is calculated. 3 (a). 3 (c). time.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon Decoder/Encoder for Digital VCRs 1023 Initially. divided into the upper and lower coefficient groups.(x) register value transfer control" block is composed of MUXs. the coefficients can be computed in parallel. At this 2 and R5. Initial value setting 2. { R i ( x 1. the lower degree coefficients are stored in Ro . for n # 2". In this architecture. i. After the polynomial expansion. the modified Euclid's algorithm is performed. the i'th degree coefficient of A(x) are stored in Ri register. 3 (a) and (b) show the recursive computation blocks of R i (x) . Then. In particular. (x) = T(x) Lo(x) = 0. Recursive computation and stop condition checking The initial values of the Euclid's algorithm are (9) (10) Then. And. polynomial swapping is needed if Zi < 0 . (x) = ColbiRi (x) + 'iaiQi (~11xl'i'[oiaiQi(x)+FibiRi(x)] (11) (12) Qi+l (x) = o i Q i (x) + ZiRi (x) Li+. (14.. 138) code by using the circuit in Fig. instead of the initial register values. oi= 1. T ( x ) uses the coefficients of S(x). Owing to the polynomial computations in (1 1) (16). According to I. Since r 6 * 5 / 14 / 2 / 2 1=1. - > - - - . 77) and (14. " Qi (x) and 3. is satisfied. Note that the maximum number of coefficient of Ri ( x > polynomial is 12. switch S is off and thus signals can not be passed to registers. Similarly. In the case of a (85. (ai.1 in GF(z"). i. The (14. Q i .(x)). This computation is necessary. RS.RS registers of block 1 and block 2. Since multiplication and = deg(Ri (XI). Fig. double-speed clock is used in this block.e.e. the modified syndrome polynomial T ( x ) shown in equation (5) can be computed by using the circuit..Ui(x) I. 1. 1. the modified Euclid's algorithm [5] is applied. If - a-' f A . S(X) ' addition is performed for ea& coefficient of polynomials.(x)= [oibiLi(x)+CiaiUi(x)]x'zi'[oiaiUi(x) +FibiLi(x)] (13) (14) ui+l (x) = q u i(x) + OiLi(x) where. We now describe briefly the hardware operation for the modified Euclid's algorithm. the coefficients of these polynomials are set in Ri . The algorithm is summarized in the following procedure. (x) = A(x) satisfied. MUXl and MUX3 select R respectively. As a result. Initially. MUX2 selects the output of the arithmetic unit (1) of block 3. 9) code processing. 3 (d) shows the arithmetic blocks used in (a) and (b). marked by dotted box. "Qi(x) R. - Li and Ui registers. Q i (x) and Li (x) . (x) = x~~ . Fig. RIOand R 1 1 registers are initially set to 0. Similarly. the upper degree coefficients of Ri ( x are stored in & R11 registers in block 3. (b).

.( x ) Arith.... No. (a... 3....(x)) ~ deg(Q..... Vol.. The modified Euclid's algorithm is processed by using the proposed architecture in parallel..4 Chien Search Implementation Block In general.' ) during odd clocks..... Fig. ( x ) .. Q .)= leading coeff..(x)) (c) Degree computation block. the decoding latency can be significantly reduced and as a consequence. w ( a .. NOVEMBER 1997 (a) Ri(x)... 4... and erroderasure evaluator polynomials are computed alternately..b. We have developed a new archtecture capable of sharing the hardware blocks.' ) is computed...... 4. ...... . Modified Euclid's algorithm processing block.... (b) Li(x). as shown in Fig. Owing to this parallel processing. Ui(x) computation block...... 3....... Qi(x) computation block.. the number of cells and the FIFO memory sizes are reduced.. of R ... deg(R. the roots of erroderasure locator polynomials... 43... o(a-')and oodd(a-') are computed... The output .. ... Hardware sharing is possible by using double-speed clock of 36 MHZ. o(a-')..... unit (3).(a-') and are computed in two blocks of the same architecture.. In this block.. The architecture is simple..1024 IEEE Transactions on Consumer Electronics..... while ~ ( a . During even clocks...... (d) Arithmetic units.... o.

Of course. Experimental Results We have implemented the proposed RS decodedencoder using VHDL. Note that in [7]. + (2t + 1) }/2. error correction capability. In the industry result. i. direct comparison is not possible. an industry design.5 RS Decoder/Encoder Timing Diagram Fig.) is different from others and therefore. A gate corresponds to a 2 input NAND gate. design specification (code length. (r( 2t + 1) / 21 +1)*2t clock cycle times are needed to complete the modified Euclid’s algorithm. IV. Chien search Implementation block. and our proposed method. [ 5 ] . circuit delays have been considered by restricting the longest path delays of the decodedencoder. decoding latency is reduced by half. The FIFO memory sizes have been reduced since the latency has been reduced.e. It takes (2t+l) clock cycle times to compute initial values used in Chien search block. the latency of the Euclid’s algorithm block is { (r(2t + 1) / 21 + 1) * 2t The error correction and verification is initially performed for (n+3) clock cycle times. In this latency.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 8 / -0 1025 r 1’ odd (a-i) Summation by XOR tree Fig. which is about 30 % smaller than the other design results. 4 . As a result. 5 shows the timing diagram for (n. by using the double-speed clock. etc. During the hardware synthesis. w . 3. [7]. We have verified the correctness of the designed decodedencoder by using extensive simulatiomith randomly generated data and errors. The new proposed architecture has been implemented by using about 35.000 gates. Berlekamp-Massey algorithm has been used. The number of total gates is from the synthesized results excluding the FIFO memories. the modified Euclid’s algorithm has been used. In others. overall decodmg latency is given by + {2t + Decoding latency = n ((r(2t + 1) / 21 + 1) * 2t + (2t + 1)) }/2 + n + 3. In particular. k) RS decodedencoder.. is adjusted to the original clock speed which is 18 M H Z. Table 1 shows the implementation results based on several design methods.

. E. New York. J. 91-92. T. vol. Y. Deutsch. "Use of the RS Decoder as an RS Encoder for Two-way Digital Communications and Storage Systems. 1994. 6. 40. 1995. Kim. no. June 1994. C. on Circuits and Systems for Video Technology. S. Confidential. Standard Definition. SECOM. K. Lee. Wicker "Error Control Systems for Digital Communication and Storage". and J. The proposed architecture has three major features. Truong." ICASSP. 4.D. Iwaki. the decoding latency and hence the memory size have been reduced. May 1996. on Circuits and Systems for Video Technology. NTSC. Hills. I. D. Jung. V. Shao. PAL. Y. W. 43. M. I. B. Nam. on Computers. DRAFT IEC document. 37. Bhargava. Third." ASIC Design Workshop. Sep. Hsu. Wang. on Consumer Electronics. respectively. [5] H. 1995. 19. "Architecture for Time or Transform Domain Decoding of Reed-Solomon Codes. pp. Okuda. Burleson.S. M. 3. Biographies Sunghoon Kwon received the B." U. I.868. 115-124. 720-723. pp. SD. a new archtecture implementing the modified Euclid's algorithm has been developed to reduce the circuit size and latency. [4] J. H. D. 1996. [6] H. pp. "On the VLSI Design of a Pipeline Reed-Solomon Decoder Using Systolic Arrays.S. 243-247. W. degree in electronics engineering from Han-Yang University. K. Yamada. Jeong and W. [9] C. Truong. "A VLSI Design of RS CODEC for Digital Data Recorder. Hsu and C. Reed. [2] S. K. 1994. and T. degree and the M.e. of Pasadena. Second. 1273-1280. Feb. vol. pp. K. Vol. Sasada.. 1989. Shao and I. [7] G." IEEE Trans. May. W. pp. 1988. S. Y. vol." IEEE Trans. Lee. 4. we have proposed a new flexible and area-efficient VLSI architecture of a Reed-Solomon product code decodedencoder for digital VCRs . no. B. He is working toward the Ph. [lo] S. 5. 1. [8] T. all of Calif. Lee. the results have not been published. 3291-3294. both References [I] S. Wicker and V. 75-81. IEEE Trans. S. Prentice Hall. L. S. IEEE Press.1026 IEEE Transactions on Consumer Electronics. Conclusions In this paper. Cho. D. S. vol. S. HD-Digital VCR Conference. S. Tanaka. Park. 10. Chun. vol. i. vol. Reed. no. Oct." ISCAS. "High-Level Estimation of High-Performance Architectures for ReedSolomon Decoding. [3] Y. Patent 4. H. M. "Architecture of A High Speed ReedSolomon Decoder. Han. First. W. pp. H. "An Area-Efficient VLSI Architecture for Decoding of Reed-Solomon Codes. area-efficiency has been significantly improved by sharing a functional block. by doubling the internal clock speed. The correctness of the proposed decoder/encoder has been verified by the extensive simulation. B. Hsu. no." IEEE Trans. June 1995 [1I] "Specifications of Digital VCR for Consumer-Use". degree . 1994. pp. I. "An Error-Control Coding Scheme for Multispeed Play of Digital VCR". Choi. NOVEMBER 1997 Modified Euclid's algorithm* (Notes): '-'in 171 means unknown. Kwuan. K. No. Feb. and T. in 1991 and 1993. L. T. Sepulveda. J. 1. Kim and S.828. S. T. 1. "Reed-Solomon Codes and Their Applications".

degree in electronics engineering from Seoul National University. Kun-Oh Institute of Technology. in 1987. In 1983.Kwon and Shin: An Area-Efficient VLSI Architecture of a Reed-Solomon DecoderEncoder for Digital VCRs 1027 in electronics engineering at Han-Yang University. . the M. Berkeley.D. he was a Member of the Technical Staff at AT&T Bell Laboratories. degree in electrical engineering from the Korea Advanced Institute of Science and Technology in 1978 and 1980. His research interests include system design and synthesis of VLSI’s Hyunchul Shin (S’78-M’80-SM’96) received the B. degree in electrical engineering and computer sciences from the University of California. respectively.S.S. he received a Fullbright Scholarship. Han-Yang University. Korea. Murray Hill. he has been with the Department of Electronics Engineering. he was with the Department of Electronics Engineering. Korea. and the Ph. NJ. From 1980 to 1983. From 1987 to 1989. Since 1989. His research interests include design and synthesis of integrated circuits and systems.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.