1312

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 52,

NO. 10,

OCTOBER 2003

Parallel CRC Realization
` Giuseppe Campobello, Giuseppe Patane, and Marco Russo
Abstract—This paper presents a theoretical result in the context of realizing high-speed hardware for parallel CRC checksums. Starting from the serial implementation widely reported in the literature, we have identified a recursive formula from which our parallel implementation is derived. In comparison with previous works, the new scheme is faster and more compact and is independent of the technology used in its realization. In our solution, the number of bits processed in parallel can be different from the degree of the polynomial generator. Last, we have also developed high-level parametric codes that are capable of generating the circuits autonomously when only the polyonomial is given. Index Terms—Parallel CRC, LFSR, error-detection, VLSI, FPGA, VHDL, digital logic.

æ
1 INTRODUCTION
codes to perform the same tasks with parallel CRC circuits. The compact representation proposed in the new scheme provides the possibility of significantly saving hardware and reaching higher frequencies in comparison with previous works. Finally, in our solution, the degree of the polynomial generator, m, and the number of bits processed in parallel, w, can be different. The article is structured as follows: Section 2 illustrates the key elements of CRC. In Section 3, we summarize previous works on parallel CRCs to provide appropriate background. In Section 4, we derive our logic equations and present the parallel circuit. In addition, we illustrate the performance by some examples. Finally, in Section 5, we evaluate our results by comparing them with those presented in previous works. The codes implemented are included in the Appendix.

YCLIC Redundancy Check (CRC) [1], [2], [3], [4], [5] is widely used in data communications and storage devices as a powerful method for dealing with data errors. It is also applied to many other fields such as the testing of integrated circuits and the detection of logical faults [6]. One of the more established hardware solutions for CRC calculation is the Linear Feedback Shift Register (LFSR), consisting of a few flip-flops (FFs) and logic gates. This simple architecture processes bits serially. In some situations, such as high-speed data communications, the speed of this serial implementation is absolutely inadequate. In these cases, a parallel computation of the CRC, where successive units of w bits are handled simultaneously, is necessary or desirable. Like any other combinatorial circuit, parallel CRC hardware could be synthetized with only two levels of gates. This is defined by laws governing digital logic. Unfortunately, this implies a huge number of gates. Furthermore, the minimization of the number of gates is an NP-hard optimization problem. Therefore, when complex circuits must be realized, one generally uses heuristics or seeks customized solutions. This paper presents a customized, elegant, and concise formal solution for building parallel CRC hardware. The new scheme generalizes and improves previous works. By making use of some mathematical principles, we will derive a recursive formula that can be used to deduce the parallel CRC circuits. Furthermore, we will show how to apply this formula and to generate the CRC circuits automatically. As in modern synthesis tools, where it is possible to specify the number of inputs of an adder and automatically generate necessary logic, we developed the necessary parametric

C

2

CYCLIC REDUNDANCY CHECK

` . G. Campobello and G. Patane are with the Department of Physics, University of Messina, Contrada Papardo, Salita Sperone 31, 98166 Messina, Italy. E-mail: {gcampo, gpatane}@ai.unime.it. . M. Russo is with the Department of Physics and Astronomy, University of Catania, 64, Via S. Sofia, I-95123 Catania, Italy. E-mail: marco.russo@ct.infn.it. Manuscript received 20 Oct. 2000; revised 26 Feb. 2002; accepted 17 Dec. 2002. For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number 113030.
0018-9340/03/$17.00 ß 2003 IEEE

As already stated in the introduction, CRC is one of the most powerful error-detecting codes. Briefly speaking, CRC can be described as follows: Let us suppose that a transmitter, T, sends a sequence, S1 , of k bits, fb0 ; b1 ; Á Á Á ; bkÀ1 g, to a receiver, R. At the same time, T generates another sequence, S2 , of m bits, fb00 ; b01 ; Á Á Á ; b0mÀ1 g, to allow the receiver to detect possible errors. The sequence S2 is commonly known as a Frame Check Sequence (FCS). It is generated by taking into account the fact that the complete sequence, S ¼ S1 [ S2 , obtained by the concatenating of S1 and S2 , has the property that it is divisible (following a particular arithmetic) by some predetermined sequence P , fp0 ; p1 ; Á Á Á ; pm g, of m þ 1 bits. After T sends S to R, R divides S (i.e., the message and the FCS) by P , using the same particular arithmetic, after it receives the message. If there is no remainder, R assumes there was no error. Fig. 1 illustrates how this mechanism works. A modulo 2 arithmetic is used in the digital realization of the above concepts [3]: The product operator is accomplished by a bitwise AND, whereas both the sum and subtraction are accomplished by bitwise XOR operators. In this case, a CRC circuit (modulo 2 divisor) can be easily realized as a special shift register, called LFSR. Fig. 2 shows a typical architecture. It can be used by both the transmitter
Published by the IEEE Computer Society

m zero bits are sent through d. so LFSR2 and LFSR have similar performance. previous works [8]. Their idea was to apply the digital filter theory to the classical CRC circuit. . Thus. the dividend is the received sequence and the divisor is the same P .e.. So. Here. In the end. no sequence of m zeros has to be sent through d. it is possible to reach a speed-up of w with respect to the time needed by the serial implementation. Later.” Furthermore. It should be mentioned that. Then. 2 are unnecessary if the divisor P is time-invariant. The signal x00 is obtained by taking an XOR of the input d and xmÀ1 . otherwise. we report the main works in the literature. In this circuit. S1 is the original sequence of k bits to transmit. in Section 5. In this paper. Finally. the feedback xmÀ1 is XOR-ed with xiÀ1 . once per clock cycle. we will call it LFSR2. . 3. P is the divisor. b0 . and Q is the quotient. the dividend is the sequence S1 concatenated with a sequence of m zeros to the right. the message length is usually much greater than m. 2.” Albertengo and Sisto [7] proposed an interesting analytical approach. The process begins by clearing all FFs. The sequence S1 is sent serially to the input d of the circuit starting from the most significant bit. Finally. The divisor is P . In practice. . The theory they developed is restricted to cases where the number of bits processed in parallel is equal to the polynomial degree (w ¼ m). If pi is zero. we compare our results with those presented in the literature. XOR related to x0i is not required). One of the possible LFSR architectures. Parallel CRC hardware is attractive because. 1. They derived a method for determining the logic equations for any generator polynomial. LFSR2 computes FCS faster than LFSR. In the simpler case of a receiver. all k bits are sent. As stated by Albertengo and Sisto [7] in 1990. In the case of the transmitter. Let us suppose that the k bits of the sequence S1 are an integral multiple of m. it is not possible to write a synthesizable VHDL code that automatically generates the equations for parallel CRCs. [10] “dealt empirically with the problem of parallel generation of CRCs. 3 RELATED WORKS Fig.: PARALLEL CRC REALIZATION 1313 Another possible implementation of the CRC circuit [7] is shown in Fig. CRC description. In 1996. the outputs of FFs (after k clock periods) are the same FCS computed by LFSR. Braun et al. Fig. S is the sequence for error detecting. we show that m FFs have common clock and clear signals. by processing the message in blocks of w bits each. S2 is the FCS of m bits.CAMPOBELLO ET AL. the degree of the divisor P . “the validity of the results is in any case restricted to a particular generator polynomial. Their formalization is based on a z-transform. many polynomial divisions are needed. [11] presented an approach suitable for FPGA implementation. They developed a special heuristic logic minimization to compute CRC checksums on FPGA in parallel. Their main results are a precise formalism and a set of proofs to derive the parallel CRC computation starting from the bit- and the receiver. when LFSR2 is used. The input x0i of the ith FF is obtained by taking an XOR of the ði À 1Þth FF output and a term given by the logical AND between pi and xmÀ1 . [9]. only a shift operation is performed (i. We point out that the AND gates in Fig. In Fig. To obtain logic equations. the FCS appears at the output end of the FFs. 2. A very complex analytical proof is presented.

but the critical path of the final circuits grows substantially. VOL. In the following. both of arbitrary length. [16]. but equations are not so optimized. Later. the solution of the system (1) (expressed by (2)) is valid even if we replace multiplication and addition with the AND and XOR operators. & Xði þ 1Þ Y ðiÞ ¼ F XðiÞ þ GUðiÞ ¼ HXðiÞ þ JUðiÞ. but the final circuit has worse performance. J to denote matrices and use X.1314 IEEE TRANSACTIONS ON COMPUTERS. LFSR2 architecture. thus. The theory they developed is quite general.e. the FFs output give the desired FCS. consists of a few nibbles. 2. OCTOBER 2003 Fig. È. [14] proposed another approach based on the theory of the Galois field. We use F . So. The derivation is valid for any polynomial and data-width w. Shieh et al. Let us consider the circuit shown in Fig. 52. respectively. He proposed interesting VHDL parametric codes. equations are not optimized (see Section 5). The possibility of using several smaller look-up tables (LUTs) is also shown. 1g. w may differ from m). and Y the output. However. respectively. H and J are the identity and zero matrices.. After kþm w clock periods. the sequence S1 plus the zeros are sent to the circuit in blocks of w bits each. Sprachmann [13] implemented parallel CRC circuits of LSFR2. From linear systems theory [17]. 2. It is just a discrete-time. 2 0 pmÀ1 1 6 1 6 pmÀ2 0 6 F ¼ 6 ÁÁÁ ÁÁÁ ÁÁÁ 6 6 0 0 4 p1 0 0 p0 ÁÁÁ ÁÁÁ ÁÁÁ ÁÁÁ ÁÁÁ 3 0 7 0 7 7 Á Á Á 7. it is easy to demonstrate that the structure fB. From this consideration. 2).e. time-invariant linear system can be expressed as follows: . In order to point out that the XOR and AND operators must be also used in the product of matrices. G ¼ ½0 0 Á Á Á 1ŠT . i. He also proposed a parametric VHDL code accepting the polynomial and the input data width. In 2001. In fact. H. as desired parallelism. we know that a discretetime. Their derivation method is similar to ours. as in [13]. Ág is a ring with identity (Galois Field GF(2) [18]). . ð1Þ where X is the state of the system. Their work is similar to our work. we will denote their product by . if we use È to denote the XOR operation and the symbol Á to denote bitwise AND and B to denote the set f0. Matrix F and G are chosen according to the equations of serial LFRS. 10. This work is similar to ours. we have developed our parallel implementation of the CRC. their hardware implementation is strongly based on lookahead techniques [15].. . their final circuits require more area and elaboration time. like those presented in our paper (i. U the input. and U to denote column vectors. the state X represents the FFs output and the vector Y coincides with X. 7 7 1 5 0 4 PARALLEL CRC COMPUTATION Starting from the circuit represented in Fig. but their proofs are more complex. but. time-invariant linear system for which the input UðiÞ is the ith bit of the input sequence. In the final circuit that we will obtain. NO. G. The solution of the first equation of the system (1) is: XðiÞ ¼ F i Xð0Þ þ ½F iÀ1 G Á Á Á F G GŠ½Uð0Þ Á Á Á Uði À 1ފT : ð2Þ We can apply (2) to the LFSR circuit (Fig. 3. . In the same year. serial case. we have: X ¼ ½xmÀ1 Á Á Á x1 x0 ŠT H ¼ Im : The identity matrix of size m  m J ¼ ½0 0 Á Á Á 0ŠT U ¼ d. we assume that the degree of polynomial generator (m) and the length of the message to be processed (k) are both multiples of the number of bits to be processed in parallel (w). McCluskey [12] developed a high-speed VHDL implementation of CRC suitable for FPGA implementation. This is typical in data transmission where a message consists of many bytes and the polynomial generator. Y .

and ½d3 d2 d1 d0 ŠT to denote the three column vectors X0 . ð8Þ where Xð0Þ is the initial state of the FFs. X is the desired FCS. When i coincides with w. for example:  2 3 1 11 0  60 00 17  7 F2 ¼ 6 4 1 0  0 0 5:  1 10 0 The same procedure may be applied to derive equations for parallel version of LFSR2. the interested reader can find the two listings that generate the correct VHDL code for the CRC parallel circuit we propose here. less than a couple of minutes are necessary in the cases of CRC-12. In the Appendix. a power of F of lower order is immediately obtained. This result implies that it is possible to calculate the m bits of the FCS by sending the k þ m bits of the message S1 plus the zeros. er.CAMPOBELLO ET AL. the next state and the present state of the system. using a Pentium II 350 MHz with 64 MB of RAM. it is important to evaluate the matrix F w . the coefficients of the generator polynomial).e. we have P ¼ f1. for clarity. by considering (8). and D ¼ ½dmÀ1 Á Á Á d1 d0 ŠT assumes the following values: ½0 Á Á Á 0jb0 Á Á Á bwÀ1 ŠT . 1. ð3Þ Finally. More precisely. the inputs of the FFs are the exclusive sum of some FF outputs and inputs. we have: x03 ¼ x2 È x1 È x0 È d3 x02 ¼ x3 È x2 È d2 x01 ¼ x3 È x2 È x1 È d1 x00 ¼ x3 È x2 È x1 È x0 È d0 : As indicated above. Yet again.c (Enables) are taken from F w . From (5). etc. As we stated in the . Inputs dwÀ1 Á Á Á d0 are the bits of dividend (Section 2) sent in groups of w bits each. 1g. and CRC-CCITT. It follows that: 2 3 1 1 0 0 60 0 1 07 7 F ¼6 4 0 0 0 1 5: 1 0 0 0 Then. the matrix G is G ¼ P 0 and (4) becomes: X 0 ¼ F w  ðX È DÞ. So. For CRC-32. where D assumes the values ½b0 Á Á Á bwÀ1 j0 Á Á Á 0ŠT . As to the realization of the LFSR2. In this case. as we will show later. applied to four commonly used CRC polynomial generators. where ImÀw is the identical matrix of order m À w. we have also written a MATLAB code that is able to generate a VHDL code. then inputs dmÀ1 Á Á Á dw are not needed. we obtain a recursive formula: X 0 ¼ F w  X È D. we have: F m ¼ ½F mÀ1  P 0 j Á Á Á jF  P 0 jP 0 Š: w m ð7Þ So. Let us suppose.1 Hardware Realization A parallel implementation of the CRC can be derived from the above considerations. The signals er. respectively. we can obtain F w when F m is already available. Fig. ½x3 x2 x1 x0 ŠT . etc. but it is easy to show that the matrix can be constructed recursively when i ranges from 2 to w: 2 2 3 3 pmÀ1   6 6 Á Á Á 7  the first m À 1 7 7 7 ð5Þ F i ¼ 6F iÀ1  6 4 4 p1 5  columns of F iÀ1 5:   p0 This formula permits an efficient VHDL code to be written. In this case. we have:  ! ImÀw ð6Þ F w ¼ F wÀ1  P 0 Á Á Á F  P 0 P 0   0 . If we indicate with P 0 the vector ½pmÀ1 Á Á Á p1 p0 ŠT . ð4Þ where. Considering that the system is time-invariant. if w < m. Even in the case of Fig. respectively. it is synthesized much faster than the previous VHDL code. we have a circuit very similar to that in Fig. then the AND gates are unnecessary. thus. 4 shows a possible implementation. the number of FFs remains unchanged. 4 where inputs d are XORed with FF outputs and results are fed back. Actually. after applying 2 0 1 1 61 1 0 4 F ¼6 41 1 1 1 1 1 (5). 4.2 Examples Here. F may be obtained from F as follows: The first w columns of F w are the last w columns of F m . our results. if we use ½x03 x02 x01 x00 ŠT . CRC-16. having F 4 available. where bi are the bits of the sequence S1 followed by a sequence of m zeros. and D in (4). ½bw Á Á Á b2wÀ1 j0 Á Á Á 0ŠT . 0. if the divisor P is fixed. after kþm clock w periods. it consists of a special register. several hours are required to evaluate F m by our synthesis tool. There are several options.. Now. Furthermore. we obtain: 3 1 07 7: 05 1 4.: PARALLEL CRC REALIZATION 1315 where pi are the bits of the divisor P (i.c is equal to the value in F w at the rth row and cth column. Furthermore. 4. 0. X. for example. the solution derived from (2) with substitution of the operators is: XðwÞ ¼ F w  Xð0Þ È ½0 Á Á Á 0jdð0Þ Á Á Á dðw À 1ފT . are reported. in blocks of w bits each. we have indicated with X0 and X. ½0 Á Á Á 0jbw Á Á Á b2wÀ1 ŠT .. We recall that. if w ¼ m ¼ 4. The upper right part of F w is ImÀw and the lower right part must be filled with zeros. So. The code produces logic equations of the desired CRC directly. for synthesizing the parallel CRC circuits.

0. 0. 0. where the first bit is the most significant. 0. 0. matrices F m are not reported as matrices of bits. 0. 0. 0. 1. many polynomial divisions are required to obtain logic equations. VOL. 1. 1. 0. 0. 1g 16 FCRC-CCITT ¼ ½0C88 0644 0322 8191 CC40 6620 B310 D988 matrix. 0. 0. Their work is based on the LFSR2 circuit. 0. 0. 0. 0. 0. 0.1316 IEEE TRANSACTIONS ON COMPUTERS. so we report only the F m P ¼ f1. This implies that it is not possible to write synthesizable VHDL codes that automatically generate CRC circuits. 0. 1. 1. 1. CRC-12: P ¼ f1. 0. 1. one more level of XOR is required with respect to the parallel LFSR circuit we propose. 0. 0. 0. For the example reported above. Further considerations can be made 0060 0030 0018 000C 8006 4003 7FFE BFFFŠT : . 0. 10. 0. 0. Albertengo and Sisto [7] based their formalization on z-transform. . This implies that our proposal is. In their approach. 52. 0. 1g 12 FCRC-12 ¼ ½CFF 280 140 0A0 050 028 814 ECC4 7662 3B31 9110 C888 6444 3222 1911ŠT : . 0. As we have already shown in Section 4. 0. 1. They are reported as a column vector in which each element is the hexadecimal representation of the binary sequence obtained from the corresponding row of F m . 0. 0. 1g 32 FCRC-32 ¼ ½FB808B20 7DC04590 BEE022C8 5F701164 2FB808B2 97DC0459 B06E890C 58374486 AC1BA243 AD8D5A01 AD462620 56A31310 2B518988 95A8C4C4 CAD46262 656A3131 493593B8 249AC9DC 924D64EE C926B277 9F13D21B B409622D 21843A36 90C21D1B 33E185AD 627049F6 313824FB E31C995D 8A0EC78E C50763C7 19033AC3 F7011641ŠT : 40A 205 DFD A01 9FFŠ : T 5 . 0. 1. 0. generally speaking. CRC-CCITT: . 0. 1. 0. generally. 1. 1. we have F 4 ¼ ½7 C E FŠT . 1. 0. 1. 0. F w may be derived from F m . previous paragraph. our theory can be applied to the same circuit. 0. 0. 0. 0. 1. 0. NO. 1. 0. CRC-16: P ¼ f1. CRC-32: P ¼ f1. OCTOBER 2003 Fig. 0. 0. 0. However. 0. 0. faster. Parallel CRC architecture. 1. 0. 1. (8) shows that. 0. 0. 4. In order to improve the readability. 1. 1g 16 FCRC-16 ¼ ½DFFF 3000 1800 0C00 0600 0300 0180 00C0 COMPARISONS . 1. 1.

parallel LFSR realizations are cheaper than LFSR2 ones. the synthesized circuit needed 162 logic cells (i. Even if very powerful VHDL synthesis tools are used.436 FFs. The main difference between our work and Braun et al. many expressions of this kind are present in the final equations. as is possible to observe in Fig. We have synthesized our results regarding the CRC-32. This phenomenon is less critical in the case of LFSR. many LUTs are not completely utilized. whereas. With LFSR.vhd and the second time refers to the VHDL generated with MATLAB. more synthesis time is necessary with respect to our final VHDL code. Optimizations like xj È xj ¼ 0 and xj È xj È xj ¼ xj must be processed from a synthesis tool. while. 1.. if FPGA is chosen as the target technology. For comparison purposes. counting the ones and supposing the realization of XORs with more than . We have at our disposal the MAX+PLUSII ver.9 ns. Even in this case. The PFU structure permits the implementation of a 21-input XOR in a single PFU. This implies that. The implementation of our parallel circuit usually requires less area (70-90 percent) and has higher speed (a speedup of $ 4 is achieved). it is not certain that they are able to find the most compact logical form.8 MHz. We have compiled VHDL codes reported in [13] using our Altera tool. In [14]. t1 is the synthesis time of the VHDL code reported in this paper using our synthesis tool. . they obtain 408 2-input XORs and seven levels of gates.e.altera. As a consequence.: PARALLEL CRC REALIZATION 1317 . only XOR are used in the final representation of the formula. term xj appears only once or not at all.. This kind of FPGA uses a technology of 0:3-0:35"m containing 196 Programmable Function Units (PFUs) and 2. The technology target was the FLEX-10KATC144-1. xj may appear more (up to m times). in our equation of x0i . we work with smaller matrices and our proofs are not so complex as those presented in [11].CAMPOBELLO ET AL. the first time refers to the crcgen.10. McCluskey’s matrix is larger. The better results obtained with our approach are due to the fact that matrices are different (i. We evaluated the number of required 2-input XORs starting from the matrix F w .com.3 ns. There is a flexible input structure inside the PFUs and a total of 21 inputs per PFU. he deals with matrices a little greater than the ones we use. One LUT introduces a delay of 0. In the case of high speed LFSR2. different starting equations) and. 4 in [13]. p. TABLE 1 Performance Evaluation for a Variety of Parallel CRC Circuits Results are for w ¼ m.e. if the CRC circuit is realized using FPGAs.0 software1 to synthesize VHDL using ALTERA FPGAs. In order to give some numerical results to confirm our considerations. For [13]. in the starting equation of [13]. LUTs is the number of look-up tables used in FPGA.. 68]). a total of 288 LUTs) with a speed of 105. There are two main motives that explain these results: The former is the same mentioned at the beginning of this section regarding the differences between LFSR and LFSR2 FPGA implementation. We have synthesized a 21-input XOR and have realized that one Logic Array Block (LAB) is required for synthesizing both small and fast circuits. we have synthesized the CRC32 in both cases. a total of 162 LUTs) with a maximum operating frequency of $ 137 MHz. The process technology is 0:35"m and the typical LUT delay is 0.’s [11] is the dimension of the matrices to be dealt with and the complexity of the proofs. t1 and t2 are the synthesis times. The latter is that our starting equations are optimized. Even when they are able to. It is smaller than ORCA3T30. After Synopsys optimization. see Table 1.e. we needed 162 LUTs to obtain a critical path of 7. Each PFU has eight LUTs each with four inputs and 10 FFs.8 ns are required. a detailed performance evaluation of the CRC-32 circuit is reported. CPD is the critical path delay. An LUT is a little SRAM which usually has more than two inputs (typically four or more). So. for LFSR2. based on look-up-tables (LUTs). there are many twoinput XOR gates (see [7. The solution suitable for our purpose is related to the use of the ORCA 3T30-6 FPGA. required 36 PFUs (i. i. Our results are simpler. For details.. he started from the LFSR circuit and derived an empirical recursive equation). . each with four inputs. The results he obtained are similar to ours (i. we take results from [14] when m ¼ w ¼ 32. More precisely. each LAB contains some FFs and eight LUTs. McCluskey.. Accurate results are reported in the paper dealing with two different possible FPGA solutions. they start from a requirement of 448 2-input XORs and a critical path of 15 levels of XORs.9 ns. But. In short. One of them offers us the possibility of comparing our results with those presented by McCluskey. McCluskey [12] developed a high speed VHDL implementation of CRC suitable for FPGA implementation. .e. When m grows. Large FPGAs are.e. our final circuit requires less area ($ 60 percent) and has greater speed ($ 30 percent). Available at www. generally. For the matrix realization. what is more. in this case. 182 LUTs and 10. When m ¼ w ¼ 32.

three levels of 2-input XORs are required with a total of seven gates. 7. With other smart tricks it is possible to obtain more compact circuits. The code: crcgen. The matlab file: crcgen.m. 52. but the circuits are a little bit larger. 6. OCTOBER 2003 Fig. 5. two inputs with binary tree architectures. Fig. the number of required gates decreases to 370. Fig. This approach gives 452 2-input XORs and only five levels of gates before optimization. NO. to realize an 8-input XOR. The package: crcpack. with some simple manual tricks. it is possible to obtain hardware savings. We do not have at our disposal the Synopsys tool. 10. for example. . This implies that our approach produces faster circuits. For example.1318 IEEE TRANSACTIONS ON COMPUTERS.vhd. VOL. identifying common XOR subexpressions and realizing them only once. so we do not know which is the automatic optimization achievable starting from the 452 initial XORs. However. So.vhd.

Extended Precision Checksums and Cyclic Redundancy Checks. Workshop Field Programmable Logic and Applications.” IEEE Design and Test of Computers. ACKNOWLEDGMENTS The authors wish to thank Chao Yang of the ILOG. A.CAMPOBELLO ET AL. “High Speed Calculation of Cyclic Redundancy Codes. His primary interests are reconfigurable computing. Boca Raton. University of Messina. Jan.W. 1961. Florida). microprocessor architectures. He is coeditor of the book Fuzzy Learning and Applications (CRC Press. 1999.2 Matlab Code In Fig. 5 and 6.. respectively. 1998. M. he has been with the Department of Physics and Astronomy at the University of Catania. “Cyclic Codes for Error Detection. Oct. and conferences.txt. McCluskey.m file needs another file. R. books. Sept. p.S. “Parallel CRC Generation. N. 1975. ACM. please visit our Digital Library at http://computer. this file contains the first 34 rows of crcgen. Saxena and E. Italy. Data and Computer Communications. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] W.” Proc. Lee. AddisonWesley Longman. A. M. Italy. Giuseppe Campobello received the “Laurea” (MS) degree in electronic engineering from the University of Messina. VLSI design. we present two VHDL listings. and distributed computing. May 2001. A. Shieh et al. Application-Specific Integrated Circuits.V. 1990. the crcgen. Braun et al. Smith. From 1998 through 2002. Tanenbaum. This code is synthesized much faster than the previous one. 1998. he is in charge of research at the National Institute for Nuclear Physics (INFN). IRE. “Analysis of Checksums. VLSI design. as an associate professor of computer science. M. ACM.S. G. Peterson and D. Since 2001. Albertengo and R. a valuable aid that contributed to the improvement of the quality of the paper. Stallings. respectively. M. His primary interests are soft computing. Italy.” Comm. VLSI design. Italy. “Parallel CRC Lets Many Lines Use One Circuit. T. Information Science and Eng. 2000. It is necessary to assign to the constant CRC the divisor P . Prentice Hall. named crcgen. computer networks. In 2001. Italy.” Proc.vhd.” J. “Parallel CRC Computation in FPGAs. in 2000. June 1983. to CRCDIM the value of m.” Pirdi-1/98. Sarwate. Italy. Aug.T. 1981. and soft computing. he is a researcher in computer science..J. “A Characterization of 1-Perfect Additive Codes. and to DATAWIDTH the desired processing parallelism (w value). W. “The Tea-Leaf Reader Algorithm: An Efficient Implementation of CRC-16 and CRC-32.” IEEE Micro. Rifa. Sisto.” Comm. May 2001. optimization techniques. currently. They are the codes we developed describing our parallel realization of both LFSR and LFSR2 circuits. . In order to work correctly. Carlyle Stones. Computer Networks.S. July 1977. July 1990.J. and distributed computing. Feedback Control of Dynamic Systems. July 1987. in 1997. used to directly produce the VHDL listing of the desired CRC where only the logical equations of the CRC are present. Catania. Addison Wesley. D. in little endian form. where. Brown. For more information on this or any computing topic.org/publications/dlib. Aug.” Digital Design..D. Marco Russo received the MA and PhD degrees in electronic engineering from the University of Catania. he was with the Department of Physics. Ramabadran and S. “A Tutorial on CRC Computations. ´ J.” IEEE Trans.: PARALLEL CRC REALIZATION 1319 APPENDIX A A. Griffiths and G.” Proc.F. Italy. “Byte-Wise CRC Calculations. Italy. A. “Cyclic Codes Redundancy. He has more than 90 technichal publications appearing in internationl journals. Prentice Hall. optimization techniques. 1988. Spachmann. His primary interests are soft computing.K. The authors also wish to thank the anonymous reviewers for their work. distributed computing. and the University of Palermo. “A Systematic Approach for Parallel CRC Computations.” Computer Design. Field Programmable Gate Arrays.V.vhd. . Cassa.J. Jan. G. Inc. Perez. he has been a PhD student in information technology at the University of Messina. named.1 VHDL Code In Figs.m. crcpack. G.vhd and crcgen. 1994. 1988. ` Giuseppe Patane received the “Laurea” and the PhD degrees in electronic engineering from the University of Catania.. The codes reported are ready to generate the hardware for the CRC-16 where m ¼ w ¼ 16. Franklin et al. Pandeya and T. for his useful suggestions during the revision process.” IEEE Micro. “Computation of Cyclic Redundancy Checks via Table Look-Up. “Automatic Generation of Parallel CRC Circuits. 250. in 2001. Gaitonde. McCluskey. 7. Borges and J. 1999 ACM/SIGDA Seventh Int’l Symp. 1996. Since 2003. he joined the Department of Physics at the University of Messina. Currently. we report the Matlab code.R. named crcgen. J. in 1992 and 1996.” IEEE Micro. Computers.

Sign up to vote on this title
UsefulNot useful