You are on page 1of 6

A Low Power Hard Decision Decoder for BCH

Codes
2021 10th International Conference on Advances in Computing and Communications (ICACC) | 978-1-6654-3919-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICACC-202152719.2021.9708303

Pranitha Garlapati B. Yamuna


Department of Electronics and Communication Engineering, Department of Electronics and Communication Engineering,
Amrita School of Engineering, Coimbatore, Amrita School of Engineering, Coimbatore,
Amrita Vishwa Vidyapeetham, India. Amrita Vishwa Vidyapeetham, India,
*Email:b yamuna@cb.amrita.edu.

Karthi Balasubramanian
Department of Electronics and Communication Engineering,
Amrita School of Engineering, Coimbatore,
Amrita Vishwa Vidyapeetham, India.

Abstract—In decoding of Bose-Chaudhuri-Hocquenghem solving the error location polynomial demands high hardware
codes, Peterson’s algorithm is more efficient for codes with complexity, which in turn affects the speed of the decoder.
single, double and triple error correcting capabilities. Numerous Peterson’s decoder is used for determining the error location
methods were proposed to reduce the hardware complexity
caused due to the inversion operation involved in the Peterson’s polynomial when the error correcting capability is small. The
algorithm. In this paper, a low complex hardware BCH decoder algorithm employs GF multiplications and inversions [8], [9].
using inversion-less Peterson’s algorithm presented in literature is The throughput and complexity of the algorithm are mainly
designed and its performance is verified with the Matlab results. dependent on GF inversion since the size of the inversion
An attempt is made to design a low power version of this low circuit is almost double the size of the multiplier [9]. Jurgen
complex BCH decoder by replacing the parallel Chien search
architecture in the decoder with the two-step p-parallel Chien Freudenberger et al., in [10] have proposed an inversion-less
search approach that is originally used in literature with the Peterson’s algorithm to reduce the complexity of the decoding
Berlekamp-Massey Algorithm. For use with the inversion-less process.
Peterson’s algorithm the parallel Chien search architecture has Solving the error location polynomial involves search for its
been modified and the resultant decoder has shown a power roots. Chien search algorithm is commonly used for finding
reduction of up to 42 percentage with a moderate increase in
area by 10 percentage. the roots of the error location polynomial. This root searching
Index Terms—BCH codes, Hard decision decoding, Low com- step is the highest power consuming step in the decoding
plex decoder, BER performance. process. Power efficient Chien search approaches like early
termination scheme [11], [12] and polynomial order reduction
I. I NTRODUCTION [13] have been proposed to reduce the power consumption.
Error correcting codes are an integral part in most of the dig- The early termination scheme and polynomial order reduction
ital communication systems. Bose-Chaudhuri-Hocquenghem techniques save power significantly when the errors are located
(BCH) codes, turbo codes and convolutional codes are some at the beginning of the codeword but they fail to do so when
of the widely used error correcting codes [1]. Binary BCH the errors are located at other positions of the codeword.
codes belong to a class of multiple error correcting cyclic To overcome the above drawback, a two step Chien search
codes constructed from a generator polynomial whose roots approach has been proposed in [14] which leads to power
are from the extended Galois Field (GF). BCH codes with saving, regardless of the location of the errors.
single, double and triple error correcting capability finds wide In this work, we propose a low power version of the low
applications in optical communication, digital storage systems complex BCH decoder introduced by Jurgen Freudenberger
and random access memory applications [2]–[5]. et al. [10]. The proposed low power architecture makes use
Algorithms and architectures for Hard Decision Decoding of the two-step p-parallel Chien search approach proposed in
(HDD) and Soft Decision Decoding (SDD) of BCH codes [14] in place of the parallel Chien search architecture in the
have been proposed in literature [6], [7]. Berlekamp-Massey low complex decoder proposed by Jurgen Freudenberger et al..
algorithm (BMA) and Peterson’s algorithm are the two com- The two-step p-parallel Chien search approach involves two
monly used HDD algorithms. Both the algorithms involve steps: the first step involves processing q bits for each clock
generation of error location polynomial and solving it for cycle, the second step involves processing the remaining f-q
determining the positions of errors. In the BCH decoding bits only when the first step is successful, where f represents
process, the step involving determining the error locations by the field dimension. This leads to a reduction in the number
of computations and hence the esulting power saving.
978-1-6654-3919-0/21/$31.00 ©2021 IEEE The paper is organized as follows. Sections II and III

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.
give a brief introduction to the BCH encoder and decoder error correcting capability [10], [16]. Chien search algorithm
respectively. The low complex BCH decoder using inversion- determines the roots of the error location polynomial that gives
less Peterson’s algorithm and its hardware architecture are the decoded error positions.
briefed in section IV. The proposed low power architecture
IV. LOW COMPLEX BCH DECODER
is presented in section V followed by the simulation results in
section VI. The conclusions are summarized in section VII. The conventional Peterson’s algorithm uses inversion oper-
ations for generating the error location polynomial. Inversion-
II. BCH ENCODER less Peterson’s algorithm is used to reduce the complexity
An (n, k, t) binary BCH code exists for any positive integer caused due to the inversion operation. To achieve inversion-
f ≥ 3 and t < 2f −1 where t is the error correcting capability, less operation, all the f bit coefficients of the error location
n is the length of the codeword that is output by the BCH polynomial are multiplied using a non zero factor without
encoder and k is the length of input message to the encoder changing the roots of the polynomial. The pipelined hardware
[15]. architecture of the inversion-less Peterson’s algorithm in [10],
BCH encoder generates a BCH code of length n and error that increases the speed of the low complex decoder is shown
correcting capability t using generator polynomial g(x) for in Fig. 3.
k message bits. The hardware implementation of the BCH
encoder proposed in [15], is carried out using the Linear
Feedback Shift Register (LFSR) as shown in Fig. 1, where
the coefficients of the generator polynomial act as feedback to
the LFSR. To generate the codeword, k message bits are fed
as input to the LFSR and parity bits are generated using the
LFSR. The output of the LFSR is the BCH codeword.

Fig. 1. An (n,k) BCH encoder [15]

III. BCH DECODER


The three main steps in HDD of BCH codes are syndrome
computation, generation of the error location polynomial and
solving the error location polynomial. Fig. 2 shows the block
diagram of the hard decision BCH decoder [10].

Fig. 3. Hardware architecture for Inversion-less Peterson’s algorithm [10]

The error positions are determined by solving the error loca-


tion polynomial using the Chien search root finding algorithm.
To increase the decoder throughput, the parallel Chien search
Fig. 2. Block diagram of the BCH decoder [10] architecture has been employed in place of the conventional
Chien search architecture [2] as shown in Fig. 4. In every
The input to the syndrome generator is the received word computation involving solving the error location polynomial,
from which the most likely transmitted codeword is to all the f bits of each of the coefficients are accessed.
be decoded, The syndrome computation module generates The low complexity decoder design proposed in [10] uses
S1 , S2 , ...., S2t−1 syndromes. The generated syndromes de- inversion-less Peterson’s algorithm for generating the error
pend only on the error vector. Peterson’s algorithm is more location polynomial and parallel Chien search approach for
efficient than BMA for decoding of BCH codes with low determining the error positions. However there always exists

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Modified two-step p-parallel Chien search architecture

Fig. 4. Conventional parallel Chien search architecture [2]


[10] is almost same except that the two-step p-parallel Chien
search architecture requires additional p, one bit registers and
a possibility of further reduction in power consumption and (p − 1)t buffers of length f bits.
hence provide more power efficient decoders.
VI. EXPERIMENTAL RESULTS
V. PROPOSED LOW POWER DECODER The BCH encoder is designed in Matlab and used to encode
ARCHITECTURE an input data of length 10,00,000 bits. BPSK scheme is used
The proposed work attempts to reduce the power consumed to modulate the enoded message bits and sent through an
in the low complex BCH decoder design presented in [10]. The AWGN channel. The resultant received words are decoded
Chien search module is the highest power consuming step in using the low complex BCH decoder [10] designed in Matlab.
the decoding process. The proposed BCH decoder architecture The hardware designs of both the low complex BCH decoder
uses a two-step p-parallel Chien search architecture proposed and the proposed low power decoder are done in Verilog.
by Hoyoung Yoo et al. [14] in the low complex decoder [10] Simulations for the Bit Error Rate (BER) performance is
in place of the parallel Chien search architecture. carried out for both the decoders using the same test vec-
The BCH decoder presented in [14] uses Berlekamp-Massey tors generated from Matlab, for (15,5,3), (15,7,2), (15,11,1),
algorithm to generate the error location polynomial but the (63,45,3), (63,51,2), (63,57,1), (255,231,3), (255,239,2) and
proposed design uses the inversion-less Peterson’s algorithm (255,247,1) BCH codes.
as in [10]. Hence the hardware architecture for the two-step p- The low power BCH decoder designed based on the inversion-
parallel Chien search used in [14] is modified as shown in Fig. less Peterson’s algorithm and the two-step p-parallel Chien
5 in accordance to the error location polynomial generated by search approach is compared with the low complex decoder
the inversion-less Peterson’s algorithm. based on inversion-less Peterson’s algorithm and parallel
In the two step p-parallel Chien search approach, each Chien search approach for field dimensions of 4, 6 and 8 and
computation is divided into two steps: In the first step the error correcting capabilities of one, two and three. The BER
occurrence of errors is determined by accessing only q of simulation results of both the hardware designs are compared
the f bits, and not all the f bits and in the second step, the with that of the decoder designed in Matlab for the above
remaining f-q bits are accessed only when the output of the mentioned BCH codes and the same are presented in Fig. 6 -
first step is zero i.e., λ(αyp+j )(f −q−1:0) is accessed only when Fig. 14. It is observed that there is no difference in the Matlab
λ(αyp+j )(f −1:f −q) is zero, where j ranges from 1 to p, λ(x) is and Verilog outputs in all the cases.
the error location polynomial, yj denotes the registers used to To analyse the resource utilization of the decoders, the
store the intermediate values obtained during the computation designs were implemented on Basys 3 Artix-7 FPGA trainer
and p denotes the number of αj computations at every clock board. Tables I and II show the resource utilization summaries
cycle. In this two-step approach, power reduction is observed for the low complex and the proposed low power BCH
since the second step is performed only when the output of decoders respectively for field dimensions of f = 4, 6 and
the first step is zero. 8.
Each of the computation in the two-step approach results To analyse the power consumed by the proposed low
in an increase in the critical path length. To resolve this power approach and the original design, the decoders were
issue, delay elements are inserted so that the computations synthesised in Synopsys using 90nm CMOS technology. Fig.
works in a pipelined manner. The hardware requirement for 15 shows the percentage reduction in the power dissipation
the proposed decoder and the low complex BCH decoder in of the proposed architecture with respect to the bit width of

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Comparative BER performance of (15,5,3) BCH decoders(low Fig. 8. Comparative BER performance of (15,11,1) BCH decoders(low
complex and low power) complex and low power)

Fig. 7. Comparative BER performance of (15,7,2) BCH decoders(low Fig. 9. Comparative BER performance of (63,45,3) BCH decoders(low
complex and low power) complex and low power)

TABLE I TABLE II
FPGA RESOURCE UTILIZATIONS OF THE LOW COMPLEX BCH DECODER FPGA RESOURCE UTILIZATIONS OF THE PROPOSED LOW POWER BCH
DECODER
BCH decoder LUTs Flip-flops Throughput:
utilized utilized codewords BCH decoder LUTs Flip-flops Throughput:
per second utilized utilized codewords
BCH decoder (15,5,3) 180 263 90 ∗ 106 per second
BCH decoder (15,7,2) 121 207 111 ∗ 106 BCH decoder (15,5,3) 198 289 82 ∗ 106
BCH decoder (15,11,1) 94 159 125 ∗ 106 BCH decoder (15,7,2) 133 227 104 ∗ 106
BCH decoder (63,45,3) 445 636 83 ∗ 106 BCH decoder (15,11,1) 105 175 112 ∗ 106
BCH decoder (63,51,2) 287 478 91 ∗ 106 BCH decoder (63,45,3) 485 690 74 ∗ 106
BCH decoder (63,57,1) 181 248 95 ∗ 106 BCH decoder (63,51,2) 315 525 83 ∗ 106
BCH decoder (255,231,3) 934 651 69 ∗ 106 BCH decoder (63,57,1) 189 272 87 ∗ 106
BCH decoder (255,239,2) 798 586 71 ∗ 106 BCH decoder (255,231,3) 1027 715 62 ∗ 106
BCH decoder (255,247,1) 630 539 76 ∗ 106 BCH decoder (255,239,2) 880 640 64 ∗ 106
BCH decoder (255,247,1) 696 590 70 ∗ 106

the first stage of the two-step p-parallel Chien approach and


the field dimension f. It can seen that the power reduction is

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.
Fig. 10. Comparative BER performance of (63,51,2) BCH decoders(low Fig. 12. Comparative BER performance of (255,231,3) BCH decoders(low
complex and low power) complex and low power)

Fig. 11. Comparative BER performance of (63,57,1) BCH decoders(low Fig. 13. Comparative BER performance of (255,239,2) BCH decoders(low
complex and low power) complex and low power)

directly proportional to the field dimension. It is also observed Simulation results show that there is a power saving of
that the maximum power reduction is obtained when smaller upto 42% when compared to the low complex BCH decoder
bit width is used at the first stage of the two-step p-parallel designed in [10] with a 10% increase in resource utilization
Chien search approach. It is observed that the optimal width and 10% decrease in throughput. Power savings due to the
for processing the first step that can yield maximum power designed architecture improves with the increase in field
saving is q = 2, for (15,5,3), (15,7,2), (15,11,1) BCH codes; dimension. It is envisaged that future work on high speed
q = 3 for (63,45,3), (63,51,2), (63,57,1) BCH codes and q = and area-efficient architectures will be able to overcome the
3 for (255,231,3), (255,239,2), (255,247,1) BCH codes. The speed and area trade-offs incurred by the proposed low power
corresponding maximum power reduction that can be achieved architecture.
are 20.40%, 31.98% and 41.62% respectively.

VII. CONCLUSIONS
This paper proposes a hardware decoder that uses the two
step p-parallel Chien search architecture in the low complex
BCH decoder to reduce the power consumption.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.
[10] J. Freudenberger, M. Rajab, and S. Shavgulidze, “A low-complexity
three-error-correcting bch decoder with applications in concatenated
codes,” in SCC 2019; 12th International ITG Conference on Systems,
Communications and Coding. VDE, 2019, pp. 1–5.
[11] K. Lee, S. Lim, and J. Kim, “Low-cost, low-power and high-throughput
bch decoder for nand flash memory,” pp. 413–415, 2012.
[12] Y. Wu, “Low power decoding of bch codes,” in 2004 IEEE International
Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), vol. 2,
2004, pp. II–369.
[13] S.-Y. Wong, C. Chen, and Q. M. J. Wu, “Low power chien search for bch
decoder using rt-level power management,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 19, no. 2, pp. 338–341,
2011.
[14] H. Yoo, Y. Lee, and I.-C. Park, “Low-power parallel chien search
architecture using a two-step approach,” IEEE Transactions on Circuits
and Systems II: Express Briefs, vol. 63, no. 3, pp. 269–273, 2016.
[15] J. Massey, “Review of’theory and practice of error control codes’(blahut,
re; 1983),” IEEE Transactions on Information Theory, vol. 31, no. 4, pp.
553–554, 1985.
[16] W. Peterson, “Encoding and error-correction procedures for the bose-
chaudhuri codes,” IRE Transactions on information theory, vol. 6, no. 4,
pp. 459–470, 1960.
Fig. 14. Comparative BER performance of (255,247,1) BCH decoders(low
complex and low power)

Fig. 15. Percentage of power savings versus bit width of the first stage for
field dimensions f = 4, 6 and 8

R EFERENCES
[1] M. Vinodhini, N. S. Murty, and T. K. Ramesh, “Transient error correc-
tion coding scheme for reliable low power data link layer in noc,” IEEE
Access, vol. 8, pp. 174 614–174 628, 2020.
[2] D. Strukov, “The area and latency tradeoffs of binary bit-parallel bch
decoders for prospective nanoelectronic memories,” in 2006 Fortieth
Asilomar Conference on Signals, Systems and Computers. IEEE, 2006,
pp. 1183–1187.
[3] P. Amato, C. Laurent, M. Sforzin, S. Bellini, M. Ferrari, and A. Toma-
soni, “Ultra fast, two-bit ecc for emerging memories,” in 2014 IEEE 6th
International Memory Workshop (IMW). IEEE, 2014, pp. 1–4.
[4] C. Yang, M. Mao, Y. Cao, and C. Chakrabarti, “Cost-effective design
solutions for enhancing pram reliability and performance,” IEEE Trans-
actions on Multi-Scale Computing Systems, vol. 3, no. 1, pp. 1–11, 2016.
[5] S. Krishnan T., A. Chalil, and K. Sreehari, “Vlsi implementation of reed
solomon codes,” in 2020 Fourth International Conference on Computing
Methodologies and Communication (ICCMC), 2020, pp. 280–284.
[6] V. Sudharsan and B. Yamuna, “Support vector machine based decoding
algorithm for bch codes,” Journal of Telecommunications and Informa-
tion Technology, 2016.
[7] B. Yamuna and T. Padmanabhan, “Reliability level list-based decoding
of multilevel modulated block codes,” International Journal of Informa-
tion and Communication Technology, 2016.
[8] E.-H. Lu, S.-W. Wu, and Y.-C. Cheng, “A decoding algorithm for
triple-error-correcting binary bch codes,” Information processing letters,
vol. 80, no. 6, pp. 299–303, 2001.
[9] X. Zhang and Z. Wang, “A low-complexity three-error-correcting bch
decoder for optical transport network,” IEEE Transactions on Circuits
and Systems II: Express Briefs, vol. 59, no. 10, pp. 663–667, 2012.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 07:55:31 UTC from IEEE Xplore. Restrictions apply.

You might also like