Novel Graph Expansion Enables Efficient NB-LDPC Decoding

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2961884, IEEE
Transactions on Communications
A Novel Graph Expansion and a Decoding

Algorithm for NB-LDPC Codes
V. B. Wijekoon† , Emanuele Viterbo† , Yi Hong† , Rino Micheloni‡ , Alessia Marelli‡
† Monash University, Australia
‡ Microsemi Corporation, Milan, Italy
Abstract—Non-binary low-density parity-check (NB-LDPC) was introduced in [8], the min-max algorithm. While these
codes are known to offer several advantages over their binary simplifications provide some gains in complexity at a little
counterparts, but the higher complexity, and the resource-hungry performance loss, those gains have not been enough for NB-
nature of decoding algorithms have so far restricted their practi-
cal usage. In this paper, we propose a new decoding algorithm for LDPC codes to be used widely in practical applications.
NB-LDPC codes over finite fields of characteristic 2, based on a An alternative approach to reducing decoding complexity of
novel binary expansion of the Q-ary Tanner graph. While it offers NB-LDPC codes is to consider them as binary codes, and em-
substantial complexity gains, simulation results demonstrate that ploy bit-level decoding schemes. This approach is particularly
the performance loss of the new algorithm, in comparison to attractive since NB-LDPC codewords are often transmitted as
the best known decoder, is quite small. Furthermore, due to
being based on a binary graph, it is particularly attractive for sets of bits over binary input channels. A primary concern
hardware implementations. We also suggest a simplified version with bit-level decoding is the binary representation of the non-
of the algorithm, which offers even higher gains in complexity. binary parity-check matrix. One option in this regard is to use
Index Terms—Non-binary LDPC codes, Graph expansion, the binary image of the matrix, which results in a p times
Iterative decoding larger binary matrix for a code over GF (2p ). This technique
is successfully employed in [9] for majority-logic decoding
(MLgD), and substantial gains over other MLgD algorithms
I. I NTRODUCTION are reported. Ideas in [9] are developed further in [10], where
First introduced by Gallager in 1962 [1], low-denisty parity- redundant rows are added to the binary image, resulting in
check (LDPC) codes have become quite popular due to their significant performance improvements. A different representa-
capacity-approaching performance [2], and are now used in tion, referred to as the ‘extended binary representation’, was
many practical applications such as Ethernet, digital television first proposed in [11]. Extended binary representation of a
and Wi-Fi. Non-binary counterparts of these codes (NB- code over GF (2p ) results in a (2p − 1) times larger binary
LDPC), introduced in [3], have been shown to perform even matrix. The technique is used for decoding over the binary
better for short-to-moderate code lengths. Their performance is erasure channel in [11], [12], and is generalized for other
superior to that of binary LDPC codes in the presence of burst channel models in [13], [14]. Still, schemes proposed in [13]
errors [4] as well. While many algebraic structures could be and [14] do not fully realize the complexity advantage offered
used to construct NB-LDPC codes, they are most often defined by the binary representation. In particular, scheme proposed
over finite fields of characteristic 2 [3], i.e. GF (2p ). in [13] is from the same family as FFT-QSPA [5], and the soft
Sum-product algorithm (SPA) for decoding binary LDPC decoding scheme highlighted in [14] requires transmission of
codes can be generalized for NB-LDPC codes [3], in which an additional set of parity bits.
case it is popularly referred to as Q-ary sum-product algorithm This paper introduces a novel decoding algorithm for NB-
(QSPA). While performance of NB-LDPC codes with QSPA LDPC codes over GF (2p ), with substantial gains in complex-
is very good, decoding complexity is quite high, of the order ity and hardware implementation costs. Algorithm is based
O(q 2 ), where q is the size of the field on which the code on a new binary expansion of the Q-ary Tanner graph, which
is defined. Algorithm also requires substantial amounts of produces a larger binary graph. The idea of symbols belonging
hardware resources, since now probability vectors of size to different cosets of size 2p−1 in GF (2p ), which can be
q have to be considered. In [5], a version of QSPA was interpreted as a set of binary random variables, is used for
introduced which uses the fast Fourier transform (FFT-QSPA) the expansion. Resulting binary graph is (2p − 1) times larger
to decrease the order of complexity to O(q log q) without any than the Q-ary graph, but only consists of binary nodes.
performance loss. Log domain implementations of QSPA [6] Although the graph expansion shares a lot of similarities with
and FFT-QSPA [4] were also considered to make the decoding the extended binary representation [11], the two expansions
algorithms more suitable for hardware implementations. are derived from different theoretical backgrounds. Our in-
Simplification of QSPA using an approach similar to min- terpretation allows us to propose a novel decoding algorithm
sum decoding of binary LDPC codes was first attempted that succeeds in realizing the significant complexity advantage
in [6]. It was further modified, resulting in the extended offered by the binary representation. Since the algorithm
min-sum algorithm (EMS) [7], which uses only a subset operates on a binary graph, it becomes very attractive in
of the q probability values. Another simplification of QSPA terms of hardware implementation. Simulations show that in
0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on March 11,2020 at 03:52:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2961884, IEEE
comparison with QSPA, the new algorithm loses very little a binary vector sub-space of dimension (p − 1). Then the
in performance, and, for certain codes, outperforms the best number of distinct subgroups is equal to the number of distinct
known simplifications. Furthermore, simplification techniques subspaces of dimension (p − 1) in a binary vector space of
used in decoding of binary LDPC codes can be directly applied dimension p.
to the new algorithm for higher gains in complexity. First we count the ways such a subspace can be constructed,
We explain the novel graph expansion in detail in Section which will be denoted with nw . To construct a subspace of
II, and present the new decoding algorithm, along with its dimension (p − 1), one should simply pick (p − 1) linearly
simplified version, in Section III. Section IV presents the independent vectors from the original vector space. For the
simulation results while Section V analyzes the complexity first choice, any vector other than the all-zero vector is valid.
of the algorithms. Section VI concludes the paper. For the second, vectors linearly dependant on the first pick
have to be disregarded, but since the vector field is binary,
II. N OVEL G RAPH E XPANSION there are none. In the third choice, the vector resulting from
Given a random variable in the additive group K = the addition of the first two has to be left-out. Continuing in
{GF (2p ), +}, and a subset of K, the variable belonging to this manner, we find;
the subset can be treated as another random variable. In other nw = (2p − 1)(2p − 2)...(2p − 2p−2 )
words, we can define how likely it is for a certain random
variable to belong to a given subset. The graph expansion nw may contain the same vector space several times. There-
presented in this paper is based on such idea: i.e., treating fore, in order to find the number of distinct subspaces, nw has
‘belonging to a subset’ as a random variable. The subsets to be divided by the number of different bases of a (p − 1)-
we are considering for the expansion are the proper cosets dimensional vector space, which we denote with nb . Following
of the additive subgroups H of order 2p−1 in K. In the the same approach as before;
following, we first present the distinctive properties of these nb = (2p−1 − 1)(2p−1 − 2)...(2p−1 − 2p−2 )
cosets which make the graph expansion possible, and then
present the expansion in detail. Now, the number of distinct subspaces, which is equal to the
number of distinct additive subgroups, is simply nw /nb .
A. Cosets of K nw (2p − 1)(2p − 2)...(2p − 2p−2 )
= p−1
Proper cosets of additive subgroups may be defined as nb (2 − 1)(2p−1 − 2)...(2p−1 − 2p−2 )
follows. (2p − 1)2p−2
= p−1 = 2p−1
Definition 1. Let H denote a subgroup of order 2p−1 of K, (2 − 2p−2 )
and let γ be an element of K that does not belong to H.
Proper coset CH of H is: Now we consider H and CH as subsets of GF (2p ) in order
CH = {h + γ | h ∈ H} to observe their behavior with the multiplication operation in
the field.
p−1
CH is a set of 2 elements that is disjoint with H, hence
H ∪ CH = K. Therefore, only one distinct proper coset CH Lemma 2. Multiplying all the elements of a subgroup by
exists for any H, and the quotient group QH = K /H consists some β ∈ GF (2p ), where β 6= 0 (additive identity), results
of only two elements: H and CH . in another distinct subgroup.
As noted in [15], QH is isomorphic to Z2 , and an analysis of
Proof: Let H denote an additive subgroup of order 2p−1
its additive properties leads to some interesting observations.
in GF (2p ). It is evident that multiplying all the elements of
In particular, the sum of any two elements in CH would always
H by some β ∈ GF (2p ) preserves the additive property.
belong to H, and the sum of any element in CH with any
Therefore βH is also a subgroup.
element in H would belong to CH . This behaviour means
that given three elements β1 , β2 , β3 ∈ K that are related as In order to prove that each different β would result in a
β1 + β2 = β3 , and a specific CH , either none of the three, or different subgroup, we first assume that the subgroups βH
exactly two, belong to CH . This observation will be used to are not distinct. That is, for some β1 and β2 , β1 H = β2 H.
significantly simplify the check node operations in decoding This results in β1 β2−1 H = H.
NB-LDPC codes, as elaborated on later in this section. In the We denote β1 β2−1 = αd , where α is a primitive of GF (2p ).
following, in interest of brevity, we may sometimes refer to We also denote the set of powers of α of all the elements in
CH simply as ‘coset’, and the trivial coset H, as ‘subgroup’. H, except the additive identity, as H ∗ . H ∗ is a sub-set of the
The number of distinct subsets CH is equal to the number ring of integers modulo (2p − 1), with (2p−1 − 1) elements.
of distinct subgroups H of order 2p−1 . Multiplying all the elements of H with αd would transform
H ∗ into d + H ∗ , where the addition is modulo (2p − 1).
Lemma 1. There are (2p − 1) distinct subgroups of order
Since αd H = H, d + H ∗ should also result in the
2p−1 in K.
set H ∗ . Without loss of generality, we assume H ∗ =
Proof: K is a binary vector space of dimension p, with {x0 , x1 , ..., x2p−1 −2 } is an ordered set, and therefore, d + H ∗
modulo 2 addition. Similarly, a subgroup of order 2p−1 is should be some i’th cyclic shift of H ∗ . This results in the
following set of congruence relations. of belonging to the same coset. Using the additive properties
p p−1 highlighted at the start of this section, one may see that given
xj + d ≡ xj⊕i mod 2 − 1; j = 0, 1, ..., 2 −2
the input probabilities of v1 and v2 belonging to the k‘th coset,
i/p i/p o/p
⊕ denotes the addition modulo 2p−1 −1. Expressing the above pk,v1 and pk,v2 , output probability pk,v3 can be calculated as;
as equations; o/p i/p i/p i/p i/p
pk,v3 = pk,v1 (1 − pk,v2 ) + (1 − pk,v1 )pk,v2
p−1
xj⊕i − xj = d; j = 0, 1, ..., 2 −2−i
This equation actually represents the operation at check nodes
p p−1
xj⊕i − xj + (2 − 1) = d; j=2 − 1 − i, ..., 2p−1 − 2 of binary LDPC codes [2]. Thus, in the case of calculating
Now, summation of all the 2p − 1 equations above gives; coset probabilities, a single non-binary check node operates as
a set of binary check nodes. We explain how this observation
i(2p − 1) could be used to formulate a novel expansion of a non-binary
d=
2p−1 − 1 Tanner graph in the following.
Note that d has to be an integer, and that (2p −1) and (2p−1 −1)
share no common factors. Thus, i = (2p−1 − 1), and d = B. Graph Expansion
p
(2p − 1). This results in β1 β2−1 = α2 −1 = 1, and β1 =
β2 . Therefore, βH for each different β results in a different From the observations on how permutation and convolution
subgroup. operations are impacted from using coset probabilities, it
seems straight-forward to expand a non-binary Tanner graph
Corollary 1. Multiplying proper cosets CH with some β 6= 0 into a binary one by replacing each non-binary node in the
always results in another distinct proper coset. original graph with a set of binary nodes, one each for the
(2p − 1) cosets. A binary variable node would represent the
Corollary 1, along with the additive property briefly
probability of the original non-binary symbol belonging to a
overviewed at the start of the section, enable us to formulate
particular coset, whereas a binary check node would calculate
a binary expansion for NB-LDPC codes that significantly
estimates of the said probability. As evident from property 2,
reduces the complexity at check nodes. Traditionally, check
how the binary variable and check nodes are to be connected
node operations are viewed as a combination of two distinct
would be defined by the non-binary edge weights of the
steps; permutation of symbol probabilities, and convolution of
original graph.
permuted probability vectors [5]. The main reason for the high
As an example, consider a parity-check equation ρ of a
decoding complexity of NB-LDPC codes is this convolution
code over GF (23 ), involving two code symbols v1 and v2 .
of probability vectors [7], [8].
Primitive polynomial p(x) = x3 + x + 1 has been used to
For the purpose of explaining how the aforementioned generate GF (23 ), and α denotes a primitive element.
properties are related to decoding, we assume that instead
of symbol probabilities, probabilities of belonging to each ρ ⇒ αv1 + α6 v2 = 0 (1)
specific coset are available at check nodes, and that the output
Now assume that v1 and v2 transmit the set of coset prob-
of check node operations should be the same. It is interesting
abilities to the check node ρ. Referring the cosets listed
to note that the two sets of probabilities are of similar size. In
in Table I, one may observe that CH1 is turned into CH7
GF (2p ), there are 2p symbol probability values, and from
upon multiplication with α, and into CH3 when multiplied
lemma 1, it can be seen that there are (2p − 1) ‘coset
with α6 . All the other cosets behave similarly. Therefore,
probability’ values.
coset probabilities have to be permuted, and the edges in the
The permutation step is necessary in decoding NB-LDPC
expanded graph, between the binary variable nodes and check
codes due to the fact that a symbol turns into a different one
nodes, are defined according to this permutation.
when multiplied with a non-binary edge weight. From lemma
2, it is evident that this behavior is the same with proper cosets TABLE I: Subgroups and Proper Cosets in GF (23 )
as well; a coset turns into another when multiplied with the
edge weight. Thus, similar to symbol probabilities, coset prob- Subgroup Coset
abilities also have to be permuted at check nodes. Therefore, H1 : {0, α1 , α2 , α4 } CH1 : {α0 , α3 , α5 , α6 }
there would be no discernible change in the permutation step H2 : {0, α0 , α2 , α6 } CH2 : {α1 , α3 , α4 , α5 }
of check node operations. But the impact on the convolution H3 : {0, α0 , α1 , α3 } CH3 : {α2 , α4 , α5 , α6 }
step is more significant, and requires a closer look. H4 : {0, α1 , α5 , α6 } CH4 : {α0 , α2 , α3 , α4 }
Consider the case of a check node with three neighboring H5 : {0, α3 , α4 , α6 } CH5 : {α0 , α1 , α2 , α5 }
variable nodes, v1 , v2 , v3 . Assuming edge weights equal to 1, H6 : {0, α0 , α4 , α5 } CH6 : {α1 , α2 , α3 , α6 }
these are related as v1 + v2 = v3 . When calculating symbol H7 : {0, α2 , α3 , α5 } CH7 : {α0 , α1 , α4 , α6 }
probabilities for one variable node, probability vectors of the
other two have to undergo a convolution operation because Analyzing the permutation from the point-of-view of the
probability of a specific output symbol is affected by those of parity-check matrix, one may observe that it is akin to re-
all the input symbols. But this is not the case when it comes placing each non-binary value with a (2p − 1) × (2p − 1)
to coset probabilities. Output probability of belonging to a permutation matrix M , where M (i, j) is 1 if the j’th coset
given coset is only impacted by the two input probabilities turns into the i’th coset upon multiplication with that value,
and 0 otherwise. The expanded matrix Hexp1 for the parity-  

check equation given by (1) would take the form of (2). 1011000
1 1 1 0 1 0 0
H = α | α6
Horg
23 = 
0 1 1 0 0 1 0 (3)

0010000 | 0000001
 1100001
0 0 0 1 0 0 0 | 0 0 1 0 0 0 0

0 1 0 0 0 0 0 | 1 0 0 0 0 0 0
 From the point-of-view of the new binary graph, parity-
check equations of the local code represent an additional set
Hexp1 = 
 
0 0 0 0 0 1 0 | 0 1 0 0 0 0 0 (2)

0 0 0 0 0 0 1 | 0 0 0 0 0 1 0 of binary check nodes. For each non-binary variable node (of
  the original graph), (2p − p − 1) binary check nodes have to
0 0 0 0 1 0 0 | 0 0 0 1 0 0 0
be added to the expanded graph. This is the only modification
1000000 | 0000100 necessary to the straight-forward approach described at the
start of this section. New set of check nodes would capture
The straight-forward approach would work if the coset
all the dependencies between the binary variable nodes. New
probabilities of a given variable node were independent of
check nodes are only connected with a subset of the set of
each other. But obviously, this is not the case. For example,
binary nodes of a single non-binary variable node and although
consider cosets CH1 , CH2 and CH7 , listed in Table I. If it is
the number of them seem quite large, it should be noted that
known that a symbol is in CH1 , but not in CH2 , then we can
they are of very low degree. In the matrix representation, these
safely come to the conclusion that the symbol must be in CH7 .
additional check nodes translate to new parity-check equations,
In order to capture the dependencies between coset probabil- and continuing with the example of (1), new expanded binary
ities, we present an alternate binary representation of GF (2p ) matrix, Hexp2 , would now be given by (4), where 4 additional
symbols. For this purpose, we first define a set of auxiliary parity-check equations each have been added to Hexp1 for the
binary random variables as follows. Given some symbol β 2 variable nodes.
and a coset Ci , both in GF (2p );
Hexp1
 
aβ,i = 1; β ∈ Ci  
 
aβ,i = 0; β∈/ Ci  1011000 | 0000000 
 
 1110100 | 0000000 
Using these auxiliary variables, each symbol in GF (2p ) could 
 0110010 | 0000000 

be uniquely represented as a binary vector of length (2p − Hexp2 = 
 
 1100001 | 0000000  (4)

1), which we call the ‘coset representation‘ of the field. For  
example, again consider GF (23 ). Table II shows how each 
 0000000 | 1011000 

symbol could be represented, using cosets given in Table I. 
 0000000 | 1110100 

 
 0000000 | 0110010 
0000000 | 1100001
TABLE II: Coset Representation of GF(23 )
When it comes to message passing decoding, it is consid-
β aβ,1 aβ,2 aβ,3 aβ,4 aβ,5 aβ,6 aβ,7
ered desirable not to have short cycles in the graph [2]. Short
α−∞ 0 0 0 0 0 0 0
cycles may create graphical structures that are detrimental for
α0 1 0 0 1 1 0 1
iterative decoding, such as stopping sets and trapping sets
α1 0 1 0 0 1 1 1
(also called near codewords) [18], [19]. For these reasons,
α2 0 0 1 1 1 1 0
construction methods of LDPC codes attempt to increase the
α3 1 1 0 1 0 1 0
girth; the length of the shortest cycle in the code [20].
α4 0 1 1 1 0 0 1
So, even though the parity-check matrix of the local code
α5 1 1 1 0 1 0 0
does help in capturing all the dependencies, an additional set of
α6 1 0 1 0 0 1 1
properties such as girth has to be considered to evaluate the
suitability of the matrix for message passing decoding. For
example, from (3) it can be observed that Horg 23 has a girth
Observing Table II, one may see that the 8 binary vectors of only 4, and that it contains 3 cycles of that length. This
for the symbols of GF (23 ) form a (7, 3) binary linear code. means that Horg23 is unsuitable for message passing decoding.
In fact, coset representation of GF (2p ) constitutes of the 2p In order to amend this, we propose carrying out row operations
codewords of a (2p − 1, p) linear code. Parity-check matrix on the local parity-check matrix until a matrix with girth at
of this linear code, which we will call the ‘local code’, can least 6 is obtained. This approach of creating a suitable parity-
be easily utilized to capture the dependencies between coset check matrix has been used in iterative decoding of Reed-
probabilities. The said matrix is a (2p −p−1)×(2p −1) matrix, Solomon codes as well [21]. Since non-binary LDPC codes
and the (2p − p − 1) parity-check equations obtained from its are not usually considered over fields larger than GF (28 ) [22],
rows are sufficient to capture the dependencies. Continuing the largest parity-check matrix possible for the local code is 247×
example with GF (23 ), parity-check matrix of the (7, 3) code 256, and constructing a better matrix is quite feasible. For
(Horg
23 ) is given below. GF (23 ), matrix obtained through this method (H23 ) is given
below. an isomorphism between that set and the vector space of di-
  mension p [14], whereas the proposed expansion is developed
1011000 using the algebraic properties of GF (2p ). As noted earlier as
1 0 0 0 1 1 0
H23 =  (5) well, our approach leads to a novel decoding algorithm which
0 1 1 0 0 1 0
offers attractive complexity gains. Algorithm is presented in
1100001 detail, along with its distinctive features, in Section III.
H23 is obtained simply by replacing the second row of Horg23 III. C OSET P ROBABILITY BASED D ECODING A LGORITHM
with the binary summation of that row with the third. An
additional advantage of H23 is that it is a bit sparser than Decoding algorithm we present is implemented on the ex-
Horg
23 , and it has a constant row weight of 3.
panded, binary graph. Since the graph is now binary, it seems
Observing table II closely, it can be seen that the codewords straight-forward to use the binary sum-product algorithm
listed comprise the codebook of the dual code of the (7,4) (SPA) [2], or the well-known simplification, min-sum (MS)
Hamming code, also called the simplex code [16]. In fact, decoding [17]. But there are certain special characteristics of
the (2p − 1, p) code arising from the coset representation of the expanded graph which require some modifications to SPA.
any field GF (2p ) is the dual code of the Hamming code of New expansion replaces each GF (2p ) variable node, which
length (2p − 1). Since Hamming codes are guaranteed to have corresponds to a single GF (2p ) symbol, with (2p − 1) binary
a minimum distance of 3, it should always be possible to find a nodes. One GF (2p ) symbol only carries p bits of information
parity-check matrix with a constant row weight 3 for the local and, therefore, only p of the binary nodes can receive channel
code of any field GF (2p ). Such a matrix is also guaranteed to estimates. Initializing the remaining set of nodes is a concern.
have no cycles of length 4, as two weight 3 binary Hamming Two different types of check nodes are present in the
codewords can never have more than a single common bit. expanded graph. First type are the ones which replaced the
Thus, the new graph expansion we introduce can be sum- nodes in the original graph. Then some additional check nodes
marized in the following three distinct steps. had to be added in order to capture the relationships between
1) Expand each GF (2p ) node into (2p − 1) binary nodes coset probabilities of a single non-binary node. We will call
2) Obtain a parity-check matrix without short cycles, and the first category ‘global’ check nodes and the second ‘local’
row weight 3, for the local code check nodes.
3) Add (2p − p − 1) ‘local‘ check nodes to each set of Global check nodes can be satisfied even if the different sets
(2p − 1) binary nodes resulting from a variable node of binary variable nodes do not represent symbols in GF (2p )
over GF (2p ) (in other words, a codeword of the local code). Local check
nodes are concerned only with the local code. The set of local
Fig. 1 presents the expansion for parity-check equation over
checks added for a single non-binary node will be satisfied
GF (23 ), given by (1). In the figure, squares represent check
when the set of binary nodes which replaced that represents
nodes, circles variable nodes, and the extra check nodes added
some symbol in GF (2p ). For the local checks, it is irrelevant
(local check nodes) are represented with hexagons. Shaded
whether the decoder has converged to an actual codeword of
graph at the top is the original, non-binary Tanner graph.
the NB-LDPC code, or not. These observations suggest that
the two types need to be treated differently.
In the following we first describe the steps of SPA for
the sake of completion, and then elaborate on how it should
be modified to operate on the expanded graph, taking into
account the aforementioned unique characteristics. An LDPC
code over GF (2p ) with an m × n parity-check matrix that
is transmitted over the binary-input additive white Gaussian
noise (BI-AWGN) channel is considered. Description of SPA
is presented step-wise; initialization, check node operations,
and variable node operations.
A. Initialization
Each GF (2p ) variable node has now been replaced with
3 p
Fig. 1: Graph Expansion for GF(2 ) (2 −1) binary nodes, and in the initialization step, channel es-
timates, called a priori probability values, should be assigned
Graph expansion presented shares similarities with the to each of these. But the set of binary nodes corresponds
extended binary representation first proposed in [11]. Both to only p bits of information transmitted across the channel.
expansions initially result in binary matrices (2p − 1) times These p bits are represented by the message bits of the local
larger than the original, and in both cases, additional binary code, and that particular set of nodes may be initialized with
check nodes added are the constraints of a simplex code. channel log-likelihood ratios (LLRs).
But the extended binary representation is derived based on 2ri,t
an endomorphism of the set of integers {0, 1, ..., 2p − 1} and Li,t = (6)
σ2
Li,t is the initial estimate, and ri,t is the value received, for the number of allowed decoding iterations is checked. In the case
t’th bit (binary node) of i’th variable node, and t = 1, ..., p. in which that number has been reached, decoder terminates
As usual, σ 2 represent noise variance. with failure, or else, extrinsic messages are calculated for
Remaining (2p − p − 1) binary nodes represent the parity another iteration, as given in equation (10).
bits of the local code. This means that each of those bits could (k+1) (k) (k)
be written as a linear combination of the p message bits which ri→
− j = Ri − rj →
− i; j ∈ Nig ∪ Nil (10)
were initialized with channel values. Equations of the original
local parity-check matrix (not modified for a large girth; given D. Modifications to SPA
by (3) for GF (23 )) represent these relationships. Utilizing the Using the sum-product algorithm as highlighted in the
channel LLR values of the p message bits, initial estimates for previous sub-sections, does not address the peculiarities of the
the (2p − p − 1) parity bits could be calculated as follows. expanded binary graph which were briefly overviewed at the
Y Li,j start of this section. Major concerns raised there were;
Li,t = 2 tanh−1 tanh (7)
2
j∈Et
1) Calculation of initial estimates for (2p − p − 1) bits out
of the (2p − 1) bits that represent a non-binary symbol
Et represents the set of message bits involved in deriving the 2) Need to distinguish between the estimates of global and
t‘th parity bit, and t = p + 1, ..., 2p − 1. local check nodes that focus on different aspects of the
After initialization, binary nodes would send all initial expansion
estimates Li,t to their neighboring check nodes.
As detailed in subsection A, an initial estimate for the bits
that are not transmitted through the channel can be calculated
B. Check Node Operations using the channel LLR values of the p transmitted bits. But
Both global and local check nodes operate similar to a check there is no guarantee on the accuracy of those LLRs, and
node of a binary LDPC decoder [2]. In the k’th decoder iter- errors there may propagate into calculated initial estimates,
ation, j’th check node would calculate a probability estimate which can affect the overall performance of the decoder. Also,
(called a posteriori probability) for the i’th variable node in these estimates do not reflect a priori probabilities in the true
its neighborhood Mj , in the form of an LLR as; sense. Therefore, calculated initial estimates can be considered
as not as reliable as the ones received from the channel. To
(k−1)
(k) −1
Y ri0 →
−j reflect this, we suggest using a scaling factor β (0<β≤1) when
rj →
−i = 2 tanh tanh (8) calculating the estimates. Optimal value of β has to be found
2
i0 ∈Mj ;i0 6=i
through simulations for each code. This coefficient will only be
where ri→
(k−1) used in initialization, and because of that, impact of the scaling
− j is the LLR from i’th variable node to j’th check
node in the previous iteration. operation on decoding complexity is negligible. Equation (7) is
slightly modified to represent the scaling operation as follows.
Y Li,j
C. Variable Node Operations Li,t = 2β tanh−1 tanh (11)
2
Similarly to a binary LDPC decoder [2], variable nodes first j∈Et
need to combine the a posteriori probability values received Variable node operations of SPA, presented in subsection C,
from neighboring check nodes with the a priori probability do not distinguish between estimates from global check nodes
from the channel. Since the decoder operates in the LLR and local check nodes. But, as discussed earlier, there is a need
domain, this is done by simple addition of the values. Combin- for this, since each type of check node deal with different
ing probability estimates at variable node i in k’th decoding code constraints; global check nodes handle the constraints
iteration is carried out as follows. resulting from the original code, and local check nodes deal
(k) with the constraints enforced by the coset representation. The
X (k) X (k)
Ri = L i + rj →
−i+ rj →
−i (9)
obvious way of differentiating between the two types is to
j∈Nig j∈Nil
scale the estimates with different weighting factors, wl for the
Nig and Nil denote global and local check nodes in the ones from local check nodes and wg for global check nodes
neighborhood of variable node i. Li is its a priori probability (0 < wl , wg ≤ 1). We suggest setting wg = 1, and finding the
estimate. appropriate value of wl through simulations.
(k)
Based on Ri values, some ‘tentative’ decision needs to A set of binary variable nodes, those which represent parity
be taken to check whether the decoder has converged. Only bits in the local code, do not really have a priori probability
the p message bits of the local code need to be considered for estimates. They are initialized with a value calculated as in
this step, since they alone form a symbol in GF (2p ). Decision (9), which is then sent to check nodes as the a priori estimate.
rule for the bit value is the same as in a binary decoder [2], This value is calculated using some of the constraints of the
where only a comparison against zero is necessary. GF (2p ) local code, and the local check nodes also represent a set of
symbols are obtained as length p binary vectors. Syndrome these. Whether the constraints used in the calculation and those
check is carried out with these, and if the syndrome is zero, represented by the local checks have any in common depends
which means the decoder has converged to a codeword of the on the parity-check matrix used for the local code (step 2
NB-LDPC code, it terminates with success. If not, maximum of the expansion, detailed in Section II-B). Still, sending a
priori estimates calculated using the local code back to check parameters for the two versions of the algorithm. Therefore,
nodes which represent the same code seem to be in contrast we suggest using the min-sum simplification only for equation
to the extrinsic principle of message passing, and could create (8), which is then modified as;
undesirable correlations between messages from the very first
decoding iteration. This should also be reflected when setting (k)
Y (k−1) (k−1)
rj →
−i = sign(ri0 →
− j ). min |ri0 →
−j| (14)
a value for wl . i0 ∈Mj ;i0 6=i
i0 ∈Mj ;i0 6=i
Equation (9) is slightly modified as follows to represent the
usage of weighting factor wl . In the case of binary codes, it has been identified that the
(k)
X (k) X (k) deterioration in decoding performance resulting from min-sum
Ri = Li + rj →
− i + wl rj →
−i (12) simplification is due to the a posteriori probabilities calculated
j∈Nig j∈Nil by check node being overestimates of the actual value [23].
Equation (10) is not modified for the global check nodes (j ∈ Various methods to ‘correct‘ this overestimation error has been
Nig ), but for local check nodes, it now takes the following suggested in the literature [23], [24], [25], among which the
form. simplest is to use a constant normalization factor δ (0<δ≤1)
(k+1) (k) (k) l
ri→− j = Ri − wl .rj →− i ; j ∈ Ni (13) [23]. We also follow this method to improve performance
of the simplified version of the algorithm, and in this case,
As explained, scaling a posteriori probability estimates equation (14) gets slightly modified as follows.
received from local check nodes is a necessary operation in
each decoder iteration. Therefore, in the interest of keeping
(k) (k−1) (k−1)
Y
the complexity low, we suggest using values of the form rj →
− i = δ. − j ).
sign(ri0 → min −j|
|ri0 → (15)
i0 ∈Mj ;i0 6=i
1/2r for wl . Then the scaling operation simply becomes a i0 ∈Mj ;i0 6=i
set of bit shifts, which are of almost negligible complexity at
hardware level. Optimal value for r, has to be obtained through IV. S IMULATION R ESULTS
simulations for each code. For certain codes, marginal gains
In this section, we evaluate performance of the new algo-
in performance (around 0.1dB) were observed with scaling
rithm, ‘Coset Probability based Decoding (CPbD)’, against
factors not of the form 1/2r , but for the majority, there were
well-known decoding algorithms for NB-LDPC codes. We as-
no observable gains.
sess performance of three versions of CPbD; without any mod-
Comparing the decoding schemes presented in [13] and [14]
ifications to SPA, as highlighted in Section III-A-C (CPbD),
with the algorithm proposed here, we note some important
with the modifications suggested in Section III-D (CPbD-
differences. [13] presents an algorithm from the family of FFT-
M), and then with both the modifications and the min-sum
QSPA [5], which means the complexity advantages one might
simplification (CPbD-MS). Existing algorithms considered in-
expect from using a binary expansion are not available. In [14],
clude the optimum decoding algorithm QSPA in its LLR-
all (2p −1) bits are transmitted when using the expansion with
domain implementation [6], and two simplifications, min-max
soft decoding, thereby lowering the code rate, and both types
algorithm [8] and max-log-SP algorithm [6]. We note that
of constraints are treated similarly. In contrast, the algorithm
max-log-SP algorithm is a special case of extended min-sum
we proposed employs bit-level decoding, thereby gaining
(EMS) algorithm [7], where the two parameters nm and nc
significant complexity advantages, and distinguishes between
are set to the maximum values possible, respectively the size
the two types of constraints for additional performance gains.
of the field (2p ) and check-node degree (dc ).
Furthermore, since the algorithm requires only p bits to be
Simulations were done over the BI-AWGN channel, with
transmitted per symbol, there is no loss of rate, and the normal
BPSK modulation. Optimum values of the two parameters of
transmission setup requires no modification. We also note that
CPbD-M, β and wl , and the normalization factor δ of CPbD-
the matrix used for decoding in [14] is not the extended binary
MS, were found through simulations.
representation of the original non-binary matrix, but a matrix
obtained from optimizing the extended representation for girth.
100
E. Simplifications
10-1
In the interest of further reducing decoding complexity,
certain modifications could be applied to the decoding algo- 10-2
rithm. In particular, calculation of tanh and tanh−1 functions,

FER
required in equations (7) and (8), is a complex operation at 10-3
hardware level. In binary LDPC decoders, these operations

10-4
are simplified to a level where only comparisons are required qspa
min-max
[17]. This version of SPA is known popularly as ‘min-sum‘ 10-5
max-log-sp
cpbd
decoding, and although there is some performance loss, it is cpbd-m
cpbd-ms
widely used in practice due to complexity savings. 10-6
2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4
Since equation (7) is only required during initialization, its Eb /N0 (dB)
impact on total decoding complexity is minimal. Also, mod-
ifying that equation would be akin to setting different initial Fig. 2: FER Perf. with a (816,408) code over GF (23 ) (C1 )
Fig.2 shows FER performance of algorithms with C1 , a 100

qspa
rate 12 code over GF (23 ), of 816 symbols in length. Code min-max
cpbd-m
was generated through random re-labeling of a regular binary 10-1 cpbd-ms
b-ldpc
LDPC code of column weight 5, obtained from [26]. Optimum b-polar(L=8)
b-polar(L=32)
10-2
parameters were found to be β = 0.5, wl = 0.25 and δ = 0.7.
Maximum number of decoding iterations was set to 50 for all
FER
10-3
algorithms.
It can be seen that performance of CPbD without the 10-4
modifications is quite unsatisfactory. It has a gap of around
1dB with QSPA, at a FER of 10−4 . But with the usage 10-5
of optimized scaling factors, as suggested in Section III-D,

performance improves significantly. In fact, CPbD-M is the 10-6
2 2.5 3 3.5 4 4.5 5
algorithm which performs closest to QSPA. The gap between E b /N0 (dB)
the two is less than 0.3dB at a FER of 10−4 . At the same Fig. 4: FER Perf. with a (32,16) code over GF (24 ) (C3 )
FER level, CPbD-M is outperforming the min-max algorithm
by around 0.1dB, and the max-log-SP algorithm by close
to 0.5dB. With the min-sum simplification, performance of
CPbD-M deteriorates by about 0.3dB, and the gap with the in Fig. 4. This code was originally proposed in [27] based on
optimum decoder is now less than 0.6dB. Even though min- an irregular protograph, and is also presented as a candidate
max algorithm outperforms CPbD-MS, it is still better than for short block length transmissions in [28]. Here, we relax the
max-log-SP algorithm, which can be considered as the direct condition on wl in the interest of better performance, which
application of min-sum decoding to NB-LDPC codes [7]. was achieved at wl = 0.3. Optimum values for β and δ
Fig. 3 shows performance of algorithms with C2 , a rate 0.89 were 0.2 and 0.85 respectively. Maximum number of decoder
code over GF (24 ), of 1998 symbols in length. Again, the code iterations was set to 100.
was constructed through random re-labeling of a regular binary It can be seen that in this case, min-max algorithm performs
graph obtained from [26]. In this case, code has a column within 0.15dB of QSPA, at a FER of 10−5 . CPbD-M shows a
weight of 4. Optimum parameters were found to be β = 0.7, higher gap of around 0.5dB at the same FER level. With min-
wl = 0.25 and δ = 0.9, and maximum number of decoding sum simplification, a further loss of 0.07dB is observed. How-
iterations was set to 50. ever, it does offer considerable complexity savings, which are
For this code, min-max algorithm performs closest to QSPA. presented in Section V. New algorithm without modifications
At a FER of 10−4 , the gap between min-max and QSPA is less was not considered for this code due to poor performance,
than 0.1dB. At the same level, CPbD-M has a gap of about while performance of max-log-SP was quite close to that of
0.35dB with QSPA. It should be noted that this gap increases min-max decoding. Respective curves were removed in the
to around 0.9dB at a FER of 10−3 if CPbD was used without interest of a clearer figure.
the scaling factors. Min-sum simplification of CPbD-M loses We also evaluate performance of binary LDPC and polar
a further 0.1dB, which makes its gap with QSPA less than codes of same rate and bit length. Binary LDPC code used
0.5dB. Once more, max-log-SP algorithm is outperformed by is the (128,64) code specified in [28]. Performance of polar
CPbD-MS, although in this case, the gap between the two is codes are evaluated with CRC-aided successive cancellation
very small, about 0.02dB. list (SCL) decoding, with two list sizes, 8 and 32. Performance
with list size of 32 is from [29]. Polar code outperforms all
LDPC codes in both settings, but interestingly, when list size
100
is 8, gap between polar code and NB-LDPC using QSPA
diminishes with error rate, going from around 0.45dB at FER
10-1
of 10−1 to 0.15dB at 10−4 , and a performance crossover can
10-2
be expected at higher SNRs. When list size is 32 though, a
consistent gap of around 0.75dB with QSPA can be observed.
FER
10-3 It should be noted that this configuration of SCL is four times

as complex as the one where list size is 8. Binary LDPC
10-4 code with SPA, performs worst of the lot, and a gap of 0.8dB
qspa
-5
min-max
max-log-sp
with QSPA can be observed. CPbD-M, which is also a binary
10
cpbd
cpbd-m
decoder, has a gain of 0.3dB over binary LDPC.
10-6
cpbd-ms
When it comes to decoding complexity, binary SPA has an
3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2
Eb /N0 (dB)
edge over SCL, particularly since SPA is highly parallelizable,
yet with a significant performance loss of close to 1dB that
Fig. 3: FER Perf. with a (1998,1776) code over GF (24 ) (C2 ) increase to around 1.6dB when the list size is larger. NB-
LDPC codes close this gap to just 0.15dB for list size 8, and
FER performance of proposed algorithms with a rate 0.5 0.75dB for list size 32. They even outperform polar codes at
code over GF (24 ), of 32 symbols in length (C3 ), is presented slightly longer lengths, but the highly complex nature of QSPA
is a considerable disadvantage. In contrast, proposed algorithm each type for the comparison. Since the new graph expansion
has a complexity close to SPA, but with less performance loss. replaces each node over GF (2p ) with (2p − 1) binary nodes,
Thus, it manages to reduce the gap between binary LDPC and complexity of that many nodes is considered together for
polar significantly, while not losing the complexity advantage CPbD and CPbD-MS. Also, as explained in Section II, for
of a highly parallelizable binary decoder. each GF (2p ) variable node, (2p − p − 1) binary check nodes
Results show that when used with the proposed modifica- are added to the new graph. Because of that, in order to keep
tions, the new algorithm, CPbD-M, performs quite close to the the comparison fair, complexity of one such set of nodes is
best known decoder for NB-LDPC codes, QSPA. Its simplified added to that of variable node operations.
version does not lose much performance, and it can be further Table III lists complexities at check node operations of each
improved using an advanced method for correcting the min- algorithm, and table IV considers variable node operations. We
sum approximation error. In particular, methods discussed use dc and dv to denote, respectively, the average degree of a
in [24] and [25] report substantial gains over the factor check node and a variable node in the original Q-ary graph.
correction method used in the simulations. Since using the dlc denotes the average degree of a local check node. This
new algorithm without scaling factors results in substantial would depend on the parity-check matrix used for the local
performance losses, we do not consider that version of the code, but as explained in Section II-B, it is always possible to
algorithm further in this paper, and refer to the version with find a matrix with dlc = 3.
scaling factors simply as ‘CPbD’. Both CPbD and CPbD- At hardware level, apart from the number of operations,
MS offer substantial advantages in terms of complexity and type of operation is another consideration for complexity. It
hardware implementation, which would be discussed in detail is considered that operations such as multiplications are more
in Section V. complex than, say, comparisons, in hardware [31]. Therefore,
Interestingly, from simulations carried out for CPbD with number of operations of each type, comparisons (Comp),
a number of different codes, it was observed that the gap it additions (Add), multiplications (Mult) and table look-ups
has with QSPA is somewhat related to the column weight (LUT), are given in the tables. It has been assumed that max∗
of the code. With increasing column weight, gap seems to operation of LLR-QSPA [6], transformation between log and
reduce. This phenomenon can be explained from the point- probability domains necessary in FFT-QSPA [5], and tanh,
of-view of a ‘parity bit’ of the local code. These variable tanh−1 calculations for CPbD are carried out with the help of
nodes do not receive a priori estimates from the channel, look-up tables. For check node operations of LLR-QSPA and
and are initialized with calculated values, which makes their min-max decoding, it has also been assumed that the forward-
initial estimates less reliable. When the column weight is backward approach is used [6], [8].
high, variable nodes receive more a posteriori estimates, which
reduces the negative impact of the unreliable initial estimate TABLE III: Check Node Complexity
in the following iterations. A similar idea is used in [30] to
Algorithm Check Node Operations
explain the improved performance of irregular LDPC codes
Comp Add Mult LUT
over the regular ones.
FFT-QSPA 0 2dc 2p p 2dc 2p 2dc 2p
From analyzing the impact of ‘calculated‘ initial estimates,
LLR-QSPA 0 3dc 22p 0 3dc 22p
one may obtain an idea on how to chose the scaling factor β.
Min-Max 3dc 22p 0 0 0
If the code has a low row weight, then the negative impact p
of an erroneous a priori estimate could be higher, since check CPbD 0 0 2dc 2 2dc 2p
p
node calculations would be impacted more by that estimate. CPbD-MS 2dc 2 0 0 0
As explained earlier, calculated initial estimates are not as
reliable as channel estimates, and, as such, they have a higher Observing table III, it can be seen that CPbD and FFT-QSPA
chance of being in error. Therefore, for codes with lower row require the same number of multiplications and table look-ups
weights, a smaller value for β is advisable. Since low rate for check node operations. But CPbD does not require the large
LDPC codes usually have a low row weight, optimum value number of additions necessary for FFT-QSPA, and therefore
for β would generally be smaller for those. This can be seen has an advantage in complexity. LLR-QSPA requires a much
from the results presented in this paper as well. For the rate larger number of additions and table look-ups than FFT-QSPA,
1 and thus, it is also more complex than CPbD. Although min-
2 codes, β is 0.5 and 0.2, while for rate 0.89 code, β is 0.7.
max algorithm requires a large number of comparisons, in fact
more comparisons than the total number of operations required
V. C OMPLEXITY AND R ESOURCE R EQUIREMENTS
by CPbD, comparisons are the only operation necessary for the
In the following, we compare decoding complexities of algorithm. Taking into account the fact that comparisons are
CPbD and CPbD-MS, with that of the two popular versions simpler than multiplications or table look-ups, we can expect
of QSPA, LLR-QSPA [6] and FFT-QSPA [5], and min-max CPbD and min-max to be of a similar complexity level. As
decoding [8]. An NB-LDPC code over GF (2p ), transmitted expected, CPbD-MS has the lowest check node complexity.
over the BI-AWGN channel, is considered for the comparison. It requires only a 21p fraction of the comparisons necessary
Complexities of the two major steps of message passing for min-max decoding. Calculation of the sign in min-sum
decoding, check node computations and variable node compu- simplification [17] is disregarded since it can be done with
tations, are compared separately. We consider a single node of simple XOR operations, which are of negligible complexity.
10
For variable node operations, all the algorithms need the operations (of each type) required in decoding one codeword
same number of additions. CPbD is at a disadvantage here of C1 , the rate 12 code over GF (23 ) used in Section IV.
since it is the only algorithm which requires multiplications Decoding thresholds and number of iterations were derived
and table look-ups (for ‘local‘ check node operations). But from analyzing BER plots obtained through Monte Carlo
since dlc , as explained in Section II-B, can always be set to simulations for different numbers of iterations.
3, the additional number of operations would not be too large
for it to eclipse the complexity advantage gained at check TABLE V: Decoding Thresholds, Iterations, and Number of
nodes. Both CPbD and CPbD-MS has the slight advantage Operations Required
of requiring just p comparisons to obtain a tentative hard Alg. Dec. Itr. Total Operations (×105 )
decision, whereas all the others need to compare 2p values. It Thr. Comp Add Mult LUT
should be seen that min-max decoding is slightly complex than FFT- 2.9dB 20 1.326 52.22 13.06 13.06
the two versions of QSPA at variable nodes, since it requires a QSPA
much larger number of comparisons. CPbD-MS uses roughly LLR- 2.9dB 20 1.326 169.8 0 156.7
the same number of comparisons and additions as min-max QSPA
decoding. Therefore, it is also slightly complex than the two Min- 3.3dB 17 138.7 11.1 0 0
QSPA versions at variable nodes. Max
CPbD 3.2dB 20 0.49 13.05 17.95 17.95
TABLE IV: Variable Node Complexity CPbD- 3.4dB 20 15.75 13.05 0 0
Algorithm Variable Node Operations MS
Comp Add Mult LUT
FFT-QSPA 2p 2dv 2p 0 0 Summarizing the results in Table III, IV, and V, and the
LLR-QSPA 2p 2dv 2p 0 0 insights on initialization, we can conclude that LLR-QSPA
Min-Max dv 2p 2dv 2p 0 0 and FFT-QSPA are the two most complex algorithms, and
CPbD p 2dv 2p 2dlc × 2dlc × that CPbD-MS is the least complex. Min-max decoding and
(2p − p) (2p − p) CPbD are of the same level of complexity, in-between the two
CPbD-MS p+ 2dv 2p 0 0 extremes. This conclusion is further validated by Table VI,
2dlc (2p − p) which lists the average time to complete one iteration with
three separate LDPC codes, for all the algorithms. Codes are
the ones considered in Section IV, and settings are the same
When compared with the two major steps discussed, impact
as the ones used there. All the algorithms were implemented
on decoding complexity from initialization is much less. But
in C, and tested on the same machine with a 2.4-GHz CPU
CPbD and CPbD-MS offer some advantages there as well.
and 8 GB of RAM.
When the channel in use is a binary input channel, all the
standard decoders of NB-LDPC codes have to construct sym- TABLE VI: Decoding Latency per Iteration
bol probabilities from bit probabilities, which requires some
real number operations. When using the binary expansion, this Algrotihm (816, 408) (1998, 1776) (32,16)
is not necessary. For the ‘message‘ bits of the local code, bit ov. GF(23 ) ov. GF(24 ) ov. GF(24 )
probabilities can be simply assigned, and calculations have to FFT-QSPA 9.5ms 44.16ms 0.166ms
be done only for the parity bits. Since the number of parity LLR-QSPA 7.65ms 47.45ms 0.189ms
bits is (2p − p − 1), and there are 2p symbol probabilities, less Min-Max 4.8ms 32.31ms 0.147ms
operations are required in initialization of CPbD and CPbD- CPbD 5.1ms 26.4ms 0.122ms
MS. CPbD-MS 1.66ms 8.9ms 0.054ms
It is worth noting here that the soft decoding scheme
proposed in [13] based on the extended binary representation Table VI presents some interesting observations. First up,
[11] is of the same complexity order as FFT-QSPA. The FFT-QSPA seems more complex than LLR-QSPA for both
hybrid decoding scheme proposed in [14] based on the same codes over the smaller field. Complexity advantage FFT-QSPA
representation, ‘Hybrid Parallel Decoding’ (HPD), relies on offers is at check node operations. When you compare the
transmitting an additional set of parity bits, and thus cannot be complexity orders of the two algorithms, O(2p p) of FFT-
fairly compared with the algorithms we have so far considered. QSPA is more of an advantage over O(22p ) of LLR-QSPA
Nevertheless, we note that the soft decoding part in HPD is of when the field is larger. Thus, with an increase in field order,
roughly the same complexity order as CPbD at check nodes, FFT-QSPA should become less complex than LLR-QSPA, and
and FFT/LLR-QSPA at variable nodes. In addition, a number we see this in Table VI.
of hard decoding iterations are also carried out. A similar argument may be used to analyze the complexity
Apart from check node and variable node complexities, advantages offered by CPbD and CPbD-MS. Observing Table
number of iterations required to converge is an important IV, it seems that the two new algorithms have a slight
factor in determining the total complexity of decoding algo- complexity disadvantage at variable node operations, due to
rithms. Table V reports decoding thresholds of the algorithms, the additional check nodes added to the graph. But there is a
along with the number of iterations, and total number of big gain at check nodes, where the complexity order is now
11
O(2p ). When comparing this with the complexity orders of [3] M. C. Davey, and D. J. C. Mackay, “Low-density parity check codes over
other algorithms, which are of orders O(22p ) and O(2p p), it GF (q)”, IEEE Comm. Lett., vol. 2, no. 6, pp. 165-167, June 1998
[4] H. Song, and J.R. Cruz, “Reduced complexity decoding of Q-ary LDPC
becomes apparent that the gains should get more prominent codes for magnetic recording”, IEEE Trans. on Magnetics, vol. 39, no.
with increasing field order. 2, pp. 1081-1087, Mar. 2003
Comparing the decoding latency of CPbD with the fastest [5] L. Barnault, and D. Declerq, “Fast decoding algorithm for LDPC over
GF (2q )”, Proc. of IEEE ITW, Paris, France, Apr. 2003
QSPA version, it can be seen that gains of 33% for C1 , 40% [6] H. Wymeersch et al, “Log-domain decoding of LDPC codes over
for C2 , and 26% for C3 are achieved. For CPbD-MS, gains are GF (q)”, Proc. of IEEE ICC, Paris, France, Jun. 2004
78%, 80% and 67% respectively, which translate to speed-ups [7] D. Declercq, and M. Fossorier, “Decoding algorithms for nonbinary
LDPC codes over GF (q)”, IEEE Trans. on Comm., vol. 55, no. 4, pp.
of around 3-5 times. It should also be noted that complexity 633-643, Apr. 2007
gains offered from min-max decoding are 37%, 27%, and 11% [8] V. Savin, “Min-Max decoding for non binary LDPC codes”, Proc. of
for the three codes. IEEE ISIT, Toronto, Canada, July 2008
[9] Q. Huang, and S. Yuan, “Bit reliability-based decoders for non-binary
Apart from the complexity advantages discussed, CPbD and LDPC codes”, IEEE Trans. on Comm., vol. 64, no. 1, pp. 38-48, Jan.
CPbD-MS have features which make them attractive for hard- 2016
ware implementations. As [31] notes, finite field operations [10] M. Zhang et al , “On bit-level decoding of nonbinary LDPC codes”,
IEEE Trans. on Comm., vol. 66, no. 9, pp. 3736-3748, Sep. 2018
at check nodes for handling the permutations of probability [11] V. Savin, “Binary linear-time erasure decoding for NB-LDPC codes”,
vectors due to non-binary edge weights is only feasible at Proc. of IEEE ITW, Taormina, Italy, Oct. 2009
hardware level when the field size is small. But with the [12] L. P. Sy et al, “Extended non-binary low-density parity-check codes over
erasure channels”, Proc. of ISWCS, Aachen, Germany, Nov. 2011
new expansion, these operations are not strictly necessary. [13] V. Savin, “Fourier domain representation of non-binary LDPC codes”,
Given the code to be used, it is now possible to hard- Proc. of IEEE ISIT, Cambridge, USA, Aug. 2012
wire the connections between binary nodes, similar to the [14] Y. Yu et al , “Generalized binary representation for the nonbinary LDPC
code with decoder design”, IEEE Trans. on Comm., vol. 62, no. 9, pp.
architecture suggested for FFT-QSPA in [31]. Also, since the 3070-3083, Sep. 2014
new algorithms only use LLR domain operations, which have [15] J. J. Rothman, “Advanced modern algebra”, 1st ed. Prentice Hall, 2003,
better numerical stability, they are more suited for hardware pp. 82-94
[16] F. J. MacWilliams, and N. J. A. Sloane, “The theory of error-correcting
implementations [31]. Since the expansion adds (2p −1) binary codes”, 1st ed. Amsterdam: North-Holland, 1977
nodes for every GF (2p ) node, CPbD and CPbD-MS do not [17] M. Fossorier et al, “Reduced complexity iterative decoding of low-
offer any advantage in memory. Still, it should be noted that density parity check codes based on belief propagation”, IEEE Trans.
on Comm., vol. 47, no. 5, pp. 673-680, May 1999
the messages passed along the edges of the new graph are just [18] T. J. Richardson, and R. L. Urbanke, “Efficient encoding of low-density
single values, not length 2p vectors as in existing decoders. parity-check codes”, IEEE Trans. on Inf. Th., vol. 47, no. 2, pp. 638-656,
Moreover, although the new algorithms are for decoding Feb. 2001
[19] C. Di et al, “Finite-length analysis of low-density parity-check codes on
NB-LDPC codes, the operations utilized are the ones used the binary erasure channel”, IEEE Trans. on Inf. Th., vol. 48, no. 6, pp.
in binary LDPC decoders. Therein lies the major advantage 1570-1579, June 2002
in terms of hardware implementations. With CPbD, all the [20] Y. Wang et al, “Hierarchical and high-girth QC LDPC codes”, IEEE
Trans. on Inf. Th., vol. 59, no. 7, pp. 4553-4583, July 2013
technologies developed and optimized for binary LDPC codes [21] J. Jiang, and K. R. Narayanan, “Iterative soft-input soft-output decoding
over the years can be used for decoding NB-LDPC codes. of Reed-Solomon codes by adapting the parity-check matrix”, IEEE
Trans. on Inf. Th., vol. 52, no. 8, pp. 3746-3756, Aug. 2006
[22] J. Sayir, “Non-binary LDPC decoding using truncated messages in the
VI. C ONCLUSIONS Walsh-Hadamard domain”, Proc. of IEEE ISITA, Melbourne, Australia,
Oct. 2014
In this paper, we proposed a new decoding algorithm for [23] J. Chen, and M. Fossorier, “Near optimum universal belief propagation
NB-LDPC codes over GF (2p ), which is based on a novel based decoding of low-density parity check codes”, IEEE Trans. on
binary expansion of non-binary Tanner graphs. While the Comm., vol. 50, no. 3, pp. 406-414, Aug. 2002
[24] V. Savin, “Self-corrected min-mum decoding of LDPC codes”, Proc. of
algorithm offers better complexity-performance trade-offs than IEEE ISIT, Toronto, Canada, July 2008
the existing decoders for NB-LDPC codes, its simplified [25] X. Wu et al, “Adaptive-normalized/offset min-sum algorithm”, IEEE
version is able to achieve speed-ups of several factors with Comm. Letters, vol. 14, no. 7, pp. 667-669, July 2010
[26] D. J. C. Mackay, “Encyclopedia of Sparse Graph Codes”, [Online].
a minimal performance loss. Both versions of the algorithm Available: http://www.inference.org.uk/mackay/codes/data.html.
have features which make them very attractive for hardware [27] L. Dolcheck et al, “Non-binary protograph-based LDPC Codes: Enu-
implementations. New expansion has features which may offer merators, analysis, and designs”, IEEE Trans. on Inf. Th., vol. 60, no. 7,
pp. 3913-3941, Apr. 2014
advantageous in applications different to decoding NB-LDPC [28] Consultative Committee for Space Data Systems, “Short block length
codes as well. LDPC codes for TC synchronization and channel coding”, Experimental
Specification, CCSDS 231.1-O-1, Apr. 2015
[29] G. Liva et al, “Code design for short blocks: A survey”, arXiv preprint,
ACKNOWLEDGMENT 2016, [Online]. Available: https://arxiv.org/abs/1610.00873
The authors would like to thank Mr. Mohommad Rowshan [30] M. C. Davey, “Error Correction Using Low-Density Parity-Check
Codes”, Ph.D Thesis, Cambridge Uni., UK, Dec. 1999
for help with the polar code results. [31] C. Spagnol et al, “Hardware implementation of GF (2m ) LDPC de-
coders”, IEEE Trans. Circuits Syst. I, vol. 56, no. 12, pp. 2609-2620,
R EFERENCES Mar. 2009
[1] R. G. Gallager, “Low-density parity-check codes”, IRE Trans. on Inf. Th.,

vol. IT-8, pp. 21-28, Jan. 1962
[2] D. J. C. MacKay, “Good error-correcting codes based on very sparse
matrices”, IEEE Trans. on Inf. Th., vol. 46, no. 2, pp. 399-431, Mar.
1999
12
V. B. Wijekoon (S’18) received the BSc degree in Rino Micheloni (M’98–SM’04) is Vice-President
electronic and telecommunication engineering from and Fellow at Microsemi-Microchip Corporation,
University of Moratuwa, Sri Lanka. He is currently where he currently runs the Flash Signal Processing
pursuing the Ph.D. degree with the Department Labs in Milan, Italy, with special focus on NAND
of Electrical and Computer Systems Engineering, Flash, Error Correction Codes, and Machine Learn-
Monash University, Australia. His research interests ing. Prior to joining Microsemi, he was Fellow at
include classical and modern coding theory, and its PMC-Sierra, working on NAND Flash characteriza-
applications in communications and storage systems. tion, LDPC, and NAND Signal Processing as part
of the team developing Flash controllers for PCIe
SSDs. Before that, he was with IDT (Integrated
Device Technology) as Lead Flash Technologist,
driving the architecture and design of the BCH engine in the world’s first
PCIe NVMe SSD controller. Early in his career, he led NAND design teams
at STMicroelectronics, Hynix, and Infineon/Qimonda; during this time, he de-
veloped the industry’s first MLC NOR device with embedded ECC technology
and the industry’s first MLC NAND with embedded BCH. Dr. Micheloni is
IEEE Senior Member, he has co-authored more than 80 publications, and
he holds 291 patents worldwide (including 137 US patents). He received
the STMicroelectronics Exceptional Patent Award in 2003 and 2004, and
Emanuele Viterbo (M’95–SM’04–F’11) is cur-
the Infineon/Qimonda IP Award in 2007. Dr. Micheloni has published the
rently a professor in the ECSE Department and an
following books with Springer: Inside Solid State Drives – 2nd edition (2018),
Associate Dean in Graduate Research at Monash
Solid-State-Drives (SSDs) Modeling (2017), 3D Flash Memories (2016),
University, Melbourne, Australia. He received his
Inside Solid State Drives (2013), Inside NAND Flash Memories (2010), Error
Ph.D. in 1995 in Electrical Engineering, from the
Correction Codes for Non-Volatile Memories (2008), Memories in Wireless
Politecnico di Torino, Torino, Italy. From 1990 to
Systems (2008), and VLSI-Design of Non-Volatile Memories (2005).
1992 he was with the European Patent Office, The
Hague, The Netherlands, as a patent examiner in the
field of dynamic recording and error-control coding.
Between 1995 and 1997 he held a post-doctoral
position in the Dipartimento di Elettronica of the
Politecnico di Torino. In 1997-98 he was a post-doctoral research fellow in the
Information Sciences Research Center of AT T Research, Florham Park, NJ,
USA. From 1998-2005, he worked as Assistant Professor and then Associate
Professor, in Dipartimento di Elettronica at Politecnico di Torino. From 2006-
2009, he worked in DEIS at University of Calabria, Italy, as a Full Professor.
Prof. Emanuele Viterbo is an ISI Highly Cited Researcher since 2009. He
is Associate Editor of IEEE Transactions on Information Theory, European
Transactions on Telecommunications and Journal of Communications and
Networks, and Guest Editor for IEEE Journal of Selected Topics in Signal
Processing: Special Issue Managing Complexity in Multiuser MIMO Systems.
Prof. Emanuele Viterbo was awarded a NATO Advanced Fellowship in 1997
from the Italian National Research Council. His main research interests are in
lattice codes for the Gaussian and fading channels, algebraic coding theory,
algebraic space-time coding, digital terrestrial television broadcasting, digital
magnetic recording, and irregular sampling.
Alessia Marelli is Technical Leader at Microsemi.

She took her degree in applied mathematics in 2003
and now has 15 years of experience in error correc-
tion codes applied to storage, flash characterization
and data mining. Before Microsemi, she was part of
the characterization in PMC-Sierra and in the ECC
Yi Hong (S’00–M’05–SM’10) is currently a Senior team in IDT. Before IDT, she worked in Infineon
lecturer at the Department of Electrical and Com- for 2 years and in STMicroelectronics for 5 years as
puter Systems Eng., Monash University, Melbourne, senior digital designer. She is co-author of several
Australia. She obtained her Ph.D. degree in Electri- patents in the US related to ECC applied to storage
cal Engineering and Telecommunications from the and of 5 books edited by Springer regarding ECC,
University of New South Wales (UNSW), Sydney, NAND Flash and Solid State Drives.
and received the NICTA-ACoRN Earlier Career Re-
searcher Award at the Australian Communication
Theory Workshop, Adelaide, Australia, 2007. She
currently serves on the Australian Research Council
College of Experts (2018-2020). Dr. Hong was an
Associate Editor for IEEE Wireless Communication Letters and Transac-
tions on Emerging Telecommunications Technologies (ETT). She was the
General Co-Chair of IEEE Information Theory Workshop 2014, Hobart; the
Technical Program Committee Chair of Australian Communications Theory
Workshop 2011, Melbourne; and the Publicity Chair at the IEEE Information
Theory Workshop 2009, Sicily. She was a Technical Program Committee
member for many IEEE leading conferences. Her research interests include
communication theory, coding and information theory with applications to
telecommunication engineering.

Novel Graph Expansion Enables Efficient NB-LDPC Decoding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Novel Graph Expansion Enables Efficient NB-LDPC Decoding

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

A Novel Graph Expansion and a Decoding

and 0 otherwise. The expanded matrix Hexp1 for the parity-  

rithm. In particular, calculation of tanh and tanh−1 functions,

required in equations (7) and (8), is a complex operation at 10-3

hardware level. In binary LDPC decoders, these operations

Fig.2 shows FER performance of algorithms with C1 , a 100

of optimized scaling factors, as suggested in Section III-D,

10-3 It should be noted that this configuration of SCL is four times

[1] R. G. Gallager, “Low-density parity-check codes”, IRE Trans. on Inf. Th.,

Alessia Marelli is Technical Leader at Microsemi.

You might also like