Advances in Error Control
Coding Techniques
Guest Editors: Yonghui Li, Jinhong Yuan,
Andrej Stefanov, and Branka Vucetic
Advances in Error Control Coding
Techniques
EURASIP Journal on
Wireless Communications and Networking
Advances in Error Control Coding
Techniques
Guest Editors: Yonghui Li, Jinhong Yuan,
Andrej Stefanov, and Branka Vucetic
Copyright © 2008 Hindawi Publishing Corporation. All rights reserved.
This is a special issue published in volume 2008 of “EURASIP Journal on Wireless Communications and Networking.” All articles are
open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
EditorinChief
Luc Vandendorpe, UCL, Belgium
Associate Editors
Thushara D. Abhayapala, Australia
Farid Ahmed, USA
Mohamed H. Ahmed, Canada
Alagan Anpalagan, Canada
Carles AntonHaro, Spain
Anthony C. Boucouvalas, Greece
Lin Cai, Canada
YuhShyan Chen, Taiwan
Biao Chen, USA
Pascal Chevalier, France
ChiaChin Chong, South Korea
Huaiyu Dai, USA
Soura Dasgupta, USA
Ibrahim Develi, Turkey
Petar M. Djuri´ c, USA
Mischa Dohler, Spain
Abraham O. Fapojuwo, Canada
Michael Gastpar, USA
Alex B. Gershman, Germany
Wolfgang Gerstacker, Germany
David Gesbert, France
Fary Ghassemlooy, UK
Christian Hartmann, Germany
Stefan Kaiser, Germany
G. K. Karagiannidis, Greece
Chi Chung Ko, Singapore
Visa Koivunen, Finland
Nicholas Kolokotronis, Greece
Richard Kozick, USA
S. Lambotharan, UK
Vincent Lau, Hong Kong
David I. Laurenson, UK
Tho LeNgoc, Canada
Wei Li, USA
Tongtong Li, USA
Zhiqiang Liu, USA
Steve McLaughlin, UK
Sudip Misra, India
Ingrid Moerman, Belgium
Marc Moonen, Belgium
Eric Moulines, France
Sayandev Mukherjee, USA
Kameswara Rao Namuduri, USA
Amiya Nayak, Canada
Claude Oestges, Belgium
A. Pandharipande, The Netherlands
Phillip Regalia, France
A. Lee Swindlehurst, USA
Sergios Theodoridis, Greece
George S. Tombras, Greece
Lang Tong, USA
Athanasios Vasilakos, Greece
Ping Wang, Canada
Weidong Xiang, USA
Yang Xiao, USA
Xueshi Yang, USA
Lawrence Yeung, Hong Kong
Dongmei Zhao, Canada
Weihua Zhuang, Canada
Contents
Advances in Error Control Coding Techniques, Yonghui Li, Jinhong Yuan,
Andrej Stefanov, and Branka Vucetic
Volume 2008, Article ID 574783, 3 pages
Structured LDPC Codes over Integer Residue Rings, Elisa Mo and Marc A. Armand
Volume 2008, Article ID 598401, 9 pages
Diﬀerentially Encoded LDPC Codes—Part I: Special Case of Product Accumulate Codes,
Jing Li (Tiﬀany)
Volume 2008, Article ID 824673, 14 pages
Diﬀerentially Encoded LDPC Codes—Part II: General Case and Code Optimization,
Jing Li (Tiﬀany)
Volume 2008, Article ID 367287, 10 pages
Construction and Iterative Decoding of LDPC Codes Over Rings for PhaseNoisy Channels,
Sridhar Karuppasami and William G. Cowley
Volume 2008, Article ID 385421, 9 pages
NewTechnique for Improving Performance of LDPC Codes in the Presence of Trapping Sets,
Esa Alghonaim, Aiman ElMaleh, and Mohamed Adnan Landolsi
Volume 2008, Article ID 362897, 12 pages
Distributed Generalized LowDensity Codes for Multiple Relay Cooperative Communications,
Changcai Han and Weiling Wu
Volume 2008, Article ID 852397, 9 pages
ReedSolomon Turbo Product Codes for Optical Communications: FromCode Optimization
to Decoder Design, Rapha¨ el Le Bidan, Camille Leroux, Christophe Jego, Patrick Adde,
and Ramesh Pyndiah
Volume 2008, Article ID 658042, 14 pages
Complexity Analysis of ReedSolomon Decoding over GF(2
m
) without Using Syndromes,
Ning Chen and Zhiyuan Yan
Volume 2008, Article ID 843634, 11 pages
Eﬃcient Decoding of Turbo Codes with Nonbinary Belief Propagation, Charly Poulliat,
David Declercq, and Thierry Lestable
Volume 2008, Article ID 473613, 10 pages
SpaceTime Convolutional Codes over Finite Fields and Rings for Systems with Large
Diversity Order, Mario de NoronhaNeto and B. F. Uchˆ oaFilho
Volume 2008, Article ID 624542, 7 pages
Joint Decoding of Concatenated VLEC and STTC System, Huijun Chen and Lei Cao
Volume 2008, Article ID 890194, 8 pages
Average Throughput with Linear Network Coding over Finite Fields: The Combination Network Case,
Ali AlBashabsheh and Abbas Yongacoglu
Volume 2008, Article ID 329727, 7 pages
MacWilliams Identity for Codes with the Rank Metric, Maximilien Gadouleau and Zhiyuan Yan
Volume 2008, Article ID 754021, 13 pages
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 574783, 3 pages
doi:10.1155/2008/574783
Editorial
Advances in Error Control Coding Techniques
Yonghui Li,
1
Jinhong Yuan,
2
Andrej Stefanov,
3
and Branka Vucetic
1
1
School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW 2006, Australia
2
School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia
3
Department of Electrical and Computer Engineering, Polytechnic University, 6 Metrotech Center, Brooklyn, NY 11201, USA
Correspondence should be addressed to Yonghui Li, lyh@ee.usyd.edu.au
Received 9 September 2008; Accepted 9 September 2008
Copyright © 2008 Yonghui Li et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In the past decade, a signiﬁcant progresshas been reported in
the ﬁeld of error control coding. In particular, the innovation
ofturbo codes and rediscovery of LDPC codes have been
recognized as two signiﬁcant breakthroughs in this ﬁeld.
The distinct features of these capacity approaching codes
have enabled them to be widely proposed and/or adopted
in existing wireless standards. Furthermore, the invention
of space time coding signiﬁcantly increased the capacity of
wireless systems and these codes have been widely applied
in broadband communication systems. Recently, new coding
concepts, exploiting the distributed nature of networks, have
been developed, such as network coding and distributed
coding techniques. They have great potential applications
in wireless, sensor, and ad hoc networks. Despite recent
advances, many challenging problems still remain. This
special issue is intended to present the stateoftheart results
in the theory and applications of coding techniques.
The special issue has received twenty six submissions, and
among them, thirteen papers have been ﬁnally selected after
a rigorous review process. They reﬂect recent advances in the
area of error control coding.
In the ﬁrst paper, “Structured LDPC codes over integer
residue rings,” Mo and Armand designed a new class of
lowdensity paritycheck (LDPC) codes over integer residue
rings. The codes are constructed based on regular Tanner
graphs by using Latin squares over a multiplicative group
of a Galois ring, rather than a ﬁnite ﬁeld. The proposed
approach is suitable for the design of codes with a wide
range of rates. One feature of this type of codes is that
their minimum pseudocodeword weights are equal to their
minimum Hamming distances.
The next twopart series of papers “Diﬀerentially
encoded LDPC codes—Part I: special case of product accu
mulate codes” and “Diﬀerentially encoded LDPC codes—
Part II: general case and code optimization,” by J. Tiﬀany
Li, study the theory and practice of diﬀerentially encoded
lowdensity paritycheck (DELDPC) codes in the context of
noncoherent detection. Part I studies a special class of DE
LDPC codes, product accumulate codes. The more general
case of DELDPC codes, where the LDPC part may take
arbitrarydegree proﬁles, is studied in Part II. The analysis
reveals that a conventional LDPC code is not ﬁtful for
diﬀerential coding, and does not in general deliver a desirable
performance when detected noncoherently. Through extrin
sic information transfer (EXIT) analysis and a modiﬁed
“convergence constraint” density evolution (DE) method,
a characterization of the type of LDPC degree proﬁles is
provided. The convergenceconstraint method provides a
useful extension to the conventional “thresholdconstraint”
method, and can match an outer LDPC code to any given
inner code with the imperfectness of the inner decoder taken
into consideration.
In the fourth paper, “Construction and iterative decoding
of LDPC codes over rings for phasenoisy channels,” by
Karuppasami and Cowley, a design and decoding method for
LDPC codes for channels with phase noise is proposed. The
new code applies blind or turbo estimators to provide signal
phase estimates over each observation interval. It is resilient
to phase rotations of 2π/M, where M is the number of phase
symmetries in the signal set and estimates phase ambiguities
in each observation interval.
A novel approach for enhancing decoder performance
in presence of trapping sets by introducing a new concept
called trapping set neutralization is proposed in the ﬁfth
paper “New technique for improving performance of LDPC
codes in the presence of trapping sets” by E. Alghonaim et al.
The eﬀect of a trapping set can be eliminated by setting its
variable nodes intrinsic and extrinsic values to zero. After a
2 EURASIP Journal on Wireless Communications and Networking
trapping set is neutralized, the estimated values of variable
nodes are aﬀected only by external messages from nodes
outside the trapping set. Most harmful trapping sets are
identiﬁed by means of simulation. To be able to neutralize
identiﬁed trapping sets, a simple algorithm is introduced to
store trapping sets conﬁguration information in variable and
check nodes.
Design of eﬃcient distributed coding schemes for coop
erative communications networks has recently attracted
signiﬁcant attention. A distributed generalized lowdensity
(GLD) coding scheme for multiple relay cooperative com
munications is developed by Han and Wu in the sixth
paper “Distributed generalized lowdensity codes for mul
tiple relay cooperative communications.” By using partial
error detecting and error correcting capabilities of the
GLD code, each relay node decodes and forwards some
of the constituent codes of the GLD code to cooperatively
form a distributed GLD code. It can work eﬀectively and
keep a ﬁxed overall code rate when the number of relay
nodes varies. Furthermore, the partial decoding at relays is
allowed and a progressive processing procedure is proposed
to reduce the complexity and adapt to the sourcerelay
channel variations. Simulation results verify that distributed
GLD codes with various number of relay nodes can obtain
signiﬁcant performance gains in quasistatic fading channels
compared with the strategy without cooperation.
Since the early 1990s, a progressive introduction of inline
optical ampliﬁers and an advent of wavelength division
multiplexing (WDM) accelerated the use of FEC in optical
ﬁber communications to reduce the system costs and
improve margins against various line impairments, such as
beam noise, channel crosstalk, and nonlinear dispersion. In
contrast to the ﬁrst and second generations of FEC codes for
optical communications, which are based on ReedSolomon
(RS) codes and the concatenated codes with harddecision
decoding, the third generation FEC codes with softdecision
decoding are attractive to reduce costs by relaxing the
requirements on expensive optical devices in highcapacity
systems. In this regard, the seventh paper “ReedSolomon
turbo product codes for optical communications: from code
optimization to decoder design” by Bidan et al. investigates
the use of turboproduct codes with ReedSolomon codes as
the components for 40 Gb/s over optical transport networks
and 10 Gb/s over passive optical networks. The issues of
code design and novel ultrahighspeed parallel decoding
architecture are developed. The complexity and performance
tradeoﬀ of the scheme is also carefully addressed in this
paper.
Recently, there has been renewed interest in decoding
ReedSolomon (RS) codes without using syndromes. In
the eighth paper “Complexity analysis of ReedSolomon
decoding over GF(2
m
) without using syndromes,” Chen and
Yan investigated the complexity of a type of syndromeless
decoding for RS codes, and compared it to that of syndrome
based decoding algorithms. The complexity analysis in their
paper mainly focuses on RS codes over characteristic2
ﬁelds, for which some multiplicative FFT techniques are not
applicable. Their ﬁndings show that for highrate RS codes,
syndromeless decoding algorithms require more ﬁeld oper
ations and have higher hardware costs and lower throughput,
when compared to syndromebased decoding algorithms.
They also derived tighter bounds on the complexities of fast
polynomial multiplications based on Cantor’s approach and
the fast extended Euclidean algorithm.
In the ninth paper “Eﬃcient decoding of turbo codes
with nonbinary belief propagation” by Poulliat et al., a new
approach of decoding turbo codes by a nonbinary belief
propagation algorithm is proposed. The approach consists
in representing groups of turbo code binary symbols by a
nonbinary Tanner graph and applying a group belief iterative
decoding. The parity check matrices of turbo codes need to
be preprocessed to ensure the code good topological prop
erties. This preprocessing introduces an additional diversity,
which is exploited to improve the decoding performance.
The tenth paper, “Spacetime convolutional codes over
ﬁnite ﬁelds and rings for systems with large diversity order”
by UchoaFilho and NoronhaNeto, propose a convolutional
encoder over the ﬁnite ring of integers to generate a space
time convolutional code (STCC). Under this structure, the
paper has proved three interesting properties related to the
generator matrix of the convolutional code that can be used
to simplify the code search procedure for STCCs over the
ﬁnite ring of integers. The properties establish equivalences
among STCCs, so that many convolutional codes can be
discarded in the code search without loosing anything.
Providing highquality multimedia service has become
an attractive application in wireless communication systems.
In the eleventh paper, “Joint decoding of concatenated VLEC
and STTC system,” Chen and Cao proposed a joint source
channel coding scheme for wireless fading channels, which
combines variable length error correcting codes (VLECs)
and space time trellis codes (STTCs) to provide bandwidth
eﬃcient data compression, as well as coding and diversity
gains. At the receiver, an iterative joint source and space
time decoding algorithm is developed to utilize redundancy
in both STTC and VLEC to improve overall decoding
performance. In their paper, various issues, such as the
inseparable systematic information in the symbol level, the
asymmetric trellis structure of VLEC, information exchange
between bit and symbol domains, and a rate allocation
between STTC and VLEC, have been investigated.
In the twelfth paper, “Average throughput with linear
network coding over ﬁnite ﬁelds: the combination network
case,” AlBashabsheh and Yongacoglu extend the average
coding throughput measure to include linear coding over
arbitrary ﬁnite ﬁelds. They characterize the average linear
network coding throughput for the combination network
with mincut 2 over an arbitrary ﬁnite ﬁeld, and provide
a network code, which is completely speciﬁed by the ﬁeld
size and achieves the average coding throughput for the
combination network.
The MacWilliams identity and related identities for linear
codes with the rank metric are derived in thethirteenth
paper “MacWilliams identity for codes with the rank metric”
by Gadouleau and Yan. It is shown that similar to the
MacWilliams identity for the Hamming metric, the rank
weight distribution of any linear code can be expressed as
a functional transformation of that of its dual code, and the
Yonghui Li et al. 3
rank weight enumerator of the dual of any vector depends
only on the rank weight of the vector and is related to the
rank weight enumerator of a maximum rank distance code.
Yonghui Li
Jinhong Yuan
Andrej Stefanov
Branka Vucetic
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 598401, 9 pages
doi:10.1155/2008/598401
Research Article
Structured LDPC Codes over Integer Residue Rings
Elisa Mo and Marc A. Armand
Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576
Correspondence should be addressed to Marc A. Armand, eleama@nus.edu.sg
Received 31 October 2007; Revised 31 March 2008; Accepted 3 June 2008
Recommended by Jinhong Yuan
This paper presents a new class of lowdensity paritycheck (LDPC) codes over Z
2
a represented by regular, structured Tanner
graphs. These graphs are constructed using Latin squares deﬁned over a multiplicative group of a Galois ring, rather than a ﬁnite
ﬁeld. Our approach yields codes for a wide range of code rates and more importantly, codes whose minimum pseudocodeword
weights equal their minimum Hamming distances. Simulation studies show that these structured codes, when transmitted using
matched signal sets over an additivewhiteGaussiannoise channel, can outperform their random counterparts of similar length
and rate.
Copyright © 2008 E. Mo and MarcA. Armand. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
1. INTRODUCTION
The study of nonbinary LDPCcodes over GF(q) was initiated
by Davey and Mackay [1]. However, the symbols of a
nonbinary code over a ﬁnite ﬁeld cannot be matched to
any signal constellation. In other words, it is not possible
to obtain a geometrically uniform code (wherein every
codeword has the same error probability), from a nonbinary,
ﬁnite ﬁeld code. The subject of geometrically uniform codes
has been well studied by various authors including Slepian
[2] and Forney Jr. [3]. More recently, Sridhara and Fuja [4]
introduced geometrically uniform, nonbinary LDPC codes
over certain rings, including integer residue rings. Their
codes are however unstructured. Structured LDPC codes,
which include the family of ﬁnite geometry (FG) codes [5]
and balanced incomplete block design (BIBD) codes [6], are
favored over their random counterparts due to the reduction
in storage space for the parity check matrix and the ease in
performance analysis they provide, while achieving relatively
similar performance. Structured nonbinary LDPC codes that
have been proposed thus far however, are constructed over
ﬁnite ﬁelds, for example, [1, 7], and therefore cannot be
geometrically uniform.
This paper therefore addresses the problem of designing
structured, geometrically uniform, nonbinary LDPC codes
over integer residue rings. Motivated by the fact that
short nonbinary LDPC codes can outperform their binary
counterparts [8–10], we focus our investigations on codes of
short codelength. Studies of the socalled pseudocodewords
arising from ﬁnite covers of a Tanner graph, for example,
[11–13], have revealed that while a code’s performance
under maximumlikelihood (ML) decoding is dictated by
its (Hamming) weight distribution, its performance under
iterative decoding is dictated by the weight distribution
of the pseudocodewords associated with its Tanner graph.
More speciﬁcally, the presence of pseudocodewords of low
weight, particularly those of weight less than the minimum
Hamming distance of the code, is detrimental to a code’s
performance under iterative decoding. We therefore adopt
the Latinsquaresbased approach of Kelley et al. [14] to con
struct structured codes, as their method aims at maximizing
the minimum pseudocodeword weight of a code. While we
maintain the pseudocodeword framework used there, our
work nevertheless diﬀers from [14] primarily because our
construction relies on an extension of the notion of Latin
squares to multiplicative groups of a Galois ring—a key
contribution of this paper.
We note that codes based on Latin squares were also
studied in [7, 15–17]. However, the authors of these works
did not do so in the pseudocodeword framework. Codes con
structed using other combinatorial approaches, such as those
presented in [6, 18, 19], were similarly not investigated using
2 EURASIP Journal on Wireless Communications and Networking
the notion of pseudocodewords. Speciﬁcally, these related
works focused on the optimization of design parameters such
as girth, expansion, diameter, and stopping sets. Our work
therefore diﬀers from these earlier studies in this regard.
For practical reasons, we only consider linear codes over
Z
2
a . In the next section, we provide an overview of codes
over Z
2
a and their natural mapping to a matched signal
constellation, that is, the 2
a
PSK constellation. Section 3
introduces the notion of Latin squares over ﬁnite ﬁelds,
followed by our extension of Latin squares to multiplicative
groups of a Galois ring. A method to construct Tanner
graphs using Latin squares (over a multiplicative group
of a Galois ring) is presented in Section 4. We show that
from these graphs, a wide range of code rates may be
obtained. We further derive in the same section certain
properties of the corresponding codes and, in particular,
show that their minimum pseudocodeword weights equal
their minimum Hamming distances. This is one of our
main results. Finally, Section 5 presents computer simula
tions which demonstrate that our codes, when mapped to
matched signal sets and transmitted over the additivewhite
Gaussiannoise (AWGN) channel, outperform their random
counterparts of similar length and rate.
2. CODES OVER Z
2
a
2.1. An overview
Let C be a Z
2
a submodule of the free Z
2
a module Z
n
2
a . Its n
G
×
n generator matrix G can be expressed in the form [20]
G =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
2
λ1
g
1
2
λ2
g
2
.
.
.
2
λn
G
g
nG
⎤
⎥
⎥
⎥
⎥
⎥
⎦
, (1)
where 0 ≤ λ
i
≤ a − 1 for i = 1, 2, . . . , n
G
and
¦g
1
, g
2
, . . . , g
nG
¦ ⊂ Z
n
2
a is a set of linearly independent
elements. The rate of C is
r =
1
n
nG
¸
i=1
a −λ
i
a
=
n
G
n
−
¸
nG
i=1
λ
i
an
. (2)
The dual code C
⊥
is generated by the n
H
×n paritycheck
matrix of C, which can be expressed in the form
H =
⎡
⎢
⎢
⎢
⎢
⎣
2
μ1
h
1
2
μ2
h
2
.
.
.
2
μn
H
h
nH
⎤
⎥
⎥
⎥
⎥
⎦
, (3)
where 0 ≤ μ
i
≤ a − 1 for i = 1, 2, . . . , n
H
and
¦h
1
, h
2
, . . . , h
nH
¦ ⊂ Z
n
2
a is a set of linearly independent
elements. The rate of C can also be obtained by
r = 1 −
1
n
nH
¸
i=1
a −μ
i
a
= 1 −
n
H
n
+
¸
nH
i=1
μ
i
an
. (4)
If G (or H) is not already in the form in (1) (or in (3)),
one could perform Gaussian elimination without dividing
a row by a zero divisor to obtain the n
G
(or n
H
) linearly
independent rows.
Remark 1. C is a free Zsubmodule if λ
i
= 0 for i = 1,
2, . . . , n
G
. This also implies that μ
i
= 0 for i = 1, 2, . . . , n
H
.
2.2. The matched signal set
The 2
a
PSK signal set contains 2
a
points that are equidistant
from the origin while maximally spread apart on a two
dimensional space. Projecting one dimension on the real axis
and the other on the imaginary axis, a symbol x ∈ Z
2
a is
mapped to s
x
=
E
s
exp( j2πx/2
a
) of the signal set, where
E
s
is the energy assigned to each symbol [4].
The 2
a
PSK is matched to Z
2
a because for any x, y ∈ Z
2
a ,
d
2
E
¸
s
x
, s
y
= d
2
E
¸
s
x−y
, s
0
, (5)
where d
2
E
(s
x
, s
y
) denotes the square Euclidean distance
between s
x
and s
y
[21].
Let c
x
, c
y
∈ C, where c
x
= [x
1
, x
2
, . . . , x
n
] and c
y
= [y
1
,
y
2
, . . . , y
n
]. They are mapped symbol by symbol to [s
x1
,
s
x2
, . . . , s
xn
] and [s
y1
, s
y2
, . . . , s
yn
], respectively. The squared
Euclidean distance between these two signal vectors is
d
2
E
¸¸
s
x1
, s
x2
, . . . , s
xn
¸
,
¸
s
y1
, s
y2
, . . . , s
yn
¸
=
n
¸
i=0
d
2
E
¸
s
xi
, s
yi
=
n
¸
i=0
d
2
E
¸
s
xi −yi
, s
0
= d
2
E
¸¸
s
x1−y1
, s
x2−y2
, . . . , s
xn−yn
¸
,
¸
s
0
, s
0
, . . . , s
0
¸
.
(6)
Observe that the Hamming distance between two code
words is mapped proportionally to the Euclidean distance
between their corresponding signal vectors.
3. LATINSQUARES
3.1. Deﬁnition and application to galois ﬁelds
The following deﬁnition and example are taken from [22,
Chapter 17].
Deﬁnition 1. A Latin square of order q is denoted as
(R, C, S; L), where R, C, and S are sets of cardinality q and
L is a mapping L(i, j) = k, where i ∈ R, j ∈ C, and k ∈ S,
such that given any two of i, j, and k, the third is unique.
A Latin square can be written as a q × q array for which
the cell in row i and column j contains the symbol L(i, j).
Two Latin squares with mapping functions L and L
¹
are
orthogonal if (L(i, j), L
¹
(i, j)) is unique for each pair (i, j).
Further, a complete family of q−1 mutually orthogonal Latin
squares (MOLS) exists for q = p
s
, where p is prime.
The notion of Latin squares can be easily applied to
Galois ﬁelds by setting R = C = S = GF(p
s
) and mapping
function L
β
(i, j) = i + βj for β ∈ GF(p
s
) \ ¦0¦.
E. Mo and MarcA. Armand 3
Example 1. Let R = C = S = GF(2
2
) = ¦0, 1, α, α
2
¦.
Mapping functions L
1
(i, j) = i + j, L
α
(i, j) = i + αj and
L
α
2 (i, j) = i + α
2
j yield a complete family of three MOLS
M
1
=
⎡
⎢
⎢
⎢
⎣
0 1 α α
2
1 0 α
2
α
α α
2
0 1
α
2
α 1 0
⎤
⎥
⎥
⎥
⎦
, M
α
=
⎡
⎢
⎢
⎢
⎣
0 α α
2
1
1 α
2
α 0
α 0 1 α
2
α
2
1 0 α
⎤
⎥
⎥
⎥
⎦
M
α
2 =
⎡
⎢
⎢
⎢
⎣
0 α
2
1 α
1 α 0 α
2
α 1 α
2
0
α
2
0 α 1
⎤
⎥
⎥
⎥
⎦
,
(7)
respectively.
3.2. Extension to multiplicative groups of a Galois ring
Extending the notion of Latin squares over integer residue
rings is not trivial. Setting R = C = S = Z
2
s and mapping
functions L
β
(i, j) = i + βj for β ∈ Z
2
s \ ¦0¦ do not yield a
complete family of 2
s
−1 MOLS.
Example 2. Let R = C = S = Z
2
2 = ¦0, 1, 2, 3¦ and let
mapping functions be L
1
(i, j) = i + j, L
2
(i, j) = i + 2j and
L
3
(i, j) = i + 3j,
M
1
=
⎡
⎢
⎢
⎢
⎣
0 1 2 3
1 2 3 0
2 3 0 1
3 0 1 2
⎤
⎥
⎥
⎥
⎦
, M
2
=
⎡
⎢
⎢
⎢
⎣
0 2 0 2
1 3 1 3
2 0 2 0
3 1 3 1
⎤
⎥
⎥
⎥
⎦
M
3
=
⎡
⎢
⎢
⎢
⎣
0 3 2 1
1 0 3 2
2 1 0 3
3 2 1 0
⎤
⎥
⎥
⎥
⎦
,
(8)
are obtained, respectively. Since the elements in each row of
M
2
is not unique, M
2
is not a Latin square. Therefore, we do
not have a complete family of three MOLS.
Hence, we propose an alternative way of constructing
Latin squares over integer residue rings. Let extension ring
R = GR(2
a
, s) = Z
2
a [y]/¹φ(y)), where φ(y) is a degree s
basic irreducible polynomial over Z
2
a . Embedded in R is a
multiplicative group G
2
s
−1
of units of order 2
s
− 1. Further,
we let a
¹
< a and deﬁne z = z mod 2
a
¹
, where z ∈ R, and
extend this notation to ntuples and matrices over R.
Example 3. Let R = GR(2
2
, 2) = Z
4
[y]/¹y
2
+ y + 3).
Embedded in R is G
3
= ¦1, α, α
2
¦ = ¦1, y + 2, 3y + 1¦,
generated by α = y + 2. Let R = C = G
3
∪ ¦0¦. Mapping
functions L
1
(i, j) = i+j, L
α
(i, j) = i+αj and L
α
2 (i, j) = i+α
2
j
yield matrices
M
1
=
⎡
⎢
⎢
⎢
⎣
0 1 y + 2 3y + 1
1 2 y + 3 3y + 2
y + 2 y + 3 2y 3
3y + 1 3y + 2 3 2y + 2
⎤
⎥
⎥
⎥
⎦
M
α
=
⎡
⎢
⎢
⎢
⎣
0 y + 2 3y + 1 1
1 y + 3 3y + 2 2
y + 2 2y 3 y + 3
3y + 1 3 2y + 2 3y + 2
⎤
⎥
⎥
⎥
⎦
M
α
2 =
⎡
⎢
⎢
⎢
⎣
0 3y + 1 1 y + 2
1 3y + 2 2 y + 3
y + 2 3 y + 3 2y
3y + 1 2y + 2 3y + 2 3
⎤
⎥
⎥
⎥
⎦
,
(9)
respectively. Since G
3
∪ ¦0¦ is not closed under Raddition,
S ⊂ R so that ¦S¦
/
=¦R¦ = ¦C¦ = 2
s
. Thus, all three matrices
are not Latin squares.
To overcome this problem, the mapping functions have
to be altered slightly such that they map i ∈ R and j ∈ C
uniquely to L
β
(i, j) ∈ S and ¦R¦ = ¦C¦ = ¦S¦.
Deﬁnition 2. L
(a)
β
(i, j) = ((i)
1/2
a−1
+(βj)
1/2
a−1
)
2
a−1
, where i, j ∈
G
2
s
−1
∪¦0¦ and β ∈ G
2
s
−1
.
Theorem 1. L
(a)
β
(i, j) ∈ G
2
s
−1
∪¦0¦.
Proof. It is apparent that (i)
1/2
a−1
, (βj)
1/2
a−1
∈ G
2
s
−1
∪ ¦0¦.
Since G
2
s
−1
∪ ¦0¦ is not closed under Raddition, (i)
1/2
a−1
+
(βj)
1/2
a−1
= u + 2v, where u ∈ G
2
s
−1
∪¦0¦ and v ∈ R. Using
binomial expansion, the mapping function can be expressed
as
L
(a)
β
(i, j) = (u + 2v)
2
a−1
=
2
a−1
¸
x=0
2
a−1
x
¸
u
2
a−1
−x
(2v)
x
. (10)
Observe that
¸
2
a−1
x
u
2
a−1
−x
(2v)
x
= 0 mod 2
a
for x = 1, 2, . . . ,
2
a−1
. Thus, L
(a)
β
(i, j) = u
2
a−1
∈ G
2
s
−1
∪¦0¦.
Theorem 2. Consider two multiplicative groups G
2
s
−1
⊂
GR(2
a
, s) = Z
2
a [y]/¹φ(y)) and G
¹
2
s
−1
⊂ GR(2
a
¹
, s) =
Z
2
a
¹
[y]/¹φ(y)), where φ(y) is a degrees basic irreducible
polynomial over Z
2
a . Let i, j ∈ G
2
s
−1
∪¦0¦ and β ∈ G
2
s
−1
, then
i, j ∈ G
¹
2
s
−1
∪¦0¦ and β ∈ G
¹
2
s
−1
. Then, L
(a
¹
)
β
(i, j) = L
(a)
β
(i, j).
Proof. Using binomial expansion,
L
(a)
β
(i, j) =
2
a−1
¸
x=0
2
a−1
x
¸
¸
(i)
1/2
a−1 2
a−1
−x¸
(βj)
1/2
a−1 x
mod 2
a
¹
.
(11)
4 EURASIP Journal on Wireless Communications and Networking
Now, observe that
2
a−1
x
¸
mod 2
a
¹
=
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩
2
a
¹
−1
y
¸
, x = y·2
a−a
¹
, where y is an integer,
0, otherwise.
(12)
Thus,
L
(a)
β
(i, j)
=
2
a
¹
−1
¸
y=0
2
a
¹
−1
y
¸
¸
(i)
1/2
a−1 2
a−1
−y·2
a−a
¹
¸
(βj)
1/2
a−1 y·2
a−a
¹
mod 2
a
¹
=
2
a
¹
−1
¸
y=0
2
a
¹
−1
y
¸
¸
(i)
1/2
a
¹
−1
2
a
¹
−1
−y
¸
(βj)
1/2
a
¹
−1
y
= L
(a
¹
)
β
(i, j).
(13)
Remark 2. When a
¹
= 1, the mapping function L
(1)
β
(i, j) =
i + β j coincides with the mapping function applied to the
Galois ﬁelds. Since L
(1)
β
(i, j) = L
(a)
β
(i, j) (from Theorem 2),
L
(a)
β
(i, j) is unique for a given pair (i, j). It follows that two
Latin squares constructed by L
(a)
β0
(i, j) and L
(a)
β1
(i, j), where
β
0
, β
1
∈ G
2
s
−1
and β
0 /
=β
1
, are orthogonal.
Let R = C = S = G
2
s
−1
∪ ¦0¦. A complete family
¦(R, C, S; L
(a)
β
) : β ∈ G
2
s
−1
¦ of MOLS is obtained by deﬁning
L
(a)
β
(i, j) = ((i)
1/(2
a−1
)
+ (βj)
1/(2
a−1
)
)
2
a−1
.
Example 4. Let R = C = S = G
3
∪ ¦0¦ ⊂ GR(2
2
, 2) and
mapping functions L
(2)
1
(i, j) = ((i)
1/2
+ j
1/2
)
2
, L
(2)
α (i, j) =
((i)
1/2
+ (αj)
1/2
)
2
and L
(2)
α
2 (i, j) = ((i)
1/2
+ (α
2
j)
1/2
)
2
. The
resultant MOLS are
M
1
=
⎡
⎢
⎢
⎢
⎣
0 1 α α
2
1 0 α
2
α
α α
2
0 1
α
2
α 1 0
⎤
⎥
⎥
⎥
⎦
, M
α
=
⎡
⎢
⎢
⎢
⎣
0 α α
2
1
1 α
2
α 0
α 0 1 α
2
α
2
1 0 α
⎤
⎥
⎥
⎥
⎦
,
M
α
2 =
⎡
⎢
⎢
⎢
⎣
0 α
2
1 α
1 α 0 α
2
α 1 α
2
0
α
2
0 α 1
⎤
⎥
⎥
⎥
⎦
,
(14)
respectively. A complete family of three MOLS is obtained.
In addition, the mapping function L
0
(i, j) = i yields a matrix
M
0
=
⎡
⎢
⎢
⎢
⎣
0 0 0 0
1 1 1 1
α α α α
α
2
α
2
α
2
α
2
⎤
⎥
⎥
⎥
⎦
(15)
Step
1
Step 2
Step 3
Step 4
0
1 2
s
2
2s
2
2s
2
s
+ 1
Figure 1: Portion of parity check matrix constructed in each step.
which is orthogonal to each Latin square in the complete
family of MOLS.
4. STRUCTUREDLDPC CODES OVER Z
2
a
4.1. Construction of graphs using latin squares
The construction method proposed in [14, Section IVA] can
be generalized to construct graphs for diﬀerent values of a
and s by altering the mapping functions according to the
value of a. The procedure is stated here for easy reference by
Theorem 3 that follows. The graph is a tree that has three
layers that enumerate from its root; the root is a variable
node, the ﬁrst layer has 2
s
+ 1 check nodes, the second layer
has 2
s
(2
s
+1) variable nodes and the third layer has 2
2s
check
nodes. Thus there are 2
2s
+2
s
+1 variable nodes and the same
number of check nodes. The connectivity of the nodes are
executed in the following steps.
(1) The variable root node is connected to each of the
check nodes in the ﬁrst layer.
(2) Each check node in the ﬁrst layer is connected to 2
s
consecutive variable nodes in the second layer.
(3) Each of the ﬁrst 2
s
variable nodes in the second layer
is connected to 2
s
consecutive check nodes in the
third layer.
(4) For i, j, k, β ∈ G
2
s
−1
∪ ¦0¦, label the remaining
variable nodes in the second layer (β, i) and all check
nodes in the third layer ( j, k). If β = 0, variable
node (0, i) is connected to check node ( j, i). If β ∈
G
2
s
−1
, variable node (β, i) is connected to check node
( j, L
(a)
β
(i, j)). The tree is completed once all possible
combinations of (i, j, k, β) are exhausted.
E. Mo and MarcA. Armand 5
Let T (a, s) denote the resultant tree constructed using the
complete family of MOLS derived from G
2
s
−1
∪ ¦0¦ ⊆ R.
T (a, s) is a degree2
s
+1 regular tree. By reading the variable
(check) nodes as columns (rows) of a matrix H(a, s) ∈
Z
(2
2s
+2
s
+1)×(2
2s
+2
s
+1)
2
a in topbottom, leftright manner while
setting the edge weights to be randomly chosen units from
Z
2
a , the portion of H(a, s) constructed at each step is
illustrated in Figure 1. The null space of H(a, s) yields an
LDPC code C(a, s) over Z
2
a .
Example 5. Let a = 2 and s = 2. The Latin squares are shown
in Example 4. Steps (1)–(3) are illustrated in Figure 2(a). As
observed, this can be perceived as the nonrandom portion of
the paritycheck matrix. Step 4, on the other hand, executes
the pseudorandom portion of the paritycheck matrix that
is commonly seen in most LDPC paritycheck matrices. The
resultant tree is shown in Figure 2(b).
4.2. Properties of C(a, s)
C(a, s) is a length n(s) = 2
2s
+ 2
s
+ 1 regular LDPC code
represented by H(a, s) (or T (a, s)). We denote the minimum
distance of C(a, s) as d
min
(a, s). Following the deﬁnition
given in [14], w
min
(a, s) denotes the minimum pseudocode
word weight that arises from the Tanner graph of C(a, s) for
the 2
a
ary symmetric channel.
Theorem3. Let T (a, s) denote the graph resulting from reduc
ing mod 2
a
¹
, all edge weights of T (a, s). T (a
¹
, s) = T (a, s),
that is, H(a
¹
, s) = H(a, s).
Proof. First, the connection procedure is regardless of a
in steps (1)–(3), and similarly for β = 0 in step (4).
Since L
(a
¹
)
β
(i, j) = L
(a)
β
(i, j) (from Theorem 2), the edge
((β, i), ( j, L
(a)
β
(i, j))) in T (a, s) is equivalent to the edge
((β, i), ( j, L
(a
¹
)
β
(i, j))) in T (a
¹
, s).
Remark 3. The graphs constructed by setting a = 1 yield
binary codes that are the same as those in [14, Section IV
A]. Further, it has also been shown that these codes are the
binary projective geometry (PG) LDPC codes introduced in
[5]. Thus, it is known that d
min
(1, s) = 2
s
+ 2.
Before deriving d
min
(a, s), we state two relationships
between the codewords in C(a, s) and C(a
¹
, s).
Corollary 1. (i) If c ∈ C(a, s), then c ∈ C(a
¹
, s).
(ii) If c ∈ C(a, s) can be expressed as c= 2
a−a
¹
c
¹
, where
C ∈ Z
n
2
a
¹ , then c
¹
∈ C(a
¹
, s) and is unique.
Proof. Corollary 1(i) is a simple consequence of Theorem 3;
while for Corollary 1(ii),
2
a−a
¹
c
¹
H
T
(a, s) = 0 mod 2
a
=⇒c
¹
H
T
(a, s) = 0 mod 2
a
¹
=⇒c
¹
H
T
(a
¹
, s) = 0 mod 2
a
¹
(fromTheorem3).
(16)
The uniqueness of c
¹
follows from the natural group
embedding, GR(2
a
¹
, s) →R : r ·→ 2
a−a
¹
r.
Theorem 4. d
min
(a, s) = d
min
(1, s).
Proof. Let d
c
be the Hamming weight of c ∈ C(a, s) \ ¦0¦.
Case 1. c contains at least one unit. From Corollary 1(i),
when a
¹
= 1, c ∈ C(1, s). Further, d
c
≥ d
c
. If d
c
= d
min
(1, s),
d
c
≥ d
min
(1, s).
Case 2.1. c can be expressed as c= 2
a−a
¹
c
¹
, where c
¹
contains
at least one unit of Z
2
a
¹
. From Corollary 1(ii), c
¹
∈ C(a
¹
, s).
Further, d
c
= d
c
¹ and from Case 1, d
c
¹ ≥ d
c
¹ . When a
¹
= 1,
c= 2
a−1
c
¹
, and c
¹
∈ C(1, s). If d
c
¹ = d
min
(1, s), d
c
= d
min
(1, s).
Case 2.2. c can be expressed as c= 2
a−a
¹
c
¹
, where c
¹
does not
contain any unit of Z
2
a
¹
. Similarly, from Corollary 1(ii), c
¹
∈
C(a
¹
, s). Therefore, d
c
= d
c
¹ and the bounds on d
c
¹ follow
Case 2.1.
Thus, d
min
(a, s) = d
min
(1, s).
It has already been shown in [14, Section IVA] that
w
min
(1, s) = d
min
(1, s). The following theorem states the
relationship between w
min
(a, s) and d
min
(a, s).
Theorem 5. w
min
(a, s) = d
min
(a, s).
Proof. Since T (1, s) = T (a, s) when a
¹
= 1 (from
Theorem 3) and all edge weights in T (a, s) are units of Z
2
a ,
w
min
(a, s) and w
min
(1, s) share the same tree bound [14],
that is, w
min
(a, s) ≥ 2
s
+ 2, for all a. Further, d
min
(a, s) =
d
min
(1, s) = 2
s
+ 2 (from Theorem 4). Thus,
2
s
+ 2 ≤ w
min
(a, s) ≤ d
min
(a, s) = 2
s
+ 2
=⇒w
min
(a, s) = d
min
(a, s) = 2
s
+ 2.
(17)
The code rate r(a, s) has to be computed ﬁrst by reducing
H(a, s) to the form as discussed in Section 2.1. r(a, s) is
bounded by
2
2s
+ 2
s
−3
s
a
¸
2
2s
+ 2
s
+ 1
≤ r(a, s) ≤
2
2s
+ 2
s
−3
s
2
2s
+ 2
s
+ 1
, (18)
where the upper bound corresponds to the code rates
of the binary PGLDPC codes [5]. We observe that by
setting the edge weights of T (a, s) as randomly chosen
units from Z
2
a , r(a, s) tends to the lower bound which
results in codes suitable for lowrate applications. On the
other hand, by setting all edge weights to be unity, r(a, s)
increases signiﬁcantly. The corresponding codes can thus be
deployed in moderaterate applications. Table 1 compiles the
properties of C(a, s) for various values of a and s.
5. SIMULATIONRESULTS
Figures 3 and 4 show the bit error rate (BER) and symbol
error rate (SER) performance of our structured codes over
6 EURASIP Journal on Wireless Communications and Networking
(a)
(0, 0)(0, 1) (0, α) (0, α
2
) (1, 0) (1, 1)(1, α)(1, α
2
) (α, 0) (α, 1)(α, α)(α, α
2
) (α
2
, 0)(α
2
, 1)(α
2
, α) (α
2
, α
2
)
(0, 0) (0, 1) (0, α) (0, α
2
) (1, 0) (1, 1) (1, α) (1, α
2
) (α, 0) (α, 1) (α, α) (α, α
2
) (α
2
, 0) (α
2
, 1) (α
2
, α) (α
2
, α
2
)
Variable node
Check node
(b)
Figure 2: Tree constructed for a = 2, s = 2 after (a) steps (1)–(3), and (b) step (4) (the ﬁnal structure).
the AWGN channel. In Figure 3(a), the corresponding edge
weights of the codes simulated are randomly chosen units
of Z
4
, while those in Figures 3(b) and 4 are set to unity.
The codewords are transmitted using the matched signals
discussed in Section 2.2. The received signals are decoded
using the sumproduct algorithm. The performance of
random, nearregular LDPC codes with constant variable
node degree of 3, is also shown. These codes have similar
codelengths and rates to that of the structured codes. For
each data point, 10
4
error bits are obtained for a maximum
of 100 iterations allowed for decoding each received signal
vector.
Figure 3(a) shows our structured Z
4
code outperforming
the random code when the codelength is small, that is,
42 bits. On the other hand, Figure 3(b) shows our structured
code performing worse than its random counterpart when
the codelength is much larger, speciﬁcally, 2114 bits. At a
glance, it therefore appears that our structured codes are only
better than random codes for short codelengths. To get a
clearer picture as to howour codes fair in comparison to their
randomcounterparts, we turn to Figures 4(a) and 4(b) which
summarize the BER performance of random and structured
codes over Z
4
, respectively, Z
8
, for increasing codelengths
of 21, 146, and 546 bits, respectively, 63, 219, and 819 bits.
From these empirical results, we conclude that our codes
signiﬁcantly outperform their random counterparts over a
wide BER range for very small codelengths, that is, less than
100 bits. On the other hand, for larger codelengths, random
codes perform better in the higher BER region while our
structured codes are superior at lower BERs, speciﬁcally,
10
−4
and below for codelengths close to 1000 bits and 10
−6
and below for larger codelengths, exceeding 2000 bits. This
phenomenon may be attributed to the fact that the minimum
distance of our codes grow linearly with the square root of
E. Mo and MarcA. Armand 7
Table 1: Properties of C(a, s).
a s n(s)
Degree of d
min
(a, s) r(a, s) r(a, s)
T (a, s) = w
min
(a, s) (Lower bound) (Unity edge weights)
1 0.5238 0.5238
2 2 21 5 6 0.2619 0.4762
3 0.1746 0.3175
4 0.1309 0.2381
1 0.6164 0.6164
2 3 73 9 10 0.3082 0.5548
3 0.2055 0.4932
4 0.1541 0.3699
1 0.6996 0.6996
2 4 273 17 18 0.3498 0.6337
3 0.2332 0.5653
4 0.1749 0.4982
1 0.7692 0.7692
2 5 1057 33 34 0.3846 0.7053
3 0.2564 0.6367
4 0.1923 0.5669
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
E
r
r
o
r
r
a
t
e
1 2 3 4 5 6 7 8 9
E
b
/N
0
(dB)
Structured BER
Structured SER
Random BER
Random SER
(a) a = 2, s = 2, random edge weights
10
−4
10
−3
10
−2
10
−1
10
0
E
r
r
o
r
r
a
t
e
1 1.5 2 2.5 3 3.5 4 4.5
E
b
/N
0
(dB)
Structured BER
Structured SER
Random BER
Random SER
(b) a = 2, s = 5, unity edge weights
Figure 3: Performance of structured and random LDPC codes over Z
4
with QPSK signaling over the AWGN channel.
their codelength. On the other hand, from [23, Theorem 26],
we have that the minimum distance of a random, regular
LDPC code with constant variable node degree of 3 grows
linearly with its codelength with high probability. As the
random codes considered here are near regular, we believe
that they have superior minimum distances compared to our
structured codes.
6. CONCLUSION
To summarize, we have extended the notion of Latin
squares to multiplicative groups of a Galois ring. Using the
generalized mapping function, we have constructed Tanner
graphs representing a family of structured LDPC codes over
Z
2
a spanning a wide range of code rates. In addition, we
8 EURASIP Journal on Wireless Communications and Networking
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
1 2 3 4 5 6 7 8
E
b
/N
0
(dB)
s = 4
s = 3
s = 2
Structured
Random
(a) a = 2, unity edge weights, transmitted using QPSK signaling
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
7 8 9 10 11 12 13
E
b
/N
0
(dB)
s = 4
s = 3
s = 2
Structured
Random
(b) a = 3, unity edge weights, transmitted using 8PSK signaling
Figure 4: Performance of structured and random LDPC codes transmitted using matched signals over the AWGN channel.
have shown that the minimum pseudocodeword weight
of these codes are equal to their minimum Hamming
distance—a desirable attribute under iterative decoding.
Finally, our simulation results show that these codes, when
transmitted by matched signal sets over the AWGN channel,
can signiﬁcantly outperform their random counterparts of
similar length and rate, at BERs of practical interest.
ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewers
for their helpful comments which led to signiﬁcant improve
ments in Sections 1 and 5 of this paper. The authors also
gratefully acknowledge ﬁnancial support from the Ministry
of Education ACRF Tier 1 Research Grant no. R263000
361112.
REFERENCES
[1] M. C. Davey and D. J. C Mackay, “Low density parity check
codes over GF(q),” IEEE Communications Letters, vol. 2, no. 5,
pp. 159–166, 1998.
[2] D. Slepian, “Group codes for the Gaussian channel,” Bell
System Technical Journal, vol. 47, pp. 575–602, 1968.
[3] G. D. Forney Jr., “Geometrically uniform codes,” IEEE Trans
actions on Information Theory, vol. 37, no. 5, pp. 1241–1260,
1991.
[4] D. Sridhara and T. E. Fuja, “LDPC codes over rings for PSK
modulation,” IEEE Transactions on Information Theory, vol.
51, no. 9, pp. 3209–3220, 2005.
[5] Y. Kou, S. Lin, and M. P. C. Fossorier, “Lowdensity parity
check codes based on ﬁnite geometries: a rediscovery and new
results,” IEEE Transactions on Information Theory, vol. 47, no.
7, pp. 2711–2736, 2001.
[6] B. Vasic and O. Milenkovic, “Combinatorial constructions of
lowdensity paritycheck codes for iterative decoding,” IEEE
Transactions on Information Theory, vol. 50, no. 6, pp. 1156–
1176, 2004.
[7] I. B. Djordjevic and B. Vasic, “Nonbinary LDPC codes for
optical communication systems,” IEEE Photonics Technology
Letters, vol. 17, no. 10, pp. 2224–2226, 2005.
[8] A. Bennatan and D. Burshtein, “Design and analysis of
nonbinary LDPC codes for arbitrary discretememoryless
channels,” IEEE Transactions on Information Theory, vol. 52,
no. 2, pp. 549–583, 2006.
[9] X.Y. Hu and E. Eleftheriou, “Binary representation of cycle
Tannergraph GF(2
b
) codes,” in Proceedings of IEEE Interna
tional Conference on Communications (ICC ’04), vol. 1, pp.
528–532, Paris, France, June 2004.
[10] C. Poulliat, M. Fossorier, and D. Declercq, “Using binary
image of nonbinary LDPC codes to improve overall perfor
mance,” in Proceedings of IEEE International Symposium on
Turbo Codes, Munich, Germany, April 2006.
[11] C. A. Kelley, D. Sridhara, and J. Rosenthal, “Pseudocodeword
weights for nonbinary LDPC codes,” in Proceedings of IEEE
International Symposium on Information Theory (ISIT ’06), pp.
1379–1383, Seattle, Wash, USA, July 2006.
[12] R. Koetter and P. O. Vontobel, “Graphcovers and iterative
decoding of ﬁnite length codes,” in Proceedings of the 3rd IEEE
International Symposium on Turbo Codes and Applications, pp.
75–82, Brest, France, September 2003.
[13] N. Wiberg, Codes and decoding on general graphs, Ph.D. thesis,
Link¨ oping University, Link¨ oping, Sweden, 1996.
[14] C. A. Kelley, D. Sridhara, and J. Rosenthal, “Treebased
construction of LDPC codes having good pseudocodeword
weights,” IEEE Transactions on Information Theory, vol. 53, no.
4, pp. 1460–1478, 2007.
[15] I. B. Djordjevic and B. Vasic, “MacNeishMann theorem
based iteratively decodable codes for optical communication
systems,” IEEE Communications Letters, vol. 8, no. 8, pp. 538–
540, 2004.
E. Mo and MarcA. Armand 9
[16] O. Milenkovic and S. Laendner, “Analysis of the cycle
structure of LDPC codes based on Latin squares,” in Pro
ceedings of IEEE International Conference on Communications
(ICC ’04), vol. 2, pp. 777–781, Paris, France, June 2004.
[17] B. Vasic, I. B. Djordjevic, and R. K. Kostuk, “Lowdensity
parity check codes and iterative decoding for longhaul optical
communication systems,” Journal of Lightwave Technology, vol.
21, no. 2, pp. 438–446, 2003.
[18] I. B. Djordjevic and B. Vasic, “Iteratively decodable codes from
orthogonal arrays for optical communication systems,” IEEE
Communications Letters, vol. 9, no. 10, pp. 924–926, 2005.
[19] O. Milenkovic, N. Kashyap, and D. Leyba, “Shortened array
codes of large girth,” IEEE Transactions on Information Theory,
vol. 52, no. 8, pp. 3707–3722, 2006.
[20] G. Caire and E. M. Biglieri, “Linear block codes over cyclic
groups,” IEEE Transactions on Information Theory, vol. 41, no.
5, pp. 1246–1256, 1995.
[21] H. A. Loeliger, “Signal sets matched to groups,” IEEE Trans
actions on Information Theory, vol. 37, no. 6, pp. 1675–1682,
1991.
[22] J. H. van Lint and R. M. Wilson, A Course in Combinatorics,
Cambridge University Press, Cambridge, UK, 2nd edition,
2001.
[23] G. Como an F. Fagnani, “Average spectra and mini
mum distances of low density parity check codes over
cyclic groups,” http://calvino.polito.it/∼fagnani/groupcodes/
ldpcgroupcodes.pdf.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 824673, 14 pages
doi:10.1155/2008/824673
Research Article
Differentially Encoded LDPC Codes—Part I:
Special Case of Product Accumulate Codes
Jing Li (Tiffany)
Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
Correspondence should be addressed to Jing Li (Tiﬀany), jingli@ece.lehigh.edu
Received 19 November 2007; Accepted 6 March 2008
Recommended by Yonghui Li
Part I of a twopart series investigates product accumulate codes, a special class of diﬀerentiallyencoded low density parity check
(DELDPC) codes with high performance and low complexity, on ﬂat Rayleigh fading channels. In the coherent detection case,
Divsalar’s simple bounds and iterative thresholds using density evolution are computed to quantify the code performance at ﬁnite
and inﬁnite lengths, respectively. In the noncoherent detection case, a simple iterative diﬀerential detection and decoding (IDDD)
receiver is proposed and shown to be robust for diﬀerent Doppler shifts. Extrinsic information transfer (EXIT) charts reveal that,
with pilot symbol assisted diﬀerential detection, the widespread practice of inserting pilot symbols to terminate the trellis actually
incurs a loss in capacity, and a more eﬃcient way is to separate pilots from the trellis. Through analysis and simulations, it is shown
that PA codes perform very well with both coherent and noncoherent detections. The more general case of DELDPC codes, where
the LDPC part may take arbitrary degree proﬁles, is studied in Part II Li 2008.
Copyright © 2008 Jing Li (Tiﬀany). This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
The discovery of turbo codes and the rediscovery of low
density paritycheck (LDPC) codes have renewed the rese
arch frontier of capacityachieving codes [1, 2]. They also
revolutionized the coding theory by establishing a new soft
iterative paradigm, where long powerful codes are con
structed from short simple codes and decoded through iter
ative message exchange and successive reﬁnement between
component decoders. Compared to turbo codes, LDPC
codes boast a lower complexity in decoding, a richer variety
in code construction, and not being patented.
One important application of LDPC codes is wireless
communications, where sender and receiver communicate
through, for example, a nolineofsight landmobile chan
nel that is characterized by the Rayleigh fading model.It is
wellrecognized that LDPC codes perform remarkably well
on Rayleigh fading channels, that is, assuming the carrier
phase is perfectly synchronized and coherent detection is
performed; but what if otherwise?
It should be noted that, due to practical issues like
complexity, acquisition time, sensitivity to tracking errors,
and phase ambiguity, coherent detection may become expen
sive or infeasible in some cases. In the context of nonco
herent detection, the technique of diﬀerential encoding
becomes immediately relevant. Diﬀerential encoding admits
simple noncoherent diﬀerential detection which solves phase
ambiguity and requires only frequency synchronization
(often more readily available than phase synchronization).
Viewed from the coding perspective, performing diﬀerential
encoding is essentially concatenating the original code with
an accumulator, or, a recursive convolutional code in the
form of 1/(1 + D).
In this series of twopart papers, we investigate the theory
and practice of LDPC codes with diﬀerential encoding. We
start with a special class of diﬀerentially encoded LDPC (DE
LDPC) codes, namely, product accumulate (PA) codes (Part
I), and then we move to the general case where an arbitrary
(random) LDPC code is concatenated with an accumulator
(Part II) [3].
Product accumulate codes, proposed in [4] and depicted
in Figure 1, are a class of serially concatenated codes, where
the inner code is a diﬀerential encoder, and the outer code
is a parallel concatenation of two branches of singleparity
2 EURASIP Journal on Wireless Communications and Networking
Outer code Inner code
SPC1
p
1
p
2
SPC2
D
(a)
Outer code Inner code
1/(1 + D) TPC/SPC π
C
1
C
2
x
1
x
2
x
n−1
x
n
0
y
1
y
2
y
n−1
y
n
Check
Bit
y: observation from the channel
x: input bit to 1/(1 + D),
output bit from TPC/SPC
R
a
n
d
o
m
i
n
t
e
r
l
e
a
v
e
r
.
.
.
.
.
.
.
.
. .
.
.
(b)
Figure 1: PA codes (a), code structure (b). Graph representation.
check (SPC) codes or a structured LDPC code comprising
degree1 and degree2 variable nodes. Since the accumulator
can also be described using a sparse bipartite graph, a
PA code is, overall, an LDPC code. Alternatively, it may
also be regarded as a diﬀerentiallyencoded LDPC code, to
emphasize the impact of the inner diﬀerential encoder.The
reasons to study PA codes are multifold. First, PA codes
exhibit an interesting threshold property and remarkable
performance,and are well established as a class of “good”
codes with rates ≥ 1/2 and performance within a few tenths
of a dB from the Shannon limit [4]. Here, “good” is in the
sense deﬁned by MacKay [2]. Second, PA codes are desirable
for their simplicity. They are simple to describe, simple to
encode and decode, and simple enough to allow rigorous
theoretical analysis [4]. Comparatively, a randomLDPCcode
can be expensive to describe and expensive to implement in
VLSI (due to the diﬃculty of routing and wiring). Finally, PA
codes are intrinsically diﬀerentially encoded, which naturally
permits noncoherent diﬀerential detection without needing
additional components.
The primary interest is the noncoherent detection case,
but for completeness of investigation and for comparison,
we also include the case of coherent detection. Under the
assumption that phase information is known, we compute
Divsalar’s simple bounds to benchmark the performance of
PA codes at ﬁnite code lengths [5], and we evaluate iterative
thresholds using density evolution (DE) to benchmark the
performance of PA codes at inﬁnite code lengths. The
asymptotic thresholds reveal that PA codes are about from
0.6 to 0.7 dB better than regular LDPC codes, but 0.5 dB
worse than optimal irregular LDPC codes (whose maximal
left degree is 50) on Rayleigh fading channels with coherent
detection. Simulations of fairly long block lengths show a
good agreement with the analytical results.
When phase information is unavailable, the decoder/
detector will either proceed without phase information
(completely blind), or entails some (coarse) estimation and
compensation in the decoding process. We regard either
case as noncoherent detection. The presence of a diﬀerential
encoder in the code structure readily lands PA codes to
noncoherent diﬀerential detection. Conventional diﬀerential
detection (CDD) operates on two symbol intervals and
recovers the information by subtracting the phase of the
previous signal sample from the current signal sample. It is
cheap to implement, but suﬀers as much as from 4 to 5 dB
in bit error rate (BER) performance [6]. Closing the gap
between CDD and diﬀerentially encoded coherent detection
generally requires the extension of the observation window
beyond two symbol intervals.The result is multisymbol
diﬀerential detection (MSDD), exempliﬁed by maximum
likelihood (ML) multisymbol detection, trellisbased mul
tisymbol detection with persurvivor processing, and their
variations [7, 8]. MSDD performs signiﬁcantly better than
CDD, at the cost of a considerably higher complexity which
increases exponentially with the window size. To preserve the
simplicity of PA codes, here we propose an eﬃcient iterative
diﬀerential detection and decoding (IDDD) receiver which is
robust against various Doppler spreads and can perform, for
example, within 1 dB from coherent detection on fast fading
channels.
We investigate the impact of pilot spacing and ﬁlter
lengths, and we show that the proposed PA IDDD receiver
requires very moderate number of pilot symbols, compared
to, for example, turbo codes [6]. It is quite expected that
the percentage of pilots directly aﬀects the performance
especially on very fast fading channels, but much less
expected is that how these pilot symbols are inserted also
makes a huge diﬀerence. Through extrinsic information
transfer (EXIT) analysis [9], we show that the widespread
practice of inserting pilot symbols to periodically terminate
the trellis of the diﬀerential encoder inevitably [6, 7] incurs
a loss in code capacity. We attribute this to what we call
the “trellis segmentation” eﬀect, namely, error events are
made much shorter in the periodically terminated trellis than
otherwise. We propose that pilot symbols be separated from
the trellis structure, and simulation conﬁrms the eﬃciency of
the new method.
From analysis and simulation, it is fair to say that PA
codes perform well both with coherent and noncoherent
detection. In Part II of this series of papers, we will show
that conventional LDPC codes, such as regular LDPC codes
with uniform column weight of 3 and optimized irregular
ones reported in literature, actually perform poorly with
noncoherent diﬀerential detection. We will discuss why, how,
and how much we can change the situation.
The rest of the paper is organized as follows. Section 2
introduces PA codes and the channel model. Section 3
analyzes the coherently detected PA codes on fading channels
Jing Li (Tiﬀany) 3
using Divsalar’s simple bounds and iterative thresholds.
Section 4 discusses noncoherent detection and decoding of
PA codes and performs EXIT analysis. Finally, Section 5
summarizes the paper.
2. PA CODES ANDCHANNEL MODEL
2.1. Channel model
We consider binary phase shiftkeying (BPSK) signaling (0→
+1, 1 → −1) over ﬂat Rayleigh fading channels. Assuming
proper sampling of the outputs from the matched ﬁlter, the
received discretetime baseband signal can be modeled as
r
k
= α
k
e
jθk
s
k
+ n
k
, where s
k
is the BPSKmodulated signal,
n
k
is the i.i.d. complex AWGN with zero mean and variance
σ
2
= N
0
/2 in each dimension. The fading amplitude α
k
is modeled as a normalized Rayleigh random variable with
E[α
2
k
] = 1 and pdf p
A
(α
k
) = 2α
k
exp(−α
2
k
) for α
k
> 0, and
the fading phase θ
k
is uniformly distributed over [0, 2π].
For fully interleaved channels, α
k
’s and θ
k
’s are inde
pendent for diﬀerent time indexes k. For insuﬃciently
interleaved channels, they are correlated. We use the Jakes’
isotropic scattering land mobile Rayleigh channel model
to describe the correlated Rayleigh process which has
autocorrelation R
k
= (1/2)J
0
(2kπ f
d
T
s
), where f
d
T
s
is the
normalized Doppler spread, and J
0
(·) is the 0th order Bessel
function of the ﬁrst kind.
Throughout the paper, θ
k
is assumed known perfectly
to the receiver/decoder in the coherent detection case, and
unknown (and needs to be worked around) in the nonco
herent detection case. Further, the receiver is said to have
channel state information (CSI) if α
k
known (irrespective of
θ
k
), and no CSI otherwise.
2.2. PAcodes and decoding analysis
A product accumulate code, as illustrated in Figure 1(a),
consists of an accumulator (or a diﬀerential encoder) as the
inner code, and a parallel concatenation of 2 branches of
singleparity check codes as the outer code. PA codes are
decoded through a softiterative process where soft extrinsic
information is exchanged between component decoders
conforming to the turbo principle. The outer code, modeled
as a structured LDPC code, is decoded using the message
passing algorithm. The inner code, taking the convolutional
form of 1/(1 + D), may be decoded either using the trellis
based BCJR algorithm, or a graphbased messagepassing
algorithm. The latter, thanks to the cyclefree code graph
of 1/(1 + D), performs as optimally as the BCJR algorithm,
but consumes several times less of complexity [4, 10].
Thus, the entire code can be eﬃciently decoded through a
uniﬁed messagepassing algorithm, driven by the initial log
likelihood ratio (LLR) values extracted from the channel [4].
For Rayleigh fading channels with perfect CSI, that is, α
k
is
known ∀k, the initial channelLLRs are computed using
L
CSI
ch
¸
s
k
=
4α
k
N
0
r
k
, (1)
and for Rayleigh fading channels without CSI,
L
NCSI
ch
(s
k
) =
4E[α
k
]
N
0
r
k
, (2)
where E[α] =
√
π/2 is the mean of α. Due to the space limit
ation, we omit the details of the overall messagepassing
algorithm, but refer readers to [4].
3. COHERENT DETECTION
This section investigates the coherent detection case on
Rayleigh fading channels. We employ Divsalar’s simple
bounds and the iterative threshold to analyze the ensemble
average performance of PAcodes, and simulate individual PA
codes at short and long lengths.
3.1. Simple bounds
Union bounds are simple to compute, but are rather loose
at low SNRs. Divsalar’s simple bound is possibly one of
the best closedform bounds [5]. Like many other tight
bounds, the simple bound is based on the second Gallager’s
bounding techniques [1]. By using numerical integration
instead of a Chernoﬀ bound and by reducing the number
of codewords to be included in the bound, Divsalar was able
to tighten the bound to overcome the cutoﬀ rate limitation.
Since the simple bound requires the knowledge of the
distance spectrum, a hardtoattain property especially for
concatenated codes, it has not seen wide application. Here,
the simplicity of PA codes permits an accurate computation
of the ensembleaverage distance spectrum (whose details
can be found in [4]), and thus enables the exploitation of
the simple bound.
The technique of the simple bound allows for the com
putation of either a maximum likelihood (ML) threshold in
the asymptotic sense [4, 5], or a performance upper bound
with respect to a given ﬁnite length. Divsalar derived the
general form of the simple bound on independent Rayleigh
fading channels with perfect CSI. Following a similar line of
reasoning, below we extend it to the case of nonCSI.
3.1.1. Gallager’s second bounding technique
Gallager’s second bounding technique sets the base for many
tight bounds including the simple bounds [1]. It states that
Pr (error) ≤ Pr (error, r ∈ R) + Pr (r / ∈R), (3)
where r = γ α s + n is the received codeword (Ndimen
sional noisecorrupted vector), s is the transmitted code
word vector, n is the noise vector whose components are
i.i.d. Gaussian random variables with zero mean and unit
variance, γ is the known constant (in modulation), α is the
N ×N matrix containing fading coeﬃcients (α is an identity
matrix for AWGN channels),and R denotes a region in the
observed space around the transmitted codeword. To get a
tight bound, optimization and integration are usually needed
to determine a meaningful R.
4 EURASIP Journal on Wireless Communications and Networking
3.1.2. Divsalar’s simple bound for independent rayleigh
fading channels with CSI
For Rayleigh fading channels, the decision metric is based on
the minimization of the norm r − γαs, where s, r, and
α are the transmitted signal, received signal, and the fading
amplitude in vector form, respectively, and γ is the amplitude
of the transmitted signal such that γ
2
/2 = E
s
/N
0
.
For a good approximation of the error using (3),and for
computational simplicity, the decision region R was chosen
as an Ndimensional hypersphere centered at ηγαs and with
radius
√
NR, where η and R are the parameters to be
optimized [5].
When perfect CSI is available, the eﬀect of fading can
be compensated through a linear transformation on γ α s. In
particular, a rotation e
jϕ
and a rescaling ζ have shown to yield
a good and analytically feasible solution [5]
R =
¸
r 
¸
¸
r −ζe
jϕ
γαs
¸
¸
2
≤ NR
2
¸
, (4)
which leads to the upper bound of the error probability of an
(N, K, R) code [5]
P(e)
≤
2
√
N−K+1
¸
h=2
min
e
−NE(c,δ,ρ,β,κ,φ)
, e
NγN(δ)
1
π
π/2
0
¸
sin
2
θ
sin
2
θ+c
h
dθ
,
(5)
where
E(c, δ, ρ, β, κ, φ)
= −ργ
N
(δ) +
ρ
2
log
β
ρ
+
1 −ρ
2
log
1 −β
1 −ρ
+ ρδ log (1 + c(1 −2κφ))
+ ρ(1 −δ) log
¸
1 + c
1 −2κφ −
(1 −κ
2
)ρ
β
¸
+ (1 −ρ) log
¸
1+c
1−ρ(1−2κφ)
1 −ρ
, −
(1 −ρ(1 −κ))
2
(1−ρ)(1−β)
¸
,
(6)
c =
γ
2
2
=
E
s
N
0
= R
E
b
N
0
, (7)
δ =
h
N
, (8)
γ(δ) = γ
N
h
N
¸
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
1
N
log
¸
A
h
, for word error rate,
1
N
log
¸
w
w
K
A
w,h
¸
, for bit error rate.
(9)
3.1.3. Extension of the simple bound to the case of No CSI
Another simple and reasonable choice of the decision region
is an ellipsoid centered at ηγs, which can be obtained by
rescaling each coordinate of r so as to compensate for the
eﬀect of fading
R =
¸
r 
¸
¸
α
−1
r −ηγs
¸
¸
2
≤ NR
2
¸
, (10)
where η and R are optimized. For independent Rayleigh
channels without CSI, since accurate information on α
is unavailable,we resort to the expectation of the fading
coeﬃcient α
−1
≈ E[α
−1
] = (1/0.8862)I in (10), where I is an
identity matrix. By replicating the computations described
in [5], we obtain the upper bound of the bit error rate for
independent Rayleigh channels without CSI:
P(e) ≤
2
√
N−K+1
¸
h=2
min
e
−NE(c,δ,ρ)
, exp
hυNγ
N
(δ)
c
¸
×
1 −
2
√
1 + 2/υ + 1
¸
h
,
(11)
where
E(c, δ, ρ) = −
1
2
log
¸
1 −ρ + ρe
2γN(δ)
+ c
1 +
1 −δ
δ
1 +
1 −ρ
ρ
e
−2γN(δ)
¸¸
−1
,
ρ =
1 +
1 −β
β
e
2γN(δ)
¸
−1
,
β =
2c
1 −δ
δ
¸
1 −e
−2γN(δ)
−1
+
1 −δ
δ
¸
2
¸
(1 + c)
2
−1
¸
¸1/2
−(1 + c)
1 −δ
δ
,
c = E
2
[α]
γ
2
2
= 0.8862
2
R
E
b
N
0
,
υ =
(γ
2
/2)
2
−1 =
(RE
b
/N
0
)
2
−1,
(12)
and δ and γ
N
(δ) are the same as in (8) and (9).
Please note that the aforediscussed extension to the
fading case with no CSI slightly loosens the simple bound,
but it preserves the computational simplicity. It is possible for
a more sophisticated transformation to yield tighter bounds
but not necessarily a feasible analytical expression.
Figure 2 plots the simulated BER performance and the
simple bound of a (1024,512) PA code on independent
Rayleigh fading channels with and without CSI. Since an
optimal ML decoder is assumed, and the ensemble average
distance spectrum is used, in the computation, the simple
bound represents the best ensemble average performance,
and may not accurately reﬂect the individual PA code being
simulated. Nevertheless, we see that the bound is fairly tight.
It provides a useful indication of the code performance at
SNRs below the cutoﬀ rate, and, at high SNRs, it joins with
the union bound to predict the error ﬂoor.
3.2. Threshold computation via the iterative analysis
The ML performance bound evaluated in the previous
subsection factors in the ﬁnite length of a PA code ensemble,
Jing Li (Tiﬀany) 5
2 4 6 8 10 12 14 16
E
b
/N
0
(dB)
10
−6
10
−4
10
−2
10
0
B
E
R
Simu., CSI
Simu., no CSI
Divsalar bound, CSI
Divsalar bound, no CSI
K = 512, R = 0.5 PA, independent fading
Figure 2: Divsalar simple bounds for R = 0.5 PA codes.
but the assumption of an ML decoder may be optimistic.
Below we account for the iterative nature of the practical
decoder and compute an asymptotic iterative threshold using
the renowned method of density evolution [11].
A useful tool for analyzing the iterative decoding pro
cess of sparsegraph codes, density evolution examines the
probability density function (pdf) of exchanging messages
in each step and can,literally speaking, track the entire
decoding process. In general, we are more interested in the
asymptotic SNR thresholds, η, which are deﬁned as the
critical channel condition that isrequired for the decoding
process to converge unanimously to the correct decision:
η(dB) = min
SNR
lim
l→∞
0
−∞
y f
(l)
Ly
(ζ) dζ = 0
¸
, (13)
where y = ±1 is the BPSK modulated signals, and f
(l)
Ly
denotes the pdf of LLR information on y after the lth
decoding iteration.
Tracking the density of the messages requires the
computation of the initial pdf of the LLR messages from
the channel,and the transformation of the message pdf ’s
in each step of the decoding process. Although Gaussian
approximation is reported to incur only very little inaccuracy
on AWGN channels [12, 13], the deviation is larger on fading
channels, since the pdf of the initial LLRs from a fading
channel looks diﬀerent from a Gaussian distribution. Hence,
exact density evolution is used to preserve accuracy.
3.2.1. Initial LLR pdf fromthe channel
Hou et al. showed in [14] that the pdf of the LLRs from
independent Rayleigh channelswith perfect CSI is given
by (assuming BPSK signaling and the allzero sequence is
transmitted)
f
CSI
Lch,y
(ζ) =
∞
0
N
4α
2
N
0
,
8α
2
N
0
¸
· p(α)dα,
=
N
0
4π
exp
−
ζ
¸
N
0
+ 1 −1
2
¸
×
∞
0
exp
−
¸
(ζN
0
/4α)−α
N
0
+1
2
N
0
¸
dα.
(14)
Using integrals from [15], we further simplify (14) to
f
CSI
Lch,y
(ζ) =
N
0
4
1 + N
0
· exp
ζ −ζ
1 + N
0
2
¸
. (15)
For the case when CSI is not available to the receiver,
we assume that the Rayleighfaded and AWGNcorrupted
signals follow a Gaussian distribution in the most probable
region. The pdf of the initial messages is then derived as
f
NCSI
Lch,y
(ζ) =
Δ
2
N
0
κ
N0
π
κ +
√
2ΔζQ
−
Δζ
√
π
¸¸
, (16)
where Δ =
N
0
/2(N
0
+ 1), κ = exp(−Δ
2
ζ
2
/2π), and Q(x) =
(1/
√
2π)
∞
x
e
−z
2
/2
dz.
3.2.2. Evolution of LLR pdf in the decoder
To track the evolution of the pdf ’s along the iterative
process can either employ Monte Carlo simulation, or,
more accurately and more eﬃciently, to proceed analytically
through discretized density evolution. The latter is possible
due to the simplicity in the code structure and in the decod
ing algorithm of PA codes. As a selfcontained discussion,
we summarize the major steps of the discretized density
evolution of PA codes in the Appendix, but for details, please
refer to [4].
Using (15) for perfect CSI case or (16) for no CSI
case (i.e., substituting them in (A.4) and (A.5) in the
Appendix), the thresholds of PA codes on Rayleigh channels
can be computed through (A.3) to (A.12) in the Appendix.
The computed thresholds are a good indication of the
performance limit as the code length and the number of
iterations increase without bound.
Figure 3 plots the thresholds as well as the simulation
results of PA codes on independent Rayleigh channels with
and without CSI. We see that the analytical results are
consistent with the simulation results for fairly large block
sizes. Here, simulations are evaluated after the 50th iteration.
As the block size and the number of iterations continue to
increase, we expect the actual performance to converge to the
thresholds.
Table 1 compares the thresholds of PA codes with those
of LDPC codes for several code rates. The ergodic capacity
of the independent Rayleigh fading channel is also listed as
reference. We see that the thresholds of PA codes are about
0.6 dB from the channel capacity, and simulations of fairly
6 EURASIP Journal on Wireless Communications and Networking
Table 1: Thresholds (E
b
/N
0
in dB) of PA codes on Rayleigh channels ((3, ρ) LDPC data by courtesy of Hou et al. [14]).
Flat Rayleigh CSI Flat Rayleigh no CSI
Rate Capacity (dB) PA (dB) LDPC (dB) Capacity (dB) PA (dB) LDPC (dB)
0.5 1.8 2.42 3.06 2.6 3.33 4.06
0.6 3.0 3.56 — 3.8 4.48 —
2/3 3.7 4.34 4.72 4.4 5.15 5.74
1.5 2 2.5 3 3.5 4 4.5 5 5.5
E
b
/N
0
(dB)
10
−4
10
−3
10
−2
10
−1
B
E
R
Simulations and thresholds of PA codes
R = 1/2, CSI
R = 1/2, no CSI
R = 2/3, CSI
Figure 3: Thresholds computed using density evolution and
simulations (data block size K = 64 K).
large block sizes are about 0.30.4 dB from the thresholds.
Compared to the thresholds of LDPC codes reported in
[14], rate 1/2 PA codes are from about 0.60.7 dB better
(asymptotically) than (3, 6)regular LDPC codes, but are
about 0.5 dB worse (asymptotically) than irregular LDPC
codes. It should be noted that these irregular LDPC codes are
speciﬁcally optimized for Rayleigh fading channels and have
maximum variable node degree of 50. It is fair to say that
PA codes perform on par with LDPC codes (using coherent
detection).
3.3. Simulation with coherent detection
To benchmark the performance of coherently detected PA
codes, several PA conﬁgurations are simulated on correlated
and independent Rayleigh fading channels. In each global
iteration (i.e., iteration between the inner decoder and the
outer decoder), two local iterations of the outer decoding are
performed. This scheduling is found to strike the best trade
oﬀ between complexity and performance (with coherent
detection).
3.3.1. Coherent BPSK on independent rayleigh channels
Figure 4 shows the performances of rate 1/2 PA codes on
independent Rayleigh fading channels with and without
channel state information, respectively. Bit error rates after
20, 30, and 50 (global) iterations are plotted, and data
block sizes from short to large (512, 1 K, 4 K, and 64 K)
are evaluated to demonstrate the interleaving gain. For
comparison purpose, the corresponding channel capacities
are also shown. The simulated performance degradation due
to the lack of CSI is about 0.9 dB, which is consistent with the
gap between the respective channel capacities.
Compared to the (3, 6)regular LDPC codes reported in
[14],the performance of this rate 1/2, codeword length N =
128 × 1024 = 1.3 × 10
5
PA code is about 0.4 and 0.25 dB
better than regular LDPCcodes of length N = 10
5
and 10
6
on
independent Rayleigh channels. It is possible that optimized
irregular LDPC codes will outperform PA codes (as indicated
by their thresholds), but for regular codes, PAcodes seemone
of th best.
3.3.2. Coherent BPSK on correlated rayleigh channels
Figure 5 shows the performance of PA codes on correlated
fading channels. Perfect CSI is assumed available to the
receiver, and an interleaver exists between the PA code and
the channel (to partially break up the correlation between the
neighboring bits). Short PA codes with rate 1/2 and 3/4 are
simulated on two common fading scenarios with normalized
Doppler spreads f
d
T
s
= 0.01 and 0.001, respectively.
As expected, the performance deteriorates rapidly as f
d
T
s
decreases, since slower Doppler rate brings smaller diversity
order. Due to the interleaver between the PA code and the
channel, the impact of slow Doppler rate is less severe for
larger block sizes than for smaller ones. Whereas K = 1 K PA
code loses about 7 dB at BER = 10
−4
as f
d
T
s
changes from
0.01 to 0.001, the loss with K = 4 K PA code is less than 5 dB.
To illuminate how well short PA codes perform on
correlated channels, we compare them with turbo codes
(which are the bestknown codes at short code lengths)
in Figure 5. The comparing turbo code has 16state com
ponent convolutional codes whose generator polynomial is
(1, 35/23)
oct
and which are decoded using logdomain BCJR
algorithm. Code rate is 075, data block size is 4 K, and S
random interleavers are used in both codes to lower the
possible error ﬂoors. Curves plotted are for PA codes at
the 10th iteration and turbo codes at the 6th iteration. We
observe that turbo codes perform about 0.6 and 0.7 dB better
than PA codes for f
d
T
s
= 0.001 and 0.01, respectively.
However, it should be noted that this performance gain
comes at a price of a considerably higher complexity. While
the messagepassing decoding of a rate0.75 PA code at the
10th iteration requires about 267 operations per data bit [4],
the logdomain BCJR decoding of a rate0.75 turbo code at
the 6th iteration requires as many as 9720 operations per data
Jing Li (Tiﬀany) 7
1.5 2 2.5 3 3.5 4 4.5 5
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
R = 0.5 PA, independent fading, CSI
Shannon limit
K = 512
K = 1 K
K = 4 K
K = 64 K
20,
30,
50 iterations
(a)
2.5 3 3.5 4 4.5 5 5.5 6
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
R = 0.5 PA, independent fading, no CSI
Shannon limit
K = 512
K = 1 K
K = 4 K
K = 64 K
20,
30,
50 iterations
(b)
Figure 4: Performance of PA codes on independent Rayleigh fading channels. Code rate 0.5, data block size 512, 1 K, 4 K, 64 K. (a) With
CSI; (b) without CSI.
2 4 6 8 10 12 14
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
R = 1/2, f
d
T
s
= 0.01
R = 1/2, f
d
T
s
= 0.001
R = 3/4, f
d
T
s
= 0.01
Turbo, R = 3/4, f
d
T
s
= 0.01
R = 3/4, f
d
T
s
= 0.001
Turbo, R = 3/4, f
d
T
s
= 0.001
Correlated fading, K = 4 K, f
d
T
s
= 0.01, 0.001
Figure 5: Performance of PA codes on correlated Rayleigh fading
channels with CSI. Data block length 4 K, normalized Doppler rate
f
d
T
s
= 0.01, 0.001, rate of PA codes 0.5 and 0.75, rate of turbo codes
0.75, component codes of the turbo code (1, 35/23)
oct
, 10 iterations
for PA codes, and 6 iterations for turbo codes.
bit, a complexity 35 times larger. Hence, PA codes are still
attractive for providing good performance at low lost.
4. NONCOHERENT DETECTIONOF PA CODES
This section considers noncoherent detection. The channel
model of interest is a Rayleigh fading channel with correlated
fading coeﬃcients.
4.1. Iterative differential detection and decoding
PAcodes are inherently diﬀerentially encoded which makes it
convenient for noncoherent diﬀerential detection. Although
multiple symbol diﬀerential detection is possible, for com
plexity concerns, we consider a simple iterative diﬀerential
detection and decoding receiver, whose structure is shown
in Figure 6.The IDDD receiver consists of a conventional
diﬀerential detector with 2symbol observation window (the
current and the previous), a phase tracking ﬁlter and the
original PA decoder (that used in coherent detection [4]).
Trellis structure is employed to assist the detection and
decoding of the inner diﬀerential code 1/(1 + D), but
unlike the case of multiple symbol detection, the trellis is
not expanded and has 2 states only. Soft information is
passed back and forth among diﬀerent parts of the receiver
conforming to the turbo principle. Let x denote the input
to the inner diﬀerential encoder or the output from the
outer code, and let y denote the output from the diﬀerential
encoder or the symbol to be put on the channel (see
Figure 6). The diﬀerential encoder implements y
k
= x
k
y
k−1
for x
k
, y
k
∈ {±1} (BPSK signal mapping 0 → +1, 1 →
−1). The channel reception is given by r
k
= α
k
e
jθk
y
k
+ n
k
,
where the channel amplitudes (α
k
’s) and phases (θ
k
’s) are
correlated, and the complex white Gaussian noise samples
(n
k
’s) are independent.
In theory, diﬀerential decoding does not require pilot
symbols. In practice, however, pilot symbols are inserted
periodically even with multiple symbol detection, to avoid
catastrophic error propagation in diﬀerential decoding. This
is particularly so for the fast fading case where phases (θ
k
)
are changing rapidly (will show later). Hence, some of the
r
k
’s (and y
k
’s) in the received sequence are pilot symbols.
We use L to denote the LLR information,superscript (q)
to denote the qth (global) iteration, and subscript i, o, ch,
and e to denote the quantities associated with the inner
8 EURASIP Journal on Wireless Communications and Networking
Conv. diﬀ. detector
1/(1 + D)
inner decoder
(detector)
Channel estimator
(ﬁlter)
Outer decoder π
−1
π
x
Channel
y
r
n αe
jθ
Iterative diﬀerential detector and decoder
Figure 6: Structure of iterative diﬀerential detection and decoding receiver.
code, the outer code, the fading channel, and “the extrinsic”,
respectively.
4.1.1. IDDDreceiver
Here is a sketch of howthe proposed IDDDreceiver operates.
In the ﬁrst iteration, the switch in Figure 6 is ﬂipped up.
The samples of the received symbols, r
k
, are fed into the
conventional diﬀerential detector which computes u
k
=
Real(r
k
r
∗
k−1
) and subsequently soft LLRL
ch
(x
k
) fromu
k
. Here
∗ denotes the complex conjugate. L
ch
(x
k
) is then treated as
L
(1)
e,i
(x
k
) and fed into the outer decoder, which, in return,
generates L
(1)
e,o (x
k
) and passes it to the inner decoder for use
in the next detection/decoding iteration. Starting from the
second iteration, the switch in Figure 6 is ﬂipped down, and
channel estimation for α
k
and
θ
k
is performed before the
“coherent” detection and decoding of the inner and outer
code. After Q iterations, a decision is made by combining
the extrinsic information from both the inner and the outer
decoders: x
k
= sign(L
(Q)
e,i
(x
k
) + L
(Q)
e,o (x
k
)). In the above
discussion, we have ignored the existence of the random
interleaver, but it is understood that proper interleaving and
deinterleaving is performed whenever needed.
4.1.2. Conventional differential detector for the ﬁrst
decoding iteration
With the assumption that the carrier phases are near con
stant between two neighboring symbols, the conventional
diﬀerential detector (in the ﬁrst iteration) performs u
k
Δ
=
Real (r
k
r
∗
k−1
). Hard decision of x
k
is obtained by simply
checking the sign of u
k
. Computing soft information L
ch
(x
k
)
from u
k
requires the knowledge of the pdf of u
k
. The
conditional pdf of u
k
given α
k
and x
k
is [16]
f
Uα,X
(uα, x)=
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
1
2N
0
exp
xu −α
2
/2
N
0
¸
, −∞< xu ≤ 0,
1
2N
0
exp
xu−α
2
/2
N
0
¸
Q
α
2
N
0
,
4xu
N
0
¸
,
0 < xu < ∞,
(17)
where Q(a, b) is the Marcum Qfunction. It is then possible
to get the true pdf of u
k
using
f
UX
(u  x) =
∞
0
f
Uα,X
(u  α, x) f
α
(α) dα
= 2
∞
0
f
Uα,X
(u  α, x) αe
−α
2
dα.
(18)
Since the computation of Marcum Qfunction is slow
and does not always converge at large values, an exact
evaluation of (18) and hence the computation of L
ch
(x
k
)
can be diﬃcult. We propose a simple approximation which
evaluates (17) with α substituted by its mean E[α]. This leads
to
f
UX
(u  x) ≈
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
1
2N
0
exp
xu −π/8
N
0
¸
, −∞< xu ≤ 0,
1
2N
0
exp
xu −π/8
N
0
¸
Q
π
4N
0
,
4xu
N
0
¸
,
0 < xu < ∞.
(19)
The corresponding LLR from the channel can then be
computed by
L
ch
(x
k
) = log
Pr (u
k
 x
k
= +1)
Pr (u
k
 x
k
= −1)
≈ sign(u
k
)
2u
k

N
0
+ log
Q
π
2
4N
0
,
4u
k

N
0
¸¸¸
.
(20)
An even more convenient compromise is to assume u
k
is Gaussian distributed, as is used in [17] and a few other
papers. Under this Gaussian assumption, we get
f
UX
(u  x) ≈ N
¸
x, 2N
0
+ N
2
0
, (21)
L
ch
(x
k
) ≈
2u
k
2N
0
+ N
2
0
. (22)
Alternatively, instead of using the conventional diﬀer
ential decoding in the ﬁrst iteration, a channel estimation
followed by the decoding of the inner 1/(1 + D) code can
Jing Li (Tiﬀany) 9
−4 −2 0 2 4 6 8
u
0
1
2
3
4
5
6
7
8
×10
−3
P
d
f
:
f
(
u
)
E
s
/N
0
= 6 dB
True pdf f (u)
(Monte Carlo)
f (u  α)
use E[α] for α
Gaussian approximation
N(1, 2N
0
+ N
2
0
)
Figure 7: Distribution of u
k
= Re{y
k
y
∗
k−1
} in a conventional
diﬀerential detection (assume “+1” transmitted).
be used, which makes the ﬁrst iteration exactly the same
as subsequent iterations. This third option then leads to
pilot symbol assisted modulation (PSAM), which has slightly
higher complexity than using diﬀerential detection in the
ﬁrst iteration.
To see how accurate the above treatments are, we plot in
Figure 7 several curves approximating the pdf of u
k
. From
the most sharp and asymmetric to the least sharp and
symmetric, these curves denote the exact pdf of f
UX
(u 
x = +1) from Monte Carlo simulations (histogram, can
be regarded as the numerical evaluation of (18)), the
“meanα approximated” pdf from (19) and the Gaussian
approximated pdf from (21). From the ﬁgure, the Gaussian
approximation does not reﬂect the true pdf well, but this
inaccuracy turns out not severely aﬀecting the overall IDDD
performance. As shown later in Figure 13, all the three treat
ments (Gaussian approximation, meanα approximation,
and PSAM) result in very similar decoding performance.We
attribute this to the fact that the inaccuracy aﬀects mostly the
ﬁrst iteration, and subsequent iterations can help mitigate
the loss. Thus, Gaussian approximation still presents itself
as a simple and viable approach for noncoherent diﬀerential
decoding.
4.1.3. Channel estimator
The channel estimator in the IDDD receiver (Figure 6) may
be implemented in several ways. Here we use a linear ﬁlter of
(2L + 1) taps to estimate α
k
’s and θ
k
’s in the qth iteration
α
(q)
k
e
j
θ
(q)
k
=
L
¸
l=−L
p
l
y
(q−1)
k−l
r
k−l
, (23)
where p
l
denotes the coeﬃcient of the lth ﬁlter tap, and y
(q−1)
k
denotes the estimate on y
k
from the feedback of the previous
iteration. For soft feedback, y
(q−1)
k
is computed using y
(q−1)
k
=
tanh((L
(q−1)
e,i
(y
k
))/2), and for hard feedback, y
(q−1)
k
= sign
(L
(q−1)
e,i
(y
k
)). The LLR message L
(q−1)
e,i
(y
k
) is generated toge
ther with L
(q−1)
e,i
(x
k
) by the inner decoder in the (q − 1)th
decoding iteration (please refer to [4] for the stepbystep
messagepassing decoding algorithm of 1/(1 + D) code). In
the ﬁrst iteration, L
(0)
e,i
(y
k
)’s are initiated as zeros for coded
bits and a large positive number (i.e., +∞) for pilot symbols.
Regarding the choice of the ﬁlter, we take a Wiener ﬁlter,
since it is known to be optimal for estimating channel gain in
the minimum meansquareerror (MMSE) sense, when the
correlation of the fading process, R
k
s, are known [18]. The
ﬁlter coeﬃcients, p
−L
, p
−L+1
, . . . , p
L
, are obtained from the
WienerHopf equation
⎛
⎜
⎜
⎜
⎝
R
0
−N
0
R
1
· · · R
L−1
R
1
R
0
−N
0
· · · R
L−2
· · · · · · · · · · · ·
R
L−1
R
L−2
· · · R
0
−N
0
⎞
⎟
⎟
⎟
⎠
·
⎛
⎜
⎜
⎜
⎝
p
−L
p
−(L−1)
· · ·
p
L
⎞
⎟
⎟
⎟
⎠
=
⎛
⎜
⎜
⎜
⎝
R
−L
R
−L−1
· · ·
R
L
⎞
⎟
⎟
⎟
⎠
,
(24)
where R
k
= (1/2)J
0
(2kπ f
d
T
s
). Since the computation of
p
l
’s from (24) involves an inverse operation on a matrix
(onetime job), it may not be computable when the matrix
becomes (near) singular, which occurs when the channel is
very slow fading. In such cases, a lowpass ﬁlter, or a simple
“moving average” can be used [6].
4.2. Analysis of pilot insertion through EXIT charts
4.2.1. EXIT charts
We perform EXIT analysis [9] to generate further insights
into PA codes and the proposed noncoherent IDDD receiver.
In EXIT charts, the exchange of extrinsic information is
visualized as a decoding/detection trajectory, allowing the
prediction of the decoding convergence and thresholds [9].
Several quantities, like the bit error rate, the mean of
the extrinsic LLR information, and the equivalent SNR
value, were previously used to depict the characteristics
and relations of the component decoders, but the mutual
information is shown to be the most robust among all [9].
The mutual information between the binary bit y
k
and its
corresponding LLR values is deﬁned as
I(Y, L(Y))
Δ
=
1
2
¸
y=±1
∞
−∞
f
L(y)
(η  Y = y)
· log
2
2 f
L(y)
(η  Y = y)
f
L(y)
(η  Y = +1) + f
L(y)
(η  Y = −1)
dη
=
∞
−∞
f
L(y)
(η  Y = +1)
· log
2
2 f
L(y)
(η  Y = +1)
f
L(y)
(η  Y = +1) + f
L(y)
(−η  Y = +1)
dη
= 1 −
∞
∞
f
L(Y)
(η  Y = +1) · log
2
¸
1 + e
−η
dη,
(25)
10 EURASIP Journal on Wireless Communications and Networking
p p
(a)
p p
(b)
Figure 8: Trellis diagram of binary diﬀerential PSK with pilot insertion. (a) Pilot symbols periodically terminate the trellis. (b) Pilot symbols
are separated from the trellis structure.
where L(Y) is either the a priori information L
a
(Y) or the
extrinsic information L
e
(Y), and f
L(Y)
(η  Y = y) is the
conditional pdf. The second equality holds when the channel
is output symmetric such that f
L(y)
(η  Y = − y) =
f
L(y)
(−η  Y = y), and the third equality holds when
the received messages satisfy the consistency condition (also
known as the symmetry condition): f
L(y)
(η  Y = y) =
f
L(y)
(−η  Y = y)e
yη
[11]. Note that the consistency
condition is an invariant in the messagepassing process on
a number of channels including the AWGN channel and
the independent Rayleigh fading channel with perfect CSI;
but it is not preserved on fading channels without CSI or
with estimated (thus imperfect) CSI, since the initial density
function evaluated in the latter cases is but an approximation
of the actual pdf of the LLR messages. Thus, (25) should
be used to compute the mutual information in those cases.
We use the Xaxis to represent the mutual information to
the inner code (a prior) or from the outer code (extrinsic),
denoted as I
a,i
/I
e,o
, and the Yaxis to represent the mutual
information from the inner code or to the outer code,
denoted as I
e,i
/I
a,o
.
4.2.2. Pilot symbol insertion
A practicality issue about noncoherent detection is pilot
insertion. The number of pilot symbols inserted should be
suﬃcient to attain a reasonable track of the channel, but not
in excess. Many researchers have reported that excessive pilot
symbols not only cause wasteful bandwidth expansion, but
actually degrade the overall performance, since the energy
compensation for the rate loss due to excessive pilot more
than outweighs the gain that can be obtained by a ﬁner
channel tracking. This tradeoﬀ issue has long been noted
in literature, but little attention has been given to another
issue of no less importance, namely, how pilots should be
inserted when diﬀerential encoding or other trellisbased
coding/modulation frontend is used.
There exist at least two ways to insert pilot symbols in
a diﬀerential encoder. The widespread approach is to peri
odically terminate the trellis [6, 7], as shown in Figure 8(a),
such that pilot symbols are used to estimate the channel
and at the same time participate in the trellis decoding.
Seemingly plausible, this turns out to be a bad strategy,
since segmenting the trellis into small chunks signiﬁcantly
increases the number of short error events, and consequently
incurs a loss in performance.
The negative eﬀect of trellis segmentation is best illus
trated by the EXIT chart in Figure 9. EXIT curves corre
0 0.2 0.4 0.6 0.8 1
I
a,i
/I
e,o
0.4
0.5
0.6
0.7
0.8
0.9
1
I
e
,
i
/
I
a
,
o
Out code of PA codes
R = 0.75
Outer code of PA codes
R = 0.5
E
s
/N
0
= 4.75 dB,
0, 4, 10, 20% pilots
E
s
/N
0
= 0.5 dB,
0, 4, 10, 20% pilots
Eﬀect of pilots segmenting the trellis, E
s
/N
0
= 4.75, 0.5 dB
Figure 9: The eﬀect of pilot symbols segmenting the trellis on the
performance of the diﬀerential decoder. Normalized Doppler rate
f
d
T
s
= 0.01, E
s
/N
0
= 4.75 dB and 0 dB, perfect CSI.
sponding to the diﬀerential decoder with 0%, 4%, 10%, and
20% pilot insertion are plotted for two diﬀerent SNR values.
To eliminate the impact of other factors, the four curves
in each SNR set are given the same energy per transmitted
symbol and perfect knowledge on the fading phase and
amplitude is provided to all the decoders (irrespective of the
number of pilot symbols). Thus the diﬀerence between the
curves in each family is only due to the diﬀerence in pilot
spacing. At the left end of the curves (when input mutual
information is small), a larger number of pilot symbols
correspond to a better performance (a higher output mutual
information). This is because when little information is
provided from the outer code, pilot symbols become the
primary contributor to a priori information. However, the
situation is completely reversed toward the right end of
the EXIT curves. We see that more pilot symbols actually
degrade the performance, the reason being, given suﬃcient
information provided by the outer code, pilot symbols no
longer constitute the key source of a priori information; on
the other hand, they segment the trellis and shorten error
events, rendering an opposite eﬀect to spectrum thinning
and thus deteriorating the performance. The performance
loss is more severe when more pilot symbols are inserted
and when the code is operating at a relatively low SNR
level. It is worth noting, for example, with 20% of pilot
insertion (pilot spacing is 5), even provided with a perfect
Jing Li (Tiﬀany) 11
mutual information from the outer code (I
a,i
= 1, but
the channel remains noisy), the trellis decoder nevertheless
fails to produce suﬃcient output mutual information I
e,i
. As
such, the inner EXIT curve is bound to intersect the outer
EXIT curve at a rather early stage of the iterative process,
causing the iterative decoder to fail at a high BER level
(not to mention this EXIT curve has 20% more of energy
consumption than the nopilot case).
The implication of this EXIT analysis is that the
widespread approach of inserting pilot symbols as part of
the trellis could cause deﬁciency for diﬀerential encoding
(and other serial concatenated schemes with inner trellis
codes). Speciﬁcally, unless the outer code is itself a capacity
achieving code at some SNR, the inner and outer EXIT
curves will intersect, result in convergence failure and cause
error ﬂoors. We observe that the more the pilot symbols, the
higher the error ﬂoor; and the lower the code rate (lower
SNR), the more severe the impact. It is therefore particularly
important to keep the number of pilot symbols in such
schemes minimal, so that error ﬂoors do not occur too early.
This analysis also suggests an alternative, and potentially
betterperforming, way of pilot insertion, namely, separating
pilots from the trellis and thus not aﬀecting error events; see
Figure 8(b).
It should be pointed out, that the level of the impact
caused by trellis segmentation may be very diﬀerent for
diﬀerent outer codes. Many (outer) codes, including single
parity check codes, block turbo codes (i.e., turbo product
codes) and convolutional codes, will see a large impact, since
these (outer) codes require suﬃcient input information in
order to produce perfect output information, or, put another
way, these codes alone are not “good” codes (good in the
sense as MacKay deﬁned in [2]). However, “good” codes like
LDPC codes will likely see a much smaller impact. This is
because an ideal LDPC code has an EXIT curve shaping like a
box (e.g., see [3, Figure 3]) which can produce perfect output
information as long as the input information is above some
threshold (without requiring I
a,i
= 1). Alternatively, one may
also interpret it as: ideal LDPC codes have large minimum
distances and are capable of correcting short error events
including those caused by the segmentation eﬀect.
To verify the analytical results, we simulate the perfor
mance of a rate 1/2, data block size K = 32 K PA code with
diﬀerent strategies of pilot insertion; see Figure 10. The nor
malized Doppler spread is f
d
T
s
= 0.01, and error rates eval
uated after 10 decoding iterations. Solid lines represent the
cases where perfect channel knowledge is known to the
receiver, and dashed lines represent the case where nonco
herent detection is used. Comparing solid curves, we see a
drastic performance gap results from diﬀerent strategies of
pilot insertion. In this speciﬁc case, by segmenting the trellis
every 10 symbols, trellissegmented pilot insertion losses
more than 3 dB at BER of 10
−4
than otherwise.The dashed
curve corresponds to the same PA code noncoherently
detected via the IDDD receiver discussed before, where 10%
of pilot symbols are inserted using the strategy in Figure 8(b)
and where an 81tap wiener ﬁlter is used to estimate the
channel. It is interesting to note that if one overlooks
the impact of pilot insertion strategies, one might arrives
3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
E
b
/N
0
(dB)
(64 K, 32 K), f
d
T
s
= 0.01, 10%
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
0%, ideal
10%, ideal, pilots separated
10%, ideal, pilots term trellis
10%, IDDD, pilots separated
Performance gap due to
diﬀerent pilot insertion strategies
Figure 10: Performance of PA codes with diﬀerent pilot insertion
strategies. Normalized Doppler rate f
d
T
s
= 0.01, code rate 0.5, data
block size 32 K, 0% or 10% pilot insertion, 10 iterations.
at a paradox result that noncoherent detection (dashed
line) performs (noticeably) better than coherent detection
(rightmost solid line)!
4.3. Impact of the pilot symbol spacing
and ﬁlter length
We now investigate how the number of pilot symbols and
the length of the estimation ﬁlter aﬀect the performance
of noncoherent detection. Figure 11 illustrates the impact
of diﬀerent pilot spacing on the BER performance of fast
fading channels where the normalized Doppler spread takes
f
d
T
s
= 0.05, 0.02 or 0.01. We observe the following: (1) The
IDDD receiver is rather robust for diﬀerent Doppler rates.
(2) Smaller pilot spacing, such as <6 symbols, is undesirable,
whose consumption of additional energy more than out
weighs any gain it may bring. (3) The code performance at
high Doppler rates is more sensitive to pilot spacing than that
at lower Doppler rates. At the normalized Doppler rate of
0.01 (already fast fading), noncoherently detected PA codes
tolerate pilot spacing as small as 6 symbols and as large
as 45 to 50 symbols (put aside the bandwidth issue); but
at very fast Doppler rate of 0.05, pilot spacing beyond 7–9
symbols will soon cause drastic performance degradation.
For comparison, we also plot the case where pilot symbols
periodically terminate the trellis (dashed line), which, due
to trellis segmentation, experiences inferior performance
when pilot spacing is small. Compared to diﬀerentially
encoded turbo codes [6], PA codes appear to require fewer
pilot symbols (we note that in the study of diﬀerentially
encoded turbo codes in [6], the authors terminated the trellis
periodically with pilot symbols, which may have made the
12 EURASIP Journal on Wireless Communications and Networking
0 10 20 30 40 50 60 70
Pilot spacing
f
d
T
s
= 0.01, E
b
/N
0
= 10 dB
10
−4
10
−3
10
−2
B
E
R
f
d
T
s
= 0.05
f
d
T
s
= 0.02
f
d
T
s
= 0.01
f
d
T
s
= 0.01, segment trellis
Figure 11: Eﬀect of the number of pilot symbols on the perfor
mance of noncoherent detected PA codes on correlated Rayleigh
channels with f
d
T
s
= 0.01. Code rate 0.75, data block size 1 K, ﬁlter
length 65, 10 (global) iterations, 4 (local) iterations within the outer
code of PA codes.
6 7 8 9 10 11 12
E
b
/N
0
(dB)
K = 1 K, R = 3/4, f
d
T
s
= 0.01, 4% pilots, 10 iterations
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
IDDD1, soft feedback, Guass approximation
IDDD2, soft feedback, meanα approximation
IDDD3, soft feedback, PSAM
IDDD4, hard feedback, PSAM
Figure 12: Comparison of BER performance for several noncoher
ent receiver strategies on correlated Rayleigh channels with f
d
T
s
= 0.01. Code rate 0.75, data block size 1 K, 4% of bandwidth
expansion, ﬁlter length 65, 10 (global) iterations each with 4 (local)
iterations for the outer decoding.
tolerant range of pilot spacing (at the small spacing end)
smaller than otherwise).
The impact of the length of the channel tracking ﬁlter is
also studied. We observe that while the ﬁlter length aﬀects
the overall performance, the impact is limited compared to
pilot spacing.This is consistent with what has been reported
in other studies [6] and is not a new discovery. Hence, we
omit the plot.
4.4. Simulation results of noncoherent detection
The performance of noncoherently detected PA codes on
fast Rayleigh fading channels are presented below. Unless
otherwise indicated, the BERcurves shown are after 10 global
iterations, and in each global iteration, 4 to 6 local iterations
of the outer code are performed. We have chosen these
parameters on the basis of a set of simulations and trading
oﬀ between performance and complexity.
4.4.1. Noncoherent detection of PA codes with different
receiver strategies
We compare the BER performance of 4 types of IDDDstrate
gies for a K = 1 K, R = 3/4 PAcode on a f
d
T
s
= 0.01 Rayleigh
fading channel in Figure 12. “IDDD1” uses the conventional
diﬀerential detection with Gaussian approximation (22) to
compute L
ch
(x
k
) in the ﬁrst iteration, and soft feedback of
y
k
in all iterations to assist channel estimation; “IDDD
2” uses conventional diﬀerential detection with “meanα”
approximation (20) in the ﬁrst iteration and soft feedback
in all iterations; “IDDD3” is PSAM with soft feedback; and
“IDDD4” is PSAM with hard feedback. In all cases, 4% of
pilot symbols are inserted and curves shown are after 10
iterations. Diﬀerent decoding strategies in the ﬁrst iteration
does not aﬀect the performance much, and the performance
is not very sensitive to hard or soft feedback either. Although
not shown, simulations of a long PA code (K = 48 K) of the
same (high) rate (R = 3/4) reveal a similar phenomenon. It is
possible, however, that other codes may be more sensitive to
the diﬀerence in decoding strategies especially the diﬀerence
in the feedback information [6].
4.4.2. Comparison of noncoherent detection with
coherent detection
Figure 13 shows the performance of rate 3/4 PA codes after
10 iterations on fast Rayleigh fading channels with Doppler
rate T
s
f
d
= 0.01.Short block size of 1 K and large block
size of 48 K are evaluated. In each case, a family of 5 BER
versus E
b
/N
0
curves, accounting for rate loss due to pilot
insertion, are plotted. The three leftmost curves are the
ideal coherent case with knowledge of fading amplitudes
and phases provided to the receiver, and the two right
curves are the noncoherent case where IDDD is used to
track amplitudes and phases. In both the coherent and
the noncoherent case, trellis segmentation incurs a small
performance loss, but since the pilot spacing is not very
small (every 25 symbols), the eﬀect is not as drastic as the
case in Figure 10. The noncoherent cases are about 1 dB and
0.55 dB away from the ideal coherent case at BER of 10
−4
for block sizes of 48 K and 1 K, respectively. This satisfying
performance is achieved with only 4% of pilot insertion and
a very lowcomplexity IDDD receiver.
Jing Li (Tiﬀany) 13
5 6 7 8 9 10 11 12
E
b
/N
0
(dB)
R = 3/4, f
d
T
s
= 0.01, 10 iterations
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
0%, ideal
4%, ideal, pilots separated from trellis
4%, ideal, pilots terminate trellis
4%, IDDD, pilots separated from trellis
4%, IDDD, pilots terminate trellis
K = 1 K
K = 48 K
Figure 13: Comparison of BER performance for several transmis
sion/reception strategies for PA codes of large and small block sizes
on correlated Rayleigh channels with f
d
T
s
= 0.01. Code rate 0.75,
data block size 48 K and 1 K, 4% of bandwidth expansion, ﬁlter
length 65, 10 (global) iterations each with 4 (local) iterations for
the outer decoding.
5. CONCLUSION
Previous work has established product accumulate codes as
a class of provenly “good” codes on AWGN channels, with
low lineartime complexity and performances close to the
Shannon limit. This paper performs a comprehensive study
of product accumulate codes on Rayleigh fading channels
with both coherent and noncoherent detection. Useful
analytical tools including Divsalar’s simple bounds, density
evolution, and EXIT charts are employed, and extensive
simulations are conducted. It is shown that PA codes not
only perform remarkably well with coherent detection, but
the embedded diﬀerential encoder makes them naturally
suitable for noncoherent detection. A simple iterative dif
ferential detection and decoding (IDDD) strategy allows PA
codes to perform only 1 dB away from the coherent case.
Another useful ﬁnding reveals that the widespread practice
of inserting pilot symbols to terminate the trellis actually
incurs performance loss compared to when pilot symbols are
inserted as separate parts from the trellis.
We conclude by proposing product accumulate codes
as a promising lowcost candidate for wireless applications.
The advantages of PA codes include (i) they perform very
well with coherent and noncoherent detection (especially
at high rates), (ii) the performance is comparable to turbo
and LDPC codes, yet PA codes require much less decoding
complexity than turbo codes and much less encoding
complexity and memory than random LDPC codes, and (iii)
the regular structure of PA codes makes it possible for low
cost implementation in hardware.
APPENDIX
DISCRETIZEDDENSITY EVOLUTIONFOR PA CODES
Using messagepassing decoding, the relevant operations on
the messages (in LLR form) include the sum in the real
domain and the tanh operation (also known as the check
operation or operation). For independent messages to
add together, the resulting pdf of the sum is the discrete
convolution (denoted by ∗) of the component pdf ’s which
can be eﬃciently implemented using a fast Fourier transform
(FFT). For the tanh operation on messages, deﬁne:
γ = α β
Δ
= Q(2tanh
−1
(tanh(α/2) tanh(β/2))),
where α, β, and γ are quantiﬁed messages, and Q
deﬁnes the quantization operation. The pdf of γ, f
γ
, can be
computed using
f
γ
[k] =
¸
(i, j):kΔ=iΔjΔ
f
α
[i] · f
β
[ j], (A.1)
where Δis the quantization interval. To simplify the notation,
we denote this operation (A.1) as f
γ
= R( f
α
, f
β
), and using
induction on the above equation, we further denote
R
k
( f
α
)
Δ
= R
¸
f
α
,
¸
R
¸
f
α
, . . . , R
¸
f
α
, f
α
· · ·
. .. .
k−1
. (A.2)
The following notations are also used:
(i) f
Lch,y
: pdf of the messages of the received signals y
obtained from the channel (see Figure 1(b)),
(ii) f
(k)
Lo,x
: pdf of the (a prior) messages of the input x to
the inner 1/(1 +D) code in the kth iteration (obtained
from the outer code in the k − 1th iteration) (see
Figure 1(b)),
(iii) f
(k)
Le,x
: pdf of the (extrinsic) messages passed from the
inner code to the outer code in the kth iteration,
(iv) f
(k)
Le1,(·)
and f
(k)
Le2,(·)
: pdf ’s of the extrinsic information
computed from the upper and lower branch of the
outer code in the kth iteration, respectively. Subscripts
d and p denote data and parity bit, respectively.
Obviously, f
(0)
Le2,d
= f
(0)
Le2, p
= δ(0), the Kronecker delta
function.
The discretized density evolution of a rate t/(t + 2) PA
code can then be summarized as follows [4]:
initialization: f
(0)
Lo,x
= f
(0)
Le,y
= f
(0)
Le1,d
= f
(0)
Le2,d
= δ(0), (A.3)
inner code: f
(k)
Le,y
= R
f
(k−1)
Lo,x
, f
Lch,y
∗f
(k−1)
Le,y
, (A.4)
f
(k)
Le,x
= R
2
f
Lch,y
∗f
(k)
Le,y
, (A.5)
innertoouter: f
(k)
Lo,d
= f
(k)
Le,x
, (A.6)
f
(k)
Lo, p
= f
(k)
Le,x
, (A.7)
14 EURASIP Journal on Wireless Communications and Networking
outer code: f
(k)
Le1,d
= R
f
(k)
Lo, p
, R
(t−1)
f
(k)
Lo,d
∗f
(k−1)
Le2,d
, (A.8)
f
(k)
Le1, p
= R
t
f
(k)
Lo,d
∗f
(k−1)
Le2,d
, (A.9)
f
(k)
Le2,d
= R
f
(k)
Lo, p
, R
(t−1)
f
(k)
Lo,d
∗f
(k)
Le1,d
, (A.10)
f
(k)
Le2, p
= R
t
f
(k)
Lo,d
∗f
(k)
Le1,d
, (A.11)
outertoinner: f
(k+1)
Lo,x
=
t
¸
f
(k)
Le1,d
+ f
(k)
Le2,d
t + 2
+
f
(k)
Le1, p
+ f
(k)
Le2, p
2t + 2
.
(A.12)
Although the outer code of PA codes can be viewed
as an LDPC code, it is desirable to take a serial update
procedure as described above rather than a parallel one as
in a conventional LDPC code, since this allows the checks
corresponding to the two SPC branches to take turns to
update, which leads to a faster convergence [4].
ACKNOWLEDGMENTS
This research work is supported in part by the National
Science Foundation under Grants no. CCF0430634 and
CCF0635199, and by the Commonwealth of Pennsylvania
through the Pennsylvania Infrastructure Technology Alliance
(PITA).
REFERENCES
[1] R. G. Gallager, Low Density Parity Check Codes, MIT Press,
Cambridge, Mass, USA, 1963.
[2] D. J. C. MacKay, “Good errorcorrecting codes based on very
sparse matrices,” IEEE Transactions on Information Theory,
vol. 45, no. 2, pp. 399–431, 1999.
[3] J. Li, “Diﬀerentially encoded LDPC codes—part II: general
case and code optimization,” to appear in EURASIP Journal
on Wireless Communications and Networking.
[4] J. Li, K. R. Narayanan, and C. N. Georghiades, “Product
accumulate codes: a class of codes with nearcapacity perfor
mance and low decoding complexity,” IEEE Transactions on
Information Theory, vol. 50, no. 1, pp. 31–46, 2004.
[5] D. Divsalar and E. Biglieri, “Upper bounds to error probabili
ties of coded systems beyond the cutoﬀ rate,” in Proceedings of
the IEEE International Symposiumon Information Theory (ISIT
’00), p. 288, Sorrento, Italy, June 2000.
[6] M. C. Valenti and B. D. Woerner, “Iterative channel estimation
and decoding of pilot symbol assisted turbo codes over ﬂat
fading channels,” IEEE Journal on Selected Areas in Communi
cations, vol. 19, no. 9, pp. 1697–1705, 2001.
[7] P. Hoeher and J. Lodge, ““Turbo DPSK”: iterative diﬀerential
PSK demodulation and channel decoding,” IEEE Transactions
on Communications, vol. 47, no. 6, pp. 837–843, 1999.
[8] M. Peleg and S. Shamai, “Iterative decoding of coded and
interleaved noncoherent multiple symbol detected DPSK,”
Electronics Letters, vol. 33, no. 12, pp. 1018–1020, 1997.
[9] S. T. Brink, “Convergence behavior of iteratively decoded
parallel concatenated codes,” IEEE Transactions on Communi
cations, vol. 49, no. 10, pp. 1727–1737, 2001.
[10] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of
optimal and suboptimal MAP decoding algorithms operating
in the log domain,” in Proceedings of the IEEE International
Conference on Communications (ICC ’95), vol. 2, pp. 1009–
1013, Seattle, Wash, USA, June 1995.
[11] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke,
“Design of capacityapproaching irregular lowdensity parity
check codes,” IEEE Transactions on Information Theory, vol. 47,
no. 2, pp. 619–637, 2001.
[12] S.Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis
of sumproduct decoding of lowdensity paritycheck codes
using a Gaussian approximation,” IEEE Transactions on Infor
mation Theory, vol. 47, no. 2, pp. 657–670, 2001.
[13] K. Xie and J. Li, “On accuracy of Gaussian assumption in
iterative analysis for LDPC codes,” in Proceedings of IEEE
International Symposium on Information Theory (ISIT 06), pp.
2398–2402, Seattle, Wash, USA, July 2006.
[14] J. Hou, P. H. Siegel, and L. B. Milstein, “Performance analysis
and code optimization of low density paritycheck codes on
Rayleigh fading channels,” IEEE Journal on Selected Areas in
Communications, vol. 19, no. 5, pp. 924–934, 2001.
[15] I. S. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series and
Products, Academic Press, New York, NY, USA, 1980.
[16] G. L. Stuber, Principles of Mobile Communications, Kluwer
Academic Publishers, Norwell, Mass, USA, 1996.
[17] M. K. Simon and M.S. Alouini, Digital Communication over
Fading Channels, John Wiley & Sons, New York, NY, USA,
2000.
[18] P. Hoeher, S. Kaiser, and P. Robertson, “Twodimensional
pilotsymbolaided channel estimation by Wiener ﬁltering,” in
Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP ’97), pp. 1845–1848,
Munich, Germany, April 1997.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 367287, 10 pages
doi:10.1155/2008/367287
Research Article
Differentially Encoded LDPC Codes—Part II:
General Case and Code Optimization
Jing Li (Tiffany)
Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
Correspondence should be addressed to Jing Li (Tiﬀany), jingli@ece.lehigh.edu
Received 19 November 2007; Accepted 6 March 2008
Recommended by Yonghui Li
This twopart series of papers studies the theory and practice of diﬀerentially encoded lowdensity paritycheck (DELDPC) codes,
especially in the context of noncoherent detection. Part I showed that a special class of DELDPC codes, product accumulate
codes, perform very well with both coherent and noncoherent detections. The analysis here reveals that a conventional LDPC code,
however, is not ﬁtful for diﬀerential coding and does not, in general, deliver a desirable performance when detected noncoherently.
Through extrinsic information transfer (EXIT) analysis and a modiﬁed “convergenceconstraint” density evolution (DE) method
developed here, we provide a characterization of the type of LDPC degree proﬁles that work in harmony with diﬀerential detection
(or a recursive inner code in general), and demonstrate how to optimize these LDPC codes. The convergenceconstraint method
provides a useful extension to the conventional “thresholdconstraint” method, and can match an outer LDPC code to any given
inner code with the imperfectness of the inner decoder taken into consideration.
Copyright © 2008 Jing Li (Tiﬀany). This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
With an increasingly mature status of the sparsegraph
coding technology in a theoretical context, the very pervasive
scope of their wellproven practical applications, and the
widescale availability of software radio, lowdensity parity
check (LDPC) codes have become and continue to be a
favorable coding strategy for researchers and practitioners.
Their superb performance on various channel models and
with various modulation schemes have been documented in
many papers. While the existing literature has shed great light
on the theory and practice of LDPC codes, investigation was
largely carried out from a pure coding perspective, where
the prevailing assumption is that the synchronization and
channel estimation are handled perfectly by the frontend
receiver.
In wireless communications, accurate phase estimation
may in many cases be very expensive or infeasible, which calls
for noncoherent detection. Practical noncoherent detection
is generally performed in one of the two ways: inserting
pilot symbols directly in the coded and modulated sequence
to help track the channel (it is possible to insert either
pilot tones or pilot symbols, but the latter is found to be
more eﬀective and is what of relevance to this paper), and
employing diﬀerential coding. Considering that the former
may result in a nontrivial expansion of bandwidth especially
on fastchanging channels, many wireless systems adopt the
latter, including satellite and radiorelay communications.
The problem we wish to investigate is: LDPC codes
perform remarkably well with coherent detection, but how
about their performance with noncoherent detection and
noncoherent diﬀerential detection in particular? This series
of twopart papers aim to generate useful insight and
engineering rules. In Part I of the series [1], we considered
a special class of diﬀerentially encoded LDPC (DELDPC)
codes, product accumulate (PA) codes [2]. The outer code
of a (p(t +2), pt) PA code is a simple, structured LDPC code
with left (variable) degree proﬁle λ(x) = 1/(t +1) +t/(t +2)x
and right (check) degree proﬁle ρ(x) = x
t
; and the inner code
is a diﬀerential encoder 1/(1 + D). We showed that, despite
their simplicity, PA codes perform quite well with coherent
detection as well as noncoherent diﬀerential detection [1].
This motivates us, in Part II of this series of papers, to
study the general case of diﬀerentially encoded LDPC codes.
The question of how LDPC codes perform with diﬀerential
coding is a worthy one [3–6], and directly relates to other
2 EURASIP Journal on Wireless Communications and Networking
interesting problems. For example, what is the best strategy
to apply LDPC codes in noncoherent detection—should
diﬀerential coding be used or not? Modulation schemes such
as the minimum phase shift keying (MPSK) have equivalent
realizations in recursive and nonrecursive forms; is one
formpreferred over the other in the context of LDPCcoding?
What other DELDPC conﬁgurations, besides PA codes, are
good for diﬀerential coding, and how to ﬁnd them?
Since the conventional diﬀerential detector (CDD) oper
ating on two symbol intervals incurs a nontrivial per
formance loss [7], and since multiple symbol diﬀerential
detectors (MSDD) [8] have a rather high complexity that
increases exponentially with the window size, we developed,
in Part I of this series of papers, a simple iterative diﬀerential
detection and decoding (IDDD) receiver, whose structure is
shown in [1, Figure 6]. The IDDD receiver comprises a CDD
with 2symbol observation window (the current and the
previous), a phasetracking Wiener ﬁlter, a messagepassing
decoder for the accumulator 1/(1 + D) [2], and a message
passing decoder conﬁgured for the (outer) LDPC code. The
CDD, coupled with the phasetracking unit and the 1/(1 +
D) decoder, acts as the frontend, or, the inner decoder of
the serially concatenated system, and the succeeding LDPC
decoder acts as the outer decoder. Soft reliability information
in the form of loglikelihood ratio (LLR) is exchanged
between the inner and the outer decoders to successively
reﬁne the decision. In the sequel, unless otherwise stated, we
take the IDDD receiver as the default noncoherent receiver in
our discussion of DELDPC codes.
We study the convergence property of IDDDfor a general
DELDPC code, through extrinsic information transfer
(EXIT) charts [9, 10]. A somewhat unexpected ﬁnding is
that, while a highrate PA code yields desirable performance
with noncoherent (diﬀerential) detection, a general DE
LDPC code does not. We attribute the reason to the mis
match of the convergence behavior between a conventional
LDPC code and a diﬀerential decoder. This suggests that
conventional LDPC codes, while an excellent choice for
coherent detection, are not as desirable for noncoherent
detection. It also gives rise to the question of what special
LDPC codes, possibly performing poorly in the conventional
scenario (such as the outer code of the PA code), may turn
out right for diﬀerential modulation and detection?
One remarkable property of LDPC codes is the possibil
ity to design their degree proﬁles, through density evolution
[11], to match to a speciﬁc channel or a speciﬁc inner code
[12–15]. To make LDPC codes work in harmony with the
noncoherent diﬀerential decoder of interest, here we develop
a convergenceconstraint density evolution method. The
conventional thresholdconstraint method [11, 16] targets the
best asymptotic threshold, and the new method eﬀectively
captures the interaction and convergence between the inner
and the outer EXIT curves through a set of “sample points.”
In that, it makes it possible to optimize LDPC codes to
match to an (arbitrary) inner code/modulation with the
imperfectness of the inner decoder/demodulator taken into
account. Our study reveals that LDPC codes may be divided
in two groups. Those having minimum left degree of ≥2 are
generally suitable for a nonrecursive inner code/modulator
but not for a diﬀerential detector or any recursive inner
code. On the other hand, the LDPC codes that perform
well with a recursive receiver always have degree1 (and
degree2) variable nodes. Further, when the code rate is
high, these degree1 and 2 nodes become dominant. This
also explains why highrate PA codes, whose outer code has
degree1 and degree2 nodes only, perform remarkably with
(noncoherent) diﬀerential detection [1].
The channel model of interest here is ﬂat Rayleigh fading
channels with additive white Gaussian noise (AWGN), the
same as discussed in Part I [1]. Let r
k
be the noisy signal at
the receiver, let s
k
be the binary phase shift keying (BPSK)
modulated signal at the transmitter, let n
k
be the i.i.d.
complex AWGN with zero mean and variance σ
2
= N
0
/2 in
each dimension, and let α
k
e
jθk
be the fading coeﬃcient with
Rayleigh distributed amplitude α
k
and uniformly distributed
phase θ
k
. We have r
k
= α
k
e
jθk
s
k
+ n
k
. Throughout the paper,
θ
k
is assumed known perfectly to the receiver/decoder in
the coherent detection case, and unknown (and needs to be
worked around) in the noncoherent detection case. Further,
the receiver is said to have channel state information (CSI) if
α
k
known (irrespective of θ
k
), and no CSI otherwise.
We consider correlated channel fading coeﬃcients (so
that noncoherent detection is possible). Applying Jakes’
isotropic scattering land mobile Rayleigh channel model, the
autocorrelation of α
k
is characterized by the 0thorder Bessel
function of the ﬁrst kind
R
k
=
1
2
J
0
(2kπ f
d
T
s
), (1)
and the power spectrum density (PSD) is given by
S( f ) =
P
π
1 −
¸
f / f
d
2
, for  f  < f
d
, (2)
where f
d
T
s
is the normalized Doppler spread, f is the
frequency band, τ is the lag parameter, and P is a constant
that is dependent on the average received power given a
speciﬁc antenna and the distribution of the angles of the
incoming power.
The rest of the paper is organized as follows. Section 2
evaluates the performance of a conventional LDPCcode with
noncoherent detection, and compare it with that of PAcodes.
Section 3 proposes the convergenceconstraint method to
optimize LDPC codes to match to a given inner code and
particular a diﬀerential detector. Section 4 concludes the
paper.
2. CODES MATCHEDTODIFFERENTIAL CODING
Part I showed that PA codes, a special class of DELDPC
codes, perform quite well with coherent detection as well as
noncoherent detection [1]. This section reveals whether or
not this also holds for general DELDPC codes, and the far
subtly why.
The analysis makes essential use of the EXIT charts
[9, 10], which are obtained through a repeated application
of density evolution at diﬀerent decoding stages. Although
they were initially proposed solely as a visualization tool,
Jing Li (Tiﬀany) 3
recent studies have revealed surprisingly elegant and useful
properties of EXIT charts. Speciﬁcally, the convergence prop
erty states that, in order for the iterative decoder to converge
successfully, the outer EXIT curve should stay strictly below
the inner EXIT curve, leaving an open tunnel between the
two curves. The area property states that the area under
the EXIT curve, A =
1
0
I
e
dI
a
, corresponds to the rate of
the code [10], where I
a
and I
e
denote the a priori (input)
mutual information to and the extrinsic (output) mutual
information from a particularly subdecoder, respectively.
When the auxiliary channel is an erasure channel and
the subdecoder is an optimal one, the relation is exact;
otherwise, it is a good approximation [10]. The immediate
implication of these properties is that, to fully harness the
capacity (achievable rate) provided by the (noncoherent)
inner diﬀerential decoder, the outer code must have an EXIT
curve closely matched in shape and in position to that of the
inner code.
With this in mind, we evaluate a few examples of
(DE)LDPCcodes. (The computation of EXIT charts speciﬁc
to DELDPC codes with IDDD receiver is discussed in [1].)
We consider two conﬁgurations of the inner code:
(1) a diﬀerential decoder for 1/(1 + D); and
(2) a direct detector, that is, a BPSK detector;
and three conﬁgurations of the outer code:
(1) the outer code of a PA code, which has degree proﬁle:
λ(x) =
1
7
+
6
7
x,
ρ(x) = x
7
;
(3)
(2) a (3,12)regular LDPC code; and
(3) an optimized irregular LDPC code reported in [17],
whose threshold is 0.6726—about 0.0576 dB away
from the AWGN capacity—and whose degree proﬁle is
γ(x) = 0.1510x+0.1978x
2
+0.2201x
6
+0.0353
7
+0.3958x
29
,
ρ(x) = x
20
.
(4)
All three outer codes have rate 3/4, and the channel
is a correlated Rayleigh fading channel with AWGN and
normalized Doppler rate of f
d
T
s
= 0.01.
The EXIT curves, plotted in Figure 1, demonstrate that
the outer code of the PA code and the diﬀerential decoder
match quite well, but a conventional LDPC code, regular or
irregular, will either intersect with the diﬀerential decoder,
causing decoder failure, or leave a huge area between
the curves, causing a capacity loss. On the other hand,
LDPC codes, especially the (optimized) irregular ones,
agree very well with the direct detector. This suggests that
(conventional) LDPC codes perform better as a single code
than being concatenated with a recursive inner code. Put it
another way, an LDPCcode that is optimal in the usual sense,
for example, BPSK modulation and memoryless channels,
may become quite suboptimal when operated together with
a recursive inner code or a recursive modulation, such
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
I
e
,
i
/
I
a
,
o
0 0.2 0.4 0.6 0.8 1
I
a,i
/I
e,o
0.35
0.29
0.24
0.19
0.15
0.12
0.085
0.055
0.026
0.013
0.0025
0.005
0.0076
f
d
T
s
= 0.01, E
b
/N
0
= 5.32, R = 3/4
Diﬀerential code 1/(1 + D)
Fading channel
Outer code of PA code
LDPC code (regular)
LDPC code (irregular)
Regular LDPC
Irregular LDPC
PA (outer)
Rayleigh channel
Diﬀerential code
Figure 1: EXIT curves of LDPC codes, the outer code of PA codes,
diﬀerential decoder and the direct detector of Rayleigh channels.
Normalized Doppler rate 0.01, E
b
/N
0
= 5.32 dB, code rate 3/4,
(3, 12)regular LDPC code, and optimized irregular LDPC code
with ρ(x) = x
20
and γ(x) = 0.1510x+0.1978x
2
+0.2201x
6
+0.0353
7
+
0.3958x
29
.
as a diﬀerential encoder. On the other hand, not using
diﬀerential coding generally requires more pilot symbols in
order to track the channel well, especially on fastfading
environments. Hence, it is fair to say that (conventional)
LDPC codes that boast outstanding performance under
coherent detection may not be nearly as advantageous
under noncoherent detection, since they either suﬀer from
performance loss (with diﬀerential encoding) or incur a
large bandwidth expansion (without diﬀerential encoding).
In comparison, PA codes can make use of the (intrinsic)
diﬀerential code for noncoherent detection, and therefore
present a better choice for bandwidthlimited wireless appli
cations.
Before providing simulations to conﬁrm our ﬁndings, we
note that the EXIT curves of both inner codes in Figure 1 are
computed using perfect knowledge of the fading coeﬃcients.
We used this genieaided case in the discussion, to rid oﬀ
the artifact of coarse channel estimation and better contrast
the diﬀerences between the recursive diﬀerential detector
and the nonrecursive direct detector. If the amplitude and
phase information is to be estimated and handled by the
inner code as in actual noncoherent detection, then the EXIT
curve of the direct detector will show a small rising slope at
the left end instead of being a ﬂat straight line all the way
through, and the EXIT curve of the diﬀerential decoder will
also exhibit a deeper slope at the left end.
4 EURASIP Journal on Wireless Communications and Networking
Figure 2 plots the BER performance curves of the same
three codes speciﬁed in Figure 1 on Rayleigh channels with
noncoherent detection. All the codes have data block size
K = 1 K and code rate 3/4. Soft feedback is used in
IDDD, the normalized Doppler spread is 0.01, and 2% or
4% pilot symbols are inserted to help track the channel.
The two LDPC codes are evaluated either with or without
a diﬀerential inner code. From the most power eﬃcient
to the least power eﬃcient, the curves shown are (i) PA
code with 4% of pilot symbols, (ii) PA code with 2% of
pilot symbols, (iii) BPSKcoded irregular LDPC code with
4% of pilot symbols, (iv) BPSKcoded regular LDPC code
with 4% of pilot symbols, (v) BPSKcoded irregular LDPC
code with 2% of pilot symbols, (vi) diﬀerentially encoded
irregular LDPC code with 4% of pilot symbols. It is evident
that (conventional) LDPC codes suﬀer from a diﬀerential
inner code. For example, with 4% of bandwidth expansion,
BPSKcoded irregular and regular LDPC codes perform
about 0.5 and 1 dB worse than PA codes at BER of 10
−4
,
respectively, but the diﬀerentially encoded irregular LDPC
code falls short by more than 2.2 dB. Further, while the
irregular LDPC code (not diﬀerentially coded) is moderately
(0.5 dB) behind the PA code with 4% of pilot symbols, the
gap becomes much more signiﬁcant when pilot symbols are
reduced in half. For PA codes, 2% of pilot symbols remain
adequate to support a desirable performance, but they
become insuﬃcient to track the channel for nondiﬀerentially
encoded LDPC codes, causing a considerable performance
loss and an error ﬂoor as high as BER of 10
−3
. Thus, the
advantages of PA codes over (conventional) LDPC codes
are rather apparent, especially in cases when noncoherent
detection is required and when only limited bandwidth
expansion is allowed.
3. CODE DESIGNFROMTHE CONVERGENCE
PROPERTY
3.1. Problemformulation
EXIT analysis and computer simulations in the previous
section show that a conventional LDPC code does not ﬁt
diﬀerential coding, but special cases such as the the outer
code of PA codes do. This raises more interesting questions:
what other (special) LDPC codes are also in harmony with
diﬀerential encoding? What degree proﬁles do they have? Is
it possible to characterize and optimize the degree proﬁles,
and how?
The fundamental tool to solve these questions lies in
convex optimization. In [11], the optimization problem
of the irregular LDPC degree proﬁles on AWGN channels
was formulated as a dualitybased convex optimization
problem, and an iterative method termed density evolution
was proposed to solve the problem. In [16], a Gaussian
approximation was applied to the density evolution method,
which reduces the problem to be a linear optimization
problem. Density evolution has since been exploited, in
diﬀerent ﬂavors and possibly combined with diﬀerential
evolution [18], to design good LDPC ensembles for a vari
ety of communication channels and modulation schemes,
10
−1
10
−2
10
−3
10
−4
10
−5
B
E
R
6 7 8 9 10 11 12 13 14
E
b
/N
0
(dB)
K = 1 K, R = 3/4, f
d
T
s
= 0.01, 10 iter
PA, 2%
PA, 4%
Irregular LDPC, 4%
Regular LDPC, 4%
Irregular LDPC, 2%
Irregular LDPC, 4%, dif. dec.
Figure 2: Comparison of PA codes and LDPC codes on fastfading
Rayleigh channels with noncoherent detection and decoding. Solid
line: PA codes, dashed lines: LDPC codes. Code rate 0.75, data block
size 1 K, ﬁlter length 65, normalized Doppler spread 0.01, 10 global
iterations, and 4 (local) iterations within LDPC codes or the outer
code of PA codes inside each global iteration.
see, for example [12–15] and the references therein. The
results reported in these previous papers are excellent, but
they almost exclusively aimed at the asymptotic threshold,
namely, their cost functions were set to minimize the SNR
threshold for a target code rate, or, equivalently, to maximize
the code rate for a target SNR threshold. This is well justiﬁed,
since in these papers, the primary involvement of the channel
is to provide the initial LLR information to trigger the start
of the density evolution process.
However, the problem we consider here is somewhat
diﬀerent. Our goal is to design codes that can fully achieve
the capacity provided by the given inner receiver, and the
noncoherent diﬀerential decoder in particular. Considering
that the inner receiver, due to the lack of channel knowledge
or other practical constraints, may not be an optimal receiver,
it is of paramount importance to control the interaction
between the inner and the outer code, or, the convergence
behavior as reﬂected in the matching of shape and position of
the corresponding EXIT curves. To emphasize the diﬀerence,
we thereafter refer to the conventional density evolution
method as the “thresholdconstraint” method, and propose
a “convergenceconstraint” method as a useful extension to
the conventional method.
The key idea of the proposed method is to sample
the inner EXIT curve and design an (outer) EXIT curve
that matches with these sample points, or, “control points.”
Suppose we choose a set of M control points in the EXIT
plane, denoted as (v
1
, w
1
), (v
2
, w
2
), . . . , (v
M
, w
M
). Let T
o
(·)
be the inputoutput mutual information transfer function
of the outer LDPC code (whose exact expression of T
o
Jing Li (Tiﬀany) 5
will be deﬁned later in (17)), the optimization problem is
formulated as
max
¸
Dv
i=1
λi =1,
¸
Dc
j=2
ρj =1
R = 1 −
¸
Dc
j=2
ρ
j
/ j
¸
Dv
i=1
λ
i
/i
 T
o
¸
w
k
≥ v
k
, k = 1, 2, . . . , M
,
(5)
where R denotes the code rate of the outer LDPC code, and
λ
i
and ρ
i
denote the fraction of edges that connect to variable
nodes and check nodes of degree i, respectively.
The formulation in (5) assumes that the LLR mes
sages at the input of the inner and the outer decoder
are Gaussian distributed, and that the output extrinsic
mutual information (MI) of an irregular LDPC code
corresponds to a linear combination of the extrinsic MI
from a set of regular codes. As reported in literature,
the Gaussian assumption for LLR messages is less not
far from reality on AWGN channels but less accurate
on Rayleigh fading channels [12]. Nevertheless, Gaussian
assumption is used for several reasons. The ﬁrst reason
is simplicity and tractability. Tracking and optimizing the
exact message pdf ’s involves tedious computation, which
is exacerbated by the fact the proposed new method is
governed by a set of control points, rather than a single
control point as in the conventional method. Second, recall
that to compute EXIT curves inevitably uses the Gaussian
approximation. Thus, it seems well acceptable to adopt
the same approximation when shaping and positioning an
EXIT curve. Finally, characterizing and representing EXIT
curves using mutual information help stabilize the process
and alleviate the inaccuracy caused by Gaussian approxi
mation and other factors. As conﬁrmed by many previous
papers as well as this one, the optimization generates very
good results in spite of the use of the Gaussian approxima
tion.
3.2. The optimization method
Below we detail the convergenceconstraint design method
formulated in (5). We conform to the notations and the
graphic framework presented in [16]. Let λ(x) =
¸
Dv
i=1
λ
i
x
i−1
and ρ(x) =
¸
Dc
i=2
ρ
i
x
i−1
be the degree proﬁles from the edge
perspective, where D
v
and D
c
are the maximum variable
node and check node degrees, and λ
i
and ρ
i
are the fraction
of edges incident to variable nodes and check nodes of degree
i. Similarly, let λ
(x) =
¸
Dv
i=1
λ
i
x
i−1
and ρ
(x) =
¸
Dc
i=2
ρ
i
x
i−1
be
the degree proﬁles from the node perspective. Let R be the
code rate. The following relation holds:
λ
i
=
λ
i
/i
¸
Dv
j=1
λ
j
/ j
, ρ
i
=
ρ
i
/i
¸
Dc
j=2
ρ
j
/ j
, R=1−
¸
Dv
i=1
λ
i
/i
¸
Dc
j=1
ρ
j
/ j
.
(6)
Let superscript (l) denote the lth LDPC decoding iteration,
and subscript v and c denote the quantities pertaining to
variable nodes and check nodes, respectively. Further, deﬁne
two functions that will be useful in the discussion
I(x) = 1 −
∞
−∞
1
√
2πx
e
−(z−x)
2
/4x
log
¸
1 + e
−z
dz, (7)
φ(x) =
⎧
⎪
⎨
⎪
⎩
1 −
1
√
4πx
tanh
z
2
e
−(z−x)
2
/4x
dz, x > 0,
1, x = 0.
(8)
Function I(x) maps the message mean x to the corre
sponding mutual information (under Gaussian assumption),
and φ(x) helps describe how the message mean evolves in
tanh(y/2) operation, where y follows a Gaussian distribution
with mean x and variance 2x.
The complete design process takes a dual constraint
optimization process that progressively optimizes variable
node degree proﬁle λ(x) and check node degree proﬁle ρ(x)
based on each other. Despite the duality in the formulation
and the steps, optimizing λ(x) is far more critical to the
code performance than optimizing ρ(x), largely because the
optimal check node degree proﬁle are shown to follow the
concentration rule [16]:
ρ(x) = Δx
k
+ (1 −Δ)x
k+1
. (9)
It is therefore a common practice to preset ρ(x) according to
(9) and code rate R, and optimize λ(x) only. For this reason,
below we focus our discussion on optimizing λ(x) for a given
ρ(x). Interested readers can formulate the optimization of
ρ(x) in a similar way.
3.2.1. Thresholdconstraint method (optimizing λ(x))
Under the assumption that the messages passed along all
the edges are i.i.d. and Gaussian distributed, the average
messages variable nodes receive from their neighboring
check nodes follow a mixed Gaussian distribution. From
(l−1)th iteration to lth local iteration (in the LDPCdecoder),
the mean of the messages associated with the variable node,
m
v
, evolves as
m
(l)
v
=
Dv
¸
i=2
λ
i
N
¸
m
(l)
v,i
, 2m
(l)
v,i
(10)
=
Dv
¸
i=2
λ
i
φ
m
0
+ (i −1)
Dc
¸
j=2
ρ
j
φ
−1
¸
1 −
¸
1 −m
(l−1)
v
j−1
¸
,
(11)
where m
0
denotes the mean of the initial messages received
from the inner code (or the channel). Let us deﬁne
h
i
(m
0
, r)
Δ
= φ
m
0
+ (i −1)
Dc
¸
j=2
ρ
j
φ
−1
¸
1 −(1 −r)
j−1
¸
,
h
¸
m
0
, r
Δ
=
Dv
¸
i=2
λ
i
h
i
¸
m
0
, r
.
(12)
Then (11) can be rewritten as
r
l
= h
¸
m
0
, r
l−1
=
Dv
¸
i=2
λ
i
h
i
¸
m
0
, r
l−1
. (13)
6 EURASIP Journal on Wireless Communications and Networking
The conventional thresholdconstraint density evolution
guarantees that the degree proﬁle converges asymptotically
to the zeroerror state at the given initial message mean m
0
.
This is achieved by enforcing [16]
r > h
¸
m
0
, r
, ∀r ∈
¸
0, φ
¸
m
0
¸
. (14)
Viewed from the EXIT chart, the thresholdconstraint
method has implicitly used a control point (v, w) = (1,
I(m
0
)), such that the resultant EXIT curve will stay below
it.
3.2.2. Convergenceconstraint method (optimizing λ(x))
The proposed convergenceconstraint method extends the
conventional thresholdconstraint method by introducing
a set of control points, which may be placed in arbitrary
positions in the EXIT plane, to control the shape and the
position of the EXIT cure. Each control point (v, w) ∈
[0, 1]
2
ensures that the EXIT curve will, at the input a priori
mutual information w, produce extrinsic mutual informa
tion greater than v. This is reﬂected in the optimization
process by changing (14) to
r
∗
> h
¸
m
0
, r
∗
, ∀r
∗
∈
¸
0, φ
¸
m
0
¸
, (15)
where r
∗
(≥ 0) is the threshold value that satisﬁes T
0
(w) ≥ v.
We can reformulate the problem as follows: for a given check
node degree proﬁle ρ(x) and a control point (v, w) in the
EXIT chart, where 0 ≤ v, w, ≤ 1,
max
¸
Dv
i=1
λi =1
Dv
¸
i=1
λ
i
i
,
subject to: (i)
Dv
¸
i=1
λ
i
= 1,
(ii)
Dv
¸
i=1
λ
i
¸
h
i
¸
m
0
, r
−r
<0, ∀r ∈
¸
r
∗
, φ
¸
m
0
¸
,
(16)
where m
0
= I
−1
(w) and r
∗
satisﬁes
T
o
(w)
Δ
=
Dv
¸
i=1
λ
i
I
i
Dc
¸
j=2
ρ
j
φ
−1
¸
1 −
¸
1 −r
∗
j−1
¸
≥ v. (17)
Apparently, when v = 1, we get r
∗
= 0, and the case reduces
to that of the conventional thresholdconstraint design.
Hence, given a set of M control points, (v
1
, w
1
),
(v
2
, w
2
), . . . , (v
M
, w
M
), where 0 ≤ v
1
< v
2
< · · · < v
M
≤ 1
and 0 ≤ w
1
≤ w
2
≤ · · · ≤ w
M
≤ 1, one can combine
the constraints associated with each individual control point
and perform joint optimization, to control the shape and the
position of the resulting EXIT curve. Speciﬁcally, when the
set of control points are proper samples from the inner EXIT
curve, the resultant EXIT curve represents an optimized
LDPC ensemble that matches to the inner code.
3.2.3. Linear programming
The basic idea of convergenceconstraint design, as discussed
before, is simple. Complication arises from the fact that
constraint (ii) in (16) is a nonlinear function of λ
i
’s. Further
more, observe that the determination of the optimization
range, or, the computation of r
∗
from (17), requires the
knowledge of λ(x), which is yet to be optimized. One possible
approach to overcome this chickenandegg dilemma is
to attempt an approximated λ(x) in (17) to compute r
∗
.
Speciﬁcally, we propose accounting for the two lowest degree
variable nodes λ
i1
and λ
i2
, and approximating the degree
proﬁle as
`
λ(x)=λ
i1
x
i1−1
+λ
i2
x
i2−1
+O
¸
λ
i2+1
x
i2
≈ λ
i1
x
i1−1
+
¸
1−λ
i1
x
i1
(18)
in (17). First, this approximated λ(x) is only used in (17) to
tentatively determine r
∗
, so that the optimization process
can get started. The exact λ(x) in (16), (i) and (ii), is to
be optimized. Second, the value of i
1
and λ
i1
(or λ
i1
) in the
approximated λ(x) is calculated in one of the following two
ways.
Case 1. A conventional LDPC ensemble has i
1
= 2, that is,
no degree1 variable nodes. This is because the outbound
messages from degree1 variable nodes do not improve over
the messagepassing process. In that case, we consider only
degree2 and 3 nodes (λ
i1=2
and λ
i2=3
), upper bound the
percentage of degree2 nodes with λ
∗
2
, and treat all the rest
as degree3 nodes. The stability condition [11, 16] states
that there exists a value ξ > 0 such that, given an initial
symmetric message density P
0
satisfying
0
−∞
P
0
(x)dx < ξ,
then the necessary and suﬃcient condition for the density
evolution to converge to the zeroerror state is λ
(0)ρ
(1) <
e
γ
, where γ
Δ
= −log(
∞
−∞
P
0
(x)e
−x/2
dx). Applying the stability
condition on Gaussian messages with initial mean value
m
0
, we get γ = m
0
/4 and λ
∗
2
= e
m0/4
/
¸
Dc
j=2
( j − 1)ρ
j
, or
equivalently,
λ
∗
2
(w) =
e
I
−1
(w)/4
¸
Dc
j=2
ρ
j
( j −1)
. (19)
It should be noted that not all values of w
k
from
the M preselected control points are suitable for (19) in
computing λ
∗
2
. Since the stability condition ensures the
asymptotic convergence to the zeroerror state for a given
input messages, λ
2
≤ λ
∗
2
(w
∗
) is valid and required only
when the output mutual information will approach 1 at the
input mutual information w
∗
. What this implies in sampling
the inner EXIT curve is that, at least one control point,
say, the rightmost point (v
M
, w
M
), should roughly satisfy the
requirement: (v
M
, w
M
) ≈ (1, w
M
). This value of w
M
is then
used in (19) to compute λ
∗
2
= λ
∗
2
(w
M
), which is subsequently
used in
`
λ(x) ≈ λ
∗
2
x +(1−λ
∗
2
)x
2
to compute r
∗
from (17). r
∗
will then be applied to all the control points from 1 to M.
Jing Li (Tiﬀany) 7
Checks Bits Checks Bits
Error
Error
LDPC Diﬀerential encoder
p
q
Figure 3: Defect for λ
1
> 1 −R. When the four bits associated with
the solid circles ﬂip altogether, another valid codeword results, and
the decoder is unable to tell (undetectable error).
It is also worth mentioning that when a Gaussian
approximation is used on the message pdf ’s, the stability
condition reduces to
λ
∗
2
(w) =
e
I
−1
(w)/4
¸
Dc
j=2
( j −1)
ρj
, (20)
which is a weaker condition than (19). Since we use Gaussian
approximation primarily for the purpose of complexity
reduction, unnecessary application is therefore avoided.
Thus (19) rather than (20) is used in our design process.
Case 2. Consider the case when an LDPC code is iteratively
decoded together with a diﬀerential encoder, or, other
recursive inner code or modulation with memory. Since
the inner code imposes another level of checks on all the
variable nodes, degree1 variable nodes in the outer LDPC
code will get extrinsic information from the inner code, and
their estimates will improve with decoding iterations. Thus,
without loss of generality, we let the ﬁrst and the second
nonzero λ
i
’s be λ
1
and λ
2
. No analytical bounds on λ
1
or
λ
2
were reported in literature for this case. We propose to
bound λ
1
by λ
1
≤ 1 − R, where R is the code rate (the exact
code rate is dependent on the optimization result, and may
be slightly diﬀerent from the target code rate). The rational
is that, if λ
1
> 1 − R, then there exist at least two degree
1 variable nodes, say the pth node and the qth node, which
connect to the same check. When the LDPC code operates
alone, these two variable nodes are apparently useless and
wasteful, and can be removed altogether. When the LDPC
code is combined with an inner recursive code, as shown
in Figure 3, these two degree1 variable nodes will cause a
minimum distance of 4 for the entire codeword, irrespective
of the code length. Using this empirical bound on λ
1
, we can
employ the approximation
`
λ(x) = (1−R)+Rx in (17), which
leads to the computation of (a lower bound for) r
∗
. Code
optimization as formulated by the convergenceconstraint
method can thus be solved using linear programming.
It is rather expected that the choice of the control points
directly aﬀects the optimization results. The set of control
points need not be large—in fact, an excessive number of
control points actually makes the optimization process con
verge slow and at times converge poor. We suggest choosing
3 to 5 control points that can reasonably characterize the
shape of the inner EXIT curve. Our experiments show that
the proposed method generates EXIT curves with a shape
matching very well to what we desire, but the position
is slightly lower, indicating that the resultant code rate is
slightly pessimistic. This can be compensated by presetting
the control points slightly higher than we actually want them
to be.
3.3. Optimization results and useful ﬁndings
For complexity concerns, instead of performing dual opti
mization, we apply the concentration theorem in (9) and
preselect ρ(x) that will make the the average column weight
to be approximately 3. The left degree proﬁle λ(x) is opti
mized through the convergenceconstraint method discussed
in the previous subsection. We now discuss some observa
tions and ﬁndings from our optimization experiments.
First, the LDPC ensemble optimal for diﬀerential coding
always contains degree1 and degree2 variable nodes. For
high rate codes above 0.75, these nodes are dominant, and
in some cases, are the the only types of variable nodes in the
degree proﬁle. For medium rates around 0.5, there exist also
a good portion of highdegree variable nodes. Considering
that the outer code of a PA code has only degree1 and
degree2 variable nodes, λ(x) = (1 − R)/(1 + R) + (2R/(1 +
R))x, where R ≥ 1/2 is the code rate, it is fair to say
that PA are (near)optimal at high rates, but less optimal
at medium rates (the optimized LDPC ensemble contains
slightly diﬀerent degree distribution than that of the PAcode,
the diﬀerence is very small in either asymptotic thresholds or
ﬁnite length simulations). This is actually well reﬂected in
the EXIT charts. As rate 3/4 (see Figure 1), the area between
the outer code of the PA code and the inner diﬀerential
code is very small, leaving not much room for improvement.
In comparison, at rate around 0.5 (see Figure 4), the area
becomes much bigger, indicating that an optimized outer
code could acquire more information rate for the same SNR
threshold, or, for the same information rate, achieve a better
SNR threshold.
The optimization result of a target rate 0.5 is shown in
Figure 4. We consider an inner diﬀerential code, operating
at 0.25 dB on a f
d
T
s
= 0.01 Rayleigh fading channel, and
decoded using the noncoherent IDDD receiver with the help
of using 10% pilot symbols. The optimzed LDPC ensemble
has code rate R = 0.5037 and degree proﬁle
λ(x) = 0.0672 + 0.4599x + 0.0264x
8
+ 0.0495x
9
+ 0.0720x
10
+ 0.0828x
11
+ 0.0855x
12
+ 0.0807x
13
+ 0.0760x
14
,
ρ(x) = x
5
.
(21)
We see that the two EXIT curves match very well with
each other. Here the inner EXIT curve is computed through
Monte Carlo simulations, when the sequences are taken in
8 EURASIP Journal on Wireless Communications and Networking
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
I
e
,
i
/
I
a
,
o
0 0.2 0.4 0.6 0.8 1
I
a,i
/I
e,o
Code design
f
d
T
s
= 0.01
1.26 dB, 10% pilots
0.25 dB, 10% pilots
R = 0.5037, optimized LDPC
R=0.5, outer code of PA codes
Figure 4: EXIT chart of a rate 0.5 LDPC ensemble optimized
using convergenceevolution for diﬀerential coding. Normalized
Doppler rate is 0.01, 10% of pilot symbols are assumed to assist
noncoherent diﬀerential detection. Degree proﬁle of the optimized
LDPC ensemble: λ(x) = 0.0672 + 0.4599x + 0.0264x
8
+ 0.0495x
9
+
0.0720x
10
+0.0828x
11
+0.0855x
12
+0.0807x
13
+0.0760x
14
, ρ(x) = x
5
blocks of N= 10
6
bits, and the power penalty due to the pilot
symbols is also compensated for.
The optimized LDPC ensemble requires 0.25 −
10 log
10
(0.5037) = 3.2283 dB asymptotically, in order for the
iterative process to converge successfully. Compared to a rate
0.50 PA code which requires 1.26−10 log
10
(0.5) = 4.2703 dB
(Figure 4), the optimized LDPC ensemble is about 1.04 dB
better asymptotically. However, as the tunnel between the
inner and the outer EXIT curves becomes more narrow, the
messagepassing decoder takes a larger number of iterations
to arrive at the zeroerror state. The increased computing
complexity and processing time are the price we pay for
reaching out to the limit.
The optimized LDPC ensemble is good in the asymptotic
sense, that is, with inﬁnite or very long code lengths. In
practice, we are also concerned with ﬁnitelength imple
mentation or individual code realization. According to the
concentration rule, at long lengths, all code realizations
perform close to each other, and they all tend to converge
to the asymptotic threshold as length increases with bound.
At short lengths, however, the concentration rule fails and
the performance may vary rather noticeably from one code
realization to another. Good realizations have improved
neighborhood condition than others, including a larger
girth (achieved, e.g., through the edge progressive growth
algorithm), a smaller number of short cycles, or a smaller
trapping set.
Figure 5 simulates the optimized rate0.5037 LDPC
code with diﬀerential encoding and noncoherent diﬀerential
10
0
10
−1
10
−2
10
−3
10
−4
B
E
R
3 3.5 4 4.5 5 5.5 6 6.5 7
E
b
/N
0
(dB)
Opt. LDPC, dif. dec., ideal, 10%, 15 iter
Opt. LDPC, dif. dec., 10%, 5, 10, 15 iter
PA, 10%, 15 iter
Conv. LDPC, nondif. dec., 10%, 15 iter
Analytical
threshold
0.75 dB 1.4 dB 1.5 dB
(64 K, 32 K), optimized LDPC, f
d
T
s
= 0.01
Figure 5: Simulations of optimized LDPC code with diﬀerential
coding and iterative diﬀerential detection and decoding. Code rate
0.5037, normalized Doppler rate 0.01, 10% pilot insertion, degree
proﬁle λ(x) = 0.0672 +0.4599x +0.0264x
8
+0.0495x
9
+0.0720x
10
+
0.0828x
11
+ 0.0855x
12
+ 0.0807x
13
+ 0.0760x
14
, and ρ(x) = x
5
, 15
(global) iterations each with 6 (local) iterations in the outer LDPC
decoding.
decoding. The Rayleigh channel and the inner diﬀerential
decoder (the IDDD receiver) are the same as discussed in
Figure 4. We chose a long codeword length of N = 64 K
to test how well the simulation agrees with the analytical
threshold. As mentioned before, a large number of iterations
(e.g., 100 iterations) is preferred to fully harness the code
gain, but considering the complexity and delay aﬀordable in
a practical system, we simulated only 15 iterations. In the
ﬁgure, the leftmost curve corresponds to the optimized DE
LDPC code using ideal detection (perfect knowledge on the
fading phases and amplitudes), but with 10% pilot symbols.
These wasteful pilot symbols are included in this coherent
detection case to oﬀset the curve, and to compare fairly with
all the other noncoherently detected curves with 10% of pilot
insertion. The three circled curves to the right of this ideal
detection curve correspond to the noncoherent performance
using iterative diﬀerential detection and decoding at the 5th,
10th, and 15th iteration. We see that the performance of
the optimized diﬀerentially encoded LDPC code performs
only about 0.3 dB worse than the coherent detection, which
is very encouraging. Further, the simulated performance
is only 0.75 dB away from the asymptotic threshold of
3.23 dB (discussed before), showing a good theorypractice
agreement.
For reference, we also plot in Figure 5 the performance of
a PAcode and a conventional LDPCcode without diﬀerential
coding (recall that conventional LDPC codes perform worse
with diﬀerential coding than without), both having code rate
around 0.5 and both noncoherently detected. We see that the
Jing Li (Tiﬀany) 9
PA code outperforms the conventional LDPC code by 1.5 dB,
but the optimized DELDPC code outperforms the PA code
by another 1.4 dB!
4. CONCLUSION
Part I of this twopart series of papers [1] studied product
accumulate codes, a special case of diﬀerentially encoded
LDPC codes, with coherent detection and especially nonco
herent detection. It showed that PA codes perform very well
in both cases. Here in Part II, we generalize the study from
PA codes to an arbitrary diﬀerentially encoded LDPC code.
The remarkable performance of LDPC codes with coher
ent detection has been extensively studied, but much less
work has been carried out on noncoherently detected LDPC
codes. In general, a noncoherently detected system may or
may not employ diﬀerential encoding. The former leads to a
diﬀerential encoding and noncoherent diﬀerential detection
architecture, and the latter requires the insertion of (many)
pilot symbols in order to track the (fastchanging) channel
well. A rather unexpected ﬁnding here is that a conventional
LDPC code actually suﬀers in either case: in the former
it was because of an EXIT mismatch between the (outer)
LDPC code and the (inner) diﬀerential code, and in the latter
it was because of the large bandwidth expansion. Here by
conventional we mean the LDPC code that delivers a superb
performance in the usual setting with coherent detection and
possibly channel state information.
Further investigation shows that it is not only possible,
but highly beneﬁcial, to optimize an LDPC code to match to
a diﬀerential decoder. The optimization is achieved through
a new convergenceconstraint density evolution method
developed here. The resultant optimized degree proﬁles are
rather nonconventional, as they contain (many) degree1
and 2 variable nodes. This is in sharp contrast to the
conventional LDPC case (i.e., coherent detection) where
degree1 variable nodes are deemed highly undesirable.
The eﬀectiveness of the new DE method is conﬁrmed
by the fact that the optimized DELDPC code brings an
additional 1.4 dB and 2.9 dB, respectively, over the existing
PA code and the conventional LDPC code (when nonco
herent detection is used). The proposed DE optimization
procedure is very useful. It provides a practical way to tune
the shape and the position of an EXIT curve, and can
therefore match an LDPC code to virtually any frontend
processor, with the imperfectness of the processor taken into
explicit consideration.
We conclude by stating that LDPC codes can after all
perform very well with diﬀerential encoding (or any other
recursive inner code or modulation), but the degree proﬁles
need to be carefully (re)designed, using, for example, the
convergenceconstraint density evolution developed here,
and one should expect the optimized degree proﬁle to
contain many degree1 (and degree2) variable nodes.
ACKNOWLEDGMENTS
This research work supported in part by the National
Science Foundation under Grant no. CCF0430634 and
CCF0635199, and by the Commonwealth of Pennsylvania
through the Pennsylvania Infrastructure Technology Alliance
(PITA).
REFERENCES
[1] J. Li, “Diﬀerentiallyencoded LDPC codes: part I—special case
of product accumulate codes,” to appear in EURASIP Journal
on Wireless Communications and Networking .
[2] J. Li, K. R. Narayanan, and C. N. Georghiades, “Product
accumulate codes: a class of codes with nearcapacity perfor
mance and low decoding complexity,” IEEE Transactions on
Information Theory, vol. 50, no. 1, pp. 31–46, 2004.
[3] V. T. Nam, P.Y. Kam, and Y. Xin, “LDPC codes with BDPSK
and diﬀerential detection over ﬂat Rayleigh fading channels,”
in Proceedings of the 50th IEEE Global Telecommunications
Conference (GLOBECOM ’07), pp. 3245–3249, Washington,
DC, USA, November 2007.
[4] H. Tatsunami, K. Ishibashi, and H. Ochiai, “On the per
formance of LDPC codes with diﬀerential detection over
Rayleigh fading channels,” in Proceedings of the 63rd IEEE
Vehicular Technology Conference (VTC ’06), vol. 5, pp. 2388–
2392, Melbourne, Victoria, Australia, May 2006.
[5] M. Franceschini, G. Ferrari, R. Raheli, and A. Curtoni, “Serial
concatenation of LDPC codes and diﬀerential modulations,”
IEEE Journal on Selected Areas in Communications, vol. 23,
no. 9, pp. 1758–1768, 2005.
[6] J. Mitra and L. Lampe, “Simple concatenated codes using
diﬀerential PSK,” in Proceedings of the 49th IEEE Global
Telecommunications Conference (GLOBECOM ’06), pp. 1–6,
San Francisco, Calif, USA, November 2006.
[7] M. C. Valenti and B. D. Woerner, “Iterative channel estimation
and decoding of pilot symbol assisted turbo codes over ﬂat
fading channels,” IEEE Journal on Selected Areas in Communi
cations, vol. 19, no. 9, pp. 1697–1705, 2001.
[8] M. Peleg and S. Shamai, “Iterative decoding of coded and
interleaved noncoherent multiple symbol detected DPSK,”
Electronics Letters, vol. 33, no. 12, pp. 1018–1020, 1997.
[9] S. ten Brink, “Convergence behavior of iteratively decoded
parallel concatenated codes,” IEEE Transactions on Communi
cations, vol. 49, no. 10, pp. 1727–1737, 2001.
[10] A. Ashikhmin, G. Kramer, and S. ten Brink, “Extrinsic
information transfer functions: model and erasure channel
properties,” IEEE Transactions on Information Theory, vol. 50,
no. 11, pp. 2657–2673, 2004.
[11] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke,
“Design of capacityapproaching irregular lowdensity parity
check codes,” IEEE Transactions on Information Theory, vol. 47,
no. 2, pp. 619–637, 2001.
[12] J. Hou, P. H. Siegel, and L. B. Milstein, “Performance analysis
and code optimization of low density paritycheck codes on
Rayleigh fading channels,” IEEE Journal on Selected Areas in
Communications, vol. 19, no. 5, pp. 924–934, 2001.
[13] A. Shokrollahi and R. Storn, “Design of eﬃcient erasure codes
with diﬀerential evolution,” in Proceedings of the IEEE Interna
tional Symposium on Information Theory, p. 5, Sorrento, Italy,
June 2000.
[14] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low
density paritycheck codes for modulation and detection,”
IEEE Transactions on Communications, vol. 52, no. 4, pp. 670–
678, 2004.
[15] R.R. Chen, R. Koetter, U. Madhow, and D. Agrawal, “Joint
noncoherent demodulation and decoding for the block fading
10 EURASIP Journal on Wireless Communications and Networking
channel: a practical framework for approaching Shannon
capacity,” IEEE Transactions on Communications, vol. 51,
no. 10, pp. 1676–1689, 2003.
[16] S.Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis
of sumproduct decoding of lowdensity paritycheck codes
using a Gaussian approximation,” IEEE Transactions on Infor
mation Theory, vol. 47, no. 2, pp. 657–670, 2001.
[17] http://lthcwww.epﬂ.ch/research/.
[18] R. Storn and K. Price, “Diﬀerential evolution—a simple and
eﬃcient heuristic for global optimization over continuous
spaces,” Journal of Global Optimization, vol. 11, no. 4, pp. 341–
359, 1997.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 385421, 9 pages
doi:10.1155/2008/385421
Research Article
Construction and Iterative Decoding of LDPC Codes Over
Rings for PhaseNoisy Channels
Sridhar Karuppasami and WilliamG. Cowley
Institute for Telecommunications Research, University of South Australia, Mawson Lakes, SA 5095, Australia
Correspondence should be addressed to Sridhar Karuppasami, sridhar.karuppasami@postgrads.unisa.edu.au
Received 1 November 2007; Revised 7 March 2008; Accepted 27 March 2008
Recommended by Branka Vucetic
This paper presents the construction and iterative decoding of lowdensity paritycheck (LDPC) codes for channels aﬀected by
phase noise. The LDPC code is based on integer rings and designed to converge under phasenoisy channels. We assume that
phase variations are small over short blocks of adjacent symbols. A part of the constructed code is inherently built with this
knowledge and hence able to withstand a phase rotation of 2π/M radians, where “M” is the number of phase symmetries in
the signal set, that occur at diﬀerent observation intervals. Another part of the code estimates the phase ambiguity present in
every observation interval. The code makes use of simple blind or turbo phase estimators to provide phase estimates over every
observation interval. We propose an iterative decoding schedule to apply the sumproduct algorithm (SPA) on the factor graph of
the code for its convergence. To illustrate the new method, we present the performance results of an LDPC code constructed over
Z
4
with quadrature phase shift keying (QPSK) modulated signals transmitted over a static channel, but aﬀected by phase noise,
which is modeled by the Wiener (randomwalk) process. The results show that the code can withstand phase noise of 2
◦
standard
deviation per symbol with small loss.
Copyright © 2008 S. Karuppasami and W. G. Cowley. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
1. INTRODUCTION
In the past decade, plenty of work was done in the con
struction and decoding of LDPC codes [1]. In general, the
code construction techniques were motivated to provide a
reduced encoding complexity and better biterror rate (BER)
performance. The channels considered are generally either
additive white Gaussian (AWGN) or binary erasure channels.
However, many real systems are aﬀected by phase noise
(e.g., DVBS2). The severity of the phase noise depends
on the quality of the local oscillators and the symbol rate.
Hence the performance of codes on the channels with phase
disturbances are of practical signiﬁcance.
Over the past few years, iterative decoding for channels
with phase disturbance has received lots of attention [2–
7]. In [2, 3], the authors have proposed algorithms to
apply over a factor graph model that involves the phase
noise process. They used canonical distributions to deal
with the continuous phase probability density functions. In
particular, their approach based on Tikhonov distribution
yields a good performance. In [4], the authors developed
algorithms for noncoherent decoding of turbolike codes for
the phasenoisy channels. These schemes make use of pilot
symbols for either estimation or decoding. In [5], the authors
showed the rotational robustness of certain codes under a
constant phase oﬀset channel with the presence of cycle slips
only during the initial part of the codeword.
In [6], the authors used smaller observation intervals
to tackle varying frequency oﬀset in the context of serially
concatenated convolutional codes (SCCCs). They used blind
and turbo phase estimators to provide a phase estimate for
every subblock. Since the phase estimates obtained from
the blind phase estimator (BPE) are phase ambiguous, each
subblock is aﬀected by an ambiguity of 2π/M radians.
By diﬀerentially encoding the subblocks independently, the
authors tackled the phase ambiguity. However, using an
inner diﬀerential encoder along with an LDPC code provides
a loss in performance and the degree distributions of the
LDPC code needs to be optimized [7].
The concept of smaller observation intervals in the
presence of phase disturbances is attractive and oﬀers low
complexity as well. Intuitively, as the observation interval get
2 EURASIP Journal on Wireless Communications and Networking
smaller more phase variation may be tackled. On the other
hand, phase estimators produce poor estimates with smaller
observation intervals. However, if the phase estimation error
is smaller than π/M, the decoder may be able to converge
correctly.
In our earlier work [8], we used subblocks in a binary
LDPCcoded receiver to tackle residual frequency oﬀset. The
received symbol vector was split into many subblocks and
BPE was used to provide a phase estimate across every sub
block. We introduced the concept of “local check nodes”
(LCNs) to resolve the phase ambiguity created by the BPE
on the subblocks. Local check nodes are odd degree check
nodes connected to the variable nodes present within a
single subblock. In (1), the local check nodes correspond
to the top four rows of the paritycheck matrix, in which
the bottom (dotted) part is connected according to random
construction. In this small example, the LCN degree (d
L
c
) is
three and if the subblock size (N
b
) is six symbols, the parity
check matrix provides N
b
/d
L
c
= 2 LCNs to resolve the phase
ambiguity in each subblock
H =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
111000000000
000111000000
000000111000
000000000111
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
. (1)
The phaseambiguityresolved vector is decoded by an LDPC
decoder. Turbo phase/frequency estimates (e.g., [9]) are
obtained during iterations to facilitate the convergence.
The quality of the phase ambiguity estimate is better with
more LCNs. Hence with reduced subblock sizes, the phase
ambiguity estimate is less reliable and the code suﬀers
performance degradation.
Following [6, 8], but with a diﬀerent perspective, we
addressed the problem of phase noise for BPSK signals in the
presence of a binary LDPCcoded system [10]. In particular,
we incorporated the observation that, even under large phase
disturbances the variation in phase over adjacent symbols
are normally small. We created a set of check nodes called
“global check nodes” (GCNs) that converge irrespective of
phase rotations (0 or π radians) in any subblocks. We
used BPE or TPE to provide a phase estimate in each sub
block. After the convergence of GCN, we used only one
LCN per subblock to resolve the phase ambiguity present
in the subblock. We found that even under relatively large
phase noise and observation intervals, the method provided a
good performance for BPSK signals. We did not make use of
pilot symbols and the complexity is low. However, we found
that the extension of the above approach to higherorder
modulations was very diﬃcult with a binary LDPC code.
In particular, with a binary LDPC code, constructing global
check nodes that converge irrespective of a phase rotation
(a multiple of 2π/M radians) in the subblocks was diﬃcult.
This paper addresses the problem of extending the above
code construction technique to higherorder signal constella
tions based on integer rings. Speciﬁcally, we construct LDPC
codes over rings with certain constraints on the placement of
edges and edge gains such that they, along with subblock
phase estimation techniques, provide good performance
under phasenoisy channels with low complexity. Under a
noiseless channel, we present edge constraints based on inte
ger rings generalized for any phasesymmetric modulation
scheme, under which the convergence of the global check
nodes is guaranteed in the presence of phase ambiguities
in any subblock. Similarly, we present generalized edge
constraints for the local check node such that they are
able to resolve the phase ambiguity in the subblock. To
illustrate the concepts discussed in this paper under a phase
noisy channel, we show the performance of an LDPC code
constructed over Z
4
with codewords mapped onto QPSK
modulation, where the transmitted symbol s
k
∈ {s
m
k
=
e
j((π/2)m+π/4)
}, m = {0, 1, 2, 3}.
The remainder of the paper is organized as follows. In
Section 2, we discuss the channel model considered for our
simulations. In Section 3, we address the eﬀects of phase
ambiguity on the check nodes and discuss the construction
of global and local check nodes. In Section 4, we explain
code construction and present a matrix inversion technique
to obtain the generator matrix. In Section 5, we explain
the receiver architecture and detail the iterative decoding
for the convergence of these codes. We also show the
additional computational complexity required due to the
phase estimation process. In Section 6, we discuss the BER
performance of the proposed receiver under phase noise
conditions using the code constructed over Z
4
for QPSK
signal set. In Section 7, we discuss the beneﬁts of the blind
phase estimator in reducing the computational complexity
involved with the turbo phase estimation and also show the
BER performance of the lowcomplexity iterative receiver
with the Z
4
code under phase noise conditions. We conclude
in Section 8 by summarizing the results of this paper.
2. CHANNEL MODEL
An information sequence is encoded by an (N, K) nonbinary
LDPC code constructed over integer rings (Z
M
), where N
and K represent the length and dimension of the code and
Z
M
denote the integers {0, 1, 2, . . . , M − 1} under addition
modulo M, respectively. The alphabets over Z
M
are mapped
onto complex symbols s using phase shift keying (PSK)
modulation with M phase symmetries. The complex symbols
are transmitted over a channel aﬀected by carrier phase
disturbance and complex additive white Gaussian noise.
Ideal timing and frame synchronization are assumed
and henceforth, all the simulations assume one sample per
symbol. At the receiver, after matched ﬁltering and ideal
sampling, we have
r
k
= s
k
e
jθk
+ n
k
, k = 0, 1, . . . , N
s
−1, (2)
where s
k
, r
k
, θ
k
, and n
k
are the kth component of the vectors
r, s, θ, and n, of length N
s
, respectively. The noise samples
n
k
contain uncorrelated real and imaginary parts with zero
mean and twosided power spectral density (PSD) of N
0
/2.
S. Karuppasami and W. G. Cowley 3
The phase noise process θ
k
is generated using the Wiener
(randomwalk) model described by
θ
k
= θ
k−1
+ Δ
k
, k = 1, 2, . . . , N
s
−1, (3)
where Δ
k
is a white real Gaussian process with a standard
deviation of σ
Δ
. θ
0
is generated uniformly from the distribu
tion (−π, π).
Let us divide the received symbol vector r of length
N
s
into B subblocks of length N
b
. Assuming small phase
variations over adjacent symbols, we may approximate the
phase variations on the symbol in the lth subblock by a
mean phase oﬀset
`
θ
l
∈ (−π, π). Similar to (2), the received
sequence can be expressed as
r
k
s
k
e
j
`
θl
+ n
k
, l = 0, 1, . . . , B −1, (4)
where k
= N
b
l + k, k = 0, 1, . . . , N
b
− 1. While the
channel model in (2) is used in our simulations, we use
the approximate model in (4) for the code construction
and receiverside processing. The approximate phase oﬀset
over lth subblock,
`
θ
l
∈ (−π, π) can be represented as the
summation of an ambiguous phase oﬀset φ
l
∈ (−π/M, π/M)
and the phase ambiguity α
l
∈ {0, 2π/M, 4π/M, . . . , 2(M −
1)π/M}.
The proposed receiver tackles modest to high levels of
phase noise. For instance, the phase noise considered in this
paper (Wiener model with σ
Δ
of 1
◦
and 2
◦
) is several times
larger than the phase noise mentioned in the European Space
Agency model (Wiener model with σ
Δ
= 0.3
◦
per symbol
[2]). However, due to the assumptions made in (4), the
proposed receiver will not be able to tackle large amounts
of phase noise, such as the Wiener model with σ
Δ
= 6
◦
per
symbol in [2, 3].
3. EFFECT OF PHASE AMBIGUITIES ON
THE CHECK NODES
In this section, we address the eﬀect of phase rotations that
are multiples of 2π/M radians on the global and local check
nodes of an LDPC code constructed over Z
M
. Let H
i, j
be the
elements of the parity check matrix participating in the ith
check node such that,
dc
¸
j=1
H
i, j
x
j
= 0 (mod M), (5)
where d
c
is the degree of the check node, x
j
is the jth symbol
participating in the ith check node and the value of H
i, j
is
chosen from the nonzero elements of Z
M
. In the remaining
subsections, we denote the degree of the GCN and LCN as
d
G
c
and d
L
c
, respectively.
3.1. Global check nodes
Unlike local check nodes, the edges of the GCN are spread
across many subblocks. Let p be the number of global check
node edges connected to symbols present within one sub
block. Say, all symbols in that subblock are rotated by 2πt/M
radians, where t ∈ {0, 1, . . . , M − 1}. As a result, the check
equation in (5) becomes
p
¸
j=1
H
i, j
¸
x
j
+ t
+
d
G
c
¸
j=p+1
H
i, j
x
j
=
p
¸
j=1
H
i, j
t +
d
G
c
¸
j=1
H
i, j
x
j
= t
p
¸
j=1
H
i, j.
(6)
Thus for arbitrary integer t, (6) becomes zero only if
p
¸
j=1
H
i, j
= 0 (mod M). (7)
In the case of binary LDPC code, p should be even in
order to satisfy (7). For LDPC codes over higherorder rings,
p can either be odd or even depending on the values of H
i, j
.
In this work, we select the values of H
i, j
from the set of
nonzero divisors of Z
M
({1, 3} from Z
4
) to avoid problems
during matrix inversion. As a result, p becomes even in the
case of LDPC code over integer rings which further makes d
G
c
as well, even.
Example 1. Assume an LDPC code constructed over Z
4
with
B = 4 subblocks. Consider a degree8 GCN whose edges
are connected to two symbols per subblock (p = 2) and the
corresponding edge gains be g = [1, 3, 1, 3, 3, 1, 1, 3]. One set
of symbols that satisﬁes this check is x = [3, 2, 3, 1, 1, 3, 0, 1].
Let us assume that subblock one and four are rotated by π/2
and π radians, respectively. Therefore, the subblock rotated
version of x, say x
r
= [0, 3, 3, 1, 1, 3, 2, 3]. It can be seen that
x
r
still satisﬁes the parity check equation with the same g.
Note that each subblock has one edge with value “1” and
another with “3,” whose sum is 0 (mod 4) as required by (7).
3.2. Local check nodes
Local check nodes resolve the phase ambiguity present in a
subblock. Let the elements H
i, j
participating in check i be
selected from a single subblock such that,
d
L
c
¸
j=1
H
i, j /
=0 (mod 2). (8)
Alternatively, (8) represents that the element
¸
d
L
c
j=1
H
i, j
is
chosen from the set of nonzero divisors from Z
M
, which
is achieved by performing the summation over modulo 2
rather than M. If modulo M is used, the check node will not
resolve certain phase ambiguities as explained below.
If all the symbols x
j
participating in ith local check node
are rotated by 2πt/M radians, then using (5) and (8), we can
show that for every t there exists a distinct residue (mod M)
which provides a solution for the phase ambiguity present on
the participating symbols x
j
. Considering all the operations
below are modulo M,
d
L
c
¸
j=1
H
i, j
¸
x
j
+ t
=
d
L
c
¸
j=1
H
i, j
x
j
+
d
L
c
¸
j=1
H
i, j
t = t
d
L
c
¸
j=1
H
i, j
. (9)
4 EURASIP Journal on Wireless Communications and Networking
Hence t can be written as
t =
d
L
c
¸
j=1
H
i, j
¸
x
j
+ t
¸
×
d
L
c
¸
j=1
H
i, j
¸
−1
(mod M). (10)
In case where the
¸
d
L
c
j=1
H
i, j
do not have a multiplicative
inverse in Z
M
(say
¸
d
L
c
j=1
H
i, j
equals a zero divisor), then (9)
is satisﬁed for any t ∈ {zero divisors in Z
M
} and hence
the phase ambiguity estimate is not unique. Thus choosing
¸
d
L
c
j=1
H
i, j
with a multiplicative inverse in Z
M
ensures phase
ambiguity resolution. Further, by selecting the edge gains of
the LCN from the nonzero divisors of Z
M
, which are odd
integers less than M, we require an odd number of edges to
satisfy (8). Hence the degree of the local check node d
L
c
is
always considered to be odd in this work.
Example 2. Let us consider the code and rotations as in
Example 1. Let the code include a degree3 LCN whose edges
with gains [1, 3, 1] are connected to the ﬁrst subblock. A set
of symbols that satisﬁes this check is x = [3, 0, 1]. Due to the
rotation of π/2 radians in the ﬁrst subblock, x
r
= [0, 1, 2].
Using (10), we can evaluate that t = 1 which corresponds to
π/2 radians.
4. NONBINARY CODE CONSTRUCTION
We apply the above set of principles in constructing codes
that are beneﬁcial in dealing with phase noise channels.
Similar to [11], we construct a binary code and choose the
nonzero divisors from Z
M
as edge gains such that check
conditions as described in Section 3 are satisﬁed.
4.1. Code construction
Following Section 2, let us say we have “B” subblocks of
length N
b
. A binary parity check matrix H
N−K×N
is con
structed such that it involves two parts:
H =
⎡
⎢
⎢
⎣
H
resolving
· · · · · · · · · · · ·
H
converging
⎤
⎥
⎥
⎦
. (11)
The upper (B × N) part of the matrix, called H
resolving
,
involves B local check nodes in contrast to N
b
/d
c
LCNs as
in our previous method [8], which are used to resolve the
phase ambiguity in B subblocks. The lower (N −K −B×N)
part of the matrix, called H
converging
, contains N − K − B
check nodes whose neighbours are selected such that their
convergence is independent of the phase ambiguities in the
subblock. We assume the degree of all the local (global)
check nodes to be equal to d
L
c
(d
G
c
). The codes are designed
to be check biregular, (i.e., with two diﬀerent degrees, d
L
c
and
d
G
c
). However, there is no constraint on the variable node
degree.
We construct the code as per the following procedure.
(1) Construction of local check nodes: the edges of the local
check node are connected to the ﬁrst d
L
c
symbols of the
subblock for which it resolves the phase ambiguity.
For example, assuming d
L
c
= 3, let H
i j
= 1 where
j corresponds to the ﬁrst 3 columns of each sub
block. However, we can arbitrarily choose the set of d
L
c
symbols from any part of the subblock.
(2) Construction of global check nodes: for every symbol,
the parity checks in which the symbol participates
are randomly chosen based on its degree and (7). As
in Example 1, every global check node participates in
only two symbols from a subblock. Care was taken to
avoid short cycles after constructing every column.
To illustrate the local and global check nodes, a small
parity check matrix (H) is shown in (12). The ﬁrst four rows
corresponding to the local check nodes (H
resolving
) are shown
at the top. The two rows below the local check nodes are
connected globally and also have p = 2 edges connected
to symbols from a subblock. The restriction of two edges
per subblock provides a better connectivity in the code.
The same technique is continued to construct the remaining
global check nodes in the dotted part of the matrix. The local
and the global check nodes shown in the ﬁrst and ﬁfth rows
of the Hmatrix are used in the previous examples. A portion
of the Tanner graph of the H matrix, in (12), is shown in
Figure 1. Local check nodes (shaded checks) and their edges
(solid lines) are distinguished from the global check nodes
and their edges (dashdotted lines)
H =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
131000000000000000000000
000000111000000000000000
000000000000333000000000
000000000000000000313000
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
100300010300030100001030
010300031000003100130000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
.
(12)
4.2. Some comments on encoding
We used the Gaussian elimination (GE) approach to obtain
a systematic generator matrix. Even though the edge gains of
the parity check matrix are nonzero divisors, we encountered
zero divisors ({2} in the case of Z
4
) during GE in the
diagonal part of the matrix. To avoid this problem, we
interchanged columns across the parity check matrix such
that we obtain a generator matrix (G) corresponding to
the columninterchanged parity check matrix (H
). Since
we wanted to use the original H matrix instead of H
,
we created a permutation table (P) to record the columns
S. Karuppasami and W. G. Cowley 5
p = 2
B = 4
LCN (d
L
c
= 3)
GCN (d
G
c
= 8)
.
.
.
Figure 1: Tanner graph of the H matrix in (12), illustrating local
and global check nodes.
that are interchanged during inversion. Alternate inversion
techniques may avoid the use of permutation table P.
A summary of the communication system used in the
simulations is given in Figure 2. The message is encoded by
the generator matrix G to produce the codeword (c). The
codeword c undergoes inverse permutation to produce c
.
The codeword is transmitted through the composite channel.
Since the permutedencoded symbols are the codewords of
the original code H, the decoder decodes the codeword. The
decoded codeword c
is again permuted to give the original
codeword c.
5. RECEIVER ARCHITECTURE ANDITERATIVE
DECODINGSCHEDULE
The receiver architecture to tackle large phase disturbances
is shown below in Figure 3. We used the SPA algorithm
for LDPC codes over rings, similar to [12]. In the case of
an AWGN channel, the SPA may be applied over the
entire code for convergence. However, in the presence of
phase disturbances, phase estimators provide an ambiguous
phase estimate and hence the SPA is applied only over the
rotationally invariant part of the factor graph, that is, the
graph involving global check nodes only.
This section discusses the application of SPAon the factor
graph of the code with phase oﬀset on every subblock such
that the beneﬁts of local and global check nodes are achieved.
Thus we split up the decoding into three phases as described
below.
(1) Converging phase.
(a) The likelihood vector, of length M, for the kth
variable node is initialized with the channel like
lihoods, p(r
k
 s
k
= s
m
k
) = (1/2πσ
2
)exp{−(r
k
−
s
m
k

2
)/2σ
2
}, where m = {0, 1, . . . , M − 1}, k =
{0, 1, . . . , N
s
−1} and σ
2
is the noise variance.
(b) The SPA is applied over the H
converging
part of the
code alone. Local check nodes are not used. The
messages coming from these nodes are assigned
to be equiprobable.
(c) After every d iterations, the turbo phase estima
tor (TPE) [9] estimates the phase oﬀset
φ
l
, which
is given by
φ
l
= arg
¸
k
r
k
a
∗
k
¸
, (13)
where k
, as deﬁned in (4), is the kth component
in the lth subblock and a
∗
k
is the complex
conjugate of the soft symbol estimate. The soft
symbol estimate a
k
of the symbol s
k
is given by
a
k
=
M−1
¸
m=0
s
m
k
p
¸
s
k
= s
m
k
 ` r
k
, (14)
where p(s
k
= s
m
k
 ` r
k
) is the a posteriori prob
ability that symbol s
k
= s
m
k
. The received
symbol vector corresponding to lth subblock is
corrected using the turbo phase estimate
φ
l
.
(d) The likelihoods are recalculated from `r after
phase correction and are used to update the mes
sages that are passed on to the global check node.
(e) Steps (a)–(c) are repeated until all the global
check nodes are satisﬁed.
(2) Resolving phase.
(a) As the symbol a posteriori probabilities at the
variable nodes are good enough at the end
of converging phase, a hard decision is taken
on the symbols, which corresponds to (x
j
+
t) in (10). These hard decisions are used to
evaluate the subblock phase ambiguity estimates
α
l
= 2πt/M using local check nodes as in (10),
which are further used to correct the received
symbol values, giving `r
. In general, the decoder
converges at the end of this stage.
(3) Final phase.
(a) If required, SPA is continued over the entire
code involving both H
resolving
and H
converging
until
either the syndrome (H c
T
= 0) is satisﬁed or a
speciﬁed number of iterations are reached. Turbo
phase estimation or phase ambiguity resolution
is is not required at this phase.
5.1. Comments on turbo phase estimation
In general, turbo phase estimation can provide a phase esti
mate in the range (−π, π). However, during the converging
6 EURASIP Journal on Wireless Communications and Networking
Message
G
c
P
−1
c
Mapper &
channel model
as in eq.(2)
LDPC receiver
(See Fig. 3)
c
P
c
Figure 2: Communication system.
Phase
ambiguity
resolver
Over subblocks
LDPC
decoder
Are all GCN
satisﬁed ?
Turbo
phase
estimator
Over subblocks
Delay by
d
iterations
r `r `r
e
−j
φ
l
e
−j α
l
Yes
No
Figure 3: Proposed LDPC receiver architecture.
0 5 10 15 20 25
Number of iterations
−30
−20
−10
0
10
20
30
M
e
a
n
t
u
r
b
o
p
h
a
s
e
e
s
t
i
m
a
t
e
(
d
e
g
r
e
e
s
)
θ = 0
◦
θ = 15
◦
θ = 30
◦
θ = 60
◦
θ = 75
◦
θ = 90
◦
Figure 4: Evolution of turbo phase estimates over subblocks
during convergence.
phase of this code, the decoder converges to a codeword
which is rotationally equivalent to the transmitted codeword.
Hence the turbo phase estimator provides a phase estimate
whose range lies between (−π/M, π/M). This is illustrated
in Figure 4, which shows the mean trajectories of the turbo
phase estimates over a subblock of 100 symbols at an
E
b
/N
0
= 2 dB under a constant phase oﬀset (θ).
5.2. Computational complexity
The computational complexity of the proposed LDPC
receiver can be evaluated as the summation of the complexi
ties of the LDPC decoder and the phase estimator/ambiguity
resolver. The computational complexity of the nonbinary
LDPC decoder is dominated by the check node decoder with
O(M
2
) operations. Reducing the computational complexity
of the nonbinary LDPC decoder is an active area of research
[13, 14]. In this paper, we concentrate only on the additional
complexity involved in the receiver due to the turbo phase
estimation in (13) and ambiguity resolution.
Since the decoding algorithm works in the probability
domain, the a posteriori probability of the symbols p(s
k
=
s
m
k
 ` r
k
) are directly available from the decoder. Given
the a posteriori probability vector of length M, for the
kth symbol, the soft symbol estimate of the symbol s
k
can be calculated according to (14). To estimate N
s
soft
symbol estimates, we require 2(M − 1)N
s
real additions and
2MN
s
real multiplications. Given the soft symbol estimates,
the evaluation of turbo phase estimate for B subblocks
requires an additional 4N
s
real multiplications, 2(N
s
− B)
real additions and B lookup table (LUT) access for evaluating
the arg function. Correcting every symbol by the turbo
phase estimate requires 4 real multiplications and 2 real
additions. Thus the total complexity involved for estimating
and correcting a symbol for its phase oﬀset using a turbo
phase estimator per iteration (O
TPE
) is given as
O
TPE
= [2M + 8]
×
+
¸
2M + 4 −
2B
N
s
+
+
¸
B
N
s
LUT
, (15)
S. Karuppasami and W. G. Cowley 7
where [·]
×
, [·]
+
, and [·]
LUT
correspond to the number of
real multiplications, real additions, and lookup table access,
respectively. The complexity involved in resolving phase
ambiguity per symbol is very small. Also phase ambiguity
resolution is required only once per decoding.
Thus the additional complexity of the receiver, mainly
due to turbo phase estimation, is relatively small. In the case
of the LDPCcode described in Section 6, the additional com
plexity per symbol per iteration is approximately equivalent
to ([16]
×
+ [12]
+
) operations, assuming d = 1.
6. BER PERFORMANCE OF THE PROPOSEDRECEIVER
We constructed a binary LDPC code of N = 3000, K =
1500, R = 0.5 for a subblock size of N
b
= 100 symbols.
Through simulations, we found that the code with subblock
size of 100 symbols gives the best BER performance for the
amounts of phase noise considered in this paper. The degree
distributions of this binary code were obtained through EXIT
charts [15] such that they converged at an E
b
/N
o
of 1.3 dB.
The variable node and check node distributions, in terms
of node perspective, were λ(x) = 0.8047x
3
+ 0.0067x
4
+
0.1887x
8
and ρ(x) = 0.02x
3
+ 0.98x
8
, respectively. The code
corresponds to B = 30 subblocks over the codeword.
We replaced the edge gains of this code from the nonzero
divisors of Z
4
such that they follow the constraints discussed
in Section 3. Turbo phase estimation was done after every
iteration (d = 1), only during the converging phase.
Iterations are performed until the codeword converges, or
to a maximum of 200 iterations. However, we found that
on an average in the waterfall region, less than 40 iterations
are required for convergence. Simulations are performed
either until 100 codeword errors are found or up to 500,000
transmissions.
Simulation results in Figure 5 show the performance of
our receiver in Figure 3 under phase noise conditions. For a
constant phase oﬀset, there is a small degradation of around
0.3 dB from the coherent performance at a BER of 10
−5
. This
loss is due to the proposed application schedule of SPA on
the code, which did not include local check nodes during
the convergence phase and the degraded performance of the
turbo phase estimator with reduced subblock size. However,
thereafter with a small loss, the code is able to tolerate a phase
noise with σ
Δ
= 2
◦
per symbol.
7. LOWER COMPLEXITY ITERATIVE RECEIVER
In this section, we show that the computational complexity
involved with the turbo phase estimation can be reduced by
using a blind phase estimator just once, before the iterative
receiver proposed in Figure 3.
7.1. Comments on initial phase estimation
The performance of the LCNbased phase ambiguity res
olution (PAR) algorithm degrades with the amount of
phase oﬀset present on the symbols participating in the
LCN. Hence in our earlier work [8], we used a BPE to
provide phase estimate for every subblock of symbols before
1 1.2 1.4 1.6 1.8 2 2.2 2.4
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
AWGN
σ
Δ
= 0
◦
σ
Δ
= 1
◦
σ
Δ
= 2
◦
Figure 5: Performance of the proposed receiver in Figure 3 with
QPSK and the Wiener phase model.
0 20 40 60 80 100 120 140 160 180 200
Number of iterations
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
P
r
o
b
a
b
i
l
i
t
y
o
f
d
e
c
o
d
e
r
c
o
n
v
e
r
g
e
n
c
e
BPE + TPE (d = 10)
TPE (d = 10)
TPE (d = 1)
Figure 6: Convergence improvement due to an initial blind phase
estimator.
resolving PAR using the local check nodes. However, in
the current work, we are able to delay the PAR on the
subblocks since the code can converge with the phase
ambiguous estimates obtained from the TPE alone. Hence
the proposed architecture does not require the use of a
blind phase estimator. However, by employing an initial
BPE for coarse phase estimation and correction of the sub
blocks, the number of iterations required for convergence
can be reduced. Figure 6 illustrates the beneﬁt of blind phase
estimation at an E
b
/N
0
= 2.1 dB with a Wiener phase noise
of 1
◦
standard deviation per symbol.
It also shows that the computational complexity due
to TPE can be reduced, approximately a factor of 10, by
8 EURASIP Journal on Wireless Communications and Networking
1 1.2 1.4 1.6 1.8 2 2.2 2.4
E
b
/N
0
(dB)
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
B
E
R
AWGN
TPE (d = 1)
BPE + TPE (d = 10)
TPE (d = 10)
TPE (d = 1, till 10th iteration and then d = 10)
Figure 7: Performance of the low complexity receiver discussed in
Section 7 under phase noise with σ
Δ
= 2
◦
per symbol.
using the BPE once before the iterative receiver and then
periodically using the turbo phase estimator.
7.2. BER performance
The code described in Section 6 was used to simulate the BER
performance of the iterative receiver with low computational
complexity. The blind phase estimator was used to estimate
and correct the phase disturbance present in each subblock
of the received symbol vector, following which the phase
corrected symbol vector was fed into the iterative receiver
in Figure 3. During the convergence phase, turbo phase
estimates were obtained once in 10 iterations (d = 10). At
σ
Δ
= 2
◦
per symbol, Figure 7 shows the advantage of a blind
phase estimator in terms of BER performance. The result
compares three distinct cases with the normal receiver, where
turbo phase estimation was performed in every iteration.
The presence of blind phase estimator allows us to include
turbo phase estimator only once in every 10 iterations
with a small loss of 0.05 dB. However, without blind phase
estimator, performing turbo phase estimation only once in
every 10 iterations shows signiﬁcant degradation. As shown,
the performance can be improved by including turbo phase
estimation for more iterations, particularly the early stages of
the decoder, during which the LDPC decoder provides a lot
of new information regarding the symbols.
8. CONCLUSION
In this paper, we addressed the problem of LDPC code
based iterative decoding under phase noise channels from
a code perspective. We proposed construction of ringbased
codes for higherorder modulations that work well with sub
block phase estimation techniques of low complexity. The
code was constructed using the new constraints outlined in
Section 3 such that it not only converges under subblock
phase rotations, but also estimates them. We also showed the
property of ringbased check nodes under the presence of
phase ambiguity based on their edge gains in a generalized
manner. As part of our future work, we are looking at ways
to construct code without explicitly constructing local check
nodes for PAR. The subblock size used in the simulation
results shown earlier, has not been optimized and we believe
that the method can be extended to adjust the observation
interval and phase model depending on the amount of phase
noise.
ACKNOWLEDGMENTS
The authors wish to acknowledge helpful discussions with
Dr. Steven S. Pietrobon on this topic and also thank reviewers
for their useful comments.
REFERENCES
[1] R. Gallager, “Low density paritycheck codes,” IEEE Transac
tions on Information Theory, vol. 8, no. 1, pp. 21–28, 1962.
[2] G. Colavolpe, A. Barbieri, and G. Caire, “Algorithms for
iterative decoding in the presence of strong phase noise,” IEEE
Journal on Selected Areas in Communications, vol. 23, no. 9, pp.
1748–1757, 2005.
[3] A. Barbieri, G. Colavolpe, and G. Caire, “Joint iterative
detection and decoding in the presence of phase noise
and frequency oﬀset,” IEEE Transactions on Communications,
vol. 55, no. 1, pp. 171–179, 2007.
[4] I. MotedayenAval and A. Anastasopoulos, “Polynomial
complexity noncoherent symbolbysymbol detection with
application to adaptive iterative decoding of turbolike codes,”
IEEE Transactions on Communications, vol. 51, no. 2, pp. 197–
207, 2003.
[5] R. Nuriyev and A. Anastasopoulos, “Rotationally invariant
and rotationally robust codes for the AWGN and the non
coherent channel,” IEEE Transactions on Communications,
vol. 51, no. 12, pp. 2001–2010, 2003.
[6] W. G. Cowley and M. S. C. Ho, “Transmission design for
Dopplervarying channels,” in Proceedings of the 7th Australian
Communications Theory Workshop (AusCTW ’06), pp. 110–
113, Perth, Australia, February 2006.
[7] M. Franceschini, G. Ferrari, R. Raheli, and A. Curtoni, “Serial
concatenation of LDPC codes and diﬀerential modulations,”
IEEE Journal on Selected Areas in Communications, vol. 23,
no. 9, pp. 1758–1768, 2005.
[8] S. Karuppasami and W. G. Cowley, “LDPC codeaided phase
ambiguity resolution for QPSK signals aﬀected by a frequency
oﬀset,” in Proceedings of the 8th Australian Communications
Theory Workshop (AusCTW ’07), pp. 47–50, Adelaide, Aus
tralia, February 2007.
[9] N. Noels, V. Lottici, A. Dejonghe, et al., “A theoretical
framework for softinformationbased synchronization in
iterative (turbo) receivers,” EURASIP Journal on Wireless
Communications and Networking, vol. 2005, no. 2, pp. 117–
129, 2005.
S. Karuppasami and W. G. Cowley 9
[10] S. Karuppasami, W. G. Cowley, and S. S. Pietrobon, “LDPC
code construction and iterative receiver techniques for chan
nels with phase noise,” in Proceedings of the 67th IEEE
Vehicular Technology Conference (VTC ’08), Singapore, May
2008.
[11] D. Sridhara and T. E. Fuja, “LDPC codes over rings for
PSK modulation,” IEEE Transactions on Information Theory,
vol. 51, no. 9, pp. 3209–3220, 2005.
[12] M. C. Davey and D. MacKay, “Lowdensity parity check codes
over GF(q),” IEEE Communications Letters, vol. 2, no. 6, pp.
165–167, 1998.
[13] D. Declercq and M. Fossorier, “Decoding algorithms for
nonbinary LDPC codes over GF(q),” IEEE Transactions on
Communications, vol. 55, no. 4, pp. 633–643, 2007.
[14] A. Voicila, D. Declercq, F. Verdier, M. Fossorier, and P. Urard,
“Lowcomplexity, lowmemory EMS algorithm for nonbinary
LDPC codes,” in Proceedings of IEEE International Conference
on Communications (ICC ’07), pp. 671–676, Glasgow, Scot
land, UK, June 2007.
[15] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low
density paritycheck codes for modulation and detection,”
IEEE Transactions on Communications, vol. 52, no. 4, pp. 670–
678, 2004.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 362897, 12 pages
doi:10.1155/2008/362897
Research Article
NewTechnique for Improving Performance of LDPC Codes in
the Presence of Trapping Sets
Esa Alghonaim,
1
Aiman ElMaleh,
1
and Mohamed Adnan Landolsi
2
1
Computer Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Kingdom of Saudi Arabia
2
Electrical Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Kingdom of Saudi Arabia
Correspondence should be addressed to Esa Alghonaim, esa.alg@gmail.com
Received 2 December 2007; Revised 18 February 2008; Accepted 21 April 2008
Recommended by Yonghui Li
Trapping sets are considered the primary factor for degrading the performance of lowdensity paritycheck (LDPC) codes in the
errorﬂoor region. The eﬀect of trapping sets on the performance of an LDPC code becomes worse as the code size decreases.
One approach to tackle this problem is to minimize trapping sets during LDPC code design. However, while trapping sets can
be reduced, their complete elimination is infeasible due to the presence of cycles in the underlying LDPC code bipartite graph.
In this work, we introduce a new technique based on trapping sets neutralization to minimize the negative eﬀect of trapping sets
under belief propagation (BP) decoding. Simulation results for random, progressive edge growth (PEG) and MacKay LDPC codes
demonstrate the eﬀectiveness of the proposed technique. The hardware cost of the proposed technique is also shown to be minimal.
Copyright © 2008 Esa Alghonaim et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
Forward error correcting (FEC) codes are an essential com
ponent of modern stateoftheart digital communication
and storage systems. Indeed, in many of the recently devel
oped standards, FEC codes play a crucial role for improving
the error performance capability of digital transmission over
noisy and interferenceimpaired communication channels.
Lowdensity paritycheck codes (LDPCs), originally
introduced in [1], have recently been undergoing a lot
of active research and are now widely considered to be
one of the leading families of FEC codes. LDPC codes
demonstrate performance very close to the information
theoretic bounds predicted by Shannon theory, while at the
same time having the distinct advantage of lowcomplexity,
nearoptimal iterative decoding.
As with other types of codes decoded by iterative decod
ing algorithms (such as turbo codes), LDPC codes can suﬀer
from the presence of undesirable error ﬂoors at increasing
SNR levels (although these are found to be relatively lower
than the error ﬂoors encountered with turbo codes [2]).
In the case of LDPC codes, trapping sets [2–4] have been
identiﬁed as one of the main factors causing error ﬂoors
at high SNR values. The analysis of trapping sets and their
impact on LDPC codes has been addressed in [3, 5–9]. The
main approaches for mitigating the impact of trapping sets
on LDPC codes are based on either introducing algorithms
to minimize their presence during code design as in [5, 7, 9]
or by enhancing decoder performance in the presence of
trapping sets as in [3, 6, 8]. The main disadvantage of the
ﬁrst approach, in addition to putting tight constraints on
code design, is that trapping sets cannot be totally eliminated
at the end due to the “unavoidable” existence of cycles
in their underlying bipartite Tanner graphs especially for
relatively short block length codes (which is the focus of this
work). In addition, LDPC codes designed to reduce trapping
sets may result in large interconnect complexity increasing
hardware implementation overhead. The second approach is
therefore considered to be more applicable for our purpose
and is the basis of the contributions presented in this
paper.
In order to enhance decoder performance in the presence
of (unavoidable) trapping sets, an algorithm is introduced
in [3] based on ﬂipping the hard decoded bits in trapping
sets. First, trapping sets are identiﬁed and stored in a
lookup table based on BP decoding simulation. Whenever
the decoder fails, the decoder uses the lookup table based
on the unsatisﬁed parity checks to determine if a preknown
failure is detected. If a match occurs, the decoder simply
ﬂips the hard decision values of trapping bits. This approach
2 EURASIP Journal on Wireless Communications and Networking
suﬀers from the following disadvantages: (1) the decoder has
to exactly specify the trapping sets variable nodes in order
to ﬂip them; (2) extra time is needed to search the lookup
table for a trapping set; (3) the technique is not amenable to
practical hardware implementation.
In [6, 8], the concept of averaging partial results is used to
overcome the negative eﬀect of trapping sets in the error ﬂoor
region. Variable node messages update in the conventional
BP decoder are modiﬁed in order to make it less sensitive
to oscillations in messages received from check nodes. The
variable node equation is modiﬁed to be the average of
current and previous signals values received from check
nodes. While this approach is eﬀective in handling oscillating
error patterns, it does not improve decoder performance in
the case of constant error patterns.
In this paper, we propose a novel approach for enhanc
ing decoder performance in presence of trapping sets by
introducing a new concept called trapping sets neutralization.
The eﬀect of a trapping set can be eliminated by setting its
variable nodes intrinsic and extrinsic values to zero, that
is, neutralizing them. After a trapping set is neutralized,
the estimated values of variable nodes are aﬀected only by
external messages from nodes outside the trapping set.
Most harmful trapping sets are identiﬁed by means of
simulation. To be able to neutralize identiﬁed trapping sets,
a simple algorithm is introduced to store trapping sets
conﬁguration information in variable and check nodes.
The remainder of this paper is organized as follows:
In Section 2, we give an overview of LDPC codes and BP
algorithm. Trapping sets identiﬁcation and neutralization are
introduced in Section 3. Section 4 presents the algorithm of
trapping sets neutralization based on learning. Experimental
results are given in Section 5. In Section 6, we conclude the
paper.
2. OVERVIEWOF LDPC CODES
LDPC codes are a class of linear block codes that use a sparse,
randomlike paritycheck matrix H [1, 10]. An LDPC code
deﬁned by the paritycheck matrix H represents the parity
equations in a linear form, where any given codeword u
satisﬁes the set of parity equations such that u×H = 0. Each
column in the matrix represents a codeword bit while each
row represents a paritycheck equation.
LDPC codes can also be represented by bipartite graphs,
usually called Tanner graphs, having two types of nodes:
variable nodes and check nodes interconnected by edges
whenever a given information bit appears in the parity
check equation of the corresponding check bit, as shown in
Figure 1.
The properties for an (N, K) LDPC code speciﬁed by an
M × N H matrix can be summarized as follows.
– Block size: number of columns (N) in the H matrix.
– Number of information bits: given by K = N − M.
– Rate: the rate of the information bits to the block
size. It equals 1 − M/N, given that there are no linear
dependent rows in the H matrix.
v
1
v
2
v
3
v
4
v
5
c
1
c
2
c
3
v
1
v
2
v
3
v
4
v
5
c
1
c
2
c
3
+
+
+
H =
⎡
⎢
⎣
1 1 1 1 0
1 0 1 0 1
0 1 0 1 1
⎤
⎥
⎦
Figure 1: The two representations of LDPC codes: graph form and
matrix form.
– Check node degree: number of 1’s in the correspond
ing row in the H matrix. Degree of a check node c
j
is
referred to as d(c
j
).
– Variable node degree: number of 1’s in the corre
sponding column in the H matrix. Degree of a
variable node v
i
is referred to as d(v
i
).
– Regularity: an LDPC code is said to be regular if
d(v
i
) = p for 1 ≤ i ≤ N and d(c
j
) = q for 1 ≤ j ≤ M.
In this case, the code is (p, q) regular LDPC code.
Otherwise, the code is considered irregular.
– Code girth: the minimum cycle length in the Tanner
graph of the code.
The iterative messagepassing belief propagation algorithm
(BP) [1, 10] is commonly used for decoding LDPC codes
and is shown to achieve optimum performance when the
underlying code graph is cyclefree. In the following, a
brief summary of the BP algorithm is given. Following
the notation and terminology used in [11], we deﬁne the
following:
(i) u
i
: transmitted bit in a codeword, u
i
∈ {0, 1}.
(ii) x
i
: a transmitted channel symbol, with a value given
by
x
i
=
⎧
⎨
⎩
+1, when u
i
= 0
−1, when u
i
= 1.
(1)
(iii) y
i
: a received channel symbol, y
i
= x
i
+ n
i
, where n
i
is zeromean additive white Gaussian noise (AWGN)
random variable with variance σ
2
.
(iv) For the jth row in an H matrix, the set of column
locations having 1’s is given by R
j
= {i : h
ji
= 1}. The
set of column locations having 1’s, excluding location
I, is given by R
j\i
= {i
: h
ji
= 1} \ {i}.
(v) For the ith column in an H matrix, the set of row
locations having 1’s is given by C
i
= {j : h
ji
= 1}. The
set of row locations having 1’s, excluding the location
j, is given by C
i\j
= {j
: h
j
i
= 1} \ {j}.
Esa Alghonaim et al. 3
c
j
q
i j
(b)
v
i
(a)
c
j
r
ji
(b)
v
i
(b)
Figure 2: (a) Variabletocheck message, (b) checktovariable
message.
(vi) q
i j
(b): message (extrinsic information) to be passed
from variable node v
i
to check node c
j
regarding
the probability of u
i
= b, b ∈ {0, 1}, as shown in
Figure 2(a). It equals the probability that u
i
= b given
extrinsic information from all check nodes, except
node c
j
.
(vii) r
ji
(b): message to be passed from check node c
j
to
variable node v
i
, which is the probability that the jth
check equation is satisﬁed given that bit u
i
= b and
the other bits have separable (independent) distribu
tion given by {q
i j
}
j
/ = j
, as shown in Figure 2(b).
(viii) Q
i
(b) = the probability that u
i
= b, b ∈ {0, 1}.
(ix)
L(u
i
) ≡ log
Pr(x
i
= +1  y
i
)
Pr(x
i
= −1  y
i
)
= log
Pr(u
i
= 0  y
i
)
Pr(u
i
= 1  y
i
)
,
(2)
where L(u
i
) is usually referred to as the intrinsic
information for node v
i
.
(x)
L(r
ji
) ≡ log
r
ji
(0)
r
ji
(1)
, L(q
i j
) ≡ log
q
i j
(0)
q
i j
(1)
. (3)
(xi)
L(Q
i
) ≡ log
Q
i
(0)
Q
i
(1)
. (4)
The BP algorithm involves one initialization step and three
iterative steps as shown below.
Initialization step
Set the initial value of each variable node signal as follows:
L(q
i j
) ≡ L(u
i
) = 2y
i
/σ
2
, where σ
2
is the variance of noise in
the AWGN channel.
Iterative steps
The three iterative steps are as follows.
(i) Update check nodes as follows:
L
r
ji
=
i
∈Rj\i
α
i
j
× φ
i
∈Rj\i
φ
β
i
j
, (5)
where α
i
j
= sign(L(q
i j
)), β
i j
= L(q
i j
),
φ(x) = −log
tanh(x/2)
= log
e
x
+ 1
e
x
− 1
. (6)
(ii) Update variable nodes as follows:
L(q
i j
) = L(u
i
) +
j
∈Ci\j
L
r
j
i
. (7)
(iii) Compute estimated variable nodes as follows:
L(Q
i
) = L
u
i
+
j∈Ci
L
r
ji
. (8)
Based on L(Q
i
), the estimated value of the received bit ( u
i
) is
given by
u
i
=
⎧
⎨
⎩
1, if L
Q
i
< 0,
0, else.
(9)
During LDPC decoding, the iterative steps (i) to (iii) are
repeated until one of the following two events occurs:
(i) the estimated vector u = ( u
1
, . . . , u
n
) satisﬁes the
check equations, that is, u· H = 0;
(ii) maximum iterations number is reached.
3. TRAPPINGSETS
In BP decoding of LDPC codes, dominant decoding failures
are, in general, caused by a combination of multiple cycles
[4]. In [2], the combination of error bits that leads to a
decoder failure is deﬁned as trapping sets. In [3], it is shown
that the dominant trapping sets are formed by a combination
of short cycles present in the bipartite graph.
In the following, we adopt the terminology and notation
related to trapping sets as originally introduced in [8]. Let
H be the paritycheck matrix of (N, K) LDPC code, and let
G(H) denote its corresponding Tanner graph.
Deﬁnition 1. A (z, w) trapping set T is a set of z variable
nodes, for which the subgraph of the z variable nodes and
the check nodes that are directly connected to them contains
exactly w odddegree check nodes.
The next example illustrates the behavior of trapping sets
and how they are harmful.
Example 2. Consider a regular (N, K) LDPC code with
degree (3,6). Figure 3 shows a trapping set T(4, 2) in the code
graph. Assume that an allzero codeword (u = 0) is sent
through an AWGN channel, and all bits are received correctly
(i.e., have positive intrinsic values) except the 4 bits in the
trapping set T(4, 2), that is, L(u
i
) < 0 for 1 ≤ i ≤ 4 and
L(u
i
) > 0 for 4 < i ≤ N. (Assume logic 0 is encoded as +1,
while logic 1 is encoded as −1).
4 EURASIP Journal on Wireless Communications and Networking
v
1
v
2 v
3
v
4
c
1
c
2
c
3
c
4
c
5
c
6
c
7
Figure 3: Trapping set example of T(4, 2).
Based on (8), the estimated value of a variable node is the
sum of its intrinsic information and messages received from
the neighboring three check nodes. Therefore, the estimation
equation for each variable node contains four summation
terms: the intrinsic information and three information
messages. In this case, the estimated values for v
1
(and v
3
)
will be incorrect because all of the four summation terms
of its estimation equation are negative. For v
2
(and v
4
),
three out of the four summation terms in its estimation
equation have negative values. Therefore, v
2
(and v
4
) has
high probability to be incorrectly estimated. In this case,
the decoder becomes in trap and will continue in the trap
unless positive signals from c
1
and/or c
2
are strong enough
to change the polarities of the estimated values of v
2
and/or
v
4
. This example illustrates a trapping set causing a constant
error pattern.
As a ﬁrst step to investigate the eﬀect of trapping sets on
LDPC codes performance, extensive simulations for LDPC
codes over AWGN channels with various SNR values have
been performed. A frame is considered to be in error if the
maximum decoding iteration is reached without satisfying
the check equations, that is, the syndrome u × H is nonzero.
Error frames are classiﬁed based on observing the behavior of
the LDPC decoder at each decoding iteration. At the end of
each iteration, bits in error are counted. Based on this, error
frames are classiﬁed into three patterns described as follows.
(i) Constant error pattern: where the bit error count
becomes constant after only a few decoding itera
tions.
(ii) Oscillating error pattern: where the bit error count
follows a nearly periodic change between maximum
and minimum values. An important feature of this
error pattern is the high variation in bit error count
as a function of decoding iteration number.
(iii) Randomlike error pattern: where the bit error count
evolution follows a random shape, characterized by
low variation range.
Figure 4 shows one example for each of the three error pat
terns. In a constant error pattern, bit errors count becomes
constant after several decoding iterations (10 iterations in the
300
250
200
150
100
50
0
N
u
m
b
e
r
o
f
e
r
r
o
r
b
i
t
s
0 10 20 30 40 50 60 70 80 90 100
Iteration
Oscillating
Constant
Randomlike
Figure 4: Illustration of the three types of error patterns.
Table 1: Percentages of error patterns at errorﬂoor region.
Code Size Constant Oscillating Randomlike
HE(1024,512) 59% 38% 3%
RND(1024,512) 95% 4% 1%
PEG(100,50) 90% 5% 5%
example of Figure 4). In this case, the decoder becomes stuck
due to the presence of a tapping set T(z, w), and the number
of bits in error equals z and all check nodes are satisﬁed
except w check nodes.
The major diﬀerence between a trapping set T(z, w)
causing a constant error pattern and a trapping set T(e, f )
causing other patterns is the number of odddegree check
nodes. Based on extensive simulations, it is found that w ≤
f . This result is interpreted logically as follows: if variable
nodes of a trapping set are in error, only odddegree check
nodes are sending correct messages to the variable nodes of
the trapping set. Therefore, as the number of odddegree
check nodes decreases, the probability of breaking the trap
decreases. As an extreme example, a trapping set with no
odddegree check nodes results in a decoder convergence to
a codeword other than the transmitted one and thus causes
undetected decoder failure.
Table 1 shows examples of percentages of the three error
patterns for three LDPC codes based on simulating the codes
at errorﬂoor regions. The ﬁrst LDPC code, HE(1024,512)
[12], is constructed to be interconnected eﬃciently for
fully parallel hardware implementation. The RND(1024,512)
LDPC code is randomly constructed avoiding cycles of size
4. The PEG(100,50) LDPC code is constructed using PEG
algorithm [7], which maximizes the size of cycles in the code
graph. From Table 1, it is evident that constant error patterns
are signiﬁcant in some LDPC codes including short length
codes.
Esa Alghonaim et al. 5
This observation motivates the need for developing
a technique for enhancing decoder performance due to
trapping sets of constant error patterns type. For trapping
sets that cause constant error patterns, when a trap occurs,
values of check equations do not change in subsequent
iterations. Thus, a decoder trap is detected based on check
equations results. The unsatisﬁed check nodes are used to
reach the trapping set variable nodes.
3.1. BP decoder trapping sets detection
In order to eliminate the eﬀect of trapping sets during
the iterations of BP decoder, a mechanism is needed to
detect the presence of a trapping set. The proposed trapping
sets detection technique is based on monitoring the state
of the check equations vector u × H. At the end of each
decoding iteration, a new value of u × H is computed.
If the u × H value is nonzero and remains unchanged
(stable) for a predetermined number of iterations, then a
decoder trap is detected. We call this number the stability
parameter (d), and it is normally set to a small value. Based
on experimental results, it is found that d = 3 is a good
choice. The implementation of trap detection is similar to the
implementation of valid codeword detection with some extra
logic in each check node. Figure 5 shows an implementation
of trapping sets detection for a decoder with M check nodes.
The output s
i
for a check node c
i
is logic zero if the check
equation result is equivalent to the check equation result
in the previous iteration, that is, no change in the check
equation result. The output S is zero if there is no change
in all check equations between the current and the previous
iteration numbers.
3.2. Trapping sets neutralization
In this section, we introduce a new technique to overcome
the detrimental eﬀect of trapping sets during BP decoding.
To overcome the negative impact of a trapping set T(z, w),
the basic idea is to neutralize the z variable nodes in the
trapping set. Neutralizing a variable node involves setting
its intrinsic value and extrinsic message values to zero.
Speciﬁcally, neutralizing a variable node v
i
involves the
following two steps:
(1) L(u
i
) = 0,
(2) L(q
i j
) = 0, 1 ≤ j ≤ d(v
i
).
The neutralization concept is illustrated by the following
example.
Example 3. For the trapping set T(4, 2) in Example 2, it has
been shown that when all code bits are received correctly
except T(4, 2) bits, the decoder fails to correct the codeword
resulting in an error pattern of constant type.
Now, consider neutralizing the trapping set variable
nodes by setting its intrinsic and extrinsic values to zero. After
neutralization, the decoder converges to a valid codeword
within two iterations, as follows. In the ﬁrst iteration after
neutralization, for v
2
and v
4
, two extrinsic messages become
positive due to positive messages from nodes c
1
and c
2
,
which shifts estimated values of v
2
and v
4
to the positive
correct values. For nodes v
1
and v
3
, all extrinsic values are
zeros and their estimated values remain zero. In the second
iteration after neutralization, for v
1
and v
3
, two extrinsic
messages become positive due to positive extrinsic messages
from nodes v
2
and v
4
, which shifts estimated values of v
1
and
v
3
to the positive correct values.
The proposed neutralization technique has three impor
tant characteristics. (1) It is not necessary to exactly deter
mine the variable nodes in a trapping set, such as the
trapping set bits ﬂipping technique used in [3]. In the
previous example, if only 3 out of the 4 trapping sets variables
are neutralized, the decoder will still be able to recover
from the trap. (2) If some nodes outside a trapping set are
neutralized (due to inexact identiﬁcation of the trapping set),
their extrinsic messages are expected to quickly recover their
estimation function to correct values due to correct messages
from neighbouring nodes. This is because most of the
extrinsic messages are correct in the errorﬂoor regions. (3)
Neutralization is performed during BP decoding iterations as
soon as a trapping set is detected, which makes the decoder
able to converge to a valid codeword within the allowed
maximum number of iterations.
As an example, for the nearconstant error pattern in
Figure 4, a trap occurs at iteration 10 and is detected at
iteration 13 (assuming d = 3). In this case, the decoder
has a plenty of time to neutralize the trapping set before
reaching the maximum 100 iterations. In general, based on
our simulations, a decoder trap is detected during early
decoding iterations.
4. BP DECODER WITHTRAPPINGSETS
NEUTRALIZATIONBASEDONLEARNING
In this section, we introduce an algorithmto correct constant
error pattern types (causing error ﬂoors) associated with
LDPC BP decoding. The proposed algorithm involves two
parts: (1) a preprocessing phase called learning phase and
(2) actual decoding phase. The learning phase is an oﬄine
computation process in which trapping sets are identiﬁed.
Then, variable and check nodes are conﬁgured according to
the identiﬁed trapping sets. In the actual decoding phase, the
proposed decoder runs as a standard BP decoder with the
ability to detect and neutralize trapping sets using variable
and check nodes conﬁguration information obtained during
the learning phase. When a trapping set is detected, the
decoder stops running BP iterations and switches to a
neutralization process, in which the detected trapping set is
neutralized. Upon completion of the neutralization process,
the decoder resumes to normal running of BP iterations. The
neutralization process involves forwarding messages between
the trapping sets check and variable nodes.
Before proceeding with the details of the proposed
decoder, we give an example on how variable and check
nodes are conﬁgured during the learning phase and how
this conﬁguration is used to neutralize a trapping set during
actual decoding.
6 EURASIP Journal on Wireless Communications and Networking
Control logic
Counter
Trap
detection
S
s
2
c
1
c
2
c
M
s
2
c
2
u
1
u
2
u
d(c2)
Figure 5: Decoder trap detection circuit.
v
1
v
2
v
3
v
4
c
1
c
2
c
3
c
4
c
5
c
6
c
7
1
1 2
3
4
5
Figure 6: Tree structure for the trapping set T(4, 2).
Example 4. Given the trapping set T(4, 2) of the previous
example, we show the following: (a) how the nodes of
this trapping set are conﬁgured, (b) how the neutralization
process is performed during the actual decoding phase.
(a) In the learning phase, the trapping set nodes {c
1
,
c
2
, c
3
, c
4
, c
5
, c
6
, c
7
, v
1
, v
2
, v
3
, v
4
} are conﬁgured for neutraliza
tion. First, a tree is built corresponding to the trapping set
starting with odddegree check nodes as the ﬁrst level of the
tree, as shown in Figure 6. The reason for starting from odd
degree check nodes is because they are the only gates leading
to a trapping set when the decoder is in a trap. When the
decoder is stuck due to a trapping set, all check nodes are
satisﬁed except the odddegree check nodes of the trapping
set. Therefore, odddegree check nodes in trapping sets are
the keys for the neutralization process.
Degree one check nodes in the trapping set (c
1
and
c
2
in this example) are conﬁgured to initiate messages to
their neighboring variable nodes requesting them to perform
neutralization. We call these messages: neutralization initia
tion messages. In Figure 6, arrows pointing out from a node
indicate that the node is conﬁgured to forward a neutral
ization message to its neighbor. The task of neutralization
message forwarding in a trapping set is to send neutralization
message to every variable node in the trapping set. In our
example, c
1
and c
2
are conﬁgured for neutralization message
initiation, while v
2
, c
3
, and c
6
are conﬁgured for neutral
ization messages forwarding. This conﬁguration is enough
to forward neutralization messages to all variable nodes in
the trapping set. Another possible conﬁguration is that c
1
and c
2
are conﬁgured for neutralization message initiation
while v
4
, c
4
, and c
7
are conﬁgured for neutralization messages
forwarding. Thus, in general, there is no need to conﬁgure all
trapping set nodes.
(b) Now, assume that the proposed decoder is running
BP iterations and falls in a trap due to T(4, 2). Next, we
show how the preconﬁgured nodes are able to neutralize the
trapping set T(4, 2) in this example. First, the decoder detects
a trap event and then it stops running BP iterations and
switches to a neutralization process. The decoder runs the
neutralization process for a ﬁxed number of cycles and then
resumes running the BP iterations. In the ﬁrst cycle of the
neutralization process, the unsatisﬁed check nodes initiate a
neutralization message according to the conﬁguration stored
in them during the learning phase. Because the decoder
failure is due to T(4, 2), all check nodes in the decoder are
satisﬁed except the two check nodes c
1
and c
2
. Therefore,
only c
1
and c
2
initiate neutralization messages to nodes
v
2
and v
4
, respectively. In the second neutralization cycle,
variable nodes v
2
and v
4
receive neutralization messages and
perform neutralization, and v
2
forwards the neutralization
message to c
3
and c
6
. In the third neutralization cycle, c
3
and c
6
receive and forward the neutralization messages to
v
1
and v
3
, respectively, which in turn perform neutralization
but do not forward neutralization messages. After that, no
message forwarding is possible until neutralization cycles
end. After the neutralization process, the decoder resumes
running BP iterations. The proposed decoder converges to a
valid codeword within two iterations after resuming running
BP iterations, as previously shown in Example 3.
Before discussing the neutralization algorithm, a descrip
tion for the conﬁguration parameters used in variable and
check nodes is given followed by an illustrative example. Each
variable node v
i
is assigned a bit γ
i
, and each check node c
j
is assigned a bit β
q
j
and a word α
q
j
for each of its links q. The
following is a description for these parameters.
γ
i
: message forwarding conﬁguration bit assigned for a
variable node v
i
. When a variable node v
i
receives a neutral
ization message, it acts as follows. If γ
i
= 1, then v
i
forwards
the received neutralization message to all neighboring check
nodes except the one that sent the message; otherwise it does
not forward the received message.
β
q
j
: message initiation conﬁguration bit assigned for a
link indexed q in a check node c
j
, where 1 ≤ q ≤ d(c
j
).
Esa Alghonaim et al. 7
Inputs: LDPC code,
(γ
i
, β
q
j
, α
q
j
): nodes conﬁguration,
Result of the check equation in each check node c
j
,
nt cycles: number of neutralization cycles
Output: Some variable nodes are neutralized
1. For each check node c
j
with unsatisﬁed equation do
for 1 ≤ q ≤ d(c
j
), if β
q
j
= 1 then initiate
a neutralization message through link q
2. l = 1 // Current number of neutralization cycle
3. While l ≤ nt cycles do
For each variable node v
i
that received a
neutralization message do the following:
– perform node neutralization on v
i
– if γ
i
= 1 then forward the message to all neighbors
For every check node c
j
that received a
neutralization message through link p do the
following:
– for 1 ≤ q ≤ d(c
j
), if the bit α
q
j
(p) is set then
forward the message through link q
l = l + 1
Algorithm 1: Trapping sets neutralization algorithm.
α
q
j
: message forwarding conﬁguration word assigned for
a link indexed q in a check node c
j
, where 1 ≤ q ≤ d(c
j
).
The size of α
q
j
in bits equals d(c
j
). If a check node c
j
has to
forward a neutralization message received at link indexed p
through a link indexed q, then α
q
j
is conﬁgured by setting
the bit number p to 1, that is, the bit α
q
j
(p) is set to 1.
For example, if a degree 6 check node c
j
has to forward a
neutralization message received at the link indexed 2 through
the link indexed 3, α
3
j
is conﬁgured to (000010)
2
, that is,
α
3
j
(2) = 1.
The following example illustrates variable and check
nodes conﬁguration values for a given trapping set.
Example 5. Assume that the trapping set T(4, 2) in Figure 6
is identiﬁed in a regular (3,6) LDPC code. Check nodes links
indices are indicated on the links, for example, in c
1
, (c
1
, v
2
)
link has index 5. The conﬁguration for this trapping set is
shown in Table 2.
Algorithm 1 lists the proposed trapping set neutraliza
tion algorithm. Since the decoder does not know how many
cycles are needed to neutralize a trapping set, it performs
neutralization and message forwarding cycles for a preset
number (nt cycles). For example, two neutralization cycles
are needed to neutralize the trapping set shown in Figure 6.
The number of neutralization cycles is preset during the
learning phase to the maximum number of neutralization
cycles required for all trapping sets. Based on simulation
results, it is found that a small number of neutralization
cycles are often required. For example, 5 neutralization cycles
are found suﬃcient to neutralize trapping sets of 20 variable
nodes.
Inputs: LDPC code,
no failures: number of processed decoder failures
Output: TS List
1. TS List = ∅, failures = 0
2. While failures ≤ no failures do
u = 0, x = +1, y = x + n // transmit a codeword
Decode y using standard BP decoder.
If u· H = 0 then goto 2 // Valid codeword
failures = failures + 1
Redecode y observing trap detection indicator
If a decoder trap is not detected then goto 2
TS = List of variable nodes v
i
in error u
i
= 1) and
unsatisﬁed check nodes.
If TS ∈ TS List then increment TS weight
Else add TS to TS List and set its weight to 1
Algorithm 2: Trapping sets identiﬁcation algorithm.
4.1. Trapping sets learning phase
The trapping sets learning phase involves two steps. First,
the trapping sets of a given LDPC code are identiﬁed.
Then, variable and check nodes are conﬁgured based on the
identiﬁed trapping sets.
4.1.1. Trapping sets identiﬁcation
Trapping sets can be identiﬁed based on two approaches.
(1) By performing decoding simulations and observing
decoder failures [2]. (2) By using graph search methods [3].
The ﬁrst approach is adopted in this work as it provides
information on the frequency of occurrence of each trapping
set, considered as its weight. This weight is computed based
on how many decoder failures occur due to that trapping
set and is used to measure its negative impact compared to
other trapping sets. The priority of conﬁguring nodes for a
trapping set is assigned according to its weight; more harmful
trapping sets are given higher conﬁguration priority.
Algorithm 2 lists the proposed trapping sets identi
ﬁcation algorithm. Decoding simulations of an allzeros
codeword with AWGN are performed until a decoder failure
is observed. Then, the received frame y that caused the
decoding failure is identiﬁed, and decoding iterations are
redone while observing trap detection indicator. If a trap
is not detected, then decoding simulations are continued
searching for another decoder failure. However, if a trap is
detected, then the trapping set TS is identiﬁed as follows.
First, the unsatisﬁed check nodes are considered the odd
degree check nodes in the trapping set TS while the variable
nodes with hard decision errors ( u
i
= 1) are considered
the variable nodes of the trapping set. Finally, if the
identiﬁed trapping set TS is already in the trapping sets list,
TS List, then its weight is incremented by one; otherwise
the identiﬁed trapping set is added to the trapping sets list,
TS List, and its weight is set to one.
8 EURASIP Journal on Wireless Communications and Networking
Table 2: Nodes conﬁguration for T(4, 2).
Conﬁguration Meaning
β
5
1
= 1 c
1
initiates a message through link 5 (i.e., initiates message to v
2
).
β
3
2
= 1 c
2
initiates a message through link 3 (i.e., initiates message to v
4
).
γ
2
= 1 v
2
forwards incoming messages to all neighbors.
α
2
3
= (000001)
2
c
3
forwards incoming messages from link 1 to link 2 (i.e., from v
2
to v
1
).
α
1
6
= (001000)
2
c
6
forwards incoming messages from link 4 to link 1 (i.e., from v
2
to v
3
).
Inputs: TS List, LDPC code of size (N, K)
Outputs: γ
i
, 1 ≤ i ≤ N
β
q
j
, α
q
j
, 1 ≤ j ≤ N − K, 1 ≤ q ≤ d(c
j
)
1. γ
i
= 0 for 1 ≤ i ≤ N
β
q
j
= 0 and α
q
j
= 0, for 1 ≤ j ≤ N − K and
1 ≤ q ≤ d(c
j
)
2. Sort TS List according to trapping sets weights in a
descending order.
3. k = 1
4. While (k ≤ size of TS List) do
Update conﬁguration so that it includes TS
k
Compute ω
j
for 1 ≤ j ≤ k
If ω
j
≤ T for 1 ≤ j ≤ k then
accept conﬁguration update
Else reject TS
k
and reject conﬁguration update
k = k + 1
Algorithm 3: Nodes conﬁguration algorithm.
4.1.2. Nodes conﬁguration
The second step in the trapping sets learning phase is to
conﬁgure variable and check nodes in order for the decoder
to be able to neutralize identiﬁed trapping sets during
decoding iterations.
Before discussing the conﬁguration algorithm, we discuss
the case when two trapping sets have common nodes and
its impact on the neutralization process. Then, we propose
a solution to overcome this problem. This is illustrated
through the following example.
Example 6. Figure 7 shows partial nodes of two trapping sets
TS
1
and TS
2
in a regular (3,6) LDPC code. {v
1
, v
3
, v
5
} ∈ TS
1
,
and {v
2
, v
3
, v
4
} ∈ TS
2
. v
3
is a common node between TS
1
and TS
2
. Conﬁguration values after conﬁguring nodes for
TS
1
and TS
2
are as follows:
α
3
1
= (000011)
2
(Link 3 in c
1
forwards messages received
from link 1 or link 2);
γ
3
= 1 (v
3
forwards messages to neighbors);
α
2
2
= (000001)
2
(Link 2 in c
2
forwards messages received
from link 1);
α
2
3
= (000001)
2
(Link 2 in c
3
forwards messages received
from link 1).
Therefore, when the decoder performs a neutralization
process due to TS
1
, node v
4
will be neutralized although it
is not a subset of TS
1
. Similarly, performing neutralization
v
1
v
2
v
3
v
4 v
5
c
1
c
2
c
3
1 2
3
1 1
2
2
TS
1
TS
2
Figure 7: Example of common nodes between two trapping sets.
process due to TS
2
causes node v
5
(which is not a subset
of TS
2
) to be neutralized. Fortunately, as mentioned in
Section 3.1, when the decoder is in a trap due to a trapping
set TS, the decoder converges to a valid codeword even if
some variable nodes outside TS have been unnecessarily neu
tralized. However, based on simulation results, neutralizing a
large number of variable nodes other than the desired nodes
leads to a decoder failure.
Having introduced the trapping sets common nodes
problem, we next show the proposed solution for this
problem. Deﬁne ω
j
for each trapping set TS
j
as follows.
ω
j
: ratio of neutralized variable nodes outside the set TS
j
to the total number of variable nodes (N).
Deﬁne T as the maximum allowed value for ω
j
. The
proposed solution is as follows: after conﬁguring a trapping
set TS
k
, we compute ω
j
for 1 ≤ j ≤ k. If ω
j
≤ T for
1 ≤ j ≤ k, then we accept the new conﬁguration, otherwise,
TS
k
is rejected and the conﬁguration is restored to its state
before conﬁguring TS
k
.
Algorithm 3 lists nodes conﬁguration algorithm. Ini
tially, conﬁgurations of all variable and check nodes are set
to zero, step 1. This means that no node is allowed to initiate
or forward a neutralization message. Sorting in step 2 is
important to give more harmful trapping sets (with greater
weight) conﬁguration priority over less harmful trapping
sets. Step 4 processes trapping sets in TS List one by one.
For each trapping set TS
k
, update nodes conﬁguration by
setting nodes conﬁguration parameters (γ
i
, β
q
j
, α
q
j
) related to
variable and check nodes in TS
k
. Then, for each previously
Esa Alghonaim et al. 9
Inputs: LDPC code,
Nodes conﬁguration (γ
i
, β
q
j
, α
q
j
),
data received from channel,
max iter: maximum iterations,
nt cycles: number of neutralization cycles
Output: decoded codeword
1. iter = 0, nt done = 0
2. iter = iter + 1
3. Run a normal BP decoding iteration.
4. If u· H = 0 then stop // valid codeword
5. If iter = max iter then stop // decoder failure
6. If decoder trap is not detected then goto step 2
7. If (iter + nt cycles < max iter) and
(nt done = 0) then do:
– Perform neutralization // Algorithm 1
– iter = iter + nt cycles
– nt done = 1
8. Goto step 2
Algorithm 4: The proposed learningbased decoder.
conﬁgured trapping set TS
j
, 1 ≤ j ≤ k, we compute ω
j
. The
parameter ω
j
for a trapping set TS
j
is computed as follows:
check equations for all check nodes of the decoder are set as
satisﬁed (i.e., assigned zero values) except odddegree check
nodes in TS
j
, and then a neutralization process is performed
as in Algorithm 1. The actual number of neutralized variable
nodes outside the trapping set variable nodes is divided by N
(code size) to get ω
j
. If the ω
j
parameter for all previously
conﬁgured trapping sets is less than or equal to the threshold
T, then the new conﬁguration is accepted, otherwise TS
k
is
rejected (ignored) and nodes conﬁguration is restored to the
state before the last update.
4.2. The proposed learningbased decoder
The algorithm of the proposed learningbased decoder
is listed in Algorithm 4. The algorithm is similar to the
conventional BP decoding algorithm with the addition of
trapping sets detection and neutralization. Note that if a
trapping set is not detected during decoding iterations,
then the proposed algorithm becomes identical to the
conventional BP decoder. After each decoding iteration, the
trap detection ﬂag is checked, step 6. If a trap is detected, then
normal decoding iterations are paused, the decoder performs
a neutralization process based on Algorithm 1, the iteration
number is increased by the number of neutralization cycles
to compensate for the time spent in the neutralization
process, and ﬁnally the decoder resumes conventional BP
iterations. In step 7, before performing a neutralization
process, the decoder checks nt done to make sure that no
neutralization process has been performed in the previous
iterations. This condition guarantees that the decoder will
not keep running into the same trap and perform the
neutralization process repeatedly. This may happen when a
trap is redetected before the decoder is able to get out of
it. Upon trap detection and before deciding to perform a
neutralization process, the decoder must check another con
dition. It must ensure that the decoding iterations left before
reaching maximum iterations are enough to perform a neu
tralization process, step 7. For example, consider a decoder
with 64 maximum decoding iterations and 5 neutralization
cycles. If a trapping set is detected at iteration number
62, the decoder will not have enough time to complete
neutralization process.
4.3. Hardware cost
The hardware cost for the proposed algorithm is considered
low. For trapping sets storage, we need to assign one bit for
each variable node (message forwarding bit). For each check
node c
i
, we need to assign one bit for message initiating and
one word of size d(c
i
) for message forwarding. Fortunately,
the communication links needed to forward neutralization
messages between check and variable nodes of the trapping
sets already exist as part of the BP decoding. Therefore,
no extra hardware cost is added for the communication
between trapping sets nodes. What is needed is a simple
control logic to decide to perform message initiation and
forwarding based on the stored forwarding information. The
decoder trap detection, shown in Figure 5, is implemented as
a logic tree similar to the tree of the valid codeword detection
implementation. The cost is low, as it mainly consists of a
simple logic circuit within the check nodes, in the addition to
an OR gate tree combining logic outputs from check nodes.
Using a simple multiplexer, valid code word detection logic
and trap detection logic can share most of their components.
It is worth emphasizing that it is not necessary to store
conﬁguration information for all variable and check nodes.
Only a subset included in the learned trapping sets is used,
which further reduces the required overhead.
5. EXPERIMENTAL RESULTS
In order to demonstrate the eﬀectiveness of the proposed
technique, extensive simulations have been performed on
several LDPC code types and sizes over BPSK modulated
AWGN channel. The maximum number of iterations is set
to 64. Due to the required CPUintensive simulations, espe
cially at high SNR, a parallel computing simulation platform
was developed to run the LDPC decoding simulations on 170
nodes on a departmental LAN network [13].
The following is a brief description for the LDPC codes
used in the simulation.
HE(1024,512): a near regular LDPC code of size (1024,
512) constructed to be interconnect eﬃcient for fully parallel
hardware implementation [12].
RND(1024,512): regular (3,6) LDPC code of size (1024,
512) randomly generated with the avoidance of cycles of
size 4.
PEG(1024,512): irregular LDPC code of size (1024,512)
generated by PEG algorithm [7]. This algorithm maximizes
graph cycles and implicitly minimizes trapping sets of con
stant type.
10 EURASIP Journal on Wireless Communications and Networking
10
−3
10
−4
10
−5
10
−6
10
−7
F
r
a
m
e
e
r
r
o
r
r
a
t
e
(
F
E
R
)
2.5 2.75 3 3.25 3.5
SNR (dB)
Conventional BP decoding algorithm
Average decoding algorithm
Proposed algorithm
Proposed algorithm on top of average decoding
Figure 8: Performance results for RND(1024,512) LDPC code.
PEG(100,50): similar to the previous code, but its size is
(100,50).
MacKay(204,102): a regular LDPC code of size (204,102)
on MacKay’s website [14] labeled as 204.33.484.txt.
In each of the ﬁve codes, we compare performance results
for the proposed algorithm with conventional BP decoding
and the average decoding algorithm proposed in [8]. The
average decoding algorithm is a modiﬁed version of the
BP algorithm in which messages are averaged over several
decoding iterations in order to prevent sudden magnitude
changes in the values of variable nodes messages. We also
add another curve showing the performance of the proposed
algorithm on top of average decoding algorithm. Using
the proposed algorithm on top of averaging algorithm is
identical to the proposed algorithm listed in Algorithm 4,
except that in step 3 average decoding algorithm iteration
is taking place instead of normal BP decoding iteration. In
the learning phase of each LDPC code, we set trapping sets
detection parameter (d) to 3 and we set the threshold value
(T) to 10%.
Figure 8 shows the performance results for RND(1024,
512). It is evident that the performance of the proposed
learningbased algorithm outperforms that of the average
decoder in the errorﬂoor region. At low SNR region, average
decoding algorithm is better than the proposed algorithm.
The reason is due to the fewoccurrences of constant trapping
sets in the low SNR region. As SNR increases, constant error
frames increase until they become dominant in errorﬂoor
region. The proposed algorithm on top of average decoding
shows the best results in all SNR regions. This is because
it combines the advantages of the two algorithms: learning
based and average decoding as it improves both constant and
nonconstant type of patterns.
Figures 9 and 10 show the performance results for
the two LDPC codes, PEG(100,50) and PEG(1024,512).
10
−4
10
−5
10
−6
10
−7
10
−8
10
−9
F
r
a
m
e
e
r
r
o
r
r
a
t
e
(
F
E
R
)
5 5.5 6 6.5 7
SNR (dB)
Conventional BP decoding algorithm
Average decoding algorithm
Proposed algorithm
Proposed algorithm on top of average decoding
Figure 9: Performance results for PEG(100,50) LDPC code.
10
−3
10
−4
10
−5
10
−6
10
−7
F
r
a
m
e
e
r
r
o
r
r
a
t
e
(
F
E
R
)
2.5 2.75 3 3.25
SNR (dB)
Conventional BP decoding algorithm
Average decoding algorithm
Proposed algorithm
Proposed algorithm on top of average decoding
Figure 10: Performance results for PEG(1024,512) LDPC code.
While there is signiﬁcant improvement for the proposed
algorithm in PEG(100,50), there is almost no improve
ment in PEG(1024,512). The low improvement gain in
PEG(1024,512) is due to the low percentage (not more than
8%) of trapping sets that cause constant error patterns.
However, it is hard to implement PEG(1024,512) codes using
fully parallel architectures. As can be seen from the PEG code
construction algorithm [7], when a new connection is to be
added to a variable node, the selected check node for con
nection is the one in the farthest level of the tree originated
Esa Alghonaim et al. 11
10
−3
10
−4
10
−5
10
−6
10
−7
F
r
a
m
e
e
r
r
o
r
r
a
t
e
(
F
E
R
)
2.5 2.75 3 3.25
SNR (dB)
Conventional BP decoding algorithm
Average decoding algorithm
Proposed algorithm
Proposed algorithm on top of average decoding
Figure 11: Performance results for HE(1024,512) LDPC code.
Table 3: Results after the learning phase of HE(1024,512) LDPC
code.
i TS
i
size TS
i
weight ω
j
1 (8,2) 106 0%
2 (8,2) 49 0%
3 (12,2) 13 0%
4 (10,3) 9 1%
5 (8,3) 8 2%
6 (10,2) 7 0%
7 (7,3) 5 2%
8 (7,3) 5 3%
9 (7,3) 4 0%
10 (15,2) 3 0%
Table 4: Identiﬁed trapping sets and conﬁguration percentages for
diﬀerent LDPC codes.
CODE #TS %V %C
HE(1024,512) 55 27.15% 13.46%
RND(1024,512) 50 18.46% 9.9%
PEG(1024,512) 8 6.74% 3.42%
PEG(100,50) 57 60% 31.67%
MacKay(204,102) 40 50% 27.94%
from the variable node. This results in interconnections even
denser than pure random construction methods.
Figure 11 shows the performance for an interconnect
eﬃcient LDPC code, HE(1024,512) [12], that has been
implemented in a fully parallel hardware architecture. This
LDPC code is designed to have a balance between decoder
throughput and error performance. The ﬁgure shows that the
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−8
F
r
a
m
e
e
r
r
o
r
r
a
t
e
(
F
E
R
)
3 4 5 6
SNR (dB)
Conventional BP decoding algorithm
Average decoding algorithm
Proposed algorithm
Proposed algorithm on top of average decoding
Figure 12: Performance results for MacKay(204,102) LDPC code.
best performance is obtained using the proposed algorithm
on top of the average decoding algorithm. The performance
at 3.25B is not drawn due to the excessive simulation time
needed at this point.
Based on the results of all simulated codes, it is clearly
demonstrated that the application of the proposed algorithm
on top of average decoding achieves signiﬁcant performance
improvements in comparison with conventional LDPC
decoding. In particular, one can observe that performance
improvements are highlighted for LDPCcodes with relatively
low performance using conventional LDPC decoder. This
allows LDPC code design techniques to relax some of
the design constraints and focus on reducing hardware
complexity such as creating interconnecteﬃcient codes.
Table 3 lists part of the trapping sets that are identiﬁed
during the learning phase of the HE(1024,512) LDPC
code. The complete number of identiﬁed trapping sets
is 55. One may note that trapping sets with the highest
weights have small number of variable and odddegree check
nodes. Table 4 shows the number of identiﬁed trapping sets
and percentage of check and variable nodes conﬁgured to
perform neutralization messages forwarding. It is clear that
only a subset of the variable and check nodes is conﬁgured,
which further decreases hardware cost.
6. CONCLUSION
In this paper, we have introduced a newtechnique to enhance
the performance of LDPC decoders especially in the error
ﬂoor regions. This technique is based on identifying trapping
sets of constant error pattern and reducing their negative
impact by neutralizing them. The proposed technique, in
addition to enhancing performance, has simple hardware
architecture with reasonable overhead. Based on extensive
12 EURASIP Journal on Wireless Communications and Networking
simulations on diﬀerent LDPC code designs and sizes, it
is shown that the proposed technique achieves signiﬁcant
performance improvements for: (1) short LDPC codes, (2)
LDPC codes designed under additional constraints such as
interconnecteﬃcient codes. It is also demonstrated that the
application of the proposed technique on top of average
decoding achieves signiﬁcant performance improvements
over conventional LDPC decoding for all of the investigated
codes. This makes LDPC codes even more attractive for
adoption in various applications and enables the design
of codes that optimize hardware implementation without
compromising the required performance.
ACKNOWLEDGMENT
The authors would like to thank King Fahd University
of Petroleum & Minerals for supporting this work under
Project no. IN070376.
REFERENCES
[1] R. G. Gallager, Low Density ParityCheck Codes, MIT Press,
Cambridge, Mass, USA, 1963.
[2] T. Richardson, “Error ﬂoors of LDPC codes,” in Proceedings
of The 41st Annual Allerton Conference on Communication,
Control, and Computing, Monticello, Ill, USA, October 2003.
[3] E. Cavus and B. Daneshrad, “A performance improvement
and error ﬂoor avoidance technique for belief propagation
decoding of LDPC codes,” in Proceedings of the 16th IEEE
International Symposium on Personal, Indoor and Mobile Radio
Communications (PIMRC ’05), vol. 4, pp. 2386–2390, Berlin,
Germany, September 2005.
[4] T. Tian, C. Jones, J. D. Villasenor, and R. D. Wesel, “Con
struction of irregular LDPC codes with low error ﬂoors,” in
Proceedings of the IEEE International Conference on Commu
nications (ICC ’03), vol. 5, pp. 3125–3129, Anchorage, Alaska,
USA, May 2003.
[5] T. Tian, C. R. Jones, J. D. Villasenor, and R. D. Wesel, “Selective
avoidance of cycles in irregular LDPC code construction,”
IEEE Transactions on Communications, vol. 52, no. 8, pp. 1242–
1247, 2004.
[6] S. Gounai, T. Ohtsuki, and T. Kaneko, “Modiﬁed belief
propagation decoding algorithm for lowdensity parity check
code based on oscillation,” in Proceedings of the 63rd IEEE
Vehicular Technology Conference (VTC ’06), vol. 3, pp. 1467–
1471, Melbourne, Australia, May 2006.
[7] X.Y. Hu, E. Eleftheriou, and D.M. Arnold, “Progressive edge
growth Tanner graphs,” in Proceedings of the IEEE Global
Telecommunicatins Conference (GLOBECOM ’01), vol. 2, pp.
995–1001, San Antonio, Tex, USA, November 2001.
[8] S. L¨ andner and O. Milenkovic, “Algorithmic and combinato
rial analysis of trapping sets in structured LDPC codes,” in
Proceedings of the IEEE International Conference on Wireless
Networks, Communications and Mobile Computing (Wirless
Com ’05), vol. 1, pp. 630–635, Maui, Hawaii, USA, June 2005.
[9] G. Richter and A. Hof, “On a construction method of irregular
LDPC codes without small stopping sets,” in Proceedings of the
IEEE International Conference on Communications (ICC ’06),
vol. 3, pp. 1119–1124, Istanbul, Turkey, June 2006.
[10] D. J. C. MacKay, “Good errorcorrecting codes based on very
sparse matrices,” IEEE Transactions on Information Theory, vol.
45, no. 2, pp. 399–431, 1999.
[11] W. Ryan, “A LowDensity ParityCheck Code Tutorial, Part
II—the Iterative Decoder,” Electrical and Computer Engineer
ing Department, The University of Arizona, Tucson, Ariz,
USA, April 2002.
[12] M. Mohiyuddin, A. Prakash, A. Aziz, and W. Wolf, “Synthesiz
ing interconnecteﬃcient low density parity check codes,” in
Proceedings of the 41st Annual Design Automation Conference
(DAC ’04), pp. 488–491, San Diego, Calif, USA, June 2004.
[13] E. Alghonaim, A. ElMaleh, and M. Adnan AlAndalusi,
“Parallel computing platform for evaluating LDPC codes per
formance,” in Proceedings of the IEEE International Conference
on Signal Processing and Communications (ICSPC ’07), pp.
157–160, Dubai, United Arab Emirates, November 2007.
[14] D. C. Mackay codes, http://www.inference.phy.cam.ac.uk/
mackay/codes/.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 852397, 9 pages
doi:10.1155/2008/852397
Research Article
Distributed Generalized LowDensity Codes for
Multiple Relay Cooperative Communications
Changcai Han and Weiling Wu
Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications,
Beijing 100876, China
Correspondence should be addressed to Changcai Han, cchan09@gmail.com
Received 1 November 2007; Revised 17 March 2008; Accepted 9 July 2008
Recommended by Yonghui Li
As a class of pseudorandom error correcting codes, generalized lowdensity (GLD) codes exhibit excellent performance over both
additive white Gaussian noise (AWGN) and Rayleigh fading channels. In this paper, distributed GLD codes are proposed for
multiple relay cooperative communications. Speciﬁcally, using the partial error detecting and error correcting capabilities of the
GLDcode, each relay node decodes and forwards some of the constituent codes of the GLDcode to cooperatively forma distributed
GLDcode, which can work eﬀectively and keep a ﬁxed overall code rate when the number of relay nodes varies. Also, at relay nodes,
a progressive processing procedure is proposed to reduce the complexity and adapt to the sourcerelay channel variations. At the
destination, the soft information from diﬀerent paths is combined for the GLD decoder thus diversity gain and coding gain are
achieved simultaneously. Simulation results verify that distributed GLD codes with various number of relay nodes can obtain
signiﬁcant performance gains in quasistatic fading channels compared with the strategy without relays and the performance is
further improved when more relays are employed.
Copyright © 2008 C. Han and W. Wu. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
Cooperative communications can increase achievable rates
and decrease susceptibility to channel variations [1–3] and
have potential practical applications in cellular systems,
wireless ad hoc, and sensor networks. In cooperative com
munications, several relay protocols, such as amplifyand
forward (AF) [4], decodeandforward (DF) [1], and coded
cooperation [5, 6], have been proposed.
Based on relay protocols, various coding strategies can
be devised using ratecompatible punctured convolutional
(RCPC) codes, product codes, or concatenation codes and
the coding scheme design has become a hot topic in
the literature [7–12]. Speciﬁcally, lowdensity paritycheck
(LDPC) codes [13] are employed for relay networks in [7–
9] and distributed turbo codes are presented in [10–12].
Most of the cooperative strategies above are devised
for the classical threenode relay channel model, that is,
the network with only one relay. However, it has been
theoretically revealed that the diversity gain increases when
more relays participate in cooperation [14]. Moreover, in
wireless relay networks, the number of relays participating in
cooperation may vary from time to time due to the random
mobility of nodes [15]. Therefore, the coding schemes should
be easily adjusted when the relay number varies. Although
distributed turbo codes can be extended to networks with
diﬀerent number of relays using multiple turbo codes [16],
the overall code rate decreases with the increase of relay
number [11]. In this paper, based on the DF relay protocol,
a novel coding scheme is proposed for cooperative relay
networks using generalized lowdensity (GLD) codes [17–
19], which has a ﬁxed overall code rate.
GLD codes were ﬁrst introduced by Tanner in [17]
and then were further investigated in [18–20]. GLD codes
which make the generalization of Gallager’s LDPC codes
are constructed by replacing the paritycheck constraints in
LDPC codes with block code constraints. Similar to LDPC
codes, GLD codes can also be iteratively decoded and exhibit
excellent performance over both additive white Gaussian
noise (AWGN) [18, 19] and Rayleigh fading channels [20].
In the proposed scheme, each relay is only responsible
for forwarding one or several constituent codes of GLD
2 EURASIP Journal on Wireless Communications and Networking
s
d
Phase 1
Phase 2
Figure 1: Cooperative communication scenario.
codes according to the number of available relays using
the partial error detecting and error correcting capabilities
of GLD codes. Unlike distributed turbo codes in [11],
the overall code rate of distributed GLD codes is ﬁxed
when the relay number varies. Moreover, a progressive
processing strategy is proposed for relay nodes, which allows
partial decoding of the received codeword to reduce the
complexity in good sourcerelay channel conditions and
guarantees the robustness of the system in bad conditions.
At the destination, a combiner is added to collect the soft
information from diﬀerent nodes for the GLD decoder
and a little complexity is added to the destination node.
The signiﬁcant performance of distributed GLD codes over
quasistatic fading channels is further veriﬁed by simulations.
The remainder of this paper is organized as follows.
Section 2 brieﬂy describes the system model of coopera
tive communications with multiple relays in a cluster. In
Section 3, distributed GLD codes are proposed and the
processing algorithms at relays and the destination are pre
sented. Section 4 gives the simulation results of distributed
GLD codes. The conclusions are drawn in Section 5.
2. SYSTEMMODEL
In this paper, we investigate the scenario that the source
node transmits data to the distant destination aided by some
nearby nodes as depicted in Figure 1. We may further assume
that the source and relay nodes locate geographically in a
small region forming a transmit cluster, and thus the quality
of the channels from the source to relays is usually good. This
is approximately equivalent to the cooperative network with
multiple relays as presented in [15].
In this scenario, the cooperative relay group can be
assigned by some central nodes or via some other distributed
protocols. For example, the source node may send a “hello”
message to the surrounding nodes and those nodes which
respond properly veriﬁed according to some criteria form
a transmit cluster as introduced in [15]. Once a cluster is
formed, the relay set R is given. Let L denote the number
of available relays in the set R, where L equals the cardinality
of the set R denoted as R. Note that the cluster needs to
be reformed after some time due to the random mobility of
nodes. That is, the relay number L may vary from time to
time in cooperative relay networks accordingly.
All the channels from nodes in the cluster to the distant
destination are usually modeled as independent quasistatic
Rayleigh fading channels. All the internode channels, that
is, the channels between nodes in the same cluster, may
be modeled as independent AWGN channels due to strong
line of sight components [21], although this is not a critical
element of the scheme. We assume that all the nodes transmit
signals on orthogonal channels (e.g., CDMA, FDMA, or
TDMA) and are constrained to the halfduplex mode.
We assume the source s transmits signals to the destina
tion d aided by relay nodes in the set R. The cooperative
relay protocol in this scenario is usually consisted of two
phases as illustrated in Figure 1. At the source node s, let
c = [c
1
, c
2
, . . . , c
N
] denote the encoded bit vector, where N is
the code length. Then it is modulated with the binary phase
shift keying (BPSK) constellation to get x = [x
1
, x
2
, . . . , x
N
].
In the ﬁrst phase, the source s broadcasts its data x
i
, 1 ≤
i ≤ N and the received signal at the distant destination d is
given by
y
sd
i
=
√
P
s
h
sd
i
x
i
+ η
sd
i
, (1)
where P
s
is the power of each symbol from the source, h
sd
i
is the Rayleigh fading coeﬃcient of the channel from the
source to the destination and η
sd
i
denotes the AWGN with
the variance σ
2
.
Simultaneously, the broadcast data from the source can
also be received by relay nodes in the set R and the received
signal of the relay l is denoted as
y
sr
i,l
=
√
P
s
h
sr
i,l
x
i
+ η
sr
i,l
, (2)
where h
sr
i,l
represents the fading gain of the path from the
source node to the relay node l and η
sr
i,l
is the AWGN. In the
scenario of this paper, the internode channels are modeled as
AWGN channels, that is, for the pair of nodes belong to the
same cluster, h
sr
i,l
= 1. Relay nodes decode the data and some
error detecting codes such as cyclic redundancy check (CRC)
or other linear block codes are used to verify the decoding
results.
In the second phase, those relay nodes which decode
the data correctly aid the source to forward data to the
destination. Let x
l
i
denote the signal transmitted by the relay
l, which is received at the destination given by
y
rd
i,l
=
P
rd
l
h
rd
i,l
x
l
i
+ η
rd
i,l
, (3)
where P
rd
l
is the power of each symbol from the relay l, h
rd
i,l
is the Rayleigh fading coeﬃcient of the channel from the
relay l to the destination, and η
rd
i,l
denotes the AWGN with
the variance σ
2
l
.
We further assume that all the fading coeﬃcients such as
h
sd
i
and h
rd
i,l
are constant during a transmit frame and vary
from frame to frame, that is, quasistatic fading channels.
Various cooperative coding schemes can be designed to
achieve performance gains over quasistatic fading channels
by designing the transmit signals of the source and relay
C. Han and W. Wu 3
H
0
H
1
H
2
H
3
π
1
π
2
·
·
·
.
.
.
0
0
H =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
H
1
H
2
H
3
.
.
.
⎤
⎥
⎥
⎥
⎥
⎥
⎦
Figure 2: Structure of the paritycheck matrix H of a GLD code.
nodes. For fair comparison of diﬀerent strategies, the total
transmit power of each bit c
i
is usually ﬁxed as
P = P
s
+
l∈R
P
rd
l
. (4)
3. DISTRIBUTEDGLDCODES FOR COOPERATIVE
COMMUNICATIONS
3.1. GLDcodes and distributed GLDcodes
In this part, generalized lowdensity codes are introduced
and distributed GLD codes are proposed for the cooperative
networks with multiple relays. Following the construction
in [17–19], GLD codes are deﬁned using a sparse matrix H
constructed by replacing each row in a sparse paritycheck
matrix, which indeed deﬁnes an LDPC code, with n − k
rows including one copy of the paritycheck matrix H
0
of the
constituent code C
0
(n, k). Here, C
0
(n, k) is usually a block
code of code length n and information bit length k, such as
BCH code and ReedSolomon (RS) code.
For an (N, J, n) GLD code with code length N, the matrix
H consists of J submatrices H
1
, . . . , H
J
, where H
j+1
= π
j
(H
1
)
and π
j
, j = 1, . . . , J − 1 denote pseudorandom column
permutations, that is, bitlevel interleavers [19] as illustrated
in Figure 2. Therefore, an (N, J, n) GLD code C can be
considered as the intersection of J supercodes C
1
, . . . , C
J
, that
is, C = ∩
J
j=1
C
j
, where C
1
= C
0
⊕ · · · ⊕ C
0
and C
j+1
=
π
j
(C
1
). Therefore, the code rate of (N, J, n) GLD codes is
R = 1 − J(1 − r) where r = k/n is the code rate of the
constituent code C
0
.
The paritycheck matrix H of GLD code is rearranged
using Gaussian elimination method to get the systematic
form, and then the generator matrix G is achieved. Using
the generator matrix G, information bits are encoded. GLD
codes can be iteratively decoded based on the softinput
softoutput (SISO) decoders of constituent codes [22, 23].
Speciﬁcally, the ﬁrst supercode C
1
is decoded using N/n
SISO decoders executed in parallel, for it is composed of
N/n constituent codes. Then the extrinsic messages of coded
bits are interleaved and fed to the decoder of the second
supercode C
2
as the priori information. Thus, excellent
performance is obtained by iterating the process above for
each supercode [19], that is, C
1
→C
2
→· · · →C
J
→C
1
→· · · .
In the following, distributed GLD codes are devised for
cooperative relay networks using (N, 2, n) GLD codes, for
the performance of GLD codes with J = 2 is asymptotically
good as shown in [19]. In order to make the description more
general, we still employ (N, J, n) to denote the GLD code in
the following.
In the proposed scheme, the source node encodes the
data using an (N, J, n) GLD encoder and then broadcasts
modulated symbols to the sink and simultaneously to all the
relay nodes in the ﬁrst phase. Then, the GLD code is decoded
and forwarded in a distributed manner by relay nodes using
its partial error detecting and error correcting capabilities.
Speciﬁcally, the protocol assigns n
l
, 1 ≤ l ≤ L, diﬀerent
constituent codes of the GLD code to the relay l, according
to the relay number L in the cluster. Since an (N, J, n) GLD
code consists of J·N/n constituent codes, we conﬁgure n
l
to
satisfy
L
l=1
n
l
=
J·N
n
. (5)
In order to eﬃciently use the transmit power, we assume
L ≤ J·N/n and one constituent code is only allocated
to one relay in this scheme. Then each relay decodes the
constituent codes for which it is responsible. The decoding
results of constituent codes which are decoded correctly are
forwarded to the destination by the associated relays in the
second phase. Note that relay nodes do not reencode the data,
which reduces the complexity of relay nodes compared with
distributed turbo codes in [11].
In this way, all the constituent codes forwarded by the
relays construct a distributed GLD code. If all the constituent
codes are forwarded successfully, each code symbol x
i
, 1 ≤
i ≤ N, is indeed forwarded J times by J relays which
constitute a relay set R(c
i
) ⊆ R for the associated code bit
c
i
, where R(c
i
) = J. Therefore, J copies of the bit c
i
from
relays in the R(c
i
) can be combined with the copy from the
source to achieve diversity gain at the destination.
One advantage of the proposed scheme is that it can be
adaptive to the variation of the relay number L by simply
adjusting n
l
. Note that, for each code bit c
i
, the total power
consumed by the source and the associated relays in R(c
i
)
is ﬁxed as P when the relay number L varies. Moreover,
contrary to distributed turbo codes [11], the overall code rate
R of the system is independent of the relay number L and
given as follows:
R =
1 −J(1 −r)
1 + J
. (6)
Therefore, the scheme is quite suitable for the cooperative
networks where the number of active relay nodes may vary
from time to time. In contrast, the distributed turbo codes
may increase the traﬃc of the network when more relays are
employed to improve the performance.
Another advantage of the proposed scheme is that
each relay node is only responsible for relaying one or
4 EURASIP Journal on Wireless Communications and Networking
Input
Hard decision
decoding for C
0
H
0
·
C
0
= 0 i = i + 1
MAP
decoding for C
0
The ith iterative
decoding for GLD
H
0
·
C
0
= 0
H
0
·
C
0
= 0
or i = I
max
i = 1
Decoder stops
Yes
No
Yes
Yes
No
No
Figure 3: Flow chart of progressive decoding for relay nodes.
several constituent codes to the destination according to
the assignment of the protocol. In this way, each relay only
consumes a little energy to relay data and signiﬁcant diversity
gain can be achieved at the destination, for the fading at
diﬀerent locations may vary. In general, the constituent
codes are uniformly allocated to the available relay nodes
in the set R so as to balance the power consumption and
data payload of each relay node. Also, the same quality of
each bit is achieved in this way. This also gives us a hint
to improve the system performance by allowing the relays,
which are lucky to experience good channel conditions to the
destination, to forward more constituent codes than others
with some adaptive protocols. This adaptive strategy will
not be included in this paper. The other design aspects and
advantages of distributed GLD codes will be addressed in the
following parts.
3.2. Progressive decoding for relay nodes
Generally speaking, the internode channels in the same
cluster can be modeled as AWGN channels and are usually
of high quality. For example, the channels between diﬀerent
receiving nodes in the same cluster are modeled as errorfree
channels in [24]. In order to reduce the decoding complexity
and adapt well to the channel variations, a progressive
decoding strategy is proposed for relay nodes as illustrated
in Figure 3 using the partial error detecting and correcting
capabilities of the GLD code.
Let us take the scenario in which one relay node
forwards one constituent code C
0
as an example and the
process can be summarized as follows. First, decode the
constituent code C
0
(n, k) using a hard decision algorithm
based on the received n symbols and obtain the hard decision
C
0
of the codeword. Then, use the parity check matrix
H
0
to verify whether the codeword is correct or not. If
H
0
·
C
0
= 0, the decoder stops and
C
0
is forwarded to
the destination. Otherwise, this codeword will be decoded
utilizing a maximum a posteriori (MAP) decoder, that is,
BCJR algorithm [22, 23]. Similarly, the check criterion is
employed to check the decoding results after MAP decoding.
If H
0
·
C
0
= 0, the relay stops decoding and forwards
C
0
to the destination. Otherwise, the relay will execute iterative
decoding for the whole GLD code based on all the N symbols
from the source. During the iterative decoding, the same
check criterion is executed after each iteration. Once the
check result is correct or the iteration reaches the maximum
times I
max
, the decoding process stops.
Theoretically, undetected errors will be incurred by the
checking criterion, but this is ignored in this paper due to
the good internode channels in a cluster and the good error
detecting capability of the constituent code. In addition, the
probability of decoding failure at relays may be very low
attributed to the signiﬁcant performance of GLD codes.
3.3. Processing at the destination
In the proposed scheme, several independent copies asso
ciated with one symbol are obtained from the source and
relay nodes, and then the signals from diﬀerent paths are
combined before they are inputted into the GLD decoder
at the destination. With the scheme, the coding gain and
diversity gain are achieved with a little additional complexity
compared with strategies without relays.
For each bit c
i
in a GLDcodeword, its loglikelihood ratio
(LLR) can be denoted as
L(c
i
) = log
P
c
i
= 1  y
sd
i
, Y
rd
i
P
c
i
= 0  y
sd
i
, Y
rd
i
= log
P
y
sd
i
, Y
rd
i
 c
i
= 1
P
y
sd
i
, Y
rd
i
 c
i
= 0
,
(7)
where the set Y
rd
i
= {y
rd
i,l
 l ∈ R(c
i
)}. Since all the paths are
independent, we have
L(c
i
) = log
P
y
sd
i
 c
i
= 1
P
y
sd
i
 c
i
= 0
+
l∈R(ci )
log
P
y
rd
i,l
 c
i
= 1
P
y
rd
i,l
 c
i
= 0
= L
sd
(c
i
) +
J
l=1
L
rd
l
(c
i
).
(8)
In (8), L
sd
(c
i
) is the LLR from the source node to the
destination given by
L
sd
(c
i
) = log
P
y
sd
i
 c
i
= 1
P
y
sd
i
 c
i
= 0
. (9)
The LLR from the relay l, l ∈ R(c
i
), to the destination is
denoted as
L
rd
l
(c
i
) = log
P
y
rd
i,l
 c
i
= 1
P
y
rd
i,l
 c
i
= 0
. (10)
Therefore, the receiver structure is depicted in Figure 4.
C. Han and W. Wu 5
In the receiver, fading factors and the parameters of
Gaussian noise of each channel to the destination need to
be estimated before the combination of soft information.
If BPSK modulation is adopted in the system as described
above, the LLR from the source to the destination can be
given as follows:
L
sd
(c
i
) =
2
√
P
s
h
sd
i
y
sd
i
σ
2
, (11)
and the LLR from the relay l, l ∈ R(c
i
), to the destination is
L
rd
l
(c
i
) =
2
P
rd
l
h
rd
i,l
y
rd
i,l
σ
2
l
. (12)
Then, the combined LLRs are sent to the GLD decoder
for iterative decoding. In this way, the diversity and coding
gain are achieved using distributed GLD codes with low
complexity at the destination. Speciﬁcally, the trellisbased
MAP algorithm [22, 23] can be employed to decode the
constituent codes in parallel for the GLD code. Compared
with multiple turbo codes [16], the decoding latency of
GLD codes is shorter due to the parallel decoding of
N/n constituent codes in each supercode. Moreover, in
the proposed scheme, the destination node always needs
to decode the (N, J, n) GLD code even when the relay
number L varies. However, in distributed turbo codes, the
receiver needs to decode the multiple turbo code with L + 1
constituent RSC codes. Here, the code length of the multiple
turbo code also increases with the relay number L, which
is usually much longer than the (N, J, n) GLD code length
N. Therefore, the complexity and decoding latency of the
proposed scheme can be greatly reduced compared with
distributed turbo codes especially when L is large.
In conclusion, Section 3 presents a novel strategy using
distributed GLD codes for multiple relay cooperative com
munications. Firstly, it is a ﬂexible scheme which can adapt
well to cooperative networks with diﬀerent number of
relays. Secondly, the complexity of relay nodes is low, for
the progressive decoding algorithm allows partial decoding
of the GLD code and no reencoding process is needed.
When there are many relays, the power consumption can be
approximately balanced for each relay node, which is quite
essential for relays especially in wireless sensor networks. At
last, the diversity gain and coding gain are achieved with
a little additional complexity compared with the strategy
without relay.
4. SIMULATIONRESULTS
In this section, the performance of the proposed scheme
is simulated and compared with other cooperative coding
schemes. In the simulations, the (420, 2, 15) GLD code is
employed, which takes (15, 11) BCH codes as constituent
codes and has a code rate of R = 7/15.
Firstly, we evaluate the progressive processing at relay
nodes. In Figure 5, the bit error rate (BER) performance of
the (15, 11) BCH code with hard decision decoding, MAP
decoding and the (420, 2, 15) GLD code under diﬀerent
L
rd
1
(c
i
) L
rd
J
(c
i
) L
sd
(c
i
)
· · ·
Combiner GLD decoder
L(c
i
)
Figure 4: The receiver structure at the destination.
10 8 6 4 2 0 −2
E
s
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
BCH HD
BCH MAP
GLD, iter. = 1
GLD, iter. = 2
GLD, iter. = 4
GLD, iter. = 10
Figure 5: Performance of the progressive decoding at relays over
AWGN channel.
iterations over AWGN channel is compared, for internode
channels in a cluster are usually modeled as AWGN channels.
Here, the horizontal axis E
s
/N
0
denotes the signaltonoise
ratio (SNR) of symbols after encoding and modulation. It is
illustrated that proper decoding schemes in the progressive
processing may be chosen according to sourcerelay channel
conditions.
Secondly, the performance of distributed GLD codes
is simulated under diﬀerent conditions. In the following,
we assume the internode channels in the same cluster are
perfect as in [11, 24]. The source and relay nodes face
independent and identically distributed quasistatic Rayleigh
fading channels toward the distant destination. Here, the
source and each relay use the same energy to transmit each
symbol. If the transmit power of each symbol is P, the
source broadcasts the symbol using P/3 and the two relay
nodes related to this symbol share the remainder 2P/3, for
each symbol is relayed twice by two relays in the designed
distributed GLD code. If there is not any relay node, that is,
L = 0, all the transmit power P is allocated to the source
node. Therefore, the comparison is fair and the overall code
rate of the system is R = 7/45. The GLD decoder iterates 10
times at the receiver. We simulate scenarios with or without
sourcedestination path.
6 EURASIP Journal on Wireless Communications and Networking
45 40 35 30 25 20 15 10 5 0
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
L = 0
L = 56
L = 28
L = 14
L = 8
L = 4
L = 2
Figure 6: Performance of distributed GLD codes without source
destination path.
When the sourcedestination path does not exist, the
BER performance of distributed GLD codes with diﬀerent
number of relays is illustrated in Figure 6. The horizontal
axis in Figure 6 is E
b
/N
0
denoting the SNR of information
bits and horizontal axes in performance ﬁgures below are
the same. Seen from Figure 6, distributed GLD codes can
signiﬁcantly improve the system performance and the BER
performance can be further improved as the relay number
increases. Especially, when L = 56, the distributed GLD
code can achieve about 35 dB gain at a BER of 10
−5
over
the scheme without relay. In the scheme with 56 relays, each
relay only needs to forward a single (15, 11) BCH codeword,
that is, 15 symbols. Therefore, a little latency, complexity, and
power consumption are incurred for each relay node.
When the sourcedestination path is included, Figure 7
shows the BER performance under diﬀerent number of relay
nodes. The performance can also be improved as the relay
number increases. Take the scheme with two relays as an
example, in which each relay indeed forwards one supercode
of the GLD code, and it can achieve about 25 dB gain over
the scheme without relay node at a BER of 10
−5
.
The performance of distributed GLD codes with and
without the sourcedestination path is further compared
in Figure 8. It is shown that the sourcedestination path
can improve the performance for the system with the same
number of relay nodes, especially when L is small, such as
L = 2, 4. However, as the relay number increases, the gap
decreases. This may be because that when L is small, the
diversity from the sourcedestination path is prominent in
the overall performance.
Thirdly, the performance of the proposed scheme is fur
ther compared with other two cooperative coding schemes
when the sourcedestination path exists. First, Figure 9
45 40 35 30 25 20 15 10 5 0
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
L = 0
L = 56
L = 28
L = 14
L = 8
L = 4
L = 2
Figure 7: Performance of distributed GLD codes with source
destination path.
25 20 15 10 5
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
No SD, L = 56
No SD, L = 14
No SD, L = 4
No SD, L = 2
SD, L = 56
SD, L = 14
SD, L = 4
SD, L = 2
Figure 8: Performance comparison of distributed GLD codes with
sourcedestination path (SD) and without sourcedestination path
(No SD).
compares the performance of distributed GLDcodes and dis
tributed turbo codes. In the simulations, the rate 1/2(7, 5)
8
recursive systematic convolutional (RSC) codes are used at
the source and all the relay nodes to construct the distributed
turbo code following [11]. Speciﬁcally, the source broadcasts
the RSC code with the code length of 420 bits equal to the
C. Han and W. Wu 7
30 25 20 15 10 5 0
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
Distributed turbo code, L = 4
Distributed GLD code, L = 56
Distributed GLD code, L = 4
Distributed GLD code, L = 2
Figure 9: Performance comparison of distributed GLD codes (R =
7/45) and distributed turbo codes (R = 1/6).
length of GLD code at the source and each relay node only
transmits the paritycheck bits of its own RSC code. For
fair comparison, four relay nodes are used to construct the
distributed turbo code with the overall rate R = 1/6 which
is approximately equal to 7/45 in the distributed GLD code.
The transmit power is allocated to the source and relay nodes
according to the same rule as in the distributed GLD code.
Considering the complexity of the receiver, we choose the
softoutput Viterbi algorithm (SOVA) to decode each RSC
code for the multiple turbo code at the destination.
In Figure 9, it is illustrated that distributed GLD codes
outperform the distributed turbo code. Here, the destination
node in distributed GLD codes always needs to decode the
(420, 2, 15) GLD code for diﬀerent relay number L. However,
in distributed turbo codes, the receiver needs to decode the
multiple turbo code of the code length 420 × (1 + L/2) bits,
which consists of L + 1 constituent (7, 5)
8
RSC codes. The
complexity and decoding latency of the proposed scheme can
be greatly reduced compared with distributed turbo codes
especially when L is large. In addition, distributed GLDcodes
may be used to provide diﬀerent quality of service (QoS)
by employing diﬀerent number of relays while the network
traﬃc is constant, for they can easily adapt to the variation of
the relay number and keep a ﬁxed overall code rate.
Then, Figure 10 compares the proposed scheme with
another relaying coding scheme using turbo codes working
in the similar manner as GLD codes, which is called turbo
multiple relay (TMR) scheme in this paper. In TMR scheme,
the source node broadcasts a rate 1/2 turbo code using the
rate 1/2(7, 5)
8
RSC codes as constituent codes. The turbo
code length is 420 bits and the codeword is also forwarded
twice by relay nodes to achieve the overall rate R = 1/6.
It is observed that TMR scheme exhibits a little better
15 10 5 0
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
TMR, L = 56
TMR, L = 14
Distributed GLD code, L = 56
Distributed GLD code, L = 14
Figure 10: Performance comparison of distributed GLDcodes (R =
7/45) and turbo multiple relay scheme (TMR, R = 1/6).
performance in the waterfall region while it is worse in the
error ﬂoor region. In fact, TMR scheme has some ﬂaws. For
example, the relay is diﬃcult to just decode and detect part
of the codeword while the proposed scheme can ingeniously
use the intrinsic partial error detecting and error correcting
capabilities of the GLD code.
At last, diﬀerent power allocation strategies on dis
tributed GLD codes are investigated. In the simulations
above, the powers allocated to the source and the two
associated relays is P/3 and 2P/3, respectively, thus each
symbol from either the source or each relay has the same
power level. In practical situations, the destination node is
usually located far from the source while the relay nodes
surround the source in a cluster. When the largescale
path loss is considered, the power levels at relay nodes
may be much higher than at the destination. Thus, the
unequal power allocation (UPA) may be considered to
further improve the performance of distributed GLD codes.
Consider a network topology as illustrated in Figure 11.
Here, the transmit cluster is limited in a region with the
radius of 50 meters and the destination is 250 meters
away from the source node, similar to the conﬁguration
in [25]. Generally, relay nodes are uniformly distributed
within the cluster. However, we simplify the situation with
the assumption that all the relay nodes in the cluster are
in a circle with radium of 50 meters and 250 meters away
from the destination and the source is at the center of the
circle. We assume that the average largescale path loss is
expressed as a function of the separation distance using a
path loss exponent γ = 2. Therefore, we allocate transmit
power 2P/25 to the source and let the two associated relays
share the remainder 23P/25 for each symbol. In this way, the
received E
b
/N
0
of the relay node can be still about 6.4 dB
higher than at the destination and thus the reliability of
internode channels in the cluster can still be guaranteed.
8 EURASIP Journal on Wireless Communications and Networking
s
d
50 m
250 m
Figure 11: Topology example of the cooperative network.
30 25 20 15 10 5
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
L = 0
No SD, L = 56
No SD, L = 14
No SD, L = 8
No SD, L = 2
SD, L = 56
SD, L = 14
SD, L = 8
SD, L = 2
Figure 12: Performance of distributed GLD codes of UPA scheme
with and without sourcedestination path.
Figure 12 shows the BER performance of distributed
GLD codes with UPA. It is obvious that the source
destination path still improves the performance for the
system with the same number of relay nodes, especially when
L is small, such as L = 2. However, compared with Figure 8,
the improvement gap of the sourcedestination path in the
UPA scheme is narrower especially when L is large, such as
L = 14, 56. For the systems with many relays, the diversity
of the sourcedestination path can be almost ignored.
When the sourcedestination path does not exist, the
performance of distributed GLD codes with two power
allocation schemes is compared in Figure 13. It is seen
that the UPA strategy with P
s
= 2P/25 can improve the
performance over the scheme with P
s
= P/3 when the system
has the same number of relays. Furthermore, for diﬀerent
relay number L, the improvement gaps of the UPA strategy
are all about 1.3 dB at a BER of 10
−5
.
When the sourcedestination path is included, the BER
performance of distributed GLD codes with two diﬀerent
power allocation strategies is compared in Figure 14. It is
very interesting that when L is small, such as 2 and 4, the
UPA scheme suﬀers performance loss compared with the
scheme with P
s
= P/3. However, when L is large, such
30 25 20 15 10 5 0
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
P
s
= 1/3, L = 56
P
s
= 1/3, L = 8
P
s
= 1/3, L = 4
P
s
= 1/3, L = 2
P
s
= 2/25, L = 56
P
s
= 2/25, L = 8
P
s
= 2/25, L = 4
P
s
= 2/25, L = 2
Figure 13: Performance comparison between the two power
allocation schemes without sourcedestination path.
30 25 20 15 10 5
E
b
/N
0
(dB)
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
B
E
R
P
s
= 1/3, L = 56
P
s
= 1/3, L = 14
P
s
= 1/3, L = 8
P
s
= 1/3, L = 4
P
s
= 1/3, L = 2
P
s
= 2/25, L = 56
P
s
= 2/25, L = 14
P
s
= 2/25, L = 8
P
s
= 2/25, L = 4
P
s
= 2/25, L = 2
Figure 14: Performance comparison between the two power
allocation schemes with sourcedestination path.
as 14 and 56, the UPA scheme contrarily exhibits better
performance compared with the scheme with P
s
= P/3. This
may be because when L is small, the UPA scheme lowers the
eﬀectiveness of the diversity due to the sourcedestination
path while it enforces the eﬀectiveness of relays for large L
such as 56 and 14.
C. Han and W. Wu 9
5. CONCLUSION
Distributed generalized lowdensity codes are constructed
for multiple relay cooperative communications. The pro
posed scheme can adapt well to the variation of the relay
number in the wireless relay network while the overall
code rate of the system is ﬁxed. In the scheme, each relay
is responsible for forwarding one or several constituent
codes of the GLD code, thus the complexity and power
consumption of each relay node are quite limited. Moreover,
a progressive decoding strategy is proposed for relay nodes to
further reduce the complexity and adapt to the sourcerelay
channel variations. At the destination, the soft information
is ﬁrst combined and then iterative decoding is performed
for GLD codes to achieve the coding gain and diversity gain.
The signiﬁcant performance improvements have also been
veriﬁed by simulations over quasistatic fading channels.
In the future, the performance can still be further
improved by allocating constituent codes and transmit
power elaborately to diﬀerent relay nodes considering their
distance to the destination and channel variations.
ACKNOWLEDGMENTS
The authors thank the editors and reviewers for their valu
able comments and suggestions. This work was supported by
National Basic Research Program of China (2007CB310604)
and NSFC (60772108).
REFERENCES
[1] A. Sendonaris, E. Erkip, and B. Aazhang, “Increasing uplink
capacity via user cooperation diversity,” in Proceedings of IEEE
International Symposium on Information Theory (ISIT ’98), p.
156, Cambridge, Mass, USA, August 1998.
[2] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation
diversity—part I: system description,” IEEE Transactions on
Communications, vol. 51, no. 11, pp. 1927–1938, 2003.
[3] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation
diversity—part II: implementation aspects and performance
analysis,” IEEE Transactions on Communications, vol. 51, no.
11, pp. 1939–1948, 2003.
[4] J. N. Laneman, G. W. Wornell, and D. N. C. Tse, “An
eﬃcient protocol for realizing cooperative diversity in wireless
networks,” in Proceedings of IEEE International Symposium on
Information Theory (ISIT ’01), p. 294, Washington, DC, USA,
June 2001.
[5] T. E. Hunter and A. Nosratinia, “Cooperation diversity
through coding,” in Proceedings of IEEE International Sym
posium on Information Theory (ISIT ’02), p. 220, Lausanne,
Switzerland, JuneJuly 2002.
[6] T. E. Hunter and A. Nosratinia, “Diversity through coded
cooperation,” IEEE Transactions on Wireless Communications,
vol. 5, no. 2, pp. 283–289, 2006.
[7] J. Ezri and M. Gastpar, “On the performance of independently
designed LDPC codes for the relay channel,” in Proceedings
of IEEE International Symposium on Information Theory
(ISIT ’06), pp. 977–981, Seattle, Wash, USA, July 2006.
[8] P. Razaghi and W. Yu, “Bilayer lowdensity paritycheck codes
for decodeandforward in relay channels,” IEEE Transactions
on Information Theory, vol. 53, no. 10, pp. 3723–3739, 2007.
[9] A. Chakrabarti, A. de Baynast, A. Sabharwal, and B. Aazhang,
“Low density parity check codes for the relay channel,” IEEE
Journal on Selected Areas in Communications, vol. 25, no. 2,
pp. 280–291, 2007.
[10] B. Zhao and M. C. Valenti, “Distributed turbo coded diversity
for relay channel,” Electronics Letters, vol. 39, no. 10, pp. 786–
787, 2003.
[11] M. C. Valenti and B. Zhao, “Distributed turbo codes: towards
the capacity of the relay channel,” in Proceedings of the 58th
IEEE Vehicular Technology Conference (VTC ’03), vol. 1, pp.
322–326, Orlando, Fla, USA, October 2003.
[12] Y. Li, B. Vucetic, T. F. Wong, and M. Dohler, “Distributed
turbo coding with soft information relaying in multihop relay
networks,” IEEE Journal on Selected Areas in Communications,
vol. 24, no. 11, pp. 2040–2050, 2006.
[13] R. G. Gallager, LowDensity ParityCheck Codes, MIT Press,
Cambridge, Mass, USA, 1963.
[14] K. Azarian, H. El Gamal, and P. Schniter, “On the achievable
diversitymultiplexing tradeoﬀ in halfduplex cooperative
channels,” IEEE Transactions on Information Theory, vol. 51,
no. 12, pp. 4152–4172, 2005.
[15] A. K. Sadek, W. Su, and K. J. R. Liu, “Clustered cooperative
communications in wireless networks,” in Proceedings of IEEE
Global Telecommunications Conference (GLOBECOM ’05), vol.
3, pp. 1–5, St. Louis, Mo, USA, NovemberDecember 2005.
[16] D. Divsalar and F. Pollara, “Multiple turbo codes,” in
Proceedings of IEEE Military Communications Conference
(MILCOM ’95), vol. 1, pp. 279–285, San Diego, Calif, USA,
November 1995.
[17] R. M. Tanner, “A recursive approach to low complexity codes,”
IEEE Transactions on Information Theory, vol. 27, no. 5, pp.
533–547, 1981.
[18] M. Lentmaier and K. S. Ziganﬁrov, “Iterative decoding of
generalized lowdensity paritycheck codes,” in Proceedings
of IEEE International Symposium on Information Theory
(ISIT ’98), p. 149, Cambridge, Mass, USA, August 1998.
[19] J. Boutros, O. Pothier, and G. Zemor, “Generalized low
density (Tanner) codes,” in Proceedimgs of IEEE International
Conference on Communications (ICC ’99), vol. 1, pp. 441–445,
Vancouver, Canada, June 1999.
[20] O. Pothier, L. Brunel, and J. Boutros, “A low complexity FEC
scheme based on the intersection of interleaved block codes,”
in Proceedings of the 49th IEEE Vehicular Technology Conference
(VTC ’99), vol. 1, pp. 274–278, Houston, Tex, USA, May 1999.
[21] M. Yuksel and E. Erkip, “Diversitymultiplexing tradeoﬀ in
cooperative wireless systems,” in Proceedings of the 40th Annual
Conference on Information Sciences and Systems (CISS ’06), pp.
1062–1067, Princeton, NJ, USA, March 2007.
[22] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding
of linear codes for minimizing symbol error rate,” IEEE
Transactions on Information Theory, vol. 20, no. 2, pp. 284–
287, 1974.
[23] T. Matsumoto, “Decoding performance of linear block codes
using a trellis in digital mobile radio,” IEEE Transactions on
Vehicular Technology, vol. 39, no. 1, pp. 68–74, 1990.
[24] X. Li, T. F. Wong, and J. M. Shea, “Performance analysis
for collaborative decoding with leastreliablebits exchange on
AWGN channels,” IEEE Transactions on Communications, vol.
56, no. 1, pp. 58–69, 2008.
[25] S. Yi, B. AzimiSadjadi, S. Kalyanaraman, and V. Subrama
nian, “Error control code combining techniques in cluster
based cooperative wireless networks,” in Proceedimgs of IEEE
International Conference on Communications (ICC ’05), vol. 5,
pp. 3510–3514, Seoul, South Korea, May 2005.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 658042, 14 pages
doi:10.1155/2008/658042
Research Article
ReedSolomon Turbo Product Codes for Optical
Communications: FromCode Optimization to Decoder Design
Rapha¨ el Le Bidan, Camille Leroux, Christophe Jego, Patrick Adde, and Ramesh Pyndiah
Institut TELECOM, TELECOM Bretagne, CNRS LabSTICC, Technopˆ ole BrestIroise, CS 83818, 29238 Brest Cedex 3, France
Correspondence should be addressed to Rapha¨ el Le Bidan, raphael.lebidan@telecombretagne.eu
Received 31 October 2007; Accepted 22 April 2008
Recommended by Jinhong Yuan
Turbo product codes (TPCs) are an attractive solution to improve link budgets and reduce systems costs by relaxing the
requirements on expensive optical devices in high capacity optical transport systems. In this paper, we investigate the use of
ReedSolomon (RS) turbo product codes for 40 Gbps transmission over optical transport networks and 10 Gbps transmission
over passive optical networks. An algorithmic study is ﬁrst performed in order to design RS TPCs that are compatible with
the performance requirements imposed by the two applications. Then, a novel ultrahighspeed parallel architecture for turbo
decoding of product codes is described. A comparison with binary BoseChaudhuriHocquenghem (BCH) TPCs is performed.
The results show that highrate RS TPCs oﬀer a better complexity/performance tradeoﬀ than BCH TPCs for lowcost Gbps ﬁber
optic communications.
Copyright © 2008 Rapha¨ el Le Bidan et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
1. INTRODUCTION
The ﬁeld of channel coding has undergone major advances
for the last twenty years. With the invention of turbo
codes [1] followed by the rediscovery of lowdensity parity
check (LDPC) codes [2], it is now possible to approach the
fundamental limit of channel capacity within a few tenths
of a decibel over several channel models of practical interest
[3]. Although this has been a major step forward, there is still
a need for improvement in forwarderror correction (FEC),
notably in terms of code ﬂexibility, throughput, and cost.
In the early 90’s, coinciding with the discovery of turbo
codes, the deployment of FEC began in optical ﬁber commu
nication systems. For a long time, there was no real incentive
to use channel coding in optical communications since the
bit error rate (BER) in lightwave transmission systems can
be as low as 10
−9
–10
−15
. Then, the progressive introduction
of inline optical ampliﬁers and the advent of wavelength
division multiplexing (WDM) technology accelerated the use
of FEC up to the point that it is now considered almost
routine in optical communications. Channel coding is seen
as an eﬃcient technique to reduce systems costs and to
improve margins against various line impairments such as
beat noise, channel crosstalk, or nonlinear dispersion. On
the other hand, the design of channel codes for optical
communications poses remarkable challenges to the system
engineer. Good codes are indeed expected to provide at the
same time low overhead (high code rate) and guaranteed
large coding gains at very low BER [4]. Furthermore, the
issue of decoding complexity should not be overlooked
since data rates have now reached 10 Gbps and beyond
(up to 40 Gbps), calling for FEC devices with low power
consumption.
FEC schemes for optical communications are commonly
classiﬁed into three generations. The reader is referred to
[5, 6] for an indepth historical perspective of FECfor optical
communication. Firstgeneration FECschemes mainly relied
on the (255, 239) ReedSolomon (RS) code over the Galois
ﬁeld GF(256), with only 6.7% overhead. In particular, this
code was recommended by the ITU for longhaul submarine
transmissions. Then, the development of WDM technology
provided the impetus for moving to secondgeneration FEC
systems, based on concatenated codes with higher coding
gains [7]. Thirdgeneration FEC based on softdecision
decoding is now the subject of intense research since stronger
FEC are seen as a promising way to reduce costs by relaxing
the requirements on expensive optical devices in high
capacity transport systems.
2 EURASIP Journal on Wireless Communications and Networking
K
1
K
2
N
1
N
2
Information
symbols
Checks
on rows
Checks on
columns
Checks on
checks
Figure 1: Codewords of the product code P = C
1
⊗C
2
.
First introduced in [8], turbo product codes (TPCs)
based on binary BoseChaudhuriHocquenghem (BCH)
codes are an eﬃcient and mature technology that has found
its way in several (either proprietary or public) wireless
transmission systems [9]. Recently, BCH TPCs have received
considerable attention for thirdgeneration FEC in optical
systems since they show good performance at high code rates
and have a highminimum distance by construction. Fur
thermore, their regular structure is amenable to veryhigh
datarate parallel decoding architectures [10, 11]. Research
on TPCs for lightwave systems culminated recently with
the experimental demonstration of a record coding gain of
10.1 dB at a BER of 10
−13
using a (144, 128) × (256, 239)
BCH turbo product code with 24.6% overhead [12]. This
gain was measured using a turbo decoding verylargescale
integration (VLSI) circuit operating on 3bit soft inputs at
a data rate of 12.4 Gbps. LDPC codes are also considered as
serious candidate for third generation FEC. Impressive cod
ing gains have notably been demonstrated by MonteCarlo
simulation [13]. To date however, to the best of the authors
knowledge, no highrate LDPC decoding architecture has
been proposed in order to demonstrate the practicality of
LDPC codes for Gbps optical communications.
In this work, we investigate the use of ReedSolomon
TPCs for thirdgeneration FEC in ﬁber optic communi
cation. Two speciﬁc applications are envisioned, namely
40 Gbps line rate transmission over optical transport net
works (OTNs), and 10 Gbps data transmission over passive
optical networks (PONs). These two applications have diﬀer
ent requirements with respect to FEC. An algorithmic study
is ﬁrst carried out in order to design RS product codes for
the two applications. In particular, it is shown that highrate
RS TPCs based on carefully designed singleerrorcorrecting
RS codes realize an excellent performance/complexity trade
oﬀ for both scenarios, compared to binary BCH TPCs of
similar code rate. In a second step, a novel parallel decoding
architecture is introduced. This architecture allows decoding
of turbo product codes at data rates of 10 Gbps and beyond.
Complexity estimations show that RS TPCs better trade
oﬀ area and throughput than BCH TPCs for fullparallel
decoding architectures. An experimental setup based on
ﬁeldprogrammable gate array (FPGA) devices has been
successfully designed for 10 Gbps data transmission. This
prototype demonstrates the practicality of RS TPCs for next
generation optical communications.
The remainder of the paper is organized as follows.
Construction and properties of RS product codes are
introduced in Section 2. Turbo decoding of RS product
codes is described in Section 3. Product code design for
optical communication and related algorithmic issues are
discussed in Section 4. The challenging issue of designing a
highthroughput parallel decoding architecture for product
codes is developed in Section 5. A comparison of throughput
and complexity between decoding architectures for RS and
BCH TPCs is carried out in Section 6. Section 7 describes
the successful realization of a turbo decoder prototype
for 10 Gbps transmission. Conclusions are ﬁnally given in
Section 8.
2. REEDSOLOMONPRODUCT CODES
2.1. Code construction and systematic encoding
Let C
1
and C
2
be two linear block codes over the Galois
ﬁeld GF(2
m
), with parameters (N
1
, K
1
, D
1
) and (N
2
, K
2
, D
2
),
respectively. The product code P = C
1
⊗ C
2
consists of
all N
1
× N
2
matrices such that each column is a codeword
in C
1
and each row is a codeword in C
2
. It is well
known that P is an (N
1
N
2
, K
1
K
2
) linear block code with
minimum distance D
1
D
2
over GF(2
m
) [14]. The direct
product construction thus oﬀers a simple way to build
long block codes with relatively large minimum distance
using simple, short component codes with small minimum
distance. When C
1
and C
2
are two RS codes over GF(2
m
),
we obtain an RS product code over GF(2
m
). Similarly, the
direct product of two binary BCH codes yields a binary BCH
product code.
Starting from a K
1
× K
2
information matrix, systematic
encoding of P is easily accomplished by ﬁrst encoding the K
1
information rows using a systematic encoder for C
2
. Then,
the N
2
columns are encoded using a systematic encoder for
C
1
, thus resulting in the N
1
× N
2
coded matrix shown in
Figure 1.
2.2. Binary image of RS product codes
Binary modulation is commonly used in optical commu
nication systems. A binary expansion of the RS product
code is then required for transmission. The extension ﬁeld
GF(2
m
) forms a vector space of dimension m over GF(2).
A binary image P
b
of P is thus obtained by expanding
each code symbol in the product code matrix into mbits
using some basis B for GF(2
m
). The polynomial basis B =
{1, α, . . . , α
m−1
} where α is a primitive element of GF(2
m
)
is the usual choice, although other basis exist [15, Chapter
8]. By construction, P
b
is a binary linear code with length
mN
1
N
2
, dimension mK
1
K
2
, and minimum distance d at least
as large as the symbollevel minimum distance D = D
1
D
2
[14, Section 10.5].
Rapha¨ el Le Bidan et al. 3
3. TURBODECODINGOF RS PRODUCT CODES
Product codes usually have high dimension which precludes
maximumlikelihood (ML) softdecision decoding. Yet the
particular structure of the product code lends itself to an
eﬃcient iterative “turbo” decoding algorithm oﬀering close
tooptimum performance at highenough signaltonoise
ratios (SNRs).
Assume that a binary transmission has taken place over
a binaryinput channel. Let Y = (y
i, j
) denote the matrix
of samples delivered by the receiver frontend. The turbo
decoder soft input is the channel loglikelihood ratio (LLR)
matrix, R = (r
i, j
), with
r
i, j
= A ln
f
1
¸
y
i, j
f
0
¸
y
i, j
. (1)
Here A is a suitably chosen constant term, and f
b
(y) denotes
the probability of observing the sample y at the channel
output given that bit b has been transmitted.
Turbo decoding is realized by decoding successively the
rows and columns of the channel matrix R using softinput
softoutput (SISO) decoders, and by exchanging reliability
information between the decoders until a reliable decision
can be made on the transmitted bits.
3.1. SISOdecoding of the component codes
In this work, SISO decoding of the RS component codes
is performed at the bitlevel using the ChasePyndiah
algorithm. First introduced in [8] for binary BCH codes
and latter extended to RS codes in [16], the Chase
Pyndiah decoder consists of a softinput hardoutput Chase
2 decoder [17] augmented by a softoutput computation
unit.
Given a softinput sequence r = (r
1
, . . . , r
mN
) corre
sponding to a row (N = N
2
) or column (N = N
1
) of
R, the Chase2 decoder ﬁrst forms a binary harddecision
sequence y = (y
1
, . . . , y
mN
). The reliability of the hard
decision y
i
on the ith bit is measured by the magnitude r
i

of the corresponding soft input. Then, N
ep
error patterns
are generated by testing diﬀerent combinations of 0 and
1 in the L
r
least reliable bit positions. In general, N
ep
≤
2
Lr
with equality if all combinations are considered. Those
error patterns are added modulo2 to the harddecision
sequence y to form candidate sequences. Algebraic decoding
of the candidate sequences returns a list with at most N
ep
distinct candidate codewords. Among them, the codeword d
at minimum Euclidean distance from the input sequence r is
selected as the ﬁnal decision.
Softoutput computation is then performed as follows.
For a given bit i, the list of candidate codewords is searched
for a competing codeword c at minimum Euclidean distance
from r and such that c
i /
=d
i
. If such a codeword exists, then
the soft output r
i
on the ith bit is given by
r
i
=
r −c
2
−r −d
2
4
¸
×d
i
, (2)
R R
W
k+1
W
k
α
k
R
k
D
k
Row/column
SISO decoding
Figure 2: Block diagram of the turbodecoder at the kth half
iteration.
where ·
2
denotes the squared norm of a sequence.
Otherwise, the soft output is computed as follows:
r
i
= r
i
+ β ×d
i
, (3)
where β is a positive value, computed on a percodeword
basis, as suggested in [18]. Following the socalled “turbo
principle,” the soft input r
i
is ﬁnally subtracted from the soft
output r
i
to obtain the extrinsic information
w
i
= r
i
−r
i
(4)
which will be sent to the next decoder.
3.2. Iterative decoding of the product code
The block diagram of the turbo decoder at the kth half
iteration is shown in Figure 2. A halfiteration stands for a
row or column decoding step, and one iteration comprises
two halfiterations. The input of the SISO decoder at half
iteration k is given by
R
k
= R + α
k
W
k
, (5)
where α
k
is a scaling factor used to attenuate the inﬂuence of
extrinsic information during the ﬁrst iterations, and where
W
k
= (w
i, j
) is the extrinsic information matrix delivered
by the SISO decoder at the previous halfiteration. The
decoder outputs an updated extrinsic information matrix
W
k+1
, and possibly a matrix D
k
of harddecisions. Decoding
stops when a given maximumnumber of iterations have been
performed, or when an earlytermination condition (stop
criterion) is met.
The use of a stop criterion can improve the convergence
of the iterative decoding process and also reduce the average
powerconsumption of the decoder by decreasing the average
number of iterations required to decode a block. An eﬃcient
stop criterion taking advantage of the structure of the
product codes was proposed in [19]. Another simple and
eﬀective solution is to stop when the hard decisions do
not change between two successive halfiterations (i.e., no
further corrections are done).
4. RS PRODUCT CODE DESIGNFOR
OPTICAL COMMUNICATIONS
Two optical communication scenarios have been identiﬁed
as promising applications for thirdgeneration FEC based on
RS TPCs: 40 Gbps data transport over OTN, and 10 Gbps
data transmission over PON. In this section, we ﬁrst review
4 EURASIP Journal on Wireless Communications and Networking
the own expectations of each application with respect to
FEC. Then, we discuss the algorithmic issues that have been
encountered and solved in order to design RS TPCs that are
compatible with these requirements.
4.1. FEC design for data transmission over
OTNand PON
40 Gbps transport over OTN calls for both highcoding gains
and low overhead (<10%). Highcoding gains are required
in order to insure high data integrity with BER in the
range 10
−13
–10
−15
. Lowoverhead limit optical transmission
impairments caused by bandwidth extension. Note that
these two requirements usually conﬂict with each other to
some extent. The complexity and power consumption of
the decoding circuit is also an important issue. A possible
solution, proposed in [6], is to multiplex in parallel four
powerful FEC devices at 10 Gbps. However 40 Gbps lowcost
line cards are a key to the deployment of 40 Gbps systems.
Furthermore, the cost of line cards is primarily dominated
by the electronics and optics operating at the serial line rate.
Thus, a single lowcost 40 Gbps FEC device could compete
favorably with the former solution if the loss in coding gain
(if any) remains small enough.
For data transmission over PON, channel codes with low
cost and low latency (small block size) are preferred to long
codes (>10 Kbits) with highcoding gain. BER requirements
are less stringent than for OTN and are typically of the order
of 10
−11
. Highcoding gains result in increased link budget
[20]. On the other hand, decoding complexity should be kept
at a minimum in order to reduce the cost of optical network
units (ONUs) deployed at the enduser side. Channel codes
for PON are also expected to be robust against burst errors.
4.2. Choice of the component codes
On the basis of the abovementioned requirements, we have
chosen to focus on RS product codes with less than 20%
overhead. Higher overheads lead to larger signal bandwidth,
thereby increasing in return the complexity of electronic and
optical components. Since the rate of the product code is
the product of the individual rates of the component codes,
RS component codes with code rate R ≥ 0.9 are necessary.
Such code rates can be obtained by considering multiple
errorcorrecting RS codes over large Galois ﬁelds, that is,
GF(256) and beyond. Another solution is to use singleerror
correcting (SEC) RS codes over Galois ﬁelds of smaller order
(32 or 64). The latter solution has been retained in this work
since it leads to lowcomplexity SISO decoders.
First, it is shown in [21] that 16 error patterns are suﬃ
cient to obtain nearoptimum performance with the Chase
Pyndiah algorithm for SEC RS codes. In contrast, more
sophisticated SISO decoders are required with multiple
errorcorrecting RS codes (e.g., see [22] or [23]) since
the number of error patterns necessary to obtain near
optimum performance with the ChasePyndiah algorithm
grows exponentially with mt for a terrorcorrection RS code
over GF(2
m
).
In addition, SEC RS codes admit lowcomplexity alge
braic decoders. This feature further contributes to reduc
ing the complexity of the ChasePyndiah algorithm. For
multipleerrorcorrecting RS codes, the BerlekampMassey
algorithm and the Euclidean algorithm are the preferred
algebraic decoding methods [15]. But they introduce unnec
essary overhead computations for SEC codes. Instead, a
more simpler decoder is obtained from the direct decoding
method devised by Peterson, Gorenstein, and Zierler (PGZ
decoder) [24, 25]. First, the two syndromes S
1
and S
2
are
calculated by evaluating the received polynomial r(x) at the
two code roots α
b
and α
b+1
:
S
i
= r
¸
α
b+i−1
=
N−1
¸
=0
r
α
(b+i−1)
, i = 1, 2. (6)
If S
1
= S
2
= 0, r(x) is a valid codeword and decoding stops.
If only one of the two syndromes is zero, a decoding failure is
declared. Otherwise, the error locator X is calculated as
X =
S
2
S
1
(7)
from which the error location i is obtained by taking the
discrete logarithm of X. The error magnitude E is ﬁnally
given by
E =
S
1
X
b
. (8)
Hence, apart from the syndrome computation, at most
two divisions over GF(2
m
) are required to obtain the error
position and value with the PGZ decoder (only one is needed
when b = 0). The overall complexity of the PGZ decoder is
usually dominated by the initial syndrome computation step.
Fortunately, the syndromes need not be fully recomputed
at each decoding attempt in the Chase2 decoder. Rather,
they can be updated in a very simple way by taking only
into account the bits that are ﬂipped between successive
error patterns [26]. This optimization further alleviates SISO
decoding complexity.
On the basis of the above arguments, two RS product
codes have been selected for the two envisioned applications.
The (31, 29)
2
RS product code over GF(32) has been retained
for PON systems since it combines a moderate overhead of
12.5%with a moderate code length of 4805 coded bits. This is
only twice the code length of the classical (255, 239) RS code
over GF(256). On the other hand, the (63, 61)
2
RS product
code over GF(64) has been preferred for OTN, since it has a
smaller overhead (6.3%), similar to the one introduced by
the standard (255, 239) RS code, and also a larger coding
gain, as we will see later.
4.3. Performance analysis and code optimization
RS product codes built from SEC RS component codes
are very attractive from the decoding complexity point of
view. On the other hand, they have lowminimum distance
D = 3 × 3 = 9 at the symbol level. Therefore, it is of
capital interest to verify that this lowminimum distance
Rapha¨ el Le Bidan et al. 5
does not introduce error ﬂares in the code performance
curve that would penalize the eﬀective coding gain at low
BER. Montecarlo simulations can be used to evaluate the
code performance down to BER of 10
−10
–10
−11
within a
reasonable computation time. For lower BER, analytical
bounding techniques are required.
In the following, binary onoﬀ keying (OOK) intensity
modulation with direct detection over additive white Gaus
sian noise (AWGN) is assumed. This model was adopted here
as a ﬁrst approximation which simpliﬁes the analysis and also
facilitates the comparison with other channel codes. More
sophisticated models of optical systems for the purpose of
assessing the performance of channel codes are developed in
[27, 28]. Under the previous assumptions, the BER of the
RS product code at high SNRs and under ML softdecision
decoding is well approximated by the ﬁrst term of the union
bound:
BER ≈
d
mN
1
N
2
B
d
2
erfc
Q
d
2
¸
, (9)
where Q is the input Qfactor (see [29, Chapter 5]), d is the
minimum distance of the binary image P
b
of the product
code, and B
d
the corresponding multiplicity (number of
codewords with minimum Hamming weight d in P
b
).
This expression shows that the asymptotic performance of
the product code is determined by the bitlevel minimum
distance d of the product code, not by the symbol minimum
distance D
1
D
2
.
The knowledge of the quantities d and B
d
is required
in order to predict the asymptotic performance of the
code in the high Qfactor (low BER) region using (9).
These parameters depend in turn on the basis B used
to represent the 2
m
ary symbols as bits, and are usually
unknown. Computing the exact binary weight enumerator
of RS product codes is indeed a very diﬃcult problem. Even
the symbol weight enumerator is hard to ﬁnd since it is not
completely determined by the symbol weight enumerators
of the component codes [30]. An average binary weight
enumerator for RS product codes was recently derived
in [31]. This enumerator is simple to calculate. However
simulations are still required to assess the tightness of the
bounds for a particular code realization. A computational
method that allows the determination of d and A
d
under
certain conditions was recently suggested in [32]. This
method exploits the fact that product codewords with
minimum symbol weight D
1
D
2
are readily constructed as the
direct product of a minimumweight row codeword with a
minimumweight column codeword. Speciﬁcally, there are
exactly
A
D1D2
=
¸
2
m
−1
N
1
D
1
¸
N
2
D
2
¸
(10)
distinct codewords with symbol weight D
1
D
2
in the product
code C
1
⊗ C
2
. They can be enumerated with the help of a
computer provided the number A
D1D2
of such codewords
is not too large. Estimates
d and B
d
are then obtained by
computing the Hamming weight of the binary expansion
Table 1: Minimum distance d and multiplicity B
d
for the binary
image of the (31, 29)
2
and (63, 61)
2
RS product codes as a function
of the ﬁrst code root α
b
.
Product code mK
2
mN
2
R b d B
d
(31, 29, 3)
2
4205 4805 0.875
1 9 217,186
0 14 6,465,608
(63, 61, 3)
2
22326 23814 0.937
1 9 4,207,140
0 14 88,611,894
of those codewords. Necessarily, d ≤
d. If it can be shown
that product codewords of symbol weight >D
1
D
2
necessarily
have binary minimum distance >
d at the bit level (this is not
always the case, depending on the value of
d), then it follows
that d =
d and B
d
= B
d
.
This method has been used to obtain the binary mini
mum distance and multiplicity of the (31, 29)
2
and (63, 61)
2
RS product codes using narrowsense component codes with
generator polynomial g(x) = (x − α)(x − α
2
). This is the
classical deﬁnition of SEC RS codes that can be found in
most textbooks. The results are given in Table 1. We observe
that in both cases, we are in the most unfavorable case where
the bitlevel minimum distance d is equal to the symbollevel
minimum distance D, and no greater. Simulation results for
the two RS TPCs after 8 decoding iterations are shown in
Figures 3 and 4, respectively. The corresponding asymptotic
performance calculated using (9) are plotted in dashed
lines. For comparison purpose, we have also included the
performance of algebraic decoding of RS codes of similar
code rate over GF(256). We observe that the lowminimum
distance introduces error ﬂares at BER of 10
−8
and 10
−9
for the (31, 29)
2
and (63, 61)
2
product codes, respectively.
Clearly, the two RS TPCs do not match the BERrequirements
imposed by the envisioned applications.
One solution to increase the minimum distance of the
product code is to resort to code extension or expurgation.
However this approach increases the overhead. It also
increases decoding complexity since a higher number of
error patterns are then required to maintain nearoptimum
performance with the ChasePyndiah algorithm [21]. In this
work, another approach has been considered. Speciﬁcally,
investigations have been conducted in order to identify
code constructions that can be mapped into binary images
with minimum distance larger than 9. One solution is
to investigate diﬀerent basis B. How to ﬁnd a basis that
maps a nonbinary code into a binary code with bitlevel
minimum distance strictly larger than the symbollevel
designed distance remains a challenging research problem.
Thus, the problem was relaxed by ﬁxing the basis to be
the polynomial basis, and studying instead the inﬂuence of
the choice of the code roots on the minimum distance of
the binary image. Any SEC RS code over GF(2
m
) can be
compactly described by its generator polynomial
g(x) =
¸
x −α
b
¸
x −α
b+1
, (11)
6 EURASIP Journal on Wireless Communications and Networking
6 7 8 9 10 11
Qfactor (dB)
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
B
i
t
e
r
r
o
r
r
a
t
e
Uncoded OOK
RS (255, 223)
RS (31, 29)
2
with b = 1
RS (31, 29)
2
with b = 0
eBCH (128, 120)
2
Figure 3: BER performance of the (31, 29)
2
RS product code as a
function of the ﬁrst code root α
b
, after 8 iterations.
where b is an integer in the range 0 · · · 2
m
− 2. Narrow
sense RS codes are obtained by setting b = 1 (which is
the usual choice for most applications). Note however that
diﬀerent values for b generate diﬀerent sets of codewords,
and thus diﬀerent RS codes with possibly diﬀerent binary
weight distributions. In [32], it is shown that alternate SEC
RS codes obtained by setting b = 0 have minimum distance
d = D + 1 = 4 at the bit level. This is a notable improvement
over classical narrowsense (b = 1) RS codes for which
d = D = 3. This result suggests that RS product codes should
be preferably built from two RS component codes with ﬁrst
root α
0
. RS product codes constructed in this way will be
called alternate RS product codes in the following.
We have computed the binary minimum distance d
and multiplicity A
d
of the (31, 29)
2
and (63, 61)
2
alternate
RS product codes. The values are reported in Table 1.
Interestingly, the alternate product codes have a minimum
distance d as high as 14 at the bitlevel, at the expense of
an increase of the error coeﬃcient B
d
. Thus, we get most of
the gain oﬀered by extended or expurgated codes (for which
d = 16, as veriﬁed by computer search) but without reducing
the code rate. It is also worth noting that this extra coding
gain is obtained without increasing decoding complexity.
The same SISO decoder is used for both narrowsense and
alternate SEC RS codes. In fact, the only modiﬁcations occur
in (6)–(8) of the PGZ decoder, which actually simplify when
b = 0. Simulated performance and asymptotic bounds for
the alternate RS product codes are shown in Figures 3 and
4. A notable improvement is observed in comparison with
the performance of the narrowsense product codes since
the error ﬂare is pushed down by several decades in both
cases. By extrapolating the simulation results, the net coding
gain (as deﬁned in [5]) at a BER of 10
−13
is estimated to be
7 8 9 10 11 12
Qfactor (dB)
10
−15
10
−10
10
−5
B
i
t
e
r
r
o
r
r
a
t
e
Uncoded OOK
RS (255, 239)
RS (63, 61)
2
with b = 1
RS (63, 61)
2
with b = 0
eBCH (256, 247)
2
Figure 4: BER performance of the (63, 61)
2
RS product code as a
function of the ﬁrst code root α
b
, after 8 decoding iterations.
around 8.7 dB and 8.9 dB for the RS(31, 29)
2
and RS(63, 61)
2
,
respectively. As a result, the two selected RS product codes
are now fully compatible with the performance requirements
imposed by the respective envisioned applications. More
importantly, this achievement has been obtained at no cost.
4.4. Comparison with BCHproduct codes
A comparison with BCH product codes is in order since
BCHproduct codes have already found application in optical
communications. A major limitation of BCH product codes
is that very large block lengths (>60000 coded bits) are
required to achieve high code rates (R > 0.9). On the other
hand, RS product codes can achieve the same code rate than
BCH product codes, but with a block size about 3 times
smaller [21]. This is an interesting advantage since, as shown
latter in the paper, large block lengths increase the decoding
latency and also the memory complexity in the decoder
architecture. RS product codes are also expected to be more
robust to error bursts than BCH product codes. Both coding
schemes inherit burstcorrection properties from the row
column interleaving in the direct product construction. But
RS product codes also beneﬁt from the fact that, in the most
favorable case, m consecutive erroneous bits may cause a
single symbol error in the received word.
A performance comparison has been carried out
between the two selected RS product codes and extended
BCH(eBCH) product codes of similar code rate: the
eBCH(128, 120)
2
and the eBCH(256, 247)
2
. Code extension
has been used for BCH codes since it increases mini
mum distance without increasing decoding complexity nor
decreasing signiﬁcantly the code rate, in contrast to RS
codes. Both eBCH TPCs have minimum distance 16 with
Rapha¨ el Le Bidan et al. 7
6 7 8 9 10 11 12 13 14 15
Qfactor (dB)
10
−10
10
−8
10
−6
10
−4
10
−2
B
i
t
e
r
r
o
r
r
a
t
e
Uncoded OOK
OOK + RS (255, 239)
OOK + RS (63, 61)
2
unquantized
OOK + RS (63, 61)
2
3−bit
OOK + RS (63, 61)
2
4−bit
Figure 5: BER performance for the (63, 61)
2
RS product code as a
function of the number of quantization bits for the softinput (sign
bit included).
multiplicities 85344
2
and 690880
2
, respectively. Simulation
results after 8 iterations are shown in Figures 3 and 4.
The corresponding asymptotic bounds are plotted in dashed
lines. We observe that eBCH TPCs converge at lower
Qfactors. As a result, a 0.3dB gain is obtained at BER in the
range 10
−8
–10
−10
. However, the large multiplicities of eBCH
TPCs introduce a change of slope in the performance curves
at lower BER. In fact, examination of the asymptotic bounds
shows that alternate RS TPCs are expected to perform at least
as well as eBCH TPCs in the BER range of interest for optical
communication, for example, 10
−10
–10
−15
. Therefore, we
conclude that RS TPCs compare favorably with eBCH TPCs
in terms of performance. We will see in the next sections that
RS TPCs have additional advantages in terms of decoding
complexity and throughput for the target applications.
4.5. Softinput quantization
The previous performance study assumed unquantized soft
values. In a practical receiver, a ﬁnite number q of bits
(sign bit included) is used to represent soft information.
Softinput quantization is performed by an analogtodigital
converter (ADC) in the receiver frontend. The very high
bit rate in ﬁber optical systems makes ADC a challenging
issue. It is therefore necessary to study the impact of soft
input quantization on the performance. Figure 5 presents
simulation results for the (63, 61)
2
alternate RS product code
using q = 3 and q = 4 quantization bits, respectively. For
comparison purpose, the performance without quantization
is also shown. Using q = 4 bits yields virtually no
degradation with respect to ideal (inﬁnite) quantization,
whereas q = 3 bits of quantization introduce a 0.5 dB penalty.
Similar conclusions have been obtained with the (31, 29)
2
RS
product code and also with various eBCH TPCs, as reported
in [27, 33] for example.
5. FULLPARALLEL TURBODECODING
ARCHITECTURE DEDICATEDTOPRODUCT CODES
Designing turbo decoding architectures compatible with the
very highline rate requirements imposed by ﬁber optics
systems at reasonable cost is a challenging issue. Parallel
decoding architectures are the only solution to achieve data
rates above 10 Gbps. A simple architectural solution is to
duplicate the elementary decoders in order to achieve the
given throughput. However, this solution results in a turbo
decoder with unacceptable cumulative area. Thus, smarter
parallel decoding architectures have to be designed in order
to better tradeoﬀ performance and complexity under the
constraint of a highthroughput. In the following, we focus
on an (N
2
, K
2
) product code obtained from with two
identical (N, K) component codes over GF(2
m
). For 2
m
ary
RS codes, m > 1 whereas m = 1 for binary BCH codes.
5.1. Previous work
Many turbo decoder architectures for product codes have
been proposed in the literature. The classical approach
involves decoding all the rows or all the columns of a
matrix before the next halfiteration. When an application
requires highspeed decoders, an architectural solution is to
cascade SISO elementary decoders for each halfiteration. In
this case, memory blocks are necessary between each half
iteration to store channel data and extrinsic information.
Each memory block is composed of four memories of mN
2
soft values. Thus, duplicating a SISO elementary decoder
results in duplicating the memory block which is very costly
in terms of silicon area. In 2002, a new architecture for
turbo decoding product codes was proposed [10]. The idea
is to store several data at the same address and to perform
semiparallel decoding to increase the data rate. However, it is
necessary to process these data by row and by column. Let
us consider l adjacent rows and l adjacent columns of the
initial matrix. The l
2
data constitute a word of the newmatrix
that has l
2
times fewer addresses. This data organization
does not require any particular memory architecture. The
results obtained show that the turbo decoding throughput is
increased by l
2
when l elementary decoders processing l data
simultaneously are used. Turbo decoding latency is divided
by l. The area of the l elementary decoders is increased by l/2
while the memory is kept constant.
5.2. Fullparallel decoding principle
All rows (or all columns) of a matrix can be decoded in
parallel. If the architecture is composed of 2N elementary
decoders, an appropriate treatment of the matrix allows the
elimination of the reconstruction of the matrix between
each halfiteration decoding step. Speciﬁcally, let i and j be
the indices of a row and a column of the N × N matrix.
In fullparallel processing, the row decoder i begins the
8 EURASIP Journal on Wireless Communications and Networking
N rows of N
soft values
Soft value
N columns of N soft values
j
i
Index (i + 1) = i + 1 mod N
Index ( j + 1) = j −1 mod N
Figure 6: Fullparallel decoding of a product code matrix.
decoding by the soft value in the ith position. Moreover,
each row decoder processes the soft values by increasing the
index by one modulo N. Similarly, the column decoder j
begins the decoding by the soft value in the jth position.
In addition, each column decoder processes the soft values
by decreasing the index by one modulo N. In fact, full
parallel decoding of turbo product code is possible thanks
to the cyclic property of BCH and RS codes. Indeed, every
cyclic shift c
= (c
N−1
, c
0
, . . . , c
N−3
, c
N−2
) of a codeword
c = (c
0
, c
1
, . . . , c
N−2
, c
N−1
) is also a valid codeword in a cyclic
code. Therefore, only oneclock period is necessary between
two successive matrix decoding operations. The fullparallel
decoding of an N × N product code matrix is described in
Figure 6. A similar strategy was previously presented in [34]
where memory access conﬂicts are resolved by means of an
appropriate treatment of the matrix.
The elementary decoder latency depends on the structure
of the decoder (i.e., number of pipeline stages) and the
code length N. Here, as the reconstruction matrix is
removed, the latency between row and column decoding is
null.
5.3. Fullparallel architecture for product codes
The major advantage of our fullparallel architecture is that it
enables the memory block of 4mN
2
soft values between each
halfiteration to be removed. However, the codeword soft
values exchanged between the rowand column decoders have
to be routed. One solution is to use a connection network for
this task. In our case, we have chosen an Omega network. The
Omega network is one of several connection networks used
in parallel machines [35]. It is composed of log
2
N stages,
each having N/2 exchange elements. In fact, the Omega
network complexity in terms of number of connections and
of 2×2 switch transfer blocks is N ×log
2
N and (N/2) log
2
N,
respectively. For example, the equivalent gate complexity of
a 31 × 31 network can be estimated to be 200 logic gates
per exchanged bit. Figure 7 depicts a fullparallel architecture
for the turbo decoding of product codes. It is composed of
cascaded modules for the turbo decoder. Each module is
dedicated to one iteration. However, it is possible to process
several iterations by the same module. In our approach, 2N
elementary decoders and 2 connection blocks are necessary
for one module. A connection block is composed of 2 Omega
networks exchanging the R and R
k
soft values. Since the
Omega network has low complexity, the fullparallel turbo
decoder complexity essentially depends on the complexity of
the elementary decoder.
5.4. Elementary SISOdecoder architecture
The block diagram of an elementary SISO decoder is shown
in Figure 2, where k stands for the current halfiteration
number. R
k
is the softinput matrix computed from the
previous halfiteration whereas R denotes the initial matrix
delivered by the receiver frontend (R
k
= R for the 1st
halfiteration). W
k
is the extrinsic information matrix.
α
k
is a scaling factor that depends on the current half
iteration and which is used to mitigate the inﬂuence of the
extrinsic information during the ﬁrst iterations. The decoder
architecture is structured in three pipelined stages identiﬁed
as reception, processing, and transmission units [36]. During
each stage, the N soft values of the received word R
k
are
processed sequentially in N clock periods. The reception
stage computes the initial syndromes S
i
and ﬁnds the L
r
least reliable bits in the received word. The main function
of the processing stage is to build and then to correct the
N
ep
error patterns obtained from the initial syndrome and
to combine the least reliable bits. Moreover, the processing
stage also has to produce a metric (Euclidean distance
between error pattern and received word) for each error
pattern.Finally, a selection function identiﬁes the maximum
likelihood codeword d and the competing codewords c
(if any). The transmission stage performs diﬀerent func
tions: computing the reliability for each binary soft value,
computing the extrinsic information, and correcting the
received soft values. The N soft values of the codeword are
thus corrected sequentially. The decoding process needs to
access the R and R
k
soft values during the three decoding
phases. For this reason, these words are implemented into
six random access memories (RAMs) of size q × m × N
controlled by a ﬁnitestate machine. In summary, a full
parallel TPC decoder architecture requires lowcomplexity
decoders.
6. COMPLEXITY ANDTHROUGHPUT
ANALYSIS OF THE FULLPARALLEL
REEDSOLOMONTURBODECODERS
Increasing the throughput regardless of the turbo decoder
complexity is not relevant. In order to compare the through
put and complexity of RS and BCH turbo decoders, we
propose to measure the eﬃciency η of a parallel architecture
by the ratio
η =
T
C
, (12)
where T is the throughput and C is the complexity of
the design. An eﬃcient architecture is expected to have a
high η ratio, that is, a high throughput with low hardware
complexity. In this section, we determine and compare the
eﬃciency of TPC decoders based on SEC BCH and RS
component codes, respectively.
Rapha¨ el Le Bidan et al. 9
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
A module for one iteration
· · ·
· · ·
· · ·
.
.
.
.
.
.
.
.
.
.
.
.
Figure 7: Fullparallel architecture for decoding of product codes.
6.1. Turbo decoder complexity analysis
A turbo decoder of product code corresponds to the cumu
lative area of computation resources, memory resources, and
communication resources. In a fullparallel turbo decoder,
the main part of the complexity is composed of memory
and computation resources. Indeed, the major advantage
of our fullparallel architecture is that it enables the
memory blocks between each halfiteration to be replaced
by Omega connection networks. Communication resources
thus represent less than 1% of the total area of the turbo
decoder. Consequently, the following study will only focus
on memory and computation resources.
6.1.1. Complexity analysis of computation resources
The computation resources of an elementary decoder are
split into three pipelined stages. The reception and transmis
sion stages have O(log(N)) complexity. For these two stages,
replacing a BCH code by an RS code of same code length N
(at the symbol level) over GF(2
m
) results in an increase of
both complexity and throughput by a factor m. As a result,
eﬃciency is constant in these parts of the decoder. However,
the hardware complexity of the processing stage increases
linearly with the number N
ep
of error patterns. Consequently,
the increase in the local parallelism rate has no inﬂuence
on the area of this stage and thus increases the eﬃciency
of an RS SISO decoder. In order to verify those general
considerations, turbo decoders for the (15, 13)
2
, (31, 29)
2
,
and (63, 61)
2
RS product codes were described in HDL
language and synthesized. Logic syntheses were performed
using the Synopsys tool Design Compiler with an ST
microelectronics 90 nm CMOS process. All designs were
clocked with 100 MHz. Complexity of BCH turbo decoders
was estimated thanks to a generic complexity model which
can deliver an estimation of the gate count for any code size
and any set of decoding parameters. Therefore, taking into
account the implementation and performance constraints,
this model can be used to select a code size N and a set
of decoding parameters [37]. In particular, the numbers of
error patterns N
ep
and also the number of competing code
Table 2: Computation resource complexity of selected TPC
decoders in terms of gate count.
Code Rate
Elementary Fullparallel
decoder module
(32, 26)
2
BCH 0.66 2 791 178 624
(64, 57)
2
BCH 0.79 3 139 401 792
(128, 120)
2
BCH 0.88 3 487 892 672
(15, 13)
2
RS 0.75 3 305 99 150
(31, 29)
2
RS 0.88 4 310 267 220
(63, 61)
2
RS 0.94 6 000 756 000
words kept for softoutput computation directly aﬀect both
the hardware complexity and the decoding performance.
Increasing these parameter values improves performance but
also increases complexity.
Table 2 summarizes some computation resource com
plexities in terms of gate count for diﬀerent BCH and
RS product codes. Firstly, the complexity of an elementary
decoder for each product code is given. The results clearly
show that RS elementary decoders are more complex than
BCH elementary decoders over the same Galois ﬁeld.
Complexity results for a fullparallel module of the turbo
decoding process are also given in Table 2. As described
in Figure 7, a fullparallel module is composed of 2N
elementary decoders and 2 connection blocks for one
iteration. In this case, fullparallel modules composed of RS
elementary decoders are seen to be less complex than full
parallel modules composed of BCH elementary decoders
when comparing eBCH and RS product codes of similar
code rate R. For instance, for a code rate R = 0.88, the
computation resource complexity in terms of gate count
are about 892, 672 and 267, 220 for the BCH(128, 120)
2
and
RS(31, 29)
2
, respectively. This is due to the fact that RS
codes need smaller code length N (at the symbol level) to
achieve a given code rate, in contrast to binary BCH codes.
Considering again the previous example, only 31×2 decoders
are necessary in the RS case for fullparallel decoding
compared to 128 × 2 decoders in the BCH case. Similarly,
10 EURASIP Journal on Wireless Communications and Networking
0 50 100 150 200 250 300 350 400
Degree of parallelism
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C
o
m
p
u
t
a
t
i
o
n
l
o
g
i
c
g
a
t
e
c
o
u
n
t
(
M
g
a
t
e
s
)
BCH block turbo decoder
RS block turbo decoder
Figure 8: Comparison of computation resource complexity.
Figure 8 gives computation resource area of BCH and RS
turbo decoders for 1 iteration and diﬀerent parallelism
degrees. We verify that higher P (i.e., higher throughput)
can be obtained with less computation resources using RS
turbo decoders. This means that RS product codes are more
eﬃcient in terms of computation resources for fullparallel
architectures dedicated to turbo decoding.
6.1.2. Complexity analysis of memory resources
A halfiteration of a parallel turbo decoder contains N banks
of q ×m×N bits. The internal memory complexity of a par
allel decoder for one halfiteration can be approximated by
S
RAM
γ ×q ×m×N
2
, (13)
where γ is a technological parameter specifying the number
of equivalent gate counts per memory bit, q is the number
of quantization bits for the soft values, and m is the number
of bits per Galois ﬁeld element. Using (17), it can also be
expressed as
S
RAM
= γ ×
P
2
m
×q, (14)
where P is the parallelism degree, deﬁned as the number of
generated bits per clock period (t
0
).
Let us consider a BCH code and an RS code of
similar code length N= 2
m
− 1. For BCH codes, a symbol
corresponds to 1 bit, whereas it is made of mbits for RS
codes. Calculating the SISO memory area for both BCH and
RS gives the following ratio:
S
RAM
(BCH)
S
RAM
(RS)
= m = log
2
(N + 1). (15)
This result shows that RS turbo decoders have lower memory
complexity for a given parallelism rate. This was conﬁrmed
by memory area estimations results showed in Figure 9.
Random access memory (RAM) area of BCH and RS turbo
decoders for a halfiteration and diﬀerent parallelism degrees
0 20 40 60 80 100 120 140 160 180
Degree of parallelism
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
R
A
M
g
a
t
e
c
o
u
n
t
(
M
g
a
t
e
s
)
BCH block turbo decoder
RS block turbo decoder
Figure 9: Comparison of internal RAM complexity.
are plotted using a memory area estimation model provided
by STMicroelectronics. We can observe that higher P (i.e.,
higher throughput) can be obtained with less memory when
using an RS turbo decoder. Thus, fullparallel decoding of
RS codes is more memoryeﬃcient than BCH code turbo
decoding.
6.2. Turbo decoder throughput analysis
In order to maximize the data rate, decoding resources are
assigned for each decoding iteration. The throughput of a
turbo decoder can be deﬁned as
T = P ×R × f
0
, (16)
where R is the code rate and f
0
= 1/t
0
is the maximum fre
quency of an elementary SISO decoder. Ultrahigh through
put can be reached by increasing these three parameters.
(i) R is a parameter that exclusively depends on the code
considered. Thus, using codes with a higher code rate (e.g.,
RS codes) would provide larger throughput.
(ii) In a fullparallel architecture, a maximum through
put is obtained by duplicating N elementary decoders
generating m soft values per clock period. The parallelism
degree can be expressed as
P = N ×m. (17)
Therefore, enhanced parallelism degree can be obtained by
using nonbinary codes (e.g., RS codes) with larger code
length N.
(iii) Finally, in a highspeed architecture, each elemen
tary decoder has to be optimized in terms of working
frequency f
0
. This is accomplished by including pipeline
stages within each elementary SISO decoder. RS and BCH
turbo decoders of equivalent code size have equivalent
working frequency f
0
since RS decoding is performed by
introducing some local parallelism at the soft value level.
This result was veriﬁed during logic syntheses. The main
drawback of pipelining elementary decoders is the extra
complexity generated by internal memory requirement.
Rapha¨ el Le Bidan et al. 11
Table 3: Hardware eﬃciency of selected TPC decoders.
Code R P T C η
(32, 26)
2
BCH 0.66 32 2.11 201 10.5
(64, 57)
2
BCH 0.79 64 5.06 508 9.97
(128, 120)
2
BCH 0.88 128 11.26 1361 8.27
(15, 13)
2
RS 0.75 60 4.5 128 35.0
(31, 29)
2
RS 0.88 155 13.64 396 34.4
(63, 61)
2
RS 0.94 378 35.5 1312 27
Since RS codes have higher P and R for equivalent
f
0
, RS turbo decoder can reach a higher data rate than
equivalent BCH turbo decoder. However, the increase in
throughput cannot be considered regardless of the turbo
decoder complexity.
6.3. Turbo product code comparison:
throughput versus complexity
The eﬃciency η between the decoder throughput and the
decoder complexity can be used to compare eBCH and RS
turbo product codes. We have reported in Table 3 the code
rate R, the parallelism degree P, the throughput T (Gbps),
the complexity C (kgate) and the eﬃciency η (kbps/gate) for
each code. All designs have been clocked at f
0
= 100 MHz for
the computation of the throughput T. An average ratio of 3.5
between RS and BCH decoder eﬃciency is observed.
The good compromise between performance, through
put and complexity clearly makes RS product codes good
candidates for nextgeneration PON and OTN. In particular,
the (31, 29)
2
RS product code is compatible with the 10 Gbps
line rate envisioned for PON evolutions. Similarly, the
(63, 61)
2
RS product code can be used for data transport over
OTN at 40 Gbps provided the turbo decoder is clocked at a
frequency slightly higher than 100 MHz.
7. IMPLEMENTATIONOF ANRS TURBODECODER FOR
ULTRA HIGHTHROUGHPUT COMMUNICATION
An experimental setup based on FPGA devices has been
designed in order to show that RS TPCs can eﬀectively
be used in the physical layer of 10 Gbps optical access
networks. Based on the previous analysis, the (31, 29)
2
RS
TPCwas selected since it oﬀers the best compromise between
performance and complexity for this kind of application.
7.1. 10 Gbps experimental setup
The experimental setup is composed of a board that includes
6 Xilinx Virtex5 LX330 FPGAs [38]. A Xilinx Virtex5
LX330 FPGA contains 51,840 slices that can emulate up to
12 million gates of logic. It should be noted that Virtex5
slices are organized diﬀerently from previous generations.
Each Virtex5 slice contains four look up tables (LUTs)
and four ﬂipﬂops instead of two LUTs and two ﬂipﬂops
in previous generation devices. The board is hosted on a
64bit, 66 MHz PCI bus that enables communication at
full PCI bandwidth with a computer. An FPGA embedded
memory block containing 10 encoded and noisy product
code matrices is used to generate input data towards the
turbo decoder. This memory block exchanges data with a
computer thanks to the PCI bus.
One decoding iteration was implemented on each FPGA
resulting in a 6 fulliteration turbo decoder as shown in
Figure 10. Each decoding module corresponds to a full
parallel architecture dedicated to the decoding of a matrix
of 31 × 31 coded soft values. We recall here that a coded
soft value over GF(32) is mapped onto 5 LLR values, each
LLR being quantized on 5 bits. Besides, the decoding process
needs to access the 31 coded soft values from each of the
matrices R and R
k
during the three decoding phases of a
halfiteration as explained in Section 4. For theses reasons,
31×5×5×2 = 1, 550 bits have to be exchanged between the
decoding modules during each clock period f
0
= 65 MHz.
The board oﬀers 200 chip to chip LVDS for each FPGA to
FPGA interconnect. Unfortunately, this number of LVDS
is insuﬃcient to enable the transmission of all the bits
between the decoding modules. To solve this implementation
constraint, we have chosen to add SERializer/DESerializer
(SERDES) modules for the paralleltoserial conversions and
for the serialtoparallel conversions in each FPGA. Indeed,
SERDES is a pair of functional blocks commonly used in
highspeed communications to convert data between parallel
data and serial interfaces in each direction. SERDES modules
are clocked with f
1
= 2 × f
0
= 130 MHz and operate at
8 : 1 serialization or 1 : 8 deserialization. In this way, all data
can be exchanged between the diﬀerent decoding modules.
Finally, the total occupation rate of the FPGA that contains
the more complex design (decoding module + two SERDES
modules + memory block + PCI protocol module) is slightly
higher than 66%. This corresponds to 34,215 Virtex5 slices.
Note that the decoding module represents only 37% of the
total design complexity. More details about this are given in
the next section.
Currently, a new design phase of the experimental setup
is in progress. The objective is to include channel emulator
and BER measurement facilities in order to verify decoding
performance of the turbo decoder by plotting some BER
curves as in our previous experimental setup [37].
7.2. Characteristics and performance of
the implemented decoding module
A decoding module for one iteration is composed of 31 ×
2 = 62 elementary decoders and 2 connection blocks.
Each elementary decoder uses information quantized on
5 bits with N
ep
= 8 error patterns and only 1 competing
codeword. These reduced parameter values allow a decrease
in the required area for a performance degradation which
remains inferior to 0.5 dB. Thus a (31, 29) RS elementary
decoder occupies 729 slice LUTs, 472 slice FlipFlops and
3 BlockRAM of 18 Kbs. A connection block occupies only
2,325 slice LUTs. Computation resources of a decoding
module take up 29,295 slice FlipFlops and 49,848 slice
LUTs. It means that the occupation rates are about 14%
and 24% of a Xilinx Virtex5 LX330 FPGA for slice registers
and slice LUTs, respectively. Besides, memory resources for
12 EURASIP Journal on Wireless Communications and Networking
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
SERDES module
200 LVDS signals
F
P
G
A
X
C
5
V
L
X
3
3
0
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
S
E
R
D
E
S
m
o
d
u
l
e
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
Global clock f0 = 65 MHz
FPGA XC5VLX330
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
S
E
R
D
E
S
m
o
d
u
l
e
S
E
R
D
E
S
m
o
d
u
l
e
2
0
0
L
V
D
S
s
i
g
n
a
l
s
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
F
P
G
A
X
C
5
V
L
X
3
3
0
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
SERDES module
S
E
R
D
E
S
m
o
d
u
l
e
2
0
0
L
V
D
S
s
i
g
n
a
l
s
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
SERDES module
2
0
0
L
V
D
S
s
i
g
n
a
l
s
FPGA
XC5VLX330
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
S
E
R
D
E
S
m
o
d
u
l
e
B
l
o
c
k
R
A
M
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
FPGA XC5VLX330
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
S
E
R
D
E
S
m
o
d
u
l
e
S
E
R
D
E
S
m
o
d
u
l
e
2
0
0
L
V
D
S
s
i
g
n
a
l
s
Elementary
decoder
for row 1
Elementary
decoder
for row 2
Elementary
decoder
for row N
Elementary
decoder for
column 1
Elementary
decoder for
column 2
Elementary
decoder for
column N
F
P
G
A
X
C
5
V
L
X
3
3
0
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
C
o
n
n
e
c
t
i
o
n
b
l
o
c
k
SERDES module
S
E
R
D
E
S
m
o
d
u
l
e
200 LVDS signals
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 10: 10 Gbps experimental setup for turbo decoding of (31, 29)
2
RS product code.
the decoding module take up 186 BlockRAM of 18 Kbits.
It represents 32% of the total BlockRAM available in the
Xilinx Virtex5 LX330 FPGA. Note that one BlockRAM of
18 Kbits is allocated by the Xilinx tool ISE to memorize only
31 × 5 × 5 = 775 bits in our design. The occupation rate
of each BlockRAM of 18 Kbits is then only about 4%. Input
data are clocked at f
0
= 65 MHz resulting in a data rate of
T
in
= 10 Gbps at the turbodecoder input. By taking into
account the code rate R = 0.87, the information rate becomes
T
out
= 8.7 Gbps. In conclusion, the implementation results
showed that a turbo decoder dedicated to the (31, 29)
2
RS
product code can eﬀectively be integrated to the physical
layer of a 10 Gbps optical access network.
7.3. (63,61)
2
RS TPC complexity estimation for
a 40 Gbps transmission over OTN
A similar prototype based on the (63, 61)
2
RS TPC can be
designed for 40 Gbps transmission over OTN. Indeed, the
architecture of one decoding iteration is the same for the
two RS TPCs considered in this work. For the (63, 61)
2
RS product code, a decoding module for one iteration is
now composed of 63 × 2 = 126 elementary decoders and
2 connection blocks. Logic syntheses were performed using
the Xilinx tool ISE to estimate the complexity of a (63, 61)
RS elementary decoder. This decoder occupies 1070 slice
LUTs, 660 slice FlipFlops, and 3 BlockRAM of 18 Kbs. These
estimations immediately give the complexity of a decoding
module dedicated to one iteration. Computation resources
of a (63, 61)
2
RS decoding module take up 83,160 slice Flip
Flops and 134,820 slice LUTs. The occupation rates are then
about 40% and 65% of a Xilinx Virtex5 LX330 FPGA for
slice registers and slice LUTs, respectively. Memory resources
of a (63, 61)
2
RS decoding module take up 378 BlockRAM of
18 Kbits that represents 65% of the total BlockRAM available
in the considered FPGA device. One BlockRAM of 18 Kbits is
allocated by the Xilinx tool ISE to memorize only 63×6×5 =
1890 bits. For a (63, 61) RS elementary decoder, the occupa
tion rate of each BlockRAM of 18 Kbits is only about 10.5%.
8. CONCLUSION
We have investigated the use of RS product codes for
forwarderror correction in highcapacity ﬁber optic trans
port systems. A complete study considering all the aspects
of the problem from code optimization to turbo product
code implementation has been performed. Two speciﬁc
applications were envisioned: 40 Gbps line rate transmis
sion over OTN and 10 Gbps data transmission over PON.
Algorithmic issues have been ordered and solved in order to
design RS turbo product codes that are compatible with the
respective requirements of the two transmission scenarios.
A novel fullparallel turbo decoding architecture has been
introduced. This architecture allows decoding of TPCs at
data rates of 10 Gbps and beyond. In addition, a comparative
study has been carried out between eBCH and RS TPCs
in the context of optical communications. The results have
shown that highrate RS TPCs oﬀer similar performance
at reduced hardware complexity. Finally, we have described
the successful realization of an RS turbo decoder prototype
for 10 Gbps data transmission. This experimental setup
demonstrates the practicality and also the beneﬁts oﬀered
by RS TPCs in lightwave systems. Although only ﬁber optic
communications have been considered in this work, RS TPCs
may also be attractive FEC solutions for nextgeneration
freespace optical communication systems.
ACKNOWLEDGMENTS
The authors wish to acknowledge the ﬁnancial support of
France Telecom R&D. They also thank G´ erald Le Mestre
Rapha¨ el Le Bidan et al. 13
for his signiﬁcant help during the experimental setup design
phase. This paper was presented in part at IEEE International
Conference on Communication, Glasgow, Scotland, in June
2007.
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon
limit errorcorrecting coding and decoding: turbocodes 1,” in
Proceedings of the IEEE International Conference on Communi
cations (ICC ’93), vol. 2, pp. 1064–1070, Geneva, Switzerland,
May 1993.
[2] R. G. Gallager, “Lowdensity paritycheck codes,” IEEE Trans
actions on Information Theory, vol. 8, no. 1, pp. 21–28, 1962.
[3] D. J. Costello Jr. and G. D. Forney Jr., “Channel coding: the
road to channel capacity,” Proceedings of the IEEE, vol. 95, no.
6, pp. 1150–1177, 2007.
[4] S. Benedetto and G. Bosco, “Channel coding for optical
communications,” in Optical Communication: Theory and
Techniques, E. Forestieri, Ed., chapter 8, pp. 63–78, Springer,
New York, NY, USA, 2005.
[5] T. Mizuochi, “Recent progress in forward error correction
for optical communication systems,” IEICE Transactions on
Communications, vol. E88B, no. 5, pp. 1934–1946, 2005.
[6] T. Mizuochi, “Recent progress in forward error correction and
its interplay with transmission impairments,” IEEE Journal of
Selected Topics in Quantum Electronics, vol. 12, no. 4, pp. 544–
554, 2006.
[7] “Forward error correction for high bit rate DWDMsubmarine
systems,” International Telecommunication Union ITUT
Recommandation G.975.1, February 2004.
[8] R. Pyndiah, A. Glavieux, A. Picart, and S. Jacq, “Near optimum
decoding of product codes,” in Proceedings of the IEEE Global
Telecommunications Conference (GLOBECOM ’94), vol. 1, pp.
339–343, San Francisco, Calif, USA, NovemberDecember
1994.
[9] K. Gracie and M.H. Hamon, “Turbo and turbolike codes:
principles and applications in telecommunications,” Proceed
ings of the IEEE, vol. 95, no. 6, pp. 1228–1254, 2007.
[10] J. Cuevas, P. Adde, S. Kerouedan, and R. Pyndiah, “New
architecture for high data rate turbo decoding of product
codes,” in Proceedings of the IEEE Global Telecommunications
Conference (GLOBECOM ’02), vol. 2, pp. 1363–1367, Taipei,
Taiwan, November 2002.
[11] C. J´ ego, P. Adde, and C. Leroux, “Fullparallel architecture for
turbo decoding of product codes,” Electronics Letters, vol. 42,
no. 18, pp. 1052–1054, 2006.
[12] T. Mizuochi, Y. Miyata, T. Kobayashi, et al., “Forward error
correction based on block turbo code with 3bit soft decision
for 10Gb/s optical communication systems,” IEEE Journal of
Selected Topics in Quantum Electronics, vol. 10, no. 2, pp. 376–
386, 2004.
[13] I. B. Djordjevic, S. Sankaranarayanan, S. K. Chilappagari,
and B. Vasic, “Lowdensity paritycheck codes for 40Gb/s
optical transmission systems,” IEEE Journal of Selected Topics
in Quantum Electronics, vol. 12, no. 4, pp. 555–562, 2006.
[14] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error
Correcting Codes, NorthHolland, Amsterdam, The Nether
lands, 1977.
[15] R. E. Blahut, Algebraic Codes for Data Transmission, Cam
bridge University Press, Cambridge, UK, 2003.
[16] O. Aitsab and R. Pyndiah, “Performance of ReedSolomon
block turbo code,” in Proceedings of the IEEE Global Telecom
munications Conference (GLOBECOM ’96), vol. 1, pp. 121–
125, London, UK, November 1996.
[17] D. Chase, “A class of algorithms for decoding block codes
with channel measurement information,” IEEE Transactions
on Information Theory, vol. 18, no. 1, pp. 170–182, 1972.
[18] P. Adde and R. Pyndiah, “Recent simpliﬁcations and improve
ments in block turbo codes,” in Proceedings of the 2nd
International Symposium on Turbo Codes and Related Topics,
pp. 133–136, Brest, France, September 2000.
[19] R. Pyndiah, “Iterative decoding of product codes: block turbo
codes,” in Proceedings of the 1st International Symposium on
Turbo Codes and Related Topics, pp. 71–79, Brest, France,
September 1997.
[20] J. Briand, F. Payoux, P. Chanclou, and M. Joindot, “Forward
error correction in WDM PON using spectrum slicing,”
Optical Switching and Networking, vol. 4, no. 2, pp. 131–136,
2007.
[21] R. Zhou, R. Le Bidan, R. Pyndiah, and A. Goalic, “Low
complexity highrate ReedSolomon block turbo codes,” IEEE
Transactions on Communications, vol. 55, no. 9, pp. 1656–
1660, 2007.
[22] P. Sweeney and S. Wesemeyer, “Iterative softdecision decod
ing of linear block codes,” IEE Proceedings: Communications,
vol. 147, no. 3, pp. 133–136, 2000.
[23] M. Lalam, K. Amis, D. Leroux, D. Feng, and J. Yuan,
“An improved iterative decoding algorithm for block turbo
codes,” in Proceedings of the IEEE International Symposium on
Information Theory (ISIT ’06), pp. 2403–2407, Seattle, Wash,
USA, July 2006.
[24] W. W. Peterson, “Encoding and errorcorrection procedures
for the BoseChaudhuri codes,” IEEE Transactions on Informa
tion Theory, vol. 6, no. 4, pp. 459–470, 1960.
[25] D. Gorenstein and N. Zierler, “A class of error correcting codes
in p
m
symbols,” Journal of the Society for Industrial and Applied
Mathematics, vol. 9, no. 2, pp. 207–214, 1961.
[26] S. A. Hirst, B. Honary, and G. Markarian, “Fast Chase
algorithm with an application in turbo decoding,” IEEE
Transactions on Communications, vol. 49, no. 10, pp. 1693–
1699, 2001.
[27] G. Bosco, G. Montorsi, and S. Benedetto, “Soft decoding in
optical systems,” IEEE Transactions on Communications, vol.
51, no. 8, pp. 1258–1265, 2003.
[28] Y. Cai, A. Pilipetskii, A. Lucero, M. Nissov, J. Chen, and J.
Li, “On channel models for predicting softdecision error
correction performance in optically ampliﬁed systems,” in
Proceedings of the Optical Fiber Communications Conference
(OFC ’03), vol. 2, pp. 532–533, Atlanta, Ga, USA, March 2003.
[29] G. P. Agrawal, Lightwave Technology: Telecommunication Sys
tems, John Wiley & Sons, Hoboken, NJ, USA, 2005.
[30] L. M. G. M. Tolhuizen, “More results on the weight enu
merator of product codes,” IEEE Transactions on Information
Theory, vol. 48, no. 9, pp. 2573–2577, 2002.
[31] M. ElKhamy and R. Garello, “On the weight enumer
ator and the maximum likelihood performance of linear
product codes,” IEEE Transaction on Information Theory,
arXiv:cs.IT/0601095 (preprint) Jan 2006.
[32] R. Le Bidan, R. Pyndiah, and P. Adde, “Some results on the
binary minimum distance of ReedSolomon codes and block
turbo codes,” in Proceedings of the IEEE International Con
ference on Communications (ICC ’07), pp. 990–994, Glasgow,
Scotland, June 2007.
[33] P. Adde, R. Pyndiah, and S. Kerouedan, “Block turbo code
with binary input for improving quality of service,” in Mul
tiaccess, Mobility and Teletraﬃc for Wireless Communications,
14 EURASIP Journal on Wireless Communications and Networking
X. Lagrange and B. Jabbari, Eds., vol. 6, Kluwer Academic
Publishers, Boston, Mass, USA, 2002.
[34] Z. Chi and K. K. Parhi, “High speed VLSI architecture
design for block turbo decoder,” in Proceedings of the IEEE
International Symposium on Circuits and Systems (ISCAS ’02),
vol. 1, pp. 901–904, Phoenix, Ariz, USA, May 2002.
[35] D. H. Lawrie, “Access and alignment of data in an array
processor,” IEEE Transactions on Computers, vol. C24, no. 12,
pp. 1145–1155, 1975.
[36] S. Kerouedan and P. Adde, “Implementation of a block
turbo decoder on a single chip,” in Proceedings of the 2nd
International Symposium on Turbo Codes and Related Topics,
pp. 243–246, Brest, France, September 2000.
[37] C. Leroux, C. J´ ego, P. Adde, and M. Jezequel, “Towards Gb/s
turbo decoding of product code onto an FPGA device,” in
Proceedings of the IEEE International Symposium on Circuits
and Systems (ISCAS ’07), pp. 909–912, New Orleans, La, USA,
May 2007.
[38] http://www.dinigroup.com/DN9000k10PCI.php.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 843634, 11 pages
doi:10.1155/2008/843634
Research Article
Complexity Analysis of ReedSolomon Decoding over
GF(2
m
) without Using Syndromes
Ning Chen and Zhiyuan Yan
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
Correspondence should be addressed to Zhiyuan Yan, yan@lehigh.edu
Received 15 November 2007; Revised 29 March 2008; Accepted 6 May 2008
Recommended by Jinhong Yuan
There has been renewed interest in decoding ReedSolomon (RS) codes without using syndromes recently. In this paper, we
investigate the complexity of syndromeless decoding, and compare it to that of syndromebased decoding. Aiming to provide
guidelines to practical applications, our complexity analysis focuses on RS codes over characteristic2 ﬁelds, for which some
multiplicative FFT techniques are not applicable. Due to moderate block lengths of RS codes in practice, our analysis is complete,
without big O notation. In addition to fast implementation using additive FFT techniques, we also consider direct implementation,
which is still relevant for RS codes with moderate lengths. For highrate RS codes, when compared to syndromebased decoding
algorithms, not only syndromeless decoding algorithms require more ﬁeld operations regardless of implementation, but also
decoder architectures based on their direct implementations have higher hardware costs and lower throughput. We also derive
tighter bounds on the complexities of fast polynomial multiplications based on Cantor’s approach and the fast extended Euclidean
algorithm.
Copyright © 2008 N. Chen and Z. Yan. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
ReedSolomon (RS) codes are among the most widely used
error control codes, with applications in space commu
nications, wireless communications, and consumer elec
tronics [1]. As such, eﬃcient decoding of RS codes is
of great interest. The majority of the applications of RS
codes use syndromebased decoding algorithms such as the
BerlekampMassey algorithm (BMA) [2] or the extended
Euclidean algorithm (EEA) [3]. Alternative hard decision
decoding methods for RS codes without using syndromes
were considered in [4–6]. As pointed out in [7, 8], these
algorithms belong to the class of frequencydomain algo
rithms and are related to the WelchBerlekamp algorithm
[9]. In contrast to syndromebased decoding algorithms,
these algorithms do not compute syndromes and avoid
the Chien search and Forney’s formula. Clearly, this diﬀer
ence leads to the question whether these algorithms oﬀer
lower complexity than syndromebased decoding, especially
when fast Fourier transform (FFT) techniques are applied
[6].
Asymptotic complexity of syndromeless decoding was
analyzed in [6], and in [7] it was concluded that syndrome
less decoding has the same asymptotic complexity O(n log
2
n)
(note that all the logarithms in this paper are to base
two) as syndromebased decoding [10]. However, existing
asymptotic complexity analysis is limited in several aspects.
For example, for RS codes over Fermat ﬁelds GF(2
2
r
+ 1)
and other prime ﬁelds [5, 6], eﬃcient multiplicative FFT
techniques lead to an asymptotic complexity of O(n log
2
n).
However, such FFT techniques do not apply to characteristic
2 ﬁelds, and hence this complexity is not applicable to
RS codes over characteristic2 ﬁelds. For RS codes over
arbitrary ﬁelds, the asymptotic complexity of syndromeless
decoding based on multiplicative FFT techniques was shown
to be O(n log
2
nlog log n) [6]. Although they are applicable
to RS codes over characteristic2 ﬁelds, the complexity has
large coeﬃcients and multiplicative FFT techniques are less
eﬃcient than fast implementation based on additive FFT
for RS codes with moderate block lengths [6, 11, 12]. As
such, asymptotic complexity analysis provides little help to
practical applications.
2 EURASIP Journal on Wireless Communications and Networking
In this paper, we analyze the complexity of syndrome
less decoding and compare it to that of syndromebased
decoding. Aiming to provide guidelines to system designers,
we focus on the decoding complexity of RS codes over
GF(2
m
). Since RS codes in practice have moderate lengths,
our complexity analysis provides not only the coeﬃcients
for the most signiﬁcant terms, but also the following
terms. Due to their moderate lengths, our comparison is
based on two types of implementations of syndromeless
decoding and syndromebased decoding: direct implemen
tation and fast implementation based on FFT techniques.
Direct implementations are often eﬃcient when decoding
RS codes with moderate lengths and have widespread
applications; thus, we consider both computational com
plexities, in terms of ﬁeld operations, and hardware costs
and throughputs. For fast implementations, we consider
their computational complexities only and their hardware
implementations are beyond the scope of this paper. We
use additive FFT techniques based on Cantor’s approach
[13] since this approach achieves small coeﬃcients [6,
11] and hence is more suitable for moderate lengths. In
contrast to some previous works [12, 14], which count
ﬁeld multiplications and additions together, we diﬀeren
tiate the multiplicative and additive complexities in our
analysis.
The main contributions of the papers are as follows.
(i) We derived a tighter bound on the complexities
of fast polynomial multiplication based on Cantor’s
approach.
(ii) We also obtained a tighter bound on the complexity
of the fast extended Euclidean algorithm (FEEA)
for general partial greatest common divisor (GCD)
computation.
(iii) We evaluated the complexities of syndromeless de
coding based on diﬀerent implementation approach
es and compare them with their counterparts of syn
dromebased decoding. Both errorsonly and errors
anderasures decodings are considered.
(iv) We compare the hardware costs and throughputs of
direct implementations for syndromeless decoders
with those for syndromebased decoders.
The rest of the paper is organized as follows. To make
this paper selfcontained, in Section 2 we brieﬂy review FFT
algorithms over ﬁnite ﬁelds, fast algorithms for polyno
mial multiplication and division over GF(2
m
), the FEEA,
and syndromeless decoding algorithms. Section 3 presents
both computational complexity and decoder architectures
of direct implementations of syndromeless decoding, and
compares them with their counterparts for syndromebased
decoding algorithms. Section 4 compares the computational
complexity of fast implementations of syndromeless decod
ing with that of syndromebased decoding. In Section 5,
case studies on two RS codes are provided and errorsand
erasures decoding is discussed. The conclusions are given in
Section 6.
2. BACKGROUND
2.1. Fast Fourier transformover ﬁnite ﬁelds
For any n (n ¦ q − 1) distinct elements a
0
, a
1
, . . . , a
n−1
∈
GF(q), the transform from f = ( f
0
, f
1
, . . . , f
n−1
)
T
to F
( f (a
0
), f (a
1
), . . . , f (a
n−1
))
T
, where f (x) =
¸
n−1
i=0
f
i
x
i
∈
GF(q)[x], is called a discrete Fourier transform (DFT),
denoted by F = DFT(f). Accordingly, f is called the inverse
DFT of F, denoted by f = IDFT(F). Asymptotically fast
Fourier transform (FFT) algorithm over GF(2
m
) was pro
posed in [15]. Reducedcomplexity cyclotomic FFT (CFFT)
was shown to be eﬃcient for moderate lengths in [16].
2.2. Polynomial multiplication over GF(2
m
)
by Cantor’s approach
A fast polynomial multiplication algorithm using additive
FFT was proposed by Cantor [13] for GF(q
q
m
), where
q is prime, and it was generalized to GF(q
m
) in [11].
Instead of evaluating and interpolating over the multiplica
tive subgroups as in multiplicative FFT techniques, Can
tor’s approach uses additive subgroups. Cantor’s approach
relies on two algorithms: multipoint evaluation (MPE) [11,
Algorithm 3.1] and multipoint interpolation (MPI) [11,
Algorithm 3.2].
Suppose the degree of the product of two polynomials
over GF(2
m
) is less than h (h ≤ 2
m
), the product can be
obtained as follows. First, the two operand polynomials are
evaluated using the MPE algorithm. The evaluation results
are then multiplied pointwise. Finally, the product polyno
mial is obtained by the MPI algorithm to interpolate the
pointwise multiplication results. The polynomial multiplica
tion requires at most (3/2)h log
2
h+(15/2)h log h+8h multi
plications over GF(2
m
) and (3/2)h log
2
h+(29/2)h log h+4h+
9 additions over GF(2
m
) [11]. For simplicity, henceforth in
this paper, all arithmetic operations are over GF(2
m
) unless
speciﬁed otherwise.
2.3. Polynomial division by Newton iteration
Suppose a, b ∈ GF(q)[x] are two polynomials of degrees
d
0
+ d
1
and d
1
(d
0
, d
1
≥ 0), respectively. To ﬁnd the quotient
polynomial q and the remainder polynomial r satisfying
a = qb + r, where deg r < d
1
, a fast polynomial division
algorithm is available [12]. Suppose rev
h
(a) x
h
a(1/x),
the fast algorithm ﬁrst computes the inverse of rev
d1
(b) mod
x
d0+1
by Newton iteration. Then, the reverse quotient is given
by q
∗
= rev
d0+d1
(a)rev
d1
(b)
−1
mod x
d0+1
. Finally, the actual
quotient and remainder are given by q = rev
d0
(q
∗
) and
r = a − qb.
Thus, the complexity of polynomial division with
remainder of a polynomial a of degree d
0
+ d
1
by a monic
polynomial b of degree d
1
is at most 4M(d
0
) + M(d
1
) +
O(d
1
) multiplications/additions when d
1
≥ d
0
[12, Theorem
9.6], where M(h) stands for the numbers of multiplica
tions/additions required to multiply two polynomials of
degree less than h.
N. Chen and Z. Yan 3
2.4. Fast extended Euclidean algorithm
Let r
0
and r
1
be two monic polynomials with deg r
0
>
deg r
1
and we assume s
0
= t
1
= 1, s
1
= t
0
= 0. Step
i (i = 1, 2, . . . , l) of the EEA computes ρ
i+1
r
i+1
= r
i−1
−
q
i
r
i
, ρ
i+1
s
i+1
= s
i−1
−q
i
s
i
, and ρ
i+1
t
i+1
= t
i−1
−q
i
t
i
so that the
sequence r
i
are monic polynomials with strictly decreasing
degrees. If the GCDof r
0
and r
1
is desired, the EEAterminates
when r
l+1
= 0. For 1 ≤ i ≤ l, R
i
Q
i
· · · Q
1
R
0
, where
Q
i
=
¸
0 1
1/ρi+1 −qi /ρi+1
¸
and R
0
=
¸
1 0
0 1
¸
. Then, it can be easily
veriﬁed that R
i
=
¸
si ti
si+1 ti+1
¸
for 0 ≤ i ≤ l. In RS decoding,
the EEA stops when the degree of r
i
falls below a certain
threshold for the ﬁrst time, and we refer to this as partial
GCD.
The FEEA in [12, 17] costs no more than (22M(h) +
O(h)) log h multiplications/additions when n
0
≤ 2h [14].
2.5. Syndromebased and syndromeless decoding
Over a ﬁnite ﬁeld GF(q), suppose a
0
, a
1
, . . . , a
n−1
are n (n ≤
q) distinct elements and g
0
(x)
¸
n−1
i=0
(x − a
i
). Let us
consider an RS code over GF(q) with length n, dimension
k, and minimum Hamming distance d = n − k + 1. A
message polynomial m(x) of degree less than k is encoded to
a codeword (c
0
, c
1
, . . . , c
n−1
) with c
i
= m(a
i
), and the received
vector is given by r = (r
0
, r
1
, . . . , r
n−1
).
The syndromebased hard decision decoding consists of
the following Steps: syndrome computation, key equation
solver, the Chien search, and Forney’s formula. Further
details are omitted, and interested readers are referred to
[1, 2, 18]. We also consider the following two syndromeless
algorithms.
Algorithm 1. [4, 5], [6, Algorithm 1]
(1.1) Interpolation. Construct a polynomial g
1
(x) with
deg g
1
(x) < n such that g
1
(a
i
) = r
i
for i = 0, 1, . . . ,
n − 1.
(1.2) Partial GCD. Apply the EEA to g
0
(x) and g
1
(x), and
ﬁnd g(x) and v(x) that maximize deg g(x) while
satisfying v(x)g
1
(x) ≡ g(x) mod g
0
(x) and deg g(x) <
(n + k)/2.
(1.3) Message recovery. If v(x) ¦ g(x), the message poly
nomial is recovered by m(x) = g(x)/v(x), otherwise
output “decoding failure.”
Algorithm 2. [6, Algorithm 1a]
(2.1) Interpolation. Construct a polynomial g
1
(x) with
deg g
1
(x) < n such that g
1
(a
i
) = r
i
for i = 0, 1, . . . ,
n − 1.
(2.2) Partial GCD. Find s
0
(x) and s
1
(x) satisfying g
0
(x) =
x
n−d+1
s
0
(x) + r
0
(x) and g
1
(x) = x
n−d+1
s
1
(x) + r
1
(x),
where deg r
0
(x) ≤ n − d and deg r
1
(x) ≤ n − d.
Apply the EEA to s
0
(x) and s
1
(x), and stop when the
remainder g(x) has degree less than (d − 1)/2. Thus,
we have v(x)s
1
(x) + u(x)s
0
(x) = g(x).
(2.3) Message recovery. If v(x) g
0
(x), output “decoding
failure;” otherwise, ﬁrst compute q(x) = g
0
(x)/v(x),
and then obtain m
¹
(x) = g
1
(x) + q(x)u(x). If
deg m
¹
(x) < k, output m
¹
(x); otherwise output
“decoding failure.”
Compared with Algorithm 1, the partial GCD Step of
Algorithm 2 is simpler but its message recovery Step is more
complex [6].
3. DIRECT IMPLEMENTATIONOF
SYNDROMELESS DECODING
3.1. Complexity analysis
We analyze the complexity of direct implementation of
Algorithms 1 and 2. For simplicity, we assume n − k is even
and hence d − 1 = 2t.
First, g
1
(x) in Steps (1.1) and (2.1) is given by IDFT(r).
Direct implementation of Steps (1.1) and (2.1) follows
Horner’s rule and requires n(n − 1) multiplications and
n(n − 1) additions [19].
Steps (1.2) and (2.2) both use the EEA. The Sugiyama
tower (ST) [3, 20] is well known as an eﬃcient direct
implementation of the EEA. For Algorithm 1, the ST is
initialized by g
1
(x) and g
0
(x), whose degrees are at most n.
Since the number of iterations is 2t, Step (1.2) requires 4t(n+
2) multiplications and 2t(n + 1) additions. For Algorithm 2,
the ST is initialized by s
0
(x) and s
1
(x), whose degrees are at
most 2t and the iteration number is at most 2t.
Step (1.3) requires one polynomial division, which can
be implemented by using k iterations of cross multiplications
in the ST. Since v(x) is actually the error locator polynomial
[6], deg v(x) ≤ t. Hence, this requires k(k + 2t + 2)
multiplications and k(t + 2) additions. However, the result
of the polynomial division is scaled by a nonzero constant.
That is, cross multiplications lead to m(x) = am(x). To
remove the scaling factor a, we can ﬁrst compute 1/a =
lc(g(x))/(lc(m(x))lc(v(x))), where lc( f ) denotes the leading
coeﬃcient of a polynomial f , and then obtains m(x) =
(1/a)m(x). This process requires one inversion and k + 2
multiplications.
Step (2.3) involves one polynomial division, one poly
nomial multiplication, and one polynomial addition, and
their complexities depend on the degrees of v(x) and
u(x), denoted as d
v
and d
u
, respectively. In the polynomial
division, let the result of the ST be q(x) = aq(x). The scaling
factor is recovered by 1/a = 1/(lc(q(x))lc(v(x))). Thus, it
requires one inversion, (n − d
v
+ 1)(n + d
v
+ 3) + n − d
v
+ 2
multiplications, and (n − d
v
+ 1)(d
v
+ 2) additions to obtain
q(x). The polynomial multiplication needs (n−d
v
+1)(d
u
+1)
multiplications and (n − d
v
+ 1)(d
u
+ 1) − (n − d
v
+ d
u
+ 1)
additions, and the polynomial addition needs n additions
since g
1
(x) has degree at most n − 1. The total complexity
of Step (2.3) includes (n − d
v
+ 1)(n + d
v
+ d
u
+ 5) + 1
multiplications, (n −d
v
+ 1)(d
v
+ d
u
+ 2) + n −d
u
additions,
and one inversion. Consider the worst case for multiplicative
complexity, where d
v
should be as small as possible. But
4 EURASIP Journal on Wireless Communications and Networking
d
v
> d
u
, so the highest multiplicative complexity is (n −
d
u
)(n + 2d
u
+ 6) + 1, which maximizes when d
u
= (n − 6)/4.
And we know d
u
< d
v
≤ t. Let R denote the code rate.
So for RS codes with R > 1/2, the maximum complexity
is n
2
+ nt − 2t
2
+ 5n − 2t + 5 multiplications, 2nt − 2t
2
+
2n + 2 additions, and one inversion. For codes with R ≤
1/2, the maximum complexity is (9/8)n
2
+ (9/2)n + 11/2
multiplications, (3/8)n
2
+ (3/2)n + 3/2 additions, and one
inversion.
Table 1 lists the complexity of direct implementation of
Algorithms 1 and 2, in terms of operations in GF(2
m
). The
complexity of syndromebased decoding is given in Table 2.
The numbers for syndrome computation, the Chien search,
and Forney’s formula are from [21]. We assume that the EEA
is used for the key equation solver since it was shown to be
equivalent to the BMA [22]. The ST is used to implement
the EEA. Note that the overall complexity of syndromebased
decoding can be reduced by sharing computations between
the Chien search and Forney’s formula. However, this is not
taken into account in Table 2.
3.2. Complexity comparison
For any application with ﬁxed parameters n and k, the
comparison between the algorithms is straightforward using
the complexities in Tables 1 and 2. Below we try to determine
which algorithm is more suitable for a given code rate.
The comparison between diﬀerent algorithms is complicated
by three diﬀerent types of ﬁeld operations. However, the
complexity is dominated by the number of multiplications:
in hardware implementation, both multiplication and inver
sion over GF(2
m
) require an areatime complexity of O(m
2
)
[23], whereas an addition requires an areatime complexity
of O(m); the complexity due to inversions is negligible since
the required number of inversions is much smaller than
that of multiplications; the numbers of multiplications and
additions are both O(n
2
). Thus, we focus on the number of
multiplications for simplicity.
Since t = (1/2)(1 − R)n and k = Rn, the multiplicative
complexities of Algorithms 1 and 2 are (3−R)n
2
+(3−R)n+2
and (1/2)(3R
2
−7R+8)n
2
+(7−3R)n+5, respectively, while
the complexity of syndromebased decoding is (1/2)(5R
2
−
13R + 8)n
2
+ (2 − 3R)n. It is easy to verify that in all
these complexities, the quadratic and linear coeﬃcients
are of the same order of magnitude; hence, we consider
only the quadratic terms. Considering only the quadratic
terms, Algorithm 1 is less eﬃcient than syndromebased
decoding when R > 1/5. If the Chien search and Forney’s
formula share computations, this threshold will be even
lower. Comparing the highest terms, Algorithm 2 is less
eﬃcient than the syndromebased algorithm regardless of
R. It is easy to verify that the most signiﬁcant term of the
diﬀerence between Algorithms 1 and 2 is (1/2)(1 − R)(3R −
2)n
2
. So when implemented directly, Algorithm 1 is less
eﬃcient than Algorithm 2 when R > 2/3. Thus, Algorithm 1
is more suitable for codes with very low rate, while
syndromebased decoding is the most eﬃcient for highrate
codes.
3.3. Hardware costs, latency, and throughput
We have compared the computational complexities of syn
dromeless decoding algorithms with those of syndrome
based algorithms. Now we compare these two types of
decoding algorithms from a hardware perspective: we will
compare the hardware costs, latency, and throughput of
decoder architectures based on direct implementations of
these algorithms. Since our goal is to compare syndrome
based algorithms with syndromeless algorithms, we select
our architectures so that the comparison is on a level
ﬁeld. Thus, among various decoder architectures available
for syndromebased decoders in the literature, we consider
the hypersystolic architecture in [20]. Not only it is an
eﬃcient architecture for syndromebased decoders, but also
some of its functional units can be easily adapted to
implement syndromeless decoders. Thus, decoder archi
tectures for both types of decoding algorithms have the
same structure with some functional units the same; this
allows us to focus on the diﬀerence between the two
types of algorithms. For the same reason, we do not try
to optimize the hardware costs, latency, or throughput
using circuitlevel techniques since such techniques will
beneﬁt from the architectures for both types of decoding
algorithms in a similar fashion and hence does not aﬀect the
comparison.
The hypersystolic architecture [20] contains three func
tional units: the power sums tower (PST) computing the
syndromes, the ST solving the key equation, and the
correction tower (CT) performing the Chien search and
Forney’s formula. The PST consists of 2t systolic cells, each of
which comprises of one multiplier, one adder, ﬁve registers,
and one multiplexer. The ST has δ + 1 (δ is the maximal
degree of the input polynomials) systolic cells, each of which
contains one multiplier, one adder, ﬁve registers, and seven
multiplexers. The latency of the ST is 6γ clock cycles [20],
where γ is the number of iterations. For the syndromebased
decoder architecture, δ and γ are both 2t. The CT consists
of 3t + 1 evaluation cells, two delay cells, along with two
joiner cells, which also perform inversions. Each evaluation
cell needs one multiplier, one adder, four registers, and one
multiplexer. Each delay cell needs one register. The two
joiner cells altogether need two multipliers, one inverter, and
four registers. Table 3 summarizes the hardware costs of the
decoder architecture for syndromebased decoders described
above. For each functional unit, we also list the latency
(in clock cycles), as well as the number of clock cycles it
needs to process one received word, which is proportional to
the inverse of the throughput. In theory, the computational
complexities of Steps of RS decoding depend on the received
word, and the total complexity is obtained by ﬁrst computing
the sum of complexities for all the Steps and then considering
the worst case scenario (cf. Section 3.1). In contrast, the
hardware costs, latency, and throughput of every functional
unit are dominated by the worst case scenario; the numbers
in Table 3 all correspond to the worst case scenario. The
critical path delay (CPD) is the same, T
mult
+T
add
+T
mux
, for
the PST, ST, and CT. In addition to the registers required by
the PST, ST, and CT, the total number of registers in Table 3
N. Chen and Z. Yan 5
Table 1: Direct implementation complexities of syndromeless decoding algorithms
Multiplications Additions Inversions
Interpolation n(n − 1) n(n − 1) 0
Partial GCD
Algorithm 1 4t(n + 2) 2t(n + 1) 0
Algorithm 2 4t(2t + 2) 2t(2t + 1) 0
Message recovery
Algorithm 1 (k + 2)(k + 1) + 2kt k(t + 2) 1
Algorithm 2 n
2
+ nt − 2t
2
+ 5n − 2t + 5 2nt − 2t
2
+ 2n + 2 1
Total
Algorithm 1 2n
2
+ 2nt + 2n + 2t + 2 n
2
+ 3nt − 2t
2
+ n − 2t 1
Algorithm 2 2n
2
+ nt + 6t
2
+ 4n + 6t + 5 n
2
+ 2nt + 2t
2
+ n + 2t + 2 1
Table 2: Direct implementation complexity of syndromebased
decoding
Multiplications Additions Inv.
Syndrome computation 2t(n − 1) 2t(n − 1) 0
Key equation solver 4t(2t + 2) 2t(2t + 1) 0
Chien search n(t − 1) nt 0
Forney’s formula 2t
2
t(2t − 1) t
Total 3nt + 10t
2
− n + 6t 3nt + 6t
2
− t t
also account for the registers needed by the delay line called
Main Street [20].
Both the PST and the ST can be adapted to implement
decoder architectures for syndromeless decoding algorithms.
Similar to syndrome computation, interpolation in syn
dromeless decoders can be implemented by Horner’s rule,
and thus the PST can be easily adapted to implement this
Step. For the architectures based on syndromeless decoding,
the PST contains n cells, and the hardware costs of each
cell remain the same. The partial GCD is implemented by
the ST. The ST can implement the polynomial division
in message recovery as well. In Step (1.3), the maximum
polynomial degree of the polynomial division is k +t and the
iteration number is at most k. As mentioned in Section 3.1,
the degree of q(x) in Step (2.3) ranges from 1 to t. In the
polynomial division g
0
(x)/v(x), the maximum polynomial
degree is n and the iteration number is at most n − 1. Given
the maximum polynomial degree and iteration number, the
hardware costs and latency for the ST can be determined as
for the syndromebased architecture.
The other operations of syndromeless decoders do
not have corresponding functional units available in the
hypersystolic architecture, and we choose to implement them
in a straightforward way. In the polynomial multiplication
q(x)u(x), u(x) has degree at most t − 1 and the product has
degree at most n−1. Thus, it can be done by n multiplyand
accumulate circuits, n registers in t cycles (see, e.g., [24]). The
polynomial addition in Step (2.3) can be done in one clock
cycle with n adders and n registers. To remove the scaling
factor, Step (1.3) is implemented in four cycles with at most
one inverter, k+2 multipliers, and k+3 registers; Step (2.3) is
implemented in three cycles with at most one inverter, n + 1
multipliers, and n + 2 registers. We summarize the hardware
costs, latency, and throughput of the decoder architectures
based on Algorithms 1 and 2 in Table 4.
Now we compare the hardware costs of the three decoder
architectures based on Tables 3 and 4. The hardware costs
are measured by the numbers of various basic circuit
elements. All three decoder architectures need only one
inverter. The syndromebased decoder architecture requires
fewer multiplexers than the decoder architecture based on
Algorithm 1, regardless of the rate, and fewer multipliers,
adders, and registers when R > 1/2. The syndrome
based decoder architecture requires fewer registers than the
decoder architecture based on Algorithm 2 when R > 21/43,
and fewer multipliers, adders, and multiplexers regardless
of the rate. Thus, for high rate codes, the syndrome
based decoder has lower hardware costs than syndromeless
decoders. The decoder architecture based on Algorithm 1
requires fewer multipliers and adders than that based on
Algorithm 2, regardless of the rate, but more registers and
multiplexers when R > 9/17.
In these algorithms, each Step starts with the results
of the previous Step. Due to this data dependency, their
corresponding functional units have to operate in a pipelined
fashion. Thus, the decoding latency is simply the sum of the
latency of all the functional units. The decoder architecture
based on Algorithm 2 has the longest latency, regardless
of the rate. The syndromebased decoder architecture has
shorter latency than the decoder architecture based on
Algorithm 1 when R > 1/7.
All three decoders have the same CPD, so the throughput
is determined by the number of clock cycles. Since the
functional units in each decoder architecture are pipelined,
the throughput of each decoder architecture is determined by
the functional unit that requires the largest number of cycles.
Regardless of the rate, the decoder based on Algorithm 2 has
the lowest throughput. When R > 1/2, the syndromebased
decoder architecture has higher throughput than the decoder
architecture based on Algorithm 1. When the rate is lower,
they have the same throughput.
Hence, for highrate RS codes, the syndromebased
decoder architecture requires less hardware and achieves
higher throughput and shorter latency than those based on
syndromeless decoding algorithms.
4. FAST IMPLEMENTATIONOF
SYNDROMELESS DECODING
In this section, we implement the three Steps of Algorithms
1 and 2: interpolation, partial GCD, and message recovery,
6 EURASIP Journal on Wireless Communications and Networking
Table 3: Decoder architecture based on syndromebased decoding (CPD is T
mult
+ T
add
+ T
mux
)
Multipliers Adders Inverters Registers Muxes Latency Throughput
−1
Syndrome computation 2t 2t 0 10t 2t n + 6t 6t
Key equation solver 2t + 1 2t + 1 0 10t + 5 14t + 7 12t 12t
Correction 3t + 3 3t + 1 1 12t + 10 3t + 1 3t 3t
Total 7t + 4 7t + 2 1 n + 53t + 15 19t + 8 n + 21t 12t
Table 4: Decoder architectures based on syndromeless decoding (CPD is T
mult
+ T
add
+ T
mux
)
Multipliers Adders Inverters Registers Muxes Latency Throughput
−1
Interpolation n n 0 5n n 4n 3n
Partial Algorithm 1 n + 1 n + 1 0 5n + 5 7n + 7 12t 12t
GCD Algorithm 2 2t + 1 2t + 1 0 10t + 5 14t + 7 12t 12t
Message Algorithm 1 2k + t + 3 k + t + 1 1 6k + 5t + 8 7k + 7t + 7 6k + 4 6k
recovery Algorithm 2 3n + 2 3n + 1 1 7n + 7 7n + 7 6n + t − 2 6n
Total
Algorithm 1 2n + 2k + t + 4 2n + k + t + 2 1 10n + 6k + 5t + 13 8n + 7k + 7t + 14 4n + 6k + 12t + 4 6k
Algorithm 2 4n + 2t + 3 4n + 2t + 2 1 12n + 10t + 12 8n + 14t + 14 10n + 13t − 2 6n
by fast algorithms described in Section 2 and evaluate their
complexities. Since both the polynomial division by Newton
iteration and the FEEA depend on eﬃcient polynomial
multiplication, the decoding complexity relies on the com
plexity of polynomial multiplication. Thus, in addition
to ﬁeld multiplications and additions, the complexities in
this section are also expressed in terms of polynomial
multiplications.
4.1. Polynomial multiplication
We ﬁrst derive a tighter bound on the complexity of the fast
polynomial multiplication based on Cantor’s approach.
Let the degree of the product of two polynomials be less
than n. The polynomial multiplication can be done by two
FFTs and one inverse FFT if lengthn FFT is available over
GF(2
m
), which requires n ¦ 2
m
− 1. If n 2
m
− 1, one
option is to pad the polynomials to length n
¹
(n
¹
> n) with
n
¹
¦ 2
m
− 1. Compared with fast polynomial multiplication
based on multiplicative FFT, Cantor’s approach uses additive
FFT and does not require n ¦ 2
m
− 1, so it is more
eﬃcient than FFT multiplication with padding for most
degrees. For n= 2
m
− 1, their complexities are similar.
Although asymptotically worse than Sch¨ onhage’s algorithm
[12], which has O(nlog nlog log n) complexity, Cantor’s
approach has small implicit constants, and hence, it is more
suitable for practical implementation of RS codes [6, 11].
Gao claimed an improvement on Cantor’s approach in [6],
but we do not pursue this due to lack of details.
A tighter bound on the complexity of Cantor’s approach
is given in Theorem 1. Here we make the same assumption
as in [11] that the auxiliary polynomials s
i
and the values
s
i
(β
j
) are precomputed. The complexity of precomputation
was given in [11].
Theorem 1. By Cantor’s approach, two polynomials a, b ∈
GF(2
m
)[x] whose product has a degree less than h (1 ≤
h ≤ 2
m
) can be multiplied using less than (3/2)h log
2
h +
(7/2)hlog h − 2h + log h + 2 multiplications, (3/2)h log
2
h +
(21/2)hlog h − 13h + log h + 15 additions, and 2h inversions
over GF(2
m
).
Proof. There exists 0 ≤ p ≤ m satisfying 2
p−1
< h ≤ 2
p
. Since
both the MPE and MPI algorithms are recursive, we denote
the numbers of additions of the MPE and MPI algorithms for
input i (0 ≤ i ≤ p) as S
E
(i) and S
I
(i), respectively. Clearly,
S
E
(0) = S
I
(0) = 0. Following the approach in [11], it can be
shown that for 1 ≤ i ≤ p,
S
E
(i) ≤ i(i + 3)2
i−2
+ (p − 3)(2
i
− 1) + i, (1)
S
I
(i) ≤ i(i + 5)2
i−2
+ (p − 3)(2
i
− 1) + i. (2)
Let M
E
(h) and A
E
(h) denote the numbers of multipli
cations and additions, respectively, that the MPE algorithm
requires for polynomials of a degree less than h. When i = p
in the MPE algorithm, f (x) has a degree less than h ≤ 2
p
,
while s
p−1
is of degree 2
p−1
and has at most p nonzero
coeﬃcients. Thus, g(x) has a degree less than h − 2
p−1
.
Therefore, the numbers of multiplications and additions for
the polynomial division in [11, Step 2 of Algorithm 3.1] are
both p(h − 2
p−1
), while r
1
(x) = r
0
(x) + s
i−1
(β
i
)g(x) needs
at most h − 2
p−1
multiplications and the same number of
additions. Substituting the bound on M
E
(2
p−1
) in [11], we
obtain M
E
(h) ≤ 2M
E
(2
p−1
) + p(h − 2
p−1
) + h − 2
p−1
, and
thus M
E
(h) is at most (1/4)p
2
2
p
− (1/4)p2
p
− 2
p
+ (p + 1)h.
Similarly, substituting the bound on S
E
(p − 1) in (1), we
obtain A
E
(h) ≤ 2S
E
(p−1)+p(h−2
p−1
)+h−2
p−1
, and hence
A
E
(h) is at most (1/4)p
2
2
p
+ (3/4)p2
p
−4·2
p
+ (p + 1)h + 4.
Let M
I
(h) and A
I
(h) denote the numbers of multiplica
tions and additions, respectively, which the MPI algorithm
requires when the interpolated polynomial has a degree less
than h. When i = p in the MPI algorithm, f (x) has a degree
less than h ≤ 2
p
. It implies that r
0
(x) +r
1
(x) has a degree less
than h − 2
p−1
. Thus, it requires at most h − 2
p−1
additions
to obtain r
0
(x) + r
1
(x) and h − 2
p−1
multiplications for
s
i−1
(β
i
)
−1
(r
0
(x)+r
1
(x)). The numbers of multiplications and
N. Chen and Z. Yan 7
additions for the polynomial multiplication in [11, Step 3 of
Algorithm 3.2] to obtain f (x) are both p(h − 2
p−1
). Adding
r
0
(x) also needs 2
p−1
additions. Substituting the bound on
M
I
(2
p−1
) in [11], we have M
I
(h) ≤ 2M
I
(2
p−1
)+p(h−2
p−1
)+
h−2
p−1
, and hence M
I
(h) is at most (1/4)p
2
2
p
−(1/4)p2
p
−
2
p
+(p+1)h. Similarly, substituting the bound on S
I
(p−1) in
(2), we have A
I
(h) ≤ 2S
I
(p−1)+p(h−2
p−1
)+h+1, and hence
A
E
(h) is at most (1/4)p
2
2
p
+ (5/4)p2
p
−4·2
p
+ (p + 1)h + 5.
The interpolation Step also needs 2
p
inversions.
Let M(h
1
, h
2
) be the complexity of multiplication of two
polynomials of degrees less than h
1
and h
2
. Using Cantor’s
approach, M(h
1
, h
2
) includes M
E
(h
1
) + M
E
(h
2
) + M
I
(h) + 2
p
multiplications, A
E
(h
1
) + A
E
(h
2
) + A
I
(h) additions, and 2
p
inversions, when h = h
1
+h
2
−1. Finally, we replace 2
p
by 2h
as in [11].
Compared with the results in [11], our results have
the same highest degree term but smaller terms for lower
degrees.
By Theorem 1, we can easily compute M(h
1
) M(h
1
,
h
1
). A byproduct of the above proof is the bounds for the
MPE and MPI algorithms. We also observe some properties
for the complexity of fast polynomial multiplication that
hold for not only Cantor’s approach but also for other
approaches. These properties will be used in our complex
ity analysis next. Since all fast polynomial multiplication
algorithms have higherthanlinear complexities, 2M(h) ≤
M(2h). Also note that M(h + 1) is no more than M(h) plus
2h multiplications and 2h additions [12, Exercise 8.34]. Since
the complexity bound is determined only by the degree of
the product polynomial, we assume M(h
1
, h
2
) ≤ M((h
1
+
h
2
)/2). We note that the complexities of Sch¨ onhage’s
algorithm as well as Sch¨ onhage and Strassen’s algorithm,
both based on multiplicative FFT, are also determined by the
degree of the product polynomial [12].
4.2. Polynomial division
Similar to [12, Exercise 9.6], in characteristic2 ﬁelds, the
complexity of Newton iteration is at most
¸
0≤j≤r−1
¸
M
¸¸¸
d
0
+ 1
2
−j
¸
+ M
¸¸¸
d
0
+ 1
2
−j−1
¸
, (3)
where r = log(d
0
+1). Since (d
0
+1)2
−j
 ≤ (d
0
+1)2
−j
]+1
and M(h + 1) is no more than M(h), plus 2h multiplica
tions and 2h additions [12, Exercise 8.34], it requires at
most
¸
1≤j≤r
(M((d
0
+ 1)2
−j
]) + M((d
0
+ 1)2
−j−1
])), plus
¸
0≤j≤r−1
(2(d
0
+1)2
−j
] +2(d
0
+1)2
−j−1
]) multiplications
and the same number of additions. Since 2M(h) ≤ M(2h),
Newton iteration costs at most
¸
0≤j≤r−1
((3/2)M((d
0
+
1)2
−j
])) ≤ 3M(d
0
+ 1), 6(d
0
+ 1) multiplications, and
6(d
0
+1) additions. The second Step to compute the quotient
needs M(d
0
+ 1) and the last Step to compute the remainder
needs M(d
1
+ 1, d
0
+ 1) and d
1
+ 1 additions. By M(d
1
+
1, d
0
+ 1) ≤ M((d
0
+ d
1
)/2 + 1), the total cost is at most
4M(d
0
) + M((d
0
+ d
1
)/2), 15d
0
+ d
1
+ 7 multiplications,
and 11d
0
+ 2d
1
+ 8 additions. Note that this bound does not
require d
1
≥ d
0
as in [12].
4.3. Partial GCD
The partial GCD Step can be implemented in three
approaches: the ST, the classical EEA with fast polynomial
multiplication and Newton iteration, and the FEEA with fast
polynomial multiplication and Newton iteration. The ST is
essentially the classical EEA. The complexity of the classical
EEA is asymptotically worse than that of the FEEA. Since the
FEEA is more suitable for long codes, we will use the FEEA
in our complexity analysis of fast implementations.
In order to derive a tighter bound on the complexity of
the FEEA, we ﬁrst present a modiﬁed FEEA in Algorithm 3.
Let η(h) max¦j:
¸
j
i=1
deg q
i
≤ h¦, which is the number of
Steps of the EEA satisfying deg r
0
− deg r
η(h)
≤ h < deg r
0
−
deg r
η(h)+1
. For f (x) = f
n
x
n
+ · · · + f
1
x + f
0
with f
n /
=0, the
truncated polynomial f (x) h f
n
x
h
+· · ·+ f
n−h+1
x+ f
n−h
,
where f
i
= 0 for i < 0. Note that f (x) h = 0 if h < 0.
Algorithm 3. (modiﬁed fast extended Euclidean algorithm)
Input: two monic polynomials r
0
and r
1
, with deg r
0
=
n
0
> n
1
= deg r
1
, as well as integer h (0 < h ≤ n
0
)
Output: l = η(h), ρ
l+1
, R
l
, r
l
, and ` r
l+1
(3.1) If r
1
= 0 or h < n
0
− n
1
, then return 0, 1,
¸
1 0
0 1
¸
, r
0
,
and r
1
.
(3.2) h
1
= h/2], r
∗
0
= r
0
2h
1
, r
∗
1
= r
1
(2h
1
− (n
0
−
n
1
)).
(3.3) ( j − 1, ρ
∗
j
, R
∗
j−1
, r
∗
j−1
, ` r
∗
j
) = FEEA(r
∗
0
, r
∗
1
, h
1
).
(3.4)
¸ rj−1
` rj
¸
= R
∗
j−1
¸
r0−r
∗
0
x
n
0
−2h
1
r1−r
∗
1
x
n
0
−2h
1
¸
+
¸ r
∗
j−1
x
n
0
−2h
1
` r
∗
j
x
n
0
−2h
1
¸
, R
j−1
=
¸
1 0
0 1/lc(` rj )
¸
R
∗
j−1
, ρ
j
= ρ
∗
j
lc(` r
j
), r
j
= ` r
j
/lc(` r
j
), n
j
=
deg r
j
.
(3.5) If r
j
= 0 or h < n
0
− n
j
, then return j −
1, ρ
j
, R
j−1
, r
j−1
, and ` r
j
.
(3.6) Perform polynomial division with remainder as
r
j−1
= q
j
r
j
+ ` r
j+1
, ρ
j+1
= lc(` r
j+1
), r
j+1
=
` r
j+1
/ρ
j+1
, n
j+1
= deg r
j+1
, R
j
=
¸
0 1
1/ρj+1 −qj /ρj+1
¸
R
j−1
.
(3.7) h
2
= h − (n
0
− n
j
), r
∗
j
= r
j
2h
2
, r
∗
j+1
= r
j+1
(2h
2
− (n
j
− n
j+1
)).
(3.8) (l − j, ρ
∗
l+1
, S
∗
, r
∗
l−j
, ` r
∗
l−j+1
) = FEEA(r
∗
j
, r
∗
j+1
, h
2
).
(3.9)
¸
rl
` rl+1
¸
= S
∗
¸ rj −r
∗
j
x
n
j
−2h
2
rj+1−r
∗
j+1
x
n
j
−2h
2
¸
+
¸ r
∗
l−j
x
n
j
−2h
2
` r
∗
l−j+1
x
n
j
−2h
2
¸
, S =
¸
1 0
0 1/lc(` rl+1)
¸
S
∗
, ρ
l+1
= ρ
∗
l+1
lc(` r
l+1
).
(3.10) Return l, ρ
l+1
, SR
j
, r
l
, ` r
l+1
.
It is easy to verify that Algorithm 3 is equivalent to the
FEEA in [12, 17]. The diﬀerence between Algorithm 3 and
the FEEA in [12, 17] lies in Steps (3.4), (3.5), (3.8), and
(3.10): in Steps (3.5) and (3.10), two additional polynomials
are returned, and they are used in the updates of Steps (3.4)
and (3.8) to reduce complexity. The modiﬁcation in Step
(3.4) was suggested in [14] and the modiﬁcation in Step (3.9)
follows the same idea.
In [12, 14], the complexity bounds of the FEEA are
established assuming n
0
≤ 2h. Thus, we ﬁrst establish a
bound of the FEEA for the case n
0
≤ 2h below in Theorem 2,
8 EURASIP Journal on Wireless Communications and Networking
using the bounds we developed in Sections 4.1 and 4.2. The
proof is similar to those in [12, 14] and hence omitted;
interested readers should have no diﬃculty ﬁlling in the
details.
Theorem 2. Let T(n
0
, h) denote the complexity of the FEEA.
When n
0
≤ 2h, T(n
0
, h) is at most 17M(h) log h plus
(48h + 2) log h multiplications, (51h + 2) log h additions,
and 3h inversions. Furthermore, if the degree sequence is
normal, T(2h, h) is at most 10M(h) log h, ((55/2)h + 6) log h
multiplications, and ((69/2)h + 3) log h additions.
Compared with the complexity bounds in [12, 14], our
bound not only is tighter, but also speciﬁes all terms of the
complexity and avoid the big O notation. The saving over
[14] is due to lower complexities of Steps (3.6), (3.9), and
(3.10) as explained above. The saving for the normal case
over [12] is due to lower complexity of Step (3.9).
Applying the FEEA to g
0
(x) and g
1
(x) to ﬁnd v(x) and
g(x) in Algorithm 1, we have n
0
= n and h ≤ t since
deg v(x) ≤ t. For RS codes, we always have n > 2t. Thus,
the condition n
0
≤ 2h for the complexity bound in [12, 14]
is not valid. It was pointed out in [6, 12] that s
0
(x) and s
1
(x)
as deﬁned in Algorithm 2 can be used instead of g
0
(x) and
g
1
(x), which is the diﬀerence between Algorithms 1 and 2.
Although such a transform allows us to use the results in
[12, 14], it introduces extra cost for message recovery [6]. To
compare the complexities of Algorithms 1 and 2, we establish
a more general bound in Theorem 3.
Theorem 3. The complexity of FEEA is no more than
34M(h/2]) logh/2] + M(n
0
/2]) + 4M(n
0
/2 − h/4) +
2M((n
0
−h)/2])+4M(h)+2M((3/4)h])+4M(h/2]), (48h+
4) logh/2] + 9n
0
+ 22h multiplications, (51h + 4) logh/2] +
11n
0
+ 17h + 2 additions, and 3h inversions.
The proof is also omitted for brevity. The main diﬀerence
between this case and Theorem 2 lies in the top level call
of the FEEA. The total complexity is obtained by adding
2T(h, h/2]) and the toplevel cost.
It can be veriﬁed that, when n
0
≤ 2h, Theorem 3 presents
a tighter bound than Theorem 2 since saving on the top
level is accounted for. Note that the complexity bounds in
Theorems 2 and 3 assume that the FEEA solves s
l+1
r
0
+
t
l+1
r
1
= ` r
l+1
for both t
l+1
and s
l+1
. If s
l+1
is not necessary, the
complexity bounds in Theorems 2 and 3 are further reduced
by 2M(h/2]), 3h + 1 multiplications, and 4h + 1 additions.
4.4. Complexity comparison
Using the results in Sections 4.1, 4.2, and 4.3, we ﬁrst
analyze and then compare the complexities of Algorithms
1 and 2 as well as syndromebased decoding under fast
implementations.
In Steps (1.1) and (2.1), g
1
(x) can be obtained by an
inverse FFT when n¦ 2
m
− 1 or by the MPI algorithm. In
the latter case, the complexity is given in Section 4.1. By
Theorem 3, the complexity of Step (1.2) is T(n, t) minus the
complexity to compute s
l+1
. The complexity of Step (2.2) is
T(2t, t). The complexity of Step (1.3) is given by the bound in
Section 4.2. Similarly, the complexity of Step (2.3) is readily
obtained by using the bounds of polynomial division and
multiplication.
All the steps of syndromebased decoding can be imple
mented using fast algorithms. Both syndrome computation
and the Chien search can be done by npoint evaluations.
Forney’s formula can be done by two tpoint evaluations plus
t inversions and t multiplications. To use the MPE algorithm,
we choose to evaluate on all n points. By Theorem 3, the
complexity of the key equation solver is T(2t, t) minus the
complexity to compute s
l+1
.
Note that to simplify the expressions, the complexi
ties are expressed in terms of three kinds of operations:
polynomial multiplications, ﬁeld multiplications, and ﬁeld
additions. Of course, with our bounds on the complexity of
polynomial multiplication in Theorem 1, the complexities of
the decoding algorithms can be expressed in terms of ﬁeld
multiplications and additions.
Given the code parameters, the comparison among
these algorithms is quite straightforward with the above
expressions. As in Section 3.2, we attempt to compare the
complexities using only R. Such a comparison is of course not
accurate, but it sheds light on the comparative complexity
of these decoding algorithms without getting entangled in
the details. To this end, we make four assumptions. First, we
assume the complexity bounds on the decoding algorithms
as approximate decoding complexities. Second, we use the
complexity bound in Theorem 1 as approximate polynomial
multiplication complexities. Third, since the numbers of
multiplications and additions are of the same degree, we only
compare the numbers of multiplications. Fourth, we focus
on the diﬀerence of the second highest degree terms since the
highest degree terms are the same for all three algorithms.
This is because the partial GCD Steps of Algorithms 1 and
2, as well as the key equation solver in syndromebased
decoding, diﬀer only in the top level of the recursion of
FEEA. Hence, Algorithms 1 and 2 as well as the key equation
solver in syndromebased decoding have the same highest
degree term.
We ﬁrst compare the complexities of Algorithms 1 and
2. Using Theorem 1, the diﬀerence between the second
highest degree terms is given by (3/4)(25R − 13)n log
2
n,
so Algorithm 1 is less eﬃcient than Algorithm 2 when
R > 0.52. Similarly, the complexity diﬀerence between
syndromebased decoding and Algorithm 1 is given by
(3/4)(1 − 31R)n log
2
n. Thus, syndromebased decoding is
more eﬃcient than Algorithm 1 when R > 0.032. Comparing
syndromebased decoding and Algorithm 2, the complexity
diﬀerence is roughly −(9/2)(2+R)n log
2
n. Hence, syndrome
based decoding is more eﬃcient than Algorithm 2 regardless
of the rate.
We remark that the conclusion of the above comparison
is similar to those obtained in Section 3.2 except the
thresholds are diﬀerent. Based on fast implementations,
Algorithm 1 is more eﬃcient than Algorithm 2 for low rate
codes, and the syndromebased decoding is more eﬃcient
than Algorithms 1 and 2 in virtually all cases.
N. Chen and Z. Yan 9
Table 5: Complexity of syndromeless decoding
(n, k)
Direct implementation Fast implementation
Algorithm 1 Algorithm 2 Algorithm 1 Algorithm 2
Mult. Add. Inv. Overall Mult. Add. Inv. Overall Mult. Add. Inv. Overall Mult. Add. Inv. Overall
(255, 233)
Interpolation 64770 64770 0 1101090 64770 64770 0 11101090 586 6900 0 16276 586 6900 0 16276
Partial GCD 16448 8192 0 271360 2176 1056 0 35872 8224 8176 16 140016 1392 1328 16 23856
Msg recovery 57536 4014 1 924606 69841 8160 1 1125632 3791 3568 1 64240 8160 7665 1 138241
Total 138754 76976 1 2297056 136787 73986 1 2262594 12601 18644 17 220532 10138 15893 17 178373
(511, 447)
Interpolation 260610 260610 0 4951590 260610 260610 0 4951590 1014 23424 0 41676 1014 23424 0 41676
Partial GCD 65664 32768 0 1214720 8448 4160 0 156224 32832 32736 32 624288 5344 5216 32 101984
Msg recovery 229760 15198 1 4150896 277921 31680 1 5034276 14751 14304 1 279840 31680 30689 1 600947
Total 556034 308576 1 10317206 546979 296450 1 10142090 48597 70464 33 945804 38038 59329 33 744607
Table 6: Complexity of syndromebased decoding
(n, k)
Direct implementation Fast implementation
Mult. Add. Inv. Overall Mult. Add. Inv. Overall
(255, 223)
Syndrome computation 8128 8128 0 138176 149 4012 0 6396
Key equation solver 2176 1056 0 35872 1088 1040 16 18704
Chien search 3825 4080 0 65280 586 6900 0 16276
Forney’s formula 512 496 16 8944 512 496 16 8944
Total 14641 13760 16 248272 2335 12448 32 50320
(511, 447)
Syndrome computation 32640 32640 0 620160 345 16952 0 23162
Key equation solver 8448 4160 0 156224 4224 4128 32 80736
Chien search 15841 16352 0 301490 1014 23424 0 41676
Forney’s formula 2048 2016 32 39456 2048 2016 32 39456
Total 58977 55168 32 1117330 7631 46520 64 185030
5. CASE STUDY ANDDISCUSSIONS
5.1. Case study
We examine the complexities of Algorithms 1 and 2 as well as
syndromebased decoding for the (255, 223) CCSDS RS code
[25] and a (511, 447) RS code which have roughly the same
rate R = 0.87. Again, both direct and fast implementations
are investigated. Due to the moderate lengths, in some cases
direct implementation leads to lower complexity, and hence
in such cases, the complexity of direct implementation is
used for both.
Tables 5 and 6 list the total decoding complexity of
Algorithms 1 and 2 as well as syndromebased decoding,
respectively. In the fast implementations, cyclotomic FFT
[16] is used for interpolation, syndrome computation, and
the Chien search. The classical EEA with fast polynomial
multiplication and division is used in fast implementations
since it is more eﬃcient than the FEEA for these lengths.
We assume normal degree sequence, which represents the
worst case scenario [12]. The message recovery Steps use long
division in fast implementation since it is more eﬃcient than
Newton iteration for these lengths. We use Horner’s rule for
Forney’s formula in both direct and fast implementations.
We note that for each decoding Step, Tables 5 and 6
not only provide the numbers of ﬁnite ﬁeld multiplications,
additions, and inversions, but also list the overall com
plexities to facilitate comparisons. The overall complexities
are computed based on the assumptions that multiplication
and inversion are of equal complexity, and that as in [15],
one multiplication is equivalent to 2m additions. The latter
assumption is justiﬁed by both hardware and software
implementations of ﬁnite ﬁeld operations. In hardware
implementation, a multiplier over GF(2
m
) generated by
trinomials requires m
2
− 1 XOR and m
2
AND gates [26],
while an adder requires m XOR gates. Assuming that XOR
and AND gates have the same complexity, the complexity of
a multiplier is 2m times that of an adder over GF(2
m
). In
software implementation, the complexity can be measured
by the number of wordlevel operations [27]. Using the
shift and add method as in [27], a multiplication requires
m − 1 shift and m XOR wordlevel operations, respectively,
while an addition needs only one XOR wordlevel operation.
Henceforth, in software implementations the complexity of
a multiplication over GF(2
m
) is also roughly 2m times as that
of an addition. Thus, the total complexity of each decoding
Step in Tables 5 and 6 is obtained by N = 2m(N
mult
+N
inv
) +
N
add
, which is in terms of ﬁeld additions.
10 EURASIP Journal on Wireless Communications and Networking
Comparisons between direct and fast implementations
for each algorithm show that fast implementations consid
erably reduce the complexities of both syndromeless and
syndromebased decoding, as shown in Tables 5 and 6.
The comparison between these tables shows that for these
two highrate codes, both direct and fast implementations
of syndromeless decoding are not as eﬃcient as their
counterparts of syndromebased decoding. This observation
is consistent with our conclusions in Sections 3.2 and 4.4.
For these two codes, hardware costs and throughput of
decoder architectures based on direct implementations of
syndromebased and syndromeless decoding can be easily
obtained by substituting the parameters in Tables 3 and 4;
thus for these two codes, the conclusions in Section 3.3 apply.
5.2. Errorsanderasures decoding
The complexity analysis of RS decoding in Sections 3
and 4 has assumed errorsonly decoding. We extend our
complexity analysis to errorsanderasures decoding below.
Syndromebased errorsanderasures decoding has been
well studied, and we adopt the approach in [18]. In this
approach, ﬁrst erasure locator polynomial and modiﬁed
syndrome polynomial are computed. After the error locator
polynomial is found by the key equation solver, the errata
locator polynomial is computed and the erroranderasure
values are computed by Forney’s formula. This approach is
used in both direct and fast implementations.
Syndromeless errorsanderasures decoding can be car
ried out in two approaches. Let us denote the number of
erasures as ν (0 ≤ ν ≤ 2t), and up to f = (2t − ν)/2] errors
can be corrected given ν erasures. As pointed out in [5, 6], the
ﬁrst approach is to ignore the ν erased coordinates, thereby
transforming the problem into errorsonly decoding of an
(n − ν, k) shortened RS code. This approach is more suitable
for direct implementation. The second approach is similar
to syndromebased errorsanderasures decoding described
above, which uses the erasure locator polynomial [5]. In
the second approach, only the partial GCD Step is aﬀected,
while the same fast implementation techniques described in
Section 4 can be used in the other Steps. Thus, the second
approach is more suitable for fast implementation.
We readily extend our complexity analysis for errors
only decoding in Sections 3 and 4 to errorsanderasures
decoding. Our conclusions for errorsanderasures decoding
are the same as those for errorsonly decoding: Algorithm 1
is the most eﬃcient only for very low rate codes; syndrome
based decoding is the most eﬃcient algorithm for high rate
codes. For brevity, we omit the details and interested readers
will have no diﬃculty ﬁlling in the details.
6. CONCLUSION
We analyze the computational complexities of two syn
dromeless decoding algorithms for RS codes using both
direct implementation and fast implementation, and com
pare themwith their counterparts of syndromebased decod
ing. With either direct or fast implementation, syndromeless
algorithms are more eﬃcient than the syndromebased
algorithms only for RS codes with very lowrate. When imple
mented in hardware, syndromebased decoders also have
lower complexity and higher throughput. Since RS codes in
practice are usually highrate codes, syndromeless decoding
algorithms are not suitable for these codes. Our case study
also shows that fast implementations can signiﬁcantly reduce
the decoding complexity. Errorsanderasures decoding is
also investigated although the details are omitted for brevity.
ACKNOWLEDGMENTS
This work was supported in part by Thales Communications
Inc. and in part by a grant from the Commonwealth of
Pennsylvania, Department of Community and Economic
Development, through the Pennsylvania Infrastructure Tech
nology Alliance (PITA). The authors are grateful to Dr.
J¨ urgen Gerhard for valuable discussions. The authors would
also like to thank the reviewers for their constructive
comments, which have resulted in signiﬁcant improvements
in the manuscript. The material in this paper was presented
in part at the IEEE Workshop on Signal Processing Systems,
Shanghai, China, October 2007.
REFERENCES
[1] S. B. Wicker and V. K. Bhargava, Eds., Reed–Solomon Codes
and Their Applications, IEEE Press, New York, NY, USA, 1994.
[2] E. R. Berlekamp, Algebraic Coding Theory, McGrawHill, New
York, NY, USA, 1968.
[3] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa, “A
method for solving key equation for decoding Goppa codes,”
Information and Control, vol. 27, no. 1, pp. 87–99, 1975.
[4] A. Shiozaki, “Decoding of redundant residue polynomial
codes using Euclid’s algorithm,” IEEE Transactions on Informa
tion Theory, vol. 34, no. 5, part 1, pp. 1351–1354, 1988.
[5] A. Shiozaki, T. K. Truong, K. M. Cheung, and I. S. Reed, “Fast
transform decoding of nonsystematic Reed–Solomon codes,”
IEE Proceedings: Computers and Digital Techniques, vol. 137,
no. 2, pp. 139–143, 1990.
[6] S. Gao, “A new algorithm for decoding Reed–Solomon codes,”
in Communications, Information and Network Security, V. K.
Bhargava, H. V. Poor, V. Tarokh, and S. Yoon, Eds., pp. 55–68,
Kluwer Academic Publishers, Norwell, Mass, USA, 2003.
[7] S. V. Fedorenko, “A simple algorithm for decoding Reed–
Solomon codes and its relation to the Welch–Berlekamp
algorithm,” IEEE Transactions on Information Theory, vol. 51,
no. 3, pp. 1196–1198, 2005.
[8] S. V. Fedorenko, “Correction to “A simple algorithm for
decoding Reed–Solomon codes and its relation to the Welch–
Berlekamp algorithm”,” IEEE Transactions on Information
Theory, vol. 52, no. 3, p. 1278, 2006.
[9] L. R. Welch and E. R. Berlekamp, “Error correction for
algebraic block codes,” US patent 4633470, September 1983.
[10] J. Justesen, “On the complexity of decoding Reed–Solomon
codes,” IEEE Transactions on Information Theory, vol. 22, no.
2, pp. 237–238, 1976.
[11] J. von zur Gathen and J. Gerhard, “Arithmetic and
factorization of polynomials over F
2
,” Tech. Rep. trrsfb
96018, University of Paderborn, Paderborn, Germany, 1996,
http://wwwmath.unipaderborn.de/∼aggathen/Publications/
gatger96a.ps.
N. Chen and Z. Yan 11
[12] J. von zur Gathen and J. Gerhard, Modern Computer Algebra,
Cambridge University Press, Cambridge, UK, 2nd edition,
2003.
[13] D. G. Cantor, “On arithmetical algorithms over ﬁnite ﬁelds,”
Journal of Combinatorial Theory, Series A, vol. 50, no. 2, pp.
285–300, 1989.
[14] S. Khodadad, Fast rational function reconstruction, M.S. thesis,
Simon Fraser University, Burnaby, BC, Canada, 2005.
[15] Y. Wang and X. Zhu, “A fast algorithm for the Fourier
transform over ﬁnite ﬁelds and its VLSI implementation,”
IEEE Journal on Selected Areas in Communications, vol. 6, no.
3, pp. 572–577, 1988.
[16] N. Chen and Z. Yan, “Reducedcomplexity cyclotomic FFT
and its application in Reed–Solomon decoding,” in Pro
ceedings of the IEEE Workshop on Signal Processing Systems
(SIPS ’07), pp. 657–662, Shanghai, China, October 2007.
[17] S. Khodadad and M. Monagan, “Fast rational function
reconstruction,” in Proceedings of the International Symposium
on Symbolic and Algebraic Computation (ISSAC ’06), pp. 184–
190, ACM Press, Genoa, Italy, July 2006.
[18] T. K. Moon, Error Correction Coding: Mathematical Methods
and Algorithms, John Wiley & Sons, Hoboken, NJ, USA, 2005.
[19] J. J. Komo and L. L. Joiner, “Adaptive Reed–Solomon decoding
using Gao’s algorithm,” in Proceedings of the IEEE Military
Communications Conference (MILCOM ’02), vol. 2, pp. 1340–
1343, Anaheim, Calif, USA, October 2002.
[20] E. Berlekamp, G. Seroussi, and P. Tong, “A hypersystolic
Reed–Solomon decoder,” in Reed–Solomon Codes and Their
Applications, S. B. Wicker and V. K. Bhargava, Eds., pp. 205–
241, IEEE Press, New York, NY, USA, 1994.
[21] D. Mandelbaum, “On decoding of Reed–Solomon codes,”
IEEE Transactions on Information Theory, vol. 17, no. 6, pp.
707–712, 1971.
[22] A. E. Heydtmann and J. M. Jensen, “On the equivalence
of the Berlekamp–Massey and the Euclidean algorithms for
decoding,” IEEE Transactions on Information Theory, vol. 46,
no. 7, pp. 2614–2624, 2000.
[23] Z. Yan and D. V. Sarwate, “New systolic architectures for
inversion and division in GF(2
m
),” IEEE Transactions on
Computers, vol. 52, no. 11, pp. 1514–1519, 2003.
[24] T. Park, “Design of the (248, 216) Reed–Solomon decoder
with erasure correction for Bluray disc,” IEEE Transactions on
Consumer Electronics, vol. 51, no. 3, pp. 872–878, 2005.
[25] “Telemetry Channel Coding,” CCSDS Std. 101.0B6, October
2002.
[26] B. Sunar and C¸ .K. Koc¸, “Mastrovito multiplier for all trinomi
als,” IEEE Transactions on Computers, vol. 48, no. 5, pp. 522–
527, 1999.
[27] A. Mahboob and N. Ikram, “Lookup table based multiplica
tion technique for GF(2
m
) with cryptographic signiﬁcance,”
IEE Proceedings: Communications, vol. 152, no. 6, pp. 965–974,
2005.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 473613, 10 pages
doi:10.1155/2008/473613
Research Article
Efﬁcient Decoding of Turbo Codes with
Nonbinary Belief Propagation
Charly Poulliat,
1
David Declercq,
1
and Thierry Lestable
2
1
ETIS laboratory, UMR 8051ENSEA/UCP/CNRS, CergyPontoise 95014, France
2
Samsung Electronics Research Institute, Communications House, South Street, Staines, Middlesex TW18 4QE, UK
Correspondence should be addressed to Charly Poulliat, charly.poulliat@ensea.fr
Received 31 October 2007; Revised 25 February 2008; Accepted 27 March 2008
Recommended by Branka Vucetic
This paper presents a new approach to decode turbo codes using a nonbinary belief propagation decoder. The proposed approach
can be decomposed into two main steps. First, a nonbinary Tanner graph representation of the turbo code is derived by clustering
the binary paritycheck matrix of the turbo code. Then, a group belief propagation decoder runs several iterations on the obtained
nonbinary Tanner graph. We show in particular that it is necessary to add a preprocessing step on the paritycheck matrix of the
turbo code in order to ensure good topological properties of the Tanner graph and then good iterative decoding performance.
Finally, by capitalizing on the diversity which comes from the existence of distinct eﬃcient preprocessings, we propose a new
decoding strategy, called decoder diversity, that intends to take beneﬁts from the diversity through collaborative decoding schemes.
Copyright © 2008 Charly Poulliat et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
Turbo codes and lowdensity, paritycheck (LDPC) codes
have long been recognized to belong tothe family of modern
error correctingcodes. Although often opponents in stan
dards and applications, these two classes of codes share
common properties, the most important one being that they
have a sparse graph representation that allows to decode
them eﬃciently using iteratively whether the maximum a
posteriori (MAP) algorithm [1] for turbo codes, or the belief
propagation (BP) algorithm for LDPC codes [2], as well as
their lowcomplexity iterative decoders.
Moreover, LDPC and turbo codes are two coding
candidates which are often options within the same system
[3, 4]. It is thus interesting to investigate common architec
ture/algorithm at the receiver side to enable switching easily
among them, whilst still maintaining reasonable cost and
area size.
Even if turbo codes eﬀectively exhibit a sparse factor
graph representation for which the BP decoder is equivalent
to the socalled turbo decoder [5, 6], this factor graph
representation is composed of diﬀerent types of nodes,
both for variable and for function nodes, which are not
reduced to paritycheck constraints (see [5] for more details).
Later, some researchers have tried to use a factor graph
representation of the turbo code based only on paritycheck
equations [7]. We will refer to a factor graph with only parity
check constraints for the function nodes (binary or not) as
Tanner graph in the rest of the paper [8].
The classical BP algorithm (sometimes called sum
product) on the Tanner graph of a turbo code does not
perform suﬃciently well to compete with the turbo decoder
performance [7]. This is mainly due to the inherent presence
of many short cycles of length 4, that lead to a poor con
vergence behavior inducing loss of performance. In order to
solve the problem of these short cycles, in [9, 10] the authors
propose to use special convolutional codes as components
of the turbo code, called lowdensity convolutional codes,
for which an iterative decoder based ontheir Tanner graph
experiences has less statistical dependence, and therefore
exhibits very good performance.
Our approach is diﬀerent from [10] since we aim at
having a generic BP decoder which performs close to the
best performance, without imposing any constraint on the
component code. In this paper, we present a new approach
to decode parallel turbo codes (i.e., binary, duobinary,
punctured or not, etc.) using a nonbinary belief propagation
decoder. The generic structure of the proposed iterative
decoder is illustrated in Figure 1. The general approach can
be decomposed into two main steps: the ﬁrst step consists
2 EURASIP Journal on Wireless Communications and Networking
Group BP
decoder
Generic turbodecoder
Clustering
Preprocessing
Code
parameters
(or parity
matrix H)
p p
Channel likelihoods
Figure 1: Block representation of the generic turbo decoder based
on group BP decoder.
in building a nonbinary Tanner graph of the turbo code
using only paritycheck nodes deﬁned over a certain ﬁnite
group, and symbol nodes representing groups of bits. The
Tanner graph is obtained by a proper clustering of order p
of the binary paritycheck matrix of the turbo code, called
“binary image.” However, the clustering of the commonly
used binary representation of turbo codes appears to be not
suitable to build an nonbinary Tanner graph representation
that leads to good performance under iterative decoding.
Thus, we will show in the paper that there exist some suitable
preprocessing functions of the paritycheck matrix (ﬁrst
block of Figure 1) for which, after the bit clustering (second
block of Figure 1), the corresponding nonbinary Tanner
graphs have good topological properties. This preliminary
tworound step is necessary to have good Tanner graph
representations that outperform the classical representations
of turbo codes under iterative decoding. Then, the second
step is a BPbased decoding stage (last block in Figure 1)
and thus consists in running several iterations of group
belief propagation (group BP), as introduced in [11], on
the nonbinary Tanner graph. Furthermore, we will show
that the decoder can also fully beneﬁt from the decoding
diversity that inherently raises from concurrent extended
Tanner graph representations, leading to the general concept
of decoder diversity. The proposed algorithms show very
good performance, as opposed to the binary BP decoder,
and serve as a ﬁrst step to view LDPC and turbo codes
within a uniﬁed framework from the decoder point of view,
that strengthen the idea to handle them with a common
approach.
The remaining of the paper is organized as follows. In
Section 2, we describe how to decode turbo codes based on
group BP decoder. To this end, we review how to derive the
binary representation of the paritycheck matrix H
tc
of a
parallel turbo code. Then, we explain how to build the non
binary Tanner graph of a turbo code based on a clustering
technique and describe the group BP decoding algorithm
based on this representation. In Section 3, we discuss how
to choose a posteriori good matrix representations and how
to take advantage of the inherent diversity that is oﬀered by
concurrent preprocessing in the decoding process. To this
end, we present some choices for the required preprocessing
of the matrix H
tc
before clustering to build a Tanner graph
with good topological properties, that performs well under
group BP decoding. Then, we introduce in Section 4 the
concept of decoder diversity and show how it can be used
to further enhance performance. Finally, conclusions and
perspectives are drawn in Section 5.
2. DECODINGA TURBOCODE AS
A NONBINARY LDPC CODE
In this Section, we present the diﬀerent key elements that
enable to decode turbo codes as nonbinary LDPC codes
deﬁned over some extended binary groups. First, we brieﬂy
review how to derive the binary representation of the parity
check matrix H
tc
of a parallel turbo code based on the parity
check matrix of a component code. Then, we explain how
to build the nonbinary Tanner graph of a turbo code based
on a clustering technique and describe how the group BP
decoding algorithm can be used to eﬃciently decode turbo
codes based on this extended representation.
2.1. Binary paritycheck matrix of a turbo code
The ﬁrst step in our approach consists in deriving a binary
paritycheck matrix representation of the turbo code. We will
only focus in this paper on parallel turbo codes with identical
component codes.
2.1.1. Paritycheck matrix of convolutional codes
The binary image of the turbo code is essentially based on
the binary representation of the paritycheck matrices of its
component codes. Following the derivations presented in
[12], the paritycheck matrix for both feedforward convo
lutional encoders and their equivalent recursive systematic
form is generally derived using the Smith’s decomposition
of its polynomial generator matrix G(D), where G(D) is a
k ×n matrix that gives the transfer of the k inputs into the n
outputs of the convolutional encoder and D is deﬁned as the
delay operator (please refer to [12] for more details about this
decomposition). From this decomposition, the polynomial
syndrome former matrix H
T
(D) [12], of dimensions n×(n−
k), can be derived and it can be expanded as
H
T
(D) = H
T
0
+ H
T
1
D + · · · + H
T
ms
D
ms
, (1)
where H
T
i
, 0 ≤ i ≤ m
s
is a matrix with dimensions n×(n−k),
and m
s
is the maximumdegree of polynomials in H
T
(D). For
both feedforward convolutional encoders and their recursive
systematic form, it is possible to derive the binary image from
the semiinﬁnite matrix H
T
given by
H
T
=
⎛
⎜
⎜
⎜
⎜
⎝
H
T
0
H
T
1
. . . H
T
ms
H
T
0
H
T
1
. . . H
T
ms
.
.
.
.
.
.
.
.
.
⎞
⎟
⎟
⎟
⎟
⎠
. (2)
When direct truncation is used, it is possible to derive
from H
T
the ﬁnite length binary paritycheck matrix with
Charly Poulliat et al. 3
dimension (N −K) ×N, given by
H =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
H
0
H
1
H
0
.
.
.
.
.
.
H
0
H
ms
H
ms
.
.
.
.
.
.
H
ms
. . . H
0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
, (3)
where N and K are the codeword and information block
lengths, respectively.
Under some length restrictions for the recursive case
[13, 14], it is also possible to derive the binary image of
the paritycheck matrix of the tailbiting code H
tb
from the
paritycheck matrix H [15] for feedforward convolutional
encoders and their recursive systematic form. This can ﬁnally
be represented as follows using the socalled “wrap around”
technique:
H
tb
=
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
H
0
H
ms
. . . H
1
H
1
H
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
H
ms
H
ms
H
0
H
ms
.
.
.
.
.
.
H
ms
H
ms−1
. . . H
0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
. (4)
Note that, in each case, both systematic and nonsystem
atic encoders give the same codewords and thus share the
same paritycheck matrix [12, 16].
2.1.2. Paritycheck matrix of turbo codes
For recursive systematic convolutional codes of rate k/(k+1),
that mainly compose classical turbo codes in the standards,
the matrix H
T
(D) is simply given by [12]
H
T
(D) =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
h
T
1
(D)
h
T
2
(D)
.
.
.
h
T
k+1
(D)
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
, (5)
where in fact h
T
i
(D), 1 ≤ i ≤ k, are the feedforward poly
nomials and h
T
k+1
(D) is the feedback polynomial deﬁning the
recursive systematic convolutional code. Then, for this kind
of components codes, the binary paritycheck matrix can be
simply derived using (2)–(4).
As recursive component codes of turbo codes are system
atic, the columns of the associated paritycheck matrix H
with dimension (N −K) ×N can be assigned to information
bits and to redundancy bits. Note that when using the pre
ceding expressions of H, the output bits of the convolutional
encoder are supposed to be ordered alternatively within the
codeword. After some column permutations, we can rewrite
H as
H = [H
i
H
r
], where H
i
and H
r
contain columns of
H relative to information and redundancy bits, respectively.
Using this notation, we can derive easily the paritycheck
matrix of a turbo code as follows for the case of two
component codes in parallel [17, 18]:
H
tc
=
H
i
H
r
0
H
i
Π
T
0 H
r
, (6)
where Π
T
is the transpose of the interleaver permutation
matrix at the input of the second component encoder. In that
case, H
tc
has dimensions 2(N − K) × 2N − K. Of course,
this technique can be easily generalized to more than two
components.
2.1.3. Example
To illustrate this section, we consider an R = 1/3 turbo code
with two rate onehalf code components with parameters
in octal given by (1, 23
8
/35
8
). Under direct truncation, the
paritycheck matrix of a component code and a correspond
ing turbo code are, respectively, given by the matrices H and
H
tc
as follows:
H =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0
0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0
0 0 0 0 1 1 1 0 0 1 0 1 1 1 0 0
0 0 0 0 0 0 1 1 1 0 0 1 0 1 1 1
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
,
H
tc
=
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
.
(7)
2.2. Clustering and preprocessing
Once the paritycheck matrix H of a turbo code has been
derived, we obtain a nonbinary Tanner graph by applying a
clustering technique, which is essentially the same as the one
described in [11].
The matrix H is decomposed in groups of p rows and
p columns. Each group of p rows represents a generalized
4 EURASIP Journal on Wireless Communications and Networking
paritycheck node in the Tanner graph, deﬁned in the ﬁnite
group G(2
p
), and each group of columns represents a symbol
node, build from the concatenation of p bits (ptuples)
deﬁning elements in G(2
p
).
A cluster is then deﬁned as a submatrix h
i j
of size
p × p in H, and each time a cluster contains nonzero
values (ones in this case) in it an edge connecting the
corresponding group of rows and group of columns is
created in the Tanner graph. To each nonzero cluster is
associated a linear function f
i j
(·) fromG(2
p
) to G(2
p
) which
has h
i j
as matrix representation. Using this notation, the ith
generalized paritycheck equation deﬁned over G(2
p
) can be
written as
j
f
i j
(c
j
) ≡ 0, (8)
where c
j
is the jth coordinate of a codeword having symbols
deﬁned in G(2
p
).
To illustrate the clustering impact on the Tanner graph
representation and to give some insights that can motivate
to extend the representation from the binary domain to a
nonbinary one, we will consider as a simple example the
clustering of the recursive systematic convolutional codes
with polynomial representation in octal basis given by
(1, 5
8
/7
8
). We assume that 12 information bits have been sent
using direct truncation. Then, a 4 ×4 clustering is applied to
the binary paritycheck matrix. Using representation of (3),
the resulting clustered matrix is given by
H =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
.
(9)
We are now able to associate a nonbinary Tanner
graph representation of H with generalized paritycheck
constraints applying now to 4tuples binary vectors. The
Tanner graph corresponding to our example is ﬁnally given
in Figure 2(b) and it is compared with the Tanner graph
associated with the binary image deﬁned by H (Figure 2(a)).
Through this example, we can see that, for convolutional
codes, when using the representation given in (3), we
can still ensure a sparse graph condition and even reach
a tree representation when increasing the order of the
representation. In fact, for rate onehalf codes, it has been
observed that there exists a minimum value of P for which
we can have a tree. This induces that using a BPlike decoder
(a)
(b)
(c)
Figure 2: Comparison of diﬀerent Tanner graph representations of
the recursive systematic convolutional code with bit clustering of
order p = 4 versus those of the binary image deﬁned by H.
will lead to a maximum a posteriori symbol decoding and, in
that case, it has been veriﬁed that BP and the MAP have the
same performance. Unfortunately, this tree condition does
not hold anymore when we use the alternative representation
H of the paritycheck matrix of a convolutional code as used
in turbo codes paritycheck matrix as it can be seen for
the Tanner graph representation of our previous example in
Figure 2(c). This representation introduces cycles even in the
extended representation of the convolutional code using bit
clustering, and as a result, in the extended representation of
the turbo codes. Moreover, when tail biting is used, there is
no possibility to ensure a tree condition due to the nonzero
elements in the righthand corner of the tailbited, parity
check matrix of the component code. Thus, a remaining
issue is how to derive a “good” extended Tanner graph
representation. To this end, we will present in Section 3
how to overcome these problems to ensure fair performance
under BP decoding by applying an eﬃcient preprocessing of
the paritycheck matrix of the turbo code.
2.3. Nonbinary group belief propagation decoding
The Tanner graph obtained by preprocessing and clustering
the binary image does not correspond to a usual code deﬁned
over a ﬁnite ﬁeld GF(q = 2
p
) but can be deﬁned on a
ﬁnite group G(2
p
) of the same order (see [11] for more
details). We will refer to the belief propagation decoder on
group codes as group BP decoder. The group BP decoder
is very similar in nature to regular BP in ﬁnite ﬁelds. The
Charly Poulliat et al. 5
Messages
of size q
V
pv
V
cp
U
vp
U
pc
Channel values
Information symbols
c
i
Linear
function nodes
h
ji
c
i
Interleaver Π
Parity check nodes
Figure 3: Tanner graph of a nonbinary LDPC code deﬁned over a ﬁnite group G(q).
only diﬀerence is that the nonzero values of a paritycheck
equation are replaced with more general linear functions
from G(2
p
) to G(2
p
), deﬁned by the binary matrices which
form the clusters. In particular, it is shown in [11] that
group BP can be implemented in the Fourier domain with
a reasonable decoding complexity.
We brieﬂy review the main steps of the group BP decoder
and its application to the nonbinary Tanner Graph of a turbo
code. The modiﬁed Tanner graph of an LDPC code over a
ﬁnite group is depicted in Figure 3, in which we indicated
the notations we use for the vector messages. Additionally, to
the classical variable and check nodes, we add function nodes
to represent the eﬀect of the linear transformations deduced
from the clusters as explained in the previous section.
The group BP decoder has four main steps which use q =
2
p
dimensional probability messages.
(i) Data node update: the output extrinsic message is
obtained from the term by term product of all input
messages including the channellikelihood message,
except the one carried on the same branch of the
Tanner graph.
(ii) Function node update: the messages are updated
through the function nodes f
i j
(·). This message
update is reduced to a cyclic permutation in the case
of a ﬁnite ﬁeld code, but in the case of a more general
linear function from G(2
p
) to G(2
p
) denoted β =
f
i j
(α) the update operation is
U
pc
[β
j
] =
i
U
vp
[α
i
] j = 0, . . . , q −1, β
j
= f
i j
(α
i
).
(10)
(iii) Check node update: this step is identical to BP decoder
over ﬁnite ﬁelds and can be eﬃciently implemented
using a fast Fourier transform. See, for example, [11,
19] for more details.
(iv) Inverse function node update: with the use of the
function f
i j
(·) backwards, that is, by identifying the
values α
i
which have the same image β
j
, the update
equation is
V
pv
[α
i
] = V
cp
[β
j
] ∀α
i
: β
j
= f
i j
(α
i
). (11)
These four steps deﬁne one decoding iteration of a
general paritycheck code on a ﬁnite group, which is the
case of a clustered convolutional or turbo code as described
previously. Note that the function node update is simply a
reordering of the values both in the ﬁnite ﬁeld case, and when
the cluster deﬁning the function f
i j
(·) is full rank. When the
cluster has deﬁcient rank r < p, which is often the case when
clustering a turbo code, only 2
r
entries of the message U
pc
are
ﬁlled and the remaining entries are set to zero.
Note that we do not discuss in this paper the decoding
complexity issues, but we rather focus on the feasibility of
the decoding using a BP decoder. Of course, a nonbinary BP
decoder is naturally much more computationally intensive
than a binary BP or a turbo decoder. However, reduced
complexity nonbinary decoders have been recently proposed,
which exhibit good complexity/performance tradeoﬀ even
compared to binary decoders [20]. The reduced complexity
decoder can be easily adapted to codes on ﬁnite groups, since
the function node update is not more complex in the group
case than in the ﬁeld case.
3. COMPARISONOF BINARY IMAGES OBTAINED
WITHDIFFERENT PREPROCESSINGS
In this Section, we discuss some relevant issues related to the
improvement of the performance when group BP decoder
is used. We show in particular that some preprocessing
functions can lead to interesting Tanner graph topologies and
good performance under iterative decoding.
3.1. Selection of preprocessing for an efﬁcient
sparse graph representation
It should be noted that the performance of the group BP
decoder depends highly on the structure of the nonbinary
Tanner graph. In our framework, it is possible to apply some
speciﬁc transformations on the binary image H before the
clustering operation, so that the Tanner graph has desirable
properties. Indeed, any row linear transformation A and
column permutation Π applied to H do not change the code
space but change the topology of the clustered Tanner graph.
Let us denote by H
= P
c
(H) = A · H · Π the preprocessed
6 EURASIP Journal on Wireless Communications and Networking
binary paritycheck matrix. We propose in this paper two
preprocessing techniques that we found attractive in terms of
Tanner graph properties and described belowand depicted in
Figure 4.
(P
c1
)
This preprocessing is deﬁned by alternating the information
bits and the redundancy bits of the ﬁrst convolutional code
of the parallel turbo code. We obtain with this technique
two parts in the paritycheck matrix. Each of them has an
upper triangular form with a diagonal (or near diagonal for
the rectangular part of H
), therefore, reducing the number
of nonzero clusters in the nonbinary Tanner graph deduced
from H
. Note that a second preprocessing of this type can
be considered by alternating the information bits and the
redundancy bits of the second convolutional encoder.
(P
c2
)
This preprocessing is obtained by column permutations
with the aim of having the most concentrated diagonal in
the paritycheck matrix, that is, minimizing the number of
clusters that will be created on the diagonal. This is supposed
to be a good choice since the clusters on the diagonal are
the more dense in the Tanner graph and are assumed to
participate the most to the performance degradation of
the BP decoder when they contribute to cycles. Indeed, we
have veriﬁed by simulations on several turbo codes that
the number of nonzero clusters of a given size has less in
the preprocessing P
c2
than in the preprocessing P
c1
on the
diagonal. Note that by properly choosing the columns to be
permuted, several images of this type could be created.
Note that the two proposed preprocessing techniques are
restricted to column permutations, that is, with the special
case of A = Id, where Id corresponds to the identity trans
formation. This case is the simplest one; the transformation
keeps the binary Tanner graph of the code unchanged,
but the nonbinary clustered Tanner graph is modiﬁed after
preprocessing. We will show through simulations that this
has an important impact on the decoder performance.
Although Figure 4 plots examples of rate R = 1/3 turbo
codes, the exact same preprocessing strategies can be applied
to any type of turbo code, that is, any rate for punctured
and/or multibinary turbo codes.
3.2. Simulation results with different preprocessings
In this section, we apply the diﬀerent preprocessing tech
niques presented in the previous section to duobinary turbo
codes with the parameters R = 0.5 and size N = {848, 3008}
coded bits taken from the DVBRCS standard [21, 22]. The
frame sizes we used correspond to ATM and MPEG frame
sizes with K = {53, 188} information bytes, respectively.
Note that these turbo codes have sizes which are not
particularly well suited to clustering. Asize of N = 864 would
have been preferable for cluster size p = 8 to ensure a proper
clustering of each part of the turbo code paritycheck matrix
corresponding to each component codes, but we wanted to
Binary image of a turbo code, natural representation
Binary image of TC(23, 35), natural representation
100
200
300
400
500
600
700
800
R
o
w
i
n
d
e
x
200 400 600 800 1000 1200
Column index
0
0
0
0
0
0
0
0
Info
Red 1
Red 2
Π(info)
(a)
Binary image of a turbo code, preprocessing P
c1
Binary image of TC(23, 35), alternating bits for ﬁrst component code
100
200
300
400
500
600
700
800
R
o
w
i
n
d
e
x
200 400 600 800 1000 1200
Column index
0
0 0
0
0
Info+red 1
Red 2
(b)
Binary image of a turbo code, preprocessing P
c2
50
100
150
200
250
300
350
400
R
o
w
i
n
d
e
x
100 200 300 400 500 600 700 800
Column index
0
0
0
0
(c)
Figure 4: Three diﬀerent binary representations of the same rate
R = 1/3 turbo code. The ﬁrst one is the natural representation (see
(6)), the second one corresponds to the clustering P
c1
, and the third
one to the clustering P
c2
.
Charly Poulliat et al. 7
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
F
r
a
m
e
e
r
r
o
r
r
a
t
e
0.5 1 1.5 2 2.5 3 3.5
E
b
/N
0
(dB)
Turbodecoder
GroupBP: no prep.
GroupBP: P
1
GroupBP: P
2
Binary BP
Figure 5: Performance of the group BP decoding algorithm based
on diﬀerent preprocessing functions for the (R = 0.5, N = 848)
duobinary turbo code and comparison with the turbo decoder as
used in the DVB standard.
keep the original interleaver and frame size fromthe standard
[21, 22]. These codes have been terminated using tail biting,
and their minimum distances are d
min
= {18, 19}. For both
duobinary turbo codes, H
T
(D) is given by
H
T
(D) =
⎛
⎜
⎜
⎝
1 + D
2
+ D
3
1 + D + D
2
+ D
3
1 + D + D
3
⎞
⎟
⎟
⎠
. (12)
In the following, we will consider the additive white
Gaussian noise channel (AWGN) for our simulations. For
this channel, we compare the group decoder performance
with various preprocessing functions, a clustering size of
p = 8, and a ﬂoating point implementation of the group
BP decoder using shuﬄe scheduling [23]. As a reference,
we simulated the turbo decoder based on MAP component
decoders in ﬂoatingpoint precision in order to have the best
results that one can obtain with a turbo decoding strategy.
The curves plotted in Figure 5 are related to the R = 1/2
turbo code with parameter N = 848 and correspond to the
natural representation of the code and two preprocessings
(one type P
c1
and one type P
c2
).
In order to illustrate that the preprocessing has inﬂuence
on the nonbinary factor graph, we have counted the number
of nonzero clusters and also the number of fullrank clusters
in the cases of the two simulated matrices tested in this
section, and for the two types of preprocessings P
c1
and
P
c2
. We reported the statistics on Table 1. Remember that a
nonzero cluster corresponds to an edge in the Tanner graph,
and that a fullrank cluster corresponds to a permutation
function, while a rank deﬁcient cluster corresponds to a
projection. We can see that the number of nonzero clusters
is much lower in the case of the proposed preprocessing,
Table 1: Cluster Statistics on the turbo codes from the DVB
standard, with a clustering size of P = 8.
P
Total Nonzero Fullrank
clusters clusters clusters
Turbo R = 1/2 P
c1
5618 506 26
N = 848 bits P
c2
5618 426 0
Turbo R = 1/2 P
c1
70688 1786 94
N = 3008 bits P
c2
70688 1504 0
but also that there is no fullrank clusters. This indicates
that the preprocessing P
c2
has concentrated the ones of
the paritycheck matrix H
b
in a better way than P
c1
. Our
simulation results show that this better concentration has
a direct inﬂuence on the errorcorrection capability of the
group BP decoder.
All group BP simulations have used a maximum of 100
iterations, but the average number of iteration is as low as
34 iterations for frame error rates below 10
−3
. Simulations
were run until at least 100 frames have been found in error.
As expected, the preprocessing of type P
c2
is far better than
the other preprocessings, which is explained by the fact that
the corresponding Tanner graph has less nonzero clusters.
It can be seen that with a good preprocessing function, a
turbo code can be eﬃciently decoded using a BP decoder,
and even can slightly beat the turbo decoder in the waterfall
region. The turbo decoder remains better in the error ﬂoor
region, which is due to the fact that the group BP decoder
has much more detected errors (due to decoder failures) in
this region than the turbo decoder. Although we are aware
that the group BP decoder is much more complex than the
turbo decoder, this result is quite encouraging since it was
long thought that turbo codes could not be decoded using
an LDPClike decoder. As a drastic example, we have plotted
the very poor performance of a binary BP decoder on the
binary image of the turbo code, which does not converge at
all for all the SNRs under consideration.
We also simulated the same curves for a longer code with
N = 3008 in order to show the robustness of our approach.
The results are shown in Figure 6, and the same comments as
for the N = 848 code apply with an even larger performance
gain when using the best preprocessing function.
4. IMPROVINGPERFORMANCE BY CAPITALIZINGON
THE PREPROCESSINGDIVERSITY
As there exist more than one possibility to build a nonbi
nary Tanner graph from the same code through diﬀerent
preprocessing functions, this raises the question whether
if it is possible to improve the decoding performance by
using this diversity of graph representation. Actually, we
have noticed that with the same noise realization, the
group BP decoder on a speciﬁc Tanner graph can either
(i) converge to the right codeword,or (ii) converge to a
wrong codeword (undetected error),or (iii) diverge after a
ﬁxed maximum number of iterations. If we accept some
additional complexity, using several instances of iterative
decoding based on several preprocessing functions and a
8 EURASIP Journal on Wireless Communications and Networking
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
F
r
a
m
e
e
r
r
o
r
r
a
t
e
0.5 1 1.5 2 2.5 3
E
b
/N
0
(dB)
Turbodecoder
GroupBP: no prep.
GroupBP: P
1
GroupBP: P
2
Binary BP
Figure 6: Performance of the group BP decoding algorithm based
on diﬀerent preprocessing functions for the (R = 0.5, N = 3008)
duobinary turbo code and comparison with the turbo decoder as
used in the DVB standard.
proper results merging strategy is likely to improve the error
correction performance.
In this paper, we will not address the problem of ﬁnding
a good set of preprocessing functions, and we restrict
ourselves to N
d
= 5 diﬀerent Tanner graphs obtained
with preprocessing functions of type P
c2
. There are various
possible merging methods to use the outputs of each decoder,
with associated performance complexity tradeoﬀs. Aside
from the two natural merging strategies depicted below, one
can think of more elaborate choices.
Serial merging
The N
d
decoders are potentially used in a sequential manner.
Assuming that we check the value of the syndrome at each
iteration, when a decoder fails to converge to the right
codeword or to a wrong codeword after a given number
of iterations, we switch to another decoder, that is, another
Tanner graph is computed with a diﬀerent preprocessing
and we restart the decoder from scratch with the new graph
and the permuted likelihood. The process stops when one
decoder converges to a codeword (either the sent codeword
or not).
Parallel merging
The N
d
decoders are used in parallel and a maximum
likelihood (ML) decision is taken among the ones that have
converged to a codeword. If nb, with nb ≤ N
d
, is the number
of decoders that have converged to a codeword in less than
the maximum number of iterations, then the nb associated
likelihood is computed and the one with the maximum
likelihood is selected. Note that the nb candidate codewords
are not necessarily distincts.
Lower bound on merging strategies
In order to study the potential of the decoder diversity
approach regardeless of the merging strategy, we deﬁne the
following lower bound. Among the N
d
decoders in the
diversity set, we check if at least one decoder converges to
the right codeword. A decoder failure is decided if all N
d
decoders have not converged after the maximum number
of iterations. Note that this method does not exhibit any
undetected error. This is called a lower bound on merging
strategies because it assumes that if there exists at least one
Tanner graph which converges to the right codeword, one
can think of a smart procedure to select this graph. This is
of course not always possible, especially if the codeword sent
is not the ML codeword. This lower bound allows also to
have a possibly tight estimation on the parallel merging case,
without having to simulate all N
d
decoders.
The extra complexity induced by the serial merging is
negligible since the other Tanner graphs will be used only
when the ﬁrst one fails to converge, that is, at an FER =
10
−3
for the ﬁrst decoder, the decoder diversity will be used
only 0.1% of the time. The parallel merging is much more
complex since it uses N
d
times more computations, but one
can argue that it is simpler to parallelize on a chip. We did not
simulate the parallel merging in this work. In the worst case,
the extra latency of the serial merging is obviously linearly
dependent on the number N
d
of diﬀerent Tanner graphs.
In Figures 7 and 8, we report simulation results for the
AWGN channel for the two turbo codes that have been
studied in the previous section. Of course, the results with no
diversity are similar to those observed in Figures 5 and 6 for
the preprocessing of type P
c2
, and we do not plot them in the
new ﬁgures. If we focus on the maximum performance gain
that one can hope for by looking at the lower bound curves, it
is clear that using several decoders can improve signiﬁcantly
the performance, both in the waterfall and the error ﬂoor
regions. For the small code as well as for the longer code,
using group BP decoding with decoder diversity can gain
between 0.25 dB to 0.4 dB compared to the turbo decoder
using MAP component decoders, which was up to now
considered as the best decoder proposed for turbo codes.
This result shows in particular that it is possible to consider
iterative decoders which are more powerful and, therefore,
which are closer to the maximumlikelihood decoder, than
the classical turbo decoder.
Interestingly, the serial merging which is the more
obvious merging strategy, and also requires the least addi
tional complexity, achieves full decoder diversity gain in
the waterfall region, that is, above FER = 10
−3
. This is
particularly useful for wireless standards which use ARQ
based transmission and, therefore, hardly require error
correction below FER = 10
−3
. In the error ﬂoor region
though, we can see in both Figures 7 and 8 that more
elaborate merging solutions should be used to achieve full
diversity gain and obtain a substantial gain compared with
turbo decoder. Note, however, that with the serial merging
Charly Poulliat et al. 9
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
F
r
a
m
e
e
r
r
o
r
r
a
t
e
0.5 1 1.5 2 2.5 3 3.5
E
b
/N
0
(dB)
Turbodecoder
5 group decoders: serial merging
5 group decoders: lower bound
Figure 7: Decoding performance when diversity is applied to the
(R = 0.5, N = 848) duobinary turbo code.
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
F
r
a
m
e
e
r
r
o
r
r
a
t
e
0.5 1 1.5 2 2.5 3
E
b
/N
0
(dB)
Turbodecoder
5 group decoders: serial merging
5 group decoders: lower bound
Figure 8: Decoding performance when diversity is applied to the
(R = 0.5, N = 3008) duobinary turbo code.
and for the N = 3008 turbo code the results are better than
the turbo decoder for all SNRs, even in the error ﬂoor region.
5. CONCLUSION
In this paper, we have proposed a new approach to eﬃciently
decode turbo codes using a nonbinary belief propagation
decoder. It has been shown that this generic method is
fully eﬃcient if a preprocessing step on the paritycheck
matrix of the code is added to the decoding process in order
to ensure good topological properties of the Tanner graph
and then good iterative decoding performance. Using this
extended representation, we show that the proposed algo
rithm exhibits very good performance in both the waterfall
and the error regions when compared to a classical turbo
decoder. Moreover, using the inherent diversity induced by
the existence of several concurrent extended Tanner graph
representations, we showthat the performance can be further
improved and we introduce the concept of decoder diversity.
This study shows that this decoding strategy (i.e., joint use
of preprocessing, group BP and diversity decoding) appears
as a key step that enables to consider LDPC and turbo
codes within a uniﬁed framework from the decoder point of
view.
REFERENCES
[1] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding
of linear codes for minimizing symbol error rate,” IEEE
Transactions on Information Theory, vol. 20, no. 2, pp. 284–
287, 1974.
[2] R. G. Gallager, Low Density Parity Check Codes, Number
21 in Research Monograph SeriesNumber 21 in Research
Monograph Series, MIT Press, Cambridge, Mass, USA, 1963.
[3] IEEE 802.162004, “IEEE standard for local and metropolitan
area networks, air interface for ﬁxed broadband wireless access
systems,” October 2004.
[4] IEEE 802.16e, February 2006, IEEE standard for local and
metropolitan area networks, air interface for ﬁxed broadband
wireless access systems, amendment 2: physical and medium
access control layers for combined ﬁxed and mobile operation
in licensed bands and corrigendum 1.
[5] F. R. Kschischang and B. J. Frey, “Iterative decoding of
compound codes by probability propagation in graphical
models,” IEEE Journal on Selected Areas in Communications,
vol. 16, no. 2, pp. 219–230, 1998.
[6] R. J. McEliece, D. J. C. MacKay, and J.F. Cheng, “Turbo decod
ing as an instance of Pearl’s “belief propagation” algorithm,”
IEEE Journal on Selected Areas in Communications, vol. 16, no.
2, pp. 140–152, 1998.
[7] L. Zhu, J. Wang, and S. Yang, “Factor graphs based iterative
decoding of turbo codes,” in Proceedings of the IEEE Inter
national Conference on Communications, Circuits and Systems
and West Sino Expositions (ICCCAS & WeSino Expo ’02), vol.
1, pp. 46–50, Chengdu, China, JuneJuly 2002.
[8] R. Tanner, “A recursive approach to low complexity codes,”
IEEE Transactions on Information Theory, vol. 27, no. 5, pp.
533–547, 1981.
[9] K. Engdahl and K. Sh. Zigangirov, “On the statistical theory of
turbo codes,” in Proceedings of the 6th International Workshop
on Algebraic and Combinatorial Coding Theory (ACCT’98), pp.
108–111, Pskov, Russia, September 1998.
[10] A. J. Felstr¨ om and K. Sh. Zigangirov, “Timevarying periodic
convolutional codes with lowdensity paritycheck matrix,”
IEEE Transactions on Information Theory, vol. 45, no. 6, pp.
2181–2191, 1999.
[11] A. Goupil, M. Colas, G. Gelle, and D. Declercq, “FFTbased BP
decoding of general LDPC codes over Abelian groups,” IEEE
Transactions on Communications, vol. 55, no. 4, pp. 644–649,
2007.
[12] R. Johannesson and K. Sh. Zigangirov, Fundamentals of Con
volutional Coding, Digital, Mobile CommunicationDigital,
Mobile Communication, chapter 12, IEEE Press, New York,
NY, USA, 1999.
10 EURASIP Journal on Wireless Communications and Networking
[13] P. Stahl, J. B. Anderson, and R. Johannesson, “A note on tail
biting codes and their feedback encoders,” IEEE Transactions
on Information Theory, vol. 48, no. 2, pp. 529–534, 2002.
[14] C. Weiss, C. Bettstetter, and S. Riedel, “Code construction
and decoding of parallel concatenated tailbiting codes,” IEEE
Transactions on Information Theory, vol. 47, no. 1, pp. 366–
386, 2001.
[15] H. Ma and J. Wolf, “Binary unequal errorprotection block
codes formed from convolutional codes by generalized tail
biting,” IEEE Transactions on Information Theory, vol. 32, no.
6, pp. 776–786, 1986.
[16] S. Lin and D. J. Costello, Error Control Coding, PrenticeHall,
Englewood Cliﬀs, NJ, USA, 2nd edition, 2004.
[17] O. Pothier, Compound codes based on graphs and their iterative
decoding, Ph.D. thesis, ENST, Paris, France, January 2000.
[18] R. E. Blahut, Algebraic Codes for Data Transmission, Cam
bridge University Press, Cambridge, UK, 2003.
[19] D. Declercq and M. Fossorier, “Decoding algorithms for
nonbinary LDPC codes over GF(q),” IEEE Transactions on
Communications, vol. 55, no. 4, pp. 633–643, 2007.
[20] A. Voicila, D. Declercq, F. Verdier, M. Fossorier, and P. Urard,
“Low complexity, low memory EMS algorithm for nonbinary
LDPC codes,” in Proceedings of the IEEE International Confer
ence on Communications (ICC’07), pp. 671–676, Glasgow, UK,
June 2007.
[21] C. Douillard and C. Berrou, “Turbo codes with ratem/(m +
1) constituent convolutional codes,” IEEE Transactions on
Communications, vol. 53, no. 10, pp. 1630–1638, 2005.
[22] Digital Video Broadcasting (DVB), “Interaction channel for
satellite distribution systems,” 2000, ETSI EN 301 790, v 1.2.2.
[23] J. Zhang and M. Fossorier, “Shuﬄed iterative decoding,” IEEE
Transactions on Communications, vol. 53, no. 2, pp. 209–213,
2005.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 624542, 7 pages
doi:10.1155/2008/624542
Research Article
SpaceTime Convolutional Codes over Finite Fields
and Rings for Systems with Large Diversity Order
Mario de NoronhaNeto
1
and B. F. Uchˆ oaFilho
2
1
Telecommunications Systems Research and Development Group, Federal Center of Technological Education of Santa Catarina,
88103310 S˜ ao Jos´e, SC, Brazil
2
Communications Research GroupGPqCom, Department of Electrical Engineering, Federal University of Santa Catarina,
88040900 Florianopolis, SC, Brazil
Correspondence should be addressed to B. F. Uchˆ oaFilho, bart.uchoa@gmail.com
Received 26 October 2007; Revised 18 March 2008; Accepted 6 May 2008
Recommended by Yonghui Li
We propose a convolutional encoder over the ﬁnite ring of integers modulo p
k
, Z
p
k , where p is a prime number and k is any
positive integer, to generate a spacetime convolutional code (STCC). Under this structure, we prove three properties related to
the generator matrix of the convolutional code that can be used to simplify the code search procedure for STCCs over Z
p
k . Some
STCCs of large diversity order (≥4) designed under the trace criterion for n = 2, 3, and 4 transmit antennas are presented for
various PSK signal constellations.
Copyright © 2008 M. de NoronhaNeto and B. F. Uchˆ oaFilho. This is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
1. INTRODUCTION
Since the discovery of spacetime trellis codes (STTCs) by
Tarokh et al. [1], much research has been done in this
area. Some authors [2–5] have concentrated their eﬀorts
to generate STTCs through an encoding structure wherein
the inputs are binary symbols, the encoding operations are
realised modulo 2
k
, where k is any positive integer, and the
2
k
ary outputs are matched to a 2
k
ary signal constellation.
Although this encoding structure facilitates the code search
procedure, this search becomes prohibitively complex as the
number of transmit antennas, states, or modulation size
increases.
In order to simplify the design of STTCs, AbdoolRassool
et al. [6] have proven two theorems that allow one to
signiﬁcantly reduce the computational eﬀort of the code
search. In [7], utilising an alternative structure, the authors
have considered STTCs generated by a convolutional encoder
over the Galois ﬁeld GF(p) ≡ Z
p
, p a prime, where the
information symbols, the convolutional encoder tap gains,
and the output symbols are elements of Z
p
, allowing for
a spectral eﬃciency of log
2
(p) b/s/Hz. These codes are
referred to as spacetime convolutional codes (STCCs). Using
the structure proposed in [7], Hong and Chung [8] and
NoronhaNeto and Uchˆ oaFilho [9] have presented some
new STCCs over GF(p) for two transmit antennas.
The design of good STTCs is based on the wellknown
rank and determinant [1] criteria or the trace [2, 3] criterion,
depending on the system’s diversity order. If the diversity
order is greater or equal to 4, the trace criterion should be
used in substitution of the determinant criterion while the
rank criterion may be relaxed.
In this paper, utilising a nonsystematic feedfoward con
volutional encoder over the ﬁnite ring of integers modulo p
k
,
Z
p
k , and inspired by the results in [6], we prove three prop
erties related to the generator matrix of the convolutional
codes over Z
p
k that can simplify the code search procedure
for STCCs over Z
p
k . Essentially, the properties establish
equivalences among STCCs so that many convolutional
codes can be discarded in the code search without loosing
anything. Herein we focus on systems with large diversity
order, so only STCCs designed under the trace criterion are
considered. By exploiting the structure of the convolutional
encoder over Z
p
k and the simpliﬁcations coming from the
properties, we obtain some good STCCs over ﬁnite ﬁelds
(k = 1) and rings based on the trace criterion for 3,4,5,7,8,
and 9PSK modulations, n = 2, 3, 4 transmit antennas, and
encoder memories 1, 2, and 3.
2 EURASIP Journal on Wireless Communications and Networking
g
0,1
g
0,2
g
0,n .
.
.
u
t
u
t−1
u
t−2 u
t−K · · · Ψ
g
K,n
+
g
K,2
.
.
.
· · ·
g
K,1
+
+ · · ·
v
n
t
v
2
t
· · ·
v
1
t
+ +
g
2,1
g
2,2
g
2,n
.
.
.
+ +
+ +
g
1,1
g
1,2
g
1,n
.
.
.
Figure 1: Rate 1/n convolutional encoder over Z
p
k with memory order K. The multiplier Ψ controls the number of encoder states.
We should mention the important work of Carrasco and
Pereira [10] that considers nonbinary spacetime convolu
tional codes. There are signiﬁcant diﬀerences between the
present work and [10]. First, Carrasco and Pereira considered
a systematic feedback convolutional encoder, which has
approximately the same number of nonbinary coeﬃcients as
compared to the encoder in nonsystematic feedfoward form
proposed in this paper. However, our structure gives rise to
the three properties mentioned above for code equivalences,
by which we can reduce the code search eﬀort. Another
important diﬀerence between our work and the work of
Carrasco and Pereira is that in [10] they consider the
determinant criterion regardless of the number of received
antennas.
The remainder of this paper is organised as follows.
In Section 2, we describe the proposed spacetime coded
system based on convolutional codes over Z
p
k , and present
the design criteria for obtaining good STCCs. In Section 3,
we prove the three properties mentioned above and present
guidelines for ﬁnding good STCCs over Z
p
k . The new STCCs
found with the code search are tabulated in Section 4. Also
provided in that section is the frame error rate (FER)
for some of the new STCCs obtained from computer
simulations. Comparison results with existing STTCs are also
given. Finally, in Section 5, we conclude the paper and make
some ﬁnal comments.
Throughout this paper, the conjugate, transpose, and
hermitian (conjugate transpose) of a matrix/vector A are
denoted by A
∗
, A
T
, and A
H
, respectively.
2. THE SPACETIME CONVOLUTIONALLY CODED
SYSTEMANDDESIGNCRITERIA
We consider a spacetime coded system employing n trans
mit antennas and m receive antennas. In the transmitter,
at each discrete time t, a Z
p
k valued information symbol
u
t
is encoded by a rate 1/n convolutional encoder over Z
p
k
with encoder memory K, shown in Figure 1. The encoder
output at time t is a block of n coded symbols over Z
p
k ,
(v
1
t
, v
2
t
, . . . , v
n
t
), where
v
i
t
≡
K
¸
x=0
u
t−x
g
x,i
¸
mod p
k
, (1)
for i = 1, . . . , n. The encoder tap gain associated with
transmit antenna i and memory depth x is denoted by g
x,i
.
The coded symbols are mapped into a complex p
k
PSK
signal constellation and transmitted simultaneously via the
n transmit antennas. A complex codeword c of length l of the
spacetime code is a sequence of blocks
c=
¸¸
c
1
t
, c
2
t
, . . . , c
n
t
¸
=
¸¸
e
j(2π/ p
k
)v
1
t
, e
j(2π/ p
k
)v
2
t
, . . . , e
j(2π/ p
k
)v
n
t
¸
(2)
for t = 1, 2, . . . , l, where c
i
t
is the signal transmitted from the
ith antenna at time t. The set of all codewords c is called the
STCC, and is denoted by C.
Note that in Figure 1 there is a multiplier Ψ between
the (K − 1)th and the Kth memory depths. This multiplier,
a positive integer that divides p
k
, has the purpose of
controlling the number of encoder states. A similar structure
has been adopted for the Gaussian channel by Massey and
Mittelholzer in [11]. For Ψ = 1, the number of encoder
states is p
kK
. But for Ψ > 1 the number of encoder states
is reduced due to the ring property that the product of
two nonzero ring elements may be zero, which reduces the
range of possible integer values that can be stored in the
Kth encoder memory. We set the value of this multiplier
to p
k−z
, where z = 1, 2, . . . , k − 1, to obtain encoders with
intermediate number of states between powers of p
k
. The
number of encoder states becomes p
kK
/Ψ. For example, the
encoders over Z
4
with (K = 1, Ψ = 1), (K = 2, Ψ = 2), and
(K = 2, Ψ = 1) have 4, 8, and 16 states, respectively.
In the spacetime coded system, the signal received by the
jth antenna at time t, d
j
t
, is given by
d
j
t
=
n
¸
i=1
α
i, j
c
i
t
E
s
+ η
j
t
, (3)
where E
s
is the average energy of the transmitted signal,
η
j
t
is a zeromean complex white Gaussian noise with
variance N
0
/2 per dimension, and α
i, j
denotes the ﬂat fading
coeﬃcient of the channel from the ith transmit antenna
to the jth receive antenna. Under the Rayleigh fading
assumption, α
i, j
, for 1 ≤ i ≤ n and 1 ≤ j ≤ m, are
modelled as independent samples of a zeromean complex
Gaussian random process with variance 0.5 per dimension.
M. de NoronhaNeto and B. F. Uchˆ oaFilho 3
In practice, to achieve independent fading the antennas
must be physically separated by a distance in the order of
a few wavelengths. For the quasistatic, ﬂatfading channel,
it is assumed that the fading coeﬃcients remain constant
during a frame and change independently from one frame
to another.
Also, we assume that the receiver perfectly knows the
channel state information and that the Viterbi algorithm
with the Euclidean metric is used in the decoder. Under these
conditions, and for high signaltonoise ratio (SNR), Tarokh
et al. [1] have shown that the pairwise error probability is
upperbounded by
P(c −→e) ≤
r
¸
i=1
λ
i
¸
−m
E
s
4N
0
¸
−rm
, (4)
where r is the rank of the diﬀerence matrix of complex
codewords (arranged as a matrix):
B(c, e)
Δ
=
⎛
⎜
⎜
⎜
⎜
⎜
⎝
e
1
1
−c
1
1
e
1
2
−c
1
2
· · · e
1
l
−c
1
l
e
2
1
−c
2
1
e
2
2
−c
2
2
· · · e
2
l
−c
2
l
.
.
.
.
.
.
.
.
.
.
.
.
e
n
1
−c
n
1
e
n
2
−c
n
2
· · · e
n
l
−c
n
l
⎞
⎟
⎟
⎟
⎟
⎟
⎠
, (5)
and λ
i
, for i = 1, . . . , r, are the nonzero eigenvalues of
A(c, e)
Δ
= B(c, e)B(c, e)
H
. To minimise P(c→e) in (4),
we should maximise the minimum rank r of the matrix
B(c, e) over all pairs of distinct complex codewords (rank
criterion), and maximise the minimum geometric mean
(η
det
) of the nonzero eigenvalues of the matrix A(c, e) over
all pairs of distinct complex codewords with minimum rank
(determinant criterion).
As shown by Chen et al. [2, 3], the rank and the
determinant criteria should be adopted whenever rm < 4. If
rm ≥ 4, they have shown that the pairwise error probability
is upperbounded by
P(c −→e) ≤
1
4
exp
−m
E
s
4N
0
n
¸
i=1
l
¸
j=1
¸
¸
e
j
i
−c
j
i
¸
¸
2
¸
, (6)
which indicates that to minimise P(c→e) we should max
imise the minimum squared Euclidean distance over all pairs
of distinct complex codewords (trace criterion). It should be
noted that the squared Euclidean distance between c and e
is equal to the trace of A(c, e), denoted by η
tr
. In this paper,
we consider only systems with rm ≥ 4, but r needs not to be
equal to n.
3. GUIDELINES FOR FINDINGGOODSPACETIME
CONVOLUTIONAL CODES OVER Z
P
K
In this section, we prove three properties that can be used to
reduce the code search procedure for STCCs over Z
p
k . But
ﬁrst, let us denote G as the n(K + 1) scalar generator matrix
of the rate 1/n convolutional encoder over Z
p
k of Figure 1,
which is deﬁned in this paper as
G
Δ
=
⎡
⎢
⎢
⎢
⎢
⎣
g
0,1
g
1,1
· · · g
K,1
g
0,2
g
1,2
· · · g
K,2
.
.
.
.
.
.
.
.
.
.
.
.
g
0,n
g
1,n
· · · g
K,n
⎤
⎥
⎥
⎥
⎥
⎦
. (7)
The ﬁrst property is based on a result in [6, Section 3.2]
for STCCs generated by an encoder with binary input and
2
k
ary tap gains. Herein, this result is extended to the case of
a convolutional encoder over Z
p
k .
Property 1. Consider an STCC C over Z
p
k generated by a
generator matrix G with coeﬃcients g
x,i
, for x = 0, 1, . . . , K
and i = 1, 2, . . . , n. Let
`
C be the STCC over Z
p
k generated
by the generator matrix
`
G with coeﬃcients ` g
x,i
= p
k
− g
x,i
,
for x = 0, 1, . . . , K and i = 1, 2, . . . , n. Then, every pair of
codewords c, e ∈ C is associated with a pair `c, `e ∈
`
C such
that A(c, e) = B(c, e)B(c, e)
H
and A(`c, `e) = B(`c, `e)B(`c, `e)
H
have the same rank, determinant, and trace. Therefore, the
two STCCs C and
`
C are entirely equivalent.
Proof. Consider that the output of the encoder shown in
Figure 1 is as given in (1). Changing the encoder coeﬃcient
to p
k
−g
x,i
yields the following output:
` v
i
t
≡
K
¸
x=0
u
t−x
¸
p
k
−g
x,i
(mod p
k
≡
K
¸
x=0
¸
u
t−x
p
k
−
¸
u
t−x
g
x,i
(mod p
k
≡
K
¸
x=0
−u
t−x
g
x,i
(mod p
k
≡ −v
i
t
(mod p
k
≡ p
k
−v
i
t
(mod p
k
.
(8)
Each element of B(c, e) is a diﬀerence of complex numbers of
the form
b
i, j
= exp
j2πv
p
k
¸
−exp
j2πw
p
k
¸
.
The associated element of B(`c, `e) is
`
b
i, j
= exp
j2π(p
k
−v)
p
k
¸
−exp
j2π(p
k
−w)
p
k
¸
= exp
−j2πv
p
k
¸
−exp
−j2πw
p
k
¸
= b
i, j
∗
.
(9)
From (9), we can conclude that
A(`c, `e) = B(`c, `e)B(`c, `e)
H
= B(c, e)
∗
¸
B(c, e)
∗
H
= B(c, e)
∗
B(c, e)
T
=
¸
B(c, e)B(c, e)
H
∗
= A(c, e)
∗
.
(10)
4 EURASIP Journal on Wireless Communications and Networking
Table 1: New good STCCs over ﬁnite ﬁelds based on the trace criterion.
p
k
n ϑ G rank η
tr
η
det
3
3 3 [1 1 ; 1 2 ; 2 1] 2 18 —
3 9 [1 1 1 ; 1 1 2 ; 1 2 1] 3 27 3
3 27 [1 0 1 2 ; 1 1 1 1 ; 1 1 2 1] 3 33 4.32
4 3 [1 1 ; 1 1 ; 1 1 ; 1 2] 2 24 —
4 9 [0 2 1 ; 1 1 1 ; 1 2 1 ; 2 2 1] 3 33 —
4 27 [2 1 2 2 ; 2 0 2 1 ; 1 1 2 2 ; 2 2 2 1] 4 45 3
5
3 5 [1 1 ; 1 2 ; 2 2] 2 15 —
3 25 [1 1 1 ; 1 3 2 ; 2 3 1] 3 21.38 1
4 5 [1 2 ; 1 2 ; 2 1 ; 2 1] 2 20 —
7
3 7 [2 4 ; 3 5 ; 6 1] 2 14 —
4 7 [1 1 ; 1 2 ; 2 3 ; 3 3] 2 17.19 —
Since A(c, e) is Hermitian, then A(c, e) and A(c, e)
∗
have the
same rank, determinant, and trace.
Note that by this property it is possible to reduce by
approximately one half the number of STCCs to be tested
without any sacriﬁce in terms of ﬁnding the best code.
Now, we present the second property, which is also an
extension of a result in [6, Theorem 2] to the ring Z
p
k .
Property 2. Consider an STTC C over Z
p
k generated by a
generator matrix G. Any STCC over Z
p
k generated by a
generator matrix whose rows correspond to a permutation
of the rows of G is entirely equivalent to C.
Proof. A permutation of the rows of G implies a permutation
of the encoder outputs in Figure 1, and also induces the same
permutation of the rows of B(c, e). It is easy to show that
the rank, determinant, and trace of the corresponding matrix
A(c, e) are not aﬀected by any permutation of the rows of
B(c, e).
Observe that with Property 2 it is possible to obtain a
reduction in the code search space by a factor of approxi
mately n!. In this paper, we utilised Properties 1 and 2 to
reduce the code search eﬀort under the trace criterion, but
they can also be applied to the rank and the determinant
criteria. The last property, presented next, applies to the trace
criterion only.
Property 3. Consider an STCCover Z
p
k generated by a matrix
G with coeﬃcients g
x,i
, for x = 0, 1, . . . , K and i = 1, 2, . . . , n.
Changing the coeﬃcients g
x,i
of χ rows of Gto p
k
−g
x,i
, where
1 ≤ χ ≤ n, does not aﬀect the trace of the matrix A(c, e) for
any pair of STCC codewords c and e.
Proof. Consider a rate R = 1/n convolutional encoder over
Z
p
k with scalar generator matrix G. As proved in Property 1,
if the coeﬃcients g
x,i
, where x = 0, 1, . . . , K, of the ith
row of the matrix G are changed to their corresponding
complements modulo p
k
, that is, p
k
− g
x,i
, where x =
0, 1, . . . , K, then the ith row of the matrix B(c, e) changes
to its complex conjugate. Since A = BB
H
, then the ith
diagonal element a
i,i
of the matrix Ais the sumof the squared
modulus of the elements of the ith row of B. Since b
i, j

2
=
b
i, j
∗

2
, and the trace of a matrix is the sum of its diagonal
elements, Property 3 is proved.
By utilising Property 3, it is possible to reduce the code
search space by a factor of 2
n
. Note that when all rows of G
are changed to their corresponding complements modulo p
k
,
that is, when χ = n, this property becomes Property 1.
It is worth mentioning that the structure of convolutional
encoders over Z
p
k , adopted in this paper, oﬀers a reduced
search space as compared to the structure based on binary
inputs. For our structure, the number of possible codes is
p
kn(K+1)
, while for the structure with binary input (standard
structure) this number is p
k
2
n(K+1)
. This reduction is possible
because the structure over Z
p
k yields a smaller number of
coeﬃcients. Of course, since we consider a smaller search
space, it is possible that in some cases the standard structure
will produce better codes. On the other hand, the code
search based on the standard structure becomes prohibitive
as the number of transmit antennas, states, or constellation
size increases, and quite often only partial (nonexhaustive)
search results are presented (see, e.g., [12]). The STCCs
presented herein have, in many cases, the same performance
parameters of the STCCs found with the standard structure
for the same number of antennas and the same complexity.
For the cases where the STCC is over GF(p), that is, k = 1,
the structure utilised in this paper becomes the only option.
We should also mention that a computer program
routine to discard those equivalent codes, according to the
three properties, can be easily prepared. So the cut in the
search eﬀort is quite signiﬁcant.
As a ﬁnal consideration, we should note that although
in this paper we utilise only PSK constellations, quadrature
amplitude modulation (QAM) constellations could also be
used. However, Properties 1 and 3 would not hold for
QAM, and the search space reduction provided by these
properties would be lost. On the other hand, Property 2
could still be used without any modiﬁcation if QAM signal
constellations were adopted. It is well known that QAM
has better Euclidean distance properties than PSK. So, using
M. de NoronhaNeto and B. F. Uchˆ oaFilho 5
Table 2: New good STCCs over ﬁnite rings based on the trace criterion.
p
k
n ϑ G rank η
tr
η
det
4
2 4 [1 1 ; 1 2] 2 10 2
2 8 [1 1 0 ; 2 1 1] 2 12 3.46
2 16 [1 1 2 ; 2 1 3] 2 16 3.46
2 64 [1 0 1 2 ; 1 1 2 1] 2 18 5.29
3 4 [1 1 ; 1 1 ; 1 2] 2 16 —
3 8 [3 3 0 ; 1 0 1 ; 1 3 1] 2 18 —
3 16 [1 1 1 ; 1 2 2 ; 2 1 3] 2 24 —
3 64
∗
[2 2 3 3 ; 1 2 1 3 ; 1 1 3 2] 3 32 2.88
4 4 [1 1 ; 1 1; 1 2 ; 1 2] 2 20 —
4 8 [1 0 1 ; 1 1 0 ; 1 1 1 ; 1 3 1] 2 26 —
4 16 [1 1 1 ; 1 1 2 ; 1 2 2 ; 2 1 3] 3 32 —
4 64
∗
[1 3 2 3 ; 1 2 1 1 ; 2 2 1 2 ; 3 3 1 0] 4 40 2
8
2 8 [1 2 ; 4 3] 2 7.17 1.41
2 16 [2 1 0 ; 3 0 1] 2 8 2
2 64
∗
[5 1 6 ; 1 1 3] 2 10.58 1.17
3 8 [1 1 ; 2 2 ; 3 4] 2 12 —
4 8 [1 1 ; 1 2 ; 2 3 ; 3 4] 2 16.52 —
9 3 9 [1 3 ; 6 4 ; 7 2] 2 12 —
Table 3: Comparison of STCCs found with diﬀerent encoder structures.
p
k
n ϑ η
tr
[12] η
det
[12] η
tr
η
det
4
2 4 10 2 10 2
2 8 12 2.82 12 3.46
2 16 16 2.82 16 3.46
2 64 18 4 18 5.29
3 4 16 — 16 —
3 8 20 — 18 —
3 16 24 — 24 —
3 64
∗
28 — 32 2.88
4 4 20 — 20 —
4 8 26 — 26 —
4 16 32 — 32 —
4 64
∗
38 — 40 2
8
2 8 7.17 1.41 7.17 1.41
2 16 8 0.82 8 2
3 8 12 — 12 —
4 8 16.58 — 16.58 —
the encoding structure proposed in this paper, it is possible
that STCCs for QAM constellation better than STCCs for
PSK constellation of the same size exist. However, since the
demonstration of algebraic properties to reduce the code
search eﬀort constitutes an important part of this paper,
QAM will not be considered herein.
4. CODE SEARCHANDSIMULATIONRESULTS
In this section, we present some new STCCs generated by
a rate 1/n convolutional encoder over Z
p
k , and show their
performance on the quasistatic ﬂat Rayleigh fading channel.
Since we are considering large diversity order, the code search
was based only on the trace criterion. Tables 1 and 2 show
the search results for STCCs over ﬁnite ﬁelds and rings,
respectively, with various p
k
PSK modulations, number of
states (ϑ), and number of transmit antennas (n). In these
tables, the STCCs marked with ∗ are the result of a partial
search. All other codes are optimal for the structure of
Figure 1. In [12], we can ﬁnd STCCs for the 4 and 8PSK
modulations. For the same number of states and number of
transmit antennas, those codes in most cases have the same
6 EURASIP Journal on Wireless Communications and Networking
10
0
10
−1
10
−2
10
−3
F
E
R
0 2 4 6 8 10 12 14
SNR (dB)
3states, m = 2
9states, m = 2
27states, m = 2
3states, m = 3
9states, m = 3
27states, m = 3
Figure 2: FER versus SNR for the STCCs over Z
3
for 3PSK based
on the trace criterion with n = 3, m = 2, 3, and 3, 9, and 27 states.
10
0
10
−1
10
−2
10
−3
F
E
R
0 2 4 6 8 10 12 14 16
SNR (dB)
4states, m = 2
8states, m = 2
16states, m = 2
4states, m = 3
8states, m = 3
16states, m = 3
Figure 3: FER versus SNR for STCCs over Z
4
for 4PSK based on
the trace criterion with n = 3, m = 2, 3, and 4, 8, and 16 states.
trace as the STCCs presented in Table 2. Table 3 compares
STCCs for the 4 and 8PSKmodulations found with diﬀerent
structures. It can be seen that with the proposed structure we
obtained an STCC with improved trace in two cases, a worse
trace in one case, and the same trace in all other cases.
All the new STCCs in Table 1 and the STCC for 9PSK in
Table 2 have no corresponding competitors in the literature.
The STCCs over GF(p) with two transmit antennas based on
the trace criterion can be found in [9].
In Figures 2 and 3, we show the FER versus SNR
(in decibels) curves for the STCCs over GF(3) and Z
4
,
10
0
10
−1
10
−2
10
−3
F
E
R
0 2 4 6 8 10 12 14
SNR (dB)
4states, [11]
8states, [11]
64states, [11]
4states, new
8states, new
64states, new
Figure 4: Performance comparison of STCCs for 4PSK obtained
with diﬀerent encoder structures for n = 3, m = 2, and 4, 8, and 64
states.
respectively, where we can observe the performance of the
codes for diﬀerent numbers of states and receive antennas. In
Figure 4, we showthe performance comparison of the STCCs
for 4PSK found with the encoder structure over Z
p
k and
with the standard structure. For n = 3 transmit antennas,
m = 2 receive antennas, and for 4, 8, and 64 states, we
can observe that the performances of these codes are very
similar, although the codes have been generated by diﬀerent
encoder structures and have diﬀerent traces in the cases of 8
and 64 states. In all simulations presented in this section, we
considered a frame length l = 130 symbols.
5. CONCLUSIONANDFINAL COMMENTS
In this paper, we have considered spacetime convolutional
codes over ﬁnite ﬁelds and rings for the quasistatic, ﬂat
Rayleigh fading channel. Based on this encoding structure,
we proved three properties that can be used to simplify the
code search based on the trace criterion. Good STCCs for n =
2, 3, 4 transmit antennas and various p
k
PSK constellations
were presented. The resulting spectral eﬃciencies, namely,
log
2
(p
k
) b/s/Hz, can serve a wide range of multimedia
applications.
As the STCCs presented herein are designed by the
trace criterion, they do not achieve the optimal diversity
multiplexing gain (DMG) tradeoﬀ [13, 14] for system with
more than one receive antenna. Therefore, it is possible that
STTCs constructed to achieve the optimum DMG tradeoﬀ
perform better than the codes in this paper, under the same
spectral eﬃciency.
ACKNOWLEDGMENTS
This work was supported by CEFET/SC under a research
grant and by the Brazilian National Council for Scientiﬁc
M. de NoronhaNeto and B. F. Uchˆ oaFilho 7
and Technological Development (CNPq) under Grants no.
484391/20062 and 303938/20072.
REFERENCES
[1] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Spacetime
codes for high data rate wireless communication: performance
criterion and code construction,” IEEE Transactions on Infor
mation Theory, vol. 44, no. 2, pp. 744–765, 1998.
[2] Z. Chen, J. Yuan, and B. Vucetic, “Improved spacetime time
trellis coded modulation scheme on slow Rayleigh fading
channels,” Electronics Letters, vol. 37, no. 7, pp. 440–441, 2001.
[3] Z. Chen, B. Vucetic, J. Yuan, and K. L. Lo, “Spacetime
trellis codes for 4PSK with three and four transmit antennas
in quasistatic ﬂat fading channels,” IEEE Communications
Letters, vol. 6, no. 2, pp. 67–69, 2002.
[4] S. Baro, G. Bauch, and A. Hansmann, “Improved codes for
spacetime trelliscoded modulation,” IEEE Communications
Letters, vol. 4, no. 1, pp. 20–22, 2000.
[5] R. S. Blum, “Some analytical tools for the design of spacetime
convolutional codes,” IEEE Transactions on Communications,
vol. 50, no. 10, pp. 1593–1599, 2002.
[6] B. AbdoolRassool, M. R. Nakhai, F. Heliot, L. Revelly, and H.
Aghvami, “Search for spacetime trellis codes: novel codes for
Rayleigh fading channels,” IEE Proceedings: Communications,
vol. 151, no. 1, pp. 25–31, 2004.
[7] M. de NoronhaNeto, R. D. Souza, and B. F. UchaFilho,
“Spacetime convolutional codes over GF(p) achieving full 2
level diversity,” in Proceedings of the IEEE Wireless Communica
tions and Networking Conference (WCNC ’03), vol. 1, pp. 408–
413, New Orleans, La, USA, March 2003.
[8] S. K. Hong and J.M. Chung, “Prime valued spacetime con
volutionsl Z
w
code achieving full 2level diversity,” Electronics
Letters, vol. 40, no. 4, pp. 253–254, 2004.
[9] M. de NoronhaNeto and B. F. UchoaFilho, “Spacetime
convolutional codes over GF(p) for two transmit antennas,”
IEEE Transactions on Communications, vol. 56, no. 3, pp. 356–
358, 2008.
[10] R. A. Carrasco and A. Pereira, “Spacetime ring TCM codes
for QAM over fading channels,” IEE Proceedings: Communica
tions, vol. 151, no. 4, pp. 316–321, 2004.
[11] J. L. Massey and T. Mittelholzer, “Convolutional codes over
rings,” in Proceedings of the 4th Joint SwedishSoviet Interna
tional Workshop on Information Theory, pp. 14–18, Gotland,
Sweeden, AugustSeptember 1989.
[12] B. Vucetic and J. Yuan, SpaceTime Coding, John Wiley &Sons,
New York, NY, USA, 2003.
[13] L. Zheng and D. N. C. Tse, “Diversity and multiplexing:
a fundamental tradeoﬀ in multipleantenna channels,” IEEE
Transactions on Information Theory, vol. 49, no. 5, pp. 1073–
1096, 2003.
[14] R. Vaze and B. S. Rajan, “On spacetime trellis codes achieving
optimal diversity multiplexing tradeoﬀ,” IEEE Transactions on
Information Theory, vol. 52, no. 11, pp. 5060–5067, 2006.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 890194, 8 pages
doi:10.1155/2008/890194
Research Article
Joint Decoding of Concatenated VLEC and STTC System
Huijun Chen and Lei Cao
Department of Electrical Engineering, The University of Mississippi, University, MS 38677, USA
Correspondence should be addressed to Huijun Chen, chenhuijunapply@gmail.com
Received 1 November 2007; Revised 26 March 2008; Accepted 6 May 2008
Recommended by Jinhong Yuan
We consider the decoding of wireless communication systems with both source coding in the application layer and channel coding
in the physical layer for highperformance transmission over fading channels. Variable length error correcting codes (VLECs)
and space time trellis codes (STTCs) are used to provide bandwidth eﬃcient data compression as well as coding and diversity
gains. At the receiver, an iterative joint source and space time decoding scheme are developed to utilize redundancy in both STTC
and VLEC to improve overall decoding performance. Issues such as the inseparable systematic information in the symbol level,
the asymmetric trellis structure of VLEC, and information exchange between bit and symbol domains have been considered in
the maximum a posteriori probability (MAP) decoding algorithm. Simulation results indicate that the developed joint decoding
scheme achieves a signiﬁcant decoding gain over the separate decoding in fading channels, whether or not the channel information
is perfectly known at the receiver. Furthermore, how rate allocation between STTC and VLEC aﬀects the performance of the joint
source and spacetime decoder is investigated. Diﬀerent systems with a ﬁxed overall information rate are studied. It is shown that
for a system with more redundancy dedicated to the source code and a higher order modulation of STTC, the joint decoding yields
better performance, though with increased complexity.
Copyright © 2008 H. Chen and L. Cao. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. INTRODUCTION
Providing multimedia service has become an attractive
application in wireless communication systems. Due to
bandwidth limitation and hash wireless channel conditions,
reliable source transmission over wireless channel remains
a challenging problem. Space time code and variable length
source code are two key enabling techniques in the physical
and application layers, respectively.
Tarokh introduced space time trellis codes (STTCs)
[1] in multipleinput multipleoutput (MIMO) systems,
which obtain bandwidth eﬃciency four times higher that
of diversity systems without space time coding. While these
STTCs are designed to achieve the maximum diversity in
space dimension, the coding gain in time dimension, on the
other hand, still may be improved.
Variable length error correcting codes (VLECs) [2] are
a family of error correcting codes used in source coding.
VLEC maps source symbols to codewords of variable length
according to the source statistics. Compared to Huﬀman
code aiming for highcompression eﬃciency, VLEC has
inherent redundancy and some error resilient capability.
However, VLEC is still sensitive to channel errors and
one single bit error may cause continuous source symbol
partition errors due to the wellknown synchronization
problem.
Shannon’s classical separation theory states that we can
optimize the system by designing optimal source code
and channel code separately. However, this theorem holds
only for inﬁnite size of packets. Therefore, with delay
and computation resource constraint, joint optimization
of source and channel coding or decoding often yields
better performance in realistic systems. Joint source channel
decoding (JSCD) basically focuses on using the redundancy
in the source coded stream to improve the overall decoding
performance. Constraint JSCD (CJSCD) is discussed in
[3, 4], in which the output from channel decoder is modeled
as an output from binary symmetric channel (BSC) and the
source decoder exploits the statistic character of BSC as a
constraint in the maximum a posteriori probability (MAP)
algorithm. Integrated JSCD (IJSCD), proposed in [5, 6],
merges the trellises of source code and channel code into
one integrated trellis and carries out MAP decoding based
on the combined trellis. The drawback of IJSCD is that the
decoding complexity dramatically increases with the number
of states in the combined trellis. Recently, iterative JSCD
2 EURASIP Journal on Wireless Communications and Networking
[7, 8] adopts iterative decoding structure and information
exchange between source decoder and channel decoder. It
has attracted increasing attention because of its relatively low
decoding complexity.
Joint decoding schemes with space time components
have also been considered recently. A mega concatenation
system of multiplelevel code, trellis coded modulation
(TCM), and STTC is proposed in [9] to provide unequal
error protection for MPEG4 streams. Variable length space
time coded modulation (VLSTCM) is proposed in [10,
11] by concatenating VLC and BLAST in MIMO systems.
Iterative detection structure is proposed in [12] for a
concatenated system with reversible variable length code
(RVLC), TCM, and diagonal block space time trellis code
(DBSTTC). In this paper, we consider another type of sys
tems where recursive STTCs (RecSTTCs) with full transmit
diversity gain and some coding gain are concatenated with
source VLECs. For this type of systems, we design an iterative
decoding scheme to fully utilize the redundancy in both
source code and space time code. Modiﬁcation of MAP
decoding algorithms and information exchange between
symbol and bit domains from the two component decoders
are addressed. This iterative decoding is evaluated in both
quasi static and rapid fading channels when either perfect
channel information is available or the channel estimation
errors exist. The results show signiﬁcant decoding gain
over noniterative decoding in the tested cases. Furthermore,
we study the rate allocation issue dealing with how to
allocate the redundancy between STTC and VLEC for better
decoding performance under the overall bandwidth and
transmission power constraint. We ﬁnd that with increased
decoding complexity, the joint decoding system performance
can be improved by introducing more redundancy into
source code while using a higherorder modulation in STTC.
The rest of paper is organized as follows. The concatena
tion structure of VLEC and STTC is described in Section 2.
Joint source and space time decoding algorithm is discussed
in Section 3 in detail. Performance in case of perfect channel
estimation is provided in Section 4. Performance in presence
of channel estimation errors is presented in Section 5. The
rate allocation issue is then investigated in Section 6. Finally,
conclusions are drawn in Section 7.
2. SYSTEMWITHVLEC ANDSTTC
The encoder block diagram is depicted in Figure 1. We
assume a
i
, i = 0, 1, . . . , K − 1 is one packet of digital source
symbols, drawn from a ﬁnite alphabet set 0, 1, . . . , N − 1.
K is the packet length, N is the source alphabet size. The
VLEC encoder maps each source symbol to a variable length
codeword at a code rate R
VLEC
= H/l. l is the average
VLEC codeword length. H is the entropy of the source.
The generated bit sequence is b
j
, j = 0, 1, . . . , L − 1. A bit
interleaver is inserted before the use of STTC for time
diversity. In this paper, we use a random interleaver.
Consider 2
p
ary modulation is used, the bit stream
is grouped every p bits and converted to symbol stream
c
t
, t = 0, 1, . . . , L/ p − 1 as the input to STTC encoder.
The output from STTC is N
T
modulated symbol sequences
{a
i
}
Source
VLEC
encoder
{b
j
} ¸
Bit
−>sym.
{c
t
}
STTC
encoder
d
N
T
−1
t
d
0
t
.
.
.
Figure 1: Serial concatenation of VLEC and STTC.
Table 1: Examples of VLEC [8].
Symbol Probability Huﬀman C1 C2
0 0.33 00 00 11
1 0.3 11 11 001
2 0.18 10 101 0100
3 0.1 010 010 0101100
4 0.09 011 0110 0001010
E[l] H = 2.14 2.19 2.46 3.61
d
f
1 2 3
d
i
t
, i = 0, . . . , N
T
− 1; t = 0, 1, . . . , M − 1 (M = L/ p),
which are sent to radio channel through N
T
transmit anten
nas. The overall eﬀective information rate is pH/l bit/s/Hz.
Suppose there are N
R
antennas at the receiver; at time t,
the signal on the jth receive antenna is
r
j
t
=
NT−1
¸
i=0
E
s
f
i, j
t
d
i
t
+ η
j
t
, (1)
where i = 0, . . . , N
T
− 1; j = 0, . . . , N
R
− 1; E
s
is the average
power of the transmitted signal; f
i, j
t
is the path gain between
the ith transmit antenna and the jth receive antenna at time
t. We consider two fading cases: quasi static fading channel
(also referred as block fading) in which the path gain keeps
constant over one packet and rapid fading channel in which
the path gain changes from one symbol to the other. η
j
t
is
the additive complex white Gaussian noise on the jth receive
antenna at time t with zero mean and variance of N
0
/2 per
dimension.
2.1. Variable length error correcting code
In [2], Buttigieg introduced variable length error correcting
code (VLEC). It is similar to block error correcting code in
that each source symbol is mapped to a codeword, but with
diﬀerent length. The more frequent symbols are assigned
with shorter codewords. The codewords are designed so
that a minimum free distance is guaranteed. With a larger
free distance, VLEC has stronger error resilience capability.
However, in the mean time, more redundancy is introduced
and the average length per symbol increases, which reduces
the overall eﬀective information rate. Table 1 gives the
examples of Huﬀman code and two VLECs of a same source
with diﬀerent free distances from [8].
Since a bitbased trellis representation was proposed for
VLEC[13], the MAP decoding algorithmcan also be adopted
for bitlevel VLEC decoding. Figure 2 gives the tree structure
and the bitlevel trellis representation of VLEC C1. Each
H. Chen and L. Cao 3
1
1
1
1
1 0
0
0
0
0
L
L L
L L
R
11 12
13 14
15
T
11
12
13
14
15
T
11
12
13
14
15
Figure 2: Example of VLEC tree structure and bitlevel trellis [7].
interior node on the VLEC coding tree is represented by
“I
i
”. The root node and the leaf nodes can be classiﬁed as
terminal nodes and denoted by the “T” states in the trellis.
The branches in the trellis describe the state transitions at
any bit instance along the source coded sequence.
2.2. Recursive space time trellis code
The recursive nature of component encoders is critical
to the excellent decoding performance of turbo codes.
General rules for designing parallel and serial concatenated
convolutional codes have been presented in [14, 15]. In both
cases, recursive convolutional code is required.
In [16], Tujkovic proposed recursive space time trellis
code (RecSTTC) with full diversity gain for parallel con
catenated space time code. Figure 3 gives the example of
RecSTTCs in [16] for two transmit antennas. The upper
part is a 4state, QPSK modulated RecSTTC (ST1) with
bandwidth eﬃciency 2 bit/s/Hz and the lower part is an
8state, 8PSK modulated RecSTTC (ST2) with bandwidth
eﬃciency 3 bit/s/Hz. Each line represents a transition from
the current state to the next state. The numbers on the left
and right sides of the dashes are the corresponding input
symbols and two output symbols, respectively.
3. JOINT VLEC ANDSPACE TIME DECODER
Consider the above serial concatenated source and space
time coding system. Conventionally, the separate decoding
stops after one round of STTC decoding followed by VLEC
decoding. In this paper, we utilize both redundancy in VLEC
and error correcting ability of STTC in time dimension to
facilitate each other’s decoding through an iterative process,
and hence to improve the overall decoding performance.
Figure 4 illustrates the iterative joint source and space
time decoding structure. Assume that the packet has been
synchronized and the side information of the packet length
in bit after VLEC encoder is known at the receiver.Softinput
softoutput MAP algorithm [17] is used in both VLEC and
STTC decoders.
0 1/1 2/2 3/3
1/10 0/11 3/12 2/13
2/20 3/21 0/22 1/23
3/30 2/31 1/32 0/33
00 01
11 10
(a)
0 1/1 2/2 3/3 4/4 5/5 6/6 7/7
1/50 2/51 3/52 4/53 5/54 6/55 7/56 0/57
2/20
3/70
4/40
5/10
6/60
7/30
3/21
4/71
5/41
6/11
7/61
0/31
4/22
5/72
6/42
7/12
0/62
1/32
5/23
6/73
7/43
0/13
1/63
2/33
6/24
7/74
0/44
1/14
2/64
3/34
7/25
0/75
1/45
2/15
3/65
4/35
0/26
1/76
2/46
3/16
4/66
5/36
1/27
2/77
3/47
4/17
5/67
6/37
011
010
110
111
101
100
001
000
(b)
Figure 3: Trellis graphs of QPSK and 8PSK recursive STTCs.
3.1. MAP in symbol and bit domains
The MAP decoder takes the received sequences as soft inputs
and a priori probability sequences and outputs an optimal
estimate of each symbol (or bit) in the sense of maximizing
its a posteriori probability. The a posteriori probability
is calculated through the coding constraints represented
distinctly by trellis.
Given the received streams,
r =
⎡
⎢
⎢
⎢
⎢
⎣
r
0
0
, . . . , r
0
t
, . . .
.
.
.
.
.
.
.
.
.
r
NR−1
0
, . . . , r
NR−1
t
, . . .
⎤
⎥
⎥
⎥
⎥
⎦
, (2)
and assume perfect channel information f = [ f
i, j
t
], i =
0, . . . , N
T
−1, j = 0, . . . , N
R
−1, known at the receiver, at each
time index t, then the space time decoder generates symbol
domain loglikelihood values for all symbols in the signal
constellation Q = q, q= 0, . . . , 2
p
− 1 as follows:
L
¸
c
t
= qr
= ln
¸
(s
,s)⇒q
α
t−1
(s
)γ
t
(s
, s)β
t
(s), (3)
where (s
, s) represents the state transition from time t −1 to
time t on the STTC trellis,
γ
t
(s
, s) = P
¸
r
t
 (s
, s)
P(ss
). (4)
r
t
= r
j
t
, j = 0, . . . , N
R
− 1 is the array of received signal on
the N
R
receive antennas at index t. The ﬁrst part on the right
hand side of (4) involves channel information given by
lnP
¸
r
t
 (s
, s)
= −C
NR−1
¸
j=0
¸
¸
¸
¸
¸
r
j
t
−
NT−1
¸
i=0
f
i, j
t
d
i
t
¸
¸
¸
¸
¸
2
, (5)
where d
i
t
, i = 0, . . . , N
T
− 1 are the transmitted signals asso
ciated with transition branch (s
, s) at time t. C is a constant
that depends on the channel condition at time t and is
the same for all possible transition branches. P(ss
) is a
4 EURASIP Journal on Wireless Communications and Networking
{r
0
t
}
{r
N
R
−1
t
}
L
a STTC
Space time
MAP decoder
L
p STTC
Bit−>sym.
probability
converter
Sym.−>bit
probability
converter
¸
¸
−1
Y
L
a VLC
L
p VLC
VLEC bitlevel
MAP decoder
VLEC SOVA
decoder
Figure 4: Joint source space time decoder.
priori information and equal to P (q : (s
, s)↔q). Without
any a priori information, every symbol in constellation is
considered as generated with equal possibility and P(ss
) is
set to 1/2
p
.
α
t
(s) is the probability that the state at time t is s and the
received signal sequences up to time t are r
k<t+1
, It can be
calculated by a forward pass as
α
t
(s) =
¸
s
γ
t
(s
, s)α
t−1
(s
). (6)
β
t−1
(s
) is the probability that the state at time t − 1 is s
and
the received data sequences after time t −1 are r
k>t−1
, and can
be calculated by a backward pass as
β
t−1
(s
) =
¸
s
β
t
(s)γ
t
(s
, s). (7)
The initial values α
0
(0) = β
Ns
(0) = 0 (Ns is the packet length
in modulated symbol), assuming tail symbols are added to
force the encoder registers back to the zero state.
It needs to be pointed out that L( c
t
= q  r) in (3)
is a loglikelihood value but not the loglikelihood ratio in
the conventional MAP decoding. This is because multiple
candidate symbols exist in the STTC constellation. Besides,
the systematic and parity information can no longer be
separated in (5), because the two output symbols in any
trellis transition are sent through two transmit antennas
simultaneously. Received signal on any receive antenna is an
additive eﬀect of two symbols and the noise. Equation (3)
can be rewritten as
L
¸
c
t
= q  r
= L
a STTC
+ L
e STTC
, (8)
where
L
a STTC
= lnP
¸
c
t
= q
,
L
e STTC
= ln
¸
(s
,s)⇒q
α
t−1
(s
)P
¸
r
t
 (s
, s)
β
t
(s).
(9)
As a result, each symbol domain loglikelihood value com
prises only two parts: extrinsic information and a priori
information. The extrinsic information of STTC will be sent
to the VLEC decoder as a priori information.
The bitindexed soft input sequence Y to VLEC decoder
is the extrinsic information from the channel decoder in the
ﬁrst iteration. VLEC MAP decoder calculates bit domain log
likelihood ratio for each coded bit u
k
as
L
¸
u
k
Y
= ln
P
¸
u
k
= 1  Y
P
¸
u
k
= 0  Y
= ln
¸
(s
,s)⇒ uk=1
α
k−1
(s
)γ
k
(s
, s)β
k
(s)
¸
(s
,s)⇒ uk=0
α
k−1
(s
)γ
k
(s
, s)β
k
(s)
.
(10)
The forward and backward calculations of α and β are similar
to STTC MAP decoding. Since this is a serially concatenated
system without separable systematic information, Y will be
regarded as the L
p STTC
minus the a priori information of the
STTCdecoding in the ﬁrst iteration and will remain the same
for the use of all iterations. The a priori information of VLEC
decoder (L
a VLEC
) in the following iterations will be updated
with the extrinsic information from the STTC decoder. The
calculation of γ can be written as
γ
k
(s
, s) = P
¸
Y
k
 (s
, s)
P(u
k
), u
k
∈ 1, 0, (11)
where u
k
is the output bit from VLEC encoder associated
with transition from previous state s
to current state s at
instant k along the trellis. Equation (10) can be further
represented as
L
¸
u
k
Y
= L
a VLEC
+ L
e VLEC
, (12)
where
L
a VLEC
= ln
P
¸
u
k
= 1
P
¸
u
k
= 0
,
L
e VLEC
= ln
¸
(s
,s)⇒ uk=1
α
k−1
(s
)P
¸
Y
k
 (s
, s)
β
k
(s)
¸
(s
,s)⇒ uk=0
α
k−1
(s
)P
¸
Y
k
 (s
, s)
β
k
(s)
.
(13)
Therefore, once the VLEC loglikelihood ratio is calculated,
the extrinsic information L
e VLEC
will be extracted and sent
to the STTC decoding as the new a priori information.
3.2. Iterative information exchange
The principle of iterative decoding is to update the a priori
information of each component decoder with the extrinsic
information from the other component decoder back and
forth. By iterative information exchange, the decoder can
H. Chen and L. Cao 5
make full use of the coding gain in the coding trellises of
the component codes to remove channel noise in a build
up way. During the ﬁrst iteration, the a priori probability
to RecSTTC decoder L
a STTC
is set to be equally distributed
over every possible symbol. The loglikelihood output from
space time decoder L
p STTC
is separated into two parts:
soft information (including the systematic and extrinsic
information since systematic information is not separable
in space time coding scheme) and a priori information,
which, in later iterations, is the extrinsic information
from VLEC decoder. The soft symbol information L
e STTC
is extracted and converted to loglikelihood ratio in bit
domain. After deinterleaving, it is sent to VLEC decoder
as a priori information L
a VLEC
. The a posteriori probability
output of VLEC decoder L
p VLEC
consists of two parts: a
priori information and extrinsic information L
e VLEC
. Only
extrinsic information is interleaved and converted to the a
priori information in symbol domain for RecSTTC decoder
in the next iteration. After the ﬁnal iteration, Viterbi VLEC
decoding is carried out on L
p VLEC
to estimate the source
symbol sequence.
The conversion between the bit domain loglikeli
hood ratio and the symbol domain loglikelihood value
is implemented based on the mapping method and the
modulation mode. Each symbol q consists of p bits
q↔w
0
, w
1
, . . . , w
p−1
, w
i
∈ 0, 1. For a group of p bits
y
0
y
1
· · · y
p−1
, we derive the relation between L(q),
q = 0, 1, . . . , 2
p
− 1 and corresponding L(y
i
), i = 0, . . . ,
p − 1 as follows:
L
¸
y
i
= ln
P
¸
y
i
= 1
P
¸
y
i
= 0
= ln
¸
q:(wi =1)∈q
P(q)
¸
q:(wi =0)∈q
P(q)
= ln
¸
q:(wi =1)∈q
e
L(q)
¸
q:(wi =0)∈q
e
L(q)
,
(14)
L(q) = lnP(q) = ln
p−1
¸
i=0
P(w
i
) =
p−1
¸
i=0
ln
e
wi L(yi )
1 + e
L(yi )
, (15)
where i = 0, . . . , P − 1. In (15),we use a conversion pair
between LLR L(a) and absolute probability P (a = 1) and
P (a = 0) as follows:
P(a = 1) =
e
L(a)
1 + e
L(a)
, P(a = 0) =
1
1 + e
L(a)
. (16)
4. PERFORMANCE OVER FADINGCHANNELS
Throughout this paper, a MIMO system with two transmit
antennas and two receive antennas is used to transmit VLEC
coded source stream. A symbol stream is ﬁrst generated
and fed to source encoder. Each symbol is drawn from
a 5ary alphabet with probability distribution shown in
Table 1. Each input packet has 100 source symbols. We
use the VLEC (C1, C2) schemes in Table 1 and the Rec
STTCs (ST1, ST2) with signal constellations in Figure 3.
The average transmitted signal power is set to one (E
s
=
1) and the amplitudes of QPSK and 8PSK are both equal
to one (
E
s
= 1). The output bit stream from VLEC
6 5.8 5.6 5.4 5.2 5 4.8 4.6 4.4 4.2 4
E
b
/N
0
(dB)
Separate C2+ST1 rapid
Joint iter4 C2+ST1 rapid
Joint iter8 C2+ST1 rapid
Separate C2+ST1 block
Joint iter4 C2+ST1 block
Joint iter8 C2+ST1 block
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
S
E
R
Figure 5: SER performance of joint source and space time decoder
over Rayleigh fading channels.
encoder is padded with “0” if necessary so that its length
can be divided by p. Tail symbols are added so that Rec
STTC encoder registers return to zero states. A random
interleaver is used between the VLEC encoder and the Rec
STTCencoder. We adopt Rayleigh distributed channel model
of both rapid fading case and quasi static fading case.
Following the iterative decoding and information conversion
described in the previous section, the endtoend system
performance is measured by the symbol error rate (SER) each
time after VLEC SOVA decoder. SER is measured in terms of
Levenshtein distance [18] which is the minimum number of
insertions, deletions, and substitutions required to transform
one sequence to another.
In this section, we study VLEC C2 concatenated with
QPSK modulated RecSTTC ST1. The overall eﬀective
information rate is 1.1856 bit/sec/Hz. Figure 5 shows the
SER performance comparison between the joint VLEC and
space time decoder and the separable space time and VLEC
decoder over quasi static (i.e., block) Rayleigh fading channel
and rapid Rayleigh fading channel. The joint source space
time decoder achieves more than 2 dB gain over separate
decoding in SER in rapid fading channel and about 0.8 dB
gain in quasi static fading channel. Especially, at 6 dB in rapid
fading channels, after 8th iteration, SER also drops to 10
−3
of
the SER of separate decoding.
We also observe that the concatenated VLEC and STTC
system has a less performance gain in quasi static fading
channel than in rapid fading channel, as shown in Figure 5.
This is reasonable because the rapid fading channels, which
are also called interleaved fading channels, can provide
additional diversity gain, compared with the quasi static
channel.
6 EURASIP Journal on Wireless Communications and Networking
5. PERFORMANCE INPRESENCE OF
CHANNEL ESTIMATIONERRORS
In this section, we evaluate the joint source and space
time decoding in more realistic scenarios. In Section 3, the
decoder assumes in the ﬁrst place that the channel state infor
mation (CSI) is perfectly known at the receiver. However, in
real communication systems, regardless of what method is
used, there are always errors in the channel estimation. How
the joint source and space time decoder performs in presence
of channel estimation errors is examined here.
Considering imperfect channel estimation, the actual
channel fading matrix f used to calculate metric in (5)
becomes the estimated channel fading matrix
f. We model
each estimated channel fading coeﬃcient
f
i, j
t
between the ith
transmit antenna and the jth receive antenna at time t as a
noisy version of the actual channel fading coeﬃcient f
i, j
t
,
f
i, j
t
= f
i, j
t
+ η
i, j
t
, (17)
where η
i, j
t
is the channel estimation error and modeled as
a complex Gaussian random variable, with zero mean and
variance of σ
2
η
and is independent on f
i, j
t
. The correlation
coeﬃcient ρ between f
i, j
t
and
f
i, j
t
is given by
ρ =
1
1 + σ
2
η
. (18)
We use VLEC C1 and RecSTTC ST1 for simulation.
Other simulation parameters keep the same. Figure 6 shows
the SERperformance over quasi static fading channels. When
channel information is accurately estimated ρ = 1.0, the SER
decreases through iterations. There is about 0.7 dB gain at the
level of 10
−3
in SER over separate VLEC and STTC decoding.
In both cases of channel estimation error (ρ = 0.98 case I and
ρ = 0.95 case II), the joint RVLC and STTC decoding still
achieves iterative decoding gain. After 8 iterations, the joint
decoding scheme achieves a performance gain of more than
0.7 dB gain at the level of 10
−3
in SER in case I, compared
with separate decoding. In case II, a performance gain of
3.5 dB at the level of 10
−2
in SER is achieved after 8 iterations.
The decoding performance in case I and case II over
rapid fading channels in Figure 7 shows a similar result.
Although channel estimation for rapid fading channels is not
practical in real systems, the result provides some theoretic
perspectives of the joint VLEC and STTC decoding. Similar
decoding gain is observed. After 8 iterations, the joint
decoding scheme achieves a performance gain of 1.5 dB in
SER at the level of 10
−3
with perfect channel estimation, a
performance gain of nearly 4 dB at the level of 10
−2
in SER in
case I, and a performance gain of more than 5 dB at the level
of 10
−1
in SER in case II, compared with separate VLEC and
STTC decoding.
It can be found that in both quasi static fading channel
and rapid fading channel, from ρ = 1 to ρ = 0.95,
the decoding gain increases. When channel estimation is
less accurate, the channel information fed to space time
decoder deviates more from correctness and causes more
13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
E
b
/N
0
(dB)
ρ = 1 separate
ρ = 1 iter8
ρ = 0.98 separate
ρ = 0.98 iter8
ρ = 0.95 separate
ρ = 0.95 iter8
10
−4
10
−3
10
−2
10
−1
S
E
R
Figure 6: SER performance joint source and space time decoding
over quasi static fading channel with channel estimation error.
13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
E
b
/N
0
(dB)
ρ = 1 separate
ρ = 1 iter8
ρ = 0.98 separate
ρ = 0.98 iter8
ρ = 0.95 separate
ρ = 0.95 iter8
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
S
E
R
Figure 7: SER performance joint source and space time decoding
over rapid fading channel with channel estimation error.
errors. The iterative decoder can still achieve signiﬁcant
improvement over the separate decoding through iterations.
Therefore, the joint source space time decoder is robust
to channel estimation errors to some extent. The result is
also consistent with the decoder’s convergence characteristic.
After 6 iterations, the iterative decoding algorithm has
H. Chen and L. Cao 7
13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
E
b
/N
0
(dB)
Separate system II rapid
Joint iter3 system II rapid
Joint iter6 system II rapid
Separate system I rapid
Joint iter3 system I rapid
Joint iter6 system I rapid
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
S
E
R
Figure 8: SER performance comparison between (C1+ST1) and
(C2+ST2) over rapid fading channel.
little improvement in case of ρ = 1 while iterative gain
is still observed in case of ρ = 0.95. However, we also
did simulations in case of ρ ≤ 0.65 which means the
channel estimation is very poor. We did not ﬁnd much
improvement using the iterative decoding. This is because
at this situation, the estimation does not reﬂect correct
information of the actual channel situation and the space
time component decoder cannot work eﬀectively to extract
the correct information for the iterative utilization.
6. RATE ALLOCATIONBETWEENSTTC ANDVLEC
The frequency bandwidth resource available to a com
munication system is always limited, the overall eﬀective
data rate that can be transmitted from antennas is hence
constrained. The power eﬃciency is measured by the energy
required for transmitting one bit. When communicating at
a rate of R with transmit power E, the power eﬃciency is
deﬁned as E/R. The overall eﬀective data rate depends on
both the modulation order of RecSTTC and the average
codeword length of VLEC. On one hand, for a source with
given entropy H and a ﬁxed power eﬃciency, the overall
eﬀective information rate is given by pH/l. It increases
with the modulation order p in RecSTTC. However, the
decoding performance decreases due to a smaller average
Euclidean distance between each pair of signal points in
the modulation constellation. On the other hand, VLEC
with a larger average length l helps to increase error
resilience capability due to extra redundancy introduced.
13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
E
b
/N
0
(dB)
Separate system II block
Joint iter3 system II block
Joint iter6 system II block
Separate system I block
Joint iter3 system I block
Joint iter6 system I block
10
−4
10
−3
10
−2
10
−1
S
E
R
Figure 9: SER performance comparison between (C1+ST1) and
(C2+ST2) over quasi static fading channel.
However, this decoding performance is improved at the
cost of data rate loss which needs to be compensated later,
for example, by the increase of modulation order. As a
result, one interesting question is that, given the overall
eﬀective information rate and transmit power, whether
introducing more redundancy in VLEC or reducing the
modulation order of RecSTTC gives more performance
improvement. This question is partially answered in the
following simulation.
We study the iterative source space time decoding
performance of two diﬀerent concatenated systems. System
I concatenates VLEC C1 with QPSK RecSTTC ST1. System
II concatenates VLEC C2 with 8PSK RecSTTC code ST2.
With the source entropy of 2.14, the average bit length
for each source symbol of C1 and C2 equals to 2.46
and 3.61. The bandwidth eﬃciencies of QPSK and 8PSK
equal to 2 bit/s/Hz and 3 bit/s/Hz. System II has a slightly
higher overall eﬀective information rate (1.7784 bit/s/Hz)
than system I (1.7398 bit/s/Hz). By assigning unit power to
each modulated symbol, system II also has a slightly higher
power eﬃciency (1/1.7784 = 0.5607/bit) than system I
(1/1.7398 = 0.5748/bit), which means that system II uses less
average power to transmit one bit source information.
Figure 8 shows SER performance comparisons between
system I and system II over rapid fading channels. The
simulation system conﬁguration is the same. System II
outperforms system I almost 4 dB at SER of 7 × 10
−5
. The
performance comparison between system I and system II in
quasi static channels shows a similar result, as in Figure 9.
8 EURASIP Journal on Wireless Communications and Networking
Therefore, given the roughly same overall information
rate and power eﬃciency, by allocating more redundancy in
the source code, the joint source and space time decoding
has more iterative decoding gain. However, it also needs to
be noted that the better performance of system II is achieved
at the cost of higher computation complexity because the
number of the states in both VLEC trellis and STTC trellis
increases. The complexity of system II is roughly 4 times
in STTC decoder and 2 times in VLEC decoder compared
with system I. Also, diﬀerent from rapid fading channel, the
quasi static channels provide no additional diversity gain. As
a result, system II has a less performance gain over system I
in quasi static fading channels.
7. CONCLUSIONS
In this paper, a joint decoder is proposed for serial con
catenated source and space time code. VLEC and RecSTTC
are employed with redundancy in both codes. By iterative
information exchange, the concatenation system achieves
additional decoding gain without bandwidth expansion.
Simulation shows that SER of joint decoding scheme is
greatly reduced, compared to the separate decoding system
in both quasi static and rapid fading channels. The proposed
decoder is also shown to be eﬀective with channel estimation
errors. Finally, We ﬁnd that given certain overall eﬀective
information rate and transmit power, introducing redun
dancy in source code can provide more decoding gain than
reducing the bandwidth eﬃciency of STTC, though with
increased decoding complexity.
REFERENCES
[1] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Spacetime
codes for high data rate wireless communication: performance
criterion and code construction,” IEEE Transactions on Infor
mation Theory, vol. 44, no. 2, pp. 744–765, 1998.
[2] V. Buttigieg and P. G. Farrell, “Variablelength errorcorrecting
codes,” IEE Proceedings: Communications, vol. 147, no. 4, pp.
211–215, 2000.
[3] N. Demir and K. Sayood, “Joint source/channel coding for
variable length codes,” in Proceedings of the Data Compression
Conference (DCC ’98), pp. 139–148, Snowbird, Utah, USA,
MarchApril 1998.
[4] K. P. Subbalakshmi and J. Vaisey, “Optimal decoding of
entropy coded memoryless sources over binary symmetric
channels,” in Proceedings of the Data Compression Conference
(DCC ’98), p. 573, Snowbird, Utah, USA, MarchApril 1998.
[5] A. H. Murad and T. E. Fuja, “Joint sourcechannel decoding
of variablelength encoded sources,” in Proceedings of the IEEE
Information Theory Workshop (ITW ’98), pp. 94–95, Killarney,
Ireland, June 1998.
[6] Q. Chen and K. P. Subbalakshmi, “An integrated joint source
channel decoder for MPEG4 coded video,” in Proceedings of
the 58th IEEE Vehicular Technology Conference (VTC ’03), vol.
1, pp. 347–351, Orlando, Fla, USA, October 2003.
[7] R. Bauer and J. Hagenauer, “On variable length codes for
iterative source/channel decoding,” in Proceedings of the Data
Compression Conference (DCC ’01), pp. 273–282, Snowbird,
Utah, USA, March 2001.
[8] A. Hedayat and A. Nosratinia, “Performance analysis and
design criteria for ﬁnitealphabet sourcechannel codes,” IEEE
Transactions on Communications, vol. 52, no. 11, pp. 1872–
1879, 2004.
[9] S. X. Ng, J. Y. Chung, and L. Hanzo, “Turbodetected unequal
protection MPEG4 wireless video telephony using multi
level coding, trellis coded modulation and spacetime trellis
coding,” IEE Proceedings: Communications, vol. 152, no. 6, pp.
1116–1124, 2005.
[10] S. X. Ng, J. Wang, L.L. Yang, and L. Hanzo, “Variable length
space time coded modulation,” in Proceedings of the 62nd IEEE
Vehicular Technology Conference (VTC ’05), pp. 1049–1053,
Dallas, Tex, USA, September 2005.
[11] S. X. Ng, J. Wang, M. Tao, L.L. Yang, and L. Hanzo,
“Iteratively decoded variable length spacetime coded mod
ulation: code construction and convergence analysis,” IEEE
Transactions on Wireless Communications, vol. 6, no. 5, pp.
1953–1962, 2007.
[12] S. X. Ng, F. Guo, and L. Hanzo, “Iterative detection of diagonal
block space time trellis codes, TCM and reversible variable
length codes for transmission over Rayleigh fading channels,”
in Proceedings of the 60th IEEE Vehicular Technology Conference
(VTC ’04), vol. 2, pp. 1348–1352, Los Angeles, Calif, USA,
September 2004.
[13] V. B. Balakirsky, “Joint sourcechannel coding with variable
length codes,” in Proceedings of the IEEE International Sympo
sium on Information Theory (ISIT ’97), p. 419, Ulm, Germany,
JuneJuly 1997.
[14] S. Benedetto and G. Montorsi, “Unveiling turbo codes:
some results on parallel concatenated coding schemes,” IEEE
Transactions on Information Theory, vol. 42, no. 2, pp. 409–
428, 1996.
[15] D. Divsalar and F. Pollara, “Serial and hybrid concatenated
codes with applications,” in Proceedings of the International
Symposium on Turbo Codes, pp. 80–87, Brest, France, Septem
ber 1997.
[16] D. Tujkovic, “Recursive spacetime trellis codes for turbo
coded modulation,” in Proceedings of the IEEE Global Commu
nication Conference (GLOBECOM ’00), vol. 2, pp. 1010–1015,
San Francisco, Calif, USA, NovemberDecember 2000.
[17] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding
of linear codes for minimizing symbol error rate,” IEEE
Transactions on Information Theory, vol. 20, no. 2, pp. 284–
287, 1974.
[18] T. Okuda, E. Tanaka, and T. Kasai, “A method for the
correction of garbled words based on the Levenshtein metric,”
IEEE Transactions on Computers, vol. 25, no. 2, pp. 172–178,
1976.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 329727, 7 pages
doi:10.1155/2008/329727
Research Article
Average Throughput with Linear Network Coding over
Finite Fields: The Combination Network Case
Ali AlBashabsheh and Abbas Yongacoglu
School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada K1N 6N5
Correspondence should be addressed to Ali AlBashabsheh, aalba059@site.uottawa.ca
Received 4 November 2007; Revised 17 March 2008; Accepted 27 March 2008
Recommended by Andrej Stefanov
We characterize the average linear network coding throughput, T
avg
c , for the combination network with mincut 2 over an arbitrary
ﬁnite ﬁeld. We also provide a network code, completely speciﬁed by the ﬁeld size, achieving T
avg
c for the combination network.
Copyright © 2008 A. AlBashabsheh and A. Yongacoglu. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
1. INTRODUCTION
For a set of sinks in a directed multicast network, it was
shown in [1] that if the network can achieve a certain
throughput to each sink individually, then it can achieve
the same throughput to all sinks simultaneously by allowing
coding at intermediate nodes. Such argument is possible
since information is an abstract entity rather than a physical
one. Thus, in addition to repetition and forwarding, nodes
can manipulate the symbols available from their inedges
and apply functions of such symbols to their outedges. The
collection of edge functions can be referred to as the network
code. The work in [1] shows that a natural bound on the
achievable coding rate is the least of the mincuts between
the source and each of the sinks. We refer to a code achieving
the mincut rate as a solution.
It is known that every multicast network has a linear
solution over a suﬃciently large ﬁnite ﬁeld [2]. Sanders et al.
[3] showed for linear network coding that an alphabet of size
O(T ) is suﬃcient to achieve the mincut rate, where T 
is the number of sinks in the network. RasalaLehman and
Lehman [4] indicated that such bound is tight by advising
a solvable multicast network requiring an alphabet of size
Ω(
T ) to achieve the mincut rate. This shows that some
multicast networks might require alphabets of huge sizes
to achieve their mincut throughputs. Coding over large
alphabet sizes is not always desirable since it introduces
some complexity and latency concerns. Hence, this motivates
working below mincut rates (i.e., relaxing the constraint of
operating at network capacity) but with signiﬁcantly smaller
alphabet sizes (if possible) [5].
Chekuri et al. [6] introduced the measure average
routing throughput by relaxing the constraint that all sinks
must receive the same rate. By decoupling the problem of
maximizing the average rate from the problem of balancing
rates toward sinks, they showed that average routing rates
can signiﬁcantly exceed the maximum achievable common
routing rates in the network. They also argued that the
majority rather than the minority of multicast applications
experience diﬀerent rates at the receivers. In [7], the concept
average linear network coding throughput was introduced
under the constraint where source alphabet and linear
network coding are restricted to the binary ﬁeld. In this work,
we extend average coding throughput measure to include
linear coding over arbitrary ﬁnite ﬁelds. Such extension is
an important step toward practical network coding. To see
this, we ﬁrst remark that in [7] the motivation to restrict
the alphabet to the binary ﬁeld was to present a simple
coding scheme where nodes are not required to perform
operations over a ﬁeld larger than F
2
. Such cut on processing
complexity came with the price of reducing the total network
throughput. In practice, although nodes might not posses
the capability to perform operations over the necessary ﬁeld
to achieve the mincut throughput, they might still have
a computation capability beyond the binary ﬁeld. In such
situations, it is more reasonable to design codes compatible
with nodes computation capability and thus get closer to the
mincut throughput.
2 EURASIP Journal on Wireless Communications and Networking
In the literature, two diﬀerent variations of the problem
of nonuniform coding throughputs at the terminals were
considered. The general connection model [5, 8] considers
the problem where each sink speciﬁes its set of demanded
messages. On the other hand, the nonuniform demand
problem refers to the problem where each sink speciﬁes only
the size of its demand (i.e., the number of messages) and such
demanded size might vary from sink to another [9]. In this
work, a sink does not specify neither the identities nor the
number of demanded messages. The objective is to maximize
the average throughput achievable with linear network
coding under the additional constraint where messages and
network coding are restricted to the ﬁnite ﬁeld F
q
(where
F
q
might not be suﬃciently large to achieve the mincut
throughput).
2. DEFINITIONS ANDPROBLEMFORMULATION
In general, assume an h unit rate information source consists
of h unit rate messages x
1
, x
2
, . . . , x
h
, where messages are
symbols from a ﬁnite ﬁeld F
q
. Also assume symbols carried
by edges belong to the same ﬁeld F
q
. For the comparison
between average throughputs and common throughputs to
be fair and meaningful, it is important that the number of
messages h at the source does not exceed the mincut. In the
more general case where the mincuts from the source to the
sinks are not equal, h can be set such that it does not exceed
the smallest of such mincuts.
A directed network N on V nodes and E links can be
modeled as a directed graph G(V, E). In multicast networks,
a node s ∈ V broadcasts a set of messages x
1
, . . . , x
h
to
a set of sinks T = {t
1
, . . . , t
n
} ⊆ V \ s where h is the
smallest mincut between s and t, for all t ∈ T . For any edge
e ∈ E, we denote by δ
in
(e) the set of edges entering the node
from which e departs. At some parts of this work, we ﬁnd it
more convenient to deal with the valuation (deﬁned below)
induced by the network code rather than the network code
itself.
Deﬁnition 1. Given a network code, C, deﬁned by a collec
tion of functions f
e
, for all e ∈ E such that
f
e
: F
δin(e)
q
−→F
q
. (1)
The code valuation induced by C is a collection of functions:
f
e
: F
h
q
−→F
q
, (2)
where f
e
is the value of f
e
as a function of x
1
, . . . , x
h
.
For a multicast network N with a set of messages at
the source whose size does not exceed the smallest of the
mincuts between the source and each sink. Linear network
coding over suﬃciently large ﬁeld allows every sink t ∈
T to recover the entire set of messages. In this work, we
somehow reverse the story, that is, we restrict the ﬁeld size
and allow sinks to recover subsets of the set of messages.
Since sinks do not experience the same throughput any more,
average throughput per sink becomes a natural measure to
evaluate the performance of a given network code. Hence,
the objective is to decide on a linear network code over the
speciﬁed alphabet which maximizes the average throughput
or equivalently the sum of throughputs experienced by all
sinks. More formally, we deﬁne the maximum average linear
coding throughput over F
q
as
T
avg
c =
1
T 
max
Q∈Q
¸
¸
t∈T
T
t
c
(Q)
, (3)
where the maximization is over Q, the set of all possible
linear coding schemes over F
q
, and T
t
c
(Q) is the throughput
at sink t under linear coding scheme Q ∈ Q. In contrast,
maximum average routing throughput was deﬁned in [6]
and is repeated here for convenience:
T
avg
i
=
1
T 
max
P∈P
¸
¸
t∈T
T
t
i
(P)
, (4)
where in this formulation, the maximization is over all
possible integer routing schemes, P, and T
t
i
(P) is the integer
routing throughput at sink t under routing scheme P ∈ P.
In what follows, we restrict our attention to the family of
combination networks with N intermediate nodes and min
cut 2. Such networks are suﬃcient to develop the ideas we
need to present in this work.
3. COMPLEXITY VERSUS ALPHABET SIZE
Consider a multicast network that requires a ﬁeld F
q
of size
q to achieve the mincut throughput. Thus, all operations to
compute edge functions must be done over the ﬁeld F
q
. In
other words, each node in the network must have a memory
of Θ(log
2
(q)) bits to store and manipulate the received
symbols. In practice, each edge can deliver a ﬁxed number
of bits per unit time. Hence, the assumption that edges can
deliver one symbol fromF
q
per network use implies a latency
of Θ(log
2
(q)).
4. AVERAGE THROUGHPUT OF COMBINATION
NETWORK WITHMINCUT 2
Consider a combination network N with mincut 2 and
messages x
1
, x
2
∈ F
q
at the source. A combination network
with mincut 2 consists of three layers of nodes: the source
s, a set of N intermediate nodes, and a set of (
N
2
) sinks. The
source has an outedge to each intermediate node. Finally,
each distinct pair of intermediate nodes is connected to a
unique sink via a pair of edges directed into the sink. In this
section, we derive an expression for the maximum average
throughput of N over an arbitrary ﬁnite ﬁeld F
q
. It is known
that the network, N, is solvable when N ≤ q + 1, where q
is the ﬁeld size. Hence, average throughput is equivalent to
the mincut throughput in this case. On the other hand, for
q = 2, the problem was solved in [7]. Therefore, in most of
the mathematical treatment which follows, we assume 2 <
q < N −1. In spite of this assumption, whenever applicable,
we use the previously obtained results for the binary ﬁeld and
the fact that N is solvable for q ≥ N−1 to present the results
for any q ≥ 2.
A. AlBashabsheh and A. Yongacoglu 3
Deﬁnition 2. Let F be a collection of functions such that
f : F
q
×F
q
−→F
q
, (5)
for each f ∈ F . Then, an average throughput, T
avg
c , is said to
be achievable over F if there exists a network code valuation
C = { f
e
(x
1
, x
2
) : f
e
(x
1
, x
2
) ∈ F for all e ∈ E} achieving
T
avg
c .
Let G = {αx
1
+ βx
2
: α, β ∈ F
q
} be the set of all linear
functions (combinations) of x
1
and x
2
over F
q
. Also let F =
{x
2
} ∪{x
1
+ βx
2
: β ∈ F
q
}.
Lemma 1. Let f (x
1
, x
2
) = αx
1
+βx
2
, f
(x
1
, x
2
) = α
x
1
+β
x
2
be two functions in G with α, β, α
, β
∈ F
q
\ {0} then x
1
is
recoverable from f and f
if and only if x
2
is recoverable from
f and f
(i.e., either both messages are recoverable or non of
them).
Proof. See the appendix.
Corollary 1. Let f (x
1
, x
2
) = x
1
+βx
2
, f
(x
1
, x
2
) = x
1
+β
x
2
be
two functions in F with β, β
∈ F
q
\ {0} then x
1
is recoverable
from f and f
if and only if x
2
is recoverable from f and f
.
Proof. The corollary follows from Lemma 1 and the fact that
F ⊂ G.
Lemma 2. An average throughput, T
avg
c , is achievable over F
if and only if it is achievable over G.
Proof. See the appendix.
Remarks
(i) Lemma 2 suggests that it is suﬃcient to consider the set
of functions F and there is no gain in considering G.
(ii) With a slight modiﬁcations in the proofs, it is easy to
show that Lemmas 1 and 2 are still valid even if G was
the set of all aﬃne functions of x
1
and x
2
over F
q
.
From the deﬁnition of F , we see that F  = q + 1, and
with the aid of Corollary 1, it is straight forward to show that
any sink receiving two distinct functions fromF will be able
to recover both messages. Thus, for N ≤ q + 1 the network
is solvable [10], and the average throughput is equal to the
mincut throughput. For N > q + 1, a simple pigeon hole
argument shows that some source edges will be carrying the
same combination of messages. Thus, a receiver with two in
edges carrying the same function of messages will be able to
recover one or non of the messages.
The next proposition determines how functions
f (x
1
, x
2
) ∈ F must be distributed among source outedges
in order to maximize average throughput (i.e., minimize
loss in throughput). Let m
i
be the number of source edges
carrying f
i
(x
1
, x
2
) = x
1
+ β
i
x
2
, for all 1 ≤ i ≤ q − 1, β
i
∈
F
q
\ {0}, β
i /
=β
j
, for all i
/
= j. With such assignment of
functions to source out edges, we still have N −
¸
q−1
i=1
m
i
unused source outedges and two functions in F namely,
f (x
1
, x
2
) = x
1
and f (x
1
, x
2
) = x
2
. Let m
0
and m
0
be the
number of source edges carrying x
1
and x
2
, respectively.
Since there is no preference in recovering one message
over the other (both messages are equally important
to each destination), a maximum average throughput
achieving assignment must have N−
¸
q−1
i=1
m
i
equally divided
between m
0
and m
0
, that is, m
0
= (N −
¸
q−1
i=1
m
i
)/2 and
m
0
= (N −
¸
q−1
i=1
m
i
+ 1)/2. Now, if a sink has both its
inedges carrying x
1
then it can not recover x
2
, and thus
there are (
m0
2
) sinks which cannot recover x
2
. Similarly, there
are (
m
0
2
) destinations which cannot recover x
1
. Finally, a
destination receiving f
i
(x
1
, x
2
) = x
1
+ β
i
x
2
, β
i
/
=0 on both
of its inedges will not recover any of the messages, and so
there is a loss of 2(
mi
2
). Hence, the total loss in throughput is
given by
L
¸
m
1
, . . . , m
k
=
⎛
⎜
⎝
¸
N−
¸
k
i=1
m
i
2
2
⎞
⎟
⎠+
⎛
⎜
⎝
¸
N−
¸
k
i=1
m
i
+ 1
2
2
⎞
⎟
⎠+ 2
k
¸
i=1
⎛
⎝
m
i
2
⎞
⎠
,
(6)
where k = q − 1 and the average throughput, as function of
m
1
, . . . , m
k
, is given by
T
avg
c
¸
m
1
, . . . , m
k
=
1
(
N
2
)
¸
2
N
2
¸
−L
¸
m
1
, . . . , m
k
. (7)
Before we present the proposition, we need the following
lemma.
Lemma 3. Let
A =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
a 1 · · · 1
1 a · · · 1
.
.
. · · ·
.
.
.
1
1 · · · 1 a
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(8)
be a k ×k matrix with a ∈ R, a
/
=1 or −(k −1), then
A
−1
=
1
(a −1)(a + k −1)
×
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
a + (k −2) −1 · · · −1
−1 a + (k −2) · · · −1
.
.
. · · ·
.
.
.
.
.
.
−1 · · · −1 a + (k −2)
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
.
(9)
Proof. See the appendix.
Proposition 1. Average linear network coding throughput of
network N is maximized when m
1
= m
2
= · · · = m
k
:= m
∗
int
,
where (N + 1)/(k + 4) ≤ m
∗
int
≤ (N + 1)/(k + 4) + 1 .
4 EURASIP Journal on Wireless Communications and Networking
Proof. From (6), we can write size
L
¸
m
1
, . . . , m
k
=
1
2
¸
N−
¸
k
i=1
m
i
−δ
2
¸
N−
¸
k
i=1
m
i
−δ
2
−1
¸
+
N−
¸
k
i=1
m
i
+δ
2
¸
N−
¸
k
i=1
m
i
+δ
2
−1
¸
+ 4
k
¸
i=1
m
i
¸
m
i
−1
2
,
(10)
where δ = 0 and 1 for N −
¸
k
i=1
m
i
is even and odd,
respectively. This reduces to
L
¸
m
1
, . . . , m
k
=
1
4
¸
N −
k
¸
i=1
m
i
¸2
−2
N −
k
¸
i=1
m
i
¸
+ 4
k
¸
i=1
m
i
¸
m
i
−1
+ δ
.
(11)
For the moment, we relax the constraintthat m
1
, . . . , m
k
must be integer valued. We also relax the constraint that
(N −
¸
k
i=1
m
i
)/2 and (N −
¸
k
i=1
m
i
+ 1)/2 must be integers
(note that with such relaxation δ disappears from(11)). Now,
we compute the partial derivative of (11) with respect to
m
j
, for all 1 ≤ j ≤ k and equate to zero. Thus, we obtain
5m
j
+
¸
i / =j
m
i
= N + 1. (12)
This can equivalently be written as
¸
m
1
m
2
· · · m
k
A = (N + 1)(1 1 · · · 1), (13)
where A = 4I + 1, I is the k × k identity matrix, and 1 is the
all ones k ×k matrix. Thus,
¸
m
1
m
2
· · · m
k
= (N + 1)(1 1 · · · 1)A
−1
(14)
and from Lemma 3 (using a = 5), we obtain
¸
m
1
m
2
· · · m
k
=
N + 1
k + 4
(1 1 · · · 1), (15)
that is, m
1
= m
2
= · · · = m
k
= (N + 1)/(k + 4). Noting
that L(m
1
, . . . , m
k
) is a convex function of m
1
, . . . , m
k
then
we know that the integer value m
∗
int
of m
1
, . . . , m
k
which
minimizes L(m
1
, . . . , m
k
) is bounded as
¸
N + 1
k + 4
≤ m
∗
int
≤
¸
N + 1
k + 4
+ 1 (16)
as required.
Since T
avg
c =max
m1,...,mk
T
avg
c (m
1
,. . ., m
k
) and T
avg
c (m
1
,. . .,
m
k
) is maximized by the choice of m
1
, . . . , m
k
as in
Proposition 1, then T
avg
c is totally speciﬁed by m
∗
int
. The
following proposition establishes a simple relation between
m
∗
int
and the ﬁeld size q.
Proposition 2. In a combination network with N interme
diate nodes, m
∗
int
that maximizes the average linear network
coding throughput over F
q
is given by
m
∗
int
=
¸
N + 2 + q/2
q + 3
, (17)
where 2 < q < N −1.
Proof. From (7) and (11), and by substituting m
1
= m
2
=
· · · = m
k
:= m, we obtain
T
avg
c (m)=
1
2N(N −1)
¸
4N(N −1) −(N −km)
2
+ 2(N −km) −4km(m−1) −δ
¸
,
(18)
where δ = 0 for N − km is even, and δ = 1 for N − km is
odd. Since δ plays a roll in the next derivations, we will write
δ(N, m) to emphasize its dependence on N and m. From
(18), we can write
T
avg
c (m) =
1
2N(N −1)
h(m), (19)
where
h(m) = −
¸
k
2
+ 4k
m
2
−2(N + 1)km
+ N(3N −2) −δ(N, m).
(20)
Let m
a
= (N +1)/(k +4) and m
b
= m
a
+1 = (N +1)/(k +
4) + 1, then from Proposition 1 we know that m
∗
int
is either
m
a
or m
b
. Thus, we can write
m
∗
int
= arg max
m∈{ma,mb}
T
avg
c (m) = arg max
m∈{ma,mb}
h(m).
(21)
In what follows, we assume k > 1 (the case when k = 1 was
solved in [7])
Now consider the following two possibilities.
Possibility A (k is odd)
Note in this case the ﬁeld size, q, is even since k = q−1. Thus,
q= 2
n
for some integer n > 1, that is, F
q
is an extension ﬁeld
with characteristic 2. Depending on the parities of N and m
a
(or equivalently m
b
), the following four cases arise.
Case 1. Both N and m
a
are even. Noting that for this case
δ(N, m
a
) = 0 and δ(N, m
b
) = 1, then from (20) and the fact
that m
b
= m
a
+ 1 we obtain
h
¸
m
b
= h
¸
m
a
−k(k + 4)
¸
2m
a
+ 1
+ 2(N + 1)k −1.
(22)
Since m
a
= (N +1)/(k +4) = (N +1 −)/(k +4) for some
∈ {0, 1, . . . , k + 3}, then we obtain
h
¸
m
b
= h
¸
m
a
+ k
¸
2 −(k + 4)
−1. (23)
From this and (21), we see that m
∗
int
= m
a
if k(2 − (k +
4)) − 1 < 0, and m
∗
int
= m
b
if k(2 − (k + 4)) − 1 > 0. But
A. AlBashabsheh and A. Yongacoglu 5
k(2 − (k + 4)) − 1 > 0 if and only if > (k + 4)/2 + 1/2k.
Thus,
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, <
k + 4
2
+
1
2k
,
m
b
, >
k + 4
2
+
1
2k
.
(24)
Now, we impose more structure on . Since m
a
= (N + 1 −
)/(k + 4) and noting that m
a
is even while N + 1 and k + 4
are odd, then must be odd, that is, ∈ {1, 3, . . . , k, k + 2}.
From this and (24), we get
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
1, 3, 5, . . . ,
k −1
2
,
k + 3
2
¸
,
m
b
, ∈
k + 7
2
,
k + 11
2
, . . . , k, k + 2
¸
.
(25)
Case 2. Both N and m
a
are odd. An identical result to (25)
can be obtained.
Case 3. N is even, and m
a
is odd. For this case, we have
δ(N, m
a
) = 1 and δ(N, m
b
) = 0 and from (20) we can write
h
¸
m
b
= h
¸
m
a
+ k
¸
2 −(k + 4)
+ 1, (26)
since m
a
= (N +1−)/(k+4) and from the fact that m
a
, N +
1, and k + 4 are all odd, then must be even, that is, ∈
{0, 2, . . . , k + 1, k + 3}.
Now, k(2 −(k +4)) +1 > 0 if and only if > (k +4)/2 −
1/2k. Thus,
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
0, 2, 4, . . . ,
k −3
2
,
k + 1
2
¸
,
m
a
, ∈
k + 5
2
,
k + 9
2
, . . . , k + 1, k + 3
¸
.
(27)
Case 4. N is odd, and m
a
is even. An identical result to (27)
can be obtained.
Combining the previous four cases, we can write for any
odd k > 1:
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
=
¸
N + 1
k + 4
, ∈
0,1, 2, 3, . . . ,
k + 1
2
,
k + 3
2
¸
,
m
b
=
¸
N + 1
k + 4
+1, ∈
k + 5
2
,
k + 7
2
, . . . ,k + 2,k + 3
¸
.
(28)
Or more compactly
m
∗
int
=
¸
N + 2 + (k + 1)/2
k + 4
. (29)
Possibility B (k is even)
Note that in this case the ﬁeld size, q, is odd. Thus, q = p
n
for some prime p
/
=2 and integer n ≥ 1. As in possibility A.
the following four cases arise.
Case 1. Both N and m
a
are even. In this case, we have
δ(N, m
a
) = δ(N, m
b
) = 0 and from (20) we can write
h
¸
m
b
= h
¸
m
a
+ k
¸
2 −(k + 4)
, (30)
since m
a
is even, N + 1 is odd, and k + 4 is even, we induce
that must be odd. Combining this with (30), (21) and the
fact that k(2 − (k + 4)) < 0 if and only if < (k + 4)/2 we
obtain, for k/2 is even (i.e., k is divisible by 4),
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
1, 3, 5, . . . ,
k −2
2
,
k + 2
2
¸
,
m
b
, ∈
k + 6
2
,
k + 10
2
, . . . , k + 3
¸
,
(31)
and for k/2 is odd:
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
1, 3, 5, . . . ,
k
2
,
k + 4
2
¸
,
m
b
, ∈
k + 8
2
,
k + 12
2
, . . . , k + 3
¸
.
(32)
Case 2. Both N and m
a
are odd. Following the same steps as
before and noting that is even for this case, we obtain (for
k/2 is even)
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
0, 2, 4, . . . ,
k
2
,
k + 4
2
¸
,
m
b
, ∈
k + 8
2
,
k + 12
2
, . . . , k, k + 2
¸
,
(33)
and for k/2 is odd:
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
, ∈
0, 2, 4, . . . ,
k −2
2
,
k + 2
2
¸
,
m
b
, ∈
k + 6
2
,
k + 10
2
, . . . , k, k + 2
¸
.
(34)
Case 3. N is even, and m
a
is odd. It can be shown that m
∗
int
in this case is given by identical relations to (31) and (32).
Case 4. N is odd, and m
a
is even. This case can be shown to
be similar to Case 2, that is, m
∗
int
is given by (33) and (34).
Combining the previous four cases, we can write for any
even k > 1:
m
∗
int
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
m
a
=
¸
N + 1
k + 4
, ∈
0,1, 2, 3, . . . ,
k
2
,
k + 2
2
,
k + 4
2
¸
,
m
b
=
¸
N + 1
k + 4
+1, ∈
k + 6
2
,
k + 8
2
, . . . ,k + 2,k + 3
¸
.
(35)
This can also be written as
m
∗
int
=
¸
N + 2 + k/2
k + 4
. (36)
From (29) and (36), we obtain for any 1 < k < N −2
m
∗
int
=
¸
N + 2 + (k + 1)/2
k + 4
, (37)
substituting k = q −1 and the proposition follows.
6 EURASIP Journal on Wireless Communications and Networking
From the work in [7] for q = 2 and the results presented
in this work, we obtain the following corollary which char
acterizes T
avg
c over any ﬁnite ﬁeld.
Corollary 2. For a combination network with N intermediate
nodes and mincut 2, the maximum achievable average linear
network coding throughput over F
q
is given as
T
avg
c =
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
2, q ≥ N −1,
1
(
N
2
)
¸
2
⎛
⎝
N
2
⎞
⎠
−L
¸
m
∗
q
, 2 ≤ q < N −1,
(38)
where
L
¸
m
∗
q
=
⎛
⎜
⎝
¸
N −km
∗
q
2
2
⎞
⎟
⎠+
⎛
⎜
⎝
¸
N −km
∗
q
+ 1
2
2
⎞
⎟
⎠+2k
⎛
⎝
m
∗
q
2
⎞
⎠
,
m
∗
q
=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
¸
N + 2 + q/2
q + 3
, 2 < q < N −1,
¸
N + 2
5
, q = 2.
(39)
Proof. For q ≥ N − 1, the result is immediate since the
network is solvable. For 2 < q < N − 1, the claim follows
from (6) and (7), where from Proposition 1 we know that
average throughput is maximized when m
1
, . . . , m
k
are equal.
Hence, we substitute m
∗
q
for m
1
, . . . , m
k
in (6), where m
∗
q
is
the integer value which maximizes the average throughput
and was obtained in Proposition 2 (it was denoted m
∗
int
).
Finally, for q = 2 the result was proven in [7].
Figure 1 shows T
avg
c as a function of N and the number
of intermediate nodes, for diﬀerent ﬁeld sizes. It also shows
the average integer routing throughput T
avg
i
.
APPENDIX
PROOFS
Proof of Lemma 1. Assume x
1
is recoverable from f and f
,
then x
2
= β
−1
( f −αx
1
) where β
−1
exists since β
/
=0, and F
q
\
{0} is a group under multiplication. Thus, x
2
is recoverable.
The proof of the other direction is similar.
Proof of Lemma 2. The forward implication is obvious since
F ⊂ G. To prove the reverse implication, assume that
there exists a code valuation C = {g
e
(x
1
, x
2
) : g
e
(x
1
, x
2
) ∈
G, for all e ∈ E} achieving average throughput T
avg
c .
Consider an indexing set I = {1, 2, . . . , N} on the source out
edges (and equivalently on intermediate nodes). Since each
intermediate node in N has only one inedge then interme
diate nodes merely forward what they receive on their in
edges to their outedges. Thus, C is uniquely speciﬁed by
the functions carried by the source outedges. Hence, we may
consider C = {g
i
(x
1
, x
2
) : g
i
(x
1
, x
2
) ∈ G, for all i ∈ I}. For
any i ∈ I, if g
i
(x
1
, x
2
) = α
i
x
1
+ β
i
x
2
∈ C with α
i
= β
i
= 0,
50 45 40 35 30 25 20 15 10 5 0
N
1.5
1.55
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
A
v
e
r
a
g
e
t
h
r
o
u
g
h
p
u
t
q = 32
q = 16
q = 13
q = 11
q = 9
q = 8
q = 7
q = 5
q = 4
q = 3
q = 2
T
avg
i
Figure 1: Average linear network coding throughput.
then g
i
can be replaced with any function without reducing
the average throughput. Thus, we can assume that α
i
and
β
i
are not both zero. Now, let A be the subset of I whose
elements are the indices of source edges carrying functions
g
i
(x
1
, x
2
) = α
i
x
1
, that is, β
i
= 0, for all i ∈ A. Similarly,
let B be the subset of I such that g
i
= β
i
x
2
, for all i ∈ B.
Obviously, A and B are disjoint since α
i
and β
i
are not both
zero for any i ∈ I. Finally, let C be the subset of I whose
elements are the indices of all source edges carrying functions
of the form g
i
= α
i
x
1
+ β
i
x
2
with α
i /
=0 and β
i /
=0. Clearly,
A, B, and C partition I.
Now, design a code over F such that f
i
(x
1
, x
2
) = x
1
,
for all i ∈ A, f
i
(x
1
, x
2
) = x
2
, for all i ∈ B, and f
i
(x
1
,
x
2
) = x
1
+ α
−1
i
β
i
x
2
, for all i ∈ C. The existence of f
i
in the
given form for i ∈ C is guaranteed since α
i /
=0, β
i /
=0, and
F
q
\ {0} is a group under multiplication.
Now for all i, j ∈ I, consider sink t
i j
∈ T with incoming
edges originating from intermediate nodes i, j ∈ I.
(i) If i, j ∈ A, then t
i j
will be able to recover only x
1
from
g
1
and g
2
which is still the case if g
i
and g
j
are replaced by f
i
and f
j
. The same argument holds if i, j ∈ B with x
2
replacing
x
1
.
(ii) If i ∈ A and j ∈ B, then f
i
and f
j
will make x
1
and x
2
available to t
i j
so both messages are recoverable and there is
no loss in throughput due to replacing g
i
and g
j
with f
i
and
f
j
.
(iii) If i ∈ A and j ∈ C, then f
i
makes x
1
available to t
i j
which can be used with f
j
to recover x
2
. Thus, both messages
are recoverable, and there is no loss in considering F instead
of G. A similar argument holds for i ∈ B and j ∈ C.
(iv) If i, j ∈ C, then from g
i
and g
j
we can write
γx
2
= α
i
g
j
− α
j
g
i
where γ = α
i
β
j
− α
j
β
i
. Hence, x
2
is not
recoverable if and only if γ = 0. From this and Lemma 1
(since α
i
, β
i
, α
j
, β
j
∈ F
q
\ {0}), we know that both x
1
and
A. AlBashabsheh and A. Yongacoglu 7
x
2
are not recoverable if and only if γ = 0. Also from f
i
and
f
j,
we can write γ
x
2
= f
j
− f
i
where γ
= α
−1
j
β
j
−α
−1
i
β
i,
and
the same argument for γ holds for γ
. Thus, we need to show
γ = 0 if and only if γ
= 0. To this end, note that
γ = 0 ⇐⇒α
i
β
j
= α
j
β
i
⇐⇒β
j
= α
−1
i
α
j
β
i
⇐⇒β
j
= α
j
α
−1
i
β
i
⇐⇒α
−1
j
β
j
= α
−1
i
β
i
⇐⇒γ
= 0.
(A.1)
Hence, there is no loss in considering F instead of G in any
of the previous cases. The lemma follows by noting that the
previous cases represent all possibilities of receiving a pair of
functions by any sink.
Remarks
(i) With a slight modiﬁcation in the last step of the proof,
the lemma can be shown to be still true even if the
alphabet was a ﬁnite division ring (a skew ﬁeld) instead
of a ﬁeld.
(ii) It is possible to show that the lemma is true for any
multicast network N with mincut 2. The proof in this
case can be of the same nature as the proof presented in
[11] for the suﬃciency of homogeneous functions.
Proof of Lemma 3. Note that A can be written as A = (a −
1)I + 1 where I is the k × k identity matrix, and 1 is the all
ones k × k matrix (1
i j
= 1, for all 1 ≤ i, j ≤ k). Let B be
another k ×k matrix such that AB = I. Assume that B can be
written as B = (bI −1)c, where b and c are scalars. Thus,
AB =
¸
(a −1)I + 1
(bI −1)c
=
¸
(a −1)bI + b1 −(a −1)1 −k1
c.
(A.2)
For the multiplication AB to equal I, we need b1−(a−1)1−
k1 = 0, where 0 is the all zero matrix. This can be satisﬁed by
choosing
b = a + k −1. (A.3)
We also need
c(a −1)b = c(a −1)(a + k −1) = 1. (A.4)
Since a
/
=1, − (k − 1), we obtain c = 1/(a − 1)(a + k − 1).
Thus,
B =
1
(a −1)(a + k −1)
¸
(a + k −1)I −1
. (A.5)
It is easy to check that BA = I, thus A
−1
= B.
REFERENCES
[1] R. Ahlswede, N. Cai, S.Y. R. Li, and R. W. Yeung, “Network
information ﬂow,” IEEE Transactions on Information Theory,
vol. 46, no. 4, pp. 1204–1216, 2000.
[2] S.Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,”
IEEE Transactions on Information Theory, vol. 49, no. 2, pp.
371–381, 2003.
[3] P. Sanders, S. Egner, and L. Tolhuizen, “Polynomial time
algorithms for network information ﬂow,” in Proceedings of the
15th Annual ACM Symposium on Parallelism in Algorithms and
Architectures (SPAA ’03), pp. 286–294, San Diego, Calif, USA,
June 2003.
[4] A. RasalaLehman and E. Lehman, “Complexity classiﬁcation
of network information ﬂow problems,” in Proceedings of the
15th Annual ACMSIAM Symposium on Discrete Algorithms
(SODA ’04), pp. 142–150, New Orleans, La, USA, January
2004.
[5] A. RasalaLehman, “Network coding,” Ph.D. dissertation,
Department of Electrical Engineering and Computer Science,
MIT, Cambridge, Mass, USA, 2005.
[6] C. Chekuri, C. Fragouli, and E. Soljanin, “On average through
put and alphabet size in network coding,” IEEE Transactions on
Information Theory, vol. 52, no. 6, pp. 2410–2424, 2006.
[7] A. AlBashabsheh and A. Yongacoglu, “Average throughput
with linear network coding over the binary ﬁeld,” in Proceed
ings of the IEEE Information Theory Workshop (ITW ’07), pp.
90–95, Tahoe City, Calif, USA, September 2007.
[8] R. Dougherty, C. Freiling, and K. Zeger, “Insuﬃciency of
linear coding in network information ﬂow,” IEEE Transactions
on Information Theory, vol. 51, no. 8, pp. 2745–2759, 2005.
[9] Y. Cassuto and J. Bruck, “Network coding for nonuniform
demands,” in Proceedings of the International Symposium on
Information Theory (ISIT ’05), pp. 1720–1724, Adelaide, SA,
Australia, September 2005.
[10] C. Fragouli and E. Soljanin, “Information ﬂow decomposition
for network coding,” IEEE Transactions on Information Theory,
vol. 52, no. 3, pp. 829–848, 2006.
[11] R. Dougherthy, C. Freiling, and K. Zeger, “Linearity and
solvability in multicast networks,” IEEE Transactions on Infor
mation Theory, vol. 50, no. 10, pp. 2243–2256, 2004.
Hindawi Publishing Corporation
EURASIP Journal on Wireless Communications and Networking
Volume 2008, Article ID 754021, 13 pages
doi:10.1155/2008/754021
Research Article
MacWilliams Identity for Codes with the Rank Metric
Maximilien Gadouleau and Zhiyuan Yan
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
Correspondence should be addressed to Maximilien Gadouleau, magc@lehigh.edu
Received 10 November 2007; Accepted 3 March 2008
Recommended by Andrej Stefanov
The MacWilliams identity, which relates the weight distribution of a code to the weight distribution of its dual code, is useful
in determining the weight distribution of codes. In this paper, we derive the MacWilliams identity for linear codes with the rank
metric, and our identity has a diﬀerent formthan that by Delsarte. Using our MacWilliams identity, we also derive related identities
for rank metric codes. These identities parallel the binomial and power moment identities derived for codes with the Hamming
metric.
Copyright © 2008 M. Gadouleau and Z. Yan. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
1. INTRODUCTION
The MacWilliams identity for codes with the Hamming
metric [1], which relates the Hamming weight distribution of
a code to the weight distribution of its dual code, is useful in
determining the Hamming weight distribution of codes. This
is because if the dual code has a small number of codewords
or equivalence classes of codewords under some known
permutation group, its weight distribution can be obtained
by exhaustive examination. It also leads to other identities
for the weight distribution such as the Pless identities
[1, 2].
Although the rank has long been known to be a metric
implicitly and explicitly (e.g., see [3]), the rank metric was
ﬁrst considered for errorcontrol codes (ECCs) by Delsarte
[4]. The potential applications of rank metric codes to
wireless communications [5, 6], publickey cryptosystems
[7], and storage equipments [8, 9] have motivated a steady
stream of works [8–20] that focus on their properties.
The majority of previous works focus on rank distance
properties, code construction, and eﬃcient decoding of rank
metric codes, and the seminal works in [4, 9, 10] have made
signiﬁcant contribution to these topics. Independently in
[4, 9, 10], a Singleton bound (up to some variations) on the
minimum rank distance of codes was established, and a class
of codes achieving the bound with equality was constructed.
We refer to this class of codes as Gabidulin codes henceforth.
In [4, 10], analytical expressions to compute the weight
distribution of linear codes achieving the Singleton bound
with equality were also derived. In [8], it was shown that
Gabidulin codes are optimal for correcting crisscross errors
(referred to as latticepattern errors in [8]). In [9], it was
shown that Gabidulin codes are also optimal in the sense of
a Singleton bound in crisscross weight, a metric considered
in [9, 12, 21] for crisscross errors. Decoding algorithms were
introduced for Gabidulin codes in [9, 10, 22, 23].
In [4], the counterpart of the MacWilliams identity,
which relates the rank distance enumerator of a code to
that of its dual code, was established using association
schemes. However, Delsarte’s work lacks an expression of
the rank weight enumerator of the dual code as a functional
transformation of the enumerator of the code. In [24,
25], Grant and Varanasi deﬁned a diﬀerent rank weight
enumerator and established a functional transformation
between the rank weight enumerator of a code and that of
its dual code.
In this paper we show that, similar to the MacWilliams
identity for the Hamming metric, the rank weight distri
bution of any linear code can be expressed as a functional
transformation of that of its dual code. It is remarkable that
our MacWilliams identity for the rank metric has a similar
form to that for the Hamming metric. Similarly, an interme
diate result of our proof is that the rank weight enumerator
of the dual of any vector depends on only the rank weight
of the vector and is related to the rank weight enumerator
of a maximum rank distance (MRD) code. We also derive
additional identities that relate moments of the rank weight
distribution of a linear code to those of its dual code.
2 EURASIP Journal on Wireless Communications and Networking
Our work in this paper diﬀers from those in [4, 24, 25] in
several aspects.
(i) In this paper, we consider a rank weight enumerator
diﬀerent from that in [24, 25], and solve the original
problem of determining the functional transforma
tion of rank weight enumerators between dual codes
as deﬁned by Delsarte.
(ii) Our proof, based on character theory, does not
require the use of association schemes as in [4] or
combinatorial arguments as in [24, 25].
(iii) In [4], the MacWilliams identity is given between
the rank distance enumerator sequences of two
dual array codes using the generalized Krawtchouk
polynomials. Our identity is equivalent to that in [4]
for linear rank metric codes, although our identity
is expressed using diﬀerent parameters which are
shown to be the generalized Krawtchouk polynomials
as well. We also present this identity in the form of a
functional transformation (cf. Theorem 1). In such a
form, the MacWilliams identities for both the rank
and the Hamming metrics are similar to each other.
(iv) The functional transformation form allows us to
derive further identities (cf. Section 4) between the
rank weight distribution of linear dual codes. We
would like to stress that the identities between the
moments of the rank distribution proved in this
paper are novel and were not considered in the
aforementioned papers.
We remark that both the matrix form [4, 9] and the
vector form [10] for rank metric codes have been considered
in the literature. Following [10], in this paper the vector form
over GF(q
m
) is used for rank metric codes although their
rank weight is deﬁned by their corresponding code matrices
over GF(q) [10]. The vector formis chosen in this paper since
our results and their derivations for rank metric codes can
be readily related to their counterparts for Hamming metric
codes.
The rest of the paper is organized as follows. Section 2
reviews some necessary backgrounds. In Section 3, we estab
lish the MacWilliams identity for the rank metric. We ﬁnally
study the moments of the rank distributions of linear codes
in Section 4.
2. PRELIMINARIES
2.1. Rank metric, MRDcodes, and
rank weight enumerator
Consider an ndimensional vector x = (x
0
, x
1
, . . . , x
n−1
) ∈
GF(q
m
)
n
. The ﬁeld GF(q
m
) may be viewed as an m
dimensional vector space over GF(q). The rank weight of x,
denoted as rk(x), is deﬁned to be the maximum number of
coordinates in x that are linearly independent over GF(q)
[10]. Note that all ranks are with respect to GF(q) unless
otherwise speciﬁed in this paper. The coordinates of x thus
span a linear subspace of GF(q
m
), denoted as S(x), with
dimension equal to rk(x). For all x, y ∈ GF(q
m
)
n
, it is easily
veriﬁed that d
R
(x, y)
def
= rk(x − y) is a metric over GF(q
m
)
n
[10], referred to as the rank metric henceforth. The minimum
rank distance of a code C, denoted as d
R
(C), is simply the
minimum rank distance over all possible pairs of distinct
codewords. When there is no ambiguity about C, we denote
the minimum rank distance as d
R
.
Combining the bounds in [10, 26] and generalizing
slightly to account for nonlinear codes, we can show that the
cardinality K of a code C over GF(q
m
) with length n and
minimum rank distance d
R
satisﬁes
K ≤ min
¸
q
m(n−dR+1)
, q
n(m−dR+1)
¸
. (1)
In this paper, we call the bound in (1) the Singleton bound
for codes with the rank metric, and refer to codes that attain
the Singleton bound as maximum rank distance (MRD)
codes. We refer to MRD codes over GF(q
m
) with length
n ≤ m and with length n > m as ClassI and ClassII MRD
codes, respectively. For any given parameter set n, m, and
d
R
, explicit construction for linear or nonlinear MRD codes
exists. For n ≤ m and d
R
≤ n, generalized Gabidulin codes
[16] constitute a subclass of linear ClassI MRD codes. For
n > m and d
R
≤ m, a ClassII MRD code can be constructed
by transposing a generalized Gabidulin code of length m
and minimum rank distance d
R
over GF(q
n
), although this
code is not necessarily linear over GF(q
m
). When n = lm
(l ≥ 2), linear ClassII MRD codes of length n and minimum
distance d
R
can be constructed by a Cartesian product G
l
def
=
G × · · · × G of an (m, k) linear ClassI MRD code G
[26].
For all v ∈ GF(q
m
)
n
with rank weight r, the rank weight
function of v is deﬁned as f
R
(v) = y
r
x
n−r
. Let C be a code
of length n over GF(q
m
). Suppose there are A
i
codewords in
C with rank weight i (0 ≤ i ≤ n). Then the rank weight
enumerator of C, denoted as W
R
C
(x, y), is deﬁned to be
W
R
C
(x, y)
def
=
¸
v∈C
f
R
(v) =
n
¸
i=0
A
i
y
i
x
n−i
. (2)
2.2. Hadamard transform
Deﬁnition 1 (see [1]). Let Cbe the ﬁeld of complex numbers.
Let a ∈ GF(q
m
) and let {1, α
1
, . . . , α
m−1
} be a basis set of
GF(q
m
). We thus have a = a
0
+a
1
α
1
+· · ·+a
m−1
α
m−1
, where
a
i
∈ GF(q) for 0 ≤ i ≤ m − 1. Finally, letting ζ ∈ C be a
primitive qth root of unity, χ(a)
def
= ζ
a0
maps GF(q
m
) to C.
Deﬁnition 2 (Hadamard transform [1]). For a mapping f
from GF(q
m
)
n
to C, the Hadamard transform of f , denoted
as
f , is deﬁned to be
f (v)
def
=
¸
u∈GF(q
m
)
n
χ(u · v) f (u), (3)
where u · v denotes the inner product of u and v.
M. Gadouleau and Z. Yan 3
2.3. Notations
In order to simplify notations, we will occasionally denote
the vector space GF(q
m
)
n
as F. We denote the number of
vectors of rank u (0 ≤ u ≤ min{m, n}) in GF(q
m
)
n
as
N
u
(q
m
, n). It can be shown that N
u
(q
m
, n) = [
n
u
]α(m, u) [10],
where α(m, 0)
def
= 1 and α(m, u)
def
=
¸
u−1
i=0
(q
m
− q
i
) for u ≥ 1.
The [
n
u
] term is often referred to as a Gaussian polynomial
[27], deﬁned as [
n
u
]
def
= α(n, u)/α(u, u). Note that [
n
u
] is
the number of udimensional linear subspaces of GF(q)
n
.
We also deﬁne β(m, 0)
def
= 1 and β(m, u)
def
=
¸
u−1
i=0
[
m−i
1
]
for u ≥ 1. These terms are closely related to Gaussian
polynomials: β(m, u) = [
m
u
]β(u, u) and β(m + u, m + u) =
[
m+u
u
]β(m, m)β(u, u). Finally, σ
i
def
= i(i −1)/2 for i ≥ 0.
3. MACWILLIAMS IDENTITY FOR THE RANK METRIC
3.1. qproduct, qtransform, and qderivative
In order to express the MacWilliams identity in polynomial
formas well as to derive other identities, we introduce several
operations on homogeneous polynomials.
Let a(x, y; m) =
¸
r
i=0
a
i
(m)y
i
x
r−i
and b(x, y; m) =
¸
s
j=0
b
j
(m)y
j
x
s−j
be two homogeneous polynomials in x and
y of degrees r and s, respectively, with coeﬃcients a
i
(m) and
b
j
(m), respectively. a
i
(m) and b
j
(m) for i, j ≥ 0 in turn
are real functions of m, and are assumed to be zero unless
otherwise speciﬁed.
Deﬁnition 3 (qproduct). The qproduct of a(x, y; m) and
b(x, y; m) is deﬁned to be the homogeneous polynomial
of degree (r + s)c(x, y; m)
def
= a(x, y; m)∗b(x, y; m) =
¸
r+s
u=0
c
u
(m)y
u
x
r+s−u
, with
c
u
(m) =
u
¸
i=0
q
is
a
i
(m)b
u−i
(m−i). (4)
We will denote the qproduct by ∗ henceforth. For
n ≥ 0, the nth qpower of a(x, y; m) is deﬁned recursively:
a(x, y; m)
[0]
= 1 and a(x, y; m)
[n]
= a(x, y; m)
[n−1]
∗
a(x, y; m) for n ≥ 1.
We provide some examples to illustrate the concept. It is
easy to verify that x∗y = yx, y∗x = qyx, yx∗x = qyx
2
,
and yx∗(q
m
− 1)y = (q
m
− q)y
2
x. Note that x∗y
/
= y∗x. It
is easy to verify that the qproduct is neither commutative
nor distributive in general. However, it is commutative and
distributive in some special cases as described below.
Lemma 1. Suppose a(x, y; m) = a is a constant independent
from m. Then a(x, y; m) ∗ b(x, y; m) = b(x, y; m) ∗ a(x, y;
m) = ab(x, y; m). Also, if deg[c(x, y; m)] = deg[a(x, y; m)],
then [a(x, y; m)+c(x, y; m)]∗b(x, y; m) = a(x, y; m)∗b(x, y;
m) + c(x, y; m) ∗ b(x, y; m), and b(x, y; m) ∗ [a(x, y; m) +
c(x, y; m)] = b(x, y; m) ∗a(x, y; m) +b(x, y; m) ∗c(x, y; m).
The homogeneous polynomials a
l
(x, y; m)
def
= [x + (q
m
−
1)y]
[l]
and b
l
(x, y; m)
def
= (x − y)
[l]
are very important to
our derivations below. The following lemma provides the
analytical expressions of a
l
(x, y; m) and b
l
(x, y; m).
Lemma 2. For l ≥ 0, y
[l]
= q
σl
y
l
and x
[l]
= x
l
. Furthermore,
a
l
(x, y; m) =
l
¸
u=0
¸
l
u
α(m, u)y
u
x
l−u
,
b
l
(x, y; m) =
l
¸
u=0
¸
l
u
(−1)
u
q
σu
y
u
x
l−u
.
(5)
Note that a
l
(x, y; m) is the rank weight enumerator of
GF(q
m
)
l
. The proof of Lemma 2, which goes by induction on
l, is easy and hence omitted.
Deﬁnition 4 (qtransform). We deﬁne the qtransform of
a(x, y; m) =
¸
r
i=0
a
i
(m)y
i
x
r−i
as the homogeneous polyno
mial a(x, y; m) =
¸
r
i=0
a
i
(m)y
[i]
∗x
[r−i]
.
Deﬁnition 5 (qderivative [28]). For q ≥ 2, the qderivative
at x
/
=0 of a realvalued function f (x) is deﬁned as
f
(1)
(x)
def
=
f (qx) − f (x)
(q −1)x
. (6)
For any real number a, [ f (x) + ag(x)]
(1)
= f
(1)
(x) +
ag
(1)
(x) for x
/
=0. For ν ≥ 0, we will denote the νth q
derivative (with respect to x) of f (x, y) as f
(ν)
(x, y). The 0th
qderivative of f (x, y) is deﬁned to be f (x, y) itself.
Lemma 3. For 0 ≤ ν ≤ l, (x
l
)
(ν)
= β(l, ν)x
l−ν
. The νth q
derivative of f (x, y) =
¸
r
i=0
f
i
y
i
x
r−i
is given by f
(ν)
(x, y) =
¸
r−ν
i=0
f
i
β(i, ν)y
i
x
r−i−ν
. Also,
a
(ν)
l
(x, y; m) = β(l, ν)a
l−ν
(x, y; m),
b
(ν)
l
(x, y; m) = β(l, ν)b
l−ν
(x, y; m).
(7)
The proof of Lemma 3, which goes by induction on ν, is
easy and hence omitted.
Lemma 4 (Leibniz rule for the qderivative). For two homo
geneous polynomials f (x, y) and g(x, y) with degrees r and s,
respectively, the νth (ν ≥ 0) qderivative of their qproduct is
given by
¸
f (x, y)∗g(x, y)
¸
(ν)
=
ν
¸
l=0
¸
ν
l
q
(ν−l)(r−l)
f
(l)
(x, y)∗g
(ν−l)
(x, y).
(8)
The proof of Lemma 4 is given in Appendix A.
The q
−1
derivative is similar to the qderivative.
Deﬁnition 6 (q
−1
derivative). For q ≥ 2, the q
−1
derivative
at y
/
=0 of a realvalued function g(y) is deﬁned as
g
{1}
(y)
def
=
g
¸
q
−1
y
−g(y)
¸
q
−1
−1
y
. (9)
For any real number a, [ f (y) + ag(y)]
{1}
= f
{1}
(y) +
ag
{1}
(y) for y
/
=0. For ν ≥ 0, we will denote the νth
4 EURASIP Journal on Wireless Communications and Networking
q
−1
derivative (with respect to y) of g(x, y) as g
{ν}
(x, y). The
0th q
−1
derivative of g(x, y) is deﬁned to be g(x, y) itself.
Lemma 5. For 0 ≤ ν ≤ l, the νth q
−1
derivative of y
l
is
(y
l
)
{ν}
= q
ν(1−n)+σν
β(l, ν)y
l−ν
. Also,
a
{ν}
l
(x, y; m) = β(l, ν)q
−σν
α(m, ν)a
l−ν
(x, y; m−ν),
b
{ν}
l
(x, y; m) = (−1)
ν
β(l, ν)b
l−ν
(x, y; m).
(10)
The proof of Lemma 5 is similar to that of Lemma 3 and
is hence omitted.
Lemma 6 (Leibniz rule for the q
−1
derivative). For two
homogeneous polynomials f (x, y; m) and g(x, y; m) with
degrees r and s, respectively, the νth (ν ≥ 0) q
−1
derivative of
their qproduct is given by
¸
f (x, y; m)∗g(x, y; m)
¸
{ν}
=
ν
¸
l=0
¸
ν
l
q
l(s−ν+l)
f
{l}
(x, y; m)∗g
{ν−l}
(x, y; m−l).
(11)
The proof of Lemma 6 is given in Appendix B.
3.2. The dual of a vector
As an important step toward our main result, we derive the
rank weight enumerator of v
⊥
, where v ∈ GF(q
m
)
n
is an
arbitrary vector and v
def
= {av : a ∈ GF(q
m
)}. Note that
v can be viewed as an (n, 1) linear code over GF(q
m
) with
a generator matrix v. It is remarkable that the rank weight
enumerator of v
⊥
depends on only the rank of v.
Berger [14] has determined that linear isometries for the
rank distance are given by the scalar multiplication by a
nonzero element of GF(q
m
), and multiplication on the right
by a nonsingular matrix B ∈ GF(q)
n×n
. We say that two codes
C and C
are rankequivalent if there exists a linear isometry
f for the rank distance such that f (C) = C
.
Lemma 7. Suppose v has rank r ≥ 1. Then L = v
⊥
is rank
equivalent to C ×GF(q
m
)
n−r
, where C is an (r, r −1, 2) MRD
code and × denotes Cartesian product.
Proof. We can express v as v = vB, where v = (v
0
, . . . ,
v
r−1
, 0 . . . , 0) has rank r, and B ∈ GF(q)
n×n
has full rank.
Remark that v is the paritycheck of C × GF(q
m
)
n−r
, where
C = (v
0
, . . . , v
r−1
)
⊥
is an (r, r − 1, 2) MRD code. It can be
easily checked that u ∈ L if and only if u
def
= uB
T
∈ v
⊥
.
Therefore, v
⊥
= LB
T
, and hence L is rankequivalent to
v
⊥
= C ×GF(q
m
)
n−r
.
We hence derive the rank weight enumerator of an (r, r −
1, 2) MRD code. Note that the rank weight distribution
of linear ClassI MRD codes has been derived in [4, 10].
However, we will not use the result in [4, 10], and instead
derive the rank weight enumerator of an (r, r − 1, 2) MRD
code directly.
Proposition 1. Suppose v
r
∈ GF(q
m
)
r
has rank r (0 ≤ r ≤
m). The rank weight enumerator of L
r
= v
⊥
depends on
only r and is given by
W
R
Lr
(x, y) = q
−m
¸
¸
x +
¸
q
m
−1
y
¸
[r]
+
¸
q
m
−1
(x − y)
[r]
¸
.
(12)
Proof. We ﬁrst prove that the number of vectors with rank r
in L
r
, denoted as A
r,r
, depends only on r and is given by
A
r,r
= q
−m
¸
α(m, r) +
¸
q
m
−1
(−1)
r
q
σr
¸
(13)
by induction on r (r ≥ 1). Equation (13) clearly holds for
r = 1. Suppose (13) holds for r = r −1.
We consider all the vectors u = (u
0
, . . . , u
r−1
) ∈ L
r
such
that the ﬁrst r −1 coordinates of u are linearly independent.
Remark that u
r−1
= −v
−1
r−1
¸
r−2
i=0
u
i
v
i
is completely determined
by u
0
, . . . , u
r−2
. Thus there are N
r−1
(q
m
, r −1) = α(m, r −1)
such vectors u. Among these vectors, we will enumerate the
vectors t whose last coordinate is a linear combination of the
ﬁrst r −1 coordinates, that is, t = (t
0
, . . . , t
r−2
,
¸
r−2
i=0
a
i
t
i
) ∈ L
r
where a
i
∈ GF(q) for 0 ≤ i ≤ r −2.
Remark that t ∈ L
r
if and only if (t
0
, . . . , t
r−2
) · (v
0
+
a
0
v
r−1
, . . . , v
r−2
+ a
r−2
v
r−1
) = 0. It is easy to check that
v(a) = (v
0
+ a
0
v
r−1
, . . . , v
r−2
+ a
r−2
v
r−1
) has rank r − 1.
Therefore, if a
0
, . . . , a
r−2
are ﬁxed, then there are A
r−1,r−1
such vectors t. Also, suppose
¸
r−2
i=0
t
i
v
i
+ v
r−1
¸
r−2
i=0
b
i
t
i
= 0.
Hence
¸
r−2
i=0
(a
i
− b
i
)t
i
= 0, which implies a = b since t
i
’s
are linearly independent. That is, v(a)
⊥
∩ v(b)
⊥
= {0}
if a
/
=b. We conclude that there are q
r−1
A
r−1,r−1
vectors t.
Therefore, A
r,r
= α(m, r −1) −q
r−1
A
r−1,r−1
= q
−m
[α(m, r) +
(q
m
−1)(−1)
r
q
σr
].
Denote the number of vectors with rank p in L
r
as
A
r, p
. We have A
r, p
= [
r
p ]A
p, p
[10], and hence A
r, p
=
[
r
p ]q
−m
[α(m, p) + (q
m
− 1)(−1)
p
q
σp
]. Thus, W
R
Lr
(x, y) =
¸
r
p=0
A
r, p
x
r−p
y
p
= q
−m
{[x + (q
m
−1)y]
[r]
+ (q
m
− 1)(x −
y)
[r]
}.
We comment that Proposition 1 in fact provides the rank
weight distribution of any (r, r −1, 2) MRD code.
Lemma 8. Let C
0
⊆ GF(q
m
)
r
be a linear code with rank
weight enumerator W
R
C0
(x, y), and for s ≥ 0, let W
R
Cs
(x, y)
be the rank weight enumerator of C
s
def
= C
0
× GF(q
m
)
s
. Then
W
R
Cs
(x, y) is given by
W
R
Cs
(x, y) = W
R
C0
(x, y)∗
¸
x +
¸
q
m
−1
y
¸
[s]
. (14)
Proof. For s ≥ 0, denote W
R
Cs
(x, y) =
¸
r+s
u=0
B
s,u
y
u
x
r+s−u
. We
will prove that
B
s,u
=
u
¸
i=0
q
is
B
0,i
¸
s
u −i
α(m−i, u −i) (15)
by induction on s. Equation (15) clearly holds for s = 0.
Now assume (15) holds for s = s − 1. For any x
s
=
(x
0
, . . . , x
r+s−1
) ∈ C
s
, we deﬁne x
s−1
= (x
0
, . . . , x
r+s−2
) ∈
C
s−1
. Then rk(x
s
) = u if and only if either rk(x
s−1
) = u and
M. Gadouleau and Z. Yan 5
x
r+s−1
∈ S(x
s−1
) or rk(x
s−1
) = u − 1 and x
r+s−1 / ∈S(x
s−1
).
This implies B
s,u
= q
u
B
s−1,u
+ (q
m
− q
u−1
)B
s−1,u−1
=
¸
u
i=0
q
is
B
0,i
[
s
u−i
]α(m−i, u −i).
Combining Lemma 7, Proposition 1, and Lemma 8, the
rank weight enumerator of v
⊥
can be determined at last.
Proposition 2. For v ∈ GF(q
m
)
n
with rank r ≥ 0, the rank
weight enumerator of L = v
⊥
depends on only r, and is given
by
W
R
L
(x, y) = q
−m
¸¸
x +
¸
q
m
−1
y
¸
[n]
+
¸
q
m
−1
(x − y)
[r]
∗
¸
x+
¸
q
m
−1
y
¸
[n−r]
¸
.
(16)
3.3. MacWilliams identity for the rank metric
Using the results in Section 3.2, we now derive the
MacWilliams identity for rank metric codes. Let C be an
(n, k) linear code over GF(q
m
), let W
R
C
(x, y) =
¸
n
i=0
A
i
y
i
x
n−i
be its rank weight enumerator, and let W
R
C
⊥(x, y) =
¸
n
j=0
B
j
y
j
x
n−j
be the rank weight enumerator of its dual code
C
⊥
.
Theorem 1. For any (n, k) linear code C and its dual code C
⊥
over GF(q
m
),
W
R
C
⊥(x, y) =
1
C
W
R
C
¸
x +
¸
q
m
−1
y, x − y
, (17)
where W
R
C
is the qtransform of W
R
C
. Equivalently,
n
¸
j=0
B
j
y
j
x
n−j
= q
−mk
n
¸
i=0
A
i
(x − y)
[i]
∗
¸
x +
¸
q
m
−1
y
¸
[n−i]
.
(18)
Proof. We have rk(λu) = rk(u) for all λ ∈ GF(q
m
)
∗
and
all u ∈ GF(q
m
)
n
. We want to determine
f
R
(v) for all v ∈
GF(q
m
)
n
. By Deﬁnition 2, we can split the summation in (3)
into two parts:
f
R
(v) =
¸
u∈L
χ(u · v) f
R
(u) +
¸
u∈F\L
χ(u · v) f
R
(u), (19)
where L = v
⊥
. If u ∈ L, then χ(u · v) = 1 by Deﬁnition 1,
and the ﬁrst summation is equal to W
R
L
(x, y). For the second
summation, we divide vectors into groups of the form{λu
1
},
where λ ∈ GF(q
m
)
∗
and u
1
· v = 1. We remark that for
u ∈ F \ L(see [1, Chapter 5, Lemma 9]):
¸
λ∈GF(q
m
)
∗
χ(λu
1
· v) f
R
(λu
1
) = f
R
(u
1
)
¸
λ∈GF(q
m
)
∗
χ(λ) = −f
R
(u
1
).
(20)
Hence the second summation is equal to (−1/(q
m
−1))W
R
F\L
(x,
y). This leads to
f
R
(v) = (1/(q
m
−1))[q
m
W
R
L
(x, y) −W
R
F
(x,
y)]. Using W
R
F
(x, y) = [x + (q
m
−1)y]
[n]
and Proposition 2,
we obtain
f
R
(v) = (x−y)
[r]
∗[x + (q
m
−1)y]
[n−r]
, where r =
rk(v).
By [1, Chapter 5, Lemma 11], any mapping f from F
to C satisﬁes
¸
v∈C
⊥ f (v) = (1/C)
¸
v∈C
f (v). Applying this
result to f
R
(v) and using Deﬁnition 4, we obtain (17) and
(18).
Also, B
j
’s can be explicitly expressed in terms of A
i
’s.
Corollary 1. It holds that
B
j
=
1
C
n
¸
i=0
A
i
P
j
(i; m, n), (21)
where
P
j
(i; m, n)
def
=
j
¸
l=0
¸
i
l
¸
n −i
j −l
(−1)
l
q
σl
q
l(n−i)
α(m−l, j −l).
(22)
Proof. We have (x − y)
[i]
∗
¸
x +
¸
q
m
− 1
y
[n−i]
=
¸
n
j=0
P
j
(i;
m, n)y
j
x
n−j
. The result follows Theorem 1.
Note that although the analytical expression in (21) is
similar to that in [4, (3.14)], P
j
(i; m, n) in (22) are diﬀerent
from P
j
(i) in [4, (A10)] and their alternative forms in [29].
We can show the following:
Proposition 3. P
j
(x; m, n) in (22) are the generalized
Krawtchouk polynomials.
The proof is given in Appendix C. Proposition 3 shows
that P
j
(x; m, n) in (22) are an alternative form for P
j
(i) in [4,
(A10)], and hence our results in Corollary 1 are equivalent
to those in [4, Theorem 3.3]. Also, it was pointed out in [29]
that P
j
(x; m, n)/P
j
(0; m, n) is actually a basic hypergeometric
function.
4. MOMENTS OF THE RANK DISTRIBUTION
4.1. Binomial moments of the rank distribution
In this section, we investigate the relationship between
moments of the rank distribution of a linear code and those
of its dual code. Our results parallel those in [1, page 131].
Proposition 4. For 0 ≤ ν ≤ n,
n−ν
¸
i=0
¸
n −i
ν
A
i
= q
m(k−ν)
ν
¸
j=0
¸
n − j
n −ν
B
j
. (23)
Proof. First, applying Theorem 1 to C
⊥
, we obtain
n
¸
i=0
A
i
y
i
x
n−i
=q
m(k−n)
n
¸
j=0
B
j
b
j
(x, y; m)∗a
n−j
(x, y; m). (24)
Next, we apply the qderivative with respect to x
to (24) ν times. By Lemma 3 the lefthand side (LHS)
6 EURASIP Journal on Wireless Communications and Networking
becomes
¸
n−ν
i=0
β(n − i, ν)A
i
y
i
x
n−i−ν
, while the RHS reduces
to q
m(k−n)
¸
n
j=0
B
j
ψ
j
(x, y) by Lemma 4, where
ψ
j
(x, y)
def
=
¸
b
j
(x, y; m)∗a
n−j
(x, y; m)
¸
(ν)
=
ν
¸
l=0
¸
ν
l
q
(ν−l)( j−l)
b
(l)
j
(x, y)∗a
(ν−l)
n−j
(x, y; m).
(25)
By Lemma 3, b
(l)
j
(x, y; m) = β( j, l)(x − y)
[ j−l]
and a
(ν−l)
n−j
(x, y;
m) = β(n − j, ν − l)a
n−j−ν+l
(x, y; m). It can be veriﬁed
that for any homogeneous polynomial b(x, y; m) and for any
s ≥ 0, (b∗a
s
)(1, 1; m) = q
ms
b(1, 1; m). Also, for x = y = 1,
b
(l)
j
(1, 1; m) = β( j, j)δ
j,l
. We hence have ψ
j
(1, 1) = 0 for
j > ν, and ψ
j
(1, 1) = [
ν
j
]β( j, j)β(n − j, ν − j)q
m(n−ν)
for
j ≤ ν. Since β(n−j, ν−j) = [
n−j
ν−j
]β(ν−j, ν−j) and β(ν, ν) =
[
ν
j
]β( j, j)β(ν − j, ν − j), then ψ
j
(1, 1) = [
n−j
ν−j
]β(ν, ν)q
m(n−ν)
.
Applying x = y = 1 to the LHS and rearranging both sides
using β(n −i, ν) = [
n−i
ν
]β(ν, ν), we obtain (23).
Proposition 4 can be simpliﬁed if ν is less than the
minimum distance of the dual code.
Corollary 2. Let d
R
be the minimum rank distance of C
⊥
. If
0 ≤ ν < d
R
, then
n−ν
¸
i=0
¸
n −i
ν
A
i
= q
m(k−ν)
¸
n
ν
. (26)
Proof. We have B
0
= 1 and B
1
= · · · = B
ν
= 0.
Using the q
−1
derivative, we obtain another identity.
Proposition 5. For 0 ≤ ν ≤ n,
n
¸
i=ν
¸
i
ν
q
ν(n−i)
A
i
= q
m(k−ν)
ν
¸
j=0
¸
n − j
n −ν
(−1)
j
q
σj
α(m− j, ν − j)q
j(ν−j)
B
j
.
(27)
The proof of Proposition 5 is similar to that of
Proposition 4, and is given in Appendix D. Following [1], we
refer to the LHS of (23) and (27) as binomial moments of
the rank distribution of C. Similarly, when either ν is less
than the minimum distance d
R
of the dual code, or ν is
greater than the diameter (maximum distance between any
two codewords) δ
R
of the dual code, Proposition 5 can be
simpliﬁed.
Corollary 3. If 0 ≤ ν < d
R
, then
n
¸
i=ν
¸
i
ν
q
ν(n−i)
A
i
= q
m(k−ν)
¸
n
ν
α(m, ν). (28)
For δ
R
< ν ≤ n,
ν
¸
i=0
¸
n −i
n −ν
(−1)
i
q
σi
α(m−i, ν −i)q
i(ν−i)
A
i
= 0. (29)
Proof. Apply Proposition 5 to C, and use B
1
= · · · = B
ν
=
0 to prove (28). Apply Proposition 5 to C
⊥
, and use B
ν
=
· · · = B
n
= 0 to prove (29).
4.2. Pless identities for the rank distribution
In this section, we consider the analogues of the Pless
identities [1, 2], in terms of Stirling numbers. The qStirling
numbers of the second kind S
q
(ν, l) are deﬁned [30] to be
S
q
(ν, l)
def
=
q
−σl
β(l, l)
l
¸
i=0
(−1)
i
q
σi
¸
l
i
¸
l −i
1
ν
, (30)
and they satisfy
¸
m
1
ν
=
ν
¸
l=0
q
σl
S
q
(ν, l)β(m, l). (31)
The following proposition can be viewed as a qanalogue
of the Pless identity with respect to x [2, P
2
].
Proposition 6. For 0 ≤ ν ≤ n,
q
−mk
n
¸
i=0
¸
n −i
1
ν
A
i
=
ν
¸
j=0
B
j
ν
¸
l=0
¸
n − j
n −l
β(l, l)S
q
(ν, l)q
−ml+σl
.
(32)
Proof. We have
n
¸
i=0
¸
n −i
1
ν
A
i
=
n
¸
i=0
A
i
ν
¸
l=0
q
σl
S
q
(ν, l)
¸
n −i
l
β(l, l) (33)
=
ν
¸
l=0
q
σl
β(l, l)S
q
(ν, l)
n
¸
i=0
¸
n −i
l
A
i
=
ν
¸
l=0
q
σl
β(l, l)S
q
(ν, l)q
m(k−l)
l
¸
j=0
¸
n − j
n −l
B
j
(34)
= q
mk
ν
¸
j=0
B
j
ν
¸
l=0
¸
n − j
n −l
q
σl
β(l, l)S
q
(ν, l)q
−ml
,
where (33) follows (31) and (34) is due to Proposition 4.
Proposition 6 can be simpliﬁed when ν is less than the
minimum distance of the dual code.
Corollary 4. For 0 ≤ ν < d
R
,
q
−mk
n
¸
i=0
¸
n −i
1
ν
A
i
=
ν
¸
l=0
β(n, l)S
q
(ν, l)q
−ml+σl
(35)
=q
−mn
n
¸
i=0
¸
n −i
1
ν
¸
n
i
α(m, i). (36)
M. Gadouleau and Z. Yan 7
Proof. Since B
0
= 1 and B
1
= · · · = B
ν
= 0, (32)
directly leads to (35). Since the righthand side of (35) is
transparent to the code, without loss of generality we choose
C = GF(q
m
)
n
and (36) follows naturally.
Unfortunately, a qanalogue of the Pless identity with
respect to y [2, P
1
] cannot be obtained due to the presence
of the q
ν(n−i)
term in the LHS of (27). Instead, we derive
its q
−1
analogue. We denote p
def
= q
−1
and deﬁne the
functions α
p
(m, u), [
n
u
]
p
, β
p
(m, u) similarly to the functions
introduced in Section 2.3, only replacing q by p. It is easy to
relate these q
−1
functions to their counterparts: α(m, u) =
p
−mu−σu
(−1)
u
α
p
(m, u), [
n
u
] = p
−u(n−u)
[
n
u
]
p
, and β(m, u) =
p
−u(m−u)−σu
β
p
(m, u).
Proposition 7. For 0 ≤ ν ≤ n,
p
mk
n
¸
i=0
¸
i
1
ν
p
A
i
=
ν
¸
j=0
B
j
p
j(m+n−j)
ν
¸
l=j
β
p
(l, l)S
p
(ν, l)(−1)
l
¸
n−j
n−l
p
α
p
(m−j, l −j).
(37)
The proof of Proposition 7 is given in Appendix E.
Corollary 5. For 0 ≤ ν < d
R
,
p
mk
n
¸
i=0
¸
i
1
ν
p
A
i
=
ν
¸
l=0
β
p
(n, l)S
p
(ν, l)α
p
(m, l)(−1)
l
. (38)
Proof. Note that B
0
= 1 and B
1
= · · · = B
ν
= 0.
4.3. Further results on the rank distribution
For nonnegative integers λ, μ, and ν, and a linear code C with
rank weight distribution {A
i
}, we deﬁne
T
λ,μ,ν
(C)
def
= q
−mk
n
¸
i=0
¸
i
λ
μ
q
ν(n−i)
A
i
, (39)
whose properties are studied below. We refer to
T
0,0,ν
(C)
def
= q
−mk
n
¸
i=0
q
ν(n−i)
A
i
(40)
as the νth qmoment of the rank distribution of C. We
remark that for any code C, the 0th order qmoment of its
rank distribution is equal to 1. We ﬁrst relate T
λ,1,ν
(C) and
T
1,μ,ν
(C) to T
0,0,ν
(C).
Lemma 9. For nonnegative integers λ, μ, and ν,
T
λ,1,ν
(C) =
1
α(λ, λ)
λ
¸
l=0
¸
λ
l
(−1)
l
q
σl
q
n(λ−l)
T
0,0,ν−λ+l
(C),
(41)
T
1,μ,ν
(C) = (1 −q)
−μ
μ
¸
a=0
μ
a
¸
(−1)
a
q
an
T
0,0,ν−a
(C). (42)
The proof of Lemma 9 is given in Appendix F. We now
consider the case where ν is less than the minimum distance
of the dual code.
Proposition 8. For 0 ≤ ν < d
R
,
T
0,0,ν
(C) =
ν
¸
j=0
¸
ν
j
α(n, j)q
−mj
(43)
= q
−mn
n
¸
i=0
¸
n
i
α(m, i)q
ν(n−i)
(44)
= q
−mν
ν
¸
l=0
¸
ν
l
α(m, l)q
n(ν−l)
. (45)
The proof of Proposition 8 is given in Appendix G.
Proposition 8 hence shows that the νth qmoment of the
rank distribution of a code is transparent to the code when
ν < d
R
. As a corollary, we show that T
λ,1,ν
(C) and T
1,μ,ν
(C)
are also transparent to the code when 0 ≤ λ, μ ≤ ν < d
R
.
Corollary 6. For 0 ≤ λ, μ ≤ ν < d
R
,
T
λ,1,ν
(C) = q
−mn
¸
n
λ
n
¸
i=λ
¸
n −λ
i −λ
q
ν(n−i)
α(m, i),
T
1,μ,ν
(C) = q
−mn
n
¸
i=0
¸
i
1
μ
q
ν(n−i)
¸
n
i
α(m, i).
(46)
Proof. By Lemma 9 and Proposition 8, T
λ,1,ν
(C) and
T
1,μ,ν
(C) are transparent to the code. Thus, without loss of
generality we assume C = GF(q
m
)
n
and (46) follows.
4.4. Rank weight distribution of MRDcodes
The rank weight distribution of linear ClassI MRD codes
was given in [4, 10]. Based on our results in Section 4.1,
we provide an alternative derivation of the rank distribution
of linear ClassI MRD codes, which can also be used to
determine the rank weight distribution of ClassII MRD
codes.
Proposition 9 (rank distribution of linear ClassI MRD
codes). Let C be an (n, k, d
R
) linear ClassI MRD code over
GF(q
m
)(n ≤ m), and let W
R
C
(x, y) =
¸
n
i=0
A
i
y
i
x
n−i
be its rank
weight enumerator. We then have A
0
= 1 and for 0 ≤ i ≤
n −d
R
,
A
dR+i
=
¸
n
d
R
+ i
i
¸
j=0
(−1)
i−j
q
σi−j
¸
d
R
+ i
d
R
+ j
¸
q
m( j+1)
−1
.
(47)
Proof. It can be shown that for two sequences of real numbers
{a
j
}
l
j=0
and {b
i
}
l
i=0
such that a
j
=
¸
j
i=0
[
l−i
l−j
]b
i
for 0 ≤ j ≤ l,
we have b
i
=
¸
i
j=0
(−1)
i−j
q
σi−j
[
l−j
l−i
]a
j
for 0 ≤ i ≤ l.
By Corollary 2, we have
¸
j
i=0
[
n−dR−i
n−dR−j
]A
dR+i
=[
n
n−dR−j
](q
m( j+1)
−
1) for 0 ≤ j ≤ n−d
R
. Applying the result above to l = n−d
R
,
8 EURASIP Journal on Wireless Communications and Networking
a
j
= [
n
n−dR−j
](q
m( j+1)
−1), and b
i
= A
dR+i
, we obtain
A
dR+i
=
i
¸
j=0
(−1)
i−j
q
σi−j
¸
n
d
R
+ i
¸
d
R
+ i
d
R
+ j
¸
q
m( j+1)
−1
.
(48)
We remark that the above rank distribution is consistent
with that derived in [4, 10]. Since ClassII MRD codes can
be constructed by transposing linear ClassI MRD codes and
the transposition operation preserves the rank weight, the
weight distributions ClassII MRD codes can be obtained
accordingly.
APPENDICES
The proofs in this section use some wellknown properties
of Gaussian polynomials [27]: [
n
k
] = [
n
n−k
], [
n
k
][
k
l
] =
[
n
l
][
n−l
n−k
], and
¸
n
k
=
¸
n −1
k
+ q
n−k
¸
n −1
k −1
(A.1)
= q
k
¸
n −1
k
+
¸
n −1
k −1
(A.2)
=
q
n
−1
q
n−k
−1
¸
n −1
k
(A.3)
=
q
n−k+1
−1
q
k
−1
¸
n
k −1
.
(A.4)
A. PROOF OF LEMMA 4
We consider homogeneous polynomials f (x, y; m) =
¸
r
i=0
f
i
y
i
x
r−i
and u(x, y; m) =
¸
r
i=0
u
i
y
i
x
r−i
of degree r as well
as g(x, y; m) =
¸
s
j=0
g
j
y
j
x
s−j
and v(x, y; m) =
¸
s
j=0
v
j
y
j
x
s−j
of degree s. First, we need a technical lemma.
Lemma 10. If u
r
= 0, then
1
x
(u(x, y; m)∗v(x, y; m)) =
u(x, y; m)
x
∗v(x, y; m). (A.5)
If v
s
= 0, then
1
x
(u(x, y; m)∗v(x, y; m))=u(x, qy; m)∗
v(x, y; m)
x
. (A.6)
Proof. Suppose u
r
= 0. Then u(x, y; m)/x =
¸
r−1
i=0
u
i
y
i
x
r−1−i
.
Hence
u(x, y; m)
x
∗v(x,y; m)=
r+s−1
¸
k=0
k
¸
l=0
q
ls
u
l
(m)v
k−l
(m−l)
¸
y
k
x
r+s−1−k
=
1
x
¸
u(x, y; m)∗v(x, y; m)
.
(A.7)
Suppose v
s
= 0. Then v(x, y; m)/x =
¸
s−1
j=0
v
j
y
j
x
s−1−j
. Hence
u(x, qy; m)∗
v(x, y; m)
x
=
r+s−1
¸
k=0
k
¸
l=0
q
l(s−1)
q
l
u
l
(m)v
k−l
(m−l)
¸
y
k
x
r+s−1−k
=
1
x
¸
u(x, y; m)∗v(x, y; m)
.
(A.8)
We now give a proof of Lemma 4.
Proof. In order to simplify notations, we omit the depen
dence of the polynomials f and g on the parameter m. The
proof goes by induction on ν. For ν = 0, the result is trivial.
For ν = 1, we have
¸
f (x, y)∗g(x, y)
¸
(1)
=
1
(q −1)x
¸
f (qx, y)∗g(qx, y) − f (qx, y)∗g(x, y)
+ f (qx, y)∗g(x, y) − f (x, y)∗g(x, y)
¸
=
1
(q −1)x
¸
f (qx, y)∗(g(qx, y) −g(x, y))
+ ( f (qx, y) − f (x, y))∗g(x, y)
¸
= f (qx, qy)∗
g(qx, y)−g(x, y)
(q −1)x
+
f (qx, y)−f (x, y)
(q −1)x
∗g(x, y),
(A.9)
= q
r
f (x, y)∗g
(1)
(x, y) + f
(1)
(x, y)∗g(x, y), (A.10)
where (A.9) follows Lemma 10.
Now suppose (8) is true for ν = ν. In order to further
simplify notations, we omit the dependence of the various
polynomials in x and y. We have
( f ∗g)
(ν+1)
=
ν
¸
l=0
¸
ν
l
q
(ν−l)(r−l)
¸
f
(l)
∗g
(ν−l)
¸
(1)
=
ν
¸
l=0
¸
ν
l
q
(ν−l)(r−l)
¸
q
r−l
f
(l)
∗g
(ν−l+1)
+f
(l+1)
∗g
(ν−l)
(A.11)
=
ν
¸
l=0
¸
ν
l
q
(ν+1−l)(r−l)
f
(l)
∗g
(ν−l+1)
+
ν+1
¸
l=1
¸
ν
l −1
q
(ν+1−l)(r−l+1)
f
(l)
∗g
(ν−l+1)
=
ν
¸
l=1
¸
ν
l
+ q
ν+1−l
¸
ν
l −1
¸
q
(ν+1−l)(r−l)
f
(l)
∗g
(ν−l+1)
+ q
(ν+1)r
f ∗g
(ν+1)
+ f
(ν+1)
∗g
=
ν+1
¸
l=0
¸
ν + 1
l
q
(ν+1−l)(r−l)
f
(l)
∗g
(ν−l+1)
,
(A.12)
where (A.11) follows (A.10), and (A.12) follows (A.1).
M. Gadouleau and Z. Yan 9
B. PROOF OF LEMMA 6
We consider homogeneous polynomials f (x, y; m) =
¸
r
i=0
f
i
y
i
x
r−i
and u(x, y; m) =
¸
r
i=0
u
i
y
i
x
r−i
of degree r as well
as g(x, y; m) =
¸
s
j=0
g
j
y
j
x
s−j
and v(x, y; m) =
¸
s
j=0
v
j
y
j
x
s−j
of degree s. First, we need a technical lemma.
Lemma 11. If u
0
= 0, then
1
y
¸
u(x, y; m))∗v(x, y; m)
= q
s
u(x, y; m)
y
∗v(x, y; m−1).
(B.1)
If v
0
= 0, then
1
y
¸
u(x, y; m)∗v(x, y; m)
= u(x, qy; m)∗
v(x, y; m)
y
.
(B.2)
Proof. Suppose u
0
=0. Then u(x, y; m)/ y =
¸
r−1
i=0
u
i+1
x
r−1−i
y
i
.
Hence
q
s
u(x, y; m)
y
∗v(x, y; m−1)
= q
s
r+s−1
¸
k=0
k
¸
l=0
q
ls
u
l+1
v
k−l
(m−1 −l)
¸
x
r+s−1−k
y
k
= q
s
r+s
¸
k=1
k
¸
l=1
q
(l−1)s
u
l
v
k−l
(m−l)
¸
x
r+s−k
y
k−1
=
1
y
¸
u(x, y; m)∗v(x, y; m)
.
(B.3)
Suppose v
0
= 0. Then v(x, y; m)/ y =
¸
s−1
j=0
v
j+1
x
s−1−j
y
j
.
Hence
u(x, qy; m)∗
v(x, y; m)
y
=
r+s−1
¸
k=0
k
¸
l=0
q
l(s−1)
q
l
u
l
v
k−l+1
(m−l)
¸
x
r+s−1−k
y
k
=
r+s
¸
k=1
k−1
¸
l=0
q
ls
u
l
v
k−l
(m−l)
¸
x
r+s−k
y
k−1
=
1
y
(u(x, y; m)∗v(x, y; m)).
(B.4)
We now give a proof of Lemma 6.
Proof. The proof goes by induction on ν, and is similar to
that of Lemma 4. For ν = 0, the result is trivial. For ν = 1 we
can easily show, by using Lemma 11, that
¸
f (x, y; m)∗g(x, y; m)
¸
{1}
=f (x, y; m)∗g
{1}
(x, y; m) + q
s
f
{1}
(x, y; m)∗g(x, y; m−1)
(B.5)
It is thus easy to verify the claim by induction on ν.
C. PROOF OF PROPOSITION3
Proof. It was shown in [29] that the generalized Krawtchouk
polynomials are the only solutions to the recurrence
P
j+1
(i+1; m+1, n+1)=q
j+1
P
j+1
(i+1; m, n)−q
j
P
j
(i; m, n)
(C.1)
with initial conditions P
j
(0; m, n) = [
n
j ]α(m, j). Clearly, our
polynomials satisfy these initial conditions. We hence show
that P
j
(i; m, n) satisfy the recurrence in (C.1). We have
P
j+1
(i + 1; m + 1, n + 1)
=
i+1
¸
l=0
¸
i +1
l
¸
n −i
j +1−l
(−1)
l
q
σl
q
l(n−i)
α(m+1−l, j +1−l)
=
i+1
¸
l=0
¸
i + 1
l
¸
m +1−l
j +1−l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j +1−l)
=
i+1
¸
l=0
q
l
¸
i
l
+
¸
i
l −1
q
j+1−l
¸
m−l
j + 1 −l
+
¸
m−l
j −l
(C.2)
×(−1)
l
q
σl
q
l(n−i)
α(n −i, j + 1 −l),
=
i
¸
l=0
¸
i
l
q
j+1
¸
m−l
j + 1 −l
(−1)
l
q
σl
q
l(n−i)
α(n −i, j + 1 −l)
+
i
¸
l=0
q
l
¸
i
l
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n −i, j + 1 −l)
+
i+1
¸
l=1
¸
i
l −1
q
j+1−l
¸
m−l
j +1−l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j +1−l)
+
i+1
¸
l=1
¸
i
l −1
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n −i, j + 1 −l),
(C.3)
where (C.2) follows (A.2). Let us denote the four summa
tions in the righthand side of (C.3) as A, B, C, and D,
respectively. We have A = q
j+1
P
j+1
(i; m, n), and
B=
i
¸
l=0
¸
i
l
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j −l)
¸
q
n−i+l
−q
j
,
(C.4)
C=
i
¸
l=0
¸
i
l
q
j−l
¸
m−l −1
j −l
(−1)
l+1
q
σl+1
q
(l+1)(n−i)
α(n−i, j −l)
=−q
j+n−i
i
¸
l=0
¸
i
l
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j −l)
q
m−j
−1
q
m−l
−1
,
(C.5)
10 EURASIP Journal on Wireless Communications and Networking
D=−q
n−i
i
¸
l=0
¸
i
l
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j −l)q
l
q
j−l
−1
q
m−l
−1
,
(C.6)
where (C.5) follows (A.3) and (C.6) follows both (A.3) and
(A.4). Combining (C.4), (C.5), and (C.6), we obtain
B + C + D =
i
¸
l=0
¸
i
l
¸
m−l
j −l
(−1)
l
q
σl
q
l(n−i)
α(n−i, j −l)
×
q
n−i+l
−q
j
−q
n−i
q
m
−q
j
q
m−l
−1
−q
n−i
q
j
−q
l
q
m−l
−1
¸
= −q
j
P
j
(i; m, n).
(C.7)
D. PROOF OF PROPOSITION5
Before proving Proposition 5, we need two technical lemmas.
Lemma 12. For all m, ν, and l,
δ(m, ν, j)
def
=
j
¸
i=0
¸
j
i
(−1)
i
q
σi
α(m−i, ν)
= α(ν, j)α(m− j, ν − j)q
j(m−j)
.
(D.1)
Proof. The proof goes by induction on j. The claim trivially
holds for j = 0. Let us suppose it holds for j = j. We have
δ
¸
m, ν, j + 1
=
j+1
¸
i=0
¸
j + 1
i
(−1)
i
q
σi
α(m−i, ν)
=
j+1
¸
i=0
q
i
¸
j
i
+
¸
j
i −1
¸
(−1)
i
q
σi
α(m−i, ν) (D.2)
=
j
¸
i=0
q
i
¸
j
i
(−1)
i
q
σi
α(m−i, ν)+
j+1
¸
i=1
¸
j
i−1
(−1)
i
q
σi
α(m−i, ν)
=
j
¸
i=0
q
i
¸
j
i
(−1)
i
q
σi
α(m−i, ν)−
j
¸
i=0
¸
j
i
(−1)
i
q
σi+1
α(m−1−i, ν)
=
j
¸
i=0
q
i
¸
j
i
(−1)
i
q
σi
α(m−1 −i, ν −1)q
m−1−i
¸
q
ν
−1
= q
m−1
¸
q
ν
−1
δ
¸
m−1, ν −1, j
= α
¸
ν, j + 1)α
¸
m− j −1, ν − j −1
q
( j+1)(m−j−1)
,
where (D.2) follows (A.2).
Lemma 13. For all n, ν, and j,
θ(n, ν, j)
def
=
j
¸
l=0
¸
j
l
¸
n − j
ν −l
q
l(n−ν)
(−1)
l
q
σl
α(ν −l, j −l)
= (−1)
j
q
σj
¸
n − j
n −ν
.
(D.3)
Proof. The proof goes by induction on j. The claim trivially
holds for j = 0. Let us suppose it holds for j = j. We have
θ
¸
n, ν, j + 1
=
j+1
¸
l=0
¸
j + 1
l
¸
n −1 − j
ν −l
q
l(n−ν)
(−1)
l
q
σl
α
¸
ν −l, j + 1 −l
=
j+1
¸
l=0
¸
j
l
+ q
j+1−l
¸
j
l −1
¸¸
n −1 − j
ν −l
(D.4)
×q
l(n−ν)
(−1)
l
q
σl
α
¸
ν −l, j + 1 −l
=
j
¸
l=0
¸
j
l
¸
n−1−j
ν−l
q
l(n−ν)
(−1)
l
q
σl
α
¸
ν−l, j −l
¸
q
ν−l
−q
j−l
+
j+1
¸
l=1
q
j−l+1
¸
j
l −1
¸
n−1−j
ν−l
q
l(n−ν)
(−1)
l
q
σl
α
¸
ν−l, j −l +1
,
(D.5)
where (D.4) follows (A.1). Let us denote the ﬁrst and second
summations in the righthand side of (D.5) as A and B,
respectively. We have
A=
¸
q
ν
−q
j
j
¸
l=0
¸
j
l
¸
n−1−j
ν −l
q
l(n−1−ν)
(−1)
l
q
σl
α
¸
ν−l, j −l
=
¸
q
ν
−q
j
θ
¸
n −1, ν, j
=
¸
q
ν
−q
j
(−1)
j
q
σ
j
¸
n −1 − j
n −1 −ν
, (D.6)
B=
j
¸
l=0
q
j−l
¸
j
l
¸
n−1−j
ν−1−l
q
(l+1)(n−ν)
(−1)
l+1
q
σl+1
α
¸
ν−1−l, j −l
=−q
j+n−ν
j
¸
l=0
¸
j
l
¸
n−1−j
ν−1−l
q
l(n−ν)
(−1)
l
q
σl
α
¸
ν−1−l, j −l
= −q
j+n−ν
θ
¸
n −1, ν −1, j
= −q
j+n−ν
(−1)
j
q
σ
j
¸
n −1 − j
n −ν
. (D.7)
M. Gadouleau and Z. Yan 11
Combining (D.4), (D.6), and (D.7), we obtain
θ
¸
n, ν, j + 1
= (−1)
j
q
σ
j
¸
q
ν
−q
j
¸
n −1 − j
n −1 −ν
−q
j+n−ν
¸
n −1 − j
n −ν
= (−1)
j+1
q
σ
j+1
¸
n −1 − j
n −ν
−
¸
q
ν−j
−1
q
n−ν
−1
q
ν−j
−1
+ q
n−ν
¸
(D.8)
= (−1)
j+1
q
σ
j+1
¸
n −1 − j
n −ν
, (D.9)
where (D.8) follows (A.4).
We now give a proof of Proposition 5.
Proof. We apply the q
−1
derivative with respect to y to (24)
ν times, and we apply x = y = 1. By Lemma 5, the LHS
becomes
n
¸
i=ν
q
ν(1−i)+σν
β(i, ν)A
i
= q
ν(1−n)+σν
β(ν, ν)
n
¸
i=ν
¸
i
ν
q
ν(n−i)
A
i
.
(D.10)
The RHS becomes q
m(k−n)
¸
n
j=0
B
j
ψ
j
(1, 1), where
ψ
j
(x, y)
def
=
¸
b
j
(x, y; m)∗a
n−j
(x, y; m)
¸
{ν}
=
ν
¸
l=0
¸
ν
l
q
l(n−j−ν+l)
b
{l}
j
(x, y; m)∗a
{ν−l}
n−j
(x, y; m−l)
(D.11)
=
ν
¸
l=0
¸
ν
l
q
l(n−j−ν+l)
(−1)
l
β( j, l)β(n−j, ν−l)q
−σν−l
×b
j−l
(x, y; m)∗α(m−l, ν−l)a
n−j−ν+l
(x, y; m−ν)
=β(ν, ν)q
−σν
ν
¸
l=0
¸
j
l
¸
n − j
ν −l
q
l(n−j)
(−1)
l
q
σl
×b
j−l
(x, y; m)∗α(m−l, ν−l)a
n−j−ν+l
(x, y; m−ν),
(D.12)
where (D.11) and (D.12) follow Lemmas 6 and 5, respec
tively. We have
¸
b
j−l
∗α(m−l, ν −l)a
n−j−ν+l
¸
(1, 1; m−ν)
=
n−ν
¸
u=0
¸
u
¸
i=0
q
i(n−j−ν+l)
¸
j −l
i
(−1)
i
q
σi
α(m−i−l, ν−l)
×
¸
n − j −ν + l
u −i
α(m−ν −i, u −i)
= q
(m−ν)(n−ν−j+l)
j−l
¸
i=0
¸
j −l
i
(−1)
i
q
σi
α(m−l −i, ν−l)
= q
(m−ν)(n−ν−j+l)
α(ν−l, j −l)α(m−j, ν−j)q
( j−l)(m−j)
,
(D.13)
where (D.13) follows Lemma 12. Hence
ψ
j
(1, 1)
= β(ν, ν)q
m(n−ν)+ν(1−n)+σν
α(m− j, ν − j)q
j(ν−j)
×
j
¸
l=0
¸
j
l
¸
n − j
ν −l
q
l(n−ν)
(−1)
l
q
σl
α(ν −l, j −l)
=β(ν, ν)q
m(n−ν)+ν(1−n)+σν
α(m−j, ν−j)q
j(ν−j)
(−1)
j
q
σj
¸
n−j
n−ν
,
(D.14)
where (D.14) follows Lemma 13. Incorporating this expres
sion for ψ
j
(1, 1) in the deﬁnition of the RHS and rearranging
both sides, we obtain the result.
E. PROOF OF PROPOSITION7
Proof. Equation (27) can be expressed in terms of the
α
p
(m, u) and [
n
u
]
p
functions as
n
¸
i=ν
¸
i
ν
p
A
i
= (−1)
ν
p
−mk−σν
ν
¸
j=0
¸
n − j
n −ν
p
p
j(m+n−j)
α
p
(m− j, ν − j)B
j
.
(E.1)
We obtain
p
mk
n
¸
i=0
¸
j
1
ν
p
A
i
= p
mk
ν
¸
l=0
p
σl
β
p
(l, l)S
p
(ν, l)
n
¸
i=l
¸
i
l
p
A
i
(E.2)
=
ν
¸
l=0
β
p
(l, l)S
p
(ν, l)(−1)
l
l
¸
j=0
¸
n−j
n−l
p
p
j(m+n−j)
α
p
(m−j, l −j)B
j
(E.3)
=
ν
¸
j=0
B
j
p
j(m+n−j)
ν
¸
l=j
β
p
(l, l)S
p
(ν, l)(−1)
l
¸
n−j
n−l
p
α
p
(m−j, l −j),
where (E.2) and (E.3) follow (31) and (E.1), respectively.
F. PROOF OF LEMMA 9
Proof. We ﬁrst prove (41):
q
−mk
n
¸
i=0
¸
i
λ
q
ν(n−i)
A
i
=
q
−mk
α(λ, λ)
n
¸
i=0
q
ν(n−i)
A
i
λ
¸
l=0
¸
λ
l
(−1)
l
q
σl
q
i(λ−l)
(F.1)
=
q
−mk
α(λ, λ)
λ
¸
l=0
¸
λ
l
(−1)
l
q
σl
q
n(λ−l)
n
¸
i=0
q
(ν−λ+l)(n−i)
A
i
=
1
α(λ, λ)
λ
¸
l=0
¸
λ
l
(−1)
l
q
σl
q
n(λ−l)
T
0,0,ν−λ+l
(C),
12 EURASIP Journal on Wireless Communications and Networking
where (F.1) follows α(i, λ) =
¸
λ
l=0
[
λ
l
](−1)
l
q
σl
q
i(λ−l)
. We now
prove (42): since
¸
i
1
μ
=
1 −q
i
1 −q
¸μ
=
1
(1 −q)
μ
μ
¸
a=0
μ
a
¸
(−1)
a
q
ia
, (F.2)
we obtain
T
1,μ,ν
(C) =
q
−mk
(1 −q)
μ
n
¸
i=0
q
ν(n−i)
A
i
μ
¸
a=0
μ
a
¸
(−1)
a
q
ia
=
q
−mk
(1 −q)
μ
μ
¸
a=0
μ
a
¸
(−1)
a
q
an
n
¸
i=0
q
(ν−a)(n−i)
A
i
= (1 −q)
−μ
μ
¸
a=0
μ
a
¸
(−1)
a
q
an
T
0,0,ν−a
(C).
(F.3)
G. PROOF OF PROPOSITION8
Proof. From [27, (3.3.6)], we obtain [
n−i
ν
] = (1/α(ν, ν))
×
¸
ν
l=0
[
ν
l
](−1)
ν−l
q
σν−l
q
l(n−i)
, and hence
q
−mk
n
¸
i=0
¸
n −i
ν
A
i
= q
−mk
n
¸
i=0
A
i
1
α(ν, ν)
ν
¸
l=0
¸
ν
l
(−1)
ν−l
q
σν−l
q
l(n−i)
=
q
−mk
α(ν, ν)
ν
¸
l=0
¸
ν
l
(−1)
ν−l
q
σν−l
n
¸
i=0
q
l(n−i)
A
i
=
1
α(ν, ν)
ν
¸
l=0
¸
ν
l
(−1)
ν−l
q
σν−l
T
0,0,l
(C), (G.1)
where (G.1) follows (40). By Corollary 2, we have for ν < d
R
,
¸
ν
l=0
[
ν
l
](−1)
ν−l
q
σν−l
T
0,0,l
(C) = q
−mν
α(n, ν), and we obtain
ν
¸
j=0
¸
ν
j
α(n, j)q
−mj
=
ν
¸
j=0
¸
ν
j
j
¸
l=0
¸
j
l
(−1)
j−l
q
σj−l
T
0,0,l
(C)
=
ν
¸
l=0
T
0,0,l
(C)
¸
ν
l
ν
¸
j=0
¸
ν−l
j −l
(−1)
j−l
q
σj−l
= T
0,0,ν
(C), (G.2)
where (G.2) follows
¸
ν−l
j=0
[
ν−l
j
](−1)
j
q
σj
= δ
ν,l
, which in turn
is a special case of [27, (3.3.6)]. This proves (43). Thus,
T
0,0,ν
(C) is transparent to the code, and (44) can be shown
by choosing C = GF(q
m
)
n
without loss of generality.
Suppose S(ν, n, m)
def
=
¸
ν
j=0
[
ν
j
]α(n, j)q
−mj
. Then S(ν, n,
m) = S(n, ν, m) since [
ν
j
]α(n, j) = [
n
j ]α(ν, j). Also, com
bining (43) and (44) yields S(ν, n, m) = q
n(ν−m)
S(n, m, ν).
Therefore, we obtain S(ν, n, m) = q
ν(n−m)
S(ν, m, n), which
proves (45).
ACKNOWLEDGMENTS
This work was supported in part by Thales Communications
Inc. and in part by a grant from the Commonwealth of
Pennsylvania, Department of Community and Economic
Development, through the Pennsylvania Infrastructure Tech
nology Alliance (PITA). The material in this paper was
presented in part at the IEEE International Symposium on
Information Theory, Nice, France, June 24–29, 2007.
REFERENCES
[1] F. MacWilliams and N. Sloane, The Theory of ErrorCorrecting
Codes, NorthHolland, Amsterdam, The Netherlands, 1977.
[2] V. Pless, “Power moment identities on weight distributions in
error correcting codes,” Information and Control, vol. 6, no. 2,
pp. 147–152, 1963.
[3] L. Hua, “A theorem on matrices over a ﬁeld and its applica
tions,” Chinese Mathematical Society, vol. 1, no. 2, pp. 109–163,
1951.
[4] P. Delsarte, “Bilinear forms over a ﬁnite ﬁeld, with applications
to coding theory,” Journal of Combinatorial Theory A, vol. 25,
no. 3, pp. 226–241, 1978.
[5] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Spacetime
codes for high data rate wireless communication: performance
criterion and code construction,” IEEE Transactions on Infor
mation Theory, vol. 44, no. 2, pp. 744–765, 1998.
[6] P. Lusina, E. M. Gabidulin, and M. Bossert, “Maximum rank
distance codes as spacetime codes,” IEEE Transactions on
Information Theory, vol. 49, no. 10, pp. 2757–2760, 2003.
[7] E. M. Gabidulin, A. V. Paramonov, and O. V. Tretjakov,
“Ideals over a noncommutative ring and their application in
cryptology,” in Proceedings of the Workshop on the Theory and
Application of Cryptographic Techniques (EUROCRYPT ’91),
vol. 547 of Lecture Notes in Computer Science, pp. 482–489,
Brighton, UK, April 1991.
[8] E. M. Gabidulin, “Optimal codes correcting latticepattern
errors,” Problems of Information Transmission, vol. 21, no. 2,
pp. 3–11, 1985.
[9] R. M. Roth, “Maximumrank array codes and their appli
cation to crisscross error correction,” IEEE Transactions on
Information Theory, vol. 37, no. 2, pp. 328–336, 1991.
[10] E. M. Gabidulin, “Theory of codes with maximum rank
distance,” Problems of Information Transmission, vol. 21, no. 1,
pp. 1–12, 1985.
[11] K. Chen, “On the nonexistence of perfect codes with rank
distance,” Mathematische Nachrichten, vol. 182, no. 1, pp. 89–
98, 1996.
[12] R. M. Roth, “Probabilistic crisscross error correction,” IEEE
Transactions on Information Theory, vol. 43, no. 5, pp. 1425–
1438, 1997.
[13] W. B. Vasantha and N. Suresh Babu, “On the covering radius
of rankdistance codes,” Ganita Sandesh, vol. 13, pp. 43–48,
1999.
[14] T. P. Berger, “Isometries for rank distance and permutation
group of Gabidulin codes,” IEEE Transactions on Information
Theory, vol. 49, no. 11, pp. 3016–3019, 2003.
[15] E. M. Gabidulin and P. Loidreau, “On subcodes of codes in
rank metric,” in Proceedings of IEEE International Symposium
on Information Theory (ISIT ’05), pp. 121–123, Adelaide,
Australia, September 2005.
[16] A. Kshevetskiy and E. M. Gabidulin, “The new construction of
rank codes,” in Proceedings of IEEE International Symposium
M. Gadouleau and Z. Yan 13
on Information Theory (ISIT ’05), pp. 2105–2108, Adelaide,
Australia, September 2005.
[17] M. Gadouleau and Z. Yan, “Properties of codes with the rank
metric,” in Proceedings of IEEE Global Telecommunications
Conference (GLOBECOM ’06), pp. 1–5, San Francisco, Calif,
USA, November 2006.
[18] M. Gadouleau and Z. Yan, “Decoder error probability of MRD
codes,” in Proceedings of IEEE Information Theory Workshop
(ITW ’06), pp. 264–268, Chengdu, China, October 2006.
[19] M. Gadouleau and Z. Yan, “On the decoder error probability
of bounded rankdistance decoders for maximum rank dis
tance codes,” IEEE Transactions on Information Theory, vol. 54,
no. 7, pp. 3202–3206, 2008.
[20] P. Loidreau, “Properties of codes in rank metric,” http://arxiv
.org/pdf/cs.DM/0610057/.
[21] M. Schwartz and T. Etzion, “Twodimensional cluster
correcting codes,” IEEE Transactions on Information Theory,
vol. 51, no. 6, pp. 2121–2132, 2005.
[22] G. Richter and S. Plass, “Fast decoding of rankcodes with
rank errors and column erasures,” in Proceedings of IEEE
International Symposium on Information Theory (ISIT ’04), p.
398, Chicago, Ill, USA, JuneJuly 2004.
[23] P. Loidreau, “A WelchBerlekamp like algorithm for decoding
Gabidulin codes,” in Proceedings of the 4th International
Workshop on Coding and Cryptography (WCC ’05), vol. 3969,
pp. 36–45, Bergen, Norway, March 2005.
[24] D. Grant and M. Varanasi, “Weight enumerators and a
MacWilliamstype identity for spacetime rank codes over
ﬁnite ﬁelds,” in Proceedings of the 43rd Allerton Conference
on Communication, Control, and Computing, pp. 2137–2146,
Monticello, Ill, USA, October 2005.
[25] D. Grant and M. Varanasi, “Duality theory for spacetime
codes over ﬁnite ﬁelds,” to appear in Advance in Mathematics
of Communications.
[26] P. Loidreau, “
´
Etude et optimisation de cryptosyst` emes ` a cl´ e
publique fond´ es sur la th´ eorie des codes correcteurs,” Ph.D.
dissertation,
´
Ecole Polytechnique, Paris, France, May 2001.
[27] G. E. Andrews, The Theory of Partitions, vol. 2 of Encyclopedia
of Mathematics and Its Applications, AddisonWesley, Reading,
Mass, USA, 1976.
[28] G. Gasper and M. Rahman, Basic Hypergeometric Series,
vol. 96 of Encyclopedia of Mathematics and Its Applications,
Cambridge University Press, New York, NY, USA, 2nd edition,
2004.
[29] P. Delsarte, “Properties and applications of the recurrence
F(i + 1, k + 1, n + 1) = q
k+1
F(i, k + 1, n) − q
k
F(i, k, n),” SIAM
Journal on Applied Mathematics, vol. 31, no. 2, pp. 262–270,
1976.
[30] L. Carlitz, “qBernoulli numbers and polynomials,” Duke
Mathematical Journal, vol. 15, no. 4, pp. 987–1000, 1948.
Advances in Error Control Coding Techniques
EURASIP Journal on Wireless Communications and Networking
Advances in Error Control Coding Techniques
Guest Editors: Yonghui Li, Jinhong Yuan, Andrej Stefanov, and Branka Vucetic
which permits unrestricted use. and reproduction in any medium.” All articles are open access articles distributed under the Creative Commons Attribution License. . distribution. This is a special issue published in volume 2008 of “EURASIP Journal on Wireless Communications and Networking.Copyright © 2008 Hindawi Publishing Corporation. provided the original work is properly cited. All rights reserved.
Hong Kong Dongmei Zhao. USA c Mischa Dohler. USA Alex B. Greece Lin Cai. Germany G. Greece Lang Tong. Lee Swindlehurst. Canada Alagan Anpalagan. Laurenson. France ChiaChin Chong. USA Kameswara Rao Namuduri. Australia Farid Ahmed. USA Zhiqiang Liu. UCL. Djuri´ . Canada Carles AntonHaro. Germany Wolfgang Gerstacker. The Netherlands Phillip Regalia. Canada Weidong Xiang. K. Canada . Abhayapala. Hong Kong David I. Belgium Associate Editors Thushara D. Tombras. Spain Abraham O. Canada YuhShyan Chen. UK Sudip Misra. Canada Weihua Zhuang. Belgium Eric Moulines. Germany Stefan Kaiser. Fapojuwo. Lambotharan. Greece Chi Chung Ko. USA Mohamed H. Germany David Gesbert. Finland Nicholas Kolokotronis. Boucouvalas. USA Ibrahim Develi. Belgium Marc Moonen. France A. USA S. Turkey Petar M. Greece George S. USA Pascal Chevalier. USA Soura Dasgupta. Canada Claude Oestges. South Korea Huaiyu Dai. Karagiannidis. Pandharipande. Greece Richard Kozick. Canada Michael Gastpar. UK Christian Hartmann. Ahmed. France Sayandev Mukherjee. USA Amiya Nayak. Greece Ping Wang. USA Lawrence Yeung. USA Tongtong Li. Belgium A. USA Yang Xiao. Spain Anthony C. Taiwan Biao Chen. France Fary Ghassemlooy. UK Tho LeNgoc. USA Sergios Theodoridis. Canada Wei Li. USA Steve McLaughlin. USA Xueshi Yang. Gershman. India Ingrid Moerman. UK Vincent Lau.EditorinChief Luc Vandendorpe. Singapore Visa Koivunen. USA Athanasios Vasilakos.
Charly Poulliat. 11 pages Eﬃcient Decoding of Turbo Codes with Nonbinary Belief Propagation. Mario de NoronhaNeto and B. 10 pages Construction and Iterative Decoding of LDPC Codes Over Rings for PhaseNoisy Channels. Article ID 473613. 8 pages . Camille Leroux. Article ID 658042. Article ID 624542. Christophe Jego. 7 pages Joint Decoding of Concatenated VLEC and STTC System. e and Ramesh Pyndiah Volume 2008. 9 pages Diﬀerentially Encoded LDPC Codes—Part I: Special Case of Product Accumulate Codes. Andrej Stefanov. Armand Volume 2008. 9 pages ReedSolomon Turbo Product Codes for Optical Communications: From Code Optimization to Decoder Design. F. 12 pages Distributed Generalized LowDensity Codes for Multiple Relay Cooperative Communications. David Declercq. Aiman ElMaleh. Article ID 890194. Huijun Chen and Lei Cao Volume 2008. 14 pages Complexity Analysis of ReedSolomon Decoding over GF(2m ) without Using Syndromes. UchoaFilho Volume 2008. Yonghui Li. Jing Li (Tiﬀany) Volume 2008. Article ID 824673. Article ID 574783. and Branka Vucetic Volume 2008. 10 pages SpaceTime Convolutional Codes over Finite Fields and Rings for Systems with Large ˆ Diversity Order. Esa Alghonaim. Elisa Mo and Marc A. 14 pages Diﬀerentially Encoded LDPC Codes—Part II: General Case and Code Optimization. Article ID 852397. Patrick Adde. Jing Li (Tiﬀany) Volume 2008. Article ID 843634. and Mohamed Adnan Landolsi Volume 2008.Contents Advances in Error Control Coding Techniques. Article ID 598401. Article ID 385421. Changcai Han and Weiling Wu Volume 2008. 3 pages Structured LDPC Codes over Integer Residue Rings. and Thierry Lestable Volume 2008. Ning Chen and Zhiyuan Yan Volume 2008. Cowley Volume 2008. Rapha¨l Le Bidan. Article ID 367287. 9 pages New Technique for Improving Performance of LDPC Codes in the Presence of Trapping Sets. Jinhong Yuan. Article ID 362897. Sridhar Karuppasami and William G.
Maximilien Gadouleau and Zhiyuan Yan Volume 2008. 7 pages MacWilliams Identity for Codes with the Rank Metric.Average Throughput with Linear Network Coding over Finite Fields: The Combination Network Case. 13 pages . Article ID 754021. Ali AlBashabsheh and Abbas Yongacoglu Volume 2008. Article ID 329727.
3 pages doi:10. 6 Metrotech Center. and can match an outer LDPC code to any given inner code with the imperfectness of the inner decoder taken into consideration. The convergenceconstraint method provides a useful extension to the conventional “thresholdconstraint” method. The eﬀect of a trapping set can be eliminated by setting its variable nodes intrinsic and extrinsic values to zero. The next twopart series of papers “Diﬀerentially encoded LDPC codes—Part I: special case of product accumulate codes” and “Diﬀerentially encoded LDPC codes— Part II: general case and code optimization. where M is the number of phase symmetries in the signal set and estimates phase ambiguities in each observation interval. Tiﬀany Li. The new code applies blind or turbo estimators to provide signal phase estimates over each observation interval. The codes are constructed based on regular Tanner graphs by using Latin squares over a multiplicative group of a Galois ring. have been developed.1 Jinhong Yuan. Polytechnic University.usyd. a characterization of the type of LDPC degree proﬁles is provided.edu. and reproduction in any medium.3 and Branka Vucetic1 1 School of Electrical and Information Engineering. and ad hoc networks. NY 11201. One feature of this type of codes is that their minimum pseudocodeword weights are equal to their minimum Hamming distances.” by Karuppasami and Cowley. many challenging problems still remain. The analysis reveals that a conventional LDPC code is not ﬁtful for diﬀerential coding. provided the original work is properly cited. In the past decade. The special issue has received twenty six submissions. a signiﬁcant progresshas been reported in the ﬁeld of error control coding. The University of Sydney. It is resilient to phase rotations of 2π/M. USA 2 School Correspondence should be addressed to Yonghui Li. NSW 2052. Accepted 9 September 2008 Copyright © 2008 Yonghui Li et al. rather than a ﬁnite ﬁeld. such as network coding and distributed coding techniques. Furthermore.Hindawi Publishing Corporation EURASIP Journal on Wireless Communications and Networking Volume 2008. and among them. lyh@ee. the innovation ofturbo codes and rediscovery of LDPC codes have been recognized as two signiﬁcant breakthroughs in this ﬁeld. In particular. study the theory and practice of diﬀerentially encoded lowdensity paritycheck (DELDPC) codes in the context of noncoherent detection. new coding concepts. Brooklyn. Part I studies a special class of DELDPC codes. the invention of space time coding signiﬁcantly increased the capacity of wireless systems and these codes have been widely applied in broadband communication systems. which permits unrestricted use. This special issue is intended to present the stateoftheart results in the theory and applications of coding techniques. The distinct features of these capacity approaching codes have enabled them to be widely proposed and/or adopted in existing wireless standards. Through extrinsic information transfer (EXIT) analysis and a modiﬁed “convergence constraint” density evolution (DE) method.au Received 9 September 2008. where the LDPC part may take arbitrarydegree proﬁles. The more general case of DELDPC codes. The University of New South Wales. distribution. After a .1155/2008/574783 Editorial Advances in Error Control Coding Techniques Yonghui Li. “Construction and iterative decoding of LDPC codes over rings for phasenoisy channels. thirteen papers have been ﬁnally selected after a rigorous review process. Sydney. Australia 3 Department of Electrical and Computer Engineering. product accumulate codes. In the ﬁrst paper. Australia of Electrical Engineering and Telecommunications.2 Andrej Stefanov. A novel approach for enhancing decoder performance in presence of trapping sets by introducing a new concept called trapping set neutralization is proposed in the ﬁfth paper “New technique for improving performance of LDPC codes in the presence of trapping sets” by E. They reﬂect recent advances in the area of error control coding. NSW 2006. exploiting the distributed nature of networks.” by J.” Mo and Armand designed a new class of lowdensity paritycheck (LDPC) codes over integer residue rings. sensor. a design and decoding method for LDPC codes for channels with phase noise is proposed. Article ID 574783. Despite recent advances. is studied in Part II. and does not in general deliver a desirable performance when detected noncoherently. The proposed approach is suitable for the design of codes with a wide range of rates. Recently. This is an open access article distributed under the Creative Commons Attribution License. Alghonaim et al. In the fourth paper. “Structured LDPC codes over integer residue rings. They have great potential applications in wireless. Sydney.
the rank weight distribution of any linear code can be expressed as a functional transformation of that of its dual code. a new approach of decoding turbo codes by a nonbinary belief propagation algorithm is proposed. In the twelfth paper. In the eighth paper “Complexity analysis of ReedSolomon decoding over GF(2m ) without using syndromes. Since the early 1990s.” AlBashabsheh and Yongacoglu extend the average coding throughput measure to include linear coding over arbitrary ﬁnite ﬁelds. “Spacetime convolutional codes over ﬁnite ﬁelds and rings for systems with large diversity order” by UchoaFilho and NoronhaNeto.” Chen and Yan investigated the complexity of a type of syndromeless decoding for RS codes. the third generation FEC codes with softdecision decoding are attractive to reduce costs by relaxing the requirements on expensive optical devices in highcapacity systems. Design of eﬃcient distributed coding schemes for cooperative communications networks has recently attracted signiﬁcant attention. Recently. as well as coding and diversity gains. and provide a network code. for which some multiplicative FFT techniques are not applicable. This preprocessing introduces an additional diversity. a simple algorithm is introduced to store trapping sets conﬁguration information in variable and check nodes. Their ﬁndings show that for highrate RS codes. and a rate allocation between STTC and VLEC. The issues of code design and novel ultrahighspeed parallel decoding architecture are developed. In this regard. It can work eﬀectively and keep a ﬁxed overall code rate when the number of relay nodes varies. a progressive introduction of inline optical ampliﬁers and an advent of wavelength division multiplexing (WDM) accelerated the use of FEC in optical ﬁber communications to reduce the system costs and improve margins against various line impairments. and the trapping set is neutralized. so that many convolutional codes can be discarded in the code search without loosing anything. The tenth paper. It is shown that similar to the MacWilliams identity for the Hamming metric. the asymmetric trellis structure of VLEC. information exchange between bit and symbol domains. the seventh paper “ReedSolomon turbo product codes for optical communications: from code optimization to decoder design” by Bidan et al. and compared it to that of syndromebased decoding algorithms. Simulation results verify that distributed GLD codes with various number of relay nodes can obtain signiﬁcant performance gains in quasistatic fading channels compared with the strategy without cooperation. each relay node decodes and forwards some of the constituent codes of the GLD code to cooperatively form a distributed GLD code. when compared to syndromebased decoding algorithms. which combines variable length error correcting codes (VLECs) and space time trellis codes (STTCs) to provide bandwidth eﬃcient data compression. the estimated values of variable nodes are aﬀected only by external messages from nodes outside the trapping set. Under this structure. syndromeless decoding algorithms require more ﬁeld oper . A distributed generalized lowdensity (GLD) coding scheme for multiple relay cooperative communications is developed by Han and Wu in the sixth paper “Distributed generalized lowdensity codes for multiple relay cooperative communications. the paper has proved three interesting properties related to the generator matrix of the convolutional code that can be used to simplify the code search procedure for STCCs over the ﬁnite ring of integers. The complexity analysis in their paper mainly focuses on RS codes over characteristic2 ﬁelds. To be able to neutralize identiﬁed trapping sets. “Average throughput with linear network coding over ﬁnite ﬁelds: the combination network case. have been investigated. channel crosstalk. various issues. In the eleventh paper.. which is completely speciﬁed by the ﬁeld size and achieves the average coding throughput for the combination network. In contrast to the ﬁrst and second generations of FEC codes for optical communications. there has been renewed interest in decoding ReedSolomon (RS) codes without using syndromes. The MacWilliams identity and related identities for linear codes with the rank metric are derived in thethirteenth paper “MacWilliams identity for codes with the rank metric” by Gadouleau and Yan. an iterative joint source and space time decoding algorithm is developed to utilize redundancy in both STTC and VLEC to improve overall decoding performance. The approach consists in representing groups of turbo code binary symbols by a nonbinary Tanner graph and applying a group belief iterative decoding. The complexity and performance tradeoﬀ of the scheme is also carefully addressed in this paper.” Chen and Cao proposed a joint sourcechannel coding scheme for wireless fading channels.2 EURASIP Journal on Wireless Communications and Networking ations and have higher hardware costs and lower throughput. such as the inseparable systematic information in the symbol level. propose a convolutional encoder over the ﬁnite ring of integers to generate a spacetime convolutional code (STCC). “Joint decoding of concatenated VLEC and STTC system. The properties establish equivalences among STCCs. which are based on ReedSolomon (RS) codes and the concatenated codes with harddecision decoding.” By using partial error detecting and error correcting capabilities of the GLD code. In the ninth paper “Eﬃcient decoding of turbo codes with nonbinary belief propagation” by Poulliat et al. investigates the use of turboproduct codes with ReedSolomon codes as the components for 40 Gb/s over optical transport networks and 10 Gb/s over passive optical networks. such as beam noise. They also derived tighter bounds on the complexities of fast polynomial multiplications based on Cantor’s approach and the fast extended Euclidean algorithm. and nonlinear dispersion. At the receiver. The parity check matrices of turbo codes need to be preprocessed to ensure the code good topological properties. Most harmful trapping sets are identiﬁed by means of simulation. Providing highquality multimedia service has become an attractive application in wireless communication systems. which is exploited to improve the decoding performance. Furthermore. In their paper. They characterize the average linear network coding throughput for the combination network with mincut 2 over an arbitrary ﬁnite ﬁeld. the partial decoding at relays is allowed and a progressive processing procedure is proposed to reduce the complexity and adapt to the sourcerelay channel variations.
Yonghui Li Jinhong Yuan Andrej Stefanov Branka Vucetic 3 . rank weight enumerator of the dual of any vector depends only on the rank weight of the vector and is related to the rank weight enumerator of a maximum rank distance code.Yonghui Li et al.
our work nevertheless diﬀers from [14] primarily because our construction relies on an extension of the notion of Latin squares to multiplicative groups of a Galois ring—a key contribution of this paper. Copyright © 2008 E. National University of Singapore. Accepted 3 June 2008 Recommended by Jinhong Yuan This paper presents a new class of lowdensity paritycheck (LDPC) codes over Z2a represented by regular. 9 pages doi:10. [1. geometrically uniform. Codes constructed using other combinatorial approaches. Singapore 117576 Correspondence should be addressed to Marc A.edu. nonbinary LDPC codes over certain rings. 19]. structured Tanner graphs. Armand Department of Electrical and Computer Engineering. The subject of geometrically uniform codes has been well studied by various authors including Slepian [2] and Forney Jr. were similarly not investigated using . 1. More recently.1155/2008/598401 Research Article Structured LDPC Codes over Integer Residue Rings Elisa Mo and Marc A. which permits unrestricted use. it is not possible to obtain a geometrically uniform code (wherein every codeword has the same error probability). from a nonbinary. Armand.sg Received 31 October 2007. However. provided the original work is properly cited. 18. particularly those of weight less than the minimum Hamming distance of the code. while achieving relatively similar performance. Sridhara and Fuja [4] introduced geometrically uniform. rather than a ﬁnite ﬁeld. In other words. This paper therefore addresses the problem of designing structured. can outperform their random counterparts of similar length and rate. Article ID 598401. Our approach yields codes for a wide range of code rates and more importantly. which include the family of ﬁnite geometry (FG) codes [5] and balanced incomplete block design (BIBD) codes [6]. More speciﬁcally. we focus our investigations on codes of short codelength. Studies of the socalled pseudocodewords arising from ﬁnite covers of a Tanner graph. codes whose minimum pseudocodeword weights equal their minimum Hamming distances. and therefore cannot be geometrically uniform. such as those presented in [6. the presence of pseudocodewords of low weight. 15–17]. Revised 31 March 2008. [3]. Structured LDPC codes. INTRODUCTION The study of nonbinary LDPC codes over GF(q) was initiated by Davey and Mackay [1]. when transmitted using matched signal sets over an additivewhiteGaussiannoise channel. distribution. Simulation studies show that these structured codes. These graphs are constructed using Latin squares deﬁned over a multiplicative group of a Galois ring. Structured nonbinary LDPC codes that have been proposed thus far however. Motivated by the fact that short nonbinary LDPC codes can outperform their binary counterparts [8–10]. We therefore adopt the Latinsquaresbased approach of Kelley et al. Mo and MarcA. the symbols of a nonbinary code over a ﬁnite ﬁeld cannot be matched to any signal constellation. for example. While we maintain the pseudocodeword framework used there. This is an open access article distributed under the Creative Commons Attribution License. and reproduction in any medium. [11–13]. are favored over their random counterparts due to the reduction in storage space for the parity check matrix and the ease in performance analysis they provide. have revealed that while a code’s performance under maximumlikelihood (ML) decoding is dictated by its (Hamming) weight distribution. are constructed over ﬁnite ﬁelds. its performance under iterative decoding is dictated by the weight distribution of the pseudocodewords associated with its Tanner graph. ﬁnite ﬁeld code. 7]. is detrimental to a code’s performance under iterative decoding. We note that codes based on Latin squares were also studied in [7.Hindawi Publishing Corporation EURASIP Journal on Wireless Communications and Networking Volume 2008. Armand. Their codes are however unstructured. nonbinary LDPC codes over integer residue rings. including integer residue rings. the authors of these works did not do so in the pseudocodeword framework. eleama@nus. [14] to construct structured codes. However. as their method aims at maximizing the minimum pseudocodeword weight of a code. for example.
. (4) . . where cx = [x1 . . The squared Euclidean distance between these two signal vectors is 2 dE sx1 . j). followed by our extension of Latin squares to multiplicative groups of a Galois ring. s y2 . s y ) denotes the square Euclidean distance between sx and s y [21]. . . Remark 1. s y1 . ⎥ ⎦ (3) 2μnH hnH where 0 ≤ μi ≤ a − 1 for i = 1. . and stopping sets. s y2 . . one could perform Gaussian elimination without dividing a row by a zero divisor to obtain the nG (or nH ) linearly independent rows. This also implies that μi = 0 for i = 1. gnG } ⊂ Zna is a set of linearly independent 2 elements. . . The matched signal set the notion of pseudocodewords. Deﬁnition 1. Speciﬁcally. . . C. . g2 . The following deﬁnition and example are taken from [22. . C is a free Zsubmodule if λi = 0 for i = 1. . 3. respectively. . in particular. . j. 2 2 dE sx . . expansion. An overview Let C n generator matrix G can be expressed in the form [20] ⎡ ⎢ ⎢ ⎢ G= ⎢ ⎢ ⎣ be a Z2a submodule of the free Z2a module Zna . The rate of C is n Observe that the Hamming distance between two codewords is mapped proportionally to the Euclidean distance between their corresponding signal vectors. . ⎣ . . . The 2a PSK is matched to Z2a because for any x. and k. xn ] and c y = [y1 . x2 .1. 2. Let cx . (2) The dual code C ⊥ is generated by the nH × n paritycheck matrix of C. . and S are sets of cardinality q and L is a mapping L(i. ⎥ ⎦ sx1 − y1 . . . Further. . where Es is the energy assigned to each symbol [4]. sxn . h2 . j ∈ C. . . j). Section 5 presents computer simulations which demonstrate that our codes. the third is unique. Its nG × 2 = = 2λ1 g1 2λ2 g2 . s yi = i=0 i=0 2 dE 2 dE sxi − yi . such that given any two of i. A Latin square of order q is denoted as (R. The notion of Latin squares can be easily applied to Galois ﬁelds by setting R = C = S = GF(ps ) and mapping function Lβ (i. . nH . s0 . . sxn − yn . Two Latin squares with mapping functions L and L are orthogonal if (L(i. sx2 . 2μ1 h1 ⎤ ⎥ ⎥ ⎥. 3. They are mapped symbol by symbol to [sx1 . Projecting one dimension on the real axis and the other on the imaginary axis. s yn n n 2 dE sxi . S. we provide an overview of codes over Z2a and their natural mapping to a matched signal constellation. . . s0 . . nH and {h1 . sxn ] and [s y1 . s0 . which can be expressed in the form ⎡ ⎢ 2μ2 h ⎢ 2 H= ⎢ . . . (5) 2 where dE (sx . . a symbol x ∈ Z2a is mapped to sx = Es exp( j2πx/2a ) of the signal set. where p is prime.1. . s0 . (6) (1) 2λnG gnG where 0 ≤ λi ≤ a − 1 for i = 1. . Chapter 17]. j) = i + β j for β ∈ GF(ps ) \ {0}. This is one of our main results. y2 . we only consider linear codes over Z2a . ⎢ . show that their minimum pseudocodeword weights equal their minimum Hamming distances. j)) is unique for each pair (i. where R. nG . We show that from these graphs. . . . A method to construct Tanner graphs using Latin squares (over a multiplicative group of a Galois ring) is presented in Section 4. outperform their random counterparts of similar length and rate. where i ∈ R. 2. ⎤ ⎥ ⎥ ⎥ ⎥. and k ∈ S. . . . . sx2 . . .2. . the 2a PSK constellation. . In the next section. . 2. a complete family of q − 1 mutually orthogonal Latin squares (MOLS) exists for q = ps . s yn ]. . j) = k. Section 3 introduces the notion of Latin squares over ﬁnite ﬁelds.2 EURASIP Journal on Wireless Communications and Networking If G (or H) is not already in the form in (1) (or in (3)). Finally. sx2 − y2 . . a wide range of code rates may be obtained. c y ∈ C. L). these related works focused on the optimization of design parameters such as girth. 2. C. CODES OVER Z2a The 2a PSK signal set contains 2a points that are equidistant from the origin while maximally spread apart on a twodimensional space. For practical reasons. . Our work therefore diﬀers from these earlier studies in this regard. L (i. . that is. yn ]. nG and {g1 . j). hnH } ⊂ Zna is a set of linearly independent 2 elements. The rate of C can also be obtained by r =1− 1 H a − μi nH =1− + n i=1 a n n nH i=1 μi an . LATIN SQUARES Deﬁnition and application to galois ﬁelds r= 1 G a − λi nG = − n i=1 a n nG i=1 λi an . . when mapped to matched signal sets and transmitted over the additivewhiteGaussiannoise (AWGN) channel. A Latin square can be written as a q × q array for which the cell in row i and column j contains the symbol L(i. 2. y ∈ Z2a . . s y = dE sx− y . s0 2. . diameter. . 2. . We further derive in the same section certain properties of the corresponding codes and. .
α. 3⎦ 0 ⎤ 2 3 0 1 0 1 2 3 2 3⎥ ⎥ ⎥ 0⎦ 1 (8) 0 ⎢1 ⎢ M3 = ⎢ ⎣2 3 = x=0 2a−1 2a−1 −x u (2v)x . Let R = C = G3 ∪ {0}. Extension to multiplicative groups of a Galois ring Extending the notion of Latin squares over integer residue rings is not trivial. L2 (i. 2. Let R = C = S = GF(22 ) = {0. Since G3 ∪ {0} is not closed under Raddition. (i)1/2 + a−1 (β j)1/2 = u + 2v. j) = i + 2 j and L3 (i. β Proof. Embedded in R is a multiplicative group G2s −1 of units of order 2s − 1. 1⎦ 2 ⎡ 0 ⎢1 ⎢ M2 = ⎢ ⎣2 3 3 0 1 2 2 3 0 1 1 2⎥ ⎥ ⎥.2. Let R = GR(22 . L(a) (i. a−1 Since G2s −1 ∪ {0} is not closed under Raddition. then i. . j) = i+ j. Deﬁnition 2. j ∈ G2s −1 ∪ {0} and β ∈ G2s −1 . 2) = Z4 [y]/ y 2 + y + 3 . α2 }. j ∈ G2s −1 ∪{0} and β ∈ G2s −1 . β β Proof. 2. j).E. (β j)1/2 ∈ G2s −1 ∪ {0}. L(a) (i. L(a) (i. j) = L(a) (i. where z ∈ R. Thus. (11) . Lα (i. Example 3. Mapping 2a−1 . 0⎦ 1 ⎤ ⎡ α α2 0 1 α2 α 1 0 1 0⎥ ⎥ ⎥ α2 ⎦ α ⎤ Mα 2 0 α2 α 1 α2 0 y + 2 3y + 1 1 y + 3 3y + 2 2 ⎥ ⎥ ⎥ 2y 3 y+3⎦ 3y + 1 3 2y + 2 3y + 2 0 ⎤ ⎤ (9) (7) respectively. β 2a−1 2a−1 a−1 a−1 a−1 a−1 a−1 0 ⎢1 ⎢ M1 = ⎢ ⎣2 3 1 2 3 0 2 3 0 1 3 0⎥ ⎥ ⎥. Mapping functions L1 (i. β j) = (u + 2v) 2a−1 x L(a) (i. where φ(y) is a degrees basic irreducible polynomial over Z2a . where i. β 2a−1 j) = x=0 2a−1 x (i)1/2 a−1 2a−1 −x (β j)1/2 a−1 x mod 2a . Theorem 1. j) = u2 a−1 Theorem 2. Setting R = C = S = Z2s and mapping functions Lβ (i. j) = i + 3 j. s) = Z2a [y]/ φ(y) and G2s −1 ⊂ GR(2a . where u ∈ G2s −1 ∪ {0} and v ∈ R. Using binomial expansion. α2 } = {1. Mα 2 = ⎢ ⎣ y+2 3 y + 3 2y ⎦ 3y + 1 2y + 2 3y + 2 3 respectively. Further. j) = ((i)1/2 +(β j)1/2 )2 . 1⎦ 0 ⎡ ⎢1 ⎢ = ⎢ ⎣α ⎤ 0 ⎢1 ⎢ Mα = ⎢ ⎣α α2 1 0 α2 α α α2 ⎥ ⎥ ⎥. Therefore. j) = i + j. x (10) Observe that are obtained. M2 is not a Latin square. and extend this notation to ntuples and matrices over R. j) = i + j. Example 2. we propose an alternative way of constructing Latin squares over integer residue rings. s) = Z2a [y]/ φ(y) . Hence. 3y + 1}. Then. j) = i + α2 j yield a complete family of three MOLS ⎡ 3 functions L1 (i. j) ∈ G2s −1 ∪ {0}. ∈ G2s −1 ∪ {0}. Since the elements in each row of M2 is not unique. u2 a−1 −x (2v)x = 0 mod 2a for x = 1. j) = i + α j and Lα2 (i. It is apparent that (i)1/2 . Lα (i. we do not have a complete family of three MOLS. the mapping function can be expressed as L(a) (i. Using binomial expansion. generated by α = y + 2. s) = Z2a [y]/ φ(y) . j) = i+α2 j yield matrices 0 1 y + 2 3y + 1 ⎢ 1 2 y + 3 3y + 2⎥ ⎢ ⎥ ⎥ M1 = ⎢ ⎣ y+2 y+3 2y 3 ⎦ 3y + 1 3y + 2 3 2y + 2 ⎡ ⎢ 1 ⎢ Mα = ⎢ ⎣ y+2 ⎡ ⎡ ⎤ 0 ⎢1 ⎢ M1 = ⎢ ⎣α α2 1 0 α2 α α α2 0 1 α2 α⎥ ⎥ ⎥. 1. y + 2. 3} and let mapping functions be L1 (i. j) = i + β j for β ∈ Z2s \ {0} do not yield a complete family of 2s − 1 MOLS. S ⊂ R so that S = R = C  = 2s . where φ(y) is a degree s basic irreducible polynomial over Z2a . the mapping functions have to be altered slightly such that they map i ∈ R and j ∈ C uniquely to Lβ (i. Let R = C = S = Z22 = {0. α. j) = i+α j and Lα2 (i. all three matrices / are not Latin squares. j ∈ β G2s −1 ∪ {0} and β ∈ G2s −1 . . L(a ) (i. respectively. . 1. Mo and MarcA. Consider two multiplicative groups G2s −1 ⊂ GR(2a . ⎡ ⎤ ⎡ ⎤ 0 3y + 1 1 y+2 ⎢ 1 3y + 2 2 y + 3⎥ ⎢ ⎥ ⎥. j) ∈ S and R = C  = S. Let extension ring R = GR(2a . we let a < a and deﬁne z = z mod 2a . Let i. . Thus. Embedded in R is G3 = {1. 3. To overcome this problem. Armand Example 1.
Let R = C = S = G3 ∪ {0} ⊂ GR(22 . If β = 0. j) is unique for a given pair (i. observe that 2a−1 x mod 2 a EURASIP Journal on Wireless Communications and Networking Step 1 Step 2 2s + 1 = ⎧ ⎪ 2a −1 ⎪ ⎪ ⎨ . j) (from Theorem 2). The tree is completed once all possible β combinations of (i. j) = β i + β j coincides with the mapping function applied to the Galois ﬁelds. variable node (0. A complete family of three MOLS is obtained. When a = 1. If β ∈ G2s −1 . i) is connected to check node ( j. j) = i yields a matrix 0 ⎢1 ⎢ M0 = ⎢ ⎣α α2 ⎡ 0 1 α α2 0 1 α α2 0 1⎥ ⎥ ⎥ α⎦ 2 α ⎤ (15) . otherwise. β) are exhausted. A complete family (a) {(R. Section IVA] can be generalized to construct graphs for diﬀerent values of a and s by altering the mapping functions according to the value of a. label the remaining variable nodes in the second layer (β. which is orthogonal to each Latin square in the complete family of MOLS. k). In addition. The ((i) + (α j) ) and Lα2 (i. the mapping function L(1) (i. β ∈ G2s −1 ∪ {0}. j. (2) Each check node in the ﬁrst layer is connected to 2s consecutive variable nodes in the second layer. / Let R = C = S = G ∪ {0}. y x = y ·2a−a . where y is an integer. k. k. C. L(2) (i. Lβ ) : β ∈ G2s −1 } of MOLS is obtained by deﬁning 2s −1 Figure 1: Portion of parity check matrix constructed in each step. i) and all check nodes in the third layer ( j. the mapping function L0 (i. j) = α 1 (2) 1/2 1/2 2 1/2 2 j)1/2 )2 . j) and L(a) (i. The procedure is stated here for easy reference by Theorem 3 that follows. The graph is a tree that has three layers that enumerate from its root. the root is a variable node. j) = ((i) + (α resultant MOLS are ⎡ ⎢ 1 0 α2 α ⎥ ⎢ ⎥ ⎥. Thus there are 22s + 2s + 1 variable nodes and the same number of check nodes. ⎪ ⎪ ⎪ ⎩0. α α2 α 0 ⎢1 ⎢ Mα 2 = ⎢ ⎣α α2 ⎡ α2 α 1 0 1 0 α2 α α α2 ⎥ ⎥ ⎥. j). j) = L(a) (i. β1 ∈ G2s −1 and β0 = β1 . M1 = ⎢ ⎣ α α2 0 1 ⎦ 0 1 α α2 1 0 ⎤ ⎡ α2 ⎢ 1 α2 α 0 ⎥ ⎢ ⎥ ⎥. j) = ((i)1/(2 β a−1 ) + (β j)1/(2 a−1 ) )2 . L(a) (i. 2) and mapping functions L(2) (i. (1) The variable root node is connected to each of the check nodes in the ﬁrst layer. STRUCTURED LDPC CODES OVER Z2a Construction of graphs using latin squares L(a) (i.1. where β0 β1 β0 . j). It follows that two β Latin squares constructed by L(a) (i.4 Now. (3) Each of the ﬁrst 2s variable nodes in the second layer is connected to 2s consecutive check nodes in the third layer. The connectivity of the nodes are executed in the following steps. j) β 2a −1 = y =0 2a −1 2a −1 y 2a −1 y (i)1/2 a−1 2a−1 − y ·2a−a (β j)1/2 a−1 y ·2a−a mod 2a = y =0 (i)1/2 a −1 2a −1 −y (β j)1/2 a −1 y = Lβ (i. (a ) 1 2s 22s (13) Remark 2. (4) For i. i) is connected to check node ( j. j) = ((i)1/2 + j 1/2 )2 . 4. j. i). β β L(a) (i. j). Since L(1) (i. the second layer has 2s (2s + 1) variable nodes and the third layer has 22s check nodes. variable node (β. a−1 Example 4. S. (12) 0 Step 3 Step 4 22s Thus. j)). are orthogonal. L(a) (i. 0⎦ 1 (14) ⎤ respectively. Mα = ⎢ ⎣ α 0 1 α2 ⎦ 0 α α2 1 1 0 ⎤ The construction method proposed in [14. the ﬁrst layer has 2s + 1 check nodes. 4.
s) tends to the lower bound which results in codes suitable for lowrate applications. s). dmin (a. s) has to be computed ﬁrst by reducing H(a. s) = 2s + 2 (from Theorem 4). 4. Table 1 compiles the properties of C(a. Corollary 1. s) denote the graph resulting from reducing mod 2a . Theorem 4. s) = 0 mod 2a (from Theorem 3). s) = T (a. dc = dc and the bounds on dc follow Case 2. r(a. dmin (a. s) and C(a . s). Thus. j) (from Theorem 2). c= 2a−1 c .E. The graphs constructed by setting a = 1 yield binary codes that are the same as those in [14. i). Following the deﬁnition given in [14]. L(a) (i. then c ∈ C(a . Section IVA] that wmin (1. while for Corollary 1(ii). s) as randomly chosen units from Z2a . Corollary 1(i) is a simple consequence of Theorem 3. s). then c ∈ C(a . Since L(a ) (i. c can be expressed as c= 2a−a c . The null space of H(a. If dc = dmin (1. Proof. Further. Thus. Let a = 2 and s = 2. ⇒ s (17) Remark 3. s) = dmin (1. s) and wmin (1. s). s) C(a. Proof. s) denotes the minimum pseudocodeword weight that arises from the Tanner graph of C(a. Armand Let T (a. (ii) If c ∈ C(a. 2s 2 + 2s + 1 a 2 + 2s + 1 (18) where the upper bound corresponds to the code rates of the binary PGLDPC codes [5]. dc = dc and from Case 1. Steps (1)–(3) are illustrated in Figure 2(a). wmin (a. wmin (a. s) (or T (a. s). j))) β in T (a. s) = dmin (1. As observed. s) for various values of a and s. c ∈ C(a . SIMULATION RESULTS (16) Figures 3 and 4 show the bit error rate (BER) and symbol error rate (SER) performance of our structured codes over . the edge β β ((β. We observe that by setting the edge weights of T (a. where c does not contain any unit of Z2a . s) constructed at each step is illustrated in Figure 1. c can be expressed as c= 2a−a c . Thus. dc = dmin (1. by setting all edge weights to be unity. Therefore. s) when a = 1 (from Theorem 3) and all edge weights in T (a. s) = 0 mod 2a ⇒ = c HT (a . Theorem 3. The corresponding codes can thus be deployed in moderaterate applications. when a = 1. Case 2. it is known that dmin (1. Since T (1. this can be perceived as the nonrandom portion of the paritycheck matrix. ( j. from Corollary 1(ii). s) is a length n(s) = 22s + 2s + 1 regular LDPC code represented by H(a. s) ∈ (22s +2s +1)×(22s +2s +1) Z2a in topbottom. s). The resultant tree is shown in Figure 2(b). L(a ) (i. dc ≥ dmin (1. dc ≥ dc . Let dc be the Hamming weight of c ∈ C(a. for all a. dmin (a. executes the pseudorandom portion of the paritycheck matrix that is commonly seen in most LDPC paritycheck matrices. s)). wmin (a. s) = 2s + 2. and c ∈ C(1. (i) If c ∈ C(a. Theorem 5. r(a. 2 Proof. s) yields an LDPC code C(a. s) ≤ 2s . Further. c contains at least one unit. s) can be expressed as c= 2a−a c . s). s) and dmin (a. we state two relationships between the codewords in C(a. dc ≥ dc . s) = H(a. From Corollary 1(i). 2a−a c HT (a. s) is equivalent to the edge in T (a . s) ≥ 2s + 2.1. s). Further. s) and is unique. T (a . the connection procedure is regardless of a in steps (1)–(3). that is. s) = dmin (1. s) = 2s + 2 = wmin (a. leftright manner while setting the edge weights to be randomly chosen units from Z2a . Case 2. s). By reading the variable (check) nodes as columns (rows) of a matrix H(a. Example 5. The Latin squares are shown in Example 4. We denote the minimum distance of C(a. s). wmin (a. s) = dmin (a. j) = L(a) (i. s).2. 5. It has already been shown in [14. s) ≤ dmin (a. c ∈ C(1. s) for the 2a ary symmetric channel. s) denote the resultant tree constructed using the complete family of MOLS derived from G2s −1 ∪ {0} ⊆ R. On the other hand. the portion of H(a. s) = dmin (1. s) as dmin (a. and similarly for β = 0 in step (4). s) is a degree2s + 1 regular tree. s). First.1. GR(2a . 2s + 2 ≤ wmin (a. Proof. Properties of C(a. s) share the same tree bound [14]. j))) β ((β. all edge weights of T (a. T (a. When a = 1. Similarly. Mo and MarcA. s). s) is bounded by 22s + 2s − 3s 22s + 2s − 3s ≤ r(a. s) = dmin (a. s). s). it has also been shown that these codes are the binary projective geometry (PG) LDPC codes introduced in [5]. s) to the form as discussed in Section 2.1. s). s). s) over Z2a . where c contains at least one unit of Z2a . s) are units of Z2a . H(a . Section IVA]. s). ( j. Let T (a. c ∈ C(a . s). s) \ {0}. where C ∈ Zna . The following theorem states the relationship between wmin (a. Further. s). i). s). s). 5 The uniqueness of c follows from the natural group embedding. Before deriving dmin (a. Step 4.2. ⇒ The code rate r(a. If dc = dmin (1. s) increases signiﬁcantly. s) = T (a. From Corollary 1(ii). that is. r(a. s) = 2 + 2. Case 1. on the other hand. s) → R : r → 2a−a r. s) = 0 mod 2a = c HT (a.
respectively. speciﬁcally. α) (0. 1) (0. The performance of random. 10−4 and below for codelengths close to 1000 bits and 10−6 and below for larger codelengths. respectively. α2 ) (0. α) (1. The received signals are decoded using the sumproduct algorithm.2. On the other hand. α2 ) (1. 1)(α2 . The codewords are transmitted using the matched signals discussed in Section 2. 1)(1. This phenomenon may be attributed to the fact that the minimum distance of our codes grow linearly with the square root of . α) (0. 42 bits. 219. for larger codelengths. speciﬁcally. we turn to Figures 4(a) and 4(b) which summarize the BER performance of random and structured codes over Z4 . 0) (1. nearregular LDPC codes with constant variable node degree of 3. 1) (1. For each data point. 0) (α2 . and 819 bits. the corresponding edge weights of the codes simulated are randomly chosen units of Z4 . 63. 0) (α. α)(α. while those in Figures 3(b) and 4 are set to unity. it therefore appears that our structured codes are only better than random codes for short codelengths. 104 error bits are obtained for a maximum of 100 iterations allowed for decoding each received signal vector. Figure 3(a) shows our structured Z4 code outperforming the random code when the codelength is small. Z8 . In Figure 3(a). 1) (0. 2114 bits. α2 ) (α. 1) (α. 0) (α. 0)(0. random codes perform better in the higher BER region while our structured codes are superior at lower BERs. 0) (1. 1)(α. Figure 3(b) shows our structured code performing worse than its random counterpart when the codelength is much larger. α) (α2 . On the other hand. α2 ) (α2 . At a glance. α2 ) (α2 . α) (α2 . α)(1. To get a clearer picture as to how our codes fair in comparison to their random counterparts. 0) (0. the AWGN channel. that is. α) (α. is also shown.6 EURASIP Journal on Wireless Communications and Networking (a) (0. less than 100 bits. that is. s = 2 after (a) steps (1)–(3). 146. α2 ) (α. exceeding 2000 bits. From these empirical results. These codes have similar codelengths and rates to that of the structured codes. 0)(α2 . for increasing codelengths of 21. and (b) step (4) (the ﬁnal structure). α2 ) (1. and 546 bits. 1) (α2 . we conclude that our codes signiﬁcantly outperform their random counterparts over a wide BER range for very small codelengths. α2 ) Variable node Check node (b) Figure 2: Tree constructed for a = 2.
6996 0. s) = wmin (a.2564 0.6164 0.1746 0.2619 0. we have constructed Tanner graphs representing a family of structured LDPC codes over Z2a spanning a wide range of code rates.5653 0. 6. from [23. s = 5. s) 7 r(a.7692 0.3846 0. On the other hand. s) dmin (a.5548 0. we have that the minimum distance of a random.6337 0.6367 0.3699 0.2381 0.4982 0.6996 0. we believe that they have superior minimum distances compared to our structured codes.3175 0.5 4 4.5238 0.1749 0. we . s) (Unity edge weights) 0. In addition. Using the generalized mapping function.4762 0. we have extended the notion of Latin squares to multiplicative groups of a Galois ring. s).4932 0.5669 100 10−1 10−2 Error rate 10−3 10−4 10−5 10−6 Error rate 1 2 3 4 5 6 Eb /N0 (dB) 7 8 9 100 10−1 10−2 10−3 10−4 1 1.5 Structured BER Structured SER Random BER Random SER Structured BER Structured SER Random BER Random SER (a) a = 2. unity edge weights Figure 3: Performance of structured and random LDPC codes over Z4 with QPSK signaling over the AWGN channel.5 2 2.2332 0. regular LDPC code with constant variable node degree of 3 grows linearly with its codelength with high probability.E. s = 2.2055 0.5 3 Eb /N0 (dB) 3. their codelength. Mo and MarcA.3498 0.1923 r(a. random edge weights (b) a = 2. a 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 1057 33 34 4 273 17 18 3 73 9 10 2 21 5 6 s n(s) Degree of T (a.7053 0. s) (Lower bound) 0. CONCLUSION To summarize. As the random codes considered here are near regular. Theorem 26]. Armand Table 1: Properties of C(a.1541 0.3082 0.1309 0.5238 0.7692 0.6164 0.
Milenkovic.” IEEE Communications Letters. “Nonbinary LDPC codes for optical communication systems. vol. “Binary representation of cycle Tannergraph GF(2b ) codes. Hu and E. 159–166. Eleftheriou. 6. Paris. no. 3209–3220. pp. Djordjevic and B. S. 51. B. Kelley. [4] D. and D.” IEEE Transactions on Information Theory. 1241–1260. Sridhara. our simulation results show that these codes. [12] R. 2007. The authors also gratefully acknowledge ﬁnancial support from the Ministry of Education ACRF Tier 1 Research Grant no. E. 2004. no. July 2006. 2004. “Group codes for the Gaussian channel. [3] G. pp. o o [14] C. C Mackay. J. Kelley.” IEEE Transactions on Information Theory. and J.” IEEE Transactions on Information Theory. pp. [9] X. [10] C. [6] B. Bennatan and D. 1. 1991. 2711–2736. 528–532.” IEEE Transactions on Information Theory. no.” IEEE Photonics Technology Letters. Slepian. “LDPC codes over rings for PSK modulation. unity edge weights. 8. 37. [7] I. 1968.” IEEE Communications Letters. 549–583. USA. ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their helpful comments which led to signiﬁcant improvements in Sections 1 and 5 of this paper. have shown that the minimum pseudocodeword weight of these codes are equal to their minimum Hamming distance—a desirable attribute under iterative decoding. France. “Pseudocodeword weights for nonbinary LDPC codes. C. Fossorier. 5. Kou. 47. Davey and D. P. pp. 50. D. Germany. “Lowdensity paritycheck codes based on ﬁnite geometries: a rediscovery and new results. 75–82. “Geometrically uniform codes. 5. no.” in Proceedings of IEEE International Symposium on Turbo Codes. 575–602. 17. Fossorier. Sridhara and T. 538– 540. pp. no. Vasic. 8. pp. vol. and J. 1998. Ph. 53.” Bell System Technical Journal. 52. Sweden. vol. [13] N. Sridhara. vol. Brest. 2. B. vol. June 2004. 1996.” in Proceedings of IEEE International Symposium on Information Theory (ISIT ’06). vol. 4. D. “Using binary image of nonbinary LDPC codes to improve overall performance.” IEEE Transactions on Information Theory. pp. Declercq. 2005. M. Rosenthal. at BERs of practical interest. vol. France. pp. unity edge weights. REFERENCES [1] M. 1379–1383. [8] A. April 2006. R263000361112. Vasic and O. 2006.” IEEE Transactions on Information Theory. 1460–1478. September 2003. Rosenthal. A. can signiﬁcantly outperform their random counterparts of similar length and rate. [2] D. C.8 100 10−1 10−2 EURASIP Journal on Wireless Communications and Networking 10−1 10−2 10−3 BER 10−3 10−4 10−5 10−6 s=4 s=3 1 2 3 4 5 Eb /N0 (dB) 6 7 8 10−6 7 8 9 s=2 10−5 s=3 10 Eb /N0 (dB) Structured Random (b) a = 3. Lin. “Design and analysis of nonbinary LDPC codes for arbitrary discretememoryless channels. “Combinatorial constructions of lowdensity paritycheck codes for iterative decoding. 2005. thesis. Vontobel. no. 2001. pp. 9. pp. “Graphcovers and iterative decoding of ﬁnite length codes. pp. “Low density parity check codes over GF(q). D. Koetter and P. pp. 2. Wiberg. Link¨ ping University. transmitted using 8PSK signaling 11 12 13 BER 10−4 s=4 s=2 Structured Random (a) a = 2. O. vol. Seattle. [15] I. 1156– 1176. no. 2224–2226. “MacNeishMann theorem based iteratively decodable codes for optical communication systems. Fuja. Link¨ ping. Forney Jr. Djordjevic and B. Wash. 47. A. no. when transmitted by matched signal sets over the AWGN channel. [11] C. 10. Finally. 7. “Treebased construction of LDPC codes having good pseudocodeword weights. Poulliat. and M.Y. no. Vasic. .” in Proceedings of IEEE International Conference on Communications (ICC ’04).” in Proceedings of the 3rd IEEE International Symposium on Turbo Codes and Applications. vol. [5] Y.D. Codes and decoding on general graphs. pp. Munich. Burshtein.. transmitted using QPSK signaling Figure 4: Performance of structured and random LDPC codes transmitted using matched signals over the AWGN channel. vol. vol.
2nd edition. [17] B. 2003. vol. no. 2. no. “Shortened array codes of large girth. B. van Lint and R. 3707–3722. no. M.E. Wilson. 2005. pp. and R. vol. 438–446. [23] G. Kostuk. “Lowdensity parity check codes and iterative decoding for longhaul optical communication systems. pp. “Analysis of the cyclestructure of LDPC codes based on Latin squares. B. Laendner. “Iteratively decodable codes from orthogonal arrays for optical communication systems. vol. Paris. pp. 1995. Cambridge.” http://calvino. I. 5. 6. vol.” in Proceedings of IEEE International Conference on Communications (ICC ’04). pp. 2. 9. and D.” IEEE Transactions on Information Theory. Armand [16] O.” Journal of Lightwave Technology. Djordjevic and B. Kashyap. “Average spectra and minimum distances of low density parity check codes over cyclic groups.” IEEE Transactions on Information Theory. Leyba. Vasic. 777–781. [19] O. “Signal sets matched to groups. Vasic. 41. 37. 8. Loeliger. Mo and MarcA. Fagnani. K. vol. 52. France. 21. M. “Linear block codes over cyclic groups.it/∼fagnani/groupcodes/ ldpcgroupcodes. UK. 924–926. [22] J. Caire and E. A Course in Combinatorics.pdf. N.” IEEE Communications Letters. A. 1246–1256. [20] G.polito. June 2004. Milenkovic and S. Biglieri. no. pp. Como an F. no.” IEEE Transactions on Information Theory. vol. 2006. Milenkovic. 1991. [21] H. 10. pp. H. [18] I. 2001. Djordjevic. Cambridge University Press. 9 . 1675–1682.
Extrinsic information transfer (EXIT) charts reveal that. a nolineofsight landmobile channel that is characterized by the Rayleigh fading model. with pilot symbol assisted diﬀerential detection. 1. where the inner code is a diﬀerential encoder. a simple iterative diﬀerential detection and decoding (IDDD) receiver is proposed and shown to be robust for diﬀerent Doppler shifts.lehigh. and then we move to the general case where an arbitrary (random) LDPC code is concatenated with an accumulator (Part II) [3]. Bethlehem. coherent detection may become expensive or infeasible in some cases. They also revolutionized the coding theory by establishing a new softiterative paradigm. One important application of LDPC codes is wireless communications. In this series of twopart papers. is studied in Part II Li 2008. Article ID 824673. and phase ambiguity. In the coherent detection case. due to practical issues like complexity. we investigate the theory and practice of LDPC codes with diﬀerential encoding. USA Correspondence should be addressed to Jing Li (Tiﬀany). Divsalar’s simple bounds and iterative thresholds using density evolution are computed to quantify the code performance at ﬁnite and inﬁnite lengths. Copyright © 2008 Jing Li (Tiﬀany). Compared to turbo codes. LDPC codes boast a lower complexity in decoding. and a more eﬃcient way is to separate pilots from the trellis. provided the original work is properly cited. the widespread practice of inserting pilot symbols to terminate the trellis actually incurs a loss in capacity. In the context of noncoherent detection. and not being patented. on ﬂat Rayleigh fading channels. where sender and receiver communicate through. INTRODUCTION The discovery of turbo codes and the rediscovery of lowdensity paritycheck (LDPC) codes have renewed the research frontier of capacityachieving codes [1. In the noncoherent detection case. The more general case of DELDPC codes. for example. Through analysis and simulations. where the LDPC part may take arbitrary degree proﬁles. Lehigh University. Product accumulate codes. We start with a special class of diﬀerentially encoded LDPC (DELDPC) codes. Viewed from the coding perspective. a recursive convolutional code in the form of 1/(1 + D). a special class of diﬀerentiallyencoded low density parity check (DELDPC) codes with high performance and low complexity. a richer variety in code construction. sensitivity to tracking errors. PA 18015. proposed in [4] and depicted in Figure 1. assuming the carrier phase is perfectly synchronized and coherent detection is performed.Hindawi Publishing Corporation EURASIP Journal on Wireless Communications and Networking Volume 2008. that is. Diﬀerential encoding admits simple noncoherent diﬀerential detection which solves phase ambiguity and requires only frequency synchronization (often more readily available than phase synchronization).It is wellrecognized that LDPC codes perform remarkably well on Rayleigh fading channels. are a class of serially concatenated codes. Accepted 6 March 2008 Recommended by Yonghui Li Part I of a twopart series investigates product accumulate codes. jingli@ece. and the outer code is a parallel concatenation of two branches of singleparity . acquisition time. 14 pages doi:10. it is shown that PA codes perform very well with both coherent and noncoherent detections. respectively.edu Received 19 November 2007. where long powerful codes are constructed from short simple codes and decoded through iterative message exchange and successive reﬁnement between component decoders.1155/2008/824673 Research Article Differentially Encoded LDPC Codes—Part I: Special Case of Product Accumulate Codes Jing Li (Tiffany) Electrical and Computer Engineering. or. but what if otherwise? It should be noted that. and reproduction in any medium. performing diﬀerential encoding is essentially concatenating the original code with an accumulator. namely. which permits unrestricted use. product accumulate (PA) codes (Part I). 2]. distribution. the technique of diﬀerential encoding becomes immediately relevant. This is an open access article distributed under the Creative Commons Attribution License.
2
EURASIP Journal on Wireless Communications and Networking 0.6 to 0.7 dB better than regular LDPC codes, but 0.5 dB worse than optimal irregular LDPC codes (whose maximal left degree is 50) on Rayleigh fading channels with coherent detection. Simulations of fairly long block lengths show a good agreement with the analytical results. When phase information is unavailable, the decoder/ detector will either proceed without phase information (completely blind), or entails some (coarse) estimation and compensation in the decoding process. We regard either case as noncoherent detection. The presence of a diﬀerential encoder in the code structure readily lands PA codes to noncoherent diﬀerential detection. Conventional diﬀerential detection (CDD) operates on two symbol intervals and recovers the information by subtracting the phase of the previous signal sample from the current signal sample. It is cheap to implement, but suﬀers as much as from 4 to 5 dB in bit error rate (BER) performance [6]. Closing the gap between CDD and diﬀerentially encoded coherent detection generally requires the extension of the observation window beyond two symbol intervals.The result is multisymbol diﬀerential detection (MSDD), exempliﬁed by maximumlikelihood (ML) multisymbol detection, trellisbased multisymbol detection with persurvivor processing, and their variations [7, 8]. MSDD performs signiﬁcantly better than CDD, at the cost of a considerably higher complexity which increases exponentially with the window size. To preserve the simplicity of PA codes, here we propose an eﬃcient iterative diﬀerential detection and decoding (IDDD) receiver which is robust against various Doppler spreads and can perform, for example, within 1 dB from coherent detection on fast fading channels. We investigate the impact of pilot spacing and ﬁlter lengths, and we show that the proposed PA IDDD receiver requires very moderate number of pilot symbols, compared to, for example, turbo codes [6]. It is quite expected that the percentage of pilots directly aﬀects the performance especially on very fast fading channels, but much less expected is that how these pilot symbols are inserted also makes a huge diﬀerence. Through extrinsic information transfer (EXIT) analysis [9], we show that the widespread practice of inserting pilot symbols to periodically terminate the trellis of the diﬀerential encoder inevitably [6, 7] incurs a loss in code capacity. We attribute this to what we call the “trellis segmentation” eﬀect, namely, error events are made much shorter in the periodically terminated trellis than otherwise. We propose that pilot symbols be separated from the trellis structure, and simulation conﬁrms the eﬃciency of the new method. From analysis and simulation, it is fair to say that PA codes perform well both with coherent and noncoherent detection. In Part II of this series of papers, we will show that conventional LDPC codes, such as regular LDPC codes with uniform column weight of 3 and optimized irregular ones reported in literature, actually perform poorly with noncoherent diﬀerential detection. We will discuss why, how, and how much we can change the situation. The rest of the paper is organized as follows. Section 2 introduces PA codes and the channel model. Section 3 analyzes the coherently detected PA codes on fading channels
p1
SPC1 SPC2
p2
D
Outer code (a)
Inner code
Outer code TPC/SPC π x1 Random interleaver C1 x2 . . . . . .
Inner code 1/(1 + D) 0 y1 y2 . . .
. . .
xn−1 xn
C2
yn−1 yn
Check Bit
y: observation from the channel x: input bit to 1/(1 + D), output bit from TPC/SPC (b)
Figure 1: PA codes (a), code structure (b). Graph representation.
check (SPC) codes or a structured LDPC code comprising degree1 and degree2 variable nodes. Since the accumulator can also be described using a sparse bipartite graph, a PA code is, overall, an LDPC code. Alternatively, it may also be regarded as a diﬀerentiallyencoded LDPC code, to emphasize the impact of the inner diﬀerential encoder.The reasons to study PA codes are multifold. First, PA codes exhibit an interesting threshold property and remarkable performance,and are well established as a class of “good” codes with rates ≥ 1/2 and performance within a few tenths of a dB from the Shannon limit [4]. Here, “good” is in the sense deﬁned by MacKay [2]. Second, PA codes are desirable for their simplicity. They are simple to describe, simple to encode and decode, and simple enough to allow rigorous theoretical analysis [4]. Comparatively, a random LDPC code can be expensive to describe and expensive to implement in VLSI (due to the diﬃculty of routing and wiring). Finally, PA codes are intrinsically diﬀerentially encoded, which naturally permits noncoherent diﬀerential detection without needing additional components. The primary interest is the noncoherent detection case, but for completeness of investigation and for comparison, we also include the case of coherent detection. Under the assumption that phase information is known, we compute Divsalar’s simple bounds to benchmark the performance of PA codes at ﬁnite code lengths [5], and we evaluate iterative thresholds using density evolution (DE) to benchmark the performance of PA codes at inﬁnite code lengths. The asymptotic thresholds reveal that PA codes are about from
Jing Li (Tiﬀany) using Divsalar’s simple bounds and iterative thresholds. Section 4 discusses noncoherent detection and decoding of PA codes and performs EXIT analysis. Finally, Section 5 summarizes the paper. 2. PA CODES AND CHANNEL MODEL and for Rayleigh fading channels without CSI, LNCSI (sk ) = ch
√
3
4E[αk ] rk , N0
(2)
2.1. Channel model We consider binary phase shiftkeying (BPSK) signaling (0 → +1, 1 → −1) over ﬂat Rayleigh fading channels. Assuming proper sampling of the outputs from the matched ﬁlter, the received discretetime baseband signal can be modeled as rk = αk e jθk sk + nk , where sk is the BPSKmodulated signal, nk is the i.i.d. complex AWGN with zero mean and variance σ 2 = N0 /2 in each dimension. The fading amplitude αk is modeled as a normalized Rayleigh random variable with E[α2 ] = 1 and pdf pA (αk ) = 2αk exp(−α2 ) for αk > 0, and k k the fading phase θk is uniformly distributed over [0, 2π]. For fully interleaved channels, αk ’s and θk ’s are independent for diﬀerent time indexes k. For insuﬃciently interleaved channels, they are correlated. We use the Jakes’ isotropic scattering land mobile Rayleigh channel model to describe the correlated Rayleigh process which has autocorrelation Rk = (1/2)J0 (2kπ fd Ts ), where fd Ts is the normalized Doppler spread, and J0 (·) is the 0th order Bessel function of the ﬁrst kind. Throughout the paper, θk is assumed known perfectly to the receiver/decoder in the coherent detection case, and unknown (and needs to be worked around) in the noncoherent detection case. Further, the receiver is said to have channel state information (CSI) if αk known (irrespective of θk ), and no CSI otherwise. 2.2. PA codes and decoding analysis A product accumulate code, as illustrated in Figure 1(a), consists of an accumulator (or a diﬀerential encoder) as the inner code, and a parallel concatenation of 2 branches of singleparity check codes as the outer code. PA codes are decoded through a softiterative process where soft extrinsic information is exchanged between component decoders conforming to the turbo principle. The outer code, modeled as a structured LDPC code, is decoded using the messagepassing algorithm. The inner code, taking the convolutional form of 1/(1 + D), may be decoded either using the trellisbased BCJR algorithm, or a graphbased messagepassing algorithm. The latter, thanks to the cyclefree code graph of 1/(1 + D), performs as optimally as the BCJR algorithm, but consumes several times less of complexity [4, 10]. Thus, the entire code can be eﬃciently decoded through a uniﬁed messagepassing algorithm, driven by the initial loglikelihood ratio (LLR) values extracted from the channel [4]. For Rayleigh fading channels with perfect CSI, that is, αk is known ∀k, the initial channelLLRs are computed using LCSI sk = ch 4αk rk , N0 (1)
where E[α] = π/2 is the mean of α. Due to the space limitation, we omit the details of the overall messagepassing algorithm, but refer readers to [4]. 3. COHERENT DETECTION
This section investigates the coherent detection case on Rayleigh fading channels. We employ Divsalar’s simple bounds and the iterative threshold to analyze the ensemble average performance of PA codes, and simulate individual PA codes at short and long lengths. 3.1. Simple bounds
Union bounds are simple to compute, but are rather loose at low SNRs. Divsalar’s simple bound is possibly one of the best closedform bounds [5]. Like many other tight bounds, the simple bound is based on the second Gallager’s bounding techniques [1]. By using numerical integration instead of a Chernoﬀ bound and by reducing the number of codewords to be included in the bound, Divsalar was able to tighten the bound to overcome the cutoﬀ rate limitation. Since the simple bound requires the knowledge of the distance spectrum, a hardtoattain property especially for concatenated codes, it has not seen wide application. Here, the simplicity of PA codes permits an accurate computation of the ensembleaverage distance spectrum (whose details can be found in [4]), and thus enables the exploitation of the simple bound. The technique of the simple bound allows for the computation of either a maximum likelihood (ML) threshold in the asymptotic sense [4, 5], or a performance upper bound with respect to a given ﬁnite length. Divsalar derived the general form of the simple bound on independent Rayleigh fading channels with perfect CSI. Following a similar line of reasoning, below we extend it to the case of nonCSI. 3.1.1. Gallager’s second bounding technique Gallager’s second bounding technique sets the base for many tight bounds including the simple bounds [1]. It states that Pr (error) ≤ Pr (error, r ∈ R) + Pr (r ∈ R), / (3)
where r = γ α s + n is the received codeword (Ndimensional noisecorrupted vector), s is the transmitted codeword vector, n is the noise vector whose components are i.i.d. Gaussian random variables with zero mean and unit variance, γ is the known constant (in modulation), α is the N × N matrix containing fading coeﬃcients (α is an identity matrix for AWGN channels),and R denotes a region in the observed space around the transmitted codeword. To get a tight bound, optimization and integration are usually needed to determine a meaningful R.
4 3.1.2. Divsalar’s simple bound for independent rayleigh fading channels with CSI
EURASIP Journal on Wireless Communications and Networking rescaling each coordinate of r so as to compensate for the eﬀect of fading R = r  α−1 r − ηγs
2
For Rayleigh fading channels, the decision metric is based on the minimization of the norm r − γαs, where s, r, and α are the transmitted signal, received signal, and the fading amplitude in vector form, respectively, and γ is the amplitude of the transmitted signal such that γ2 /2 = Es /N0 . For a good approximation of the error using (3),and for computational simplicity, the decision region R was chosen as an Ndimensional hypersphere centered at ηγαs and with √ radius NR, where η and R are the parameters to be optimized [5]. When perfect CSI is available, the eﬀect of fading can be compensated through a linear transformation on γ α s. In particular, a rotation e jϕ and a rescaling ζ have shown to yield a good and analytically feasible solution [5] R = r  r − ζe jϕ γαs
2
≤ NR2 ,
(10)
where η and R are optimized. For independent Rayleigh channels without CSI, since accurate information on α is unavailable,we resort to the expectation of the fading coeﬃcient α−1 ≈ E[α−1 ] = (1/0.8862)I in (10), where I is an identity matrix. By replicating the computations described in [5], we obtain the upper bound of the bit error rate for independent Rayleigh channels without CSI:
2 N −K+1
√
P(e) ≤
h=2
min e−N E(c,δ,ρ) , exp
hυNγN (δ) c
h
(11) ,
2 × 1− √ 1 + 2/υ + 1 where
E(c, δ, ρ) = − log 1 − ρ + ρe2γN (δ)
≤ NR2 ,
(4)
which leads to the upper bound of the error probability of an (N, K, R) code [5] P(e)
2 N −K+1
√
1 2
+c 1+ min e−N E(c,δ,ρ,β,κ,φ) , eNγN (δ) 1 π
π/2 0
1 − ρ −2γN (δ) 1−δ 1+ e δ ρ
−1
−1
,
≤
h=2
sin2 θ dθ , sin2 θ+c (5)
h
ρ = 1+ β = 2c +
1 − β 2γN (δ) e β
,
−1
where
E(c, δ, ρ, β, κ, φ) = −ργN (δ) +
1−δ 1 − e−2γN (δ) δ 1−δ δ
2
(12)
1/2
ρ β 1−ρ 1−β log + log 2 ρ 2 1−ρ (1 − κ2 )ρ β
2
(1 + c)2 − 1
− (1 + c)
1−δ , δ
+ ρδ log (1 + c(1 − 2κφ)) + ρ(1 − δ) log 1 + c 1 − 2κφ − + (1 − ρ) log 1+c γ2 Es Eb = =R , 2 N0 N0 h , N h N
c = E2 [α]
γ2 Eb = 0.88622 R , 2 N0
υ = (γ2 /2)2 − 1 = (REb /N0 )2 − 1, and δ and γN (δ) are the same as in (8) and (9). Please note that the aforediscussed extension to the fading case with no CSI slightly loosens the simple bound, but it preserves the computational simplicity. It is possible for a more sophisticated transformation to yield tighter bounds but not necessarily a feasible analytical expression. Figure 2 plots the simulated BER performance and the simple bound of a (1024,512) PA code on independent Rayleigh fading channels with and without CSI. Since an optimal ML decoder is assumed, and the ensemble average distance spectrum is used, in the computation, the simple bound represents the best ensemble average performance, and may not accurately reﬂect the individual PA code being simulated. Nevertheless, we see that the bound is fairly tight. It provides a useful indication of the code performance at SNRs below the cutoﬀ rate, and, at high SNRs, it joins with the union bound to predict the error ﬂoor. 3.2. Threshold computation via the iterative analysis
1 − ρ(1 − 2κφ) (1 − ρ(1 − κ)) ,− 1−ρ (1 − ρ)(1 − β)
, (6)
c= δ=
(7) (8)
γ(δ) = γN
=
⎧ ⎪ 1 log A , ⎪ ⎪ h ⎪N ⎨ ⎪1 ⎪ ⎪ log ⎪ ⎩
for word error rate, for bit error rate.
(9)
N
w
w Aw,h , K
3.1.3. Extension of the simple bound to the case of No CSI Another simple and reasonable choice of the decision region is an ellipsoid centered at ηγs, which can be obtained by
The ML performance bound evaluated in the previous subsection factors in the ﬁnite length of a PA code ensemble,
and Q(x) = √ 2 ∞ (1/ 2π) x e−z /2 dz. simulations are evaluated after the 50th iteration.y √ Δ2 N0 κN0 Δζ κ + 2ΔζQ − √ π π . κ = exp(−Δ2 ζ 2 /2π).. the thresholds of PA codes on Rayleigh channels can be computed through (A. and fL(l) y denotes the pdf of LLR information on y after the lth decoding iteration. Although Gaussian approximation is reported to incur only very little inaccuracy on AWGN channels [12. Using (15) for perfect CSI case or (16) for no CSI case (i. 3.12) in the Appendix. 13].1.5 PA. Initial LLR pdf from the channel Hou et al. The latter is possible due to the simplicity in the code structure and in the decoding algorithm of PA codes. Table 1 compares the thresholds of PA codes with those of LDPC codes for several code rates. y (13) where y = ±1 is the BPSK modulated signals. Tracking the density of the messages requires the computation of the initial pdf of the LLR messages from the channel. substituting them in (A..e. We see that the analytical results are consistent with the simulation results for fairly large block sizes.6 dB from the channel capacity.5 PA codes. In general. Figure 3 plots the thresholds as well as the simulation results of PA codes on independent Rayleigh channels with and without CSI. As the block size and the number of iterations continue to increase. we expect the actual performance to converge to the thresholds. CSI Simu. Here. no CSI Figure 2: Divsalar simple bounds for R = 0. please refer to [4].y = ∞ 100 10−2 BER 0 N 4α2 8α2 . For the case when CSI is not available to the receiver. or. R = 0.2. 10−4 0 exp − (ζN0 /4α) − α N0 +1 N0 10−6 Using integrals from [15]. since the pdf of the initial LLRs from a fading channel looks diﬀerent from a Gaussian distribution.4) and (A.and the transformation of the message pdf ’s in each step of the decoding process. Hence. which are deﬁned as the critical channel condition that isrequired for the decoding process to converge unanimously to the correct decision: 0 where Δ = N0 /2(N0 + 1). 3. Below we account for the iterative nature of the practical decoder and compute an asymptotic iterative threshold using the renowned method of density evolution [11]. showed in [14] that the pdf of the LLRs from independent Rayleigh channelswith perfect CSI is given . (16) but the assumption of an ML decoder may be optimistic. exact density evolution is used to preserve accuracy. density evolution examines the probability density function (pdf) of exchanging messages in each step and can.. independent fading 5 by (assuming BPSK signaling and the allzero sequence is transmitted) fLCSI (ζ) = ch.3) to (A. N0 N0 N0 + 1 − 1 2 2 ζ N0 exp − 4π × ∞ (14) dα.y N0 ζ − ζ  1 + N0 · exp . The computed thresholds are a good indication of the performance limit as the code length and the number of iterations increase without bound. CSI Divsalar bound. no CSI Divsalar bound.literally speaking.2. but for details. we are more interested in the asymptotic SNR thresholds. to proceed analytically through discretized density evolution. and simulations of fairly η(dB) = min lim SNR l→∞ −∞ y fL(l) (ζ) dζ = 0 . we summarize the major steps of the discretized density evolution of PA codes in the Appendix.5) in the Appendix). more accurately and more eﬃciently. A useful tool for analyzing the iterative decoding process of sparsegraph codes. Evolution of LLR pdf in the decoder To track the evolution of the pdf ’s along the iterative process can either employ Monte Carlo simulation. we further simplify (14) to 2 4 6 8 10 Eb /N0 (dB) 12 14 16 fLCSI (ζ) = ch. η. The ergodic capacity of the independent Rayleigh fading channel is also listed as reference. We see that the thresholds of PA codes are about 0. · p(α)dα. we assume that the Rayleighfaded and AWGNcorrupted signals follow a Gaussian distribution in the most probable region.Jing Li (Tiﬀany) K = 512. 2 4 1 + N0 (15) Simu. As a selfcontained discussion.2. track the entire decoding process. the deviation is larger on fading channels. The pdf of the initial messages is then derived as fLNCSI (ζ) = ch.
which is consistent with the gap between the respective channel capacities.001. respectively. We observe that turbo codes perform about 0.2. Whereas K = 1 K PA code loses about 7 dB at BER = 10−4 as fd Ts changes from 0.8 3.5 3 3. [14]).5 0.33 4.5 2 2. large block sizes are about 0.0 3.56 4.4 dB from the thresholds. In each global iteration (i.06 — 4. codeword length N = 128 × 1024 = 1.e. data block size is 4 K. Short PA codes with rate 1/2 and 3/4 are simulated on two common fading scenarios with normalized Doppler spreads fd Ts = 0.4 Flat Rayleigh no CSI PA (dB) 3. it should be noted that this performance gain comes at a price of a considerably higher complexity.34 LDPC (dB) 3. This scheduling is found to strike the best tradeoﬀ between complexity and performance (with coherent detection).001 and 0. Code rate is 075. no CSI R = 2/3.72 Capacity (dB) 2. For comparison purpose. Coherent BPSK on correlated rayleigh channels Figure 5 shows the performance of PA codes on correlated fading channels. since slower Doppler rate brings smaller diversity order. However. CSI 1. PA codes seem one of th best.the performance of this rate 1/2.30.06 — 5. but are about 0. respectively. CSI R = 1/2.7 Flat Rayleigh CSI PA (dB) 2. Bit error rates after 20. 3. the impact of slow Doppler rate is less severe for larger block sizes than for smaller ones.6 2/3 Capacity (dB) 1. It should be noted that these irregular LDPC codes are speciﬁcally optimized for Rayleigh fading channels and have maximum variable node degree of 50. The simulated performance degradation due to the lack of CSI is about 0. several PA conﬁgurations are simulated on correlated and independent Rayleigh fading channels.6 EURASIP Journal on Wireless Communications and Networking Table 1: Thresholds (Eb /N0 in dB) of PA codes on Rayleigh channels ((3. 4 K. two local iterations of the outer decoding are performed. Compared to the (3.01.25 dB better than regular LDPC codes of length N = 105 and 106 on independent Rayleigh channels.01 to 0. the corresponding channel capacities are also shown.1.3..75 turbo code at the 6th iteration requires as many as 9720 operations per data Figure 3: Thresholds computed using density evolution and simulations (data block size K = 64 K).6 3.5 4 Eb /N0 (dB) 4. ρ) LDPC data by courtesy of Hou et al.3. and Srandom interleavers are used in both codes to lower the possible error ﬂoors. the logdomain BCJR decoding of a rate0.3. Simulation with coherent detection To benchmark the performance of coherently detected PA codes.9 dB.7 dB better than PA codes for fd Ts = 0.001. Rate 0. While the messagepassing decoding of a rate0. rate 1/2 PA codes are from about 0. and 64 K) are evaluated to demonstrate the interleaving gain. Compared to the thresholds of LDPC codes reported in [14].3 × 105 PA code is about 0. It is fair to say that PA codes perform on par with LDPC codes (using coherent detection).74 10−1 Simulations and thresholds of PA codes 10−2 10−3 R = 1/2. Perfect CSI is assumed available to the receiver. 1 K.5 5 5. 30.7 dB better (asymptotically) than (3.5 10−4 block sizes from short to large (512. Curves plotted are for PA codes at the 10th iteration and turbo codes at the 6th iteration. Due to the interleaver between the PA code and the channel.4 and 0.60.75 PA code at the 10th iteration requires about 267 operations per data bit [4]. 6)regular LDPC codes. the loss with K = 4 K PA code is less than 5 dB.42 3. 3. It is possible that optimized irregular LDPC codes will outperform PA codes (as indicated by their thresholds). we compare them with turbo codes (which are the bestknown codes at short code lengths) in Figure 5.48 5. As expected. To illuminate how well short PA codes perform on correlated channels. but for regular codes. The comparing turbo code has 16state component convolutional codes whose generator polynomial is (1. and an interleaver exists between the PA code and the channel (to partially break up the correlation between the neighboring bits). Coherent BPSK on independent rayleigh channels Figure 4 shows the performances of rate 1/2 PA codes on independent Rayleigh fading channels with and without channel state information.8 4. respectively.5 dB worse (asymptotically) than irregular LDPC codes.6 and 0. and 50 (global) iterations are plotted. 35/23)oct and which are decoded using logdomain BCJR algorithm. the performance deteriorates rapidly as fd Ts decreases.01 and 0. iteration between the inner decoder and the outer decoder). 6)regular LDPC codes reported in [14]. and data BER . 3.15 LDPC (dB) 4.
5 K = 4K 3 3. 4. however. for complexity concerns. fd Ts = 0. CSI 20. In practice.5 and 0. but unlike the case of multiple symbol detection. no CSI 20. and subscript i.01 Turbo.5 6 Figure 4: Performance of PA codes on independent Rayleigh fading channels. Trellis structure is employed to assist the detection and decoding of the inner diﬀerential code 1/(1 + D).5 PA.01 R = 1/2. 100 10−1 10−2 BER 10−3 10−4 10−5 10−6 Correlated fading. 35/23)oct . rate of turbo codes 0. independent fading.The IDDD receiver consists of a conventional diﬀerential detector with 2symbol observation window (the current and the previous).001 4. This is particularly so for the fast fading case where phases (θk ) are changing rapidly (will show later). (b) without CSI. yk ∈ {±1} (BPSK signal mapping 0 → +1. PA codes are still attractive for providing good performance at low lost. whose structure is shown in Figure 6. fd Ts = 0.5 100 10−1 10−2 BER 10−3 10−4 K = 512 K = 512 Shannon limit K = 1K K = 64 K 2 2. component codes of the turbo code (1. In theory. independent fading. R = 3/4. Hence. K = 4 K. The channel model of interest is a Rayleigh fading channel with correlated fading coeﬃcients. where the channel amplitudes (αk ’s) and phases (θk ’s) are correlated. ch. PA codes are inherently diﬀerentially encoded which makes it convenient for noncoherent diﬀerential detection. 50 iterations 7 100 10−1 10−2 BER 10−3 10−4 10−5 10−6 1.01 R = 3/4.1. diﬀerential decoding does not require pilot symbols. we consider a simple iterative diﬀerential detection and decoding receiver. and e to denote the quantities associated with the inner . The diﬀerential encoder implements yk = xk yk−1 for xk .5 3 3.001 Turbo.01. fd Ts = 0. normalized Doppler rate fd Ts = 0.001 R = 3/4. (a) With CSI. 10 iterations for PA codes. We use L to denote the LLR information.001 Figure 5: Performance of PA codes on correlated Rayleigh fading channels with CSI. NONCOHERENT DETECTION OF PA CODES This section considers noncoherent detection. rate of PA codes 0.5 Eb /N0 (dB) (b) K = 1K K = 4K 5 5. 64 K. fd Ts = 0. Let x denote the input to the inner diﬀerential encoder or the output from the outer code. Iterative differential detection and decoding 2 4 6 8 Eb /N0 (dB) 10 12 14 R = 1/2. the trellis is not expanded and has 2 states only. and the complex white Gaussian noise samples (nk ’s) are independent.superscript (q) to denote the qth (global) iteration. Code rate 0. fd Ts = 0. and 6 iterations for turbo codes.75. Although multiple symbol diﬀerential detection is possible. fd Ts = 0. some of the rk ’s (and yk ’s) in the received sequence are pilot symbols. a phase tracking ﬁlter and the original PA decoder (that used in coherent detection [4]). 0. bit. and let y denote the output from the diﬀerential encoder or the symbol to be put on the channel (see Figure 6).5 4 4. Hence.01. o.5 5 Shannon limit 10−5 K = 64 K 10−6 2. a complexity 35 times larger. to avoid catastrophic error propagation in diﬀerential decoding. 30. 1 → −1).75. 4 K. Soft information is passed back and forth among diﬀerent parts of the receiver conforming to the turbo principle. 50 iterations R = 0.5 PA. fd Ts = 0. Data block length 4 K. data block size 512.Jing Li (Tiﬀany) R = 0. 30. pilot symbols are inserted periodically even with multiple symbol detection. 1 K. 0.5 Eb /N0 (dB) (a) 4 4. The channel reception is given by rk = αk e jθk yk + nk . R = 3/4.001.5.
and “the extrinsic”. N0 Lch (xk ) ≈ 2uk 2. Conventional differential detector for the ﬁrst decoding iteration With the assumption that the carrier phases are near constant between two neighboring symbols. We propose a simple approximation which evaluates (17) with α substituted by its mean E[α]. but it is understood that proper interleaving and deinterleaving is performed whenever needed. instead of using the conventional diﬀerential decoding in the ﬁrst iteration. an exact evaluation of (18) and hence the computation of Lch (xk ) can be diﬃcult. x) fα (α) dα (18) fU α. we have ignored the existence of the random interleaver. b) is the Marcum Qfunction. 2N0 + N0 0 < xu < ∞. Computing soft information Lch (xk ) from uk requires the knowledge of the pdf of uk .o in the next detection/decoding iteration. (17) Alternatively. are fed into the conventional diﬀerential detector which computes uk = ∗ Real(rk rk−1 ) and subsequently soft LLR Lch (xk ) from uk . code. Under this Gaussian assumption.1. 4N0 4xu . e. 4N0 4uk  N0 . the fading channel. (20) An even more convenient compromise is to assume uk is Gaussian distributed. The conditional pdf of uk given αk and xk is [16] ⎧ xu − α2 /2 ⎪ 1 ⎪ ⎪ exp . 2 ∞ 0 Since the computation of Marcum Qfunction is slow and does not always converge at large values. N0 (19) 0 < xu < ∞. ≈ sign (uk ) (21) (22) α2 .X (u  α. ⎪ ⎪2N0 N0 ⎪ ⎪ ⎪ ⎨ xu − α2 /2 fU α. rk .1. the switch in Figure 6 is ﬂipped up. Here ∗ denotes the complex conjugate. N0 4xu . 4. The samples of the received symbols. in return.X (u  α. as is used in [17] and a few other papers. respectively. we get 2 fU X (u  x) ≈ N x. detector 1/(1 + D) inner decoder (detector) Channel estimator (ﬁlter) x π −1 Outer decoder π Iterative diﬀerential detector and decoder Figure 6: Structure of iterative diﬀerential detection and decoding receiver. x) αe−α dα. which. Lch (xk ) is then treated as L(1) (xk ) and fed into the outer decoder. diﬀ. This leads to ⎧ xu − π/8 ⎪ 1 ⎪ .8 EURASIP Journal on Wireless Communications and Networking αe jθ n y Channel r Conv. It is then possible to get the true pdf of uk using fU X (u  x) = =2 ∞ 0 fU α. ⎪ ⎪ 2N exp ⎪ 0 N0 ⎪ ⎪ ⎪ ⎨ xu − π/8 fU X (u  x) ≈ ⎪ 1 Q ⎪ 2N exp ⎪ 0 N0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ −∞ < xu ≤ 0. Starting from the second iteration.2. 2N0 + N0 .1. the switch in Figure 6 is ﬂipped down. In the ﬁrst iteration. In the above e.X (u  α. a channel estimation followed by the decoding of the inner 1/(1 + D) code can . x) =⎪ 1 exp Q ⎪ ⎪2N0 N0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ −∞ < xu ≤ 0.i discussion. 4. where Q(a. the outer code.i generates L(1) (xk ) and passes it to the inner decoder for use e.o e. the conventional Δ diﬀerential detector (in the ﬁrst iteration) performs uk = ∗ Real (rk rk−1 ). Hard decision of xk is obtained by simply checking the sign of uk . IDDD receiver Here is a sketch of how the proposed IDDD receiver operates. a decision is made by combining the extrinsic information from both the inner and the outer decoders: xk = sign(L(Q) (xk ) + L(Q) (xk )). π . The corresponding LLR from the channel can then be computed by Lch (xk ) = log Pr (uk  xk = +1) Pr (uk  xk = −1) 2uk  + log Q N0 π2 . After Q iterations. and channel estimation for αk and θk is performed before the “coherent” detection and decoding of the inner and outer code.
or a simple “moving average” can be used [6]. In the ﬁrst iteration. From the ﬁgure.i bits and a large positive number (i. +∞) for pilot symbols. (23) = (q−1) yk 2 fL(y) (η  Y = y) dη fL(y) (η  Y = +1) + fL(y) (η  Y = −1) ∞ −∞ fL(y) (η  Y = +1) 2 fL(y) (η  Y = +1) dη fL(y) (η  Y = +1) + fL(y) (−η  Y = +1) fL(Y ) (η  Y = +1) · log2 1 + e−η dη. are obtained from the WienerHopf equation ⎛ ⎜ ⎝ 8 7 6 5 Es /N0 = 6 dB True pdf f (u) (Monte Carlo) f (u  α) use E[α] for α Gaussian approximation 2 N(1. these curves denote the exact pdf of fU X (u  x = +1) from Monte Carlo simulations (histogram.. meanα approximation. Channel estimator The channel estimator in the IDDD receiver (Figure 6) may be implemented in several ways.i (yk ) is generated toge · log2 =1− ∞ ∞ . Here we use a linear ﬁlter of (2L + 1) taps to estimate αk ’s and θk ’s in the qth iteration (q) (q) αk e j θk where Rk = (1/2)J0 (2kπ fd Ts ). ⎟. Gaussian approximation still presents itself as a simple and viable approach for noncoherent diﬀerential decoding.1. but this inaccuracy turns out not severely aﬀecting the overall IDDD performance. The mutual information between the binary bit yk and its corresponding LLR values is deﬁned as I(Y . In EXIT charts. As shown later in Figure 13.3. the exchange of extrinsic information is visualized as a decoding/detection trajectory. Since the computation of pl ’s from (24) involves an inverse operation on a matrix (onetime job). This third option then leads to pilot symbol assisted modulation (PSAM). In such cases. since it is known to be optimal for estimating channel gain in the minimum meansquareerror (MMSE) sense. 4. . the “meanα approximated” pdf from (19) and the Gaussian approximated pdf from (21). (25) where pl denotes the coeﬃcient of the lth ﬁlter tap. yk = sign (q−1) (q−1) (Le. The LLR message Le. were previously used to depict the characteristics and relations of the component decoders. we plot in Figure 7 several curves approximating the pdf of uk . p−L .2. can be regarded as the numerical evaluation of (18)).1. For soft feedback. all the three treatments (Gaussian approximation.Jing Li (Tiﬀany) ×10−3 (q−1) 9 ther with Le. which has slightly higher complexity than using diﬀerential detection in the ﬁrst iteration. L(0) (yk )’s are initiated as zeros for coded e. The ﬁlter coeﬃcients. Rk s. To see how accurate the above treatments are. allowing the prediction of the decoding convergence and thresholds [9]. and denotes the estimate on yk from the feedback of the previous (q−1) (q−1) is computed using yk = iteration. when the correlation of the fading process.i (yk ))/2).2. .i (yk )). L(Y )) ∞ Δ 1 = fL(y) (η  Y = y) 2 y=±1 −∞ · log2 L = l=−L (q−1) pl yk−l rk−l . Several quantities. yk (q−1) (q−1) tanh((Le.We attribute this to the fact that the inaccuracy aﬀects mostly the ﬁrst iteration. p−L+1 .i (xk ) by the inner decoder in the (q − 1)th decoding iteration (please refer to [4] for the stepbystep messagepassing decoding algorithm of 1/(1 + D) code). and the equivalent SNR value. 2N0 + N0 ) Pdf: f (u) 4 3 2 1 0 −4 −2 R0 − N0 R1 ··· ⎜ R R0 − N0 · · · 1 ⎜ 0 2 u 4 6 8 RL−1 ··· ∗ Figure 7: Distribution of uk = Re{ yk yk−1 } in a conventional diﬀerential detection (assume “+1” transmitted). EXIT charts We perform EXIT analysis [9] to generate further insights into PA codes and the proposed noncoherent IDDD receiver. and subsequent iterations can help mitigate the loss. Thus. a lowpass ﬁlter. . the mean of the extrinsic LLR information. which occurs when the channel is very slow fading. which makes the ﬁrst iteration exactly the same as subsequent iterations. and PSAM) result in very similar decoding performance. Regarding the choice of the ﬁlter. like the bit error rate. 4. we take a Wiener ﬁlter. and for hard feedback. pL . Analysis of pilot insertion through EXIT charts 4.e. . but the mutual information is shown to be the most robust among all [9]. From the most sharp and asymmetric to the least sharp and symmetric. are known [18]. it may not be computable when the matrix becomes (near) singular. the Gaussian approximation does not reﬂect the true pdf well. =⎜ ⎝ ··· ⎠ R−L ⎜R ⎟ ⎜ −L−1 ⎟ RL ⎛ RL−2 ⎞ ··· ⎟·⎜ ⎟ ··· ··· ⎠ ⎝ ··· ⎠ · · · R0 − N0 pL RL−1 p−L RL−2 ⎟ ⎜ p−(L−1) ⎟ ⎟ ⎜ ⎟ (24) ⎞ ⎛ ⎞ be used.
4%.i /Ia. 4. 10%.8 Ie. 4. pilot symbols become the primary contributor to a priori information. since segmenting the trellis into small chunks signiﬁcantly increases the number of short error events. since the energy compensation for the rate loss due to excessive pilot more than outweighs the gain that can be obtained by a ﬁner channel tracking.5 0. such that pilot symbols are used to estimate the channel and at the same time participate in the trellis decoding. sponding to the diﬀerential decoder with 0%. 20% pilots Es /N0 = 0.75. We see that more pilot symbols actually degrade the performance. pilot symbols no longer constitute the key source of a priori information. the four curves in each SNR set are given the same energy per transmitted symbol and perfect knowledge on the fading phase and amplitude is provided to all the decoders (irrespective of the number of pilot symbols). perfect CSI. This is because when little information is provided from the outer code.75 dB. even provided with a perfect . 10. There exist at least two ways to insert pilot symbols in a diﬀerential encoder. since the initial density function evaluated in the latter cases is but an approximation of the actual pdf of the LLR messages. The number of pilot symbols inserted should be suﬃcient to attain a reasonable track of the channel. 20% pilots Out code of PA codes R = 0. 0. This tradeoﬀ issue has long been noted in literature. but it is not preserved on fading channels without CSI or with estimated (thus imperfect) CSI. but little attention has been given to another issue of no less importance. Es /N0 = 4. where L(Y ) is either the a priori information La (Y ) or the extrinsic information Le (Y ).o .o 0.75 dB and 0 dB. Pilot symbol insertion A practicality issue about noncoherent detection is pilot insertion. Thus the diﬀerence between the curves in each family is only due to the diﬀerence in pilot spacing. rendering an opposite eﬀect to spectrum thinning and thus deteriorating the performance. At the left end of the curves (when input mutual information is small).9 0. and 20% pilot insertion are plotted for two diﬀerent SNR values.7 0.6 0. The second equality holds when the channel is output symmetric such that fL(y) (η  Y = − y) = fL(y) (−η  Y = y). Thus. It is worth noting.5 0 0.i /Ie.o 0. Note that the consistency condition is an invariant in the messagepassing process on a number of channels including the AWGN channel and the independent Rayleigh fading channel with perfect CSI. 10.4 Ia.5 dB Es /N0 = 4. with 20% of pilot insertion (pilot spacing is 5). To eliminate the impact of other factors. 0.2.i /Ia.10 p p EURASIP Journal on Wireless Communications and Networking p p (a) (b) Figure 8: Trellis diagram of binary diﬀerential PSK with pilot insertion. how pilots should be inserted when diﬀerential encoding or other trellisbased coding/modulation frontend is used.75 Outer code of PA codes R = 0. and consequently incurs a loss in performance. and fL(Y ) (η  Y = y) is the conditional pdf.8 1 Figure 9: The eﬀect of pilot symbols segmenting the trellis on the performance of the diﬀerential decoder. for example. denoted as Ie.o . Normalized Doppler rate fd Ts = 0. but actually degrade the overall performance.2.2 0. denoted as Ia. this turns out to be a bad strategy. (a) Pilot symbols periodically terminate the trellis. 7].4 Eﬀect of pilots segmenting the trellis.01. EXIT curves corre 1 0. (25) should be used to compute the mutual information in those cases. the reason being. 0. and the Yaxis to represent the mutual information from the inner code or to the outer code.6 0. namely. However. 4. (b) Pilot symbols are separated from the trellis structure. as shown in Figure 8(a). Seemingly plausible. The negative eﬀect of trellis segmentation is best illustrated by the EXIT chart in Figure 9. a larger number of pilot symbols correspond to a better performance (a higher output mutual information). the situation is completely reversed toward the right end of the EXIT curves. The performance loss is more severe when more pilot symbols are inserted and when the code is operating at a relatively low SNR level. Many researchers have reported that excessive pilot symbols not only cause wasteful bandwidth expansion. given suﬃcient information provided by the outer code. they segment the trellis and shorten error events. We use the Xaxis to represent the mutual information to the inner code (a prior) or from the outer code (extrinsic).i /Ie. on the other hand.5 dB. Es /N0 = 4. but not in excess. The widespread approach is to periodically terminate the trellis [6. and the third equality holds when the received messages satisfy the consistency condition (also known as the symmetry condition): fL(y) (η  Y = y) = fL(y) (−η  Y = y)e yη [11].
ideal 10%. the inner and outer EXIT curves will intersect. the authors terminated the trellis periodically with pilot symbols. the more severe the impact. As such. turbo product codes) and convolutional codes. pilots separated 10%.05. the trellis decoder nevertheless fails to produce suﬃcient output mutual information Ie. and potentially betterperforming. at a paradox result that noncoherent detection (dashed line) performs (noticeably) better than coherent detection (rightmost solid line)! 4. such as <6 symbols. see Figure 10. (2) Smaller pilot spacing. IDDD. fd Ts = 0.01. way of pilot insertion. It is interesting to note that if one overlooks the impact of pilot insertion strategies. we also plot the case where pilot symbols periodically terminate the trellis (dashed line). 32 K). Compared to diﬀerentially encoded turbo codes [6]. due to trellis segmentation.01.g. we simulate the performance of a rate 1/2.5. The normalized Doppler spread is fd Ts = 0. We observe that the more the pilot symbols. data block size 32 K.5 6 6.02 or 0. 0. Normalized Doppler rate fd Ts = 0.5 7 7. see Figure 8(b). Impact of the pilot symbol spacing and ﬁlter length We now investigate how the number of pilot symbols and the length of the estimation ﬁlter aﬀect the performance of noncoherent detection. At the normalized Doppler rate of 0. so that error ﬂoors do not occur too early. and the lower the code rate (lower SNR). unless the outer code is itself a capacityachieving code at some SNR.. In this speciﬁc case.Jing Li (Tiﬀany) mutual information from the outer code (Ia. where 10% of pilot symbols are inserted using the strategy in Figure 8(b) and where an 81tap wiener ﬁlter is used to estimate the channel. namely. Alternatively. This is because an ideal LDPC code has an EXIT curve shaping like a box (e. which may have made the . Solid lines represent the cases where perfect channel knowledge is known to the receiver.3. separating pilots from the trellis and thus not aﬀecting error events. pilot spacing beyond 7–9 symbols will soon cause drastic performance degradation. 10% 11 10−2 10−3 BER 10−4 Performance gap due to diﬀerent pilot insertion strategies 10−5 10−6 3. but at very fast Doppler rate of 0.05. Comparing solid curves. For comparison. experiences inferior performance when pilot spacing is small. To verify the analytical results. It should be pointed out. “good” codes like LDPC codes will likely see a much smaller impact.01.e. ideal. We observe the following: (1) The IDDD receiver is rather robust for diﬀerent Doppler rates. Many (outer) codes. pilots separated Figure 10: Performance of PA codes with diﬀerent pilot insertion strategies. 10 iterations. (3) The code performance at high Doppler rates is more sensitive to pilot spacing than that at lower Doppler rates. block turbo codes (i. whose consumption of additional energy more than outweighs any gain it may bring. the inner EXIT curve is bound to intersect the outer EXIT curve at a rather early stage of the iterative process. result in convergence failure and cause error ﬂoors. or.01 (already fast fading). but the channel remains noisy). and error rates evaluated after 10 decoding iterations. PA codes appear to require fewer pilot symbols (we note that in the study of diﬀerentially encoded turbo codes in [6]. these codes alone are not “good” codes (good in the sense as MacKay deﬁned in [2]). we see a drastic performance gap results from diﬀerent strategies of pilot insertion. Speciﬁcally. trellissegmented pilot insertion losses more than 3 dB at BER of 10−4 than otherwise. It is therefore particularly important to keep the number of pilot symbols in such schemes minimal. see [3. The implication of this EXIT analysis is that the widespread approach of inserting pilot symbols as part of the trellis could cause deﬁciency for diﬀerential encoding (and other serial concatenated schemes with inner trellis codes).The dashed curve corresponds to the same PA code noncoherentlydetected via the IDDD receiver discussed before.5 4 4.5 8 Eb /N0 (dB) 0%. by segmenting the trellis every 10 symbols. pilots term trellis 10%.i .01. is undesirable.5 5 5. However. which. 0% or 10% pilot insertion. Figure 3]) which can produce perfect output information as long as the input information is above some threshold (without requiring Ia.i = 1). one might arrives 10−1 (64 K. put another way. code rate 0. the higher the error ﬂoor. and dashed lines represent the case where noncoherent detection is used. data block size K = 32 K PA code with diﬀerent strategies of pilot insertion. since these (outer) codes require suﬃcient input information in order to produce perfect output information.. that the level of the impact caused by trellis segmentation may be very diﬀerent for diﬀerent outer codes. will see a large impact. including single parity check codes. one may also interpret it as: ideal LDPC codes have large minimum distances and are capable of correcting short error events including those caused by the segmentation eﬀect. ideal.i = 1. Figure 11 illustrates the impact of diﬀerent pilot spacing on the BER performance of fast fading channels where the normalized Doppler spread takes fd Ts = 0. causing the iterative decoder to fail at a high BER level (not to mention this EXIT curve has 20% more of energy consumption than the nopilot case). This analysis also suggests an alternative. noncoherently detected PA codes tolerate pilot spacing as small as 6 symbols and as large as 45 to 50 symbols (put aside the bandwidth issue).
Short block size of 1 K and large block size of 48 K are evaluated. fd Ts = 0. data block size 1 K. 4% of bandwidth expansion. The impact of the length of the channel tracking ﬁlter is also studied.55 dB away from the ideal coherent case at BER of 10−4 for block sizes of 48 K and 1 K.05 fd Ts = 0. a family of 5 BERversus. We have chosen these parameters on the basis of a set of simulations and tradingoﬀ between performance and complexity. 4% pilots. 4 to 6 local iterations of the outer code are performed. respectively.This is consistent with what has been reported Figure 13 shows the performance of rate 3/4 PA codes after 10 iterations on fast Rayleigh fading channels with Doppler rate Ts fd = 0. segment trellis 4.01 Rayleigh fading channel in Figure 12. that other codes may be more sensitive to the diﬀerence in decoding strategies especially the diﬀerence in the feedback information [6]. however. “IDDD1” uses the conventional diﬀerential detection with Gaussian approximation (22) to compute Lch (xk ) in the ﬁrst iteration.75. 10 (global) iterations each with 4 (local) iterations for the outer decoding. Noncoherent detection of PA codes with different receiver strategies We compare the BER performance of 4 types of IDDD strategies for a K = 1 K. and the performance is not very sensitive to hard or soft feedback either. simulations of a long PA code (K = 48 K) of the same (high) rate (R = 3/4) reveal a similar phenomenon.01 fd Ts = 0. We observe that while the ﬁlter length aﬀects the overall performance. Unless otherwise indicated.4. Simulation results of noncoherent detection 10−2 10−3 10−4 The performance of noncoherently detected PA codes on fast Rayleigh fading channels are presented below. meanα approximation IDDD3. Guass approximation IDDD2.01. 4 (local) iterations within the outer code of PA codes. 10 (global) iterations. Code rate 0.Eb /N0 curves. R = 3/4. ﬁlter length 65. . This satisfying performance is achieved with only 4% of pilot insertion and a very lowcomplexity IDDD receiver.01. 0 10 20 30 40 50 60 70 BER Pilot spacing fd Ts = 0. the BER curves shown are after 10 global iterations. Diﬀerent decoding strategies in the ﬁrst iteration does not aﬀect the performance much.12 fd Ts = 0.01. 10−1 K = 1 K. Comparison of noncoherent detection with coherent detection Figure 11: Eﬀect of the number of pilot symbols on the performance of noncoherent detected PA codes on correlated Rayleigh channels with fd Ts = 0. 4. “IDDD2” uses conventional diﬀerential detection with “meanα” approximation (20) in the ﬁrst iteration and soft feedback in all iterations. data block size 1 K. trellis segmentation incurs a small performance loss. PSAM IDDD4. Eb /N0 = 10 dB EURASIP Journal on Wireless Communications and Networking in other studies [6] and is not a new discovery. In both the coherent and the noncoherent case. accounting for rate loss due to pilot insertion.01. but since the pilot spacing is not very small (every 25 symbols). In each case. the eﬀect is not as drastic as the case in Figure 10.4. and soft feedback of yk in all iterations to assist channel estimation. we omit the plot. The three leftmost curves are the ideal coherent case with knowledge of fading amplitudes and phases provided to the receiver. In all cases. soft feedback. “IDDD3” is PSAM with soft feedback.01. ﬁlter length 65.75. and the two right curves are the noncoherent case where IDDD is used to track amplitudes and phases.01. Code rate 0.2. Although not shown. and “IDDD4” is PSAM with hard feedback. the impact is limited compared to pilot spacing. and in each global iteration. The noncoherent cases are about 1 dB and 0. It is possible. hard feedback. soft feedback. soft feedback. tolerant range of pilot spacing (at the small spacing end) smaller than otherwise). Hence.1. R = 3/4 PA code on a fd Ts = 0. 4. 4% of pilot symbols are inserted and curves shown are after 10 iterations. are plotted. PSAM Figure 12: Comparison of BER performance for several noncoherent receiver strategies on correlated Rayleigh channels with fd Ts = 0. 10 iterations 10−2 BER 10−3 10−4 10−5 6 7 8 9 Eb /N0 (dB) 10 11 12 IDDD1.4.02 fd Ts = 0.
1) Figure 13: Comparison of BER performance for several transmission/reception strategies for PA codes of large and small block sizes on correlated Rayleigh channels with fd Ts = 0. and extensive simulations are conducted. ideal.5) (A. pilots separated from trellis 4%. . (iv) fL(k). with low lineartime complexity and performances close to the Shannon limit. where α. It is shown that PA codes not only perform remarkably well with coherent detection.d = fL(0). A simple iterative differential detection and decoding (IDDD) strategy allows PA codes to perform only 1 dB away from the coherent case.1) as fγ = R( fα .4) (A. The discretized density evolution of a rate t/(t + 2) PA code can then be summarized as follows [4]: initialization: fL(0) = fL(0) = fL(0). the relevant operations on the messages (in LLR form) include the sum in the real domain and the tanh operation (also known as the check operation or operation). we denote this operation (A.2) 5. R fα . where Δ is the quantization interval.y : pdf of the messages of the received signals y obtained from the channel (see Figure 1(b)). To simplify the notation. fLch. β.3) (A. respectively. ideal 4%.y fL(k) = R2 fLch. but the embedded diﬀerential encoder makes them naturally suitable for noncoherent detection. yet PA codes require much less decoding complexity than turbo codes and much less encoding complexity and memory than random LDPC codes. e. Code rate 0. Another useful ﬁnding reveals that the widespread practice of inserting pilot symbols to terminate the trellis actually incurs performance loss compared to when pilot symbols are inserted as separate parts from the trellis.75. Useful analytical tools including Divsalar’s simple bounds. This paper performs a comprehensive study of product accumulate codes on Rayleigh fading channels with both coherent and noncoherent detection. Obviously. deﬁne: Δ γ = α β = Q(2tanh−1 (tanh(α/2) tanh(β/2))). o . IDDD.01. Subscripts d and p denote data and parity bit.x e.y e1 e2 inner code: fL(k) = R fL(k−1) .y o. e . 10 (global) iterations each with 4 (local) iterations for the outer decoding.Jing Li (Tiﬀany) 10−1 R = 3/4.d fL(k) = fL(k) .x o . density evolution. fL(0) = fL(0) = δ(0). fγ . and Q deﬁnes the quantization operation.(·) : pdf ’s of the extrinsic information e1 e2 computed from the upper and lower branch of the outer code in the kth iteration.x e. (ii) fL(k) : pdf of the (a prior) messages of the input x to o.p e.y ∗ fL(k−1) . (ii) the performance is comparable to turbo and LDPC codes.01.y ∗ fL(k) . pilots separated from trellis 4%. o.7) Previous work has established product accumulate codes as a class of provenly “good” codes on AWGN channels. The pdf of γ. the resulting pdf of the sum is the discrete convolution (denoted by ∗) of the component pdf ’s which can be eﬃciently implemented using a fast Fourier transform (FFT). (iii) fL(k) : pdf of the (extrinsic) messages passed from the e . respectively.y innertoouter: fL(k) = fL(k) . .p function. R fα . and γ are quantiﬁed messages. For independent messages to add together.x inner code to the outer code in the kth iteration. and (iii) the regular structure of PA codes makes it possible for lowcost implementation in hardware. pilots terminate trellis 4%. can be computed using fγ [k] = (i. . ideal. fα · · · k−1 Δ . CONCLUSION The following notations are also used: (i) fLch.x e. We conclude by proposing product accumulate codes as a promising lowcost candidate for wireless applications. .(·) and fL(k). we further denote R k ( fα ) = R fα . ﬁlter length 65. j):kΔ=iΔ jΔ fα [i] · fβ [ j]. and EXIT charts are employed.x (A. For the tanh operation on messages. IDDD. 10 iterations 13 APPENDIX DISCRETIZED DENSITY EVOLUTION FOR PA CODES 10−2 10−3 BER 10−4 K = 48 K K = 1K 10−5 10−6 5 6 7 8 9 10 11 12 Eb /N0 (dB) 0%. The advantages of PA codes include (i) they perform very well with coherent and noncoherent detection (especially at high rates). fβ ). e.x the inner 1/(1 + D) code in the kth iteration (obtained from the outer code in the k − 1th iteration) (see Figure 1(b)). fd Ts = 0. the Kronecker delta e2. 4% of bandwidth expansion. and using induction on the above equation. pilots terminate trellis Using messagepassing decoding.d = δ(0).d e2. (A. . data block size 48 K and 1 K. (A.6) (A.
S. Cambridge. 2004.d o . 837–843. Peleg and S. Gradshteyn and I. K. P. June 2000.” in Proceedings of the IEEE International Conference on Acoustics. R(t−1) fL(k) ∗ fL(k−1) o . Xie and J. 47. New York. [7] P. Academic Press.11) [13] fL(k). L.12) Although the outer code of PA codes can be viewed as an LDPC code.14 outer code: fL(k). M. and R. which leads to a faster convergence [4]. 19. and L. M. [2] D. L. 657–670.p = Rt fL(k) ∗ fL(k).Y. Seattle. Hoeher. 288.” in Proceedings of the IEEE International [14] [15] [16] [17] [18] . Hoeher and J. USA. 2. Li. April 1997. Italy.8) (A. 1980. “A comparison of optimal and suboptimal MAP decoding algorithms operating in the log domain. vol. G. “Product accumulate codes: a class of codes with nearcapacity performance and low decoding complexity. 2001. 33. July 2006.” IEEE Journal on Selected Areas in Communications. 1.S. vol. 2001. vol. Lodge. vol. Tables of Integrals. 1845–1848. pp.d = R fL(k) . since this allows the checks corresponding to the two SPC branches to take turns to update.d .” in Proceedings of the IEEE International Symposium on Information Theory (ISIT ’00). Urbanke. Urbanke. no. CCF0430634 and CCF0635199. pp. 49. no. it is desirable to take a serial update procedure as described above rather than a parallel one as in a conventional LDPC code. H. 1996. Georghiades. 2. S. 2001. K. Wash. D. 5.p + fL(k). T. [4] J. 47. ““Turbo DPSK”: iterative diﬀerential PSK demodulation and channel decoding. (A. 31–46. [8] M. no. [10] P. vol. pp. 1727–1737. Valenti and B. 1999. “Good errorcorrecting codes based on very sparse matrices.” IEEE Transactions on Communications. New York. R(t−1) fL(k) ∗ fL(k).” IEEE Transactions on Communications. and P. 1697–1705. MacKay. C. Gallager. “Upper bounds to error probabilities of coded systems beyond the cutoﬀ rate. Chung. 1018–1020. Hoeher. Kluwer Academic Publishers. 9. Seattle. B. Li.p t fL(k). T. John Wiley & Sons. “Iterative decoding of coded and interleaved noncoherent multiple symbol detected DPSK. I. G.p = Rt fL(k) ∗ fL(k−1) . pp.d = R fL(k) .d e2 . Kaiser. Low Density Parity Check Codes. “Diﬀerentially encoded LDPC codes—part II: general case and code optimization. pp. 2001. L. M. Norwell. NY. Siegel. Brink.” in Proceedings of IEEE International Symposium on Information Theory (ISIT 06). vol. p. Richardson. no. Sorrento. no. vol. Mass. USA. Simon and M. e2 o . NY. C. Ryzhik.d fL(k). t+2 2t + 2 (A. (A. USA.d e1 outertoinner: fL(k+1) = o. Digital Communication over Fading Channels. Shamai. USA. vol. REFERENCES [1] R. and C.x . N. Narayanan. 2001. “Performance analysis and code optimization of low density paritycheck codes on Rayleigh fading channels. 1963. “Twodimensional pilotsymbolaided channel estimation by Wiener ﬁltering. Robertson. vol.10) [12] [11] Conference on Communications (ICC ’95). Robertson.d + fL(k). 19. 399–431. Biglieri.d e2 . Munich. Mass. 10. 1999.d fL(k).9) (A. USA. “On accuracy of Gaussian assumption in iterative analysis for LDPC codes. 924–934. “Convergence behavior of iteratively decoded parallel concatenated codes. Stuber. 2398–2402. 50. Shokrollahi. pp. [9] S. 1009– 1013. J. and R.” IEEE Transactions on Information Theory. Series and Products. pp.p e2 o . Hou.” IEEE Transactions on Information Theory. Principles of Mobile Communications. pp.” IEEE Transactions on Information Theory. EURASIP Journal on Wireless Communications and Networking . R. 619–637.d e1 fL(k). [3] J. 2000. USA.” Electronics Letters. 47. J. ACKNOWLEDGMENTS This research work is supported in part by the National Science Foundation under Grants no.” IEEE Journal on Selected Areas in Communications.” to appear in EURASIP Journal on Wireless Communications and Networking. and by the Commonwealth of Pennsylvania through the Pennsylvania Infrastructure Technology Alliance (PITA). Li. no. no. Richardson. 1997. T. pp. no. [6] M. 2. Alouini. MIT Press. pp. Woerner. Villebrun. K. “Design of capacityapproaching irregular lowdensity paritycheck codes. Speech and Signal Processing (ICASSP ’97). Milstein. June 1995. “Analysis of sumproduct decoding of lowdensity paritycheck codes using a Gaussian approximation. Germany. 12. 45. J. “Iterative channel estimation and decoding of pilot symbol assisted turbo codes over ﬂatfading channels. P. pp. vol. E.p e1 o . e1 o . 6. Wash. no.d e2 e1 e2 + e1 .” IEEE Transactions on Information Theory. A. J. 2. [5] D. pp. S. and P. Divsalar and E.
investigation was largely carried out from a pure coding perspective. and directly relates to other . Article ID 367287. but how about their performance with noncoherent detection and noncoherent diﬀerential detection in particular? This series of twopart papers aim to generate useful insight and engineering rules. and the inner code is a diﬀerential encoder 1/(1 + D). we considered a special class of diﬀerentially encoded LDPC (DELDPC) codes. The analysis here reveals that a conventional LDPC code. Copyright © 2008 Jing Li (Tiﬀany). in Part II of this series of papers. we provide a characterization of the type of LDPC degree proﬁles that work in harmony with diﬀerential detection (or a recursive inner code in general). USA Correspondence should be addressed to Jing Li (Tiﬀany). and reproduction in any medium. Bethlehem. lowdensity paritycheck (LDPC) codes have become and continue to be a favorable coding strategy for researchers and practitioners. This is an open access article distributed under the Creative Commons Attribution License. PA codes perform quite well with coherent detection as well as noncoherent diﬀerential detection [1]. The question of how LDPC codes perform with diﬀerential coding is a worthy one [3–6]. Part I showed that a special class of DELDPC codes.lehigh. and the widescale availability of software radio.1155/2008/367287 Research Article Differentially Encoded LDPC Codes—Part II: General Case and Code Optimization Jing Li (Tiffany) Electrical and Computer Engineering. distribution. jingli@ece. many wireless systems adopt the latter. to study the general case of diﬀerentially encoded LDPC codes. which permits unrestricted use. in general. structured LDPC code with left (variable) degree proﬁle λ(x) = 1/(t + 1) + t/(t + 2)x and right (check) degree proﬁle ρ(x) = xt . 10 pages doi:10. The convergenceconstraint method provides a useful extension to the conventional “thresholdconstraint” method. and demonstrate how to optimize these LDPC codes. despite their simplicity. In wireless communications. The problem we wish to investigate is: LDPC codes perform remarkably well with coherent detection. Considering that the former may result in a nontrivial expansion of bandwidth especially on fastchanging channels. This motivates us. deliver a desirable performance when detected noncoherently. accurate phase estimation may in many cases be very expensive or infeasible. Through extrinsic information transfer (EXIT) analysis and a modiﬁed “convergenceconstraint” density evolution (DE) method developed here. Practical noncoherent detection is generally performed in one of the two ways: inserting pilot symbols directly in the coded and modulated sequence to help track the channel (it is possible to insert either pilot tones or pilot symbols. Lehigh University. but the latter is found to be more eﬀective and is what of relevance to this paper). The outer code of a (p(t + 2). pt) PA code is a simple. where the prevailing assumption is that the synchronization and channel estimation are handled perfectly by the frontend receiver. PA 18015. and employing diﬀerential coding. In Part I of the series [1]. Their superb performance on various channel models and with various modulation schemes have been documented in many papers.Hindawi Publishing Corporation EURASIP Journal on Wireless Communications and Networking Volume 2008. is not ﬁtful for diﬀerential coding and does not. which calls for noncoherent detection. especially in the context of noncoherent detection. the very pervasive scope of their wellproven practical applications. and can match an outer LDPC code to any given inner code with the imperfectness of the inner decoder taken into consideration. We showed that. While the existing literature has shed great light on the theory and practice of LDPC codes. product accumulate (PA) codes [2]. provided the original work is properly cited. perform very well with both coherent and noncoherent detections. 1.edu Received 19 November 2007. INTRODUCTION With an increasingly mature status of the sparsegraph coding technology in a theoretical context. product accumulate codes. however. including satellite and radiorelay communications. Accepted 6 March 2008 Recommended by Yonghui Li This twopart series of papers studies the theory and practice of diﬀerentially encoded lowdensity paritycheck (DELDPC) codes.
or. a simple iterative diﬀerential detection and decoding (IDDD) receiver. are not as desirable for noncoherent detection. The channel model of interest here is ﬂat Rayleigh fading channels with additive white Gaussian noise (AWGN). unless otherwise stated. 2. it makes it possible to optimize LDPC codes to match to an (arbitrary) inner code/modulation with the imperfectness of the inner decoder/demodulator taken into account. and let αk e jθk be the fading coeﬃcient with Rayleigh distributed amplitude αk and uniformly distributed phase θk . Section 2 evaluates the performance of a conventional LDPC code with noncoherent detection. and how to ﬁnd them? Since the conventional diﬀerential detector (CDD) operating on two symbol intervals incurs a nontrivial performance loss [7]. On the other hand. the same as discussed in Part I [1]. besides PA codes. It also gives rise to the question of what special LDPC codes. is one form preferred over the other in the context of LDPC coding? What other DELDPC conﬁgurations. 2 and the power spectrum density (PSD) is given by S( f ) = P π 1 − f / fd 2 interesting problems.i. and no CSI otherwise. and the succeeding LDPC decoder acts as the outer decoder. and the far subtly why.d. which are obtained through a repeated application of density evolution at diﬀerent decoding stages. perform quite well with coherent detection as well as noncoherent detection [1]. Applying Jakes’ isotropic scattering land mobile Rayleigh channel model. to match to a speciﬁc channel or a speciﬁc inner code [12–15]. here we develop a convergenceconstraint density evolution method. for  f  < fd . let nk be the i. In the sequel. (2) where fd Ts is the normalized Doppler spread.2 EURASIP Journal on Wireless Communications and Networking but not for a diﬀerential detector or any recursive inner code. possibly performing poorly in the conventional scenario (such as the outer code of the PA code). perform remarkably with (noncoherent) diﬀerential detection [1]. 10]. let sk be the binary phase shift keying (BPSK) modulated signal at the transmitter. and P is a constant that is dependent on the average received power given a speciﬁc antenna and the distribution of the angles of the incoming power. a messagepassing decoder for the accumulator 1/(1 + D) [2]. what is the best strategy to apply LDPC codes in noncoherent detection—should diﬀerential coding be used or not? Modulation schemes such as the minimum phase shift keying (MPSK) have equivalent realizations in recursive and nonrecursive forms. and the new method eﬀectively captures the interaction and convergence between the inner and the outer EXIT curves through a set of “sample points. Although they were initially proposed solely as a visualization tool. in Part I of this series of papers. coupled with the phasetracking unit and the 1/(1 + D) decoder. through density evolution [11]. The conventional thresholdconstraint method [11. and unknown (and needs to be worked around) in the noncoherent detection case. Section 4 concludes the paper. a general DELDPC code does not. we take the IDDD receiver as the default noncoherent receiver in our discussion of DELDPC codes. the autocorrelation of αk is characterized by the 0thorder Bessel function of the ﬁrst kind 1 Rk = J0 (2kπ fd Ts ).” In that. may turn out right for diﬀerential modulation and detection? One remarkable property of LDPC codes is the possibility to design their degree proﬁles. This also explains why highrate PA codes. whose outer code has degree1 and degree2 nodes only. whose structure is shown in [1. The IDDD receiver comprises a CDD with 2symbol observation window (the current and the previous). We consider correlated channel fading coeﬃcients (so that noncoherent detection is possible). For example. the LDPC codes that perform well with a recursive receiver always have degree1 (and degree2) variable nodes. Those having minimum left degree of ≥2 are generally suitable for a nonrecursive inner code/modulator (1) . . Further. Let rk be the noisy signal at the receiver. a phasetracking Wiener ﬁlter. acts as the frontend. θk is assumed known perfectly to the receiver/decoder in the coherent detection case. the receiver is said to have channel state information (CSI) if αk known (irrespective of θk ). CODES MATCHED TO DIFFERENTIAL CODING Part I showed that PA codes. and compare it with that of PA codes. We study the convergence property of IDDD for a general DELDPC code. and since multiple symbol diﬀerential detectors (MSDD) [8] have a rather high complexity that increases exponentially with the window size. Further. are good for diﬀerential coding. The CDD. Throughout the paper. We attribute the reason to the mismatch of the convergence behavior between a conventional LDPC code and a diﬀerential decoder. f is the frequency band. while a highrate PA code yields desirable performance with noncoherent (diﬀerential) detection. This suggests that conventional LDPC codes. Figure 6]. Section 3 proposes the convergenceconstraint method to optimize LDPC codes to match to a given inner code and particular a diﬀerential detector. a special class of DELDPC codes. A somewhat unexpected ﬁnding is that. and a messagepassing decoder conﬁgured for the (outer) LDPC code. when the code rate is high. we developed. complex AWGN with zero mean and variance σ 2 = N0 /2 in each dimension. while an excellent choice for coherent detection. 10]. To make LDPC codes work in harmony with the noncoherent diﬀerential decoder of interest. the inner decoder of the serially concatenated system. Our study reveals that LDPC codes may be divided in two groups. these degree1 and 2 nodes become dominant. Soft reliability information in the form of loglikelihood ratio (LLR) is exchanged between the inner and the outer decoders to successively reﬁne the decision. τ is the lag parameter. This section reveals whether or not this also holds for general DELDPC codes. We have rk = αk e jθk sk + nk . 16] targets the best asymptotic threshold. The analysis makes essential use of the EXIT charts [9. through extrinsic information transfer (EXIT) charts [9. The rest of the paper is organized as follows.
o Diﬀerential code 1/(1 + D) Fading channel Outer code of PA code LDPC code (regular) LDPC code (irregular) 0. to rid oﬀ the artifact of coarse channel estimation and better contrast the diﬀerences between the recursive diﬀerential detector and the nonrecursive direct detector. regular or irregular. to fully harness the capacity (achievable rate) provided by the (noncoherent) inner diﬀerential decoder.i /Ia. If the amplitude and phase information is to be estimated and handled by the inner code as in actual noncoherent detection.15 0. may become quite suboptimal when operated together with a recursive inner code or a recursive modulation.32 dB. we evaluate a few examples of (DE)LDPC codes.3 0. Speciﬁcally.32. such Figure 1: EXIT curves of LDPC codes. especially on fastfading environments. and the EXIT curve of the diﬀerential decoder will also exhibit a deeper slope at the left end. and therefore present a better choice for bandwidthlimited wireless applications. will either intersect with the diﬀerential decoder. We used this genieaided case in the discussion. and optimized irregular LDPC code with ρ(x) = x20 and γ(x) = 0.3958x29 . Eb /N0 = 5. Eb /N0 = 5.1 0 0. code rate 3/4. demonstrate that the outer code of the PA code and the diﬀerential decoder match quite well. that is. especially the (optimized) irregular ones.0076 1 0. the outer EXIT curve should stay strictly below the inner EXIT curve.005 0. (4) All three outer codes have rate 3/4.1510x+0. Normalized Doppler rate 0.2201x6 +0. but a conventional LDPC code. The EXIT curves. With this in mind. A = 0 Ie dIa . PA codes can make use of the (intrinsic) diﬀerential code for noncoherent detection.2 0. and the channel is a correlated Rayleigh fading channel with AWGN and normalized Doppler rate of fd Ts = 0.055 0.29 0. Hence.24 0. Before providing simulations to conﬁrm our ﬁndings. plotted in Figure 1. When the auxiliary channel is an erasure channel and the subdecoder is an optimal one. agree very well with the direct detector.01. In comparison. and three conﬁgurations of the outer code: (1) the outer code of a PA code.12)regular LDPC code.12 0. diﬀerential decoder and the direct detector of Rayleigh channels.7 0. LDPC codes. we note that the EXIT curves of both inner codes in Figure 1 are computed using perfect knowledge of the fading coeﬃcients.6726—about 0.4 0. corresponds to the rate of the code [10]. On the other hand. R = 3/4 Rayleigh channel Diﬀerential code 0.1978x2 +0. This suggests that (conventional) LDPC codes perform better as a single code than being concatenated with a recursive inner code. 7 7 7 3 fd Ts = 0. .8 0. then the EXIT curve of the direct detector will show a small rising slope at the left end instead of being a ﬂat straight line all the way through. an LDPC code that is optimal in the usual sense.03537 +0.8 1 0.1510x+0. not using diﬀerential coding generally requires more pilot symbols in order to track the channel well. whose threshold is 0.2201x6 +0.6 0.35 0 0. (3.Jing Li (Tiﬀany) recent studies have revealed surprisingly elegant and useful properties of EXIT charts. as a diﬀerential encoder.4 Ia. The area property states that the area under 1 the EXIT curve.01.3958x29 . ρ(x) = x20 . causing decoder failure.0576 dB away from the AWGN capacity—and whose degree proﬁle is γ(x) = 0. causing a capacity loss. BPSK modulation and memoryless channels.085 0. since they either suﬀer from performance loss (with diﬀerential encoding) or incur a large bandwidth expansion (without diﬀerential encoding).013 0. and (3) an optimized irregular LDPC code reported in [17]. the relation is exact. the convergence property states that.03537 + 0.1978x2 +0. and (2) a direct detector. Put it another way. it is a good approximation [10]. the outer code of PA codes.01. which has degree proﬁle: λ(x) = 1 6 + x.9 (3) ρ(x) = x . where Ia and Ie denote the a priori (input) mutual information to and the extrinsic (output) mutual information from a particularly subdecoder.19 0.2 0. it is fair to say that (conventional) LDPC codes that boast outstanding performance under coherent detection may not be nearly as advantageous under noncoherent detection. (The computation of EXIT charts speciﬁc to DELDPC codes with IDDD receiver is discussed in [1]. a BPSK detector. leaving an open tunnel between the two curves. respectively.6 Ie.i /Ie.) We consider two conﬁgurations of the inner code: (1) a diﬀerential decoder for 1/(1 + D). The immediate implication of these properties is that.5 0. or leave a huge area between the curves. 12)regular LDPC code. otherwise. in order for the iterative decoder to converge successfully.0025 0.o 0. for example.026 PA (outer) Regular LDPC Irregular LDPC 0. the outer code must have an EXIT curve closely matched in shape and in position to that of the inner code. (2) a (3. On the other hand.
4
EURASIP Journal on Wireless Communications and Networking
10−1 K = 1 K, R = 3/4, fd Ts = 0.01, 10 iter
Figure 2 plots the BER performance curves of the same three codes speciﬁed in Figure 1 on Rayleigh channels with noncoherent detection. All the codes have data block size K = 1 K and code rate 3/4. Soft feedback is used in IDDD, the normalized Doppler spread is 0.01, and 2% or 4% pilot symbols are inserted to help track the channel. The two LDPC codes are evaluated either with or without a diﬀerential inner code. From the most power eﬃcient to the least power eﬃcient, the curves shown are (i) PA code with 4% of pilot symbols, (ii) PA code with 2% of pilot symbols, (iii) BPSKcoded irregular LDPC code with 4% of pilot symbols, (iv) BPSKcoded regular LDPC code with 4% of pilot symbols, (v) BPSKcoded irregular LDPC code with 2% of pilot symbols, (vi) diﬀerentially encoded irregular LDPC code with 4% of pilot symbols. It is evident that (conventional) LDPC codes suﬀer from a diﬀerential inner code. For example, with 4% of bandwidth expansion, BPSKcoded irregular and regular LDPC codes perform about 0.5 and 1 dB worse than PA codes at BER of 10−4 , respectively, but the diﬀerentially encoded irregular LDPC code falls short by more than 2.2 dB. Further, while the irregular LDPC code (not diﬀerentially coded) is moderately (0.5 dB) behind the PA code with 4% of pilot symbols, the gap becomes much more signiﬁcant when pilot symbols are reduced in half. For PA codes, 2% of pilot symbols remain adequate to support a desirable performance, but they become insuﬃcient to track the channel for nondiﬀerentially encoded LDPC codes, causing a considerable performance loss and an error ﬂoor as high as BER of 10−3 . Thus, the advantages of PA codes over (conventional) LDPC codes are rather apparent, especially in cases when noncoherent detection is required and when only limited bandwidth expansion is allowed. 3. CODE DESIGN FROM THE CONVERGENCE PROPERTY
10−2
BER
10−3
10−4
10−5 6 7 8 9 10 11 Eb /N0 (dB) 12 13 14
PA, 2% PA, 4% Irregular LDPC, 4%
Regular LDPC, 4% Irregular LDPC, 2% Irregular LDPC, 4%, dif. dec.
Figure 2: Comparison of PA codes and LDPC codes on fastfading Rayleigh channels with noncoherent detection and decoding. Solid line: PA codes, dashed lines: LDPC codes. Code rate 0.75, data block size 1 K, ﬁlter length 65, normalized Doppler spread 0.01, 10 global iterations, and 4 (local) iterations within LDPC codes or the outer code of PA codes inside each global iteration.
3.1. Problem formulation EXIT analysis and computer simulations in the previous section show that a conventional LDPC code does not ﬁt diﬀerential coding, but special cases such as the the outer code of PA codes do. This raises more interesting questions: what other (special) LDPC codes are also in harmony with diﬀerential encoding? What degree proﬁles do they have? Is it possible to characterize and optimize the degree proﬁles, and how? The fundamental tool to solve these questions lies in convex optimization. In [11], the optimization problem of the irregular LDPC degree proﬁles on AWGN channels was formulated as a dualitybased convex optimization problem, and an iterative method termed density evolution was proposed to solve the problem. In [16], a Gaussian approximation was applied to the density evolution method, which reduces the problem to be a linear optimization problem. Density evolution has since been exploited, in diﬀerent ﬂavors and possibly combined with diﬀerential evolution [18], to design good LDPC ensembles for a variety of communication channels and modulation schemes,
see, for example [12–15] and the references therein. The results reported in these previous papers are excellent, but they almost exclusively aimed at the asymptotic threshold, namely, their cost functions were set to minimize the SNR threshold for a target code rate, or, equivalently, to maximize the code rate for a target SNR threshold. This is well justiﬁed, since in these papers, the primary involvement of the channel is to provide the initial LLR information to trigger the start of the density evolution process. However, the problem we consider here is somewhat diﬀerent. Our goal is to design codes that can fully achieve the capacity provided by the given inner receiver, and the noncoherent diﬀerential decoder in particular. Considering that the inner receiver, due to the lack of channel knowledge or other practical constraints, may not be an optimal receiver, it is of paramount importance to control the interaction between the inner and the outer code, or, the convergence behavior as reﬂected in the matching of shape and position of the corresponding EXIT curves. To emphasize the diﬀerence, we thereafter refer to the conventional density evolution method as the “thresholdconstraint” method, and propose a “convergenceconstraint” method as a useful extension to the conventional method. The key idea of the proposed method is to sample the inner EXIT curve and design an (outer) EXIT curve that matches with these sample points, or, “control points.” Suppose we choose a set of M control points in the EXIT plane, denoted as (v1 , w1 ), (v2 , w2 ), . . . , (vM , wM ). Let To (·) be the inputoutput mutual information transfer function of the outer LDPC code (whose exact expression of To
Jing Li (Tiﬀany) will be deﬁned later in (17)), the optimization problem is formulated as max D R=1−
Dc j =2 ρ j / j Dv i=1 λi /i
5 two functions that will be useful in the discussion I(x) = 1 −
∞ −∞
√
Dv i=1 λi =1,
c j =2 ρ j =1
 To w k
(5)
≥ vk , k = 1, 2, . . . , M ,
⎧ ⎪ ⎨1 − √ 1 4πx φ(x) = ⎪ ⎩
1 −(z−x)2 /4x e log 1 + e−z dz, 2πx
2 z tanh e−(z−x) /4x dz, 2
(7)
x > 0, x = 0.
(8)
1,
where R denotes the code rate of the outer LDPC code, and λi and ρi denote the fraction of edges that connect to variable nodes and check nodes of degree i, respectively. The formulation in (5) assumes that the LLR messages at the input of the inner and the outer decoder are Gaussian distributed, and that the output extrinsic mutual information (MI) of an irregular LDPC code corresponds to a linear combination of the extrinsic MI from a set of regular codes. As reported in literature, the Gaussian assumption for LLR messages is less not far from reality on AWGN channels but less accurate on Rayleigh fading channels [12]. Nevertheless, Gaussian assumption is used for several reasons. The ﬁrst reason is simplicity and tractability. Tracking and optimizing the exact message pdf ’s involves tedious computation, which is exacerbated by the fact the proposed new method is governed by a set of control points, rather than a single control point as in the conventional method. Second, recall that to compute EXIT curves inevitably uses the Gaussian approximation. Thus, it seems well acceptable to adopt the same approximation when shaping and positioning an EXIT curve. Finally, characterizing and representing EXIT curves using mutual information help stabilize the process and alleviate the inaccuracy caused by Gaussian approximation and other factors. As conﬁrmed by many previous papers as well as this one, the optimization generates very good results in spite of the use of the Gaussian approximation. 3.2. The optimization method Below we detail the convergenceconstraint design method formulated in (5). We conform to the notations and the graphic framework presented in [16]. Let λ(x) = Dv1 λi xi−1 i= and ρ(x) = Dc2 ρi xi−1 be the degree proﬁles from the edge i= perspective, where Dv and Dc are the maximum variable node and check node degrees, and λi and ρi are the fraction of edges incident to variable nodes and check nodes of degree i. Similarly, let λ (x) = Dv1 λi xi−1 and ρ (x) = Dc2 ρi xi−1 be i= i= the degree proﬁles from the node perspective. Let R be the code rate. The following relation holds: λi = λi /i , Dv j =1 λ j / j ρi = ρi /i , Dc j =2 ρ j / j R = 1−
Dv i=1 λi /i . Dc j =1 ρ j / j
Function I(x) maps the message mean x to the corresponding mutual information (under Gaussian assumption), and φ(x) helps describe how the message mean evolves in tanh(y/2) operation, where y follows a Gaussian distribution with mean x and variance 2x. The complete design process takes a dual constraint optimization process that progressively optimizes variable node degree proﬁle λ(x) and check node degree proﬁle ρ(x) based on each other. Despite the duality in the formulation and the steps, optimizing λ(x) is far more critical to the code performance than optimizing ρ(x), largely because the optimal check node degree proﬁle are shown to follow the concentration rule [16]: ρ(x) = Δxk + (1 − Δ)xk+1 . (9)
It is therefore a common practice to preset ρ(x) according to (9) and code rate R, and optimize λ(x) only. For this reason, below we focus our discussion on optimizing λ(x) for a given ρ(x). Interested readers can formulate the optimization of ρ(x) in a similar way. 3.2.1. Thresholdconstraint method (optimizing λ(x)) Under the assumption that the messages passed along all the edges are i.i.d. and Gaussian distributed, the average messages variable nodes receive from their neighboring check nodes follow a mixed Gaussian distribution. From (l −1)th iteration to lth local iteration (in the LDPC decoder), the mean of the messages associated with the variable node, mv , evolves as
Dv
m(l) = v
i=2 Dv
λi N m(l) , 2m(l) v,i v,i
Dc
(10)
(l ρ j φ−1 1 − 1 − mv −1) j −1
=
i=2
λi φ m0 + (i − 1)
j =2
, (11)
where m0 denotes the mean of the initial messages received from the inner code (or the channel). Let us deﬁne hi (m0 , r) = φ m0 + (i − 1) h m0 , r =
Δ Δ Dc j =2
ρ j φ−1 1 − (1 − r) j −1
,
Dv
i=2
λi hi m0 , r . (12)
(6) Then (11) can be rewritten as Let superscript (l) denote the lth LDPC decoding iteration, and subscript v and c denote the quantities pertaining to variable nodes and check nodes, respectively. Further, deﬁne
Dv
rl = h m0 , rl−1 =
i=2
λi hi m0 , rl−1 .
(13)
6
EURASIP Journal on Wireless Communications and Networking 3.2.3. Linear programming The basic idea of convergenceconstraint design, as discussed before, is simple. Complication arises from the fact that constraint (ii) in (16) is a nonlinear function of λi ’s. Furthermore, observe that the determination of the optimization range, or, the computation of r ∗ from (17), requires the knowledge of λ(x), which is yet to be optimized. One possible approach to overcome this chickenandegg dilemma is to attempt an approximated λ(x) in (17) to compute r ∗ . Speciﬁcally, we propose accounting for the two lowest degree variable nodes λi1 and λi2 , and approximating the degree proﬁle as λ(x) = λi1 xi1 −1 +λi2 xi2 −1 +O λi2 +1 xi2 ≈ λi1 xi1 −1 + 1 − λi1 xi1 (18) in (17). First, this approximated λ(x) is only used in (17) to tentatively determine r ∗ , so that the optimization process can get started. The exact λ(x) in (16), (i) and (ii), is to be optimized. Second, the value of i1 and λi1 (or λi1 ) in the approximated λ(x) is calculated in one of the following two ways. Case 1. A conventional LDPC ensemble has i1 = 2, that is, no degree1 variable nodes. This is because the outbound messages from degree1 variable nodes do not improve over the messagepassing process. In that case, we consider only degree2 and 3 nodes (λi1 =2 and λi2 =3 ), upper bound the percentage of degree2 nodes with λ∗ , and treat all the rest 2 as degree3 nodes. The stability condition [11, 16] states that there exists a value ξ > 0 such that, given an initial 0 symmetric message density P0 satisfying −∞ P0 (x)dx < ξ, then the necessary and suﬃcient condition for the density evolution to converge to the zeroerror state is λ (0)ρ (1) < Δ ∞ eγ , where γ = − log( −∞ P0 (x)e−x/2 dx). Applying the stability condition on Gaussian messages with initial mean value c m0 , we get γ = m0 /4 and λ∗ = em0 /4 / D=2 ( j − 1)ρ j , or 2 j equivalently, λ∗ (w) = 2
j −1
The conventional thresholdconstraint density evolution guarantees that the degree proﬁle converges asymptotically to the zeroerror state at the given initial message mean m0 . This is achieved by enforcing [16] r > h m0 , r ,
∀r ∈ 0, φ m0 .
(14)
Viewed from the EXIT chart, the thresholdconstraint method has implicitly used a control point (v, w) = (1, I(m0 )), such that the resultant EXIT curve will stay below it. 3.2.2. Convergenceconstraint method (optimizing λ(x)) The proposed convergenceconstraint method extends the conventional thresholdconstraint method by introducing a set of control points, which may be placed in arbitrary positions in the EXIT plane, to control the shape and the position of the EXIT cure. Each control point (v, w) ∈ [0, 1]2 ensures that the EXIT curve will, at the input a priori mutual information w, produce extrinsic mutual information greater than v. This is reﬂected in the optimization process by changing (14) to r ∗ > h m0 , r ∗ ,
∗
∀ r ∗ ∈ 0, φ m0 ,
(15)
where r (≥ 0) is the threshold value that satisﬁes T0 (w) ≥ v. We can reformulate the problem as follows: for a given check node degree proﬁle ρ(x) and a control point (v, w) in the EXIT chart, where 0 ≤ v, w, ≤ 1, max D λi , i i=1
Dv Dv
v i=1 λi =1
subject to: (i)
i=1
λi = 1,
Dv
(ii)
i=1
λi hi m0 , r − r <0,
∀r ∈ r ∗ , φ m0 ,
(16) where m0 = I−1 (w) and r ∗ satisﬁes To (w) =
i=1 Δ Dv Dc
eI (w)/4 . Dc j =2 ρ j ( j − 1)
−1
(19)
λi I i
j =2
ρ j φ−1 1 − 1 − r ∗
≥ v.
(17)
Apparently, when v = 1, we get r ∗ = 0, and the case reduces to that of the conventional thresholdconstraint design. Hence, given a set of M control points, (v1 , w1 ), (v2 , w2 ), . . . , (vM , wM ), where 0 ≤ v1 < v2 < · · · < vM ≤ 1 and 0 ≤ w1 ≤ w2 ≤ · · · ≤ wM ≤ 1, one can combine the constraints associated with each individual control point and perform joint optimization, to control the shape and the position of the resulting EXIT curve. Speciﬁcally, when the set of control points are proper samples from the inner EXIT curve, the resultant EXIT curve represents an optimized LDPC ensemble that matches to the inner code.
It should be noted that not all values of wk from the M preselected control points are suitable for (19) in computing λ∗ . Since the stability condition ensures the 2 asymptotic convergence to the zeroerror state for a given input messages, λ2 ≤ λ∗ (w∗ ) is valid and required only 2 when the output mutual information will approach 1 at the input mutual information w∗ . What this implies in sampling the inner EXIT curve is that, at least one control point, say, the rightmost point (vM , wM ), should roughly satisfy the requirement: (vM , wM ) ≈ (1, wM ). This value of wM is then used in (19) to compute λ∗ = λ∗ (wM ), which is subsequently 2 2 used in λ(x) ≈ λ∗ x + (1 − λ∗ )x2 to compute r ∗ from (17). r ∗ 2 2 will then be applied to all the control points from 1 to M.
Thus. operating at 0.0828x11 + 0.0807x13 + 0. leaving not much room for improvement. The optimzed LDPC ensemble has code rate R = 0.Jing Li (Tiﬀany) Checks p Bits Checks Bits Error 7 points need not be large—in fact. Since the inner code imposes another level of checks on all the variable nodes. This is actually well reﬂected in the EXIT charts. these nodes are dominant.0495x9 + 0. these two degree1 variable nodes will cause a minimum distance of 4 for the entire codeword. or. the diﬀerence is very small in either asymptotic thresholds or ﬁnite length simulations). achieve a better SNR threshold. but the position is slightly lower.5 (see Figure 4). the LDPC ensemble optimal for diﬀerential coding always contains degree1 and degree2 variable nodes. and the decoder is unable to tell (undetectable error). 3. the area between the outer code of the PA code and the inner diﬀerential code is very small. When the LDPC code operates alone. we can employ the approximation λ(x) = (1 − R)+Rx in (17).75. For medium rates around 0. say the pth node and the qth node. these two variable nodes are apparently useless and wasteful. we let the ﬁrst and the second nonzero λi ’s be λ1 and λ2 .01 Rayleigh fading channel. for the same information rate. Optimization results and useful ﬁndings q Error LDPC Diﬀerential encoder Figure 3: Defect for λ1 > 1 − R. Case 2. Thus (19) rather than (20) is used in our design process.4599x + 0.0672 + 0.5037 and degree proﬁle λ(x) = 0. which leads to the computation of (a lower bound for) r ∗ . without loss of generality. we apply the concentration theorem in (9) and preselect ρ(x) that will make the the average column weight to be approximately 3. No analytical bounds on λ1 or λ2 were reported in literature for this case. at rate around 0.0720x10 + 0. This can be compensated by presetting the control points slightly higher than we actually want them to be. The set of control For complexity concerns. irrespective of the code length. Considering that the outer code of a PA code has only degree1 and degree2 variable nodes. other recursive inner code or modulation with memory. are the the only types of variable nodes in the degree proﬁle.0264x8 + 0. which connect to the same check. Our experiments show that the proposed method generates EXIT curves with a shape matching very well to what we desire. ρ(x) = x5 .5. First.3. It is also worth mentioning that when a Gaussian approximation is used on the message pdf ’s. indicating that an optimized outer code could acquire more information rate for the same SNR threshold. As rate 3/4 (see Figure 1). Since we use Gaussian approximation primarily for the purpose of complexity reduction. the area becomes much bigger. it is fair to say that PA are (near)optimal at high rates. and in some cases. Using this empirical bound on λ1 . where R ≥ 1/2 is the code rate. degree1 variable nodes in the outer LDPC code will get extrinsic information from the inner code. where R is the code rate (the exact code rate is dependent on the optimization result. (20) which is a weaker condition than (19). We suggest choosing 3 to 5 control points that can reasonably characterize the shape of the inner EXIT curve. Here the inner EXIT curve is computed through Monte Carlo simulations. We now discuss some observations and ﬁndings from our optimization experiments. The left degree proﬁle λ(x) is optimized through the convergenceconstraint method discussed in the previous subsection. We consider an inner diﬀerential code. When the LDPC code is combined with an inner recursive code. We see that the two EXIT curves match very well with each other. if λ1 > 1 − R. or. Consider the case when an LDPC code is iteratively decoded together with a diﬀerential encoder. instead of performing dual optimization. then there exist at least two degree1 variable nodes. In comparison.25 dB on a fd Ts = 0. an excessive number of control points actually makes the optimization process converge slow and at times converge poor. When the four bits associated with the solid circles ﬂip altogether. but less optimal at medium rates (the optimized LDPC ensemble contains slightly diﬀerent degree distribution than that of the PA code. Code optimization as formulated by the convergenceconstraint method can thus be solved using linear programming. and decoded using the noncoherent IDDD receiver with the help of using 10% pilot symbols. The rational is that. indicating that the resultant code rate is slightly pessimistic. another valid codeword results. and can be removed altogether. when the sequences are taken in (21) . and may be slightly diﬀerent from the target code rate). unnecessary application is therefore avoided.5 is shown in Figure 4. λ(x) = (1 − R)/(1 + R) + (2R/(1 + R))x.0760x14 . The optimization result of a target rate 0. and their estimates will improve with decoding iterations.0855x12 + 0. there exist also a good portion of highdegree variable nodes. as shown in Figure 3. It is rather expected that the choice of the control points directly aﬀects the optimization results. For high rate codes above 0. We propose to bound λ1 by λ1 ≤ 1 − R. the stability condition reduces to λ∗ (w) = 2 eI −1 (w)/4 Dc j =2 ( j − 1) ρj .
Figure 5 simulates the optimized rate0. The increased computing complexity and processing time are the price we pay for reaching out to the limit.75 dB away from the asymptotic threshold of 3. LDPC. LDPC. ρ(x) = x5 Figure 5: Simulations of optimized LDPC code with diﬀerential coding and iterative diﬀerential detection and decoding. we simulated only 15 iterations.5037 LDPC code with diﬀerential encoding and noncoherent diﬀerential decoding.5) = 4.6 Ie.g. 15 iter Conv.01. the messagepassing decoder takes a larger number of iterations to arrive at the zeroerror state. For reference..5 LDPC ensemble optimized using convergenceevolution for diﬀerential coding. Further. Degree proﬁle of the optimized LDPC ensemble: λ(x) = 0. dec. outer code of PA codes 0. optimized LDPC 0. all code realizations perform close to each other.4599x + 0.26 − 10 log10 (0. the leftmost curve corresponds to the optimized DELDPC code using ideal detection (perfect knowledge on the fading phases and amplitudes).4 Ia.o 0. and ρ(x) = x5 . fd Ts = 0. but with 10% pilot symbols. ideal.8 0.0855x12 + 0. dif. 100 iterations) is preferred to fully harness the code gain. dif.2283 dB asymptotically. 5.25 dB.0828x11 + 0. The three circled curves to the right of this ideal detection curve correspond to the noncoherent performance using iterative diﬀerential detection and decoding at the 5th. at long lengths.5 Eb /N0 (dB) 6 6.50 PA code which requires 1.7 0. dec.0855x12 + 0. 10%. Good realizations have improved neighborhood condition than others.0720x10 + 0.0828x11 + 0.01 EURASIP Journal on Wireless Communications and Networking (64 K. optimized LDPC. According to the concentration rule.2 0.. LDPC. the optimized LDPC ensemble is about 1. Compared to a rate 0. e. 15 iter Figure 4: EXIT chart of a rate 0.. Code rate 0. As mentioned before. that is.23 dB (discussed before).2703 dB (Figure 4). The Rayleigh channel and the inner diﬀerential decoder (the IDDD receiver) are the same as discussed in Figure 4.0672 + 0. including a larger girth (achieved. and 15th iteration. a smaller number of short cycles.i /Ie.. In practice.5 7 Opt. in order for the iterative process to converge successfully.01 100 10−1 1.26 dB.6 0. In the ﬁgure. 10th. 15 (global) iterations each with 6 (local) iterations in the outer LDPC decoding. We chose a long codeword length of N = 64 K to test how well the simulation agrees with the analytical threshold.5 dB 10−4 10−2 4. At short lengths.5037. showing a good theorypractice agreement. we also plot in Figure 5 the performance of a PA code and a conventional LDPC code without diﬀerential coding (recall that conventional LDPC codes perform worse with diﬀerential coding than without).4 dB 1. through the edge progressive growth algorithm). The optimized LDPC ensemble requires 0. degree proﬁle λ(x) = 0.8 Code design 1 0. and the power penalty due to the pilot symbols is also compensated for. We see that the performance of the optimized diﬀerentially encoded LDPC code performs only about 0.0495x9 + 0.1 0 0 0. However.0760x14 .5. blocks of N = 106 bits.4 0.g. 10.75 dB Analytical threshold 3 3.0807x13 + 0. 10%.8 1 fd Ts = 0. and they all tend to converge to the asymptotic threshold as length increases with bound.5037. These wasteful pilot symbols are included in this coherent detection case to oﬀset the curve. 32 K). the simulated performance is only 0. the concentration rule fails and the performance may vary rather noticeably from one code realization to another. as the tunnel between the inner and the outer EXIT curves becomes more narrow.5 and both noncoherently detected. 10% of pilot symbols are assumed to assist noncoherent diﬀerential detection. 10% pilots 0.0720x10 + 0.5 0. or a smaller trapping set. 10%. a large number of iterations (e.0264x8 + 0. we are also concerned with ﬁnitelength implementation or individual code realization. We see that the .0760x14 . 10%..5 4 1. both having code rate around 0.0264x8 + 0.o 0. 10% pilots BER 10−3 0.0495x9 + 0. The optimized LDPC ensemble is good in the asymptotic sense. however. nondif.3 dB worse than the coherent detection.9 0. which is very encouraging. 15 iter PA.5 5 5. 15 iter Opt.25 − 10 log10 (0. dec.01.0807x13 + 0.4599x + 0.i /Ia.0672 + 0.04 dB better asymptotically.5037) = 3.2 R=0. and to compare fairly with all the other noncoherently detected curves with 10% of pilot insertion. with inﬁnite or very long code lengths. 10% pilot insertion. Normalized Doppler rate is 0. normalized Doppler rate 0. but considering the complexity and delay aﬀordable in a practical system.3 R = 0.
“Convergence behavior of iteratively decoded parallel concatenated codes. no. Here in Part II. Ochiai. CONCLUSION 9 CCF0635199. “Joint noncoherent demodulation and decoding for the block fading Part I of this twopart series of papers [1] studied product accumulate codes. Richardson.Jing Li (Tiﬀany) PA code outperforms the conventional LDPC code by 1. Kramer.” Electronics Letters. [15] R. Woerner. 11. and A. 2004. P. P. vol. D. pp. a noncoherently detected system may or may not employ diﬀerential encoding. Further investigation shows that it is not only possible. “Design of lowdensity paritycheck codes for modulation and detection. R. and L. 10. The eﬀectiveness of the new DE method is conﬁrmed by the fact that the optimized DELDPC code brings an additional 1. no. R. and one should expect the optimized degree proﬁle to contain many degree1 (and degree2) variable nodes. no. A rather unexpected ﬁnding here is that a conventional LDPC code actually suﬀers in either case: in the former it was because of an EXIT mismatch between the (outer) LDPC code and the (inner) diﬀerential code. ACKNOWLEDGMENTS This research work supported in part by the National Science Foundation under Grant no. 23. 2001. M. vol.. Tatsunami. [10] A. pp. but the degree proﬁles need to be carefully (re)designed. Victoria.” IEEE Transactions on Communications.4 dB and 2. Shokrollahi and R. [7] M. 670– 678. 1–6. [8] M. L. Raheli. no. Narayanan. 5. [3] V. 49. as they contain (many) degree1 and 2 variable nodes. USA. Koetter. 1018–1020. to optimize an LDPC code to match to a diﬀerential decoder. 2388– 2392. 619–637. with coherent detection and especially noncoherent detection. G. N. Peleg and S. vol. [6] J. 5. pp.” IEEE Journal on Selected Areas in Communications. pp. In general. November 2007. pp. “Iterative channel estimation and decoding of pilot symbol assisted turbo codes over ﬂatfading channels. It provides a practical way to tune the shape and the position of an EXIT curve. vol. but the optimized DELDPC code outperforms the PA code by another 1. [12] J. 12. G. Washington. and A. 2001.” in Proceedings of the 63rd IEEE Vehicular Technology Conference (VTC ’06). Milstein. vol. ten Brink. pp. 5.e. 1. ten Brink. vol. [11] T. B. Siegel. no. “Extrinsic information transfer functions: model and erasure channel properties. [9] S. 2004. Calif. 1758–1768. “Design of capacityapproaching irregular lowdensity paritycheck codes. CCF0430634 and . pp. 4.” IEEE Transactions on Information Theory. vol. November 2006. [13] A. Melbourne. pp. Georghiades. respectively. [5] M. vol. 1997. “Diﬀerentiallyencoded LDPC codes: part I—special case of product accumulate codes. 50.” IEEE Transactions on Information Theory. [4] H. DC.5 dB. and S. and in the latter it was because of the large bandwidth expansion. May 2006. pp. Agrawal. Mitra and L. no. The resultant optimized degree proﬁles are rather nonconventional. Li. Hou.” in Proceedings of the IEEE International Symposium on Information Theory. The optimization is achieved through a new convergenceconstraint density evolution method developed here. pp. and C.” IEEE Journal on Selected Areas in Communications. 19. Kam. pp. The former leads to a diﬀerential encoding and noncoherent diﬀerential detection architecture. C.4 dB! 4. USA. vol. The remarkable performance of LDPC codes with coherent detection has been extensively studied. Ashikhmin. 2001.” in Proceedings of the 49th IEEE Global Telecommunications Conference (GLOBECOM ’06). Franceschini. Storn. pp.Y. and can therefore match an LDPC code to virtually any frontend processor. T. 924–934. no. We conclude by stating that LDPC codes can after all perform very well with diﬀerential encoding (or any other recursive inner code or modulation). Nam. H. for example. K. 31–46. R. This is in sharp contrast to the conventional LDPC case (i. 1727–1737. [2] J. “On the performance of LDPC codes with diﬀerential detection over Rayleigh fading channels. using. 2657–2673. and the latter requires the insertion of (many) pilot symbols in order to track the (fastchanging) channel well. The proposed DE optimization procedure is very useful.R. [14] S. Ashikhmin. Valenti and B. with the imperfectness of the processor taken into explicit consideration. but highly beneﬁcial.” to appear in EURASIP Journal on Wireless Communications and Networking . the convergenceconstraint density evolution developed here. 2001.” in Proceedings of the 50th IEEE Global Telecommunications Conference (GLOBECOM ’07). 33. Li.” IEEE Transactions on Communications. Xin. Here by conventional we mean the LDPC code that delivers a superb performance in the usual setting with coherent detection and possibly channel state information. G.9 dB. 3245–3249. coherent detection) where degree1 variable nodes are deemed highly undesirable. 9. 2004. ten Brink. J. “LDPC codes with BDPSK and diﬀerential detection over ﬂat Rayleigh fading channels. “Iterative decoding of coded and interleaved noncoherent multiple symbol detected DPSK. 19. Italy. Sorrento. Kramer. and R. Ishibashi. no. Chen. and Y. but much less work has been carried out on noncoherently detected LDPC codes. and H. It showed that PA codes perform very well in both cases. 2005. K. “Serial concatenation of LDPC codes and diﬀerential modulations. June 2000. no. Shamai. and D. p. a special case of diﬀerentially encoded LDPC codes. 1697–1705. we generalize the study from PA codes to an arbitrary diﬀerentially encoded LDPC code. and by the Commonwealth of Pennsylvania through the Pennsylvania Infrastructure Technology Alliance (PITA). “Design of eﬃcient erasure codes with diﬀerential evolution. Australia. Lampe. Ferrari. San Francisco. vol. U. 47. Curtoni. over the existing PA code and the conventional LDPC code (when noncoherent detection is used). Madhow. Shokrollahi. Urbanke.” IEEE Journal on Selected Areas in Communications. “Performance analysis and code optimization of low density paritycheck codes on Rayleigh fading channels. “Product accumulate codes: a class of codes with nearcapacity performance and low decoding complexity. REFERENCES [1] J. A. 9. 2.” IEEE Transactions on Information Theory. “Simple concatenated codes using diﬀerential PSK. 50. 52.
” IEEE Transactions on Communications. [16] S. 47. Richardson. T. Chung. pp. Storn and K. and R.ch/research/. [18] R. J. L. 51.” IEEE Transactions on Information Theory.epﬂ. no. Price. 657–670. pp. 4. 11. pp. “Analysis of sumproduct decoding of lowdensity paritycheck codes using a Gaussian approximation. 2.” Journal of Global Optimization. 1997. . 341– 359. 1676–1689. no. Urbanke. 2003. vol. vol. 2001. [17] http://lthcwww. no. “Diﬀerential evolution—a simple and eﬃcient heuristic for global optimization over continuous spaces.10 EURASIP Journal on Wireless Communications and Networking channel: a practical framework for approaching Shannon capacity.Y. vol. 10.
SA 5095. provided the original work is properly cited. The channels considered are generally either additive white Gaussian (AWGN) or binary erasure channels. We assume that phase variations are small over short blocks of adjacent symbols. In [6]. To illustrate the new method.Hindawi Publishing Corporation EURASIP Journal on Wireless Communications and Networking Volume 2008. The concept of smaller observation intervals in the presence of phase disturbances is attractive and oﬀers low complexity as well. This is an open access article distributed under the Creative Commons Attribution License. many real systems are aﬀected by phase noise (e. that occur at diﬀerent observation intervals. However. Over the past few years. iterative decoding for channels with phase disturbance has received lots of attention [2– 7]. In [2.edu. we present the performance results of an LDPC code constructed over Z4 with quadrature phase shift keying (QPSK) modulated signals transmitted over a static channel. Hence the performance of codes on the channels with phase disturbances are of practical signiﬁcance. They used blind and turbo phase estimators to provide a phase estimate for every subblock. Australia Correspondence should be addressed to Sridhar Karuppasami. Another part of the code estimates the phase ambiguity present in every observation interval. Karuppasami and W. 1. However. the authors tackled the phase ambiguity. DVBS2). Cowley Institute for Telecommunications Research. We propose an iterative decoding schedule to apply the sumproduct algorithm (SPA) on the factor graph of the code for its convergence. These schemes make use of pilot symbols for either estimation or decoding. but aﬀected by phase noise. which is modeled by the Wiener (randomwalk) process. as the observation interval get . Accepted 27 March 2008 Recommended by Branka Vucetic This paper presents the construction and iterative decoding of lowdensity paritycheck (LDPC) codes for channels aﬀected by phase noise.au Received 1 November 2007.. G.1155/2008/385421 Research Article Construction and Iterative Decoding of LDPC Codes Over Rings for PhaseNoisy Channels Sridhar Karuppasami and William G. each subblock is aﬀected by an ambiguity of 2π/M radians. Revised 7 March 2008. and reproduction in any medium.karuppasami@postgrads. The LDPC code is based on integer rings and designed to converge under phasenoisy channels. their approach based on Tikhonov distribution yields a good performance. using an inner diﬀerential encoder along with an LDPC code provides a loss in performance and the degree distributions of the LDPC code needs to be optimized [7]. In [4]. where “M” is the number of phase symmetries in the signal set. In [5]. In particular. Article ID 385421. Cowley. which permits unrestricted use. Intuitively. 9 pages doi:10. In general. By diﬀerentially encoding the subblocks independently. the authors have proposed algorithms to apply over a factor graph model that involves the phase noise process. University of South Australia. They used canonical distributions to deal with the continuous phase probability density functions. 3].unisa. the code construction techniques were motivated to provide a reduced encoding complexity and better biterror rate (BER) performance. INTRODUCTION In the past decade. sridhar. A part of the constructed code is inherently built with this knowledge and hence able to withstand a phase rotation of 2π/M radians. The code makes use of simple blind or turbo phase estimators to provide phase estimates over every observation interval. the authors developed algorithms for noncoherent decoding of turbolike codes for the phasenoisy channels. distribution. the authors used smaller observation intervals to tackle varying frequency oﬀset in the context of serially concatenated convolutional codes (SCCCs).g. the authors showed the rotational robustness of certain codes under a constant phase oﬀset channel with the presence of cycle slips only during the initial part of the codeword. The results show that the code can withstand phase noise of 2◦ standard deviation per symbol with small loss. The severity of the phase noise depends on the quality of the local oscillators and the symbol rate. plenty of work was done in the construction and decoding of LDPC codes [1]. Since the phase estimates obtained from the blind phase estimator (BPE) are phase ambiguous. Copyright © 2008 S. Mawson Lakes.
. under which the convergence of the global check nodes is guaranteed in the presence of phase ambiguities in any subblock. 1. where N and K represent the length and dimension of the code and ZM denote the integers {0. we discuss the channel model considered for our simulations. we explain the receiver architecture and detail the iterative decoding for the convergence of these codes. we address the eﬀects of phase ambiguity on the check nodes and discuss the construction of global and local check nodes. ⎥ ⎢ . Under a noiseless channel.. In Section 5. ⎥ ⎢ . Hence with reduced subblock sizes. θk . . 2. provide good performance under phasenoisy channels with low complexity. After the convergence of GCN. we used only one LCN per subblock to resolve the phase ambiguity present in the subblock. . .. and nk are the kth component of the vectors r. The received symbol vector was split into many subblocks and BPE was used to provide a phase estimate across every subblock. However. 3}. ⎢ ⎥ ⎢. we have rk = sk e jθk + nk . 2. Turbo phase/frequency estimates (e. we found that the extension of the above approach to higherorder modulations was very diﬃcult with a binary LDPC code. The alphabets over ZM are mapped onto complex symbols s using phase shift keying (PSK) modulation with M phase symmetries. the phase ambiguity estimate is less reliable and the code suﬀers performance degradation. .g. M − 1} under addition modulo M. we present generalized edge constraints for the local check node such that they are able to resolve the phase ambiguity in the subblock. . respectively. We conclude in Section 8 by summarizing the results of this paper. ⎦ . . [9]) are obtained during iterations to facilitate the convergence. we show the performance of an LDPC code constructed over Z4 with codewords mapped onto QPSK modulation. Ideal timing and frame synchronization are assumed and henceforth. respectively. 1. . along with subblock phase estimation techniques. Local check nodes are odd degree check nodes connected to the variable nodes present within a single subblock. . we present edge constraints based on integer rings generalized for any phasesymmetric modulation scheme. we construct LDPC codes over rings with certain constraints on the placement of edges and edge gains such that they. This paper addresses the problem of extending the above code construction technique to higherorder signal constella An information sequence is encoded by an (N. The complex symbols are transmitted over a channel aﬀected by carrier phase disturbance and complex additive white Gaussian noise. . In this small example. we discuss the beneﬁts of the blind phase estimator in reducing the computational complexity involved with the turbo phase estimation and also show the BER performance of the lowcomplexity iterative receiver with the Z4 code under phase noise conditions. . The quality of the phase ambiguity estimate is better with more LCNs. we incorporated the observation that. On the other hand. To illustrate the concepts discussed in this paper under a phasenoisy channel. In Section 4. if the phase estimation error is smaller than π/M.. the LCN degree (dc ) is three and if the subblock size (Nb ) is six symbols. the parity L check matrix provides Nb /dc = 2 LCNs to resolve the phase ambiguity in each subblock 111000000000 ⎢ ⎥ ⎢000111000000⎥ ⎢ ⎥ ⎢ ⎥ ⎢000000111000⎥ ⎢ ⎥ ⎢ ⎥ H = ⎢000000000111⎥ . the local check nodes correspond to the top four rows of the paritycheck matrix. but with a diﬀerent perspective. The remainder of the paper is organized as follows. we explain code construction and present a matrix inversion technique to obtain the generator matrix.2 EURASIP Journal on Wireless Communications and Networking tions based on integer rings. s. the decoder may be able to converge correctly. In particular. of length Ns . We created a set of check nodes called “global check nodes” (GCNs) that converge irrespective of phase rotations (0 or π radians) in any subblocks. we used subblocks in a binary LDPCcoded receiver to tackle residual frequency oﬀset. . We did not make use of pilot symbols and the complexity is low. 8]. Speciﬁcally. In (1). the method provided a good performance for BPSK signals. constructing global check nodes that converge irrespective of a phase rotation (a multiple of 2π/M radians) in the subblocks was diﬃcult. even under large phase disturbances the variation in phase over adjacent symbols are normally small. rk . . and n. In Section 7. . In Section 3. 2. . . . . We used BPE or TPE to provide a phase estimate in each subblock. CHANNEL MODEL smaller more phase variation may be tackled. In particular. . we addressed the problem of phase noise for BPSK signals in the presence of a binary LDPCcoded system [10]. Following [6. . θ. where the transmitted symbol sk ∈ {sm = k e j((π/2)m+π/4) }. all the simulations assume one sample per symbol. The noise samples nk contain uncorrelated real and imaginary parts with zero mean and twosided power spectral density (PSD) of N0 /2. . . . after matched ﬁltering and ideal sampling. K) nonbinary LDPC code constructed over integer rings (ZM ). However. m = {0.. Ns − 1. Similarly. with a binary LDPC code. (2) where sk . We introduced the concept of “local check nodes” (LCNs) to resolve the phase ambiguity created by the BPE on the subblocks. in which the bottom (dotted) part is connected according to random L construction. ..⎥ ⎢ ⎣ ⎡ ⎤ (1) . 1. phase estimators produce poor estimates with smaller observation intervals. In Section 6. In Section 2. . We found that even under relatively large phase noise and observation intervals. We also show the additional computational complexity required due to the phase estimation process. In our earlier work [8]. . . k = 0. ⎥ The phaseambiguityresolved vector is decoded by an LDPC decoder. At the receiver. we discuss the BER performance of the proposed receiver under phase noise conditions using the code constructed over Z4 for QPSK signal set. . ..
2. The proposed receiver tackles modest to high levels of phase noise. Considering all the operations below are modulo M. . θ0 is generated uniformly from the distribution (−π. Example 1. 1. k = 1. j x j = 0 (mod M). all symbols in that subblock are rotated by 2πt/M radians. j from the set of nonzero divisors of ZM ({1. 2π/M. For LDPC codes over higherorder rings. where t ∈ {0. 2. Let Hi. Note that each subblock has one edge with value “1” and another with “3. j =1 (7) + nk . j be the elements of the parity check matrix participating in the ith check node such that. If modulo M is used. 1. (6) becomes zero only if p Hi. For instance. j =1 (5) where dc is the degree of the check node. If all the symbols x j participating in ith local check node are rotated by 2πt/M radians. j x j + t = j =1 Hi. (6) Thus for arbitrary integer t. One set of symbols that satisﬁes this check is x = [3. l = 0. . x j is the jth symbol participating in the ith check node and the value of Hi. π). 1. Let the elements Hi. the check equation in (5) becomes G dc p G dc p where Δk is a white real Gaussian process with a standard deviation of σΔ . p can either be odd or even depending on the values of Hi. the edges of the GCN are spread across many subblocks. 1. It can be seen that xr still satisﬁes the parity check equation with the same g. . j t = t j =1 j =1 Hi. j = 0 (mod M). j. . even.1. As a result. π) can be represented as the summation of an ambiguous phase oﬀset φl ∈ (−π/M. 3. M − 1}. 3]. . 3. B − 1. p becomes even in the G case of LDPC code over integer rings which further makes dc as well. π/M) and the phase ambiguity αl ∈ {0. j is chosen from the set of nonzero divisors from ZM . . respectively. we use the approximate model in (4) for the code construction and receiverside processing. Karuppasami and W. Say. . p should be even in order to satisfy (7). j t + j =1 Hi. respectively. 3} from Z4 ) to avoid problems during matrix inversion. k = 0. we denote the degree of the GCN and LCN as G L dc and dc . (9) . j . j x j = j = p+1 j =1 Hi. j = 0 (mod 2). θl ∈ (−π.3◦ per symbol [2]). While the channel model in (2) is used in our simulations. As a result. In this work. 3]. (8) represents that the element j =1 Hi. . . Assuming small phase variations over adjacent symbols. the phase noise considered in this paper (Wiener model with σΔ of 1◦ and 2◦ ) is several times larger than the phase noise mentioned in the European Space Agency model (Wiener model with σΔ = 0. (4) where k = Nb l + k. the subblock rotated version of x.S. In the remaining subsections. we select the values of Hi. Global check nodes Unlike local check nodes. EFFECT OF PHASE AMBIGUITIES ON THE CHECK NODES In the case of binary LDPC code. we can show that for every t there exists a distinct residue (mod M) which provides a solution for the phase ambiguity present on the participating symbols x j . L dc Hi. the check node will not resolve certain phase ambiguities as explained below.” whose sum is 0 (mod 4) as required by (7). the received sequence can be expressed as rk sk e j θl Hi. (3) p 3 block. . L dc L dc L dc L dc Hi. G. j x j + j =1 Hi. 1. Similar to (2). Local check nodes In this section. Cowley The phase noise process θk is generated using the Wiener (randomwalk) model described by θk = θk−1 + Δk . Let us divide the received symbol vector r of length Ns into B subblocks of length Nb . j is chosen from the nonzero elements of ZM . due to the assumptions made in (4). 3. 1. Consider a degree8 GCN whose edges are connected to two symbols per subblock (p = 2) and the corresponding edge gains be g = [1. we address the eﬀect of phase rotations that are multiples of 2π/M radians on the global and local check nodes of an LDPC code constructed over ZM . 2(M − 1)π/M }. dc Local check nodes resolve the phase ambiguity present in a subblock. 1. j . 3. 0. Assume an LDPC code constructed over Z4 with B = 4 subblocks. say xr = [0. . 3. Nb − 1. 3. 1]. The approximate phase oﬀset over lth subblock. such as the Wiener model with σΔ = 6◦ per symbol in [2. 3]. 1. . 1. .2. j participating in check i be selected from a single subblock such that. 3. Let p be the number of global check node edges connected to symbols present within one sub c Alternatively. j x j + t + j =1 Hi. 3. which is achieved by performing the summation over modulo 2 rather than M. we may approximate the phase variations on the symbol in the lth subblock by a mean phase oﬀset θl ∈ (−π. . . 3. . the proposed receiver will not be able to tackle large amounts of phase noise. . then using (5) and (8). j x j = t j =1 Hi. 1. Ns − 1. However. . Therefore. . 2. 3. / j =1 dL (8) Hi. . Let us assume that subblock one and four are rotated by π/2 and π radians. 3. 4π/M. π).
. We construct the code as per the following procedure. . The same technique is continued to construct the remaining global check nodes in the dotted part of the matrix. . The ﬁrst four rows corresponding to the local check nodes (Hresolving ) are shown at the top. j Hi. NONBINARY CODE CONSTRUCTION L dc j =1 Hi. Even though the edge gains of the parity check matrix are nonzero divisors. .. Similar to [11]. (i... The lower (N − K − B × N) part of the matrix. we can arbitrarily choose the set of dc symbols from any part of the subblock. As in Example 1.. let us say we have “B” subblocks of length Nb . 1] are connected to the ﬁrst subblock. L For example. 1]. j with a multiplicative inverse in ZM ensures phase ambiguity resolution. we construct a binary code and choose the nonzero divisors from ZM as edge gains such that check conditions as described in Section 3 are satisﬁed. . . . called Hconverging . we interchanged columns across the parity check matrix such that we obtain a generator matrix (G) corresponding to the columninterchanged parity check matrix (H ). the parity checks in which the symbol participates are randomly chosen based on its degree and (7). . . we encountered zero divisors ({2} in the case of Z4 ) during GE in the diagonal part of the matrix.. . . . 4. To avoid this problem. . Since we wanted to use the original H matrix instead of H . (12) 4.2. . However. ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 131000000000000000000000 000000111000000000000000 ⎤ . let Hi j = 1 where j corresponds to the ﬁrst 3 columns of each subL block.. a small parity check matrix (H) is shown in (12).1. Hence the degree of the local check node dc is always considered to be odd in this work. Let the code include a degree3 LCN whose edges with gains [1. .. . Care was taken to avoid short cycles after constructing every column. Local check nodes (shaded checks) and their edges (solid lines) are distinguished from the global check nodes and their edges (dashdotted lines) ⎡ −1 t= j =1 Hi. with two diﬀerent degrees. we can evaluate that t = 1 which corresponds to π/2 radians. Thus choosing L dc j =1 Hi... 2]. j We apply the above set of principles in constructing codes that are beneﬁcial in dealing with phase noise channels. The two rows below the local check nodes are connected globally and also have p = 2 edges connected to symbols from a subblock.. Code construction Following Section 2.. . However. . xr = [0. contains N − K − B check nodes whose neighbours are selected such that their convergence is independent of the phase ambiguities in the subblock. Due to the rotation of π/2 radians in the ﬁrst subblock. 3. The local and the global check nodes shown in the ﬁrst and ﬁfth rows of the Hmatrix are used in the previous examples. . we created a permutation table (P) to record the columns . 4.. is shown in Figure 1. . . .. called Hresolving . involves B local check nodes in contrast to Nb /dc LCNs as in our previous method [8]. which are odd integers less than M. . . there is no constraint on the variable node dc degree. . . . . The restriction of two edges per subblock provides a better connectivity in the code. Some comments on encoding The upper (B × N) part of the matrix. The codes are designed L to be check biregular. . (10) In case where the do not have a multiplicative inverse in ZM (say equals a zero divisor). Further. A portion of the Tanner graph of the H matrix. A set of symbols that satisﬁes this check is x = [3. . Example 2. .. . H=⎢ ⎢ ⎥ 100300010300030100001030 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 010300031000003100130000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . by selecting the edge gains of the LCN from the nonzero divisors of ZM . A binary parity check matrix HN −K ×N is constructed such that it involves two parts: ⎡ ⎢ Hresolving ⎤ ⎥ H = ⎢ · · · · · · · · · · · ·⎥ . .. then (9) is satisﬁed for any t ∈ {zero divisors in ZM } and hence the phase ambiguity estimate is not unique. j x j + t × j =1 L dc j =1 Hi. To illustrate the local and global check nodes. j (mod M). (2) Construction of global check nodes: for every symbol. . in (12). .e. We used the Gaussian elimination (GE) approach to obtain a systematic generator matrix. ..4 Hence t can be written as L dc L dc EURASIP Journal on Wireless Communications and Networking (1) Construction of local check nodes: the edges of the local L check node are connected to the ﬁrst dc symbols of the subblock for which it resolves the phase ambiguity. . . . ... Using (10). 0. . We assume the degree of all the local (global) L G check nodes to be equal to dc (dc ).. every global check node participates in only two symbols from a subblock. ⎣ ⎦ Hconverging (11) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 000000000000333000000000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 000000000000000000313000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·⎥ ⎢ ⎥ ⎥. which are used to resolve the phase ambiguity in B subblocks. Let us consider the code and rotations as in Example 1. dc and G ).. ⎢ ⎥ . assuming dc = 3. . 1.. . we require an odd number of edges to L satisfy (8).
Cowley (1) Converging phase. Alternate inversion techniques may avoid the use of permutation table P. phase estimators provide an ambiguous phase estimate and hence the SPA is applied only over the rotationally invariant part of the factor graph. (2) Resolving phase. The messages coming from these nodes are assigned to be equiprobable. the graph involving global check nodes only. The received k symbol vector corresponding to lth subblock is corrected using the turbo phase estimate φl . Thus we split up the decoding into three phases as described below. Turbo phase estimation or phase ambiguity resolution is is not required at this phase. RECEIVER ARCHITECTURE AND ITERATIVE DECODING SCHEDULE where p(sk = sm  rk ) is the a posteriori probk ability that symbol sk = sm . These hard decisions are used to evaluate the subblock phase ambiguity estimates αl = 2πt/M using local check nodes as in (10). The codeword is transmitted through the composite channel. . We used the SPA algorithm for LDPC codes over rings. . as deﬁned in (4). 5 L LCN (dc = 3) p=2 B=4 (a) The likelihood vector. the decoder decodes the codeword. . the SPA may be applied over the entire code for convergence. similar to [12]. Ns − 1} and σ 2 is the noise variance. . . However. Since the permutedencoded symbols are the codewords of the original code H. However. 5. k = k {0. Karuppasami and W. that is. turbo phase estimation can provide a phase estimate in the range (−π. the turbo phase estimator (TPE) [9] estimates the phase oﬀset φl . In general. The decoded codeword c is again permuted to give the original codeword c. Comments on turbo phase estimation The receiver architecture to tackle large phase disturbances is shown below in Figure 3. a hard decision is taken on the symbols. which are further used to correct the received symbol values. The codeword c undergoes inverse permutation to produce c . the decoder converges at the end of this stage. 1. (13) G GCN (dc = 8) where k . (3) Final phase. for the kth variable node is initialized with the channel likelihoods.S. M − 1}. (a) As the symbol a posteriori probabilities at the variable nodes are good enough at the end of converging phase. . (d) The likelihoods are recalculated from r after phase correction and are used to update the messages that are passed on to the global check node. in the presence of phase disturbances. Local check nodes are not used. 1. of length M. The message is encoded by the generator matrix G to produce the codeword (c). (a) If required. A summary of the communication system used in the simulations is given in Figure 2. . k k . This section discusses the application of SPA on the factor graph of the code with phase oﬀset on every subblock such that the beneﬁts of local and global check nodes are achieved. (e) Steps (a)–(c) are repeated until all the global check nodes are satisﬁed. In general. which is given by φl = arg rk a∗ .1. (c) After every d iterations. during the converging . The soft symbol estimate ak of the symbol sk is given by M −1 ak = Figure 1: Tanner graph of the H matrix in (12). (b) The SPA is applied over the Hconverging part of the code alone. G. . SPA is continued over the entire code involving both Hresolving and Hconverging until T either the syndrome (H c = 0) is satisﬁed or a speciﬁed number of iterations are reached. which corresponds to (x j + t) in (10). . π). . In the case of an AWGN channel. p(rk  sk = sm ) = (1/2πσ 2 )exp{−(rk − k sm 2 )/2σ 2 }. 5. k k (14) that are interchanged during inversion. giving r . illustrating local and global check nodes. m=0 sm p sk = sm  rk . where m = {0. is the kth component in the lth subblock and a∗ is the complex k conjugate of the soft symbol estimate.
Reducing the computational complexity of the nonbinary LDPC decoder is an active area of research [13. (15) . Given k the a posteriori probability vector of length M. Thus the total complexity involved for estimating and correcting a symbol for its phase oﬀset using a turbo phase estimator per iteration (O TPE ) is given as OTPE = [2M + 8]× + 2M + 4 − 2B Ns + + B Ns LUT . the decoder converges to a codeword which is rotationally equivalent to the transmitted codeword. Hence the turbo phase estimator provides a phase estimate whose range lies between (−π/M. The computational complexity of the proposed LDPC receiver can be evaluated as the summation of the complexities of the LDPC decoder and the phase estimator/ambiguity resolver. π/M). for the kth symbol. Since the decoding algorithm works in the probability domain.(2) Message G c P −1 c LDPC receiver (See Fig. Over subblocks e − j αl Phase ambiguity resolver Yes r r r LDPC decoder Are all GCN satisﬁed ? No e − j φl Turbo phase estimator Over subblocks Delay by d iterations Figure 3: Proposed LDPC receiver architecture.6 EURASIP Journal on Wireless Communications and Networking Mapper & channel model as in eq. 2(Ns − B) real additions and B lookup table (LUT) access for evaluating the arg function. which shows the mean trajectories of the turbo phase estimates over a subblock of 100 symbols at an Eb /N0 = 2 dB under a constant phase oﬀset (θ). we require 2(M − 1)Ns real additions and 2MNs real multiplications. Mean turbo phase estimate (degrees) 30 20 10 0 −10 −20 −30 5. To estimate Ns soft symbol estimates. Given the soft symbol estimates. This is illustrated in Figure 4. Computational complexity 0 5 10 15 Number of iterations θ = 60◦ θ = 75◦ θ = 90◦ 20 25 θ = 0◦ θ = 15◦ θ = 30◦ Figure 4: Evolution of turbo phase estimates over subblocks during convergence. the evaluation of turbo phase estimate for B subblocks requires an additional 4Ns real multiplications.2. In this paper. 3) c P c Figure 2: Communication system. Correcting every symbol by the turbo phase estimate requires 4 real multiplications and 2 real additions. phase of this code. the a posteriori probability of the symbols p(sk = sm  rk ) are directly available from the decoder. we concentrate only on the additional complexity involved in the receiver due to the turbo phase estimation in (13) and ambiguity resolution. The computational complexity of the nonbinary LDPC decoder is dominated by the check node decoder with O(M 2 ) operations. the soft symbol estimate of the symbol sk can be calculated according to (14). 14].
we show that the computational complexity involved with the turbo phase estimation can be reduced by using a blind phase estimator just once. in the current work. However.0067x4 + 0. Thus the additional complexity of the receiver. thereafter with a small loss.4 Eb /N0 (dB) AWGN σ Δ = 0◦ σ Δ = 1◦ σ Δ = 2◦ Figure 5: Performance of the proposed receiver in Figure 3 with QPSK and the Wiener phase model.14 Probability of decoder convergence 0.02 0 0 20 40 60 80 100 120 140 Number of iterations 160 180 200 BPE + TPE (d = 10) TPE (d = 10) TPE (d = 1) Figure 6: Convergence improvement due to an initial blind phase estimator. We replaced the edge gains of this code from the nonzero divisors of Z4 such that they follow the constraints discussed in Section 3.12 0. 0.08 0. Simulations are performed either until 100 codeword errors are found or up to 500. 7. we are able to delay the PAR on the subblocks since the code can converge with the phase ambiguous estimates obtained from the TPE alone.2 1. assuming d = 1. less than 40 iterations are required for convergence. 6. [·]+ . LOWER COMPLEXITY ITERATIVE RECEIVER 1 1. by employing an initial BPE for coarse phase estimation and correction of the subblocks. The complexity involved in resolving phase ambiguity per symbol is very small. the code is able to tolerate a phase noise with σΔ = 2◦ per symbol.5 for a subblock size of Nb = 100 symbols.3 dB. BER PERFORMANCE OF THE PROPOSED RECEIVER 100 10−1 10−2 BER 10−3 10−4 10−5 10−6 7 We constructed a binary LDPC code of N = 3000.000 transmissions.2 2. respectively. The degree distributions of this binary code were obtained through EXIT charts [15] such that they converged at an Eb /No of 1. For a constant phase oﬀset. The code corresponds to B = 30 subblocks over the codeword. In the case of the LDPC code described in Section 6. Iterations are performed until the codeword converges. Simulation results in Figure 5 show the performance of our receiver in Figure 3 under phase noise conditions.1. real additions. is relatively small. the number of iterations required for convergence can be reduced. in terms of node perspective. 7.S. respectively. we used a BPE to provide phase estimate for every subblock of symbols before resolving PAR using the local check nodes. by . and lookup table access.1 dB with a Wiener phase noise of 1◦ standard deviation per symbol. we found that the code with subblock size of 100 symbols gives the best BER performance for the amounts of phase noise considered in this paper. Also phase ambiguity resolution is required only once per decoding. Hence in our earlier work [8].1887x8 and ρ(x) = 0.98x8 . the additional complexity per symbol per iteration is approximately equivalent to ([16]× + [12]+ ) operations. Figure 6 illustrates the beneﬁt of blind phase estimation at an Eb /N0 = 2. However. only during the converging phase. we found that on an average in the waterfall region. The variable node and check node distributions. or to a maximum of 200 iterations.8047x3 + 0. Through simulations. Turbo phase estimation was done after every iteration (d = 1). which did not include local check nodes during the convergence phase and the degraded performance of the turbo phase estimator with reduced subblock size. there is a small degradation of around 0. mainly due to turbo phase estimation. In this section. However. It also shows that the computational complexity due to TPE can be reduced.06 0. before the iterative receiver proposed in Figure 3. R = 0. G. Cowley where [·]× .02x3 + 0.4 1.6 1. This loss is due to the proposed application schedule of SPA on the code. K = 1500. approximately a factor of 10.1 0. Hence the proposed architecture does not require the use of a blind phase estimator.04 0. Karuppasami and W. Comments on initial phase estimation The performance of the LCNbased phase ambiguity resolution (PAR) algorithm degrades with the amount of phase oﬀset present on the symbols participating in the LCN.8 2 2. However. were λ(x) = 0. and [·]LUT correspond to the number of real multiplications.3 dB from the coherent performance at a BER of 10−5 .
However. Colavolpe. 1. The result compares three distinct cases with the normal receiver.” IEEE Transactions on Communications. 23. no. [4] I. and G. Gallager. Franceschini.” in Proceedings of the 7th Australian Communications Theory Workshop (AusCTW ’06). performing turbo phase estimation only once in every 10 iterations shows signiﬁcant degradation. Caire. turbo phase estimates were obtained once in 10 iterations (d = 10). 117– 129. vol. Barbieri. no. 171–179. Lottici. pp. following which the phasecorrected symbol vector was fed into the iterative receiver in Figure 3. Steven S. et al. A. pp. The subblock size used in the simulation results shown earlier. pp. 51.2 2. “Joint iterative detection and decoding in the presence of phase noise and frequency oﬀset.. pp.” IEEE Transactions on Communications. At σΔ = 2◦ per symbol. vol. [3] A. vol. no.8 100 10−1 10−2 BER 10−3 10−4 10−5 EURASIP Journal on Wireless Communications and Networking block phase estimation techniques of low complexity. [6] W. 2003. pp. Cowley and M. 47–50. where turbo phase estimation was performed in every iteration. As part of our future work. R. 7. “A theoretical framework for softinformationbased synchronization in iterative (turbo) receivers. 12.2 1. Ferrari. Noels. “Transmission design for Dopplervarying channels. During the convergence phase. The code was constructed using the new constraints outlined in Section 3 such that it not only converges under subblock phase rotations. 1748–1757. 2001–2010. Pietrobon on this topic and also thank reviewers for their useful comments. vol.” IEEE Journal on Selected Areas in Communications. Adelaide. pp. “Polynomialcomplexity noncoherent symbolbysymbol detection with application to adaptive iterative decoding of turbolike codes. no. A.2. 1 1. and A. 55. Curtoni. CONCLUSION In this paper. Cowley.05 dB. 21–28.” EURASIP Journal on Wireless Communications and Networking. Raheli. 2. [5] R. 1758–1768. we addressed the problem of LDPC codebased iterative decoding under phase noise channels from a code perspective. The blind phase estimator was used to estimate and correct the phase disturbance present in each subblock of the received symbol vector. 110– 113. Caire. Perth. the performance can be improved by including turbo phase estimation for more iterations. “LDPC codeaided phaseambiguity resolution for QPSK signals aﬀected by a frequency oﬀset. We also showed the property of ringbased check nodes under the presence of phase ambiguity based on their edge gains in a generalized manner. using the BPE once before the iterative receiver and then periodically using the turbo phase estimator. G. 2007. 9. till 10th iteration and then d = 10) ACKNOWLEDGMENTS The authors wish to acknowledge helpful discussions with Dr. and G. REFERENCES [1] R. 2005. has not been optimized and we believe that the method can be extended to adjust the observation interval and phase model depending on the amount of phase noise. 2005. pp. Anastasopoulos.4 Eb /N0 (dB) AWGN TPE (d = 1) BPE + TPE (d = 10) TPE (d = 10) TPE (d = 1. BER performance The code described in Section 6 was used to simulate the BER performance of the iterative receiver with low computational complexity. 1962. MotedayenAval and A. G. Figure 7 shows the advantage of a blind phase estimator in terms of BER performance. Ho. [9] N. vol. 8. 2003. without blind phase estimator. 9. Australia. February 2007. 1. February 2006. [7] M. Karuppasami and W.8 2 2.4 1.” IEEE Journal on Selected Areas in Communications. no. We proposed construction of ringbased codes for higherorder modulations that work well with sub . “Low density paritycheck codes. G. 2005. during which the LDPC decoder provides a lot of new information regarding the symbols. but also estimates them. Anastasopoulos. [2] G. pp. no. Barbieri. pp. Figure 7: Performance of the low complexity receiver discussed in Section 7 under phase noise with σΔ = 2◦ per symbol. G. S. we are looking at ways to construct code without explicitly constructing local check nodes for PAR.” IEEE Transactions on Information Theory. “Rotationally invariant and rotationally robust codes for the AWGN and the noncoherent channel. Australia. Colavolpe. C. Nuriyev and A. [8] S.” in Proceedings of the 8th Australian Communications Theory Workshop (AusCTW ’07). As shown. no.” IEEE Transactions on Communications. vol.6 1. 197– 207. vol. particularly the early stages of the decoder. 51. Dejonghe. The presence of blind phase estimator allows us to include turbo phase estimator only once in every 10 iterations with a small loss of 0. 8. V. “Serial concatenation of LDPC codes and diﬀerential modulations. 2005. 2. “Algorithms for iterative decoding in the presence of strong phase noise. 23.
Voicila.” in Proceedings of the 67th IEEE Vehicular Technology Conference (VTC ’08). and P. UK. MacKay. Kramer. Cowley.” in Proceedings of IEEE International Conference on Communications (ICC ’07). D. Sridhara and T. no. 9 . 165–167. G. vol. Karuppasami. Cowley [10] S. pp. 2. “Lowcomplexity. pp. pp. vol. Singapore. 671–676.” IEEE Transactions on Communications. M. June 2007. 52. and S. Davey and D. [12] M. C. 2007.S. no. [15] S. 4. vol. 2004. and A. 1998. May 2008. “Decoding algorithms for nonbinary LDPC codes over GF(q)