P. 1
Art Pap

Art Pap

|Views: 6|Likes:
Published by monamomeni

More info:

Published by: monamomeni on Oct 16, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/16/2010

pdf

text

original

a

r
X
i
v
:
c
s
.
I
T
/
0
6
0
7
0
6
4

v
1



1
3

J
u
l

2
0
0
6
1
How to Find Good Finite-Length Codes:
From Art Towards Science
Abdelaziz Amraoui

, Andrea Montanari

and Ruediger Urbanke

Abstract—We explain how to optimize finite-length LDPC
codes for transmission over the binary erasure channel. Our
approach relies on an analytic approximation of the erasure
probability. This is in turn based on a finite-length scaling result
to model large scale erasures and a union bound involving
minimal stopping sets to take into account small error events. We
show that the performances of optimized ensembles as observed
in simulations are well described by our approximation. Although
we only address the case of transmission over the binary erasure
channel, our method should be applicable to a more general
setting.
I. INTRODUCTION
In this paper, we consider transmission using random el-
ements from the standard ensemble of low-density parity-
check (LDPC) codes defined by the degree distribution pair
(λ, ρ). For an introduction to LDPC codes and the standard
notation see [1]. In [2], one of the authors (AM) suggested that
the probability of error of iterative coding systems follows a
scaling law. In [3]–[5], it was shown that this is indeed true for
LDPC codes, assuming that transmission takes place over the
BEC. Strictly speaking, scaling laws describe the asymptotic
behavior of the error probability close to the threshold for
increasing blocklengths. However, as observed empirically
in the papers mentioned above, scaling laws provide good
approximations to the error probability also away from the
threshold and already for modest blocklengths. This is the
starting point for our finite-length optimization.
In [3], [5] the form of the scaling law for transmission over
the BEC was derived and it was shown how to compute the
scaling parameters by solving a system of ordinary differential
equations. This system was called covariance evolution in
analogy to density evolution. Density evolution concerns the
evolution of the average number of erasures still contained in
the graph during the decoding process, whereas covariance
evolution concerns the evolution of its variance. Whereas
This invited paper is an enhanced version of the work presented at the
4
th
International Symposium on Turbo Codes and Related Topics, M¨ unich,
Germany, 2006.
The work presented in this paper was supported (in part) by the National
Competence Center in Research on Mobile Information and Communication
Systems (NCCR-MICS), a center supported by the Swiss National Science
Foundation under grant number 5005-67322.
AM has been partially supported by the EU under the integrated project
EVERGROW.

A. Amraoui is with EPFL, School of Computer and Communication Sci-
ences, Lausanne CH-1015, Switzerland, abdelaziz.amraoui@epfl.ch

A. Montanari is with Ecole Normale Sup´ erieure, Laboratoire de Physique
Th´ eorique, 75231 Paris Cedex 05, France, montanar@lpt.ens.fr

R. Urbanke is with EPFL, School of Computer and Communication Sci-
ences, Lausanne CH-1015, Switzerland, ruediger.urbanke@epfl.ch
Luby et al. [6] found an explicit solution to the density evolu-
tion equations, to date no such solution is known for the system
of covariance equations. Covariance evolution must therefore
be integrated numerically. Unfortunately the dimension of the
ODE’s system ranges from hundreds to thousand for typical
examples. As a consequence, numerical integration can be
quite time consuming. This is a serious problem if we want
to use scaling laws in the context of optimization, where the
computation of scaling parameters must be repeated for many
different ensembles during the optimization process.
In this paper, we make two main contributions. First, we
derive explicit analytic expressions for the scaling parameters
as a function of the degree distribution pair and quantities
which appear in density evolution. Second, we provide an
accurate approximation to the erasure probability stemming
from small stopping sets and resulting in the erasure floor.
The paper is organized as follows. Section II describes
our approximation for the error probability, the scaling law
being discussed in Section II-B and the error floor in Sec-
tion II-C. We combine these results and give in Section II-D
an approximation to the erasure probability curve, denoted
by P(n, λ, ρ, ǫ), that can be computed efficiently for any
blocklength, degree distribution pair, and channel parameter.
The basic ideas behind the explicit determination of the
scaling parameters (together with the resulting expressions)
are collected in Section III. Finally, the most technical (and
tricky) part of this computation is deferred to Section IV.
As a motivation for some of the rather technical points to
come, we start in Section I-A by showing how P(n, λ, ρ, ǫ)
can be used to perform an efficient finite-length optimization.
A. Optimization
The optimization procedure takes as input a blocklength
n, the BEC erasure probability ǫ, and a target probability of
erasure, call it P
target
. Both bit or block probability can be
considered. We want to find a degree distribution pair (λ, ρ) of
maximum rate so that P(n, λ, ρ, ǫ) ≤ P
target
, where P(n, λ, ρ, ǫ)
is the approximation discussed in the introduction.
Let us describe an efficient procedure to accomplish this
optimization locally (however, many equivalent approaches are
possible). Although providing a global optimization scheme
goes beyond the scope of this paper, the local procedure was
found empirically to converge often to the global optimum.
It is well known [1] that the design rate r(λ, ρ) associated
to a degree distribution pair (λ, ρ) is equal to
r(λ, ρ) =1 −
¸
i
ρi
i
¸
j
λj
j
. (1)
2
For “most” ensembles the actual rate of a randomly chosen
element of the ensemble LDPC(n, λ, ρ) is close to this design
rate [7]. In any case, r(λ, ρ) is always a lower bound. Assume
we change the degree distribution pair slightly by ∆λ(x) =
¸
i
∆λ
i
x
i−1
and ∆ρ(x) =
¸
i
∆ρ
i
x
i−1
, where ∆λ(1) = 0 =
∆ρ(1) and assume that the change is sufficiently small so that
λ + ∆λ as well as ρ + ∆ρ are still valid degree distributions
(non-negative coefficients). A quick calculation then shows
that the design rate changes by
r(λ + ∆λ, ρ + ∆ρ) −r(λ, ρ)

¸
i
∆λ
i
(1 −r)
i
¸
j
λj
j

¸
i
∆ρ
i
i
¸
j
λj
j
. (2)
In the same way, the erasure probability changes (according
to the approximation) by
P(n, λ + ∆λ, ρ + ∆ρ, ǫ) −P(n, λ, ρ, ǫ)

¸
i
∆λ
i
∂ P
∂λ
i
+
¸
i
∆ρ
i
∂ P
∂ρ
i
. (3)
Equations (2) and (3) give rise to a simple linear program to
optimize locally the degree distribution: Start with some initial
degree distribution pair (λ, ρ). If P(n, λ, ρ, ǫ) ≤ P
target
, then
increase the rate by a repeated application of the following
linear program.
LP 1: [Linear program to increase the rate]
max{(1 −r)
¸
i
∆λ
i
/i −
¸
i
∆ρ
i
/i |
¸
i
∆λ
i
= 0; −min{δ, λ
i
} ≤ ∆λ
i
≤ δ;
¸
i
∆ρ
i
= 0; −min{δ, ρ
i
} ≤ ∆ρ
i
≤ δ;
¸
i
∂ P
∂λ
i
∆λ
i
+
¸
i
∂ P
∂ρ
i
∆ρ
i
≤ P
target
−P(n, λ, ρ, ǫ)}.
Hereby, δ is a sufficiently small non-negative number to ensure
that the degree distribution pair changes only slightly at each
step so that changes of the rate and of the probability of
erasure are accurately described by the linear approximation.
The value δ is best adapted dynamically to ensure convergence.
One can start with a large value and decrease it the closer
we get to the final answer. The objective function in LP 1
is equal to the total derivative of the rate as a function of the
change of the degree distribution. Several rounds of this linear
program will gradually improve the rate of the code ensemble,
while keeping the erasure probability below the target (last
inequality).
Sometimes it is necessary to initialize the optimization
procedure with degree distribution pairs that do not fulfill
the target erasure probability constraint. This is for instance
the case if the optimization is repeated for a large number
of “randomly” chosen initial conditions. In this way, we can
check whether the procedure always converges to the same
point (thus suggesting that a global optimum was found), or
otherwise, pick the best outcome of many trials. To this end we
define a linear program that decreases the erasure probability.
LP 2: [Linear program to decrease P(n, λ, ρ, ǫ)]
min{
¸
∂ P
∂λ
i
∆λ
i
+
¸
∂ P
∂ρ
i
∆ρ
i
|
¸
i
∆λ
i
= 0; −min{δ, λ
i
} ≤ ∆λ
i
≤ δ;
¸
i
∆ρ
i
= 0; −min{δ, ρ
i
} ≤ ∆ρ
i
≤ δ}.
Example 1: [Sample Optimization] Let us show a sample
optimization. Assume we transmit over a BEC with channel
erasure probability ǫ = 0.5. We are interested in a block
length of n = 5000 bits and the maximum variable and check
degree we allow are l
max
= 13 and r
max
= 10, respectively.
We constrain the block erasure probability to be smaller than
P
target
= 10
−4
. We further count only erasures larger or equal
to s
min
= 6 bits. This corresponds to looking at an expurgated
ensemble, i.e., we are looking at the subset of codes of the
ensemble that do not contain stopping sets of sizes smaller than
6. Alternatively, we can interpret this constraint in the sense
that we use an outer code which “cleans up” the remaining
small erasures. Using the techniques discussed in Section
II-C, we can compute the probability that a randomly chosen
element of an ensemble does not contain stopping sets of size
smaller than 6. If this probability is not too small then we
have a good chance of finding such a code in the ensemble by
sampling a sufficient number of random elements. This can be
checked at the end of the optimization procedure.
We start with an arbitrary degree distribution pair:
λ(x) = 0.139976x + 0.149265x
2
+ 0.174615x
3
(4)
+ 0.110137x
4
+ 0.0184844x
5
+ 0.0775212x
6
+ 0.0166585x
7
+ 0.00832646x
8
+ 0.0760256x
9
+ 0.0838369x
10
+ 0.0833654x
11
+ 0.0617885x
12
,
ρ(x) =0.0532687x + 0.0749403x
2
+ 0.11504x
3
(5)
+ 0.0511266x
4
+ 0.170892x
5
+ 0.17678x
6
+ 0.0444454x
7
+ 0.152618x
8
+ 0.160889x
9
.
This pair was generated randomly by choosing each coefficient
uniformly in [0, 1] and then normalizing so that λ(1) = ρ(1) =
1. The approximation of the block erasure probability curve of
this code (as given in Section II-D) is shown in Fig. 1. For this
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
40.58 %
0.0 1.0 rate/capacity
2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11 12 13
contribution to error floor
6 8 10 12 14 16 18 20 22 24 26
ǫ
P
B
Fig. 1: Approximation of the block erasure probability for the
initial ensemble with degree distribution pair given in (4) and
(5).
initial degree distribution pair we have r(λ, ρ) = 0.2029 and
P
B
(n = 5000, λ, ρ, ǫ = 0.5) = 0.000552 > P
target
. Therefore,
3
we start by reducing P
B
(n = 5000, λ, ρ, ǫ = 0.5) (over the
choice of λ and ρ) using LP 2 until it becomes lower than
P
target
. After a number of LP 2 rounds we obtain the degree
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
43.63 %
0.0 1.0 rate/capacity
2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11 12 13
contribution to error floor
6 8 10 12 14 16 18 20 22 24 26
ǫ
P
B
Fig. 2: Approximation of the block erasure probability for the
ensemble obtained after the first part of the optimization (see
(6) and (7)). The erasure probability has been lowered below
the target.
distribution pair:
λ(x) =0.111913x + 0.178291x
2
+ 0.203641x
3
(6)
+ 0.139163x
4
+ 0.0475105x
5
+ 0.106547x
6
+ 0.0240221x
7
+ 0.0469994x
9
+ 0.0548108x
10
+ 0.0543393x
11
+ 0.0327624x
12
,
ρ(x) =0.0242426x + 0.101914x
2
+ 0.142014x
3
(7)
+ 0.0781005x
4
+ 0.198892x
5
+ 0.177806x
6
+ 0.0174716x
7
+ 0.125644x
8
+ 0.133916x
9
.
For this degree distribution pair we have P
B
(n =
5000, λ, ρ, ǫ = 0.5) = 0.0000997 ≤ P
target
and r(λ, ρ) =
0.218. We show the corresponding approximation in Fig. 2.
Now, we start the second phase of the optimization and opti-
mize the rate while insuring that the block erasure probability
remains below the target, using LP 1. The resulting degree
distribution pair is:
λ(x) =0.0739196x + 0.657891x
2
+ 0.268189x
12
, (8)
ρ(x) =0.390753x
4
+ 0.361589x
5
+ 0.247658x
9
, (9)
where r(λ, ρ) = 0.41065. The block erasure probability plot
for the result of the optimization is shown in Fig 3.
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
82.13 %
0.0 1.0 rate/capacity
2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11 12 13
contribution to error floor
6 8 10 12 14 16 18 20 22 24 26
86.79 %
0.0 1.0 rate/capacity
2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 1011121314151617181920
contribution to error floor
18 20 22 24 26 28 30 32 34 36 38
ǫ
P
B
Fig. 3: Error probability curve for the result of the op-
timization (see (8) and (9)). The solid curve is P
B
(n =
5000, λ, ρ, ǫ = 0.5) while the small dots correspond to simu-
lation points. In dotted are the results with a more aggressive
expurgation.
Each LP step takes on the order of seconds on a standard
PC. In total, the optimization for a given set of parameters
(n, ǫ, P
target
, l
max
, r
max
, s
min
) takes on the order of minutes.
Recall that the whole optimization procedure was based
on P
B
(n, λ, ρ, ǫ) which is only an approximation of the true
block erasure probability. In principle, the actual performances
of the optimized ensemble could be worse (or better) than
predicted by P
B
(n, λ, ρ, ǫ). To validate the procedure we
computed the block erasure probability for the optimized
degree distribution also by means of simulations and compare
the two. The simulation results are shown in Fig 3 (dots
with 95% confidence intervals) : analytical (approximate) and
numerical results are in almost perfect agreement!
How hard is it to find a code without stopping sets of
size smaller than 6 within the ensemble LDPC(5000, λ, ρ)
with (λ, ρ) given by Eqs. (8) and (9)? As discussed in more
detail in Section II-C, in the limit of large blocklengths the
number of small stopping sets has a joint Poisson distribution.
As a consequence, if
˜
A
i
denotes the expected number of
minimal stopping sets of size i in a random element from
LDPC(5000, λ, ρ), the probability that it contains no stopping
set of size smaller than 6 is approximately exp{−
¸
5
i=1
˜
A
i
}.
For the optimized ensemble we get exp{−(0.2073+0.04688+
0.01676 + 0.007874 + 0.0043335)} ≈ 0.753, a quite large
probability. We repeated the optimization procedure with var-
ious different random initial conditions and always ended up
with essentially the same degree distribution. Therefore, we
can be quite confident that the result of our local optimization
is close to the global optimal degree distribution pair for the
given constraints (n, ǫ, P
target
, l
max
, r
max
, s
min
).
There are many ways of improving the result. E.g., if we
allow higher degrees or apply a more aggressive expurgation,
we can obtain degree distribution pairs with higher rate. E.g.,
for the choice l
max
= 15 and s
min
= 18 the resulting degree
distribution pair is
λ(x) =0.205031x + 0.455716x
2
, (10)
+ 0.193248x
13
+ 0.146004x
14
ρ(x) =0.608291x
5
+ 0.391709x
6
, (11)
where r(λ, ρ) = 0.433942. The corresponding curve is de-
picted in Fig 3, as a dotted line. However, this time the
probability that a random element from LDPC(5000, λ, ρ)
has no stopping set of size smaller than 18 is approximately
6.10
−6
. It will therefore be harder to find a code that fulfills
the expurgation requirement.
It is worth stressing that our results could be improved further
by applying the same approach to more powerful ensembles,
e.g., multi-edge type ensembles, or ensembles defined by
protographs. The steps to be accomplished are: (i) derive the
scaling laws and define scaling parameters for such ensembles;
(ii) find efficiently computable expressions for the scaling
parameters; (iii) optimize the ensemble with respect to its
defining parameters (e.g. the degree distribution) as above.
Each of these steps is a manageable task – albeit not a trivial
one.
Another generalization of our approach which is slated for
future work is the extension to general binary memoryless
4
symmetric channels. Empirical evidence suggests that scaling
laws should also hold in this case, see [2], [3]. How to prove
this fact or how to compute the required parameters, however,
is an open issue.
In the rest of this paper, we describe in detail the approxi-
mation P(n, λ, ρ, ǫ) for the BEC.
II. APPROXIMATION P
B
(n, λ, ρ, ǫ) AND P
b
(n, λ, ρ, ǫ)
In order to derive approximations for the erasure probability
we separate the contributions to this erasure probability into
two parts – the contributions due to large erasure events and
the ones due to small erasure events. The large erasure events
give rise to the so-called waterfall curve, whereas the small
erasure events are responsible for the erasure floor.
In Section II-B, we recall that the water fall curve follows
a scaling law and we discuss how to compute the scaling
parameters. We denote this approximation of the water fall
curve by P
W
B/b
(n, λ, ρ, ǫ). We next show in Section II-C how
to approximate the erasure floor. We call this approximation
P
E
B/b,smin
(n, λ, ρ, ǫ). Hereby, s
min
denotes the expurgation pa-
rameter, i.e, we only count error events involving at least s
min
erasures. Finally, we collect in Section II-D our results and
give an approximation to the total erasure probability. We start
in Section II-A with a short review of density evolution.
A. Density Evolution
The initial analysis of the performance of LDPC codes
assuming that transmission takes place of the BEC is due to
Luby, Mitzenmacher, Shokrollahi, Spielman and Stemann, see
[6], and it is based on the so-called peeling algorithm. In this
algorithm we “peel-off” one variable node at a time (and all
its adjacent check nodes and edges) creating a sequence of
residual graphs. Decoding is successful if and only if the final
residual graph is the empty graph. A variable node can be
peeled off if it is connected to at least one check node which
has residual degree one. Initially, we start with the complete
Tanner graph representing the code and in the first step we
delete all variable nodes from the graph which have been
received (have not been erased), all connected check nodes,
and all connected edges.
From the description of the algorithm it should be clear that
the number of degree-one check nodes plays a crucial role.
The algorithm stops if and only if no degree-one check node
remains in the residual graph. Luby et al. were able to give
analytic expressions for the expected number of degree-one
check nodes as a function of the size of the residual graph
in the limit of large blocklengths. They further showed that
most instances of the graph and the channel follow closely this
ensemble average. More precisely, let r
1
denote the fraction of
degree-one check nodes in the decoder. (This means that the
actual number of degree-one check nodes is equal to n(1 −
r)r
1
, where n is the blocklength and r is the design rate of
the code.) Then, as shown in [6], r
1
is given parametrically
by
r
1
(y) = ǫλ(y)[y −1 + ρ(1 −ǫλ(y))]. (12)
where y is determined so that ǫL(y) is the fractional (with
respect to n) size of the residual graph. Hereby, L(x) =
¸
i
L
i
x
i
=

x
0
λ(u)du

1
0
λ(u)du
is the node perspective variable node
distribution, i.e. L
i
is the fraction of variable nodes of degree i
in the Tanner graph. Analogously, we let R
i
denote the fraction
of degree i check nodes, and set R(x) =
¸
i
R
i
x
i
. With
an abuse of notation we shall sometimes denote the irregular
LDPC ensemble as LDPC(n, L, R).
The threshold noise parameter ǫ

= ǫ

(λ, ρ) is the supre-
mum value of ǫ such that r
1
(y) > 0 for all y ∈ (0, 1], (and
therefore iterative decoding is successful with high probabil-
ity). In Fig. 4, we show the function r
1
(y) depicted for the
ensemble with λ(x) = x
2
and ρ(x) = x
5
for ǫ = ǫ

. As
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.01
0.0
0.01
0.02
0.03
y
r
1
(y)
erasure floor
erasures
waterfall
erasures
y

Fig. 4: r
1
(y) for y ∈ [0, 1] at the threshold. The degree
distribution pair is λ(x) = x
2
and ρ(y) = x
5
and the threshold
is ǫ

= 0.4294381.
the fraction of degree-one check nodes concentrates around
r
1
(y), the decoder will fail with high probability only in two
possible ways. The first relates to y ≈ 0 and corresponds
to small erasure events. The second one corresponds to the
value y

such that r
1
(y

) = 0. In this case the fraction of
variable nodes that can not be decoded concentrates around
ν

= ǫ

L(y

).
We call a point y

where the function y −1 +ρ(1 −ǫλ(y))
and its derivative both vanish a critical point. At threshold, i.e.
for ǫ = ǫ

, there is at least one critical point, but there may be
more than one. (Notice that the function r
1
(y) always vanishes
together with its derivative at y = 0, cf. Fig. 4. However, this
does not imply that y = 0 is a critical point because of the
extra factor λ(y) in the definition of r
1
(y).) Note that if an
ensemble has a single critical point and this point is strictly
positive, then the number of remaining erasures conditioned
on decoding failure, concentrates around ν

ǫ

L(y

).
In the rest of this paper, we will consider ensembles with a
single critical point and separate the two above contributions.
We will consider in Section II-B erasures of size at least nγν

with γ ∈ (0, 1). In Section II-C we will instead focus on
erasures of size smaller than nγν

. We will finally combine
the two results in Section II-D
B. Waterfall Region
It was proved in [3], that the erasure probability due to large
failures obeys a well defined scaling law. For our purpose it is
best to consider a refined scaling law which was conjectured
in the same paper. For convenience of the reader we restate it
here.
5
Conjecture 1: [Refined Scaling Law] Consider transmis-
sion over a BEC of erasure probability ǫ using random
elements from the ensemble LDPC(n, λ, ρ) = LDPC(n, L, R).
Assume that the ensemble has a single critical point y

> 0
and let ν

= ǫ

L(y

), where ǫ

is the threshold erasure
probability. Let P
W
b
(n, λ, ρ, ǫ) (respectively, P
W
B
(n, λ, ρ, ǫ))
denote the expected bit (block) erasure probability due to
erasures of size at least nγν

, where γ ∈ (0, 1). Fix z :=

n(ǫ

−βn

2
3
−ǫ). Then as n tends to infinity,
P
W
B
(n, λ, ρ, ǫ) = Q

z
α

1 + O(n
−1/3
)

,
P
W
b
(n, λ, ρ, ǫ) = ν

Q

z
α

1 + O(n
−1/3
)

,
where α = α(λ, ρ) and β = β(λ, ρ) are constants which
depend on the ensemble.
In [3], [5], a procedure called covariance evolution was defined
to compute the scaling parameter α through the solution of
a system of ordinary differential equations. The number of
equations in the system is equal to the square of the number of
variable node degrees plus the largest check node degree minus
one. As an example, for an ensemble with 5 different variable
node degrees and r
max
= 30, the number of coupled equations
in covariance evolution is (5+29)
2
= 1156. The computation
of the scaling parameter can therefore become a challenging
task. The main result in this paper is to show that it is possible
to compute the scaling parameter α without explicitly solving
covariance evolution. This is the crucial ingredient allowing
for efficient code optimization.
Lemma 1: [Expression for α] Consider transmission over a
BEC with erasure probability ǫ using random elements from
the ensemble LDPC(n, λ, ρ) = LDPC(n, L, R). Assume that
the ensemble has a single critical point y

> 0, and let
ǫ

denote the threshold erasure probability. Then the scaling
parameter α in Conjecture 1 is given by
α =

ρ(¯ x

)
2
−ρ(¯ x
∗2
) + ρ

(¯ x

)(1 −2x

ρ(¯ x

)) − ¯ x
∗2
ρ

(¯ x
∗2
)
L

(1)λ(y

)
2
ρ

(¯ x

)
2
+
ǫ
∗2
λ(y

)
2
−ǫ
∗2
λ(y
∗2
) −y
∗2
ǫ
∗2
λ

(y
∗2
)
L

(1)λ(y

)
2

1/2
,
where x

= ǫ

λ(y

), ¯ x

= 1 −x

.
The derivation of this expression is explained in Section III
For completeness and the convenience of the reader, we
repeat here also an explicit characterization of the shift param-
eter β which appeared already (in a slightly different form) in
[3], [5].
Conjecture 2: [Scaling Parameter β] Consider transmission
over a BEC of erasure probability ǫ using random elements
from the ensemble LDPC(n, λ, ρ) = LDPC(n, L, R). Assume
that the ensemble has a single critical point y

> 0, and let
ǫ

denote the threshold erasure probability. Then the scaling
parameter β in Conjecture 1 is given by
β/Ω =

ǫ
∗4
r
∗2
2


λ

(y

)
2
r

2
−x


′′
(y

)r

2


(y

)x

))
2
L

(1)
2
ρ

(¯ x

)
3
x
∗10
(2ǫ

λ

(y

)
2
r

3
−λ
′′
(y

)r

2
x

)

1/3
, (13)
where x

= ǫ

λ(y

) and ¯ x

= 1 −x

, and for i ≥ 2
r

i
=
¸
m≥j≥i
(−1)
i+j

j −1
i −1

m−1
j −1

ρ
m


λ(y

))
j
.
Further, Ω is a universal (code independent) constant defined
in Ref. [3], [5].
We also recall that Ω is numerically quite close to 1. In the
rest of this paper, we shall always adopt the approximate Ω
by 1.
C. Error Floor
Lemma 2: [Error Floor] Consider transmission over a BEC
of erasure probability ǫ using random elements from an ensem-
ble LDPC(n, λ, ρ) = LDPC(n, L, R). Assume that the ensem-
ble has a single critical point y

> 0. Let ν

= ǫ

L(y

), where
ǫ

is the threshold erasure probability. Let P
E
b,smin
(n, λ, ρ, ǫ)
(respectively P
E
B,smin
(n, λ, ρ, ǫ)) denote the expected bit (block)
erasure probability due to stopping sets of size between s
min
and nγν

, where γ ∈ (0, 1). Then, for any ǫ < ǫ

,
P
E
b,smin
(n, λ, ρ, ǫ) =
¸
s≥smin
s
˜
A
s
ǫ
s
(1 + o(1)) , (14)
P
E
B,smin
(n, λ, ρ, ǫ) =1 −e

¸
s≥s
min
˜
Asǫ
s
(1 + o(1)) , (15)
where
˜
A
s
= coef {log (A(x)) , x
s
} for s ≥ 1, with A(x) =
¸
s≥0
A
s
x
s
and
A
s
=
¸
e

coef

¸
i
(1 + xy
i
)
nLi
, x
s
y
e
¸
× (16)
coef
¸¸
i
((1 + x)
i
−ix)
n(1−r)Ri
, x
e
¸

nL

(1)
e

.
Discussion: In the lemma we only claim a multiplicative error
term of the form o(1) since this is easy to prove. This weak
statement would remain valid if we replaced the expression for
A
s
given in (16) with the explicit and much easier to compute
asymptotic expression derived in [1]. In practice however the
approximation is much better than the stated o(1) error term if
we use the finite-length averages given by (16). The hurdle in
proving stronger error terms is due to the fact that for a given
length it is not clear how to relate the number of stopping
sets to the number of minimal stopping sets. However, this
relationship becomes easy in the limit of large blocklengths.
Proof: The key in deriving this erasure floor expression
is in focusing on the number of minimal stopping sets. These
are stopping set that are not the union of smaller stopping
sets. The asymptotic distribution of the number of minimal
stopping sets contained in an LDPC graph was already studied
in [1]. We recall that the distribution of the number of minimal
stopping sets tends to a Poisson distribution with independent
components as the length tends to infinity. Because of this
independence one can relate the number of minimal stopping
sets to the number of stopping sets – any combination of
minimal stopping sets gives rise to a stopping set. In the limit
of infinity blocklenghts the minimal stopping sets are non-
overlapping with probability one so that the weight of the
resulting stopping set is just the sum of the weights of the
individual stopping sets. For example, the number of stopping
sets of size two is equal to the number of minimal stopping
sets of size two plus the number of stopping sets we get by
taking all pairs of (minimal) stopping sets of size one.
6
Therefore, define
˜
A(x) =
¸
s≥1
˜
A
s
x
s
, with
˜
A
s
, the ex-
pected number of minimal stopping sets of size s in the graph.
Define further A(x) =
¸
s≥0
A
s
x
s
, with A
s
the expected
number of stopping sets of size s in the graph (not necessarily
minimal). We then have
A(x) = e
˜
A(x)
= 1 +
˜
A(x) +
˜
A(x)
2
2!
+
˜
A(x)
3
3!
+· · · ,
so that conversely
˜
A(x) = log (A(x)).
It remains to determine the number of stopping sets. As
remarked right after the statement of the lemma, any expres-
sion which converges in the limit of large blocklength to the
asymptotic value would satisfy the statement of the lemma
but we get the best empirical agreement for short lengths if
we use the exact finite-length averages. These average were
already compute in [1] and are given as in (16).
Consider now e.g. the bit erasure probability. We first
compute A(x) using (16) and then
˜
A(x) by means of
˜
A(x) =
log (A(x)). Consider one minimal stopping set of size s. The
probability that its s associated bits are all erased is equal to
ǫ
s
and if this is the case this stopping set causes s erasures.
Since there are in expectation
¯
A
s
minimal stopping sets of
size s and minimal stopping sets are non-overlapping with
increasing probability as the blocklength increases a simple
union bound is asymptotically tight. The expression for the
block erasure probability is derived in a similar way. Now we
are interested in the probability that a particular graph and
noise realization results in no (small) stopping set. Using the
fact that the distribution of minimal stopping sets follows a
Poisson distribution we get equation (15).
D. Complete Approximation
In Section II-B, we have studied the erasure probability
stemming from failures of size bigger than nγν

where
γ ∈ (0, 1) and ν

= ǫ

L(y

), i.e., ν

is the asymptotic
fractional number of erasures remaining after the decoding at
the threshold. In Section II-C, we have studied the probability
of erasures resulting from stopping sets of size between s
min
and nγν

. Combining the results in the two previous sections,
we get
P
B
(n, λ, ρ, ǫ) =P
W
B
(n, λ, ρ, ǫ) + P
E
B,smin
(n, λ, ρ, ǫ)
=Q


n(ǫ

−βn

2
3
−ǫ)
α

(17)
+ 1 −e

¸
s≥s
min
˜
Asǫ
s
,
P
b
(n, λ, ρ, ǫ) =P
W
b
(n, λ, ρ, ǫ) + P
E
b,smin
(n, λ, ρ, ǫ)
ν

Q


n(ǫ

−βn

2
3
−ǫ)
α

(18)
+
¸
s≥smin
s
˜
A
s
ǫ
s
.
Here we assume that there is a single critical point. If
the degree distribution has several critical points (at different
values of the channel parameter ǫ

1
, ǫ

2
,. . . ) then we simply
take a sum of terms P
W
B
(n, λ, ρ, ǫ), one for each critical point.
Let us finally notice that summing the probabilities of
different error types provides in principle only an upper bound
on the overall error probability. However, for each given
channel parameter ǫ, only one of the terms in Eqs. (17), (18)
dominates. As a consequence the bound is actually tight.
III. ANALYTIC DETERMINATION OF α
Let us now show how the scaling parameter α can be
determined analytically. We accomplish this in two steps. We
first compute the variance of the number of erasure messages.
Then we show in a second step how to relate this variance to
the scaling parameter α.
A. Variance of the Messages
Consider the ensemble LDPC(n, λ, ρ) and assume that
transmission takes place over a BEC of parameter ǫ. Perform
ℓ iterations of BP decoding. Set µ
(ℓ)
i
equal to 1 if the message
sent out along edge i from variable to check node is an erasure
and 0, otherwise. Consider the variance of these messages in
the limit of large blocklengths. More precisely, consider
V
(ℓ)
≡ lim
n→∞
E[(
¸
i
µ
(ℓ)
i
)
2
] −E[(
¸
i
µ
(ℓ)
i
)]
2
nL

(1)
.
Lemma 3 in Section IV contains an analytic expression for this
quantity as a function of the degree distribution pair (λ, ρ),
the channel parameter ǫ, and the number of iterations ℓ. Let
us consider this variance as a function of the parameter ǫ and
the number of iterations ℓ. Figure 5 shows the result of this
evaluation for the case (L(x) =
2
5
x
2
+
3
5
x
3
; R(x) =
3
10
x
2
+
7
10
x
3
). The threshold for this example is ǫ

≈ 0.8495897455.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0
3
6
9
12
Fig. 5: The variance as a function of ǫ and ℓ = 0, · · · , 9 for
(L(x) =
2
5
x
2
+
3
5
x
3
; R(x) =
3
10
x
2
+
7
10
x
3
).
This value is indicated as a vertical line in the figure. As we
can see from this figure, the variance is a unimodal function
of the channel parameter. It is zero for the extremal values
of ǫ (either all messages are known or all are erased) and
it takes on a maximum value for a parameter of ǫ which
approaches the critical value ǫ

as ℓ increases. Further, for
increasing ℓ the maximum value of the variance increases. The
limit of these curves as ℓ tends to infinity V = lim
ℓ→∞
V
(ℓ)
is
also shown (bold curve): the variance is zero below threshold;
above threshold it is positive and diverges as the threshold
is approached. In Section IV we state the exact form of the
limiting curve. We show that for ǫ approaching ǫ

from above
V =
γ
(1 −ǫλ

(y)ρ

(¯ x))
2
+ O((1 −ǫλ

(y)ρ

(¯ x))
−1
), (19)
7
where
γ = ǫ
∗2
λ

(y

)
2
¸
[ρ(¯ x

)
2
−ρ(¯ x
∗2
) + ρ

(¯ x

)(1 −2x

ρ(¯ x

))
− ¯ x
∗2
ρ

(¯ x
∗2
)] + ǫ
∗2
ρ

(¯ x

)
2
[λ(y

)
2
−λ(y
∗2
) −y
∗2
λ

(y
∗2
)]
¸
.
Here y

is the unique critical point, x

= ǫ

λ(y

), and ¯ x

=
1−x

. Since (1−ǫλ

(y)ρ

(¯ x)) = Θ(

ǫ −ǫ

), Eq. (19) implies
a divergence at ǫ

.
B. Relation Between γ and α
Now that we know the asymptotic variance of the edges
messages, let us discuss how this quantity can be related to
the scaling parameter α. Think of a decoder operating above
the threshold of the code. Then, for large blocklengths, it will
get stuck with high probability before correcting all nodes. In
Fig 6 we show R
1
, the number of degree-one check nodes,
as a function of the number of erasure messages for a few
decoding runs. Let V

represent the normalized variance of
0 2000 4000 6000 8000 10000
-80
0
80
160
240
320
400
480
560
640
720
¸
i
µ
i
R
1
∆r1
∆x
r1(x)
x

Fig. 6: Number of degree-one check nodes as a function of the
number of erasure messages in the corresponding BP decoder
for LDPC(n = 8192, λ(x) = x
2
, ρ(x) = x
5
). The thin lines
represent the decoding trajectories that stop when r
1
= 0 and
the thick line is the mean curve predicted by density evolution.
the number of erased messages in the decoder after an infinite
number of iterations
V

≡ lim
n→∞
lim
ℓ→∞
E[(
¸
i
µ
(ℓ)
i
)
2
] −E[(
¸
i
µ
(ℓ)
i
)]
2
nL

(1)
.
In other words, V

is the variance of the point at which the
decoding trajectories hit the R
1
= 0 axis.
This quantity can be related to the variance of the number
of degree-one check nodes through the slope of the density
evolution curve. Normalize all the quantities by nL

(1), the
number of edges in the graph. Consider the curve r
1
(ǫ, x)
given by density evolution, and representing the fraction of
degree-one check nodes in the residual graph, around the
critical point for an erasure probability above the threshold
(see Fig.6). The real decoding process stops when hitting
the r
1
= 0 axis. Think of a virtual process identical to the
decoding for r
1
> 0 but that continues below the r
1
= 0 axis
(for a proper definition, see [3]). A simple calculation shows
that if the point at which the curve hits the x-axis varies by
∆x while keeping the minimum at x

, it results in a variation
of the height of the curve by
∆r
1
=

2
r
1
(ǫ, x)
∂x
2


(x −x

)∆x + o(x −x

)
Taking the expectation of the square on both side and letting
ǫ tend to ǫ

, we obtain the normalized variance of R
1
at
threshold
δ
r1,r1
|

= lim
ǫ→ǫ


2
r
1
(ǫ, x)
∂x
2

2
(x −x

)
2
V + o((x −x

)
2
)

=

x

ǫ

λ

(y

)

2
lim
ǫ→ǫ

(1 −ǫλ

(y)ρ

(¯ x))
2
V

.
The transition between the first and the second line comes the
relationship between the ǫ and x, with r
1
(ǫ, x) = 0 when ǫ
tends to ǫ

.
The quantity V

differs from V computed in the previous
paragraphs because of the different order of the limits n →∞
and ℓ →∞. However it can be proved that the order does not
matter and V = V

. Using the result (19), we finally get
δ
r1,r1
|

=

x

ǫ

λ

(y

)

2
γ .
We conclude that the scaling parameter α can be obtained as
α =

δ
r1r1
|

L

(1)

∂r1
∂ǫ

2
=

γ
L

(1)x
∗2
λ

(y

)
2
ρ

(¯ x

)
2
The last expression is equal to the one in Lemma 1.
IV. MESSAGE VARIANCE
Consider the ensemble LDPC(n, λ, ρ) and transmission over
the BEC of erasure probability ǫ. As pointed out in the
previous section, the scaling parameter α can be related to
the (normalized) variance, with respect to the choice of the
graph and the channel realization, of the number of erased
edge messages sent from the variable nodes. Although what
really matters is the limit of this quantity as the blocklength
and the number of iterations tend to infinity (in this order)
we start by providing an exact expression for finite number of
iterations ℓ (at infinite blocklength). At the end of this section,
we shall take the limit ℓ →∞.
To be definite, we initialize the iterative decoder by setting
all check-to-variable messages to be erased at time 0. We let
x
i
(respectively y
i
) be the fraction of erased messages sent
from variable to check nodes (from check to variable nodes),
at iteration i, in the infinite blocklength limit. These values
are determined by the density evolution [1] recursions y
i+1
=
1 − ρ(¯ x
i
), with x
i
= ǫλ(y
i
) (where we used the notation
¯ x
i
= 1 − x
i
). The above initialization implies y
0
= 1. For
future convenience we also set x
i
= y
i
= 1 for i < 0.
Using these variables, we have the following characteriza-
tion of V
(ℓ)
, the (normalized) variance after ℓ iterations.
Lemma 3: Let G be chosen uniformly at random from
LDPC(n, λ, ρ) and consider transmission over the BEC of
erasure probability ǫ. Label the nL

(1) edges of G in some
fixed order by the elements of {1, · · · , nL

(1)}. Assume that
the receiver performs ℓ rounds of Belief Propagation decoding
and let µ
(ℓ)
i
be equal to one if the message sent at the end of
the ℓ-th iteration along edge i (from a variable node to a check
8
node) is an erasure, and zero otherwise. Then
V
(ℓ)
≡ lim
n→∞
E[(
¸
i
µ
(ℓ)
i
)
2
] −E[(
¸
i
µ
(ℓ)
i
)]
2
nL

(1)
(20)
=x

+ x

(1, 0)


¸
j=0
V(ℓ) · · · C(ℓ −j)

(1, 0)
T
edges in T
1
+ x
2

ρ

(1)
ℓ−1
¸
i=0
λ

(1)
i
ρ

(1)
i
edges in T
2
+ x

(1, 0)

2ℓ
¸
j=1
V(ℓ) · · · C(ℓ −j)

(1, 0)
T
edges in T
3
+ (1, 0)


¸
j=0

y
ℓ−j
U

(j, j) + (1 −y
ℓ−j
)U
0
(j, j)

+
2ℓ
¸
j=ℓ+1
V(ℓ) · · · C(2ℓ −j)
·

y
ℓ−j
U

(j, ℓ) + (1 −y
ℓ−j
)U
0
(j, ℓ)


edges in T
4
−x

W(ℓ, 1)
+

¸
i=1
F
i
(x
i
W(ℓ, 1) −ǫW(ℓ, y
i
))


¸
i=1
F
i
ǫλ

(y
i
)

D(ℓ, 1)ρ(¯ x
i−1
) −D(ℓ, ¯ x
i−1

+
ℓ−1
¸
i=1
F
i

x

+ (1, 0)V(ℓ) · · · C(0)V(0)(1, 0)
T

· (1, 0)V(i) · · · C(i −ℓ)(1, 0)
T

ℓ−1
¸
i=1
F
i
x
i

x

+ (1, 0)V(ℓ) · · · C(0)V(0)(1, 0)
T

,
· (λ

(1)ρ

(1))

where we introduced the shorthand
V(i) · · · C(i −j) ≡
j−1
¸
k=0
V(i −k)C(i −k −1). (21)
We We define the matrices
V(i) =

ǫλ

(y
i
) 0
λ

(1) −ǫλ

(y
i
) λ

(1)

, (22)
C(i) =

ρ

(1) ρ

(1) −ρ

(¯ x
i
)
0 ρ

(¯ x
i
)

, i ≥ 0, (23)
V(i) =

λ

(1) 0
0 λ

(1)

, (24)
C(i) =

ρ

(1) 0
0 ρ

(1)

, i < 0. (25)
Further, U

(j, j), U

(j, ℓ), U
0
(j, j) and U
0
(j, ℓ) are computed
through the following recursion. For j ≤ ℓ, set
U

(j, 0) =(y
ℓ−j
ǫλ

(y

), (1 −y
ℓ−j
)ǫλ

(y

))
T
,
U
0
(j, 0) =(0, 0)
T
,
whereas for j > ℓ, initialize by
U

(j, j −ℓ) =(1, 0)V(ℓ) · · · C(2ℓ −j)(1, 0)
T

ǫλ

(y
2ℓ−j
)
0

+ (1, 0)V(ℓ) · · · C(2ℓ −j)(0, 1)
T

ǫ(λ

(1) −λ

(y
2ℓ−j
))
λ

(1)(1 −ǫ)

,
U
0
(j, j −ℓ) =(1, 0)V(ℓ) · · · C(2ℓ −j)(0, 1)
T

ǫλ

(1)
(1 −ǫ)λ

(1)

.
The recursion is
U

(j, k) =M
1
(j, k)C(ℓ −j + k −1)U

(j, k −1) (26)
+ M
2
(j, k)[N
1
(j, k −1)U

(j, k −1)
+ N
2
(j, k −1)U
0
(j, k −1)],
U
0
(j, k) =V(ℓ −j + k)[N
1
(j, k −1)U

(j, k −1) (27)
+ N
2
(j, k −1)U
0
(j, k −1)],
with
M
1
(j, k) =

ǫλ

(y
max{ℓ−k,ℓ−j+k}
) 0
1
{j<2k}
ǫ(λ

(y
ℓ−k
) −λ

(y
ℓ−j+k
)) ǫλ

(y
ℓ−k
)

,
M
2
(j, k) =

1
{j>2k}
ǫ(λ

(y
ℓ−j+k
) −λ

(y
ℓ−k
)) 0
λ

(1) −ǫλ

(y
min{ℓ−k,ℓ−j+k}
) λ

(1) −ǫλ

(y
ℓ−k
)

,
N
1
(j, k) =

ρ

(1) −ρ

(¯ x
ℓ−k−1
) ρ

(1) −ρ

(¯ x
max{ℓ−k−1,ℓ−j+k}
)
0 1
{j≤2k}


(¯ x
ℓ−j+k
) −ρ

(¯ x
ℓ−k−1
))

,
N
2
(j, k) =

ρ

(¯ x
ℓ−k−1
) 1
{j>2k}


(¯ x
ℓ−k−1
) −ρ

(¯ x
ℓ−j+k
))
0 ρ

(¯ x
min{ℓ−k−1,ℓ−j+k}
)

.
The coefficients F
i
are given by
F
i
=

¸
k=i+1
ǫλ

(y
k


(¯ x
k−1
), (28)
and finally
W(ℓ, α) =
2ℓ
¸
k=0
(1, 0)V(ℓ) · · · C(ℓ −k)A(ℓ, k, α)
+ x

(αλ

(α) + λ(α))ρ

(1)
ℓ−1
¸
i=0
ρ

(1)
i
λ

(1)
i
,
with A(ℓ, k, α) equal to

ǫαy
ℓ−k
λ

(αy
ℓ−k
) + ǫλ(αy
ℓ−k
)
αλ

(α) + λ(α) −ǫαy
ℓ−k
λ

(αy
ℓ−k
) −ǫλ(αy
ℓ−k
)

,
k ≤ ℓ

αλ

(α) + λ(α)
0

, k > ℓ
9
and
D(ℓ, α) =
2ℓ
¸
k=1
(1, 0)V(ℓ) · · · C(ℓ −k + 1)V(ℓ −k + 1)
·

αρ

(α) + ρ(α) −α(¯ x
ℓ−k


(α¯ x
ℓ−k
) −ρ(α¯ x
ℓ−k
)
α¯ x
ℓ−k
ρ

(α¯ x
ℓ−k
) + ρ(α¯ x
ℓ−k
)

+ x

(αρ

(α) + ρ(α))
ℓ−1
¸
i=0
ρ

(1)
i
λ

(1)
i
.
Proof: Expand V
(ℓ)
in (20) as
V
(ℓ)
= lim
n→∞
E[(
¸
i
µ
(ℓ)
i
)
2
] −E[
¸
i
µ
(ℓ)
i
]
2
nL

(1)
,
= lim
n→∞
¸
j

E[µ
(ℓ)
j
¸
i
µ
(ℓ)
i
] −E[µ
(ℓ)
j
]E[
¸
i
µ
(ℓ)
i
]

nL

(1)
,
= lim
n→∞
E[µ
(ℓ)
1
¸
i
µ
(ℓ)
i
] −E[µ
(ℓ)
1
]E[
¸
i
µ
(ℓ)
i
],
= lim
n→∞
E[µ
(ℓ)
1
¸
i
µ
(ℓ)
i
] −nL

(1)x
2

. (29)
In the last step, we have used the fact that x

= E[µ
(ℓ)
i
] for
any i ∈ {1, · · · , Λ

(1)}. Let us look more carefully at the first
term of (29). After a finite number of iterations, each message
µ
(ℓ)
i
depends upon the received symbols of a subset of the
variable nodes. Since ℓ is kept finite, this subset remains finite
in the large blocklength limit, and by standard arguments is a
tree with high probability. As usual, we refer to the subgraph
containing all such variable nodes, as well as the check nodes
connecting them as to the computation tree for µ
(ℓ)
i
.
It is useful to split the sum in the first term of Eq. (29) into
two contributions: the first contribution stems from edges i so
that the computation trees of µ
(ℓ)
1
and µ
(ℓ)
i
intersect, and the
second one stems from the remaining edges. More precisely,
we write
lim
n→∞

E[µ
(ℓ)
1
¸
i
µ
(ℓ)
i
] −nL

(1)x
2

= lim
n→∞
E[µ
(ℓ)
1
¸
i∈T
µ
(ℓ)
i
]
+ lim
n→∞

E[µ
(ℓ)
1
¸
i∈T
c
µ
(ℓ)
i
] −nL

(1)x
2

. (30)
We define T to be that subset of the variable-to-check edge
indices so that if i ∈ T then the computation trees µ
(ℓ)
i
and
µ
(ℓ)
1
intersect. This means that T includes all the edges whose
messages depend on some of the received values that are used
in the computation of µ
(ℓ)
1
. For convenience, we complete T
by including all edges that are connected to the same variable
nodes as edges that are already in T. T
c
is the complement in
{1, · · · , nL

(1)} of the set of indices T.
The set of indices T depends on the number of iterations
performed and on the graph realization. For any fixed ℓ, T
is a tree with high probability in the large blocklength limit,
and admits a simple characterization. It contains two sets of
edges: the ones ‘above’ and the ones ‘below’ edge 1 (we call
this the ‘root’ edge and the variable node it is connected to,
the ‘root’ variable node). Edges above the root are the ones
departing from a variable node that can be reached by a non
reversing path starting with the root edge and involving at most
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(A)
(B)
Fig. 7: Graph representing all edges contained in T, for the
case of ℓ = 2. The small letters represent messages sent
along the edges from a variable node to a check node and
the capital letters represent variable nodes. The message µ
(ℓ)
1
is represented by (a).
ℓ variable nodes (not including the root one). Edges below the
root are the ones departing from a variable node that can be
reached by a non reversing path starting with the opposite
of the root edge and involving at most 2ℓ variable nodes (not
including the root one). Edges departing from the root variable
node are considered below the root (apart from the root edge
itself).
We have depicted in Fig. 7 an example for the case of an
irregular graph with ℓ = 2. In the middle of the figure the edge
(a) ≡ 1 carries the message µ
(ℓ)
1
. We will call µ
(ℓ)
1
the root
message. We expand the graph starting from this root node. We
consider ℓ variable node levels above the root. As an example,
notice that the channel output on node (A) affects µ
(ℓ)
1
as well
as the message sent on (b) at the ℓ-th iteration. Therefore the
corresponding computation trees intersect and, according to
our definition (b) ∈ T. On the other hand, the computation
tree of (c) does not intersect the one of (a), but (c) ∈ T
because it shares a variable node with (b). We also expand 2ℓ
levels below the root. For instance, the value received on node
(B) affects both µ
(ℓ)
1
and the message sent on (g) at the ℓ-th
iteration.
We compute the two terms in (30) separately.
Define S = lim
n→∞
E[µ
(ℓ)
1
¸
i∈T
µ
(ℓ)
i
] and S
c
=
lim
n→∞

E[µ
(ℓ)
1
¸
i∈T
c
µ
(ℓ)
i
] −nL

(1)x
2

.
1) Computation of S: Having defined T, we can further
identify four different types of terms appearing in S and write
S = lim
n→∞
E[µ
(ℓ)
1
¸
i∈T
µ
(ℓ)
i
]
= lim
n→∞
E[µ
(ℓ)
1
¸
i∈T1
µ
(ℓ)
i
] + lim
n→∞
E[µ
(ℓ)
1
¸
i∈T2
µ
(ℓ)
i
]+
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T3
µ
(ℓ)
i
] + lim
n→∞
E[µ
(ℓ)
1
¸
i∈T4
µ
(ℓ)
i
]
10
2ℓ

µ
(ℓ)
1

Fig. 8: Size of T. It contains ℓ layers of variable nodes above
the root edge and 2ℓ layer of variable nodes below the root
variable node. The gray area represent the computation tree of
the message µ
(ℓ)
1
. It contains ℓ layers of variable nodes below
the root variable node.
The subset T
1
⊂ T contains the edges above the root variable
node that carry messages that point upwards (we include the
root edge in T
1
). In Fig. 7, the message sent on edge (b)
is of this type. T
2
contains all edges above the root but point
downwards, such as (c) in Fig. 7. T
3
contains the edges below
the root that carry an upward messages, like (d) and (f).
Finally, T
4
contains the edges below the root variable node
that point downwards, like (e) and (g).
Let us start with the simplest term, involving the messages
in T
2
. If i ∈ T
2
, then the computation trees of µ
(ℓ)
1
, and µ
(ℓ)
i
are with high probability disjoint in the large blocklength limit.
In this case, the messages µ
(ℓ)
1
and µ
(ℓ)
i
do not depend on any
common channel observation. The messages are nevertheless
correlated: conditioned on the computation graph of the root
edge the degree distribution of the computation graph of edge
i is biased (assume that the computation graph of the root edge
contains an unusual number of high degree check nodes; then
the computation graph of edge i must contain in expectation
an unusual low number of high degree check nodes). This
correlation is however of order O(1/n) and since T only
contains a finite number of edges the contribution of this
correlation vanishes as n →∞. We obtain therefore
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T2
µ
(ℓ)
i
] =x
2

ρ

(1)
ℓ−1
¸
i=0
ρ

(1)
i
λ

(1)
i
,
where we used lim
n→∞
E[µ
(ℓ)
1
µ
(ℓ)
i
] = x
2

, and the fact that the
expected number of edges in T
2
is ρ

(1)
¸
ℓ−1
i=0
λ

(1)
i
ρ

(1)
i
.
For the edges in T
1
we obtain
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T1
µ
(ℓ)
i
] = x

+ (31)
x

(1, 0)

¸

¸
j=1
V(ℓ)C(ℓ −1) · · · V(ℓ −j + 1)C(ℓ −j)
¸

(1, 0)
T
,
with the matrices V(i) and C(i) defined in Eqs. (22) and (23).
In order to understand this expression, consider the following
case (cf. Fig. 9 for an illustration). We are at the i-th iteration
of BP decoding and we pick an edge at random in the graph.
It is connected to a check node of degree j with probability
ρ
j
. Assume further that the message carried by this edge
from the variable node to the check node (incoming message)
is erased with probability p and known with probability ¯ p.
We want to compute the expected numbers of erased and
known messages sent out by the check node on its other edges
(outgoing messages). If the incoming message is erased, then
the number of erased outgoing messages is exactly (j − 1).
Averaging over the check node degrees gives us ρ

(1). If the
incoming message is known, then the expected number of
erased outgoing messages is (j−1)(1−(1−x
i
)
j−2
). Averaging
over the check node degrees gives us ρ

(1) −ρ

(1 −x
i
). The
expected number of erased outgoing messages is therefore,


(1) + ¯ p(ρ

(1) − ρ

(1 − x
i
)). Analogously, the expected
number of known outgoing messages is ¯ pρ

(x
i
). This result
can be written using a matrix notation: the expected number
of erased (respectively, known) outgoing messages is the first
(respectively, second) component of the vector C(i)(p, ¯ p)
T
,
with C(i) being defined in Eq. (23).
The situation is similar if we consider a variable node
instead of the check node with the matrix the matrix V(i)
replacing C(i). The result is generalized to several layers
of check and variable nodes, by taking the product of the
corresponding matrices, cf. Fig. 9.
(p, ¯ p)
T
C(i)(p, ¯ p)
T
(p, ¯ p)
T
V(i)(p, ¯ p)
T
(p, ¯ p)
T
V(i + 1)C(i)(p, ¯ p)
T
Fig. 9: Number of outgoing erased messages as a function of
the probability of erasure of the incoming message.
The contribution of the edges in T
1
to S is obtained by
writing
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T1
µ
(ℓ)
i
]
= lim
n→∞
P{µ
(ℓ)
1
= 1}E[
¸
i∈T1
µ
(ℓ)
i
| µ
(ℓ)
1
= 1]. (32)
The conditional expectation on the right hand side is given by
1 +

¸
j=1
(1, 0)V(ℓ) · · · C(ℓ −j)

1
0

. (33)
where the 1 is due to the fact that E[µ
(ℓ)
1
| µ
(ℓ)
1
= 1] = 1, and
each summand (1, 0)V(ℓ) · · · C(ℓ − j)(1, 0)
T
, is the expected
number of erased messages in the j-th layer of edges in T
1
,
conditioned on the fact that the root edge is erased at iteration
11
ℓ (notice that µ
(ℓ)
1
= 1 implies µ
(i)
1
= 1 for all i ≤ ℓ). Now
multiplying (33) by P{µ
(ℓ)
1
= 1} = x

gives us (31).
The computation is similar for the edges in T
3
and results
in
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T3
µ
(ℓ)
i
] = x

2ℓ
¸
j=1
(1, 0)V(ℓ) · · · C(ℓ −j)

1
0

.
In this sum, when j > ℓ, we have to evaluate the matrices V(i)
and C(i) for negative indices using the definitions given in (24)
and (25). The meaning of this case is simple: if j > ℓ then the
observations in these layers do not influence the message µ
(ℓ)
1
.
Therefore, for these steps we only need to count the expected
number of edges.
In order to obtain S, it remains to compute the contribution
of the edges in T
4
. This case is slightly more involved than
the previous ones. Recall that T
4
includes all the edges that
are below the root node and point downwards. In Fig. 7, edges
(e) and (g) are elements of T
4
. We claim that
lim
n→∞
E[µ
(ℓ)
1
¸
i∈T4
µ
(ℓ)
i
]
=(1, 0)

¸
j=0

y
ℓ−j
U

(j, j) + (1 −y
ℓ−j
)U
0
(j, j)

(34)
+ (1, 0)
2ℓ
¸
j=ℓ+1
V(ℓ) · · · C(2ℓ −j)
·

y
ℓ−j
U

(j, ℓ) + (1 −y
ℓ−j
)U
0
(j, ℓ)

.
The first sum on the right hand side corresponds to messages
µ
(ℓ)
i
, i ∈ T
4
whose computation tree contains the root variable
node. In the case of Fig. 7, where ℓ = 2, the contribution of
edge (e), would be counted in this first sum. The second term
in (34) corresponds to edges i ∈ T
4
, that are separated from
the root edge by more than ℓ + 1 variable nodes. In Fig. 7,
edge (g) is of this type.
In order to understand the first sum in (34), consider the
root edge and an edge i ∈ T
4
separated from the root edge
by j + 1 variable node with j ∈ {0, · · · , ℓ}. For this edge in
T
4
, consider two messages it carries: the message that is sent
from the variable node to the check node at the ℓ-th iteration
(this ‘outgoing’ message participates in our second moment
calculation), and the one sent from the check node to the
variable node at the (ℓ−j)-th iteration (‘incoming’). Define the
two-components vector U

(j, j) as follows. Its first component
is the joint probability that both the root and the outgoing
messages are erased conditioned on the fact that the incoming
message is erased, multiplied by the expected number of edges
in T
4
whose distance from the root is the same as for edge
i. Its second component is the joint probability that the root
message is erased and that the outgoing message is known,
again conditioned on the incoming message being erased, and
multiplied by the expected number of edges in T
4
at the
same distance from the root. The vector U
0
(j, j) is defined in
exactly the same manner except that in this case we condition
on the incoming message being known. The superscript ⋆
or 0 accounts respectively for the cases where the incoming
message is erased or known.
From these definitions, it is clear that the contribution
to S of the edges that are in T
4
and separated from the
root edge by j + 1 variable nodes with j ∈ {0, · · · , ℓ}, is
(1, 0)

y
ℓ−j
U

(j, j) + (1 −y
ℓ−j
)U
0
(j, j)

. We still have to
evaluate U

(j, j) and U
0
(j, j). In order to do this, we define
the vectors U

(j, k) and U
0
(j, k) with k ≤ j, analogously
to the case k = j, except that this time we consider the root
edge and an edge in i ∈ T
4
separated from the root edge by
k+1 variable nodes. The outgoing message we consider is the
one at the (ℓ −j + k)-th iteration and the incoming message
we condition on, is the one at the (ℓ − k)-th iteration. It is
easy to check that U

(j, j) and U
0
(j, j) can be computed in
a recursive manner using U

(j, k) and U
0
(j, k). The initial
conditions are
U

(j, 0) =

y
ℓ−j
ǫλ

(y

)
(1 −y
ℓ−j
)ǫλ

(y

)

, U
0
(j, 0) =

0
0

,
and the recursion is for k ∈ {1, · · · , j} is the one given in
Lemma 3, cf. Eqs. (26) and (27). Notice that any received
value which is on the path between the root edge and the
edge in T
4
affects both the messages µ
(ℓ)
1
and µ
(ℓ)
i
on the
corresponding edges. This is why this recursion is slightly
more involved than the one for T
1
. The situation is depicted
in the left side of Fig. 10.
root edge
root edge
edge in T
4
edge in T
4
Fig. 10: The two situations that arise when computing the
contribution of T
4
. In the left side we show the case where
the two edges are separated by at most ℓ + 1 variable nodes
and in the right side, the case where they are separated by
more than ℓ + 1 variable nodes.
Consider now the case of edges in T
4
that are separated from
the root edge by more than ℓ+1 variable nodes, cf. right picture
in Fig. 10. In this case, not all of the received values along the
path connecting the two edges, do affect both messages. We
therefore have to adapt the previous recursion. We start from
the root edge and compute the effect of the received values
that only affect this message resulting in a expression similar
to the one we used to compute the contribution of T
1
. This
gives us the following initial condition
U

(j, j −ℓ) =(1, 0)V(ℓ) · · · C(2ℓ −j)(1, 0)
T

ǫλ

(y
2ℓ−j
)
0

+ (1, 0)V(ℓ) · · · C(2ℓ −j)(0, 1)
T

ǫ(λ

(1) −λ

(y
2ℓ−j
))
λ

(1)(1 −ǫ)

,
U
0
(j, j −ℓ) =(1, 0)V(ℓ) · · · C(2ℓ −j)(0, 1)
T

ǫλ

(1)
(1 −ǫ)λ

(1)

.
12
We then apply the recursion given in Lemma 3 to the intersec-
tion of the computation trees. We have to stop the recursion
at k = ℓ (end of the intersection of the computation trees). It
remains to account for the received values that only affect the
messages on the edge in T
4
. This is done by writing
(1, 0)
2ℓ
¸
j=ℓ+1
V(ℓ) · · ·C(2ℓ −j)
·

y
ℓ−j
U

(j, ℓ) + (1 −y
ℓ−j
)U
0
(j, ℓ)

,
which is the second term on the right hand side of Eq. (34).
2) Computation of S
c
: We still need to compute S
c
=
lim
n→∞

E[µ
(ℓ)
1
¸
i∈T
c µ
(ℓ)
i
] −nL

(1)x
2

. Recall that by
definition, all the messages that are carried by edges in T
c
at the ℓ-th iteration are functions of a set of received values
distinct from the ones µ
(ℓ)
1
depends on. At first sight, one might
think that such messages are independent from µ
(ℓ)
1
. This is
indeed the case when the Tanner graph is regular, i.e. for the
degree distributions λ(x) = x
l−1
and ρ(x) = x
r−1
. We then
have
S
c
= lim
n→∞

E[µ
(ℓ)
1
¸
i∈T
c
µ
(ℓ)
i
] −nL

(1)x
2

= lim
n→∞

|T
c
|x
2

−Λ

(1)x
2

= lim
n→∞



(1) −|T|)x
2

−Λ

(1)x
2

=−|T|x
2

with the cardinality of T being |T| =
¸
2ℓ
i=0
(l −1)
i
(r −1)
i
l
+
¸

i=1
(l −1)
i−1
(r −1)
i
l.
Consider now an irregular ensemble and let G
T
be the graph
composed by the edges in T and by the variable and check
nodes connecting them. Unlike in the regular case, G
T
is
not fixed anymore and depends (in its size as well as in its
structure) on the graph realization. It is clear that the root
message µ
(ℓ)
1
depends on the realization of G
T
. We will see
that the messages carried by the edges in T
c
also depend
on the realization of G
T
. On the other hand they are clearly
conditionally independent given G
T
(because, conditioned on
G
T
, µ
(ℓ)
1
is just a deterministic function of the received symbols
in its computation tree). If we let j denote a generic edge in T
c
(for instance, the one with the lowest index), we can therefore
write
S
c
= lim
n→∞

E[µ
(ℓ)
1
¸
i∈T
c
µ
(ℓ)
i
] −nL

(1)x
2

= lim
n→∞

E
GT
[E[µ
(ℓ)
1
¸
i∈T
c
µ
(ℓ)
i
| G
T
]] −nL

(1)x
2

= lim
n→∞

E
GT
[|T
c
|E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]] −nL

(1)x
2

= lim
n→∞

E
GT
[(nL

(1) −|T|) E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]]
−nL

(1)x
2

= lim
n→∞
nL

(1)

E
GT
[E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]] −nL

(1)x
2

− lim
n→∞
E
GT
[|T|E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]]. (35)
We need to compute E[µ
(ℓ)
j
| G
T
] for a fixed realization of
G
T
and an arbitrary edge j taken from T
c
(the expectation
does not depend on j ∈ T
c
: we can therefore consider it as a
random edge as well). This value differs slightly from x

for
two reasons. The first one is that we are dealing with a fixed-
size Tanner graph (although taking later the limit n →∞) and
therefore the degrees of the nodes in G
T
are correlated with
the degrees of nodes in its complement G\G
T
. Intuitively, if G
T
contains an unusually large number of high degree variable
nodes, the rest of the graph will contain an unusually small
number of high degree variable nodes affecting the average
E[µ
(ℓ)
j
| G
T
]. The second reason why E[µ
(ℓ)
j
| G
T
] differs from
x

, is that certain messages carried by edges in T
c
which are
close to G
T
are affected by messages that flow out of G
T
.
The first effect can be characterized by computing the
degree distribution on G\G
T
as a function of G
T
. Define
V
i
(G
T
) (respectively C
i
(G
T
)) to be the number of variable
nodes (check nodes) of degree i in G
T
, and let V (x; G
T
) =
¸
i
V
i
(G
T
)x
i
and C(x; G
T
) =
¸
i
C
i
(G
T
)x
i
. We shall also
need the derivatives of these polynomials: V

(x; G
T
) =
¸
i
iV
i
(G
T
)x
i−1
and C

(x; G
T
) =
¸
i
iC
i
(G
T
)x
i−1
. It is easy to
check that if we take a bipartite graph having a variable degree
distributions λ(x) and remove a variable node of degree i, the
variable degree distribution changes by
δ
i
λ(x) =
iλ(x) −ix
i−1
nL

(1)
+ O(1/n
2
).
Therefore, if we remove G
T
from the bipartite graph, the
remaining graph will have a variable perspective degree dis-
tribution that differ from the original by
δλ(x) =
V

(1; G
T
)λ(x) −V

(x; G
T
)
nL

(1)
+ O(1/n
2
).
In the same way, the check degree distribution when we
remove G
T
changes by
δρ(x) =
C

(1; G
T
)ρ(x) − C

(x; G
T
)
nL

(1)
+ O(1/n
2
).
If the degree distributions change by δλ(x) and δρ(x), the
fraction x

of erased variable-to-check messages changes by
δx

. To the linear order we get
δx

=

¸
i=1

¸
k=i+1
ǫλ

(y
k


(¯ x
k−1
) [ǫδλ(y
i
) −ǫλ

(y
i
)δρ(¯ x
i−1
)] ,
=
1
Λ

(1)

¸
i=1
F
i
[ǫ(V

(1; G
T
)λ(y
i
) −V

(y
i
; G
T
))
−ǫλ

(y
i
)(C

(1; G
T
)ρ(¯ x
i−1
) −C

(¯ x
i−1
; G
T
))] + O(1/n
2
),
with F
i
defined as in Eq. (28).
Imagine now that we ix the degree distribution of G\G
T
.
The conditional expectation E[µ
(ℓ)
j
| G
T
] still depends on the
detailed structure of G
T
. The reason is that the messages that
flow out of the boundary of G
T
(both their number and value)
depend on G
T
, and these message affect messages in G\G
T
.
Since the fraction of such (boundary) messages is O(1/n),
their effect can be evaluated again perturbatively.
Call B the number of edges forming the boundary of G
T
(edges emanating upwards from the variable nodes that are ℓ
13
levels above the root edge and emanating downwards from the
variable nodes that are 2ℓ levels below the root variable node)
and let B

i
be the number of erased messages carried at the
i-th iteration by these edges. Let ˜ x
i
be the fraction of erased
messages, incoming to check nodes in G\G
T
from variable
nodes in G\G
T
, at the i-th iteration. Taking into account the
messages coming from variable nodes in G
T
(i.e. corresponding
to boundary edge), the overall fraction will be ˜ x
i
+δ˜ x
i
, where
δ˜ x
i
=
B

i
−B˜ x
i
nL

(1)
+ O(1/n
2
).
This expression simply comes from the fact that at the i-
th iteration, we have (nΛ

(1) − T) = nΛ

(1)(1 + O(1/n))
messages in the complement of G
T
of which a fraction ˜ x
i
is
erased. Further B messages incoming from the boundaries of
which B

i
are erasures.
Combining the two above effects, we have for an edge j ∈
T
c
E[µ
(ℓ)
j
| G
T
] = x

+
1
Λ

(1)

¸
i=1
F
i
[x
i
V

(1; G
T
) −ǫV

(y
i
; G
T
)
−ǫλ

(y
i
)(C

(1; G
T
)ρ(¯ x
i−1
) − C

(¯ x
i−1
; G
T
))]
+
1
Λ

(1)
ℓ−1
¸
i=1
F
i
(B

i
−Bx
i
) + O(1/n
2
).
We can now use this expression (35) to obtain
S
c
= lim
n→∞
Λ

(1)

E
GT
[E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]] −nL

(1)x
2

− lim
n→∞
E
GT
[|T|E[µ
(ℓ)
1
| G
T
]E[µ
(ℓ)
j
| G
T
]]
=

¸
i=1
F
i

x
i
E[µ
(ℓ)
1
V

(1; G
T
)] −ǫE[µ
(ℓ)
1
V

(y
i
; G
T
)]



¸
i=1
F
i
ǫλ

(y
i
)

E[µ
(ℓ)
1
C

(1; G
T
)]ρ(¯ x
i−1
)
−E[µ
(ℓ)
1
C

(¯ x
i−1
; G
T
)]

+
ℓ−1
¸
i=1
F
i
E[µ
(ℓ)
1
B

i
] −
ℓ−1
¸
i=1
F
i
x
i
E[µ
(ℓ)
1
B] − x

E[µ
(ℓ)
1
V
GT

(1)],
where we took the limit n →∞and replaced |T| by V

(1; G
T
).
It is clear what each of these values represent. For example,
E[µ
(ℓ)
1
V

(1; G
T
)] is the expectation of µ
(ℓ)
1
times the number
of edges that are in G
T
. Each of these terms can be computed
through recursions that are similar in spirit to the ones used
to compute S. These recursions are provided in the body of
Lemma 3. We will just explain in further detail how the terms
E[µ
(ℓ)
1
B] and E[µ
(ℓ)
1
B

i
] are computed.
We claim that
E[µ
(ℓ)
1
B] =

x

+ (1, 0)V(ℓ) · · · C(0)V(0)(1, 0)
T



(1)ρ

(1))

.
The reason is that µ
(ℓ)
1
depends only on the realization of its
computation tree and not on the whole G
T
. From the defini-
tions of G
T
, the boundary of G
T
is in average (λ

(1)ρ

(1))

larger than the boundary of the computation tree. Finally,
the expectation of µ
(ℓ)
1
times the number of edges in the
boundary of its computation tree is computed analogously to
what has been done for the contribution of S. The result is

x

+ (1, 0)V(ℓ) · · · C(0)V(0)(1, 0)
T

(the term x

accounts for
the root edge, and the other one of the lower boundary of
the computation tree). Multiplying this by (λ

(1)ρ

(1))

, we
obtain the above expression.
The calculation of E[µ
(ℓ)
1
B

i
] is similar. We start by comput-
ing the expectation of µ
(ℓ)
1
multiplied by the number of edges
in the boundary of its computation tree. This number has to be
multiplied by (1, 0)V(i)C(i −1) · · · V(i −ℓ +1)C(i −ℓ)(1, 0)
T
to account for what happens between the boundary of the
computation tree and the boundary of G
T
. We therefore obtain
E[µ
(ℓ)
1
B

i
] =

x

+ (1, 0)V(ℓ) · · · C(0)V(0)(1, 0)
T

· (1, 0)V(i) · · · C(i −ℓ)(1, 0)
T
.
The expression provided in the above lemma has been used
to plot V
(ℓ)
for ǫ ∈ (0, 1) and for several values of ℓ in the
case of an irregular ensemble in Fig. 5.
It remains to determine the asymptotic behavior of this
quantity as the number of iterations converges to infinity.
Lemma 4: Let G be chosen uniformly at random from
LDPC(n, λ, ρ) and consider transmission over the BEC of
erasure probability ǫ. Label the nL

(1) edges of G in some
fixed order by the elements of {1, . . . , nL

(1)}. Set µ
(ℓ)
i
equal
to one if the message along edge i from variable to check node,
after ℓ iterations, is an erasure and equal to zero otherwise.
Then
lim
ℓ→∞
lim
n→∞
E[(
¸
i
µ
(ℓ)
i
)
2
] −E[(
¸
i
µ
(ℓ)
i
)]
2
nL

(1)
=
+
ǫ
2
λ

(y)
2
(ρ(¯ x)
2
−ρ(¯ x
2
) + ρ

(¯ x)(1 −2xρ(¯ x)) − ¯ x
2
ρ

(¯ x
2
))
(1 −ǫλ

(y)ρ

(¯ x))
2
+
ǫ
2
λ

(y)
2
ρ

(¯ x)
2

2
λ(y)
2
−ǫ
2
λ(y
2
) −y
2
ǫ
2
λ

(y
2
))
(1 −ǫλ

(y)ρ

(¯ x))
2
+
(x −ǫ
2
λ(y
2
) −y
2
ǫ
2
λ

(y
2
))(1 + ǫλ

(y)ρ

(¯ x)) + ǫy
2
λ

(y)
1 −ǫλ

(y)ρ

(¯ x)
.
The proof is a (particularly tedious) calculus exercise, and
we omit it here for the sake of space.
REFERENCES
[1] T. Richardson and R. Urbanke. Modern Coding Theory. Cambridge
University Press, 2006. In preparation.
[2] A. Montanari. Finite-size scaling of good codes. In Proc. 39th
Annual Allerton Conference on Communication, Control and Computing,
Monticello, IL, 2001.
[3] A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke. Finite-length
scaling for iteratively decoded ldpc ensembles. submitted to IEEE IT,
June 2004.
[4] A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke. Finite-length
scaling and finite-length shift for low-density parity-check codes. In
Proc. 42th Annual Allerton Conference on Communication, Control and
Computing, Monticello, IL, 2004.
[5] A. Amraoui, A. Montanari, and R. Urbanke. Finite-length scaling of
irregular LDPC code ensembles. In Proc. IEEE Information Theory
Workshop, Rotorua, New-Zealand, Aug-Sept 2005.
[6] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. A. Spielman, and V. Ste-
mann. Practical loss-resilient codes. In Proceedings of the 29th annual
ACM Symposium on Theory of Computing, pages 150–159, 1997.
[7] C. M´ easson, A. Montanari, and R. Urbanke. Maxwell’s construction:
The hidden bridge between maximum-likelihood and iterative decoding.
submitted to IEEE Transactions on Information Theory, 2005.

ρi } ≤ ∆ρi ≤ δ}. ∂λi ∂ρi i i Hereby. Assume we change the degree distribution pair slightly by ∆λ(x) = i−1 and ∆ρ(x) = i ∆ρi xi−1 .0 rate/capacity contribution to error floor 10-3 1.5. If this probability is not too small then we have a good chance of finding such a code in the ensemble by sampling a sufficient number of random elements. i − min{δ. LP 1: [Linear program to increase the rate] max{(1 − r) i Example 1: [Sample Optimization] Let us show a sample optimization. i ∂P ∆ρi | ∂ρi − min{δ.0532687x + 0. ρ. ρ + ∆ρ) − r(λ.7 0. λ + ∆λ.. Therefore. 1. or otherwise.0 6 8 10 12 14 16 18 20 22 24 26 10-4 10-5 10-6 0.00832646x8 + 0. We are interested in a block length of n = 5000 bits and the maximum variable and check degree we allow are lmax = 13 and rmax = 10. the erasure probability changes (according to the approximation) by P(n. A quick calculation then shows that the design rate changes by r(λ + ∆λ.170892x + 0.e. In this way. (5) + 0.65 0. This corresponds to looking at an expurgated ensemble. ∆ρi = 0.3 0. Alternatively. ǫ) ≤ Ptarget . then increase the rate by a repeated application of the following linear program. The value δ is best adapted dynamically to ensure convergence. Assume we transmit over a BEC with channel erasure probability ǫ = 0. In any case. 1] and then normalizing so that λ(1) = ρ(1) = 1.2029 and PB (n = 5000. − min{δ.000552 > Ptarget . Sometimes it is necessary to initialize the optimization procedure with degree distribution pairs that do not fulfill the target erasure probability constraint. To this end we define a linear program that decreases the erasure probability. respectively.25 0. We constrain the block erasure probability to be smaller than Ptarget = 10−4 . we can compute the probability that a randomly chosen element of an ensemble does not contain stopping sets of size smaller than 6. The approximation of the block erasure probability curve of this code (as given in Section II-D) is shown in Fig.5) = 0.58 % 0.35 0. λ. ǫ) − P(n.4 0. The objective function in LP 1 is equal to the total derivative of the rate as a function of the change of the degree distribution. ρ.0444454x7 + 0.0833654x11 + 0.45 0.0511266x + 0. ρ) is close to this design rate [7]. λ. For this PB 10-1 10 -2 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 11 12 13 10 40. ρ). 1: Approximation of the block erasure probability for the initial ensemble with degree distribution pair given in (4) and (5). Several rounds of this linear program will gradually improve the rate of the code ensemble. we are looking at the subset of codes of the ensemble that do not contain stopping sets of sizes smaller than 6. where ∆λ(1) = 0 = i ∆λi x ∆ρ(1) and assume that the change is sufficiently small so that λ + ∆λ as well as ρ + ∆ρ are still valid degree distributions (non-negative coefficients).55 0.152618x8 + 0. λ. ρi } ≤ ∆ρi ≤ δ.75 ǫ Fig. ρ(x) =0.0617885x12. ρ) (1 − r) ∆ρi ∆λi ≃ − . We further count only erasures larger or equal to smin = 6 bits. If P(n.17678x 4 5 6 ∂P ∂P ∆λi + ∆ρi ≤ Ptarget − P(n. λi } ≤ ∆λi ≤ δ. ρ + ∆ρ. ≃ ∂λi ∂ρi i i (3) Equations (2) and (3) give rise to a simple linear program to optimize locally the degree distribution: Start with some initial degree distribution pair (λ. λ.0775212x 4 5 6 i ∆λi /i − i ∆ρi /i | (4) ∆λi = 0. while keeping the erasure probability below the target (last inequality). ρ. + 0. .0184844x + 0. Using the techniques discussed in Section II-C. λj λj i j j i i i j j LP 2: [Linear program to decrease P(n. − min{δ. This pair was generated randomly by choosing each coefficient uniformly in [0.0760256x9 + 0. ρ) = 0.139976x + 0.0749403x2 + 0. One can start with a large value and decrease it the closer we get to the final answer. ρ) is always a lower bound. i. This can be checked at the end of the optimization procedure. ǫ)] min{ ∂P ∆λi + ∂λi ∆λi = 0.5 0. i (2) In the same way. λi } ≤ ∆λi ≤ δ.6 0. This is for instance the case if the optimization is repeated for a large number of “randomly” chosen initial conditions. ρ. pick the best outcome of many trials.11504x3 + 0.0166585x7 + 0. r(λ. λ. ǫ = 0.160889x9.2 For “most” ensembles the actual rate of a randomly chosen element of the ensemble LDPC(n.0838369x10 + 0.110137x + 0. we can interpret this constraint in the sense that we use an outer code which “cleans up” the remaining small erasures.2 0. ρ. ǫ)}. we can check whether the procedure always converges to the same point (thus suggesting that a global optimum was found). λ. ǫ) ∂P ∂P ∆ρi ∆λi + .149265x2 + 0. δ is a sufficiently small non-negative number to ensure that the degree distribution pair changes only slightly at each step so that changes of the rate and of the probability of erasure are accurately described by the linear approximation. ∆ρi = 0. We start with an arbitrary degree distribution pair: λ(x) = 0. initial degree distribution pair we have r(λ.174615x3 + 0.

rmax . In principle. (iii) optimize the ensemble with respect to its defining parameters (e.25 0.5) (over the choice of λ and ρ) using LP 2 until it becomes lower than Ptarget . ρ.193248x13 + 0.75 ǫ Fig.0 10-5 18 20 22 24 26 28 30 32 34 36 38 10-6 0.6 0.0781005x + 0.0174716x7 + 0. ǫ = 0. The block erasure probability plot for the result of the optimization is shown in Fig 3.133916x9. smin ).178291x2 + 0.55 0. using LP 1.0 rate/capacity contribution to error floor 10-3 1. ǫ.753.0 6 8 10 12 14 16 18 20 22 24 26 10 -4 10-5 10-6 0. ρ.45 0. λ. in the limit of large blocklengths the number of small stopping sets has a joint Poisson distribution. To validate the procedure we computed the block erasure probability for the optimized degree distribution also by means of simulations and compare the two. The erasure probability has been lowered below the target. Each of these steps is a manageable task – albeit not a trivial one.218.7 0.177806x 4 5 6 (6) (7) + 0. this time the probability that a random element from LDPC(5000.0475105x5 + 0.04688+ 0. ρ) = 0. (11) (10) (8) (9) where r(λ.g. For the optimized ensemble we get exp{−(0.0 86. ρ) = 0. In total. ρ. λ.g.g. There are many ways of improving the result. if Ai denotes the expected number of minimal stopping sets of size i in a random element from LDPC(5000.35 0.4 0.0739196x + 0. multi-edge type ensembles.2 0. lmax . ǫ.0548108x10 + 0. 3: Error probability curve for the result of the optimization (see (8) and (9)). ρ(x) =0.106547x6 + 0.5 0.g. 4 5 9 2 12 Each LP step takes on the order of seconds on a standard PC.5) = 0. where r(λ. λ.45 0.6 0.146004x14 ρ(x) =0.2073+0. ρ) with (λ.41065.63 % 0.390753x + 0.0043335)} ≈ 0. It will therefore be harder to find a code that fulfills the expurgation requirement. Another generalization of our approach which is slated for future work is the extension to general binary memoryless . λ. we can obtain degree distribution pairs with higher rate. However.433942.13 % 0.4 0.657891x + 0.5 0. The resulting degree distribution pair is: λ(x) =0. For this degree distribution pair we have PB (n = 5000. Therefore. as a dotted line. ǫ) which is only an approximation of the true block erasure probability. E.0 6 8 10 12 14 16 18 20 22 24 26 2 3 4 5 6 7 8 9 1011121314151617181920 2 3 4 5 6 7 8 9 10 10-4 0.391709x6. After a number of LP 2 rounds we obtain the degree PB 10-1 10-2 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 11 12 13 10 43. the probability that it contains no stopping 5 ˜ set of size smaller than 6 is approximately exp{− i=1 Ai }. The corresponding curve is depicted in Fig 3.268189x . we start the second phase of the optimization and optimize the rate while insuring that the block erasure probability remains below the target.205031x + 0.0469994x9 + 0. Ptarget .0000997 ≤ Ptarget and r(λ. ǫ)..7 0.10−6 . Recall that the whole optimization procedure was based on PB (n.35 0. ǫ = 0.361589x + 0. smin ) takes on the order of minutes. the optimization for a given set of parameters (n.65 0.101914x2 + 0. 2.75 ǫ Fig.55 0. λ.198892x + 0. Now. ρ(x) =0. (ii) find efficiently computable expressions for the scaling parameters. ǫ = 0. the actual performances of the optimized ensemble could be worse (or better) than predicted by PB (n.247658x . a quite large probability.79 % rate/capacity contribution to error floor 1. λ.3 0. ˜ As a consequence.608291x5 + 0. ρ) has no stopping set of size smaller than 18 is approximately 6. rmax . It is worth stressing that our results could be improved further by applying the same approach to more powerful ensembles. Ptarget . The solid curve is PB (n = 5000.25 0.142014x3 + 0. 2: Approximation of the block erasure probability for the ensemble obtained after the first part of the optimization (see (6) and (7)). e. The steps to be accomplished are: (i) derive the scaling laws and define scaling parameters for such ensembles. we can be quite confident that the result of our local optimization is close to the global optimal degree distribution pair for the given constraints (n. ρ). We show the corresponding approximation in Fig.3 we start by reducing PB (n = 5000.5) while the small dots correspond to simulation points.455716x2. or ensembles defined by protographs.0242426x + 0.0327624x12. λ.01676 + 0. + 0. λ.139163x4 + 0. ρ) given by Eqs. ρ..0 rate/capacity contribution to error floor 10-3 1. ρ.65 0. The simulation results are shown in Fig 3 (dots with 95% confidence intervals) : analytical (approximate) and numerical results are in almost perfect agreement! How hard is it to find a code without stopping sets of size smaller than 6 within the ensemble LDPC(5000. lmax . if we allow higher degrees or apply a more aggressive expurgation.3 0.111913x + 0. the degree distribution) as above. PB 10-1 10 -2 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 11 12 13 10 82.007874 + 0.125644x8 + 0.0240221x7 + 0.203641x3 + 0. for the choice lmax = 15 and smin = 18 the resulting degree distribution pair is λ(x) =0. In dotted are the results with a more aggressive expurgation. E.2 0.0543393x11 + 0. We repeated the optimization procedure with various different random initial conditions and always ended up with essentially the same degree distribution. distribution pair: λ(x) =0. (8) and (9)? As discussed in more detail in Section II-C.. ρ) = 0.

Analogously. the decoder will fail with high probability only in two possible ways.7 0.8 0.) Note that if an ensemble has a single critical point and this point is strictly positive. we describe in detail the approximation P(n. i. Li is the fraction of variable nodes of degree i in the Tanner graph. (and therefore iterative decoding is successful with high probability). smin denotes the expurgation paB/b. 4: r1 (y) for y ∈ [0. In the rest of this paper. whereas the small erasure events are responsible for the erasure floor. ρ. We start in Section II-A with a short review of density evolution. however. we start with the complete Tanner graph representing the code and in the first step we delete all variable nodes from the graph which have been received (have not been erased). is an open issue.4294381.e. For our purpose it is best to consider a refined scaling law which was conjectured in the same paper. [3].01 0. We call a point y ∗ where the function y − 1 + ρ(1 − ǫλ(y)) and its derivative both vanish a critical point. Spielman and Stemann. Shokrollahi. cf. λ. However. We will consider in Section II-B erasures of size at least nγν ∗ with γ ∈ (0. From the description of the algorithm it should be clear that the number of degree-one check nodes plays a crucial role. that the erasure probability due to large failures obeys a well defined scaling law.) Then. ρ. Luby et al. Hereby. The first relates to y ≈ 0 and corresponds to small erasure events. λ.4 0.e.3 0.9 -0. we only count error events involving at least smin erasures. were able to give analytic expressions for the expected number of degree-one check nodes as a function of the size of the residual graph in the limit of large blocklengths. Density Evolution The initial analysis of the performance of LDPC codes assuming that transmission takes place of the BEC is due to Luby. as shown in [6]. L(x) = λ(u)du i 0 is the node perspective variable node 1 i Li x = λ(u)du 0 distribution. all connected check nodes. ρ) is the supremum value of ǫ such that r1 (y) > 0 for all y ∈ (0. ρ. Fig. In Section II-C we will instead focus on erasures of size smaller than nγν ∗ . Mitzenmacher. this does not imply that y = 0 is a critical point because of the extra factor λ(y) in the definition of r1 (y).s rameter.6 0. we show the function r1 (y) depicted for the ensemble with λ(x) = x2 and ρ(x) = x5 for ǫ = ǫ∗ . We denote this approximation of the water fall W curve by PB/b (n. ǫ) AND Pb (n. (12) where y is determined so that ǫL(y) is the fractional (with respect to n) xsize of the residual graph. L. . then the number of remaining erasures conditioned on decoding failure. In Section II-B. 4. the fraction of degree-one check nodes concentrates around r1 (y). λ. In Fig. The threshold noise parameter ǫ∗ = ǫ∗ (λ. A variable node can be peeled off if it is connected to at least one check node which has residual degree one.e. we will consider ensembles with a single critical point and separate the two above contributions. ǫ). we recall that the water fall curve follows a scaling law and we discuss how to compute the scaling parameters. In the rest of this paper. (This means that the actual number of degree-one check nodes is equal to n(1 − r)r1 . we let Ri denote the fraction i of degree i check nodes.1 0. ρ. λ. In this case the fraction of variable nodes that can not be decoded concentrates around ν ∗ = ǫ∗ L(y ∗ ).4 symmetric channels. R). How to prove this fact or how to compute the required parameters. we collect in Section II-D our results and give an approximation to the total erasure probability.5 0. where n is the blocklength and r is the design rate of the code. In this algorithm we “peel-off” one variable node at a time (and all its adjacent check nodes and edges) creating a sequence of residual graphs. and all connected edges. The large erasure events give rise to the so-called waterfall curve. concentrates around ν ∗ ǫ∗ L(y ∗ ). ǫ) for the BEC. ρ. λ. The algorithm stops if and only if no degree-one check node remains in the residual graph. 1]. see [6]. For convenience of the reader we restate it here. 1] at the threshold. II. i.0 y∗ 0. 1). and it is based on the so-called peeling algorithm. ǫ) In order to derive approximations for the erasure probability we separate the contributions to this erasure probability into two parts – the contributions due to large erasure events and the ones due to small erasure events. Initially. (Notice that the function r1 (y) always vanishes together with its derivative at y = 0. there is at least one critical point. The degree distribution pair is λ(x) = x2 and ρ(y) = x5 and the threshold is ǫ∗ = 0. As r1 (y) 0. Waterfall Region It was proved in [3]. Empirical evidence suggests that scaling laws should also hold in this case. Hereby.2 0. for ǫ = ǫ∗ . The second one corresponds to the value y ∗ such that r1 (y ∗ ) = 0. 4. ǫ).01 0. We will finally combine the two results in Section II-D B. r1 is given parametrically by r1 (y) = ǫλ(y)[y − 1 + ρ(1 − ǫλ(y))].02 erasure floor erasures waterfall erasures 0. More precisely. but there may be more than one. They further showed that most instances of the graph and the channel follow closely this ensemble average. With an abuse of notation we shall sometimes denote the irregular LDPC ensemble as LDPC(n. Finally. see [2]. A. A PPROXIMATION PB (n. We call this approximation PE min (n. let r1 denote the fraction of degree-one check nodes in the decoder. and set R(x) = i Ri x . At threshold. We next show in Section II-C how to approximate the erasure floor.0 y Fig. Decoding is successful if and only if the final residual graph is the empty graph.03 0. i.

Then the scaling parameter α in Conjecture 1 is given by α= + ρ(¯∗ )2 − ρ(¯∗2 ) + ρ′ (¯∗ )(1 − 2x∗ ρ(¯∗ )) − x∗2 ρ′ (¯∗2 ) x x x x ¯ x L′ (1)λ(y ∗ )2 ρ′ (¯∗ )2 x ǫ∗2 λ(y ∗ )2 − ǫ∗2 λ(y ∗2 ) − y ∗2 ǫ∗2 λ′ (y ∗2 ) L′ (1)λ(y ∗ )2 1/2 Further. xs y e × . ǫ)) B denote the expected bit (block) erasure probability due to erasures of size at least nγν ∗ . [5]. In [3]. ρ) = LDPC(n. where x∗ = ǫ∗ λ(y ∗ ). ρ. ǫ) b. However. ρ. λ. The main result in this paper is to show that it is possible to compute the scaling parameter α without explicitly solving covariance evolution.5 Conjecture 1: [Refined Scaling Law] Consider transmission over a BEC of erasure probability ǫ using random elements from the ensemble LDPC(n. ρ.s ˜ As ǫ s (1 + o(1)) . In the limit of infinity blocklenghts the minimal stopping sets are nonoverlapping with probability one so that the weight of the resulting stopping set is just the sum of the weights of the individual stopping sets. with A(x) = s s≥0 As x and As = e coef i i ((1 (1 + xy i )nLi . L. ρ. PW (n. This is the crucial ingredient allowing for efficient code optimization. and for i ≥ 2 ¯ ∗ ri = m≥j≥i (−1)i+j j−1 i−1 m−1 ρm (ǫ∗ λ(y ∗ ))j . ρ. z PW (n. Let PE min (n. For example. ρ) = LDPC(n. where γ ∈ (0. where γ ∈ (0. x∗ = 1 − x∗ . xe nL′ (1) e . for an ensemble with 5 different variable node degrees and rmax = 30. [5]. ρ. The asymptotic distribution of the number of minimal stopping sets contained in an LDPC graph was already studied in [1]. Ω is a universal (code independent) constant defined in Ref. ˜ where As = coef {log (A(x)) . C.s erasure probability due to stopping sets of size between smin and nγν ∗ .s s≥smin ˜ sAs ǫs (1 + o(1)) . R). where x∗ = ǫ∗ λ(y ∗ ) and x∗ = 1 − x∗ . L. Then. ǫ) (respectively. Conjecture 2: [Scaling Parameter β] Consider transmission over a BEC of erasure probability ǫ using random elements from the ensemble LDPC(n. xs } for s ≥ 1. Assume that the ensemble has a single critical point y ∗ > 0 and let ν ∗ = ǫ∗ L(y ∗ ). R). The hurdle in proving stronger error terms is due to the fact that for a given length it is not clear how to relate the number of stopping sets to the number of minimal stopping sets. (16) coef + x)i − ix)n(1−r)Ri . Let Pb (n. Let ν ∗ = ǫ∗ L(y ∗ ). λ. Then as n tends to infinity. ǫ) = ν ∗ Q b α where α = α(λ. ¯ The derivation of this expression is explained in Section III For completeness and the convenience of the reader. the number of coupled equations in covariance evolution is (5 + 29)2 = 1156. Then the scaling parameter β in Conjecture 1 is given by β/Ω = ∗2 ∗ ∗ ǫ∗4 r2 (ǫ∗ λ′ (y ∗ )2 r2 −x∗ (λ′′ (y ∗ )r2 +λ′ (y ∗ )x∗ ))2 ∗ ∗ L′ (1)2 ρ′ (¯∗ )3 x∗10 (2ǫ∗ λ′ (y ∗ )2 r3 −λ′′ (y ∗ )r2 x∗ ) x 1/3 (13) . we repeat here also an explicit characterization of the shift parameter β which appeared already (in a slightly different form) in [3]. a procedure called covariance evolution was defined to compute the scaling parameter α through the solution of a system of ordinary differential equations. PE min (n. Because of this independence one can relate the number of minimal stopping sets to the number of stopping sets – any combination of minimal stopping sets gives rise to a stopping set. λ. ρ) and β = β(λ. and let ǫ∗ denote the threshold erasure probability. The number of equations in the system is equal to the square of the number of variable node degrees plus the largest check node degree minus one. Assume that the ensemble has a single critical point y ∗ > 0. ρ) = LDPC(n. PW (n. for any ǫ < ǫ∗ . λ. λ.s (respectively PE min (n. we shall always adopt the approximate Ω by 1. In practice however the approximation is much better than the stated o(1) error term if we use the finite-length averages given by (16). Error Floor Lemma 2: [Error Floor] Consider transmission over a BEC of erasure probability ǫ using random elements from an ensemble LDPC(n. [3]. λ. B α z 1 + O(n−1/3 ) . where ǫ∗ is the threshold erasure probability. 1). ρ) are constants which depend on the ensemble. and let ǫ∗ denote the threshold erasure probability. ρ) = LDPC(n. The computation of the scaling parameter can therefore become a challenging task. λ. R). s≥smin (14) (15) PE min (n. λ. Assume that the ensemble has a single critical point y ∗ > 0. the number of stopping sets of size two is equal to the number of minimal stopping sets of size two plus the number of stopping sets we get by taking all pairs of (minimal) stopping sets of size one. λ. As an example. Proof: The key in deriving this erasure floor expression is in focusing on the number of minimal stopping sets. Lemma 1: [Expression for α] Consider transmission over a BEC with erasure probability ǫ using random elements from the ensemble LDPC(n. In the rest of this paper. L. λ. λ. ǫ)) denote the expected bit (block) B. R). this relationship becomes easy in the limit of large blocklengths. [5]. ǫ) = b. . ρ. We recall that the distribution of the number of minimal stopping sets tends to a Poisson distribution with independent components as the length tends to infinity. We also recall that Ω is numerically quite close to 1. Assume that the ensemble has a single critical point y ∗ > 0. ρ. These are stopping set that are not the union of smaller stopping sets. λ. Fix z := √ ∗ 2 n(ǫ − βn− 3 − ǫ). j−1 Discussion: In the lemma we only claim a multiplicative error term of the form o(1) since this is easy to prove. where ǫ∗ is the threshold erasure W probability. L. ǫ) = Q 1 + O(n−1/3 ) . This weak statement would remain valid if we replaced the expression for As given in (16) with the explicit and much easier to compute asymptotic expression derived in [1]. ǫ) =1 − e− B. 1).

Further. ρ. The limit of these curves as ℓ tends to infinity V = limℓ→∞ V (ℓ) is also shown (bold curve): the variance is zero below threshold. s Define further A(x) = s≥0 As x . · · · . ρ. Combining the results in the two previous sections. Variance of the Messages Consider the ensemble LDPC(n. Here we assume that there is a single critical point. Perform (ℓ) ℓ iterations of BP decoding.9 =Q + √ ∗ 2 n(ǫ − βn− 3 − ǫ) α − s≥smin PE min (n. we get PB (n. 1) and ν ∗ = ǫ∗ L(y ∗ ).s (17) Fig. λ. we have studied the probability of erasures resulting from stopping sets of size between smin and nγν ∗ . define A(x) = s≥1 As x .4 0. . V (ℓ) ≡ lim E[( i 12 (ℓ) (ℓ) 9 6 3 0 0. We accomplish this in two steps. ǫ) W = PB (n. consider µi )2 ] − E[( i µi )]2 . The probability that its s associated bits are all erased is equal to ǫs and if this is the case this stopping set causes s erasures.0 0. In Section IV we state the exact form of the limiting curve. the variance is a unimodal function of the channel parameter. only one of the terms in Eqs. for each given channel parameter ǫ.s √ ∗ 2 n(ǫ − βn− 3 − ǫ) ν∗Q (18) α + s≥smin ˜ sAs ǫs .e. λ.5 0. ) then we simply 2 1 take a sum of terms PW (n. λ. Complete Approximation In Section II-B. ǫ) Let us finally notice that summing the probabilities of different error types provides in principle only an upper bound on the overall error probability. λ.. ǫ). with As the expected number of stopping sets of size s in the graph (not necessarily minimal). If the degree distribution has several critical points (at different values of the channel parameter ǫ∗ . we have studied the erasure probability stemming from failures of size bigger than nγν ∗ where γ ∈ (0. 5 This value is indicated as a vertical line in the figure. We then have ˜ ˜ A(x)2 A(x)3 ˜ ˜ A(x) = eA(x) = 1 + A(x) + + + ··· . Then we show in a second step how to relate this variance to the scaling parameter α.1 0.g. ρ) and assume that transmission takes place over a BEC of parameter ǫ.2 0. and the number of iterations ℓ. (17). the expected number of minimal stopping sets of size s in the graph. B . ρ. ǫ∗ . (18) dominates. for increasing ℓ the maximum value of the variance increases. Using the fact that the distribution of minimal stopping sets follows a Poisson distribution we get equation (15). More precisely.6 ˜ s ˜ ˜ Therefore. We show that for ǫ approaching ǫ∗ from above γ V= + O((1 − ǫλ′ (y)ρ′ (¯))−1 ). We first compute the variance of the number of erasure messages. The threshold for this example is ǫ ≈ 0. It remains to determine the number of stopping sets. ǫ) = PW (n.. Consider the variance of these messages in the limit of large blocklengths. Now we are interested in the probability that a particular graph and noise realization results in no (small) stopping set. As we can see from this figure. ǫ) + PE min (n. Let us consider this variance as a function of the parameter ǫ and the number of iterations ℓ. the channel parameter ǫ. Figure 5 shows the result of this 3 3 evaluation for the case (L(x) = 2 x2 + 5 x3 . Consider now e. D. 9 for 3 3 7 (L(x) = 2 x2 + 5 x3 . Pb (n. ρ. λ. ρ. The expression for the block erasure probability is derived in a similar way. ǫ) b b. As remarked right after the statement of the lemma. ¯ Since there are in expectation As minimal stopping sets of size s and minimal stopping sets are non-overlapping with increasing probability as the blocklength increases a simple union bound is asymptotically tight. λ. ρ. λ.6 0. These average were already compute in [1] and are given as in (16). 5: The variance as a function of ǫ and ℓ = 0. A NALYTIC D ETERMINATION OF α Let us now show how the scaling parameter α can be determined analytically. ǫ) B. n→∞ nL′ (1) Lemma 3 in Section IV contains an analytic expression for this quantity as a function of the degree distribution pair (λ. ρ. Set µi equal to 1 if the message sent out along edge i from variable to check node is an erasure and 0. with As . ρ). . However. the bit erasure probability. above threshold it is positive and diverges as the threshold is approached. We first ˜ ˜ compute A(x) using (16) and then A(x) by means of A(x) = log (A(x)). As a consequence the bound is actually tight.8495897455. one for each critical point.8 0. III. (19) x ′ (y)ρ′ (¯))2 (1 − ǫλ x +1−e ˜ As ǫ s . R(x) = 10 x2 + 5 7 3 ∗ 10 x ). otherwise.3 0. i. R(x) = 10 x2 + 10 x3 ). It is zero for the extremal values of ǫ (either all messages are known or all are erased) and it takes on a maximum value for a parameter of ǫ which approaches the critical value ǫ∗ as ℓ increases. λ. A. 2! 3! ˜ so that conversely A(x) = log (A(x)). In Section II-C. ν ∗ is the asymptotic fractional number of erasures remaining after the decoding at the threshold.7 0. any expression which converges in the limit of large blocklength to the asymptotic value would satisfy the statement of the lemma but we get the best empirical agreement for short lengths if we use the exact finite-length averages. Consider one minimal stopping set of size s.

around the critical point for an erasure probability above the threshold (see Fig. for large blocklengths. This quantity can be related to the variance of the number of degree-one check nodes through the slope of the density evolution curve. ρ) and consider transmission over the BEC of erasure probability ǫ. let us discuss how this quantity can be related to the scaling parameter α. Then.6). see [3]). with xi = ǫλ(yi ) (where we used the notation x xi = 1 − xi ). The quantity V∗ differs from V computed in the previous paragraphs because of the different order of the limits n → ∞ and ℓ → ∞. Assume that the receiver performs ℓ rounds of Belief Propagation decoding (ℓ) and let µi be equal to one if the message sent at the end of the ℓ-th iteration along edge i (from a variable node to a check i Fig. x) (x − x∗ )∆x + o(x − x∗ ) ∂x2 ∗ . with respect to the choice of the graph and the channel realization. (19) implies x a divergence at ǫ∗ . x = ǫ λ(y ). we initialize the iterative decoder by setting all check-to-variable messages to be erased at time 0. we obtain the normalized variance of R1 at threshold δ r1 . and x = ¯ √ 1−x∗ . The above initialization implies y0 = 1.7 where γ = ǫ λ (y ) ∗2 ′ ∗ ∗2 ∗2 ′ ∗ 2 − x ρ (¯ )] + ǫ∗2 ρ′ (¯∗ )2 [λ(y ∗ )2 − λ(y ∗2 ) − y ∗2 λ′ (y ∗2 )] . Normalize all the quantities by nL′ (1). x The transition between the first and the second line comes the relationship between the ǫ and x. M ESSAGE VARIANCE Consider the ensemble LDPC(n. r1 (x) x∗ ∆x ∆r1 We conclude that the scaling parameter α can be obtained as α= δ r1 r1 |∗ ∂r1 2 ∂ǫ L′ (1) = γ L′ (1)x∗2 λ′ (y ∗ )2 ρ′ (¯∗ )2 x The last expression is equal to the one in Lemma 1. with r1 (ǫ. we finally get δ r1 . it results in a variation of the height of the curve by ∆r1 = ∂ 2 r1 (ǫ. in the infinite blocklength limit. ¯ x x Here y is the unique critical point. it will get stuck with high probability before correcting all nodes. nL′ (1)}. Think of a decoder operating above the threshold of the code. µi IV. At the end of this section. we shall take the limit ℓ → ∞. As pointed out in the previous section. Eq. ρ) and transmission over the BEC of erasure probability ǫ. Although what really matters is the limit of this quantity as the blocklength and the number of iterations tend to infinity (in this order) we start by providing an exact expression for finite number of iterations ℓ (at infinite blocklength). Since (1−ǫλ′ (y)ρ′ (¯)) = Θ( ǫ − ǫ∗ ). λ(x) = x2 . ρ(x) = x5 ). x) = 0 when ǫ tends to ǫ∗ . For ¯ future convenience we also set xi = yi = 1 for i < 0. Label the nL′ (1) edges of G in some fixed order by the elements of {1. of the number of erased edge messages sent from the variable nodes. Relation Between γ and α Now that we know the asymptotic variance of the edges messages. Using these variables. 6: Number of degree-one check nodes as a function of the number of erasure messages in the corresponding BP decoder for LDPC(n = 8192. λ. and representing the fraction of degree-one check nodes in the residual graph. at iteration i. The thin lines represent the decoding trajectories that stop when r1 = 0 and the thick line is the mean curve predicted by density evolution. λ. In other words. the scaling parameter α can be related to the (normalized) variance. Think of a virtual process identical to the decoding for r1 > 0 but that continues below the r1 = 0 axis (for a proper definition. the number of degree-one check nodes. the number of erased messages in the decoder after an infinite number of iterations V∗ ≡ lim lim E[( i n→∞ ℓ→∞ µi )2 ] − E[( nL′ (1) (ℓ) i µi )]2 (ℓ) . Let V∗ represent the normalized variance of R1 720 640 560 480 400 320 240 160 80 0 -80 0 2000 4000 6000 8000 10000 [ρ(¯ ) − ρ(¯ ) + ρ (¯ )(1 − 2x ρ(¯ )) x x x x ∗ ∗ ∗ ∗ ∗ 2 ∗2 ′ ∗ ∗ ∗ Taking the expectation of the square on both side and letting ǫ tend to ǫ∗ . · · · . the number of edges in the graph. x) given by density evolution. V∗ is the variance of the point at which the decoding trajectories hit the R1 = 0 axis. We let xi (respectively yi ) be the fraction of erased messages sent from variable to check nodes (from check to variable nodes). To be definite. we have the following characterization of V (ℓ) .r1 |∗ = lim∗ ǫ→ǫ ∂ 2 r1 (ǫ.r1 |∗ = x∗ ∗ λ′ (y ∗ ) ǫ 2 γ. B. A simple calculation shows that if the point at which the curve hits the x-axis varies by ∆x while keeping the minimum at x∗ . In Fig 6 we show R1 . However it can be proved that the order does not matter and V = V∗ . The real decoding process stops when hitting the r1 = 0 axis. as a function of the number of erasure messages for a few decoding runs. x) ∂x2 2 ǫ→ǫ∗ 2 ∗ (x − x∗ )2 V + o((x − x∗ )2 ) = x∗ ǫ∗ λ′ (y ∗ ) lim (1 − ǫλ′ (y)ρ′ (¯))2 V∗ . Using the result (19). the (normalized) variance after ℓ iterations. Consider the curve r1 (ǫ. Lemma 3: Let G be chosen uniformly at random from LDPC(n. These values are determined by the density evolution [1] recursions yi+1 = 1 − ρ(¯i ).

ℓ−j+k} ) x The coefficients Fi are given by ℓ i=1 where we introduced the shorthand j−1 Fi = k=i+1 ǫλ′ (yk )ρ′ (¯k−1 ). ℓ) + N2 (j. 0)V(ℓ) · · · C(0)V(0)(1. 0)V(i) · · · C(i − ℓ)(1. Then V (ℓ) ≡ lim E[( i n→∞ µi )2 ] − E[( nL′ (1) ℓ (ℓ) i µi )]2 (ℓ) (20) Further. i ≥ 0. k) = ρ′ (¯ℓ−k−1 ) 1{j>2k} (ρ′ (¯ℓ−k−1 ) − ρ′ (¯ℓ−j+k )) x x x 0 ρ′ (¯min{ℓ−k−1. yi )) Fi ǫλ (yi ) D(ℓ. . k) = 0 + N2 (j. k − 1)U (j. . . 1) ℓ + i=1 ℓ Fi (xi W (ℓ. 0) · (1. (21) and finally 2ℓ We We define the matrices V(i) = C(i) = ǫλ′ (yi ) 0 λ (1) − ǫλ′ (yi ) λ′ (1) ′ W (ℓ. − + i=1 ℓ−1 i=1 N1 (j. k)[N1 (j. i < 0. initialize by U ⋆ (j. j) V(ℓ) · · · C(2ℓ − j) − ⋆ 0 + j=ℓ+1 U (j. j − ℓ) =(1. α) ℓ−1 . 0) · (λ (1)ρ (1)) ′ ′ ℓ T T ′ ǫλ′ (ymax{ℓ−k. set U ⋆ (j. k) = . 0)V(ℓ) · · · C(2ℓ − j)(1. k) =V(ℓ − j + k)[N1 (j. ℓ) + (1 − yℓ−j )U 0 (j. 1) − ǫW (ℓ. x (28) V(i) · · · C(i − j) ≡ k=0 V(i − k)C(i − k − 1). U 0 (j. k − 1) ⋆ + (1. ℓ) are computed through the following recursion. j) + (1 − yℓ−j )U (j. (24) (25) ρ′ (1) 0 0 ρ′ (1) αλ′ (α) + λ(α) 0 . whereas for j > ℓ. (1 − yℓ−j )ǫλ′ (yℓ ))T . 0)T . U ⋆ (j. 0)V(ℓ) · · · C(2ℓ − j)(0. edges in T4 − xℓ W (ℓ. k)C(ℓ − j + k − 1)U ⋆ (j. 0) =(yℓ−j ǫλ′ (yℓ ). 0) j=0 ℓ−1 2 + xℓ ρ′ (1) V(ℓ) · · · C(ℓ − j) (1. 1)ρ(¯i−1 ) − D(ℓ. 0)T Fi xi xℓ + (1. 0)V(ℓ) · · · C(0)V(0)(1. ℓ). (27) 0 · yℓ−j U ⋆ (j. 0)T edges in T3 The recursion is U ⋆ (j. k − 1) + M2 (j. k − 1)]. k − 1)U (j. α) = k=0 (1. xi−1 x ¯ Fi xℓ + (1. + (1. 0)V(ℓ) · · · C(ℓ − k)A(ℓ. k − 1)U ⋆ (j. U 0 (j. x ρ′ (1) ρ′ (1) − ρ′ (¯i ) 0 ρ′ (¯i ) x λ (1) 0 ′ with A(ℓ. For j ≤ ℓ. k. 0)T ǫλ′ (y2ℓ−j ) 0 ′ ′ ǫ(λ (1) − λ (y2ℓ−j )) . j) and U 0 (j. (22) (23) + xℓ (αλ′ (α) + λ(α))ρ′ (1) i=0 ρ′ (1)i λ′ (1)i . . k≤ℓ k>ℓ . 0) j=1 ℓ V(ℓ) · · · C(ℓ − j) (1. U 0 (j. 0)T edges in T1 λ′ (1)i ρ′ (1)i i=0 2ℓ edges in T2 + xℓ (1.ℓ−j+k} ) x 1{j≤2k} (ρ′ (¯ℓ−j+k ) − ρ′ (¯ℓ−k−1 )) x x . j − ℓ) =(1.8 node) is an erasure. U ⋆ (j. k) =M1 (j. 0) =(0. k − 1)U 0 (j. k − 1)]. 1)T λ′ (1)(1 − ǫ) ǫλ′ (1) . k. k − 1) with M1 (j. 0)V(ℓ) · · · C(2ℓ − j)(0. k) = ρ′ (1) − ρ′ (¯ℓ−k−1 ) x 0 1{j>2k} ǫ(λ′ (yℓ−j+k ) − λ′ (yℓ−k )) 0 λ′ (1) − ǫλ′ (ymin{ℓ−k. α) equal to ǫαyℓ−k λ′ (αyℓ−k ) + ǫλ(αyℓ−k ) αλ (α) + λ(α) − ǫαyℓ−k λ′ (αyℓ−k ) − ǫλ(αyℓ−k ) ′ V(i) = C(i) = 0 λ′ (1) .ℓ−j+k} ) 0 1{j<2k} ǫ(λ′ (yℓ−k ) − λ′ (yℓ−j+k )) ǫλ′ (yℓ−k ) M2 (j. N2 (j. 1)T (1 − ǫ)λ′ (1) (26) =xℓ + xℓ (1. and zero otherwise.ℓ−j+k} ) λ′ (1) − ǫλ′ (yℓ−k ) ρ′ (1) − ρ′ (¯max{ℓ−k−1. 0) j=0 2ℓ yℓ−j U (j. ℓ−1 − . j). .

We consider ℓ variable node levels above the root. We define T to be that subset of the variable-to-check edge (ℓ) indices so that if i ∈ T then the computation trees µi and (ℓ) µ1 intersect. We have depicted in Fig. (ℓ) notice that the channel output on node (A) affects µ1 as well as the message sent on (b) at the ℓ-th iteration. We expand the graph starting from this root node. It contains two sets of edges: the ones ‘above’ and the ones ‘below’ edge 1 (we call this the ‘root’ edge and the variable node it is connected to. this subset remains finite in the large blocklength limit. Fig. Edges below the root are the ones departing from a variable node that can be reached by a non reversing path starting with the opposite of the root edge and involving at most 2ℓ variable nodes (not including the root one). for the case of ℓ = 2. The message µ1 is represented by (a). Λ′ (1)}. (ℓ) (ℓ) c = Define S = limn→∞ E[µ1 i∈T µi ] and S (ℓ) (ℓ) ′ 2 limn→∞ E[µ1 i∈Tc µi ] − nL (1)xℓ . the value received on node (ℓ) (B) affects both µ1 and the message sent on (g) at the ℓ-th iteration. the ‘root’ variable node). according to our definition (b) ∈ T. we complete T by including all edges that are connected to the same variable nodes as edges that are already in T. For any fixed ℓ. n→∞ (ℓ) nL′ (1) (ℓ) = lim E[µ1 n→∞ n→∞ µi ] − E[µ1 ]E[ µi ] − nL′ (1)x2 . This means that T includes all the edges whose messages depend on some of the received values that are used (ℓ) in the computation of µ1 . · · · . and admits a simple characterization. we refer to the subgraph containing all such variable nodes. As usual. Therefore the corresponding computation trees intersect and. the computation tree of (c) does not intersect the one of (a). nL′ (1)} of the set of indices T. (29) into two contributions: the first contribution stems from edges i so (ℓ) (ℓ) that the computation trees of µ1 and µi intersect. 7 an example for the case of an irregular graph with ℓ = 2. each message (ℓ) µi depends upon the received symbols of a subset of the variable nodes. but (c) ∈ T because it shares a variable node with (b). as well as the check nodes (ℓ) connecting them as to the computation tree for µi . 0)V(ℓ) · · · C(ℓ − k + 1)V(ℓ − k + 1) (c) (a) (A) (d) (e) (B) i D(ℓ. On the other hand. we can further identify four different types of terms appearing in S and write S = lim E[µ1 n→∞ (ℓ) i∈T µi ] µi ] + lim E[µ1 i∈T1 (ℓ) µi ] i∈T3 n→∞ (ℓ) (ℓ) i∈T2 (ℓ) (ℓ) = lim E[µ1 n→∞ (ℓ) lim E[µ1 n→∞ µi ]+ µi ] i∈T4 (ℓ) (ℓ) + (ℓ) lim E[µ1 n→∞ . For instance. More precisely. in (20) as µi ]2 (ℓ) µi )2 ] − E[ nL′ (1) (ℓ) (ℓ) . we write n→∞ lim E[µ1 (ℓ) i µi ] − nL′ (1)x2 ℓ (ℓ) (ℓ) = lim E[µ1 n→∞ (ℓ) i∈T µi ] (30) (ℓ) + lim n→∞ E[µ1 (ℓ) i∈Tc µi ] − nL′ (1)x2 ℓ . Edges departing from the root variable node are considered below the root (apart from the root edge itself). The small letters represent messages sent along the edges from a variable node to a check node and (ℓ) the capital letters represent variable nodes. It is useful to split the sum in the first term of Eq. · · · . T is a tree with high probability in the large blocklength limit. i (ℓ) = lim E[µ1 (29) (ℓ) In the last step. Edges above the root are the ones departing from a variable node that can be reached by a non reversing path starting with the root edge and involving at most ℓ variable nodes (not including the root one). We also expand 2ℓ levels below the root. For convenience. As an example. After a finite number of iterations. we have used the fact that xℓ = E[µi ] for any i ∈ {1. Let us look more carefully at the first term of (29). Since ℓ is kept finite. i µi ] (ℓ) = lim E[µj (ℓ) i (ℓ) i i µi ] − E[µj ]E[ (ℓ) (f ) (g) . The set of indices T depends on the number of iterations performed and on the graph realization. ℓ (ℓ) µi ]. 1) Computation of S: Having defined T. and by standard arguments is a tree with high probability. In the middle of the figure the edge (ℓ) (ℓ) (a) ≡ 1 carries the message µ1 . We will call µ1 the root message.9 and 2ℓ (b) (1. and the second one stems from the remaining edges. α) = k=1 · αρ′ (α) + ρ(α) − α(¯ℓ−k )ρ′ (α¯ℓ−k ) − ρ(α¯ℓ−k ) x x x α¯ℓ−k ρ′ (α¯ℓ−k ) + ρ(α¯ℓ−k ) x x x ℓ−1 + xℓ (αρ′ (α) + ρ(α)) Proof: Expand V V (ℓ) = lim E[( j i n→∞ i=0 (ℓ) (ℓ) ρ′ (1)i λ′ (1)i . We compute the two terms in (30) separately. Tc is the complement in {1. 7: Graph representing all edges contained in T.

p)T ¯ (p. (ℓ) (ℓ) (ℓ) =x2 ρ′ (1) ℓ ρ (1) λ (1) . 9 for an illustration). and the fact that the where we used ℓ ℓ−1 expected number of edges in T2 is ρ′ (1) i=0 λ′ (1)i ρ′ (1)i . ¯ with C(i) being defined in Eq. pρ′ (1) + p(ρ′ (1) − ρ′ (1 − xi )). T4 contains the edges below the root variable node that point downwards. such as (c) in Fig. 8: Size of T. (ℓ) (ℓ) In this case. 0 (ℓ) (33) (31) xℓ (1. then the expected number of erased outgoing messages is (j−1)(1−(1−xi )j−2 ). Fig. Averaging over the check node degrees gives us ρ′ (1) − ρ′ (1 − xi ). 9. T2 contains all edges above the root but point downwards. (22) and (23). 9: Number of outgoing erased messages as a function of the probability of erasure of the incoming message.10 ℓ µ1 (ℓ) ℓ 2ℓ Fig. If the incoming message is erased. the message sent on edge (b) is of this type. It contains ℓ layers of variable nodes above the root edge and 2ℓ layer of variable nodes below the root variable node. and µi are with high probability disjoint in the large blocklength limit. V(i + 1)C(i)(p. (23). The situation is similar if we consider a variable node instead of the check node with the matrix the matrix V(i) replacing C(i). We are at the i-th iteration of BP decoding and we pick an edge at random in the graph. known) outgoing messages is the first (respectively. like (d) and (f ). by taking the product of the corresponding matrices. then the computation graph of edge i must contain in expectation an unusual low number of high degree check nodes). p)T . This correlation is however of order O(1/n) and since T only contains a finite number of edges the contribution of this correlation vanishes as n → ∞. The subset T1 ⊂ T contains the edges above the root variable node that carry messages that point upwards (we include the root edge in T1 ). The contribution of the edges in T1 to S is obtained by writing n→∞ lim E[µ1 (ℓ) i∈T1 (ℓ) µi ] µi | µ1 = 1]. It contains ℓ layers of variable nodes below the root variable node. p)T ¯ Fig. T3 contains the edges below the root that carry an upward messages. where the 1 is due to the fact that E[µ1 | µ1 = 1] = 1. In Fig. 0)V(ℓ) · · · C(ℓ − j)(1. p)T ¯ (p. We obtain therefore ℓ−1 i=0 (ℓ) lim E[µ1 n→∞ (ℓ) µi ] i∈T2 with the matrices V(i) and C(i) defined in Eqs. Finally. Let us start with the simplest term. second) component of the vector C(i)(p. 7. is the expected number of erased messages in the j-th layer of edges in T1 . like (e) and (g). Analogously. The gray area represent the computation tree of (ℓ) the message µ1 . ′ i ′ i = lim P{µ1 = 1}E[ n→∞ i∈T1 (32) = x2 . The messages are nevertheless correlated: conditioned on the computation graph of the root edge the degree distribution of the computation graph of edge i is biased (assume that the computation graph of the root edge contains an unusual number of high degree check nodes. involving the messages (ℓ) (ℓ) in T2 . cf. If the incoming message is known. 0)T . If i ∈ T2 . The expected number of erased outgoing messages is therefore. and each summand (1. Averaging over the check node degrees gives us ρ′ (1). p)T ¯ (p. 0)V(ℓ) · · · C(ℓ − j) (ℓ) 1 . 0)T . the expected ¯ number of known outgoing messages is pρ′ (xi ). 7. the messages µ1 and µi do not depend on any common channel observation. The result is generalized to several layers of check and variable nodes. This result ¯ can be written using a matrix notation: the expected number of erased (respectively. consider the following case (cf. then the number of erased outgoing messages is exactly (j − 1). It is connected to a check node of degree j with probability ρj . For the edges in T1 we obtain n→∞ (ℓ) (ℓ) limn→∞ E[µ1 µi ] The conditional expectation on the right hand side is given by ℓ 1+ j=1 lim E[µ1  (ℓ) i∈T1 ℓ j=1 µi ] = xℓ +  (ℓ) (1. p)T ¯ C(i)(p. In order to understand this expression. conditioned on the fact that the root edge is erased at iteration . p)T ¯ V(i)(p. ¯ We want to compute the expected numbers of erased and known messages sent out by the check node on its other edges (outgoing messages). then the computation trees of µ1 . Fig. Assume further that the message carried by this edge from the variable node to the check node (incoming message) is erased with probability p and known with probability p. 0)  V(ℓ)C(ℓ − 1) · · · V(ℓ − j + 1)C(ℓ − j) (1.

0)V(ℓ) · · · C(2ℓ − j)(0. root edge root edge The first sum on the right hand side corresponds to messages (ℓ) µi . In this case. Now (ℓ) multiplying (33) by P{µ1 = 1} = xℓ gives us (31). k) and U 0 (j. do affect both messages. and multiplied by the expected number of edges in T4 at the same distance from the root. Recall that T4 includes all the edges that are below the root node and point downwards. analogously to the case k = j. the contribution of edge (e). Therefore.11 ℓ (notice that µ1 = 1 implies µ1 = 1 for all i ≤ ℓ). ℓ) . The initial conditions are yℓ−j ǫλ′ (yℓ ) 0 U ⋆ (j. The meaning of this case is simple: if j > ℓ then the (ℓ) observations in these layers do not influence the message µ1 . In order to understand the first sum in (34). we have to evaluate the matrices V(i) and C(i) for negative indices using the definitions given in (24) and (25). cf. edge in T4 edge in T4 Fig. 0) = . From these definitions. again conditioned on the incoming message being erased. consider two messages it carries: the message that is sent from the variable node to the check node at the ℓ-th iteration (this ‘outgoing’ message participates in our second moment calculation). ℓ) + (1 − yℓ−j )U 0 (j. and the one sent from the check node to the variable node at the (ℓ−j)-th iteration (‘incoming’). The superscript ⋆ or 0 accounts respectively for the cases where the incoming message is erased or known. 10: The two situations that arise when computing the contribution of T4 . edges (e) and (g) are elements of T4 . The second term in (34) corresponds to edges i ∈ T4 . k) with k ≤ j. U 0 (j. j − ℓ) =(1. 1)T (1 − ǫ)λ′ (1) . Eqs. 0)T ǫλ′ (y2ℓ−j ) 0 ǫ(λ′ (1) − λ′ (y2ℓ−j )) . consider the root edge and an edge i ∈ T4 separated from the root edge by j + 1 variable node with j ∈ {0. j). the case where they are separated by more than ℓ + 1 variable nodes. We start from the root edge and compute the effect of the received values that only affect this message resulting in a expression similar to the one we used to compute the contribution of T1 . 7. 0)V(ℓ) · · · C(ℓ − j) 1 . 0)V(ℓ) · · · C(2ℓ − j)(1. We still have to evaluate U ⋆ (j. 0) = . For this edge in T4 . not all of the received values along the path connecting the two edges. 0)V(ℓ) · · · C(2ℓ − j)(0. edge (g) is of this type. 0) j=ℓ+1 V(ℓ) · · · C(2ℓ − j) · yℓ−j U ⋆ (j. ℓ}. This gives us the following initial condition U ⋆ (j. is (1. In order to do this. is the one at the (ℓ − k)-th iteration. ℓ}. · · · . The outgoing message we consider is the one at the (ℓ − j + k)-th iteration and the incoming message we condition on. In Fig. Its first component is the joint probability that both the root and the outgoing messages are erased conditioned on the fact that the incoming message is erased. We therefore have to adapt the previous recursion. j − ℓ) =(1. Notice that any received value which is on the path between the root edge and the (ℓ) (ℓ) edge in T4 affects both the messages µ1 and µi on the corresponding edges. The vector U 0 (j. + (1. j) + (1 − yℓ−j )U 0 (j. It is easy to check that U ⋆ (j. j) and U 0 (j. This case is slightly more involved than the previous ones. k) and U 0 (j. 10. · · · . i ∈ T4 whose computation tree contains the root variable node. We claim that n→∞ lim E[µ1 ℓ (ℓ) i∈T4 µi ] (34) (ℓ) =(1. 0 In this sum. j) . In the left side we show the case where the two edges are separated by at most ℓ + 1 variable nodes and in the right side. right picture in Fig. U 0 (j. In Fig. 1)T λ′ (1)(1 − ǫ) ǫλ′ (1) . for these steps we only need to count the expected number of edges. The computation is similar for the edges in T3 and results in 2ℓ n→∞ (ℓ) (i) lim E[µ1 (ℓ) i∈T3 µi ] = (ℓ) xℓ j=1 (1. In order to obtain S. · · · . (26) and (27). cf. In the case of Fig. j} is the one given in Lemma 3. j) as follows. This is why this recursion is slightly more involved than the one for T1 . (1 − yℓ−j )ǫλ′ (yℓ ) 0 + (1. Consider now the case of edges in T4 that are separated from the root edge by more than ℓ+1 variable nodes. Its second component is the joint probability that the root message is erased and that the outgoing message is known. Define the two-components vector U ⋆ (j. 7. k). 10. except that this time we consider the root edge and an edge in i ∈ T4 separated from the root edge by k +1 variable nodes. multiplied by the expected number of edges in T4 whose distance from the root is the same as for edge i. 0) j=0 yℓ−j U ⋆ (j. that are separated from the root edge by more than ℓ + 1 variable nodes. 0) yℓ−j U ⋆ (j. The situation is depicted in the left side of Fig. it is clear that the contribution to S of the edges that are in T4 and separated from the root edge by j + 1 variable nodes with j ∈ {0. j) 2ℓ and the recursion is for k ∈ {1. would be counted in this first sum. j) can be computed in a recursive manner using U ⋆ (j. j) is defined in exactly the same manner except that in this case we condition on the incoming message being known. it remains to compute the contribution of the edges in T4 . 7. we define the vectors U ⋆ (j. j) and U 0 (j. when j > ℓ. j) + (1 − yℓ−j )U 0 (j. where ℓ = 2.

To the linear order we get ℓ ℓ δxℓ = i=1 k=i+1 ǫλ′ (yk )ρ′ (¯k−1 ) [ǫδλ(yi ) − ǫλ′ (yi )δρ(¯i−1 )] . On the other hand they are clearly conditionally independent given GT (because. We shall also i Vi (GT )x and C(x. their effect can be evaluated again perturbatively. GT ) = i iCi (GT )xi−1 . the rest of the graph will contain an unusually small number of high degree variable nodes affecting the average (ℓ) (ℓ) E[µj | GT ]. ℓ) + (1 − yℓ−j )U 0 (j. the variable degree distribution changes by δi λ(x) = iλ(x) − ixi−1 + O(1/n2 ). all the messages that are carried by edges in Tc at the ℓ-th iteration are functions of a set of received values (ℓ) distinct from the ones µ1 depends on. GT )) E[µ1 (ℓ) i∈Tc 2 µi ] − nL′ (1)xℓ (ℓ) 1 = ′ Λ (1) ℓ i=1 n→∞ EGT [E[µ1 EGT [|T c (ℓ) i∈Tc µi | (ℓ) 2 | GT ]] − nL′ (1)xℓ −ǫλ′ (yi )(C ′ (1. Define Vi (GT ) (respectively Ci (GT )) to be the number of variable nodes (check nodes) of degree i in GT . The reason is that the messages that flow out of the boundary of GT (both their number and value) depend on GT . The first effect can be characterized by computing the degree distribution on G\GT as a function of GT . It is easy to i iVi (GT )x check that if we take a bipartite graph having a variable degree distributions λ(x) and remove a variable node of degree i. Recall that by definition. for the degree distributions λ(x) = xl−1 and ρ(x) = xr−1 . GT )λ(yi ) − V ′ (yi . This is done by writing 2ℓ (1. GT ) = need the derivatives of these polynomials: V ′ (x. This is indeed the case when the Tanner graph is regular. is that certain messages carried by edges in Tc which are close to GT are affected by messages that flow out of GT . the fraction xℓ of erased variable-to-check messages changes by δxℓ . Consider now an irregular ensemble and let GT be the graph composed by the edges in T and by the variable and check nodes connecting them. 2) Computation of S c : We still need to compute S c = (ℓ) (ℓ) ′ 2 limn→∞ E[µ1 i∈Tc µi ] − nL (1)xℓ . the one with the lowest index). GT ) = i−1 and C ′ (x. x x Fi [ǫ(V ′ (1. We have to stop the recursion at k = ℓ (end of the intersection of the computation trees). It remains to account for the received values that only affect the messages on the edge in T4 . Intuitively. If we let j denote a generic edge in Tc (for instance. GT )ρ(¯i−1 ) − C ′ (¯i−1 . We then have S c = lim n→∞ E[µ1 (ℓ) i∈Tc µi ] − nL′ (1)x2 ℓ (ℓ) = lim (Λ′ (1) − |T|)x2 − Λ′ (1)x2 ℓ ℓ n→∞ 2 = lim |Tc |x2 − Λ′ (1)xℓ ℓ n→∞ We need to compute E[µj | GT ] for a fixed realization of GT and an arbitrary edge j taken from Tc (the expectation does not depend on j ∈ Tc : we can therefore consider it as a random edge as well). ℓ) . and these message affect messages in G\GT . nL′ (1) In the same way. if GT contains an unusually large number of high degree variable nodes. x x (1)x2 ℓ with Fi defined as in Eq. Unlike in the regular case. GT ))] + O(1/n2 ). nL′ (1) (ℓ) = − |T|x2 ℓ with the cardinality of T being |T| = (l − 1) (r − 1) l ℓ + i=1 (l − 1)i−1 (r − 1)i l. The first one is that we are dealing with a fixedsize Tanner graph (although taking later the limit n → ∞) and therefore the degrees of the nodes in GT are correlated with the degrees of nodes in its complement G\GT . (28). which is the second term on the right hand side of Eq.e. Imagine now that we ix the degree distribution of G\GT . µ1 is just a deterministic function of the received symbols in its computation tree). n→∞ (ℓ) (ℓ) (35) . 0) j=ℓ+1 V(ℓ) · · ·C(2ℓ − j) · yℓ−j U ⋆ (j. the remaining graph will have a variable perspective degree distribution that differ from the original by δλ(x) = V ′ (1. conditioned on (ℓ) GT . We will see that the messages carried by the edges in Tc also depend on the realization of GT . At first sight. if we remove GT from the bipartite graph. we can therefore write S c = lim = lim n→∞ 2ℓ i=0 i i Therefore.12 We then apply the recursion given in Lemma 3 to the intersection of the computation trees. one might (ℓ) think that such messages are independent from µ1 . The second reason why E[µj | GT ] differs from xℓ . the check degree distribution when we remove GT changes by δρ(x) = C ′ (1. This value differs slightly from xℓ for two reasons. i. GT ) + O(1/n2 ). (ℓ) The conditional expectation E[µj | GT ] still depends on the detailed structure of GT . Call B the number of edges forming the boundary of GT (edges emanating upwards from the variable nodes that are ℓ = lim n→∞ n→∞ (ℓ) |E[µ1 (ℓ) GT ]E[µj (ℓ) | GT ]] − nL (ℓ) ′ = lim EGT [(nL′ (1) − |T|) E[µ1 | GT ]E[µj | GT ]] − nL′ (1)x2 ℓ (ℓ) (ℓ) = lim nL′ (1) EGT [E[µ1 | GT ]E[µj | GT ]] − nL′ (1)x2 ℓ n→∞ − lim EGT [|T|E[µ1 | GT ]E[µj | GT ]]. (34). GT )ρ(x) − C ′ (x. Since the fraction of such (boundary) messages is O(1/n). GT ) + O(1/n2 ). GT )λ(x) − V ′ (x. It is clear that the root (ℓ) message µ1 depends on the realization of GT . GT is not fixed anymore and depends (in its size as well as in its structure) on the graph realization. GT ) = i i i Ci (GT )x . nL′ (1) If the degree distributions change by δλ(x) and δρ(x). and let V (x.

Lemma 4: Let G be chosen uniformly at random from LDPC(n. nL′ (1)}. 0)T (ℓ) · (1. T. Monticello. and R. We can now use this expression (35) to obtain S c = lim Λ′ (1) EGT [E[µ1 | GT ]E[µj | GT ]] − nL′ (1)x2 ℓ n→∞ (ℓ) (ℓ) The expression provided in the above lemma has been used to plot V (ℓ) for ǫ ∈ (0. is an erasure and equal to zero otherwise. Each of these terms can be computed through recursions that are similar in spirit to the ones used to compute S. 0)T (the term xℓ accounts for the root edge. Stemann. [3] A. submitted to IEEE IT. T. A. A. Control and Computing. lim lim E[( i (ℓ) (ℓ) − lim EGT [|T|E[µ1 | GT ]E[µj | GT ]] n→∞ ℓ i=1 ℓ (ℓ) (ℓ) = − Fi xi E[µ1 V ′ (1. λ. . at the i-th iteration. Spielman. Combining the two above effects. Label the nL′ (1) edges of G in some (ℓ) fixed order by the elements of {1. corresponding to boundary edge). GT ) − ǫV ′ (yi . . GT )] Fi ǫλ′ (yi ) E[µ1 C ′ (1. [2] A. the overall fraction will be xi + δ˜i . Montanari. Rotorua. In preparation. M´ asson. Amraoui. we obtain the above expression. 2004. In Proc. Montanari. 2001. the boundary of GT is in average (λ′ (1)ρ′ (1))ℓ larger than the boundary of the computation tree. Multiplying this by (λ′ (1)ρ′ (1))ℓ . New-Zealand. It is clear what each of these values represent. ρ) and consider transmission over the BEC of erasure probability ǫ. [5] A. . GT )ρ(¯i−1 ) − C ′ (¯i−1 . Montanari. where ˜ x δ˜i = x ⋆ Bi − B xi ˜ + O(1/n2 ). Shokrollahi. 1) and for several values of ℓ in the case of an irregular ensemble in Fig. In Proc. 0)T to account for what happens between the boundary of the computation tree and the boundary of GT . and we omit it here for the sake of space. [7] C. Urbanke. Montanari. D. and R. Practical loss-resilient codes. Amraoui. Finite-length scaling of irregular LDPC code ensembles. Set µi equal to one if the message along edge i from variable to check node. We start by comput(ℓ) ing the expectation of µ1 multiplied by the number of edges in the boundary of its computation tree. It remains to determine the asymptotic behavior of this quantity as the number of iterations converges to infinity. 0)V(ℓ) · · · C(0)V(0)(1. Urbanke. Finally. Urbanke. + ′ (y)ρ′ (¯) 1 − ǫλ x The proof is a (particularly tedious) calculus exercise. M. 0)T (λ′ (1)ρ′ (1)) . [4] A. pages 150–159. incoming to check nodes in G\GT from variable nodes in G\GT. IL. 39th Annual Allerton Conference on Communication. . submitted to IEEE Transactions on Information Theory. Further B messages incoming from the boundaries of ⋆ which Bi are erasures. and the other one of the lower boundary of the computation tree). nL′ (1) This expression simply comes from the fact that at the ith iteration. 1997.e. (ℓ) the expectation of µ1 times the number of edges in the boundary of its computation tree is computed analogously to (ℓ) (ℓ) ℓ R EFERENCES [1] T. 42th Annual Allerton Conference on Communication. and R. (ℓ) ⋆ The calculation of E[µ1 Bi ] is similar. In Proc. We claim that E[µ1 B] = xℓ + (1. GT )] is the expectation of µ1 times the number of edges that are in GT . IEEE Information Theory Workshop. A. This number has to be multiplied by (1. IL. Aug-Sept 2005. [6] M. ′ where we took the limit n → ∞ and replaced |T| by V ′ (1. GT ) −ǫλ′ (yi )(C ′ (1. We will just explain in further detail how the terms (ℓ) ⋆ (ℓ) E[µ1 B] and E[µ1 Bi ] are computed. A. Mitzenmacher. Control and Computing. The reason is that µ1 depends only on the realization of its computation tree and not on the whole GT . GT ). 0)V(i) · · · C(i − ℓ)(1. 0)V(i)C(i − 1) · · · V(i − ℓ + 1)C(i − ℓ)(1. 2006. June 2004. Let xi be the fraction of erased ˜ messages. Taking into account the messages coming from variable nodes in GT (i. (ℓ) (ℓ) E[µ1 V ′ (1. We therefore obtain ⋆ E[µ1 Bi ] = xℓ + (1. we have for an edge j ∈ Tc E[µj | GT ] = xℓ + (ℓ) what has been done for the contribution of S. Maxwell’s construction: e The hidden bridge between maximum-likelihood and iterative decoding. 0)V(ℓ) · · · C(0)V(0)(1. Finite-length scaling for iteratively decoded ldpc ensembles. In Proceedings of the 29th annual ACM Symposium on Theory of Computing. after ℓ iterations. GT )] x ℓ−1 ℓ−1 ⋆ Fi E[µ1 Bi ] − (ℓ) (ℓ) (ℓ) i=1 (ℓ) + i=1 Fi xi E[µ1 B] − xℓ E[µ1 V GT (1)]. Montanari. 1 ′ (1) Λ ℓ−1 i=1 ℓ i=1 Fi [xi V ′ (1. Amraoui. Cambridge University Press. .13 levels above the root edge and emanating downwards from the variable nodes that are 2ℓ levels below the root variable node) ⋆ and let Bi be the number of erased messages carried at the i-th iteration by these edges. Richardson and R. A. From the definitions of GT . Modern Coding Theory. Monticello. The result is xℓ + (1. 0)V(ℓ) · · · C(0)V(0)(1. Richardson. and R. These recursions are provided in the body of Lemma 3. 0)T . Richardson. For example. 5. Urbanke. GT )]ρ(¯i−1 ) x (ℓ) (ℓ) (ℓ) i=1 −E[µ1 C ′ (¯i−1 . GT ))] x x + 1 Λ′ (1) ⋆ Fi (Bi − Bxi ) + O(1/n2 ). Then µi )2 ] − E[( i µi )]2 = ℓ→∞ n→∞ nL′ (1) ǫ2 λ′ (y)2 (ρ(¯)2 − ρ(¯2 ) + ρ′ (¯)(1 − 2xρ(¯)) − x2 ρ′ (¯2 )) x x x x ¯ x + (1 − ǫλ′ (y)ρ′ (¯))2 x ǫ2 λ′ (y)2 ρ′ (¯)2 (ǫ2 λ(y)2 − ǫ2 λ(y 2 ) − y 2 ǫ2 λ′ (y 2 )) x + (1 − ǫλ′ (y)ρ′ (¯))2 x 2 2 (x − ǫ λ(y ) − y 2 ǫ2 λ′ (y 2 ))(1 + ǫλ′ (y)ρ′ (¯)) + ǫy 2 λ′ (y) x . GT )] − ǫE[µ1 V ′ (yi . A. 2005. Urbanke. and V. Luby. Finite-length scaling and finite-length shift for low-density parity-check codes. we have (nΛ′ (1) − T) = nΛ′ (1)(1 + O(1/n)) messages in the complement of GT of which a fraction xi is ˜ erased. Finite-size scaling of good codes.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->