Error Correction and LDPC decoding

1
Error Correction in Communication Systems
2
 Error: Given the original frame k and the
received frame k’, how many corresponding bits
differ?
 Hamming distance (Hamming, 1950).
 Example:
 Transmitted frame: 1110011
 Received frame: 1011001
 Number of errors:

noise
Transmitter
Binary
information
Corrected
information
frame
Corrupted
frame
channel
Receiver
3
3
Error Detection and Correction
 Add extra information to the original data being
transmitted.
 Frame = k data bits + m bits for error control: n = k + m.
 Error detection: enough info to detect error.
 Need retransmissions.
 Error correction: enough info to detect and
correct error.
 Forward error correction (FEC).
Error Correction in Communication Systems
Encoder
(Adding
Redundancy)
Channel
Decoder
(Error Detection
and Correction
Noise
Binary
information
Corrected
information
Encoded
information
Corrupted
information
with noise
Reed Solomon
codes
Hamming
codes
LDPC
Introduced
Convolutional
codes
BCH codes
Renewed interest
in LDPC
Turbo
codes
1970
1960
1990 2000 1980
Practical
implementation
of codes
LDPC beats
Turbo and
convolutional
codes
Modulation

 The information format is changed *******
 Binary Phase Shift-Keying (BPSK) Modulation
 1-2X

5
signal power is two times noise power
Key terms
 Encoder : adds redundant bits to the sender's bit stream to
create a codeword.
 Decoder: uses the redundant bits to detect and/or correct as
many bit errors as the particular error-control code will allow.
 Communication Channel: the part of the communication
system that introduces errors.
 Ex: radio, twisted wire pair, coaxial cable, fiber optic cable, magnetic
tape, optical discs, or any other noisy medium
 Additive white Gaussian noise (AWGN)


 Larger noise makes the
distribution wider
6
Important metrics
 Bit error rate (BER): The probability of bit error.
 We want to keep this number small
 Ex: BER=10
-4
means if we have transmitted10,000 bits, there is 1 bit
error.
 BER is a useful indicator of system performance independent of error
channel
 BER=Number of error bits/ total number of transmitted bits
 Signal to noise ratio (SNR): quantifies how much a signal has
been corrupted by noise.
 defined as the ratio of signal power to the noise power corrupting the
signal. A ratio higher than 1:1 indicates more signal than noise
 often expressed using the logarithmic decibel scale:



 Important number: 3dB means
7
signal power is two times noise power
Error Correction in Communication Systems
 Goal: Attain lower BER at smaller SNR
 Error correction is a key component
in communication and storage
applications.
 Coding example: Convolutional,
Turbo, and Reed-Solomon codes
 What can 3 dB of coding gain buy?
 A satellite can send data with half the
required transmit power
 A cellphone can operate reliably with
half the required receive power
Signal to Noise Ratio (dB)
Figure courtesy of B. Nikolic, 2003
(modified)
B
i
t


E
r
r
o
r

P
r
o
b
a
b
i
l
i
t
y

10
0
10
-1
10
-2
10
-3
10
-4
0 1 2 3 4 5 6 7 8
3 dB

Convolutional
code
Uncoded system
noise
8
Information
k-bit
channel
Codeword
n-bit
Receivedword
n-bit
Decoder
(check parity,
detect error)
Encoder
(add parity)
Corrected
Information
k-bit
LDPC Codes and Their Applications
 Low Density Parity Check (LDPC) codes have superior
error performance
 4 dB coding gain over convolutional codes

 Standards and applications
 10 Gigabit Ethernet (10GBASE-T)
 Digital Video Broadcasting
(DVB-S2, DVB-T2, DVB-C2)
 Next-Gen Wired Home
Networking (G.hn)
 WiMAX (802.16e)
 WiFi (802.11n)
 Hard disks
 Deep-space satellite missions

Signal to Noise Ratio (dB)
B
i
t


E
r
r
o
r

P
r
o
b
a
b
i
l
i
t
y

10
0
10
-1
10
-2
10
-3
10
-4
0 1 2 3 4 5 6 7 8
4 dB

Conv. code
Uncoded
Figure courtesy of B. Nikolic, 2003
(modified)
9
Future Wireless Devices Requirements
 Increased throughput
 1Gbps for next generation of WiMAX
(802.16m) and LTE (Advanced LTE)
 2.2 Gbps WirelessHD UWB [Ref]
 Power budget likely
 Current smart phones require 100 GOPS
within 1 Watt [Ref]
 Required reconfigurability for
different environments
 A rate-compatible LDPC code is proposed
for 802.16m [Ref]
 Required reconfigurability for
different communication standards
 Ex: LTE/WiMax dual-mode cellphones
require Turbo codes (used in LTE) and
LDPC codes (used in WiMAX)
 Requires hardware sharing for silicon area
saving

Future Digital TV Broadcasting Requirements
 High definition television for
stationery and mobile users
 DTMB/DMB-TH (terrestrial/mobile),
ABS-S (satellite), CMMB
(multimedia/mobile)
 Current Digital TV (DTV)
standards are not well-suited for
mobile devices
 Require more sophisticated signal
processing and correction algorithms
 Require Low power
 Require Low error floor
 Remove multi-level coding
 Recently proposed ABS-S
LDPC codes (15,360-bit
code length, 11 code rates)
achieves FER < 10
-7

without concatenation [Ref]

Future Storage Devices Requirements
 Ultra high-density storage
 2 Terabit per square inch [ref]
 Worsening InterSymbol Interference (ISI) [ref]
 High throughput
 Larger than 5 Gbps [ref]
 Low error floor
 Lower than 10
-15
[ref]
 Remove multi-level coding
 Next generation of Hitachi IDRC read-channel technology [9]

Encoding Picture Example
H.V
i
T
=0
1 0 1 1 1 0 1 1 0 0 0 0 … 1 1 1 1 0 0 0 1 1 1 1 1
0 1 0 0 1 0 0 0 1 0 1 1 … 0 1 1 0 0 1 0 0 0 1 1 0
0 1 0 1 0 0 0 1 1 1 1 1 … 1 0 0 1 0 0 1 1 1 0 0 1
0 1 0 0 0 1 1 0 0 1 0 0 … 0 1 0 0 1 1 1 0 0 1 0 0
… ...
V =
1 0 0 0 0 0 0 0 0 0 . . .1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 0 0 . . .1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 0 . . .0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0
0 0 0 1 0 0 0 0 0 0 . . .0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1
… ...
H=
Parity Image
V=

Binary multiplication called syndrome check
Decoding Picture Example
20 40 60 80 100 120 140 160 180 200
50
100
150
200
250
20 40 60 80 100 120 140 160 180 200
50
100
150
200
250
50 100 150 200
50
100
150
200
250
20 40 60 80 100 120 140 160 180 200
50
100
150
200
250
Iterative message passing decoding
Receiver noise
Iteration 1
Transmitter
Iteration 5 Iteration 15
Iteration 16
channel
Ethernet cable,
Wireless,
or Hard disk
LDPC Codes—Parity Check Matrix
 Defined by a large binary matrix, called a parity check matrix or H matrix
 Each row is defined by a parity equation
 The number of columns is the code length

 Example: 6x 12 H matrix for a12-bit LDPC code
 No. of columns=12 (i.e. Receivedword (V) = 12 bit)
 No. of rows= 6
 No. of ones per row=3 (row weight)
 No. of ones per col= 2 (column weight)


15
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 0 1 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 1 0 1 0 0
H =
C1
V3 V4 V8 V1 V2 V5 V6 V7 V9
C2
C3
C4
C5
C6
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
V11 V10 V12
0
LDPC Codes—Tanner Graph
 Interconnect representation of H matrix
 Two sets of nodes: Check nodes and Variable nodes
 Each row of the matrix is represented by a Check node
 Each column of matrix is represented by a Variable node
 A message passing method is used between nodes to correct errors
(1) Initialization with Receivedword
(2) Messages passing until correct
Example:
V3 to C1, V5 to C1,
V8 to C1, V10 to C1
C2 to V1, C5 to V1
16
C 1 C 2 C 3 C 4 C 5 C 6
V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 V 10 V 11 V 12
Check nodes
Variable nodes
Receivedword from channel
C 1 C 2 C 3 C 4 C 5 C 6
V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 V 10 V 11 V 12
Check nodes
Variable nodes
C 1 C 2 C 3 C 4 C 5 C 6
V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 V 10 V 11 V 12
Check nodes
Variable nodes
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 0 1 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 1 0 1 0 0
H =
C1
V3
V4
V8
V1
V2 V5
V6
V7
V9
C2
C3
C4
C5
C6
0 0
1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
V11 V10
V12
0
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
Transmission scenario


17
Message Passing: Variable node processing
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
λ is the original received information from the channel
18
α: message from check to
variable node
β: message from variable
to check node
j ij
h j
j
ij
Z ì o + =
¿
=
'
1 , '
'
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
ij j ij
Z o | ÷ =
Message Passing: Check node processing (MinSum)
( ) Sfactor sign
ij
j j h j
j j h j
ij MS ij
ij
ij
×
|
|
.
|

\
|
×
|
|
.
|

\
|
=
= =
= =
[ '
' , 1 , '
' , 1 , '
'
min
'
'
| | o
19
Sign
Magnitude
After check node
processing, the next
iteration starts with
another variable
node processing
(begins a new
iteration)
C 1 C 2 C 3 C 4 C 5 C 6
V 1 V 2 V 3
V
4 V 6 V 7 V 8 V 9 V 10 V 11 V 12 V6
Check nodes
Variable nodes
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
Code Estimation
 Based on your modulation
scheme (here BPSK)
estimate the transmitted
bits
20
^
V
^
V
Z
Syndrome Check
 Compute syndrome


 Ex:
21
H.V
i
T
=0 (Binary multiplication)
 If syndrome =0, terminate
decoding
 Else, continue another iteration

^
Example
Encoded information V= [1 0 1 0 1 0 1 0 1 1 1 1]
22
BPSK modulated= [-1 1 -1 1 -1 1 -1 1 -1 -1 -1 -1]

λ (Received data from channel)=
[ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8]
Estimated code=
V= 1 0 1 0 1 0 0 0 1 1 1 1

^
Information
k-bit
channel
Codeword (V)
n-bit
Receivedword(λ)
n-bit
Decoder
(iterative
MinSum)
Encoder
Corrected
Information
n-bit
BPSK
modulation
Ex: Variable node processing (iteration 1)
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
23
0
-9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8
=
12
| =
15
|
0
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
=
1
Z
λ=
Ex: Check node processing (Iteration 1)
24
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
1 ) 1 )( 1 )( 1 ( ) (
} 5 . 2 , 6 . 1 , 6 . 3 { | |
13
13
÷ = ÷ =
=
o
o
Sign
Min
=-9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8
=
=
) (
| |
14
14
o
o
Sign =
=
) (
| |
18
18
o
o
Sign =
=
) (
| |
110
110
o
o
Sign
|
1 = Sfactor
 Here assume
C 1 C 2 C 3 C 4 C 5 C 6
V 1 V 2 V 3
V
4 V 6 V 7 V 8 V 9 V 10 V 11 V 12 V6
Ex: Code Estimation (Iteration 1)
25
^
V
^
V
Z
= 1 0 1 0 1 0 0 0 1 1 1 1

Z=λ =
[ -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8]
^
V
Ex: Syndrome Check (iteration 1)
 Compute syndrome


H.V
i
T
=0 (Binary multiplication)

^
26
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
¿
=
=
= e
i
i
h h j
j i
Syndrome Syndrom Sum
v XOR Syndrome
ij ij
0 |
) (

Sumsyndrome=2 Not ZERO => Error, continue decoding

1
0
1
0
1
0
0
0
1
1
1
1

=
x
0
0
1
1
1
0
0
=
Second iteration
 In variable node processing, compute β, α and Z based on the algorithm

27
Z= [-12.1 7.1 -4.5 7.7 -7.2 4.4 -4.2 7.2 -10.0 -7.7 -8.9 -8.1]

[ 1 0 1 0 1 0 1 0 1 1 1 1 ]

^
V=
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
-1.4
-1.6
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
λ= -9.1 4.9 -3.2 3.6 -1.4 3.1 0.3 1.6 -6.1 -2.5 -7.8 -6.8
λ=
Ex: Syndrome Check (iteration 2)
 Compute syndrome


H.V
i
T
=0 (Binary multiplication)

^
28
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
¿
=
=
= e
i
i
h h j
j i
Syndrome Syndrom Sum
v XOR Syndrome
ij ij
0 |
) (

Sumsyndrome= ZERO => corrected code Terminate Decoding

1
0
1
0
1
0
1
0
1
1
1
1

=
x
0
0
0
0
0
0
0
=
Full-Parallel Decoding
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Check nodes
 Every check node and variable
node is mapped to a processor
 All processors directly connected
based on the Tanner graph
 Very High throughput
 No large memory storage elements
(e.g. SRAMs)
 High routing congestion
 Large delay, area, and power caused
by long global wires

29
Chk
1
Chk
2
Chk
5
Var
1
Var
2
Var
3
Var
12
Chk
1
Chk
2
Chk
5
Var
1
Var
2
Var
3
Var
12
init: all α = 0
Chk
1
Chk
2
Chk
6
Var
1
Var
2
Var
3
Var
12
λ from channel
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Check nodes
Var
1
Var
2
Var
3
Var
12
Chk
1
Chk
2
Chk
6
Var
1
Var
2
Var
3
Var
12
Chk
1
Chk
2
Chk
6
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Check nodes
Var
1
Var
2
Var
3
Var
12
Chk
1
Chk
2
Chk
6
Variable nodes
Full-Parallel LDPC Decoder Examples
For all data in the plot:
 Same automatic place & route flow is used
 CPU: Quad Core, Intel Xeon 3.0GHz
 Ex 1: 1024-bit decoder, [JSSC 2002]
 52.5 mm
2
, 50% logic utilization, 160 nm CMOS
 Ex 2: 2048 bit decoder, [ISCAS 2009]
 18.2 mm
2
, 25% logic utilization, 30 MHz,
65 nm CMOS
 CPU time for place & route>10 days

10
5
10
6
10
7
0
100
200
300
Number of wire connections
C
P
U

t
i
m
e

(
h
o
u
r
s
)
512 Chk &
1024 Var
Proc.
384 Chk
& 2048
Var Proc.
Serial Decoder Example
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Check nodes
Variable nodes
Mem
Row Col
Mem
Row Col
Mem
Row Col
(2) compute V1
and store
V2 V3
V4 V5 V6
V7 V8 V9
V10 V11 V12
(1) initialize memory
(clear contents)
(3) …now
compute C1
and store
C2 C3
C4 C5 C6
Decoding Architectures
 Partial parallel decoders
 Multiple processing units
and shared memories
 Throughput: 100 Mbps-Gbps
 Requires Large memory
(depending on the size)
 Requires Efficient Control and
scheduling


Var Var Var Var
Mem Mem Mem Mem
Mem Mem Mem Mem
Mem Mem Mem Mem
Chk
Chk
=
1 0 0 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
0 0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1
1 0 0 0 1 1 0 0
H
0 0 1
0 1 0
0 1 0
1 0 0
0 0 1
1 0 0
0
Reported LDPC Decoder ASICs
2000 2002 2004 2006 2008 2010
10
1
10
2
10
3
10
4
10
5
Year
T
h
r
o
u
g
h
p
u
t

(
M
b
p
s
)
Partial-parallel Decoder
Full-parallel Decoder
10GBASE-T
802.16e
DVB-S2
802.11n
802.11a/g
Throughput Across Fabrication Technologies
 Existing ASIC implementations without early termination
 Full-parallel decoders have the highest throughput
65 90 130 160 180
10
1
10
2
10
3
10
4
CMOS Technology (nm)
T
h
r
o
u
g
h
p
u
t

(
M
b
p
s
)
Partial-parallel Decoder
Full-parallel Decoder
Energy per Decoded Bit in Different Technologies
 Existing ASIC implementations without early termination
 Full-parallel decoders have the lowest energy dissipation
65 90 130 160 180
10
-1
10
0
10
1
10
2
CMOS Technology (nm)
E
n
e
r
g
y

p
e
r

b
i
t

(
n
J
/
b
i
t
)
Partial-parallel Decoder
Full-parallel Decoder
Circuit Area in Different Technologies
 Full-parallel decoders have the largest area due to the high
routing congestion and low logic utilization
65 90 130 160 180
10
0
10
1
10
2
10
3
CMOS Technology (nm)
A
r
e
a

o
f

D
e
c
o
d
e
r

C
h
i
p

(
m
m
2
)
Partial-parallel Decoder
Full-parallel Decoder
Key optimization factors
 Architectural optimization
 Parallelism
 Memory
 Data path wordwidth (fixedpoint format)
37
Check
Node
+ + +
_
β
β
i α
λ
α
i
λ
α
i
i=1
λ +
Variable Node
clk
W
c
α
i
i=1
W
c
α
1
...α
W
c
Architectural optimization

38
BER performance versus quantization format

39
SNR(dB)
Check Node Processor
 Wr/inputs
 Log2(Wr/Spn) comp
stages
 Split-Row Threshold
 The same benefits as
Split-Row
 Added two comparators
and a few logic gates

Min1
Min2
β
1
β
Wr/Spn
| β
Wr/Spn
|
IndexMin1
β
2
| α
Wr/Spn
|
| α
1
|
β
n–1
β
n
β
Wr/Spn – 1
| β
2
|
| β
n–1
|
| β
n
|

Wr/Spn – 1
|
L = log2(Wr)
Comp
Comp
Comp
Comp
Comp
Comp
Sign (β
1
)
Sign (α
Wr/Spn
)
Sign (β
Wr/Spn
)
Sign (α
1
)
Sign Logic
Mag Logic

41
Variable Node Processor
 Based on the variable
update equation
 The same as the
original MinSum and
SPA algorithms
 Variable node
hardware complexity
is mainly reduced via
wordwidth reduction
j ij
h j
ij
ij
ì o | + =
¿
=
'
1 , '
'
+
+
+
3
λ
i

λ
i
+ αj
SM
to 2's
2's to
SM
2's to
SM
β
wc
SM
to 2's
j=1:wc
β
1
α
1

α
wc

+
+
+
+
+
+
-
-
SAT
SAT
8
6
5
8
7
7
7
seven 5-bit
inputs
Partial parallel decoder example

43
802.11ad LDPC code

44