You are on page 1of 32

500Mb/s Soft Output Viterbi Decoder

Engling Yeo
Stephanie Augsburger, Wm. Rhett Davis, Borivoje Nikolić
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley, CA 94720, USA

Engling Yeo University of California, Berkeley 1


Turbo Encoder/Decoder
Noise

x1 u2 x2 y
u1 (11,13)
Encoder
π
EPR4
Channel
+
Encoder

y EPR4 û2 −1
x̂1 (11,13) û1
π
Decoder Decoder

ûˆ 2 x̂ˆ1
π
Decoder
Example high-throughput application: Magnetic Disk-Drive Read Channel
EPR4: Enhanced Partial Response Class 4
π : Interleavers
Engling Yeo University of California, Berkeley 2
Outline

ƒ Soft Output Viterbi Algorithm (SOVA)

ƒ Decoder Architecture

ƒ Add-Compare-Select Structures

ƒ Survivor Memory Traceback

ƒ Design Flow and Testing

Engling Yeo University of California, Berkeley 3


Viterbi Algorithm State
000
001
α 010
011
mk 100
101
110
111
Time: k-M k n
L-step VA
ƒ Trellis representation of convolutional code
ƒ Selection between pair of competing paths at each node
ƒ Decision traceback
ƒ Most likely node, mk
ƒ Series of ML nodes form most-likely path, α
Engling Yeo University of California, Berkeley 4
Soft Output Viterbi Algorithm (SOVA) State
000
001
α 010
011
β mk 100
101
110
111
Time: k-M k n
M-step SOVA L-step VA
ƒ Next most likely path (β)
ƒ Assumes α and β share at least one common node
ƒ Two ML paths provide complementary bit decisions
ƒ Branching occurs at node with minimum difference in path metric
ƒ Soft output = Difference in path metric of α and β
Engling Yeo University of California, Berkeley 5
SOVA Decoder Architecture

Branch 8×
Channel Metric Compare-
Input Gen. Select- 3 Most-
Add 8 × CSA Survivor Memory Unit
decisions likely
(L-step SMU) state

ƒ Viterbi Decoder
ƒ Memories
ƒ Soft Output Evaluate

Engling Yeo University of California, Berkeley 6


SOVA Decoder Architecture
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select- 3 Most-
Add 8 × CSA Survivor Memory Unit
decisions likely
(L-step SMU) state

ƒ Viterbi Decoder 8 × L-step FIFO


(CSA Decisions)
ƒ Memories
ƒ Soft Output Evaluate

Engling Yeo University of California, Berkeley 7


SOVA Decoder Architecture
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select-
Add 8 × CSA Survivor Memory Unit
decisions
(L-step SMU)

ƒ Viterbi Decoder

Most-likely state
8 × L-step FIFO
(CSA Decisions)
ƒ Memories Path Equivalence Detector
ƒ Soft Output Evaluate (M-step PED)
3
• • •

EQiˆ,M EQiˆ,2 EQiˆ,1


Decoded Soft Reliability Measure Unit ∆ iˆ
Output (M-step RMU)
Engling Yeo University of California, Berkeley 8
Micro Architecture: Compare Select Add
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select-
Add 8 × CSA Survivor Memory Unit
decisions
(L-step SMU)

Most-likely state
8 × L-step FIFO
(CSA Decisions)

Path Equivalence Detector


(M-step PED)
3
• • •

Decoded Soft Reliability Measure Unit


Output (M-step RMU)
Engling Yeo University of California, Berkeley 9
Add-Compare-Select Structures

-
bm00 (n )
sm0 (n ) +
sm0 (n + 1)
bm10 (n )
sm1(n ) +

Add Comp Sel

ƒ Traditional throughput bottleneck

ƒ Previous implementations: Radix-4 ‘loop unrolling’


— 2.7× area increase (overall decoder) for 40% speedup

Engling Yeo University of California, Berkeley 10


Retiming: Compare-Select-Add

-
bm00 (n + 1)
+ sm0 (n + 1) + bm00 (n + 1)
bm00 (n ) + sm0 (n )

sm0 (n + 1)

bm10 (n ) + sm1(n )
+ sm0 (n + 1) + bm01(n + 1)
bm01(n + 1)

Comp Sel Add

ƒ Forward pipeline to include ADD operations of next stage.

[G. Fettweis, et. al., “Reduced-complexity Viterbi detector architectures for partial
response signaling,” Proc. IEEE GLOBECOM, Nov 1995.]
[I. Lee and J. L. Sonntag, “A new architecture for the fast Viterbi algorithm,” Proc. IEEE
GLOBECOM, Nov. 2000.]

Engling Yeo University of California, Berkeley 11


Transformation of CSA

- ∆
bm00 (n + 1)
+
+ sm0 (n + 1) + bm00 (n + 1)
sm0 (n ) + bm00 (n )
sm1(n ) + bm10 (n )
+
bm01(n + 1) sm0 (n + 1) + bm01(n + 1)
+
Comp/ Sel
Add
ƒ Parallel executions of Add and Compare
ƒ 34% reduction in critical delay
ƒ 22% area penalty (overall decoder)

Engling Yeo University of California, Berkeley 12


Micro Architecture: Survivor Memory Unit
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select-
Add 8 × CSA Survivor Memory Unit
decisions
(L-step SMU)

Most-likely state
8 × L-step FIFO
(CSA Decisions)

Path Equivalence Detector


(M-step PED)
3
• • •

Decoded Soft Reliability Measure Unit


Output (M-step RMU)
Engling Yeo University of California, Berkeley 13
Survivor Memory Traceback
Mux
Register

dACS0
Traceback recovers most-likely path.
dACS1
— SRAM Memory – Slower, Low power and
dACS2 area.

dACS3 — Register Exchange – short critical path


Small increase in area and power, due to small
dACS4 number of states in design.

dACS5

dACS6

dACS7
Engling Yeo University of California, Berkeley 14
Micro Architecture: Path Equivalence Detector
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select-
Add 8 × CSA Survivor Memory Unit
decisions
(L-step SMU)

Most-likely state
8 × L-step FIFO
(CSA Decisions)

Path Equivalence Detector


(M-step PED)
3
• • •

EQiˆ,M EQiˆ,2 EQiˆ,1


Decoded Soft Reliability Measure Unit ∆ iˆ
Output (M-step RMU)
Engling Yeo University of California, Berkeley 15
Path Equivalence Detection (PED)
State
000

001

010 SOVA requirements:


Ensure branching off ML path results in
011 complementary decisions.

100

101

110

111
Traceback
depths 0 1 2 3 4
Engling Yeo University of California, Berkeley 16
Path Equivalence Detection (PED)
State
000

001

010 SOVA requirements:


Ensure branching off ML path results in
011 complementary decisions.

100 — Yellow path: Equivalent


— Red path: Complementary
101

110

111
Traceback
depths 0 1 2 3 4
Engling Yeo University of California, Berkeley 17
Path Equivalence Detector

1,2
1,1
EQ 1

EQ2
0,

0,
EQ

EQ

ƒ XOR gates test for equivalence between inputs to each multiplexer


ƒ ML decision from SMU used to select EQiˆ, j for j=1,2,…,M to RMU
ƒ EQiˆ, j : Equivalence test at ML node iˆ with traceback length j

Engling Yeo University of California, Berkeley 18


Micro Architecture: Reliability Measure
8 × L-step FIFO
Branch 8× (Path Metric Differences)
Channel Metric Compare-
Input Gen. Select-
Add 8 × CSA Survivor Memory Unit
decisions
(L-step SMU)

Most-likely state
8 × L-step FIFO
(CSA Decisions)

Path Equivalence Detector


(M-step PED)
3
• • • •

EQiˆ,M EQiˆ,2 EQiˆ,1


Decoded Soft Reliability Measure Unit ∆ iˆ
Output (M-step RMU)
Engling Yeo University of California, Berkeley 19
Reliability Measure Unit
∆iˆ
1 1
r j −1 ∆ iˆ < r j −1 rj ∆ iˆ < r j r j +1
... < <
...
0 0

SEL SEL

EQiˆ, j EQiˆ, j +1

ƒ Determines minimum difference in path metrics


∆ iˆ Difference in competing path metrics at ML node iˆ
EQiˆ, j Equivalence test at ML node iˆ with traceback length j ; j = 1,2,…M
ƒ Recursion: r j −1 ; EQiˆ, j = 1
rj =
(
min r j −1, ∆ iˆ ) ; EQiˆ, j = 0

Engling Yeo University of California, Berkeley 20


Automated Design Flow
Simulink Description Module Compiler Description

Cross
Verification

Netlist

ƒ Fixed-point cycle-true Simulink description


ƒ Netlist generation from Module Compiler
ƒ Automatic placement and routing
ƒ Custom clock tree (75ps skew, 120ps rise/fall times)
ƒ Built-in delay line for speed characterization

[W. R. Davis, et. al., "An Automated Design Flow for High-Throughput Low-Power Dedicated
Signal Processing Systems," IEEE JSSC, Mar 2002.]
Layout
Engling Yeo University of California, Berkeley 21
Technology and Physical Parameters
Technology:
ƒ General-purpose 0.18µm CMOS with 6 metal layers
ƒ Dual threshold available (used only high-speed transistors)
ƒ 1.8V power supply

Implemented decoders:
ƒ Soft Output Viterbi Algorithm
ƒ Convolutional codes
– 8-state Octal(11,13) code
– 8-state enhanced partial response class-4 (EPR4)
ƒ Speed: 500Mb/s
ƒ Power: 400mW
ƒ Core Area: 1mm x 0.5mm
ƒ Transistor count: 170,000
Engling Yeo University of California, Berkeley 22
Measured Performance

Freq (x100MHz)

Power (x 100mW)

Engling Yeo University of California, Berkeley 23


Summary

„ 500Mb/s soft-output Viterbi decoder

„ 1.8V, 0.18µm CMOS technology

„ Architectural transformations of add-compare-select structures

„ Modified register exchange

„ Automated design flow permits efficient analysis of bottleneck


issues.

„ Performance of chip characterized from 1.2V through 2V supply


Engling Yeo University of California, Berkeley 24
Die Micrograph

1.8V, 0.18µm CMOS with 6 metal layers


Trans.
Decoder Speed Power
Count

(11,13) 500Mb/s 400mW 174k

EPR4 500Mb/s 395mW 164k

„ Acknowledgments:
— T. Smilkstein, P. Pakzad and V. Anantharam of UC Berkeley for technical assistance
— ST Microelectronics for fabrication of the test chip.
— Texas Instruments for support under the UC MICRO program.
Engling Yeo University of California, Berkeley 25
END

Engling Yeo University of California, Berkeley 26


Unrolled Iterative Decoder

Channel
Observations
D2
Extrinsic π−1
SISO D1
π
Intrinsic D2
π−1
D1
• Unrolled and pipelined decoder to achieve
desired throughput rates (> 1Gbps) D2 π
• Complexity (linear increase) π-1 D1
• Latency (multi-sector)

[G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, “VLSI architectures


for turbo codes,” IEEE Trans. VLSI Systems, Sep 1999. p.369-79.]

Engling Yeo University of California, Berkeley 27


Comparison of SOVA decoder implementations

Publication Process Speed Power Number Area Scaled to 4-state 2.5V,


Tech. (Mb/s) of (mm2) 0.5µm technology
States
Area Speed Power
(mm2) (Mb/s) (mW)
Yeo, et. al., 0.18 µm 500 400 mW 8 0.5 1.9 180 386
2002 CMOS @ 1.8V
Garrett & Stan, 1.2 µm 2.5 23 mW 4 23 4.0 6.0 13.2
2001 CMOS @ 3.3V
Conway & 0.7 µm 75 N.A. 4 14 7.1 105 N.A.
Nelson,1996 CMOS
Joeressen & 1 µm 40 1.6 W 16 43 2.7 80 230
Meyr, 1994 CMOS @ 3.3V

Engling Yeo University of California, Berkeley 28


Log-likelihood Operations

1
Perr = ; ∆ = M β − Mα
1 + exp(∆ )

Assumption: Values of path metrics, Mα and Mβ, dominate over that of other paths

 1 − Perr 
log-likelihood of error = log  = ∆ = M β − Mα

 Perr 

Engling Yeo University of California, Berkeley 29


Chip Testing
• 4-layer PCB designed and fabricated with 75
discrete components.
• Logical verification at 50MHz.
• Download and upload data with networked
Logic Analyzer.
• Test vectors generated from Simulink.

Engling Yeo University of California, Berkeley 30


Chip-in-a-Day Design Flow: User Perspective
Simulink Module Simulink
Parameters Pillar
ICMakefile MDL Compiler Test
File Floorplan
Files Code Vectors

dfII Arcadia Spectre EPIC EPIC


Calibre
layout Parasitic Clock Tree PowerMill PathMill
DRC & LVS
hierarchy Extraction Analysis Simulation Simulation

„ Allow regeneration and reanalysis of the design for


small changes at the push of a button
„ Uses flow dependency graphs to manage large
projects
Engling Yeo University of California, Berkeley 31
Clock Tree Generation
CLK CLK2 Variable vdd2
Delay

CLK3

• < 75ps skew


• < 120ps transition times
• Built-in variable delay line for speed characterization
• 200mW av. CLK power at 500MHz (Simulation)

Engling Yeo University of California, Berkeley 32