500Mb/s Soft Output Viterbi Decoder

Engling Yeo Stephanie Augsburger, Wm. Rhett Davis, Borivoje Nikolić
Department of Electrical Engineering and Computer Sciences University of California, Berkeley, CA 94720, USA

Engling Yeo

University of California, Berkeley

1

Turbo Encoder/Decoder
Noise
u1

(11,13) Encoder

x1

π

u2
Encoder

EPR4 Channel

x2

+
ˆ u1

y

y

ˆ u2 EPR4 Decoder

π

−1

ˆ x1

(11,13) Decoder
ˆ ˆ x1

ˆ ˆ u2

π
Decoder

Example high-throughput application: Magnetic Disk-Drive Read Channel EPR4: Enhanced Partial Response Class 4 π : Interleavers
Engling Yeo University of California, Berkeley

2

Outline Soft Output Viterbi Algorithm (SOVA) Decoder Architecture Add-Compare-Select Structures Survivor Memory Traceback Design Flow and Testing
Engling Yeo University of California, Berkeley 3

Viterbi Algorithm

α
mk
Time:

k-M

k

n

State 000 001 010 011 100 101 110 111

L-step VA
Trellis representation of convolutional code Selection between pair of competing paths at each node Decision traceback Most likely node, mk Series of ML nodes form most-likely path, α
Engling Yeo University of California, Berkeley 4

Soft Output Viterbi Algorithm (SOVA)

α
β
Time:

mk

k-M

k

n

State 000 001 010 011 100 101 110 111

M-step SOVA
Next most likely path (β)

L-step VA

Assumes α and β share at least one common node Two ML paths provide complementary bit decisions Branching occurs at node with minimum difference in path metric Soft output = Difference in path metric of α and β
Engling Yeo University of California, Berkeley 5

SOVA Decoder Architecture
Channel Input Branch Metric Gen. 8× CompareSelectAdd

8 × CSA decisions

Survivor Memory Unit (L-step SMU)

3

Mostlikely state

Viterbi Decoder Memories Soft Output Evaluate

Engling Yeo

University of California, Berkeley

6

SOVA Decoder Architecture
Channel Input Branch Metric Gen. 8× CompareSelectAdd

(Path Metric Differences)
8 × CSA decisions

8 × L-step FIFO

Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)

3

Mostlikely state

Viterbi Decoder Memories Soft Output Evaluate

Engling Yeo

University of California, Berkeley

7

SOVA Decoder Architecture
Channel Input Branch Metric Gen. 8× CompareSelectAdd

8 × L-step FIFO (Path Metric Differences)
8 × CSA decisions

Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)
Most-likely state

Viterbi Decoder Memories Soft Output Evaluate

Path Equivalence Detector (M-step PED)
• • •

3

EQiˆ,M

EQiˆ,2

EQiˆ,1

Decoded Soft Output
Engling Yeo

Reliability Measure Unit (M-step RMU)
University of California, Berkeley

∆ iˆ
8

Micro Architecture: Compare Select Add
Channel Input Branch Metric Gen. 8× CompareSelect8 × CSA Add decisions

8 × L-step FIFO (Path Metric Differences) Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)
Most-likely state

Path Equivalence Detector (M-step PED)
• • •

3

Decoded Soft Output
Engling Yeo

Reliability Measure Unit (M-step RMU)
University of California, Berkeley 9

Add-Compare-Select Structures

bm00 (n ) sm0 (n )

+
sm0 (n + 1)

bm10 (n )

sm1(n )

+
Add
Comp Sel

Traditional throughput bottleneck Previous implementations: Radix-4 ‘loop unrolling’ — 2.7× area increase (overall decoder) for 40% speedup
Engling Yeo University of California, Berkeley 10

Retiming: Compare-Select-Add

bm00 (n ) + sm0 (n )

bm00 (n + 1)

+

sm0 (n + 1) + bm00 (n + 1)

sm0 (n + 1)

bm10 (n ) + sm1(n )
bm01(n + 1)

+

sm0 (n + 1) + bm01(n + 1)

Comp

Sel

Add

Forward pipeline to include ADD operations of next stage.
[G. Fettweis, et. al., “Reduced-complexity Viterbi detector architectures for partial response signaling,” Proc. IEEE GLOBECOM, Nov 1995.] [I. Lee and J. L. Sonntag, “A new architecture for the fast Viterbi algorithm,” Proc. IEEE GLOBECOM, Nov. 2000.]

Engling Yeo

University of California, Berkeley

11

Transformation of CSA
bm00 (n + 1)

+
+
sm0 (n + 1) + bm00 (n + 1)

sm0 (n ) + bm00 (n ) sm1(n ) + bm10 (n )
bm01(n + 1)

+
+
Sel
sm0 (n + 1) + bm01(n + 1)

Comp/ Add Parallel executions of Add and Compare

34% reduction in critical delay 22% area penalty (overall decoder)
Engling Yeo University of California, Berkeley 12

Micro Architecture: Survivor Memory Unit
Channel Input Branch Metric Gen. 8× CompareSelect8 × CSA Add decisions

8 × L-step FIFO (Path Metric Differences) Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)
Most-likely state

Path Equivalence Detector (M-step PED)
• • •

3

Decoded Soft Output
Engling Yeo

Reliability Measure Unit (M-step RMU)
University of California, Berkeley 13

Survivor Memory Traceback
Register dACS0 dACS1 dACS2 dACS3 dACS4 dACS5 dACS6 dACS7
Engling Yeo University of California, Berkeley 14

Mux

Traceback recovers most-likely path.
— SRAM Memory – Slower, Low power and area. — Register Exchange – short critical path
Small increase in area and power, due to small number of states in design.

Micro Architecture: Path Equivalence Detector
Channel Input Branch Metric Gen. 8× CompareSelect8 × CSA Add decisions

8 × L-step FIFO (Path Metric Differences) Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)
Most-likely state

Path Equivalence Detector (M-step PED)
• • •

3

EQiˆ,M

EQiˆ,2

EQiˆ,1

Decoded Soft Output
Engling Yeo

Reliability Measure Unit (M-step RMU)
University of California, Berkeley

∆ iˆ
15

Path Equivalence Detection (PED)
State 000 001 010 011 100 101 110 111
Traceback depths
Engling Yeo

SOVA requirements:
Ensure branching off ML path results in complementary decisions.

0

1

2

University of California, Berkeley

3

4

16

Path Equivalence Detection (PED)
State 000 001 010 011 100 101 110 111
Traceback depths
Engling Yeo

SOVA requirements:
Ensure branching off ML path results in complementary decisions. — — Yellow path: Red path: Equivalent Complementary

0

1

2

University of California, Berkeley

3

4

17

Path Equivalence Detector
EQ 0,1 EQ
1,1

EQ 0,2 EQ

1,2

XOR gates test for equivalence between inputs to each multiplexer ML decision from SMU used to select EQiˆ, j for j=1,2,…,M to RMU
EQiˆ, j
Engling Yeo

:

ˆ Equivalence test at ML node i with traceback length j
University of California, Berkeley 18

Micro Architecture: Reliability Measure
Channel Input Branch Metric Gen. 8× CompareSelect8 × CSA Add decisions

8 × L-step FIFO (Path Metric Differences) Survivor Memory Unit (L-step SMU) 8 × L-step FIFO (CSA Decisions)
Most-likely state

Path Equivalence Detector (M-step PED)
• • • •

3

EQiˆ,M

EQiˆ,2

EQiˆ,1

Decoded Soft Output
Engling Yeo

Reliability Measure Unit (M-step RMU)
University of California, Berkeley

∆ iˆ
19

Reliability Measure Unit
∆iˆ
...

r j −1
<

∆ iˆ < r j −1

1

rj

<

∆ iˆ < r j

1

r j +1
...

0 SEL

0 SEL

EQiˆ, j

EQiˆ, j +1

Determines minimum difference in path metrics
∆ iˆ

Difference in competing path metrics at ML node iˆ Equivalence test at ML node iˆ with traceback length j ; j = 1,2,…M
rj = r j −1 min r j −1, ∆ iˆ

EQiˆ, j

Recursion:

(

)

; EQiˆ, j = 1 ; EQiˆ, j = 0
20

Engling Yeo

University of California, Berkeley

Automated Design Flow
Simulink Description Module Compiler Description Cross Verification Netlist Fixed-point cycle-true Simulink description Netlist generation from Module Compiler Automatic placement and routing Custom clock tree (75ps skew, 120ps rise/fall times) Built-in delay line for speed characterization
[W. R. Davis, et. al., "An Automated Design Flow for High-Throughput Low-Power Dedicated Signal Processing Systems," IEEE JSSC, Mar 2002.]

Engling Yeo

University of California, Berkeley

Layout

21

Technology and Physical Parameters
Technology: General-purpose 0.18µm CMOS with 6 metal layers Dual threshold available (used only high-speed transistors) 1.8V power supply Implemented decoders: Soft Output Viterbi Algorithm Convolutional codes
– – 8-state Octal(11,13) code 8-state enhanced partial response class-4 (EPR4)

Speed: Power: Core Area: Transistor count:
Engling Yeo

500Mb/s 400mW 1mm x 0.5mm 170,000
22

University of California, Berkeley

Measured Performance

Freq (x100MHz) Power (x 100mW)

Engling Yeo

University of California, Berkeley

23

Summary
500Mb/s soft-output Viterbi decoder 1.8V, 0.18µm CMOS technology Architectural transformations of add-compare-select structures Modified register exchange Automated design flow permits efficient analysis of bottleneck issues. Performance of chip characterized from 1.2V through 2V supply
Engling Yeo University of California, Berkeley 24

Die Micrograph
1.8V, 0.18µm CMOS with 6 metal layers Decoder (11,13) Speed Power Trans. Count 174k

500Mb/s 400mW

EPR4

500Mb/s 395mW

164k

Acknowledgments:
— — —

T. Smilkstein, P. Pakzad and V. Anantharam of UC Berkeley for technical assistance ST Microelectronics for fabrication of the test chip. Texas Instruments for support under the UC MICRO program.
Engling Yeo University of California, Berkeley 25

END

Engling Yeo

University of California, Berkeley

26

Unrolled Iterative Decoder
Channel Observations

D2
SISO
Extrinsic

π−1
D2

D1

Intrinsic

π
π−1

• • •

Unrolled and pipelined decoder to achieve desired throughput rates (> 1Gbps) Complexity (linear increase) Latency (multi-sector)

D1

D2

π
π-1

D1

[G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, “VLSI architectures for turbo codes,” IEEE Trans. VLSI Systems, Sep 1999. p.369-79.]

Engling Yeo

University of California, Berkeley

27

Comparison of SOVA decoder implementations
Publication Process Tech. Speed (Mb/s) Power Number Area (mm2) of States Scaled to 4-state 2.5V, 0.5µm technology Area (mm2) Speed (Mb/s) Power (mW)

Yeo, et. al., 2002 Garrett & Stan, 2001 Conway & Nelson,1996 Joeressen & Meyr, 1994

0.18 µm CMOS 1.2 µm CMOS 0.7 µm CMOS 1 µm CMOS

500 2.5 75 40

400 mW @ 1.8V 23 mW @ 3.3V N.A. 1.6 W @ 3.3V

8 4 4 16

0.5 23 14 43

1.9 4.0 7.1 2.7

180 6.0 105 80

386 13.2 N.A. 230

Engling Yeo

University of California, Berkeley

28

Log-likelihood Operations
Perr 1 = ; ∆ = M β − Mα 1 + exp(∆ )

Assumption: Values of path metrics, Mα and Mβ, dominate over that of other paths

log-likelihood of error =

 1 − Perr log  P  err

  = ∆ = M β − Mα  

Engling Yeo

University of California, Berkeley

29

Chip Testing
• • 4-layer PCB designed and fabricated with 75 discrete components. Logical verification at 50MHz.
• • Download and upload data with networked Logic Analyzer. Test vectors generated from Simulink.

Engling Yeo

University of California, Berkeley

30

Chip-in-a-Day Design Flow: User Perspective
ICMakefile Simulink MDL Files Module Compiler Code Parameters File Pillar Floorplan Simulink Test Vectors

dfII layout hierarchy

Calibre DRC & LVS

Arcadia Parasitic Extraction

Spectre Clock Tree Analysis

EPIC PowerMill Simulation

EPIC PathMill Simulation

Allow regeneration and reanalysis of the design for small changes at the push of a button Uses flow dependency graphs to manage large projects
Engling Yeo University of California, Berkeley 31

Clock Tree Generation
CLK CLK2 Variable Delay vdd2 CLK3

• • • •

< 75ps skew < 120ps transition times Built-in variable delay line for speed characterization 200mW av. CLK power at 500MHz (Simulation)
Engling Yeo University of California, Berkeley 32