You are on page 1of 51

ECE 124A

VLSI Principles
Lecture 16

Prof. Kaustav Banerjee


Electrical and Computer Engineering
E-mail: kaustav@ece.ucsb.edu

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

A Generic Digital Processor


Lecture 15

Static, Dynamic Memory


Latch, Register
Timing, Stability

Interconnect

Input / Output

MEMORY

CONTROL

DATAPATH

Sequential Circuit

Today
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

DATAPATH
The

core of a digital Processor


Datapath consists
Logic Blocks
Combinational Logic Functions (AND, OR, XOR.)

Arithmetic Blocks
Addition
Multiplication
Comparison
Shift

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

g64
CARRYGEN
SUMSEL

node1

2-1 Mux

9-1 Mux

ck1

SUMGEN
+ LU

REG

5-1 Mux

9-1 Mux

An Intel Microprocessor

sum

sumb
to Cache

Intel Itanium
Integer Datapath

s0
s1

LU : Logical
Unit
1000um

Itanium has 6 integer execution units like this

Fetzer, Orton, ISSCC02


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Bit-Sliced Design

DATA OUT

Bit 32

Multiplexer

Shifter

Adder

Register

DATA IN

Control

Bit 0

Design a single bit datapath and repeat for all bits

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Adders

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Design an Adder
Fundamental

Arithmetic Building Block

Performance

Logic Level Optimization


Optimize Boolean Functions
Carry Lookahead

Circuit Level Optimization


Transistor Sizing
Power

Consumption

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Half-Adder Implementations
Half-Adder
X

Half-Adder

Carry

X
y

X
y

X
y

c
s

X
y

SUM

c
X
s
y
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Full-Adder
A
Cin

Full
adder

A
B

Cout

HA

Cin

Cout
HA

Sum

S = A B Cin = ABC in + ABC in + ABCin + ABCin


Cout = AB + BCin + ACin

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Express Sum and Carry


A

Ci

Co

status

Delete (D) = A B

Propagate (P) = A B

Generate (G) = AB

Truth Table for Full Adder

Lecture 16, ECE 124A, VLSI Principles

Co = G + PCi
S = P Ci

Kaustav Banerjee

Complimentary Static CMOS Full Adder


VDD
VDD
A

Ci

B
A

C0

Ci
A

Ci

S
VDD

Ci
S

A
Ci

VDD
A
Co

28 Transistors

Ci

S = A B Cin = ABCi + C0 ( A + B + Ci )
Co = AB + BCin + ACin
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

The Ripple-Carry Adder

Worst case delay: linear with the number of bits

td = O(N)

tadder = (N-1)tcarry + tsum

Propagation Delay is linearly proportional to N


tcarry dominates the propagation delay
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Inverting Property

S( A, B,Ci ) = S( A, B,C i )
C 0 ( A, B,Ci ) = C0 ( A, B,C i )
Inverting all inputs to FA results in inverted values for all outputs

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Minimize Critical Path by Reducing Inverting Stages


Even cell
A0

B0

Ci,0

A1

B1

Co,0

A2

Odd cell

B2

Co,1

A3

B3

Co,2

Co,3

FA

FA

FA

FA

S0

S1

S2

S3

Exploit Inversion Property

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Mirror Adder (1)


Carry

VDD

VDD
A

Ci

Co

status

VDD
B

Ci

Ci
Co

Ci

A
A

Truth Table for Full Adder

Lecture 16, ECE 124A, VLSI Principles

Ci
A

Ci

A
B

Kaustav Banerjee

Mirror Adder (2)


Sum
VDD
A

Ci

Co

status

VDD
B

Ci

B
A
Ci

Co

Ci

A
A

VDD

Ci
A

Ci

A
B

Truth Table for Full Adder

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Mirror Adder Implementation


Stick Diagram
VDD

Ci

A Ci

Co

Ci

Co
S
GND

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Mirror Adder Summary

24 Transistors

The NMOS and PMOS chains are completely symmetrical

A maximum of 2 series transistors in carry-generation circuitry

The transistors connected to Ci should be closest to the output

Carry-Stage transistors have to be optimized for speed

Sum-Stage transistors can be optimized for area

The most critical issue is to minimize the capacitance at Co

Co is composed of 4 diffusion capacitances, 2 internal gate


capacitances, and 6 gate capacitances in the connecting adder cell
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Transmission Gate Full Adder


P

VDD

Ci

P
A

P
B

VDD
Ci

A
P

Ci

VDD
S Sum Generation

Ci
P
B

VDD

A
P

Co Carry Generation

Ci
A
Setup

This implementation has similar sum and carry output delay


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Manchester Carry-Chain
Manchester Carry Gates
VDD
Pi

VDD
Pi

Co

Ci
Gi

Gi

Co

Ci
Di

Pi

Static Implementation

Dynamic Implementation

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

4-Bit Manchester Carry-Chain


VDD

P0

P1

P2

P3
C3

Ci,0

G1

G0

G3

G2

C0

C1

Lecture 16, ECE 124A, VLSI Principles

C2

C3

Kaustav Banerjee

Manchester Carry-Chain Implementation


Stick Diagram

Propagate/Generate Row
VDD
Pi
Ci - 1

Gi

Pi + 1

Gi + 1

Ci

Ci + 1

GND
Inverter/Sum Row
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Design for Long Word Length


Propagation

delay of chain style design


is quadratic in the number of bits

Chain

style design is NOT practical for


long word length (e.g. 32 bits)

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Carry-Bypass Adder
Carry-Skip Adder
G1

Ci,0

P0

G1

C o,0

P0

FA

P2

FA

G2

Co,1

FA

G3
Co,3

FA

G1

C o,0

P3
Co,2

FA

P0 G1

G2

C o,1

FA

Ci,0

P2

P3

G3

BP=P oP1 P2 P3

C o,2

FA

FA

Multiplexer

P0

Co,3

Idea: If (P0 and P1 and P2 and P3 = 1)


then C o3 = C0, else kill or generate.
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

16-bit Carry-Bypass Adder


Bit 03
Setup

Bit 47
tsetup

Setup

tbypass

Bit 811

Bit 1215

Setup

Setup

Carry
propagation

Carry
propagation

Carry
propagation

Carry
propagation

Sum

Sum

Sum

M bits

N=16

tsum

Sum

M=4

tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Carry Ripple versus Carry Bypass


tp

Ripple Adder

Bypass Adder
For smaller N, bypass
adder is not preferred due
to the overhead of bypass
multiplexer

4~8

Depends on technology

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Linear Carry-Select Adder


Setup

Both situations are evaluated

P,G

Co,k-1

"0"

"0" Carry Propagation

"1"

"1" Carry Propagation

Multiplexer

C o,k+3
Carry Vector

Sum Generation

~ 30 % Hardware overhead

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

16-bit Linear Carry-Select Adder


Bit 03

Bit 47

Bit 811

Bit 1215

Setup

Setup

Setup

Setup

0-Carry

0-Carry

0-Carry

0-Carry

1-Carry

1-Carry

1-Carry

1-Carry

Ci,0

Multiplexer

Co,3

Multiplexer

Co,7

Multiplexer

Co,11

Multiplexer

Sum Generation

Sum Generation

Sum Generation

Sum Generation

S03

S47

S811

S1215

N=16

Co,15

M=4

tadder = tsetup + Mtcarry + (N/M)tmux + tsum


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Square-Root Carry-Select Adder


Bit 0-1

Bit 2-4

Setup

Setup

Bit 5-8

Bit 9-13

Setup

Setup

Bit 14-19

(1)
"0" Carry

"0"

"0"

"0" Carry

"0"

"0" Carry

"0"

"0" Carry

(1)
"1" Carry

"1"
(3)

"1" Carry
"1"
(4)

(3)
(4)
Multiplexer

"1" Carry
"1"
(5)
(5)

Multiplexer

"1" Carry
"1"
(7)

(6)
(6)

Multiplexer

(7)
Multiplexer

Mux

Ci,0

(8)
Sum Generation
S0-1

Sum Generation

Sum Generation

S2-4

S 5-8

Sum Generation
S9-13

N bits P stages First Stage has M bits


2
For example:
P
N = 2 + 3 + ...... + (P + 1)
2
M=2

Sum
S14-19 (9)

P = 2N

tadder = tsetup + Mtcarry + (N/M)tmux + tsum= tsetup + Mtcarry +Ptmux + tsum


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Adder Delays - Comparison


50
Ripple adder

tp (in unit delays)

40
30

Linear select
20
10
0

Square root select

20

40

60

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Look-Ahead - Basic Idea


A0 , B0

A1, B1

AN-1, BN-1

Expanding Lookahead equations:

C0,k=Gk+PkC0,k-1
C0,k=Gk+Pk(Gk-1+Pk-1C0,k-2)
Ci,0

P0 Ci,1

S0

P1

S1

Ci, N-1

PN-1

SN-1

All the way:

C0,k=Gk+Pk(Gk-1+Pk-1(+P1(G0+P0Ci,0))))))

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Carry Determination
Co,0=G0+P0Ci,0
Co,1=G1+P1Co,0=G1+P1(G0+P0Ci,0)=G1+P1G0+P1P0Ci,0
Co,2=G2+P2G1+P2P1G0+P2P1P0Ci,0
Co,3=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0Ci,0

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

4-bit Look-Ahead
VDD
G3
G2
G1
G0

Large stack

Ci,0
Co,3
P0

Poor Performance

P1
P2
P3

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Hierarchically Decomposition
Co,0=G0+P0Ci,0
Co,1=G1+P1Co,0
Co,2=G2+P2C0,1
Co,3=G3+P3Co,2

A0
A1

A2

A3

A4

A5

A6

A7

tp N

A0
A1

(g,p) (g,p)

A2
A3
F

g=g+gp
p=pp

A4
A5
A6
A7

tp log2(N)

(g,p)
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15

(A0, B0)
(A1, B1)
(A2, B2)
(A3, B3)
(A4, B4)
(A5, B5)
(A6, B6)
(A7, B7)
(A8, B8)
(A9, B9)
(A10, B10)
(A11, B11)
(A12, B12)
(A13, B13)
(A14, B14)
(A15, B15)

Logarithmic Look-Ahead Adder


Kogge-Stone tree

t p log2 N

Lecture 16, ECE 124A, VLSI Principles


Kaustav Banerjee

Multipliers

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

The Binary Multiplication


1 0 1 0 1 0
x

1 0 1 1

Multiplicand
Multiplier

1 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 0
+

Partial products

1 0 1 0 1 0
1 1 1 0 0 1 1 1 0

Lecture 16, ECE 124A, VLSI Principles

Result

Kaustav Banerjee

Reduce Partial Products


Partial

products can be reduced by


multiplier transformation
(Booths Recording)

Reduce

number of partial products is


equivalent to reducing the number of
additions

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

4X4 Bit-Array Multiplier


X3

Z7

X2

X1

X0
Y1 Z0

X3

X2

X1

X0

HA

FA

FA

HA

X3

X2

X1

X0

FA

FA

FA

HA

X3

X2

X1

X0

FA

FA

FA

HA

Z6

Z5

Z4

Y3

Y2

Y0

Z1

Z2

Z3

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Critical Path

X2

X1

tand

X2

X1

X0

HA

FA

FA

HA

X3

X2

X1

X0

FA

FA

FA

HA

X3

X2

X1

X0

FA

FA

FA

HA

Z6

Z5

Z4

Y3

Y2

X0

Y0

Y1 Z0

X3

Z7

X3

Z1

N-1

Z2

Z3

For MXN Bit-Array Multiplier


tmultiplier = [ (M-1) + (N-2) ] tcarry + (N-1) tsum + tand
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Carry-Save Multiplier
HA

HA

HA

HA

FA

FA

FA

HA

FA

FA

FA

FA

FA

HA

HA

Carry is saved for the next adder stage


HA

Vector Merging Adder

tmultiplier = (N-1) tcarry + tand + tmerge


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Multiplier Floorplan
Carry-Save Multiplier
X3

X2

X1

X0

Y0
HA Multiplier Cell

Y1

C S

C S

C S

C S

Z0
FA Multiplier Cell

Y2
C S

Y3

C S

Z7

C S

C S

C S

C S

C S
Z1

Vector Merging Cell

Z2

X and Y signals are broadcasted


through the complete array.
(
)

C S

Z6

Z5

Z4

Z3

Rectangular shape is easy for integration


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Partial Products Transformation


Partial products
6

First stage
2

(a)

0 Bit position

(b)

Second stage
6

Final adder
3

FA

HA
(c)

(d)

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Wallace-Tree Multiplier
Partial products

x3y3

x3y2
x2y3

First stage

Second stage

FA

x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1


x1y3
x0y3 x2y1 x0y2 x1y0 x0y0

HA

HA

FA

FA

FA

Final adder
z7 z6

z5

z4

Lecture 16, ECE 124A, VLSI Principles

z3

z2

z1

Kaustav Banerjee

z0

Wallace Tree Multiplier


Pros:

Substantial hardware saving


Reduce propagation delay
Cons:

Irregular structure
Customized layout

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Shifters

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

The Binary Shifter


Widely used for floating point units, multiplications by constant numbers
Right

nop

Left

Ai

Bi

Ai-1

Bi-1
Bit-Slice i

...

This binary shifter is extremely slow for multi-bit applications


Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

The Barrel Shifter


Area Dominated by Control Wiring
A3

B3

Sh1
A2
B2

Sh2

: Data Wire

A1
B1

: Control Wire

Sh3
A0
B0

Sh0

Sh1

Sh2

Sh3

Decoder

Lecture 16, ECE 124A, VLSI Principles

Control Signal

Kaustav Banerjee

4x4 barrel shifter


A3

A2

A1

A0

Sh0

Sh 1

S h2

Sh3

B uffer

Widthbarrel ~ 2 pm M
Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

Logarithmic Shifter
Sh1 Sh1

Sh2 Sh2

Sh4 Sh4

A3

B3

A2

B2

A1

B1

A0

B0

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

0-7 bit Logarithmic Shifter


A

Out3

Out2

Out1

Out0

Lecture 16, ECE 124A, VLSI Principles

Kaustav Banerjee

You might also like