You are on page 1of 54

Introduction to CMOS VLSI Design

Chapter 4
Delay
Delay Definitions
 tpdr: rising propagation delay
– From input to rising output
crossing VDD/2
 tpdf: falling propagation delay
– From input to falling output
crossing VDD/2
 tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
 tr: rise time
– From output crossing 0.2
VDD to 0.8 VDD
 tf: fall time
– From output crossing 0.8 Inverter
VDD to 0.2 VDD

Chapter 4 CMOS VLSI Design 2


Delay Definitions
 tcdr: rising contamination delay
– From input to rising output crossing VDD/2
 tcdf: falling contamination delay
– From input to falling output crossing VDD/2
 tcd: contamination delay
– tcd = (tcdr + tcdf)/2 ??
– tcd = min(tcdr , tcdf)

Chapter 4 CMOS VLSI Design 3


Simulated Inverter Delay
 Solving differential equations by hand is too hard
 SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
 But simulations take time to write, may hide insight
2.0

1.5

1.0
(V)
tpdf = 66ps tpdr = 83ps
Vin
Vout
0.5

0.0

0.0 200p 400p 600p 800p 1n


t(s)

Chapter 4 CMOS VLSI Design 4


Delay Estimation
 We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
 The step response usually looks like a 1st order RC
response with a decaying exponential.
 Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
 Characterize transistors by finding their effective R
– Depends on average current as gate switches

Chapter 4 CMOS VLSI Design 5


Effective Resistance
 Shockley models have limited value
– Not accurate enough for modern transistors
– Too complicated for much hand analysis
 Simplification: treat transistor as resistor
– Replace Ids(Vds, Vgs) with effective resistance R
• Ids = Vds/R
– R averaged across switching of digital gate
 Too inaccurate to predict current at any given time
– But good enough to predict RC delay

Chapter 4 CMOS VLSI Design 6


RC Delay Model
 Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R (lower carrier
mobility), capacitance C
 Capacitance proportional to width (k)
 Resistance inversely proportional to width
d
s
kC
kC
R/k
d 2R/k
d
g k g kC
g k g
s kC kC
kC s
s
d
Chapter 4 CMOS VLSI Design 7
RC Values
 Capacitance
– C = Cg = Cs = Cd = 2 fF/mm of gate width in 0.6 mm
– Gradually decline to 1 fF/mm in nanometer techs. (
(1 fF = 1.0E-15 F)
 Resistance
– R  6 KW*mm in 0.6 mm process
– Improves with shorter channel lengths
 Unit transistors
– May refer to minimum contacted device (4/2 l)
– Or maybe 1 mm wide device
– Doesn’t matter as long as you are consistent
Chapter 4 CMOS VLSI Design 8
Equivalent RC circuits

Chapter 4 CMOS VLSI Design Slide 9


Inverter Delay Estimate
 Estimate the delay of a fanout-of-1 inverter
 A=1, consider output capacitance
2C

2C 2C
2C 2C
2 Y 2
A Y
1 1 R C
C
R C C

C output
capacitance
d = 6RC
Chapter 4 CMOS VLSI Design 10
Delay Model Comparison
(Example 4.1, p.145)

Chapter 4 CMOS VLSI Design 11


Example: 3-input NAND
 Sketch a 3-input NAND with transistor widths chosen to
achieve effective rise and fall resistances equal to a unit
inverter (R).

2 2 2

3
3

Chapter 4 CMOS VLSI Design 12


3-input NAND Caps
 Annotate the 3-input NAND gate with gate and diffusion
capacitance.

2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C

3 3C
3C
3C
3
3C
3C
3
3C
3C

Chapter 4 CMOS VLSI Design 13


3-input NAND Caps
 Annotate the 3-input NAND gate with gate and diffusion
capacitance.

2 2 2

9C
3
5C
3C
3
5C
3C
3
5C

Chapter 4 CMOS VLSI Design 14


2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C

3 3C
3C
3C
3
3C
3C
3
3C 2 2 2
3C
9C
3
5C
3C
3
5C
3C
3
5C

CMOS VLSI Design


Elmore Delay
 ON transistors look like resistors (output capacitor)
 Pullup or pulldown network modeled as RC ladder
 Elmore delay of RC ladder
t pd  R
nodes i
i to  source Ci

 R1C1   R1  R2  C2  ...   R1  R2  ...  RN  CN


R1 R2 R3 RN

C1 C2 C3 CN

Chapter 4 CMOS VLSI Design 16


Example: 3-input NAND
 Estimate worst-case rising and falling delay of 3-input NAND
driving h identical gates.
2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
h copies
C 3 3C

A=1, B=1, C=0


A=1, B=1, C=1

t pdf   3C   R3    3C   R3  R3    9  5h  C   R3  R3  R3 
t pdr   9  5h  RC
 11  5h  RC
rising output (nạp tụ)
falling output (xả tụ)

Chapter 4 CMOS VLSI Design 17


Delay Components
 Delay has two parts
– Parasitic delay
• 9 or 11 RC
• Independent of load
– Effort delay
• 5h RC
• Proportional to load capacitance

Chapter 4 CMOS VLSI Design 18


Contamination Delay
 Best-case (contamination) delay (minimum delay) can be
substantially less than propagation delay.
 Ex: If all three inputs fall simultaneously (rising output)

2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
C 3 3C

A=0, B=0, C=0

R  5 
tcdr   9  5h  C      3  h  RC
3  3 
rising output (nạp tụ)

Chapter 4 CMOS VLSI Design 19


Diffusion Capacitance
 We assumed contacted diffusion on every s / d.
 Good layout minimizes diffusion area
 Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
2C 2C
Shared
Contacted
Diffusion Isolated
Contacted 2 2 2
Merged Diffusion
Uncontacted 3 7C
Diffusion 3 3C

3C 3C 3C 3 3C

Chapter 4 CMOS VLSI Design 20


Layout Comparison
 Which layout is better?

VDD VDD
A B A B

Y Y

GND GND

Chapter 4 CMOS VLSI Design 21


Logical Effort Review
 Logical Effort
 Delay in a Logic Gate
 Multistage Logic Networks
 Choosing the Best Number of Stages
 Example
 Summary

Chapter 4 CMOS VLSI Design 22


Introduction
 Chip designers face a bewildering array of choices
– What is the best circuit topology for a function?
– How many stages of logic give least delay?
– How wide should the transistors be?

 Logical effort is a method to make these decisions


– Uses a simple model of delay
– Allows back-of-the-envelope calculations
– Helps make rapid comparisons between alternatives
– Emphasizes remarkable symmetries

Chapter 4 CMOS VLSI Design 23


Example
 A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits

4:16 Decoder
 Decoder specifications:

16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
 needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 24
Delay in a Logic Gate
 Express delays in process-independent unit d  d abs
 Delay has two components: d = f + p 
  3RC
 f: effort delay = gh (a.k.a. stage effort)
 3 ps in 65 nm process
– Again has two components 60 ps in 0.6 mm process
 g: logical effort
– Measures relative ability of gate to deliver current
– g  1 for inverter
 h: electrical effort (or fanout) = Cout / Cin
– Ratio of output to input capacitance
– Sometimes called fanout
 p: parasitic delay (normally ~1)
– Represents delay of gate driving no load
– Set by internal parasitic capacitance

Chapter 4 CMOS VLSI Design 25


Delay Plots
d =f+p 2-input
= gh + p 6
NAND Inverter
g = 4/3

Normalized Delay: d
5 p=2
 What about d = (4/3)h + 2
4 g=1
NOR2? p=1
3 d=h+1

2 Effort Delay: f

1
Parasitic Delay: p
0
0 1 2 3 4 5

Electrical Effort:
h = Cout / Cin

Chapter 4 CMOS VLSI Design 26


Computing Logical Effort
 DEF: Logical effort (g) is the ratio of the input
capacitance of a gate to the input capacitance of an
inverter delivering the same output current.
 Measure from delay vs. fanout plots
 Or estimate by counting transistor widths
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1

Cin = 3 Cin = 4 Cin = 5


g = 3/3 g = 4/3 g = 5/3

Chapter 4 CMOS VLSI Design 27


Catalog of Gates
 Logical effort of common gates

Gate type Number of inputs


1 2 3 4 n
Inverter 1
NAND 4/3 5/3 6/3 (n+2)/3
NOR 5/3 7/3 9/3 (2n+1)/3
Tristate / mux 2 2 2 2 2
XOR, XNOR 4, 4 6, 12, 6 8, 16, 16, 8

Chapter 4 CMOS VLSI Design 28


Catalog of Gates
 Parasitic delay of common gates
– In multiples of pinv (1)
Gate type Number of inputs
1 2 3 4 n
Inverter 1
NAND 2 3 4 n
NOR 2 3 4 n
Tristate / mux 2 4 6 8 2n
XOR, XNOR 4 6 8

Chapter 4 CMOS VLSI Design 29


Example: Ring Oscillator
 Estimate the frequency of an N-stage ring oscillator

Logical Effort: g=1 31 stage ring oscillator in


0.6 mm process has
Electrical Effort: h=1 frequency of ~ 200 MHz
Parasitic Delay: p=1
Stage Delay: d=2
Frequency: fosc = 1/(2*N*d) = 1/4N

Chapter 4 CMOS VLSI Design 30


Example: FO4 Inverter
 Estimate the delay of a fanout-of-4 (FO4) inverter
d

Logical Effort: g=1


Electrical Effort: h=4 The FO4 delay is about

Parasitic Delay: p=1 300 ps in 0.6 mm process

Stage Delay: d=5 15 ps in a 65 nm process

Chapter 4 CMOS VLSI Design 31


Multistage Logic Networks
 Logical effort generalizes to multistage networks
 Path Logical Effort G gi 
Cout-path
 Path Electrical Effort H
Cin-path
 Path Effort F   f i   gi hi

10
x z
y
20
g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1
h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z

Chapter 4 CMOS VLSI Design 32


Multistage Logic Networks
 Logical effort generalizes to multistage networks
 Path Logical Effort G  gi
Cout  path
 Path Electrical Effort H
Cin  path
 Path Effort F   f i   gi hi

 Can we write F = GH?

Chapter 4 CMOS VLSI Design 33


Paths that Branch
 No! Consider paths that branch:
15
G =1 90
5
H = 90 / 5 = 18
GH = 18 15
90
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH

Chapter 4 CMOS VLSI Design 34


Branching Effort
 Introduce branching effort
– Accounts for branching between stages in path
Con path  Coff path
b
Con path
B   bi
Note:

 h  BHi

 Now we compute the path effort


– F = GBH

Chapter 4 CMOS VLSI Design 35


Multistage Delays
 Path Effort Delay DF   f i

 Path Parasitic Delay P   pi

 Path Delay D   d i  DF  P

Chapter 4 CMOS VLSI Design 36


Designing Fast Circuits
D   d i  DF  P
 Delay is smallest when each stage bears same effort

fˆ  gi hi  F
1
N

 Thus minimum delay of N stage path is


1
D  NF  P
N

 This is a key result of logical effort


– Find fastest possible delay
– Doesn’t require calculating gate sizes

Chapter 4 CMOS VLSI Design 37


Gate Sizes
 How wide should the gates be for least delay?

fˆ  gh  g CCoutin
gi Couti
 Cini 

 Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
 Check work by verifying input cap spec is met.

Chapter 4 CMOS VLSI Design 38


Example: 3-stage path
 Select gate sizes x and y for least delay from A to B

y
x
45
A 8
x
y B
45

Chapter 4 CMOS VLSI Design 39


Example: 3-stage path
x

y
x
45
A 8
x
y B
45

Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27


Electrical Effort H = 45/8
Branching Effort B=3*2=6
Path Effort F = GBH = 125
Best Stage Effort fˆ  3 F  5
Parasitic Delay P=2+3+2=7
Delay D = 3*5 + 7 = 22 = 4.4 FO4

Chapter 4 CMOS VLSI Design 40


Example: 3-stage path
 Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10

y
x
45
45
A P:
84 P:
x 4
N: 4 P:
y 12 B
B
N: 6 45
N: 3 45

Chapter 4 CMOS VLSI Design 41


Best Number of Stages
 How many stages should a path use?
– Minimizing number of stages is not always fastest
 Example: drive 64-bit datapath with unit inverter
Initial Driver 1 1 1 1

8 4 2.8

D = NF1/N + P 16 8

= N(64)1/N + N
23

Datapath Load 64 64 64 64

N: 1 2 3 4
f: 64 8 4 2.8
D: 65 18 15 15.3
Fastest

Chapter 4 CMOS VLSI Design 42


Derivation
 Consider adding inverters to end of path
– How many give least delay? N - n1 ExtraInverters
Logic Block:
n1 n1Stages

D  NF   pi   N  n1  pinv
1
N Path Effort F

i 1
D 1 1 1
  F N ln F N  F N  pinv  0
N
F
1
 Define best stage effort N

pinv   1  ln    0

Chapter 4 CMOS VLSI Design 43


Best Stage Effort
 pinv   1  ln    0 has no closed-form solution

 Neglecting parasitics (pinv = 0), we find  = 2.718 (e)


 For pinv = 1, solve numerically for  = 3.59

Chapter 4 CMOS VLSI Design 44


Sensitivity Analysis
 How sensitive is delay to using exactly the best
number of stages? 1.6
1.51

D(N) /D(N)
1.4
1.26
1.2 1.15
1.0

(=6) ( =2.4)

0.0
0.5 0.7 1.0 1.4 2.0

N/ N

 2.4 <  < 6 gives delay within 15% of optimal


– We can be sloppy!
– Harris uses  = 4, exact  is process dependent

Chapter 4 CMOS VLSI Design 45


Optimal Power?
 Switching vs. Crowbar

Chapter 4 CMOS VLSI Design Slide 46


Example, Revisited
 A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits

4:16 Decoder
 Decoder specifications:

16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
 needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 47
Number of Stages
 Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8

 If we neglect logical effort (assume G = 1)


Path Effort: F = GBH = 76.8

Number of Stages: N = log4F = 3.1

 Try a 3-stage design (But G1 and 4)

Chapter 4 CMOS VLSI Design 48


Gate Sizes & Delay
Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort: F = GBH = 154
Stage Effort: fˆ  F 1/ 3  5.36
Path Delay: D  3 fˆ  1  4  1  22.1
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]

10 10 10 10 10 10 10 10

y z word[0]

96 units of wordline capacitance

y z word[15]

Chapter 4 CMOS VLSI Design 49


Comparison
 Compare many alternatives with a spreadsheet
 D = N(76.8 G)1/N + P
Design N G P D
NOR4 1 3 4 234
NAND4-INV 2 2 5 29.8
NAND2-NOR2 2 20/9 4 30.1
INV-NAND4-INV 3 2 6 22.1
NAND4-INV-INV-INV 4 2 7 21.1
NAND2-NOR2-INV-INV 4 20/9 6 20.5
NAND2-INV-NAND2-INV 4 16/9 6 19.7
INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4
NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6

Chapter 4 CMOS VLSI Design 50


Review of Definitions
Term Stage Path
number of stages 1 N
logical effort g G   gi
H
Cout-path
electrical effort h  CCoutin Cin-path
Con-path Coff-path
branching effort b Con-path B   bi
effort f  gh F  GBH

effort delay f DF   f i

parasitic delay p P   pi
delay d f p D   di  DF  P

Chapter 4 CMOS VLSI Design 51


Method of Logical Effort
1) Compute path effort F  GBH
2) Estimate best number of stages N  log4 F
3) Sketch path with N stages
1
4) Estimate least delay D  NF  PN

5) Determine best stage effort ˆf  F N1

gi Couti
6) Find gate sizes Cini 

Chapter 4 CMOS VLSI Design 52


Limits of Logical Effort
 Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
 Simplistic delay model
– Neglects input rise time effects
 Interconnect
– Iteration required in designs with wire
 Maximum speed only
– Not minimum area/power for constrained delay

Chapter 4 CMOS VLSI Design 53


Summary
 Logical effort is useful for thinking of delay in circuits
– Numeric logical effort characterizes gates
– NANDs are faster than NORs in CMOS
– Paths are fastest when effort delays are ~4
– Path delay is weakly sensitive to stages, sizes
– But using fewer stages doesn’t mean faster paths
– Delay of path is about log4F FO4 inverter delays
– Inverters and NAND2 best for driving large caps
 Provides language for discussing fast circuits
– But requires practice to master

Chapter 4 CMOS VLSI Design 54

You might also like