You are on page 1of 47

|  

  |




David Harris

Harvey Mudd College


Spring 2004
 
© |ntroduction
© Delay in a Logic Gate
© Multistage Logic Networks
© Choosing the Best Number of Stages
© Example
© Summary


 a 


| 
© Chip designers face a bewildering array of choices
± What is the best circuit topology for a function?
± How many stages of logic give least delay?
???
± How wide should the transistors be?

© Logical effort is a method to make these decisions


± Uses a simple model of delay
± Allows back-of-the-envelope calculations
± Helps make rapid comparisons between alternatives
± Emphasizes remarkable symmetries


 a 



© Ben Bitdiddle is the memory designer for the Motoroil 68W86,
an embedded automotive processor. Help Ben design the
A[3:0] A[3:0]
decoder for a register file. 32 bits

© Decoder specifications:

:16 Decoder

16 words
16
Register File
± 16 word register file
± Each word is 32 bits wide
± Each bit presents load of 3 unit-sized transistors
± True and complementary address inputs A[3:0]
± Each input may drive 10 unit-sized transistors
© Ben needs to decide:
± How many stages to use?
± How large should each gate be?
± How fast can decoder operate?

 a 


 
© Express delays in process-independent unit
—  [ 3 C
   12 ps in 180 nm process
[
40 ps in 0.6 m process


 a 


 
© Express delays in process-independent unit
— 

[
© Delay has two components
  


 a 


 
© Express delays in process-independent unit
— 

[
© Delay has two components
  
© ! 
 üa.k.a.   )
± Again has two components


 a 


 
© Express delays in process-independent unit
— 

[
© Delay has two components
  
© Effort delay 
 üa.k.a. stage effort)
± Again has two components
© :   
± Measures relative ability of gate to deliver current
± A 1 for inverter

 a 


 
© Express delays in process-independent unit
— 

[
© Delay has two components
  
© Effort delay 
 üa.k.a. stage effort)
± Again has two components
© :   = Cout / Cin
± atio of output to input capacitance
± Sometimes called fanout

 a 


 
© Express delays in process-independent unit
— 

[
© Delay has two components
  
© Parasitic delay r
± epresents delay of gate driving no load
± Set by internal parasitic capacitance


 a 



 = +r 2-input
NAND |nverter
= + r 6
g=

NormalizedDelay:d
5 p=
d=
4 g=
p=
3 d=
2

0
0 1 2 3 4 5

ElectricalEffort:
h = Co u t / Cin


 a 



 = +r 2-input
NAND |nverter
= + r 6
g = 4/3

NormalizedDelay:d
5 p=2
d = ü4/3)h + 2
© What about 4 g=1
p=1
NO 2? 3 d = h +1
2 EffortDelay:f

1
Parasitic Delay: p
0
0 1 2 3 4 5

ElectricalEffort:
h = Co u t / Cin


 a 


 
© DE : O      r 
r   r r 
    r  .
© Measure from delay vs. fanout plots
© Or estimate by counting transistor widths
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1

Cin = 3 Cin = 4 Cin = 5


g = 3/3 g = 4/3 g = 5/3


 a 



© Logical effort of common gates

Gate type Number of inputs


1 2 3 4 n
|nverter 1
NAND 4/3 5/3 6/3 ün+2)/3
NO 5/3 7/3 9/3 ü2n+1)/3
Tristate / mux 2 2 2 2 2
XO , XNO 4, 4 6, 12, 6 8, 16, 16, 8


 a 



© Parasitic delay of common gates
± |n multiples of pinv ü 1)
Gate type Number of inputs
1 2 3 4 n
|nverter 1
NAND 2 3 4 n
NO 2 3 4 n
Tristate / mux 2 4 6 8 2n
XO , XNO 4 6 8


 a 


  
© Estimate the frequency of an N-stage ring oscillator

Logical Effort: g=
Electrical Effort: h=
Parasitic Delay: p=
Stage Delay: d=
requency: fosc =


 a 


  
© Estimate the frequency of an N-stage ring oscillator

Logical Effort: g=1 31 stage ring oscillator in


0.6 m process has
Electrical Effort: h=1 frequency of ~ 200 MHz
Parasitic Delay: p=1
Stage Delay: d=2
requency: fosc = 1/ü2*N*d) = 1/4N


 a 


 !| "
© Estimate the delay of a fanout-of-4 ü O4) inverter
d

Logical Effort: g=
Electrical Effort: h=
Parasitic Delay: p=
Stage Delay: d=


 a 


 !| "
© Estimate the delay of a fanout-of-4 ü O4) inverter
d

Logical Effort: g=1


Electrical Effort: h=4 The O4 delay is about

Parasitic Delay: p=1 200 ps in 0.6 m process

Stage Delay: d=5 60 ps in a 180 nm process


f/3 ns in an m process


 a 



#$%
© Logical effort generalizes to multistage networks
© * O !  à 
© * !! 
aV 
r
a
© * !   r   à r  à à

£
 


£ £  
   £
£  £         


 a 



#$%
© Logical effort generalizes to multistage networks
© * O !   à

© * !! 


aV   —
aà  —
© * !      à à à

© Can we write = GH?


 a 


&&' &
© No! Consider paths that branch:
15
G = 90
5
H =
GH = 15
90
h1 =
h2 =
= GH?


 a 


&&' &
© No! Consider paths that branch:
15
G =1 90
5
H = 90 / 5 = 18
GH = 18 15
90
h1 = ü15 +15) / 5 = 6
h2 = 90 / 15 = 6
= g1g2h1h2 = 36 = 2GH


 a 


' & 
© |ntroduce x   
± Accounts for branching between stages in path
aV  aV 
r
aV
Note:
r à
 à r
© Now we compute the path effort
± = GBH


 a 




© Path Effort Delay   à

© Path Parasitic Delay   à

© Path Delay   à  


 a 


    
r à r 
© Delay is smallest when each stage bears same effort
£
 à à  î

© Thus minimum delay of N stage path is


£
r î   î

© This is a key result of logical effort


± ind fastest possible delay
± Doesn¶t require calculating gate sizes


 a 


 (
© How wide should the gates be for least delay?

 r  r  aaVà
à aV à
aàà r

© Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
© Check work by verifying input cap spec is met.


 a 


)*&
© Select gate sizes x and y for least delay from A to B

y
x

A
x
y B


 a 


)*&
y

y B

Logical Effort G=
Electrical Effort H=
Branching Effort B=
Path Effort =
Best Stage Effort  r
Parasitic Delay P=
Delay D=


 a 


)*&
y
45
8
y B
45

Logical Effort G = ü4/3)*ü5/3)*ü5/3) = 100/27


Electrical Effort H = 45/8
Branching Effort B=3*2=6
Path Effort = GBH = 125
Best Stage Effort  r 6  r
Parasitic Delay P=2+3+2=7
Delay D = 3*5 + 7 = 22 = 4.4 O4


 a 


)*&
© Work backward for sizes
y=
x=

y
x

x
y


 a 


)*&
© Work backward for sizes
y = 45 * ü5/3) / 5 = 15
x = ü15*2) * ü5/3) / 5 = 10

45
:4
:4
:4 : 12
: 45
:3


 a 


'#+ 
© How many stages should a path use?
± Minimizing number of stages is not always fastest
© Example: drive 64-bit datapath with unit inverter
InitialDriver

D =

Datapath oad 64 64 64 64

: 4
f:
D:


 a 


'#+ 
© How many stages should a path use?
± Minimizing number of stages is not always fastest
© Example: drive 64-bit datapath with unit inverter
InitialDriver 1 1 1 1

D = N 1/N + P 16

= Nü64)1/N + N

Datapath oad 64 64 64 64

N: 1 4
f: 64 4
D: 6 1 1 1
astest


 a 


"
© Consider adding inverters to end of path
± How many give least delay? N - n1 Extra|nverters
Logic Block:
£ n1Stages
£
r î  î
à  Uî  £ à Path Effort

à r£
 £ £ £
   î î
 î
à

î
£
© Define best stage effort ur î

à  u U£   u r


 a 


' 
© à  u U£   u r
has no closed-form solution

© Neglecting parasitics üpinv = 0), we find u = 2.718 üe)


© or pinv = 1, solve numerically for u = 3.59


 a 


 ", 
© How sensitive is delay to using exactly the best
number of stages? 1.6
.

2 î /2 î
1.4
.
1. .

1.

üuå) üu å . )

.
.5 . 1. 1.4 .

î/ î

© 2.4 < u < 6 gives delay within 15% of optimal


± We can be sloppy!
± | like u = 4


 a 


-"
© Ben Bitdiddle is the memory designer for the Motoroil 68W86,
an embedded automotive processor. Help Ben design the
A[3:0] A[3:0]
decoder for a register file. 32 bits

© Decoder specifications:

:16 Decoder

16 words
16
Register File
± 16 word register file
± Each word is 32 bits wide
± Each bit presents load of 3 unit-sized transistors
± True and complementary address inputs A[3:0]
± Each input may drive 10 unit-sized transistors
© Ben needs to decide:
± How many stages to use?
± How large should each gate be?
± How fast can decoder operate?

 a 


#+ 
© Decoder effort is mainly electrical and branching
Electrical Effort: H=
Branching Effort: B=

© |f we neglect logical effort üassume G = 1)


Path Effort: =

Number of Stages: N=


 a 


#+ 
© Decoder effort is mainly electrical and branching
Electrical Effort: H = ü32*3) / 10 = 9.6
Branching Effort: B=8

© |f we neglect logical effort üassume G = 1)


Path Effort: = GBH = 76.8

Number of Stages: N = log4 = 3.1

© Try a 3-stage design


 a 


 (.
Logical Effort: G=
Path Effort: =
Stage Effort: 
Path Delay: 
Gate sizes: z= y=
     
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
       
       

y z
or[]

 

its of or li e ca acita ce

y z
or[ ]


 a 


 (.
Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort: = GBH = 154
Stage Effort:   £ 6 6
Path Delay: r 6   £    £ r £
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
 
[3] [3] [2] [2] [1] [1] [ ] [ ]
       
1 1 1 1 1 1 1 1

y z
or[]


96 its of or li e ca acita ce

y z
or[15]


 a 



© Compare many alternatives with a spreadsheet


! " # 
NAND4-|NV 2 2 5 29.8
NAND2-NO 2 2 20/9 4 30.1
|NV-NAND4-|NV 3 2 6 22.1
NAND4-|NV-|NV-|NV 4 2 7 21.1
NAND2-NO 2-|NV-|NV 4 20/9 6 20.5
NAND2-|NV-NAND2-|NV 4 16/9 6 19.7
|NV-NAND2-|NV-NAND2-|NV 5 16/9 7 20.4
NAND2-|NV-NAND2-|NV-|NV-|NV 6 16/9 8 21.6


 a 


"$  
$%   #&
number of stages £ î
logical effort   à

aV 
electrical effort  aV 
a
r a
aV  aV 
branching effort r aV r à

effort  r   r 

effort delay   r  à

parasitic delay   r  à
delay r   r  à r  


 a 



&
1) Compute path effort  r 
2) Estimate best number of stages î r V  
3) Sketch path with N stages
£
4) Estimate least delay r î  î

£
5) Determine best stage effort  r î

à aV à
6) ind gate sizes aàà



 a 



© Chicken and egg problem
± Need path to compute G
± But don¶t know number of stages without G
© Simplistic delay model
± Neglects input rise time effects
© |nterconnect
± |teration required in designs with wire
© Maximum speed only
± Not minimum area/power for constrained delay


 a 



© Logical effort is useful for thinking of delay in circuits
± Numeric logical effort characterizes gates
± NANDs are faster than NO s in CMOS
± Paths are fastest when effort delays are ~4
± Path delay is weakly sensitive to stages, sizes
± But using fewer stages doesn¶t mean faster paths
± Delay of path is about log4 O4 inverter delays
± |nverters and NAND2 best for driving large caps
© Provides language for discussing fast circuits
± But requires practice to master


 a 



You might also like