Professional Documents
Culture Documents
Chih-Wei Liu
VLSI Signal Processing LAB
National Chiao Tung University
cwliu@twins.ee.nctu.edu.tw
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 1
Outline
RC Delay Estimation
Logical Effort
Delay in a Logic Gate
Multistage Logic Networks
Choosing the Best Number of Stages
Example
Summary
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 2
RC Delay Model
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 3
RC Values
Capacitance 假設nMOS/pMOS寄生電容與閘極電容相同
C = Cg = Cs = Cd = 2 fF/μm of gate width
Values similar across many processes
Resistance
R ≈ 6 KΩ*μm in 0.6um process
Improves with shorter channel lengths
Unit transistors
May refer to minimum contacted device (4/2 λ)
Or maybe 1 μm wide device
Doesn’t matter as long as you are consistent
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 4
Inverter Delay Estimate
2 Y 2
A
1 1
s
kC
2R/k
kC
g
kC
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 5
Inverter Delay Estimate
2C
2C
2 Y 2
A Y
1 1
C
R C
假設A電位提昇
C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 6
Inverter Delay Estimate
2C 2C
2C 2C
2 Y 2
A Y
1 1 R C
C
R C C
output
C
A=1
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 7
Inverter Delay Estimate
2C 2C
2C 2C
2 Y 2
A Y
1 1 R C
C
R C C
C tpd = 6RC
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 8
Example: 3-input NAND
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 9
Example: 3-input NAND
Step 1
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 10
3-input NAND Caps
We still assume C=Cg=Cdiff
2 2 2
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 11
3-input NAND Caps
We assume C=Cg=Cdiff
3C
Step 2 3
3C
3C
3
3C
3C
3
3C
3C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 12
3-input NAND Caps
Final result 2 2 2
3 9C
5C
3 3C
5C
3 3C
5C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 13
Elmore Delay Model
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 14
Example: 2-input NAND
2 2 Y
h copies
A 2
B 2x
2-input NAND
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 15
Example: 2-input NAND
2 2 Y
A 2 6C 4hC h copies
B 2x 2C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 16
Example: 2-input NAND
2 2 Y
A 2 6C 4hC h copies
B 2x 2C
Worst case rising: 只有一個pMOS開啟
R
Y t pdr = ( 6 + 4h ) RC
(6+4h)C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 17
Example: 2-input NAND
2 2 Y
A 2 6C 4hC h copies
B 2x 2C
Worst case falling:A已經為”1”;B的電位開始上升
R R R
x R/2 Y t pdf = ( )2C + ( + )[(6 + 4h)C ]
R/2 2C (6+4h)C 2 2 2
= (7 + 4h) RC
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 18
Delay Components
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 19
Contamination Delay
2 2 Y
A 2 6C 4hC
B 2x 2C
R
tcdr = ( )[(6 + 4h)C ]
R R 2
Y
(6+4h)C = (3 + 2h) RC
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 20
Contamination Delay
2 2 Y
A 2 6C 4hC
B 2x 2C
R R
tcdf = ( + )[(6 + 4h)C ]
2 2
= (6 + 4h) RC
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 21
Delay Components
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 22
Diffusion Capacitance 2 2 2
3 9C
5C
3 3C
5C
3 3C
5C
We assumed contacted diffusion on every s / d.
Good layout minimizes diffusion area Recall 3-input NAND
Ex: NAND3 layout shares one diffusion contact
Reduces output capacitance by 2C
Merged un-contacted diffusion might help too
Gate capacitance: 由電晶體的寬度決定
Diffusion capacitance: 與佈局有關
2C 2C
Shared
Contacted
Diffusion Isolated
Contacted 2 2 2
Merged Diffusion
Uncontacted 3 7C
Diffusion 3 3C
3C 3C 3C 3 3C
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 23
Layout Comparison
Y Y
GND GND
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 24
Introduction
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 25
Example
Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded
automotive processor. Help Ben design the decoder for a register file.
A[3:0] A[3:0]
32 bits
Decoder specifications:
4:16 Decoder
16 word register file
16 words
16
Each word is 32 bits wide Register File
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 26
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 27
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 28
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 29
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 30
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 31
Delay in a Logic Gate
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 32
Delay in NAND2
2x 2C
d= f+p B
Normalized delay representation
4hC
= +2
3C
註:Ideal Inverter (無寄生電容、
4 Single Fanout) τ = 3RC;g = 1
= ( ) h + 2 = gh + p
3
How to calculate the logical effort g ?
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 33
Delay Plots
d =f+p 2-input
6
NAND Inverter
= gh + p g=
Normalized Delay: d
5 p=
d=
4 g=
p=
3 d=
0
0 1 2 3 4 5
Electrical Effort:
h = Cout / Cin
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 34
Delay Plots
d =f+p 2-input
6
NAND Inverter
= gh + p g = 4/3
Normalized Delay: d
5 p=2
d = (4/3)h + 2
4 g=1
What about p=1
NOR2? 3 d=h+1
2 Effort Delay: f
1
Parasitic Delay: p
0
0 1 2 3 4 5
Electrical Effort:
h = Cout / Cin
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 35
Computing Logical Effort
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 36
Catalog of Gates
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 37
Recall: 4:1 Multiplexer
D0
D2
大型多工器的速度與小型多工器
D3
的速度相當?
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 38
Computing Parasitic Delay
DEF:
Represents delay of gate driving no load
Set by internal parasitic capacitance
Measure from delay vs. fanout plots
Or estimate by counting total diffusion capacitances at
output
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1
增加電晶體大小會降低電阻,但卻會增加寄生電容,但較
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 39
寬的電晶體可以摺疊,使內部的寄生電容可能變小
Catalog of Gates
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 40
Example: FO4 Inverter
Logical Effort: g=
Electrical Effort: h=
Parasitic Delay: p=
Stage Delay: d=
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 41
Example1: FO4 Inverter
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 42
Example2: Ring Oscillator
如果g ≠ 1,那如何計算多級邏輯網路的延遲呢?
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 43
Multistage Logic Networks
Cout-path
Path Electrical Effort H= 與電晶體大小有關
Cin-path
Path Effort F = ∏ f i = ∏ gi hi
10
x z
y
20
g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1
h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z
F=GH=(5/3)(4/3)(x/10)(y/x)(z/y)(20/z)=G(20/10)
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 44
Multistage Logic Networks
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 45
Branch Paths
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 46
Paths that Branch
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 47
Branching Effort
Branching effort
Accounts for branching between stages in path
∏h
Path branching effort
i = BH
Now we compute the path effort
F = GBH
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 48
Multistage Delays
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 49
Designing Fast Circuits
D = ∑ d i = DF + P
Delay is smallest when each stage bears same effort
1
fˆ = gi hi = F N
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 50
Choose Gate Sizes to Meet the Least
Delay Design
How wide should the gates be for least delay?
ˆf = gh = g Cout
Cin
gi Couti
⇒ Cini =
fˆ
Working backward, apply capacitance transformation to
find input capacitance of each gate given load it drives.
Check work by verifying input cap spec is met.
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 51
Example: 3-stage path
y
x
45
A 8
x
y B
45
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 52
Example: 3-stage path
x
y
x
45
A 8
x
y B
45
Logical Effort G=
Electrical Effort H=
Branching Effort B=
Path Effort F=
Best Stage Effort fˆ =
Parasitic Delay P=
Delay D=
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 53
Example: 3-stage path
x
y
x
45
A 8
x
y B
45
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 54
Example: 3-stage path
y
x
45
A 8
x
y B
45
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 55
Example: 3-stage path fˆ = gh = g CCoutin
gi Couti
⇒ Cini =
Work backward for sizes fˆ
y = 45 * (5/3) / 5 = 15
x = (15+15) * (5/3) / 5 = 10
45
A P: 4
P: 4
N: 4 P: 12 B
N: 6 45
N: 3
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 56
Example: 3-stage path
45
A P: 4
P: 4
N: 4 P: 12 B
N: 6 45
N: 3
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 57
Best Number of Stages
D =
DatapathLoad 64 64 64 64
N: 1 2 3 4
f:
D:
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 58
Best Number of Stages
8 4 2.8
D = NF1/N + P
= N(64)1/N + N
16 8
23
DatapathLoad 64 64 64 64
N: 1 2 3 4
f: 64 8 4 2.8
D: 65 18 15 15.3
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw Fastest 59
Derivation for Best Number of Stages
D = NF + ∑ pi + ( N − n1 ) pinv
1 n1Stages
N
Path Effort F
i =1
∂D 1 1 1
= − F ln F + F + pinv = 0
N N N
∂N
ρ=F
1
N
Define best stage effort
pinv + ρ (1 − ln ρ ) = 0
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 60
Best Number of Stages
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 61
Sensitivity Analysis
D(N) /D(N)
1.4
1.26
1.2 1.15
If pinv = 1
1.0
0.0
0.5 0.7 1.0 1.4 2.0
N/ N
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 62
Example, Revisited
Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded
automotive processor. Help Ben design the decoder for a register file.
A[3:0] A[3:0]
32 bits
Decoder specifications:
16 word register file
4:16 Decoder
16 words
16
Each word is 32 bits wide Register File
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 63
Possible 4:16 Decoder
10 10 10 10 10 10 10 10
y z word[0]
y z word[15]
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 64
Number of Stages
Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 65
Gate Sizes & Delay
10 10 10 10 10 10 10 10
y z word[0]
y z word[15]
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 66
Comparison
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 67
Review of Definitions
logical effort g G = ∏ gi
H=
Cout-path
electrical effort h= Cout
Cin Cin-path
Con-path + Coff-path
branching effort b= Con-path B = ∏ bi
effort f = gh F = GBH
effort delay f DF = ∑ f i
parasitic delay p P = ∑ pi
delay d= f +p D = ∑ d i = DF + P
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 68
Method of Logical Effort
gi Couti
6) Find gate sizes Cini =
fˆ
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 69
Limits of Logical Effort
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 70
Summary
DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 71