You are on page 1of 42

Delay Modeling and Static Timing

Verification

Prof. Kurt Keutzer


Michael Orshansky
EECS
University of California
Berkeley, CA
1

RTL Synthesis Flow

HDL

RTL
Synthesis

netlist
a 0 d
q
b 1

Library s clk

logic
optimization
a 0 d
q

netlist b 1

s clk

physical
design

layout

2
Design Process

Design : specify and enter the


design intent

Verify:
verify the Implement:
correctness of refine the
design and design
implementation through all
phases

Implementation Verification
HDL

RTL manual
Synthesis design

netlist a

b
0
1
d
q
Is the
Library/
logic
s clk
implementation
module
generators optimization consistent
with the original
netlist
a

b
0

1
d
q design intent?
s clk

physical Is what I
design implemented
what I
layout
wanted?
4
Implementation verification for ASIC’s

HDL Apply gate-level


simulation (‘‘the
RTL manual golden
Synthesis design simulator’’) at
each step to verify
netlist a 0 d
q

functionality:
b 1

Library/
module logic
s clk
• 0-1 behavior on
generators optimization a 0 d
regression test
b 1
q
set
s clk

netlist and timing:


ASIC • maximum delay
physical signoff of circuit across
design critical paths

layout

Software Simulation

a
Simulation
Simulation
0 d

q
monitor
driver 1

(yes/no)
b

and
(vectors)
s
clk
speed

Advantages of gate-level simulation


z verifies timing and functionality simultaneously
z approach well understood by designers
Disadvantages of gate-level simulation
z computationally intensive - only 1 - 10 clock cycles of 100K gate design
per 1 CPU second
z incomplete - results only as good as your vector set - easy to overlook
incorrect timing/behavior

6
Alternative - Static Sign-off
HDL Use static
analysis
RTL manual techniques to
Synthesis design verify:

netlist a 0 d
q
functionality:
• formal
b 1

Library/
module logic
s clk
equivalence-
generators optimization a 0 d
checking
b 1
q
techniques – we’ll
s clk talk about this
netlist later
ASIC
physical signoff and timing:
design • use static timing
analysis
layout

Different Roles of Timing Analysis

Optimization is only relevant when there exists an objective


function
For many circuits the primary objective function is speed
Timing verification
z Before fabrication, ensure a chip meets its timing requirements

Timing-driven optimization – give fast accurate timing


information to guide tools as they evolve the chip
z Logic synthesis
z Placement
z Routing

8
Approach of Static Timing Verification

Combinational
Combinational Combinational logic
logic logic

clk
clk clk

• determine fastest permissible clock speed (e.g. 100MHz)


by determining delay (including set-up and hold time) of
longest path from register to register (e.g. 10ns.)

•largely eliminates need for gate-level simulation to verify the delay


of the circuit

Cycle Time - Critical Path Delay

Cycle time (T) cannot be smaller


than longest path delay (Tmax) cycle time
Longest (critical) path delay is a data
function of: Tclock1
Total gate, wire delays
Tmax ≤ T
z logic levels

Q1 Q2

Tclock1 critical path,


~5 logic levels Tclock2
clock
10
Cycle Time - Setup Time

For FFs to correctly latch data,


setup time
it must be stable during:
• Setup time (Tsetup) before
data
clock arrives Tclock1

Tmax + Tsetup ≤ T

Q1 Q2

Tclock1 critical path,


~5 logic levels Tclock2
clock
11

Cycle Time - Clock-skew

If clock network has unbalanced data


delay – clock skew
Tclock1
Cycle time is also a function of Tclock2
clock skew (Tskew) Q2
clock skew

Tmax + Tsetup + Tskew ≤ T

Q1 Q2

critical path,
Tclock1 ~5 logic levels Tclock2
clock
1212
Cycle Time - Clock to Q

Cycle time is also a function data


of propagation delay of Tclock1
FF (Tclk-to-Q)
Tclock2
Tclk-to-Q : time from arrival of Q2
clock signal till change at clock-to-Q
FF output)
Tmax + Tsetup + Tskew + Tclk − to −Q ≤ T

Q1 Q2

Tclock1 critical path,


~5 logic levels Tclock2
clock
13

Min Path Delay - Hold Time

For FFs to correctly latch data, hold time


data must be stable during:
data
• Hold time (Thold) after clock Tclock1
arrives
Determined by delay of shortest Tmin ≥ Thold + Tskew
path in circuit (Tmin) and
clock skew (Tskew)

Q1 Q2

Tclock1 short path, ~3


logic levels Tclock2
clock
14
One more time
cycle time

hold time –
D stable
after clock

set-up time – D stable


before clock

When signal
may change

Example of a single phase clock

15

Elements of Timing Verification

To verify circuit timing need


z Accurate delay calculation
z Timing analysis engine

Delay calculation
z Delay numbers for gates
z Delay numbers for wires

Timing analysis engine


z Circuit path analysis
z Integrating clock network and FF/latches

16
Delay Modeling and Delay Computation
Single path delay computation

tG1 tG2
tW1 tG3

D D
To simulate complex circuits, need accurate models of
z Gate delay
z Interconnect delay
Path delay = sum of 50%
Vdd
100% propagation delays

50%

tG1 tG2 tW1 tG3 Time 17

Gate Delay Modeling Requirements


Fast delay evaluation
z To enable full chip simulation
z Analytical models and look-up tables

Conservative delay models


z STA determines longest path delay under all possible
conditions

To enable fast tractable computation, have to give up on many


modeling details
z Input pattern dependencies
z Complex dynamic behavior is captured through tables

18
Gate Timing Characterization

A
CL
B D F

CL

“Extract” exact transistor characteristics from layout


z Transistor width, length, junction area and perimeter
z Local wire length and inter-wire distance

Compute all transistor and wire capacitances

19

Cell Timing Characterization

Delay tables generated using a detailed transistor-level


circuit simulator SPICE (differential-equations solver)

For a number of different input slews and load capacitances


simulate the circuit of the cell
z Propagation time (50% Vdd at input to 50% at output)
z Output slew (10% Vdd at output to 90% Vdd at output)

tslew
Vdd

tpd

Time
20
Scope of Variation

Inter-Die Variation
Intra-Die Variation

• As an ASIC vendor we
must ensure that our
%
N(µ,σ)
chips work across this
range of variation
• This is a very restrictive
condition and will be
responsible for many
conservative
assumptions throughout
timing analysis

21

Device Worst Case Model


Comparison Corner vs Real Data

FF FF: Fast NMOS


&
SF Fast PMOS

FS SF: Slow NMOS


&
SS Fast PMOS

3 corner Model:
TT, SS, FF
Corners from SPICE document
5 corner model:
Must use worst case assumptions for all
Slow/set-up and Fast/hold analysis 22
How Is Gate Delay Computed?
Non-linear effects reflected in tables
DG = f (CL, Sin) and Sout = f (CL, Sin)
z Non-linear

Interpolate between table entries


Interpolation error is usually below 10% of SPICE

Output Output
Capacitance Capacitance

Input Intrinsic Input Output


Slew Delay Slew Slew

Delay at the gate Resulting waveform


23

Conservatism of Gate Delay Modeling


True gate delay depends on input arrival time patterns
z STA will assume that only 1 input is switching
z Will use worst slope among several inputs

Vdd
A

D
A F
B tpd F
B CL

Time
Vdd
A
tpd F

Time 24
Elements of Timing Verification

To verify circuit timing need


z Accurate delay calculation
z Timing analysis engine

Delay calculation
z Delay numbers for gates
z Delay numbers for wires

Timing analysis engine


z Circuit path analysis
z Integrating clock network and FF/latches

25

Interconnect Impact on Chip

Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic
26
Wire-Dominated Chips
Wiring requirements grow dramatically

Number of interconnect layers (6-8)

Interconnect effects
z Large RC and RLC delays
z Inter-wire coupling

Interconnect becomes a
dominant factor in
limiting chip performance

TSMC Copper Process


27

Wire Delay Modeling

Lumped RC model
z Simple: R – total resistance, Vin R Vout
C – total capacitance
z Pessimistic and inaccurate C
z Spurious oscillations

Distributed RC model
z Required for longer interconnect lines
z Exact solution requires solving “diffusion equation”
z No closed-form solution - approximations

R1 R2 RN
C1 C2 CN

28
Wire Delay Modeling: Elmore Constant

Elmore Delay Constant


z Dominant time constant for step input
z Works for branch-less distributed RC networks
z No floating caps, grounded resistors

R1 R2 RN
C1 C2 CN

N i
τN = ∑ Ci ∑ R j = C1R1 + C2 (R1 + R2 ) + ... + Ci (R1 + R2 + ... + Ri )
i =1 j =1

29

Wire Delay Modeling: AWE


AWE (Asymptotic Waveform Evaluation) is a qth-order extension of the
Elmore delay for general RLC circuits

A generalized approach to linear RLC network response approximations


z Floating cap, grounded resistors, inductors
z Non-zero input transition times
z Initial conditions

Produces a reduced qth order model of transient response


z Trade-off between model order (complexity) and model accuracy

The q unknown time- constants, or poles, of the circuit, are


obtained through a moment matching technique (a Padé approximation)

A first-order AWE approximation reduces to RC tree methods


30
Constructing RC Network

Resistance estimation (extraction) is easy


Lw
Rw = ρ
W ⋅T

Capacitance estimation (extraction) is difficult


z Analytically
z Using electromagnetic 3D simulators / extractors

C12 S W

C11 C11 T
C10 H

31

Capacitance: The Parallel Plate Model

Current flow

W Electrical-field lines

tdi Dielectric Jan M. Rabaey


Anantha Chandrakasan
Borivoje Nikolic
Substrate

ε di S 1
cint = WL SCwire = =
t di S ⋅ SL SL
32
Interwire Capacitance

fringing parallel

33

Deriving Capacitance Information

Wire delays determined by layout S W


z Wire-to-wire spacing
T
z Wire-length
H

Problem: wire-to-wire capacitance and wire-length not known


until placement and routing of all circuit components

Two modes of evaluating wire delay


z Post P&R: 2, 2&1/2 or 3-D interconnect parasitic extraction (e.g.
capacitance and resistance), distributed RC delay models
z Synthesis: tabular data based on prior chips

34
Wire Load Models
Synthesis must produce circuits that meet the designer’s timing
constraints

Wire delay based on gathered empirical data from past chips or


estimated by typical (average) wirelength given by Rent’s rule – see
extra slides at the end of the lecture
z Function of FO
5
z Function of block size
4
fanout 3 cap
Lumped RC model used 2

1
100 500 1000 2000 3000
block size (gates)

35

Delay Modeling: Review

Components of cycle time


z Critical path delay, setup time, clock skew, clock-to-Q

Gate delay modeling


z Fast delay evaluation: look up tables
z Conservative models: worst-case assumptions
z Derived by running SPICE simulations of cells in the library

Wire delay modeling


More accurate
z Elmore model, RC tree, AWE
z Statistical wirelength modeling
Typically table look ups – or extracted values are used

36
Library (.lib) Embodies Gate/Wire Delay Info
Contains for each cell:
z Functional information: cell = a *b * c
z Timing information: function of
z input slew
z intrinsic delay

z output capacitance
Library
non-linear models used in tabular approach
z Physical footprint (area)
z Power characteristics

Wire-load models - function of


z Block size
z Fan-out

37

Elements of Timing Verification

To verify circuit timing need


z Accurate delay calculation
z Timing analysis engine

Delay calculation
z Delay numbers for gates
z Delay numbers for wires

Timing analysis engine


z Considering clock network and FF/latches
z Circuit path analysis

38
Elements of Static Timing Verification - 2

Combinational
Combinational Combinational logic
logic logic

clk
clk clk

Clocking issues
z Regimes: single-phase, two-phase, multi-phase
z overlapping, non-overlapping
z qualified clocks, clock skew

39

Clock Skew
The clock may be delivered to each FF at a different,
unsynchronized, time

From Dennis Sylvester


40
Elements of Static Timing Verification - 3

Combinational
Combinational Combinational logic
logic logic

clk
clk clk

Delay calculation procedure


z Longest graphical path
z Longest true delay
Method of calculation
z ``Batch mode’’
z Incremental
41

Typical Simplifications

Clocking issues
z clocking regime treated as orthogonal issue

Delay modeling
z gate - fixed pin-to-pin delays,
z interconnect - non-linear effects captured in tables
z Process-voltage-temperature - worst-case values used

Delay calculation
z ``extract’’ combinational logic from sequential circuit
z Choose for accuracy
z Simple ``longest-path analysis’’
z Boolean analysis excluding false paths

42
Elements of Timing Verification

To verify circuit timing need


z Accurate delay calculation
z Timing analysis engine

Delay calculation
z Delay numbers for gates
z Delay numbers for wires

Timing analysis engine


z Considering clock network and FF/latches
z Circuit path analysis
z Topologically/graphically based

z Including Boolean/functional pruning

43

Approach -reduce to combinational

Combinational
Combinational Combinational logic
logic logic

clk
clk clk
original circuit

Combinational
logic

extracted block
Each combinational block
.05 X
Arrival time in 0
green A .20
W 2
.10 1
.05 Z
Interconnect 0 C Y 2 f
delay in red .15
.20
2
B .20
1 .05
Gate delay in
blue
What’s the right
mathematical
object to use to
represent this
physical object?

45

Problem formulation - 1
Use a labeled .05 X
0
directed graph .20
W
A 2
G = <V,E> 1
.10 Z
.05
Vertices represent 0 C Y 2 f
gates, primary .15
.20
2
inputs and B .20
primary outputs 1 .05
Edges represent
A
0 X
wires 2
0 1 .20
Labels represent 0
C .1 .05 2 .15
W f
delays .2 Z
.20
Now what do we do 1 2
1 Y
with this? B
46
Problem formulation - Arrival Time
Arrival time A(v) for a node v is time when signal arrives at
node v
X
A(X) dx → z
A(Z)
Z
A(Y) dY → z
Y

A( υ) = max (A(u) + du →υ )
u∈FI( υ )

where dυ→u is delay from υ to u, FI(υ) = {X,Y}, and υ = {Z}.

47

Problem formulation - 2
Use a labeled .05 X
0
directed graph .20
W
A 2
G = <V,E> 1
.10 Z
.5
Enumerate all paths 0 C Y 2 f
- choose the .15
.20
2
longest? B .20
1 .05
A
0 X 2
0 1 .20
0
C .1 .5 2 .15
W f
.2 Z
.20
1
1 Y 2
B
48
Problem formulation - 3
0 A (Kirkpatrick 1966, IBM JRD)
0 X
2 Complexity?
0 1 .20
0
0 C0 .1 .5 2 .15 0
W f
.2 Z
Origin .1 0 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
49

Algorithm Execution
0 AO
0 X
2
0 1 .20
0
0 C0 .1 .5 2 .15 0
W f
.2 Z
Origin .1 0 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
50
Algorithm Execution
0 AO
0 X
2
0 1 .20
0 .15 0
0 0C .1 .5 2
0 W f
.2 Z
Origin .1 0 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
51

Algorithm Execution
0 AO
0 X
2
0 1 .20
0
0
0 C0 .1 .5 2 .15 0
W f
.2 Z
Origin .1 0 .1 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
52
Algorithm Execution
0 AO 2 X
0 2.0
0 1 .20
0
0
0 C0 .1 .5 2 .15 0
W f
.2 Z
Origin .1 0 .1 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
53

Algorithm Execution
AO
0 X
2
0 1 1.1 .20
0
0 C0 .1 .5 2 .15 0
W f
.2 Z
Origin .1 .1 .20
Y 2
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay (w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
54
Algorithm Execution
AO
0 X
2
0 1 1.1 .20
0
0 C0 .1 .5 2 .15 0
W f
2.2
.2 Z
Origin .1 .1 .20
1 Y 2
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
55

Algorithm Execution
AO
0 X
2
0 1 1.1 .20
0
0 C0 .1 .5 2 .15 0
W f
.2 3 Z
Origin .1 .1 .20
1 Y 2
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
56
Algorithm Execution
AO 2 X
0 3.6
0 1 1.1 .20
0
0 C0 .1 .5 2 .15 0
W f
.2 3 Z
Origin 1 0
2
.20
1 Y
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
57

Algorithm Execution
AO 2 X
0 3.6
0 1 1.1 .20
0
0 C0 .1 .5 2 5.8 .15
0
W f
2
.2 Z
Origin 1 0 .20
Y 3
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
58
Algorithm Execution
AO 2 X
0 3.6
0 1 1.1 .20
0
0 C0 .1 .5 2 5.8 .15
0
W f
2
.2 Z
Origin 1 0 .20
Y 3
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
59

Algorithm Execution
AO 2 X
0 3.6
0 1 1.1 .20 5.95
0
0 C0 .1 .5 2 5.8 .15
W f
2
.2 Z
Origin 1 0 .20 0
Y 3
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
60
Critical Path (sub-graph)
AO 2 X
0 3.6
0 1 1.1 .20 5.95
0
0 C0 .1 .5 2 5.8 .15
W f
2
.2 Z
Origin 1 0 .20 0
Y 3
1
B
Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
of the DAG)
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed
add w to W
}
Longest path(G)
Forward_prop(Origin)
}
61

Timing for Optimization: Extra


Requirements
Longest-path algorithm computes arrival times at each node
If we have timing constraints, need to propagate slack to
each node
z A measure of how much timing margin exists at each node
z Can optimize a particular branch
Can trade slack for power, area, robustness

clock
62
Required Time

Required time R(v) is the time before which a signal must


arrive to avoid a timing violation
Y
R(Y)
X dx → Y
R(X)
dX → z
R(Z)
Z

Required time is user defined at output:


R(v) = T - Tsetup

Then recursively
R( υ) = min (R(u) − dυ→u )
u∈FO( υ )

where FO( υ ) = {Y,Z} and υ = {X}


63

Required Time Propagation: Example

AO 2 X
0 3.6
0 1.45 1 1.1 .20 5.95
0 C 0 .1 .5 3.45 2 5.8 .15
0
0.95
W f
2 Z 5.65 5.80
-0.15 .2
Origin 0 .20 0
1 3
1 Y
0.45 3.45
B

Assume required time at output R(f) = 5.80


Propagate required times backwards

64
Timing Slack

From arrival and required time can compute slack. For each
node v:
S( υ) = R( υ) − A( υ)

Slack reflects criticality of a node


Positive slack
z Node is not on critical path. Timing constraints met.
Zero slack
z Node is on critical path. Timing constraints are barely met.
Negative slack
z There is a timing violation

Slack distribution is key for timing optimization!


65

Timing Slack Computation: Example

A
X
1.45
C -0.15
-0.15
W f
-0.15 Z -0.15 -0.15
Origin
Y
0.45 0.45
B

Compute slack at each node


S( υ) = R( υ) − A( υ)

66
Timing Slack Properties R. Nair et al, “Generation of Performance
Constraints for Layout”, TCAD 1989.

Path through a timing graph: ρ =< v1,v 2 ,...,vk >


k −1
Path slack: S(ρ) = R(vk ) − A(v1) − ∑ dv → v
i i+1
i =1
Path slack reflects slack at output assuming no other path in
circuit is active

Lemma 1: For any path ρ =< v1,v 2 ,...,vk > through circuit,
S(vi ) ≤ S(ρ), for 1 ≤ i ≤ k

Lemma 2: For each cell v ∈ V there is some path ρv


such that S(v) = S(ρv )

By increasing delay on only one cell in the path, can set path
slack to zero -> other cells also have zero slack.
67

Approach of Static Timing Verification

Combinational
Combinational Combinational logic
logic logic

clk
clk clk

• computing longest path delay


• full path enumeration - potentially exponential
• longest path algorithm on DAG (Kirkpatrick 1966, IBM JRD)
(O(v+e) or O(g + p))
• Currently in successful application on even the largest (>10M
gate) circuits
•has two challenges:
•asynchronous sub-circuits - limited gate-level simulation
• false paths - ubiquitous and problematic

68
Elements of Timing Verification

To verify circuit timing need


z Accurate delay calculation
z Timing analysis engine

Delay calculation
z Delay numbers for gates
z Delay numbers for wires

Timing analysis engine


z Considering clock network and FF/latches
z Circuit path analysis
z Topologically/graphically based

z Including Boolean/functional pruning

69

Interesting Example: Carry Bypass Adder

pi: carry propagate from state i


so s1
Pi: ai XORbi
mux

co c1 0 c2
Full Adder Full Adder
p1
p0 co 1

a0 b0 a1 b1
and

Lehman, Birla - IRETrans. Electron. Comput. , 1961


V. Oklobdzija - Jrnl. of VLSI Signal Processing, 1991

70
Inside Carry Bypass Adder - 1
late arriving
ci-1
ai
bi si+1

ai 1 ci+1
bi 0

ai+1
bi+1

ai+1
bi+1

Longest graphical/topological path runs along carry chain


from stage to stage
Longest path analysis would identify red path as critical

71

Inside Carry Bypass Adder - 2


late arriving
ci-1
ai
bi si+1

ai 1 needed 1 ci+1
bi 0
1 needed
ai+1
bi+1
1 created
ai+1
bi+1

To sensitize red path we need: ai⊕bi && ai+1⊕ bi+1


But: red path is false because when this condition is true MUX
selects “1” input, i.e. directly from ci-1
Instead shorter green paths are sensitized and red path is not the
critical path of the circuit
False paths first observed by V. Hrapcenko (Soviet Math. Dokl. 1978 )

72
Capturing Functional Behavior in Analysis

Failure to remove false paths leads to conservative delay


estimation

Looking for a path delay calculation approach that :


z Is conservative
z Is not (overly) pessimistic
z Incorporates functional (Boolean) behavior
z Eliminates false paths from analysis

Two approaches to false path elimination


z User-specified path exceptions (still mainstream)
z Automatic false-path detection

73

False Path Detection

Sensitization criterion: a condition under which a signal


transition at gate input will propagate to the gate output

Two-vector (transition mode) condition


z More complex: two-vector condition with (2^n * 2^n) vs. single
vector condition (2^n space)
z Does not satisfy monotone speedup property
z Path delay increased when gate delay decreases

One-vector (floating mode) condition


z Previous node value is (conservatively) indeterminate

74
Chen-Du Path Sensitization Condition

Path from a gate input to the gate output is true iff the input
gives
1. The earliest controlling value
2. The latest non-controlling value if all inputs are non-
controlling

a a
b b

Functional timing analysis


z Test each path for falsity
z Implicit path delay sensitization (PODEM)

75

Enhancements of STA – project ideas

Incremental timing

Incorporation of deep sub-micron effects - crosstalk

Incorporation of physical variation Æ timing variation –


statistical timing

76
Current Status of Static Timing Verification
Static timing verification is now the principal method for verifying
timing in the majority of digital circuits
z ASICs:
z Gate-level models from Semiconductor vendors
z Driven by fast interpolation from tables
z Estimated (wire-load model) or back-annotated
capacitances
z Custom (e.g. Intel microprocessor)/COT (e.g. nVidia, ATI graphics
chips)
z Mixture of above and transistor-level static-timing
verification
z Transistor level based on extracted device and wiring
attributes
z Simulation of each transistor network builds accurate
model

77

Extras

Rent’s rule
RC Models

78
Statistical Wire Length Estimation

Before P&R, need to predict wire


length for each net

Can predict average (typical)


wirelength based on empirical
statistical regularities

Early on data showed that there


exist strong correlation
(Bakoglu)
between number of IO
terminals and number of gates
in the block

79

Rent’s Rule

In 1960, E.F.Rent showed that


Np = K(Ng )β
where Np =no. of external signal connections on a block, Ng = no.
of logic gates in a block, K accounts for # of pins per gate, and
β is Rent’ s constant – is a measure of many of the gates in a
circuit block need to communicate with the outside world.

Rent’ s Rule is very circuit-fabric (architecture) specific

Chip Type Rent’s constant Proportionality Constant K

Static Memory 0.12 6

Microprocessor 0.45 0.82

Gate Array 0.5 1.9

Typical values of Rent’s Coefficient (Bakoglu)


80
Statistical Wirelength Estimation
Estimate average wiring length for point-to-point nets by
applying Rent’s rule recursively:
z Partition the chip into hierarchical divisions
z Estimate the connections between partitions by Rent’ s Rule
Lavg = f(Ng, β)
Wire length also depends on fan out of the net
Lavg (FO) = Lavg (1 + 0.4(FO − 1))

(J. Davis)
Wirelength (gate pitches) 81

Wire Delay Modeling: RC Trees

For RC trees (branching networks), delay can be estimated


using Rubinstein-Penfield Theorem 2

C2
R2
4
s R1 1
C4
R3 R4
C1
3

C3 Ri
i

Ci

If a single dominant time-constant exists, then


N
τi ∝ ∑ Ck Ri,k
k =1
where Ri,k = ∑ R j ⇒ (R j ∈ [path(i → s) I path(k → s)])

E.g. τi == C1R1 + C2R1 + C3 (R1 + R3 ) + C4 (R1 + R3 ) + Ci (R1 + R3 + Ri )


82
Wire Delay Modeling: RC Trees

The theorem also establishes precise bounds on voltage


waveforms

Same limitations as for Elmore constant


z RC- tree topologies (not applicable with inter-nodal capacitances)
z Instantaneous input transition times
83

You might also like