Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022
Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022
Lecture 5:
Timing Analysis
Semester B, 2021-22
Lecturer: Zvika Webb
21 March 2022
Disclaimer: This course was prepared, in its entirety, by Adam Teman and Zvi Webb. Many materials were copied from sources freely available on the internet. When possible, these sources
have been cited; however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be
cited or removed, please feel free to email webb@[Link] and I will address this as soon as possible.
Lecture Outline
• Sequential Clocking
• Static Timing Analysis (STA)
• Design Constraints
• Timing Reports
• Multi-Mode Multi-Corner
• Advance STA
2
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner
Sequential Clocking
Synchronous Design - Reminder
• The majority of digital designs are Synchronous
and constructed with Sequential Elements.
• Synchronous design eliminates races
(like a traffic light).
• Pipelining increases throughput.
• We will assume that all sequentials are
Edge-Triggered, using D-Flip Flops as registers.
• D-Flip Flops have three critical timing parameters:
• tcq – clock to output: essentially a propagation delay
• tsetup – setup time: the time the data needs to arrive before the clock
• thold – hold time: the time the data has to be stable after the clock
4
Timing Parameters - tcq
• tcq is the time from the clock edge until the data
appears at the output.
• The tcq for rising and falling outputs is different.
clk
clk
6
Timing Parameters - thold
• thold - Hold time is the time the data has to be stable after the clock
to ensure correct sampling.
clk
7
Timing Constraints
• There are two main problems that can arise in synchronous logic:
• Max Delay: The data doesn’t have enough time to pass
from one register to the next before the next clock edge.
• Min Delay: The data path is so short that it passes through
several registers during the same clock cycle.
8
Setup (Max) Constraint
• Let’s see what makes up our clock cycle:
• After the clock rises, it takes tcq for the data to propagate to point A.
• Then the data goes through the delay of the logic to get to point B.
• The data has to arrive at point B, tsetup before the next clock.
• In general, our timing path is a race:
• Between the Data Arrival, starting with the launching clock edge.
• And the Data Capture, one clock period later.
clk D Q Logic D Q
A B
D tcq
A clk
B tsetup
9
Setup (Max) Constraint
Launch Path
margin
Capture Path
clk
Logic
D A B
tcq
B clk
thold
11
Hold (Min) Constraint
Launch Path
margin
Capture Path
13
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner
• Disadvantages:
• Proper circuit functionality is NOT checked
• Must define timing requirements/exceptions
(garbage in garbage out!)
• Limitations:
• Only useful for synchronous design
• Cannot analyze combinatorial feedback loops
• e.g., a flip-flop created out of basic logic gates
• Cannot analyze asynchronous timing issues
• Such as clock domain crossing
• Will not check for glitching effects on asynchronous pins
• Combinatorial logic driving asynch (set/reset) pins of sequential elements will not be checked for glitching
15
Timing Paths
• A path is a route from a Startpoint to an Endpoint. D Q A S D Q
• Startpoint (SP)
• Clock pins of the flip flops Clk Clk
• Input ports , a.k.a Primary Inputs (PI)
• Endpoints (EP) D Q B Co D Q
16
Static Timing Analysis
• Four categories of timing paths
• Register to Register (reg2reg) • Input to Register (in2reg)
• Register to Output (reg2out) • Input to Output (in2out)
17
Goals of Static Timing Analysis
• Verify max delay and min delay constraints are met for all paths in a design.
• Start with a Gate-Level Netlist.
• Timing Models are provided for every gate in the library.
• Static Timing Analysis needs to report if any path violates
the max/min delay constraints.
• But is this enough?
• No!
• We want to know all the paths that violate the timing constraints.
• In fact, we want to know the timing of all paths reported in order of length.
• And we want to know where the problems are so we can go about fixing them.
• Let’s see the basic idea of how this can be done.
18
Some basic assumptions
• Our design is synchronous
• In addition, we will only be showing how to deal with combinational elements
and max delay constraints.
• We will assume a pin-to-pin delay model
• In other words, each gate has a single, constant delay from input to output.
• In the real world, gate delay is affected by many factors, such as gate type,
loading, waveform shape, transition direction, particular pin, and random
variation.
• We will see how a real design gets all this data in the next lecture.
• We will take a topological approach
• In other words, we disregard the logical functionality of the gates and therefore,
consider all paths, though some of them cannot logically happen.
• More on this later…
19
Simple path representation
a c
• Let’s say we have the following circuit: b e
d
• And the timing model of our AND gate is: 2
2
• We will build a graph:
• Vertices: Wires, 1 per gate output and 1 for a 2
2
each SP and EP. c
e
• Edges: Gates, input pin to output pin, b 2
1 edge per input with a delay for each edge. d 2
21
How do we compute ATs and RATs?
• Recursively!
• The Arrival Time at a node is just the maximum of the ATs at the predecessor
nodes plus the delay from that node.
• The Required Arrival Time to a node is just the minimum of the RATs at the
successor nodes minus the delay to that node.
0 n SRC
AT n max AT p p, n n SRC
ppred n
T n SNK
RAT n min RAT s n, s n SNK
ssucc n
22
So let’s try to understand AT, RAT, and Slack
If the signal arrives too late, we
get negative slack, which means
there is a timing violation.
Slack
Slack
RAT(n)
AT(n) RAT(n)
AT(n)
RAT: longest logic
AT: longest delay to the capture
logic delay edge of the clock
after launch (dependent on
of clock cycle time)
Launch Clock cycle time (T) Capture Clock cycle time (T)
23
Now let’s see an example
• Just look at this path and try to find the worst path.
• Does it meet a cycle time of T=12 ?
3 g
a 1
d 2
j
5 1
b 4 f
3
c 1 2 k
4
3 h
2 e 5 n
• Now let’s fill in the RAT, AT, and SLACK of each node and:
• Quickly find out if we meet timing
• Figure out what the worst path is
24
Now let’s see an example
• We’ll start by representing it as a
directed acyclic graph (DAG)
• Next, we’ll compute ATs from SRC to SNK
0 1 4 7
1 3 2
a d g j
0 5 1 0
0 0 6 12 15
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 3 15 0
5
2 10
c e n
25
Now let’s see an example
• And now RAT from SNK to SRC
0 -3 1 -2 4 10 7 12
1 3 2
a d g j
0 5 1 0
0 -3 0 -1 6 3 12 12 15 12
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 2 4 3 15 12 0
5
2 10 7
c e n
26
Now let’s see an example
• And finally, we can calculate the slack.
• And guess what – we found the critical path!
0 -3 -3 1 -2 -3 4 10 6 7 12 5
1 3 2
a d g j
0 5 1 0
0 -3 -3 0 -1 -1 6 3 -3 12 12 0 15 12 -3
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 2 2 4 2 3 15 12 -3 0
5
2 10 7 -3
c e n
27
False Paths
• We saw how to find the RAT, AT and Slack at every node.
• All of this can be done very efficiently and be adapted for min timing,
sequential elements, latch-based timing, etc.
• Even better, we can quickly report the order of the critical paths.
• However, this was all done topologically (i.e., without looking at logic).
• Let’s see why this is a problem
This is called a “False Path”
8 2 8 8
a d g 2 a d 2 8 g 2
0 0
f 1 2 j
1 1 1 2 1
b e h 1
2 b e 2 h
1 1 1 1
c i c
28
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner
Design Constraints
Timing Constraints
• “Stupid Question”:
• How does the STA tool know what the required clock period is?
• Obvious Answer…
• We have to tell it!
• We have to define constraints for the design.
• This is usually done using the
Synopsys Design Constraints (SDC) syntax,
which is a superset of TCL.
• Three main categories of timing constraints:
• Clock definitions
• Modeling the world external to the chip
• Timing exceptions
30
Collections
• So you think you know TCL, right?
• Well EDA tools sometimes use a different data
structure called a “collection”
• A collection is similar to a TCL list, but:
• The value of a collection is not a string, but rather a pointer, and we need to
use special functions to access its values.
• For example, if you were to run foreach on a collection, it would just have one
element (the pointer to the collection). Instead, use foreach_in_collection.
• I won’t go into the specifics here (see SynopsysCommandsReference), but
these are some of the collection accessing functions:
wire n1;
• Before starting with constraints, Pin
let’s look at some very useful built in commands: Cell Net
• Note that all of these return collections INVx1 U1 (.in(a),.out(n1));
and not TCL lists! Reference
NANDX3 U2
• These will only work after design elaboration! (.in1(n1),.in2(b),.out(out));
• “get” commands: endmodule
• [get_ports string] – returns all ports that match string.
• [get_pins string] – returns all cell/macro pins that match string.
• [get_nets string] – returns all nets that match string.
• Note that adding the –hier option will search hierarchically through the design.
• “all” commands:
• [all_inputs] – returns all the primary inputs (ports) of the block.
• [all_outputs] – returns all the primary outputs (ports) of the block.
• [all_registers] – returns all the registers in the block.
33
Clock Definitions
• To start, we must define a clock:
• Where does the clock come from? (i.e., input port, output of PLL, etc.)
• What is the clock period? (=operating frequency)
• What is the duty-cycle of the clock?
34
Clock Definitions (2)
• But during synthesis, we assume the clock is ideal, so:
set_ideal_network [get_ports clk]
35
I/O Constraints
• Now that the clock is defined, reg2reg paths are sufficiently constrained.
However, what about in2reg, reg2out, and in2out paths?
• First, what clock toggles an I/O port?
• And what about the time needed outside the chip?
• Note that a better methodology is to define a “virtual clock”, but let’s not confuse you too much at
this point…
36
I/O Constraint (2)
• An alternative approach is to define max delays to/from I/Os:
set_max_delay 5 \
–from [remove_from_collection [all_inputs] [get_ports clk]]
set_max_delay 5 –to [all_outputs]
37
I/O Constraint (3)
• Graphically, we can summarize the I/O constraints, as follows:
Input and
Output Delays
1 1
• In this case, we would define a false path: c
39
Timing Exceptions (2)
• Another common case of a false path is a clock
domain crossing through a synchronizer:
set_false_path –from F1/CP –to F2/D
40
Case Analysis
• A common case for designs is that some value should be assumed constant
• For example, setting a register for a certain operating mode.
• In such cases, many timing paths are false
• For example, if the constant sets a multiplexer selector.
• Or a ‘0’ is driven to one of the inputs of an AND gate.
• To propagate these constants through the design and disable irrelevant timing
arcs, a set_case_analysis constraint is used:
41
Design Rule Violations (DRV)
• You can set specific design rules that should be met, for example:
• Maximum transition through a net.
set_max_transition $MAX_TRAN_IN_NS
42
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner
Timing Reports
43
Check Types
• Throughout this lecture, we have
discussed the two primary timing checks:
• Setup (max) Delay
• Hold (min) Delay
• However, in practice, there are other
categories of timing checks that you will
encounter:
• Recovery
• Removal
• Clock Gating
• Min Pulse Width
• Data-to-Data
44
Recovery, Removal and MPW
• Recovery Check
• The minimum time that an asynchronous control
input pin must be stable after being deasserted and
before the next clock transition (active-edge)
• Removal Check
• The minimum time that an asynchronous control
input pin must be stable before being deasserted and
after the previous clock transition (active edge)
• Minimum Clock Pulse Width (MPW)
• The amount of time after the rising/falling edge of a
clock that the clock signal must remain stable.
45
Clock Gating Check
• Clock gating occurrences are any signals on the clock path
that block (gate) the clock from propagating.
• The enable path of the clock gate must arrive enough time before the clock
itself to ensure glitch-free functionality (and similarly hold after the edge).
Ex. 1: Gating signal should only change Ex. 2: Gating signal should only change
when the clock is in the low state when the clock is in high low state
46
Checking your design
• report_analysis_coverage checks that you have fully constrained your
design. pt_shell> report_analysis_coverage
Type of Check Total Met Violated Untested
-------------------------------------------------------------------------------
setup 6724 5366 ( 80%) 0 ( 0%) 1358 ( 20%)
hold 6732 5366 ( 80%) 0 ( 0%) 1366 ( 20%)
recovery 362 302 ( 83%) 0 ( 0%) 60 ( 17%)
Library
Timing Checks
removal 354 302 ( 85%) 0 ( 0%) 52 ( 15%)
min_pulse_width 4672 4310 ( 92%) 0 ( 0%) 362 ( 8%)
clock_gating_setup 65 65 (100%) 0 ( 0%) 0 ( 0%)
clock_gating_hold 65 65 (100%) 0 ( 0%) 0 ( 0%)
From out_setup 138 138 (100%) 0 ( 0%) 0 ( 0%)
set_output_delay out_hold 138 74 ( 54%) 64 ( 46%) 0 ( 0%)
-------------------------------------------------------------------------------
All Checks 19250 15988 ( 84%) 64 ( 0%) 3198 ( 16%)
48
Report Timing (Max Delay) report_timing
Data Required
Data
Arrival
Time
FF1/clk 5.1ns
1.1ns
Slack is the difference
FF2/D between data arrival and
required.
Data Hold
Required
50 FF2/clk
Report Timing (Min Delay)
report_timing –delay min
Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk)
Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk)
Path Group: Clk
Path Type: min
52
Report Timing - Path Groups
• Path groups are categories of paths that are both
optimized and reported separately.
• Default path groups are per ends Clock.
• In addition, paths ending at clock gates are treated separately.
• To create specific path groups for your design, use the group_path command:
53
Report Timing Debugger
• A very good GUI option is to use the PT “Timing Status” tool.
• This tool lets you explore the
timing report interactively, even
showing path schematics, SDC,
and highlighting the path in the
layout.
54
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner
Multi-Mode Multi-Corner
Or how to deal with the corner crisis!
55
More than one operating mode
• During synthesis, we (usually) target timing for a worst-case scenario.
• But, what is “worst-case”?
• Intuitively, that would be a slow corner, (i.e., SS, VDD-10%, 125C)
• No need for hold checking, since clock is ideal (No skew = No hold)
• But, what if there is an additional operating mode?
Mode TEST_MODE FREQ
• For example, a test (scan) mode.
Functional 0 1 GHz
• Do we have to close timing
Test 1 10 MHz
at the same (high) clock speed?
• No problem, we’ll just deal with both modes separately
• Prepare an additional SDC and rerun STA/optimization.
58
Distributed Multi-Scenario Analysis (DMSA)
• DMSA provides efficient unified analysis of multiple PrimeTime analyses (or
scenarios) from a single master PrimeTime
• We can work with multiple analyses as easily as with a single PrimeTime analysis run!
59
Multi-Mode, Multi-Corner
• MMMC to the rescue!
• It’s implemented in a slightly (!) confusing way,
but it really simplifies things.
• The basic concept is that we create analysis views that can then be selected for
setup and hold (max and min) constraints.
Analysis View = “Turbo” Analysis View = “Low Power” Analysis View = “Turbo - Hold”
Advance STA
POCV, GBA vs PBA and SI
63
Yield-driven and Advanced STA
• There are many more concepts, approaches, and terminologies
used in timing analysis for high-yield signoff:
• On-chip Variation (OCV)
• Advanced On-Chip Variation (AOCV)
• Graph Based Analysis (GBA) , Path Based Analysis (PBA)
• Signal Integrity (SI)
• ECO
• and more and more…*
64
OCV -> AOCV -> POCV
65
Traditional Accounting for On Chip Variation (OCV) Effects
Min-Max Slew Cell delay based on
Propagation Min-Max slews
Analyzing for the
Worst case Timing Check Path
Launch
OCV effect Setup Slowest launch and Fastest capture
on timing U3 Path
Hold Fastest launch and Slowest capture
A U6 D Q
Y
D Q B
0.55 U5 Environment:
U4
Environment: “0.60” CP Min-Max Loads
U1 CP
Min-Max Drives
setup/hold
CLK
U2 0.21/0.08
0.62 “0.23/0.10”
One Process Corner “0.65” Capture
Path
Min-Max OCV
Libraries for On-chip variation (OCV) analysis is to recognize the intrinsic
that Corner variability of semiconductor processes and their impact on timing.
i.e, each library cell can have a variation between a maximum
and minimum delay value (for example) due to OCV effect.
Example: OCV Analysis with Timing Derate
# In the SLOW PVT corner: Models an 8% speed-up and a
set_timing_derate –early 0.92 4% slow-down from the SLOW
set_timing_derate –late 1.04 PVT corner
_RESET U6 U7 4% SLOWER
FF2 Slow
Launch Path corner SLOW
Delay 8% FASTER
FF1
CLK1 U1 U2
CRP Fast 7% SLOWER
Capture corner FAST
U4 U5
Path 4% SLOWER
Operating
CLK2 U3 Conditions
~ P / -V / T
Setup Hold
Launch uses LATE derate: Launch uses EARLY derate:
1.04 x SLOW corner delays 0.92 x SLOW corner delays
Capture uses EARLY derate: Capture uses LATE derate:
0.92 x SLOW corner delays 1.04 x SLOW corner delays
Timing Pessimism with OCV Analysis : TFU
The “min-max” OCV delay range of BF1 in the SLOW corner is 90-100 ps:
BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1
2. On actual silicon, the more realistic total delay for ten BF1 buffers will be
between 900ps and 1000ps. Therefore, OCV timing analysis is:
a. Optimistic
b. Pessimistic
4. Since process (P) variation is random, its effects are smaller in a path with:
a. 30 BF1 cells
b. 3 BF1 cells
68
AOCV Calculates Derating Based on Depth and Distance
Advanced On-Chip Variation provides more realistic variable derating by
taking path depth and distance into account
• Random P variation modeled as a Launch Cell Path Depth = 5
function of logical path depth
• Systemic V/T variation modeled as a
function of the maximum physical distance
between related timing path cells and nets
CRPR
CRPR
AOCV depth+distance variation table Common
Common
Point
Point Depth=0 Capture Cell Path Depth = 3
version: 1.0 Depth = 0
object_type: lib_cell
delay_type: cell
rf_type: rise
derate_type: late
object_spec: lib/MUX2_1 Depth 1 2 3 4 5 ...
depth: 1 2 3 4 5 … Variable
1.214 1.153 1.109 1.078 1.056 ...
distance: 100 500 Derate
table: 1.214 1.153 1.109 1.078 1.056 …\ Flat
1.271 1.201 1.145 1.121 1.100 … 1.15 1.15 1.15 1.15 1.15 ...
Derate
69
AOCV Pessimism
Modeling random P variation based on path depth in AOCV can be
pessimistic due to a cell belonging to multiple paths (and multiple depths)
Stage AOCV %
4 Count Derat Increase
e
1 1.160 16.0%
4 1.097 9.7%
6 1.072 7.2%
39 1.034 3.4%
• GBA-based AOCV uses the worst-case 9.7% derate for the shared “AND3”, based on the
shorter 4-stage path, for the 39-stage timing path
• PBA-based AOCV uses the actual path specific 3.4% derate, based on the actual 39-stage
count of the path
70 • GBA AOCV pessimism for the long path = 9.7% - 3.4% = 6.3%
Parametric On-Chip Variation Modeling of Random Variation
• POCV does not rely on path depths to model random variation – instead uses
a Gaussian Distribution model for individual cell delays
• Each cell has a nominal or mean (m) delay and a standard deviation (σ)
delay value
(σ is also called sensitivity)
71
• Eliminates GBA-based pessimism due to random variations
GBA - PBA
72
Fast Analysis != Final Analysis
A[1]
U0 Cell delay = f (i/p slew, o/p cap)
U3
U7
o/p slew = f(Driving cell’s i/p slew, …)
B[2] U10
U1
Two different
U8 input pin slews.
A[2] (Pin A: Slower, Pin B: Faster slew)
U4
B[0]
U12 Z[2]
U5 A
U9 U11
B[1]
B
U2 U6 Worst slew gets propagated by default
A[0]
to calculate U11 cell delay
• STA is fast [doesn’t require simulation]
• The default analysis in PT is called Graph based analysis (GBA)
• Analysis is faster -- Uses worst case slew (through pin U9/A) propagation to calculate U11 cell delay
• Analyzed path through pin U9/B is pessimistic due to U11 cell delay
• The GBA pessimism is OK for earlier PT runs [faster analysis]
• When approaching signoff, this pessimism has to be removed by using path based analysis (PBA)
73
Graph Based Analysis (GBA)
• The default analysis in PT is known as GBA which uses the worst input slew of
a cell to calculate that cell’s output transition
• The calculated output slew will be used to calculate the delays of downstream
cells
• This can lead to pessimism in the analysis of certain paths, ex: when FF2-FF3
path is analyzed using the U1/A pin slew.
(Worst slew)
U 1 /A
FF 1
U1 U2 U3
U 1 /B U1 /Z
FF 3
(late arriving path) (Best slew)
FF 2
74
SI
75
What is Crosstalk?
Crosstalk is the transfer of a voltage transition from multiple switching nets
(aggressors) to another static or switching net (victim) through a coupling
capacitance (Cc).
Switching Aggressor
net 1 Aggressor
Cc
net 2 Victim
Switching Aggressors
Switching Victims
D Q
Shortest
path
Clk
Longest
D Q path
net 1 Aggressor
Cc
net 2 Victim
Noise is a problem,
when the noise bump
crosses the logic
“Static noise” threshold of a down-
stream gate (called a
“logic failure”).
Unsafe noise bumps : Crossing the Logic
Thresholds
If the noise bump crosses the logic thresholds of the downstream gate, the
output of that gate may generate a logic swing Potential problem!
Logic threshold of
down-stream gate
VDD VDD
0.5 VDD 0.5 VDD
0V 0V
If the noise bump does not cross the logic threshold of the downstream
gate, there is no negative effect – the noise bump is considered safe!
80
Example Functional Failures due to Noise
Aggressor Noise on Data = Potential
wrong logic value captured!
Functional Failure
Constraints
• Layout (DEF) + Floorplan
PrimeTime SI Constraints (Tcl)
Coupled SPEF
• Ref Library + Tech Info (LEF)
STA/SI
• RC Parasitics with coordinates
(SPEF)
No • Standard cell spacing rules for
Violations? Signoff advanced technologies
list
82
Recommended ECO Guidance Flow in PrimeTime
Recommended Noise and
Power DRC Fixing Final leakage
ECO
recovery Timing Fixing recovery
guidance
swap HVT_BUF1X
HVT_BUF1X LVT_BUF1X
NVT_BUF1X
NVT_BUF1X
LVT_BUF1X HVT_BUF1X
NVT_BUF1X swap
HVT_BUF1X no swap
Changing
pattern • The changing pattern can be a prefix, suffix or in the middle
• Fixed pattern has to match for the swappable cells
84
Reporting Vt Cell Usage Example
report_cell_usage –pattern_priority {HVT LVT}
Before :
Pattern Combinational cell Sequential cell Total More LVT, Fewer HVT cells
--------------------------------------------------------------------
HVT 82081 ( 2%) 1078 ( 0%) 83159 ( 2%)
LVT 3908613 ( 83%) 696435 ( 15%) 4605048 ( 98%)
Others 4115 ( 0%) 532 ( 0%) 4647 ( 0%)
85
References
• Gil Rahav, BGU
• Gangadharan, Churiwala “Constraining Designs for Synthesis and Timing
Analysis: A Practical Guide to Synopsys Design Constraints (SDC)”, Springer,
2013
• Synopsys Solvnet (PT,Synthesis Quick Reference)
• Cadence Support (+Genus and Innovus Text Command References)
86