100% found this document useful (1 vote)
1K views86 pages

Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022

The document provides an overview of static timing analysis (STA) for digital VLSI design. STA checks the worst case propagation delays for all timing paths in a design to analyze setup and hold constraints without simulation. Timing paths connect startpoints like flip-flop clock pins or inputs to endpoints like flip-flop inputs or outputs. STA is useful for synchronous designs but cannot check functionality or asynchronous timing issues. The analysis relies on accurately defined timing requirements and constraints.

Uploaded by

Shay Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views86 pages

Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022

The document provides an overview of static timing analysis (STA) for digital VLSI design. STA checks the worst case propagation delays for all timing paths in a design to analyze setup and hold constraints without simulation. Timing paths connect startpoints like flip-flop clock pins or inputs to endpoints like flip-flop inputs or outputs. STA is useful for synchronous designs but cannot check functionality or asynchronous timing issues. The analysis relies on accurately defined timing requirements and constraints.

Uploaded by

Shay Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Digital VLSI Design

Lecture 5:
Timing Analysis
Semester B, 2021-22
Lecturer: Zvika Webb
21 March 2022

Disclaimer: This course was prepared, in its entirety, by Adam Teman and Zvi Webb. Many materials were copied from sources freely available on the internet. When possible, these sources
have been cited; however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be
cited or removed, please feel free to email webb@[Link] and I will address this as soon as possible.
Lecture Outline

• Sequential Clocking
• Static Timing Analysis (STA)
• Design Constraints
• Timing Reports
• Multi-Mode Multi-Corner
• Advance STA

2
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Sequential Clocking
Synchronous Design - Reminder
• The majority of digital designs are Synchronous
and constructed with Sequential Elements.
• Synchronous design eliminates races
(like a traffic light).
• Pipelining increases throughput.
• We will assume that all sequentials are
Edge-Triggered, using D-Flip Flops as registers.
• D-Flip Flops have three critical timing parameters:
• tcq – clock to output: essentially a propagation delay
• tsetup – setup time: the time the data needs to arrive before the clock
• thold – hold time: the time the data has to be stable after the clock

4
Timing Parameters - tcq
• tcq is the time from the clock edge until the data
appears at the output.
• The tcq for rising and falling outputs is different.

clk

tcqLH tcqHL tcqLH


5
Timing Parameters - tsetup
• tsetup - Setup time is the time the data has to arrive before the clock
to ensure correct sampling.

clk

tsu tsu tsu

Good! Good! BAD!

6
Timing Parameters - thold
• thold - Hold time is the time the data has to be stable after the clock
to ensure correct sampling.

clk

thold thold thold

Good! Good! BAD!


Q

7
Timing Constraints
• There are two main problems that can arise in synchronous logic:
• Max Delay: The data doesn’t have enough time to pass
from one register to the next before the next clock edge.
• Min Delay: The data path is so short that it passes through
several registers during the same clock cycle.

• Max delay violations are a result of a slow data path,


including the registers’ tsetup, therefore it is often called
the “Setup” path.
• Min delay violations are a result of a short data path,
causing the data to change before the thold has passed,
therefore it is often called the “Hold” path.

8
Setup (Max) Constraint
• Let’s see what makes up our clock cycle:
• After the clock rises, it takes tcq for the data to propagate to point A.
• Then the data goes through the delay of the logic to get to point B.
• The data has to arrive at point B, tsetup before the next clock.
• In general, our timing path is a race:
• Between the Data Arrival, starting with the launching clock edge.
• And the Data Capture, one clock period later.

clk D Q Logic D Q
A B
D tcq

A clk

B tsetup
9
Setup (Max) Constraint
Launch Path

margin

positive clock skew

Capture Path

T  tcq  tlogic  tsetup


Adding in clock skew and other guardbands:

T   skew  tcq  tlogic  tsetup   margin


Hold (Min) Constraint
• Hold problems occur due to the logic changing before thold has passed.
• This is not a function of cycle time – it is relative to a single clock edge!
• Let’s see how this can happen:
• The clock rises and the data at A changes after tcq.
• The data at B changes tpd(logic) later.
• Since the data at B had to stay stable for thold after the clock (for the second
register), the change at B has to be at least thold after the clock edge.

clk
Logic
D A B
tcq

B clk
thold
11
Hold (Min) Constraint
Launch Path

margin

positive clock skew

Capture Path

tcq  tlogic  thold triggered on same clock edge!

Adding in clock skew and other guardbands:

tcq  tlogic   margin  thold   skew


Summary
• For Setup constraints, the data has to propagate fast
enough to be captured by the next clock edge: tlaunch  T  tcapture
• This sets our maximum frequency.
• If we have setup failures, we can T   skew  tcq  tlogic  tsetup   margin
always just slow down the clock.

• For Hold constraints, the data path delay has to


be long enough so it isn’t accidentally captured
by the same clock edge: tlaunch  tcapture
• This is independent of clock period.
• If there is a hold failure,
you can throw your chip away!
tcq  tlogic   margin  thold   skew

13
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Static Timing Analysis


Or why and how to calculate slack.

This section is heavily based on Rob Rutenbar’s “From Logic to Layout”,


Lecture 12 from 2013. For a better  and more detailed explanation, do
yourself a favor and go see the original!
Static Timing Analysis (STA)
• STA checks the worst case propagation of all possible vectors for min/max delays.
• Advantages:
• Much faster than timing-driven, gate-level simulation
• Exhaustive, i.e., every (constrained) timing path is checked.
• Vector generation NOT required

• Disadvantages:
• Proper circuit functionality is NOT checked
• Must define timing requirements/exceptions
(garbage in  garbage out!)

• Limitations:
• Only useful for synchronous design
• Cannot analyze combinatorial feedback loops
• e.g., a flip-flop created out of basic logic gates
• Cannot analyze asynchronous timing issues
• Such as clock domain crossing
• Will not check for glitching effects on asynchronous pins
• Combinatorial logic driving asynch (set/reset) pins of sequential elements will not be checked for glitching

15
Timing Paths
• A path is a route from a Startpoint to an Endpoint. D Q A S D Q
• Startpoint (SP)
• Clock pins of the flip flops Clk Clk
• Input ports , a.k.a Primary Inputs (PI)
• Endpoints (EP) D Q B Co D Q

• Input pins of the flip flops


(except the clock pins) Clk Clk
• Output ports, a.k.a Primary Outputs (PO)
• Memories / Hard macros D Q Ci

• There can be:


• Many paths going to any one endpoint Clk

• Many paths for each start-point and end-point combination

16
Static Timing Analysis
• Four categories of timing paths
• Register to Register (reg2reg) • Input to Register (in2reg)
• Register to Output (reg2out) • Input to Output (in2out)

17
Goals of Static Timing Analysis
• Verify max delay and min delay constraints are met for all paths in a design.
• Start with a Gate-Level Netlist.
• Timing Models are provided for every gate in the library.
• Static Timing Analysis needs to report if any path violates
the max/min delay constraints.
• But is this enough?
• No!
• We want to know all the paths that violate the timing constraints.
• In fact, we want to know the timing of all paths reported in order of length.
• And we want to know where the problems are so we can go about fixing them.
• Let’s see the basic idea of how this can be done.

18
Some basic assumptions
• Our design is synchronous
• In addition, we will only be showing how to deal with combinational elements
and max delay constraints.
• We will assume a pin-to-pin delay model
• In other words, each gate has a single, constant delay from input to output.
• In the real world, gate delay is affected by many factors, such as gate type,
loading, waveform shape, transition direction, particular pin, and random
variation.
• We will see how a real design gets all this data in the next lecture.
• We will take a topological approach
• In other words, we disregard the logical functionality of the gates and therefore,
consider all paths, though some of them cannot logically happen.
• More on this later…

19
Simple path representation
a c
• Let’s say we have the following circuit: b e
d
• And the timing model of our AND gate is: 2

2
• We will build a graph:
• Vertices: Wires, 1 per gate output and 1 for a 2
2
each SP and EP. c
e
• Edges: Gates, input pin to output pin, b 2
1 edge per input with a delay for each edge. d 2

• Finally, add Source/Sink Nodes:


• 0-weight edge to each SP and from each EP. 0 a 2
2
• That way all paths start and end at a single node. SRC
0 c
b 2 e SNK
0
0 2
20 d
Node oriented timing analysis
• If we would enumerate every path, we would quickly get exponential explosion
in the number of paths.
• Instead, we will use node-oriented timing analysis
• For each node, find the worst delay to the node along any path.
• For this, we need to define two important values:
• Arrival Time at a node (AT): the longest path from the source to the node.
• Required Arrival Time at node (RAT): the latest time the signal is allowed to
leave the node to make it to the sink in time.
Slack at node n is defined as:
Slack(n) = RAT(n) – AT(n)

21
How do we compute ATs and RATs?
• Recursively!
• The Arrival Time at a node is just the maximum of the ATs at the predecessor
nodes plus the delay from that node.
• The Required Arrival Time to a node is just the minimum of the RATs at the
successor nodes minus the delay to that node.

 0 n  SRC
AT  n    max  AT p   p, n  n  SRC
 ppred n  
   

 T n  SNK
RAT  n    min  RAT s   n, s  n  SNK
 ssucc n  
   

22
So let’s try to understand AT, RAT, and Slack
If the signal arrives too late, we
get negative slack, which means
there is a timing violation.

Slack
Slack
RAT(n)
AT(n) RAT(n)
AT(n)
RAT: longest logic
AT: longest delay to the capture
logic delay edge of the clock
after launch (dependent on
of clock cycle time)

Launch Clock cycle time (T) Capture Clock cycle time (T)

23
Now let’s see an example
• Just look at this path and try to find the worst path.
• Does it meet a cycle time of T=12 ?
3 g
a 1
d 2
j
5 1
b 4 f
3
c 1 2 k
4
3 h
2 e 5 n
• Now let’s fill in the RAT, AT, and SLACK of each node and:
• Quickly find out if we meet timing
• Figure out what the worst path is

24
Now let’s see an example
• We’ll start by representing it as a
directed acyclic graph (DAG)
• Next, we’ll compute ATs from SRC to SNK
0 1 4 7
1 3 2
a d g j
0 5 1 0
0 0 6 12 15
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 3 15 0
5
2 10
c e n
25
Now let’s see an example
• And now RAT from SNK to SRC

0 -3 1 -2 4 10 7 12
1 3 2
a d g j
0 5 1 0
0 -3 0 -1 6 3 12 12 15 12
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 2 4 3 15 12 0
5
2 10 7
c e n
26
Now let’s see an example
• And finally, we can calculate the slack.
• And guess what – we found the critical path!

0 -3 -3 1 -2 -3 4 10 6 7 12 5
1 3 2
a d g j
0 5 1 0
0 -3 -3 0 -1 -1 6 3 -3 12 12 0 15 12 -3
0 4 3 0
SRC b f 4 k SNK
2
0 1 h
0 2 2 2 4 2 3 15 12 -3 0
5
2 10 7 -3
c e n
27
False Paths
• We saw how to find the RAT, AT and Slack at every node.
• All of this can be done very efficiently and be adapted for min timing,
sequential elements, latch-based timing, etc.
• Even better, we can quickly report the order of the critical paths.
• However, this was all done topologically (i.e., without looking at logic).
• Let’s see why this is a problem
This is called a “False Path”

8 2 8 8
a d g 2 a d 2 8 g 2
0 0
f 1 2 j
1 1 1 2 1
b e h 1
2 b e 2 h
1 1 1 1
c i c
28
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Design Constraints
Timing Constraints
• “Stupid Question”:
• How does the STA tool know what the required clock period is?
• Obvious Answer…
• We have to tell it!
• We have to define constraints for the design.
• This is usually done using the
Synopsys Design Constraints (SDC) syntax,
which is a superset of TCL.
• Three main categories of timing constraints:
• Clock definitions
• Modeling the world external to the chip
• Timing exceptions

30
Collections
• So you think you know TCL, right?
• Well EDA tools sometimes use a different data
structure called a “collection”
• A collection is similar to a TCL list, but:
• The value of a collection is not a string, but rather a pointer, and we need to
use special functions to access its values.
• For example, if you were to run foreach on a collection, it would just have one
element (the pointer to the collection). Instead, use foreach_in_collection.
• I won’t go into the specifics here (see SynopsysCommandsReference), but
these are some of the collection accessing functions:

foreach_in_collection filter_collection copy_collection


index_collection add_to_collection get_object_name
sizeof_collection compare_collections remove_from_collection
sort_collection
31
Design Objects
• Design: A circuit description that performs one or more logical functions (i.e Verilog module).
• Cell: An instantiation of a design within another design (i.e Verilog instance).
• Called an inst in Stylus Common UI.
• Reference: The original design that a cell "points to" (i.e Verilog sub-module)
• Called a module in Stylus Common UI. module foo (a,b,out);
• Port: The input, output or inout port of a Design. input a, b; Design
output out; Port
• Pin: The input, output or inout pin of a Cell in the Design.
wire n1; Pin
• Net: The wire that connects Ports to Pins Net
Cell (inst)
and/or Pins to each other.
INVx1 U1 (.in(a),.out(n1));
• Clock: Port of a Design or Pin of a Cell explicitly
NANDX3 U2 (.in1(n1),.in2(b),.out(out));
defined as a clock source.
• Called a clock_tree in Stylus Common UI. Reference
endmodule (module)
32
module foo (a,b,out);

SDC helper functions input a, b;


output out;
Design
Port

wire n1;
• Before starting with constraints, Pin
let’s look at some very useful built in commands: Cell Net
• Note that all of these return collections INVx1 U1 (.in(a),.out(n1));
and not TCL lists! Reference
NANDX3 U2
• These will only work after design elaboration! (.in1(n1),.in2(b),.out(out));
• “get” commands: endmodule
• [get_ports string] – returns all ports that match string.
• [get_pins string] – returns all cell/macro pins that match string.
• [get_nets string] – returns all nets that match string.
• Note that adding the –hier option will search hierarchically through the design.
• “all” commands:
• [all_inputs] – returns all the primary inputs (ports) of the block.
• [all_outputs] – returns all the primary outputs (ports) of the block.
• [all_registers] – returns all the registers in the block.
33
Clock Definitions
• To start, we must define a clock:
• Where does the clock come from? (i.e., input port, output of PLL, etc.)
• What is the clock period? (=operating frequency)
• What is the duty-cycle of the clock?

create_clock –period 20 –name my_clock [get_ports clk]

• Can there be more than one clock in a design?


• Yes, but be careful about clock domain crossings! (…more later)
• If a clock is produced by a clock divider, define a “generated clock”:

create_generated_clock –name gen_clock \


-source [get_ports clk] –divide_by 2 [get_pins FF1/Q]

34
Clock Definitions (2)
• But during synthesis, we assume the clock is ideal, so:
set_ideal_network [get_ports clk]

• However, for realistic timing, it should have some transition:


set_clock_transition 0.2 [get_clocks my_clock]

• And we may want to add some jitter, so:


set_clock_uncertainty 0.2 [get_clocks my_clock]

• Finally, after building a clock tree, we do not want


the clock to be ideal anymore, so:
set_propagated_clock [get_clocks my_clock]

35
I/O Constraints
• Now that the clock is defined, reg2reg paths are sufficiently constrained.
However, what about in2reg, reg2out, and in2out paths?
• First, what clock toggles an I/O port?
• And what about the time needed outside the chip?

• Define I/O constraints:


• Input and output delays model the length of the path
outside the block:

set_input_delay 0.8 –clock clk \


[remove_from_collection [all_inputs] [get_ports clk]]
set_output_delay 2.5 –clock clk [all_outputs]

• Note that a better methodology is to define a “virtual clock”, but let’s not confuse you too much at
this point…

36
I/O Constraint (2)
• An alternative approach is to define max delays to/from I/Os:
set_max_delay 5 \
–from [remove_from_collection [all_inputs] [get_ports clk]]
set_max_delay 5 –to [all_outputs]

• Additionally, we must model the transitions on the inputs:


set_driving_cell –cell [get_lib_cells MYLIB/INV4] –pin Z \
[remove_from_collection [all_inputs] [get_ports clk]]

• And capacitance of the outputs:


set_load $CIN_OF_INV [all_outputs]

37
I/O Constraint (3)
• Graphically, we can summarize the I/O constraints, as follows:

Input and
Output Delays

Input drive and


output cap modeling
38
Timing Exceptions
• There are several cases when we need to define exceptions that should be
treated differently by STA.
8
• For example, looking into the topology a d 2 8 g 2
0 0
of the network we saw earlier: 1 1 2 1
1
b e 2 h

1 1
• In this case, we would define a false path: c

set_false_path –through [get_pins mux1/I0] –through [get_pins mux2/I0]


set_false_path –through [get_pins mux1/I1] –through [get_pins mux2/I1]

39
Timing Exceptions (2)
• Another common case of a false path is a clock
domain crossing through a synchronizer:
set_false_path –from F1/CP –to F2/D

• Alternatively, this can be defined with:


set_clock_groups –logically_exclusive \
–group [get_clocks C1] –group [get_clocks C2]

• If an equal-phase (divided) slow clock is sending data


to a faster clock, a multi-cycle path may be appropriate:
set_multicycle_path –setup –from F1/CP –to F1/D 2
set_multicycle_path –hold –from F1/CP –to F1/D 1

40
Case Analysis
• A common case for designs is that some value should be assumed constant
• For example, setting a register for a certain operating mode.
• In such cases, many timing paths are false
• For example, if the constant sets a multiplexer selector.
• Or a ‘0’ is driven to one of the inputs of an AND gate.
• To propagate these constants through the design and disable irrelevant timing
arcs, a set_case_analysis constraint is used:

set_case_analysis 0 [get_ports TEST_MODE]

41
Design Rule Violations (DRV)
• You can set specific design rules that should be met, for example:
• Maximum transition through a net.
set_max_transition $MAX_TRAN_IN_NS

• Maximum Capacitive load of a net.


set_max_capacitance $MAX_CAP_IN_PF

• Maximum fanout of a gate.


set_max_fanout $MAX_FANOUT

42
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Timing Reports

43
Check Types
• Throughout this lecture, we have
discussed the two primary timing checks:
• Setup (max) Delay
• Hold (min) Delay
• However, in practice, there are other
categories of timing checks that you will
encounter:
• Recovery
• Removal
• Clock Gating
• Min Pulse Width
• Data-to-Data

44
Recovery, Removal and MPW
• Recovery Check
• The minimum time that an asynchronous control
input pin must be stable after being deasserted and
before the next clock transition (active-edge)
• Removal Check
• The minimum time that an asynchronous control
input pin must be stable before being deasserted and
after the previous clock transition (active edge)
• Minimum Clock Pulse Width (MPW)
• The amount of time after the rising/falling edge of a
clock that the clock signal must remain stable.

45
Clock Gating Check
• Clock gating occurrences are any signals on the clock path
that block (gate) the clock from propagating.
• The enable path of the clock gate must arrive enough time before the clock
itself to ensure glitch-free functionality (and similarly hold after the edge).

Ex. 1: Gating signal should only change Ex. 2: Gating signal should only change
when the clock is in the low state when the clock is in high low state
46
Checking your design
• report_analysis_coverage checks that you have fully constrained your
design. pt_shell> report_analysis_coverage
Type of Check Total Met Violated Untested
-------------------------------------------------------------------------------
setup 6724 5366 ( 80%) 0 ( 0%) 1358 ( 20%)
hold 6732 5366 ( 80%) 0 ( 0%) 1366 ( 20%)
recovery 362 302 ( 83%) 0 ( 0%) 60 ( 17%)
Library
Timing Checks
removal 354 302 ( 85%) 0 ( 0%) 52 ( 15%)
min_pulse_width 4672 4310 ( 92%) 0 ( 0%) 362 ( 8%)
clock_gating_setup 65 65 (100%) 0 ( 0%) 0 ( 0%)
clock_gating_hold 65 65 (100%) 0 ( 0%) 0 ( 0%)
From out_setup 138 138 (100%) 0 ( 0%) 0 ( 0%)
set_output_delay out_hold 138 74 ( 54%) 64 ( 46%) 0 ( 0%)
-------------------------------------------------------------------------------
All Checks 19250 15988 ( 84%) 64 ( 0%) 3198 ( 16%)

• check_timing Performs a variety of consistency and completeness checks


on the timing constraints specified for a design.
47
Report Timing (Max Delay) Analysis
• Perhaps the most important command in any synthesis or place and route tool
is report_timing.
• For convenience, we will look
at the PT syntax and
reports.
• For other tools, the concepts
are similar (perhaps easier).

48
Report Timing (Max Delay) report_timing

Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk)


Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk)
Header Path Group: Clk
Path Type: max

Point Incr Path


-----------------------------------------------------------
clock Clk (rise edge) 0.00 0.00
clock network delay (propagated) 1.10 & 1.10
Data FF1/CLK (fdef1a15) 0.00 1.10 r
FF1/Q (fdef1a15) 0.50 & 1.60 r
arrival U2/Y (buf1a27) 0.11 & 1.71 r
0ns 4ns
U3/Y (buf1a27) 0.11 & 1.82 r
FF2/D (fdef1a15) 0.05 & 1.87 r
data arrival time 1.87

clock Clk (rise edge) 4.00 4.00


Data clock network delay (propagated) 1.00 & 5.00
FF2/CLK (fdef1a15) 5.00 r
required library setup time -0.21 4.79
data required time 4.79
------------------------------------------------------------
data required time 4.79
Slack data arrival time -1.87
------------------------------------------------------------
49
slack (MET) 2.92
Report Timing (Min Delay) Analysis
FF1 Data Arrival
FF2
Q
F1 U2
D
0ns 4ns U3 F1
CLK
Clk CLK

Data Required
Data
Arrival
Time

FF1/clk 5.1ns
1.1ns
Slack is the difference
FF2/D between data arrival and
required.
Data Hold
Required
50 FF2/clk
Report Timing (Min Delay)
report_timing –delay min
Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk)
Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk)
Path Group: Clk
Path Type: min

Point Incr Path


----------------------------------------------------------
clock Clk (rise edge) 0.00 0.00
clock network delay (propagated) 1.10 & 1.10
FF1/CLK (fdef1a15) 0.00 1.10 r
FF1/Q (fdef1a15) 0.40 & 1.50 f
There are several U2/Y (buf1a27) 0.05 & 1.55 f
0ns 4ns
clues that this is a U3/Y (buf1a27) 0.05 & 1.60 f
FF2/D (fdef1a15) 0.01 & 1.61 f
hold time report. data arrival time 1.61

clock Clk (rise edge) 0.00 0.00


clock network delay (propagated) 1.00 & 1.00
FF2/CLK (fdef1a15) 1.00 r
library hold time 0.10 1.10
data required time 1.10
----------------------------------------------------------
data required time 1.10
data arrival time -1.61
----------------------------------------------------------
slack (MET) 0.51
51
Report Timing – Selecting Paths
• By default, report_timing shows you the most critical path
• i.e., the path with the worst negative slack (WNS)
• But sometimes, we want to analyze a specific path or set of paths.
• For example, I only want to see the paths that come from a primary input…
• Use the –from, – to, – through flags and their variants:
• -from: To select a start point (= input port or register/IP clock pin)
• -to: To select an endpoint (= output port or register/IP data pin)
• -through: to select any other pin
• You can specify direction (i.e., -through_rise), clock (i.e., -clock_from)
report_timing –from ff1/CK –through_fall mux1/I0 –to [all_outputs]

52
Report Timing - Path Groups
• Path groups are categories of paths that are both
optimized and reported separately.
• Default path groups are per ends Clock.
• In addition, paths ending at clock gates are treated separately.
• To create specific path groups for your design, use the group_path command:

group_path –from ff1/CLK –to ff2/D –name my_path

report_timing –group my_path

53
Report Timing Debugger
• A very good GUI option is to use the PT “Timing Status” tool.
• This tool lets you explore the
timing report interactively, even
showing path schematics, SDC,
and highlighting the path in the
layout.

54
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Multi-Mode Multi-Corner
Or how to deal with the corner crisis!

55
More than one operating mode
• During synthesis, we (usually) target timing for a worst-case scenario.
• But, what is “worst-case”?
• Intuitively, that would be a slow corner, (i.e., SS, VDD-10%, 125C)
• No need for hold checking, since clock is ideal (No skew = No hold)
• But, what if there is an additional operating mode?
Mode TEST_MODE FREQ
• For example, a test (scan) mode.
Functional 0 1 GHz
• Do we have to close timing
Test 1 10 MHz
at the same (high) clock speed?
• No problem, we’ll just deal with both modes separately
• Prepare an additional SDC and rerun STA/optimization.

set_case_analysis 1 [get_ports TEST_MODE]


create_clock –period [expr $TCLK/100] –name TEST_CLK [get_ports TEST_CLK]
56
Many, many, corners…
• But real SoCs are much more complex: Mode VDD1 FREQ1 VDD2 FREQ2

• Many operating modes. F1 1.2 V 2 GHz 0.8 V 500 MHz


F2 0.8 V 400 MHz 0.8 V 400 MHz
• Many voltage domains.
F3 Off Off 0.5 V 50 MHz
• With real clock, need to check hold.
TEST 1.2 V 10 MHz 1.2 V 10 MHz
• We easily get to hundreds of corners
• Setup and hold for every mode.
• Hold can be affected by SI 
check hold for all corners
• Temperature inversion – what is the worst case?
• RC Extraction – what is the worst case?
• Leakage – what is the worst case?
• Aaaaarrrrrgggghhhh!
57
The corner crisis
• Traditional approach not feasible

58
Distributed Multi-Scenario Analysis (DMSA)
• DMSA provides efficient unified analysis of multiple PrimeTime analyses (or
scenarios) from a single master PrimeTime
• We can work with multiple analyses as easily as with a single PrimeTime analysis run!

Functional Scan shift Scan JTAG


mode mode capture mode
mode
PT PT worst-case
host host (125ºC, 1.35V) Scenario #1 Scenario #2 Scenario #3 Scenario #4
DMSA network connection
MASTER PT
host typical Scenario #5 Scenario #6 Scenario #7 Scenario #8
(25ºC, 1.50V)
PT PT best-case
host host (0ºC, 1.65V)
Scenario #9 Scenario #10 Scenario #11 Scenario #12

59
Multi-Mode, Multi-Corner
• MMMC to the rescue!
• It’s implemented in a slightly (!) confusing way,
but it really simplifies things.
• The basic concept is that we create analysis views that can then be selected for
setup and hold (max and min) constraints.
Analysis View = “Turbo” Analysis View = “Low Power” Analysis View = “Turbo - Hold”

VDD=1.2V VDD=0.5V VDD=1.3V


Freq=2 GHz Freq=50 MHz Freq=2 GHz
Corner=SS Corner=SS Corner=FF
Temp=125 C Temp=125 C Temp= -40 C
OpMode=“Turbo” OpMode=“LP” OpMode=“Turbo”

Setup checks: “Turbo” and “Sleep” Modes


Hold checks: “Turbo – Hold” Mode
60
Multi-Mode, Multi-Corner - Summary
Timing
Condition
Timing1 Library Set 1
Library Set 2
Condition
Timing2 Library Set 3
Delay
Corner 1 Condition 3
Analysis View 1 Delay
Corner 1
Analysis View 2 Delay
Selected Corner 3 RC Corner 1
Setup Views RC Corner 2
Analysis View 3 RC Corner 3
Selected Hold Analysis View 4
Views
Analysis View 5
Constraint
Mode 1
Analysis View 6 Constraint
Mode 1
Constraint
Mode 3
61
…So you think that was complicated?
• What if I have multiple voltage domains?
• Now, for example, in a certain operating mode, one inverter is operated at
1.2V, while another one, only a few microns away, is at 0.6V.
• How do I define that library set?
• Even worse…
• What happens if I want to power down a certain module?
• What if I want to power down a module, but retain the state (i.e., the value
stored in the flip flops)?
• How do I transfer data between two voltage domains?
• Arrrrrgggghhhh!

• We’ll briefly discuss this next lecture…


62
2 5
1 3 4 6
Static Multi Mode
Sequential Design Timing Advance
Timing Multi
Clocking Constraints Reports STA
Analysis Corner

Advance STA
POCV, GBA vs PBA and SI

63
Yield-driven and Advanced STA
• There are many more concepts, approaches, and terminologies
used in timing analysis for high-yield signoff:
• On-chip Variation (OCV)
• Advanced On-Chip Variation (AOCV)
• Graph Based Analysis (GBA) , Path Based Analysis (PBA)
• Signal Integrity (SI)
• ECO
• and more and more…*

64
OCV -> AOCV -> POCV

• OCV – On-Chip Variation


• AOCV - Advanced On-Chip Variation
• POCV - Parametric On-Chip Variation

65
Traditional Accounting for On Chip Variation (OCV) Effects
Min-Max Slew Cell delay based on
Propagation Min-Max slews
Analyzing for the
Worst case Timing Check Path
Launch
OCV effect Setup Slowest launch and Fastest capture
on timing U3 Path
Hold Fastest launch and Slowest capture
A U6 D Q
Y
D Q B
0.55 U5 Environment:
U4
Environment: “0.60” CP Min-Max Loads
U1 CP
Min-Max Drives
setup/hold
CLK
U2 0.21/0.08
0.62 “0.23/0.10”
One Process Corner “0.65” Capture
Path
Min-Max OCV
Libraries for On-chip variation (OCV) analysis is to recognize the intrinsic
that Corner variability of semiconductor processes and their impact on timing.
i.e, each library cell can have a variation between a maximum
and minimum delay value (for example) due to OCV effect.
Example: OCV Analysis with Timing Derate
# In the SLOW PVT corner: Models an 8% speed-up and a
set_timing_derate –early 0.92 4% slow-down from the SLOW
set_timing_derate –late 1.04 PVT corner

_RESET U6 U7 4% SLOWER
FF2 Slow
Launch Path corner SLOW

Delay 8% FASTER
FF1
CLK1 U1 U2
CRP Fast 7% SLOWER
Capture corner FAST

U4 U5
Path 4% SLOWER
Operating
CLK2 U3 Conditions
~ P / -V / T

Setup Hold
Launch uses LATE derate: Launch uses EARLY derate:
1.04 x SLOW corner delays 0.92 x SLOW corner delays
Capture uses EARLY derate: Capture uses LATE derate:
0.92 x SLOW corner delays 1.04 x SLOW corner delays
Timing Pessimism with OCV Analysis : TFU
The “min-max” OCV delay range of BF1 in the SLOW corner is 90-100 ps:
BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1 BF1

1. Assume the above OCV is modelled by early/late derating:


If the launch and capture clock paths each contain ten BF1 buffers,
setup timing analysis assumes:
a. 900 ps for launch path; 1000 ps for capture path
b. 1000 ps for launch path; 900 ps for capture path

2. On actual silicon, the more realistic total delay for ten BF1 buffers will be
between 900ps and 1000ps. Therefore, OCV timing analysis is:
a. Optimistic
b. Pessimistic

3. The closer the cells are physically placed to each other:


a. The larger their systemic V/T variations can be
b. The smaller their systemic V/T variations can be

4. Since process (P) variation is random, its effects are smaller in a path with:
a. 30 BF1 cells
b. 3 BF1 cells
68
AOCV Calculates Derating Based on Depth and Distance
Advanced On-Chip Variation provides more realistic variable derating by
taking path depth and distance into account
• Random P variation modeled as a Launch Cell Path Depth = 5
function of logical path depth
• Systemic V/T variation modeled as a
function of the maximum physical distance
between related timing path cells and nets

CRPR
CRPR
AOCV depth+distance variation table Common
Common
Point
Point Depth=0 Capture Cell Path Depth = 3
version: 1.0 Depth = 0
object_type: lib_cell
delay_type: cell
rf_type: rise
derate_type: late
object_spec: lib/MUX2_1 Depth 1 2 3 4 5 ...
depth: 1 2 3 4 5 … Variable
1.214 1.153 1.109 1.078 1.056 ...
distance: 100 500 Derate
table: 1.214 1.153 1.109 1.078 1.056 …\ Flat
1.271 1.201 1.145 1.121 1.100 … 1.15 1.15 1.15 1.15 1.15 ...
Derate
69
AOCV Pessimism
Modeling random P variation based on path depth in AOCV can be
pessimistic due to a cell belonging to multiple paths (and multiple depths)

Stage AOCV %
4 Count Derat Increase
e
1 1.160 16.0%
4 1.097 9.7%
6 1.072 7.2%
39 1.034 3.4%

• GBA-based AOCV uses the worst-case 9.7% derate for the shared “AND3”, based on the
shorter 4-stage path, for the 39-stage timing path
• PBA-based AOCV uses the actual path specific 3.4% derate, based on the actual 39-stage
count of the path
70 • GBA AOCV pessimism for the long path = 9.7% - 3.4% = 6.3%
Parametric On-Chip Variation Modeling of Random Variation

• POCV does not rely on path depths to model random variation – instead uses
a Gaussian Distribution model for individual cell delays
• Each cell has a nominal or mean (m) delay and a standard deviation (σ)
delay value
(σ is also called sensitivity)

• The cumulative delay of a path is


determined by statistically adding the
delay distribution of each stage
a c f
e h
b d g m

71
• Eliminates GBA-based pessimism due to random variations
GBA - PBA

• GBA – Graph Based Analysis


• PBA – Path Based Analysis

72
Fast Analysis != Final Analysis
A[1]
U0 Cell delay = f (i/p slew, o/p cap)
U3
U7
o/p slew = f(Driving cell’s i/p slew, …)
B[2] U10
U1
Two different
U8 input pin slews.
A[2] (Pin A: Slower, Pin B: Faster slew)
U4
B[0]
U12 Z[2]
U5 A
U9 U11
B[1]
B
U2 U6 Worst slew gets propagated by default
A[0]
to calculate U11 cell delay
• STA is fast [doesn’t require simulation]
• The default analysis in PT is called Graph based analysis (GBA)
• Analysis is faster -- Uses worst case slew (through pin U9/A) propagation to calculate U11 cell delay
• Analyzed path through pin U9/B is pessimistic due to U11 cell delay
• The GBA pessimism is OK for earlier PT runs [faster analysis]
• When approaching signoff, this pessimism has to be removed by using path based analysis (PBA)

73
Graph Based Analysis (GBA)
• The default analysis in PT is known as GBA which uses the worst input slew of
a cell to calculate that cell’s output transition
• The calculated output slew will be used to calculate the delays of downstream
cells
• This can lead to pessimism in the analysis of certain paths, ex: when FF2-FF3
path is analyzed using the U1/A pin slew.

(Worst slew)

U 1 /A
FF 1
U1 U2 U3
U 1 /B U1 /Z
FF 3
(late arriving path) (Best slew)
FF 2

74
SI

• Delay – crosstalk impacts setup and hold requirements


• Noise – crosstalk impacts signals

75
What is Crosstalk?
Crosstalk is the transfer of a voltage transition from multiple switching nets
(aggressors) to another static or switching net (victim) through a coupling
capacitance (Cc).

Switching Aggressor
net 1 Aggressor
Cc
net 2 Victim

Static victim Switching victim

Every net is a potential aggressor to its neighboring nets Analysis is


 An aggressor net can affect multiple victim nets iterative!
Every net is a potential victim 
A victim net can have multiple aggressor nets
What Is a Stage Delay?
• A victim’s stage delay consists of
• The cell + net delay, calculated together
• 4 numbers - min/max rise/fall delays
• A stage delay is used to
• Determine each stage arrival windows
• Calculate data arrival and required time for setup/hold analysis
• A stage delay is calculated considering the worst case effects

Switching Aggressors

Switching Victims

Effective slow-down Effective speed-up


(positive delta delay) (negative delta delay)
max_rise min_rise
77
What Are Timing Windows?
• Each stage has a timing window, defined by the shortest and longest path
delays leading up to that net
Timing Window

D Q
Shortest
path

Clk

Longest
D Q path

Clk If the nets here speed up or slow


down due to crosstalk –this will
affect the timing window here!
What Is Crosstalk-Induced Noise?
Switching aggressor nets can create crosstalk-induced noise on static victim
nets, also called “static noise”.

net 1 Aggressor
Cc
net 2 Victim

Noise is a problem,
when the noise bump
crosses the logic
“Static noise” threshold of a down-
stream gate (called a
“logic failure”).
Unsafe noise bumps : Crossing the Logic
Thresholds
If the noise bump crosses the logic thresholds of the downstream gate, the
output of that gate may generate a logic swing  Potential problem!

Logic threshold of
down-stream gate

VDD VDD
0.5 VDD 0.5 VDD
0V 0V

Crosses logic threshold Output switches - “Logic


Failure”!
This is called “between-the-rails” noise bumps (below_high and above_low).

If the noise bump does not cross the logic threshold of the downstream
gate, there is no negative effect – the noise bump is considered safe!
80
Example Functional Failures due to Noise
Aggressor Noise on Data = Potential
wrong logic value captured!
 Functional Failure

Noise on Clock = Extra clock


C cycle!
c Vd  Functional Failure
d D Q
^
Clk
Victim R

report_noise –all_violators Noise on Reset =


report_noise –clock_pins Unintended reset!
report_noise –async_pins  Functional Failure
report_noise –data_pins
Signoff-driven Physically Aware ECO Flow
• PrimeTime inputs:
IC Compiler II StarRC • Netlist (Verilog)
Extraction
• Timing constraints (SDC)
Netlist & • Power Intent (UPF), if applicable
ECO Guidance change

Constraints
• Layout (DEF) + Floorplan
PrimeTime SI Constraints (Tcl)
Coupled SPEF
• Ref Library + Tech Info (LEF)
STA/SI
• RC Parasitics with coordinates
(SPEF)
No • Standard cell spacing rules for
Violations? Signoff advanced technologies
list

Yes (Encrypted Tcl)


ECO
• PrimeTime output:
• ASCII ECO file with coordinates
PrimeTime ECO Flow with IC Compiler II

82
Recommended ECO Guidance Flow in PrimeTime
Recommended Noise and
Power DRC Fixing Final leakage
ECO
recovery Timing Fixing recovery
guidance

Sizing DRC and noise Setup uses Hold uses Sizing


use buffer sizing buffer
Fix mechanism
insertion and insertion
sizing and sizing
Does not DRC and noise Setup Hold Does not introduce
introduce new alter setup and honors DRC, honors new timing or DRC
timing or DRC hold slack as alters hold if setup violations
Precedence rules
violations needed (fix noise needed slack and
after DRC and DRC
timing)
• Perform ECOs that result in more changes Final step after
Flow options • Implement the changes in IC Compiler before next timing closure
PrimeTime ECO run
• Supports full-chip STA, SI, AOCV GBA and PBA flows
Highlights
• Fixes across all scenarios
No negative 95% 70% 95% No negative impact
QoR impact
Leakage Recovery : fix_eco_power –
pattern_priority
fix_eco_power -pattern_priority {HVT NVT LVT}

Most preferred Least preferred


Fixed pattern

swap HVT_BUF1X
HVT_BUF1X LVT_BUF1X
NVT_BUF1X
NVT_BUF1X
LVT_BUF1X HVT_BUF1X
NVT_BUF1X swap

HVT_BUF1X no swap
Changing
pattern • The changing pattern can be a prefix, suffix or in the middle
• Fixed pattern has to match for the swappable cells

84
Reporting Vt Cell Usage Example
report_cell_usage –pattern_priority {HVT LVT}
Before :
Pattern Combinational cell Sequential cell Total More LVT, Fewer HVT cells
--------------------------------------------------------------------
HVT 82081 ( 2%) 1078 ( 0%) 83159 ( 2%)
LVT 3908613 ( 83%) 696435 ( 15%) 4605048 ( 98%)
Others 4115 ( 0%) 532 ( 0%) 4647 ( 0%)

More HVT, Fewer LVT cells


After :
Pattern Combinational cell Sequential cell Total
--------------------------------------------------------------------
HVT 3678276 ( 78%) 3491 ( 0%) 3681767 ( 78%)
LVT 312418 ( 7%) 694022 ( 15%) 1006440 ( 21%)
Others 4115 ( 0%) 532 ( 0%) 4647 ( 0%)

85
References
• Gil Rahav, BGU
• Gangadharan, Churiwala “Constraining Designs for Synthesis and Timing
Analysis: A Practical Guide to Synopsys Design Constraints (SDC)”, Springer,
2013
• Synopsys Solvnet (PT,Synthesis Quick Reference)
• Cadence Support (+Genus and Innovus Text Command References)

86

You might also like