Negative Skew For Setup: Propagation Delays Load Delays Interconnect Delays

V = Vdd * [1 - e -t/(Rdh * Cload)] o/p capacitor charges
The product (Rdh * Cload) is called the RC time constant

The output capacitance discharges V = Vdd * e-t/(Rdl * Cload)
The timing arc is positive unate if a rising transition on an input causes the output to rise (or not to change) and a falling transition on an
input causes the output to fall (or not to change)
An example of a false path is where the designer has explicitly placed clock synchronizer logic between the two clock domains.
In this case, even though there appears to be a timing path from one clock domain to the next, it is not a real timing path since the data is not
constrained to propagate through the synchronizer logic in one clock cycle. Such a path is referred to as a false path - not real - because the
clock synchronizer ensures that the data passes correctly from one domain to the next. Identifying which clock domain crossings
are real and which clock crossings are not real is an important part of the timing verification effort. This enables the designer to focus on
validating only the real timing paths.
Propagation Delay: Sum of Interconnect and Load delay. Often written as TPD.
Margin: The difference between the required value of a timing parameter and the actual value. A negative margin
means that there is a timing violation. A margin of zero means that the timing parameter is just satisfied: changing the timing of the signals
could violate the timing parameter
NB: “Margin is often called “slack”. Both terms are used commonly.
Negative Skew For Setup
Positive skew For Hold
Definition Clock Latency: The difference in arrival times for the same clock edge at different levels of interconnect along the clock tree.
(Intuitively “different points in the clock generation circuitry.”)
NOTE: Clock latency Clock latency does not affect the limit on the minimim clock period.
Definition Clock Skew: The difference in arrival times for the same clock edge at different flip-flops. Clock skew is caused by the difference in
interconnect delays to different points on the chip.
Definition Clock Jitter: Difference between actual clock period and ideal clock period.
Clock jitter is caused by:
_ temperature and voltage variations over time
_ temperature and voltage variations across different locations on a chip
_ manufacturing variations between different parts
If setup time is violated, current input data will not be stored; input data from previous clock cycle might remain stored.
Propagation Delays
Load Delays
Delay is proportional to load capacitance
Interconnect Delays
Wires, also known as interconnect, have resistance, and there is a capacitance
between parallel wires. Both of these factors increase delay.
_ Wire resistance is dependent upon the material and geometry of the wire.
_ Wire capacitance is dependent on wire geometry, geometry of neighboring wires, and materials.
_ Shorter wires are faster.
_ Fatter wires are faster.
_ FPGAs have special routing resources for long wires.
_ CMOS processes use higher metal layers for long wires, these layers have wires with much larger cross sections than lower levels of metal.
The key to calculating setup and hold times of a latch, flop, etc is to identify:
1. how the data is stored when not connected to the input (often a pair of inverters in a loop)
2. the gate(s) that the clock uses to cause the stored data to drive the output (often a transmission gate or multiplexor)
3. the gate(s) that the clock uses to cause the input to drive the output (often a transmission gate or multiplexor)
The transistors were made larger.
Answer:
Resizing transistor to increase the width to length ratio decreases the resistance of the transistor, which makes it faster. This means that the
supply voltage can be reduced to save power while maintaining performance. However, increasing the width to length ratio increases the
capacitance. After a certain point, the capacitance increase becomes more significant than the reduction in supply voltage, causing power to
increase. Therefore, resizing is adjusting supply voltage and load capacitance to minimize their product in the switching power component.
How to decide setup and hold margin on state timing analyze ？ How much margin can completely assure tapout sucessfully?
Depends on which stage u are doing STA. for eg; in prelayout stage 10% margin for setup and 0% hold will do . hanks a lot , But you can
assure 10% margin is okay ? if you select another process. then...postlayout stage how much margin I should keep .Well, 10% is not a magic
number. I use 15% most of the time. Also, use aggressive wireload model in your early synthesis can take some wiring delay into the synthesis
consideration, which will make post-layout timing closure somewhat easier.
But, how to gain 15% ? if you have a bigger margin ,maybe make your design timing is not better?
Set your clock period 15% shorter than your target. Bigger margin may result a little bigger block, but make your backend design easier.
In our design we r generating 1MHZ clock from 8mhz clock. There are some logic in design where we are sending data from 8Mhz to 1 MHZ.
Do we need to set multicyle paths for setup and hold check?
If the clocks are from the same source (when you say you are generating the 1MHz clock from 8 MHz then these two clocks should belong to
the same domain) you need to set the 1MHz clock as a derived clock and define the corresponding parameters. When you do this the tool will
automatically check the timing of the signals that cross the freq domains. It's no need to set multi cycles etc. Once you declared generate clock
with the correct clock phase. STA will automatically decide the launch clock edge and capture clock edge.
In my design, we have multiple clocks with different frequency, but the clocks have the same source.
(For example, clock A and clock B both are divided from clock S, and Frequency(A) = n*Frequency(B), n is an integer) There are some logic
between the two clock domains, how can I constraint the logic from clock A to clock B, and the logic from clock B to clock A?
Yep you could set the generate-clock using the same source clock. Then the STA tools will calculate the setup/hold time in these clock
domains. If you can confirm that the clkA and clkB ablosutely come from the same source, and according to the design requirement you do
want to check the timing between these domain, you can set these clocks in the same clock group. Then tools will check timing in these
domains. Since the definition of the two clocks are separately, and the frequency are different too, so how to tell DC that the two clocks have a
same source?
You could use this command: set_clock_group {} It defines the group of clock
Set up time of a flip-flop depends up on what ?

yes it does depend upon the clock transition time.If u see the cell library of liberty u can see that the set up time of a flip-flop depends up on
two things
a)input transition time of D-flip flop and
b)clock transition time.
the table for set-up time comprises of the above two. Now the reason is if u see the cmos transistor level D-ff.understandin the requirment for
setup is important. SET-UP means to establish. We want the driving element of the D flip-flop which may be another flip flop should establish
the data before setup time . if clock transistion time is more, the D-i/p may get that much extra time to setup at the input. so the setup time
decreases with increase in clock transistion time and vice-versa.
what is meant by virtual clock definition and why do i need it?
For paths going through a primary input port, the tool needs to know the frequency of the clock driving the signal in order to create a proper
timing path. Similarly for output ports, the tool needs to know the frequency of the flop capturing the signal. This is why we define a virtual
clock. To give a clock relationship to paths going through IO ports.
so why not use a real clock to constraint the IO instead of using a virtual clock?
real clock constraints are defined at a clock generation point inside the design. IO signals are launched/captured by clocks outside the design,
which is why we define virtual clocks. If the clock also goes through the IO you may be able to define the clock as real at the IO port, but this
may not always be the case.
I have closed STA with SPEF in PT/PTSI and when doing back annotation with SDF I am seeing functional failures. What might be the
reason?
1) Bad timing constraints (false path in STA on a real path)

2) asynchronous logic that STA would not catch
3) missing xfilter on synchronizer
While doing STA with OCV and xtalk ON, using PrimeTime SI, is there any way by means of which I can control timing windows. Basically
what I am asking is, when we turn ON both OCV and Xtalk we usually see lot of timing violations (because of pessimism introduced). To
remove pessimism we can apply some filters like filtering out of all the coupling capacitor below some x femto farad(FF) or ratio of coupling
capacitnce to total capacitance is below some y FF like wise is there any way I can control or apply filter on timing window of aggeresor and
victim nets.
The set_clock_latency with early/late can be used for jitter margining but I would keep this command for clock tree variations. The reason is
that when you move from ideal to propagated clocks, this command goes away and you will need to account for jitter some other way. It's
best to keep it consistent both pre and post CTS.
The set_clock_uncertainty will be applied the same both pre and post CTS. One issue with using this to apply jitter is that multi-cycle paths
will not be accounted for properly. Say your frequency jitter can speed up your clock by 100ps, and you are timing a multi-cycle path of 4,
set_clock_uncertainty will only give you 100ps of margin when you really need 400ps.
another way to account for frequency jitter is increase your clock frequency. This fixes the multi-cycle path problem as the margin added
would then the correct 400ps.
The increase frequency method works for frequency jitter but does not account for 1/2 cycle jitter. If your design has a lot of 1/2 cycle paths
you can also use set_clock_uncertainty with rise/fall options to account for 1/2 cycle jitter.
Yes, PD tools uses different wire load models at different stages. Because in the implementation stages as we can not use sign-off tools for
extraction because of runtime(Or we may not require that accuracy) and also we will not have complete routing information. So physical
design tools uses different wire load models to model parasitics. Then why is different wireload models at different stages?
As we proceed with flow even we need to have accurate wire model as close as possible to sign-off tool so that we would t not see any
surprise results when do STA with sign-off tools(StarRC-xt Industry sign-off parasitics extract tool, PrimeTime/PrimeTimeSI Industry sign-
off STA tool
WireLoadModel Stage
1. CONSTANT or Zero wire load Model used during fix-time stage

2. Global wire load model used during fix-cell/fix-wire stage (Because first global routing is done at fix-cell stage and again during fix-
wire stage)
3. MANHATTAN -During CTS/Partial at Fix-wire stage (I am not sure)
4. FINAL when detail routing is done i.e at the end of Fix-wire Stage.
In our flow, we get an sdc from the frontend team and subsequently get additional margins.
1. My query is why does the SDC not have all the constraints in one file.
2. What is the concept of Deration(Setup/Hold) that needs to be applied?
3. What is OCV ? Again why is this included separately?
OCV is on chip variation. SSTA(Statistical static timing analysis) use OCV. Deration is coefficient of variation. If we use derivation we can use
STA instead of SSTA.
1) Constraints do not all have to be in the same file. If you integrate multiple IPs into one design, you will have multiple constraint files that
you load in separately. You can always do a write_sdc command to write them out into one file .
2) Derating is simply another way of adding margin to the design. This allows you to scale all delays by a certain percentage to increase
margin. This is not an SDC constraint, but a variable to set within the tool run script.
3) OCV is a timing mode (like single or BC/WC). This allows you to use variation from the libraries in performing timing checks for ensuring
worst case scenarios. Not sure what you mean by 'included separately'. This is not a SDC constraint, but a timing mode that you set in your
tool run script.
How to identify the Multi cycle path and the False path in the design. do we need to identify after the Synthesis stage DC tool it self will
recognize and through as warning or error.
At what stage in the asic flow this multicycle path and False path are identified. How to fix this Multi cycle path and false path in the asic fpga
flow .How it is going to effect the Timing Closure and the Slack of the design.
i feel synthesis tool might not give out any multi-cycle/false paths.
We need to identify the multi-cycle and false paths, as per design and specify them to tool.
Generally paths crossing clock domains are specified as false paths to the tool so that the will not waste time in analyzing those paths for
timing. We can specify multi cycle paths by design if there are any, which will improve the tools efficiency in analyzing the timing. Also we
can check any critical timing paths are multi cycle paths and specify them to the tool
Yes it slows down the STA timer. Every timing exception is an additional rule that must be added to the basic STA algorithm. And each one
requires CPU time to check which slows down the tool. A handful more or less doesn't make any difference, but long complex lists of timing
exceptions are a drag for most tools. I cannot vouch that this is true for every last tool, but it is true for many EDA tools, including P&R.
Keep in mind that you must absolutely specify all significant timing exceptions or the optimization tool will focus on trying to fix spurious
timing problems. But it is not desirable nor possible to list all the exceptions exhaustively.
If a timing exception, for example, a false path does not cause any timing violations then we don't care if the P&R tool does some minor bit of
unecessary optimization to optimize this false path.
So you only need to list those exceptions that would otherwise rise to near the top of the critical path list. Unfortunately these can usually only
be identified through trial and error.
Keep in mind that for most EDA tools, the more timing exceptions you specify , the slower the tool runs.
The practical solution that everybody uses is to list all the false/multicycle paths that you know of and then run a timing report. If timing is
OK, everything is fine. If there are timing failures, check to see if any of them are false/multicycle paths. If yes, then those timing failures are
false and you need to add them to the exception list. If none of your timing failures are timing exceptions, then you have a real timing
problem.
What are all the different corners available & why do we need to simulate it?.
If you mean postlayout simulation, we simulate to double check STA results (ensuring there were no bad constraints) and also to check what
STA does not cover (asynchronous paths). We can simulate at any corner STA is run, since we need SDF to be generated, but normally only a
subset of corners is simulated due to runtime.
"Corner" in the context of circuit simulation means "models" corresponding to extreme process conditions. These models include device
(Spice) models, which are called "fast" (low Vt, high Ion), "slow" (high Vt, low Ion), and their combinations corresponding to independent
variations of PMOS and NMOS device (fast-fast, fast-slow, etc.). Also, these models include BEOL (back-end-of-line) models - i.e. multilayer
metal/dielectric stack parameters (layer thicknesses, dielectric constants, resistivities, etc.) - in which case they are called by possible
combination of words "capacitance/resistance" and "best/worst" ("best" meaning lowest value of parasitic R and C etc.). "Corner" modeling is
supposed to "guardband" the design against process variations that happen due to manufacturing effects (i.e. variations over wafer or over
chip, as well as wafer-to-wafer and lot-to-lot variations) and due to microscopic physical effects (i.e. doping density variation in the channel of
MOSFETs due to discretness of implants and due to polycrystalline gate structure, line-edge-roughness of poly-Si gates, and so on). "Corner"
modeling results in an over-guardbanding of chip design (i.e. sub-optimal performance).
Why is latch so much undesirable ?
one reason i know is that since latch is level sensitive, the output could come at any point of the particular level. Hence it could lead to the
formation of glitches.
If one wants to fix a latch based design timing by better balancing the logic among the pipe stages he might have to do a more complex
analysis than required for a flop based design.
I came across many definitions for clock latency...

Def 1:
The number of clock pulses required by the circuit to give out the first output.
Def 2:
The total time taken by the clock signal from the source to reach the input(clock pins) of the register.
The correct definition is Def 2. especially in the context of STA. However, you are right, sometime people do refer to Def 1 when trying to
describe latency of the circuit .. I would say this is not really clock latency, but latency of the design -> how many cycles from pumping the
first input do you get a meaningful output. But usually, when you refer to clock latency, it is defn 2 that we are talking about.
Please can any one explain the below

what is min delay in max corner?
what is max delay in min corner?
min and max corners refers to extraction corners to account for different PVT. within the same corner, there is min and max delay (or I prefer
to call ealy and late delays). this is to account for OCV.
so you can use a max corner extraction and run STA with OCV at worst-case (late) and better-than-worst-case (early) operating conditions.
for a given maxtime path, the delay on the data path should be late, the delay of capture clock should be early, the delay of the launch clock
should be late.
Max corner:
setup: PVT- Slow Low High
Hold: PVT- Slow High Low or Fast High Low
Min corner:
setup: PVT- Fast Low High or Slow Low High
Hold: PVT- Fast High Low
But when you are doing OCV analysis, consider path between reg to reg, the delay on the data path should be worst delay, the delay of
capture clock should be best delay, the delay of the launch clock should be again worst.
The worst delay will get on Max corner and best delay will be on min corner.
This kind of analysis will lead to too pessimistic results. So what STA tool will does is for given corner it will create two library i.e if you
consider max delay corner, it will create worst of worst delay library and worst of best delay library depending upon ocv derate values. Now
if consider again the same reg to reg path, the delay on the data path will be worst of worst delay, the delay of capture clock will be best of
worst delay, the delay of the launch clock will be worst of worst delay.
Clock Latency is the delay in the clock signal from the clock source port to any clock pin in the circuit. Clock uncertainity is jitter. But jitter and
skew are two different terms. Jitter is the variation in the clock period ( that is the clock edge might not be at the required time). Jitter coud be
caused due to various on chip variations.Jitter need not be expressed with respect to two nodes. Clock Skew is the difference between the
clock arrival times at two different nodes.
The first important point is that there are two phases in the design of a clock signal. At first the clock is in "ideal mode" (e.g.: during RTL
design, during synthesis and during placement). An "ideal" clock has no physical distribution tree, it just shows up magically on time at all the
clock pins. The second phase comes when clock tree synthesis (CTS) inserts an actual tree of buffers into the design that carries the clock
signal from the clock source pin to the (thousands) of flip-flops that need to get it. CTS is done after placement and before routing. After CTS
is finished, the clock is said to be in "propagated mode".
Now we can get to your questions:
What is clock latency? Clock latency is an ideal mode term. It refers to the delay that is specified to exist between the source of the clock signal
and the flip-flop clock pin. This is a delay specified by the user - not a real, measured thing. (In fact there is 'clock source latency' and 'clock
network latency' - the difference is not important for this discussion). When the clock is actually created, then that same delay is now referred
to as the "insertion delay". Insertion delay (ID) is a real, measurable delay path through a tree of buffers. Sometimes the clock latency is
interpreted as a desired target value for the insertion delay.
What is clock uncertainty? In ideal mode the clock signal can arrive at all clock pins simultaneously. But in fact, that perfection is not
achievable. So, to anticipate the fact that the clock will arrive at different times at different clock pins, the "ideal mode" clock assumes a
clock uncertainty. For example, a 1 ns clock with a 100 ps clock uncertainty means that the next clock tick will arrive in 1 ns plus or minus 50
ps. A deeper question gets into *why* the clock does not always arrive exactly one clock period later. There are several possible reasons but I
will list 3 major ones:
(a) The insertion delay to the launching flip-flop's clock pin is different than the insertion delay to the capturing flip-flop's clock pin (one paths
through the clock tree can be longer than another path). This is called clock skew.
(b) The clock period is not constant. Some clock cycles are longer or shorter than others in a random fashion. This is called clock jitter.
(c) Even if the launching clock path and the capturing clock path are absolutely identical, their path delays can still be different because of on-
chip variation. This is where the chip's delay properties vary across the die due to process variations or temperature variations or other
reasons. This essentially increases the clock skew
The max transition time is one of the three Design Rules..(max fanout, max transition, max capacitance ) It is much more important than
setup/hold timing.
As we all know, in STA , the delay of each std cell is calculated from looking up the NLDM ( non-linear delay model ) tables which is defined
in library. These tables are two factors : input transition time, and output load. The result of table is the delay value of cell under certain input
transition and output load. If the input transition or output load is within but not the values in NDLM, interploation is utilized to calculate.
If the input transition or output load is out of range of NLDM, ext-interpolation is used to calculation. But it is natual the result would be
rather in-accurate. So the STA will be rather in-accurate. Timing analysis is un - believable .
Now. You can understand how important max tran is One more reason of fixing max transition violation is that bigger transition will result
in bigger DC power consumption
the margin in 30% of max transition is allowed.
for example, if the constraint of max transition is 1ns, then 1.3ns is allowed.
I want to determine the max frequency of a large design (coded in Verilog). I know that one can do static timing analysis. But how to
determine which paths are false paths and which paths are multi cycle paths. My design has no multipliers or any thing like that which could
aid me determining multi cycle paths.
Secondly, if i determine the frequency using static timing analysis, i am not able to run post synthesis timing simulation using this frequency
as i get timing violations such as setup time and hold time.I have to increase the clk period by a factor of 10 or so to get away these timing
violations. The question is how to ensure multicycle constraints are being met during post synthesis simulation.
How important is to do post synthesis gate level simulation?I have found it to be very essential to know that synthesis tool has done its job
correctly and that your simulation results match the golden (pre synthesis) functional simulation.
For the identification of the false path and multicycle path, I have some comments:
1. communicate with the logic designers. they have better understanding about the design, and will give you some valuable points
2. sort out the 10-20% worst path, after the STA. analyze if there are some timing exception path.
3. I heared that there are some tools who can identifies the false path/multicycle
path.
For the highest frequency, I think you should communicate your customer and your backend team parallelly. In my opinion, you can
determine your higheset clock frequency using the slack margin. You can leave a 15%-20% positive slack comparing to the clock period, when
you run the Zero wire load timing analysis.
For the gate level simulation, I don't think it is essential before the completion of placement and wiring. we usuallly give the netlist and SDF
file back to logic designers. Then they will annotate this file in the logical simulation tool, to see if there are some hold/setup violation when
gate level simulation.
I often see the following STA constraints:
set_clock_uncertainty 200ps -setup
set_clock_uncertainty 100ps -hold
why the value use in setup checking is set bigger than the value used in hold checking
You would see these constraints before place and route, before we have a clock tree for the specified block. Adding more uncertainity to the
setup part is over-constraining the block, This could be done by running the block at higher frequency i.e 10% more than u r target
frequency(Just an example, percentage depends up on the designer). Over-constraining the block sometimes may not fetch desired results.
1) to make chip run faster as specified in real silicon, we need more margin for setup in STA
2) the source of uncertainty include : PLL jitter, clock skew(before CTS), OCV (before post-routing), guard margin.
setup uncertainty should include all of them. but we can ignore PLL jitter in hold uncertainty, and OCV uncertainty for hold can less than
setup. Anyway, hold uncertainty always less than setup uncertainty in STA.
Glitch is a spurious output event due to physical characteristics of the circuit realization. If the circuit is synchronous, glitches in
combinational logic do not cause errors at the combinational logic's outputs, as the correct values will finally have been propagated
before the clock's arrival. However, such glitches consume power and if the design is analyzed correctly this power consumption can be
accurately computed. If the circuit is asynchronous, glitches *may* cause erroneous operation of the circuit, as there is no clock to "hide" this
behavior and every signal change is acknowledged. So, glitch analysis is crucial for asynchronous circuits in order to guarantee the correct
operation of the circuit.
Pre-layout simulation is the one you use to check if your gate-level netlist (right after Synthesis) is functionally correct. But the post layout
simulation "no longer in use" was to check that the gate-level netlist is still functionally equivalent the pre-layout one because you have added
the clock tree, buffers, scan chains etc.
In today's physical design flow, designers use functional verification as a way to check that the circuit function is right after synthesis.it uses
mathematical formulations rather that dynamic simulation.it is much faster and more mature now. they also use the STA "statical timing
analysis" to compensate the fact that functional verification can't check timing of the circuit....
There are two clocks in a design, when we are doing scan insertion, adding lockup latches for avoiding the Clock Skews. Is there any other
method to avoid clock skew in multiple design other than lockup latch.
If you take care of the skew between the two clocks while closing the testmode timning, you may aviod the lockup latch. But I am not sure
about the silicon results.
yes this is rightly said, balancing both the scan clocks for the required skew limit should help and validating TEST STA for all the
modes and corners and OCV(on chip variation) should aid your confidence of the design.
what do you loose having a lockup latch across the clocks ? R
yah v have another method other than lookup latch is to mention pin as PRESERVE PIN in clock tree specification file so that clock tree
algorithm makes best to reduce skew .
Timing borrow, we refer to use latch to borrow time, which is typically half a cycle.
Generally, 'timing borrowing' is to use latch's behavior to make slack of previous stage or next stage met according on max. borrowing or
balanced borrowing.
Retiming, is to move register forward or backward through the combinatorial logic to balance the timing paths on every stage in the nearby
register chain.
.
STA stands for Static timing analysis
SSTA stands for Statistical Static timing analysis.
There are so much of uncertainties happens and which we do modeling during timing analysis stages like on-chip variation, wafer variation,
process variation, channel length variation, clock jitter variation, inverse temperature effect, IR drop.
How to ensure the timing targets?
Irrespective so much uncertainties happening during the real-time as a designer we should have a mechanism in place to model these
uncertainties to ensure that the design will meet the timing targets.
So in order to model these uncertainties the traditional method is to add a fixed uncertainty value (usually a bigger margin) to be in the safer
side and close the timing, but there is a catch this adds up in delaying tapeout or time to market.
So came the SSTA, more towards statistical or probabilistic way and models these uncertainties and achieves the timing targets.
This way we can have a meaningful uncertainty number and there by ensuring faster time to market.
one other thing I can add here is that signal skew usually builds up as one traverses the logic along clock trees. This always increases as we
move away from the clock source. It is a pre-layout number. It is a best guess estimate. It is a place holder for the actual post-layout delays and
as such is used only for pre-layout analysis. Jitter, on the other hand, is a lumped-sum number representing all other uncertainties which are
beyond our controls such as; PLL or crystal imperfections, cell and wire technology deviations and fluctuations across different PVT corners.
Jitter applies to both pre and post layout analysis as the others already mentioned.
The spatial variation in arrival time of a clock transition is commonly referred as a clock skew.
Clock jitter refers to the temporal variations of the clock period at a given point on the chip, that is the clock period can reduce or expand on a
cycle by cycle basis
1) for pre-layout designs :

clock uncertainity = skew + jitter
2) for post-layout designs :
clock uncertainity = jitter
B4 CTS there is no way of knowing what is the delay in the clock tree. so an approximate value has to be used while performing the " pre-
layout STA" .
After CTS is performed ( propogatd clock mode), the delay values will be propogated along the clock path and hence skew will be
eliminated !!
jitter : none of the PLLs are perfect, the small variations in the clock signal generated by the PLLs is known as "jitter" . jitter will normally be a
fixed value !
How the clock latency influences timing in the STA

note that clock skew has 2 components
- structural skew
- PVT skew
as you increase clock tree delay and balance it , structural skew is Okay, but PVT skew becomes worst . i.e more buffers in clock path will
have more more variations in PVT (Process , Voltage , Temperature) and hence skew increases.
some people call PVT skew as OCV (on chip variations)
1. In STA, if we find some of the cells which i need to upsize? After upsizing the cells, if the timing get even worse? What may be the
reason?
2. While upsizing the cells, what are the things we need to take care?
whether we have to see previous cell transition and their fanout values?
3. What are the ways we can suggest layout team to fix the violation, i,e in STA when analyzing the timing reports?
Whether we can also consider the follwing reason?
1. Checking skew
2. Cells to upsize
3. net length between two cells.
1. The transition at the input pin would have worsened since by upsizing the cell the input cap increases which could load the previous cell
(low drive) and would caused a timing degradation.
2. we need to see the i/p transition and output load. i/p tran could be high because the load on the previous cell is high or either the i/p tran
on the previous could be high depends on the situation. Based on the situation u can fix the required
3. During post-route .based on the required scenario we can either upsize , downsize, insert buffer or skew the clock . But one should also take
care of physical locations.
Latches are fast,consumes less power, less area than Flops but Glitches can also come along with this advantages. tHats why we for flops.
latch is not good ,since STA is based on posedge of clk to do timing check and latch is level sensitive. also DFT need to do some special step to
tackle this latch! Latches will allow the data at the input to reflect at the output till the entire time the Latch is enabled , ie when it is enabled it
is called to be transparent ie the output follows the input thus if a glitch appears it will be reflected at the output
but the case with FF is not so , the output follows the input only at the edge of the clock whether positive or negative . thus any glitch
appearing at the input will not be transfered to the output unless the clock edge appears .Also Latches are not DFT friendly... It is very
difficult to perform Static timing analysis with latches in your design...
The big problem with clock gates that you absolutely have to verify is that in the logical space the clock signal is "ideal". This means that it
magiacally shows up at all the clock pins at the same time.
In a real circuit, you have to synthesize a clock tree to distribute this clock signal and this involves real delays. This is called "propagated
mode". Because the ideal mode STA doesn't know what the clock insertion delay is to each flip-flop, it is impossible for it to really know if the
enable timing will work or not.
For example: You say the control signal comes from the FFs driven by the 50MHz signal. What if the delay from your 50Mhz clock source pin
to the FFs is 23ns? Will your circuit still work? Check setup and hold times to make sure.So you need to specify target insertion delays and
target skews in the SDC that match what you think is feasible, and then re-simulate. Find out how much insertion delay you can tolerate
before it stops working.
Clock gating is a low-power technique whereby you switch off certain branches of the clock tree when that part of the logic is not needed. This
saves a lot of power by preventing needles switching of those FFs and clock buffers. A clock gate in its simplest form is an AND gate with pin
A connected to the clock and pin B connected to an 'enable' signal. If the enable is high then the clock passes through the AND. If the enable is
low then the clock signal is blocked.
In practice, a clock gate needs a transparent latch as well. Most semiconductor libraries include several Integrated Clock Gating (ICG) cells
that implement a complete clock gate circuit with a latch, test enable, regular enable and clock input/outputs.[/img
sdf - contains delays for every gate and net in the design
dspf - contains RC parasitics data on every net segment in the design.
Multicycle path or False path ?

I think we need to set_falsepath command in this case, becasue doent matter how much delay between different clock domains it takes we
need to specify that path as false path.
The paths between the clock domains we remove the timing constraints, mean we have to set false path only, if u set multicycle(2) path
command, then tool again try to optimize the path for 2 clock cycles, but we dont want to optimize at al
So far we are clear that if they are from same source and have a deterministic relationship, we can time them in STA, and make sure that there
is no setup/hold violation on the data transfer.
But, if they are not coming from same clock source, and we infer that these clock domains are asynchronous, we should not blindly false_path
them.
If its a async crossing,
a) there should be either a synchronizer on this path ( I can see combo cloud in figure, so this is not a part of synchronizer crossing for sure),
b) or should be a data path qualified by a control path which is synced separately.
c) or should be a static signal, which only changes once in a while in manner that its new value is not important for a few cycles, while it
settles down to correct value.
how clock latency influences timing in STA

At first thought, you may think clk latency doesn't make timing worse or better since two flip-flops' clocks get delayed with the same value. I
would agree with you ... but with some exceptions. clk latecncy is specified to model the delay from source of the clk to the destination of the
clk. It includes source latency and network latency. Please don't get confused of above two latencies with clock skews, which measures
uncertainty of clock edge. Given said, when clock latency matters? Let's say your block has a flip-flop with its D input driven by an input port.
Normally you specify input delay for the input port. Now you can see clock latency matters here. The larger clock latency, the better this flip-
flop to meet the setup time, but the worse it to meet the hold time. You can do similar analysis for output flip-flop and you will see clock
latency also matters there and it actually affects the setup timing for the flip-flop at downstream. So clock latency matters for the flip-flops at
block boundary.
Meeting skew is imp ?

There is no problem. Skew is a means to an end, not an end in itself. If all you setup and hold times are OK, then everything is fine. That said,
many layout teams insist on achieving the specified skew target because they believe it makes the circuit more robust against variations.
What I feel is set up and hold time parameters have a dependency on skew value also.Skew changes will affect the set up and hold time.Youe
have said there is no violation of hold and set up time requirements which implies that skew value is not causing any problem
i think thereis no problems but u may analyse skew VIOs
Let us say, to take your example, that you design your circuit with a skew margin of 50ps. This does not mean that every flop-to-flop delay is
at the critical edge of meeting timing (slack = 0). There are many many flop-to-flop delays that have plenty of positive slack, so they can
tolerate a lot of local (adjacent flop) skew and still work.
It is typically only the critical path that has zero slack, and there you want the skew from one flip-flop to the next to be less than 50ps, but the
global skew can be much bigger.
The global skew measures the difference between the earliest and the latest arrival times at any flop in the clock - even if there is never a signal
that goes between them!
So you can easily violate your global skew limit and still have a circuit that works perfectly well. Think about
STA can be run before the clock network has been created (= ideal clock) or after the clock network has been inserted (= propagated clock).
In ideal mode, you are correct that the clock is assumed to arrive everywhere with zero skew, but the SDC constraints do build in a margin for
clock skew called the clock uncertainty. So the amount of time available for a signal to go from the launching FF to the capturing FF is =
Clk_period - setup time - clk_uncertainty.
In propagated mode, however, all the clock arrival times at every FF can be calculated exactly and there are no assumptions made about skew.
The timing equation (for setup) becomes:
Clk_period >= Datapath_delay + Setup + Capture_clock_insertion_delay - Launch_clock_insertion_delay
Notice how there are no estimates or limits here - every delay can be exactly computed. If the equation (and the similar one for hold) is
satisfied then the timing works, no matter what the skew is.
rimeTime is a static timing analyzer (STA). Internally it actually consists of 2 parts:
The first part is a "delay calculator". As the name suggests, this engine calculates the delay through a gate or the delay along a wire. In order to
calculate these delays, PrimeTime needs an electrical-equivalent model for the physical wire. This is generated by another tool called an RC-
extractor (e.g.: Star-RCXT from Synopsys). These "RC parasitics", as the wire model is called, are supplied to PrimeTime in the form of a SPEF
(Standard Parasitic Exchange Format) file. The resulting delay arcs produced by the delay calculator can be dumped out of PrimeTime in the
form of an SDF (Standard Delay Format) file
The second part of PrimeTime is the actual STA engine that computes the slack and the critical paths between all registers, and checks setup
times and hold times.
So, it is quite normal for PrimeTime to use a SPEF because PrimeTime cannot extract parasitics itself. But it is very unusual for PrimeTime to
take in an SDF. This would only happen if the user had some other delay calculator that he/she trusted more than PrimeTime's own delay
calculator.
Hold time is independant of clock speed which makes it a potential design killer. No matter how much you slow your clock you may still
have hold violations. They are easily fixed by inserting buffers or downsizing cells (but making sure not to create any setup violations). Also a
good idea to add some extra hold margin.
The IR drop rang you should care is related to the library you are using. If your cells working at voltage outside the library, no data is valid
for your STA or else.
.How do u come to know that if ur ir drop is at one particular value then u shd reduce otherwise u shd not...These are the constraints given by
the fab to us....So if we exceed 5% of vdd then there is no chance tht ur chip will work fine after fabrication so tht is the reason we vhv to
follow that...
Hope u got this concept..........
Yes. We, divided DRC as two types. one is timing DRC and other one is layout DRC. Timing DRC includes max transition and
max_capacitance (max load) violations. Timing DRC can be analyzed wrto your design and techno lib values using STA tools like PT for sign
off. Some times library characterization values also feed into timing sign off tolls instaed of techno lib values.
Where as layout DRC includes Internal layer checks,wide metal checks , layer-to-layer checks . Can any BE enginer explain about how these
checks are done with design and technology and which tools are sign off for this(other than hercules)?.
For standard cell based ASIC we design a number of basic primitives (like NAND,NOR, Inverter,Mux,etc). The layout and schematic design
part of this is called Library Development. For timing and power analysis you should have all the information regarding a particular cell (e.g,
for given input slew and output load how much delay it should have and power also).This part of the Library design procedure is called
characterization, in which you calculate delay and power for each library primitive for given input slew and output capacitance range.
Library Characterization may also involve the recharacterization of existing designs when there are shifts in the process flow. Typically, fabs
will update the spice models if the process shifts (as they do with time or new equipment or whatever -- especially if new process). So you will
need to recharacterize the libraries, otherwise the simulations you do with the primitives would be misleading or out of date or inaccurate -
take your pick.
While cap load is more meaningful as the amount of time a driver takes for its output to reach the new logical value depends on time taken to
charge the capacitor at its output, but due to resistance between the output of driver, and input of next cell, the effective load of the receiving
cell might be less at the driver ( search for resistive shielding).
Also, if you don't consider resistance, there is not slew degradation from output of driver to input of receiver, which again is not true.
In early prelayout stages while using wireload models, we do tend to ignore resistive load, as we are just modelling the parasitics.
Q1) which library we need to use for synthesis and why?
ANS:- we need to synthesize the RTL with the slow.lib since In the slow library all the delays are calculated according to the worst case
corner. so the delay caused at this point is high compared to the remaining corners. but the cell design is same for all the corners. the compiler
will use the slow.lib is for the SETUP time corner and fast.lib is for the HOLD time calculations.
Q2) why the netlist is different when we synthesis with the SLOW.lib and FAST.lib?
ANS:-
in any compiler the tool will follow the optimisation in the following priority way
1.timing of the design.
2.area of the design.
when the tool is doing the synthesis with the FAST.lib where the delays are less the tool will map the design with the less drive strength
cell(min area since it don't have extra amplifier compared to the high strength cell).so you may expect the cell with X1 drive strength instred
of cell with X2 drive strength.in this manner u may expect the diffrence in the netlist but if u compare the netlist in the formal verification tool
it will report's u that there is no miss match is present in both the netlist. when u do the syntheis with the fast.lib you may see the hold time
problems which are not present when u do the synthesis with teh slow.lib.
WLM does not contain only R and C. It also contains area, Slope,fanout_length. I can cfreate a WLM with R-C=0. But what should be the
values I should put for area, Slope,fanout_length in that WLM.
You can do synthesis without a wire load model, only that you would need to be more consevative while giving your timing constraints. That
can be done either by giving a lower value of clock period or a higher value of clock uncertainty. In all this, it is assumed that post layout sdf
will be used for back annotation at a later stage. So as a first step or in some cases inevitable we use DC to compile the design without a
wireload model
I generated 2 spef files

1) with out metal fill
2) with metal fill
both after routing and when i ran ETS (STA) for timing I see that there is no much effect on timing for in2reg but a few ps on reg2out. Could
some one please tell me wat timing is majorly effected by metal fill ..o
Metal fill is not limited to a certain path group for timing impact, it can affect all timing groups. Generally, if your design is reasonably
utilized then metal should not have significant impact on timing because on most layers there should be a sufficient amount of metal already,
so the addition of metal fill is minimized. If however, you design has low utilization and thus low amount of routing, then metal fill will
change timing dramatically.
Path based STA is broadly used by current STA sign-off tool (Primetime, TimeCraft). Timing constraints will be checked at endpoint of the
timing path.
the AT of endpoint should less than RT.
BLock Based STA:

timing constraints will be checked at each node of the timing path.
each node own its AT and RT. If its AT less than RT, timing is ok.
Extraction can happen in different modes and different process corners.

From the fab, we get the RC data for different corners (for example slow/fast/nominal etc).
when we talk about modes, extraction can happen for ASIC's during the design process at
1. Global mode (post global routing).
2. Final mode (post detailed routing).
In Global mode, once RC's are extracted, the delay model that is used is elmore.
Elmore is a distributed resistance and lumped capacitance model. It accounts for only one pole and hence fast but can introduce some
inaccuracy.
In final mode, the delay model is AWE (asymptotic waveform evaluation). AWE is more accurate because it is a > 2 pole model.
Delay models are important because the delay calculator provides this delay information back to STA (static timing analysis) engine.
Extraction is so important because it effects STA (static timing analysis), which dictates if you met your setup and hold times at every single
flop in your chip.
STA actually finds the longest(critical) path in your circuit which determines your circuit frequency and also if you met all your hold times in
your circuit.
STA uses extraction data at fast corner while calculating hld and slow data while calculating setup to be pessimistic as possible so that your
chip doesn't fail after it comes back from the fab.
Extraction can also be classified as Lumped and coupled. In lumped you basically try to reduce a large RC circuit into an equivalen smaller RC
circuit just taking the dominant poles into account (without any coupling). This is important because you want to reduce large RC circuits to
smaller one's so that you can save on computer memory and run times.
In coupled mode, you basically try to extract the coupling capacitances which contribute to analyzing your crosstalk delay/noise effects on
your chip. If we are extracting for gate level, we have an equivalent RC circuit representation for each gate, which represents the gate
Input/Output capacitances along with holding resistances. We plug these models into the RC network. But this is only for delay computation
purpose (especially for fast slew propagation). But this can be inaccurate. Hence we dont represent gates using RC's. Rather we represent
them using NLDM's and .libs (liberty format).
Reduction only happens mostly for passive components (wires) and not active elements. This reduction we term as model order reduction
(MOR).
We usually plug in .lib (liberty) models for gates. Liberty models have tables which show gate delay as a function of (input transition
time/output loads). what we use at global and final mode for extraction is usually called a 2.5D extractor. This makes use of rules which are
generated using a 3D extractor.
A 3D extractor is most accurate as it uses maxwells equations (to be precise green's function) to calculate RC's of various geometries which
your fabrication facility will most likely manufacture.
To find out the effect of temperature on clock speed we have to look into semiconductor physics. Semiconductors ( p-type and n-type) may
exhibit either +ve or -ve temperature coefficient. When temperature increases, the mobility in MOS transistors may decrease. This decrease in
mobility increases the Vt of the transistor. This makes the transistor slower. Thus, an increase in temperature will decrease the clock speed of a
digital ckt.
Increasing the voltage source increases the potential difference at the gate and also if the source and drain voltage difference is high, the
number of carriers injected may also increase. This increase in carrier injection will tend to increase the switching speed of the transistor
thereby increasing the overall clock speed of the digital ckt.
Simulation : simulation is "dynamic timing analysis". meaning the test benches/ test vectors are used to sensitize each path and chk for the
timing. This is not the best way to verify the Timing , cos its not practically possible to sensitize each and every path in the design using test
vectors !! simulation is used to verify functionality of the design and not timing
STA : This instead of checking functionality, rather breaks up each and every path in the dsgn into timing arcs & verifies purely the timing
between 2 sequential elements.
STA tools will divide the logic of the design into Timing Start points(SP) and Timing end points (EP). Valid SPs are Input port /FF clock pin
and valid End points are FF Data pin and Output ports. After applying the clocks and your constraints, STA tools will try to check the timing
path (Path is from Valid SP to Valid EP). Assume your design is single clock and it runs at 111Mhz(9ns). Tool will try to check every path will
work at 111mhz or not. It wont check beyond that(Becasuee you constraint say check only for 111Mhz). Even though one path works at
150Mhz, it doesnt mean, your design can run at 150mhz.
Negative skew means, clock is reaching capture flop is faster than the launch. So you have small timing window to meet the timing. In other
words possibility of setup violation will increase, so ultimately your frequency will reduce So if your skew is negative, required time will be
less, so probability of violating setup is more. More the violation lower the frequency of a circuit.
The analysis modes can be classified by different ways. The way timing arcs are calculated from clock arrival and Slew propagated across the
path. Let me explain with PT as signoff tool. It may be applicable for other tools too, but not yet worked on timing sign off tools.
To make clear about this, let us consider the single OC also with BC-WC and OCV.
Single OC :
At Launch clock path: Late clock, max delay in clock path ,No derating and single OC.
At Data path : Max delay, Single OC,No derating.
At Capture clock path : Early clock, min delay in clock path, single OC. no derating.
slew propagation : max slew is propagated　irrespective setup or hold.
BC-WC :
At Launch clock path: : Late clock,max delay in clock path, late derating,WC OC.
At Data path : Max delay, WC OC, late derating.
At Capture clock path : Early clock, min delay in clock path, WC OC. early erating
slew propagation : max slew is propagated during setup analysis and min slew propagted during hold analysis.
OCV :
At Launch clock path: : Late clock,max delay in clock path, late derating,WC OC.
At Data path : Max delay, WC OC, late derating.
At Capture clock path : Early clock, min delay in clock path, BC OC. early erating
slew propagation : During setup analysis, max slew for data launching path and min slew for data capturing path. During hold analysis, min
slew for data launch path and max slew for data capture path.
Single OC is used before layout,since design was not extended for clock tree structure. BC-WC is recommended to test both corners after
layout. but OCV will account more pessimisam to account for PVT variations across the die.
Hold is at highest priority then Setup. FPGA can not perform or fails to operate if HOLD violations remains in the design.
The Setup violations directly gives the best operating frequency of the FPGA (performance).
Setup violations are broadly classifies based on Synchronizes/gated logic Like. Clock gating default setup violations , and recovery for
synchronises.
In detail setup can be fixed before / after Clock tree Synthesis and post-routing.
Mostly used methods :
- Use best placement of logic physically ( Timing driven placement, force placement, Netweight , good floor planning e.t.c)
- Upsize the cells driving long nets
- If possible use high speed cells in the logical path causing setup
- Reduce logical depths by deleting cascading buffers/ restructuring .
- Pin Swapping / cloning / buffering / moving etc.
1. Too many logic levels, this is always caused by deep nested "if else" in hdl code, or big adder tree with no pipelining, big comparator or
some other arithmetic circuits synthesized into combinatorial logic. when the logic levels increase the setuoptime increase, by certain degree,
setup time problems comes up. method to decrease logic leves is to use "case" to take instead of "if else". Add pipelining into big arithmetic
circuits that may be synthesized into combinatorial logic.
2 long route delay or big fan-out . long route delay can cause a big propagation delay on signal, this delay will be add up to setup time. When
the place and route is not proper in fpga, this problems will come up. Method to combat this problem is that you can try to use the highest
p&r level . Items which can be chased in ise tool menu, and you can choose timing driven, one hot code of state machine in synthesis attribute
and place attribute of your design tool, cancle resource share attribute. If the tool can't solve this problem with its high effort level, then you
have to do something yourself to gain the desired performance of design. You alse can used high speed net to replace the long delay net
through write UCF constraint in ucf editor. make proper location constraints. Another way to resolve long delay in net is that you can insert
sevel ffs on that long delay net to divide it into sevels short ones. Big fanout can cause a long transition time on the signal edge, the big
transition time can be add to setup time too. Two method can be used to resolve this problem, the first one: you can use your tool to fix this
problem automatically, you can restict the fanout number listed in the synthesis tool property menu. Change it into a much small one if you
had find out that the setup problem is caused by large fanout. The second one: you can duplicate the ff whit have a big fanout. use two or
more ffs two drive the signal that had been driven by a single ff before.
3. unproperly designed asynchronies circuits, it can bring in setup problems too , try your best to avoid asynchronism design ,if it can’t be
avoid ,use proper synchronizer circuit to make robust transition between asynchronism clock domains
4 add input and output registers to your design to hold the input data into fpga and output data out of fpga,then pack these registers into iob
to minimum the input delay and output delay at the boundary of your fpga .
5 some advance techs can be used to promote your design performance too, such as multicycle ,latch and so on.
So few thing you can try in backend design.

1. Identify the worst negative slack path.
2. Check the physical placement and routing of the cell. Check whether it is due to the bad placement or bad routing. If yes, fix this, extract
spef and rerun STA again.
3. Cause during synthesis, this path should pass before the synthesis engineer passes the gate level netlist to the backend.
4. If there is still violation, then you can consider skew to improve your STA violation.
5. You can also check with the synthesis guy on the slack for this path during synthesis
Pre-layout Simulation:
1. RTL Simulation: To Ensure that the design works for functionality.

2. Gate level Simulation: Now RTL is synthesized and we have gate level netlist.
We use this gate level netlist and perform simulation.
To Ensure Functionality and To ensure meets the specific Timing Requirements, we perform Static Timing Analysis with the gatelevel netlist.
3. ATPG Simulation: We also take the Gate level Netlist and perform zero delay simulation and perform ATPG simulations.
Post-layout Simulation:.
Now we have perform layout (place and route), Now we have a kind of real physical Design stuff for our design.
We perform the Extraction (To extract the Resistance/Capacitance) values of the Design in the format called as SPEF(Standard parasitics
Extraction Format).
We use the Place & Route Verilog Netlist and the Extracted SPEF file in the Static Timing Analysis and generate the SDF(Standard Delay
Format) file.
1. Dynamic Gate level Simulation: Use the Place and Route (verilog netlist) and the SDF file and the test vector on our Design testbench and
ensure that the design works after layout.
2. Static Timing analysis: The place and route(verilog netlist) and the SPEF file and the SDC(design constraints file) is used and perform the
timing analysis.
3. Power Analysis : Perform the power simulations and ensure the design meets power requirements.
4. Noise Analysis : Perform crosstalk Noise simultions and ensure the design is immune to Noise
Chip designed voltage is from 2.5 to 5.5 Volt. The real chip works great at Vdd=3V, but when Vdd=5V, one routine is stucked. If I reduce the
external clock from 14MHz to 1MHz, then it works fine. Chip works at room temperature, but same routint is stucked at high temperature,
like 100C. Is that because of timing violation. The digital block is a rectangular block, three sides of this block have thick power line, the fourth
side is open, will that cause some power drop problem?
The thing is : your chip is not showing consistent results:

Lowering the frequency makes it work: means its a setup problem
Raising the voltage then should also make it work, but its failing, which suggests a hold porb:
Raising the temperature makes it fail: Again setup prob suspected.
So,
1. Make sure that you are varing only one parameter at one time.
2. Try to get consistant results: Same path is not likely fail for both setup and hold. So even if the same routine is failing, there may be 2 paths
involving the same routine, one fails due to setup and the other due to hold.
My recommendation:
Do a full chip STA at different corners. You will surely see what path is failing. I dont think Power rail should have anything to do with it
Hope it helps.
26. how to do ILMs for timing optimization?

A) ILM stands for Interface Logic Model. When faced with running STA on huge blocks we use ILMs of child blocks to run top level STA for
better run times. ILMs essentially have just the input and output timing for a block and not the internal timing for blocks
27. how to do Partitioning the Design?

A) Depends on lot of factors like, size, logical hierarchy, timing, aspect ration. It varies from design to design
29. What is AWE (Asymptotic Waveform Estimation) ?

A) It is used for timing engines in tools like PT and Astro to calculate delays. Its more accurate than Elmore
30. why is power planning done and how? which metal should we use for power and ground ring & strips and why?
A) To supply power to evert part of the design. In general we use top levels since the resistance associated with those top layers is less
31. how do we elimate slack if it occurs during First optimization stage (trial routing)?
A) During estimation stage we estimate without net delays. If timing is not meeting we need to analyze and change the SDC or report to the
front end teams.
32. How do we calculate the die size from cell count of our design?
A) We use 2 input NAND gate areaXcellcount+Macros area to caliculate the die size
33. Why Parasitics Extraction for only R and C ,why not L(inductor) ?
A) Since Digital IC run at low frequencies, the Inductance is not a big contributer. Hence we ignore it.
34. what are the output files after physical Design?

A) GDS, SDF, netlist etc
Static Timing Analysis is a methodology to analyze and validate timing on all the timing paths in a Chip.
The various timing paths in a Chip are
1. Purely combinational path(path starting from chip input port and ending at chip output port).
2. Path starting from input port and ending at the Data input of a Register.
3. Path starting from output of a register and ending at the output port of a chip.
4. Purelyregister to register path (Reg to Reg path)
The best part of static timing analysis is , it qualifies all your timing paths of your chip. The only disadvantage is this approach is by mistake if
you have applied a false or multicycle paths then it would not be covered.
But Dynamic Timing analysis is otherwise called as Gate level Simulations with timing information. In this you will be validating with a test
vector (your chip application specific). In this you are guaranting the timing of the chip only for the test cases you are interested with. The
quality of the simulation depends on the quality of your test-case. But the only advantage or the purpose Dynamic timing simulation is a
mechanism to catch the false or multicycle paths which you by mistake used while performing static timing analysis.
Suppose there are 10 setup and 10 hold violations in a design. Manager will come and tell you that design needs to be taped out tomorrow. As
much as possible violations need to be fixed. How you go about fixing these violations?
Which violation you try to fix either setup or hold and why?
Hold violations are more critical than set-up.I would fix the hold and then try set-up.Bcoz to avoid set-up problems you can reduce the
frequency of the design to make it work.But if a hold problem persists your chip functionally fails
pre P&R:-
1) u have to fix the setup than the hold bcz if u fix setup the clock skew after P&R the hold will be fixed and there may be a chance of the some
setup will be fixed and some more will come newly.
The hold will be fixed by using the delay cells or the buffers cells. at the P&R stage .
post P&R:-
1) u have to fix the Hold than the setup if hold is there the chip will not work . if setup is there the chip will work with the redused
frequency.so u have to fix the HOLD first.
2) the hold violation will be fixed by inserting the delay cells or the basic buffers in the violating path.
]3)After fixing the HOLD u have to goto find the which path is going to cause more violation in the setup. ADD some buffer or ad just some
drivestrength of the more delay cell and finally do the STA for the vilating report. i think this will remove all the above problems if itis not
removing all. do the step 3 for the violating path till the all possible violations are removed.
In DFT, it will add lockup latch where there are different clock domain in one scan chain to avoid hold time violation . But if the skew is
greater than half of the cycle , I think lockup latch will not solve the hold time violation .
"In very deep-submicron VLSI, certain manufacturing steps � notably optical exposure, resist development and etch, chemical vapor
deposition and chemical-mechanical polishing (CMP)� have varying effects on device and interconnect features depending on local
characteristics of the layout. To make these effects uniform and predictable, the layout itself must be made uniform with respect to certain
density parameters. Traditionally, only foundries have performed the post-processing needed to achieve this uniformity, via insertion
(�filling�) or partial deletion (�slotting�) of features in the layout. Today, however, physical design and verification tools cannot remain
oblivious to such foundry post-processing. Without an accurate estimate of the filling and slotting, RC extraction, delay calculation, and
timing and noise analysis flows will all suffer from wild inaccuracies. Therefore, future place and- route tools must efficiently perform filling
and slotting prior to performance analysis within the layout optimization loop."
I would like to mention that the Dummy metal and poly is added at the final stage of P&R after the STA, As the Technology scaling reaches to
the 90nm, 65nm etc, the Parasitics due to these Dummy Metals started to play a spoil game on the Timing of the Design, So EDA people are
working on How to place Dummy metal in the Chip without affectiing the Timing of the Design
Gate level netlist is the representation of a circuit in terms of its gates and there interconnections between them.
How to deciede the clock jitter

Clock jitter is a randomized result due to VDD/GND supply noise, interference from other chips on the PCB board and the input reference
noise from crystal oscillator of your chip. If you have clock tree built, then the timing difference between the slowest D flipflop and the fastest
D flipflop with the same clock domain will be your clock tree skew ( ppl use skew instead of jitter in talking about clock tree). Therefore, the
total clock tree skew will be equal to the sum of clock tree skew and the PLL clock jitter or the oscillator clock jitter if you don't use PLL.
Typically for clock tree synthesis using Astrx or silicon ensembxx, they can do a very good job and the range is less than 0.5ns, and so you can
set safely in STA set_clock_uncertainty for setup and hold time for 0.5ns. Usually for fast corner clock skew is less than slow corner since the
delay is smaller in fast corner
The main jitter comes from random noise on VDD & VSS. There is a rough calculation of jitter: if ripple voltage is dv=100mV, the gain of the
buffer is G=10, and the rise/fall time of buffer is tr=0.1ns (=100ps), VCC=3.3V consider the worst case, that is before buffer the VCC is VCC-
0.5dV, and after buffer the VCC is VCC+0.5dv, there is a first order approximation : (dv/G)/dt=VCC/tr ------ (1)
so the uncertainty dt=tr*(dv/G)/VCC =100ps*(100mV/10)/3.3V=303fs
From eq. (1), if you want to reduce jitter (uncentainty) you should do something:
A). increasing G , but the higher gain , the slower speed.
B). decreasing tr, this is true only in the internal of IC, if bonding wire is considered, this will cause ground bounce increasing, so dv is getting
larger.
C). increasing VCC, but the higer VCC , the more power is required.
D). reduce dv by decoupling capacitor between VDD & VSS. because the most ripple come from the ground bounce generated
by L*di/dt, so BGA package is always better then SSOP.
SDF consists cell delays & interconnect delays. At synthesis level this file is used for timing analysis. That interconnect delays are only an
estimation. its not the real & exact interconnect delays. After complete place & route only we can get the exact interconnect delays.
after P&R, parasitic & interconnect delays are extracted using tools, that info is SPEF file. This file will be feedback to static timing analysis
tools to do timing analysis on final p&r netlist.
SPEF file describes spice netlist and RC information.
After running LVS (LVS correctly), you can generate SPEF file from RC extraction tool (Xcalibre,Calibre-XRC--Mentor Star-RCXT--synopsys)
or APR tool.
SPEF has parasitic information. only capacitance and resistance values are annotated on the nets when you use spef and the cell delay come
from library and calculted based on loading of each cell. With SDF all net delays are annotated as if they are absolte values as calculated and
included in SDF file. In short SPEF is more accurate that SDF.
As for SDF file, it describes cell delay(from synthesis library) and interconnect delay(from SPEF file).
You can generate it from APR tool and PrimeTime(STA analysis tool).
If you want to get accuracy SDF file to do STA analysis.
I recommend you can use rc extraction tool to generate SPEF file and load into PrimeTime to generate SDF file to do STA analysis.
False path in a single clock domain is definitely possible. One example is a circuit with 2 modes of operation that is mutually exclusive. There
could be physical paths between logic from one mode to another, which is not a logical path.
One common example is test modes. For example, when BIST is running, you may not care about paths starting from your BIST controller and
flowing into your functional logic and vice versa.
For increase data throughput rates, use is sometimes made of both the rising and the falling clock edge for clocked elements. But it causes a
number of problems, in particular:
� An asymmetrical clock duty cycle can cause setup and hold violations.
� It is difficult to determine critical signal paths.
� Test methodologies such as scan-path insertion are difficult, as they rely on all flip-flops being activated on the same clock edge. If scan
insertion is required in a circuit with double-edged clocking, multiplexers must be inserted in the clock lines to change to single-edged
clocking in test mode.
It will be more safe to use 2 multiple clk with you source clk signal .
False paths are paths which you want to exclude from STA analysis. Following are examples of false paths:
1. Paths between Async-Clock Domains: This is taken care by demetastabalization circuits and are ignored in STA.
2. Paths that exists in circuit but no combination of input vectors can excersise it.
There is a major difference between Clock Tree & Reset Tree - in regards to correct design practices.
1. Clock Tree must always be 'Skew Balanced' to avoid synchronous skips & races.
2. Reset Trees - especially for those cases where the Reset is Asynchrounous - MAY not be 'Skew Balanced' (in most of the times).
3. Reset Tree can be MORE loaded than Clock Tree - e.g. - a relaxed DRC rule cab be set for Reset Tree - since Flip-Flops unstable behavior is
less sensitive for slow slew rates in the Reset input (while the Clock input is).
4. Some ppl consider synchronizing the Reset input signal with the main System Clock. While this is a correct practice to avoid Metastability
at the trailing edge of Reset, some skew problems may arise. For those cases, carefull STA must be run to alert the designer.
n pre-layout STA, the interconnect delay is calculated via wire load model, which count the net delay according to fanout number. It is a
statistical method.
All nets between modules will use wire_load model inherited from top design when set wire_load_mode top, or choose form its upper
module if set wire_load_mode enclosed.
design specification
|
design architecture ( referring to other similar design)
|
RTL coding
|
RTL simulation (block-level and system level)
|
logic synthesys and DFT
|
pre-layout STA
|
floor-planning
|
Place & Routing, CTS
|
DRC & LVS
|
post-layout STA
|
formal check and gate-level simulation
SDF represents worst timing arc, but rspf can be used by delay calculator to compute multiple timing arcs between two nodes, especially for
crossing region of different clock domains
for STA, we should trust on parasitisc extraction. the SDF is only for simulation.
1) If you need accurate result, then you need to use the GDS to do the RC extraction, then use the
generated RC Hspice netlist to do the transistor level simulation.
(2) If you just need run a Hspice simulation, without high requirement on the accuracy. Then you can use "nettran" (in Heculus) or "v2lvs" (in
Calibre) to transform the verilog netlist into the Hspice netlist. To keep the accuracy, some CShunt may be added according to the technology
library used.
When you do HSpice simulation, you shall ask your
Place & Route vendor for the CDL netlist. After they
do P&R, the layout tools can extract some transistor
models and wire resistance/capacitance values.
These values are used for transient simulation.
CDL netlist is different from Verilog.
if you use invclk instead of bufclk, the clock tree will be so long because of the fanout of clock. So the tool will decide place where use inv or
buff.
How important is transition time violations. I have got 600-700 violation in Max Trans.
The clock slew will effect setup and hold time of a flop. The signal slew will affect the delay of gate and wire, as well as noise immunity of
certain circuits. Clock/signal slew mainly depends on your implementation methodology and tech node.
if u target for a low transition value, then tool will put more buff and will improve the slew rate. And defiantly slew value for a particular lib
cell depends on the tech node, as o/p slew of a cell is function of i/p slew and o/p cap value which is tech dependents The max transition
time is one of the three Design Rules..(max fan-out, max transition, max capacitance )
It is much more important than setup/hold timing.
As we all know, in STA, the delay of each std cell is calculated from looking up the NLDM (non-linear delay model) tables which is defined in
library. These tables are two factors: input transition time, and output load. The result of table is the delay value of cell under certain input
transition and output load.
If the input transition or output load is within but not the values in NDLM, interploation is utilized to calculate.
If the input transition or output load is out of range of NLDM, ext-interpolation is used to calculation. But it is natual the result would be
rather in-accurate.
So the STA will be rather in-accurate. Timing analysis is un - believable.
Now. You can understand how important max tran is .
max transition violation is that bigger transition will result in bigger DC power consumption the margin in 30% of max transition is allowed.
for example,if the constraint of max transition is 1ns, then 1.3ns is allowed.
U are talking about the Design Rule Violations which are Max Transition( slew violation), Max Fanout Violation and the last one is the Max
Cap violation which is the load violation...
And this occurs if ur design not meeting the constraints....so u have to make it tight by adding buffers in the appropriate position by seeing all
these max tran , max cap and max fanout report
The question is how to locate the FALSE PATH ... What you answered that once you found the fath /analyzed the path to be false you can
DECLARE it by SET_FALSE_PATH key word but the question remains the same how to identify first that the path is a false path ... then to
declare it with SET_FALSE_PATH declaration.
In the first you need's to run the timing analysis tool without setting up the any false paths.then tool will report some violations that are
very huge in violation. then you have the path corelate it with the design spec wether you need that path to meet timing r not.then you
will get all the false paths.
u r designer will give some false paths load those paths frist other wise you will end up with huge no of violations.
1. What is the reason for flipping the cell rows.

1. In order to save area we abut the two cell rows so the common power and ground is shared by the cells. so we are fliping the cell rows.
2. When you need to leave a gap in between the cell rows, how do you determine the height of the gap.
2.If a horizontal routing resource is not enough we are providing the gap between the cell rows in order to provide some extra horizontal
routing resource in metal1. The gap is determined by the metal1 minimum spacing rule.
3.why core power pads should not be connected to the core power rings in the last.
3. core power pads should be connected to the core power rings in floorplan
4. if you are to use both vertical and horizontal stripes, what are the considerations to decide which one should be added to the power plan
first.
5. how do you find out all the requirements of the clock tree.
5. SDC obtained after synthesis must give u proper idea abt clocks in design
6. purpose of filler cells.

6.Filler cells fill gaps between pad cells and provide routing between them.
7. the filler cells usually have widths that are given as 1x, 2x, 4x, 8x, etc the next bigger filler cell always has its width doubled why?
8. why is it better to insert the filler cells after detailed routing.
8. Beacuse at this time the cells are well placed and the location of the cells are fixed. That we are not going to change the cell placement. So we
can know the exact location of the filler cell. So we are inserting the filler cells after the detailed routing.
8. since filler doesn’t affect timing, there is no need to add them in earlier stages of routing.
9. Why physical verification can detect DRC and LVS violations that are not detected by the P&R tool.
9. Most of P&R tools look only at the metal layer present in std cells, they are not much bothered about (OD, active or diffusion layer). with
only metal info visible it may be possible for them to check for LVS issues
10. STA passes but the simulation fails on the same logic path. reason?
11. Simulation passes but STA fails on the same logic path. reason?
12. The timing requirement of a design is met after the physical synthesis step. Clock tree synthesis is then performed and all the clock trees
meet the skew and latency specifications. however , STA shows that there are many timing paths with very poor timing slack. reason for poor
timing slack?
Yes, you can use clk/2 clock. But iInternally generated clocks are giving rise to testability issues, because the logic driven by an internally
generated clock can't be
made part of the scan chain. Writing timing constraints for generated
clocks becomes more difficult as well.
Solution: Add in test circuitry to bypass the internally generated clock.

For example, if you have a divide-by-2 clock, add in a MUX to
select a primary input clock over the internally generated one
for test. The MUX select line should be controlled by a
test-mode signal coming from a primary input.
Static Timing Analysis Overview

Static timing analysis is a method of validating the timing performance of a design by checking all possible paths for timing violations. To
check a design for violations, PrimeTime breaks the design down into a set of timing paths, calculates the signal propagation delay along each
path, and checks for violations of timing constraints inside the design and at the input/output interface. Another way to perform timing
analysis is to use dynamic simulation, which determines the full behavior of the circuit for a given set of input stimulus vectors. Compared
with dynamic simulation, static timing analysis is much faster because it is not necessary to simulate the logical operation of the circuit.
• Path 1 starts at an input port and ends at the data input of a sequential element.
• Path 2 starts at the clock pin of a sequential element and ends at the data input of a sequential element.
• Path 3 starts at the clock pin of a sequential element and ends at an output port.
• Path 4 starts at an input port and ends at an output port.
Delay Calculation
After breaking down a design into a set of timing paths, PrimeTime calculates the delay along each path. The total delay of a path is the sum
of all cell and net delays in the path. After layout, an external tool can accurately determine the delays and write them to a Standard Delay
Format (SDF) file. PrimeTime can read the SDF file and back-annotate the design with the delay information for layout-accurate timing
analysis. PrimeTime can also accept a detailed description of parasitic capacitors and resistors in the interconnection network, and then
accurately calculate net delays based on that information.
Cell Delay
Cell delay is the amount of delay from input to output of a logic gate in a path. In the absence of back-annotated delay information from an
SDF file, PrimeTime calculates the cell delay from delay tables provided in the technology library for the cell.
Net Delay
Net delay is the amount of delay from the output of a cell to the input of the next cell in a timing path. This delay is caused by the parasitic
capacitance of the interconnection between the two cells, combined with net resistance and the limited drive strength of the cell driving
the net. PrimeTime can calculate net delays by the following methods:
• By using specific time values back-annotated from an SDF file
• By using detailed parasitic resistance and capacitance data back-annotated from file in RSPF, DSPF, SPEF, or SBPF format
• By estimating delays from a wire load model
A wire load model attempts to predict the capacitance and resistance of nets in the absence of back-annotated delay information or parasitic
data. The technology library provides statistical wire load models for estimation of parasitic resistance and capacitance based on the number
of fanout pins on each net. Using a wire load model is less accurate than using back-annotated delays or parasitic data,
Constraint Checking
After PrimeTime determines the timing paths and calculates the path delays, it can check for violations of timing constraints, such as setup
and hold constraints. The amount of time by which a violation is avoided is called the slack
Setup and Hold Checking for Latches

Latch-based designs typically use two-phase, no overlapping clocks to control successive registers in a data path. In these cases, PrimeTime
can use time borrowing to lessen the constraints on successive path To perform hold checking, PrimeTime considers the launch and capture
edges relative to the setup check. It verifies that data launched at the start point does not reach the endpoint too quickly, thereby ensuring that
data launched in the previous cycle is latched and not overwritten by the new data.
PrimeTime lets you specify the following types of timing exceptions:

• False path – A path that is never sensitized due to the logic configuration, expected data sequence, or operating mode.
• Multicycle path – A path designed to take more than one clock cycle from launch to capture.
• Minimum/maximum delay path – A path that must meet a delay constraint that you specify explicitly as a time value.
Reading the Design Data
The first step is to read in the gate-level design description and associated technology library information. PrimeTime accepts design
descriptions and library information in .db format and .ddc and gate-level netlists in Verilog and VHDL formats. The set_output_delay
command specifies the minimum and maximum amount of delay between the output port and the external sequential device that captures
data from that output port. This setting establishes the times at which signals must be available at the output port in order to meet the setup
and hold requirements of the external sequential element. The set_input_delay command specifies the minimum and maximum amount of
delay from a clock edge to the arrival of a signal at a specified input port.
Specifying the Environment and Analysis Conditions

PrimeTime allows you to specify the operating environment and the conditions for timing analysis. For example, you can do the following:
• Specify the process, temperature, and voltage operating conditions, as characterized in the technology library
• Apply case analysis and mode analysis to restrict the operating modes of the device under analysis
• Specify driving cells at input ports and loads at output ports
• Specify timing exceptions for paths that do not conform to the default behavior assumed by PrimeTime
• Specify the wire load model or back-annotated net information used to calculate net delays
PrimeTime offers three analysis modes with respect to operating conditions, called the single, best-case/worst-case, and on-chip variation
modes.
In the single operating condition mode, PrimeTime uses a single set of delay parameters for the whole circuit, based on one set of process,
temperature, and voltage conditions.
In the best-case/worst-case mode, PrimeTime simultaneously checks the circuit for the two extreme operating conditions, minimum and
maximum. For setup checks, it uses maximum delays for all paths. For hold checks, it uses minimum delays for all paths. This mode lets you
check both extremes in a single analysis run, thereby reducing overall runtime for a full analysis.
In the on-chip variation mode, PrimeTime performs a conservative analysis that allows both minimum and maximum delays to apply to
different paths at the same time. For a setup check, it uses maximum delays for the data path and minimum delays for the clock path. For a
hold check, it uses minimum delays for the data path and maximum delays for the clock path.
Driving Cells and Port Loads
The external driver that drives an input port has impedance and parasitic load characteristics that can affect the signal timing. To more
accurately take these effects into account, you can use the set_driving_cell command.
set_load command to specify the amount of capacitance on a port or net, allowing PrimeTime to more accurately calculate the effects of the
load on port or net delay
Timing Exceptions
For a valid timing analysis, you need to specify the paths that are not intended to operate according to the default setup/hold behavior
assumed by PrimeTime. These exceptions include false paths, multicycle paths, and paths that must conform to constraints that you
specify explicitly with the set_min_delay or set_max_delay command.
Wire Load Models and Back-Annotated Delay

To accurately calculate net delays, PrimeTime needs information about the parasitic loads of the wire interconnections. Before placement and
routing have been completed, PrimeTime estimates these loads by using wire load models provided in the technology library. The
set_wire_load_model command specifies which wire load model to use for the current analysis.
Checking the Design and Analysis Setup

Before you begin a full analysis, it is a good idea to check the characteristics of the design such as the hierarchy, library elements, ports, nets,
cells; and the analysis setup parameters such as clocks, wire load models, input delay constraints, and output delay constraints.
The check_timing command checks for constraint problems such as undefined clocking, undefined input data arrival times, and undefined
output data required times.
Fixing Timing Violations

When PrimeTime reports a timing violation, you should examine the violation report to determine whether it is a true violation, and not a
condition such as a false path or an incorrectly specified constraint. PrimeTime lets you temporarily change the design in certain ways,
without modifying the original netlist, so you can easily test the timing effects of those changes. The commands for making these changes are
insert_buffer, size_cell, and swap_cell. To fix a timing problem, you generally need to resynthesize part or all of the design using new timing
constraints. To resynthesize a submodule in the design, you can capture the timing environment of that submodule with the
charcterize_context command. This command creates a script that can be used in Design Compiler to specify the timing
conditions for resynthesis
Timing Paths 5
A timing path is a point-to-point sequence through a design that starts at a register clock pin
or an input port, passes through combinational logic elements, and ends at a register data
input pin or an output port.
Synthesis
Program-like VHDL into hardware design (netlist Transforms )
• Inputs
– HDL description
– Timing constraints (When outputs need to be ready, when inputs will be ready, data to estimate wire delay)
– Technology to map to (list of available blocks and their size/timing information)
– Information about design priorities (area vs. speed)
Different from other signal nets, clock and power are special routing problems
– For clock nets, need to consider clock skew as well as delay.
– For power nets, need to consider current density (IR drop
• P/G routings are pretty regular

• They have high priority as well
– P/G routing resources are usually reserved
– When you do global and detailed routing for signal nets, you cannot use up all the routing resources at each metal layers
• Normally some design rules will be given (e.g., 40% of top metal layers are reserved for P/G)
• Routing resource
– Need to balance the routing resource for P/G, clock and signals
• Voltage drop
– Static (IR) and dynamic (L di/dt) voltage drops
– More voltage drop means more gate delay
– Usually less than 5-10% voltage drop is allowed
– So you may need to size P/G wires accordingly
• Electrical migration
– Too big current may cause EMI problem
• P/G I/O pad co-optimization with classic physical design
• Decoupling capacitor can reduce P/G related voltage drop
– Need to be planned together with floorplanning and placement
• Multiple voltage/frequency islands make the P/G problem and clock distributions more challenging
STA is an effective methodology for verifying the timing characteristics of a design without the use of test vectors
Drawback of STA – cannot determine the timing errors related to logic operations because it does not perform functional simulation.
Eg. Race conditions cannot be detected. Coverage of all the possible timing paths in the design without having to generate test vectors
Timing tools are most appropriate for purely synchronous design.
STA is a technique used to determine all the possible timing violations including setup, hold, recovery removal and pulse width in a design
for a specified clock. without applying test vectors. STA is done on true timing paths ie where functional checks are done (synchronous
designs).After synthesis timing analysis is done to to generate a list of paths which violates the timing requirements.
Static Timing Analysis is a method for determining if a circuit meets timing constraints without having to simulate
 Much faster than timing-driven, gate-level simulation
 Proper circuit functionality is not checked
 Vector generation NOT required
Types of Timing Verification

Dynamic Timing Simulation
Advantages
 Can be very accurate (spice-level)
Disadvantages
 Analysis quality depends on stimulus vectors
 Non-exhaustive, slow
Examples:
VCS,Spice,ACE
Static Timing Analysis (STA)
Advantages
 Fast, exhaustive
 Better analysis checks against timing requirements
Disadvantage
 Less accurate
 Must define timing requirements/exceptions
 Difficulty handling asynchronous designs, false paths
The actual path delay is the sum of net and cell delays along the timing path
“Net Delay” refers to the total time needed to charge or discharge all of the parasitics of a given net
 Total net parasitics are affected by
 net length
 net fanout
 Net delay and parasitics are typically
 Back-Annotated (Post-Layout) from data obtained from an extraction tool
 Estimated (Pre-Layout)

Pulse width
 It is the time between the active and inactive states of the same signal
Signal (Clock/Data) slew
 Amount of time it takes for a signal transition to occur
 Accounts for uncertainty in Rise and fall times of the signal
 Slew rate is measured in volts/sec
IN
V IN IN IN CLKA
IN Rise= Rise= Rise=
Rise= V V V
V
7 7 7 7
Fall=4 Fall=4 Fall=4 Fall=4
CLK IN Rise= B Rise= CLKB

V 7 U 7
F
Fall=4 Fall=4
BU CLKC
Rise=
F
7
Fall=4
Input Arrival time

An arrival time defines the time interval during which a data signal can arrive at an input pin in relation to the nearest edge of the clock signal
that triggers the data transition
Output required time
Specifies the data required time on output ports.
False paths
 Paths that physically exist in a design but are not logic/functional paths
 These paths never get sensitized under any input conditions
Multi-cycle paths
 Data Paths that require more than one clock period for execution
C2 skewed after C1: TW ≥ max TPFF + max tNET + tsu - min tINV
C2 skewed before C1: TW ≥ max TPFF + max tNET + tsu + max tINV
tPFF > th + tSK
TSU = tsu + max tNET - min tC
TH = th - min tNET + max tC
X NET D D Q
CK
CLK Q
• IN CASE OF POSITIVE SKEW TMIN = T CQ + T COMB + T SETUP – T SKEW

• IN CASE OF NEGATIVE SKEW TMIN = T CQ + T COMB + T SETUP + T SKEW
• IN CASE OF POSITIVE SKEW T SKEW (max) = T CQ + T COMB – T HOLD
• IN CASE OF NEGATIVE SKEW T SKEW (max) = T – (T CQ + T COMB + T SETUP)
SDF (Standard Delay Format) data is used to describe back annotation net and cell delays
Synthesis tool uses wire-load model to estimate wire delay from resistive and capacitive load of nets based on fanout
wire load model is a table for estimating the capacitance, resistance and area of a net
It is based on a statistical correlation between net fanout and net parasitics
A wireload model is basically a set of tables net fanout vs load net fanout vs resistance net fanout vs area
How are WLMs Generated?
For a given area on a die, vendors take averages of resistance

and capacitance for nets with different fanout
• A table of averages is then generated
Wire_load_table (“WLM1”) {
fanout_capacitance ( 1, 0.015 );
fanout_resistance ( 1,0.012);
}
Design rules are electrical checks usually defined in the technology library for each gate
• Minimum and maximum limits for capacitance
• Minimum and maximum limits for transition times
• Minimum and maximum limits for fanout
pt_shell> set link_path “* tc6a.db”
pt_shell> read_db “BLOCKA.db BLOCKB.db”
pt_shell> read_verilog TOP.V
pt_shell> link_design TOP
If link_design could not resolve a particular reference PT will create black boxes
A black box is essentially an empty cell with no timing arcs
list_designs lists all designs in PT memory

list_libraries lists all libraries in PT memory
remove_design removes designs from PT memory
remove_lib removes libraries from PT memory
get_cells # Create a collection of cells
get_clocks # Create a collection of clocks
get_designs # Create a collection of designs
get_lib_cells # Create a collection of library cells
get_nets # Create a collection of nets
get_pins # Create a collection of pins
all_clocks # Create a collection of all clocks in design
all_connected # Create a collection of objects connected to another
all_inputs # Create a collection of all input ports in design
all_instances # Create a collection of all instances in design
all_outputs # Create a collection of all output ports in design
all_registers # Create a collection of register cells or pins
all_fanin # Create a collection of all pins/ports or cells
# in the fanin of specified sinks
pt_shell> set var1 [add_to_collection [get_ports “DATA*”] [get_ports “CTRL*”]]

all_fanout # Create a collection of all pins/ports or cells in the fanout of specified sources
add_to_collection # Add object (s)
compare_collections # Compares two collections
copy_collection # Make a copy of a collection
filter_collection # Filter a collection, resulting in a new collection
foreach_in_collection # Iterate over a collection
index_collection # Extract object from collection
remove_from_collection # Remove object (s)from a collection
sizeof_collection # Number of objects in a collection
sort_collection # Create a sorted copy of a collection
create_clock # Create a clock object
create_generated_clock # Create a generated clock object
remove_clock # Remove a clock object
remove_generated_clock # Remove a generated_clock object
remove_propagated_clock# Remove a propagated clock spec
report_clock # Report clock info
get_clock # Get clocks
get_generated_clock # Select generated_clocks
set_propagated_clock # Specify propagated clock latency
Defining a Clock
User MUST Define: User may also define:
Clock Source (port or pin) Duty Cycle
Clock Period Offset/Skew
Clock Name
pt_shell> create_clock -period 10 [get_ports clk]
set_input_delay -min describes the fastest arrival time of the external logic to the input ports
set_input_delay -min 0.3 -clock Clk [get_ports A]
set_output_delay -min describes the hold time of the external logic on the output ports
What should you specify as the min output delay?
What is the minimum delay for the path through S?
More on set_output_delay -min

min output delay = hold time of FF4 -min delay of cloud T
FF4
Capture Edge
set_output_delay -max setup
set_output_delay -min hold
Since the hold time should

be measured after the active
clock edge, we define a
negative number
set_output_delay -min -0.2 -clock Clk [get_ports C]
report_port -input_delay -output_delay Returns input and output delay constraints placed on all ports
report_clock Returns the source, waveform and period of all clock objects in the current_design
check_timing Reports any unconstrained timing paths in the current_design
remove_input_delay Removes input delay constraints from specified ports
remove_output_delay Removes output delay constraints from specified ports
remove_clock Removes clock objects
set in_ports [remove_from_collection [all_inputs] [get_port Clk]] Removes the Clk port from a collection of all input ports
Specify input and output delay constraints for all ports and ensure a clock reaches every register in the design Use check_timing
Environmental Attributes
Which wire load model will PT use to estimate the pre-layout net parasitic data? set_wire_load_model
How will the delays through the boundary cells be calculated? set_driving-cell set_load
Under what process, voltage, and temperature conditions will PT calculate the path delays? set_operating_conditions
In order to accurately calculate the timing of boundary cells, PT needs to know the external capacitate loading
set_load allows the user to specify the external capacitive load on ports
load_of lib/cell/pin to return the pin capacitance of a specific gate from the technology library Set_load [load_of CBA/AN2/A] [get_ports
OUT1]
In order to accurately calculate the timing of input boundary cells, PT also needs to know the driving cell
set_driving_cell allows the user to specify the external driving cell for input ports
• The default drive on input ports is 0
• The driving cell also imposes its design rules on the input port
• You may also specify min and max driving cells
Library cells are usually characterized using “nominal” voltage and temperature
What if the circuit is to operate at a voltage and/or temperature OTHER than nominal?
How is the delay through the net or cell affected?
set_operating_conditions -max “typ_120_4.50” -min “typ_-40_5.25”
set_wire_load_mode top
set_wire_load_model -name “tc6a120m2” [current_design]
With the mode top, the top-level wire load model is used for every net in the design
You may specify min and max wire load models as well
check_timing Reports some timing problems, like missing input delay or unconstrained endpoints
report_analysis_coverage Summarizes all constraints (met, violated, untested) for a design
report_constraint -all_violators -max_delay -min_delay Summary of setup and hold violations in the design
report_bottleneck
report_timing
report_constraint -max_capacitance Reports design rule violations, like max_capacitance
report_port -verbose Reports all port attributes
report_clock Reports all clock attributes
report_design Reports operating conditions and wire load

models on the current_design
Reports (among other things) any
check_timing unconstrained
endpoints
write_script -output Writes a script file containing all attributes and
constraints specified on the current_design
reset_design Removes all constraints and attributes on the
current_design
setup check = (clock_edge-uncertainty - lib_setup)

The transition time of ideal clocks is assumed to be zero
The clock-to-Q delays (and possibly the setup and hold times) of your registers will be optimistically small
set_clock_transition 0.2[all_clocks]
Pre-Layout Post-Layout
creative_clock creative_clock
set_clock_uncertainty set_propagated_clock
set_clock_latency
set_clock_transition
PT will compute clock network latency along the clock tree with set_propagated_clock
Use when clock trees have been inserted and you have back annotated delays and parasitic data
read_sdf top.sdf
set_propagated_clock [all_clocks]
report_timing
set_ propagated_clock override previous values set with
set_clock_latency and set_clock_transition commands
By default, uncertainty will be used with propagated clocks

If PT is used to calculate flop-to-flop skew from clock tree SDF, clock uncertainty should be removed remove_clock_uncertainty
Creating a Virtual Clock
It’s same as defining a clock, but don’t specify a clock pin or port You must name your virtual clock, since there’s no clock port for the
virtual clock
create_clock -name vTEMP_CLK -period 20
Must be named No source pin or port!
set_false_path can be used to disable STA on a path-by-path basis
Useful for:
• Constraining asynchronous paths
• Constraining logically false paths
report_timing -path full_clock
Source Latency
current_design YOUR_DESIGN
create_clock -per 10 [get_ports CLK]
create_clock -per 10 -name VCLK Source Latency and Generated Clocks
set_clock_latency -source 2 [get_clocks CLK]
set_clock_latency -source 1 [get_clocks VCLK] PT automatically calculates source latency for generated clocks
# set_clock_latency 1 [get_clocks CLK] pre-layoutwhen the source clock is propagated
set_propagated_clock [all_clocks]; # post-layout
set_input_delay -max 0.4 -clock VCLK [get_ports A] TOP_LEVEL
FUNC_CORE
CLOCK_GEN
U2
Int_clk
D Q
YOUR_DESIGN
VCLK D Q
0.4ns A X D Q 0.5ns
1ns CLK Ext_clk Qb
CLK
2ns 1ns CLK

1.5ns
Origin of Clock
Source Latency Network Latency
create-clock -period 20 [get_ports Ext_Clk]
create_generated_clock -name Int_Clk -source Ext_Clk \
-divide_by 2 [get_pins CLOCK_GEN/U2/Q]
set_propagated_clock [get_clocks Ext_CLk]
update_timing; # Force PT to calculate source latency
report_clock -skew -attributes; # Validate source latency
Parasitics along the clock tree may degrade clock pulse widths:
• Pulse width too small -- register may not ”see” the clock
• Pulse width shrinks so that it is not propagated through the tree
PrimeTime can be used to ensure a minimum clock pulse width on all flip-flops This analysis is performed after the clock tree is defined
and the clock is propagated Minimum pulse width checks may be included in library cell models
• The minimum pulse width check can be back-annotated from SDF
User can specify a minimum pulse width check on clocks, pins, or cells
set_min_pulse_width -high 1.5 [all_clocks]

set_min_pulse_width -low 1.0 [all_clocks]
report_min_pulse_width
On-Chip variation allows you to account for the delay variations due to PVT changes across the die, providing more
accurate delay estimates When cell(s) are common between clock and data paths, and on-chip variation is used, the cells cannot have two
delays. Bottleneck analysis identifies the cells which are involved in multiple violations . Use the results from bottleneck analysis to
determine designs you wish to recompile or cells you wish to buffer or upsize.
Optimization with OCV
Exact CPPR calculations are expensive, because CPPR is specific to each startpoint-endpoint pair and tracing data paths takes time. Instead,
you can use margins for optimization. config timing common_point slack_range 5p configures the search to include only fanins that have a
slack no more that 5p greater than the slack of the data endpoint. the margin represents the worst slack difference on the endpoint as a result
of OCV and CPPR effects. Margin, which calculates and implements margins on endpoints using one of three options to the config common
point search command (all, relative_slack, and path) and is pessimistic due to simplification of startpoints.
The OCV margin accounts for clock skew induced by delay variation in the clock network and reflects the removal of common path
pessimism. During fix clock, the hold buffering that is performed to repair timing violations can optimize for OCV and crosstalk delay .For
setup, the OCV and crosstalk delay analyses should not be run concurrently. In crosstalk delay analysis for setup, unfriendly switching is
assumed along the launch along and data path and friendly switching is assumed the capture path. CPP should not be removed during
crosstalk delay analysis because the setup check is done between two edges and there is a likely chance for both the edges to behave
differently along the same portions of the clock tree. Therefore, the OCV margin calculation should be performed separately from crosstalk
delay analysis.
4. Enable crosstalk delay analysis and report timing slack.
config timing crosstalk delay on
config timing crosstalk model $m $l -type static -miller_max 1.7 - miller_min 0.4 -timing_window off
config timing crosstalk convergence off
report timing path $m -summary -all -file reports/ocv+crosstalk.rpt
5. Turn off crosstalk delay analysis and calculate the margin for OCV.
config timing crosstalk delay off
run prepare margin ocv setup $m
6. Reset the configuration and enable OCV optimization and crosstalk delay analysis.
config timing crosstalk delay on
config timing margin ocv setup on
config condition on_chip_variation off
config condition case worst

The set_max_fanout command sets a maximum fanout load for specified output ports or
designs Setting a maximum fanout load on a port applies to the net connected to that port. The fanout load for a net is the sum of fanout_load
attributes for the input pins and output ports connected to the net.
The set_min_capacitance command sets a minimum capacitance limit for the nets attached to specified ports or for the whole design. The
set_min_capacitance command sets a minimum capacitance limit for the nets attached to specified ports or for the whole design.
The set_max_transition command sets a maximum limit on the transition time for the nets attached to specified ports, for a whole design, or
for pins within a specified clock domain. Within a clock domain, you can optionally restrict the constraint further to clock paths or data paths
only, and to rising or falling transitions only.
static timing analysis can be performed on a gate-level netlist depending on:

i. How interconnect is modeled - ideal interconnect, wireload model, global routes with approximate RCs, or real routes with accurate RCs.
ii. How clocks are modeled - whether clocks are ideal (zero delay) or propagated (real delays).
iii. Whether the coupling between signals is included – whether any crosstalk noise is analyzed.
Some of the limitations of STA are:

i. Reset sequence: To check if all flip-flops are reset into their required logical values after an asynchronous or synchronous reset. This is
something that cannot be checked using static timing analysis. The chip may not come out of reset. This is because certain declarations such as
initial values on signals are not synthesized and are only verified during simulation.
ii. X-handling: The STA techniques only deal with the logical domain of logic-0 and logic-1 (or high and low), rise and fall. An unknown value
X in the design causes indeterminate values to propagate through the design, which cannot be checked with STA.
iii. PLL settings: PLL configurations may not be loaded or set properly.
iv. Asynchronous clock domain crossings: STA does not check if the correct clock synchronizers are being used. Other tools are
needed to ensure that the correct clock synchronizers are present wherever there are asynchronous clock domain crossings.
v. IO interface timing: It may not be possible to specify the IO interface requirements in terms of STA constraints onl
vi. Interfaces between analog and digital blocks: Since STA does not
deal with analog blocks, the verification methodology needs to
ensure that the connectivity between these two kinds of blocks
is correct.
viii. FIFO pointers out of synchronization: STA cannot detect the problem
when two finite state machines expected to be synchronous
are actually out of synchronization. During functional
simulations, it is possible that the two finite state machines are
always synchronized and change together in lock-step.
ix. Clock synchronization logic: STA cannot detect the problem of

clock generation logic not matching the clock definition. STA
assumes that the clock generator will provide the waveform as
specified in the clock definition.
x. Functional behavior across clock cycles: The static timing analysis

cannot model or simulate functional behavior that changes
across clock cycles.

Negative Skew For Setup: Propagation Delays Load Delays Interconnect Delays

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Negative Skew For Setup: Propagation Delays Load Delays Interconnect Delays

Uploaded by

Copyright:

Available Formats

V = Vdd * [1 - e -t/(Rdh * Cload)] o/p capacitor charges

The product (Rdh * Cload) is called the RC time constant

Positive skew For Hold

Set up time of a flip-flop depends up on what ?

what is meant by virtual clock definition and why do i need it?

1) Bad timing constraints (false path in STA on a real path)

1. CONSTANT or Zero wire load Model used during fix-time stage

3. MANHATTAN -During CTS/Partial at Fix-wire stage (I am not sure)

I came across many definitions for clock latency...

Please can any one explain the below

Now we can get to your questions:

1) for pre-layout designs :

How the clock latency influences timing in the STA

some people call PVT skew as OCV (on chip variations)

Multicycle path or False path ?

how clock latency influences timing in STA

Meeting skew is imp ?

Clk_period >= Datapath_delay + Setup + Capture_clock_insertion_delay - Launch_clock_insertion_delay

Q1) which library we need to use for synthesis and why?

I generated 2 spef files

BLock Based STA:

Extraction can happen in different modes and different process corners.

So few thing you can try in backend design.

1. RTL Simulation: To Ensure that the design works for functionality.

The thing is : your chip is not showing consistent results:

26. how to do ILMs for timing optimization?

27. how to do Partitioning the Design?

29. What is AWE (Asymptotic Waveform Estimation) ?

34. what are the output files after physical Design?

How to deciede the clock jitter

CDL netlist is different from Verilog.

1. What is the reason for flipping the cell rows.

6. purpose of filler cells.

8. why is it better to insert the filler cells after detailed routing.

Solution: Add in test circuitry to bypass the internally generated clock.

Static Timing Analysis Overview

Setup and Hold Checking for Latches

PrimeTime lets you specify the following types of timing exceptions:

Specifying the Environment and Analysis Conditions

Wire Load Models and Back-Annotated Delay

Checking the Design and Analysis Setup

Fixing Timing Violations

• P/G routings are pretty regular

Types of Timing Verification

CLK IN Rise= B Rise= CLKB

Input Arrival time

• IN CASE OF POSITIVE SKEW TMIN = T CQ + T COMB + T SETUP – T SKEW

How are WLMs Generated?

For a given area on a die, vendors take averages of resistance

• A table of averages is then generated

fanout_capacitance ( 20, 0.360 );

list_designs lists all designs in PT memory

pt_shell> set var1 [add_to_collection [get_ports “DATA*”] [get_ports “CTRL*”]]

More on set_output_delay -min

set_output_delay -max setup

set_output_delay -min hold

Since the hold time should

report_port -verbose Reports all port attributes

report_clock Reports all clock attributes

report_design Reports operating conditions and wire load

setup check = (clock_edge-uncertainty - lib_setup)

By default, uncertainty will be used with propagated clocks

2ns 1ns CLK

pt_shell> set var1 [add_to_collection [get_ports “DATA”] [get_ports “CTRL”]]