You are on page 1of 18

TIMING FIXES

Setup time: It is the minimum time required for the data to be stable before
the clock edge.

So total time to propagate the data from launch to capture flop = one time
period (T) – Tsu (setup time of flip flop2)

This is the required time for the data travel from launch to capture flop.

And how much time it does take data to arrive at the D pin of capture flop is
=Tcq (clock to Q delay of FF1) + Tcomb (combinational delay). This is called
arrival time
So condition for setup timing to not violate:

Slack = RT –AT

RT = Tcap + T (Time period) – Tsu (set up time of flip flop 2)

AT = Tlaunch + Tcq +Tcomb + net delay

Slack =+ve (no violation)

=-ve (setup violation)

Example 1: If setup time =2ns, Hold = 1ns, Clock period = 8ns, clock to q
delay = 3ns, Tcap = 3ns, Tlaunch = 2ns,Tcomb = 2ns.

Setup slack = RT – AT

RT = Tcap + T (time period) – Tsu (setup time of flip flop2)

= 3ns +8ns -2ns = 9ns

AT = Tlaunch + Tcq + Tcomb + Net delay

= 2ns + 3ns + 2ns + 0ns = 7ns

Setup slack = RT – AT = 9ns -7ns = 2ns (Here the value is positive


means the slack is met)

Example 2: If setup time =2ns, Hold = 1ns, Clock period = 8ns, clock to q
delay = 3ns, Tcap = 2ns, Tlaunch = 3ns, Tcomb = 2ns, Net delay = 2ns.

Setup slack = RT – AT
RT = Tcap + T (time period) – Tsu (setup time of flip flop2)
= 2ns + 8ns – 2ns =8ns

AT = Tlaunch + Tcq + Tcomb + Net delay

= 3ns + 3ns + 2ns +2ns =10ns

Setup slack = RT – AT = 8ns - 10ns = -2ns (Here the slack is violated)

Setup time fixes:


1.Replace buffer with 2 inverters:

There are two reasons for delay improvement using inverter;

1. Compared to Buffer, Inverter cell delay is less [Buffer is noting but back 2
back connected inverter]

2. And using two inverter, RC delay is further divided & improving transition &
delay. Place the inverters with equal distance between the loads.

[Load ]--------------------------[ BUFFER]------------------------------>|[Load]

[Load ]------------[INVERTER]-------------[INVERTER]----------->|[Load]

2.Change HVT cells to LVT cells:

Another intelligent method, but a leaky one, to reduce the delay of cell is to
swap high threshold voltage (Vt) cell with low Vt cell. Refer to the below
diagram.
The characteristics of NMOS (or PMOS) device is such that, the 'ON'
resistance is inversely proportional to (Vgs - Vt). But, the direct effect is that
low Vt cells are often more leaky i.e. leakage power increases.

"Delay can be reduced by using low Vt cells, but the cost paid is high
leakage power"

3. Increase the drive strength: Another technique to modify the delay of cell is
to 'upsize' or 'downsize' a cell i.e. varying the drive strength ('ON' resistance) of
the cell. This is captured in the figure below.

High drive strength cell indicates a cell having low 'ON' resistance. Due to low
resistance, the time required to charge the output capacitance will be low, i.e.
RC delay reduces. This technique is useful to fix setup violation where the delay
needs to reduce. The inverse (i.e. high 'ON' resistance) is useful for fixing hold
violation, where the delay needs to increase.

"Delay varies by varying drive-strength (ON resistance) of the logic cell"


4. Group pathing: Assume that we have 50 paths . In this 10 paths are violated
less and other 40 paths are violated more. In this case what will happen means
the tool will optimize the less violated paths on whole iterations. But when we
are considering the group paths we are telling to the tool concentrate which are
more violating paths are in the design. Then it will spend more iterations on the
more violating paths and it will reduces the slack.

5.Magnetic placement:

To improve congestion for a complex floorplan or to improve timing for the


design we can use magnet placement to specify fixed object as a magnet and
have the tool place all the standard cells connected to the magnet object close to
it. We can fix macrocells, pins of fixed macro or IO ports as the magnet object.

For best results perform magnet placement before standard cell placement.

Command: magnet_placement

6. Placement bounds: Placement bounds:

It is a constraint that controls the placement of groups of leaf cells and


hierarchical cells. It allows you to group cells to minimize wire length and place
the cells at most appropriate locations. When our timing is critical during
placement then we create bounds in that area where two communicating cells
are sitting far from another. It is a fixed region in which we placed a set of cells.
It comprises of one or more rectangular or rectilinear shapes which can be
abutted or disjoint. In general we specify the cells and ports to be included in
the bound. If a hierarchical cell is included, all cells in the sub-design belong to
the bound.

Types of bounds:

Soft move bound

Hard move bound

Exclusive move bound


Soft move bound:

In this tool tries to place the cells in the move bound within a specified region,
however, there is no guarantee that the cells are placed inside the bounds.

Create bound –name b0 –type soft –boundary {10 10 20 20} instance_1

#define soft bound for instance_1 with its left corner at (10 10) and its upper-
right corner at (20 20).

Hard move bound:

In this tool must place the cells in the move bound within a specified region.

Create bound –name b1 –type soft –boundary {10 10 20 20} instance_2

Exclusive move bound:

In this tool tries to place the cells in the group bound within a floating region,
however, there is no guarantee that the cells are placed inside the bounds

Create bound –name b2 –exclusive –boundary {10 10 20 20} instance_1

7. If net delay is more than break the net and add the buffer:

If net length is long, then we insert buffer to boast. It decreases the transition
time, which decreases the wire delay. If the amount of wire delay decreases due
to decreasing of transition time > cell delay of buffer, then overall delay
decreases.

Hold time: It is the minimum time required for the data to be stable after the
clock edge.
So condition for setup timing to not violate:

Hold slack = AT – RT

AT = Tcq + Tcomb + net delay

RT = Thold

If hold slack = +ve (No violation)

= -ve (hold violation)

If arrival time is less that means data coming is very fast (or early) so hold
violation occurs.

Example 1: If setup time =2ns, Hold = 1ns, Clock period = 8ns, clock to q
delay = 3ns, Tcap =3ns, Tlaunch = 2ns, Tcomb = 2ns.

Hold slack = AT - RT
RT = Tcap + Thold

=3ns + 1ns = 4ns

AT = Tlaunch + Tcq + Tcomb + Net delay

= 2ns + 3ns + 2ns + 0ns = 7ns

Hold slack = AT - RT = 7ns – 4ns = 3ns ( Here the value is positive


means the slack is met)

Example 2: If setup time =2ns, Hold = 1ns, Clock period = 8ns, clock to q
delay = 0.3ns, Tcap =3ns, Tlaunch = 2ns ,Tcomb = 0.2ns, Net delay = 0.2ns.

Hold slack = AT – RT

RT = Tcap + Thold

=3ns + 1ns = 4ns

AT = Tlaunch + Tcq + Tcomb + Net delay

= 2ns + 0.3ns + 0.2ns + 0.2ns

= 2.7ns

Hold slack = AT - RT = 2.7ns – 4ns = -1.3ns (Here the slack is violated)

Hold time fixes: 1. Change LVT cells to HVT.


2. Decrease the drive strength.

3. Inserting the delay buffers.

Latency (or) Clock network delay (or) insertion delay:


It is the time taken by the clock to reach the clock pin from
the clock source. It is divided into two types. Clock source latency and clock
network latency.

Source latency: Here from clock source to definition pin is the clock source
latency.
Network latency: From the clock definition pin to clock pin of the flip flop
is the clock network latency.

OCV (On chip variation): The delay values of IC will vary in different
conditions like changing in processor, voltage, temperature (PVT). The delay
value of IC in cold weather is different and in hot weather is different. In cold
weather the metals in the IC will shrink. In hot weather the metal will expand so
the delay will increase. To overcome this effect flat derate is applied in the
circuit.

In simple words, OCV is a technique in which this flat derate is applied to make
a faster path more fast and slower path slower. So OCV adds some pessimism
in the common path of launch and capture path i.e. for a same cell there are two
delays min and max.

Sources of variations:
There are three major sources of variations, Process, Voltage and Temperature.
These variations are collectively called PVT variations. We already do PVT
analysis and take care of these variations while designing an ASIC, then why
we need to take care of OCV separately? And the answer is, all the variations
cannot be taken care in PVT analysis. Some of them are predictable and can be
modelled easily as the technology get matures but some of them are highly
unpredictable and cannot be modelled easily. Figure-3 shows the various
components of the PVT and OCV variation together.

I. Process Variations:
The drain current of an nMOS transistor in the linear region can be defined as

Where Id is the drain current, μn is the mobility of electrons, ∈ox is the


permittivity of silicon oxide, tox is the oxide thickness, W is the width of
transistors and L is the gate length of the transistor as shown in figure-4.

In the drain current equation, the factors which are dependent on the fabrication
process are:
Gate Oxide Thickness (tox)

Length of the transistor (L)

and Threshold voltage of Transistor

So if any of the factors mentioned above varies during the fabrication process, It
will affect the drain current. The delay of a cell is dependent on the drain
current so due to process variation, the delay of a standard cell is going to vary.
Now see some example, how these parameters can get affected during the
fabrication process. Figure-5 and Figure-6 show the length and width variation
associated with the photolithography process.

Optical Proximity Correction (OPC) is a process which is applied to the layout


before mask generation in order to get better replication of layout on the wafer.
In this process generally, the corner edge is of layout extended to get a better
yield. A general photolithography flow has shown in figure-6.
A photolithography process is a non-ideal process and it is very hard to print the
exact layout on the silicon wafer. So there are variations in the dimension of
actual layout and printed geometry on the wafer.

Process variation generally includes:

Photolithography

Optical Proximity Correction (OPC)

Random Dopant Fluctuation (RDF)

Line Edge Roughness (LER)

Etching

Chemical Mechanical Policing (CMP)

Oxide Thickness Variation (OTV)

So, in conclusion, there are many factors and high chances of variation while
fabrication of a chip and these can lead the vary the delay of the standard cells.

II. Voltage Variations:


The external voltage variation is taken care in the PVT but there could occur
internal voltage variation in your chip based on the design. There could occur
IR drop in your power delivery network which may lead to variation in
available voltage to operate a cell.

Power comes from the power pads/ Bumps and distributed to all standard cells
inside the chip through the metal stripes and rails which is collectively called
the power delivery network (PDN) or power grid. Distance between the power
pad and standard cells could not be the same for all the standard cells. So there
will be a variation of available VDD for the standard cells depending on the
design. Delay of a cell is dependent on the available VDD, If VDD is less delay
will be more.

III. Temperature Variations:


Transistors characteristics are strongly dependent on the junction temperature.
Ambient temperature is taken care in PVT as per the application of ASIC. But
junction temperature is dependent on the design of the chip. Power dissipation
inside the chip could raise the temperature of nearby junctions and it could
affect the performance of the entire chip.

Sometimes there is also the formation of local hotspots based on the placement
density and power requirements of cells which affects the temperature of the
junction and ultimately lead to the variation in current and delay of cells.
Junction temperature is the sum of ambient temperature and the temperature
raised by the power dissipation of cell. This whole thing is not predictable and
cannot be taken care in PVT so we have to take care of these variations in OCV.

CRPR (Clock re convergence pessimism removal ): In this concept


we remove the pessimism to the common path. Generally we add the delay to
every buffer in the process of OCV. But adding more delay is also effect the
speed of the chip and it may cause violations to overcome this we are removing
the delay to the common path in the process of CRPR.

In simple words it can be used to remove the pessimism


and penalty by using common cell for both launch and capture flip flop.

Problem: In the fig three buffers, flip flops, combinational circuit have two
delays one is min delay another is the delay after adding derating i.e. max delay.
Consider Time period 8ns and Tsetup and Thold are 0.2ns.

Without CRPR: -
Setup slack = (required time) min - (arrival time) max
Arrival time = buf1 + buf2 + Tcq + Tcomb

Arrival time = 0.70 + 0.65 +0.60 + 3.6 = 5.55ns

Required time = Tclock period + buf1 + buf3 - Tsu

Required time = 8+ 0.60 + 0.45 -0.2 = 8.85ns

Setup slack = RT –AT = 8.85ns – 5.55ns = 3.3ns


Hold slack= = (arrival time) min – (required time) max
Arrival time = buf1 + buf2 + Tcq + Tcomb

Arrival time = 0.60 +0.55+0.48+2.5 = 4.13ns

Required time = buf1 + buf3 + Thold

Requited time = 0.70 + 0.75+0.2 =1.65ns

Hold slack = AT – RT = 4.13ns -1.65ns = 2.48ns


With CRPR: When comes to OCV analysis, the tool further considers, max.
for data path and min. for clock path during setup analysis. Max. for clock path
and min. for data path during hold analysis. So, buffer placed in the common
path now has 2 values i.e., max. and min. values. As we know, a cell can't have
two different values at a particular instant of time. Thereby we calculate the
buffer value as:

CRPR = Max. Value - min. value


In the CRPR process we are removing the derating to common buffer. Here the
common buffer buf1.so we are considering 0.70ns-.60ns =0.10ns for buf1.

Setup slack = (required time) min - (arrival time) max


Arrival time = buf1 + buf2 + Tcq + Tcomb

Arrival time = 0.10 + 0.65 +0.60 + 3.6 = 4.95ns

Required time = Tclock period + buf1 + buf3 - Tsu

Requited time = 8+0.10 + 0.45 -0.2 = 8.35ns

Setup slack = RT – AT = 8.35ns – 4.95ns = 3.4ns


Hold slack= = (arrival time) min - required time) max
Arrival time = buf1 + buf2 + Tcq + Tcomb

Arrival time = 0.10 +0.55+0.48+2.5 = 3.63ns

Required time = buf1 + buf3 + Thold

Requited time = 0.10 + 0.75+0.2 =1.05ns

Hold slack = AT – RT = 3.63ns -1.05ns = 2.58ns


Without CRPR the setup and hold Slacks values are: 3.3ns,2.48ns.
With CRPR the setup and hold Slacks values are: 3.4ns, 2.58ns.
From the above results, it is clear that with the CRPR method both setup and
hold slacks are benefited.

Input Delay: Input delay is the time at which the data arrives at the input pin
of the block from external circuit with respect to reference clock.

For Example,

let’s say the clock period is 1ns.


And for setup analysis, the data required time for the path FF11 to FF1 is 850ps.

Suppose the maximum delay of the path from the clock pin of FF11 to CIN is
550ps.

Then on block-level, for setup analysis, we have to close the remaining path
that is from CIN to FF1 at 850 – 550 = 300ps.

Input delay path has also two parts, one is clock to q dealy of FF11 and other is
a combinational delay from q to CIN. This path will have max and min delay,
which will be used separately in the setup and hold analysis. So when we apply
input delay we apply two delays, max input delay and min input delay. The
command for applying this delay in the SDC file is as follow.

Setting Input Delay:


create_clock -name RLCK -period 1 [get_ports RCLK]

set_input_delay -max 0.55 -clock RCLK [get_ports CIN]

set_input_delay -min 0.45 -clock RCLK [get_ports CIN]

Output Delay: Output delay is time required by the external circuit before
which the data has to arrive at the output pin of the block with respect to
reference clock.

For Example,
let’s say the clock period is 1ns.
And for setup analysis, the data required time for the path FF2 to FF22 is
800ps.

Suppose the max delay of the path-2 from COUT to FF222 is 250ps.

Then on block-level, for setup analysis, we have to close the remaining path
from FF2 to COUT at 800 – 250 = 550ps.

In SDC file we specify maximum and minimum output delay, which is used
separately for setup and hold analysis. The output delay is the delay from the
output pin to the next register.

Setting Output Delay:

create_clock -name RLCK -period 1 [get_ports RCLK]

set_output_delay -max 0.25 -clock RCLK [get_ports COUT]

set_output_delay -min 0.20 -clock RCLK [get_ports COUT]

The above set of SDC commands will set the maximum output delay of 250 ps
and minimum input delay 200 ps to COUT output pin. We can imagine this like
there is a virtual flop outside the block and the delay from COUT pin to that
virtual flop is output delay of COUT pin. Here output delay has explained with
reference to setup analysis but a similar concept is applicable for the hold
analysis too.

Timing constraints:
False path: It specifies the logic path. In below fig when enable is 0 the
output is 8 (from first block 5 is active and from second block 3 is active
5+3=8) if we didn’t specifies the logic path tool will take wrong path like (5+5
or 3+3).

Multi cycle path: It specifies the no of clock cycles required to propagate


data from start to end of the path.
Min/Max delay: It over ride the default setup and hold constraints with
specific max and min time values.

You might also like