Pddesign

WHAT IS PHYSICAL DESIGN
Physical design.
Physical design means --->> netlist (.v ) converted into GDSII form(layout
form)
logical connectivity of cells converted into physical connectivity.
During physical design, all design components are instantiated with their geometric
representations. In other words, all macros, cells, gates, transistors, etc., with fixed
shapes and sizes per fabrication layer, are assigned spatial locations (placement)
and have appropriate routing connections (routing) completed in metal layers.
Physical design directly impacts circuit performance, area, reliability, power,

and manufacturing yield. Examples of these impacts are discussed below.
1. Performance: long routes have significantly longer signal delays.
2. Area: placing connected modules far apart results in larger and slower chips.
3. Reliability: A large number of vias can significantly reduce the reliability of
the circuit.
4. Power: transistors with smaller gate lengths achieve greater switching
speeds at the cost of higher leakage current and manufacturing variability; larger
transistors and longer wires result in greater dynamic power dissipation.
5. Yield: wires routed too close together may decrease yield due
to electrical shorts occurring during manufacturing, but spreading gates too far apart
may also undermine yield due to longer wires and a higher probability of opens
Due to its high complexity, physical design is split into several key steps
(Fig. 1).
 Partitioning: breaks up a circuit into smaller sub-circuits or module which
can each be designed or analyzed individually.
 Floorplanning: determines the shapes and arrangement of sub-circuits or
modules, as well as the locations of external ports and IP or macro-blocks
 Power and ground routing (power planning): often intrinsic to
floorplanning, distributes power (VDD) and ground (GND) nets throughout the chip.
 Placement: finds the spatial locations of all cells within each block.
1
 Clock network synthesis: determines the buffering, gating (e.g.,
for power management) and routing of the clock signal to meet prescribed skew
and delay requirements
 Global routing: allocates routing resources that are used for connections;
example resources include routing tracks in the channel and in the switch box
 Detailed routing: assigns routes to specific metal layers and routing tracks
within the global routing resources.
 Timing closure: optimizes circuit performance by specialized placement or
routing techniques
**GDS- Graphical Data system
1. The physical design is the process of transforming a circuit description into

the physical layout, which describes the position of cells and routes for the
interconnections between them.
2. The main concern is the physical design of VLSI-chips is to find a layout
with minimal area, further the total wire length has to be minimized. For some
critical nets there are hard limitations for the maximal wire length.
2
ASIC FLOW AND PHYSICAL DESIGN FLOW
The main steps in the ASIC PHYSICAL DESIGN flow are:
1. Design netlist (after synthesis)

2. Floorplanning
3. Power planning
4. placement
5. Clock Tree Synthesis(CTS)
6. Routing
7. Physical verification
8. GDSII generation
3
Physical verification:
After physical design is completed, the layout must be fully verified to
ensure correct electrical and logical functionality. Some problems found during
physical verification can be tolerated if their impact on chip yield is negligible.
Therefore, at this stage, layout changes are usually performed manually by
experienced design engineers.
 Design rule checking (DRC): verifies that the layout meets all technology-
imposed constraints. DRC also verifies layer density for chemical-
mechanical polishing (CMP).
 Layout vs. schematic (LVS): checking verifies the functionality of the
design. From the layout, a netlist is derived and compared with the original netlist
produced from logic synthesis or circuit design.
 Parasitic extraction: derives electrical parameters of the layout elements
from their geometric representations; with the netlist, these are used to verify the
electrical characteristics of the circuit.
 Antenna rule checking: seeks to prevent antenna effects, which may
damage transistor gates during manufacturing plasma-etch steps through the
accumulation of excess charge on metal wires that are not connected to PN
junction node
 Electrical rule checking (ERC): verifies the correctness of power and
ground connections, and that signal transition times (slew), capacitive loads and
fan-outs are appropriately bounded
4
physical design flow
EDA Tools:
1.P&R
Synopsys: ICCI, ICCII, DC COMPILER
cadence: Encounter, innovus
2. TIMING ANALYSIS
Synopsys: Primetime
cadence: tempus
3. physical verification:
Synopsys:Hercules
cadence: Assura
mentor: calibre(mostly used)
5
4.RC Extraction:
Synopsys: StarRCXT
5.formal verification:
Synopsys: formality
INPUTS FOR PHYSICAL DESIGN
Name of Inputs File format Given by

Netlist .v (Verilog) synthesis team
Synopsys Design .sdc(written in TCL) synthesis team
Constraints (SDC)
Timing library/logical .lib(liberty file) vendors
library
Physical library .lef(layout exchange vendors
format)
Technology file .techlef/.tf foundry
TLU+(Table Look Up) .tlup foundry
Description of all inputs

Netlist :
 Textual description of circuits components (like logic gates, combinational
circuits, sequential circuits), so netlist is a collection of gates.
 It contains the logical connectivity of all the cells.
 It can also be a collection of resistors, capacitors or transistors.
example of netlist:
module and_gate(y,a,b);
input a,b;
6
output y;
AND2 U1(.Y(y), .A(a), .B(b));
endmodule
Synopsys Design Constraints(SDC) :

These are timing constraints and used to meet the timings.
constraints are :
 create clock definition

 generated clock definition
 Virtual clock
 input delay
 output delay
 max delay
 min delay
 max transition
 max capacitance
 max fanout
 clock latency
 clock uncertainty etc..
and clock exceptions are also present in SDC
 Multicycle Path
 False Path
 Half Cycle Path
 disable timing arcs
 case analysis
will explain each constraints in detail in further posts......
Timing library/logical library (.lib):
 It contains timing information of standard cells, soft macros, hard macros.

 It contains Functionality information of standard cells and soft macros.
7
 Timing information like cell delay setup, hold, recovery, removal are
present.
 Design rules like max tran, max cap, max fanout, min cap are present
 Contain power information.
 PVT corners are also present. for every PVT corner the timing of cells is
different. hence for every PVT corner there is a .lib file present
cell delay is a function of input transition and output load and is calculated based
on lookup tables.
cell delays are calculated by Nonlinear Delay Model(NLDM) and composite
current source (CCS) models.
CCS(Composite current source) NLDM (Non-Linear Delay Model)
It’s like Norton equivalent circuit It’s like Thevenin's equivalent circuits
current source used for driver modeling Voltage source used for driver
modeling
It 20 variables to account input transition It has only 2 variables
and output load
CCS is more accurate Less accurate
CCS file is 10x times larger than NLDM Smaller than CCS file
because of more numbers of variables
Runtime for CCS is more Runtime is less
in .lib file following units are present,
 time units
 voltage unit
 leakage power unit
 capacitive load unit
 slew rate
 rise and fall time
for each cell following attributes are present,
8
 area of cell
 leakage power
 capacitance
 rise and fall capacitance
 for each pin direction and their capacitance
Lookup tables are defined for different parameters like cell delay, hold, setup,
recovery, removal with different matrix
cell_fall (delay_template_6x6) {
index_1 (“0.015, 0.04, 0.08, 0.2, 0.4, 0.8”);
index_2 (“0.06, 0.18, 0.42, 0.6, 1.2, 1.8”);
values ( \
“0.0606, 0.0624, 0.0744, 0.0768, 0.09, 0.098”, \
“0.1146, 0.1152, 0.1164, 0.1212, 0.1314, 0.1514”, \
“0.201, 0.2004, 0.2052, 0.2058, 0.2148, 0.2168”, \
“0.48, 0.4806, 0.4812, 0.4824, 0.4866, 0.4889”, \
“0.9504, 0.9504, 0.9504, 0.951, 0.9534, 0.975” \
“0.6804, 0.6820, 0.6836, 0.6851, 0.6874, 0.6895" \);
# index_1 represents input transition.
#index_2 represents output load i.e output net capacitance.
Ques: what would be the cell_fall time if input_net_transition is 0.08 and the
output load is 0.6?
Ans: 0.2058
example of library:
cell(OR2_3) {
area : 6.00
power:
rise_time:
fall_time:
pin (O) {
direction : output;
timing () {
related_pin : "A";
9
rise_propagation() }
rie_transition()}
function : "(A||B)"; #functionality
max_cap:
min_cap: }
pin (A) {
direction: input;
cap: ;}
Physical Library(.lef) :
 It contains physical information of standard cells, macros, pads.

 Contain the name of the pin, pin location, pin layers, direction of pin(in,
out,inout), uses of pin (Signal, Power, Ground) site row, height and width of the
pin and cell.
 Contain the height of standard cell placement rows.
 Macros information like cell name, size, dimensions, layout,blockages and
capacitance are defined.
 Design rules, via definitions, metal layers and metal capacitances are
defined.
 For every technology the via and layer definition are different, so in physical
library defined the type of layer(routing/master slice/overlap), width/pitch and
spacing, direction, resistance, capacitance, and antenna factor are defined
 Contain preferred routing Directions, minimum width of the resolution
example of lef:
layer M2
type routing
width 0.50;
end M2
layer via
type cut
end via
macro AND_1
10
origin 0.000
size 4.5 by 12
symmetry x y;
site core;
pin A
dir input;
port
layer M2
end
lef contains two types of views
1. CELL view: it is a full layout of the block and used at the time of tape out.
2. FRAM view: this is an abstract view that has only the pins, metals, via and
blockages that are used in Placement & Route stages. this makes sure that the
interconnection between the pins can be routed automatically and the routing tool
will not route over existing metal/via areas otherwise any shorts will come into the
picture.
Technology File :
 Contain the number of metal layers and vias and their name and
conventions.
 Design rules for metal layers like the width of metal layer and spacing
between two metal layers.
 Metal layers resistance and capacitance as well as routing grid.
 Unit, precision, color, and pattern of metal layer and via.
 Maximum current density is also present in the tech file.
 Contains ERC rules, Extraction rules, LVS rules.
 Physical and electrical characteristics of each layer and via.
 It contains nwell,pwell, metal pitch.
tech file should be compatible with both physical & timing libraries
example of tech file:
/* specify units and unit values*/
technology {
name
unit
operating conditions
11
routing_rule_models
}
/* define six basic color used to create display colors*/
[ primary color {
primary_color_attributes
}]
/* define layer specific characteristics including display*/
/* characteristics and layer specific routing design rules*/
/* define layer and data types*/
/* defining vias used in the design */
/* define inter layer ruting design rules */
/* defining cell rows spacing rules */
/* defining density rules*/
/* defining via and slot rules*/
/* defining capacitance ,resistace and temperature coeff of the layer*/
TLU+(Table Lookup) :
 It is a table containing wire cap at diffrent net length and spacing.

 contain RC coeficients for specific technology.
 TLU+ files are extracted or generated from ITF(contains interconnect
details) file results.
 The main function of this files are--
[a] . Extracted R, C parasitics of metal per unit length
[b] . These RC parasitics are used for calculating net delays.
[c]. If TLU+ files are not present these R,C parasitics extracted from.ITF
files
[d]. For loading of TLU+ we have to load 3 files: 1. TLU+ 2. Min TLU+
3. Max TLU+
[e]. Map file maps the .itf file and .tf files of the layer and via names.
Milkyway.tf also contain parasitics model of wire as TLU+ contains. If you specify
in ICC the TLU+ files then ICC used TLU+ files and did not read parasitics from
.tf. if not specified by default ICC will use .tf.
advantage of TLU+
1.more accurate
2.different TLU+ for different RC corners and scenario.
12
disadvantage of Milkyway.tf ---It used only for one RC corner.
Interview Questions related to this topic:

1. What are the inputs for physical design?
2. What does lef and lib and .tf files contain?
3. How the cell is defined in the library?
4. What does sdc file contains?
5. What is cell delay and net delay and how it is defined and calculated?
6. What are different timing delays models available and what is WLM?
7. From where do you get the WLM? Do you create WLM? How do u specify?
What are the sanity checks before going to start physical design flow?
Sanity checks:
To ensure that the input received from the library team and synthesis team is
correct or not. If we are not doing these checks then it creates problems in later
stages of design.
Basically, we are checking following input files: and make sure that these files are
complete and not erroneous.
1. design/netlist checks
2. SDC checks
3. Library checks
Design checks:
Check if current design is consistent or not
It checks the quality of netlist and identifies:
1. Floating pins
2. Multidriven nets
3. Undriven input ports
4. Unloaded outputs
5. Unconstrained pins
6. Pin mismatch counts between an instance and its reference
13
7. Tristate buses with non-tristate drivers
8. Wire loops across hierarchies
ICC command: check_design:
Checks for multi driven nets, floating nets/pins, empty modules.
Pins mismatch, cells or instances without I/O pins/ports etc.
SDC Checks:
1. If any unconstrained paths exist in the design then PNR tool will not optimize
that path, so these checks are used to report unconstrained paths
2. Checks whether the clock is reaching to all the clock pin of the flip-flop.
3. Check if multiple clock are driving same registers
4. Check unconstrained endpoints
5. Port missing input/output delay.
6. Port missing slew/load constraints.
ICC command: check_timing
Library checks:
It validate the library i.e. it checks the consistency between logical and physical
libraries.
It checks the qualities of both libraries.
check_library: This command shows the name of the library, library type & its
version, units of time, capacitance, leakage power, and current. It shows the
number of cells missing, the number of metal or pins missing in the physical and
logical library.
Ques: what checks you do as you get the netlist before going to the floorplan.?
14
what is Floorplanning
Floor planning:
Floorplanning is the art of any physical design. A well and perfect floorplan leads
to an ASIC design with higher performance and optimum area.
Floorplanning can be challenging in that, it deals with the placement of I/O pads
and macros as well as power and ground structure.
Before we are going for the floor planning to make sure that inputs are used for
floorplan is prepared properly.
Inputs for floorplan:
1. Netlist (.v)
2. Technology file (techlef)
3. Timing Library files (.lib)
4. Physical library (.lef)
5. Synopsys design constraints (.sdc)
6. Tlu+
fig 1. ASIC design
15
After physical design database creation using imported netlist and
corresponding library and technology file, steps are
1. Decide core width and height for die size estimation.

2. IO pad sites are created for placement of IO pad placement.
3. Placement of macros.
4. The standard cell rows created for standard cell placement.
5. Power planning (pre routing)
6. Adding physical only cells
apart from this aspect ratio of the core, utilization of core area, cell orientation, and
core to IO clearance are also taken care of during the floorplan stages.
floorplan control parameter:

core area depends upon :
1. Aspect ratio: Aspect ratio will decide the size and shape of the chip. It is
the ratio between horizontal routing resources to vertical routing resources (or)
ratio of height and width. Aspect ratio = width/height
2. Core utilization:- Utilization will define the area occupied by the standard
cells, macros, and other cells.If core utilization is 0.8 (80%) that means 80% of the
core area is used for placing the standard cells, macros, and other cells, and the
remaining 20% is used for routing purposes.
core utilization = (macros area + std cell area +pads area)/ total core area
Pad placement:
In ASIC design three types of IO Pads. Generally pad placement and pin
placement is done by Top-Level people. It is critical to the functional operation of
an ASIC design to ensure that the pads have adequate power and ground
connections and are placed properly in order to eliminate electro-migration and
current-switching related problems.
1. Power
2. Ground
3. Signal
16
What is electromigration? We will discuss it later in depth.
Electro-migration is the movement or molecular transfer of electrons in the metal
from one area to another area that is caused by a high density electric current
inflows in the metal. High density current can create voids or hillocks, resulting in
increased metal resistance or shorts between wires and it can degrade the ASIC
performance.
One can determine the minimum number of ground pads required to satisfy the
current limit, and the required number of power pads equal to the number of
ground pads.
Where
Ngnd = Itotal/Imax
Ngnd = number of ground pads
Itotal = total current in design (sum of static and dynamic currents)
Imax= max EM current
Current switching noise is generated when there is a transition between states on
metal layers. This will cover in crosstalk topic.
Macro placement:
Macros may be memories, analog blocks. Proper placement of macros has a great
impact on the quality and performance of the ASIC design. Macro placement can
be manual or automatic.
Manual macro placement is more efficient when there are few macros to be placed.
Manual macro placement is done based on the connectivity information of macros
to IO pin/pads and macro to macro. Automatic macro placement is more
appropriate if the number of macros is large.
17
Types of macros:
 Hard macros: The circuit is fixed. We can’t see the functionality

information about macros. Only we know the timing information.
 Soft macros: The circuit is not fixed and we can see the functionality and
which type of gates are using inside it. Also we know the timing information.
Guidelines to place macros:
 Placement of macros are the based on the fly-lines ( its shows the
connectivity b/w macro to macro and macro to pins) so we can minimize the
interconnect length between IO pins and other cells.
 Place the macros around to the boundary of the core, leaving some space
between macro to core edge so that during optimization this space will be used for
buffer/inverter insertion and keeping large areas for placement of standard cells
during the placement stage.
 Macros that are communicating with pins/ports of core place them near to
core boundary.
 Place the macros of same hierarchy together.
 Keep the sufficient channel between macros
channel width = (number of pins * pitch )/ number of layers either horizontal
or vertical
Eg. Let’s assume If there are two macros having 50 pins and the pitch values are
0.6 and the total number of horizontal and vertical layers are 12. Means M0 M2
M4 M6 M8 M10 are horizontal layers and M1 M3 M5 M7 M9 M11are vertical
layers.
Channel width = ((50+50)*0.6)/6
= 10
 Avoids notches while placing macros, if anywhere notches is present then
use hard blockages in that area.
 Avoid crisscross connection of macro placement.
 Keep keep-out margin around the four sides of macros so no standard cells
will not sit near to Macro pins. This technique avoids the congestion.
 Keep placement blockages at the corners of macros.
18
 For pin side of macros keep larger separation and for non-pin side, we can
abut the macros with their halo so that area will be saved and Halo of two macros
can abut so that no standard cells are placed in between macros.
 Between two macros at least one pair of power straps (power and Ground)
should be present.
fig: placement of IO pads, standard cells and macros hierarchy wise
Fig. Abutted non-pin side macros with abutted halo
fig: pin side macros with halo
19
lots of iterations happen to get optimum floorplan. the designer takes care of the
design parameter such as power, area, timing and performance during
floorplanning.
types of floorplan techniques:

1. Abutted:- When the chip is divided into blocks in the abutted design there is
no gap between the blocks.
2. Non abutted:- In this design there is a gap between blocks. The connection
between the blocks is done through the routing nets.
3. The mix of both: This design is a combination of abutted and non- abutted.
outputs of floorplan:
1. get core and boundary area
2. IO ports/pins placed
3. macros placement done
4. floorplan def file
key terms related to floorplan:
standard cell row:
 The area allotted for the standard cells on the core is divided into
rows where standard cells are placed.
 The height of the row is equal to the height of the standard cell and width
varies. The height varies according to multiple standard cell row height. there may
be double-height cells, triple-height cells, etc.
 The standard cells will sit in the row with proper orientation.
20
fig: placement of standard cells in row
fly lines:
 macros are placed manually using fly lines. fly lines are a virtual connection
between macros and macros to IO pads.
 This helps the designer about the logical connection between macros and
pads.
 Fly lines act as guidelines to the designer to reduce the interconnect length
and routing resources.
 fly lines are of two types:
macros to IO pin:
macros to macros fly lines:
21
halo(keep out margin):
 This is the region around the fixed macros so that no other macros and
standard cell can be placed near to macros boundary
 the width of the keep out margin on each side of the fixed cell can be the
same or different depending on how you define keepout margin.
 keeping the placement of cells out of such regions avoids congestion and
produce better qor.
 Halo of two adjacent macros can be overlap.
 If the macros moved from one place to another place, the halo will also
move.
ICC command for keepout margin:

create_keepout_margin -outer {10 10 10 10} macro_name -type soft/hard
create_keepout_margin -inner {10 10 10 10} macro_name -type soft/hard
22
Blockages :
1. Blockages are the specific location where the placing of cells is blocked.
2. Blockages will not guide the tool but it allows not to place the standard cells
buffer and inverters at some particular area i.e by using blockages we blocked the
area so no standard cells and other cells won't be placed.
3. If the macros moved from one place to another place, blockages will not
move.
4. Blockages are of three types. a) soft b) hard c)partial
soft blockages:
 only buffer can be placed.

 prevents from the placement of std cell and hard macro within the specified
area during coarse placement but allows placement of buffer/inv during
optimization, legalization and clock tree synthesis.
 create_placement_blockages -boundary {{10 20} {100 200}} -name PB1 -
type soft
hard blockages:
 No standard cells, macros and buffer/inv can be placed within the specified
area during coarse placement, optimization, and legalization.
 Used to avoid routing congestion at macros corners.
 Control power rails generation at the macros.
 create_placement_blockages -boundry {{10 20} {100 200}} -name PB1 -
type hard
partial blockages:
 Partial blockages limit the cell density in the specified area.

 By default the blockage factor is 100% so no cells can be placed in that
region but if we want to reduce density without blocking 100% area, we can
change the blockage factor.
 create_placement_blockages -Boundry {10 20 100 200} -type partial -
blocked_percentage 40
Partial blockages with a maximum
allowed cell density of 60% (blocked % is 40) enclosed by a rectangle with corners
(10 20) & (100 200)
23
 To allow unlimited usage of a partial blockage area specify a blockage
percentage to zero.
ques. how can you say the floorplan is good?
ans: a good floorplan should meet the following constraints
1. minimize the total chip area
2. routing should be easy
interview question :
1. what is floorplanning? how can you say that your floorplan is good?
2. what are the input and outputs of the floorplan?
3. what are the floorplan control parameters?
4. how will you determine the distance between the two macros?
5. what are the guidelines for placing the macros?
6. what are the steps needed to be taken care of while during floorplanning?
7. what are the issues you will face if floorplan is bad? ... congestion and
timing
8. How can you estimate the area of the block?
9. how will you decide the pin location in block-level design?
10. what are blockages, halo, and keepout margins?
PHYSICAL ONLY CELLS
PHYSICAL ONLY CELLS:

These cells are not present in the design netlist. if the name of a cell is not
present in the current design, it will consider as physical only cells. they do not
appear on timing paths reports.they are typically invented for finishing the chip.
Tap cells:
 A tap cell is a special nonlogic cell with a well tie, substrate tie, or both.
 Tap cells are placed in the regular intervals in standard cell row and distance
between two tap cells given in the design rule manual.
24
 These cells are typically used when most or all of the standard cells in the
library contain no substrate or well taps.
 Generally, the design rules specify the maximum distance allowed between
every transistor in a standard cell and a well or substrate tap.
 Before global placement (during the floorplanning stage), you can insert tap
cells in the block to form a two-dimensional array structure to ensure that all
standard cells placed subsequently comply with the maximum diffusion-to-
tap distance limit.
 command in ICC: create_tap_cells, specify the name of the library cell to
use for tap cell insertion (-lib_cell option) and the maximum distance, in microns,
between tap cells (-distance option)
 icc2_shell>create_tap_cells -lib_cell myreflib/mytapcell -distance 30
fig1: TAP Cells and END CAP Cells

what is latch-up the problem: it is the condition when low impedance path gets
formed between VDD and GND terminal and there is direct current flow from
VDD to GND which might result in a complete failure of chip.while the formation
of CMOS INVERTER we saw the formation of PN junctions and because of these
PN junctions there may be formation of parasitics elements like diode and
transistors.
25
FIG: latch-up phenomenon
Transistors Q1(NPN) and Q2(PNP) are parasitics transistors that are getting
formed during the manufacturing of CMOS inverter. If these two parasitic
transistors are in on condition then current starts flowing from VDD to VSS and
creates a short circuit. while manufacturing these devices the designer made sure
that all PN junction should be in reverse bias so that no parasitic transistor will turn
on and hence the normal operation will not be affected. but sometimes what
happened because of external elements (like input and output) the parasitic
transistors get turned on. for parasitics transistor gets turned on there are two
scenarios::
1. when the input and output > VDD: PNP transistor in ON
condition: because now P region is more positive than N region in Nwell,
therefore Base-Emitter junction of PNP (Q2) transistor is in Forward biased and
now this transistor will turn on. Now if we see in the fig the collector of PNP
transistor is connected to the base of NPN transistor, because of this connection the
current is flowing from collector (PNP) to base (NPN) and then because of this
base current the NPN transistor gets turn on and the current flowing from VDD to
VSS through these two parasitics transistors. This current is flowing even if we
removed the external inputs and outputs and parasitic transistors make a feedback
path in which current is latched up and creates a short circuit path.
26
1. when input and output <VSS: NPN transistor in ON condition: Now N
region is more negative than P region in P substrate, therefore Base-Emitter
junction of NPN (Q1) transistor is in Forward biased and now this transistor will
turn on. Now if we see in the fig the Base of NPN transistor is connected to the
Collector of PNP transistor, because of this connection the current is flowing from
Base (NPN) to Collector (PNP) and then because of this Collector current the PNP
transistor gets turn on and current flowing from VDD to VSS through these two
parasitics transistors. This current is flowing even if we removed the external
inputs and outputs and parasitic transistors make a feedback path in which current
is latched up and creates a short circuit path.
In the fig shown above, the value of Rnwell and Rpsub resistance is quite high, if
the values of these resistances will reduced then what will happen? The current
flowing from the collector of PNP transistor will flow from these resistance paths,
i.e current find the low resistance path to reach from VDD to VSS and NPN
transistor will never get turn on and in this way, latchup problem will not occur.
Solution for latch-up problem:

Reducing the resistance values: tap the Nwell to VDD and Psubstrate to GND
externally.
fig: tapping the VDD with Nwell and VSS with Psub externally
27
Tie cells:
 These are special-purpose cells whose output is constant high or low. The
input needs to be connected to the gate of the transistor and there are only two
types of input logic 1 and logic 0, but we do not connect them directly to gate of
the transistor as with supply glitches can damage the transistor so we used tie high
and tie low cells (these are nothing but resistors to make sure that PG network
connected through them ) and output of these cells are connected to the gate of the
transistor.
 There will be floating nets because of unused inputs they should be tie with
some value either low or high to make them stable.
 Insert the tie cells manually also by command connect_tie_cells, this
command insert tie cells and connect them to specified cell ports.
why tie cells are inserted?
The gate oxide is very thin and it is very sensitive to voltage fluctuations. If the
Gate oxide is directly connected to the PG network, the gate oxide of the transistor
may get damaged due to voltage fluctuations in the power supply. To overcome
this problem tie cells are used.
How the circuit looks like and how it will work:
Tie high cells: initially we directly connect VDD to the gate of transistor now we
connect the output of these cells to the gate of the transistor if any fluctuations in
VDD due to ESD then PMOS circuit pull it back to the stable state. PMOS should
be ON always, the input of the PMOS transistor is coming from the output of
NMOS transistor and here in NMOS gate and drain are shorted and this is the
condition of saturation (NMOS) and NMOS will act as pull-down and always give
a low voltage at the gate of PMOS. now PMOS will on and gives stable high
output and this output is connected to the gate of transistor
28
fig: TIE HIGH CELLS
Tie low cells: initially we directly connect VDD to the gate of transistor now we
connect the output of these cells to the gate of the transistor, if any fluctuations in
VDD due to ESD (electrostatic discharge) then NMOS circuit pull down it back to
the constant low stable state.
in fig the gate and drain of PMOS transistor are shorted and hence this is on
saturation region and it acts as pull up resistor and it always gives high voltage to
the gate of NMOS transistor and because of this high voltage the NMOS transistor
will be ON all the time and we get stable low output because it acts as pull-down
transistor and this output is connected to the gate of the transistor.
fig: TIE LOW CELLS

Filler Cells
 To ensure that all power nets are connected, you can fill empty space in
the standard-cell rows with filler cells.
29
 Filler cells have no logical connectivity. these cells are provided continuity
in the rows for VDD and VSS nets and it also contains substrate nwell connection
to improve substrate biasing.
 Filler cell insertion is often used to add decoupling capacitors to improve the
stability of the power supply and discontinuity in power.
 The IC Compiler II tool supports filler cells with and without metal and
supports both single-height and multi-height filler cells.
fig2: Filler cells, Decap cells

why we need continuity for nwell and implantation :
If there is continuity b/w nwell and implant layer it is easier for foundry people to
generate them and the creation of a mask is a very costly process so it is better to
use only a single mask.
If nwell is discontinuous the DRC rule will tell that place cells further apart i.e
maintain the minimum spacing because there is a well proximity effect.
we know nwell is tap to VDD and P substrate is tap to VSS to prevent latchup
problem. now if there is a discontinuity in nwell it will not find well tap cells, so
we have placed well tap cells explicitly, therefore it will increase the area
explicitly, hence we have filler cells so no need to place well tap cells.
30
when we placed filler cells in the design:
when optimization of clock tree synthesis is completed i.e after timing has been
met because let's say if we want to place buffer/inv for optimization purpose we
can't place these cells because there is already placed filler cells, and enough area
is not there to present buffer/inv, so after timing has been met and routing
optimization is done then only placed the filler cells to fill the empty space.
Well proximity effect:

During the manufacturing of chips, we will get these types of problems, that's why
they are second-order effects. In this, the transistors that are close to the well edge
have different performance than ideally placed transistors, because of this effect
the transistor speed can vary by +- 10%.
The transistor placed to the well boundary so it will get many problems during ion
implantation. Implanted ion is coming to the well boundary and reflected/scatter
from the well boundary to transistors Q1 & Q5 boundary and ions are deposited on
the Q1 Q5 boundary. Ion particles are scattered/reflected due to photoresist on
both side of nwell wall. these ions are deposited only those transistors who are near
to the well boundary, so any one of the terminals of transistors gets affected by ion
implantation and the rest of the transistor will get uniform ions.
well proximity effect
31
Decap cells:
 These are typically poly gate transistors where source and drain are
connected to the ground rails and the gate is connected to the power rails. (fig2)
 The gate in a transistor consumes most of the power(dynamic) only at the
active edge of the clock at which most of the sequential elements are switching.
 No voltage source is perfect hence glitches are produced on the power line
due to this huge current drawn from the power grid at the active edge of the clock.
 We already know that power and ground rails are made up of metal and
metal have its own resistance and inductance, when current is flowing through
these metal wires there is some voltage drop.
 If the power source is far away from a flip flop the chances are that this flop
can go into the metastable state due to this IR drop because it will not get sufficient
voltage to work properly.
 Decap filler cells are small capacitors which are placed between VDD and
GND all over the layout when the logic circuit draw a high amount of current, this
capacitor provides extra charge to that circuit. when logic circuit not drawing any
current, the de-cap is charged up to maximum capacitance.
we know V= IR + L di/dt.
V= IR (voltage drop due to the metal layer)
V= Ldi/dt (voltage drop due to change in current with respect to time)
32
now if the voltage drop is greater than Noise margin(NM) of the inverter then it
might lead to improper functionality. Let's say NM is 0.4 now 1-0.4=0.6v
(anything less than 0.6 will consider as 0 by this inverter)
now let's say voltage drop is 0.5 i.e 1-0.5=0.5v now this will consider as 0
therefore it will lead to improper functionality. now we have seen voltage drop >
NM so here de-cap cells comes into picture so these cells are capacitor that holds
the reservoir of charges and these cells are placed near to the power-hungry cells,
so whenever these cells switch the de-cap cells starts discharging themselves and
provided the required voltage to the particular cells and keep the power supply
constant to all the elements in the design.
With increasing clock frequency and decreasing supply voltage as technology

scales, maintaining the quality of power supply becomes a critical issue. Typically,
decoupling capacitors (de-caps) are used to keep the power supply within a certain
percentage (e.g., 10%) of the nominal supply voltage. De-caps holds a reservoir of
charge and are placed close to the power pads and near any large drivers. When
large drivers switch, the de-caps provide instantaneous current to the drivers to
reduce IR drop and Ldi/dt effects, and hence keep the supply voltage relatively
constant. A standard de-cap is usually made from NMOS transistors in a CMOS
process. At the 90nm technology node, the oxide thickness of a transistor is
reduced to roughly 2.0nm. The thin oxide causes two new problems: possible
electrostatic discharge (ESD) induced oxide breakdown and gate tunneling
leakage. Potential ESD oxide breakdown increases the likelihood that an integrated
circuit (IC) will be permanently damaged during an ESD event and hence raises a
reliability concern. Higher gate tunneling leakage increases the total static power
consumption of the chip. As technology scales further down, with a thinner oxide,
the result is an even higher ESD risk and more gate leakage. The standard de-cap
design experiences these two problems for more please visit the link below.
33
Disadvantages of Decap cells:
 These are very leaky.
 The interaction of de-cap cells with package RLC network .since the die is
essentially a capacitor with small R, L and package having huge RL network, more
de-cap cells placed the more chances of tunning the circuit is into its resonance
frequency, since both VDD and GND will be oscillating, many of the designers are
fails because of this.
Boundary cells (End cap cells):
 Before placing the standard cells, we can add boundary cells to the block.
Boundary cells consist of end-cap cells, which are added to the ends of the cell
rows and around the boundaries of objects such as the core area, hard macros,
blockages, and voltage areas, and corner cells, which fill the empty space between
horizontal and vertical end-cap cells. (fig1)
 End-cap cells are typically nonlogic cells such as a decoupling capacitor for
the power rail. Because the tool accepts any standard cell as an end-cap cell, ensure
that you specify suitable end-cap cells.
 Boundary cells include both end-cap cells placed on the left, right, top, and
bottom boundaries, and inside and outside corner cells and.
 To know the end of the row and at the edges, endcap cells are placed to
avoid the cell's damages at the end of the row to avoid the wrong laser wavelength
for correct manufacturing.
 Boundary cells protect your design from external signals.
 These cells ensure that gaps do not occur between the well and implant layer
and to prevent from the DRC violations.
 When you insert these at the end of the placement row, these will make sure
that these cells properly integrated to the design and will have a clean unwell, other
DRC clean, hence next block will abut without any issues.
34
fig: end cap cells
POWER PLANNING
Before 2000 area, delay and performance were the most important
parameters, if anyone design circuit the main focus was on how much less area is
occupied by the circuit on the chip and what the speed is. Now situation is
changed, the performance and speed is a secondary concern. In all nanometer
(deep sub-micron) technology power becomes the most important parameter in the
design. Almost all portable devices run on battery power. Power consumption is a
very big challenge in modern-day VLSI design as technology is going to
shrinks Because of
1. Increasing transistors count on small chip
2. Higher speed of operations
3. Greater device leakage currents
Grid structure:
Power planning means to provide power to the every macros, standard cells,
and all other cells are present in the design. Power and Ground nets are usually laid
out on the metal layers. In this create power and ground structure for both IO pads
and core logic. The IO pads power and ground buses are built into the pad itself
and will be connected by abutment.
For core logic there is a core ring enclosing the core with one or more sets of
power and ground rings. The next consideration is to construct cell power and
ground that is internal to core logic these are called power and ground stripes that
repeat at regular intervals across the logic or specified region, within the design.
Each of these stripes run both vertically and horizontally at regular interval then
35
this is called power mesh.
Ring and core power ground

The total number of stripes and interval distance is solely dependent on the ASIC
core power consumption. As power consumption (static and dynamic) increases
the distance of power and ground straps interval increase to reduce overall voltage
drop, thereby improving performance.
In addition to the core power and ground ring, macro power and ground rings need
to be created using vertical and horizontal metal layers. A macro ring encloses one
or more macros, completely or partially with one or more sets of power and ground
rings.
It is strongly recommended to check for power and ground connectivity after
construction of the entire PG network.
Standard cell are placed in a row since they are placing side by side they are
touching each other, and horizontal routing across the cells already done so you
don’t have to anything it’s like a continuous line. That’s why we are placing filler
cells if any gap is thereafter optimization.
36
standard cells are placed in the rows
NOTE: as you are going to deep sub-micron technologies the numbers of metal
layers are increased like in 28 nm technology we have only 7 metal layer and for
7nm technology, we have 13 metal layers, and one more thing as we are going to
lower metal layers to higher metal layers the width of the layers is increasing. So
better to use higher layers for the power line. They can carry more current.
Power planning involves

 Calculating number of power pins required
 Number of rings, stripes
 Width of rings and stripes
 IR drop
The calculation depends on the power consumed by chip, total current
drawn, current rating of power pad (A/m), current rating of metal layers, and the
percentage of current flowing through metal layers.
Inputs of power planning

 Netlist(.v)
 SDC
 physical and logical libraries (.lef & .lib)
 TLU+
 UPF
37
Level of power distribution
1. Rings: carries VDD and VSS around the chip
2. Stripes: carries VDD and VSS from the ring
3. Rails: connect VDD and VSS to the standard cell’s VDD and VSS
POWER PLANNING MANAGEMENT

Core cell power management:
 In this power rings are formed around the core.
 If any macros/IP is power critical than create separate power ring for
particular macro/IP.
 Number of power straps are created based on the power requirement.
I/O cell power management:

In this power rings are formed for I/O cells and trunks are created between core
and power rings and power pads.
Ideal power distribution network has the following properties:

 Maintain a stable voltage with little noise
 Avoids wear out from EM and self-heating
 Consume little chip area and wiring
 Easy to layout
Real networks must balance these competing demands, meeting targets of noise
and reliability. The noise goal is typically ±10%; for example, a system with a
nominal voltage Vdd= 1.0 V may guarantee the actual supply remains within 0.9-
1.1V. The two fundamental sources of power supply noise are IR drop and L di/dt
noise.
IR Drop:
The power supply in the chip is distributed uniformly through metal layers
across the design and these metal layers have their finite amount of resistance.
When we applied the voltage the current starts flowing through these metal layers
and some voltage is dropped due to that resistance of a metal wire and current.
This drop is called IR drop. Because of IR drop, delay will increase and it violates
the timing. And this will increase noise and performance will be degraded.
38
The value of an acceptable IR drop will be decided at the start of the project
and it is one of the factors used to determine the derate value. If the value of the IR
drop is more than the acceptable value, it calls to change the derate value. Without
this change, the timing calculation becomes optimistic.
For example, a design needs to operate at 1.2 volts and it has tolerance
(acceptable IR drop limit) of ±0.4 volts, so we ensure that the voltage across VDD
and VSS should not fall below 0.8v and should not rise above 1.6v. If the drop will
be in this limit it does not affect timing and functionality
Power routes generally conduct a lot of current and due to this IR drop will
become into the picture so we are going to higher layers to make these routes less
resistive. For power routing top metal layers are preferred because they are thicker
and offer less resistance.
For example, if there are 15 metal layers in the project then top two or three
layers used by hierarchy people(top level or full chip) and metal 11 and metal 12
used for power planning purposes.
There are two types of IR drop

1. Static IR drop
2. Dynamic IR drop
Static IR drop:
This drop is independent of cell switching. And this is calculated with the help of
metal own resistance.
Methods to improve static IR drop

1. We can go for higher layers if available
2. Increase the width of the straps.
3. Increase the number of wires.
4. Check if any via is missing then add more via.
39
Dynamic IR drop:
This drop is calculated with the help of the switching of cells. When a cell is
switching at the active edge of the clock the cell requires large current or voltage to
turn on but due to voltage drop sufficient amount of voltage is not reached to the
particular cell and cell may be goes into metastable state and affect the timing and
performance.
Methods to improve dynamic IR drop

1. Use de-cap cells.
2. Increase the number of straps.
Tools used for IR drop analysis:

1. Redhawk from Apache
2. Voltage storm from cadence
Electromigration:
When a high density of current is flowing through metal layers, the atoms
(electron) in the metal layers are displaced from their original position causing
open and shorts in the metal layers. Heating also accelerates EM because higher
temperature cause a high number of metal ions to diffuse.
In higher technology nodes we saw the EM on power and clock metal layers but
now in lower nodes the signal metal layers are also needed to be analyzes due to an
increased amount of current density.
Clock nets are more prone to EM because they have high switching activity,
because of this only we are avoiding to use high drive strength clock buffers to
build the clock tree.
Methods to solve EM:

1. Increase the width of wire
2. Buffer insertion
3. Downsize the driver
4. Switch the net to higher metal layers.
5. Adding more vias
6. Keep the wire length short
40
Checklist after power planning:
1. All power/Ground supplies defined properly?
2. Is IR drop acceptable?
3. Is the power supply is uniform?
4. Are specials cell added? (Tap cells and end cap cells)
5. Any power related shorts/opens in the design?
UPF & special cells used for power planning
Unified Power Format (UPF):
UPF is an IEEE standard and developed by members of Accellera.UPF is

designed to reflect the power intent of a design at a relatively high level.
UPF scripts describe which power rails should be routed to individual blocks,
when blocks are expected to be powered up or shut down, how voltage levels
should be shifted as signals cross from one power domain to another and whether
measures should be taken to retain register and memory-cell contents if the
primary power supply to a domain is removed.
The backbone of UPF (Synopsys), as well as the similar Common Power
Format (CPF) (cadence), is the Tool Control Language (TCL). The TCL command
“create_power_domain”, used to define a power domain and its characteristics.
for example, if this command is used by UPF-aware tools to define a set of blocks
in the design that are treated as one power domain that is supplied differently to
other blocks on the same chip. The idea behind this type of command is that
power-aware tools read in the description of which blocks in a design can be
powered up and down independently.
Content in UPF:
 Power domains: some time design have more than one power domain like
multi Vdd design and sometimes only a single power domain design. In this group
of elements share a common set of power supply.
 Supply rails: distribution of power in supply nets, ports, supply sets, power
state
41
 Additional Protection By special cells: level shifters, isolation cells, power
switches, retention registers
Isolation cells:
Isolations cells always used between two domains active mode domains and
shut down mode domain.
Consider there are two power domains one D1 is in shutdown mode and D2
is in active mode, if data is passed through D1 to D2 this will not a valid data
received at D2. To prevent this condition we insert an isolation cell b/w these two
power domains to clamp a known value at the D2 otherwise it will give unknown
value.
Shut down domain outputs may be floating outputs and this could be a
problem when other active domains get these floating outputs as an input. This
could affect the proper functioning of the active domain.
These cells are also called “clamp” because these cells convert the invalid or
floating outputs to a known value or clamp it to some specified known value.
Level shifter:
The chip may be divided into multiple voltage domains, where each domain
is optimized for the needs of certain circuits. For example a system on a chip might
use a high supply voltage for memories to ensure cell stability, a medium voltage
42
for a processor and low voltage for IO peripherals. Some of the challenges in using
voltage domain include voltage levels for signals that cross domains, selecting
which circuits belongs in which domain
Whenever a signal is going from low domain to high domain there will not
be full output swing available at the output of high domain or vice versa. In this
condition we required level shifter to shift up or down the voltage level according
to the requirement.
Let us consider one example, there are two designs with different power
domains, one design is working on 1.5 v and the other is working on 1 v power
supply. Now if the signal is passed from 1.5v domain to 1 v domain then wrong
logic is interpreted by 1v domain. So to prevent this level sifter are inserted b/w
two domains. The main function of the level shifter is to shift the voltage level
according to the requirement of voltage levels.
Retention Registers:
To reduce power consumption when the devices are not in use, the power
domain is switched off for those devices. When the design blocks are in switched
off or sleep mode, data in all flip-flops contained within a block will be lost. If the
designer desires to retain this state then retention registers are used. Retention
register requires D flip-flop and latch and required always-on supply to retain the
data. Using the retention register area required more as compared to normal flop
because here we are using flip-flop with always-on supply. So the area required
more when we are using retention registers.
43
Always On cells:
These cells are always on irrespective of where they are placed. Generally
buffer and inverters are used for always ON cells. Always cells are also present in
UPF. They can be a special cell or regular buffer. If they are special cells they have
their own secondary power supply and placed anywhere in the design.
If they are regular buffer/inverters they require always-on cells and restrict
the placement of these types of cells in the specific region
For example if data needs to be routed through or from the sleep block
domain to active block domain and distance between both domains is very long or
driving load is also very large then buffer might be needed to drive the long nets
from one domain to another.
SOURCES OF POWER DISSIPATION IN CMOS
Sources of power dissipation:

Power dissipation in CMOS circuits comes from two components:
44
Dynamic dissipation due to:
 charging and discharging load capacitance as gate switch.
 "short circuit" current while both PMOS and NMOS are partially ON.
Pdynamic = Pswitching + Pshort circuit..........................................(1)
Static dissipation due to:

 subthreshold leakage through OFF transistors
 gate leakage through the gate dielectric
 junction leakage from source /drain diffusions
Pstatic = (Isub + Igate + Ijunction)Vdd.........................................(2)
So,
Ptotal = Pdynamic + Pstatic............................................................(3)
Dynamic power:
Whenever input signals in the circuit are changed its state with time, and it
causes some power dissipation. Dynamic power is required to charging and
discharging the load capacitance when transistor input switches
When the input switches from 1 to 0 the PMOS transistor (PULL UP network)
turns ON and charges the load to VDD. And that time energy stored in the
capacitance is
EC = ½ CL V2DD
The energy delivered from the power supply is
EC = CL V2DD
Observed that only half of the energy from the power supply is stored in the
capacitor. The other half is dissipated (converted to heat) in the PMOS transistor
because transistor has a voltage across it at the same time current flows through it.
The dissipated power depends on only the load capacitance not on transistor or
speed at which the gate switches.
When the input switches from 0 to 1, the PMOS transistor turns off and the
NMOS transistor is turned ON, and discharging the capacitor. The energy stored in
45
the capacitor is dissipated in the NMOS transistor. No energy is drawn from the
power supply in this case.
Depending upon the inputs at the gate of the transistor one gate is on the
other is off because of charging and discharging of the capacitor. The inverter is
sized for equal rise and fall times so we know that in one cycle we have rising and
falling transition.
On rising edge output change Q = CVDD is required to charge the output

node to VDD (i.e. cap is charged to VDD) and on falling edge the load capacitance is
discharged to GND.
Now suppose gate switches at some average frequency fsw (switching frequency).
Over the time period T, the load is charging and discharging T*fsw times. So
average power dissipation is
Pswitching = CV2DD fsw
This is called dynamic power because it arises from the switching of the
load. Because most gates don’t switch every clock cycle, so it is convenient to
express switching frequency as an activity factor(α) times the clock frequency f,
now power dissipation written as
Pswitching = α CV2DD f
Activity factor:
The activity factor is the probability that the circuit node transitions from 0
to 1 because that is only the time the circuits consume power. A cock has an
activity factor α = 1 because it rises and falls every cycle. The activity factor is
powerful and easy to use lever for reducing power.
If a circuit can be turned off entirely, the activity factor and dynamic power go to
zero. When a block is on the activity factor is 1. Glitches in the circuit can increase
the activity factor.
46
Techniques to reduce dynamic power:
we know
Pswitching = α C V2DD f
 Reduce α (clock gating, sleep mode)
 Reduce C (small transistors, short wires, smaller fan-out)
 Reduce Vdd (reduce supply voltage up to which circuit works correctly)
 Reduce f (reduce the frequency to the extent if possible power-down mode,
without sacrificing the performance beyond acceptable level)
Short circuit power:
Dynamic power includes a short circuit power component. It occurs in
CMOS when input of gate switches. When both pullup and pulldown networks are
conducting for a small duration and there is a direct path b/w VDD to VSS. during
this scenario spikes will be generated momentarily in the current as shown in fig
below. The current is flowing from VDD to VSS is also called cross-bar current.
This is normally less than 10% of the whole power so it can be estimated by
adding 10% to the switching power. This power is directly related to the frequency
of switching, as clock frequency increases the frequency of transition is also
increased thus short circuit power dissipation also increases.
SHORT CIRCUIT CURRENT PATH IN CMOS
47
when both transistor are ON momentarily the short circuit comes in the form
of spike
Clock gating:
Power dissipation is highly dependent on the signal transition activity in a
circuit and the clock is mainly responsible for signal activities. Clock is the brain
of the entire system so wherever clock transition takes place all the circuit works
synchronously. Sometimes we have not required that clock to some blocks if we
disable this clock to that particular block then switching activity reduced and
activity factor α will also reduce and power dissipation also reduced.
The inactive Registers are loaded by clock not directly but loaded through
the OR gate using enable signal. When we know we don’t require a functional unit
we will set the enable to 1 so that the output of the OR gate will be constant 1, and
the value of register will not change and therefore there will be no signal transition
into a functional unit. When we are switching off the clock signal from the
functional unit additional logic is required depending on the scenario that the
functional unit required or not. But clock signal might add some delay in the
48
critical path due to additional circuitry (OR gate and for enable different circuitry)
then skew analysis is required.
CLOCK GATING
Glitches:
We know in reality every gate has a finite amount of delay because of that
delay only glitch will occur.
Let input A B C changes from 010 to 111.
When input 0 1 0 then output O1=1 and O2= 1
Now the input is changed to 111 then output O1 = 0 but O2 =1 remains the same.
These both the ideal case when gate delays are zero.
Now what happened when we will consider the gate delays also?
The gate delay will be added on to both output because of the small amount
of delay the glitch will appear.
49
so how it will be reduced?
50
The output is the same for both and the number of gates is also the same and
we achieved reduced glitch power because the signal will reach at the same time in
figure 2 (output of first and second gate) so there is no delay difference. Another
advantage that the critical path delay is also reduced (in first figure it takes 3 gates
to get the outputs and in second figure output takes only 2 gates. So we can reduce
the glitch power dissipation by the realization of the circuit in balanced form
instead of cascaded form.
HOW TO REDUCE SUPPLY VOLTAGE (VDD) TO REDUCE DYNAMIC

POWER:
We know two design approach
Static approach: where the distribution of power supply voltage is fixed a priori
among various functional blocks.
Dynamic approach: where the power supply voltage is changed as required.
Static voltage reduction approach:

First, you analyze the circuit and we can reduce the supply voltage that will make
your circuit consumes less power but the circuit becomes may be slower so you
have to analyze the circuit and find out which part of the circuit is not critical in
terms of delay, you can possibly make that part little slower without touching the
overall performances. So identify those parts and reduce the power supply for that.
51
Suppose I have a circuit having three functional modules, let’s assume we
required the central block to be running fast but the other two blocks are running
slower. So we use pair of voltage rails one is for low supply voltage and the other
is for high supply voltage so the block which is supposed to run faster they are fed
by the high supply voltage and the other two feed by low supply voltages rails, so
our voltage is saved.
Here the distribution of the voltages is always fixed and here additional
circuitry is required between different power domains because signals are sending
from slow block to fast block and vice versa, some voltage translation is required.
Dynamic approach:
We adjust the operating voltage and frequency dynamically to match the
performance requirements.
The modern-day processors can have different power modes like our laptops
have different mode standby, sleep, hibernate, etc. Previously power is not the
issue now power becomes more important because everyone wants high-
performance mode with high battery mode i.e. system should run for a longer time.
This approach provides flexibility & doesn’t limit the performance. The penalty of
transition between two states can be high, it will take some time from one mode to
another mode so it should not be done frequently.
High-performance mode: higher VDD and f
Power saving mode: lower VDD and f
VOLTAGE ISLANDS (MULTI-VDD)
Cells are arranged in a row, there are two different voltage domains high and low.
So all the cells who are supposed to give the high performance will be placed in
high voltage domain and lower performance cells are sits in the low voltage
domain. This is allowed for both macros and standard cell voltage alignment.
sometimes times voltage island is in the same circuit row like some circuit has
performance and some have high but if we are doing this the problem of power
routing will become more difficult.
52
DIFFERENT POWER DOMAINS
Static power:
Static power is consumed even when a chip is not switching they leak a
small amount of current. CMOS has replaced NMOS processes because contention
current inherent to NMOS logic limited the number of transistors that could be
integrated on one chip. Static CMOS gates have no contention current.
In processes with feature size above 180nm was typically insignificant
except in very low power applications. In 90 and 65 nm processes, threshold
voltage has reduced to the point that subthreshold leakage reaches levels of 1 sec to
10 sec of nA per transistor, which is significant when multiplied by millions of
transistors on a chip.
In the 45 nm process, oxide thickness reduces to the point that gate leakage
becomes comparable to subthreshold leakage unless high-k dielectric is employed.
Overall, leakage has become an important design consideration in nanometer
processes.
LEAKAGE CURRENT PATHS
53
Static power sources:
1. Subthreshold leakage current: b/w source and drain. This is the dominant
source of static power
2. Gate leakage current: from gate to body.
3. Junction leakage current: from source to body and drain to the body.
Static power reduction techniques:
1. Power gating
2. Multiple threshold voltages
3. Variable threshold voltages
Subthreshold leakage current:

It is caused by the thermal emission of carriers over the potential barrier set
by a threshold. In a real transistor, current does not abruptly cut off below the
threshold but rather drops off exponentially as shown in fig:
When the gate voltage is high (Vgs>Vt), the transistor is strongly ON. When
the gate falls below Vt (Vgs < Vt) the exponential decline in current appears as a
straight line. This current increase with temperature. It also increased as Vt is
scaled-down along with the power supply for better performance.
Various mechanism affects the subthreshold leakage current
1. DIBL effect
2. Body effect
3. Narrow width affect
4. Effect of channel length
5. Effect of temperature
Gate leakage current:

It is a quantum mechanical effect caused by tunneling through an extremely
thin gate dielectric. Gate leakage occurs when carriers tunnel through a thin gate
dielectric when a voltage is applied across the gate (when the gate is ON). Gate
leakage is an extremely strong function of the dielectric thickness. Gate leakage
also depends on the voltage across the gate. For gate oxides thinner than 15-20 Å,
54
there is a nonzero probability that an electron in the gate will find itself on the
wrong side of the oxide, and it will get away through the channel. This effect of
carriers crossing a thin barrier is called tunneling and because of this leakage
current is flowing through the gate to the body.
Two physical mechanism for gate tunneling are:
1. Fowler- Nordheim tunneling: most important at high voltage and moderate

oxide thickness
2. Direct tunneling: most important at lower voltage with thin oxides and it is
the dominant leakage component.
Junction leakage current:
The p-n junction between diffusion and the substrate or well form diode as
shown in fig: The well-to-substrate junction is another diode. The substrate and
well are tied to GND or VDD to ensure these diodes do not become forward biased
in normal operation. However, reversed – biased diode still conducts a small
amount of current ID. Junction leakage occurs when a source or drain diffusion
region is at a different potential from the substrate.
SUBSTRATE TO DIFFUSION DIODES IN CMOS
Static power reduction techniques:

POWER GATING:
The easier way to reduce static current during sleep mode is to turn off the power
supply to the sleeping blocks. This technique is called power gating as shown in
fig:
55
POWER GATING
The logic block receives its power from a virtual VDD rail, VDDV. When the
block is active, the header switch transistors are ON and connecting V DDV to VDD.
When the block goes to sleep, the header switch turns OFF, allowing VDDV to
float and gradually sink towards zero (0). As this occurs, the output of the block
may take on voltage level in the forbidden or unknown state. The output
of isolations gates forces the output to a valid level during sleep so that they do
not cause problems in downstream logic.
Power gating introduces a number of design issues like header switch
requires careful sizing, it should add minimal delay to the circuit during active
operation and should have low leakage during sleep mode.
The transition between active and sleep modes takes some time and energy,
so power gating is only effective when a block is turned off long enough. When a
block is gated, the state must either saved or reset upon power-up. State retention
registers use a second power supply to maintain the state. The important registers
can be saved to memory so the entire block can be power-gated. The register must
be reloaded from memory when power is restored.
Power gating was originally proposed as multiple threshold
CMOS (MTCMOS) because it used low-vt transistors for logic and high-vt for
header and footer switch.
56
Power gating can be done externally with disable input to a voltage
regulator or internally with high–vt header and footer switches. External power
gating completely eliminates the leakage during sleep, nut it takes a long time and
energy because the power network may have 100s of nF decoupling capacitance to
discharge. On-chip power gating can use a PMOS header switch transistor or
NMOS footer switch transistors. NMOS transistors deliver more current per unit
width so they can be smaller.
On the other hand, if both internal and external power gating is used, it is
more consistent for both methods to cut off VDD.
HEADER AND FOOTER SWITCH

Power gating types:
1. Fine –grained power gating
2. Coarse-grained power gating
Fine-grained power gating: it is applied to individual logic gates, but placing this
in every cell has enormous area overhead.
Coarse-grained power gating: in this the switch is shared across an entire block.
57
INTERVIEW QUESTION RELATED TO PLACEMENT
interview questions related to placement stage:

1. What is placement and what are goals of placement?
2. What are the things to be checked before going to placement stage?
3. What are the inputs and outputs for placement stages?
4. After floorplan DEF is generated as input of placement stage what it
contains?
5. What are the challenges faced in placement stages?
6. What are reasons of congestion in the design?
7. What are the different methods of tackling congestion? Where do you find
major congestion in your design?
8. How do you tackle congestion in the core area? You have a region with pin
density high and the router is not able to route to these pins. How to resolve this
issue?
9. How to tackle cell density issue?
10. At what stage do you start looking at your timing reports in the PD flow?
11. How will you check your timing report? What are the things you will look
into it in case of timing violation?
12. What are the types of placement?
13. During placement stage how tool will know that your design is getting
congestion without doing the actual route?
14. What types of routing is carried out by the tool in placement stage global
routing or virtual routing?
15. What happens in global routing?
16. What are the switches in place_opt command and how they are useful?
17. What happens during congestion driven techniques? What are the care
should be taken using congestion driven option?
18. What are the types of congestion? And solution to resolve congestion?
19. How to modify physical constraints to reduce the congestion?
20. What happens if you over do keepout margin?
21. How many blockages are there in your design, how will you define the
blockages and keepout margins (halo)?
58
22. What do you mean by logic optimization techniques and give some
examples of logic optimization?
23. How do you used blockages techniques to effectively reduce congestion?
24. What is magnet placement?
25. What is multibit banking and why we use it and what are the advantages and
disadvantages of that?
26. What is fan-out? What is HFN synthesis why we do it in placement stage?
Can we consider clock in HFN during placement stages?
27. How global route will handle congested paths?
28. What do you means by scan chain reordering why we used it?
29. How to qualify the placement stage?
CTS (PART- I)
CLOCK TREE SYNTHESIS (CTS)
Clock is not propagated before CTS so after clock tree build in CTS stage
we consider hold timings and try to meet all hold violations.
After placement we have position of all standard cells and macros and in
placement we have ideal clock (for simplicity we assume that we are dealing with
a single clock for the whole design). At the placement optimization stage buffer
insertion and gate sizing and any other optimization techniques are used only for
data paths but in the clock path nothing we change.
CTS is the process of connecting the clocks to all clock pin of sequential
circuits by using inverters/buffers in order to balance the skew and to minimize the
insertion delay. All the clock pins are driven by a single clock source. Clock
balancing is important for meeting all the design constraints.
59
fig: before the clock tree is not build
Checklist before CTS:

 Before going to CTS it should meet the following requirements:
 The clock source are identified with the create_clock or
create_generated_clock commands.
 The placement of standard cells and optimization is done. {NOTE: use
check_legality –verbose command to verify that the placement is legalized. If cells
are not legalize the qor is not good and it might have long run time during CTS
stage}
 Power ground nets- pre-routed
 Congestion- acceptable
 Timing – acceptable
 Estimated max tran/cap – no violations
 High fan-out nets such as scan enable, reset are synthesized with buffers.
60
Inputs required for CTS:
 Placement def
 Target latency and skew if specify (SDC)
 Buffer or inverters for building the clock tree
 The source of clock and all the sinks where the clock is going to feed (all
sink pins).
 Clock tree DRC (max Tran, max cap, max fan-out, max no. of buffer levels)
 NDR (Nondefault routing) rules (because clock nets are more prone to cross-
talk effect)
 Routing metal layers used for clocks.
Output of CTS:
 CTS def
 Latency and skew report
 Clock structure report
 Timing Qor report
CTS target:
 Skew
 Insertion delay
CTS goal:
 Max Tran
 Max cap
 Max fan-out
 A buffer tree is built to balance the loads and minimize skew, there are
levels of buffer in the clock tree between the clock source and clock sinks.
Effect of CTS:
Clock buffers are added congestion may increase non-clock cells may have been
moved to less ideal locations can introduce timing and tran/cap violations.
61
fig: CTS structure after clock tree build
Checks after CTS:

 In latency report check is skew is minimum? And insertion delay is balanced
or not.
 In qor report check is timing (especially HOLD) met, if not why?
 In utilization report check Standard cell utilization is acceptable or not?
 Check global route congestion?
 Check placement legality of cells.
 Check whether the timing violations are related to the constrained paths or
not like not defining false paths, asynchronous paths, half-cycle paths, multi-cycle
paths in the design.
Clock Endpoints types:
When deriving the clock tree, the tool identifies two types of clock endpoints:
62
Sink pins (balancing pins):
Sink pins are the clock endpoints that are used for delay balancing. The tool assign
an insertion delay of zero to all sink pins and uses this delay during the delay
balancing.
During CTS, the tool uses sink pins in calculations and optimizations for both
design rule constraints for both design rule constraints and clock tree timing (skew
& insertion delay).
Sink pins are:
A clock pin on a sequential cell
A clock pin on a macro cell
Ignore pins:
These are also clock endpoints that are excluded from clock tree timing
calculations and optimizations. The tool uses ignore pins only in calculation and
optimizations for design rule constraints.
During CTS the tool isolate ignore pins from the clock tree by inserting a guide
buffer before the pin. Beyond the ignore pins the tool never performs skew or
insertion delay optimization but it does perform design rule fixing.
Ignore pins are:
Source pins of clock trees in the fanout of another clock
Non clock inputs pins of sequential cells
Output ports
Float pins: it is like stop pins but delay on the clock pin, macro internal delay.
Exclude pins: CTS ignores the targets and only fix the clock tree DRC (CTS
goals).
Nonstop pin: by these pin clock tree tracing the continuous against the default
behavior. Clock which are traversed through divider clock sequential elements
clock pins are considered as non-stop pins.
Why clock routes are given more priority than signal nets:
Clock is propagated after placement because the exact location of cells and
modules are needed for the clock propagation for the estimation of accurate delay,
63
skew and insertion delay. Clock is propagated before routing of signals nets and
clock is the only signal nets switches frequently which act as sources for dynamic
power dissipation.
CTS Optimization process:

 By buffer sizing
 Gate sizing
 Buffer relocation
 Level adjustment
 HFN synthesis
 Delay insertion
 Fix max transition
 Fix max capacitance
 Reduce disturbances to other cells as much as possible.
 Perform logical and placement optimization to all fix possible timing.
NOTE: MAINLY TRY TO IMPROVE SETUP SLACK IN
PREPLACEMENT, INPLACEMENT AND POSTPLACEMENT
OPTIMIZATION BEFORE CTS STAGES AND IN THESE STAGES
NEGLECTING THE HOLD SLACK.
IN POST PLACEMENT OPTIMIZATION AFTER CTS STAGES THE
HOLD SLACK IS IMPROVED. AS A RESULT OF CTS LOT OF BUFFERS
ARE ADDED.
The below topics will discuss in the CTS II part...If any topic left in CTS please
comment me then I will try to add in the second part of CTS
difference between clock buffer and normal buffer?

what is pulse width violation?
what are the clock tree optimization process in detail?
what is useful skew and clock pushing and pulling in detail?
what is the difference between HFNS and CTS?
what are clock tree types?
what are the NDR rules applied to the clock net?
crosstalk noise and delay and it affects on set up and hold timing.
64
how positive skew is good for setup and negative skew is good for hold.
how to reduce the crosstalk effect.
CTS (PART-II) (crosstalk and useful skew)

Kavita Sharma 3:38 AM Clock Tree Synthesis , CTS(PART-II) 14 Comments
crosstalk and useful skew
For crosstalk and useful skew, we have to know the basics of setup and hold
timing. Here I am going to write here some small concepts related to timing that
will be used for crosstalk and useful skew.
Setup time: The minimum time before the active edge of the clock, the input data
should be stable i.e. data should not be changed at this time.
Hold time: The minimum time after the active edge of the clock, the input data
should be stable i.e. data should not be changed at this time.
Capture edge: the edge of the clock at which data is captured by a captured flip
flop
Launch edge: the edge of the clock at which data is launched by a launch flip flop
65
Launch flip flop: this is a flop from where data is launched.
Captured flip flop: this is the flop on which data is captured.
For setup check

Setup slack check = (required time) min – (arrival time) max
Arrival time = Tlaunch + Tcq+ Tcomb
Required time = Tclk+Tcapture-Tsu
For setup time should not violate the required time should be greater than arrival
time.
For hold check

Hold slack check = (arrival time) min – (required time) max
Arrival time = Tlaunch + Tcq + Tcomb
Required time = Tcapture + Thold
For hold time should not violate the arrival time should be greater than the required
time.
#AT= arrival time
#RT = required time
66
Crosstalk noise: noise refers to undesired or unintentional effect between two or
more signals that are going to affect the proper functionality of the chip. It is
caused by capacitive coupling between neighboring signals on the die. In deep
submicron technologies, noise plays an important role in terms of functionality or
timing of device due to several reasons.
 Increasing the number of metal layers. For example, 28nm has 7 or 8 metal
layers and in 7nm it’s around 15 metal layers.
 Vertically dominant metal aspect ratio it means that in lower technology
wire are thin and tall but in higher technology the wire is wide and thin, thus a
greater the proportion of the sidewall capacitance which maps into wire to wire
capacitance between neighboring wires.
 Higher routing density due to finer geometry means more metal layers are
packed in close physical proximity.
 A large number of interacting devices and interconnect.
 Faster waveforms due to higher frequencies. Fast edge rates cause more
current spikes as well as greater coupling impact on the neighboring cells.
 Lower supply voltage, because the supply voltage is reduced it leaves a
small margin for noise.
 The switching activity on one net can affect on the coupled signal. The
effected signal is called the victim and affecting signals termed as aggressors.
There are two types of noise effect caused by crosstalk
Glitch: when one net is switching and another net is constant then switching signal
may cause spikes on the other net because of coupling capacitance (Cc) occur
between two nets this is called crosstalk noise.
67
In fig the positive glitch is induced by crosstalk from rising edge waveform at the
aggressor net. The magnitude of glitch depends on various factors.
 Coupling capacitance between aggressor and victim net: greater the coupling
capacitance, larger the magnitude of glitch.
 Slew (transition) of the aggressor net: if the transition is more so magnitude
of glitch also more. And we know the transition is more because of high output
drive strength.
 If Victim net grounded capacitance is small then the magnitude of glitch will
be large.
 If Victim net drive strength is small then the magnitude of glitch will be
large.
Types of glitches:
68
 Rise: when a victim net is low (constant 0) and the aggressor net is at a
rising edge.
 Fall: when a victim net is high (constant 1) and the aggressor net is at the
falling edge.
 Overshoot: when a victim net is high (constant 1)) and the aggressor net is
at a rising edge.
 Undershoot: when a victim net is low (constant 0) and the aggressor net is
at the falling edge.
Crosstalk delay: when both nets are switching or in transition state then switching
signal at the victim signal may have some delay or advancement in the transition
due to coupling capacitance (Cc) occur between two nets this is called crosstalk
delay.
Crosstalk delay depends on the switching direction of aggressor and victim net
because of this either transition is slower or faster of the victim net.
Types of crosstalk:
1. Positive crosstalk: the aggressor net has a rising transition at the same time
when the victim net has a falling transition. The aggressor net switching in the
opposite direction increases the delay for the victim. The positive crosstalk impacts
69
the driving cell, as well as the net, interconnect - the delay for both gets increased
because the charge required for the coupling capacitance Cc is more.
1. Negative crosstalk: the aggressor net is a rising transition at the same time
as the victim net. The aggressor's net switching in the same direction decrease
delay of the victim. The positive crosstalk impacts the driving cell, as well as the
net, interconnect - the delay for both gets decreased because the charge required for
the coupling capacitance Cc is less.
70
Crosstalk effect on timing analysis:
Consider crosstalk in data path:
If the aggressor transition in the same direction as the victim then victim transition
becomes fast because of this data will be arrive early means arrival time will be
less.
Setup = RT – AT(dec) this is good for setup #dec- decrease
Hold = AT(dec) – RT this is bad for hold
71
If the aggressor transition in a different direction as a victim then victim transition
becomes slow because of this data will be arrive late means arrival time will be
more.
Setup = RT – AT(inc) this is not good for setup

Hold = AT(inc) – RT this is good for hold #inc- increase
Consider crosstalk in the clock path:

If the aggressor transition in the same direction as the victim then victim transition
becomes fast because of this clock will be arrive early means Required time will be
less.
Setup = RT(dec) – AT this is not good for setup

Hold = AT – RT(dec) this is good for hold
If the aggressor transition in a different direction as a victim then victim transition

becomes slow because of this clock will be arrive late means the Required time
will be more.
Setup = RT(inc) – AT this is good for setup

Hold = AT – RT(inc) this is not good for hold
72
How to reduce the crosstalk:
 Wire spacing (NDR rules) by doing this we can reduce the coupling
capacitance between two nets.
 Increased the drive strength of victim net and decrease the drive strength of
aggressor net
 Jumping to higher layers (because higher layers have width is more)
 Insert buffer to split long nets
 Use multiple vias means less resistance then less RC delay
 Shielding: high-frequency noise is coupled to VSS or VDD since shielded
layers are connects to either VDD or VSS. The coupling capacitance remains
constant with VDD or VSS.
Useful skew:
When clock skew is intentionally add to meet the timing then we called it
useful skew.
In this fig the path from FF1 to FF2

Arrival time = 2ns + 1ns + 9ns = 12ns
Required time = 10 ns (clock period) + 2ns - 1ns = 11ns
Setup slack = required time – arrival time
= 11ns -12ns
= -1ns (setup violated)
= 11ns -8ns
= 3ns (setup not violated)
73
Let’s introduce some clock skew to path ff1 to ff2 to meet the timing. Here
we add 2ns extra skew in clock path but we have to make sure about the next path
timing violation.

= 13ns -12ns
=1ns (setup not violated)
Arrival time = 4ns + 1ns +5 ns = 10ns
= 11ns -10ns
= 1ns (setup not violated)
CTS (PART -III) CLOCK BUFFER AND MINIMUM PULSE WIDTH

VIOLATION
CLOCK BUFFER AND MINIMUM PULSE WIDTH VIOLATION
74
Transition (slew): A slew is defined as a rate of change. In STA analysis the
rising or falling waveforms are measured in terms of whether the transition(slew)
is fast or slow. Slew is typically measured in terms of transition time, i.e. the time
it takes for a signal to transition between two specific levels ( 1 to 0 or 0 to 1/ low
to high or high to low). Transition time is inverse of the slew rate- the larger the
transition time, the slower the slew and vice-versa.
In lib these transition is defined as:

#rising edge threshold:
Slew_lower_threshold_pct_rise : 20.0;
Slew_upper_threshold_pct_rise : 80.0;
#falling edge threshold:
Slew_upper_threshold_pct_fall : 80.0;
Slew_lower_threshold_pct_fall : 20.0;
These values are specified as a % of Vdd.
Rise time: The time required for a signal to transition from 20% of its (VDD)
maximum value to 80% of its maximum value.
Fall time: The time required for a signal to transition from 80% of its (VDD)
maximum value to 20% of its maximum value.
Propagation delay: The time required for the signal to change the inputs to its
state like 0 to 1 or 1 to 0.
75
Clock buffer and normal buffer
Clock net is a high fan-out net and most active signal in the design. Clock
buffer mainly used for clock distribution to make the clock tree. The main goal of
CTS to meet skew and insertion delay, for this we insert buffer in the clock path.
Now if the buffer has different rise and fall time it will affect the duty cycle with
this condition tool can do skew optimization but complicates the whole
optimization process as a tool has to deal with a clock with duty cycle at different
flop paths. If buffer delays are the same only thing the tool has to do balance the
delay by inserting buffer.
The clock buffers are designed with some special property like high drive
strength, equal rise and fall time, less delay and less delay variation with PVT and
OCV. Clock buffer has an equal rise and fall time. This prevents the duty cycle of
clock signal from changing when it passes through a chain of clock buffers.
A perfect clock tree is that gives minimum insertion delay and 50% duty
cycle for the clock. The clock can maintain the 50% duty cycle only if the rise and
the fall delays and transition of the tree cells are equal.
How to decide whether we need to used buffer or inverter for building a clock tree
in the clock tree synthesis stage?
This decision totally depends on the libraries which we are using. The main
factors which we consider to choose inverter or buffer are rise delay, fall delay,
drive strength and insertion delay (latency) of the cell. In most of the library files, a
buffer is the combination of two inverters so we can say that inverter will be
having lesser delay than buffer with the same drive strength. Also inverters having
more driving capacity than a buffer that’s why most of the libraries preferred
inverter over buffer for CTS.
Clock buffers sometimes have input and output pins on higher metal layers
much fewer vias are needed in the clock distribution root. Normal buffer has pins
on lower metal layers like metal1. Some lib also has clock buffers with input pins
on high metal layers and output pins on lower metal layers. Normally clock routing
is done into higher metal layers as compared to signal routing so to provide easier
76
access to clock pins from these layers clock buffer may have pins in higher metal
layers. And for normal buffer pins may be in lower metal layers.
Clock buffer are balanced i.e. rise and fall time almost the same. If these are
not equal then duty cycle distortion in the clock tree will occur and because of this
minimum pulse width violation comes into the picture. In clock buffer the size of
PMOS is greater than NMOS.
On the other hand normal buffer have not equal rise and fall time. In other
words they don’t need to have PMOS/NMOS size to 2:1 i.e. size of PMOS don’t
need to be bigger than the NMOS, because of this normal buffer is in a smaller size
as compared to clock buffer and clock buffer consumes more power.
The advantage of using an inverter-based tree is that it gives equal rise and
fall transition so due to that jitter (duty cycle jitter) get canceled out and we get
symmetrical high and low pulse width.
Buffer contain two inverters with unequal size in area and unequal drive
strength. First inverter is of small size having low drive strength and the second
buffer is of large size having high drive, strength are connected back to back as
shown in figure below.
So a load of these two inverters are unequal. The net length b/w two back to
back inverter is small so small wire capacitance will present here we can neglect
that but for the next stage the net length is more and because of net length the
capacitance is more by wire capacitance and next inverter input pin capacitance
and we get unequal rise and fall time so jitter will get added in clock tree with an
additional cost of more area than an inverter.
So mainly we are preferred inverter-based trees instead of the buffer based.
77
inverter based tree having equal rise and fall time
buffer based tree having unequal rise and fall time
78
Why PMOS is having bigger size than NMOS?
We know NMOS have majority charge carriers are electrons and PMOS
have majority charges carriers are holes. And we also know that electrons are very
much faster than holes.
Since electron mobility is greater than the hole mobility, so PMOS width
must be larger to compensate and make the pull-up network more stronger. If W/L
of PMOS is the same as NMOS the charging time of the output node would be
higher than the discharging time because discharging time is related to the
pulldown network.
So we make PMOS is of big size so that we can get equal rise and fall time.
Normal buffer are designed with W/L ratio such that sum of rise and fall time is
minimum.
Normally (R) PMOS > (R) NMOS
(R) PMOS =3*(R) NMOS

For making equal resistance of both transistor the size of PMOS is bigger than the
NMOS.
The duty cycle of clock:

It is the fraction of one period of the clock during which clock signal is in the high
(active) state. A period is the time it takes for a clock signal to complete an on-and-
off state. Duty cycle (D) is expressed in percentage (%).
79
Minimum Pulse width violation:
It is important for the clock signal to ensure the proper functionality of
sequential and combinational cells. Ensure that the width of the clock signal is
wide enough for the cell, internal operation i.e. minimum pulse width of the clock
has to be maintain for proper output otherwise, the cell will go into metastable state
and we will not get the correct output.
In other words clock pulse into the flop/latch must be wide enough so that it
does not interfere with the correct functionality of the cells.
Minimum pulse width violation checks are to ensure that the pulse width of
the clock signal for the high and low duration is more than the required value.
Basically this violation is based on what frequency of operation and

Technology we are working. If the frequency of design is 1 GHz then the time
period for each high and low pulse will be 0.5ns as if we consider the duty cycle is
50%.
Normally we saw that in most of design duty cycle always keep 50% for the
simplicity otherwise designer can face many issues like clock distortion and
minimum pulse width violation. If in our design is using half-cycle path means
data is launch at the positive edge and capturing at the negative edge and again
minimum pulse width as rising level and fall level will not be the same and if lots
of inverter and buffer will be in chain then it is possible that pulse can completely
vanish.
Normally for the clock path, we use clock buffer because they have equal
rise and fall delay of these buffer as compare to normal buffer having unequal
delay that’s why we have to check minimum pulse width.
Why the minimum pulse width violation occurs:
Due to unequal rise and fall delay of combinational cell. Let’s take an
example of buffer and clock signal having 1 GHz frequency (1ns period) is
entering into a buffer. So for example, if the rise delay is more than the fall delay
80
than the output of clock pulse width will have less width for high level than the
input clock pulse.
The difference b/w rise and fall time is: 0.007
High pulse: 0.5-0.006=0.494
Low pulse: 0.5+0.006=0.506
We can understand it with an example:-
Let’s there is a clock signal which is pass through more numbers of buffers with
different rise and fall delay time. We can calculate how it effects to the low or
high pulse of the clock signal. The width of clock signal is decreasing when buffer
delay is more than the pulse width.
As we know every buffer in the chain is taking more time to charge than to
discharge. When the clock signal is propagating through a long chain of buffers,
the pulse width is reduced as shown below.
81
We can understand by the calculation:-
High pulse width = half pulse width of clock signal– (rise delay –fall delay)
= 0.5 - (0.055-0.048) - (0.039-0.032) - (0.025-0.022) - (0.048-
0.043) - (0.058-0.054) = 0.474ns
Low Pulse width = half pulse width of clock signal + (rise delay –fall delay)
= 0.5 + (0.055–0.048) + (0.039–0.032) + (0.025–0.022) +
(0.048 – 0.043) + (0.058 – 0.054) = 0.526ns
Let’s required value of Min pulse width is 0.410ns, Uncertainty = 90ps

Then high pulse width = 0.474-0.090 = 0.384ns
The slack is 0.384-0.410= - 0.026ns
here we can see that we are getting min pulse width violation for high pulse as total
high pulse width is less than the required value.
If uncertainty we did not consider then violation will not occur in this scenario.
How to correct if violations are present in design:

We need to change the clock tree cells which have equal rise and fall delay
time or use those cells they have less difference between rise and fall delays.
What are the problems occurs if pulse width violation occurs:

 Sequential data might not be captured properly, and flop can go into a
metastable state.
 In some logic circuits the entire pulse could disappear and does not capture
any new data.
So it is required to ensure every circuit element always gets a clock pulse greater
than minimum pulse width required then only violation will not occur in the
design.
82
There are two types of minimum pulse width checks are performed:
Clock pulse width check at sequential devices
Clock pulse width check at combinational circuits
How to report:
report_timing –check_type pulse_width
83
How to define pulse width:
By liberty file (.lib):
By default all the registers in the design have a minimum pulse width defined in
.lib file as this is the format to convey the std cell requirement to the STA tool.
By convention min pulse width is defined for the clock signal and reset pins.
Command name: min_pulse_width
In SDC file (.sdc):

set_min_pulse_width -high 5 [get_clock clk1]
set_min_pulse_width -low 4 [get_clock clk1]
If high or low is not specified then constraints applied to both high and low pulses.
NOTE:
Balanced buffers means buffer having equal rise and fall time.
Unbalanced buffers means buffer having unequal rise & fall time
NEXT TOPICS
clock tree types

clock tree optimization process
difference between High fanout net synthesis and clock tree synthesis
Clock Tree routing Algorithms
Clock Tree Routing Algorithms

The main idea behind using these algorithms to minimize the skew. So how
we minimize the skew by using these algorithms. Distribute the clock signal in
such a way that the interconnections (routing wires) carrying the clock signal to the
other sub-blocks that are equal in length.
84
Several algorithms exist that are trying to achieve this goal (to minimize the
skew).
 H-Tree
 X-Tree
 Method of Mean and Median
 Geometric Matching Algorithms
 Zero skew clock routing
The first to four algorithm techniques are trying to make minimize the length and
the last one is to use the actual interconnect delay in making the skew is zero.
H-Tree
In this algorithm Clock routing takes place like the English letter H.
It is an easy approach that is based on the equalization of wire length.
In H tree-based approach the distance from the clock source points to each of the
clock sink points are always the same.
In H tree approached the tool trying to minimize skew by making
interconnection to subunits equal in length.
This type of algorithm used for the scenario where all the clock terminal
points are arranged in a symmetrical manner like as in gate array are arranged in
FPGAs.
In fig (a) all the terminal points are exactly 7 units from the reference point
P0 and hence skew is zero if we are not considering interconnect delays.
85
It can be generalized to 4i. When we are going to up terminals are increased
like 4, 16, and 64…and so on and regularly placed across the chip in H structure.
fig: H tree with 16 sink points

In this routing algorithm all the wires connected on the same metal layers,
we don’t need to move horizontal to vertical or vertical to horizontal on two layers.
H tree do not produce corner sharper than 900 and no clock terminals in the
H tree approach in close proximity like X tree.
86
Advantages:
Exact zero skew in terms of distance (here we are ignoring parasitic delay)
due to the symmetry of the H tree.
Typically used for very special structures like top-level clock level
distribution not for the entire clock then distributed to the different clock sinks.
Disadvantages:
Blockages can spoil the symmetry of the H tree because sometimes
blockages are present on the metal layers.
Non-uniform sink location and varying sink capacitance also complicate the
design of the H tree.
X-tree
If routing is not restricted to being rectilinear there is an alternative tree
structure with a smaller delay we can use. The X tree also ensures to skew should
be zero.
X-tree routing algorithm is similar to H-tree but the only difference is the
connections are not rectilinear in the X tree-based approach.
Although it is better than the H tree but this may cause crosstalk due to close
proximity of wires.
87
Like H tree this is also applicable for top-level tree and then feeding to the
next level tree.
Disadvantages:
Cross Talk due to adjacent wires
Clock Routing is not rectilinear
Both of the H Tree and X tree approach basically designed for a four array
tree structure. Each of the 4 nodes connected to the other 4 nodes in the next stages
so the number of terminal points or sink will grow as the power of 4 like 4,16 and
64 and so on.
These two methods basically did not consider the exact location of the clock
terminals it independently create the clock tree and produce a regular array of sink
locations across the surface of the chip.
But in other approaches, we did not ignore the exact location of the actual
clock terminal points so now the question is how we what these approaches will do
for exact location.
They look at where we required the clocks to be sent w.r.t location and try to build
the tree systematically and that tree does not look like the H tree and X tree.
Method of mean & median (MMM) algorithm:

Method of mean and median follows the strategy similar to the H-tree
algorithm, but it can handle sink location anywhere we want.
Step 1: It continuously partitions the set of terminals into two subsets of equal parts
(median) (As Fig.)
Step2: connects the center of mass of the whole set (module) to the center of
masses of the two partitioned subset (mean).
How the partitioning is done?

Let Lx denoted the list of clock points sorted accordingly to their x-coordinates
Let Px be the median in Lx
-assign points in the list to the left of Px and Lx
88
-assign the remaining points to Pr.
Next, we go for a horizontal partition where we partition a set of points into two
sets Pb &Pt
This process is repeated iteratively.
fig: MMM algorithm
This algorithm ignores the blockages and produces a non-rectilinear (not

regularly spaces) tree. Here some wire may also interact with each other.
It is a top-down approach as we are partitioning till each partition consist of a
single point.
89
Recursive geometric Matching Algorithm (RGM)
This is another binary tree-based routing algorithm in which clock routing is
achieved by constructing a binary tree using exclusive geometry matching.
Unlike the Method of mean & median (MMM) algorithm which is top-down and
this is bottom-up fashion. Here we used the concept of recursive matching.
To construct a clock tree by using recursive matching determines a

minimum cost geometric matching of n sink nodes.
The Center of each segment is called tapping point and the clock signal is provided
at this point then the signal will arrive at the two endpoints of the segment with
zero skew.
Find a set of n/2 line segments that match n endpoints and minimum total
length. After each matching step a balance or tapping point is found on each
matching segment to maintain zero skew to the related sinks. These set of n/2
tapping point then forms the input to the next matching step.
90
Fig:- RGM algorithm
This bottom-up approach gives a better result than a top-down approach.
91
STA - I STA,DTA,TIMING ARC, UNATENESS
What is Timing Analysis?
We checked whether the circuit meets all its timing requirements. The
timing analysis is used to refer to either of these two methods - static timing
analysis, or the timing simulation (dynamic timing analysis).
What is static Timing Analysis?
STA is the technique to verify the timing of a digital design. The STA
analysis is the static type and in this analysis of the design is carried out statically
and does not depend upon the data values being applied at the input pins.
The more important aspect of static timing analysis is that the entire design
(typically specified in hardware descriptive languages like VHDL or VERILOG) is
analyzed once and the required timing checks are performed for all possible timing
paths and scenarios related to the design. Thus, STA is a complete and exhaustive
method for verifying the timing of a design.
In STA whole design is divided into a set of timing paths having start and
endpoints and calculate the propagation delay for each path and check whether
there is any violation in the path and report it.
What is Dynamic Timing Analysis?

DTA is a simulation-based timing analysis where a stimulus is applied on
input signals, and resulting behavior is observed and verified using the Verilog test
bench, then time is advanced with new input, the stimulus applied, and the new
behavior is observed and verified and so on. It is an approach used to verify the
functionality as well as the timing of the design.
This analysis can only verify the portions of the design that get exercised by
stimulus (vectors). Verification through timing simulation is only as exhaustive as
the test vectors used. To simulate and verify all the timing paths and timing
conditions of a design with 10-100 million gates are very slow and the timing
cannot be verified completely. Thus, it is very difficult to do exhaustive
verification through simulation.
92
Why Static Timing Analysis?
 STA is a complete and exhaustive verification of all timing checks of a

design.
 STA provides a faster and simpler way of checking and analyzing all the
timing paths in a design for any timing violations.
 Day by day the complexity of ASIC design is increasing, which may contain
10 to 100 million gates, the STA has become a necessity to exhaustively verify the
timing of a design.
Design flow for Static timing Analysis:

In ASIC design, the static timing analysis can be performed at many stages
of the implementation. STA analysis first done at RTL level and at this stage more
important is to verify the functionality of the design not timing.
93
Once the design is synthesized from RTL to Gate – level, then STA analysis
is used for verifying the timing of the design. STA is also performing logic
optimization to identify the worst/critical timing paths. STA can be rerun after
logic optimization to see whether there are failing paths are still remaining that
need to be optimized or to identify the worst paths in the design.
At the start of physical design (PD) stages like floorplan and placement, the
clock is considered as an ideal which means the delay from clock to all the sink
pins of the flip flop is zero (i.e. clock is reaching to all the flip flop at the same
time). After placement, in the CTS stage clock tree is built and STA can be
performed to check the timing. During physical design, STA can be performed at
94
each and every stage to identify the worst paths.
In the logic design phase, interconnect is ideal since there is no physical

information related to the placement of Macros and standard cells. In this stage, to
estimate the length of interconnect we used WLM (wire load model) which
provides estimated RC interconnect length based on the fan-out of the cell.
In the physical design stage, we have the information about the placement of
macros and standard cells and these cells are connected by interconnect metal
traces. The parasitic RC of the metal affects the delay and power dissipation in the
design.
Before the routing is finalized this phase is called the Global route phase, the
implementation tool used to estimate the routing length and the routing estimates
are used to determine resistance and capacitance parasitic that are needed to
calculate the wire delays. Before the routing stages we are not focused on the effect
of coupling. After the detailed routing complete, actual RC values obtained from
the extraction tool (used to extract the detailed parasitic from the design) and the
effect of coupling also analyzed.
Limitations of STA:
1. If all the flip-flops are in reset mode into their required values after applying
synchronous or asynchronous rest this condition cannot be checked using static
timing analysis.
2. STA is dealing with only known values like logic-0 and logic 1 (or we can
say low and high). If any unknown value X in the design comes then this value will
not check by using STA.
3. Ensure that correct clock synchronizer is present whenever there are
asynchronous clock domain crossing is present in the design otherwise STA does
not check if the correct clock synchronizer is being used.
4. If the design having digital and analog blocks then the interface between
these two blocks will not handle by STA because STA does not deal with analog
blocks. Some verification methodologies are used to ensure the connectivity
between these kinds of blocks.
5. STA verifies the all the timing paths included that timing path also which
does not meet all the requirements and even though logic may never be able to
95
propagate through the path these timing paths are false paths. So we have to give
proper timing constraints for false path and multicycle paths then only STA qor
result will be better.
Standard cells:
Most of the complex functionality in the chip is designed using basic blocks
of AND, OR, NAND, NOR AOI, OAI cells and flip flops. These blocks are
predesigned and called standard cells.
The functionality and timing of these standard cells are pre-characterized

and available to the designer in the form of standard cell libraries and use these
blocks according to the requirement.
Propagation delay: https://www.physicaldesign4u.com/2020/03/cts-part-iii-

clock-buffer-and-minimum.html
Transition (slew): https://www.physicaldesign4u.com/2020/03/cts-part-iii-

clock-buffer-and-minimum.html
Timing arcs and unateness:
Timing arcs:
The timing arc means a path from each input to each output of the cell.
Every combinational logic cell has multiple timing arcs. Basically, it represents
how much time one input takes to reach up to output (eg. A to Y and B to Y). Like
if we see AND, OR, NAND, and NOR cell as shown in the figure. In sequential
cells such as flip flop have timing arcs from clock to the outputs and clock to data
input.
Timing arcs can be further divided into two categories – cell arcs and net
arcs.
Cell arcs: This arc is between an input pin and an output pin of a cell i.e. source
pin is an input pin of a cell and sink pin is the output pin of the same cell. Cell arcs
can be further divided into sequential and combinational arcs.
96
Combinational arcs are between an input and output pin of a combinational cell
or block.
Sequential arcs are between the clock pin and either input or output pin. Setup and
hold timing arcs are between the input data pin and clock pin of flip flop and are
termed as timing check arcs as they constrain a form of the timing relationship
between a set of signals. Sequential delay arc is between clock pin and output Q
pin of FF. An example of a sequential delay arc is clk to q is called delay arc and
clk to D input is called timing check arcs in sequential circuits
Net arcs: These arcs are between driver (cell) pin of a net and load pin of a net i.e.
the source pin is output pin of one cell and the sink pin is input pin of another cell.
Net arcs are always a delay timing arcs.
Unateness:
Each timing arcs has a timing sense that means how the output changes for
different types of transitions on input this is called unateness. Unateness is
important for timing as it specifies how the output is responding for the particular
input and how much time it will take.
Timing arc unateness are of three types:
Positive unate: If a rising transition on the input gives the output to rise and
falling transition on the input gives the output to fall i.e. there is no change in
transition of input and output then that timing arc is called positive unate.
Example: Buffer, AND, OR gate
97
Buffer: There are two-timing arc in buffer. First is rising input A to Y which gives
rising output Y and second is for falling input A to Y that gives falling output Y
i.e. what type of edge is giving to the input we got same at the output (output is
constant).
AND gate: There are four timing arcs.
Input A to output Y for rising edge
Input B to output Y for rising edge
Input A to output Y for falling edge
Input B to output Y for falling edge
The truth table of AND gate is shown in figure.
98
Check for rising edge, from the truth table we can see that
If A = 0, and B (is going from 0 to 1): Y is constant at 0 (no change)
If A = 1, and B (is going from 0 to 1): Y is changed from 0 to 1
If B = 0, and A (is going from 0 to 1): Y is constant at 0 (no change)
If B = 1, and A (is going from 0 to 1): Y is changed from 0 to 1.
Check for the falling edge, from the truth table we can see that
99
Negative unate: If a rising transition on the input gives the output to fall and
falling transition on the input gives the output to rise i.e. there is a change in
transition of input and output then that timing arc is called negative unate.
Example: inverter, NAND, NOR gate
Inverter: There are two-timing arc in inverter. First is rising input A to Y which
gives falling output Y and second is for falling input A to Y that gives rising output
Y i.e. what type edge is giving to the input we got opposite of that at the output
(output is inverted).
NAND gate: There are four timing arcs.
100
The truth table of NAND gate is shown in figure.
Check for the rising edge, from the truth table we can see that
101
Non-unate: The output transition cannot be determined by not only the direction
of an input but also depends on the state of the other inputs.
102
Example: XOR, XNOR gate
XOR gate: There are four timing arcs.
The truth table of XOR gate is shown in figure
Check for the rising edge, from the truth table we can see that
If A = 0, and B (is going from 0 to 1): Y is changed from 0 to 1.
103
From where we can get timing arcs:

We know that all the cell information related to timing functionality is
present in the library (.lib). The cell is a function of the input to output. For all the
combinations of input and outputs and rising and falling conditions of inputs are
104
defined in the .lib file. In case you have to read SDF, this delay is picked from SDF
(Standard Delay Format) file. The net arcs taken from the parasitic values that are
given in SPEF (Standard Parasitic Exchange Format) file, or SDF.
STA-II TRANSMISSION GATE,D LATCH, DFF,SETUP &HOLD
Before going to understand the setup and hold timing we should have to
know about D latch and D FF and D latch and D FF is made up of transmission
gate and inverters. So in this post, I will cover transmission gate, D LATCH, D FF,
setup up and hold time.
Transmission Gate:
The transmission gate is consists of a parallel connection of PMOS &
NMOS.
Two gate voltage of PMOS and NMOS are the complement of each other.
The effective resistance of the transmission gate is almost constant because of the
parallel connection of PMOS and NMOS.
It is a bidirectional circuit and it carries the current in either direction.
Transmission Gate
Truth table
105
NOTE: PMOS is on when gate input is 0.
NMOS is on when gate input is 1.
Working: When control is high (1) from the truth table we can see both transistors
are ON at the same time and whatever is applied to the input we got at the output.
When control is low (0) from the truth table we can see both transistors are OFF at
the same time and whatever is applied to the input is not reached to the output so
we got high impedance (Z) at the output.
D latch:
The latch is a level-sensitive device and it is transparent when the clock is
high if it is a positive level-sensitive latch and when the clock is low it is called
negative level-sensitive latch.
In latch the output (Q) is dependent only on the level of the clock (Clk). In this
latch D is control the output (Q).
106
Positive triggered D latch waveform
Positive D latch using transmission Gate:

It consists of two transmission gates and two inverters.
When Clk = high (1) T1 is ON and T2 is OFF, so output (Q) directly follows the
input (D).
107
When Clk = low (0) T1 is OFF and T2 is ON, now new data entering into the latch
is stopped and we get only previously-stored data at the output.
Negative D latch using transmission Gate:

It is also consist of two transmission gate and two inverters. It is working in an
exactly opposite manner of the positive level-sensitive D latch.
108
Negative triggered D latch waveform
When Clk = low (0) T1 is ON and T2 is OFF, so output (Q) directly follows the
input (D).
109
When Clk = high (1) T1 is OFF and T2 is ON, now new data entering into the latch
is stopped and we get only previously-stored data at the output.
D Flip flop:
A D flip flop is an edge-triggered device which means the output (Q)
follows the input (D) only at the active edge (for positive rising edge) of the clock
(for the positive edge-triggered) and retain the same value until the next rising edge
i.e. output does not change between two rising edges, it should be changed only at
the rising edge.
110
Positive edge-triggered D FF waveform
Negative edge-triggered D FF waveform
111
D Flip flop using a transmission gate:
It is a combination of negative level-sensitive latch and positive level-
sensitive latch that giving an edge-sensitive device. Data is change only at the
active edge of the clock.
Positive edge-triggered D FF using Transmission gate
when Clk= LOW (0) T1, T4 is ON and T2, T3 is OFF.

New data (D) is continuously entering through T1 and getting
stored till the edge of T2 (path is D-1-2-3-4 and at node 4 it stops) it cannot pass
through T2 and T3 transmission gate because they are off. This operation for the
master latch. For slave latch it keeps retaining the previously stored value of output
(Q) (path is 5-6-7-8-5).
112
when Clk is low
When Clk= HIGH (1) T2, T3 are ON and T1, T4 are OFF.
Now master latch did not allow new data to enter into the device because
T1 is OFF and the previously stored data at point 4 is going through the path 4-1-2-
5-6-Q and this same data is reflected at the output and this does not change until
the next rising edge and this same data is also going to the transmission gate T4
(path is 4-1-2-5-6-7-8 and stops because transmission gate T4 is OFF).
when Clk is high
113
Again if Clk is low the master latching circuit is enabled and there is no
change in the output. Any changes in input is reflected at node 4 which is reflected
at the output at the next positive edge of the clock.
So we can say that if D changes, the changes would reflect only at node 4
when the clock is low and it will appear at the output only when the Clk is high.
Setup time:
The minimum time for which the data (D) should be stable at the input
before the active edge of clock arrival.
The data is launched from FF1 and captured at the FF2 at the next clock
edge. The launched data should be present at the D pin of capture flop at least
setup time before the next active edge of the clock arrives.
114
So total time to propagate the data from launch to capture flop = one time
period (T) –Tsu
This is the required time for the data travel from launch to capture flop.
And how much time it does take data to arrive at the D pin of capture flop is =Tcq
(clock to Q delay of FF1) + Tcomb (combinational delay). This is called arrival
time.
So condition for setup timing to not violate
RT > AT
Slack = RT –AT
Slack =+ve (no violation)
=-ve (setup violation)
115
Now, what is Tsu (setup time)? How do we determine and how much should
be the setup time of flip flop and from where we can find this.
Now, what is Tsu (setup time)? How do we determine and how much should
be the setup time of flip flop and from where we can find this.
When the CLK is low the input (D) is following the path D-1-2-3-4 and it will take
some time to reach at the node 4 that time we will call setup time.
What happens if data (D) is not stable for the setup time before the next active
edge of the clock arrives?
So now when the clock turns high the data which has to be launched should
be present at node 4 but since the data is slow it would not get enough time to
travel till node 4 and the data (D) is still be present somewhere between node 2 and
3 (let's say) so we don’t know which data will be launched at the rising edge and
output will be indeterminate because data is not reached at node 4 yet i.e. data is
late.
116
If skew is present in the design:
The required time (RT) is = T – Tsu + Tskew
If there is a positive skew it means we are giving more time to data to arrive
at D pin of capture FF. so positive skew is good for setup but bad for hold.
Tskew is positive or negative depending on the capture clock it comes fast or

slow than the launch clock.
Positive skew: if the capture clock comes late than the launch clock.
Negative skew: if the capture clock comes early than the launch clock.
for more information about skew follow the below link

https://www.physicaldesign4u.com/2020/04/skew-latency-
uncertaintyjitter.html
Hold time:
The minimum time for which the data (D) should be stable at the input after the
active edge of clock has arrived.
117
Data is launched from FF1 at 0sec and this data should be captured at FFF2
after one time period (T). The hold time is to make sure that the current data (n)
which is being captured at the FF2 should not be corrupted by the next data (n+1)
which has been launched from the launch flop at the same edge.
Why do we check to hold at the same edge itself?

Because this same edge is going to both the flip flops if at this edge The
capturing flop FF2 is capturing the current data (n) at this same edge itself the
launch flop FF1 is launching the next data (n+1) so the whole check is to make
sure that this new data (n+1) which is being launched at the same edge from the
launch flop FF1 should not come so fast that it corrupts the current data (n) which
is being captured at the capture flop at the same edge.
118
The arrival time of this (n+1)th data should at least be greater
than the Thold time of capture flop FF2. Basically this current data (n) should be
held for enough time for it to be captured reliably, that enough time is called hold
time.
Arrival time Required time
Hold slack = AT – RT
If hold slack = +ve (No violation)
= -ve (hold violation)
If arrival time is less that means data coming is very fast (or early) so hold
violation occurs.
If positive skew is present:

It means the next data (n+1) will be launched early from the launch flop FF1
and till now the capture clock is not reached to the capture flop FF2 so the data (n)
also did not have to capture yet, but this nth data has to be stable at the capture
clock for Tskew+ Thold time otherwise data n will be corrupted. So we can say
+ve skew is bad for hold.
Positive skew
119
If negative skew is present:
it means the data (n) is being captured at captured flop FF2 early but by the time
(n+1) data will not be getting launched from the launched flop FF1, so the data (n)
got enough time to be held at the input for it to be captured reliability but till now
the launch flop did not launch (n+1) data. So negative skew is good for hold.
Negative skew
so the condition for hold time with skew,
Hold time (Thold) of flip flop:

For working of this structure please read DFF using the transmission gate.
120
The Clock is turning from low to high the T1 and T4 transmission gate is
turned off and stop entering the new data (D) into the device but transmission gate
T1 does not turn off the immediately it will take some time to become OFF
because the clock (Clk) is passes through many buffer and inverter then reached to
the transmission gate T1 so we can say that transmission gate also take some time
to get turned OFF so during the time when it is turning OFF from ON during that
time new data should not come and disturbed the current data which is being
captured.
Basically new data should not enter into the devices during that time also so
the hold time will be the time is to take the transmission gate to turn off completely
after the clock edge has arrived.
if there is any combination delay in the data path then the hold requirement
will change.
121
STA -III Global setup and hold time. Can setup and hold time of FF be negative??
Global setup and hold time:
Sometime we saw setup and hold time is negative in library file what is the
significance of that? Can setup and hold time be negative?
for understanding the working of D FF using transmission gate please read this
post
https://www.physicaldesign4u.com/2020/04/sta-ii-transmission-gated-latch-
dffsetup.html
Data path delay (Tcomb) = The Time taken for the data to reach at the input of
transmission gate or D pin of FF.
Clock path delay (Tclkint) = The Time taken for the clock
to reach from clock generation point (clock) to the clk pin (clk) of transmission
gate or Clk pin of FF.
Both setup and hold time are measured with respect to the active edge of
the clock. For a pure flip flop (containing no extra gate delays) setup and hold time
will always be a positive number and fixed once the chip is fabricated it can’t be
changed. But if we put some glue logic around the FF data path and clock path
then setup and hold requirement will be changed and we called this global setup
time (Tsunew) and global hold time (Tholdnew) and these components are
available as a part of the standard cell library.
Setup and hold time can be negative also depending on where we measure
the setup and hold. If we want to measure setup and hold time at the component
level then it may be negative.
First scenario: If some combinational logic is present between Data pin and
transmission gate T1 (data path) or D input of flip flop.
122
For setup:
Some delay will be induced due to combo logic now the data will take
more time to reach from D pin of FF to node 4. So we can say that setup time will
be increased because of combo logic.
Set up time is the time, when data reach to node 4 and now combinational logic is
present so the setup Time requirement will be increased.
Let us take delay of comb logic (Tcomb) is 1ns and the original setup time (Tsu) is
3ns.
Then new setup time (Tsunew) = Tsu + Tcomb
= 3ns +1 ns =4ns
Now data will take 4ns to reach at node 4.
NOTE: IF COMB DELAY IS PRESENT IN THE DATA PATH THEN THE

SETUP REQUIREMENT IS INCREASED.
For hold:
There is some comb logic sitting between the Data input & transmission
gate pin D so now what will happen during this time, the transmission gate T1 is
turning OFF till clock is not high and the new data comes at the input of the FF it
will have to travel through the comb logic delay before it goes inside and disturbed
our original data which is being captured so can we say that if the delay is added in
the data path our hold time will be decreased and new data have to wait until the
123
new active edge of the clock will not arrive at the transmission gate T1.
Let us take original hold time of FF (Thold) is 2ns it means the clock takes
2ns to reach from the generation point (clock) to the clock pin (clk) of FF or
transmission gate to turn off so during that time new data should not come into the
device and data should be stable for 2ns at the input after the rising edge of the
clock has arrived.
Then New hold time (Tholdnew) = Thold - Tcomb
= 2ns - 1 ns =1ns (positive hold time)
If Tcomb = 2ns
Tholdnew = Thold – Tcomb
= 2ns – 2ns = 0 ns (zero hold time)
If the comb logic is equal to internal clock delay then our hold time will be zero if
hold time is zero it means no need to hold the data after the clock edge has arrived.
If Tcomb = 3ns
= 2ns – 3ns = -1 ns (negative hold time)
Note: IF COMB DELAY IS PRESENT IN THE DATA PATH THEN THE

HOLD REQUIREMENT IS DECREASED.
124
Summary:
If comb logic present between data pin to the D pin of flop then,
Tsunew = Tsu + Tcomb

If Thold > Tcomb => Positive hold time

Thold < Tcomb => Negative hold time
Thold = Tcomb => Zero hold time
Second scenario: if clock delay is present in clock path (the clock is passed
through some buffer and inverter to reach at the T1)
125
For setup:
It means the active edge of the clock will take more time to reach at the
transmission gate from the clock generation point (clock) this is called internal
clock delay. So what will happen if this delay is present?
So we know that as soon as the active edge of clock has arrived, the
transmission gate T1 is turned off and it stops the data entering into the device. We
also know that clock is not reaching to the T1 instantly because of the internal
clock delay even though the active edge of clock arrived at the generation point
and the transmission gate T1 is still ON and the data still have some time to enter
into the device and reached at node 4, then we can say that if internal clock delay is
present in the design, the setup requirement will be decreased because the data will
get some extra time to enter into the device and stabilize at node 4 even though the
clock has arrived at the reference point of the flip flop.
Let us take internal clock delay (Tclkint) is 2ns and Tsu is 3ns.
Then new setup time (Tsunew) = Tsu - Tclkint
= 3ns -2ns
= 1ns (positive setup time)
126
if internal clock delay (Tclkint) is 3ns,
= 3ns - 3ns
= 0ns (zero setup time)
if internal clock delay (Tclkint) is 4ns

= 3ns -4ns
= -1ns (negative setup time)
It means the data can reach 1ns later than the active edge of clock because
internally clock will take more time to reach the transmission gate T1 and this data
even though it arrives 1 ns late it will still be able to make it to node 4 and get
stabilized.
For hold:
So we know that as soon as the active edge of clock has arrived, the
transmission gate T1 is turned off and it stops the data entering into the device and
data have to wait at the D pin of transmission gate T1 till the clock is going to low.
Then we can say that if internal clock delay is present in the design the hold
requirement will be increased because the data will have to wait at the D pin of T1
to enter into the device
let take if internal clock delay (Tclkint) is 2ns and hold time (Thold) of FF is
2ns Then
New hold time (Tholdnew ) = Thold + Tclkint
= 2ns +2 ns =4ns
127
Summary:
Tholdnew = Thold + Tclkint

Tsunew = Tsu - Tclkint
If Tsu > Tclkint => Positive setup time

Tsu < Tclkint => Negative setup time
Tsu = Tclkint => Zero setup time
Third scenario: if some comb logic is sitting b/w Data pin and the D pin of FF
and clock internal delay is present b/w clock generation point to the clk pin of
FF.
128
New setup time (Tsunew) = Tsu +Tcomb - Tclkint
New hold time (Tholdnew) = Thold –Tcomb + Tclkint
more precise value:
when the chip is fabricating all the components on the chip have different
propagation delay because of the PVT (process - voltage - temperature) conditions
so Every cell has three types of delay max, min, and typical delay.
for setup we consider worst (max) data path and min clock path so,
New setup time (Tsunew) = Tsu +Tcomb(max) - Tclkint(min)
For hold we consider worst (max) clock path and max clock path so,
New hold time (Tholdnew) = Thold – Tcomb(min) + Tclkint(max)
Setup and hold values can not be negative simultaneously but individually they
may be negative. so for the setup and hold checks to be consistent, the sum of
129
setup and hold values should be positive.
from where got the setup and hold values: library file
so the next post is related to how the setup and hold are defined for rise and fall
constraints in the library file (.lib).
How setup and hold checks are defined in the library
How the setup and hold checks are defined in the library?
Can both setup and hold be negative?
Sequential cells timing arcs:
Sequential cells timing arcs as shown in the figure.
For synchronous inputs such as D, SI, SE there are following timing arcs
 Setup check arc (rising and falling)

 Hold check arc (rising and falling)
130
For asynchronous inputs such as CDN there are following timing arcs
 Recovery check arc

 Removal check arc
Synchronous checks: setup and hold
The setup and hold timing checks are needed to check the proper
propagation of data through the sequential circuits. These timing checks are used to
verify the data input (D) is unambiguous at the active edge of the clock so the
proper data is latched at the active edge. These timing checks verify the data input
is stable before and after the clock.
131
The setup and hold timing constraints for the synchronous pin of a
sequential cell described in terms of two-dimensional table as shown in the figure.
The setup and hold timing constraints on the input pin D with respect to the rising
(active) edge of clock CK of the FF.
The two-dimensional models are in terms of data and clock transition time at
the constrained_pin (Data pin D) and the related_pin (clock pin CK) respectively.
Index 1 is showing data transition (D) at the rising edge and index 2 is showing the
clock transition (CK) at the rising edge. Thus with data D pin rise transition of
0.4ns and clock CK pin rise transition of 0.84ns so the setup constraint for the
rising edge of the D pin is 0.112ns (this value is read from the rise_constraint
table).
For the falling edge of the D pin, the setup constraint will tool from the
fall_constraint table of setup.
Rise_constraint and fall_constraint tables of the setup constraint refer to the

constrained_pin. The clock transition used is determined by the timing_type which
specifies whether the cell is rising edge-triggered or falling edge-triggered.
132
Negative values in setup and hold checks:
In the last post I mentioned the reason for negative values in the library. We
can consider a flip flop like a black box and we don't know what is inside the flop.
Just we can assume sometimes data reach early at the data pin than the clock
reaches at clock pin or sometimes clock reaches early at the clock pin than the data
at data pin.
We notice that some of the hold values in the fig are negative. This normally
happens when the path from the pin of the Flop to the internal latch point for the
data is longer than the corresponding path for the clock. Thus a negative hold
timing check implies that the data pin (D) of the flop can change before the clock
pin (CK) and still meet hold time check.
The setup values of a FF can also be negative and this means that at the pins
of the flip flop, the data can change after the clock pin and still, meet the setup
check.
Can both setup and hold be negative? This ques is related to the library file.
No, For the setup and hold checks to be consistent and proper, the sum of
setup and hold values should be positive. If there is one of the setup (or hold)
check contains negative values –then corresponding hold (or setup) should be
sufficiently positive so that the setup plus hold value is a positive quantity. so the
data will proper in the setup and hold window.
The setup plus hold time is the width of the region where the data signal is
required to be stable.
133
For flip flops, it is helpful to have a negative hold time on scan data input
pins. This gives the flexibility in terms of clock skew and can eliminate the need
for almost all the buffer insertion for fixing hold violations in scan mode (scan
mode is the one in which flip flop are tied serially forming a scan chain – output of
flip flop is typically connected to the scan data input pin of the next flip flop in
series, these connections are used for testability)
134
Time Borrowing concept in STA
Time borrowing in VLSI

Difference between latches and flip flop:
Flip flop (Register) Latch
Edge triggered and synchronous device. Level sensitive and asynchronous

device.
Flip Flops are of two types depending Latches are two types depending upon
upon the clock signal polarity i.e. enable signal polarity i.e. positive level
positive edge-triggered and negative latch and negative level latch
edge triggered.
Flip Flop are made up of latches (-ve Latches are made up of logic gates.
latch and +ve latch)
FF changes its state only when the Latches are transparent that means when
active edge of clock (either positive or they are enabled, output changes
negative depending on requirement) is immediately if input changes.
triggered.
Flip Flop based design are complex to Latch based designs are simpler to
design because they have clock signal design (because not based not edge of
and in FF to maintain the clock skew is clock) and have a small die size.
big problem.
Latch based design is more variation
tolerant and we get a better yield than
the flip flop based design
The operation of FF based design is The operation of latches is faster

slower as compare to latches based because latches don’t wait for the clock
design due to clock signal. edges.
135
The power requirement is more because Power requirement is less
of the clock signal.
There are two types of storage sequential circuits are flip flop (registers) and
latches.
Flip flop based system:
Data launches on one rising edge of the clock and must set up before the
next rising edge of the clock.
If the combinational logic delay present between two Flip Flops is very large
then arrives late at the capture flip flop and data may goes into metastability.
If the combinational logic delay present between two FF is less i.e. it arrives
early, so time is wasted in this case.
Latch based system
Data can pass through latch while the latch is transparent. If there is a large
delay comb logic is present in the design then it can borrow the time from the next
cycle. So latch based designs are used, where we want the high performance of the
design.
Time borrowing is applies only to latch based design and cycle stealing for flip
flop based design.
Time borrowing concept in latches:

It is the property of latch, a path ending at a latch can borrow time from the
next path in the pipeline such that the overall time of two paths remains the same.
STA applies a concept of time borrowing for latch based designs.
136
Whatever data launched from Flip Flop1 at ons it should be reached to Flip
Flop2 at next active edge i.e. 10ns (ideal case when setup hold time and skew and
clock delay all are zero). If data reaches at Flip Flop2 after 10ns will not be able to
capture the correct data. Similarly if we launched our data from Flip Flop2 at 10 ns
then it should be reached Flip Flop3 at 20ns (next active clock edge) without any
violation.
137
What if the combinational logic delay is large (greater than time period 10ns):
If comb delay is greater than 10 then Flip Flop2 did not capture the correct
data because Data is reaching at Flip Flop2 after 12ns and 0ns to 12ns Flip Flop
will check only one positive edge so here this is the problem of setup violation so
to avoid this we can use latch between positive levels.
What is latch: whenever the clock is enabled/high the latch will work for a half
period of time depending upon the polarity.
Now if the Flip Flop2 is replaced with a latch where the gate of the latch is
driven by the same clock line.
What is the benefit of using this latch?
But if data reaches at latch after 10ns then what will happen?
If it reaches to the latch input before 10ns, this data waits at the latch’s D
pin. This is a similar case as if we are using FF in place of latch.
Let say data is coming after the 12ns so there is a problem if we are using
flip flop but if we are using latch there is no problem because the latch is
transparent from 10 to 15, and no problem to receive the data so we can receive our
data by borrowing the time from the next cycle, this means latch provides the
advantage over Flip flop of 2ns time. The maximum time we can borrow is 5ns
138
from the latch. But this time is reduced for the latch to flip flop3 so Flip Flop3
must get the data at 20ns so the latch must send the data before 20ns.
So we are borrowing the time from latch because it is open for 10 to 15ns
and also this time is reduced for the next logic.
Now let’s look at the path from Latch to FF3:
The data comes out of Latch at 12ns will sample at Flip Flop3 at the time
20ns. Thus the path from latch2 to Flip Flop3 gets only 8ns time.
In the circuit which had all the flops, the second path had 10ns time however
in this circuit it gets 2ns less time.
So it should be noted that the path from Flip Flop1 to Flip Flop3 still there is
total 20ns which is not changed only distribution of time is changed.
So we can say that in flip flop based design the combinational delay should
not longer than the clock period except for some exceptions like multicycle path
and false path in static timing analysis.
139
In latch based design has larger combinational delay can be compensated by
a shorter combination delay path in logic states. So for high performance of circuit
we mostly used latch based design.
In simple term-time borrowing is the technique in which a longer path takes

to borrow the time from the next path of subsequent logic.
Time borrowing typically affects the setup since time borrowing is slowing
the data arrival time i.e. data arrival time is more. It does not affect the hold time
because in hold time data arrival time is more.
Time borrowing examples:
Example1: There are two flip flops and 2 combinational logics arranged
between flip flops. The clock period is 5ns.
140
Setup violation present in this scenario, because data coming to FF1
after 7ns and clock period is only 5ns. If we increase the clock period more
than 7ns then the timing can be met. But increasing the clock period affects
the performance of the design.
This timing violation issue can be resolved with the same clock
period 5ns using the timing borrowing concept. We can replace the flip flop1
with the positive sensitive latch. Latch opens at the same time like flip flop
at 0ns. This latch is open from 0ns to 2.5ns.
141
Ideally data from path1 should have arrived at ons but it is not reached. Path
1 borrowed 2ns from the latch if the latch is not present there would have been
timing violation at ons. Now o.5ns used by path2.
So path1 have an extra 2.5ns time to borrow from the next cycle. Since latch
closes at 2.5ns there is no timing violation for path1 because path1 arrives 0.5ns
before the latch is closed.
Output of latch is immediately available for comb path2. From the fig path2
start where path1 left off. Path 2 could have used up to 3ns (0.5ns from previous
stage + 2.5ns of half clock period of current cycle) but the given delay for path2 is
only 1ns.
Valid data is available for capture flip flop2 at 3ns since the rising edge of
capture Flip flop2 happens at 5ns. FLIP FLOP2 has extra 2ns this is +ve slack.
Time borrowing from the next cycle and using slack from the previous
cycle. Timing is met without changing the clock, but just by replacing flip flop to
latch. Time borrowing is used only for latch based designs.
Example 2: There are four positive level-sensitive latches and 4 combination

logic between latches. Latch1 and latch3 are controlled by CK1, latch2, and
latch4 are controlled by CK2. The relationship between ck1 and ck2 is shown
in fig.
142
For simplicity all four latches assumed to have 0ns propagation delay and 0ns
setup and hold time.
case1: Path1 delay = 6ns
Path2 delay = 1ns
Path3 delay = 8ns
Path4 delay = 1ns
Latch1 is opened at point A of ck1 and Latch2 is opened at point B of ck2.

Data from latch1 is available at latch2 at 6ns. Since latch2 is opened from 5ns to
10ns then path1 can able to borrow 1ns from the path2.
143
Similarly path3 has a delay of 8ns, borrowed 3ns from the next stage. In
both cases where we borrowed the time from the next stage is not used fully
because path2 and path4 has delay less than 5ns the half of the clock period.
Case2: Path1 delay = 6ns
Path2 delay = 1ns
Path3 delay = 2ns
Path4 delay = 1ns
Path3 has a delay of 2ns which is less than the 5ns (half clock period) it means NO
TIME BORROWING is required from the succeeding stage. This is the same case
as if there had been a flip flop instead of a latch.
NOTE: flip flop in place of latch4 stops time borrowing that we have seen from
latch1 to latch3 at launch edge of flip flop.
Case3:Path1 delay = 6ns
Path2 delay = 7ns

144
Path3 delay = 5ns
Path4 delay = 3ns
In this case there is 100% time borrowing because path1, path2, and path3 delays
are greater or equal than the 5ns (half clock period). Each stage is automatically
borrow the time from the succeeding stages until we get the output.
If the circuit made up of 4 flip flop instead of 4 latches, the 4 stage flip flop based
design would consume a total of 28ns time but if we are using latch the delay has
been reduced to 20ns. This is the advantage of latches over flip flops.
Time stealing and difference between Time borrowing and Time stealing
Time Stealing and Time Borrowing
Time borrowing:
https://www.physicaldesign4u.com/2020/05/time-borrowing-concept-in-
sta.html
Here some more about Time borrowing .....
145
The High-speed CMOS clocking design style are --- time borrowing in
Latches and time-stealing in Edge triggered flip flop.
Since in an edge-triggered system, the operation time of each pipeline

partition will never equal to others and the longest logic delay between two
registers will determine the maximum clock frequency of the system.
In any circuit, the concept of how to fit more combinational logic within
each logic partition (between flip flops). So every pipeline partition will want more
time than allocated. Time borrowing and time-stealing which allows one stage to
pass slack time from fast logics to neighboring slower logic then we will remedy
those problems.
Time borrowing by definition is permitting logic to automatically use slack

time (borrow the time) from a previous cycle. It always indicates the situation that
logic partition in a pipeline structure (flip flops and combinational delays are
present between flip flops) use leftover time from the previous stage and this
passing of slack time from one cycle to the next cycle is automatic without any
additional circuitry or clock adjustments.
This transparent nature allows latches to be used for high-performance

designs since they offer more flexibility than edge-triggered circuits in terms of the
minimum clock period achievable – a result of time borrowing.
Method of borrowing time from the previous stage allowing combinational

paths to have a delay of more than the clock period is referred to variously as time
borrowing or cycle borrowing.
Time borrowing happens due to only the latches because latches are level
sensitive. Since the use of an edge-triggered structure must require a clock arrival
time adjustment at the circuit and this will violate the definition of time borrowing.
So time borrowing is ideally suitable for static logic in a two-phase clocking
system latches (non-edge triggered).
146
The advantage of slack (time) borrowing is that it allows logic between
cycle boundaries to use more than one clock cycle while satisfying the cycle time
constraint. Mostly time borrowing is only allowed in exceptional cases, which are
carefully verified individually.
Time borrowing has been traditionally used to reduce the effect of clock
skew and jitter on the maximum clock frequency and it also has been widely used
in critical delay paths of high-speed circuits especially in high-speed adder designs.
Since time borrowing can automatically average out delay variations along a
path caused by process variation and inaccuracies, time borrowing is used to
alleviate the mean maximum clock frequency degradation caused by within-die
parameter variations.
Time Stealing:
Time stealing gains time by taking it from the next cycle. Time stealing
happens when some logical partition needs additional time for evaluation but
cannot use leftover time from previous cycles or phases like in time borrowing.
Therefore the most important difference between time stealing and time
borrowing is that time-stealing will not automatically use the leftover time but time
borrowing automatically use the leftover time.
It has to be forced to steal evaluation time from the subsequence cycle and
leave less time to that cycle or phase which is achieved by adjusting clock arrival
time. Since this additional time is obtained by adjusting clock arrival time, it is
usually used in edge-triggered logic.
Time Stealing can be used when a particular logic partition needs additional
time. The additional time required should be deterministic at the time of the design
and can adjust the clock phase of capture Flip Flop (FF2), so that data arrival time
at the capture edge of FF2, will not violate setup.
147
So if a dynamic logic needs more time to evaluate, it has to increase its
phase time by widening the active clock time and this can only be done by shifting
the rising edge earlier or falling edge later.
This means instead of using the symmetric 2 phase system with a 50%duty
cycle as shown in time borrowing, time-stealing has to use asymmetric duty cycle
to gain additional time. Time stealing is used to reduce leakage power.
Time Stealing can be used when a particular logic paths need additional
time. This additional time should be deterministic at the time of the design so we
can adjust the phase of capture Flip Flop2 clock, so that data arrival time at the
capture edge of Flip Flop2, should not violate setup time.
In the fig, there are 4 flip flops and 4 combination logic circuits having
path1,path2,path3 and path4 delays are 12ns, 1ns, 8ns, and 1ns respectively.
The combinational delay in the first stage between FF1 and FF2 is path1
having a delay of 12ns and it is higher than the clock time period (10ns). On the
other hand, we have a positive slack available in the second stage between FF2 and
FF3 because of the smaller combinational delay of path2 with a delay of 1ns. Here
we were able to support the higher combinational path delay (path1) without
increasing the time period by controlling the clock arrival time to FF2.
148
As shown in fig the path1 stole a time of 4ns (Ck2 offset, not the time
borrowed by path1) from path2 available time of 10ns, leaving path2 with 6ns.
Since path2 needs only 1ns, there is enough time for Flip Flop3 to capture data at
20ns.
The negative edge-triggered flip-flop is not used instead of latch, Why?
If we replace Latch1 with negative edge-triggered flip-flop as shown in Fig,

the path1 will still have that extra 2.5 ns (half of the clock cycle) to borrow from
the next clock cycle, just like in latch. So why we are not using Flip flops in place
of Latch.
149
On the input side, a negative edge triggered flip flop will behave just the
same way as a latch. The transparent nature of the latch will help to the succeeding
stage to use positive slack (leftover, if any) in the current stage OR pass on the
negative slack in the current stage to succeeding stage.
By comparing waveforms of both, we can understand the benefits of the LATCH

over negative edge triggered FLOP.
In the case of positive level-sensitive latch, data appears at the input of path2
(output of latch1) at time = 2ns, because of the transparent nature of the latch (in
case of negative edge flop data appeared at the input of path2 at t = 2.5ns) so in this
case, we have a positive slack of 2 ns (5 – 3) ns.
150
Now let’s assume path2 requires 2.7 ns, data arrival time at second
positive clock edge is 2ns + 2.7ns = 4.7ns. The second positive clock edge occurs
at t = 5 ns and in this case we have a positive slack of 0.3 ns (5 – 4.7) ns.
In fig, in case of negative edge flop, the data appears at the input of path2
(output of negative edge triggered FF) at time = 2.5 ns. With path2 consuming 1
ns, data arrives at the output of path2 at 3.5 ns (2.5ns + 1ns) and we have a positive
slack of 1.5 ns (5 ns -3.5 ns).
151
Now, assume a situation where path2 requires 2.7ns, instead of 1ns so
available time is 2.5 ns and Required time by path2 is 2.7ns then in this situation,
there is a timing violation at the edge of the clock at t = 5 ns, with negative slack
of 0.2 ns as shown in fig below.
So as compared to positive level-sensitive latch in negative edge triggered

flop the positive slack 0.5 ns (between t=2ns and t=2.5ns) available in path1 is
wasted because of the nature of flops. The transparent nature of the latch makes
use of the prior cycle’s positive slack of 0.5ns in the current cycle.
What is the Time stealing?
What is time borrowing?
What is the difference between Time Borrowing and Time stealing?
The negative edge-triggered flip-flop is not used instead of latch, Why?

152
Non Linear delay model (NLDM) in VLSI
Timing model in VLSI
1) Linear timing model
2) Nonlinear delay model (NLDM)
Cell Delay (Gate Delay):
Transistors within a gate take a finite time to switch. This means that a change in
the input of a gate takes a finite time to cause a change in the output.
Gate delay = f (input transition (slew) time, output load Cnet+Cpin).
Cnet-->Net capacitance
Cpin-->pin capacitance of the driven cell
Timing model:
Cell timing models are used to provide accurate timing for various instances
of the cells present in the design. The timing model normally obtained from
detailed circuit simulation of the cell to model the actual scenario of the cell
operation.
Let’s consider an example of a timing arc of a simple inverter. For the

inverter if a rising transition at the input causes a falling transition at the output and
vice versa. So there are two types of delay characterized for the cell first is output
rise delay and output fall delay.
153
Timing Arc delays for an Inverter Cell
These delays are measured based upon the threshold value defined in the
library which is typically 50% of the Vdd. Thus delay is measured from input
crossing its threshold to the output crossing its threshold.
The delay for the timing arc through the inverter depends on the two factors:
The output load i.e. capacitance load at the output and the transition time at the
input. The delay value is directly proportional to the load capacitance- larger the
load capacitance means larger the delay.
In most cases, the delay increases with increasing input transition time and
in some cases the delay through the cell may be showing non-monotonic behavior
with respect to the input transition time that means a larger input transition time
may produce a smaller delay especially if the output is highly loaded.
The transition time at the output is directly proportional to the output

capacitance i.e. output transition time increases with the output load. Thus a large
transition time at the input can improve the transition and a slow transition at the
input can deteriorate the transition at the output depending upon the cell type and
its output load. In the figure shown two cases where the transition time at the
output of a cell can improve or deteriorate depending on the load at the output.
154
Linear timing model:
In this the delay and the output transition time of cells are represented as
linear functions of the two-parameter: input transition time and output load
capacitance.
The general form of a linear delay model is
D = D0 + D1 * S + D2 * C
Where D0, D1, D2 are constant, S is the input transition time, and C is the output
load capacitance.
The linear delay models are not accurate over the range of input transition
time and output capacitance for deep submicron technologies so presently most of
the cell libraries use the more complex models like Non-linear Delay Model
(NLDM) and CCS model.
155
Non-linear Delay Model (NLDM):
Nowadays the non-linear delay model (NLDM) or the composite current

source timing model (CCS) based look-up table (LUT) is widely used for static
timing analysis (STA). In those LUTs, the characterization data such as cell delay
and transition time is indexed by a fixed number of input transition time and load
capacitance values.
A Synopsys Liberty (.lib) format file, also known as a timing library file
(Lib file), contains several kinds of LUTs for computing cell delay. Usually, the
Lib file is provided by the foundry, however, when a designer wants to have his
own cell library or to change some parameters in the process, he needs to generate
the lib file himself.
For doing this work, the designer should decide the characteristic parameters
input transition time and output load of the LUT and then get the timing
information by Hspice simulation or other EDA tools.
After completing the whole library’s characterization, the designer needs to

analyze the accuracy of the timing file for the future work by himself, as no EDA
tools can analyze it automatically. The accuracy of timing analysis depends on the
accuracy of the delay model used in the cell characterization.
The LUT only includes a fixed number of parameters, for example, the size
of 5x5 or 7x7 (input transition time and load) pairs, while the delays for the other
pairs are obtained using linear interpolation.
As a result, when making a standard cell library of one’s own, choosing

appropriate characteristic parameters of input transition time and load is the key
work for the accuracy of the timing library file.
NLDM is a highly accurate timing model as it is derived from SPICE

characterizations. Most of the cell libraries used table models to specify the delay
and timing checks for various timing arcs of the cell, the table model is referred to
as an NLDM and are used for calculating the delay, output slew, or other timing
checks.
156
The table gives the delay through the cell for various combinations of input
transition time at the cell input pin and total output capacitance at the cell
output.An NDLM delay model is represented in two-dimensional form, with two
independent variables are the input transition time and the output load capacitance
and entries in the table denoting the delay.
Some newer timing libraries also provide advanced timing models based on
current sources such as CCS, ECSM, etc. for deep submicron technologies.
Example of delay table for inverter
This is related to rising and falling delay models of inverter timing arc from
pin INP1 and OUT as well as max transition allowed time at pin OUT for This
separate timing model for the rise and fall delays for output pin and these are
denoted as cell_rise and cell_fall respectively.
pin (OUT) {
max_transition: 1.0;
timing () {
related_pin: "INP1";
timing_sense: negative_unate;
cell_rise (delay_template_3x3) {
index_1 ("0.1, 0.3, 0.7"); /* Input transition */
index_2 ("0.16, 0.35, 1.43"); /* Output capacitance */
values (/* 0.16 0.35 1.43 */ \
/* 0.1 */ "0.0513, 0.1537, 0.5280", \
/* 0.3 */ "0.1018, 0.2327, 0.6476", \
/* 0.7 */ "0.1334, 0.2973, 0.7252");
157
cell_fall (delay_template_3x3) {
values ( /* 0.16 0.35 1.43 */ \
/* 0.1 */ "0.0617, 0.1537, 0.5280", \
/* 0.3 */ "0.0918, 0.2027, 0.5676", \
/* 0.7 */ "0.1034, 0.2273, 0.6452");
In the lookup table template has two variable is specified, the first variable
in the table is the input transition time and the second variable is the output
capacitance but this can be in either i.e. the first variable can be output capacitance
but usually designer is consistent across all the templates in the library. This form
of representing delays in a lookup table is called the Non-Linear delay model
because of the nonlinear variations of delay with input transition time and load
capacitance.
The NLDM models are not used only for the output delay but also for used
for the output transition time which is characterized by the input transition time
and the output load. This also separates two-dimensional tables used for computing
the output rise and fall transition times of a cell. These are denoted by
rise_transition and fall_transition at the output.
pin (OUT) {
max_transition: 1.0;
timing () {
related_pin: "INP";
timing_sense: negative_unate;
rise_transition (delay_template_3x3) {
158
values (/* 0.16 0.35 1.43 */ \
/* 0.1 */ "0.0417, 0.1337, 0.4680", \
/* 0.3 */ "0.0718, 0.1827, 0.5676", \
/* 0.7 */ "0.1034, 0.2173, 0.6452");
fall_transition (delay_template_3x3) {
values (/* 0.16 0.35 1.43 */ \
/* 0.1 */ "0.0817, 0.1937, 0.7280", \
/* 0.3 */ "0.1018, 0.2327, 0.7676", \
/* 0.7 */ "0.1334, 0.2973, 0.8452");
...
...
159
An inverter cell with a Non-Linear Delay model has the following tables:
Rise delay
Fall delay
Rise transition
Fall transition
Where is that information which specifies that the cell is inverting?
This information is specified in the timing_sense field of the timing arc. For
the inverter, the timing arc is negative_unate i.e. the output pin direction is
opposite (negative) of the input pin direction. Thus the cell_rise table lookup
corresponds to the falling transition time at the input pin and cell_fall table lookup
corresponds to the rising transition time at the input pin.
Case1: When Input transition and output load values match with lookup table
indexes values.
Based upon the delay table an input fall transition time 0f 0.3ns and an
output load of 0.16pf will correspond to the rise delay of the inverter of 0.1018ns
because a falling transition at the input gives the inverter output rise.
fall_transition (delay_template_3x3) {
values (/* 0.16 0.35 1.43 */ \
/* 0.1 */ "0.0817, 0.1937, 0.7280", \
/* 0.3 */ "0.1018, 0.2327, 0.7676", \
/* 0.7 */ "0.1334, 0.2973, 0.8452");
160
Case2: When input transition and output load values doesn't match with table
index values
If any port/net on a timing path exceeds a value beyond the LUT’s range, the
STA tool extrapolates the desired value from existing LUT tables. However, the
linear interpolation method does not guarantee the accuracy of the extrapolated
values.
The example is related to where the lookup table does not correspond to any
entries available in the table. In this case two-dimensional interpolation is used to
provide the resulting delay value. The two nearest values are chosen for the table
interpolation in each dimension.
Consider the lookup table for the fall transition for the input transition of
0.14ns and the output capacitance of 1.15pF. The fall transition table given below
for two- dimensional interpolation is reproduced below.
fall_transition (delay_template_3x3)
index1 (“0.1, 0.3 . . .”);
index2 (“. . . 0.35, 1.43”);
values (\
“. . . 0.1937, 07280” \
“. . . 0.2327, 0.7676”
...
Input transition time 0.14ns lies between the 0.1ns and 0.3ns input transition
so we assume 0.1ns is x1, 0.3ns is x2 and 0.14ns is x0. Similarly output load
1.15pF lies between the 0.35 and 1.43 output load so we assume 0.35 is y1, 1.43ns
is y2 and 1.15 is y0 and corresponding delays entries are T11 = 0.1937,
T12=.07280, T21=.2327 and T22=0.7676.
161
If lookup table is required for (x0, y0), the delay value T00 is obtained by
interpolation and the formula is:
X1= 0.1, x2= 0.3 and x0 =0.14
y1= 0.35, y2=1.43 and y0= 1.15
T11 = 0.1937, T12=.7280, T21=.2327 and T22=0.7676.
xo1 = (x0 – x1) / (x2 – x1) = (0.14 – 0.1) / (0.3 – 0.1) = 0.04/0.2 = 0.2
x2o = (x2 – x0) / (x2 – x1) = (0.3 – 0.14) / (0.3 – 0.1) = 0.16/0.2 = 0.8
y01 = (y0 – y1) / (y2 – y1) = (1.15 – 0.35) / (1.43 – 0.35) = 0.8/1.08 =.7407
y20 = (y2 – y0) / (y2 – y1) = (1.43 – 1.15) / (1.43 – 0.35) = .28/1.08 =.3651
T00 = x20 * y20 * T11 + x20 * y01 * T12 + x01 * y20 * T21 + x01 * y01 * T21
Substituting the values
T00 = 0.8 * 0.3651 *0.1937 +0.8 * 0.7407 *.7280 + 0.2 * 0.3651 *.2327 + 0.2 *
0.7407 * 0.7676
= 0.056575 + 0.43138 + 0.01699 + 0.11371
= 0.6186
This equation is valid for interpolation as well as extrapolation that means

when the indices (x0, y0) lies outside the index1 and index2 (characterized
range of indices).
For example, for the lookup table with 0.05 for index1 and 1.7 for index2 then the
fall transition value obtained as:
X1= 0.1, x2= 0.3 and x0 =0.05
y1= 0.35, y2=1.43 and y0= 1.7
T11 = 0.1937, T12=.7280, T21=.2327 and T22=0.7676.
162
xo1 = (x0 – x1) / (x2 – x1) = (0.05 – 0.1) / (0.3 – 0.1) = - 0.05/0.2 = -0.25
x2o = (x2 – x0) / (x2 – x1) = (0.3 – 0.05) / (0.3 – 0.1) = 0.25/0.2 = 1.25
y01 = (y0 – y1) / (y2 – y1) = (1.7 – 0.35) / (1.43 – 0.35) = 1.35/1.08 =1.25
y20 = (y2 – y0) / (y2 – y1) = (1.43 – 1.7) / (1.43 – 0.35) = -0.27/1.08 = -0.25
T00 = x20 * y20 * T11 + x20 * y01 * T12 + x01 * y20 * T21 + x01 * y01 * T21
=1.25 * (-0.25)* 0.1937 + 1.25 * 1.25 * .7280 + (-0.25) * (-0.25) * 0.2327 +

(-0.25) * 1.25 * 0.7676
= 0.8516
Ques1: How to interpolate and extrapolate delay values from the library?
Ques2: What is standard Cell Characterization Concepts in the library?
It is a process of analyzing a circuit using static and dynamic methods to

generate models suitable for chip implementation flows.
No digital chip is possible without cell models. Characterization is necessary

for the use of Standard Cells and this is done on extracted netlists.
The Non-linear Delay Model (NLDM) is the most common delay model for
the characterization of standard cells. As we are going to deep sub-micron
technology it requires more accurate timing models like CCS (composite current
source)
Every digital chip implementation (RTL-to-GDSII) flow requires cell

models for analysis (logic simulation, verification, timing, power, noise, etc),
implementation (synthesis, test insertion, placement, clock tree synthesis, routing)
and fixing (engineering change order, rule fixing, etc).
Ques3: NDLM is related to voltage or current source?
163
Wire Load Model (WLM)
Wire Load Model (WLM)
How do you estimate the parasitics (RC) of a net before placement and
routing?
Prior to the Routing stage, net parasitics and delays cannot be accurately
determined we know only the fanout of net and the size of the block.
Before going for floorplanning or layout, wire load models (WLM) can be
used to calculate interconnect wiring delays (capacitance (C), resistance (R)), and
the area overhead (A) due to interconnect.
The wire load model is also used to estimate the length of a net-based upon
the number of its fanouts.
The wire load model depends upon the area of the block, and designs with
different areas may choose different wire load models.
The wire load model also maps the estimated length of the net into the
resistance, capacitance, and the corresponding area overhead due to routing. The
average wire length within a block increases as the block size is increased.
Generally, a number of wire-load models are present in the Synopsys

technology library, each representing a particular size block of the logic. These
models define the capacitance, resistance, and area factor of the net.
Typically a wire load model selection is based upon the chip area of the
block. However these WLM models can be modified or changed according to the
user’s requirement by the designers.
Figure shows different areas (chip or block size), having different wire load
models would typically be used in determining the parasitics (RC), so from the
figure, it is clear that the smaller sized block has a smaller capacitance.
164
Example of a wire load model
The wire load model specifies slope and fanout_length for the logic under
consideration along with Resistance, capacitance, and area overhead. The
fanout_length attribute specifies the value of the wire length that is associated with
the number of fanouts.
If any fan out number is not explicitly listed in the table, the interconnect
length is obtained by using linear extrapolation and interpolation with the specified
given slope.
wire_load (“wlm_conservative”) {
resistance: 6.0; # resistance per unit length of the interconnect
capacitance: 1.2; # capacitance per unit length of the interconnect
165
area: 0.07; # area overhead per unit length of the interconnect.
slope: 0.5; #extrapolation slope used for data points that are not specified in the
fan-out length table.
fanout_length (1, 2.6);
fanout_length (2, 2.9) means that for output pin with fanout equal 2, the wire
length will be 2.9. Then you need to multiply this wire length with capacitance and
resistance to calculate RC values for all wires.
These wire load models are based on statistical info (average wire length
among different designs) and doesn't requires cell placement info (so it is easy,
faster method, but not very accurate) i.e. just a rough estimate of the load that can
be caused by wires. Wire load model information will be in the liberty files (what
we call .libs) provided by the foundry.
Some companies create Custom WLM. - Just to have more accurate WLM
for their own designs. They did place (and route) of their design, collect info of all
wires (fanout and length for each net), and generate average length per fanout. For
example, Synopsys Jupiter has such capability.
Example1: for fan out of 8
Fanout 8 is not present in the table then we used linear extrapolation to

calculate the length of interconnect, Resistance, capacitance, and area overhead.
166
For extrapolation
Length = Length of Last fanout number given in the table + (The fanout number
we want – Last fanout number in WLM) * Slope
Capacitance = New calculated Length * Capacitance coefficient given in the table
Resistance = New calculated Length * Resistance coefficient given in the table
Area overhead due to interconnect = New calculated Length * Area coefficient

given in the table
So, Length = 5.6 + (8 - 7) * 0.5 = 6.1 units
Capacitance = 6.1 * 1.2 = 7.32 units
Resistance = 6.1 * 6 = 36.6 units
Area overhead due to interconnect = 6.1 * 0.07 = 0.427 units
Length = 5.6 + (12 - 7) * 0.5 = 8.1 units
Resistance = 8.1 * 6 = 48.6 units
Since it is between fanout numbers 4 and 6 so we calculate the length using

linear interpolation and linear interpolation between the two nearest pairs is used to
estimate any points that are not in the lookup table.
For interpolation:
Length = Average of fanout lengths = (Net length at fanout 4 + Net length at

fanout 6)/2
Capacitance = New calculated Length * Capacitance coefficient given in the table
167
Resistance = New calculated Length * Resistance coefficient given in the table
Area overhead due to interconnect = New calculated Length * Area coefficient

given in the table
Length = (4.1 +5.1)/2 = 4.6 units
Resistance = 4.6 * 6 = 27.6units
The units for the length, resistance, capacitance, and area are as specified in the
library.
Wire load Models Types:
Till now we saw that for a particular net we can estimate the RC values as per the
WLM. If a net crosses a hierarchical boundary, then different wire load models can
be applied to different parts of the net and these wire load modes are of three types:
 Top
 Enclosed
 Segmented
Top:
In this mode, all the nets within the hierarchy use the wire load model of the
top-level, and if any wire load models specified in lower-level blocks (sub-blocks)
then we ignored that WLM model. Only the top-level wire load model takes
precedence over all the WLM modes.
From the figure, the wlm_1 wire load model specified in block A1 is used
over all the other wire load models specified in blocks A2, A3, and A4.
168
Enclosed:
In this WLM mode, the wire load model of the block that fully encompasses
the net is used for the entire net.
From the figure, the net Net1 is included in block A2 and thus the wire
load model of block A2, wlm_2 is used for this net.
Other net that is fully contained in block A3 use the wlm_3 wire load model,
and net that is fully contained within block A4 use the wlm_4 wire load model.
Segmented:
In this WLM mode each segment of the net gets its wire load model from the
block that encompasses the net segment. Each portion of the net uses the
appropriate wire load model within that block.
169
Figure illustrates an example of a net Net1 that has segments in three blocks.
The interconnect within block, A3 uses the wire load model wlm_3, the segment of
the net within block A4 uses the wire load model wlm_4, and the segment within
block A2 uses the wire load model wlm_2.
Standard Parasitic Extraction Format (SPEF)
Standard Parasitic Extraction Format (SPEF)
SPEF allows the representation of parasitic information of a design(R, L,

and C) in an ASCII (American Standard Code for Information
Interchange exchange format). A user can read and check the values in a SPEF file.
Users would never create this file manually it is automatically generated by the
tool. It is mainly used to pass parasitic information from one tool to another.
Interconnect parasitics depends on the process. SPEF supports the

specification of all the cases like best-case, typical, and worst-case values. These
triplets (best, typical, and worst) are allowed for R, L, and C values, ports slows,
and loads. The units of the parasitics R, C, and inductance L are specified at the
beginning of the SPEF file.
170
The figure shows that SPEF can be generated by place-and-route tool or a
parasitic extraction tool, and then this SPEF is used by timing analysis tool for
checking the timing, in-circuit simulation or to perform crosstalk analysis.
Parasitics can be represented at many different levels. SPEF supports three models.
Distributed net model: In this model (D_NET), each segment of a net route has
its own R and C values.
Reduced net model: In this model (R_NET), on the load pins of the net have
single reduced R and C, and on the driver pin of the net a pie model (C-R-C) is
considered.
171
Lumped capacitance model: In this model, only a single capacitance is specified
for the entire net.
Parasitics extracted from a layout can be defined in three formats:
 Detailed Standard Parasitic Format (DSPF)

 Reduced Standard Parasitic Format (RSPF)
 Standard Parasitic Extraction Format (SPEF)
The SPEF is a compact format of the detailed parasitics. A SPEF file for a
design can be split across multiple files and it can also be hierarchical. SPEF is the
format of choice for representing the parasitics in a design due to its compactness
and completeness
An example of a net with two fanouts (*I *8 and *I *10) is given below.
*D_NET NET_27 0.77181
*CONN
*I *8: Q O *L 0 *D CELL1
*I *10: I I*L 12.3
172
*CAP
1 *9:0 0.00372945
2 *9:1 0.0206066
3 *9:2 0.035503
4 *9:3 0.0186259
5 *9:4 0.0117878
6 *9:5 0.0189788
7 *9:6 0.0194256
8 *9:7 0.0122347
9 *9:8 0.00972101
10 *9:9 0.298681
11 *9:10 0.305738
12 *9:11 0.0167775
*RES
1 *9:0 *9:1 0.0327394
2 *9:1 *9:2 0.116926
3 *9:2 *9:3 0.119265
4 *9:4 *9:5 0.0122066
5 *9:5 *9:6 0.0122066
6 *9:6 *9:7 0.0122066
7 *9:8 *9:9 0.142205
8 *9:9 *9:10 3.85904
9 *9:10 *9:11 0.142205

173
10 *9:12 *9:2 1.33151
11 *9:13 *9:6 1.33151
12 *9:1 *9:9 1.33151
13 *9:5 *9:10 1.33151
14 *9:12 *8: Q 0
15 *9:13 *10: I 0
*END
CONTENTS in SPEF FILE:
header_definition: contains basic information such as the SPEF version number,

design name, and units for R, L and C.
[name_map]: specifies the mapping of net names and instance names to indices.
[power_definition]: declares the power nets and ground nets
[external_definition]: defines the ports of the design.
[define_definition]: identifies instances, whose SPEF is described in additional

files
internal_definition: contains the guts of the file, which are the parasitics of the
design.
HEADER DEFINITION
*SPEF "IEEE 1481-1999" ------------> SPEF version
*DESIGN "ddrphy1" ------------------> Design name
*DATE "Fri Sep 21 00:49:32 2005"---> Timestamp when the file was created
*VENDOR "SGP Design Automation_1"---> Vendor tool
*VERSION "V2000.09” ----> version number of the program that was used to
generate the SPEF
174
*DESIGN_FLOW "PIN_CAP NONE" "NAME_SCOPE LOCAL" ---> specifies at
what stage the SPEF file was created. It describes information about the SPEF file
that cannot be derived by reading the file.
*DIVIDER / -----> specifies the hierarchy delimiter.
*DELIMITER: --> Delimiter between the pin and its instance.
*BUS_DELIMITER [ ] ---> specifies the prefix and suffix used to identify a bit of
a bus.
*T_UNIT 1.00000 NS---> NS | PS: specifies the time unit
*C_UNIT 1.00000 FF----> PF | FF: specifies the capacitance unit
*R_UNIT 1.00000 OHM-----> OHM | KOHM: specifies the resistance unit.
*L_UNIT 1.00000 HENRY-----> HENRY | MH | UH: specifies the inductance

unit.
A comment in a SPEF file can be specified in two forms.
// Comment - until end of line.
/* This comment can be extended to multiple lines */
NAME MAP:
A name map consisting of a map of net names and instance names to

indices, the SPEF file size is made effectively smaller, and more importantly, all
long names appearing in only one place. It specifies the mapping of names to
unique integer values (their indices).
Format of Name Map:
*NAME_MAP
*positive_integer name
*positive_integer name
175
Example of Name Map
*NAME_MAP
*1 memclk
*2 memclk_2x
*3 reset_
*4 refresh
*5 resync
*6 int_d_out[63]
*7 int_d_out[62]
*8 int_d_out[61]
*9 int_d_out[60]
*10 int_d_out[59]
*11 int_d_out[58]
*12 int_d_out[57]
...
*364 mcdll_write_data/write19/d_out_2x_reg_19
...
*14954 test_se_15_S0
*14955 wr_sdly_course_enc[0]_L0
*14956 wr_sdly_course_enc[0]_L0_1
*14957 wr_sdly_course_enc[0]_S0
176
This helps in reducing the file size by making all future references of the name by
the index. A name can be an instance name or net name.
The name map thus avoids repeating long names and their paths by using their
unique integer representation.
POWER DEFINITION SECTION:
Defines the power and ground nets.
*POWER_NETS VDDQ
*GROUND_NETS VSSQ
EXTERNAL DEFINITION SECTION:
It contains the definition of the physical & logical ports of the design.
Logical ports Format:
*PORTS
port_name direction {conn_attribute}
...
Example:
*PORTS
*1 I
*2 I
*3 I
*4 I
*5 I
*6 I
*7 I
177
*8 I
*9 I
*10 I
*11 I
...
*450 O
*451 O
*452 O
*453 O
*454 O
*455 O
*456 O
Here I is for input, O is for output and B is for bidirectional.
PHYSICAL PORTS
Format:
*PHYSICAL_PORTS
pport_name direction {conn_attribute}
DEFINE DEFINITION SECTION:
It defines entity instances that are referenced in the current SPEF.
*DEFINE instance_name {instance_name} entity_name
*PDEFINE physical_instance entity_name
*PDEFINE ---> Used when the entity instance is a physical partition instead of
logical.
178
example:
*DEFINE core/u1ddrphy core/u2ddrphy “ddrphy1”
This means that there would be another SPEF file with a *DESIGN value of
ddrphy1 - this file would contain the parasitics for the design ddrphy1, and
possible to have physical and logical hierarchy.
INTERNAL DEFINITION SECTION:
It describes the parasitics for the nets in the design. There are basically two forms:
1. Distributed net, D_NET

2. Reduced net, R_NET
Example of Distributed net parasitics for net *5426 (D_NET).
*D_NET *5426 0.899466
*CONN
*I *14212: D I *C 21.7150 79.2300
*I *14214: Q O *C21.495076.6000*DDFFQX1
*CAP 1
*5426:10278 *5290:8775 0.217446
2 *5426:10278 *16:3754 0.0105401
3 *5426:10278 *5266:9481 0.0278254
4 *5426:10278 *5116:9922 0.113918
5 *5426:10278 0.529736
*RES
1 *5426:10278 *14212: D 0.340000
2 *5426:10278 *5426:10142 0.916273
3 *5426:10142 *14214: Q 0.340000 *END
179
In the first line, *D_NET *5426 0.899466: *5426 is the net index and 0.899466 is
the total capacitance value on the net.
The capacitance value is the sum of all capacitances on the net like cross-
coupling capacitances that are assumed to be grounded, and load capacitances.
It may or may not include pin capacitances depending on the setting of

PIN_CAP in the *DESIGN_FLOW definition.
In the second line is the connectivity section which describes the drivers and
loads for the net. In:
*CONN
*I *14212: D I *C 21.7150 79.2300
*I *14214: Q O *C 21.4950 76.6000 *D DFFQX2
Where *I-----> internal pin
*14212: D --> instance of D pin.
*14212--> index number
I -------> load (input pin) on the net.
O ------> Driver (output pin) on the net.
*C -----> coordinates of the pin
*D-----> driving cell of the pin.
CAPACITANCE SECTION:
This section describes the capacitances of the distributed net.
1 *5426:10278 *5290:8775 0.217446
2 *5426:10278 *16:3754 0.0105401
3 *5426:10278 *5266:9481 0.0278254
180
4 *5426:10278 *5116:9922 0.113918
5 *5426:10278 0.529736
The first number is the capacitance identifier. In SPEF there are two forms
of capacitance specification.
First through fourth are of one formà (first through fourth) specifies the
cross-coupling capacitances between two nets. So in index 1 of capacitance is
0.217446 and it represents the cross-coupling capacitance between nets *5426
and*5290.
Fifth is of the second form à with index 5) specifies the capacitance to

ground and in capacitance index 5, the capacitance to ground is 0.529736.
Notice that the first node name is necessarily the net name for the D_NET
and here the name of D_NET is *5426. The positive integer 10278 in *5426:10278
specifies an internal node or junction point.
So capacitance index 4 states that there is a coupling capacitance between

net *5426 with internal node 10278 and net *5116 with internal node 9922, and the
value of this coupling capacitance is 0.113918.
RESISTANCE SECTION
Describes the resistances of the distributed net.
*RES
1 *5426:10278 *14212: D 0.340000
2 *5426:10278 *5426:10142 0.916273

181
3 *5426:10142 *14214: Q 0.340000
The first field is the resistance identifier.
The first index is between the internal node *5426:10278 to the D pin on
*14212 and the resistance value is 0.34.
The capacitance and resistance section can be better understood with the RC
network shown pictorially in fig.
Second example of distributed net:
This net has two loads and one driver and the total capacitance on the net is
2.69358.
*D_NET*5423 2.69358
*CONN
*I *14207: D I *C 21.7450 94.3150 ------> load
*I *14205: D I *C 21.7450 90.4900------>load
*I *14211: Q O *C 21.4900 83.8800 *D DFFQX1-----------> driver
*CAP
1 *5423:10107 *547:12722 0.202686
2 *5423:10107 *5116:10594 0.104195
3 *5423:10107 *5233:9552 0.208867
4 *5423:10107 *5265:9483 0.0225810
5 *5423:10107 *267:9668 0.0443454
6 *5423:10107 *5314:7853 0.120589
7 *5423:10212 *2109:996 0.0293744
8 *5423:10212 *5187:7411 0.526945
9 *5423:14640 *6577:10075 0.126929
182
10 *5423:10213 1.30707
*RES
1 *5423:10107 *5423:10212 2.07195
2 *5423:10107 *5423:10106 0.340000
3 *5423:10212 *5423:10211 0.340000
4 *5423:10212 *5423:14640 1.17257
5 *5423:14640 *5423:10213 0.340000
6 *5423:10213 *14207: D 0.0806953
7 *5423:10211 *14205: D 0.210835
8 *5423:10106 *14211: Q 0.0932139
*END
Figure RC network for D_NET *5423 corresponds to the distributed net

specification.
183
In general, an internal definition can comprise of the following specifications:
D_NET: Distributed RC network forms of a logical net
R_NET: Reduced RC network form of a logical net
D_PNET: Distributed form of a physical net
R_PNET: Reduced form of a physical net
*R_NET net_index total_cap [*V routing_confidence] [driver_reduction]
*D_PNET pnet_index total_cap [*V routing_confidence] [pconn_section]

[pcap_section] [pres_section] [pinduc_section]
*END
*R_PNET pnet_index total_cap [*V routing_confidence] [pdriver_reduction]
*END
Here is the syntax.
*D_NET net_index total_cap [*V routing_confidence]
[conn_section]
[cap_section]
[res_section]
[inductance_section]
*END
184
INDUCTANCE SECTION
This section used to define the inductances and the format is very similar to
the resistance section.
The *V is used to specify the accuracy of the parasitics of the net. The
accuracy can be specified individually with a net and globally using the
*DESIGN_FLOW statement with the ROUTING_CONFIDENCE value, such as:
*DESIGN_FLOW “ROUTING_CONFIDENCE 100” ---> means the

parasitics were extracted after final cell placement and final route and 3d extraction
was used.
The possible values of routing confidence are 10, 20, 30, 40, 50, 60, 70, 80, 90 and,
100.
REDUCED NET
It is the net that is the reduced from the distributed net. There is one driver
reduction section for each driver on a net. The example of a reduced net SPEF is
below
*R_NET *1200 2.995
*DRIVER *1201: Q ----> DRIVER pin_name
*CELL SEDFFX1
*C2_R1_C1 0.511 2.922 0.106 ----> shows the parasitics for the pie model on the
driver pin of the net
*LOADS *RC *1202: A 1.135
*RC *1203: A 0.946 --------> rc_value in the *RC construct is the Elmore delay
(R*C)
*END
185
And the pictorial view of RC network is given below.
LUMPED CAPACITANCE MODEL:
It can be define using either a *D_NET or a *R_NET construct with only the
total capacitance and no other information is given. The examples of lumped
capacitance are given below.
*D_NET *1 80.2096
*CONN
*I *2: Y O *L 0 *D CLKMX2X3
*P *1 O *L 0 *END
*R_NET *17 58.5204
*END
Values in a SPEF file can be in a triplet form that represents the process variations,
such as 0.243: 0.269: 0.300.
0.243 ----> best-case value
0.269 ----> typical value
0.300----> worst-case value.
186
OCV (On Chip Variation) and CRPR (Clock Reconvergence Pessimism
Removal)
What are OCV and CRPR?

How they are related?
CRPR (Clock Reconvergence Pessimism Removal) topic is comes after the

STA introduces OCV (On Chip Variation) analysis, So here I covered small
introduction of OCV so that we can understand the CRPR and How CRPR is
related to OCV.
we all know during the manufacturing of chips on the same die may suffer
from variations due to process, voltage, or temperature change, thus transistors can
be faster or slower in different dies.
Delays vary across a single die due to PVT (processor, voltage,

temperature). The delay value of IC in cold weather is different and hot weather is
different. In cold weather, the metals in IC will shrink so the delay will decrease.
In hot weather, the metal will expand so the delay will increase.
The variation may be random or deterministic. Random variations are oxide

thickness variation, implant doses, and metal or dielectric thickness variations.
Suppose there is two inverters having the same characteristics on the single-
chip but due to process, voltage, and temperature variation the delay of these two
has different delay propagation.
To compensate these variations, STA introduces a concept called On-Chip
Variation (OCV). In this concept, extra timing margins are added in timing
analysis. In OCV, all cells or nets in the launch clock path, data path, and capture
clock path will be added a fixed derate value, bringing more pessimism in timing
analysis and compensating the variation.
In simple words, OCV is a technique in which this flat derate is applied to

make a faster path more fast and slower path slower. so OCV adds some
187
pessimism in the common path of lauch and capture path i.e. for a same cell there
are two delays min and max.
In this concept, we remove the extra pessimism from common path.

Generally, we add the delay to every buffer in the process of OCV. But adding
more delay is also affects the speed of the chip and it may cause violations to
overcome this we are removing the delay to the common path in the process of
CRPR.
The delay difference along the common paths of the launching and capturing
clock paths is called CRPR.
Problem: - In the fig three buffers, flip flops, combinational circuit have two
delays one is min delay another is the delay after adding derating i.e. max
delay. Consider Time period 8ns and Tsetup and Thold are 0.2ns.
188
Let's consider a buffer that is placed in a common path (both data path and
clock path) for buf2 and buf3 buffer.
The tool calculates max. delays for setup calculation and min. delays for
hold (worst- and best-case analysis).
Without CRPR: -
Setup slack = (required time) min - (arrival time) max

Arrival time = 0.70 + 0.65 +0.60 + 3.6 = 5.55ns
Requited time = 8+ 0.60 + 0.45 -0.2 = 8.85ns
Setup slack = 8.85ns – 5.55ns = 3.3ns
Hold slack= = (arrival time) min - required time) max -

Arrival time = 0.60 +.55+.48+2.5 = 4.13ns
Requited time = .70 + .75+0.2 =1.65ns
Hold time = 4.13ns -1.65ns = 2.48ns
With CRPR:
When comes to OCV analysis, the tool further considers, max. for data path
and min. for clock path during setup analysis. max. for clock path and min. for data
path during hold analysis.
So, buffer placed in the common path now has 2 values i.e., max. and min. values.
As we know, a cell can't have two different values at a particular instant of time.
Thereby we calculate the buffer value as:
CRPR = Max. value - min. value

In the CRPR process we are removing the derating to common buffer. here
the common buffer buf1.so we are considering 0.70ns-.60ns =.10ns for buf1.
Setup slack = (required time) min - (arrival time) max

Arrival time = 0.10 + 0.65 +0.60 + 3.6 = 4.95ns
Requited time = 8+0.10 + 0.45 -0.2 = 8.35ns
Setup slack = 8.35ns – 4.95ns = 3.4ns
189
Hold slack= = (arrival time) min - required time) max -
Arrival time = 0.10 +.55+.48+2.5 = 3.63ns
Requited time = .10 + .75+0.2 =1.05ns
Hold time = 3.63ns -1.05ns = 2.58ns
Without CRPR the setup and hold values are: - 3.3ns, 2.48ns
With CRPR the setup and hold values are: - 3.4ns, 2.58ns
From the above results, it is clear that with the CRPR method both setup and hold
are benefited.
Some commands related to CRPR:

pt_shell> set_operating_conditions -analysis_type on_chip_variation -min MIN -
max MAX
pt_shell>set timing_remove_clock_reconvergence_pessimism TRUE
I will write about OCV (on chip variation) in some other post because this is a
separate topic.
PVT (Process, Voltage, Temperature)
What is the meaning of PVT corners & How these corners will affect the
Delay?
How OCV (On-Chip Variation)is related to PVT?
PVT:
PVT is the Process, Voltage, and Temperature. In order to make our chip to
work after fabrication in all the possible conditions, we simulate it at different
corners of process, voltage, and temperature. These conditions are called corners.
All these three parameters directly affect the delay of the cell.
190
Process:
There are millions of transistors on the single-chip as we are going to lower
nodes and all the transistors in a chip cannot have the same properties. Process
variation is the deviation in parameters of the transistor during the fabrication.
During manufacturing a die, the area at the center and at the boundary will
have different process variations. This happens because layers that will be getting
fabricated cannot be uniform all over the die.
Below are a few important factors which can cause the process variation;
 The wavelength of the UV light
 Manufacturing defects
1. Oxide thickness variation

2. Dopant and mobility fluctuation
3. Transistor width
4. RC Variation
5. channel length
6. doping concentration,
7. metal thickness
8. impurity concentration densities
9. diffusion depths
10. imperfections in the manufacturing process like mask print, etching
These variations will cause the parameters like threshold voltage and
threshold voltage depends on different parameters like doping concentration,
surface potential, channel length, oxide thickness, temperature, source-to-body
voltage, and implant impurities, etc.
The threshold voltage equals the sum of the flat band voltage, twice the bulk
potential, and the voltage across the oxide due to the depletion layer charge.
191
consider the drain current equation for NMOS;
Id = (1/2) μn Cox (W/L) (VGS – VT)2
So, the current flowing through the channel directly depends upon
 Mobility (μn), (mobility is depending upon temperature)
 Oxide capacitance Cox (and Cox = εox/ tox hence the thickness of oxide i.e.
tox)
 The ratio of width to length (W/L)
If any of these parameters change, it will change the current and change in
current will affect the delay of the circuit because The delay depends upon the R
and C values (Time constant RC) of the circuit. The relation between process and
delay shown in Figure. From the figure, we conclude that delay is more for slow
process MOSFETs and it is less for fast process MOSFETs.
The process of fabrication includes Oxidation, diffusion, Ion Implantation,

Deposition, Etching, Photolithography, drawing out of metal wires, gate drawing,
etc. The diffusion density and the width of metal wire are not uniform throughout
wafer and diffusion regions for all transistors will not have the same diffusion
concentrations. So, all transistors are expected to have different characteristics.
This introduces variations in the sheet resistance (Rs) and transistor parameters
such as threshold voltage (Vth) and because of this, it will causes (W/L) variations
in MOS transistors.
Process variation is different for different technologies but is more dominant

in lower node technologies because transistors are in millions on the chip. Process
variations are due to variations in the manufacturing conditions such as
temperature, pressure, and dopant concentrations. As a consequence, the different
transistors have different lengths throughout the chip. This makes the different
propagation delay everywhere in a chip because a smaller transistor is faster and
therefore the propagation delay is smaller.
192
Voltage:
As we are going to the lower nodes the supply voltage for a chip is also
going to less. Let’s say the chip is operating at 1.2V. So, there are chances that at
certain instances of time this voltage may vary. It can go to 1.5V or 0.8V. To take
care of this scenario, we consider voltage variation.
There are multiple reasons for voltage variation.
 IR drop is caused by the current flow over the power grid network.
 Supply noise caused by parasitic inductance in combination with resistance
and capacitance. when the current is flowing through parasitic inductance (L) it
will causes the voltage bounce.
The supply voltage is given to any chip either externally from the DC source
or some voltage regulator. The voltage regulator will not give the same voltage all
the time. It can go above or below to the expected voltage and hence if voltage
change it will change the current and making the circuit slower or faster than
earlier.
193
Power is distributed to all transistors on the chip with the help of a power
grid network. Throughout a chip, the power supply is not constant it will change
with the placement of cells. The power grid network is made up of metals and
metals have their own resistance and capacitance. So, there is a voltage drop along
the power grid.
The supply voltage reaching the power pins will not be the same for all
standard cells and macros because of the resistance variation of the metals.
Consider there are two cells, one which is placed closer to the DC power source,
and others placed far. As the interconnect length is more for the farther cell, it has
more resistance and results in a higher IR drop, and it reduces the supply voltage
reaching the farthest cell. As the voltage is less, this cell will take more delay to
power on than the cell which is placed closer. If nearer cells get higher voltage
then the cell is faster and hence the propagation delay is also reduced. That is the
reason because of which, there is variation in delays across the transistors.
The delay of a cell is depending on the saturation current and the saturation
current of a cell depends on the power supply. In this way, the power supply
affects the propagation delay of a cell.
The self-inductance of a supply line contributes also to a voltage drop. For

example, when a transistor is switching to high, it takes a current to charge up the
output load. This time-varying current (for a short period of time) causes an
opposite self-induced electromotive force. The amplitude of the voltage drop is
given by V=L*dI/dt, where L is the self-inductance and I is the current through the
line.
194
Temperature:
The transistor density is not uniform throughout the chip. Some regions of
the chip have higher density and higher switching, resulting in higher power
dissipation and Some regions of the chip have lower density and lower switching,
resulting in lower power dissipation Hence the junction temperature at these
regions may be higher or lower depending upon the density of transistors. Because
of the variation in temperature across the chip, it introduces different delays across
all the transistors.
The temperature variation is with respect to the junction and not ambient
temperature. The temperature at the junction inside the chip can vary within a big
range and that’s why temperature variation needs to be considered. Delay of a cell
increases with an increase in temperature. But this is not true for all technology
nodes. For deep sub-micron technologies, this behavior is contrary. This
phenomenon is called a temperature inversion.
When a chip is operating, the temperature can vary throughout the chip. This
is due to the power dissipation in the MOS-transistors. The power consumption in
195
the transistors is mainly due to switching, short-circuit, and leakage power
consumption.
The average switching power dissipation is due to the required energy to

charge up the parasitic and load capacitances and the short-circuit power
dissipation is due to the finite rise and fall times and leakage power consumption is
due to the reverse leakage and sub-threshold currents.
The biggest contribution to power consumption is switching. The dissipated

power will increase the temperature. Mobility depends on temperature.
mobility= temp^-m
we know that with an increase in temperature, the resistivity of a metal
wire(conductor) increases. The reason for this phenomenon is that with an increase
in temperature, thermal vibrations also increase. This gives rise to increased
electron scattering and electrons start colliding with each other more and the
mobility of the primary carriers decreases with an increase in temperature.
Similarly, for higher doping concentrations, the temperature is higher and

thermal vibrations are also increasing and the electrons and holes move slower i.e.
mobility decreases, then the propagation delay increases. Hence, the propagation
delay increases with increased temperature. The threshold voltage of a transistor
depends on the temperature. A higher temperature will decrease the threshold
voltage. A lower threshold voltage means a higher current and therefore a better
delay performance. This effect depends extremely on the power supply, threshold
voltage, load, and input slope of a cell. There is a competition between the two
effects and generally the mobility effect wins.
196
AMD interview Questions (Physical Design)
AMD interview Questions (Physical Design)

1. Tell me about your experience.
2. How will you make sure that your power structure is good?
3. Tell me about DRC and LVS fixes.
4. Why we are following certain guidelines for macros placement and what are
those guidelines?
5. What is the minimum space required in between macros if the channel is
there on the non-pin side of macros?
6. What is the distance between tap cells in your design?
7. What are the setup and hold edges for a 3-level multi-cycle path?
8. What are the setup and hold edges for the half-cycle path?
197
9. How you will perform cell spreading in placement if congestion is there?
10. Tell me about the 2-pass approach in placement.
11. Why we are not taking care of hold violations at the placement stage?
12. Tell me about multi-source CTS.
13. Tell me about Switching power, internal power, average power, peak power,
and IR Drop.
14. Tell me about CPPR.
15. What are the differences between OCV and POCV?
16. What are the setup and hold edges for the half-cycle path?
17. What are the setup and hold edges for the positive latch to negative flop?
18. Tell me about setup and hold violation fixes which are occurred in the same
path.`
19. How will you apply to derate?
20. Tell me about DPT.
21. Tell me about X-talk delta and X-talk noise.
22. What are the different ways to fix setup and hold? Which one is difficult to
fix setup? or hold?
23. Which violation you will fix first? Is it set up or hold?
24. Tell me about scan-chain reordering.
25. What are the timing arcs for flipflop when we have scan-chain reordering?
26. How will you improve your insertion delay?
27. And some other timing-related scenarios w.r.t. setup and hold fixes.
28. Explain sanity checks.
29. What check_design will report?
30. What is the issue, if inputs are floating?
31. Is there any issue in the case of outputs floating?
32. What check_timing will report?
33. What check_library will report?
34. On what basis, macros will be keeping inside the design?
198
35. What is the use of keep-out margin around the macros?
36. Is it compulsory to keep out margin around macros?
37. Explain the order of keeping preplace cells?
38. Tap cells information will be in which file?
39. On what basis, the distance between tap cells will be decided?
40. Format of keeping tap cells inside the core area.
41. How many std cells are being accommodated by each tap cell.
42. What is the use of keeping tap cells in checkerboard format rather than
keeping continuous?
43. What is the purpose of endcap cells?
44. Why can’t we keep endcaps on the top and bottom of macros?
45. What is isolation cell, retentions cell?
46. What are the checks after the floorplan?
47. How the tool will place std cells in the design.
48. How to fix congestion?
49. Prioritize timing DRCs, timing, DRC.
50. What is the purpose of IO buffers?
51. Among Max Trans, Max cap, Max fanout.....which one will be fixed first.
52. Difference between normal buffers and clock buffers.
53. Checks after placement.
54. What are the contents of the clock spec file?
55. What is NDR?
56. When we will enable NDR.
57. What are the inputs to PT?
199
Qualcomm Interview Question (Physical Design Engg)
1. Practical flow of the design?

2. How analog macro is placed?
3. Explain the power plan structure in your design?
4. What is the routing blockage for analog macro?
5. What are the checks after the floorplan?
6. What are the steps in the placement stage?
7. What is the block size, utilization, target skew, WNS of your design?
8. What are the corners in your design?
9. What are the corners considered in the placement stage? why?
10. What are the corners considered in the CTS stage? why?
11. What are the causes for congestion. And how to fix it?
12. What could be the reason for congestion, if there is neither cell density nor pin
density and
also, there is no much communication between nearest macro and std cells as well?
13. How to fix setup and hold issues?
14. Give the order of priority among various setup fixing methods? And reasons
for them?
15. How tran Violation will affect the setup?
16. In case, there are 10000 setup violations in the placement stage, what could be
the issue?
17. What kind of constraint will lead to so many setup violations in the placement
stage?
18. How to analyze timing reports. And how the approach will be for fixing slack?
19. If there are many shorts at one place in the routing stage, what could be the
possible reason?
20. What are the checks after CTS?
21. Why we check hold after CTS?
22. What are the checks after routing?
23. What are the different kinds of DRC checks?
24. In case there are many shorts, opens, setup, hold violations...what you will
address first?
25. What are tap cells? How much distance given in your design and why?
26. What is the use of tap cell?
200
27. If we don’t use tap cells, what error we will get?
28. What are low power techniques?
29. Explain isolation cell, retentions cell, power switches, and level shifters.
30. How to fix static and dynamic IR issues.
31. Assume there are 6 timing corners, if the hold is not able to fix in one corner,
how to fix it without
affecting the other corners. What is the better approach to fix it without affecting
others?
32. Write few commands from ICC2, INNOVUS?
interview questions related to placement stage:
1. What is placement and what are goals of placement?

2. What are the things to be checked before going to placement stage?
3. What are the inputs and outputs for placement stages?
4. After floorplan DEF is generated as input of placement stage what it
contains?
5. What are the challenges faced in placement stages?
6. What are reasons of congestion in the design?
7. What are the different methods of tackling congestion? Where do you find
major congestion in your design?
8. How do you tackle congestion in the core area? You have a region with pin
density high and the router is not able to route to these pins. How to resolve this
issue?
9. How to tackle cell density issue?
10. At what stage do you start looking at your timing reports in the PD flow?
11. How will you check your timing report? What are the things you will look
into it in case of timing violation?
12. What are the types of placement?
13. During placement stage how tool will know that your design is getting
congestion without doing the actual route?
14. What types of routing is carried out by the tool in placement stage global
routing or virtual routing?
15. What happens in global routing?
201
16. What are the switches in place_opt command and how they are useful?
17. What happens during congestion driven techniques? What are the care
should be taken using congestion driven option?
18. What are the types of congestion? And solution to resolve congestion?
19. How to modify physical constraints to reduce the congestion?
20. What happens if you over do keepout margin?
21. How many blockages are there in your design, how will you define the
blockages and keepout margins (halo)?
22. What do you mean by logic optimization techniques and give some
examples of logic optimization?
23. How do you used blockages techniques to effectively reduce congestion?
24. What is magnet placement?
25. What is multibit banking and why we use it and what are the advantages and
disadvantages of that?
26. What is fan-out? What is HFN synthesis why we do it in placement stage?
Can we consider clock in HFN during placement stages?
27. How global route will handle congested paths?
28. What do you means by scan chain reordering why we used it?
29. How to qualify the placement stage?
Power planning related questions for interview:
1. What are the challenge you will see in lower technology?

2. What are the inputs and outputs from the power analysis?
3. What are the checks after power planning is completed?
4. What are the power dissipation components? How to reduce them
5. Why float outputs are ignored but not float gates?
6. How do you calculate the core ring width?
7. What is IR drop? And how will you decrease this?
8. What are general power margins?
9. During power analysis, if you are facing IR drop problem, then how did you
avoid that.
10. What are the effects of IR drop?
11. How IR drop affects setup and hold timing?
202
12. Why high metal layers are preferred for VDD and VSS
13. How to find number of power pads and IO power pads. How the width of
metal and numbers of straps calculated for power and ground.
14. What is power gating?
15. CMOS power consumption details? Different types
16. How you make sure that power structure is good?
17. What is short circuit current and how will you overcome this problem?
18. What is difference between static IR drop and dynamic IR drop?
19. On what all parameter static IR drop and dynamic IR drop depends on?
20. What is the purpose of static IR drop?
21. How to reduce power/ground bounce?
22. How you will fix EM violations. What are the step to minimize
Electromigration?
23. What is clock gating?
24. Why is power planning done and how? Which metal should we use for
power and ground rings & straps and why?
25. What is the difference between level shifter and isolations cells?
26. What is isolations cells and its types?
27. How to find total power chip, what are the problems you can faced with
respect to timing?
28. How the numbers of power straps calculate.
29. How did you do floorplanning?
30. How to calculate core ring width, macro ring width and straps or trunk
width?
31. How do you reduce power dissipation using high VT and Low VT on your
design?
32. What are the various statistics available in IR drop reports?
33. What is the importance of IR DROP analysis?
34. What are low power techniques?
35. What is the difference between footer switch and header switch in power
gating?
36. What is EM self-heating?
37. How the cell modeled while power analysis?
38. How core ring length matters while deciding the core ring width?
39. After adding power straps if you have hot spot what to do?
203
40. How to calculate core ring and straps width?
41. How do you reduce standby (leakage) power?
42. How to do power planning for multi voltage design?
43. What is the tradeoff between dynamic power (current) and leakage power
(current)?
44. What are the power dissipation component? How to reduce them?
45. What are the different reason for high voltage drop in the design?
46. What is the need of UPF and what are the contents in UPF? Is always on cell
present in UPF or not?
47. If you have analog and digital (RAM) macros in your design how to do
floorplanning?
48. How many power domains are there in your project and how are they
interlinked with each other?
49. What is the issue if we see the design having current more than its defined
capacity?
50. How to reduce glitches power violations in the design?
51. Explain power routing structure in your design?
sort of question those I faced in my interviews some are from service based
company and some are from product based also.....
Interview 1
1. What are the inputs of Physical design?
2. What are the content in the .lib, .lef & .tlef files
3. What are the challenges you faced in your design? If you say congestion,
timing, latency then they will ask more question on these challenges.
4. What is value of Tran, cap, latency and skew in your design?
5. What are the OCV & AOCV? Why we go for POCV?
6. What are the commands for multicycle path and generated clock?
7. How you will confirm that netlist is proper?
8. How to decide channel width between macros?
204
Interview 2
1. What are ECO’s and what are the inputs need for ECO, why we need of it?
2. What are metal ECO and Base ECO?
3. What are the inputs of LVS?
4. What is input and output delay?
5. What is insertion delay?
6. What is the clock period in your design and how many levels of clock has
been built?
Interview 3
1. What is DPT layers? How many DPT layers in your design?
2. Why we are using DPT layers?
3. What is the difference between FINFET and CMOS? Fin is gate or
diffusion?
4. How fin is connected to metal layers and via?
5. Consider you have two design design1 with 10k instances and design2 with
1Million instances for both we requires same or different TLU+ files?
6. What are the effects of crosstalk on setup and hold timing?
7. How IR drop is affected timing?
8. IR drop is on power or signal net?
9. In PNR stage how will you handle the timing issues?
10. What is the difference between bounds and region?
11. What are the sanity checks before going for floorplanning?
12. What is the command for setup violating paths?
13. What are placement blockages?
14. What are DRC’s & how will you fix them?
Interview 4
1. What is NDM?
2. How much spacing you will give between macros?
3. What is dynamic power?
4. Is there any rules to put minimum number of straps between two macros?
205
5. What is clock gating and power gating?
6. What are the low power design techniques?
7. What are clock routing rules?
8. What is antenna effect and how will you remove this?
Interview 5
1. What are Physical design inputs in details?
2. What SDC files contains?
3. What is the generated clock and virtual clock?
4. The timing DRV’s (max Tran, max cap, max fan-out) in the SDC and in the
library file are same or different?
5. What is clock skew and its types?
6. What is the difference between clock skew and uncertainty?
7. What SDC contains related to CTS?
8. What are NDR rules and when we apply these rules?
9. What is Electro migration and how to reduce it?
10. What are the guidelines for doing floorplanning? Is there any specific
guidelines in 7nm technology for floorplanning?
11. What is useful skew, local skew and global skew?
12. What is latency? If you face latency issue in your design what will you do?
13. What are tap cells and end cap cells?
14. What problem you faced during placement stages (timing and congestion),
how will you tackle these problems?
15. What are timing DRV’s?
16. What is HVT, LVT, and ULVT cells?
17. What are EM and antenna violations and how to fix it?
18. How to fix Dynamic IR drop?
19. How to fix setup violations?
20. Can single path have both setup and hold violation? If yes why and if no
why?
21. In OCV how to derate the values?
22. What are clock synchronizers why we used them?
23. What are standard cells and how to improve the delay of standard cells?
24. What is path delay, on what factor it is depends?
206
25. What is the difference between crosstalk delay and crosstalk noise?
26. What are the guidelines for floorplanning?
27. How to find the setup for flip-flop? On what factors setup is depend?
Interview 6
1. Why shielding we used?
2. What are Blockage creation command?
3. Fence region and bound difference?
4. What is command to show only all setup violating paths?
5. If clock is not reaching to particular flops how will you report it?
6. What are the number of clocks in your design and how you balance the
skew?
7. How to reduce dynamic power?
8. What is Static and dynamic power and what are the sources of these powers
in CMOS and how to reduce?
9. In CMOS if you are changes the place of PMOS and NMOS with each other
what will happen and why?
10. What is the difference between ASIC and FPGA?
Interview 7
1. What are the inputs for synthesis?
2. How you resolved timing violations in your design?
3. How many metal layers are present in your design? On what basis the metals
are divided?
4. What is the flow for physical design, explain each part?
5. What types of DRC’s you saw in the design and how did you resolve?
6. In library file how particular cell is defined stepwise?
7. Why setup and hold violates in same path? Give reasons.
8. What is the skew if more positive skew which is violate (setup/hold)?
9. In latency what is the root buffer? Why we consider the latency? What is the
use of this?
10. What is LVS, where do you fix LVS & how many types of LVS issues are
there in the design and how you fixed?
207
Interview 8
1. What are the contents in UPF?
2. There is a reg to reg path in that setup is violating then how to fix setup if
you already applied all the techniques?
3. What is isolation cells, if we are not using this what will happened?
4. What is multibit banking and need of it?
5. What are the STA inputs, explain each?
6. What contains in SDC?
7. What is the frequency and time period of your design?
8. How many paths have you seen in your design and types of those paths?
9. What is CRPR and how it work in STA?
10. What is OCV analysis, how to give derate values to the paths?
11. Tell me about your design is power aware or not?
12. What is clock Tran and data Tran and how to fix these violation?
13. In routing stage what are the issues you faced?
14. What is pulse width violation if it is in design how to reduce?
15. What is difference between clock buffer and normal buffer?
Interview 9
1. Have you seen any End cap cells around the macros if yes why we are using
them?
2. What information are presents in DEF. In DEF there are special nets what is
the use of them?
3. What is Eletromigration?
4. What is antenna effect?
5. What type of DRC you seen in your design and how to fix them?
6. Have you checked DRC in ICC2 or only in caliber?
7. What are the NDR rules you follow in your design?
8. How power planning is done?
9. If two tap cells are overlapped what kind of DRC issue will be there?
10. Can we overlap the Macros with Tap cells?
11. Why end cap are present in the design?
12. On what factors you will decide distance between two macros?
208
13. On what basis you placed your macros in core area?
14. What is the flow of your project?
15. If congestion is not reduced by doing all techniques what will you do?
16. Consider two scenario like if you are reducing congestion timing is worst
and if you are doing on timing is good but now congestion is worst then what
should you do for that?
17. Why we are using NDR rules?
Questions related to floorplanning,physical only cells, & inputs of physical

design
Why we are following certain guidelines for macro placement and what are
those guidelines?
Floorplanning is the most important stage in physical design. Quality of your

chip implementation depands on how good id floorplan. A well oraganised
floorplan results in more efficient utilization of the core area thereby aiding the
placement of standard cells without causing issues related to congestion, timing,
signal integrity etc.
if the floorplan is bad, it affects the area, power, reliability of the chip and
requires more effort for closure and it can increase overall IC cost also. It can
create all kind of issues in the design like congestion , timing, IR, routing issues.
When placing large macros we must consider impacts on routing, timing and
power.
GUIDELINES TO PLACE MACROS:
 Placement of macros are the based on the fly-lines ( its shows the
connectivity b/w macro to macro and macro to pins) so we can minimize the
interconnect length between IO pins and other cells.
 Place the macros around to the boundary of core, left some space between
macro to core edge so that during optimization this space will be used for
buffer/inverter insertion and keeping large area for placement of standard cell
during placement stage.
209
 Macros that are communicating with pins/ports of core place them near to
core boundary.
 Place the macros of same hierarchy together.
 Keep the sufficient channel between macros
 channel width = (number of pins * pitch )/ number of layers either horizontal
or vertical
 Avoids notches while placing macros, if anywhere notches is present then
use hard blockages in that area.
 Avoid crisscross connection of macro placement.
 Keep keep-out margin around the four sides of macros so no standard cells
will not sit near to Macro pins. This technique avoids the congestion.
 Keep placement blockages at the corners of macros.
 For pin side of macros keep larger separation and for non-pin side we can
abut the macros with their halo so that area will be save and Halo of two macros
can abut so that no standard cell are placed in between macros.
 Between two macros at least one pair of power straps (power and Ground)
should be present.
 Sensitive blocks (PLL,ADC,DAC ) should be placed far from high
frequency blocks and high frequency IOs.
 Macros alignment and orientation is correct and pins are on the edges
what is the minimum distance required in between macros if channel is there
in non-pin side of macros?
Macro to macro spcing deciding factors are:
1. Pin density
2. Number of metal layers
3. Routing pitch
channel width = (number of pins * pitch )/ number of layers either horizontal
or vertical
Eg. Let’s assume If there is a two macros having 50 pins and the pitch values is 0.6
and the total number of horizontal and vertical layers are 12. Means M0 M2 M4
M6 M8 M10 are horizontal layers and M1 M3 M5 M7 M9 M11are vertical
layers.Channel width = ((50+50)*0.6)/6
210
= 10
what are the checks as you get netlist before going for floorplan?
1. Netlist uniqueness
2. Assignment statement
3. Setup timing check
4. SDC constraints (Clock frequency, uncertainty margins, exception path list
(false path and multicycle path)
1. Why we are following certain guidelines for macro placement and what
are those guidelines?
2. What is the minimum distance required in between macros if channel is
there in non-pin side of macros?
3. What are the checks as you get netlist before going for floorplan?
4. What are pads and what do they do? How the IO pad arrangement will
be done?
5. What floorplan checks do you do to freeze?
6. What are the floor planning steps?
7. What is objective of floorplan?
8. What are Goals of floor planning?
9. What are input and outputs for floor planning?
10. What are the constraint you consider for floor planning of Macros and
standard cells?
11. What are Fence, Guide and region?
12. What is Halo (Padding)?
13. What are Placement blockages? Explain each.
14. What are routing grids and manufacturing grid?
15. What is the different between hierarchical design and flat design?
16. What is the die size if standard cell area is 3 mm2 and macros area is 2
mm2?
17. Could you place the standard cells in core to IO REGION?
18. Why standard cell width is integer multiple of M2 pitch?
19. How much placement density allowed at floorplan stage?
20. What is floorplan and power plan?
21. What are the steps to be taken care while doing floor planning?
211
22. What is core and how will you decide W/H ratio for the core?
23. What is effective utilization and chip utilization?
24. How will you validate your floorplan?
25. What are the steps involved in designing an optimal pad rings?
26. What are the issues if floorplan is not good?
27. How much aspect ratio should be kept?
28. How will you decide pin location in block level design?
29. Why do you use alternate routing approach HVH/VHV?
30. What is the distance between tap cells in design?
31. What happens if you place macros at the center?
32. How do you place macros in a full chip design?
33. What are the parameter that differentiate chip design and block level
design?
34. What is the shape of your block?
35. What is core and standard cell utilization?
36. What are blockage explain each?
37. Can you rotate the macros?
38. What is the difference between standard cell and macros?
39. What are fly lines?
40. What kind of macros you had in the design?
41. What will you do if you have congestion between macros?
42. Your netlist area is grown much more than expected then what will you
do?
43. What are don’t use or don’t touch cells? Who will provide these cells?
44. How the IO pad arrangement will be done?
45. What are the different types of floor planning?
46. If there are too many Pins of the logic cells in one place within the core,
what kind of issue you face and how will you resolve.
47. What are the issues if you see floorplan is bad?
48. What are the standard cell rows?
49. What is difference between soft macros and hard macros?
50. What is partial floorplan?
51. What are the challenges seen as technology shrinks.
52. What are most challenging job in P&R
53. Explain netlist to GDSII flow?
212
54. What are the parameters you will consider while estimating die size?
55. How to decide number of pads in chip level design?
56. How do you use blockages techniques to reduce congestion?
57. Why we need .lib floor planning?
58. What is grid? Why we need and what are different types of grids?
59. What are the steps involved in designing an optimal pad rings?
60. How to decide number of pads in chip level?
61. How to decide number of routing layers?
62. How to decide full chip IO rings?
63. How to decide total number of pins/pads and locations?
64. How to decide design is pad limited or core limited?
65. What is wire bond and flip chip packages?
66. What is gridded and griddles routing?
67. Explain top level pin placement flow? What are the parameters to
decide?
68. What is major advantage of using flip chip over wire bond package?
69. What is need for sanity checks at floorplan stage?
70. What are physical only cells?
71. What corner cells contains?
72. What is the difference between ore filler cells a metal filler cells?
73. What are decap cells and what is the purpose of it? What are the
advantage and disadvantages of Decap cells?
74. Why filler cells are used? Why we need fill in decreasing order of filler
cell size.
75. Can I add spare cells instead of filler cells.so that we have many spare
cells for ECO?
76. Can I use FILL1 cells where i can use FILL32/64?
77. What is ESD (Electrostatic discharge)?
78. Why endcaps cells are used and internal structure how to differentiate
it with filler cells?
79. What are tie high and tie low cells and where it is used?
80. What are the inputs for physical design?
81. What does lef and lib and .tf files contains?
82. How the cell is defined in library?
83. What does sdc file contains?
213
84. What is cell delay and net delay and how it is defined and calculated?
85. What are different timing delays models available and what is WLM?
86. From where do you get the WLM? Do you create WLM? How do u
specify?
87. In which metal do you prefer the IO pins? How many metal layers
(HVH) will you select for the below shaped blocks?
88.How much space/area do you take while doing floor planning if 8*32 bit
bus talking from one macros to another macros?
89.Color region shows the routing congestion inside the chip? What is the
reason?
If I get more questions, I will write here in the sequence...
Ques:What are the advantage and disadvantages of Decap cells?

advantage:
1. stable voltage between power and ground when signal net switch
2. reduce dynamic IR drop
disadvantages:
1. consume chip area as they are bigger in size.
2. leakage power consumption
214
Ques: Why filler cells are used? why we need fill in decreasing order of filler
cell size.
 Special filler cells are required to obtain the final layout, because placed
standard cells can have free spaces between them that need to be filled. These
spaces are also multiple of technology pitch and therefore the filler cells widths are
chosen accordingly.
 If we do DRC check we can expect spacing violations like "NWELL
minimum spacing is not met", this is the condition where we have to use filler cells
for continuity of NWELL.
 If we are placed continuous low width filler cells it forms wide metal and
create metal slotting rule violation.
 The CMP process also introduce undesired side effects like dielectric
erosion and metal dishing.As the number of metal layers and the wafer size is
increases, the need of planarity across wafer becomes more crucial.chemical
mechanical planarization/polishing (CMP) is a process of smoothing surfaces with
the combination of chemical and chemical process.
To avoid metal slot rule violation always insert fat filler cells then thin filler cells.
for example use one 20 um pitch filler cell and one 10 um pitch filler cellls instead
of using 6 "5um pitch filler cells to fill 30 um spacing.
Metal slotting is a manufaturing issue, some process have special rules requiring
metal wires have wide slot(e.g> 10-40 um) .slots are long slits , on the order of
3um wide ,in the wire runninhg parallel to the direction of current flow. they
provide stress relief, help keep the wire in place and reduce the risk of
electromigration failure. these design rules vary widely between manufacturers.
fig: slots in wide metal power bus
215
Ques: can I add spare cells instead of filler cells.so that we have many spare
cells for ECO?
Ans : NO, because
1. more static power consumption of spare cells as compared to filler cells.
2. spare cells are not available in all sizes.
but we can replace filler cells with spare cells with logic.
Ques: Can I use FILL1 cells where i can use FILL32/64?

Ans: No, because following problems will comes
1. metal slotting
2. metal dishing
3. dielectric erosion
Ques: what is ESD(Electrostatic discharge)?
Ans:There is a growing interest in the effects of ESD on the performance of
semiconductor integrated circuits (ICs) because of the impact ESD has on
production yields and product quality. ESD problems are increasing in the
electronics industry because of the trends toward higher speed and smaller device
sizes. ESD is a major consideration in the design and manufacture of ICs.
Static charge is an unbalanced electrical charge at rest. Typically, it is created by
insulator surfaces rubbing together or pulling apart. One surface gains electrons,
while the other surface loses electrons. This results in an unbalanced electrical
condition known as static charge. When a static charge moves from one surface to
another, it becomes ESD. It can occur only when the voltage differential between
the two surfaces is sufficiently high to break down the dielectric strength of the
medium separating the two surfaces. When a static charge moves, it becomes a
current that damages or destroys gate oxide, metallization, and junctions.
ESD can occur in any one of four different ways: a charged body can touch an IC,
a charged IC can touch a grounded surface, a charged machine can touch an IC, or
an electrostatic field can induce a voltage across a dielectric sufficient to break it
down.
216
Floorplanning
What is floor planning?
 A floor planning is the process of placing blocks/macros in the chip/core

area.
 Floorplan determine the size of the design cell (or die), create the boundary
and core area and creates wire tracks for placement of standard cells. It is
also a process of positioning blocks or macros on the die.
 Floorplan is one the critical & important step in Physical design. Quality of
your Chip / Design implementation depends on how good is the Floorplan.
Inputs and Outputs of Floorplan:
Inputs
 Gate level netlist (.v)

 Logical (.db) and physical library(mW)
 Timing constraint (SDC)
 Technology file (.tf)
 RC co-efficient files (TLU+ files)
Outputs
 Design Database (Floorplan Completed)

 Die/Block area decided.
 I/O placed – Macros placed
 Standard cell placement area decided
 Blockages are defined.
 DEF file.
Types Of Floorplan
 Two type of the design are possible.

o Chip Level
o Block Level
217
 Block level design will be rectilinear and chip level will be rectangular in
shape.
Rectangular :- To define this only height and width of the die is required.
Rectilinear :- To define the size more coordinates are required.
218
Another types of the floorplan.
 Channeled floorplans
 Abutted floorplans
 Narrow-channel floorplans
Channeled floorplans
Figure A
 Channeled floorplans contain spacing between blocks for the placement of

top-level macro cells, as shown in Figure.
Abutted floorplans
Figure B
219
 In the abutted floorplan methodology, blocks are touching and the tool does
not allocate space for macro cell placement between blocks.
 Designs with abutted floorplans do not require top-level optimization, as all
logic is pushed down into the blocks. However, abutted floorplans might
require over-the-block routing to meet design requirements.
 This floorplan style also requires more attention to feedthrough
management. Routing congestion might also become an issue for abutted
floorplan designs. An example of an abutted floorplan is shown in Figure.
Narrow-channel Floorplans
Figure C
 The narrow-channel floorplan methodology provides a balance between the

channeled floorplan and abutted floorplan methodologies.
 You can abut certain blocks in the design where top-level cells are not
needed.
 In other areas, you can reserve a channel between certain blocks for top-
level clock routing or other special purpose cells. An example of a narrow-
channel floorplan is shown in Figure.
Floorplan Control Parameter

Aspect Ratio :-
220
 Aspect ratio is the ratio between vertical routing resources to horizontal
routing resources. If the ratio is 1.0 height and width are the same, therefore
the core is a square. If the ratio 3.0 the height is three times the width.
Core Utilization :-
 Utilization defines the area occupied by standard cell, macros and blockages.
In general 70% to 80% of the utilization is fixed because more number of
inverter and buffer will be add during the process of CTS in order to
maintain minimum skew.
Macro Placement Guidelines
 Using FLY line.

 Port communication.
 Macro grouping (Logical Hierarchy).
 Spacing between two macros.
 Macros alignment – orientation.
 Blockages.
 Notches avoiding.
 Macros are rotated as required to optimized wire length.
 Typically, macros are placed around EDGE of block, keeping are large main
area for STD cells.
 Make a HALO around the macros.
Spacing between two macros
221
Blockages
Placement Blockages
 In placement blockages there is a three types of blockages

o Hard (STD cell Blockages)
o Soft (Non-Buffer Blockages)
o Partial Blockages
 Hard (STD cell Blockages)
o Block all STD cell and buffer
 Soft (Non-Buffer Blockages)
o Only buffer can be placed and STD cell cannot be placed.
 Partial Blockages
o By default, a placement blockages has a blockages factor of 100%.
o You can set blockages percentage according to requirements.
Routing Blockages
 Routing blockages block routing resource on one or more layers.
Halo (Keep-Out Margin)
 It is region around the boundary of fixed macros in design in which no other

macros or std-cell can be placed.
222
 It allow placement of buffers and inverter in its area if keep-out margin is
soft.
 If we move the macro than halo also move.
After Floorplan Checklist
 Hard macro placement done according to guidelines?

 Proper blockages created over and around macros?
 Pins and pad placement done according to given constraints or flyline
analysis?
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
=*=*=*=*=*=*=
C------------------------
Powerplanning
What is power planning?
 In power planning power grid network is created to distribute power to each

part of the design equally.
 It is used to provide power to macros and standard cells within the IR-drop
limit.
 Power planning can be done manually as well as automatically through the
tool.
Inputs and Outputs of powerplanning
Inputs
 Database with valid floorplan.

 Power ring and power straps width.
 Spacing between VDD and VSS straps.
 Power layer selection.
223
Outputs
 Design with power structure.
Power Management
Three level of power distribution.
Ring:- Carries VDD and VSS around the chip.
Straps:- Carries VDD and VSS from the ring.
Rails:- Connect VDD and VSS to the standard cell’s VDD and VSS.
Power planning management can be divided in two category.
 Core cell power management

o Power ring are formed around the core.
o If any macro/IP is power critical than create separate power ring for
particular macro/IP.
o Number of straps are created based on power requirement.
 I/O cell power management
o IO cell power planning power rings are formed for I/O cells and
trunks are created between core power ring and power pads.
224
IR Drop Analysis
 The power supply in chip is distributed uniformly through metal layer (VDD
and VSS) across the design, these metal layer have finite amount of
resistance.
 When voltage is applied to this metal wires current start flowing through the
metal layer and some voltage is dropped due to that resistance of metal wire
and current. This drop is called as IR drop.
 This result in increase noise and poor performance.
225
 More IR drop = Delay increases.
Different Type of IR Analysis.
Static IR Drop:- Independent of the cell switching the drop is calculated with the
help of wire resistance.
 Technique to improve static IR drop

o Width of the wire increase.
o Increase the no. of wires.
Dynamic IR Drop:- IR drop is calculated with the help of the switching of the
cells.
 Technique to improve dynamic IR drop

o Placing DCAP cells.
o Increase the no. of straps.
Electromigration
 Electromigration (EM) is generally considered to be the result of momentum

transfer from the electrons due to high current density.
 Atoms get displaced from their original position causing voids(opens) &
hillocks(shorts) in the metal layer.
 Heating also accelerates EM because higher temperatures cause a higher
number of metal ions to diffuse.
Technique to solve EM
 Increase the width of the wire.
226
 Buffer insertion.
 Downsize the driver.
 Switch the net to higher metal layer.
After powerplan checklist
 All Power/Ground supplies are defined separately ?

 Is IR drop acceptable ?
 IS power supply uniform in the corners ?
 Are special cells added ?
 Any power related shorts/Opens in the design ?
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
=*=*=*=*=*=*=
Placement
What is Placement?
 In this stage all the standard cells are placed in the design.
 Tool determine the location of each of the standard cell on the die.
 Placement does not just place the standard cell available in the synthesized
netlist. It also optimize the design.
 Placement will be driven by different criteria like timing driven congestion
driven, power optimization.
Goal of placement
 Timing, Power, Area optimization.

 Routable design.
 No/minimal cell density, pin density.
 Minimal timing DRCs.
Inputs and Outputs of Placement
Inputs
 Netlist.
227
 Floorplanned design.
 Logical and physical libraries.
 Design constraint.
Outputs
 Physical layout information Cell

 placement location
Flow of placement
 Before the start of placement optimization all wire load model (WLM) are
removed.
 Placement uses RC values from virtual route (VR) to calculate timing.
 VR is the shortest Manhattan distance between two pins.
 VR RCs is more accurate than WLM RCs.
Placement is performed in four optimization phases.
 Pre-placement optimization.
 In placement optimization.
 Post Placement Optimization (PPO) before clock tree synthesis (CTS).
 PPO after CTS.
Pre-Placement optimization
 Pre-placement Optimization optimizes the netlist before placement, HFNs

are collapsed. It can also downsize the cells.
In-placement optimization
 In-placement optimization re-optimizes the logic based on VR. This can

perform cell sizing, cell moving, cell bypassing, net splitting, gate
duplication, buffer insertion, area recovery. Optimization performs iteration
of setup fixing, incremental timing and congestion driven placement.
228
Post placement optimization
 Post placement optimization before CTS performs netlist optimization with

ideal clocks. It can fix setup, hold, max trans/cap violations. It can do
placement optimization based on global routing. It re does HFN synthesis.
Post placement optimization
 Post placement optimization after CTS optimizes timing with propagated

clock. It tries to preserve clock skew.
Placement Stages
Placement perform in three phase.
 Global Placement
 Legalization
 Detailed Placement
Global Placement
 Global placement aims at generating a rough placement solution that my

violate some placement constraint while maintaining a global view of whole
netlist.
 Objective :- To minimize the interconnect wire length.
Legalization
229
 Legalization make the rough solution from global placement legal ( i.e. no
placement constraint violation ) by moving modules around locally.
Detailed Placement
 Detailed placement further improve the legalized placement solution in an

iterative manner by rearranging a small group of modules in a local region
while keeping all other modules fixed.
Timing Driven Placement
 Timing-driven placement tries to place cells along timing-critical path close

together to reduce net RCs and meet setup timing.
Congestion Driven Placement
 Congestion-driven placement tries to spread the cells along to reduce the

track usage.
230
Different task in placement
 HFNS (High Fan-out Net Synthesis )

 Scan chain re-ordering
 Special cell adding
HFNS (High Fan-Out Net Synthesis)
 High fan-out net synthesis (HFNS) is the process of buffering the high
fanout nets to balance the load.
Why HFNS?
 To balance the load HFNS is performed too many load affects delay
numbers and transition times. Because load is directly proportional to the
delay. By buffering the HFN the load can be balanced this is called as high
fan-out net Synthesis.
Where HFNS?
 Generally at placement step HFNS performed. And also be perform during

synthesis in design compiler.
Scan chain Re-ordering.
 IC-compiler can perform placement aware reordering of scan cells.

 The scan chain information (SCANDEF) from synthesis can be transferred
to IC compiler in two way:
o By loading the netlist in ddc format (embedded SCANDEF).
o By loading a SCANDEF file (generated after scan insertion) .
Scan Chain Re-Ordering.
 Scan chain not re-ordered. It will create congestion & degrade overall QoR.
231
Same flop placement with scan-chain reordered has better congestion & wire / net
length are reduce.
 Re-ordered scan chain. Reduced congestion
232
Optimization Technique
 Netlist restructuring only changes existing gates, does not change

functionality.
Optimization Technique
 Cloning
o Duplicating gates
 Gate Sizing
 Buffering
 Redesigning fanout trees can change delays on specific paths
 Swapping commutative pins can change the final delay.
Cloning
 Duplicating Gates
Gate Sizing
233
Buffering
Redesigning fanout trees can change delays on specific paths
234
Swapping commutative pins can change the final delay
Note:- To study in detail about above optimizing concept refer VLSI Physical
Design: From Graph Partitioning to Timing Closure book.
Checks After Placement
 Check legalization.
 Check PG connections of all the cells.
 Check congestion, place density & pin density maps, All these should be
under control.
 Timing QoR, There should not be any high WNS violations.
 Minimal max tran & max cap violations.
 Check whether all don’t touch cells & nets are preserved.
 What is the total utilization of the design after placement?
 Setup Timing violations Solved ?
 Congestion is Acceptable ?
 What is the maximum Overflow ?
235
CTS Clock Tree Synthesis
What is CTS (Clock Tree Synthesis)?
 CTS is the process to built a physical clock tree structure between clock
source to sink pins in design.
 CTS process is carried out after placement of macros and standard cells,
because only after placement of cells the exact physical location of cells can
be identified which is needed to establish the tree structure in the design.
 Clock Tree Synthesis is a process which makes sure that the clock gets
distributed evenly to all sequential elements in a design.
Goal Of CTS
Meet logical Design Rule Constraints (DRC)
 Maximum transition delay

 Maximum load capacitance
 Maximum fanout
Meet the clock tree targets
 Minimum skew
 Min/Max insertion delay
Inputs and Outputs of CTS
Inputs
 Detail placement database

 SDC, TLU+, .tf
236
 CTS constraint
Outputs
 Database with properly build clock tree in the design
Quality Check Parameter
 Clock skew
 Pulse width
 Duty cycle
 Clock latency
 Clock tree power
 Signal integrity and crosstalk
Clock skew
 Skew should be minimum. From figure T1-T2 = ~0
Pulse Width
 There should be no pulse width violation.
237
Duty Cycle
 Duty cycle should be equal
Clock Latency
 The latency of a clock is defined as the total time that a clock signal takes to
propagate from the clock source to a specific register clock pin inside the
design.
238
Advantages
 Need less no. of buffer.

 Need less no. of routing resources.
 Reduced clock power dissipation.
Clock Tree Power
 Basically clock tree power is a function of latency and transition.
Latency
 Low latency = Low number of buffer (Save Switching Power)
Transition
 Low transition is good.
239
Signal integrity and crosstalk.
Figure A
240
Figure
B=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
*=*=*=*=*=*=*=
Clock Tree Algorithms
 H-Tree
 X-Tree
 Method of Mean and Median
 Geometric Matching Algorithms
 Pi Configuration
H-Tree
 Minimize skew by making interconnection to subunits equal in length.
241
X-tree
 X-tree is similar to H-tree but only difference is the connections are not
rectilinear.
Difficulties
 Cross Talk
 Routing is not rectilinear
242
Method of Mean and Median (MMM)
 Method of mean and median follows the strategy similar to H-tree

algorithm, provided the H-tree shape is achieved by proper partitioning of
the module.
 It continuously partitions the system into two equal parts (As Fig.) and
connects the center of the whole circuit (module) to center of the two sub-
circuits (sub-module) and thus produces a non-linear tree.
 Here intersection of wires may be possible.
 Figure shows the exact flow of MMM and the process of partitioning.
 Partition keeps on continues until each division contains only one sub-
module.
243
Geometric Matching Algorithm (GMA)
 Geometric Matching Algorithm (GMA) In fig (A), the physical locations of

sub-modules are not symmetric.
 Developing H-tree among these sub-modules is practically not possible.
 At first two sub-modules are grouped together and those trees are named as
X-1, X-2, X-3 and X-4 (fig (B)).
 The optimal entry point may not be equidistant from the entry point of X-1
and X-2; buffer insertion can balance the delay because of un-equal net
length.
 Then two two-point trees are joined together to form a H like structure. The
resultant H-trees are named as X-12 and X-34 (fig (C)).
 As shown in fig (D), the tap points of both the H structure cannot be
connected by using rectilinear nets.
 In order to connect the two trees the geometrical position of one H-tree is
changed compatible to the other tree’s tap point and the resultant will be as
shown in fig (E).
244
Pi Configuration
 In pi configuration, the total number of buffers inserted along the clock path
is multiple of previous level.
 This type of structure uses the same number of buffers and geometrical
wires and relies on matching the delay components at each level of the
clock.
 The pi structure is clock tree is considered to be balanced. The
representation of pi tree is shown in fig.
245
Clock Tree Optimization Technique
 Buffer and Gate Sizing

 Buffer and Gate Relocation
 Level Adjustment
 Reconfiguration
 Delay insertion
 Dummy Load Insertion
Buffer and Gate Sizing
 Sizes up or down buffers and gates to improve both skew and insertion
delay.
246
Buffer and Gate Relocation
 Physical location of the buffer or gate is moved to reduce skew and insertion
delay.
247
Level Adjustment
 Adjust the level of the clock pins to a higher or lower part of the clock tree
hierarchy.
Reconfiguration
248
 Clustering of sequential logic.
 Buffer placement is performed after clustering.
 Longer run times.
Delay Insertion
 Delay is inserted for shortest paths.

 Delay cells can be user defined or can be extracted from by the tool.
 By adding new buffers to the clock path the clock tree hierarchy will change.
249
Dummy Load Insertion
 Uses load balancing to fine tune the clock skew by increasing the shortest
path delay.
 Dummy load cells can be user defined or can be extracted by the tool.
250
Non Default Routing (NDR)
 Double width.
 Double spacing.
 Shielding - It is the process of shield the clock net by VSS/VDD Nets for
reduce cross talk.
251
Effect Of CTS
 Clock buffer are added.

 Congestion may increase.
 Non-clock cell may have been moved to less ideal location.
Checklist After CTS
 Hold fix are done at CTS? How many buffer are added.
 What skew and latency is achieved after CTS?
 Check QoR report, setup and hold are fixed?
 Only clock buffer or pair of inverter used in the clock tree?
 Timing analysis with OCV and Crosstalk done?
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
*=*=*=*=*=*=*=*=*=
Routing
What is Routing?
 Routing is the process of creating physical connections based on logical

connectivity.
252
Inputs and Outputs Of Routing
Inputs
 Netlist with location of blocks and locations of pins (CTS Completed).

 Timing budget for critical net.
 Technology File
 TLU+ File
 SDC
Outputs
 Geometric layout of all nets (.GDS)

 .spef file
 .sdc file
Checklist Before Routing
 Placement completed
 CTS completed
 Power and ground nets routed
 Estimated congestion - acceptable
 Estimated timing – acceptable (~0 ns slack)
 Estimated max cap/trans – no violations
Routing Constraint
Design rule constraint
 Design rule constraints are often related with the manufacturing details
during fabrication. To improve the manufacturing yield, connections of nets
have to follow the rules provided by foundries. It also includes RC value of
each layer.
253
Types Of Routing
 Power routing.
 Clock routing.
 Signal routing.
 Critical routing.
Power Routing
 In the power routing VDD and VSS are routed in power routing.
Clock Routing
 After CTS, clock is routed.
254
Signal Routing
 In the routing stage all the signal pins are routed.
Routing operation stages
 Global Routing
 Track Assignment
 Detailed Routing
o Search and Repair
255
Global Routing
 first partitions the routing region into tiles/rectangles called global routing
cells (gcells) and decides tile-to-tile paths for all nets while attempting to
optimize some given objective function (e.g., total wire length and circuit
timing), but doesn’t make actual connections or assign nets to specific paths
within the routing regions.
 By default, the width of a gcells is same as the height of a standard cell and
is aligned with the standard cell rows.
 Every gcell having the a number of horizontal routing resources and vertical
routing resources.
 Global routing assigns nets (logical connectivity not metal connectivity) to
specific metal layers and global routing cells.
Track Assignment
 Track assignment is a stage wherein the routing tracks are assigned for each
global routes. The tasks that are performed during this stage are as follows
o Assigning tracks in horizontal and vertical partitions.
o Rerouting all overlapped wires.
 Track Assignment replaces all global routes with actual metal layers.
 Although all nets are routed (not very carefully), there will be many DRC, SI
and timing related violations, especially in regions where the routing
connects the pins. These violations are fixed in the succeeding stages.
256
Detail Routing
 The detailed router uses the routing plan laid by the router during the Global
Routing and Track Assignment and lays actually metal to logically connect
pins with nets and other pins in the design.
 The violations that were created during the Track Assignment stage are fixed
through multiple iterations in this stage.
 The main goal of detailed routing is to complete all the required interconnect
without leaving shorts or spacing violations.
 The detailed routing starts with the router dividing the block into specific
areas called switch boxes or Sbox, which are generally expressed in terms of
gcells.
Search and Repair
 The search-and-repair stage is performed during detailed routing after the

first iteration. In search-and-repair, shorts and spacing violations are located
and rerouting of affected areas to fix all possible violation is executed.
Types of Detail Routing
 Grid-Based Routing
 Grid-Less Routing
257
Grid-Based Routing
Grid-Based Routing
 For grid-based routing, a routing grid is superimposed on the routing region,

and then the detailed router finds routing paths in the grid.
 The space between adjacent grid lines is called wire pitch, which is defined
in the technology file and is larger than or equal to the sum of the minimum
width and spacing of wires.
 Switching from layer to layer is allowed only at the intersection of vertical
and horizontal grid lines. In this way, the wires with the minimum width
following the path in the grid would automatically satisfy the design rules.
 Therefore, grid-based detailed routing is much more efficient and easier for
implementation.
Grid-Less Routing
258
Grid-Less Routing
 A gridless detailed router does not follow the routing grid and thus can use
different wire widths and spacing.
 It can handle variable widths and spacing for wires and is, thus, more
suitable for interconnect tuning optimization, such as wire sizing.
 However, gridless detailed routing is generally much slower than the grid-
based one because of its higher complexity.
Routing Congestion
 Routing congestion occurs when too many routes need to go through an area
that does not have enough resources or routing tracks.
 High speed design having more signal integrity constraints such as extra
route spacing or wire routes.
Causes of Congestion
259
Floorplan Congestion
 Between memories & around corner of memories.
High placement density
 Timing is tightly constrained without taking wire impact into account or

optimized using excessive buffering or sizing.
Logic induced congestion
 High amount of connectivity contained within a small area. i.e. Large mux/
High pin density in small cell.
 Limited numbers of routing layer for cost reason can also cause global
congestion.
Solution For Congestion
Floorplan Congestion
 Creating more space around the macros.

 Placement blockages.
 Macro Orientation change.
High placement density
 Spreading of standard cells.

 Remove buffers and pairs of inverters before placement.
 Decreasing the overall target utilization.
Check List After Routing
 Special cells are inserted.

 Final utilization (total & cell row).
 Final congestion figure.
 Power analysis done?
 Total power consumption in design.
260
 Timing analysis for setup & hold slack.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
=*=*=*=*=*=*=
Basic Sign Off
What is sign off?
 Sign off is a process of logical and physical verification of our chip.
Three types of checks done in sign off stage
 Logical Checks
 Physical Checks
 Power Checks
Sign-off Checks
Logical Checks
 LEC ( Logical Equivalence Checks)

 Post Layout STA
Physical Checks
 LVS (Layout vs Schematics)

 DRC (Design Rule Checks)
 ERC (Electrical Rule Check)
 Antenna Check
Power Checks
 Dynamic IR
 EM (Electromigration)
261
Logical Checks
 LEC (Logical Equivalence Checks)

o Ensure the functional check between RTL and Netlist.
o LEC can be perform between any two representations of a design:
RTL vs Netlist OR Reference Netlist vs Golden netlist.
 Inputs: Library files, golden Netlist, reference netlist
Post Layout STA
 All timing checks are performed again using the actual parasitic extracted
after the routing of design.
 INPUTS: library files, constraints file, routed netlist, SPEF (SBEF/SDF)
 Tools: Prime Time from synopsys, tempes from cadence.
 Tool to generate RC extraction: starRC, QRC
 Input needed for starRC: .tf, .nxtgrdn(generated from .itf file using
grdgenxo tool), GDS-II or DEF/LEF
 Timing checks perform: setup checks , hold checks, DRC checks.
Physical Checks
LVS (Layout vs Schematics)
 Inputs: .v netlist of the design, GSD-layout database of the design, LVS

rule deck (.v and GDS should be of the same stage).
 LVS rule deck file contains the layer definition to identify the layers used in
layout file and to match it with the location of layer in GDS. It also contains
device structure definitions.
262
LVS check involve three steps:
 Extraction:
o The tool takes GDSII file containing all the layers and uses polygon
based approach to determine the components like transistors, diodes,
capacitors and resistors and also connectivity information between
devices presented in the layout by their layers of construction. All the
device layers, terminals of the devices, size of devices, nets, vias and
the locations of pins are defined and given an unique identification.
 Reduction:
o All the defined information is extracted in the form of netlist.
 Comparison:
o The extracted layout netlist is then compared to the netlist of the same
stage using the LVS rule deck. In this stage the number of instances,
nets and ports are compared. All the mismatches such as shorts and
opens, pin mismatch etc.. are reported. The tools also checks topology
and size mismatch.
LVS check includes following comparisons:
 Number of devices in schematic and its layout.

 Type of devices in schematic and its layout.
 Number of nets in schematic and its layout.
263
Typical errors which can occur during LVS checks are:
 Shorts: Shorts are formed, if two or more wires which should not be
connected together are connected.
 Opens: Opens are formed, if the wires or components which should be
connected together are left floating or partially connected.
 Component mismatch: Component mismatch can happen, if components of
different types are used (e.g, LVT cells instead of HVT cells).
 Missing components: Component missing can happen, if an expected
component is left out from the layout.
 Parameter mismatch: All components has it’s own properties, LVS tool is
configured to compare these properties with some tolerance. If this tolerance
is not met, then it will give parameter mismatch.
DRC (Design Rule Checks)
 Design rule checks are nothing but physical checks of metal width, pitch and
spacing requirement for the different layers with respect to different
manufacturing process.
 If we give physical connection to the components without considering the
DRC rules, then it will lead to failure of functionality of chip, so all DRC
violations has to be cleaned up.
 After the completion of physical connection, we check each and every
polygon in the design, based on the design rules and reports all the
violations. This whole process is called Design Rule Check.
Typical DRC rules are:
 Interior
 Exterior
 Enclosure
 Extension
264
Interior
Exterior
Enclosure
265
Extension
ERC (Electrical Rule Checks)
 ERC involves checking a design for all electrical connection.

 Checks such as:
o Well and subtract area for proper contact and spacing.
o Unconnected input or shorted output.
o Gates should not connect directly to supply (Must be connect through
TIE high/low cells only).
 Floating gate error:
o If any gate is unconnected, this could lead to leakage issues.
 VDD/VSS errors:
o The well geometries need to be connected to power/Ground and if the
PG connection is not complete or if the pins are not defined, the whole
layout can report errors like “NWELL not connected to VDD.
266
Antenna Check
 The antenna effect is caused by the charges collected on the floating

interconnects, which are connected to only a gate oxide. During the
metallization, long floating interconnects act as temporary capacitors and
store charges gained from the energy provided by fabrication steps such as
plasma etching and CMP. If the collected charges exceed a threshold, the
Fowler-Nordheim (F-N) tunneling current will discharge through the thin
oxide and cause gate damage. On the other hand, if the collected charges can
be released before exceeding the threshold through a low impedance path,
such as diffusion, the gate damage can be avoided.
 For example, considering the routing in Figure A, the interconnects are

manufactured in the order of poly, metal 1, and metal 2. After manufacturing
metal 1 (see Figure B), the collected charges on the right metal 1 pattern
may cause damage to the connected gate oxide. The discharging path is
constructed after manufacturing metal 2 (see Figure C), and thus the charges
can be released through the connected diffusion on the left side.
267
Solution for antenna violation
 There are three kinds of solutions to reduce the antenna effect.

o Jumper insertion: Break only signal wires with antenna violation
and route to the highest level by jumper insertion. This reduces the
charge amount for violated nets during manufacturing.
o Embedded protection diode: Add protection diodes on every input
port for every standard cell. Because these diodes are embedded and
fixed, they consume unnecessary area when there is no violation at the
connecting wire.
o Diode inserting after placement and routing: Fix those wires with
antenna violations that have enough room for “under-the-wire” diode
insertion. During wafer manufacturing, all the inserted diodes are
floating (or ground). One diode can be used to protect all input ports
that are connected to the same output ports. However, this approach
works only if there is enough room for diode insertion.
Power Checks
EM – electro migration
 EM leads to open circuits due to voids in wires or vias and leads to short
circuits due to extrusions or “hillocks” on wires. During older technology
nodes EM was considered only on power wires and clock wires, But now
signal wires also need to be considered due to increased current density in
them.
Method to fix EM
 Widen the wire to reduce current density.

 Keep the wire length short.
 Reduce buffer size in clock lines.
 Lower the supply voltage.
268
IR drop
 IR Drop can be defined as the voltage drop in metal wires constituting power
grids before it reaches the vdd pins of the cells.
 IR drop occurs when there are cells with high current requirement or high
switching regions.
 IR drop causes voltage drop which in-turn causes the delaying of the cells
causing setup and hold violations. Hold violations cannot be fixed once the
chip is fabricated.
Two types of IR drop
 Static IR drop analysis

 Dynamic IR drop analysis
Method to reduced IR drop
 Robust power mesh

 De-cap Cell
 Spacing
 Reducing load
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
=*=*=*=*=*=*=
269

Pddesign

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pddesign

Uploaded by

Copyright:

Available Formats

WHAT IS PHYSICAL DESIGN

Physical design directly impacts circuit performance, area, reliability, power,

**GDS- Graphical Data system

1. The physical design is the process of transforming a circuit description into

The main steps in the ASIC PHYSICAL DESIGN flow are:

1. Design netlist (after synthesis)

INPUTS FOR PHYSICAL DESIGN

Name of Inputs File format Given by

Description of all inputs

Synopsys Design Constraints(SDC) :

 create clock definition

Timing library/logical library (.lib):

 It contains timing information of standard cells, soft macros, hard macros.

CCS(Composite current source) NLDM (Non-Linear Delay Model)

in .lib file following units are present,

 It contains physical information of standard cells, macros, pads.

 It is a table containing wire cap at diffrent net length and spacing.

Interview Questions related to this topic:

Inputs for floorplan:

fig 1. ASIC design

1. Decide core width and height for die size estimation.

floorplan control parameter:

 Hard macros: The circuit is fixed. We can’t see the functionality

fig: placement of IO pads, standard cells and macros hierarchy wise

Fig. Abutted non-pin side macros with abutted halo

fig: pin side macros with halo

types of floorplan techniques:

standard cell row:

macros to macros fly lines:

ICC command for keepout margin:

 only buffer can be placed.

 Partial blockages limit the cell density in the specified area.

PHYSICAL ONLY CELLS

PHYSICAL ONLY CELLS:

fig1: TAP Cells and END CAP Cells

Solution for latch-up problem:

How the circuit looks like and how it will work:

fig: TIE LOW CELLS

fig2: Filler cells, Decap cells

Well proximity effect:

well proximity effect

With increasing clock frequency and decreasing supply voltage as technology

Boundary cells (End cap cells):

Ring and core power ground

Power planning involves

Inputs of power planning

POWER PLANNING MANAGEMENT

I/O cell power management:

Ideal power distribution network has the following properties:

There are two types of IR drop

Methods to improve static IR drop

Methods to improve dynamic IR drop

Tools used for IR drop analysis:

Methods to solve EM:

UPF & special cells used for power planning

Unified Power Format (UPF):

UPF is an IEEE standard and developed by members of Accellera.UPF is

SOURCES OF POWER DISSIPATION IN CMOS

Sources of power dissipation:

Static dissipation due to:

On rising edge output change Q = CVDD is required to charge the output

Pswitching = CV2DD fsw