You are on page 1of 47

1.

MOSFET

MOSFET leakage power


2. The most effective way to reduce dynamic power is to
reduce the supply voltage. The trouble with lowering
VDD is that it tends to lower IDS, the on or drive
current of the transistor, resulting in slower speeds.
3. Multi voltage design challenges.
 Level shifters are inevitable and power rail design
become complex.
 Cell libraries may not be characterized at the
voltage being used & STA becomes complex
 Additional voltage regulators are required at
board level
 Careful power up & power down sequence is
required to avoid deadlock
 For voltages in close range (.9v – 1.2v) if level
shifters are not used, input signal may turn on
both pmos & nmos cause crowbar current. May
lead to rise/fall time degradation and timing
failure
 If the voltage changes during operation, then
level shifters pose challenge
 It should be taken cared such that voltage
domain always have same swing
requirement(high – low/ low-high) with
neighboring domains
 High2Low level shifter require single power rail

 Low2High level shifters are more critical. Under


driven signals lead to degradation of rise fall
times & cause more crowbar currents, higher
switching current, CTS challenges. Power rail
becomes complex. Offer more delay,

 When two voltage domains present in another


domain, problem become more challenging when
signal travels from one domain to another domain
through a other domain.
 STA of multi voltage design

Timing constraints must be provided for each


supply voltage level. Each domain clock
frequency may be different, have different
performance objectives
4. Multi voltage domains system design issues
 Power up sequence: It’s not practical to bringup
all power supplies precisely at same time
 Explicit power sequence is to be defined
 All domains must be powered up completely
before reset is released/issued
 CPU has to wait until rest of the chip is powered
up before booting
 Crystal oscillators & PLL require technology
dependent stabilization lock times which will
begin after SOC is powered up
 Voltage regulators are required to avoid voltage
overshoot & undershoot
POWER GATING CONSIDERATIONS
 Power gating of entire CPU provides good power
reduction
 But wakeup time response has negative mpact
 Net power savings depend on wakeup profile
(how much energy spent in reloading state)
 CPU must be powered off after completion of
current task so that it can resume freshly after
wake up
 Outputs of power gated block may ramp off very
slowly cause crowbar current in power on block.
Isolation cells prevent this
 Use retention cells in place of normal flops.
Retention cells have shadow registers(slower
than main regs) but has very less leakage
current. When retention enable signal asserted
the contents will copied to main registers
 Fine grain power switch is placed inside each
standard cell and has vast area overhead
 Coarse grain power switch supplies power to the
block and has little area overhead
 In-rush current needs to be controlled to avoid
excess IR drop.
 Power gating challenges are power switching
fabric, power gating controller, isolation, retention
cells, impact on timing & area, clocks, resets,
correct SDC, low power verification,

 Daisy chain connection of header switches offer


certain delay to power up completely
 Power switching fabric contains AON buffers and
adds to power routing complexity
 Power gating control signals must be bypassed
during DFT
 Isolation cells avoid crowbar current in AON
domain and may add delay in critical path.
Transistor level pull up/down circuits used to
produce clamp high/low for signal isolation but
suffer metal migration problem & DFT is difficult
 It’s preferred to place isolation cells at the source
 Isolation enable signal is to be buffered only by
AON buffers.
 Re usable IPs must designed to have isolations
cells within
 For complex protocol signals latched isolation
cells used to re start from the state rather from
reset state.
 Isolation control signals must be ensured so that
stuck at faults are detected in test mode
 Avoid isolation of clock signals
State retention & restoration methods
 SW read regs & write after powered on
 Restoration with scan chain (RTL code must be
written to debug the retention scan operation).
Number of scan chains should be same as
memory data bus width. Saving & restoring
retention flops result in overhead time & may
cause IR drop issue. During scan toggle activity
is more than in normal operation because all the
flops in the scan chain can potentially toggle in
each clock. Need to have separate scan chain for
each power domain.
 Typically all scan chains should have same
length, otherwise balancing flops have to be
added

 Functional simulation could be a challenge


 Having retention registers. A shadow register
contains retention data and has AON Vdd. Offer
20-50% area overhead
 Save & restore signals should be under AON
 To keep retention transparent to RTL design,
neither the clock, nor reset are active during
retention
 Retention must have priority over clock & reset
 Retention library cells to ensure that contents
doesn’t corrupted due to floating clock, reset
inputs
 Partial state retention pose challenge in non-
retained registers (FIFO/MEMRY/COUNTER)
should be powered up with legal safe states
 In partial retention implementation ensure state
machine has no dependency on non-retained
registers
 Retention controls must be made controllable &
observable during scan mode
 There shouldn’t be X propagation after power up
and reset only non-retained registers. Separate
resets for retention & non-retention registers
 Clock gating pose challenge?
 If both positive & negative edge flops are
retained, it may not be possible to restore all the
data correctly?
 Scan testing of retention registers is
complicated?
POWER CONTROL SEQUENCE
Finish current transaction
Stop the clock
Assert isolation
Assert retention
Assert reset to non-retained registers
Assert power gating control signal to power down
De assert power gating control signal to power up
De assert reset after power is stabilized
Assert retention restore
De assert isolation
Resume clock

Power gating switch fabric must be designed to limit


voltage spikes (which might corrupt retention
registers) is achieved by limiting the current during
power up & thus limiting the rate at which voltage
rises to it’s final value
To cope up with delays request acknowledge
handshake used

Power up shouldn’t be begin before it’s completely


powered down
IDDQ test? (Direct Drain Quiescent Current alternate
to DFX DFT DFM) done to verify power switch turned
off correctly
For long term power leakage savings external power
rail switching is used. But has significant turn on delay
On chip power on may take 100s of clock cycles
For the signals crossing from one power domain
(which can be turned off) we may have to take care of
such corner conditions by defining power state table
Low Power IP development
Multi vt (synthesis scripts)
Clock gating (RTL)
Power gating (UPF)
Voltage scaling (UPF)
5. Bi directional level shifters are not used because of
analog design issue.
6. NMOS is ON when gate is tied to logic high & passes
strong 0. It can charge the load capacitance to max
vdd-vt only because NMOS will be turned off beyond
this level. So NMOS passes weak 1.
7. PMOS is ON when gate is tied to logic low. It can
charge capacitance load to Vdd during PMOS is ON.
When Source is connected to Gnd, it can discharge
capacitance max to Vt only and PMOS will be turned
off beyond. So PMOS passes weak 0.
8. NMOS is twice faster than equal size PMOS as
electrons have double mobility than holes. So in a
CMOS invertor PMOS should have double the width
than NMOS for having equal rise & fall times (eg clock
buffers)
9. Substrate bias Vsb results in increase in threshold
voltage as shown in below equation hence reduces
leakage current

10. Temperature inversion:


In general, when temperature rises, mobility
decreases and so delay increases. At the same time
threshold voltage decreases with rise in temperature.
At higher technologies when temperature raises, cell
delay also increases as mobility variation is dominant.
At lower technologies threshold voltage variation is
dominant and cell delay decreases with raise in
temperature known as temperature inversion
11. Track is defined as the path in which nets can
pass through. 12 track cell will be taller & faster than
9track cell.
12. Each cell, macro, IO pad have orientation
associated. R0,MX,MY,R90,R180,R270,MX90 &
MY90 etc
13. Manufacturing grid. Smallest resolution of the
technology node. Any geometry shape created in the
design must align to the grid to avoid DRC
14. Physical cells: These cells don’t have any
functionality in the design. Tap, endcap, decap, tie,
filler, spare cells
15. Track is virtual line (guideline) for the PNR tool.
For each metal layer in the design, tracks are defined
for preferred & non preferred directions with specific
pitch & offset

16. Pitch: Two parallel wires are separated by s and


thickness w then pitch is w+s. Aspect ratio AR = t/w.
Earlier technologies AR << 1 and modern process AR
= ~2 as width is reduced

17. M1 is closer to the substrate and used to connect


transistors. M2-M4 used for inter connection. M5-M6
thickest & used for connecting VDD, GND& CLK.
M6,M5 . . M1,Poly, diffusion resistance increasing
order.

18. Standard Cells are connected to M1 layer, inter


connection will take place in higher layers as M2 is
routed in horizontal & M3 is routed vertically and inter
layer connection is through via
19. Each macro cell typically has both a CEL view
and a FRAM view.
20. The FRAM view is an abstraction of the cell
containing only the information needed for placement
and routing
21. CEL view is used only for generating the final
stream of mask data for chip manufacturing
22. Partitioning is hierarchical decomposition of a
complex system/netlist in to manageable sub
systems(netlists) so that each partition is same in size
& number of interconnections between them also
similar
23. Dead Space: Wasted space in the layout

24. Latchup in Bulk CMOS


A byproduct of the Bulk CMOS structure is a pair of
parasitic bipolar transistors. The collector of each BJT
is connected to the base of the other transistor in a
positive feedback structure. A phenomenon called
latchup can occur when (1) both BJT's conduct,
creating a low resistance path between Vdd and
GND and (2) the product of the gains of the two
transistors in the feedback loop, b1 x b2, is greater
than one. The result of latchup is at the minimum a
circuit malfunction, and in the worst case, the
destruction of the device.
25. Tap cell:
Usually, in most applications, the Body is connected
to the most positive supply in case of PMOS(Nwell)
devices and to the most negative(ground) in case of
NMOS (p substrate). This is to avoid leakage to the
substrate. This is done using TAP cells. These cells
“tap” into the nwell and psubstrate and make sure that
they are at the required potential. Since the area of
the nwell/substrate is large, a single tap point is not
good enough. Hence multiple numbers of these cells
are placed around the devices which makes sure that
it absorbs the noise and maintain constant bulk
potential.
Tap cells are a special nonlogic cell with well and
substrate ties. These cells are typically used when
most or all of the standard cells in the library contain
no substrate or well taps.
Generally, the design rules specify the maximum
distance allowed between every transistor in a
standard cell and a well or the substrate ties.
NETLIST
Netlist: Format is .V

It contains Logical connectivity Of all Cell(Std


cells,Macros).

1. lib & *.db are same and supplied by the vendor TSMC.
Cadence used .lib & synopsys uses .db
2. lef & .mw are same and supplied by the vendor TSMC.
Cadence used .lef & synopsys uses .mw
3. We are generating milkyway library of cells of the
design at different phases (import/floor plan/power plan
etc)
INPUTS TO PHYSICAL DESIGN

Library Extension/Ven Remarks


file dor
Logical .lib / .db /TSMC Functionality,delay,PVT,trans
library ition, setup, hold, power,drc
Physical .lef .mw/TSMC Pin,unit tile,
library dimension,antenna,routing
blockage,cell & FRAM views
Technolo .tf /TSMC Metal layers, min
gy file width,area,height,current
density, units of
layer,via,wire spacing,min
width between layer & via
Tlu+ .tluplus/TSMC RC paracitics,
Tlf .tlf/TSMC If Tlu+ is not present, derive
RC from .tlf& map files
Tdf .tdf Contains IO pad pin & port
info
Netlist .v/synthesis Synthesis netlist
sdc .sdc/synthesis Timing constraints

SANITY CHECK:
Sanity check is performed on the netlist.
ICC command What it does
1. Physical library quality PIN direction,
SIZE, LAYER consistency between
Check_library techfile& mw file, (same names across
different physical libraries)
2. Logical library quality, pin, area, delay,
leakage numbers
3. Logical vs physical consistency
dumps design stats(how many
gates/macros/ports), multi driven nets??
(shouldn't be there), uniqueness of
Check_design modules(give different names to different
instantiation of same modules), floating
input pins & output ports   Floating inputs
should be connected to known logic
values to avoid metastability issue
Check_timing netlist vs sdc consistency check, netlist vs
library consistency check&Unconstrained
io ports, No Input drive, no output load,
clock not found/reach, comb loop timing
tool takes long time to calculate delay
Report_timing Timing violations
 
MACRO SRAM, IP PLL

FLOOR PLAN:
    Goals          TASKS
Optimal size        size,shape,aspect ratio:utilization
(std cell area + ip area+macro area/ design area) typ
60%
Min Congestion
Better Routability
Meet timing
Optimal data flow
Need more area for optimization cells, special
purpose cells, routability space, blockage/hallow
cells, power density concern

FLOORPLAN
1. size shape
2. IO area creation, IO pin placement
3. Block level IO pin placement is done based on the
connectivity from the adjacent blocks
4. 3.IP macro placement
5. Placement blockages,keepout
6. Die size:Macro area+std cell+blockage+io area
Chip/Die utilization:std cell+macro+IP+IO area % Die area
Std cell utilization: std cell area % (Die area -
[IO+Macro+IP area])
Core area = Die area – IO area
IO port placement:
IO ports: power, gnd, signal io, clock, reset,feed through
System synchronous IF, source synchronous IF(DDR),
self synchronous Iinterface(serdes)
Feed throughs are defined for routing feasibility. PLL has
ref clock input & generates clocks outputs in phase with
ref clock. Place PLL near to refclk port. Few control inputs
to the PLL. Place clock port in the center of the die
IP/MACRO placement:IO2Macro
connection,Macro2Macro,Macro2Stdcell is the order of
preference
memories contain IO ports one side, vdd vss on other
side. Memory power connection M4 Macros don’t sit on
the rows (i.e not multiple of std cell size). Orientation of
memory is very critical. Unidirectional poly (gate layer
should be horizontal or vertical all over the design). Std
cells have vertical poly.(vendor document) macros should
be oriented to have vertical poly. Floorplan impacts wire
length
Apply keepout/halo around macros &Ips. Is placement
blockage(1u) around macros for routability
Ram stacking:Keep some stack(group) gap between
macros based on the length. Stack space allows area for
inserting buffers (optimization) on the wires routing in the
memories. Soft placement blockage. Memories block few
routing layers. Base layers are present in the memory

Blockage: Keepout/Halo define blockages around macros


to ease routing & to avoid routing congestion. Halo regions
of adjacent macros can overlap. If we move the macro,
halo also moves along, where as if we move the macro
blockage/keepout doesn’t move
Hard: No standard cells or buffers can be placed
Soft: Only buffers are allowed to be placed
Partial: Some %of blockage can be specified
Channel width: Number of pins X metal pitch % (No of
effective routing layer)
effective routing layer = layers/2
Power Stripe in channel: at least pair of vdd & vss should
be connected to channel (area between memories).
Channel width should support enough spacing power
stripe.
Avoid criss cross routes & orient memories for shortest
route $ better routability
Avoid notches & create continues core area for std cells.
Memories placed towards boundary, not in the center to
create continues area for std cells
Make sure macros don’t block IO port routing
Fix(attribute) memory placement
Checklist after Floor plan:
1. memory overlaps 2 b avoided
2. IO ports should not be blocked
3. Ram channels should accommodate at least pair of
power/gnd stripe
4. All ports/memories/Ips should be placed have fixed
attribute
5. All memory channel have soft blockage
6. All ports & components are on grid(manufacturing,track)
Port should be placed on track
commands for checking placement violations
check_leagality, check_fp_pin_assignment
Macros should be distributed evenly to avoid notches
which cause routing congestion
Power ring is to be placed before placing standard cells

POWER PLAN
1. Power consumption:Ps+Pd(Pint+psw)
2. power delivery (power mesh)
PPA: High Power requires heat sink(cost), battery
limitation,
Electromigrarion: When current flows through the wire
result in displacement of the atoms of the wire from one
part to other part of wire cause thinning/thicker wire.
(5/7/11 years foundry gives data) resistance varies.
Current limit should not exceed some to maintin resistance
changes less tha 10%) M1 W1 idc limit1 M2W2 idc limit2
Linear expansion affect width
Copper has better electromigartion effect than aluminum
IR drop & ground bounce:
Normally the highest metal layers are used for the power
routing because the resistance associated will be less for
the top layers
Power mesh is created so that no short between pwr& gnd
take place after placement of std cells
set_keepout_margin -type hard -all_macros _outer {2 2 2
2}
cut_rows near macros which removes site rows on the
macros so that standard cells can’t be placed
floating shapes error is reported if there is no via available
to connect to power strip. These will go after adding filler
cells
Placement & Optimization:
Before placing standard cells following physical cells are
placed
1. EndCap cells are added at the start & end of the row
to avoid well proximity effects
2. Tap cells are placed at the regular intervals for
connecting well&substrate to the power and ground to
avoid latchup (back to back parasitic connection in
BJT) which causes power ground shorting. Tap cells
help IRdrop requirement to get maintained. In a
standard cell Nwell & substrate are not connected to
vdd or gnd which cause more area?
3. IO buffers are placed near the IO ports to strengthen
the signals
4. Spare cells are placed for ECO which reduces the
mask cost if re tape out is required.
5. Placement guidance: Some cells like sync cells are to
be placed closely bound (soft/hard/exclusive bound)
such that no other cells can be placed in the vicinity
blockages/keepouts
After adding above cells standard cells are placed.
A.Coarse placement(timing or congestion driven)
B.Legal placement(right location & orientation)
C. Trial/Global routing performed
D. Optimization (congestion/timing) is performed
Post placement:
Tie cells addition & scan chain reordering
Tie cell contains Low pass filter which filters power
ground high frequency noise/fluctuations during circuit
operation. Scan chain reordering is done to reduce
scan wire length optimisation&congestion.
1. QOR is checked for routability (congestion),timing
(DRV,DRC,setup), power,Area
2. Congestion is the measure for routability. It reports
metrics as OVERCON, WIRE LENGTH &
CONGESTION map.
3. Reasons for congestion: Higher global placement
density, higher local placement density, higher pin
density
4. Congestion impact: Routability(DRC,shorts,timing
deterioration & cross talk)
5. Report contains horizontal & vertical overcon
numbers(1000 cells 500 -> -2 track deficiency)
0.5(H) & 0.2(V) number of routes%number of
tracks should be less than 1
6. To analyze or mitigate congestion: Open
congestion maps & check placement density & pin
density maps
7. Congestion resolving methods: Tool related
switches –high congestion effort, magnet
placement, bounds, placement blockage, max
utilization or re floor plan.
8. For global congestion try max utilization. For high
pin density try keepout/hallo cell padding, partial
blockage, and keep higher channel width between
high pin macros.

CTS
Following sanity checks are done before CTS
 Check legality.
 Check power stripes, standard cell rails & also
verify PG connections.
 Timing QoR (setup should be under control).
 Timing DRVs.
 High Fanout nets (like scan enable / any static
signal).
 Congestion (running CTS on congested design /
design with congestion hotspots can create more
congestion & other issues (noise / IR)).
 Remove don’t_use attribute on clock buffers &
inverters.
 Check whether all pre-existing cells in clock path
are balanced cells (CK* cells).
 Check & qualify don’t_touch, don’t size attributes
on clock components.
Preparations
 Understand clock structure of the design &
balancing requirements of the designs. This
will be help in coming with proper exceptions to
build optimum clock tree.
 Creating non-default rules (check whether
shielding is required).
 Setting clock transition, capacitance & fan-out.
 Decide on which cells to be used for CTS
(clock buffer / clock inverter).
 Handle clock dividers & other clock elements
properly.
 Come up with exceptions.
 Understand latency (from Full chip point of
view) & skew targets.
 Take care of special balancing requirements.
 Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis
(HFNS) & Clock Tree Synthesis:
Clock buffers and clock inverter with equal rise and fall
times are used. Whereas HFNS uses buffers and inverters
with a relaxed rise and fall times.
HFNS are used mostly for reset, scan enable and other
static signals having high fan-outs. There is not stringent
requirement of balancing & power reduction.
Clock tree power is given special attention as it is a
constantly switching signal. HFNS are mostly performed
for static signals and hence not much attention to power is
needed.
Difference between clock buffer and normal buffer
Clock buffer have equal rise time and fall time, therefore
pulse width violation is avoided. In clock buffers Beta ratio
is adjusted such that rise & fall time are matched. This
may increase size of clock buffer compared to normal
buffer.
Normal buffers may not have equal rise and fall time.
Clock buffers are usually designed such that an input
signal with 50% duty cycle produces an output with 50%
duty cycle
CTS Goals
1. Meet the clock tree DRC.
2. Max. Transition.
3. Max. Capacitance.
4. Max. Fanout.
5. Meet the clock tree targets.
6. Minimal skew.
7. Minimum insertion delay.
Boundary cell insertions??
When we are working on a block-level design, we might
want to preserve the boundary conditions of the block’s
clock ports (the boundary clock pins).
A boundary cell is a fixed buffer that is inserted
immediately after the boundary clock pins to preserve the
boundary conditions of the clock pin.
When boundary cell insertion is enabled, buffer is inserted
from the clock tree reference list immediately after the
boundary clock pins. For multi-voltage designs, buffers are
inserted at the boundary in the default voltage area.
The boundary cells are fixed for clock tree synthesis after
insertion; it can’t be moved or sized. In addition, no cells
are inserted between a clock pin and its boundary cell.
Delay Insertion
If the delay is more, instead of adding many buffers we
can just add a delay cell of particular delay value.
Advantage is the size and also power reduction. But it has
high variation, so usage of delay cells in clock tree is not
recommended.
Clock Tree Design Rule Constraints
Max. Transition.
The Transition of the clock should not be too tight or too
relaxed.
If it is too tight then we need more number of buffers.
If it is too relaxed, then dynamic power is more.
Max. Capacitance.
Max. Fanout.
Clock Tree Exceptions
Non- Stop Pin
Nonstop pins trace through the endpoints that are
normally considered as endpoints of the clock tree.

Exclude Pin
Exclude pin are clock tree endpoints that are excluded
from clock tree timing calculation and optimization

In the above figure, beyond the exclude pin the tool never
perform skew or insertion delay optimization but does
perform design rule fixing.
Float Pin
Float pins are clock pins that have special insertion delay
requirements and balancing is done according to the delay

Stop Pin
Stop pins are the endpoints of clock tree that are used for
delay balancing.
CTS, the tool uses stop pins in calculation & optimization
for both DRC and clock tree timing.

Clock sink are implicit stop pins

Don’t Touch Subtree

Don’t Buffer Nets


Don’t Size Cells
On chip Variation
TL+TC2Q+TC+Ts+SM <= TCLK+Tskew
500+100+500+100+SM = 1000+600
Setup Margin=1600-1200=400ps
TL+TC2Q+TC >= Th+HM+Tskew
500+100+500 >=100+HM+600
Hold Margin = 1100 -700 = 400ps
Apply 10% derate for OCV

Setup derate: TL=550ps TC = 540ps TComb = 550


550+100+550+100+SM <=1000+540
Setup Margin: 1540-1300= 240ps
Hold derate: TL= 450ps TC=660ls TComb=450ps
450+100+450 >=100+660+HM
Hold Margin: 1000 – 760 = 240ps
ROUTING
Physical connectivity of logic cells is
performed in signal routing.
Tool performs routing in 4 stages
1. Global (trial route) ICCUM description
assignment of nets to GRC,
2. Track assignment track assignment (which
track 2 which net)
3. Detailed route actual net routing in
multiple phases
4. Search & repair (fixes DRC/shorts opens
errors in each phase)
then optimise for DRV setup hold & cross
talk
Routing options:
1. Top(M7) & bottom layers (M2) 2 be used
(M9+1LB layers) routes between M2 to
M7 layers. All cell pins will be in
M1. M1 is not preferred to route. Lead
to DRC violations.
2. Route clock nets first or any critical
nets first
3. Timing SI effort high/medium/. To
reduce SI tool spaces nets far apart.
4. Litho repair (at smaller nodes) DFM
(to reduce sharp edges add more metal
and route smoother turn) avoid routing
non-manufacturing friendly patterns
5. Redundant vias effort (manufacturing
vias is more difficult than metal &
via failure is more so redundant vias
are added 70% multi cut vias) check
signal EM after routed.
6. Antenna diode specify to tool. Antenna
violation (process atenna effect
/manufacturing time short term effect)
AL plasma dry unwanted area is
etched /CU CMP Long metal charge
rupters the gate. Anternna ratio (AR
should be less than fab requirement)
AR = Metal area % (gate area +
diffucion area) Gate area is in lef
file, diffucsion area in lef output
pin. Keep diode to provide discharge
path for the metal charge.
7. Antenna violation fixes: Diode is
added. Reduce metal area. Metal layer
hopping (tool does by default)
8. M1 M2 M3-----M9 is manufactured. First
M1 layer is manufactured and may cause
antenna violation before M2 is
manufactured. Higher metal layer acts
like diffusion path for lower metal
layers known as metal hopping.

9. After CTS buffers & invertors will be


added to the netlist in the clock path
and % of increase of gates should be
minimal
10. Before routing make sure legality is
checked. After CTS newly added buffers
doesn’t have power Gnd connection. So
need to be power/gnd pins are to be
connected prior to routing Core area
height must be integral multiple of
site row.
No ideal nets should present in the
design before routing. Multi cut vias
If clock layers have routing resourse
it can be used for signal routing. But
power layers can’t be used as may lead
to shorts and power vias cause highier
resistance & delay to the routed
signal.
During routing stage we have option to
use higher metal layers (use highier
metal layers for fixing setup
violations on critical nets)
verift_lvs --> opens/shorts/texture
verify_drc --> min spaceing/min
area/min width/ min cut
vias/antenna/via2via spacing/
How2fix post route DRC
DRV max_cap max_fanout max_transition
DRC: min width nin area
Remove unnecessary vias. Move vias so that
min space is achieved
For fixing shorts, analyze and choose
different metal layer for routing such that
shorts are avoided
SIGNOFF CHECKS

1. Latchup
If NFET PFET present closer (Invertor
NMOS/PMOS sit closer) chance of PNP
parasitc structure can form called latchup.
PNP & NPN BJTs are formed in the MOSFET.
Resistances are formed in P substrate &
NWELL (vertical & horizantal resistors)
results in conductive path between vdd and
ground through +ve feedback BJT transistors
and devise gets spoiled. Short circuit
(crewbar) power. Internal latchup &
external latchup. latchup is
formed/triggered due to
1. Voltage drop(ground bounce) internal
latchup
2. Charge generation (heat impact/hot
carriers) external latchup
For regeneration not to take place
Rpsub1+Rpsub2 must be lower Rnwell1+Rnewll2
must be smaller. More the tappings reduce
psub resistance. Distance of tapcell
results in reduction of psub resistance.
Each cell tapping (vias) causes more area.
Internal latchup risk is reduced by more
cell tappings, lower distance from
diffusion to well taps (done in PD).
Shallow trench isolation done by Foundry.
External latchup is reduced by FDSIO Fully
depleted silicon on insulator,gaurd rings,
tripple well structure (cell design not in
PD scope)
Bulk technolgy FDSOI(different structure)
other methods to reduce latchup risk
2.HFN synthesis
clock, reset, scan en are the high fanout
nets in the design are dealt separately.
1. During placeopt takes care of bufferring
& transition (HFN)
If placeopt should not handle any HFN then
use set_ideal_net
2. stand alone HFN bufferring is done with
create_buffer_tree
3. HFN synthesis: similar to CTS
compile_clock_tree -hfn builds clock tree
like structure (no need of low skew/latency
no exceptions (exclude pins) should meet
transition & cap only)
3. Uncertinity & jitter
Contributors are
clock skew
clock jitter:Absolute jitter, tracking
jitter(if input has jitter) & periodic
jitter
Variation of active clock edge arrival
time(2% of clock period) Total slack impact
is two times jitter(worst case)
Signoff margins: use 30% extra setup/hold
margins(foundry guide line)
IR drop: Drop in voltage causes delay
reduction in clock/data path cells cause
setup/hold violations. Typically 1% drop in
voltage 2.5% change in delay.
Tclk >= Tc2q+Tc_Tsu+sign off margin
stage wise margins: Budget enough margins
in earlier stages (synthesis/pre cts/post
cts jitter+SOM+skew+IR drop) which will
demand in the subsequent stages to meet
timing after signoff
top level impacts(cross talk)
Uncertainty adds pessimism (requires extra
slack margin to accommodate setup & hold
uncertainty)
4. Virtual clock & update_io constraint
Virtual clock: sdc contains clock/generated
clock/virtual clock
Normal clock/generated clock have
origination port. Virtual clock has clock
name not associated with any physical
pin/port. Virtual clock used to define
inout delay constraints are defined with
virtual clock. Apply latency on virtual
clock. After CTS done,
update_io_constraints change only clock
latency. IO delays are not changed.
Purpose of defining a virtual clock: The
advantage of defining a virtual clock is
that we can specify desired latency for
virtual clock. As mentioned above, virtual
clock is used to time interface paths.
Figure 1 shows a scenario where it helps to
define a virtual clock. Reg-A is flop
inside block that is sending data through
PORT outside the block. Since, it is a
synchronous signal, we can assume it to be
captured by a flop (Reg-B) sitting outside
the block. Now, within the block, the path
to PORT can be timed by specifying output
delay for this port with a clock
synchronous to clock_in. We can specify a
delay with respect to clock_in itself, but
there lies the difficulty of specifying the
clock latency. If we specify the latency
for clock_in, it will be applied to Reg-A
also. Applying output delay with respect to
a real clock causes input ports to get
relaxed and output ports to get tightened
after clock tree has been built.
The solution to the problem is to define a
virtual clock and apply output delay with
respect to it. Making the source latency of
virtual clock equal to network latency of
real clock will solve the problem
DRC constraints:
1. set_max_fanout
each input of cell has fanout_load
attribute. each output of a cell has
max_fanout attribute. BUFFD0 can't drive
more than two cells of it's kind.
2. set_max_capacitance : BUFFD0 output
shouldn't be connected to any input if
interconnect & load pin capacitance is more
than 2.2
3.set_max_transition is applicable to input
of BUFFD0. Any net which has transition
value greater than 1.5 shouldn't be
connected to this input
5. NDR Vs shielding ground cap
Non default rule (net width/spacing etc)
NDR:Single width double spacing for clock
nets reduce cross talk
1. capacitance:Carea+Cfringe (ground ref)
+Ccoup(near by wire ref) psub is held to 0
potential/gnd. less
2. cross talk to reduce xtalk net spacing
should be more
3. ID:
4. power
Shielding: place power or ground net placed
near to clock net.
1. Capacitance increases
2. cross talk:reduces
3: ID is more because of more capacitance
4. Power is also more
6. Optimisation techniques
Timing
DRV max tran, max cap, setup,hold
Max tran violation is fixed by VT
swapping/upsizing/net bufferring(for more
net length)/fanout splitting(for more
fanout)
Max cap violation is fixed by fanout
splitting(for more fanout) net bufferring
(for long net)
Setup violation is fixed by Vt
swap/upsizing driver/fanout
split/bufferring. Add buffer in common
clock path clock tweaking
Hold violation is fixed by adding delay
cells end point near to capture flop D
pin(have small area but have more corner
variations) in the data path OR divergent
bufferring
Congestion is reduced by
padding keepout soft/hard blockage, overall
place density, low pin count cells, better
floor plan. If high cell density is there
put partial blockage to disperse in local
congestion,
Area
Downsize cells in _ve setup slack path &
create some space
Power
Leakage power is reducing by Vt swapping on
+ve slack paths(optimise_power command)

Jitter will have impact on setup check


(2Tj) as setup check takes place at
different edges. Jitter has not effect on
Hold check as it takes place at same edge.
But for the Half cycle path setup check
takes place between rising edge & next
falling edge 2Tj jitter has impact on
setup. In this case hold check takes place
between two different edges 2Tj jitter has
impact on hold check.
OCV On Chip Variation
All dies in each wafer have same opearting
condition SS/TT/FF.
Within a wafer every die has some
variation. Not all mosfets in the same die
have same voltage/temp/process. Local ocv.
All small variations in PVT. Global
variations are modelled in timing libs
Local variations are modelled in derates.
OCV derate 10% is valid (by fab) @ max_tran
& max_cap
2. 10% of clock period
DRM (Delay rule manual) Imp & signoff
guide lines(derate values) document is
obtained from fab. OACV is provided by
library vendor
8. CPRR
Cell present in common path STA calculates
max delay for setup path & min delay for
hold path which adds pessimism.
SIGN OFF
Finish routeing & op -> add decap (more
size mosfet) & filler cells(less size has n
well continues) -> Generate outputs Signoff
with different tools tasks(STA/PT)
Extractor(Quantus/StarRc)A --> GDS -->
Dummy base metal fill (base & metal poly
min max density rule) --> filled GDS -->
DRC+LVS+ERC+DFM(lpc/pattern matching) +
PERC
A-FV
A-Rail analysis Redok (EM/Power/IR drop)
After all of these passed signoff
PBA mode path based analysis takes lot of
time by STA tool
each cell has 3 physical views LEF GDS
SPICE netlist used4lvs phyisical netlistVs
GDS is LVS
extraction &STA& Signoff
Post route->Dcap&filler additions-->sign
off below are done
LEC(logical netlist vs synth netlist,
synthNL vs placeopt/routed & RTL &synth
netlist) LVS(synth netlist Vs GDS) PDV
RAIL EXTRACTION &STA
Extraction:RC spef (cell+wire) dspf. STA
inputs spef sdc,lib,def(4 physical aware
timing analysis)
STA check DRV(max tran/max cap/max fanout)
logical drc setup&hold analysis inclusive
cross talk impact, Noise analysis(glitches)
& min pulse width checks for clock
SI Cross talk
As wires are placed closer and CT is more
for thicker wires as coupling capacitance.
CT impact on delay & functionality in the
same layer. Aggressor & victim nets. Each
victim net can have multiple aggressor
nets. When both aggressor & victim are
switching same direction transition
improves /opposite direction transition
worsens it impacts delay. It depends on
Coupling cap/Aggressor switching directions
& strength. If the victim net is an input
of a cell lead to setup/hold worsening
scenarios. If the timing window overlap
present between aggressor & victim nets
causes CT. Aggressor is switching and
victim is static, then it causes glitches
(over shoot/under shoot) results
noise/functional issue.
AOCV
OCV is pessimistic: As derate is applied to
all cells. In reality not all cells will
have same worst derate. There will be an
average OCV impact on cells. If more cells
in a Data path average impact is more &
derate value will be less (stage based ocv)
and distance based ocv
OCV derate given by foundry
AOCV: lookup table based on depth library
vendor .aocv file
Rail Analysis:(CAD toos?)
1. Power dissipation (Psw+Pint+Pleakage)
2. Electro migration: power em/signal em
3.IR drop: Voltage drop
1.Avg/static IR drop snalysis: Based on
average activity factor power
2.Dynamic IR drop analysis: Based on actual
switching dynamic cells peak current
analysis
Violations
signal em: is reduce load/add buffer
power em: Add another parallel power strap
add missing vias. Spread cells which have
more switching activity. Add DECAP cells
for dynamic IR drop. Decaps are placed near
clock buffers/high drive cells as a
methodology as pre placed cells. What if
analysis helps in placing small/bigger
decap based on dynamic power drop
Power Gating&Multi vdd
FINFET
To mitigate/overcome short channel impact
(DIBL) Higher leakage MOSFET doesn’t
shutoff even after gate voltage is reduced
then Vt.. Planar to finfet device (3 sided
gate/gate cap is more) 16nm onwards

Synth,latch based timing,lockup latch


ECO
DPT Dual patterning
Manufacturing: Photo lithography
Light is passed through mask (mask is
prepared on gds layers) – reduction lense
-> wafer loaded
Raw wafers, Raw masks, Stepper are inputs
Critical Dimension =
KYWL(lambda)/Refractive index RI XNA
numerical apparture) 120nm
CD is redused by reducing wave length which
requires new stepper. Or change RI by
filling with another gas. 76Nm is achieved.
If 38nm M2 pitch is wanted with same
stepper use M1layer Mask1, M1 layer with
Mask2
Patterning can be made beyond 5nm.
Signoff Colouring GDS layers/tracks PD
Bottom (M1 m2 m3 DPT 16/14nm more pitch)
IDEAS:
1. Having voltage gradient (not discrete)
distribution across the die?
2. Remove ground connection to the FET to
reduce leakage current??

3. Ds
4. ds

You might also like