Professional Documents
Culture Documents
MOSFET
1. lib & *.db are same and supplied by the vendor TSMC.
Cadence used .lib & synopsys uses .db
2. lef & .mw are same and supplied by the vendor TSMC.
Cadence used .lef & synopsys uses .mw
3. We are generating milkyway library of cells of the
design at different phases (import/floor plan/power plan
etc)
INPUTS TO PHYSICAL DESIGN
SANITY CHECK:
Sanity check is performed on the netlist.
ICC command What it does
1. Physical library quality PIN direction,
SIZE, LAYER consistency between
Check_library techfile& mw file, (same names across
different physical libraries)
2. Logical library quality, pin, area, delay,
leakage numbers
3. Logical vs physical consistency
dumps design stats(how many
gates/macros/ports), multi driven nets??
(shouldn't be there), uniqueness of
Check_design modules(give different names to different
instantiation of same modules), floating
input pins & output ports Floating inputs
should be connected to known logic
values to avoid metastability issue
Check_timing netlist vs sdc consistency check, netlist vs
library consistency check&Unconstrained
io ports, No Input drive, no output load,
clock not found/reach, comb loop timing
tool takes long time to calculate delay
Report_timing Timing violations
MACRO SRAM, IP PLL
FLOOR PLAN:
Goals TASKS
Optimal size size,shape,aspect ratio:utilization
(std cell area + ip area+macro area/ design area) typ
60%
Min Congestion
Better Routability
Meet timing
Optimal data flow
Need more area for optimization cells, special
purpose cells, routability space, blockage/hallow
cells, power density concern
FLOORPLAN
1. size shape
2. IO area creation, IO pin placement
3. Block level IO pin placement is done based on the
connectivity from the adjacent blocks
4. 3.IP macro placement
5. Placement blockages,keepout
6. Die size:Macro area+std cell+blockage+io area
Chip/Die utilization:std cell+macro+IP+IO area % Die area
Std cell utilization: std cell area % (Die area -
[IO+Macro+IP area])
Core area = Die area – IO area
IO port placement:
IO ports: power, gnd, signal io, clock, reset,feed through
System synchronous IF, source synchronous IF(DDR),
self synchronous Iinterface(serdes)
Feed throughs are defined for routing feasibility. PLL has
ref clock input & generates clocks outputs in phase with
ref clock. Place PLL near to refclk port. Few control inputs
to the PLL. Place clock port in the center of the die
IP/MACRO placement:IO2Macro
connection,Macro2Macro,Macro2Stdcell is the order of
preference
memories contain IO ports one side, vdd vss on other
side. Memory power connection M4 Macros don’t sit on
the rows (i.e not multiple of std cell size). Orientation of
memory is very critical. Unidirectional poly (gate layer
should be horizontal or vertical all over the design). Std
cells have vertical poly.(vendor document) macros should
be oriented to have vertical poly. Floorplan impacts wire
length
Apply keepout/halo around macros &Ips. Is placement
blockage(1u) around macros for routability
Ram stacking:Keep some stack(group) gap between
macros based on the length. Stack space allows area for
inserting buffers (optimization) on the wires routing in the
memories. Soft placement blockage. Memories block few
routing layers. Base layers are present in the memory
POWER PLAN
1. Power consumption:Ps+Pd(Pint+psw)
2. power delivery (power mesh)
PPA: High Power requires heat sink(cost), battery
limitation,
Electromigrarion: When current flows through the wire
result in displacement of the atoms of the wire from one
part to other part of wire cause thinning/thicker wire.
(5/7/11 years foundry gives data) resistance varies.
Current limit should not exceed some to maintin resistance
changes less tha 10%) M1 W1 idc limit1 M2W2 idc limit2
Linear expansion affect width
Copper has better electromigartion effect than aluminum
IR drop & ground bounce:
Normally the highest metal layers are used for the power
routing because the resistance associated will be less for
the top layers
Power mesh is created so that no short between pwr& gnd
take place after placement of std cells
set_keepout_margin -type hard -all_macros _outer {2 2 2
2}
cut_rows near macros which removes site rows on the
macros so that standard cells can’t be placed
floating shapes error is reported if there is no via available
to connect to power strip. These will go after adding filler
cells
Placement & Optimization:
Before placing standard cells following physical cells are
placed
1. EndCap cells are added at the start & end of the row
to avoid well proximity effects
2. Tap cells are placed at the regular intervals for
connecting well&substrate to the power and ground to
avoid latchup (back to back parasitic connection in
BJT) which causes power ground shorting. Tap cells
help IRdrop requirement to get maintained. In a
standard cell Nwell & substrate are not connected to
vdd or gnd which cause more area?
3. IO buffers are placed near the IO ports to strengthen
the signals
4. Spare cells are placed for ECO which reduces the
mask cost if re tape out is required.
5. Placement guidance: Some cells like sync cells are to
be placed closely bound (soft/hard/exclusive bound)
such that no other cells can be placed in the vicinity
blockages/keepouts
After adding above cells standard cells are placed.
A.Coarse placement(timing or congestion driven)
B.Legal placement(right location & orientation)
C. Trial/Global routing performed
D. Optimization (congestion/timing) is performed
Post placement:
Tie cells addition & scan chain reordering
Tie cell contains Low pass filter which filters power
ground high frequency noise/fluctuations during circuit
operation. Scan chain reordering is done to reduce
scan wire length optimisation&congestion.
1. QOR is checked for routability (congestion),timing
(DRV,DRC,setup), power,Area
2. Congestion is the measure for routability. It reports
metrics as OVERCON, WIRE LENGTH &
CONGESTION map.
3. Reasons for congestion: Higher global placement
density, higher local placement density, higher pin
density
4. Congestion impact: Routability(DRC,shorts,timing
deterioration & cross talk)
5. Report contains horizontal & vertical overcon
numbers(1000 cells 500 -> -2 track deficiency)
0.5(H) & 0.2(V) number of routes%number of
tracks should be less than 1
6. To analyze or mitigate congestion: Open
congestion maps & check placement density & pin
density maps
7. Congestion resolving methods: Tool related
switches –high congestion effort, magnet
placement, bounds, placement blockage, max
utilization or re floor plan.
8. For global congestion try max utilization. For high
pin density try keepout/hallo cell padding, partial
blockage, and keep higher channel width between
high pin macros.
CTS
Following sanity checks are done before CTS
Check legality.
Check power stripes, standard cell rails & also
verify PG connections.
Timing QoR (setup should be under control).
Timing DRVs.
High Fanout nets (like scan enable / any static
signal).
Congestion (running CTS on congested design /
design with congestion hotspots can create more
congestion & other issues (noise / IR)).
Remove don’t_use attribute on clock buffers &
inverters.
Check whether all pre-existing cells in clock path
are balanced cells (CK* cells).
Check & qualify don’t_touch, don’t size attributes
on clock components.
Preparations
Understand clock structure of the design &
balancing requirements of the designs. This
will be help in coming with proper exceptions to
build optimum clock tree.
Creating non-default rules (check whether
shielding is required).
Setting clock transition, capacitance & fan-out.
Decide on which cells to be used for CTS
(clock buffer / clock inverter).
Handle clock dividers & other clock elements
properly.
Come up with exceptions.
Understand latency (from Full chip point of
view) & skew targets.
Take care of special balancing requirements.
Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis
(HFNS) & Clock Tree Synthesis:
Clock buffers and clock inverter with equal rise and fall
times are used. Whereas HFNS uses buffers and inverters
with a relaxed rise and fall times.
HFNS are used mostly for reset, scan enable and other
static signals having high fan-outs. There is not stringent
requirement of balancing & power reduction.
Clock tree power is given special attention as it is a
constantly switching signal. HFNS are mostly performed
for static signals and hence not much attention to power is
needed.
Difference between clock buffer and normal buffer
Clock buffer have equal rise time and fall time, therefore
pulse width violation is avoided. In clock buffers Beta ratio
is adjusted such that rise & fall time are matched. This
may increase size of clock buffer compared to normal
buffer.
Normal buffers may not have equal rise and fall time.
Clock buffers are usually designed such that an input
signal with 50% duty cycle produces an output with 50%
duty cycle
CTS Goals
1. Meet the clock tree DRC.
2. Max. Transition.
3. Max. Capacitance.
4. Max. Fanout.
5. Meet the clock tree targets.
6. Minimal skew.
7. Minimum insertion delay.
Boundary cell insertions??
When we are working on a block-level design, we might
want to preserve the boundary conditions of the block’s
clock ports (the boundary clock pins).
A boundary cell is a fixed buffer that is inserted
immediately after the boundary clock pins to preserve the
boundary conditions of the clock pin.
When boundary cell insertion is enabled, buffer is inserted
from the clock tree reference list immediately after the
boundary clock pins. For multi-voltage designs, buffers are
inserted at the boundary in the default voltage area.
The boundary cells are fixed for clock tree synthesis after
insertion; it can’t be moved or sized. In addition, no cells
are inserted between a clock pin and its boundary cell.
Delay Insertion
If the delay is more, instead of adding many buffers we
can just add a delay cell of particular delay value.
Advantage is the size and also power reduction. But it has
high variation, so usage of delay cells in clock tree is not
recommended.
Clock Tree Design Rule Constraints
Max. Transition.
The Transition of the clock should not be too tight or too
relaxed.
If it is too tight then we need more number of buffers.
If it is too relaxed, then dynamic power is more.
Max. Capacitance.
Max. Fanout.
Clock Tree Exceptions
Non- Stop Pin
Nonstop pins trace through the endpoints that are
normally considered as endpoints of the clock tree.
Exclude Pin
Exclude pin are clock tree endpoints that are excluded
from clock tree timing calculation and optimization
In the above figure, beyond the exclude pin the tool never
perform skew or insertion delay optimization but does
perform design rule fixing.
Float Pin
Float pins are clock pins that have special insertion delay
requirements and balancing is done according to the delay
Stop Pin
Stop pins are the endpoints of clock tree that are used for
delay balancing.
CTS, the tool uses stop pins in calculation & optimization
for both DRC and clock tree timing.
1. Latchup
If NFET PFET present closer (Invertor
NMOS/PMOS sit closer) chance of PNP
parasitc structure can form called latchup.
PNP & NPN BJTs are formed in the MOSFET.
Resistances are formed in P substrate &
NWELL (vertical & horizantal resistors)
results in conductive path between vdd and
ground through +ve feedback BJT transistors
and devise gets spoiled. Short circuit
(crewbar) power. Internal latchup &
external latchup. latchup is
formed/triggered due to
1. Voltage drop(ground bounce) internal
latchup
2. Charge generation (heat impact/hot
carriers) external latchup
For regeneration not to take place
Rpsub1+Rpsub2 must be lower Rnwell1+Rnewll2
must be smaller. More the tappings reduce
psub resistance. Distance of tapcell
results in reduction of psub resistance.
Each cell tapping (vias) causes more area.
Internal latchup risk is reduced by more
cell tappings, lower distance from
diffusion to well taps (done in PD).
Shallow trench isolation done by Foundry.
External latchup is reduced by FDSIO Fully
depleted silicon on insulator,gaurd rings,
tripple well structure (cell design not in
PD scope)
Bulk technolgy FDSOI(different structure)
other methods to reduce latchup risk
2.HFN synthesis
clock, reset, scan en are the high fanout
nets in the design are dealt separately.
1. During placeopt takes care of bufferring
& transition (HFN)
If placeopt should not handle any HFN then
use set_ideal_net
2. stand alone HFN bufferring is done with
create_buffer_tree
3. HFN synthesis: similar to CTS
compile_clock_tree -hfn builds clock tree
like structure (no need of low skew/latency
no exceptions (exclude pins) should meet
transition & cap only)
3. Uncertinity & jitter
Contributors are
clock skew
clock jitter:Absolute jitter, tracking
jitter(if input has jitter) & periodic
jitter
Variation of active clock edge arrival
time(2% of clock period) Total slack impact
is two times jitter(worst case)
Signoff margins: use 30% extra setup/hold
margins(foundry guide line)
IR drop: Drop in voltage causes delay
reduction in clock/data path cells cause
setup/hold violations. Typically 1% drop in
voltage 2.5% change in delay.
Tclk >= Tc2q+Tc_Tsu+sign off margin
stage wise margins: Budget enough margins
in earlier stages (synthesis/pre cts/post
cts jitter+SOM+skew+IR drop) which will
demand in the subsequent stages to meet
timing after signoff
top level impacts(cross talk)
Uncertainty adds pessimism (requires extra
slack margin to accommodate setup & hold
uncertainty)
4. Virtual clock & update_io constraint
Virtual clock: sdc contains clock/generated
clock/virtual clock
Normal clock/generated clock have
origination port. Virtual clock has clock
name not associated with any physical
pin/port. Virtual clock used to define
inout delay constraints are defined with
virtual clock. Apply latency on virtual
clock. After CTS done,
update_io_constraints change only clock
latency. IO delays are not changed.
Purpose of defining a virtual clock: The
advantage of defining a virtual clock is
that we can specify desired latency for
virtual clock. As mentioned above, virtual
clock is used to time interface paths.
Figure 1 shows a scenario where it helps to
define a virtual clock. Reg-A is flop
inside block that is sending data through
PORT outside the block. Since, it is a
synchronous signal, we can assume it to be
captured by a flop (Reg-B) sitting outside
the block. Now, within the block, the path
to PORT can be timed by specifying output
delay for this port with a clock
synchronous to clock_in. We can specify a
delay with respect to clock_in itself, but
there lies the difficulty of specifying the
clock latency. If we specify the latency
for clock_in, it will be applied to Reg-A
also. Applying output delay with respect to
a real clock causes input ports to get
relaxed and output ports to get tightened
after clock tree has been built.
The solution to the problem is to define a
virtual clock and apply output delay with
respect to it. Making the source latency of
virtual clock equal to network latency of
real clock will solve the problem
DRC constraints:
1. set_max_fanout
each input of cell has fanout_load
attribute. each output of a cell has
max_fanout attribute. BUFFD0 can't drive
more than two cells of it's kind.
2. set_max_capacitance : BUFFD0 output
shouldn't be connected to any input if
interconnect & load pin capacitance is more
than 2.2
3.set_max_transition is applicable to input
of BUFFD0. Any net which has transition
value greater than 1.5 shouldn't be
connected to this input
5. NDR Vs shielding ground cap
Non default rule (net width/spacing etc)
NDR:Single width double spacing for clock
nets reduce cross talk
1. capacitance:Carea+Cfringe (ground ref)
+Ccoup(near by wire ref) psub is held to 0
potential/gnd. less
2. cross talk to reduce xtalk net spacing
should be more
3. ID:
4. power
Shielding: place power or ground net placed
near to clock net.
1. Capacitance increases
2. cross talk:reduces
3: ID is more because of more capacitance
4. Power is also more
6. Optimisation techniques
Timing
DRV max tran, max cap, setup,hold
Max tran violation is fixed by VT
swapping/upsizing/net bufferring(for more
net length)/fanout splitting(for more
fanout)
Max cap violation is fixed by fanout
splitting(for more fanout) net bufferring
(for long net)
Setup violation is fixed by Vt
swap/upsizing driver/fanout
split/bufferring. Add buffer in common
clock path clock tweaking
Hold violation is fixed by adding delay
cells end point near to capture flop D
pin(have small area but have more corner
variations) in the data path OR divergent
bufferring
Congestion is reduced by
padding keepout soft/hard blockage, overall
place density, low pin count cells, better
floor plan. If high cell density is there
put partial blockage to disperse in local
congestion,
Area
Downsize cells in _ve setup slack path &
create some space
Power
Leakage power is reducing by Vt swapping on
+ve slack paths(optimise_power command)
3. Ds
4. ds