Professional Documents
Culture Documents
m
Standard Cell Power Connections
5.3 Initialize Floorplan
We are now ready to proceed with CADENCE SOC ENCOUNTER .
Student Task 7:
From the menu select FloorplanSpecify Floorplan.... A large window will open.
Select the DIE SIZE BY: WIDTH AND HEIGHT option and make sure that both values are
1519.62.
Now we need to specify the I/O to core spacing by lling in the four values under the CORE
MARGINS BY: entry. There must be sufcient room for the power ring around the core area.
17
Larger values will reduce the area available to place the core cells thereby increasing core
utilization.
As noted earlier, some iterations are usually required to nd optimal values for a particular
design.
In this exercise we will assume that we will use one VCC and one GND line of maximum
width 20 m. We need some extra space between the lines and, for the moment, we can
start with a distance of 50m for all sides and click on OK.
The oorplan should now look like shown in the screen-shot below. Note that the pads are all placed
at their proper locations as the I/O le used during design import species absolute locations and we
made sure that the die size stays xed to the proper size during the initialize oorplan step.
Student Task 8:
Next we need to place the RAM macro-cell. Change the cursor mode to MOVE/RESIZE/RE-
SHAPE by selecting the appropriate icon (next to the ruler icon) or use the keyboard shortcut
SHIFT-R. Now you can select the RAM macro-cell and drag it to any location you like. The
blue lines displayed are so called ightlines that show where the signal connections to the
block are.
You can change the orientation of the RAM by either using Floorplan Edit Floorplan\
Flip/Rotate Instances ... (or press r), or with the attribute editor (press q). Note that the
RAM macro will completely block Metal-1, Metal-2, Metal-3 and Metal-4. Only Metal-5,
Metal-6 will be available for routing over the RAM macro-cell
20
.
20
By default, the internal structures within a cell or block are not displayed. You need to make Cell Blkg visible to
see the so called blockages within a cell.
18
5.4 Power Planning
The next step is to create the power distribution network.
The Verilog netlist that we started with does not contain any power connections, therefore we need
to create this connectivity now. We have to connect the power/ground pins of all instances to the
respective global power/ground net that was specied on the DESIGN IMPORT form (category POWER
on the ADVANCEDtab)
21
.
This can be done using the Floorplan Connect Global Nets ... form or you can use the
globalnet.tcl script provided.
Student Task 9:
Execute the script provided by typing on the command line of CADENCE SOC ENCOUNTER
(not GUI):
enc> source scripts/globalnet.tcl
21
There is also a special rule required if there are logic one/zero values 1b1/1b0 instead of TIE1/TIE0 cells in your
netlist. You should however not have such logic values in your netlist.
19
Next we will add the core power rings that distribute power all around the core.
Student Task 10:
Select the menu Power Power Planning Add Rings.... A large window will ap-
pear. The NET(S) eld on the top denes for which nets rings will be created. The default
is to create power VCC as well as ground GND rings.
In the RING CONFIGURATION section you can specify on what layers the ring segments will
be created. Select metal5 H for TOP and BOTTOM and metal6 V for LEFT and RIGHT.
Specify WIDTH as 20 m, SPACING as 1.5 m and OFFSET as 4 m and click OK.
There are many alternative power distribution schemes that can be used. The one that we have
chosen here is a very simple one. We have selected the upper metal layers Metal-5 and Metal-6
for the ring, because in this technology Metal-6 is thicker and consequently has less parasitic
resistance which is desirable for power distribution.
For your own designs, you should perform a power analysis (topic of Training 2) to nd out the best
power distribution approach that matches your design.
The width has been chosen as 20 m for convenience reasons. Basically the wider the power con-
nection, the better. But as already mentioned earlier, in this technology, metal lines wider than 20m
need to be slotted (stress relief slots) which requires extra effort. As an alternative to slotting it is
also possible to create several smaller parallel rings, e.g. two VCC and two GND rings.
20
SPACING determines the distance between the two nets and OFFSET determines the distance be-
tween the core area and the innermost ring.
We also need a (partial) ring around the macro-cell, you will see later why this is necessary.
Student Task 11:
Select the menu PowerPower Planning Add Rings... just like before. This time
in the RING TYPE box, select BLOCK RING(S) AROUND. You can leave the selection at
EACH BLOCK since we have only one block anyway.
CADENCE SOC ENCOUNTER is usually smart enough to create wires only on the edges
where no power lines are yet, i.e. to not create new wires on top of the core ring.
If this fails you can specify the segments and connections you want on the ADVANCED tab.
Fill in the values/settings similar to that of the ADD RINGS and click on OK.
At any point if you wish to delete part of the oorplan you can:
use the UNDO feature by simply pressing u
select and remove objects of a specic class (press d)
use the menu option Floorplan Edit Floorplan Clear Floorplan...
select an object and hit the Del key on the keyboard
Student Task 12:
Also, you can save or load (restore) your oorplan at any time using the menu Design
Save Floorplan ... and Design Load Floorplan ... respectively.
Save your oorplan to the save directory.
At this point power is to the standard cells arrives from the sides. Especially for fast designs the
standard cells in the middle of the standard cell row will not receive sufcient power it is important to
add vertical stripes to improve the power distribution.
Student Task 13:
Select Power Power Planning Add Stripes ....
The SET CONFIGURATION part of the window denes the properties of one stripe set.
The SET PATTERN part denes how many stripes will be added. We can either choose to
insert a xed number of sets or only specify the distance between two sets SET-TO-SET
DISTANCE:
In the FIRST/LAST STRIPE part, we select RELATIVE FROM CORE OR SELECTED AREA. Add
to X FROM LEFT and X FROM RIGHT a value stripe sets in such a way that the standard cell
rows get divided into three equally long pieces. See the screen shot for width, spacing and
layer. Note: You can ne tune this later by moving the stripe sets.
By default stripes will continue over macro cells. To prevent this, select the OMIT STRIPES
INSIDE BLOCK RINGS option in the STRIPE BREAKING section of the ADVANCED tab.
21
It is rather easy to move wires in CADENCE SOC ENCOUNTER . Click on the move wires button (or
press m), select the wires you want to move, and drag them to their new location. CADENCE SOC
ENCOUNTER will make sure that electrical connections remain intact. If you want you can use this to
ne tune the stripe placement.
We still need to dene a block halo for the RAM macro-cell. This is necessary to keep standard cells
from being placed to close to the RAM and also to avoid problems when routing the power lines of
the standard cell rows.
The gure below illustrates one common problem with the block halo.
Macro-Block
Block Halo
Standard Cell Row
Standard Cell Row
P
o
w
e
r
R
a
i
l
s
Dangling Power Line (bad)
Terminated Power Line (good)
22
In this gure, only two standard cell rows are shown. The block halo around the rst row extends far
enough to cover the two power lines
22
. This is like it should be.
For the second row, the block halo does not cover the power rails, and when making the power
connections CADENCE SOC ENCOUNTER will try to extend the power connection past the power
rails as shown in the gure. This leaves a dangling power line
23
. While this will not render your chip
useless, it should be avoided.
Student Task 14:
From the menu select Floorplan Edit Floorplan Edit Halo.... A window will
appear, where you can specify a keep-out zone for routing and/or placement around the
macro-cell.
Usually we only need a Placement Halo. The size will depend on your power routing/oor-
plan.
Create an appropriate Placement Halo.
Notice that the I/O pads are placed with some distance between them
24
. At some point in the design
ow we need to close the gaps between the I/O pads in order to complete the supply rings that run
around the core (within the pad cells) and are required to supply the circuitry within of the pad cells.
Student Task 15:
Instead of using wires, we will place so called ller cells that completely ll the gaps and
establish the required connectivity.
There is a script that will automatically insert matching ller cells. Type the following in the
CADENCE SOC ENCOUNTER console window
enc> source scripts/fillperi.tcl
22
This is just for illustration. It is not possible to draw a block halo that has this (L) shape.
23
This sort of dangling wires are known as geometry antenna in Cadence SoC Encounter
24
This is due to the contraints set by the company that bonds the chips. They specify that the minimum distance
between two adjacent pads can be 90 m. Since even a core-limited pad in this technology is roughly 60 wide, we
need to place them with gaps in between.
23
Now we need to nalize the power connections of the chip. The following connections still need to be
made:
The core ring needs to be connected to the core supply pads (VCC3IOD and GNDIOD).
All standard cells need to be connected to VCC and GND lines.
All macro-cells need to be connected to VCC and GND lines.
Student Task 16:
Select Route Special Route ... from the menu. SRoute is the special net router,
and is only used to make power connections.
The ROUTE: part contains the different connection types we have listed above. BLOCK
PINS are macro-cell power connections, PAD PINS are the connections from the core supply
pads to the core ring. We will not need PAD RINGS since we have already used ller cells to
complete these rings. STANDARD CELL PINS will add power lines to the standard cell rows.
Finally, if you still have stripes that are not connected to power (not very likely) you can use
the STRIPES (UNCONNECTED) option.
While it is possible to route all connections at the same time, it is strongly recommended to
do it one by one:
1. Start with PAD PINS. If nothing happens you have most likely forgotten to source the
globalnet.tcl script.
2. Route BLOCK PINS. Check the result, did the router connect the macro-cell the way
you wanted? If not you may need to study the ADVANCED tab of the SRoute window.
If all fails you can edit the connections manually.
3. Route the STANDARD CELL PINS. This should create many horizontal Metal-1 lines
that connect to the rings and stripes. Look for dangling wires around the block halo
(adjust the block halo if necessary).
We are now nished with oorplanning. Your oorplan should look similar to the following screen
shot.
24
6 Placement
We will now start with the placement of the standard cells in the core area. Placement is a very
computation intensive problem, and mostly heuristic algorithms are used for this purpose.
Student Task 17:
Select Place Standard Cells.. ....
We want run a full placement and not an incremental or just the quick prototyping one.
INCLUDE PRE-PLACE OPTIMIZATION however is very useful as it removes all buffers/invert-
ers trees from the netlist which will help us for timing analysis as you will see later.
To set advanced options click MODE. Set CONGESTION EFFORT to LOW and deselect RUN
TIMING DRIVEN PLACEMENT as timing driven takes much longer and might not help that
much to improve timing. There are several other options that you can set, but at this time
we will leave them as they are. Apply the changes by pressing OK
You will come back to the placement window seen below, click OK to start placement. This
may take some time.
We have to warn you about the various performance related options such as CONGESTION EFFORT
and RUN TIMING DRIVEN PLACEMENT above. In the exercises sometimes we will advise you to use
certain settings for these options in order to reduce runtime, or because for this particular design
we have found out that a particular option gives better results. When you do your own designs, you
25
should consider evaluating which options are better suited rather than copying all options from this
exercise.
For each standard cell, the placement algorithm will try to nd the optimum location so that there is a
feasible routing solution and the total length of the connections is minimized.
Examine the placement by using the design browser (switch to the physical view). You will notice that
standard cells within the same entity are mostly placed next to each other.
The available space and the placement of macro-cells and I/O pads can have a great inuence on
the placement of standard cells. Even though more space seems to be a good idea, too much
space sometimes results in placements where the average distance between standard cells and
consequently the delays caused by wire capacitance/resistance become larger. Only experience and
several iterations will allow you to nd a placement for your circuit that is close to optimal.
Note: Visibility of SPECIAL NET is turned off in the next screen shot.
26
The results for placement (and later routing) are strongly design dependent. For example, structures
with many interconnections such as look-up tables will usually need much more space than synthesis
predicted as the cells need to be spread out in order to have enough space to route all the intercon-
nections. This is why generalizations for back-end design, such as During back-end design, your
circuit area will increase by 10% dont work very well.
Student Task 18:
Let us save the entire design with Design Save Design As SoCE. This will save the
conguration le, netlist, oorplan, special route, placement and routing les as well as the
current mode, options and preferences. A design saved in this way can be restores using
Design Restore Design ... SoCE.
The space required is surprisingly small as most les are compressed and the library les
do not get saved along with the design.
Remember to save under the save directory.
Alternatively you could also just save the placement. Select Design Save Place \
....
During synthesis, SYNOPSYS DESIGN COMPILER assigns constant logic values to two special stan-
dard cells named TIE0x and TIE1x, where x is a drive strength modier. This creates a small
inconvenience, as often one of these cells is assigned to drive many outputs at the same time, creat-
ing relatively long interconnections.
There is sufcient place on the chip to place several of these cells. We will use a script that rst
removes all these cells. Then we will set the rules for placing these cells. The example script scripts\
/tiehilo.tcl sets the maximum number of connections driven by a single cell to 20, and the maximum
distance between the pin and the tie cell to 250 m. And nally we insert the tie cells according to
the rules we have dened.
Student Task 19:
At the command line type:
enc> source scripts/tiehilo.tcl
7 Timing
The synthesis tools we currently use for HDL synthesis (SYNOPSYS DESIGN COMPILER) are not
aware of any instance placement information. Therefore the interconnects can only be estimated
based on a statistical model, i.e. the fanout of a net determines its length, capacitance, resistance and
area. Now that the placement and even trial-routing is available the timing might differ considerably
from the numbers obtained from SYNOPSYS DESIGN COMPILER.
7.1 Analysis
CADENCE SOC ENCOUNTER has a practical timing analysis function, where you only have to specify
the state of the design (see below) and the ANALYSIS TYPE (Setup or Hold) you want to run.
27
Pre-Place design is not placed
Pre-CTS design is placed but clock tree is not yet inserted
Post-CTS design is placed and the clock tree is inserted
Post-Route design is placed and routed
Sign-O will use extra tools for even more precise analysis. We will not use this as these tools are
not installed/setup.
Depending on this state, trial route (a very simple, but fast routing) and/or parasitic extraction might
be run automatically prior to the timing analysis. This will improve the accuracy and help to avoid
unnecessary iterations.
Student Task 20:
Open Timing Analyze Timing and make sure PRE-CTS and SETUP is selected.
Start the timing analysis by clicking OK.
Note: You could also do this from the command line with
enc> timeDesign -preCTS
As the design is not routed, CADENCE SOC ENCOUNTER will perform trial route and parasitic extrac-
tion before doing the timing analysis. A short summary will be displayed on the console (the actual
numbers may differ slightly):
+--------------------+---------+---------+---------+---------+---------+---------+
| Setup mode | all | reg2reg | in2reg | reg2out | in2out | clkgate |
+--------------------+---------+---------+---------+---------+---------+---------+
| WNS (ns):| -9.069 | -6.554 | -9.069 | -0.686 | -7.328 | N/A |
| TNS (ns):| -2684.3 | -1776.9 | -2392.1 | -1.172 | -43.761 | N/A |
| Violating Paths:| 861 | 732 | 454 | 7 | 6 | N/A |
| All Paths:| 1807 | 1342 | 817 | 18 | 6 | N/A |
+--------------------+---------+---------+---------+---------+---------+---------+
+----------------+-------------------------------+------------------+
| | Real | Total |
| DRVs +------------------+------------+------------------|
| | Nr nets(terms) | Worst Vio | Nr nets(terms) |
+----------------+------------------+------------+------------------+
| max_cap | 187 (187) | -3.774 | 188 (188) |
| max_tran | 368 (13826) | -8.333 | 387 (13867) |
| max_fanout | 0 (0) | 0 | 0 (0) |
+----------------+------------------+------------+------------------+
Density: 59.566%
Routing Overflow: 0.00% H and 0.25% V
------------------------------------------------------------
The summary gives a very good overview of the current design timing. Some explanations:
The analysis was run in setup mode, i.e. setup time checks were performed but no hold time
checks.
28
The columns contain numbers for all path in the design (ALL) or for specic path groups, e.g.
reg2reg for all register to register paths.
Worst negative slack (WNS) reports the slack for the most critical path. Negative numbers
mean that the constraints are violated by this value.
Total negative slack (TNS) is the sum of WNS for all violating paths. Together with the number
of violating paths this gure helps to see how severe the violations are.
Real/Total DRV show (electrical) design rule violations, some libraries have a maximum tran-
sition time for all nets. The report above shows that 370 nets have a transition violation (the
signal takes too long to change from logic-1 to logic-0 or vice versa). In addition 135 nets have
a maximum capacitance violation (the total amount of capacitance driven by a net exceeds the
limit set by the design library). These violations are mostly related to excessive parasitic capac-
itance due to interconnections, and generally cause timing violations as well. However, even if
a DRV does not cause a timing violation it needs to be xed.
DENSITY and ROUTING OVERFLOW show the placement utilization and routing resources, i.e.
are a measure for the feasibility of the current oorplan/placement.
Remark: Refer to exercise 4 of VLSI I
25
if you have problems with timing concepts.
The summary looks really terrible. Obviously we have many timing violations that we need to have a
closer look at, before we try to optimize the timing with CADENCE SOC ENCOUNTER .
Here are some important points to consider when doing so:
The timing depends entirely on the constraints you have specied in the le src/chip.sdc. The
most common mistake is to have errors in this le. Before you go any further make sure that
your timing constraints are correct.
Make sure to not accidentally use constraints that were written for the core level (chip without
pads) at the chip level (with pads) and vice versa. The pads affect the I/O timing quite a bit and
the drive capabilities of a standard cell and an output pad are entirely different, i.e. set_load
needs to be very different.
Inputs and outputs used for test and debugging may cause timing violations. Most of these
signals are not dynamic (they are not toggled during normal operation) and the timing paths
originating from these inputs or ending at these outputs should be ignored, i.e. left uncon-
strained or explicitly disabled.
To speed up delay calculation CADENCE SOC ENCOUNTER does not compute the timing of
nets with a fanout above a certain limit but rather swaps in predened values for delay, capaci-
tance and transition time. All these numbers are specied on the DESIGN IMPORT form on the
ADVANCED tab in the Delay Calculation category. As a result you will not see the real timing
26
of these net in timing analysis and furthermore optimization will not see (and therefore not x)
violations
27
on these nets. However, this is usually the desired behavior as we give these nets
a special treatment anyway (with CTS).
25
You can access the exercise descriptions, les, and solutions under /home/vlsi1/u4.
26
To see the real timing you can change the limit on-the-y from 1000 to a very high value in the console with
setUseDefaultDelayLimit 100000. More on this topic later.
27
DRV violations will be xed but no setup/hold violations. Clock nets are even more special, also no DRV xing will
be done there.
29
Lets now examine the detailed reports that were generated by timing analysis and can be found in
the timingReports folder. Each analysis produces multiple les. Among these there are three les
dedicated to design rule violations (max capacitance:
*
.cap , max fanout:
*
.fanout, max transition
time:
*
.tran violations), and separate
*
.tarpt timing analysis report les for different path groups
(in2out, in2reg, reg2reg, reg2out)
Student Task 21:
Where do the violating paths in the in2out path category start?
Where do the violating paths in the in2reg path category start?
Do the paths in reg2out and reg2reg look like normal path that should be optimized to
meet timing or is there something wrong?
Why are the reg2reg paths too slow? Look for large numbers in the Delay column and
check the drive strength of the corresponding cell.
There are several different problems in the .sdc le that we have used. First of all, two of our inputs
should not be considered for timing analysis
28
. We also have several nets (clock, reset and scan
enable) that we will take care of separately (using the clock tree synthesizer, which we will see later).
These nets will show up in the DRV reports. We do not want to solve timing related problems for
these nets (since they will anyway be solved later), the time and effort required to optimize these nets
could prevent other parts of the design to be optimized.
We can use the DEFAULT PIN LIMIT feature of CADENCE SOC ENCOUNTER to stop CADENCE SOC
ENCOUNTER from extracting timing information (and reporting timing violations) for the nets that we
will be optimizing later on. By default the pin limit of CADENCE SOC ENCOUNTER is set to 1000. In
our case this number is too high (we have slightly more than 400 ip ops in our design).
Student Task 22:
Let us see the nets which have a large fanout. Report all nets with e.g. more than 400 pins.
Use the console command:
enc> report_net -min_fanout 400
Now set a suitable limit with the command
enc> setUseDefaultDelayLimit <number>
so that the high fanout nets will not be considered for timing. Also make the neces-
sary changes to the timing constraints le src/chip.sdc to disable the offending input-
ports. Reload the timing constraints by selecting the menu Timing Load Timing \
Constraint ....
Then rerun timing analysis.
If you have done everything correct, the only setup violations should be in the path group register-
to-register and register-to-out. There should no longer be pins that belong to scan enable or reset
network in the transition time violation report.
28
Cadence SoC Encounter provides a special timing calculation mode that is called Multi-Mode Multi-Corner
Analysis (MMMC). In this mode it is possible to dene several scenarios (i.e. separate test and functional modes).
The setup for MMMC is slightly involved and will not be covered as part of this exercise.
30
7.2 Optimization
In order to (better) meet the constraints, CADENCE SOC ENCOUNTER can try to optimize the design
at every stage of the design process. In our case, the worst setup time violation is about 5.8 ns (for
a 8 ns period), although the netlist delivered by the synthesis tool had no timing violations. This is
due to differences in interconnect parasitics between the two tools. While the synthesis tool relies on
an estimate (statistical model based) CADENCE SOC ENCOUNTER can use the real placement and
(trial-)routing at hand. Consider the following line from a timing report (broken down over many lines
for readability)
Path 1: VIOLATED Setup Check with Pin i_filter_top/u_filter/u_filter_stage_5/
RegxDP_reg_42_/CK
Endpoint: i_filter_top/u_filter/u_filter_stage_5/RegxDP_reg_42_/D () checked
with leading edge of ClkxCI
Beginpoint: i_filter_top/u_ram_wrapper/i_ram/DO5 ()
triggered by leading edge of ClkxCI
Path Groups: {reg2reg}
Other End Arrival Time 0.000
- Setup 0.149
+ Phase Shift 8.000
= Required Time 7.851
- Arrival Time 14.405
= Slack Time -6.554
Clock Rise Edge 0.000
= Beginpoint Arrival Time 0.000
Timing Path:
+----------------------------------------------------------------------------------------------------------+
| Instance | Arc | Cell | Slew | Load | Delay | Arrival |
| | | | | | | Time |
|-----------------------------------+---------------+--------------------+-------+-------+-------+---------|
| | ClkxCI | | 0.000 | 1.828 | | 0.000 |
| ClkxCI_PAD | I -> O | XMD | 0.000 | 0.000 | 0.000 | 0.000 |
| i_filter_top/u_ram_wrapper/i_ram | CK -> DO5 | SY180_2048X16X1CM8 | 0.130 | 0.033 | 1.750 | 1.750 |
| i_filter_top/u_ram_wrapper/i_test_| A -> O | MUX2 | 8.441 | 1.874 | 3.973 | 5.722 |
| bypass_mux5 | | | | | | |
The last line reports an standard cell instance MUX2 with low driving capability (2) that has to drive a
big load on its output (1.876 pF). The propagation delay is therefore huge (3.98 ns).
The timing of the same cell as reported by synthesis are: Delay: 0.15 ns, Slew: 0.09, Load: 0.01.
While this is an extreme case you see how synthesis can be wrong without knowing the actual
placement and wire loads.
Student Task 23:
Open the optimization form by selecting Timing Optimize ....
DESIGN STAGE needs to be set to the current design stage. Some options are only available
for certain stages, e.g. hold time optimization can not be performed during PRE-CTS as it
doesnt make much sense.
Timing is not the only thing that can optimized. Most technologies specify design rules
like maximum transition time, maximum capacitance driven by a certain cell or maximum
fanout.
After pressing the MODE button, within the THRESHOLDS section you can nd options that
can be used to tighten the constraints in order to get some margin
a
.
31
Set the options as shown in the gure below and hit OK. Watch the progress of the op-
timization in the console window. CADENCE SOC ENCOUNTER is very verbose with its
actions.
a
Cadence SoC Encounter will already automatically add a small margin on its own (internally)
During optimization CADENCE SOC ENCOUNTER can select different drive strengths for cells, add/re-
move buffers and inverters, move instances or even restructure part of the logic (just like synthesis
does).
Optimization is done using iterations of timing analysis, optimization, trial-route and parasitic extrac-
tion.
As a last step CADENCE SOC ENCOUNTER performs a timing analysis on the optimized design,
prints the summary to the console and writes the detailed reports to the timingReports directory.
Student Task 24:
Take a look at the summary and the nal reports generated. There should be no violations
left.
But what happens if we can not x the violations with optimization? Again, rst make sure to under-
stand what your constraints are and why they are violated. Often there are errors in converting the
design specications to constraints (is the input delay really 3.5 ns? Also for this pin?) and describing
them properly with the commands available. If you still have problems, there are three levels where
you can reach a solution:
Optimization during backend design (CADENCE SOC ENCOUNTER )
CADENCE SOC ENCOUNTER can optimize the design at every stage of the design process. In
general, the earlier the stage, the more changes can be done, e.g. PRE-CTS optimization has
much more exibility than POST-ROUTE optimization. At the PRE-CTS stage registers can be
moved and resized, this will no longer be possible after clock tree insertion. On the other hand,
the parasitic interconnect information is much more accurate with later stages of design, so the
timing information (and hence the optimization goals) will be more accurate.
We can (re)run the optimization at various stages, try a new placement or even start with a
new oorplan. It is impossible to give general guidelines, you will have to see what works best
for your design. If you are far from meeting your target (e.g. for a 10 ns clock, if after all
optimizations you still have a timing violation of 2 ns), you may need to go back to synthesis.
32
Optimization during synthesis
Once you have tried to place and route a netlist you will get a better idea about the relationship
between synthesis results and back-end results (area and timing wise). You may use this
information to adjust the timing constraints and re-synthesize the circuit.
Architectural optimizations
If nothing else helps, you will have to modify your architecture. During this iteration you will have
a much better idea about what is critical for your circuit.
If all of the above fails, you will have to see if the specications could be changed.
Student Task 25:
Your design has changed considerably as the optimization algorithms have modied the
netlist and placement. Save it by using Design Save Design As.
8 Clock Tree Insertion
The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets
(that drive hundreds or even thousands of inputs) need to be handled differently from standard inter-
connections. Note: For timing analysis we did adjust the pin limit (setUseDefaultDelayLimit) in
order to treat them differently.
Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits
reset and scan-enable signals have to be distributed to each and every ip-op as well.
The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each
driven input adds its own input capacitance to the total load capacitance and in addition, the intercon-
nection required to distribute the signal to all these inputs increases the load capacitance further.
There are three important parameters for such nets:
Transition time This is the time it takes to change the logic level of a node (e.g. 0 1). Basically,
the more load an output has to drive, the more time is required to charge this load. CMOS
drivers consume additional short circuit current during the transition, therefore long transition
times are not very welcome. Furthermore, noise on signals with long transition times can result
in glitching. Most libraries set an upper limit for the transition time (for the technology we are
using this is 1.79 ns for typical libraries). To lower the transition time, a tree of buffers can be
inserted so that the total load is shared between the buffers. The lower the desired transition
time, the more buffers are required.
Insertion delay The time required for the signal to travel from the driver to the end-points. This delay
is usually different for each end-point. Each level of buffers in the buffer tree will add a delay to
the signal.
Skew The difference between insertion delays of different end-points. To minimize skew, a balanced
buffer tree has to be built. Generally, the lower the desired skew the more buffers are required.
What parameters are most important depends on the type of net:
33
Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew
depends on the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is
acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and
would be to high.
If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion
time will rise, which will affect your input and output timing. Therefore you will want to balance
the skew against insertion delay and the number of buffers. Constraining maximum insertion
delay too low will usually degrade results.
Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so
you dont have to worry about that.
Reset We are interested in propagating the reset within one clock cycle to all ip-ops in our design.
For designs with on-chip reset synchronization this is strictly required. The insertion delay
should therefore be less than the clock period, transition times within the bounds imposed by
the technology and skew doesnt matter at all.
Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore
we can allow even a larger insertion delay. For transition time and skew the same holds true as
for the reset.
Buf Tran
Sink Tran
Sink Tran
Sink Tran
Sink Tran
Buf Tran
Buf Tran
Min Delay
Max Delay
Max Skew
AutoCTS
Root Pin
In CADENCE SOC ENCOUNTER , clock tree synthesis (CTS) is used to generate optimized buffer
trees to drive high fan-out nets. It can be congured to satisfy a variety of constraints.
Student Task 26:
A sample clock tree synthesis conguration le can be found under src/sample/chip.ctstch\
sample. The sample le contains three different congurations for a clock, a reset and a
scan enable signal.
Copy this le to the src directory and adapt the AutoCTSRootPin statements to match
your design.
For educational purposes, change the clock tree specications as follows: max. skew
0.2 ns, max. insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins
0.4 ns
a
34
Take a closer look at the other two trees too.
a
It is usually not a good idea to specify a small max. insertion time such that this becomes a limiting factor for
CTS. Results may degrade signicantly and for most designs the insertion delay is not very important anyway.
If the design employs a reset synchronization register (the example design has one) the source of
the reset tree must be the output of the synchronization register. Note that there is a special option
named SetASyncSRPinAsSync YES for the reset tree denition. This allows set and reset pins to
be considered as targets for the clock tree optimization.
The scan-enable signal is also a special case. Normally the clock tree synthesis algorithm starts at
the AutoCTSRootPin and traces through the netlist in order to nd valid endpoints. Per default,
combinational gates will be traced through and clock and asynchronous input pins of sequential
elements (ip-ops) will be stopped at.
By specifying the NoGating rising option, we can make the tracer stop at the rst gate encoun-
tered. This is necessary since the scan enable signal is often connected to multiplexers and we want
their input pins to be endpoints. Once this option is underway you need to specify the internal pin of
the pad driving the scan-enable signal, otherwise tracing will stop prematurely at the pad cell.
Student Task 27:
Read in the clock tree specication by selecting Clock Design Clock ... from the
menu. Using the browser select the clock tree specication le you have just modied.
Press LOAD SPEC. DONT PRESS OK yet
a
. You should now see a summary for all three
clock specications on the console, check it.
Our netlist may have some buffers on the high fan-out nets we want to build trees on. We
need to remove them prior to CTS with the following command:
enc> deleteClockTree -all
a
Pressing OK will start the clock tree insertion. We need to make sure that the clock tree specication is correct
before we go ahead with this step. If you accidentally pressed OK here, it is advised to restart from the last
saved point.
A large number of errors can be discovered by analyzing the pins connected to these nets, even
before building a clock tree.
Student Task 28:
Select Clock Trace Pre-CTS Clock Tree .... To start the trace, click on the icon
on the top left and accept the default trace le name. A summary will be displayed on the
console and the content of the trace le visualized in the GUI.
35
We can see how the trees currently look like and what pins are connected to them. Look also at the
trace le directly. Things to look for include:
Clock, reset, or scan-enable connecting to unexpected input pins, e.g. the reset signal should
not connect to pins other than asynchronous set/reset pins of sequential elements.
Unexpected latches on the clock tree can be discovered this way (G or GB pin).
Discrepancy between the number of endpoints of clock, reset and scan trees. For our example
numbers are as follows:
clock tree: 443 with 442 ip-op CK pins + 1 RAM CK pin
reset tree: 441 ip-op RB pins
scan tree: 447 with 441 ip-op SEL pins + 6 mux S pins, to choose between the functional
and test (scan chain) output signal.
As we see, 442 ip-ops are clocked but only 441 receive a reset signal, this is due to the reset
synchronization register being connected to the external reset signal rather than the internal
reset tree. As the reset synchronization ip-op is also not on the scan chain and we use full
scan otherwise the 441 ip-ops on the scan tree match perfectly. You get the idea...
Student Task 29:
Open the le chip.cts trace and search for Clock Tree to examine the leaf pins.
If everything looks OK we can proceed with clock synthesis. In the SYNTHESIZE CLOCK
TREE form press OK.
After a few minutes clock tree synthesis will be completed. Detailed reports will be generated under
the directory specied on the form (most likely clock report). This directory includes a simple report
le (clock.report).
36
A summary report is also displayed on the CADENCE SOC ENCOUNTER console. The rst column
shows the achieved performance while the second column reports the target specied in the cong-
uration le.
Student Task 30:
Check your results (summary and detailed reports). How many buffers were added? How
many levels created? Whats the insertion delay? Are all constraints met?
Note 1: You will get a max transition time violation on ClkxCI_PAD/I which can safely be
ignored. As we have specied an input transition time of 800 ps on all primary inputs there
is no way CTS could fulll the 600ps requirement at this point.
Note 2: Unless the RouteClkNet YES option was used (more on this later), the
timing gures reported are only estimates and might change quite a bit with detailed routing.
9 Timing Revisited
At this point we will have to go into some more detail about timing. During different stages of the de-
sign ow, we have slightly different timing constraints (Refer to the following gure for the differences
in the three stages).
a) synthesis initially the design does not contain any pads. The input delay t
idel
and the output delay
t
odel
should contain the contribution of the input t
inpad
and output t
outpad
pads.
b) pre-CTS during placement and routing phase, all required I/O pads and drivers will be present.
At this stage there is no clock tree present. The timing should be adjusted, as at this moment
the input delay t
idel
and output delay t
odel
no longer include the pad delays.
c) post-CTS once the clock tree is inserted, the timing will change slightly again. Due to the clock
insertion delay t
di
the internal clock will be slightly offset when compared to the external clock.
At the input, the data travelling towards the rst ip-op inside the chip, will have more time,
since this ip-op will be trigerred by a clock signal that has been delayed by t
di
. At the output
however, the data that is coming from the chip will be launched with the internal clock, but will
have to be sampled by the external clock. Consequently there will be less time for this signal.
It should now be clear why it might be desirable to set constraints on the clock insertion delay property
by specifying minimum and maximum values in the chip.ctstch le by MinDelay and MaxDelay
parameters. The clock insertion delay can play an important part in the I/O delay. You may want to
keep the insertion delay within certain limits to ensure proper I/O timing.
Design tools have different mechanisms to deal with these three different cases. The simple solution
is to use multiple constraint les for different stages. However, both SYNOPSYS DESIGN COMPILER
and CADENCE SOC ENCOUNTER accept several parameters to deal with this problem automati-
cally. In the following we will discuss on how CADENCE SOC ENCOUNTER calculates delays in the
presence and absence of clock tree. The following table summarizes the most important settings:
37
timing analysis mode clock propagation mode clock latency
(setAnalysisMode) (set propagated clock) (set clock latency)
-noSkew forced ideal no effect
-skew -noClockTree forced ideal SDCs in effect
-skew -clockTree SDCs in effect
a
SDCs in effect
b
a
still ideal mode unless set propagated clock is set
b
set clock latency command is overridden by overlapping set propagated clock constraints
The timing analysis mode is automatically updated by CADENCE SOC ENCOUNTER to match the
design stage, i.e. before clock tree insertion it is set to -skew -noClockTree and afterwards to
-skew -ClockTree. The analysis mode can also be changed manually with the setAnalysisMode
command.
The two synopsys design constraints (SDC) set_propagated_clock and set_clock_latency
are usually specied by the designer in the chip.sdc le. Furthermore, CTS tries to add a
set_propagated_clock constraint on-the-y (in memory), which can cause a number of prob-
lems:
This constraint will only be added if the AutoCTSRootPin pin/port in chip.ctstch and the clock
waveform source pin/port (from the create_clock command in chip.sdc) are perfectly identi-
cal, i.e. not port vs. instance pin etc.
This constraint is never written to your chip.sdc le, so if you reload that le the constraint is
lost.
Before CTS, only a pointer to your constraints le is saved along with the database. Now, if a
constraint was added by CTS, all loaded constraints (including the new one) will be saved along
with the database to a new le (*.pt). Restoring this database will then load this new constraints
le instead of the one in encounter/src/ that you might have expected.
Note: As soon as you manually (re-)load a constraints le, the behavior is reverted to the normal
one.
Now, as can be seen from the table above, to get the actual timing of the buffers/inverters on the
clock tree instead of ideal mode, setting both -skew -ClockTree and set_propagated_clock
is required. Also note that set_propagated_clock gets overridden for all pre-CTS design stages
and could therefore be set right from the start (as already mentioned earlier).
In ideal mode, the clock tree insertion delay is zero unless the set_clock_latency command
is used to specify a different number, preferably close to the delay of the real tree (that is still to
be inserted). While this placeholder delay has the advantage that the I/O timing doesnt change
between pre-CTS and post-CTS phases, it renders timing reports more intransparent and is not
handled exactly the same across different tools. Therefore, do not use this command unless you
know what you are doing.
In conclusion, it is recommended to include set_propagated_clock right from the start, not use
set_clock_latency and load modied timing constraints after CTS only if required, i.e. when the
I/O timing numbers (set_input_delay, set_output_delay) need to be adjusted to account for
the actual clock tree
29
. For this training we will modify and reload the constraints
30
.
29
For slower clock speeds and/or uncritical I/O timing this is often not required.
30
It might be more convenient to keep a separate post-CTS constraint le rather than changing the numbers back and
fourth when redoing the ow.
38
t
inpad
t
outpad
t
odel
t
idel
T
clk
t
in2reg
t
reg2reg
t
reg2out
t
di
T
clk
T
clk
t
pd ff
t
su ff
t
pd a
t
pd b
t
pd c
t
pd d
t
pd e
t
pd ff
t
su ff
t
pd ff
t
su ff
t
odel
t
reg2out
Internal Clock
External Clock Clock insertion delay
Less time for output
Chip
Clk
a b c d e
t
inpad
t
outpad
t
odel
t
idel
T
clk
t
in2reg
t
reg2reg
t
reg2out
T
clk
T
clk
t
pd ff
t
su ff
t
pd a
t
pd b
t
pd c
t
pd d
t
pd e
t
pd ff
t
su ff
t
pd ff
t
su ff
Chip
a b c d e
Clk
Clk
t
inpad
t
outpad
t
odel
t
idel
T
clk
t
in2reg
t
reg2reg
t
reg2out
T
clk
T
clk
t
pd ff
t
su ff
t
pd a
t
pd b
t
pd c
t
pd d
t
pd e
t
pd ff
t
su ff
t
pd ff
t
su ff
Top
a b c d e
t
in2reg
t
idel
More time for input
a)
b)
c)
The previous gure illustrates all three stages in some detail. Whereever possible the same naming
conventions as the textbook have been used
31
31
Refer to page 235 How to formulate timing constraints, and page 346 How to achieve friendly input/output timing
for more on this topic
39
Student Task 31:
Copy your timing constraints le to filter_chip_postCTS.sdc and then modify the I/O
timing constraints to account for the insertion delay of the actual clock tree, make sure
that the clock is set to PROPAGATED MODE and load the constraints (Timing Load \
Timing Constraint ...
a
)
Run timing analysis (make sure to select POST-CTS as design stage).
Examine the reports timingReports/chip postCTS
*
. You should now see the real timing on
the clock network.
If you have violations, run a POST-CTS (!) optimization with default settings. This should
x all violations.
Save the entire design.
a
Currently loaded constraints will be purged before the new ones get loaded.
10 Signal Routing
We will now route the signal nets. What you have seen so far are only trial-route nets that are not
DRC clean and can therefore not be manufactured.
Student Task 32:
There are two routing engines in CADENCE SOC ENCOUNTER . WRoute is the older one
and NanoRoute is supposed to be the latest and greatest. Start NanoRoute by selecting
Route NanoRoute Route.... A large window will open. Enable the INSERT DIODES
option (you can leave the DIODE CELL NAME eld blank) and leave all other settings at their
defaults
a
. Click OK to start routing. You can observe the progress in the console window.
a
On multi-CPU or multi-core machines you can increase the number of CPUs used by selecting Set Multiple
CPU. This gives almost a linear speedup.
The FIX ANTENNA and INSERT DIODE will cause the router to change layers and/or insert special
protection diodes in order to avoid damages that can happen during manufacturing due to charges
that accumulate on the wires and stress the gate oxide of input pins. Note that this is usually referred
to as PROCESS ANTENNAS which is entirely different from geometrical antennas (which is related to
dangling wires).
40
Our example design should route without problems. This is not always the case and we might get
geometry violations. Geometry violations include shorts between nets and design rule violations (for
example metal lines are drawn too close to be manufactured as separate wires). Needless to say
that we must solve all these violations.
You should always closely examine the violations in order to nd out what causes them. Sometimes
there is an unfortunate placement of macro-cells or power lines to blame and sometimes there is just
not enough space to route all connections. Solutions range from re-running routing to completely
reworking the oorplan.
Student Task 33:
Now that we have the real signal wiring we need to perform a postroute timing analysis to
see if we still meet all constraints. At this point not only a setup time analysis, but also a
hold time analysis needs to be run. Usually it is not necessary to deal with hold time until
this point.
Note that you have to do two separate runs, one for setup and one for hold, as it is not
possible do this in one single step. Use the GUI (make sure to select POST-ROUTE) or type
the commands below to perform the two analyses.
enc> timeDesign -postRoute
enc> timeDesign -postRoute -hold
Inspect the two summaries and the report les written to the timingReports directory. You
will most likely have setup violations.
To x violations or increase the hold margin we can now perform a postroute optimization. Internal
hold time violations need to be xed in any case as, unlike internal setup violations, they can not be
41
avoided later on (i.e. real chip) by lowering the clock speed
32
.
Further possibilities to improve timing include over-constraining the POST-CTS optimization and en-
abling the TIMING DRIVEN option of NanoRoute. Earlier in the ow, TIMING DRIVEN PLACEMENT
might be worth a try. Please note that the biggest improvements are possible with Pre-CTS opti-
mization as the registers can be moved and resized at that stage. Per default, clock tree insertion will
x the registers to preserve the clock tree, i.e. they no longer can be moved or resized.
Student Task 34:
If you have large reg2reg setup violations, this step may take a very long time. During the
initial iterations of the design, it might be a good idea to use a more conservative (using a
longer clock period) timing constraint so that not much time is spent during the optimization.
Once you are satised with all other aspects of the design, you could revert to the original
time constraints and let the optimizer try to achieve the timing.
Perform a postroute optimization Timing Optimize ....
Optimization will delete and re-route all nets that are affected by the changes and run setup
and hold mode timing analyses at the very end. Once again, inspect the reports.
Student Task 35:
Now let us have a look at the postroute timing of our clock tree(s)
enc> reportClockTree -postRoute
This will print a summary on the console and write a couple of report les chip.ctsrpt
*
to
the encounter directory. There should be no (or only minor) violations of our clock tree
constraints.
Please note that the previous postCTS and postRoute setup (and hold) analyses already
consider clock skew as they time every single path from the clock root to the leaf pins
separately. Therefore, even a rather big skew reported here doesnt really matter as long
as the former analyses passed.
So far, the clock tree has been routed as any other signal net. This is usually good enough, but if you
want, for whatever reason, to further improve clock net timings, you can do the following (in CTS):
32
This does not necessarily hold true for multi-clock designs.
42
In the clock tree constraint le, set RouteClkNet YES. This is a per-tree setting that instructs
CTS to call NanoRoute in order to route this clock net during clock tree insertion. The wires
get a status of FIXED and will therefore not be changed later during signal routing. While this
improves timing on the clock tree, overall routability gets worse.
To further improve timing, you can tell NanoRoute to route this net not like an ordinary signal
net, but to create a balanced routing (by following the so called RouteGuide computed by
CTS). To do so, set UseCTSRouteGuide YES in the clock constraint le
33
.
11 Timing Debug
To analyze timing violations, CADENCE SOC ENCOUNTER also offers a graphical interface (Timing
Debug Timing) that visualizes paths and allows cross-probing with the layout. We will not explain
the tool in detail here, but rather make some important notes:
This functionality is sort of standalone, it does not use results from the timeDesign command
but runs a new analysis that generates the le top.mtarpt. Then these paths are visualized.
If the above le already exists, it will usually simply be loaded. This means that whenever your
design has changed you have to regenerate this le in order to get up to date data. This can be
done with the GENERATE switch on the form that opens when you click the folder icon.
When generating the top.mtarpt, the current timing mode is relevant, i.e. to analyze hold paths
timing mode has to be set to hold mode.
33
This will persistently(!) alter the global CTS Mode to setCTSMode -useCTSRouteGuide
43
12 Finishing
We are almost done with backend design, there are only a few steps required to nish the layout and
verify that everything is correct.
12.1 Insert Filler Cells
Student Task 36:
Now that we dont need the additional space within the standard cell rows anymore, we
have to ll these gaps with ller cells. This is required for fabrication. In addition, some of
them contain capacitors between VCC and GND that lter spikes on the power lines.
enc> source scripts/fillcore.tcl
Note that your row utilization will be 100% after this step. This means that you will have no room
for further optimizations. Make sure to insert ller cells after all optimizations have been completed.
44
Note: It is also possible to remove the ller cells with Place Filler Delete... or by using
the script removellcore.tcl.
12.2 Checking Connectivity and Geometry Violations
Now that we are completely nished with the layout, we should make sure that we have no connection
errors, i.e. all logic connections from the netlist are also present in the physical layout.
Student Task 37:
Select Verify Verify Connectivity ... from the menu. A window will appear.
Run the analysis and check the console for the report summary. There should be no
violations.
In a similar way let us verify all geometrical shapes. Select Verify Verify Geometry \
... from the menu. Run the analysis and check the report on the console. You should get
no violations.
There is a script that will perform the last verication steps for you automatically. You can set a
variable DESIGNNAME to assign the base name for all the les generated by this script.
enc> set DESIGNNAME MyBeautifulChip
enc> source scripts/checkdesign.tcl
45
12.3 Evaluate the Physical Design
Take the time to examine the routing. This is the main feedback you need for a second back-end
iteration. Try to view all metal lines separately to see how congested your routing is. If you see a lot
of Metal-6 (orange) you are probably close to the density limit. In our design you should not notice
any congestion and Metal-6 will barely be used. If your design routed without problems and the
routing was rather sparse then the next time you could assign a smaller core area and increase the
row utilization. On the other hand if the design barely routed you have found the limits, in a second
iteration you might consider assigning a little more core area timing degrades with congestion.
Check the connections of your macro-cells and pads, this may give you an idea how to place the
macro-cells the next time around. You need to get used to evaluating the result of different back-end
design runs.
12.4 Generate Output Files
Congratulations, you have completed the back-end design. That was not so hard now, was it?
Student Task 38:
Save your design using Design Save Design As ... SoCE to the save directory
and make sure that you use a name that shows this is a nished design (i.e. chip nal.enc).
Finally we need to export all data needed for post layout simulation and physical verication
(DRC/LVS). There is a script that will write out all relevant les to the out/ directory
a
.
enc> source scripts/exportall.tcl
a
To get complete supply net connectivity in the Verilog netlist for LVS, the missing connections for the power
and ground pins (GNDIO/VCC3IO) of the pads are added and removed on-the-y. We could also dene and
handle these two nets in the same way as VCC/GND, but there are more drawbacks than benets.
Similar to the checkdesign.tcl le, the variable DESIGNNAME will be used to assign the base name of
the les. If you do not specify a name, final will be used. After you complete this step you will have
the following les:
*.v This is the nal netlist. Make sure to use this netlist for post layout simulations.
*.gds.gz The layout in GDSII (Graphic Design System II) format. This is the standard format for
exchanging layout data.
*.sdf.gz The SDF (Standard Delay Format) le to be used for post layout simulation.
46