Cadence Workshop Trainee

BD03: Digital Physical Design
Version 1.0
STUDENT
HANDOUT
June 16, 2008
Legal Notices
Copyright Notice
© 1990-2008 Cadence Design Systems, Inc. All rights reserved.
e
When printed on paper, this presentation qualifies as a STUDENT HANDOUT.
This course and the material in it is owned by Cadence Design Systems, Inc. (Cadence), 2655 Seely Avenue, San Jose, CA
95134, USA. Unless you have received express written approval directly from Cadence, you are not allowed to copy, scan,
c
replicate, disclose, distribute, or publish this document, or any part of it.
Confidentiality Notice
n
No part of this publication may be reproduced in whole or in part by any means (including photocopying or storage in an
information storage/retrieval system) or transmitted in any form or by any means without prior written permission from
Cadence Design Systems, Inc. (Cadence).
e
Information in this document is subject to change without notice and does not represent a commitment on the part of Cadence.
The information contained herein is the proprietary and confidential information of Cadence or its licensors, and is supplied
subject to, and may be used only by Cadence’s customer in accordance with, a written agreement between Cadence and its
d
customer. Except as may be explicitly set forth in such agreement, Cadence does not make, and expressly disclaims, any
representations or warranties as to the completeness, accuracy or usefulness of the information contained in this document.
Cadence does not warrant that use of such information will not infringe any third party rights, nor does Cadence assume any
a
liability for damages or costs of any kind that may result from use of such information.
RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in
c
subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013.
UNPUBLISHED This document contains unpublished confidential information and is not to be disclosed or used except as
authorized by written contract with Cadence. Rights reserved under the copyright laws of the United States.
6/16/08 BD03: Digital Physical Design 2

Copyrights and Trademarks
Cadence Trademarks
Trademarks and service marks of Cadence Design Systems, Inc. (Cadence) contained in this document are attributed to
Cadence with the appropriate symbol. For queries regarding Cadence’s trademarks, contact the corporate legal department at
e
the address above or call 800.862.4522 from the US or +1.408.943.1234 internationally.
Allegro® HDL-ICE® Silicon Ensemble®
c
Accelerating Mixed Signal Design® Incisive® Silicon Express™
Assura® InstallScape™ SKILL®
BuildGates® IP Gallery™ SoC Encounter™
n
Cadence® (brand and logo) NanoRoute® SourceLink® online customer support
CeltIC® NC-Verilog® Specman®
e
Conformal® NeoCell® Spectre®
Connections® NeoCircuit® Speed Bridge®
Diva® OpenBook® online documentation library UltraSim®
d
Dracula® OrCAD® Verifault-XL®
ElectronStorm® Palladium® Verification Advisor®
a
Encounter® Pearl® Verilog®
EU CAD® PowerSuite® Virtuoso®
Fire & Ice® PSpice® VoltageStorm®
c
First Encounter® SignalStorm® Xtreme®
HDL-ICE® Silicon Design Chain™
Other Trademarks
Open SystemC, Open SystemC Initiative, OSCI, SystemC, and SystemC Initiative are trademarks or registered
trademarks of Open SystemC Initiative, Inc. in the United States and other countries and are used with permission. All
other trademarks are the property of their respective holders.
Phase 1: Curriculum Map

General (all disciplines)
e
BG01
BG02 IC Process
Semiconductor BG03 IC Packaging BG04 Test and DFT
and Devices
c
Business Processes
n
Digital Discipline
e
BD01 Digital IC BD02 Digital IC BD03 Digital
Tool
Lab Training
Training
Architecture Design Physical Design
Analog Discipline
a
BA01 Analog IC
d
BA02 Mixed-Signal
BA04 Mixed-Signal
c
CMOS Physical Lab Training
Design IC Design
Implementation
or
BA03 RF IC Design

Course Objectives
After taking this course, you will be able to
e
Draw a complete flowchart of the digital design implementation flow
and explain the steps in detail
n
design and timing analysis
c
Describe how cell libraries and timing libraries are used for physical
Explain the steps involved in synthesis, floorplanning, placement,
e
clock tree synthesis, routing, extraction, delay calculation, static timing
analysis, and design optimization
a d
Contrast power consumption and power grid analysis, and apply
power saving design techniques
c
Explain the issues involved with signal integrity
Describe how engineering change orders (ECOs) are performed, and

how chips are physically verified and taped out
Course Policies
It is important that you attend class. Your participation is essential.
e
Three or more absences will damage your grade.
Be on time.
Be respectful of each other.
nc
Conduct only one conversation at a time.
e
Turn off cell phones, pagers, and laptops.
d
Get involved.
Come prepared to discuss the day’s assignment.
a
Volunteer.
c
Ask questions.
Share relevant ideas and observations.
Offer your own experiences.

Assignments and Grades
Assignments
e
Assignments are discussed in class.
Assignments are due on the date indicated in the syllabus.
Grades
nc
Keep a copy of all assignments you hand in.
d e
A. Outstanding achievement, exceeding course requirements
B. Praiseworthy performance, meets course requirements and criteria
a
C. Average, satisfactory performance
c
D. Below average, marginal performance
Assignments and Grades (continued)

Assignment Percentage Due Date
e
Homework 1: 15% 7/30/08
Describe the issues, changes in design flow,
c
and considerations that design teams must
take into account when designing for a deep
submicron process (90 nm or less).
n
Homework 2: 15% 8/6/08
Create a clock tree constraint file for
e
automatic CTS based on a specification.
Homework 3: 20% 8/13/08
d
Part I. Given several scenarios, calculate
static and dynamic power.
Part II. Given several IR-drop heat maps,
a
discuss the potential problems and solutions.
Part III. Given a block diagram and several
scenarios, discuss which possible low-power
c
design methods can be used to reduce overall
power.
Formal Study Group Presentation 10% 8/18/08 – 8/21/08
Final Exam 40% 8/22/08

Course Calendar
Week Day Module and Topics Assignments (Due Date)
e
1 July 21 Introduction to Digital Physical Design flowchart, activity in class.
Implementation
- Inputs
c
- Steps in Flow
1 July 22 Introduction and Overview of Layout LEF terms, activity in class.
n
Technology
- Layout Layers Homework Assignment 1 (7/3/08):
e
- Introduction to Physical Verification, Describe the issues, changes in design flow,
DRC/LVS, DRC and considerations that design teams must
- Cell Libraries, LEF Syntax take into account when designing for a deep
d
submicron process in 90 nm or less.
1 July 23 Timing Libraries and Constraint Create timing constraints, activity in class.
a
Files
- Concepts
- Libraries
c
- Constraint Files
2 July 28 Synthesis Review log file and optimization steps,
- Logical Synthesis Optimization Steps activity in class.
- Physical Synthesis Overview
Course Calendar (continued)

e
2 July 29 Floorplanning and Placement Examples of floorplans, activity in class.
- Floorplanning Fundamentals
- Placement Fundamentals
c
2 July 30 Clock Tree Synthesis Homework Assignment 2 (7/10/08):
- Clock Trees and Clock Tree Synthesis Describe the issues, changes in design flow,
n
- Clock Tree Specification and considerations that design teams must
- CTS Reports take into account when designing for a deep
submicron process in 90 nm or less.
e
- Low-Power Clocking Techniques
3 Aug 4 Routing Review routing log files, activity in class.
- Fundamentals
d
- Special Types of Routing
3 Aug 5 Power Consumption and Power Grid Homework Assignment 3 (7/17/08):
a
Analysis Part I. Given several scenarios, calculate
- Power Consumption static and dynamic power.
c
- Power Grid Analysis Part II. Given several IR-drop heat maps,
- Low-Power Design Techniques discuss the potential problems and solutions.
Part III. Given a block diagram and several
scenarios, discuss which possible low-power
design methods can be used to reduce
overall power.

e
3 Aug 6 Extraction and Delay Calculation Flowchart with SPEF/SDF, activity in class.
- Extraction Models and SPEF Format
- Delay Calculation Fundamentals and SDF
c
Format
4 Aug 11 Static Timing Analysis and Signal Timing and SI report analysis, activities in
n
Integrity Analysis class.
- Timing Constraints and Analysis
e
- Design Rule Verification
- Signal Integrity Fundamentals Analysis
4 Aug 12 Design Optimization Review optimization cases, activity in class.
d
- Fundamentals
- Types
a
4 Aug 13 Engineering Change Orders, Design ECO scenarios and tapeout requirements,
Verification, and Tapeout activities in class.
c
- ECO Types and Fundamentals
- Physical Verification Overview
- Tapeout Requirements

e
5 Aug 18 Formal Study Group Presentation
c
5
Aug 21
Aug 22 Final Exam
en
Formal Study Group Presentation
a d
6/16/08
c BD03: Digital Physical Design 12
Recommended Text
Hennessy, John L. and Patterson, David A. Computer Architecture,
Fourth Edition: A Quantitative Approach. San Francisco, CA: Morgan
e
Kaufmann. 2007.
c
ISBN-10: 0123704901
ISBN-13: 978-0123704900
en
a d
6/16/08
Instructor Information
Instructor name:
e
Phone:
E-mail:
Office location:
Office hours:
nc
d e
ca
Introduction to Digital Physical
Implementation
Module 1
June 16, 2008
The Life of a CMOS Inverter
Specification RTL Gates
e
“Device that outputs the module invx1(a,z);
inverse of its input with
c
input a; invx1
minimum size and power”
output z; a z
assign z=!a;
n
endmodule
d e
Transistor and Layout
a
VDD VDD
6/16/08
c a
GND
z a

VSS
z
16
Design Implementation Flow
Much like the simple CMOS inverter, the general process of digital design
implementation is the transformation of a design into various representations,
e
eventually into physical hardware devices, just on a much BIGGER scale.
SPEC
nc RTL Gates
d e
ca Layout
Module Objectives
In this module, you will be able to
e
Draw a complete flowchart of the digital design implementation flow
nc
d e
ca
Learning Activity
In this activity, you will
Complete a flowchart of the digital
design implementation flow
Include the design flow steps
Include the necessary inputs and
ce RTL
n
outputs Design Flow
?
e
Step
15 minutes for activity
a d ?
6/16/08
c BD03: Digital Physical Design
GDSII
19
Topics in This Module

Overall design flow
e
Basic implementation flow
Example flow
nc
d e
ca
Overall Design Flow
A design flow can be divided into three phases:
e
System
Logical
Physical
nc
In each phase, two main processes need to be performed:
Implementation
Verification
d e
ca
Overall Design Flow

VERIFICATION Specification
e
IMPLEMENTATION System Simulation Designer
SYSTEM
c
Microarchitecture
n
System Simulation Designer
e
RTL
LOGICAL
Formal
Verification Logic Simulation Logic Synthesis
a d
Gate Level Simulation
Gates
Place/Route
Synthesized Netlist
c
PHYSICAL
Timing GDSII Placed/Routed Design

Signoff
Physical Verification Layout
GDSII GDSII

Implementation Flow
Specification
e
Designer
SYSTEM
c
Microarchitecture
n
Designer
e
RTL
LOGICAL
Logic Synthesis
a
Gates
Place/Route
d Synthesized Netlist
6/16/08
c GDSII
Layout
GDSII
Placed/Routed Design
GDSII

PHYSICAL
23
Implementation Flow
Specification
e
Designer
c
Microarchitecture
Front-end chip design
FRONT-END
definition: Processes in
n
Designer the overall chip design flow
that involve system and
logical design and
e
RTL verification
Logic Synthesis
a
Gates
Place/Route
d
Synthesized Netlist
c
Back-end chip design
BACK-END
definition: Processes in the

GDSII Placed/Routed Design overall chip design flow that
involve physical design and
Layout verification
GDSII GDSII

Overall design flow
e
Basic physical implementation flow
Example flow
nc
d e
ca
Back End or Physical Design

The terms “physical design” or “back end” or
“place/route” encompass many process
e
steps, such as
Floorplanning
Placement
Clock Tree Synthesis (CTS)
Route
nc Gates
Place/Route
Synthesized Netlist
Extraction
Delay Calculation
d e
Static Timing Analysis (STA)
Gates
Place/Route
Placed/Routed Gates
a
GDSII GDSII
Signal Integrity
c
Design Optimization
Physical Synthesis
Design Verification
Mask Prep
Back-End Implementation Flow
Specification Floorplanning Place/Route
e
Designer Placement
c
Microarchitecture
Scan Reorder
Physical Synthesis
n
Design Optimization
Designer
Static Timing Analysis

Pre-CTS
Delay Calculation
Signal Integrity
e
Extraction
CTS
RTL
Design Optimization
d
Post-CTS
Logic Synthesis
Route
a
Synthesized Design Optimization
Gates Gates
Post-Route
c
Detail
Routed GDSII
Design
Layout Design Verification
GDSII GDSII Mask Prep

Overall design flow
e
Basic implementation flow
Example flow
nc
d e
ca
Flow Example
Let’s take a simple example through the implementation flow.
e
We will cover each step and highlight the following:
c
Definition and step in the overall flow
Inputs and outputs
Formats
Example per step
en
a d
6/16/08
What Is a Specification?
Ideas begin with a specification, which
Floorplanning Place/Route
Specification
can be a textual, graphical, or
e
sometimes a software representation. Designer Placement
Physical Synthesis
Microarchitecture
c
Scan Reorder
Definition: A specification is an
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
explicit set of requirements to be

Extraction
RTL
n
CTS
satisfied by a material, product, Design Optimization

Post-CTS
or service. Logic Synthesis
e
Route
Netlist Gates Post-Route
Example: The specification for
d
Detail
the latest chip specified a 250- Routed
Design
GDSII
MHz core clock with a serial
a
interface, able to process 1 Mb
GDSII
of data per second at less than GDSII Mask Prep
c
10W total power.

What Is a Microarchitecture?
Step between the specification and
RTL, the microarchitecture defines how
e
the block will be implemented. Floorplanning
Specification Place/Route
c
Definition: The microarchitecture Designer Placement
implements the specification and
Physical Synthesis
Microarchitecture Scan Reorder
n
defines specific mechanisms and

Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
structures for achieving that
Extraction
RTL
e
CTS
implementation. Design Optimization

Post-CTS
Logic Synthesis
d
Route
Example: For Block A, the Synthesized

Gates Design Optimization
Netlist Post-Route
designer created a
a
Detail
microarchitecture and partitioned Routed
Design
GDSII
the block into several smaller
c
modules.
Specification and Microarchitecture: Input and Output, Format

Specification
Input: Requirements from
ce
Marketing, CEO (Chief Executive
Officer), CTO (Chief Technology
Officer), etc.
n
Output: Document or model in Specification
text/graphics or software (C++,
e
SystemC, SystemVerilog, etc.)
Designer
format
Microarchitecture
a d
Input: Specification + requirement
from designer
Microarchitecture
6/16/08
c
Output: Typically a document in
text/graphics, could be software
as well
BD03: Digital Physical Design 32

Example: Specification
Let’s assume we have a specification,
microarchitecture, and RTL.
We are designing a chip called “EX”

with
c
Three main partitions “A,” “B,” and
e EX (Block Diagram)
n
“C”
e
Memories in each partition
din, clk A C dout
Perimeter I/O
250-MHz clock
a
10W total power
d B
c
Die size not to exceed 10x10 mm2
due to custom package
requirements
Example: Microarchitecture
For Block C
32-bit data bus interface to
Block A
16-bit control interface from
Block B
ce EX (Block Diagram)
n
Use 64 Mb of SRAM
e
Duplicate datapath elements in a 32
parallel implementation
din, clk A C dout
d
Limit of five clock cycles from data 16
input processed to data output B
ca
What Is Logic Synthesis?
Definition: The process of Floorplanning
parsing, translating,
e
Designer Placement
optimizing, and mapping RTL
code into a specified standard
Physical Synthesis
c
cell library

Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL CTS
n
Example: To determine the Design Optimization
Post-CTS
Logic Synthesis
feasibility of the design, we
e
Route
need to synthesize the RTL Synthesized

Netlist Gates Design Optimization
Post-Route
code into gates and measure Detail
d
Routed GDSII
timing, power, and area. Design
a
6/16/08
Logic Synthesis: Input and Output, Format

Input
RTL SDC
RTL in the Verilog® language or
e
other HDL
c
Constraints in Synopsys Design
Constraints (SDC) format
Logic Synthesis Library
n
Timing Libraries in Liberty (.lib)
format Synthesized
Gates
e
Netlist
Output
Gate Level Netlist in the Verilog
d
language or other HDL
ca
Example: Logic Synthesis
We use the RTL for blocks A, B, and C
RTL
to produce the following netlists:
e
For Blocks A, B, C
block_a.vg
c
block_b.vg Logic Synthesis
n
block_c.vg
Gates
At the top level EX, the module are
e
Synthesized
instantiated: Gates
d
// top.vg block_a.vg
block_b.vg
block_c.vg
module ex (…);
a
top.vg
block_a u0 (…);
c
block_b u1 (…);
block_c u2 (…);
endmodule
What Is Floorplanning?
Definition: Process of deriving Floorplanning
the die size, allocating space for
e
Designer Placement
soft blocks, planning power, and
macro placement.
Physical Synthesis
c
Design Optimization
Designer Pre-CTS
Delay Calculation
Signal Integrity
Example: The three blocks of the

Extraction
RTL CTS
n
chip were floorplanned to Design Optimization
Post-CTS
Logic Synthesis
minimize the distance between the
e
Route
I/Os of the blocks and their Synthesized

Post-Route
interfaces to the chip. This Detail
d
Routed GDSII
reduces the routing between the Design
blocks and, thus, improves the
a
timing and routability of the GDSII GDSII Mask Prep

design.
6/16/08
Floorplanning: Input and Output, Format
Input
Synthesized Netlist
e
SDC TCL
language or other HDL Gates
c
Floorplanning
n
Logical Timing Libraries in Liberty
(.lib) format Logical
Library
Physical
Library
Gates +
e
Physical Libraries in LEF format DEF
Floorplan constraints and script in Floorplanned Design
d
TCL
Output
a
Floorplanned design in the Verilog
c
language (logical connectivity
data) or other HDL + DEF
(physical data)
Example: Floorplanning
With a top level netlist, we can begin to
floorplan the chip
Set die size to 10x10 mm2
Assign the din, clk, and dout I/Os

to the perimeter.
ce EX
n
Create hard blocks for A, B, and C din dout
e
Size the blocks A, B, and C A C
Perform power planning
Perform macro placement
a d
Check for early routing congestion
clk
B
10mm
c
Check for early block utilization
10mm

Example: Floorplanning (continued)
For each block, we can also perform
some early checks.
C
Assign pins
Place RAMs and macros
Check power plan
ce from_a dout
n
Check for early routing congestion
e
RAM A1
RAM A0
Check for early block utilization clk
d
It is important to make sure the floorplan
is routable and meets the utilization
a
requirements with a given RAM and from_b
macro placement, pin assignment, etc.
6/16/08
What Is Placement?
Definition: Process of placing Floorplanning
the standard cells in a
e
Designer Placement
floorplanned design.
Microarchitecture
Physical Synthesis
Scan Reorder
c
Example: After the chip was
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
floorplanned, we performed
Extraction
RTL CTS
n
placement and discovered the Design Optimization
Post-CTS
floorplan was too small to fit all Logic Synthesis
e
Route
of the cells and macros in the Synthesized

Post-Route
design. Detail
d
Routed GDSII
Design
Question: How can we avoid
a
this problem?
6/16/08
What Is Physical Synthesis?
Definition: The combination of
Specification
logical synthesis and
e
placement. Designer Placement
Physical Synthesis
Microarchitecture
c
Scan Reorder
Example: To meet timing, we

Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
ran physical synthesis which, in
Extraction
RTL
n
CTS
addition to upsizing and Design Optimization

Post-CTS
downsizing components, also Logic Synthesis
e
Route
ran logic restructuring. Synthesized
Netlist Post-Route
d
Detail
Routed GDSII
Design
a
6/16/08
Placement and Physical Synthesis: Input and Output, Format

Input
Floorplanned Design
Floorplanned design in the Verilog
e
SDC TCL
Gates +
language or other HDL + DEF DEF
c
Constraints (SDC) format Placement
n
Logical Physical
(.lib) format Gates + Library Library
e
DEF
Physical Libraries in LEF format
Placement constraints and script Placed Design
d
in TCL
Output
ca
Placed design in the Verilog
language or other HDL + DEF

Example: Placement
If the design is small enough (<300K instances), we can run standard cell
placement “top down” at the EX level and place everything at once.
Top-Down Placement
ce
Or we can place the standard cells for each of the blocks separately.
Bottom-Up Placement
n
EX C
din
A C
d e dout from_a dout
a
RAM A1
RAM A0
c
clk clk
u10 u11 u12
B u14
u13
u15
u16 u17
from_b
What Is Scan Reorder?

Definition: Process of re-
Specification
connecting the scan chains in a
e
design to optimize for routing, Designer Placement
timing, etc.
Physical Synthesis
Microarchitecture
c
Scan Reorder
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
Example: Since logic synthesis

Extraction
RTL
n
CTS
arbitrarily connects the scan Design Optimization

Post-CTS
chain, we need to perform scan Logic Synthesis
e
Route
reorder after placement so that Synthesized
Netlist Post-Route
the scan chain routing will be
d
Detail
optimal. Routed
Design
GDSII
a
What is a scan chain?

c
A scan chain is the connection
of the flip-flops in a design, such
that test patterns can be scanned
in and results scanned out
during automated testing.

Scan Reorder: Input and Output, Format
Input
e
language or other HDL + DEF Placed Design
SDC SCANDEF
c
Constraints in Synopsys Design Gates +
DEF
n
Scan Reorder
(.lib) format
e
Physical Libraries in LEF format Logical Physical
Gates + Library Library
Scan chain information in DEF
d
SCANDEF format
Scan Chain Reordered
Output Design
ca
Scan chain reordered design in
the Verilog language or other HDL
+ DEF
Example: Scan Reorder

Scan chains that were stitched in the logical netlist need be reordered now that
placement is done.
ce
Logical netlist was stitched numerically.
Physical netlist is reordered based on placement.
n
SI DFF1 DFF2 DFF3 SO
e
Logical Netlist
SI DFF1 DFF2 SO
a
DFF3
d Physical Netlist before Reorder
SI
6/16/08
c DFF1
DFF3

DFF2 SO
Physical Netlist after Reorder
48
Example: Scan Reorder (continued)
Reordered scan chain requires much less routing resources in the example design.
Before Scan Reorder

C
ce After Scan Reorder

C
n
dff1 dff3 dff1 dff3
d e
a
RAM A1
RAM A0
RAM A1
RAM A0
dff2 dff2
6/16/08
What Is Design Optimization?

Definition: Process of using
Specification
automated algorithms to
e
improve the quality of a digital Designer Placement
design
Physical Synthesis
Microarchitecture
c
Scan Reorder
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
Example: After initial

Extraction
RTL
n
CTS
placement, we run a pass of Design Optimization

Post-CTS
pre-CTS design optimization to Logic Synthesis
e
Route
fix timing violations that may Synthesized
Netlist Post-Route
show up now that the design is
d
Detail
placed and we have delays Routed
Design
GDSII
based on estimated
a
interconnect.
6/16/08
Pre-CTS Design Optimization: Input and Output, Format
Input
Scan chain reordered design in Scan Chain
e
the Verilog language or other HDL Reordered Design
+ DEF SDC TCL
c
Gates +
Constraints in Synopsys Design DEF
Constraints (SDC) format (ideal
n
clocks) Design Optimization
Pre-CTS
e
Logical Physical
(.lib) format Gates + Library Library
DEF
d
Commands in TCL Optimized Placed Design
a
Output
c
Optimized placed design in the
Verilog language or other HDL +
DEF
Example: Pre-CTS Design Optimization

Because logical synthesis uses wire load models (estimates of net delay), the
design choices it makes can sometimes lead to sub-optimal results in placement.
Upsizing or downsizing cells
ce
Pre-CTS design optimization can clean up some of these issues by
n
Buffering nets
Re-synthesizing paths to improve timing, etc. C
d e from_a dout
ca
RAM A1
RAM A0
clk
u10 u11 u12
u13
u14 u15
u16 u17
u11 and u16 are upsized
from_b

Example: Pre-CTS Design Optimization (continued)
Cell u11 was driving several cells, and one of them, u20, was far away. In order to
drive the long net and meet timing, the cell was upsized. Cell u16 was upsized for
e
the same reason.
c
u20
en
d
RAM A1
RAM A0
a
u10 u11 u12
c
u13
u14 u15
u16 u17
u11 and u16 are upsized
What Is Clock Tree Synthesis?

Definition: Process of
inserting buffers in the clock Specification Floorplanning Place/Route
e
path, with the goal of Designer Placement
minimizing clock skew and
c
Physical Synthesis

latency to optimize timing
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
Example: We ran clock tree Design Optimization
synthesis on the example Logic Synthesis

Post-CTS
e
Route
block and saw a large clock Synthesized
skew due to bad clock Netlist Post-Route
d
Detail
constraints. We ended up re- Routed GDSII
Design
running clock tree synthesis
a
with better constraints to get Layout Design Verification
an optimal result. GDSII GDSII Mask Prep
6/16/08
Clock Tree Synthesis: Input and Output, Format
Input
Optimized design in the Verilog
e
language or other HDL + DEF Optimized Placed Design
c
Constraints in Synopsys Design SDC TCL
Gates +
Constraints (SDC) format DEF
n
(.lib) format CTS
e
Physical Libraries in LEF format Logical Physical
Gates + Library Library
Clock constraints and commands DEF
d
in TCL
Placed Design with
Output
a
Clock Trees Inserted
Post-CTS design with clock trees
c
inserted in the Verilog language or
other HDL + DEF
Example: Clock Tree Synthesis

Up to now, the clocks in the design have been treated as ideal (no clock skew, no
clock latency, ideal transition time, etc.). In CTS, we add buffers for the real clock
e
tree in order to minimize
c
Clock skew in the design
Clock latency in the design
n
C
d e from_a dout
a
RAM A1
RAM A0
c
clk
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
c0,c1,c2,c3 clock buffers are added
from_b

Example: Clock Tree Synthesis (continued)
Buffers are added to the clock tree in our example design.
e
Netlist before CTS
DFF1 DFF2
c
u11 u13
u10 u11
u11
en
c2
Netlist after CTS
DFF1
u10
u13
c3
DFF2
u11
d
c0 c1
cac0
dff1
u10
u14
u16
u11
u13
u17
Placement after CTS
c2
u15
c1
dff2
u11
c3
Example: Design Optimization, Post-CTS

Another round of design optimization takes place, since it is possible that CTS could
have disturbed the timing of some of the paths in the design.
Post-CTS optimization can include

Buffering
ce
n
Modifications to the clock tree itself

C
d e from_a dout
a
RAM A1
RAM A0
c
clk
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
from_b

What Is Route?
connecting the pins of the
e
Specification
standard cells, macros, and
I/Os of a digital design to Designer Placement
c
specific metal layers in the
Physical Synthesis
process technology to

Design Optimization
Designer
match the schematic
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL CTS
Example: We ran a
Design Optimization
e
Post-CTS
Logic Synthesis
preliminary route on the Route
example block and saw that Synthesized

d
Post-Route
routing congestion was an Detail

issue. To fix it, we re-ran Routed
Design
GDSII
a
placement with a placement Layout Design Verification
density screen to force a
c
lower utilization in that area GDSII GDSII Mask Prep
and allow for more routing

resources.
Route: Input and Output, Format

Input
e
language or other HDL + DEF Placed Design with
Clocks Inserted
c
Constraints (SDC) format SDC TCL
Gates +
DEF
n
(.lib) format
Route
e
Logical Physical
Route constraints and commands Gates + Library Library
d
in TCL DEF
Output Routed Design
ca
Routed design in the Verilog

Example: Route
When the design has been fully placed with all of the clock tree buffers, it is time to
perform routing. Routing connects all of the I/Os, standard cells, RAMs, and macros
e
to their specific routing layers according to the synthesized netlist.
The router will try to minimize
Route congestion
Timing impact on critical paths
nc
d e
RAM A1
RAM A0
ca c0
u10
u14
u16
u11
u13
u17
c2
u15
c1
u12
c0,c1,c2,c3 clock buffers are routed

c3
Example: Design Optimization, Post-Route

Another round of design optimization takes place, since the timing is more realistic
now that there are actual wires and not just estimates of wires.
Post-route optimization can include
Buffering
ce
n
More advanced and aggressive modifications
d e
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
c0,c1,c2,c3 clock buffers are routed, u10 and u14 are upsized

Discussion Questions
What iterations take place when a design goes from logic synthesis
through floorplanning, placement, CTS, and route?
RTL to placement?
ce
What precautions would you take if you were to take your design from
en
a d
6/16/08
What Is Extraction?
calculating the parasitic
e
Specification
resistance and capacitance
Designer Placement
of the interconnect of the
c
Physical Synthesis
Microarchitecture
physical design Scan Reorder
Design Optimization
Designer
Delay Calculation
Pre-CTS
n
Signal Integrity
Example: Extraction can be

Extraction
RTL CTS
performed at various parts of Design Optimization
e
Post-CTS
Logic Synthesis
the design with varying Route
accuracy. The most accurate Synthesized

d
Post-Route
results are achieved when Detail

Routed GDSII
extraction is performed on a Design
a
fully routed design, because Layout Design Verification
all of the nets are of known
c
metal type and length. There
are no estimates for nets at
this point.

Extraction: Input and Output, Format
Input
e
language or other HDL + DEF or Routed Design
GDSII TCL
c
DEF or
GDSII
LVS verified netlist
n
Extraction
Extraction constraints and
e
commands in TCL Physical
Library
SPEF
Output
d
Standard Parasitic Extraction Parasitic File
Format (SPEF) file containing all
a
of the RC information for the
routed nets in the design
6/16/08
Example: Extraction
When the design has been routed, we can perform a detailed extraction of the
resistance and capacitance of the routed nets in the design.
design.
ce
This RC data will give us a more accurate report of the timing and power of the
n
Resistance and
capacitance
e
for each net is
“extracted”
and saved in
d
a SPEF file.
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1

What Is Delay Calculation?
computing the delay of Specification Floorplanning Place/Route
e
interconnect and standard Designer Placement
cells in a digital design
c
Physical Synthesis

Design Optimization
Designer
Example: In the example
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
design, delay calculation was Design Optimization
performed after CTS and also Logic Synthesis

Post-CTS
e
Route
after final route. Using the Synthesized
delay information, we were Netlist Post-Route
d
Detail
able to find several timing Routed GDSII
Design
violations in the design.
a
6/16/08
Delay Calculation: Input and Output, Format

Input
e
Routed Design
language or other HDL + DEF TCL
Gates +
c
Parasitic extraction file (SPEF) DEF
SPEF
n
(.lib) format Delay Calculation
Logical Physical
e
Library Library
Constraints and commands in SDF
TCL
d
Delay File
Output
a
Standard Delay Format (SDF) file
containing all of the delay
c
information in the design

Example: Delay Calculation
We can perform delay calculation for all the cells and nets in the design and
generate an SDF file. This can be done in a
e
Separate delay calculator
c
STA tool
The reason for generating an SDF file is to have consistency for all timing
n
calculations throughout the flow. Once it is generated, then all tools can access the
same SDF file.
d e
Delay for each
cell and net
in the design
is calculated
a RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
What Is Signal Integrity?

Definition: Unintended
effects on digital signals Specification Floorplanning Place/Route
e
caused by interconnect Designer Placement
parasitic resistance or
c
Physical Synthesis

capacitance that causes noise
Design Optimization
Designer
Delay Calculation
Pre-CTS
and/or changes delays
Signal Integrity
n
Extraction
RTL CTS
Design Optimization
Example: In our example Logic Synthesis

Post-CTS
e
Route
design, we saw signal Synthesized
integrity (SI) effects such as Netlist Post-Route
d
Detail
noise-on-delay and glitches, Routed GDSII
Design
due to long nets that were
a
running in parallel. Layout Design Verification
c
What is noise-on-delay?
Crosstalk-induced delay or incremental
delay due to coupling capacitance?
What is a glitch?
A glitch is a bump or change in value
caused by a changing signal effecting a
neighboring wire.

Signal Integrity: Input and Output, Format
Input
Routed Design in the Verilog language
e
or other HDL + DEF SPEF
c
Constraints (SDC) format Routed Design
Constraints and commands in TCL SDC TCL
Gates +
n
Parasitic extraction file (SPEF) DEF
Logical Timing Libraries in Liberty (.lib)
e
format Signal Integrity
Logical Physical
d
Power rail IR-drop data Incremental Library Library
SDF
Tool specific SI libraries Tool
a
Specific
Output Delay File Library
Incremental Standard Delay Format
c
(SDF) file containing all of the delay
information in the design related to
noise-on-delay
Reports for glitch nets
List of problem nets that need to be
re-routed.
Example: Signal Integrity

We can run checks and produce data or reports to help us identify timing and reliability issues
due to SI. For submicron designs, closely coupled nets can produce
e
Crosstalk-induced delay
c
Noise
Power rail IR drop can cause
n
Weakened drivers
Increased delays
Lower noise margins
d e
Incremental delay due
coupling capacitance is
stored in an SDF file
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1

What Is Static Timing Analysis?
STA is the preferred method for timing
signoff since the majority of ASIC
e
Specification
vendors and foundries have adopted
Designer Placement
it.
c
Physical Synthesis

Definition: Process of Designer
Design Optimization
Pre-CTS
Delay Calculation
n
Signal Integrity
computing the timing of
Extraction
RTL CTS
logically related paths for a Design Optimization
e
Post-CTS
Logic Synthesis
digital design without regard to Route
large scale functional behavior Synthesized

d
Post-Route
Detail
Routed GDSII
Example: To determine the Design
a
timing of the design, we ran Layout Design Verification
static timing analysis after
c
detail route, and saw several
paths violating their setup time
requirements.
Static Timing Analysis: Input and Output, Format

Input
Routed Design in the Verilog
e
language or other HDL (Note: SPEF
STA can be run on a design at
c
any stage of the back-end flow) SDF Routed Design
SDC TCL
Constraints in Synopsys Design Incremental
n
SDF Gates
e
Static Timing
(.lib) format Analysis
Constraints and commands in Logical
d
Library
TCL
a
SPEF, SDF, and incremental SDF
Reports
Output
c
Timing reports, including noise-
on-delay effects

Example: Static Timing Analysis
At the end of the physical implementation phase, we will need to run signoff STA to
make sure that all of the paths in our design meet timing.
STA can be used
ce
During the implementation phase to check on timing, etc.
n
For signoff just before tapeout to ensure all paths meet timing
e
Full chip timing
can now be run
d
with routing and
SI effects
included
a RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
What Is Design/Physical Verification?

Definition: Layout versus schematic (LVS)
and design rule check (DRC) and power (IR
e
drop and EM) are signoff checks run to Specification Floorplanning Place/Route
ensure the integrity, functionality, and Designer Placement
c
manufacturability of the chip.
Physical Synthesis

LVS is a comparison of transistor-
Design Optimization
n
Designer
level SPICE netlist vs. GDSII to
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL
ensure the connectivity of the CTS
e
design. Design Optimization
Post-CTS
Logic Synthesis
Route
DRC is a detailed check of the
Synthesized
d
Design Optimization
physical design against the process Netlist Gates Post-Route
technology rules. Detail

Routed GDSII
a
Design
IR drop is a detailed check of the
chip’s power plan to ensure that the
c
supply voltages do not drop below GDSII GDSII Mask Prep
accepted levels.
EM is a detailed check to ensure that
the current density in all parts of the
design does not exceed accepted
levels.
What Is Mask Prep?
Process of creating the mask set from the GDSII database to allow chip
manufacturing
Specification
ce Floorplanning Place/Route
n
Designer Placement
Physical Synthesis
e

Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
d
RTL CTS
Design Optimization
Post-CTS
Logic Synthesis
a
Route
Netlist Gates Post-Route
c
Detail
Routed GDSII
Design
DRC: Input and Output, Format

Input
GDSII
e
GDSII
Rule deck
c
Output DRC
n
DRC reports Rule
Deck
e
Reports
a d
6/16/08
LVS: Input and Output, Format
Input
Gate Level Netlist in the
e
Gates GDSII
Verilog language
c
GDSII LVS
Rule deck
n
Rule SPICE
Deck Libs
SPICE libraries
e
Output Reports
LVS reports
a d
6/16/08
Power Grid Analysis, IR Drop, and EM: Input and Output, Format
Input
e
language + DEF
VCD
c
Power characterized libraries in
tool-specific format Gates + SDC TWF
SPEF
DEF
n
Timing libraries in Liberty (.lib)
format Power Grid
e
Timing constraints in SDC format Analysis
Logical Power
Extraction data in SPEF format Libraries Libraries
d
Timing windows file (TWF)
Value-change-dump file (optional)
a
Reports
Output
c
IR drop reports
EM reports

Mask Prep: Input and Output, Format
Input
GDSII GDSII
Technology Specific Files
Output
ce Mask Prep
Tech
n
Optimized GDSII Optimized Files
GDSII
d e
ca
Example: Physical Verification and Mask Prep

Physical verification involves power, LVS, and DRC checks to ensure the integrity of
the design.
ce
When the design passes all of the PV checks, a GDSII is produced and mask prep
can begin. Mask prep involves complex processes such as lithography (the process
of creating the masks to create the layers for an integrated circuit) modifications,
n
etc.
e
Make sure
power, LVS, DRC
checks pass
d
Perform mask
prep
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1

What are the main process steps in the physical design of a chip?
e
Which process steps can be done at multiple stages of the flow?
If you were to lead the design of a chip, how would you organize your
c
resources to handle the various tasks?
n
d e
ca
Summary
We have introduced all of the steps in the physical implementation flow:
e
Specification, microarchitecture, RTL, logic synthesis
Floorplanning, placement, clock tree synthesis, route
nc
Extraction, delay calculation, static timing analysis, signal integrity
Design optimization, physical synthesis
e
Design verification, mask prep
d
Each step in the process, in and of itself, is very detailed, so we will spend
the rest of course learning more about each step.
ca
Testing Your Understanding
True or false
e
1. In creating a floorplan, we can gather information to see if our design
is routable.
nc
2. If a design does not meet timing after synthesis, it is possible that it
can meet timing during placement.
3. When routing a design, it is best to avoid having long parallel routes.
timing analysis.
d e
4. Accurate SDC constraints are important to meet timing during static
a
5. Errors in physical verification are simple to fix.
6/16/08
Learning Activity
Complete a flowchart of the digital
design implementation flow
Include the design flow steps
Include the necessary inputs and
ce RTL
n
outputs Design Flow
?
e
Step
Fill in the missing or wrong
sections of the flowchart ?
a d
10 minutes for debriefing
?
6/16/08
GDSII
86
Terms and Definitions
Floorplanning Process of deriving the die size, allocating space for soft blocks, planning power, and macro
placement.
e
Placement Process of placing the standard cells in a floorplanned design
Clock Tree Synthesis Process of inserting buffers in the clock tree of a digital design
c
Route Process of connecting the pins of the standard cells, macros, and I/Os of a digital design to
specific metal layers in the process technology to match the schematic.
n
Extraction Process of calculating the parasitic resistance and capacitance of the interconnect of the physical
design
e
Delay Calculation Process of computing the delay of interconnect and standard cells in a digital design
Static Timing Analysis Process of computing the timing of logically related paths for a digital design without regard to
d
large scale functional behavior
Signal Integrity Unintended effects on digital signals caused by interconnect parasitic resistance or capacitance
a
that causes noise and/or changes delays
Design Optimization Process of using automated algorithms to improve the quality of a digital design
c
Physical Synthesis Process of combining logic synthesis and placement to improve the accuracy of the physical
implementation of a digital design
Design Verification Process of physically verifying the design rules and backend checks of a design
Mask Prep Process of creating the mask set from the GDSII database to allow chip manufacturing
Terms and Definitions (continued)

LEF Library Exchange Format, Physical Library (metal and via routing rules)
DEF Design Exchange Format, Physical (floorplanning, placement, routing) and Logical Representation
e
(connectivity)
Liberty Format for logical libraries, includes timing, area, and power information
c
SDC Synopsys Design Constraints, includes clocks and timing constraints
Clock Skew Delay difference between clock paths in a design
n
Clock Latency Delay from clock source to destination in a design
SPEF Standard Parasitic Exchange Format, standard format for representing capacitance and resistance
e
for each net
SDF Standard Delay Format, standard format for representing interconnect and cell delays
d
LVS Layout vs. schematic, connectivity checking
DRC Design Rule Check, physical rule checking
a
IR Drop Voltage Drop, measure of power plan integrity
EM Electromigration, term used to describe failures in wires due to high current
c
TWF Timing Windows File, file used in signal integrity analysis to determine the overlap of signals
VCD Value Change Dump, file used to provide toggle information to power analysis
GDSII Graphic Data System, standard format for IC layout data exchange
Rule Deck Technology specific information used by physical verification
Spice Deck Format to represent circuits, cells, and macros in detail

Introduction and Overview of Layout
Technology
Module 2
June 16, 2008
Urban Planning
When civil engineers plan the layout of an urban settlement, they need to consider
Total population and population density of the settlement.
ce
Locations of parks, apartments, shopping centers, etc.
Spacing of each building, tree, and street.
en
a d
6/16/08
Integrated Circuit Layout
In a similar fashion, but on a micron scale, layout engineers must also decide where
to place parts of the circuit under design and follow spacing rules.
ce
en
a d
6/16/08
Module Objectives
e
Describe the fabrication of field effect transistors (FETs) and layout
technologies, and correspond layout data to library exchange format
c
(LEF) syntax
Read a design rule manual (DRM) and interpret design rule check
en
(DRC) and layout versus schematic (LVS) errors
Describe how cell libraries are used
a d
6/16/08
Introduction to layout describing layers, FETs, and logic gate layouts
e
DRC and reading a DRM
Layout versus schematic checking
LEF library format and syntax
Review
nc
d e
ca
Introduction to Layout
After a design has been synthesized, it is time to start laying out the
design.
ce
Usually, place and route tools more or less automate the process
using ready-made transistor and gate libraries provided by the
foundry.
en
Sometimes, when performance and design density is of primary
importance, the designer must lay out the design manually.
This approach, called “custom design,” leads to high production costs
a d
and a long time to market.
There are usually three reasons to justify a custom design:
c
The block can be re-used many times such as a library cell.
The product can be sold in a large volume, such as microprocessors.
Cost is not the primary concern, such as chips used in space.

Laying Out a Transistor
Since the transistor is the basic building block of circuits, we must
understand how a transistor is laid out on a chip.
MOS transistor (NMOS).
ce
Below are the symbol schematic and the 3D diagram for an n-channel
Recall that the NMOS has four terminals:
n
The gate is made out of polysilicon.
e
The drain and source are made out of heavily doped n+ diffusion layers
(also called active area).
d
The bulk is made out of p-type substrate.
a
Gate
c
Poly Gate
n+ drain n+ source
Drain Source
Bulk p-substrate bulk
Symbol Schematic 3D Diagram
Laying Out a Transistor (continued)

In a layout tool, you will simply draw the top-down view of the 3D diagram you
saw in the previous slide with a few additions.
e
Due to lithographic error margins, the polysilicon gate must be extended over the
diffusion layer according to design rules (discussed later).
nc
Metal contacts must be added for the drain and source diffusion layers.
Metal 1
e
contact
a d Gate
6/16/08
c Source Drain

Laying Out a Transistor (continued)
A p-channel transistor is similar to the n-channel transistor except for a few
differences:
e
The drain and source of a p-channel MOS (PMOS) are made out of p+ diffusion
layer, and the substrate is composed of n-type material.
c
For consistency in this module, the wafer is created with a p-type process. This
means the default substrate is p-type material. Therefore, since the PMOS requires
n-type material, a well composed of n-type material must be built around the
n
diffusion layer.
In layout tools, this well is also called the select region.
e
Layout of p-channel transistor
a d
c
Gate
Source Drain
n-well
Laying Out an Inverter

Now that we have seen how a transistor is laid out, it is time to lay out
the simplest logic gate: an inverter.
Features to consider:
ce
The inverter needs power and ground metal strips (rails). Recall that the
source of the PMOS is connected to the power rail, whereas the source of
n
the NMOS is connected to the ground rail.
e
The poly can be extended to connect together both gates.
Recall that for an inverter with equal drive strength for rising and falling
d
transitions, the PMOS is twice the width of the NMOS.
Since the substrate needs to be biased, we will also need to add substrate
a
contacts to both transistors (n-tap for n-type substrate contact, p-tap for p-
type substrate contact).
6/16/08
c
In digital circuits, the substrate of a transistor is usually biased to the same
voltage as the source of the transistor to avoid body effect.

Laying Out an Inverter (continued)
VDD
n-tap
ce Drain
In Out
en n-well
Source
Gate
a d Source
In Out
Drain
c
Note: In digital circuit schematics, the
bulk node is usually not drawn because it
is assumed that the bulk is connected to
the source.
p-tap
Ground
Stick Diagram
Just like the way a writer writes a rough draft for an essay, layout engineers
can also plan their layout on paper before diving into the tools.
e
A commonly preferred method for scratch work is a stick diagram.
c
A stick diagram is a way to visualize a layout without drawing the actual
dimensions.
n
Each object (that is, poly, metal strip, diffusion) is represented by a
dimensionless “stick.”
e
Below is a stick diagram for the inverter we just drew. Can you identify which
each stick represents?
a d
Stick diagram of an inverter
6/16/08
NAND Gate Layout
Let’s draw a stick diagram for a slightly more complicated, two-input NAND
gate.
NMOS in series.
ce
Looking at the schematic, we see that there are two PMOS in parallel and two
Due to this fact, we do not need to draw separate diffusion layers for two
n
devices that share sources and drains.
A B
d e A B
ca A
NAND Gate Layout (continued)

Now let’s translate that stick diagram into an actual layout with dimensions.
We will need to add the n-well, contacts, and substrate taps.
n-tap VDD
ce
en
d A B
ca p-tap Ground

General Layout Tips
Here are some fundamental guidelines to follow when laying out a design:
e
Always try to create a continuous diffusion layer or well.
If you must separate the wells (select areas), remember to place
nc
substrate taps in each separate well.
All poly strips should run in one direction (usually vertical).
e
Keep metal jogs to a minimum, and absolutely no diagonal wires,
since diagonal wires can cause problems with design rules
d
downstream.
a
Power and ground rails should be extra wide to allow a large amount
of current to flow into your device.
c
Placing additional contacts never hurt. They will give you more options
in terms of where to place the metal wire during routing.
Plan your layout on paper first; the stick diagram is your friend!
Technology Layers
In most layout tools today, you will
have access to a layers palette
e
like the one shown on the right
from Cadence Virtuoso Layout
c
Editor.
We have already encountered
n
some of the layers on the previous
e
slides.
The table on the next slide will
d
summarize commonly used
layers.
ca
Technology Layers (continued)
Layer Name as Displayed in Palette Description
Metal(1-8)
Poly
ce Metal layer used to connect together pins on

a layout
Polysilicon material used for transistor gates
nactive
pactive
en n+ diffusion layer for NMOS
p+ diffusion layer for PMOS
d
nselect n well for PMOS
a
pselect In a p-process, this represents an abstract
boundary.
c
cc Contact cut. This layer, in conjunction with
metal layers, is used to create vias and
contacts.
Class Exercise
Draw the stick diagram and layout for a two-input NOR gate.
ce A
en B
a d A B
6/16/08
e
Review
nc
d e
ca
Design Rules
Today’s semiconductor manufacturing processes are extremely
complex. It is simply not possible to expect every layout engineer to
e
understand the intricacies of the fabrication process.
c
Layout engineers want tighter, smaller designs.
Process engineers want a reproducible and high-yield process.
and process engineers.
en
Design rules act as an interface and a compromise between layout
d
By understanding design rules, layout engineers can make their
design as compact as possible while ensuring that their design will
a
have a high yield.
6/16/08
Design Rule Manual
Usually, the layout engineer will have access to a document called the
design rule manual (DRM), which explains all the design rules that
e
need to be followed.
c
The information is also annotated into layout tools that automatically
check the design for violations as the design is being laid out.
en
We will take a brief look at a sample DRM.
To make the rules more readable, the rules in this manual are divided
into sections based on the different layers.
a d
6/16/08
Design Rule Manual: N-Well Rules

N-Well Rules
P-active Rule Rule Description Drawn
e
1A
1F
1A Minimum width 2.2 μm
c
1B
1B Minimum spacing in x and y, 1.6 μm
n
both N-wells biased at the
same potential
N-well
e
1C Minimum spacing either or 3.0 μm
both N-wells not biased or
biased to different potentials
1C
a 1E
d 1D
1E
Minimum enclosure of p-
active region
Minimum spacing in x and y
1.5 μm
2.0 μm
c
to an external n-active region
N-active
1F Minimum spacing in x and y 1.5 μm

1D to an external p-active region
Note: These dimensions are not drawn to scale.

Design Rule Manual: N/P-Active Rules
N-Active Rules
2A
Rule Rule Description Drawn
N-active
ce 2A
2B
Minimum width
Minimum spacing over field

0.6 μm
0.8 μm
2B
P-active
2C
en 2C Minimum spacing to p-active
P-Active Rules
1.0 μm
d
3A Minimum width 0.6 μm
a
3B Minimum spacing over field 0.8 μm
c
3C Minimum spacing to n-active 1.0 μm

The rules for 3A, 3B, 3C are graphically identical to 2A, 2B, 2C except they are for p-active regions.
The numbers for n-active and p-active are the same. It might not be so for every technology.
Design Rule Manual: Contact1 Rules
4D Contact 1 (Metal 1 Contacts) Rules
e
4E Rule Rule Description Drawn
c
4A Required size (square) 0.8 μm2
n
4A
4C 4B Minimum spacing 0.6 μm
e
4C Minimum poly contact 1 0.4 μm
spacing to any active region
d
4B
4D Minimum active region 0.6 μm
contact 1 to poly
a
4F
4E Minimum enclosure by any 0.2 μm
active region
c
4F Minimum enclosure by poly 0.2 μm

The rules for contacts belonging to other layers are similar; just the numbers are different.

Design Rule Manual: Metal1 Rules
5C
Metal 1 Rules
e
Rule Rule Description Drawn
5A
nc 5A Minimum width 0.6 μm
e
5B Minimum spacing 0.8 μm
d
5C Minimum overlap of contact 1 0.2 μm
a
5B
6/16/08
c
The rules for contacts belonging to other layers are similar; just the numbers are different.
If we had design rule violations on METAL2 and METAL3 after detail
route, which sections of the DRM should we refer to?
ce
Why is there a minimum spacing rule for specific metal layers?
Why is there a minimum width rule for specific metal layers?
n
Are the rules different per metal layer?
d e
ca
e
Review
nc
d e
ca
Layout vs. Schematic

After your layout is completed, how do you verify that it is functionally
correct?
ce
A function called Layout versus Schematic (LVS) is found in most
tools, which checks your layout against a schematic netlist.
The tool first extracts a netlist from the layout by using some basic
rules:
en
A transistor is detected when poly overlaps active regions.
All poly, diffusion, and metal layers are conductive and are assumed to
d
route signals.
ca
Layout vs. Schematic (continued)
Net2 Net2
e
Net3
Net1 VDD VDD Net1 I1 I3 Net3
I1 I3 2/1 2/1
GND
A B
I2
GND
A B
I4
nc I2
1/1
I4
1/1
d e
ca Net1
IN1
Net2 Net3
O1
General LVS Tips

Although different LVS tools contain different user interfaces,
commands, and options, they all share the same principle of
e
comparing a netlist against a layout.
c
These are some general tips when performing LVS:
Similar to Verilog® design, a bottom-up approach should be used when
n
performing LVS. If LVS does not pass, then the error can be narrowed
down to the interconnects, because the smaller blocks are already LVS
e
clean.
d
Label your layout. All pins and wires should be labeled exactly as they
appear in the netlist. It gives the tool a good chance to correctly identify a
a
mismatch.
If the device count between layout and netlist is the same, do not perform
c
any netlist reduction. If the count is different, check to see if the layout is
correct before performing netlist reduction, because this process attempts
to simplify logic and can potentially collapse nets.

General LVS Tips (continued)
The first goal of LVS should be to get a connectivity-clean LVS.
The electrical connections linking different devices in the front and back
end should be equivalent.
ce
The two netlists should be topologically equivalent, meaning they have the
same type of devices.
n
Set your constraints to check only the above factors.
e
The second goal of LVS is to make sure that device parameters and
capacitance values are correct. This can only be done if the netlist
d
annotates such information.
a
Check the reports.
LVS reports usually consist of matching and non-matching nets and
c
devices.
Most tools have a cross-probing feature that will highlight the equivalent
object on both the layout and the schematic of the netlist if one is selected.
This is your best debugging friend!

e
Review
nc
d e
ca
Physical Libraries
After a standard cell is laid out, the information is encapsulated into a
Library Exchange Format (LEF).
ce
The LEF provides a means to exchange layout information between
layout and routing tools in the IC flow (such as the Cadence®
Virtuoso® tools and the SoC Encounter® RTL-to-GDSII system).
standard cell.
en
The LEF contains only information on layout of metal layers inside a
This information includes the locations of I/O pins and also internal
a
route.
d
metal routing so that the router knows where to route and where not to
6/16/08
General Rules about LEF Files

A LEF file is limited to 2048 characters per line.
e
The unit of distance is in microns.
The precision for unit of distance is controlled by the UNITS

statement.
nc
LEF statements end with a semicolon. A space must separate the last
character in the statement and the semicolon.
cell LEF file.
d e
LEF information is usually divided into two files, a technology and a
a
LEF statements can be defined in any order. But data must be defined
before it is used. The following table is the typical format for LEF files.
6/16/08
Typical LEF Format
Statements for a tech LEF file. Statements for a standard cell LEF file.
[VERSION statement] [VERSION statement]
e
[BUSBITCHARS statement] [BUSBITCHARS statement]
[DIVIDERCHAR statement] [DIVIDERCHAR statement]
c
[UNITS statement] [VIA statement] ...
[MANUFACTURINGGRID statement] [SITE statement]
n
[USEMINSPACING statement] [MACRO statement
[CLEARANCEMEASURE statement ;] [PIN statement] ...
e
[PROPERTYDEFINITIONS statement] [OBS statement ...] ] ...
[LAYER(Nonrouting) statement [BEGINEXT statement] ...
d
| LAYER(Routing) statement] ... [END LIBRARY]
[SPACING statement ]
[MAXVIASTACK statement]
a
[VIA statement] ...
[VIARULE statement] ...
c
[VIARULE GENERATE statement] ...
[NONDEFAULTRULE statement] ...
[SITE statement] ...
[BEGINEXT statement] ...
[END LIBRARY]
Technology LEF File

A technology LEF file contains information about a certain technology
process (for example, UMC 130 nm and IBM 65 nm).
ce
The bulk of the technology LEF file describes the metal and via layers,
and their process rules (such as width, spacing, extension, minimum
area, and antenna area).
en
Metal layers are used to connect standard cells and macros, whereas
vias are used to connect different metal layers.
A via is a rectangular object that connects two routing layers together.
a
two routing layers.
d
The via is usually composed of three layers: a cut layer sandwiched by
6/16/08
LAYER Statement
Every layer in the technology is described with the LAYER statement.
e
There are four types of layers: CUT, Routing, Implant, and
Masterslice.
nc
In this class, we will cover only the CUT and Routing layers, which are
responsible for creating metal routes and the vias.
Implant and Masterslice layers are beyond the scope of this module
e
and will not be discussed here.
a d
6/16/08
Routing LAYER
LAYER ME1
Routing layers are responsible for TYPE ROUTING ;
creating metal routes between WIDTH 0.160 ;
e
cells. AREA 0.1024 ;
SPACING 0.160 ;
c
For each layer, there are many SPACING 0.26 RANGE 1.765 100000.0 ;
PITCH 0.400 ;
attributes to set. On the right is a OFFSET 0.200 ;
sample LEF file describing the
n
DIRECTION HORIZONTAL ;
attributes for metal layer 1. THICKNESS 0.320 ;
HEIGHT 0.46 ;
e
The important attributes will be MINENCLOSEDAREA 0.3072 ;
MINIMUMCUT 2 WIDTH 1.40 ;
described in detail on the next MAXWIDTH 25.00 ;
d
slide. CAPACITANCE CPERSQDIST 1.1012E-04 ;
RESISTANCE RPERSQ 0.09100000 ;
a
EDGECAPACITANCE 9.362E-05 ;
MINIMUMDENSITY 20 ;
MAXIMUMDENSITY 80 ;
c
DENSITYCHECKWINDOW 200 200 ;
DENSITYCHECKSTEP 100 ;
FILLACTIVESPACING 0.8 ;
ANTENNACUMAREARATIO 396 ;
ANTENNACUMDIFFAREARATIO PWL ( ( 0 396 )
( 0.102 396 ) ( 0.103 999999999 ) ( 1 999999999
) ) ;
END ME1

Routing LAYER Attributes
3D view Attribute Description
e
Width Minimum width of the routing
Thickness wires
c
Area Minimum area for a polygon of
metal
n
Top down view Thickness Minimum thickness of wire
d e Spacing Minimum spacing between wires.

You may specify different
minimum spacing values for
various range of widths.
a
ex: SPACING 0.26 RANGE 1.765
100000.0 ;
means the minimum spacing is
c
0.26 for wires with widths beyond
1.765 microns.
Spacing Width
Routing LAYER Attributes (continued)

These attributes are used to calculate wire delays, cross talk, and other
physical verification parameters.
ce Attribute Description
n
Capacitance calculations
Capacitance The capacitance per
square unit of the wire-to-
e
Resistance ground capacitance
Resistance The resistance per square
d
of the metal
EdgeCapacitance Capacitance EdgeCapacitance The capacitance from the
a
sidewall to the ground of
the metal
6/16/08
Routing LAYER Attributes (continued)
Most place and route tools have
routing tracks. All metal routes
e
must be placed squarely on these
Attribute Description
tracks.
c
Offset The distance of the first routing track
from the edge of the chip
en Pitch The distance between each successive

routing track
d
Direction Each metal layer has a preferred
direction that the auto router will route
with. It is either vertical or horizontal.
a
Diagonal tracks are usually not
preferred.
6/16/08
c
Offset Pitch
Edge of Chip
Vias
Vias are contacts that connect
together different metal layers. Sample Via Definition
e
//The LAYER statement for metal 1 and 2
Vias usually have three layers: two defined on previous slides.
c
routing layers and a CUT layer in
between. LAYER VI1
TYPE CUT ;
n
SPACING 0.20 ;
END VI1
e
VIA VI1_H DEFAULT
RESISTANCE 4.0000e+00 ;
d
CUT Layer LAYER ME1 ;
RECT -0.16 -0.1 0.16 0.1 ;
a
LAYER VI1 ;
RECT -0.1 -0.1 0.1 0.1 ;
LAYER ME2 ;
c
RECT -0.16 -0.1 0.16 0.1 ;
END VI1_H
This via to connect metal 1 and 2 has three

layers—two routing layers and a cut layer with the
sizes defined by the RECT statement.

Standard Cell LEF File
A standard cell LEF file contains the metal pin layout information for macros.
e
MACRO INVX10MTL PIN VSS
CLASS CORE ; DIRECTION INOUT ;
FOREIGN INVX10MTL 0.000 0.000 ; USE GROUND ;
c
ORIGIN 0.000 0.000 ; SHAPE ABUTMENT ;
SIZE 3.200 BY 2.800 ; PORT
SYMMETRY X Y ; LAYER ME1 ;
n
SITE SAMPLEFSNSITE ; RECT 2.540 -0.180 3.200 0.180 ;
PIN Y RECT 2.260 -0.180 2.540 0.680 ;
DIRECTION OUTPUT ; RECT 1.460 -0.180 2.260 0.180 ;
e
PORT RECT 1.180 -0.180 1.460 0.580 ;
LAYER ME1 ; END
RECT 2.815 0.605 3.100 2.305 ; END VSS
d
RECT 1.980 1.040 2.815 1.760 ; PIN VDD
RECT 1.700 0.605 1.980 2.305 ; DIRECTION INOUT ;
RECT 1.220 0.740 1.700 2.020 ; USE POWER ;
a
END SHAPE ABUTMENT ;
ANTENNADIFFAREA 1.687 ; PORT
END Y LAYER ME1 ;
c
PIN A RECT 2.540 2.620 3.200 2.980 ;
DIRECTION INPUT ; RECT 2.260 2.070 2.540 2.980 ;
PORT RECT 1.460 2.620 2.260 2.980 ;
LAYER ME1 ; RECT 1.180 2.180 1.460 2.980 ;
RECT 0.160 1.140 1.040 1.500 ; END
END END VDD
ANTENNAGATEAREA 0.888 ; END INVX10MTL
END A
MACRO General Attributes

A MACRO in a LEF file can refer Attribute Description
to any instantiated macro, Class This is the type of MACRO. For
e
standard cell, and I/O pads. standard cells, the value is CORE.
Foreign Specifies the name of the macro
c
We will focus on standard cells. when seen in a tool. It specifies how
the position and orientation would be
translated when read into a layout
n
tool.
Origin Specifies the origin of the macro
e
relative to a DEF COMPONENT
placement point. Usually leave this
d
as 0 0 to avoid confusion.
Size Dimensions of the MACRO
ca
MACRO Symmetry
A chip is divided into core rows in which standard cells are placed.
The rows are usually placed in a flipped and abutted pattern, with alternating north (N),
e
and flipped south (FS) orientations.
c
Standard cells are placed in the rows, in N or FS orientation, such that they share VDD
rails and VSS rails.
n
Cells in the N row have the N orientation, whereas those in the FS row have FS
orientation.
VDD or VSS rail
d e Flip and abut

Shared VDD or VSS rail
N Row
FS Row ca N
FS
VDD or VSS rail
MACRO Symmetry (continued)

Cells can also be flipped about their y-axis.
Cells on the N row that are flipped about their y-axis have the FN orientation,
N Row N
e
whereas those flipped vertically on the FS row have the S orientation.
c FN VDD or VSS Rail
FS Row FS
en S
d
The SYMMETRY statement (SYMMETRY X ;) tells the placer which
orientations are allowed when placing cells in the rows.
a
Possible values include
c
X : N and FS orientations should allowed
Y : N and FN orientations should allowed
X Y: All orientations should allowed
R90: Do not use this value for standard cells

MACRO Pins
Now that we have seen some of the
attributes about the standard cell LEF Code for Vdd Pin
e
itself, it is time to look at the most PIN VDD
important components of a standard DIRECTION INOUT ;
cell: its pins. USE POWER ;
c
SHAPE ABUTMENT ;
The pin DIRECTION specifies the PORT
direction of the pin. Values can be
n
LAYER ME1 ;
either INPUT, OUTPUT, or INOUT, RECT 2.540 2.620 3.200 2.980 ;
TRISTATE, or FEEDTHRU. RECT 2.260 2.070 2.540 2.980 ;
e
The pin SHAPE specifies how the pin RECT 1.460 2.620 2.260 2.980 ;
is connected. Values can be RECT 1.180 2.180 1.460 2.980 ;
END
d
ABUTMENT, RING, or FEEDTHRU
(used only for pins with special END VDD
connection requirements, such as
a
power/ground).
The pin USE specifies how the pin is
c
used. Values can be either
ANALOG, CLOCK, GROUND,
POWER, or SIGNAL.
MACRO Pin Shape

The SHAPE statement specifies a pin with special connection requirements due to
its shape. The values are
e
ABUTMENT: Pins that stretch across the cell joining the same pin on adjacent
cells without routing. (Power rails are a good example.)
c
RING: Pin on a large macro that forms a ring around the macro allowing
connection to any point on the ring (used for power on big macros such as
RAMS).
n
e
FEEDTHRU: Pin with an irregular shape with a jog within the cell.
Abutment
a d Ring Feedthrough
6/16/08
MACRO Pin Port Block
The port statement begins a section, which specifies the location of the metal and via
geometries of the pin relative to the standard cell origin.
e
There can be more than one port block. All ports are electrically connected for that pin.
c
The LAYER statement specifies the layer of the metal or via geometry in the port. There
can be more than one LAYER or VIA statement in each PORT.
n
The RECT statements give the dimensions of the port. (The first two numbers are the x
y coordinates of one corner, whereas the second two numbers are the x and y
e
coordinates of the corner diagonally across from the first one. The convention is lower
left, upper right for the two sets of coordinates.)
a d PIN A
DIRECTION IN ;
USE SIGNAL ;
c
PORT
LAYER ME1 ;
RECT 0.000 0.000 1.000 1.000 ;
RECT 1.000 0.000 2.000 2.000 ;
END
END VSS

e
Review
nc
d e
ca
Summary
Layout is the process of placing physical instances of a netlist onto a
chip.
e
This process is primarily used for full custom designs or library cells.
c
The layout for a transistor consists of a polysilicon gate, a diffusion layer,
and a substrate layer.
n
Metal contacts, vias, and substrate taps are needed as interconnects for
your transistors.
e
It is a good idea to lay your design out on paper using a stick diagram
before diving into a tool.
d
Stick diagrams ensure a continuous diffusion layer and consistent vertical
poly strips.
ca
Design rules allow layout engineers to produce a high-yield design
without understanding the intricacies of the fabrication process.
The process engineer provides a document called the design rule manual,
which contains all the pertinent design rules to the layout engineer.
The manual contains minimum spacing requirements for all layers on a
layout.
Summary (continued)
LVS is a tool to check the functional correctness of a layout by
comparing the layout against the netlist for which it was designed.
library file.
ce
The information from a completed layout is annotated into a LEF
The LEF file is used in automatic place and route tools, giving the tools
n
information about the routing layers for a certain technology process.
e
The technology LEF file contains design rules of all metal layers, whereas
the standard cell LEF file contains the locations of all internal pins and
d
routing inside the standard cells.
ca
True or false
e
1. Laying out an entire chip manually is an easy process and is done
routinely in the industry.
material.
nc
2. The diffusion layer of an NMOS is made out of heavily doped p-type
3. Stick diagrams contain no information about the dimensions of your
e
layout.
d
4. Design rules must be strictly followed in order for the design to have a
high yield.
ca
5. The technology LEF file contains only the standard cell information
about a certain technology process.
Learning Activity
e
Match the following LEF file terms with the corresponding diagram in
the handout.
c
Present your results to the class.
n
e
a d
6/16/08
Timing Libraries and Constraint Files
Module 3
June 16, 2008
One Verilog Source, Many Design Possibilities
Design 1
ce
Verilog
Design 2
Timing
Library
en
Logic
d
Synthesis Design 3
Constraints
ca Design 4

One Verilog Source, Many Design Possibilities (continued)
Using the same Verilog® design files, different variations of the same
design can be made.
ce
Design1 is the smallest and Design4 is the biggest, but all four designs
perform the same logical function.
How were the designs made different?
Timing library
en
The timing library and constraints made all the difference.
d
Guides as to which technology to target to, for example 130 nm or 65 nm.
Constraints
a
This defines the rules based on which the design has to be made.
c
If the rules are written well, the results is a better and smaller design.
If the rules are written poorly, even with the best technology, the result is
the worst and biggest design.
One Verilog Source, Many Design Possibilities (continued)

Let’s look at the process in detail to understand what we will learn
today.
ce
The designers write equivalent behavioral Verilog code, which has the
same functionality as a digital circuit that is to be manufactured.
A synthesis tool is used to convert this behavioral code into a structural
n
code implementing the same functionality.
Structural Verilog consists of instantiated gates.
e
But from where does the synthesis tool get these gates?
d
ca
Module Objectives
e
Identify the syntax of a timing library and describe how the numbers in
the library are used for timing analysis
c
Create a constraint file based on timing specifications
n
d e
ca

Technology libraries
e
Constraints
General-purpose and object-access constraints
Timing constraints
n
Environmental constraints
c
d e
ca
What Are Timing Libraries?
Every foundry has a list of gates with which it can build designs.
e
A list of such gates and cells is stored in a file generically called a
library.
c
Cells are library representations of gates. You use a cell from a library to
create a gate in your design.
technology library.
en
One such file that is used by a synthesis tool, is called as synthesis
For our presentation, we will refer to it as library.
a d
Different views (logical, physical, etc.) of the gates and cells are stored in
different files. Together, these views are called a technology library.
Other library files also exist and contain information needed by the back-
c
end tools (not discussed in this section).
What Are Timing Libraries? (continued)

The most common format used to
write these libraries is the Liberty
e
format, which uses a .lib
extension.
c
A library file is comprised not only
of a list of gates but also RTL SDC
n
Their functional/logical definitions
Power, energy, and timing
e
characteristics
Their physical characteristics such Logic Synthesis Library
d
as area and footprint
Synthesized
If this library is given as an input to Gates
a
Netlist
a synthesis tool along with the
behavioral (RTL) code, it converts
c
it into an appropriate structural
design.
Synthesis can replace cells with
other cells of the same footprint
without affecting logic function.

What Is in a Library File?
A library file consists of two sections, header and body.
General attributes
Header
ce
Library File
Cell name
Body
Documentation attributes
Unit attributes
Operating conditions
en Physical description
d
Pin information
Threshold and default definitions
a
Templates
Power characteristics
More attributes
c
Voltage information
Wire load definitions
6/16/08
Timing characteristics
General, Documentation, and Unit Attributes

General attributes Library File
delay_model: Delay model used,
e
Header
lookup, or calculated
General attributes
c
Other attributes not discussed
Documentation attributes Documentation attributes
n
revision: Revision number
Unit attributes
date: Date created
comment: Any comments
Unit attributes
d
time_unit: nano, pico, etc.
e Operating conditions
a
voltage_unit: milli, micro, etc. Templates
c
current_unit: milli, micro, Etc.
More attributes
pulling_resistance_unit: Ω, etc.
leakage_power_unit: watt, etc. Voltage information
capacitive_load_unit: pico,
femto, farad, etc.
Operating Conditions
Operating conditions are the conditions Library File
under which the chip will operate,
e
including process, temperature, and Header
voltage General attributes
nom_process: 1, 2, etc.
nom_temperature: 100, 120, etc.
nom_voltage: 1, 0.9, etc.
nc Documentation attributes
Unit attributes
operating_conditions
process: 1, 2, etc.
d e Operating conditions
a
temperature: 100, 120, etc.
voltage: 1, 0.9, etc.
Templates
c
tree_type: balanced, etc. More attributes
Voltage information
Threshold and Default Definitions

Threshold definitions
Library File
Slew_lower_threshold_pct_rise:
e
10, 30, etc. Header
Slew_lower_threshold_pct_fall: General attributes
c
90, 70, etc.
These indicate the points from Documentation attributes
n
where the slew should be
calculated Unit attributes
30% here
d
70% here
e Operating conditions
a
Templates
Default definitions
c
Contains attributes such as More attributes
default_fanout_load,
default_max_transition, etc. Voltage information
There are many more attributes
that are not discussed Wire load definitions

Templates
Templates: Different types of Library File
templates present are
e
Header
Power/energy template
General attributes
c
Timing template, etc.
Template shows how these
characteristics would be
described in the library
Let’s look at an example to
en Unit attributes
d
understand better: Threshold and default definitions
a
lu_table_template(delay_template_7x7) { Templates
variable_1: input_net_transition;
c
variable_2:
total_output_net_capacitance;
More attributes
index_1 ("1000, 1001, 1002, 1003, 1004,
1005, 1006"); Voltage information
index_2 ("1000, 1001, 1002, 1003, 1004,
1005, 1006");
}
Example Template
Shown below is an example of timing template. (Templates for other
characteristics are similar and will not be discussed in this module.)
ce
lu_table_template(delay_template_7x7) {
variable_1: input_net_transition; (ex: 1, 2, 3, 4, 5, 6, 7)
variable_2: total_output_net_capacitance; (ex: 10, 20, 40, 80,
n
160, 320, 640)
index_1 ("100, 101, 102, 103, 104, 105, 106");
index_2 ("100, 101, 102, 103, 104, 105, 106");
e
}
d
lu indicates that it is a lookup and not calculated.
7x7 indicates the size of the lookup to be 7 rows and 7 columns.
a
variable_1 indicates the factor for row indices.
c
variable_2 indicates the factor for column indices.
Example: Delay for an input_net_transition of 2 ps and
total_output_net_capacitance of 80 pF is row2 and column4. From the
table, we get this value to be 103 ps.

Voltage Information and Wire Load Definitions
There are more attributes defined
Library File
in a library file such as the pad
e
attributes (I/O pads), which are not Header
discussed here. General attributes
c
Voltage information
Minimum, maximum, and other
n
complimentary MOS (CMOS)
characteristics of the input and Unit attributes
e
output voltages are described in
this section, Operating conditions
d
In a digital circuit, not only the
a
gates, but even wires have delays
associated with them. Templates
c
They may be small compared to
gate delay, but considering the More attributes
amount of wiring in the latest
chips, their delay accounts to as Voltage information
much as 50%.
Example: Wire Load

A wire load or wire load model (WLM) is an estimate of the net delays
in a netlist.
ce
Many WLM choices are available in a timing library, and they are
chosen based on the size of a design.
Example:
e
wire_load(“wire_load name") {
resistance : 8.0e-8;
n
d
capacitance: 1.2e-4;
area : 0.7;
a
slope : 66.667;
fanout_length (200.0);
}
6/16/08
c
A custom wire load model (CWLM) is a user-generated model that can
be used to more accurately estimate the net delays.

Cell Name and Physical Description
Cell name: Indicates the name of Library File
the cell
e
Body
Physical description
Cell name
c
cell_footprint: general name, ex
and2, or2, etc.
n
area: area of the cell, 20.8, 30.4,
etc.
Physical description
e
Example:
d
cell (ADDX1) { Pin information
cell_footprint : add;
a
area : 80.000;
Power characteristics
6/16/08
c Timing characteristics
Pin Information
Direction: Input/Output/Inout/Internal
Library File
Capacitance: Capacitance that is seen
e
at this pin. Body
Output pins: Cell name
c
Function: Value based on the inputs
Example:
n
Function: (in1 in2) Æ and gate
Function: (in1 | in2) Æ or gate Physical description
e
Example:
pin(CI) {
direction : input;
d
Pin information
capacitance : 0.004189;
}
a
pin(S) {
direction : output; Power characteristics
c
capacitance : 0.0;
function : "(A ^ B ^ CI)";
Other characteristics of the pin include
power and timing characteristics Timing characteristics
Note: Power is not discussed in this
module.
Timing Characteristics
The timing information is Library File
displayed for each output pin in
e
relation to each input pin, in the Body
form of a lookup table. Cell name
c
If it is not calculated
n
There are multiple lookup tables
for each type of delay. Physical description
e
Rise delay
Rise transition
d
Pin information
Fall delay, etc.
a
It is displayed exactly as we saw
in the timing template earlier, but a Power characteristics
c
little more detail.
Timing characteristics
Timing Characteristics Example

The two variables and the full 7x7 lookup table is displayed.
e
Example:
c
pin(Y) {
direction: output;
capacitance: 0.0;
n
function: "(A B)";
internal_power() {
e
related_pin: "A";
cell_rise(delay_template_7x7) {
index_1 ("0.04, 0.07, 0.1, 0.2, 0.5, 1.0, 2");
d
index_2 ("0.006, 0.030, 0.078, 0.174, 0.366, 0.749,
1.523");
values ( \
a
"0.07, 0.09, 0.13, 0.20, 0.35, 0.64, 1.23", \
"0.08, 0.10, 0.13, 0.21, 0.35, 0.65, 1.24", \
c
"0.09, 0.11, 0.15, 0.22, 0.37, 0.66, 1.25", \
"0.11, 0.13, 0.17, 0.25, 0.39, 0.68, 1.28", \
"0.14, 0.17, 0.20, 0.28, 0.42, 0.72, 1.31", \
"0.18, 0.21, 0.25, 0.33, 0.47, 0.76, 1.35", \
"0.23, 0.26, 0.31, 0.39, 0.54, 0.83, 1.42");
}

Timing Characteristics Example (continued)
index_1 represents input net transition, and index_2 represents total
output net capacitance.
values are looked up.
ce
Depending on various values for these indexes, corresponding delay
Question: In the previous example, if the delay template was given as
en
follows, what would the cell_rise be if the input_net_transition was
0.07 and the total_output_net_capacitance was 0.030?
d
lu_table_template(delay_template_7x7) {
variable_1 : input_net_transition;
a
variable_2 : total_output_net_capacitance;
index_1 ("1000, 1001, 1002, 1003, 1004, 1005,
c
1006");
index_2 ("1000, 1001, 1002, 1003, 1004, 1005,
1006");
}
Library File: Summary

A library file is a file that contains basic information about cell
functionality, timing, power, etc., for a given technology node.
ce
One such file (the liberty file), which is used by a synthesis tool, is
called a synthesis technology library.
The library file is divided into two main parts:
en
Header: Contains all the attributes and terminology used in the library
Body: Contains characteristics of each cell that a foundry has for a specific
technology
a d
Synthesis tools use these files to generate structural Verilog files
equivalent to behavioral (RTL) Verilog files given as inputs.
c
Next, we will see what else is given as inputs to a synthesis tool.

Discussion Question
Given the following circuit and .lib file examples, what is the path delay?
cell (DFFX1) {
e
cell_footprint : dffx1;
area : 50.0;
pin(D) {
direction : input;
DFFX1 DFFX1 timing() {
BUFX1
c
related_pin : "CK";
timing_type : setup_rising;
rise_constraint(setup_template_3x3) {
index_1 ("0.05, 1.4, 4.5");
index_2 ("0.05, 1.4, 3.3");
values ( \
n
"0.156250, 0.070312, 0.113281", \
"0.246094, 0.140625, 0.175781", \
"0.203125, 0.093750, 0.128906");
}
pin(Q) {
e
direction : output;
timing() {
related_pin : "CK";
timing_type : rising_edge;
cell (BUFX1) { timing_sense : non_unate;
d
cell_footprint : buf; cell_rise(delay_template_7x7) {
area : 13.0; index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5");
pin(A) { index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
direction : input; values ( \
} "0.291957, 0.437181, 0.550916, 0.843878, 1.248819, 1.788431, 2.305442", \
a
pin(Y) { "0.316264, 0.461499, 0.575227, 0.868187, 1.273127, 1.812741, 2.329752", \
direction : output; "0.388358, 0.533648, 0.647351, 0.940318, 1.345271, 1.884899, 2.401920", \
function : "A"; "0.439033, 0.584292, 0.697982, 0.990937, 1.395897, 1.935540, 2.452571", \
internal_power() { "0.462183, 0.607445, 0.721146, 1.014067, 1.419031, 1.958683, 2.475723", \
timing() { "0.468653, 0.613990, 0.727660, 1.020554, 1.425521, 1.965184, 2.482228", \
c
related_pin : "A"; "0.460997, 0.606314, 0.719968, 1.012831, 1.417787, 1.957454, 2.474507");
timing_sense : positive_unate; }
cell_rise(delay_template_7x7) { lu_table_template(delay_template_7x7) {
index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5"); variable_1 : input_net_transition;
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); variable_2 : total_output_net_capacitance;
values ( \ index_1 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
"0.094400, 0.235579, 0.351869, 0.653282, 1.070188, 1.625921, 2.158454", \ index_2 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
"0.116567, 0.257243, 0.373654, 0.675230, 1.092220, 1.647999, 2.180553", \ }
"0.156644, 0.301067, 0.417546, 0.719020, 1.136089, 1.691941, 2.224538", \
"0.165784, 0.318068, 0.434036, 0.735633, 1.152743, 1.708488, 2.241054", \ lu_table_template(setup_template_3x3) {
"0.149625, 0.311618, 0.428220, 0.729893, 1.147035, 1.702969, 2.235440", \ variable_1 : constrained_pin_transition;
"0.117344, 0.289370, 0.407324, 0.710811, 1.128181, 1.684128, 2.216830", \ variable_2 : related_pin_transition;
"0.067751, 0.250660, 0.370401, 0.676924, 1.096166, 1.652295, 2.184962"); index_1 ("1000, 1001, 1002");
} index_2 ("1000, 1001, 1002");
}

e
Constraints
Timing constraints
nc
d e
ca
Constraints
The rules that are written are referred to as constraints.
e
Constraints are essential to meet design goals in terms of area, timing,
and power to obtain the best possible implementation of a circuit.
nc
Constraints allow designers to control various aspects of synthesis.
Synthesis algorithms and heuristics are tuned to automatically find the

most optimal solution; however, sometimes they initially fail to reach
e
the most optimal result.
a d
6/16/08
Defining Constraints
Every EDA tool has its own commands to define constraints for a
design.
ce
However, there is a common format, which is supported by almost all
the EDA tools, to define the constraints.
This format is called Synopsis Design Constraint (SDC) format.
en
The constraints are defined using special SDC commands.
The file is saved with an .sdc extension.
a d
6/16/08
SDC Format
The SDC commands are divided into these broad categories:
General-purpose commands
Object-access commands
Timing commands
ce
n
Environmental commands
There are more categories in SDC format, but we will be discussing
e
only these for this course.
d
ca

e
Constraints
Timing constraints
n
c
d e
ca
General-Purpose Commands
Following are general-purpose commands:
expr: Used to create simple expressions
ce
Syntax: expr arg1 arg2 arg3 … argn
Example: expr 0.1 + 0.2 + 0.1
The result of this is the addition of the three numbers that is 0.4.
en
set: Used to define variables
Syntax: set variable_name Value
Example: set design design1
a d
The variable $design now contains the value design1. A list of values
can also be defined as shown below:
set list {item1, item2, item3, item4 … itemn}
6/16/08
c
There are other general-purpose commands that are not discussed
here.
Object-Access Commands
These commands are used to get the location of an object in the
design.
design.
ce
The object can be a cell, a block, a port, a pin, or anything else in the
The commands are as listed below, most of which are self-

explanatory.
en
all_clocks: Returns a list of all clocks
all_inputs: Returns a list of all inputs within a clock domain
a d
Syntax: all_inputs –clock <clock_name>
all_outputs: Returns a list of all outputs within a clock domain
Syntax: all_outputs –clock <clock_name>
6/16/08
c
get_cells: Searches for a cell with a particular naming pattern and returns
its location if found
Syntax: get_cells pattern

Object-Access Commands (continued)
get_clocks: Searches for a clock with a particular naming pattern and
returns its location if found
e
Syntax: get_clocks pattern
c
get_nets: Searches for a net (equivalent to a wire in the Verilog language)
with a particular naming pattern and returns its location if found
n
Syntax: get_nets pattern
get_pins: Searches for a pins (ports of cells) with a particular naming
e
pattern and returns its location if found
Syntax: get_pins pattern
a d
get_ports: Searches for a port (inputs/outputs to the design) with a
particular naming pattern and returns its location if found
Syntax: get_ports pattern
6/16/08

e
Constraints
Timing constraints
n
c
d e
ca
Timing Constraints
To model a clock in a design, use the create_clock and the
create_generated_clock SDC commands.
ce
Let’s look at a few definitions before we start modeling a clock.
Clock period is defined as the time difference between two consecutive
rising or falling clock edges.
en
Duty cycle is defined as the ratio between the pulse duration (t) and the
period (T) of a rectangular waveform.
d
Pulse Duration t
Duty Cycle = t/T
ca
Rising Falling
Edges Edges
Clock Period T
create_clock
The create_clock command is used to model a clock waveform.
e
Syntax:
create_clock -period <period_value in nanoseconds> \
c
-name <clock name> -waveform <edge list> source_objects
n
Example:
create_clock –name core_clock –period 10 \
e
–waveform (4, 10) [get_port “clock”]
d
The clock waveform that would be modeled is as below:
a
Pulse Duration t = 6 ns
Core_clock
6/16/08
c 4 10 14 20 24 30
Clock Period T = 10 ns

34 40
Duty Cycle = t/T

6/10 = 60%
176
create_generated_clock
This command is used to generate the model of a clock from an
existing clock model, such as from a PLL or dedicated clocking block.
Example:
ce
core_clock is the base clock defined in the design below.
n
clock2 is derived by multiplying the core_clock by 2.
e
External
clock port
d
PLL multiply
by 2 logic
ca core_clock is defined
at this point.
clock2 is defined at
this point.
create_generated_clock (continued)
If the definition of the core_clock changes, it is automatically reflected
in the generated clock model.
create_generated_clock
e
Some of the arguments that can be passed to it are
c
-name <clock_name>
-source <master_pin>
en -divide_by <factor>
-multiply_by <factor>
-duty_cycle <percent>
d
-invert
-master_clock clock
ca
Example: create_generated_clock –name clock2
source_objects
–source [get_pin core_clock] –multiply_by 2

–duty_cycle 50 –master_clock core_clock
[get_port] clock2

create_generated_clock (continued)
The clock waveform that would be modeled by the command in the
previous slide is as shown below.
ce
n
core_clock
clock2
d e
ca
set_clock_transition
The create_clock command assumes an ideal clock with no rise and fall times.
To model some realistic values of rise and fall time, the set_clock_transition command
e
is used.
Some of the arguments to this command are
c
set_clock_transition -rise –fall <transition> <clock_list>
Example: set_clock_transition –rise 0.1
n
[get_clocks “clock_core”]
Set_clock_transition –fall 0.1
e
[get_clocks “clock_core”]
The above two commands together model the clock core to have a rise and fall time of 0.1 ns.
a ideal clock
d
c
0.1 0.1
with clock_
transition

set_clock_uncertainty
The clock in real time is never perfect, and this must be accounted for
in the clock model that is defined.
ce
In SDC, this is achieved by the set_clock_uncertainty command
Some of the arguments that can be given to it are
n
set_clock_uncertainty -from <from_clock>
-to <to_clock>
e
-setup
-hold
a
Example:
d <uncertainty>
<object_list>
c
set_clock_uncertainty 1.0 [get_port “clock2”]
An uncertainty of 1.0 ns is set for clock2, as shown in the next slide.
set_clock_uncertainty (continued)
clock2
ce
en
The uncertainty value means that the clock edge can start 1 ns before
or after the ideal clock edge.
d
Example:
set_clock_uncertainty –from core_clock –to clock2 3.0
a
This means that if there is a logical path that goes from the core_clock
clock domain to the clock2 clock domain, the uncertainty for such paths is
c
3 ns.

set_disable_timing
This command marks a path to not be timed; that is, this path will be
regarded as virtually nonexistent. Use this command as a last resort,
e
when certain technology specific cells or macros require it.
c
Care should be taken in using this command, because not only does it
render the said path as not timed, but any other path that passes
n
through this path as well.
d e
ca
set_disable_timing (continued)
When the path from in2 to out2 as shown by the red arc is disabled, all the other paths going through
this arc are also disabled.
e
That is, by stating this one arc, two paths are disabled:
1. D1->Q1->in2->out2->D3->Q3 and
c
2. D2->in1->out1-> in2->out2->D3->Q3
But if the intention was only to disable one pat, say number 1 above, it should have been stated in a
n
different way or using a different command, which we will see later.
Various arguments that can be used with command are
e
set_disable_timing -from <from_pin_name> -to <to_pin_name> <cell_pin_list>
Example: set_disable_timing –from in2 –to out2
d
sel
D1 Q1 in2
a
D Q D3 Q3
D Q
c
out2
D2 in1
D Q
out1

set_false_path
This command is used to remove a particular path or set of paths from
being timed, that is, it will set it as false.
e
False path: A path that has no functional purpose, or a path that does not
need to be timing constrained (for example, path between two clock
c
domains).
When a path is set as a false path, the synthesis tool only maps it to
technology-specific gates.
en
The tool does not optimize or improve the timing of this path even if it does
not meet timing.
d
Reasons for false path:
a
Path is never exercised during circuit operation
Path is only possible in special operation mode (test mode, etc.)
6/16/08
c
This command is different from set_disable_timing in the sense that
Only the paths specified are set as false.
Any other paths passing through a false path, but not sharing the same
exact start-end pair, will not be affected.
set_false_path (continued)
Some of the arguments that can be passed to this command are
set_false_path -from <from_list> -to <to_list> -through <through_list>
e
The following command sets the path from F1 to F3 as false:
set_false_path –from [get_cells “F1”] –to [get_cells “F3”]
nc
But if the intention is to set all the paths that pass through the red arc as false, this is
how it can be done:
set_false_path –through [get_cell “OR1/in2”]
This will set both the paths from F1 and F2 to F3 as false.
D1 Q1
d e sel
in2
a
D Q D3
M1 OR1 Q3
F1
D Q
c
out2 F3
D2 in1
D Q AND1
F2 out1

I/O Delays
The I/O delays consist of input delay and output delay.
e
Input delay is the time it would take for the data to arrive at the input
port of the design.
should have.
nc
Output delay is the margin that the data going out of the output port
It can be viewed as the input delay for the input port of another design that
e
is connected to this output port.
The figure on the following slide illustrates this better.
a d
6/16/08
I/O Delays (continued)

The clouds in the figure represent combinational logic.
e
The corresponding commands in the SDC format to model these
delays are
c
Input delay: set_input_delay
n
Output delay: set_output_delay
Output delay
e
Input delay
a d
6/16/08
c Clock period
Data arrival timing
Clock period Clock period
Data Required timing

set_input_delay
Some of the arguments to set_input_delay are
set_input_delay
e
-clock <clock_name>
-max
c
-min
-add_delay
n
<delay_value>
e
<port_pin_list>
If an input delay has already been specified for a pin, then the –add_delay
d
argument enables the new delay specified to be added on to the existing
delay.
a
Example: set_input_delay –max 3.0 [get_pin “in1”]
c
–clock [get_clock “core_clock”]
This command assumes an input delay of 3.0 ns for the data coming in at
the input port in1.
set_output_delay
Some of the arguments to set_output_delay are
set_output_delay
e
-clock <clock_name>
-max
c
-min
-add_delay
n
<delay_value>
e
<port_pin_list>
If an output delay has already been specified for a pin, then the
d
–add_delay argument enables the new delay specified to be added on to
the existing delay.
a
Example: set_output_delay –max 3.0 [get_pin “out1”]
c
–clock [get_clock “core_clock”]
This command assumes an output delay of 3.0 ns for the data going out of
the output port out1.

Logic Outside the Design
To get an accurate model, the virtual flops/cell that may be there outside your
design, and the input transition from it, must be modeled as well.
e
Input delay Output delay
nc
d e
Clock period Clock period Clock period
a
Data arrival timing Data Required timing
c
Same is true on the output side, the case being that the output load that is
being driven must be modeled.
These modeling techniques fall under environmental modeling commands and
will be covered after this section.
set_max_delay
max_delay is the period that a combinational path from the input port
to the output port in the design should meet.
ce
n
In Out
Combinational 1
d e
ca Max Delay

set_max_delay (continued)
Some of the arguments to the set_max_delay command are
set_max_delay
e
-from <from_list>
-to <to_list>
c
-through <through_list>
<delay_value>
en
Example: set_max_delay 5.0 –from [get_port “IN”]
–through [get_cell “combinational1”]
–to [get_port “OUT”]
a d
This command sets a maximum delay that is allowable through the
combinational path shown in the previous figure as 5.0 ns.
6/16/08
set_multicycle_path
The figure below helps illustrate multi_cycle_path.
D1 Q1
D Q
ce D Q
en
d
Data captured Data not Data captured
here; launch
captured here here
a
data from D1
c
The data in this example is captured every other clock cycle.
Specify in the SDC file that this particular path has a time period of two
times that of the clock period.
This can be achieved by using the set_multicycle_path command.

set_multicycle_path (continued)
set_multicycle_path -start
ce -end
-from <from_list>
-to <to_list>
n
-through <through_list>
e
<path_multiplier>
Example: set_multicycle_path –from [get_pin “D1”]
d
–to [get_pin “Q1”] 2
a
This command sets the time period for this particular path as twice the
clock period of that clock domain.
6/16/08

e
Constraints
Timing constraints
n
c
d e
ca
set_driving_cell
Remember the virtual logic outside the design when modeling I/O
delays.
particular input port.
ce
The set_driving_cell command specifies the cell type that is driving a
n
Virtual Logic
Output delay
e
Input delay
a d
6/16/08
c Clock period
Data arrival timing
Clock period Clock period
set_driving_cell (continued)
set_driving_cell
e
-lib_cell lib_cell_name
-library <lib_name>
c
-pin <pin_name>
-clock <clock_name>
n
<port_list>
e
Example: set_driving_cell –libcell AND2X
–library xyz_130nm –pin Y
d
–clock [get_clocks “clk”] [get_port “input1”]
a
This command indicates that the output pin Y of an AND2X gate from the
library xyz_130nm is connected to the input1 port in the clk clock domain.
6/16/08
set_input_transition
This command models the transition of the waveform at the input port.
e
Rise time: The time it takes for the waveform to rise from 5% to 95% of
its final value.
its base value
nc
Fall time: The time it takes for the waveform to fall from 95% to 5% of
Input transition
d e
ca Rise time Fall time
set_input_transition (continued)
set_input_transition
e
-rise
-fall
c
-clock <clock_name>
<transition>
n
<port_list>
d e
Example: set_input_transition –rise 0.1
–clock [get_clocks “clk”]
a
[get_port “input1”]
set_input_transition –fall 0.2
c
–clock [get_clocks “clk”]
[get_port “input1”]
These commands model the waveform of input1 with a rise time of 0.1 ns
and a fall time of 0.2 ns.

set_load
This command is used to model a load that an output port may see.
e
At the output, one is not concerned about the cell that may be driven
or the transition that they may receive.
nc
View this as the input port modeling that another block, connected to
this output port, would model.
set_load -min
d
-max
e
<value>
a
<Objects>
c
Example: set_load –max 20 [get_ports “output1”]
This command sets a maximum load of 20 fF on the output1 port.
set_case_analysis
In a large design, all of the logic in the design may not be active at the
same time.
ce
Logic blocks may be activated based on the value of certain inputs.
This is done sometimes to save power.
n
Some designs themselves are configurable to perform different tasks
depending on certain input values.
EN[1:0]
d e Block1
EN(00)
Block2
EN(01)
ca EN(11)
Block4
EN(10)
Block3

set_case_analysis (continued)
The previous design is an example where different blocks are enabled
by an external enable signal.
ce
To get accurate timing and power numbers of the entire design, it
should be timed with one block enabled at a time, because that is
essentially how the design would actually behave.
en
This can be achieved by setting the enable pin to a constant value for
timing with the use of the set_case_analysis command.
The arguments that can be passed to it are
set_case_analysis
a d <value (0 or 1)>
<port_or_pin_list>
Example: set_case_analysis 0 [get_port “EN[0]”]
6/16/08
c set_case_analysis 0 [get_port “EN[1]”]
This command sets the value of the EN pin to binary 00 during timing.
set_max_fanout
Fanout indicates the number of cells being
driven by one cell.
e
If this number is very big, the size of the
driving cell is increased by the synthesis to
c
be able to drive this large load.
Bigger cell means more area, and
sometimes it is desirable to restrict the size
n
of cells.
This can be achieved by specifying the
e
maximum number of loads a cell can drive,
and this is exactly what this command
d
models.
The arguments to this command are
a
set_max_fanout <value>
object_list
c
Example: set_max_fanout 16 TOP_LEVEL
This command sets a limit of 16 on the
Fanout of 16
number of loads to all the cells in the design
TOP_LEVEL.

Summary
To make a good design, the technology choice and the constraints are
as important as the design itself.
aspects of the design.
ce
Constraints guide the synthesis tool and tell it how to handle different
The libraries provide the synthesis tool with the building blocks for the
design itself.
en
a d
6/16/08

1. Is it possible to give more than one generic technology library as input
to a synthesis tool? What would the outcome be?
A. No, it is not possible.
ce
B. Yes, it is possible and will result in a better design.
C. Yes, it is possible, but will result in a worse design.
MHz. The port name is clk.
en
2. Write the SDC command to model a clock that has a frequency of 100
d
3. Write the SDC command to model rise and fall times of 100 ps for the
above mentioned clock.
ca
Testing Your Understanding (continued)
4. Which of the following constitutes uncertainty?
e
A. Clock skew
B. Clock jitter
c
C. Wire load assumptions
n
D. Margin
E. All of the above
d e
ca
Learning Activity
e
Interpret the specifications for a given design
Create an SDC file based on the specifications given
c
Present your results to the class
n
e
a d
6/16/08
Synthesis
Module 4
June 16, 2008
Which Method Would You Use …

… to design the logic circuitry for a one-million gate design?
12
Paper
and Pencil
ce
n
Months
d e
Schematic
Capture
a
.1 Logic Synthesis
6/16/08
c
Logic synthesis has dramatically reduced the ASIC design cycle. You will
learn why in this module.

Module Objectives
e
Explain the optimization stages of the synthesis flow
Interpret the results in a timing report
nc
d e
ca
What is logic synthesis?
e
What are the inputs and outputs to and from logic synthesis?
nc
d e
ca
Logic synthesis
Introduction
Reading HDL source files
Elaborating design
ce
n
Technology-independent (generic) mapping
Technology transformation
Scan chain insertion
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
What Is Logic Synthesis?

Definition: The process of
parsing, translating, Specification Floorplanning Place/Route
e
optimizing, and mapping RTL Designer Placement
code into a specified
c
Microarchitecture
Physical Synthesis
Scan Reorder
standard cell library
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
Example: To determine the Design Optimization
Post-CTS
feasibility of the design, we Logic Synthesis
e
Route
need to synthesize the RTL Synthesized Design Optimization
Netlist Gates
code into gates, and measure Post-Route
d
Detail
timing, power, and area. Routed GDSII
Design
a
6/16/08
Logic Synthesis: Input and Output, Format
Input
RTL SDC
RTL in the Verilog® language or
e
other HDL
c
Logic Synthesis Library
n
Timing Libraries in Liberty (.lib)
format Synthesized
Gates
e
Gates
Output
Gate-level netlist in the Verilog
d
language or other HDL
ca
Logic Synthesis Goals

Minimize area
In terms of cell count and cell size
Minimize power
ce
In terms of switching activity in individual gates, deactivated circuit blocks
In terms of leakage power
Maximize performance
en
In terms of maximum clock frequency of synchronous systems, throughput for asynchronous
systems
d
Quickly produce accurate functional models
Gate-level model is functionally equivalent to RTL model
a
Gate-level model is produced in less time than is required by an experienced logic designer to
create the same model
6/16/08
c
Produce predictable and accurate results
Timing, area, and power consumption calculations should correspond with actual values
measured on physical device once manufactured.

Logic Synthesis Phrases and Commands
Synthesis Phrase Description Command
e
Read RTL source files Parse source code, check read_hdl
syntax
c
Elaboration Build data structures and elaborate
registers
Technology-independent
mapping
n
Optimize data structures
e
synthesize –to_generic
Technology transformation Map to specific technology synthesize –to_map
d
(mapping) gates
a
Technology-dependent Use optimized gates in the retime (optional)
optimization technology library
c
Scan chain insertion Build the scan chain synthesize –to_map
–incremental
Timing report analysis Create timing reports report_timing

Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals

Reading RTL Source Files
Reading the RTL source files performs two functions:
Source files undergo a lint check (syntax and structure check).
Example
in the next phase.
ce
If the source files pass the lint check, they are loaded into memory for use
n
rc:/> read_hdl -v2001 my_design.v
e
a d
6/16/08
Log Entries for Reading RTL

read_hdl command loads
rc:/> read_hdl -v2001 my_design.v my_design.v
e
Reading Verilog file ‘my_design.v'
c
assign #1875 write_clk_int = ~clk; loads my_design.v
|
n
Warning : Ignoring delay specifier. [VLOGPT-35]
: in file '/my_design.v' on line 373, column 14.
: A delay specifier, either in an assignment or as a separate statement, is
e
not synthesizable.
assign #1875 postamble_clk_int = ~clk;
a d
Linting process has detected a problem in my_design.v.
Details of the problem are listed. In this case, my_design.v includes a Verilog
c
construct that is not synthesizable (Verilog # construct).

Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals
Elaborating Design
Builds data structures and infers registers in the design
e
Function expansion (e.g., functions are in-line expanded)
Constant propagation
Original code
nc
Detect operands driven by constant values and pre-compute the output
e
a = 0;
b = a + 1;
d
c = 2 * a;
Optimized code
ca
b = 1;
c = 0

Elaborating Design (continued)
Loop unrolling
for loops are replaced by as many instances of the loop body as the loop
Original loop
ce
would have iterated. This allows for greatest possible optimizations later.
for (a=2; a >= 0; a = a -1)
n
z[a] = x[a] + y[2-a];
e
Unrolled loop
z[2] = x[2] + y[0]
d
z[1] = x[1] + y[1]
a
z[0] = x[0] + y[3]
a = x
6/16/08
Elaborating Design (continued)

Dead code removal
Dead code consists of operations that cannot be reached, or whose result
removed.
Original code
ce
is never referenced elsewhere. Such operations are detected and
n
a = x
e
b = a + 1
c = 2 * a
d
Optimized code
b = x + 1
ca
c = 2 * x
Dead code elimination removed
a = x

Log Entries for Elaboration
e
elaborate command
rc:/> elaborate my_design builds my_design
c
Elaborating block my_design from file ‘my_design.v'.
Warning : Removing unused register. [CDFG-508]
n
: Removing unused register 'doing_wr_r' in module ‘my_design' in
file ‘my_design.v' on line 155. Beginning of
e
Info : Unused module input port. [CDFG-500] elaboration section
: Input port 'p_clk' is not used in module ‘my_design' in file
‘my_design.v' on line 90.
d
End of elaboration
Done elaborating ‘my_design'. section
ca

Logic synthesis
Introduction
Elaborating design
ce
n

d e
a
c
Physical synthesis
Fundamentals

Technology-Independent Mapping
A design is technology-independent when the formula (function,
system) has no connection with the building blocks in the
e
implementation.
c
Technology-independent mapping and optimization techniques:
Carry save arithmetic optimization
Logic pruning
Resource sharing
Speculation
en
a d
Implementation selection
Arithmetic optimization
c
Common sub-expression sharing
Logic speculation
Carry-Save Arithmetic Operations

Carry-save arithmetic (CSA) operations are functionally equivalent to their
carry-propagate counterparts.
e
The carry logic for the intermediate sums is saved until the very end, thus
c
saving area and possibly timing.
a bc d e f
n
a b c d e f
+ + +
d e
a
+ +
z z
6/16/08
c Carry-propagate

Carry-save
228
Pruning Logic Driving Unused Pins
By default, logic that drives unused (unloaded) hierarchical pins is optimized
away.
c
transitively drive an output port.
e
In the example below, instances in red are deleted because they do not
D Q
en D Q
*
in1
in2
a d * *
c
in3 1 1 out1
in4 0 1’b1 0
Resource Sharing
A resource is any computational element, Given the following HDL description:
such as an add, shift, or “if/then” operation. if (select)
e
sum <= A + B;
Each type of operator in the RTL description else
sum <= C + D;
requires a unique resource type.
c
For instance + operator requires an adder One possible implementation:
and’ > requires a comparator. A
n
+
Maximum number of resources required for B
MUX sum
each operator type is the number of times an
e
operator is used in the RTL description. C
D
+
Resources can be reduced, thus saving
d
area, using the following techniques: select
Some operators can be mapped to a common
a
Another, more efficient implementation.
resource type. For instance, + and - operators
A
can be mapped to an add-subtract unit. MUX
c
C
Operators in different clock cycles can share select sum
+
the same resource. This is determined by B
analyzing if there are any data flow or control D
MUX
flow conflicts (discussed later).

Sharing and Speculation
The sharing and un-sharing (speculation) of resources trades off area versus
timing during logic synthesis
if (Q =‘0’)
Q
A
MUX
C
ce B
MUX
D
Speculation
A B C D
n
x = a + b; + +
else
+
e
y = c + d; X Y
Q
X
Resource MUX
Y
d
Sharing
ca
Implementation Selection: ChipWare

Some advanced synthesis tools come
with a libraries of re-usable designs. HDL Operator Definition
e
RTL File
Cadence Encounter® RTL Compiler (RC) has
such a library known as ChipWare.
Z <= X + Y
c
ChipWare (CW) library includes
n
Common combinational and sequential
components
Arithmetic components (adders, subtractors,
e
ChipWare
add_op
multipliers) Library
Memory components (flip flops, FIFOs)
Logic synthesis searches for operators
a
maps those operators to CW
d
in RTL files it reads and automatically
ADD_SUB ADD ALU
c
components, if available.
CW components often have multiple
Implementations
architectural implementations that allow
ripple CLA proprietary
logic synthesis to pick one according to
design need.

Implementation Selection: Architecture Tradeoff
Different implementations of the ChipWare components have different area
and timing characteristics.
e
Design constraints determine the appropriate ChipWare component.
c
en fastest
Brent-Kung
Carry Look-Forward
d
Z <= A*B + C +
a
HDL Operator Carry Look-Ahead
c
smallest
Ripple Carry
Arithmetic Optimization
SUM <= A + B + C + D
A
e
B +
+
c
Initial Order C
SUM
D +
n
A
+
e
B
SUM
Optimized For Speed C +
•All inputs have equal delay
d
D +
a
Late A
B +
SUM
c
Optimized For Speed C +
•Input A is late arriving
D +
Note: Operators can not be re-arranged if initial order

is overridden by use of parenthesis in HDL

Common Sub-Expression Sharing
Consider the assignments:
e
SUM1 <= A + B + C
SUM2 <= A + B + D
c
SUM3 <= B + A + E
n
The “A+B” sub-expression could be shared, thus saving two adders in the process.
The order within the sub-expressions is not important, but the position must be the
e
same.
d
A B C A B D B A E A B C D E
a
+ + + +
c
Sharing of Sub-
+ + + + + +
Expressions
SUM1 SUM2 SUM3 SUM1 SUM2 SUM3
Commands for Technology-Independent Mapping

In this stage, logic synthesis performs technology-independent
optimizations, including
Constant propagation
Resource sharing
Logic speculation
ce
Multiplexor optimization
en
Carry-save arithmetic optimization
d
You can run this stage separately by using the following command:
synthesize –to_generic -effort <effort_level>
ca
Log Entries for Technology Independent Mapping
Starts technology-
independent
e
rc:/> synthesize -to_generic optimization
process
Deleting 2 sequential instances. They do not transitively
c
drive any primary outputs: Logic pruning
vpb/vpo/luma_sel_a1_reg[0], vpb/vpo/luma_sel_reg[0] (floating root)
Info
: The implementation
n
: An implementation was inferred. [CWD-19]
e
'/hdl_libraries/GB/components/increment/implementations/very_fast' was
Implementation
selection
d
inferred through the binding 'b1' for the call to synthetic operator
'INCREMENT_CI_OP'. Mux
a
Optimizing muxes in design ‘my_design' optimization
c
End of technology-
Synthesis succeeded. independent
optimization

Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals

Technology Transformation (Mapping)
Technology transformation or “technology mapping” is the phase of
logic synthesis when gates are selected from a technology library to
e
implement the circuit.
c
Technology mapping is normally done after technology-independent
optimization.
Why technology mapping?
en
Straight implementation may not be good. For example, F = abcdef as a
six-input AND gate cause a long delay.
d
Gates in the library are pre-designed; they are usually optimized in terms
a
of area, delay, power, etc.
Fastest gates along the critical path, area-efficient gates (combination)
c
off the critical path.
Technology Mapping Stages

Target setting
Target timing goals (clock period) for each class and group of timing paths
Global mapping
ce
are derived from the fastest arrival time.
Optimizes for area, timing, power, and maps the design while aiming for
Remapping
en
the target clock frequency
Evaluates every cell in the design and resizes as needed to improve area
and power consumption
a
Incremental optimization
d
c
Runs Design Rule Checks (DRCs), timing and area cleanup, and critical
region resynthesis (CRR) for timing optimization

Synthesis Stages: Target Setting
In this first phase of global mapping, logic synthesis performs tentative structuring
and computes the estimated arrival and required times for all the endpoints based
e
on the effort level you set. The result of this stage is the target for each cost group.
synthesize –to_mapped -effort <effort_level>
rc:/> synthesize -to_generic
nc Starts technology-
dependent
optimization
process
e
Mapping my_design to gates.
Technology
Mapping ‘my_design'... mapping
d
Preparing the circuit
Structuring (delay-based) logic partition in alu_32...
a
Target setting
Performing redundancy-removal...
c
Performing bdd-opto...
Performing redundancy-removal... End of target setting
Done structuring (delay-based) logic partition in alu_32
Synthesis Stages: Global Mapping

In this second phase of global mapping, RC restructures paths and computes
delays based on the targets and the effort level you set. The goal of this phase is to
e
meet the target timing.
synthesize –to_mapped -effort <effort_level>
Optimizing component cb_seq...
nc
Restructuring (delay-based) cb_part_4...
Optimizing component cb_part_4...
e
Done restructuring (delay-based) cb_part_4
d
Restructuring (delay-based) cb_oseq_3...
Indicates the beginning of
global mapping
a
Done restructuring (delay-based) cb_oseq_3
c
Optimizing component cb_oseq_3...
Restructuring (delay-based) cb_part...
Done restructuring (delay-based) cb_part
Optimizing component cb_part...

Synthesis Stages: Remapping
Several optimization routines are used during this stage of synthesis,
mainly to reduce the area of the design.
Global mapping status

=====================
Group
ce
n
Total
Total Worst
Operation Area Slacks Worst Path
e
-------------------------------------------------------------------------------
global_map 721782 -308 VIT_ACS10/NEW_reg[5]/CP --> VIT_ACS26/NEW_reg[1]/D
d
fine_map 514143 -372 VIT_ACS10/NEW_reg[6]/CP --> VIT_ACS8/NEW_reg[2]/D
area_map 512565 -344 VIT_ACS23/NEW_reg[4]/CP --> VIT_ACS31/NEW_reg[7]/D
a
area_map 498515 -345 VIT_ACS1/NEW_reg[5]/CP --> CS4/SELECT_REG_reg/D
Done mapping dtmf_chip
c
Indicates the beginning of
remapping
Synthesis Stages: Incremental Synthesis

Incremental synthesis iterates on paths with a mix of strategies to improve timing,
area, etc.
Incremental optimization status

===============================
Group
ce
n
Total Total - - - - DRC Totals - - -
Total Worst Neg Max Max Max
Operation Area Slacks Slack Trans Cap Fanout
e
-------------------------------------------------------------------
init_delay 498515 -345 -124671 414 18 229
d
Path: VIT_ACS1/NEW_reg[5]/CP -->
VIT_ACS4/THREE_SELECT_REG_reg/D
incr_delay 502638 -301 -114125 129 69 276
a
Path: VIT_ACS16/NEW_reg[2]/CP --> VIT_ACS2/NEW_reg[6]/D
incr_delay 511982 -267 -100144 0 19 614 Indicates the
c
Path: VIT_ACS19/NEW_reg[6]/CP --> beginning of
VIT_ACS13/NEW_reg[1]/D incremental
synthesis
incr_delay 515304 -221 -91064 0 34 614
VIT_ACS10/NEW_reg[0]/D

Synthesis Stages: Incremental Synthesis (continued)
This report shows the localized algorithms (tricks) used in incremental
optimization, the corresponding number of attempts made by the synthesis
e
engine, and the number of times that the routine has been run to improve
the design goals.
Run time for each of these
nc
Trick Calls
crr_rsyn
Accepts
389 (
Attempts
215 /
Time
-------------------------------------------------------
300 ) 79917
e
tricks must be small.
crr_glob 25 ( 198 / 215 ) 5324
crit_upsz 4746 ( 2047 / 2117 ) 31691
d
fopt 358 ( 0 / 0 ) 23
crit_dnsz 428 ( 23 / 25 ) 4970
dup 347 ( 1 / 1 ) 250
a
DRC fixing is done at the end fopt 1076 ( 261 / 336 ) 22515
of each pass. setup_dn 398 ( 11 / 14 ) 324
c
exp 25 ( 23 / 64 ) 3214
init_drc 522875 -235 -10660 0 31 537

VIT_ACS11/NEW_reg[0]/D

Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals

Technology-Dependent Optimization
A design is technology-dependent
if the formula (function, circuit,
e
system) is implemented by one or
more logic gates in a pre-designed
c
set of gates (called technology
library or cell library).
n
Advantage: Gates in the cell
e
library have a highly optimized,
pre-defined path to silicon, so that
the area and delay parameters are
d
known and accurate.
ca
Technology-Dependent Optimization Types

Boundary optimization
e
Register re-timing
nc
d e
ca
Controlling Boundary Optimization
Examines input and output pin characteristics of a sub-design to try and optimize a
mapped netlist
design
ce
Removes any gate that drives output ports that are not connected outside a
Considers swapping or merging of input ports to minimize logic.
n
Propagates constant values across hierarchical boundaries and eliminates
e
unnecessary logic.
d
a being 0, the blocks L1 and L2
are equivalent and therefore
a
optimized. L2
L1
c
clk
Constant Hierarchical
a=0 boundary
Retiming
Retiming optimizes the register locations in the design to improve the results without
changing the combinational logic or latency through the chip or block. Use the
e
following attributes to control retiming on the design and sub-designs:
Reposition flops
6ns
nc 4ns
Required Clock : 5ns

5ns
d e
WNS: -1 ns
Combine
flops
a
5ns
6/16/08
c
Required Clock : 5ns
Retime for Delay

WNS: 0 ns
retime –min_delay

Retime for Area
Retime –min_area
250
Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals
Test Synthesis
Manufacturing defects in ASICs are detected using automated test
equipment (ATE), which sends special bit patterns known as test
e
vectors into the inputs of the ASIC and compares the output to
expected values. Any difference could mean the ASIC is not
functioning properly.
nc
Improve testability by making every register in the design look like a
“virtual I/O.”
d e
Allows every flip-flop to be independently controlled and observed.
Allows every flip-flop to act like a combinational logic input.
Allows every flip-flop to act like a combinational logic output.
ca
Test Synthesis (continued)
Test synthesis is the modification of a chip design to make both the
chip and the PC board system containing it more testable.
ce
Coupled with this testability is the automatic test-pattern generation
(ATPG) of test vectors. Design for test (DFT) lets you modify a design
to make a circuit more testable. Test synthesis tools can assist in both
n
places.
e
The use of test synthesis for DFT techniques and ATPG reduces from
months to days the time to generate manufacturing test vectors.
a d
Use the DFT features of RTL Compiler to improve your ability to
control and observe internal signal nodes. After RTL and logic
synthesis, test synthesis can perform full or partial internal-scan cell
c
insertion and boundary scan. An ASIC vendor often implements
special cells in the ASIC library to handle these tasks.
Test Synthesis (continued)

The use of internal scan cells enables ATPG tools to easily generate
nearly 100% fault coverage on the combinatorial logic.
ce
Internal scan replaces latches and flip-flops with their scan-equivalent
latches and flip-flops. Each scan cell has a scan-data input (SDI), a
scan-data output (SDO), and a test-enable (TE) input. The tool
n
connects groups of these cells in chains of equal or similar length.
data_in
d e Combinational logic to be tested

data_out
ca
clock

Muxed Scan
To convert a traditional flip-flop to a muxed-scan flip-flop, simply add a
multiplexer on the data input to the flop.
data input to the flip-flop.
ce
shift_enable signal selects normal functional data input or a new scan
n
Scan inputs are chained to the output of other flip-flops.
Same clocks are used for both scan and functional operations.
d e Scan-DFF
Q
a
2:1
SI DFF
c
SE
QB
Muxed-Scan Hookup
Add scan chains
e
D Q D Q
SI SI SI
SE
CK
QB
nc SE
CK
QB
D Q
d e D
SI
Q SO
a
SI
c
SE SE
QB QB
CK CK
SE
Connect shift_enable

Muxed-Scan Shift Cycle
Sequence: SE to active state, pulse clock “n” times to scan in/out data
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q
SO
SE
6/16/08
cSE
CK
QB

SE
CK
QB
257
Muxed-Scan Capture Cycle

Sequence: SE to inactive state, pulse clock “1” times to capture data in the registers
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q SO
SE
6/16/08
c
SE
CK
QB

SE
CK
QB
258
Muxed-Scan Shift Cycle
Sequence: SE to active state, pulse clock “n” times to scan in/out data
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q
SO
SE
6/16/08
cSE
CK
QB

SE
CK
QB
259
RTL Top-down Design-for-Testability Flow

Read Target Libraries
Read HDL files
e
Elaborate Design
Modify constraints Set Timing and Design
c
Constraints Shift enable
Modify optimization Test mode
Apply Optimization Directives
directives Prevent scan mapping of flops
n
Setup for DFT Rule Checker Internal clocks as test clocks
Run DFT Rule Checker and DFT controllable constraints
Abstract scan segments
e
Report Registers
Fix DFT Violations
Test-point insertion
d
Add Testability Logic Shadow logic insertion
Synthesize Design and Map to
Scan
a
Scan chains
Set up DFT Configuration Number of scan chains
Constraints and Preview Scan Length of scan chains
c
Chains Control data lockup elements
Connect Scan Chains
No Run Incremental Optimization

Meet Analyze Design
constraints?
Netlist, SDC
Yes ScanDEF, ATPG, Abstraction Model

Logic synthesis
Introduction
Elaborating design
ce
n

d e
a
c
Physical synthesis
Fundamentals
Reading Timing Reports

============================================================
Generated by: Encounter(r) RTL Compiler v07.10-p004_1
Generated on: Jul 23 2007 03:16:40 AM
Module: dtmf_chip Header includes
library and module
e
Technology libraries: slow_normal 1.0 slow_hvt 1.1 tpz973gtc 230 ram_128x16A 0.0
ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 information.
Operating conditions: slow (balanced_tree)
Wireload mode: enclosed
c
============================================================
Pin Type Fanout Load Slew Delay Arrival
(fF) (ps) (ps) (ps)
----------------------------------------------------------------------------------
n
(clock m_clk) launch 0 R
latency +4000 4000 R
DTMF_INST
TDSP_CORE_INST
e
DATA_BUS_MACH_INST
data_out_reg[0]/clk 0 4000 R Body includes
data_out_reg[0]/q (u) unmapped_d_flop 19 155.1 0 +258 4258 R arrival time
DATA_BUS_MACH_INST/data_out[0]
calculation.
d
TDSP_CORE_GLUE_INST/data_out[0]
TDSP_CORE_GLUE_INST/port_data_in[0]
PORT_BUS_MACH_INST/data_in[0]
PORT_BUS_MACH_INST/pad_data_out[0]
a
TDSP_CORE_INST/port_pad_data_out[0]
DTMF_INST/port_pad_data_out[0]
IOPADS_INST/tdsp_portO[0]
c
Ptdspop00/I +0 4258
Ptdspop00/PAD PDO04CDG 1 6719.0 2038 +1648 5906 R
IOPADS_INST/tdsp_port_out[0]
port_pad_data_out[0] out port +0 5906 R
(ou_del_1) ext delay +500 6406 R
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(clock refclk) capture 6000 R
uncertainty -250 5750 R
---------------------------------------------------------------------------------- Footer includes
Timing slack : -656ps (TIMING VIOLATION) timing slack
Start-point : DTMF_INST/TDSP_CORE_INST/DATA_BUS_MACH_INST/data_out_reg[0]/clk calculation.
End-point : port_pad_data_out[0]

Reading Timing Reports: Header
============================================================
Generated by: Encounter(r) RTL Compiler v07.10-p004_1
e
Generated on: Jul 23 2007 03:16:40 AM
Module: dtmf_chip Header includes
Technology libraries: slow_normal 1.0 slow_hvt 1.1 tpz973gtc 230 ram_128x16A 0.0 library and module
ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 information.
c
Operating conditions: slow (balanced_tree)
Wireload mode: enclosed
============================================================
Tool-specific information
en
In the header, the following information is given:
d
Timestamp
a
Module information
6/16/08
c
Wireload mode
Reading Timing Reports: Body

Pin Type Fanout Load Slew Delay Arrival
(fF) (ps) (ps) (ps)
----------------------------------------------------------------------------------
(clock m_clk) launch 0 R
e
latency +4000 4000 R
DTMF_INST
TDSP_CORE_INST
DATA_BUS_MACH_INST
c
data_out_reg[0]/clk 0 4000 R Body includes
data_out_reg[0]/q (u) unmapped_d_flop 19 155.1 0 +258 4258 R arrival time
DATA_BUS_MACH_INST/data_out[0] calculation.
TDSP_CORE_GLUE_INST/data_out[0]
n
TDSP_CORE_GLUE_INST/port_data_in[0]
PORT_BUS_MACH_INST/data_in[0]
PORT_BUS_MACH_INST/pad_data_out[0]
e
TDSP_CORE_INST/port_pad_data_out[0]
DTMF_INST/port_pad_data_out[0]
IOPADS_INST/tdsp_portO[0]
Ptdspop00/I +0 4258
Ptdspop00/PAD PDO04CDG 1 6719.0 2038 +1648 5906 R
d
IOPADS_INST/tdsp_port_out[0]
port_pad_data_out[0] out port +0 5906 R
(ou_del_1) ext delay +500 6406 R
a
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(clock refclk) capture 6000 R
uncertainty -250 5750 R
The body of the timing report includes arrival time calculation and includes
c
Instance pins the timing path goes through
Fanout for each pin output
Load and slew for each pin output
Incremental delay for each cell
Cumulative delay (or arrival time) for each cell

Reading Timing Reports: Footer
---------------------------------------------------------------------------------- Footer includes
Timing slack : -656ps (TIMING VIOLATION) timing slack
e
Start-point : DTMF_INST/TDSP_CORE_INST/DATA_BUS_MACH_INST/data_out_reg[0]/clk calculation.
End-point : port_pad_data_out[0]
Timing slack
nc
The footer section of the timing report shows the final calculation and includes
e
Start point
End point
a d
6/16/08
Given the timing report in the previous example:
e
What type of path is being checked?
input->reg, reg->reg, reg->output, or input->output?
nc
What logic gates are involved in the timing path?
How many levels of logical hierarchy does the path go through?
e
Which clocks are launching and capturing the data?
d
What is the clock period of the design?
What is the output delay of the path?
a
What is the clock uncertainty?
c
What is the arrival time?
What is the required time?
Why does the path violate timing, and how can it be fixed?

Other Synthesis Reports
report area Prints an exhaustive hierarchical area report
e
report datapath Prints a datapath resources report
report design_rules Prints design rule violations
report gates
report hierarchy
summary
nc
Reports libcells used, total area, and instance count
Prints a hierarchy report
e
report instance Prints an instance report
d
report memory Prints a memory usage report
report messages Prints a summary of error messages that have been
a
issued
c
report power Prints a power report
report qor Prints a quality of results report
report timing Prints a timing report
report summary Prints an area, timing, and design rules report

Logic synthesis
Introduction
Elaborating design
ce
n
d
e
a
c
Physical synthesis
Fundamentals

Starting RTL Compiler
The first step is ensure that RTL Compiler has been properly installed on a
computer server to which you have access. Check with your system
e
administrator to ensure that RC is installed and working properly.
c
To invoke RC from the UNIX prompt, type the following:
unix% rc
n
A message appears similar to the following, as well as the rc_shell prompt.
e
Now you can enter commands directly at this prompt and synthesize a design.
d
Checking out license 'RTL_Compiler_Ultra'... (0 seconds elapsed)
License RTL_Compiler_Ultra checkout failed
a
Checking out license 'RTL_Compiler_Verification'... (0 seconds
elapsed)
c
Cadence Encounter(r) RTL Compiler
Version v06.20-s019_1 (64-bit), built Mar 8 2007
rc:/>
Viewing the Log File

Log files will list every command issued to logic synthesis, as well as
the tool’s response to such commands.
such as vi or emacs.
ce
Log file is a text document that can be viewed using any text editor
n
Checking out license 'RTL_Compiler_Ultra'... (0 seconds elapsed)
License RTL_Compiler_Ultra checkout failed RC license checkout info
Checking out license 'RTL_Compiler_Verification'... (1 seconds elapsed)
e
RC version
Cadence Encounter(r) RTL Compiler
Version v06.20-s019_1 (32-bit), built Mar 8 2007
d
Welcome message
========================================================================
a
Welcome to Encounter (TM) Encounter(r) RTL Compiler
Any line that begins with
======================================================================== rc:/> is a command
issued to RC.
c
rc:/> source config/libraries_virage.tcl All other lines are RC’s
response.
rc.log

Logic synthesis
Introduction
Elaborating design
ce
n

d e
a
c
Physical synthesis
Fundamentals
Physical Synthesis Fundamentals

Physical synthesis is the
integration of logic synthesis and RTL
e
placement.
Logical synthesis and placement
c
optimizations can be run Logic Timing
Library
concurrently in a single Synthesis
n
executable.
Physical information, usually Netlist
e
reserved for physical design
(place/route) tools, is used by the
d
physical synthesis tool to optimize Floorplan Floorplan
the design.
a
Physical library Floorplan
Netlist
Floorplan information
6/16/08
c
Timing is more accurate, because
the wiring estimations are based
on real placement, but run times
are typically higher because of the
additional placement steps done.
Placement

Physical
Library
Physical
Synthesis
272
RC Physical Methodology
Introduced by Cadence in RTL Compiler 7.1
e
Incorporates physical/process data into interconnect delay calculations
Consists of two RTL Compiler features:
QoS Prediction
nc
Physical Layout Estimation (PLE)
d e
ca
Physical Layout Estimation

Physical layout estimation (PLE) is a method and model to calculate
interconnect delay as an alternative to wire load models.
into account.
ce
PLE uses a proprietary algorithm that takes design and vendor process data
Other significant differences between WLMs and PLE are summarized below:
en
a d
6/16/08
Synthesis Flow with PLE Enabled
Inputs Steps Commands
e
.lib Read tech lib set_attr library
c
cap set_attr lef_library
LEF Read physical lib info
Table set_attr cap_table_file
RTL
n
Read Verilog source files
e
read_hdl
d
Elaborate design elaborate
a
DEF .sdc Apply constraints read_sdc
c
set_attr def_file
Define DFT controls and DRC
Basic Flow
Map (PLE driven)

PLE Flow
Quality-of-Silicon Prediction
Quality of silicon (QoS) prediction
e
Targets the long nets PLE cannot estimate—the last 10-20% of nets.
Invokes the SoC Encounter® Silicon Virtual Prototyping (SVP) feature

from within the RC session.
nc
Uses SVP to perform trial place and route, so loading on long nets can
be estimated properly.
d e
Works in concert with PLE, and maximum predictability is achieved
when both features are enabled.
ca
Synthesis Flow with QoS Prediction Enabled
Inputs Steps Commands
DEF LEF
cap
Table
ce
Read physical lib info
set_attr lef_library
set_attr cap_table_file
set_attr def_file
en Map (PLE driven) synthesize –to_map
d
QoS prediction
predict_qos
(silicon virtual prototyping)
ca
Basic Flow
QoS Flow
Incremental optimization
Generate reports
synthesize –to_map -incr
report_timing,
area, power, qor, etc.
Summary
We have discussed the following topics in this module:
e
Major phases of logic synthesis
nc
d e
Fundamental concepts of physical synthesis
Integration of logic synthesis and placement
a
Usage in the flow with PLE and QoS prediction
6/16/08
True or false
e
1. Technology-independent optimization takes place before technology-
dependent optimization.
nc
2. Boundary optimization takes place during technology mapping.
3. Two-level minimization generates smaller designs than multilevel

minimizations.
mapping.
d e
4. A Boolean network is generated immediately after technology
a
5. Physical synthesis is the integration of floorplanning and placement.
6/16/08
Learning Activity
e
Study a log file after synthesis, including a timing report
Explain the optimization stages of the synthesis flow
c
n
e
a d
6/16/08
Floorplanning and Placement
Module 5
June 16, 2008
An Apartment Building vs. a Chip

In many ways, an apartment building and a chip are alike.
ce
en
a d
6/16/08
How?
Built in layers from the ground up
Silicon
ce
en
a d
6/16/08
How? (continued)
Electrical wiring
ce
Made up of building blocks
en
a d
Bricks in the case of apartments
Silicon atoms, dopants, and metals in the case of microchips
6/16/08
How? (continued)
Built using a floorplan
“Rooms” have explicit functions
ce
en
a d
6/16/08
Module Objectives
e
Articulate the steps in floorplanning and power planning
Articulate the steps in timing-driven placement and re-order scan
nc
d e
ca
Recall the flowchart diagram of the
design flow steps required to take an
e
idea to product (chip).
c
In which part of the flow does
floorplanning occur?
n
In which part of the flow does
placement occur?
e
Design Flow
Input/Output ? ?
Step
d
? Input/Output
a
? ?
6/16/08

Floorplanning
e
Power planning
Placement
nc
d e
ca
Floorplanning
Definition
e
Implementation flow overview
DEF file
How to floorplan
nc
Floorplanning inputs and outputs
e
Module constraint types
d
Pin placement
ca
What Is Floorplanning?
Floorplanning is the process of deriving the die size, allocating space for soft
blocks, planning power, and macro placement.
Example:
ce E C
n
F G
e
D
B A
d
F B
G A
ca C D

Implementation Flow Overview
RTL
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
ca Clock Tree Synthesis
Route
GDSII GDSII
Floorplanning: Inputs and Outputs

Inputs
Gate-level netlist from output of
e
logic synthesis Gates
c
Constraints (SDC) are needed so
that timing with STA can be
accurate and measured against
n
the specifications of the design Tech
Constraints Lib
Timing library (.lib) contains the
e
timing information for each
discrete logic gate or macro Floorplanning
d
Physical library (LEF) contains
information about the shape and
a
Phys
connectivity of the technology Lib
library cells
Outputs
6/16/08
c
Floorplan of the design, which is
saved in the form of a DEF file

Floorplan
292
What Is Design Exchange Format (DEF)?
Definition: A specification for
representing logical connectivity
e
and physical layout of an
Gates
integrated circuit in ASCII format
Example: A DEF file is used to

describe all the physical aspects
of a design, including die size,
nc Constraints
Tech
Lib
e
connectivity, and physical location
of cells and macros on the chip. It Floorplanning
d
contains floorplanning information
such as standard cell rows,
a
Phys
groups, placement and routing Lib
blockages, placement constraints,
c
and power domain boundaries. It
also contains the physical
representation for pins, signal DEF
routing, and power routing,
including rings and stripes.
DEF File Example

VERSION 5.6 ;
DIVIDERCHAR "/" ;
BUSBITCHARS "[]" ;
DESIGN DSP ;
UNITS DISTANCE MICRONS 2000 ;
e
PROPERTYDEFINITIONS
COMPONENTPIN designRuleWidth REAL ;
DESIGN FE_CORE_BOX_LL_X REAL 2.8 ;
Header information
DESIGN FE_CORE_BOX_UR_X REAL 1997.2 ;
DESIGN FE_CORE_BOX_LL_Y REAL 2.8 ;
c
DESIGN FE_CORE_BOX_UR_Y REAL 3997.2 ;
END PROPERTYDEFINITIONS
DIEAREA ( 0 0 ) ( 4000000 8000000 ) ;
ROW CORE_ROW_0 UMC13FSNSITE 5600 5600 FS DO 4986 BY 1 STEP 800 0 ;

ROW CORE_ROW_1 UMC13FSNSITE 5600 11200 N DO 4986 BY 1 STEP 800 0 ;
n
ROW
ROW
…
CORE_ROW_2 UMC13FSNSITE 5600 16800 FS DO 4986 BY 1 STEP 800 0 ;
CORE_ROW_3 UMC13FSNSITE 5600 22400 N DO 4986 BY 1 STEP 800 0 ; Area and rows
ROW CORE_ROW_1424 UMC13FSNSITE 5600 7980000 FS DO 4986 BY 1 STEP 800 0 ;
e
TRACKS Y 1200 DO 5000 STEP 1600 LAYER ME8 ; TRACKS X 1200 DO 2500 STEP 1600 LAYER ME8 ;
TRACKS X 500 DO 5000 STEP 800 LAYER ME7 ; TRACKS Y 1200 DO 5000 STEP 1600 LAYER ME7 ;
TRACKS X 400 DO 5000 STEP 800 LAYER ME5 ; TRACKS Y 400 DO 10000 STEP 800 LAYER ME5 ;
d
TRACKS X 400 DO 5000 STEP 800 LAYER ME3 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME3 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME2 ;
Routing tracks
TRACKS Y 400 DO 10000 STEP 800 LAYER ME1 ; and GCell information
a
GCELLGRID X 3992400 DO 2 STEP 7600 ;
GCELLGRID Y 7992400 DO 2 STEP 7600 ;
c
PINS 765 ;
- ADC0[0] + NET ADC0[0] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7524700 ) E ;
-ADC0[1] + NET ADC0[1] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7528700 ) E ;
Pins
- TST_SEL + NET TST_SEL + DIRECTION INPUT + USE SIGNAL + LAYER ME4 ( -300 0 ) ( 300 2000 ) + FIXED ( 3501690 8000000 ) S ;
…
END PINS
SPECIALNETS 2 ; - DVSS ( * VSS )

+ USE GROUND ;
- DVDD ( * VDD ) + USE POWER
; Special nets
END SPECIALNETS
END DESIGN

DEF File Syntax
[VERSION statement ] [NONDEFAULTRULES statement]
[DIVIDERCHAR statement] [REGIONS statement]
[BUSBITCHARS statement]
DESIGN statement
[TECHNOLOGY statement]
ce [COMPONENTS section]
[PINS section]
[PINPROPERTIES section]
[UNITS statement]
[HISTORY statement]
e
[PROPERTYDEFINITIONS SECTION ]
n [BLOCKAGE section]
[SLOTS section]
[FILLS section]
[DIEAREA statement]
a
[ROWS statement]
[TRACKS statement]
d [SPECIALNETS section]
[NETS section]
[SCANCHAINS section]
c
[GCELLGRID statement]
[VIAS statement]
[STYLES statement]
6/16/08
[GROUPS section]
[BEGINEXT section]
[END DESIGN statement[
How to Floorplan
When the design is imported into the tool, a default die size is
calculated and displayed, and each module is assigned a physical
e
representation using a default placement density of 70% and aspect
ratio of 1.
nc
Each unit represents a particular module in the design.
Floorplanning allocates position and area to each unit.
d e
ca
How to Floorplan (continued)
Position the modules and blocks in the die area. In general, position
the modules and blocks such that the area of the bounding rectangle
e
is minimum or meets the die size requirement. Try different
orientations, aspect ratios, and placement densities of the modules to
puzzle fit them into the die area.
nc
The bounding rectangle represents the die area.
d e
ca

Identify modules that should be placed close together.
Tool shows flightlines (lines showing number of connections) between the
ce
modules. The higher the flightlines between two modules, the closer these
modules will have to be within the design.
Flightlines indicate how much communication occurs between two
n
modules.
The diagram below shows how to floorplan optimally. The numbers
modules.
d e
over the flightlines indicate the number of nets between corresponding
a
121
B B
D
A
c
34
D 57
E
C A
C
152 104
E
Example: The design below shows the flightlines between one of the
modules and its macro on the right side of the die area, as well as with
e
other modules that communicate with it.
nc
d e
ca
Module Constraint Types

The size of the design and of each module is initially calculated by the tool
during design import and assigned one of the following constraints.
Type
None
Definition
ce
Contents of module are placed without any constraint.
n
Guide Module is placed in core design area. It guides placement
of the module’s cells in the vicinity of guides location.
Fence
d e
Fence is a hard constraint in core design area. Design for
the module is self-contained within the rigid outline of a
fence.
a
Region Same as a fence, except that instances from other modules
can be placed within its physical outline.
c
Soft Guide Similar to guide, except that there are no fixed locations.

Module Constraint Types (continued)
ce
en
a d
6/16/08
Pin Placement
There are two ways to handle pin placement, using a bottom-up or top-
down approach.
Bottom up
ce
Pins are initially placed along with the cells in a block to optimize their
placement with respect to that block.
n
The top-level floorplan is finished, and pin placement is re-optimized
considering both top-level goals and block timing.
Top down
d e
The pins are initially placed in the top-level floorplan to optimize their
placement on a global level.
a
Then, their location is fixed within a block, and the block level cells are
placed.
c
Finally, the pin placement is re-optimized considering both top-level
goals and block timing.
Use bottom up if the top-level design is incomplete so progress can be
made at the block level. Use top down if the top-level design is near
complete so that you can account for the inter-block connections.
Pin Placement Goals
Identifying critical paths and making placement tradeoffs to optimize
the critical paths
Wire length reduction
ce
Achieving timing by reducing the amount of block-to-block or IO-to-block
interconnect
en
Achieving via-free direct routes
Achieving accurate pin matching between hierarchical boundaries
Optimizing pin placement with respect to routing congestion
d
Pin spacing variation in congested areas
a
6/16/08

Floorplanning
e
Power planning
Placement
nc
d e
ca
Power Planning
Definition
e
Goals
Need for power planning
Basics of power planning
Early planning for power
nc
e
Types of power routing
d
Steps involved in power routing
a
Multiple supply voltages
6/16/08
What Is Power Planning?

Definition: The task of creating the global power plan for a design. These are
typically created as VDD/VSS rings and stripes.
Example:
ce
en
a d
6/16/08
What Are Voltage (IR) Drop and Electromigration?
Voltage (IR) drop is the voltage drop across a chip’s power network
caused by current and resistance associated with the power network.
ce
Electromigration (EM) is the mechanical failure of metal wires because
of metal atoms migrating over a long period of time due to high current
densities, causing open circuits, short circuits, or unacceptable
n
increases in resistance.
d e
ca
Power Planning Goals

To design a global power distribution network that supplies the
appropriate power and ground nets to all the instances of the design
ce
To size the power wires and choose the metal layers necessary to
deliver the required power to different parts of the chip without causing
failure
en
a d
6/16/08
Need for Power Planning
Power-related issues can
e
Affect chip timing due to excessive rail voltage drop (“IR-drop”) and
ground bounce
c
Lead to complete device failure due to electromigration effects
n
d e EM Failures as seen though a Scanning
Electron Microscope (SEM)
ca
The effects of IR-drop and other power-related issues can be limited by
Good power-grid design
Sufficient VDD and VSS pads
Basics of Power Planning

Ensure adequate power and ground
connections by including the following
e
basic elements into the power network.
c
Power pads that supply power to
the chip
n
Power rings around the periphery
of the die that carry power to the
e
standard cells and macros
Rings are put on higher level
d
routing layers leaving the lower
layers for signal routing
ca
Power rails and trunks that cross
the entire die or sections of the die

Early Planning for Power
Simulation of major power dissipation components
e
Quantification of chip power
Total chip power
Maximum power density
nc
Total chip power fluctuations
Power grid analysis
d e
Allocation and coordination of chip resources
Wiring tracks for power grid
a
Low Vt devices
Dynamic circuits
6/16/08
c
Clock gating
Placement and quantity of decoupling capacitors
Types of Power Planning

Trunks and rings
Used for upper level routing
the block
Uniform grid
ce
Rings are placed around blocks to assure even power distribution within
n
Usually used inside lower level partitions
e
a d
6/16/08
Trunks and Rings Methodology
Each block has its own ring
G V G V
structure
Each block has a trunk that
ce
connects the top level to the block block 3
V
Rings can be shared between block 5
n
G
abutted blocks
G
Requires less routing resources
e
block 2
V
Changes in design may require block 4
d
changes to power structure
V
ca V
Ring block 1
Trunk
G
G V G V G V
Uniform Chip Grid Methodology

Robust and redundant power G V G V
network
Seen in microprocessors and
high-end large ASICs
ce
V
block 4
V
block 5
Primary distribution through upper
n
G
metal layers
G
Grids of different blocks need to
e
block 3
align with each other
V
block 4
d
G
ca block 1
V
G V G V G V

Power Planning
Power stripes
Specified and created by the chip designer, typically using a place/route tool
ce
Distribute power vertically within a ring
Typical power routing routes horizontally in metal 1 (including standard cell row
power rails) and vertically in metal 2
n
Metal 1
Power Stripe
Power Ring
Row of cells
d e
ca Metal 2
Power Planning (continued)

Power mesh
Meshes are created to cover large areas of a chip
directions
ce
Created by layers of power straps going in alternate vertical and horizontal
Distributes power across a chip so that IR drop and electromigration
n
targets are met
e
Example:
a d
6/16/08
Steps Involved in Power Routing
Create core power rings
e
Connect core power pads to the core power rings
c
n
Add power rings around the macros
e
Add power rails to the power plan for standard cell area
d
Modify power rails for macro power rings, routing blockages, and other restrictions
ca Add vertical and horizontal stripes to reduce IR drop at power rails of cells and macros
Connect power rails to cell power pins and extend to the power rings and connect with vias
Power pins of macros are tapped to core rings or power stripes
Connecting Power Rings Around Core

Followpins are used to
e
Route power/ground along the standard cell rows
Follows the pins of each cell and stitches them together
Connect ring is used to
nc
Connects these routes to power rings (and vertical stripes)
e
Connect dangling power routes to stripes/rings
d
Connect power rings to I/O power pads
ca
Power Consumption
Power on a chip is consumed when it is active (dynamic power) as
well as inactive (leakage power).
Leakage power
ce
Power consumed when cells are not switching
n
Main sources of leakage power are sub-threshold leakage currents, which
reduce linearly with supply voltage
e
Dynamic power
d
It is the power associated with switching of nets and cells
It is calculated as Power = f x C x V2
a
How can the power consumption on a chip be reduced?
c
Multiple Supply Voltages

Using multiple supply voltages is one method of reducing a chip’s
power consumption.
ce
It aims at minimizing the supply voltage level wherever possible.
Instead of the chip operating from single uniform supply voltage, a
n
range of supply voltages are assigned to different areas of the chip.
It also assigns separate power-nets to different blocks, and steps the
e
power-net voltages down wherever the chip and block performance
allow.
a d
6/16/08
Discussion Question
Assuming the following chip diagram, what considerations should be taken into
account when designing a power plan?
ce Block1
1.0V
en
a d Block2
0.8V
Block3
1.2V
6/16/08

Floorplanning
e
Power planning
Placement
nc
d e
ca
Placement
Definition
e
Placement goals
Standard cell placement
Timing driven placement
ECO placement
nc
e
Incremental placement
d
Boundary scan
a
Scan chain re-order
6/16/08
What Is Placement?
Definition: Process of placing the standard cells in a floorplanned design
e
Example: The diagram shows a die area with no cells (left), and the cells
placed within the die (right).
nc
d e
ca
Placement Goals
Goals of placement step are to
Guarantee that the router can complete the routing step
ce
Minimize all the critical net delays by placing cells close to each other,
thus reducing interconnect lengths
Minimize the die size as much as possible
en
Reduce routing congestions, if any
Good placement is essential for meeting timing goals
d
Bad placement can lead to sub-optimal routes and cause paths to fail
timing
ca
Standard Cell Placement

The core area of the die is defined by specifying the distance between edge
of the layout and core.
ce
Standard cells are placed in rows that are drawn within the core area.
Placement should be legalized, meaning standard cells are placed correctly

on the placement grid, not overlapping, and power pins of standard cells are
n
aligned correctly.
Placement should be routable and meet timing requirements.
d e
a
VDD VDD
CELL
c
GND GND
Standard Cell Row
Standard Cell Rows

in Core Area

Standard Cell Rows
Regular Orientation, Gap in Between Rows
e
VDD VDD Cells with
regular orientation
CELL
c
GND GND
Gap
n
VDD VDD
CELL
e
GND GND
d
Regular + Flipped Orientation, Shared Rows
a
VDD VDD Cell with
regular orientation
CELL
c
Shared rail GND GND Cell with
flipped orientation
CELL
VDD VDD
Cell Row Placement

There are three ways to arrange cell rows:
Sometimes a technology allows rows
e
to be flipped and abutted so the pairs
can share power and ground rails. This
c
is the most common approach.
Second configuration is to flip every
n
other cell row but leave a gap between
every two cell rows mainly for routing
purposes. Creates larger power rails
e
and densely packed cell structure.
Last configuration is to leave a gap
d
between every cell row and not flip the
rows. Useful when only two or three
a
metal layers are available for routing.
c
The command to run placement is
placeDesign

Timing-Driven Placement
Placement of standard cells takes into account the timing constraints.
Placer balances importance of meeting setup-type timing constraints with
routability.
ce
Placer identifies critical nets and performs placement to meet the
constraints. It pays less attention to meeting timing constraints on non-
n
critical nets, but more attention to enhancing routability.
Why do we need this?
d e
Growing interconnect versus gate delay ratios
Higher levels of on-die functional integration makes global interconnects
even longer
ca
Increased chip operating frequencies that makes timing closure tougher
Increased number of macros and standard cells for modern designs
Timing-Driven Placement (continued)

Timing-driven placement algorithms can be divided into two categories:
e
Path-based
Tries to minimize the longest path delay
optimization
Net-based
nc
Complexity is high since it maintains an accurate timing view during
on individual nets
d e
First transforms timing constraints into either length constraints or weights
This information is fed into a weighted wirelength minimization-based
a
placement engine, obtains new placement with better timing
Complexity is lower compared to path-based algorithms
6/16/08
Engineering Change Order Placement
Engineering change order (ECO) placement is used to place unplaced
cells to a partially or fully placed design.
ce
In a partially placed design, unplaced cells are placed in timing-driven
mode followed by legalization (overlap removal).
In a fully placed design, only legalization step takes place.
imported design by 10%.
en
Make sure that ECO logic changes do not exceed the previously
When ECO placement is run, it places only the cells that are unplaced.
d
It cannot move the cells that are fixed and makes only minor
a
modification to cells already placed.
The command to run ECO placement is
6/16/08
c ecoPlace
Incremental Placement
Incremental placement works on an already placed design to improve
overall quality and timing.
before placing the design
ce
To use incremental placement, the following command should be run
n
placeDesign –incremental
Regular placement
d e
The above command performs a two-pass placement flow.
a
Incremental placement
c
In addition to having placement information about all placed cells, it
maintains information about space available for adding new cells.

What Is Boundary-Scan Architecture?
Boundary-scan architecture
Is a method that enables the chip
ce
tester to test connectivity of the
I/O pins on the fabricated chip
Provides a means to test
n
interconnects between integrated
circuits on a board without using
e
physical test probes
Is synonymous with Joint Test
d
Action Group (JTAG)
a
JTAG is the name used for the
IEEE 1149.1 standard entitled
c
Standard Test Access Port and
boundary-scan architecture to test
access ports.
Boundary Scan
Boundary scan adds one or more
memory elements, called
e
boundary-scan cells, to each I/O
pin of the device, which can
c
selectively override the
functionality of that pin.
n
The collection of boundary scan
cells is configured into a parallel-
e
in, parallel-out shift register.
Test sequence is passed into the
d
shift register, and the data coming
out is compared.
a
Boundary scan cells do not
contribute to the functionality of
c
the internal core logic.
Test access port (TAP) controller
is a state machine whose
transitions are controlled by a
TMS signal.

Boundary Scan (continued)
JTAG interface, collectively known
as TAP controller, uses the
e
following signals to support
operation of boundary scan.
Test data is shifted around the
c
shift register in serial mode from
n
input pin Test Data In (TDI).
e
Test data is terminated at output
pin Test Data Out (TDO).
d
Test Clock (TCK) synchronizes
the internal state machine
a
operation.
c
Test Reset (TRST) is an optional
input pin to reset the TAP
controller’s state machine.
Test Mode State (TMS)
determines the next state.
Boundary Scan (continued)

Chain integrity testing
Basic form of testing by JTAG (tests that the JTAG devices meant to
be in the chain exist).
ce
Each JTAG compliant device contains an ID code.
Issuing a correct sequence of JTAG commands, the ID codes of all the
devices can be read out.
en
The ID codes read out from JTAG chain are compared with the actual
ID codes of the device. If they match, the JTAG chain is correctly
d
connected and the devices are in place.
a
Benefits of JTAG
Shorter test times
c
Higher test coverage
Increased diagnostic capability
Lower capital equipment cost

What Is a Scan Chain?
Scan chains are a technique used in Design for Test (DFT) to reduce
the time it takes on the tester to determine if a part is good or bad.
ce
All the registers in the design are connected in one or more scan
chains so that their inputs can be controlled and their outputs can be
observed.
en
Flip-flops have an extra signal called scan enable.
When Scan Enable is de-asserted, the flip-flop behaves normally and
passes the data.
d
When Scan Enable is asserted, all the flip-flops are connected into a long
a
shift register, with one end of the chain as primary input and the other end
primary output.
6/16/08
Scan Chain Re-Order

Testing is done by putting flops into this test mode, shifting in a test
vector, switching back to normal mode to clock (capture) the data, and
e
finally switching back to test mode to shift out the resulting flop values.
The resulting vector is compared with a known “good” vector to
nc
determine if the chip is functioning correctly.
Why do we need to re-order the scan chain?
e
During placement, cells are placed to meet functional timing and minimize
congestion, and the scan chain connectivity is ignored.
d
This results in long, inefficient routing between flops in the chain and
causes routing congestion.
a
Re-ordering the scan chain reduces congestion by connecting the
c
cells based on their placement.
It may cause hold time violations in the chain, and buffers may need to be
inserted to fix the same.

Discussion Question
In the following example, the scan chain after logic synthesis was ordered
alphanumerically by instance name.
result?
ce
How would you reorder the scan chain after initial placement to get the optimal
n
DFF U1 DFF U10
d
DFF U6
e DFF U8
ca DFF U5
DFF U7
DFF U3
DFF U2 DFF U4
DFF U9
Summary
The back-end flow starts with floorplanning. Here is where we get to
see the physical chip.
into the die area.
ce
Floorplanning is a puzzle-fitting stage, where we have to fit modules
Plan the power network with a view to distribute power efficiently
en
throughout the chip and meet the current requirements.
Placement of the cells and macros into the core area is to be done
with the ultimate goal of meeting timing and reducing congestion.
a d
Each step affects the overall goals of meeting timing and power
requirements. Quality time spent in floorplanning and power network
implementation reduces the number of iterations to achieving a
6/16/08
c
working chip that meets design specifications.

True or false
e
1. Boundary scan adds more functional logic to the existing internal
logic.
back-end flow.
nc
2. Timing-driven placement reduces the number of iterations through the
3. The DEF file contains information on the standard cell library.
nets.
d e
4. Timing-driven placement tries to first meet routability on the critical
a
5. In floorplanning, a guide is considered to be a rigid constraint.
6. The DEF file is saved in binary format and can only be read in by the
6/16/08
c
tool.
Learning Activity
e
Study several examples of bad floorplans
Identify the bad practices from each and how you would correct them
c
n
e
d
ca
Clock Tree Synthesis
Module 6
June 16, 2008
What Is the Difference?
e
Combinational Combinational
Combinational
logic 1 Combinational
FF FF logic 2 FF
logic 1 logic 2
nc
e
CLK
a
FF
d
Combinational
Combinational
logic 1
logic 1
FF Combinational
Combinational
logic 2
logic 2
FF
CLK
6/16/08
Module Objectives
e
Explain a clock tree and why you need to create one
Write a clock tree constraint file based on a given specification
c
Describe the benefits of using useful skew versus classical zero skew
n
d e
ca
Discussion Question
Recall the diagram of the design flow
steps to take an idea to product (chip)
ce
What part of the flow does Clock
Tree Synthesis (CTS) occur?
en ? Design Flow
Step
d
Input/Output ? CTS
a
?
6/16/08
Clock trees and clock tree synthesis
e
Clock tree specification
Analyzing CTS reports
Low-power clocking techniques
nc
d e
ca
What Is a Clock Tree?

In a synchronous digital systems, a
clock signal is used to define a time
e
reference for the movement of data
within that system.
Definition: A network of buffers
nc
inserted into the clock signal path
in such a way that the overall
delay from the generator to all
e
destinations is minimized.
d
Example: Instead of one electrical
signal path being optimized, the
a
path in the design was broken up
and strategically buffered to
c
minimize the delay. The resulting
network resembled a tree in that
the central clock signal branches
throughout the chip using these
buffers and ends up with the clock
signal reaching all of the leaf cells.

Need for a Clock Tree
When complexity (i.e., number of gates in a design) increases, the
need to distribute clock signals in a controlled manner becomes more
e
important.
c
Reasons why we need to build a clock tree:
Large chip area
Different flop densities
en
Non-uniform distribution of flops
All flops need to get clock signal at the same time
Power budget
a d
Clock routing: hard problem
c
The clock distribution network distributes the clock signal(s) from a
common point to all the elements that need it.
Ideal Clock
All flip-flops are clocked together
Simplifies clock analysis over hierarchical boundaries
Block1
ce
Used prior to clock tree insertion and place and route for timing analysis
Block2
n
A data
CLK1
e
B data
CLK2
Ideal
d
CLK C data
CLK3
ca CLK3
CLK2
CLK1
Ideal
CLK
Note: This diagram assumes zero clock skew and insertion delay.
Propagated Clock
Clock delays are extracted from clock tree routing
Clock skew is correctly modeled using propagated delay
Block1
ce
More accurate and used in final timing closure
Block2
n
A data
CLK1
e
B data
CLK2
d
Propagated C
CLK data
CLK3
ca CLK3
CLK2
CLK1
Propagated
CLK
Issues Involved in Clocking

Clock delivered to the memory elements from a signal pin
Different net lengths means different arrival time of clock at each flip flop
pin
ce
Delay and transition time affected by large number of elements connected to one
n
FF FF FF FF
d e
FF FF FF FF
a
Clock Source
c
FF FF FF FF
FF FF FF FF

Effects on Clock Signal
The factors that can cause harm to a clock signal are
e
Clock skew
Clock latency
Clock jitter
nc
d e
ca
What Is Clock Skew?

The measure of the difference of
FF
delay between the minimum and
e
maximum time it takes the clock to
reach different leaf cells CTS Inserted
c
Buffers FF
Typically hurts performance of the
design, although in some cases
n
Clock
helps achieve timing targets Source
(useful skew) Minimum Insertion Delay
e
Caused by a clock tree with Maximum Insertion Delay
unbalanced branches occurring
d
Same clock source
due to
a
Different types of buffers Clock
source
Varying capacitance and
c
resistance values of nets FF1
Gating components
Off-chip or on-chip variations FF2
Different arrival times at FF

What Is Zero Skew?
The conventional approach to clock tree generation is called the zero skew
or classical skew approach.
ce
The clock tree is treated as ideal.
All combinational blocks must fit into the same fixed time period.
n
All registers are clocked at the same time.
e
You do not need knowledge of signal timing.
Clock skew is made as small as possible to take advantage of full

clock period.
a d
A good classical skew minimization strategy does not necessarily
correlate with good performance.
6/16/08
What Is Useful Skew?

Useful skew is a technique that takes advantage of the difference of
arrival time at flip-flops to correct datapath timing violations.
advantage.
Helps meet setup and hold time.
ce
Increased latency but decreased clock period provides a net timing
period.
en
Some combinational paths require more time than the allowed clock
d
Adjusting clock delays to registers allows allocation of more time to
some paths and less on others.
ca
Time is borrowed from neighboring paths that have positive slack.
Useful skew can be done pre-CTS or post-CTS.

Example of Useful Skew
Time Period 4 ns Time Period 4 ns
Propagated clock
e
Delay Delay
Delay Delay
of of
FF of
5 ns FF of
2ns FF
c
5 ns 2ns
en
d
1 ns margin
Propagated clock
obtained by Delay Delay with useful skew
a
Delay Delay
speeding up of of
FF of
5 ns FF of
2 ns FF
source clock 5 ns 2 ns
6/16/08
c 1ns
Example of Useful Skew (continued)

Propagated clock
e
Delay Delay
Delay Delay
of of
FF of
5 ns FF of
2 ns FF
c
5 ns 2 ns
en
d
Propagated clock
Delay Delay with useful skew
a
Delay Delay
of of
FF of
5 ns
5 ns
FF of
2 ns
2 ns
FF
6/16/08
c 1ns

1 ns obtained by
delaying target clock
358
In a circuit after clock tree synthesis and a clock period of 5 ns, there is a
-1 ns worst-case negative slack.
ce
Will useful skew always improve the timing path?
How could you check to see if useful skew could be of benefit?
n
What other ways could you improve the timing of this path?
d e
ca
What Is Insertion Delay?

Insertion delay is the time clock signal (rise or fall) takes to propagate from the
clock definition point (root) to a register clock pin (leaf cells).
e
Insertion delay is also known as clock network latency.
c
n
FF
d e
CTS Inserted Buffers
FF
a
Clock Source
c
Minimum Insertion Delay
Maximum Insertion Delay

What Is Clock Jitter?
Jitter is the undesired variation or fluctuation of a
signal with respect to its ideal position in time. Ideal clock
e
period
The common sources of jitter are
Internal circuitry of the phase-locked loop (PLL)
c
Random thermal or mechanical noise from a
crystal vibration
n
Other resonating devices Jitter
Signal transmitters
e
Crosstalk
VCC sag
Ground bounce
a d
Electromagnetic Interferences from nearby
devices
c
There are three types of clock jitter:
Period Jitter
Cycle-to-cycle jitter
Long-term jitter
What Is Period Jitter?

The deviation in the output clock
Ideal clock
transition from the ideal position.
e
period
The deviation is either leading or
lagging the ideal position.
c
Ideal clock
Measured and expressed in time or edge location
frequency
n
Period Jitter
Used to calculate timing margins in
e
systems
a d
6/16/08
What Is Cycle-to-Cycle Jitter?
Change in a clock’s output Ideal clock Lesser clock Ideal clock
transition from its corresponding period (T) period (T1) period (T)
e
position in the previous cycle.
c
Large cycle to cycle jitter can Jitter = T-T1
cause a system to fail.
n
Most difficult type of jitter to
measure.
d e
ca
What Is Long-Term Jitter?

Also known as phase jitter Ideal clock
period
Measures the maximum change in
e
Ideal clock
edge location
a clock’s output transition from its
Cycle 0
c
ideal over a large number of Ideal clock
cycles edge location
Cycle N
n
Long-term Jitter Jitter
d e
ca
Types of Clock Trees
A clock tree can be implemented in the following styles:
e
Binary tree
H tree
nc
d e
ca
Clock Trees: Binary Tree

a
Clock delay is identical for all elements.
Length of a to d = Length of a to g
e
Same length, so same delay.
b
c
Results in a clock skew between the
c
clock signals at d and g.
Drawback
en
The branch affect – The clock signals
from b to e and f contribute a capacitance
d e f
Conceptual structure
g
d
that would actually increase the delay
d
from a to g.
a
As the size of the clock distribution tree
e
increases, the effects on clock signal a b c
c
become worse.
f
g
Physical structure

Clock Trees: H Tree
First two stages resemble the letter “H”
Maintains distributed interconnects
part of the chip
ce
Provides equal propagation delays to each
Minimizes skew by making connections to
n
the memory elements in equal lengths
Clock Source
e
Drawback
Total wire lengths is much greater compared
d
to standard clock tree
Increased capacitance of the H-tree structure
ca
What Is Clock Tree Synthesis?

To ensures that the system will work correctly at the required clock
frequency, a clock tree needs to be designed to synchronize memory
e
elements such as rams and flops.
nc
Definition: Process of inserting buffers in the clock path, with the goal
to minimize clock skew and latency to optimize for timing
Example: We ran clock tree synthesis on the example block and saw
e
a large clock skew due to bad clock constraints. We ended up re-
running clock tree synthesis with better constraints to get an optimal
result.
a d
6/16/08
Need for Clock Tree Synthesis
Clock signals are typically loaded with the greatest fanout.
e
Differences and uncertainty in the arrival times of the clock signals can
severely limit the maximum performance of the entire system.
nc
Design needs to be operated at the highest speeds of any signal.
Clock signals are affected by technology scaling (Moore’s law).
e
Long global interconnect lines become significantly more resistive as
line dimensions are decreased.
a d
Catastrophic race conditions can be created in which an incorrect data
signal may latch within a register.
6/16/08
CTS: Inputs and Outputs

Inputs
Clock tree specification file
e
Verilog® netlist
Timing library, which contains the timing
c
information for each discrete logic gate or
macro Tech
Netlist File
n
Physical library, which contains
information about the shape and Clock Tree Specification file DEF
File
e
connectivity of the technology library cells
Placement information such as a DEF file Phys
Lib
d
Outputs
Netlist with clock tree inserted
a
Reports on the results of the run in ASCII
text or HTML format
c
Routing guide files for clock tree Routing Macro
Netlist Reports
Guides Models
preroutes to be used during trial routing
Macro model files for partitions or
modules

Where Does CTS Fit in the Implementation Flow?
RTL
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
Route
GDSII GDSII
After CTS
Clock buffer tree is built to balance output loads and minimize clock skew.
e
Buffers can be added to the network to meet the minimum insertion delay
c
FF FF FF FF
en FF FF FF FF
a
Clock Source
d FF FF FF FF
6/16/08
c FF FF

FF FF
372
Goals of CTS
Deliver clock to all memory elements with
Acceptable skew
ce
Least amount of insertion delay
Deliver clock edges with acceptable sharpness
en
a d
6/16/08
Steps Involved in CTS

An initial placement of the logic
Initial placement
cells should be completed.
e
of core logic
This ensures that the timing
performance of the core logic is
c
met. Scope of clock tree
First define/understand scope or
extent of the clock tree
en
This would include items such as
total load, routing area, distance
Define clock tree
constraints
CTS
d
the clock has to travel, available
routing layers, and routing
Define clock tree
restrictions.
a
topology
6/16/08
Insert clock tree
Routing of clock tree
374
Steps Involved in CTS (continued)
Define the constraints that the
Initial placement
clock tree must satisfy.
e
of core logic
Include minimum and maximum
c
insertion delay and maximum
skew Scope of clock tree
n
This is part of the clock tree
specification file. Define clock tree
e
constraints
CTS
a d Define clock tree
topology
6/16/08
Insert clock tree
375

Define the way the clock tree
Initial placement
topology will be generated,
e
of core logic
including
Number of levels or buffer stages
c
in the tree Scope of clock tree
Type of buffers/inverters
Fanout limit at each level
The topology can be defined
en Define clock tree

constraints
CTS
manually by the designer or
d
automatically by a clock tree
generator tool. Define clock tree
a
topology
This is part of the clock tree
c
specification file.
Insert clock tree

The clock tree is inserted, taking
Initial placement
into account the location of the
e
of core logic
logic cells.
c
The buffers are placed or inserted
in strategic placed to minimize the Scope of clock tree
clock delay and routing.
en Define clock tree

constraints
CTS
topology
6/16/08
Insert clock tree
377

The routing is completed for all
Initial placement
clock signals simultaneously along
e
of core logic
with optimization for meeting all
timing goals.
c
This step is optional and can be Scope of clock tree
done along with CTS or with the
n
routing phase.
Define clock tree
e
constraints
CTS

topology
6/16/08
Insert clock tree
378
CTS Operation Modes
There are two modes for running CTS:
e
Manual CTS allows user to control
Number of levels
Number of buffers
nc
Types of buffer at each level
Automatic CTS automatically determines the number of levels and

buffers
d e
Numbers depend on timing constraint in the clock tree specification file.
CTS traces the clock net through buffers, inverters, and gated elements.
a
In most cases, you would use automatic CTS. In case you have issues with
c
the clock tree (skew, etc.), you can specify the CTS manually. In some
cases where the design is very regular or very high speed, an experienced
designer will manually specify the CTS constraints to better control the
output.

Clock trees and clock tree synthesis
e
Clock tree specification
Analyzing CTS reports
nc
d e
ca
CTS Guidelines
There are two CTS modes for specifying the clock tree:
Manual CTS
Automatic CTS
ce
Both modes require a clock specification file to create the clock tree.
n
In manual CTS mode, the clock tree structure has to be specified by
the user.
d e
In automatic CTS mode, the tool automatically creates the clock tree
structure from the specification file.
a
Automatic CTS is the preferred method of creating the clock tree.
6/16/08
CTS Guidelines (continued)

A clock tree specification file can be created by one of three methods:
e
Using the Create Clock Tree Spec form (GUI)
Using the createClockTreeSpec command
nc
Using the specifyClockTree command with –template parameter. This
method creates a basic clock tree specification template file,
template.ctstch.
d e
Each method is similar and will allow the user to easily create a clock tree
specification. The first option uses the GUI, whereas the other commands
use the command line.
ca
The GUI command allows the user to fill in all of the values, and then a
clock specification file is generated. In the other commands, a template is
created and the user must modify the values.

Contents of the Specification File
The sections of the clock tree
specification file must appear in the
e
order given below. Individual statements
within each section can appear in any
c
order. The contents in the specification CLOCK SPEC FILE
file are
n
Timing Constraint File
Timing constraint file (optional) Naming Attributes
e
Naming attributes (optional) Macro Model Data
Clock Grouping Data

Macro model data (optional)
d
Router Attributes
Clock grouping data (optional) Requirements for manual/automatic
a
and Gated CTS
Attributes used by the routing tool
c
(optional)
Requirements for manual CTS or
automatic, gated CTS

Defines the timing constraints for
use during CTS
Must be the first statement of the
clock tree specification file
Example
ce CLOCK SPEC FILE
n
TimingConstraintFile /path/cts.tcl
Naming Attributes
e
Macro Model Data
Clock Grouping Data
d
Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
Naming Attributes
Allows user to customize the
name delimiter that CTS uses
e
when inserting buffers and
updating clock root and net names
c
CLOCK SPEC FILE
The UseSingleDelim command
instructs CTS to use a single
n
character, instead of multiple Naming Attributes
e
characters for the given delimiter. Macro Model Data
Default: clk__L3_I2
Clock Grouping Data
d
With UseSingleDelim YES: Router Attributes
clk_L3_I2 Requirements for manual/automatic
a
and Gated CTS
Example
c
UseSingleDelim YES
NameDelimiter #
Macro Model Data

A macro model is a block with
synthesized clock trees, and thus
e
has delays have to specified for
pins.
c
CLOCK SPEC FILE
Example
n
MacroModel pin m1/clk 20ps 18ps 20ps
18ps 30ff Naming Attributes
e
Macro Model Data
Clock Grouping Data
d
Router Attributes
a
and Gated CTS
6/16/08
Clock Grouping Data
Specifies two or more clock
domains for which you want CTS
e
to balance the skew
c
The arguments are the clock root
CLOCK SPEC FILE
pin names.
n
Example
Naming Attributes
ClkGroup
e
Macro Model Data
+ U1/CGEN_1
Clock Grouping Data
d
+ U2/CGEN_2
Router Attributes
a
and Gated CTS
6/16/08
Router Attributes
Defines attributes that CTS
passes to the router for routing the
e
clock net.
c
Example
CLOCK SPEC FILE
RouteTypeName CK1
n
NonDefaultRule rule1
Naming Attributes
PreferredExtraSpace 1
e
TopPreferredExtraSpace 1 Macro Model Data
BottomPreferredLayer 5 Clock Grouping Data
d
Router Attributes
a
and Gated CTS
6/16/08
Requirements for Manual/Automatic CTS and Gated CTS
All of the “optional” clock tree
specification sections were
e
mentioned in the previous
sections.
c
CLOCK SPEC FILE
In the next few slides, we will
discuss the requirements for the
n
following Naming Attributes
e
Manual CTS Macro Model Data
Automatic CTS Clock Grouping Data
d
Clock Gated CTS Router Attributes
a
and Gated CTS
6/16/08
CTS Operation Mode: Manual

Manual CTS specification file
e
ClockNetName CK
LevelNumber 2
To Flip flops
c
LevelSpec 1 2 BUFX2
LevelSpec 2 16 BUFX3
n
PostOpt YES
e
OptAddBuffer YES CK CK
End
a d
To Flip flops
Level 1, 2 BUFX2
c
Level 2, 16 BUFX3

CTS Operation Mode: Automatic
Automatic CTS specification
file
AutoCTSRootPin
e
Phase Delay 1
AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns clk_out/Y
Flip Flops
MaxFanout 30 CTS Buffer 2
SinkMaxTran 500ps
n
Sink Input Max Skew
CTS Buffer 1 Transition Time
BufMaxTran 500ps Phase Delay 2 FPU/CORE
MaxSkew 600ps or can be a
e
std cell
NoGating NO CTS Buffer 4
MaxDepth 10 Buffer Input
Pin FPU/CORE/A
CTS Buffer 3
d
Transition Time
RouteType CLK1_ROUTE
DetailReport YES XPU/CAM
a
RouteClkNet YES
PostOpt YES CTS Buffer 5
c
OptAddBuffer YES Pin XPU/CAM/C
Buffer BUFX2 BUFX4 BUFX8
INVX1 INVX2 INVX4
End
Example: Clock Specification File

AutoCTSRootPin clockRootPinName Automatic CTS specification file
Specifies the name of the clock root
e
pin name from which to start tracing AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End

Example: Clock Specification File (continued)
MaxDelay number{ns|ps} Automatic CTS specification file
Specifies the maximum insertion
e
delay. If this statement is not specified, AutoCTSRootPin clk_out/Y
the tool automatically sets the delay to MaxDelay 5ns
c
10 ns MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

MinDelay number{ns|ps} Automatic CTS specification file
Specifies the minimum insertion delay.
e
If this statement is not specified, the AutoCTSRootPin clk_out/Y
tool automatically sets the delay to MaxDelay 5ns
c
0.0 ns MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

MaxFanout integer Automatic CTS specification file
Limits the number of leaf cells
e
connected to the clock buffer at the AutoCTSRootPin clk_out/Y
last stage of the clock tree. MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

SinkMaxTran number{ns|ps} Automatic CTS specification file
Specifies the maximum input transition
e
time constraint for the sinks. The AutoCTSRootPin clk_out/Y
maximum value is 10,000 ns. The MaxDelay 5ns
c
default value is 400 ps. MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

BufMaxTran number{ns|ps} Automatic CTS specification file
Specifies the maximum input transition
e
time constraint for buffers. The AutoCTSRootPin clk_out/Y
maximum value is 10,000 ns. The MaxDelay 5ns
c
default value is 400 ps. MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

MaxSkew number{ns|ps} Automatic CTS specification file
Specifies the maximum skew between
e
sinks (clock pins). The default value is AutoCTSRootPin clk_out/Y
300 ps. MaxDelay 5ns
c
The lower the skew, the better the MinDelay 0ns
clock tree, and hence the better overall MaxFanout 30
n
timing performance for the design. SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

NoGating { rising | falling | NO} Automatic CTS specification file
Sets the criteria for tracing through logic
e
gates AutoCTSRootPin clk_out/Y
Rising: Stops tracing through a gate MaxDelay 5ns
c
(including buffers and inverters) and MinDelay 0ns
treats the gate as a rising-edge- MaxFanout 30
triggered flip-flop clock pin.
n
SinkMaxTran 500ps
Falling: Stops tracing through a gate BufMaxTran 500ps
(including buffers and inverter) and
e
MaxSkew 600ps
treats the gate as a falling-edge-
triggered flip-flop clock pin. NoGating NO
d
MaxDepth 10
NO: Default behavior for gated-clock
designs. Allows CTS to trace through
a
clock gating logic. DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

MaxDepth number Automatic CTS specification file
Sets the maximum depth of clock tree
e
tracing. The default value is 1024, i.e., AutoCTSRootPin clk_out/Y
CTS limits the number of levels of MaxDelay 5ns
c
clock tree tracing to 1024. MinDelay 0ns
Tracing is done by CTS (before MaxFanout 30
n
inserting buffers) to understand the SinkMaxTran 500ps
logical structure of the design and see BufMaxTran 500ps
that there are no feedback loops.
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

RouteType routeTypeName Automatic CTS specification file
Specifies the name of the clock whose
e
routing attributes are being defined. AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

DetailReport YES | NO Automatic CTS specification file
Determines whether CTS provides a
e
detailed report, which includes timing AutoCTSRootPin clk_out/Y
information for every component in the MaxDelay 5ns
c
design. Default behavior is not to MinDelay 0ns
generate a detailed report.
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

RouteClkNet YES | NO Automatic CTS specification file
Determines whether CTS routes the
e
clock nets. Default behavior is not to AutoCTSRootPin clk_out/Y
route the clock net. MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

PostOpt YES | NO Automatic CTS specification file
Specifies whether CTS runs
e
optimization, i.e., it resizes buffers or AutoCTSRootPin clk_out/Y
inverters, refines placements, and MaxDelay 5ns
c
corrects routing for signal and clock MinDelay 0ns
wires. Default: YES,
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

OptAddBuffer YES | NO Automatic CTS specification file
Controls whether CTS adds buffers
e
during optimization. Effective only if AutoCTSRootPin clk_out/Y
PostOpt YES is specified. MaxDelay 5ns
c
Tries to meet the trigger edge skew MinDelay 0ns
constraints as defined in the clock tree MaxFanout 30
n
specification file. SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

Buffer cell1 cell2 cell3 Automatic CTS specification file
Specifies the names of buffer cells to
e
use during automatic gated CTS. AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
INVX2 INVX4
End

What is clock tree synthesis
e
How to create a clock tree specification file
Analyzing a CTS report
nc
d e
ca
CTS Report
After running CTS on a design, a report is created containing
information about the clock tree constructed. The report contains
e
several sections.
c
Library Information: The process information used to create the clock tree.
Example
#
#
#
en
Complete Clock Tree Timing Report
CLOCK: cgen/i_5/Y
a
#
#
#
d
Mode: preRoute
Library Name : slow
Operating Condition : slow
c
# Process : 1
# Voltage : 1.62
# Temperature : 125

CTS Report (continued)
Clock Tree Structure Information: Gives details on the number of
buffers, subtrees, sinks, levels
Example
Nr. of Subtrees : 1
ce
n
Nr. of Sinks : 343
Nr. of Buffer : 9
e
Nr. of Level (including gates) : 2
Max trig. edge delay at sink(F):
d
TPRAM/mod1/CK 477.7(ps)
a
Min trig. edge delay at sink(R):
TPRAM/mod2/CK 459.6(ps)
6/16/08

Delay, skew, and transition information
e
Example
c
Actual) (Required)
Rise Phase Delay : 459.6~477.7(ps) 0~5000(ps)
Fall Phase Delay : 432.8~446.7(ps) 0~5000(ps)
n
Trig. Edge Skew : 18.1(ps) 250(ps)
Rise Skew : 18.1(ps)
e
Fall Skew : 13.9(ps)
Max. Rise Buffer Tran : 238.5(ps) 550(ps)
Max. Fall Buffer Tran : 141.4(ps) 550(ps)
d
Max. Rise Sink Tran : 366.2(ps) 550(ps)
Max. Fall Sink Tran : 204.5(ps) 550(ps)
a
Min. Rise Buffer Tran : 120(ps) 0(ps)
Min. Fall Buffer Tran : 120(ps) 0(ps)
Min. Rise Sink Tran : 340.6(ps) 0(ps)
c
Min. Fall Sink Tran : 192(ps) 0(ps)

Maximum transition time violation
Example
ce
***** Max Transition Time Violation *****
Pin Name (Actual) (Required)
n
-----------------------------------------------------------------
reg/CK [406 353.5](ps) 400(ps)
reg2/CK [406 353.4](ps) 400(ps)
e
clk0__L6_I2/A [345.5 288.1](ps) 300(ps)
clk0__L7_I4/A [346.2 296.3](ps) 300(ps)
clk0__L9_I11/A [351.6 299.9](ps) 300(ps)
d
clk0__L9_I10/A [361.5 305.9](ps) 300(ps)
ca

Skew distribution information
e
Example
c
cgen/i_5/Y delay[0 0] ( CK__L1_I0/A )
********** Skew Distribution **********
LEVEL 1 Buffer:
n
Input Delay Range Nr of Buffers
[0.6 0.6] 1
(max, min, avg, skew) = (0.6(ps) 0.6(ps) 0.6(ps) 0(ps))
e
-------------------------------------------------------
Output Delay Range Nr of Buffers
d
[195.5 195.5] 1
(max, min, avg, skew) = (195.5(ps) 195.5(ps) 195.5(ps) 0(ps))LEVEL 2
Buffer:
a
-------------------------------------------------------
Input Delay Range Nr of Buffers
c
[212.8 212.8] 1
(max, min, avg, skew) = (212.8(ps) 212.8(ps) 212.8(ps) 0(ps))

What is clock tree synthesis
e
How to create a clock tree specification file
Analyzing a CTS report
nc
d e
ca
Need for Low-Power Clocking

Clock distribution network takes a significant fraction of the power consumed
by a chip .
output is not needed.
ce
Significant power can be wasted in transitions within blocks, even when their
en
a d
6/16/08
Low-Power Clocking Technique
Gated clocks
Involves adding logic gates to the clock distribution tree
ce
Prevents switching in the areas of the chip not being used
Exact savings are very design dependent, but around 20-30% is often
achievable.
en
FF FF
a d FF
Gated Clock Section
c
Clock Source FF
FF
Summary
Clock signal is needed to synchronize all memory elements in a chip.
e
Clock tree has to be created to provide clock signal with the least
amount of skew and insertion delay.
combinational logic.
nc
Skew affects the amount of clock period available for the
Useful skew takes advantage of the difference of arrival time at flip-
e
flops to correct datapath timing violations.
d
Clock tree has to provide an acceptable input transition to all the flip
flops.
a
Low-power designs make use of gated clocks.
c
True or false
e
1. Clock tree adds more wire into the design as compared to a clock
mesh.
exact same time.
nc
2. Propagated clock signal arrives at all flip-flops within a design at the
3. Skew can be good or bad for a design.
and automatic CTS.
d e
4. A clock tree specification file is needed in the case of both manual
a
5. Default behavior of clock tree synthesis is to only place the clock
buffers into the design and not route them.
6/16/08
ce
en
a d
6/16/08
Routing
Module 7
June 16, 2008
What Is the Difference?
e
DFF BUF NAND
c
CKBUF CKBUF CKBUF DFF
DFF
n
CKBUF NOR CKBUF INV
d e
a
DFF BUF NAND
CKBUF CKBUF CKBUF DFF
6/16/08
c CKBUF
DFF
NOR

CKBUF INV
420
Module Objectives
e
Analyze benefits of timing-driven versus congestion-driven routing
Predict downstream detail routing issues by running trial routing
nc
Explain how trial routing works and its benefits
Explain what is meant by incremental routing, process antenna, and SI
e
fixing
a d
6/16/08
Discussion Question
Recall the flowchart diagram of the
design flow steps to take an idea to
e
product (chip)
c
In what part of the flow does
routing occur?
en ? Design Flow
Step
d
Input/Output ? Routing
a
?
6/16/08
Routing
e
Various types of routing
Comparison of different types of routing
nc
d e
ca
Routing
Definition
e
Routing inputs and outputs
Router
Types of routers
Routing tracks
nc
e
Goals of routing
d
Steps involved in routing
a
Congestion
6/16/08
What Is Routing?
After placement of the individual
standard cells and macros, the
e
connections between the pins of the
cells need to be formed using metal
c
wires and vias. All wires connecting the
placed components have to obey the
n
design rules.
e
Definition: Process of connecting the
pins of the standard cells, macros, and
d
IOs of a digital design to specific metal
layers in the process technology to
a
match the schematic.
6/16/08
Routing: Inputs and Outputs

Inputs
Verilog® netlist
e
Timing library (.lib) contains the Tech
Netlist File
timing information for each
c
discrete logic gate or macro
Technology library (LEF) contains
n
DEF
information about the routing File
layers and their rules
Routing
e
Physical cell library (LEF) contains
information about the shape and Phys
connectivity of the technology Lib
d
library cells
Placement information such as a
a
DEF file
Routing guides from clock tree Routed Congestion
c
synthesis (CTS) (optional) Design table
Outputs
Routed design
Congestion table

Where Does Routing Fit into the Implementation Flow?
RTL
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
Route
GDSII GDSII
Router
To handle various cost functions and constraints of deep submicron
layouts, router needs the capability to handle
Variable wire widths
ce
Variable spacing requirements
n
Shielding and interleaving
e
Minimum area rules
Process antenna rules
a d
6/16/08
Types of Routers: Grid-Based
Most commonly used router, because it is fast and mature
Performs well for flat designs less than 3 million gates and for 130 nm
and larger designs
Used for block-based designs
Relatively high-speed router
ce
the routing area
en
Superimposes a mesh-like grid running horizontally and vertically over
d
Each vertical and horizontal grid intersection point on the mesh is
maintained as a pointer in memory
a
The larger the design grows or the smaller the process geometry, the
more grid points need to be allocated in memory and the more time it
c
takes for routing
A trial-router is a type of grid-based router used to quickly perform
global and detail routing to estimate congestion and timing at the early
stages of the physical implementation flow
Types of Routers: Shape-Based

Limited to small designs or top-level designs with approximately
20,000 to 30,000 nets
grid restrictions
ce
Does not need to adhere to the concept of grid and so is not limited by
Preferred solution for top-level routing and can handle complex and
custom requirements
en
a d
6/16/08
Types of Routers: Graph-Based
Combines the performance characteristics of a grid-based router with
the flexibility of a shape-based router
ce
Fast tool capable of handling all aspects of routing complex multi-
million gate designs, both at block level and top level
Views a design similar to a grid-based router in that there are grid
en
lines in both the vertical and horizontal direction, however it considers
these grids only as a guideline for routing
Does not require that every grid intersection on the design be
a d
allocated a pointer in memory, only the grid points in the vicinity of the
routing task will be considered as needed
Through efficient memory handling, graph-based routers can handle
6/16/08
c
significantly larger design sizes
Types of Routers
Super Threading
Multi-CPU
ce Graph-based
router
100-million-gate SoC
designs with hierarchy
65-nm variable pitch
n
Speed and Capacity
e
Designs of one million
Grid-based standard cells
d
routers Best for flat 130 nm
and above
ca Flexibility
Shape-based
routers
60–80K nets
structured custom
(Top level)

Routing Tracks
Metal routes must meet minimum width and spacing design rules to prevent
open and short circuits during fabrication
e
In grid-based routing systems, design rules determine the minimum center-to-
center distance for each metal layer
nc
Congestion occurs if there are more wires to be routed than available tracks
Detailed routing track is track for actual wire locations
Global routing track is coarser track for global routing
Detailed
routing
d e Global
routing
a
track track
6/16/08
Goals of Routing
Responsible for functionally connecting all signal nets, power nets,
and buses in a design
ce
Route the design quickly and be free from design rule check (DRC),
layout versus schematic (LVS), and signal integrity (SI) errors
Effectively meet design for manufacturability and overall timing
specifications
en
a d
6/16/08
Steps Involved in Routing
Global routing
Assigns nets to specific metal layers and global routing cells
ce
Tries to avoid congested global cells while minimizing detours
Tries to avoid prerouted power and ground signal, placement, and routing
blockages
n
Track assignment
e
Assigns each net to a specific track
Tries to avoid large number of vias
d
Operates on the entire design at once
a
Detail routing
Tries to fix DRC violations using a fixed-size, small area known as SBox
c
Traverses the whole design box by box until entire routing pass is
complete
Search and repair
Fixes any shorts or violations that are present
What Is Congestion?
Congestion occurs when
Design is densely routed
ce
More wires are needed at a location than the number of available tracks
Congestion is shown as red diamond-shaped markers after an initial
n
trial route
d e
ca
Analyzing Congestion
Actions to consider:
e
Block placements can be adjusted to make sure that connecting pins
face each other.
nc
Check for obstructions that may cause the congestion in the area.
A partial placement blockage can be used to lower the congestion in a

specific area.
d e
Read the log files for congestion information during global route, as
well as violation and iteration information during detail route.
ca
Congestion Analysis Table

NanoRoute Groute congestion analysis table from the encounter.log file.
# Congestion Analysis:
e
#
# OverCon OverCon OverCon OverCon
c
# #Gcell #Gcell #Gcell #Gcell %Gcell
# Layer (1-2) (3-4) (5-6) (7-17) OverCon Worst case
n
# -------------------------------------------------------------------------------------------------------- on Metal2
# Metal 1 1625(2.35%) 34(0.05%) 0(0.00%) 0(0.00%) (2.40%)
# Metal 2 11546(16.7%) 6353(9.19%) 4728(6.84%) 3787(5.48%) (38.2%)
e
# Metal 3 8500(12.3%) 904(1.31%) 37(0.05%) 1(0.00%) (13.7%)
# Metal 4 14951(21.6%) 764(1.11%) 20(0.03%) 0(0.00%) (22.8%)
d
# Metal 5 8473(12.3%) 37(0.05%) 0(0.00%) 0(0.00%) (12.3%)
# Metal 6 854(1.24%) 0(0.00%) 0(0.00%) 0(0.00%) (1.24%)
a
# --------------------------------------------------------------------------------------------------------
# Total 45949(11.1%) 8092(1.95%) 4785(1.15%) 3788(0.91%) (15.1%)
#
c
# The worst congested Gcell OverCon (routing demand over resource in number of tracks) = 17
Note: Overflow/OverCon = (Demand – Supply) per gcell

Violation Trends
#start 19th optimization iteration ...
# completing 10% with 98611 violations
# completing 20% with 98648 violations Steps
e
…
9 Run the search and repair
# completing 100% with 98663 violations
up to the 19th iteration.
c
# number of violations = 98663
#Complete Detail Routing. 9 Check the log file on which
#Total wire length = 559006793 um. layers the violations occur.
n
#Total half perimeter of net bounding box =
471147199 um. 9 Check the violations
graphically if there are lots
e
#Total wire length on LAYER MT1 = 18600362 um.
… of violations (>1000).
#Total number of vias = 31662170
d
#Total number of vias on LAYER MT1 to MT2 =
11200471
a
…
#Total number of DRC violations = 98663
#Total number of violations on LAYER MT1 = 61945
c
…

Routing
e
nc
d e
ca
Various Types of Routing
Trial route
e
Global routing
Detail routing
Timing-driven routing
Congestion-driven routing
nc
e
Incremental routing
d
Process Antenna Effect (PAE)-aware routing
a
SI-aware routing
Clock routing
6/16/08
c
Super-threading routing
Diagonal routing
What Is Trial Route?

Performs quick global and detailed routing to
e
Estimate and view routing congestion: Produces a congestion map
that is viewed to get early feedback on whether a design is routable
nc
Estimate parasitic values for optimization and timing analysis: Creates
actual wires to get good representation of RC and coupling
d e
ca
Trial Route Effort Level
Prototyping
Runs quickly to gauge the feasibility of the netlist
Medium effort
Default selection
ce
Components in the design might not be routed at legal locations
High effort
en
For additional iteration to lower congestion
d
Low effort
For quick routing, and it completes without congestion detouring
a
At this effort level, you throw away the route information
c
This mode is typically used only when you partition the design
Use “Prototype” or “Low” effort if you want to have a very quick look at
the routability of the design. Use “Medium” effort in most cases, and
“High” effort if “Medium” is showing congestion.
Trial Route
Advantages Disadvantages
Routes the design quickly
ce
Estimates congestion and parasitic data
Does not fix DRC violations or give DRC
clean routing results
Routes are only used to estimate
n
early in the design cycle parasitic values for timing analysis and
e
not signal integrity analysis
Creates a congestion table showing the Cannot fix timing violations
d
amount of congestion in each metal
layer
ca
What are the benefits of running trial route?
e
What issues can be predicted by running trial route?
nc
d e
ca
What Is Global Routing?

Guides the detailed router in large designs
Creates a coarse routing plan for detailed router to follow
ce
Does not create actual routing wires
May perform quick, initial detail routing
n
Commonly used in cell-based design, chip assembly, and datapath
e
Also used in floorplanning and placement
a d
6/16/08
Global Routing Goals
Minimize the wire length
Total wire length calculated by global router should be within a few
Minimize worst congestion value
ce
percentage points of that estimated by the placer
Congestion value is associated with each boundary crossing (edge)
en
between adjacent global routing cells (gcells) on a specific layer
Optimize routes for timing and signal integrity
Tries to meet hold and setup timing
d
Minimizes design rule violations
a
6/16/08
Global Routing Steps

Router breaks the routing portion
of the design into rectangles gcells
e
Start
Router then assigns the signal

gcell
c
nets to the gcells
Router attempts to find the
n
shortest path through the gcells
No actual connections are made
tracks within the gcells
d e
No nets are assigned to specific
ca End

Global Routing Steps (continued)
Tries to avoid assigning more nets
to a gcell than the tracks can
e
accommodate Start
gcell
nc Start
e
Start
d
Start
ca End
End
End
End
Global Routing Steps (continued)

Router then generates a map of
the gcells (congestion map)
e
Start
Congestion map uses colors to gcell
indicate whether there are too
c
few, too many, or the correct
Start
number of nets assigned to the
n
gcells
gcells are marked over-congested
e
Start
if router assigns too many nets to
a gcell
d
Start
ca Edge has
3
crossings
End
End
End
End
Edge has
2
crossings

Congestion Map GUI
Congestion maps from trial route and global route are displayed differently
ce
en
a d
6/16/08
c
Trial Route View Global Route View
What Is Detail Routing?

Connects all pins in each net
e
Must understand most or all design rules
Necessary in all applications
nc
Goal is to complete all of the required interconnects without violations
All nets will be routed, even if they contain violations (It is better to
e
have a route with a violation, than no route at all.)
a d
6/16/08
Detail Routing Steps
Router divides the chip into areas called switch boxes (SBoxes)
SBoxes align with gcell boundaries
ce
Router follows global routing plan
Lays down actual wires that connect the pins to the corresponding nets
n
Creates shorts or spacing violations rather than leave unconnected nets
Router runs search and repair
d e
Locates the shorts and spacing violations
Reroutes affected areas to eliminate as many violations as possible
a
Runs post-route optimization
Runs rigorous search-and-repair steps
6/16/08
c
Stops once it cannot make further progress on routing the design
Timing-Driven Routing
Routing along the timing-critical path is given priority
e
Creates shorter and faster connections along the critical path
Non-critical paths are routed around critical paths
nc
Reduces routing congestion problems for critical paths
Does not adversely impact timing of non-critical paths
e
Input files needed for timing-driven routing
Physical libraries in LEF
a d
Timing library in .lib format
Timing constraints in .sdc format or a timing graph
Extended capacitance table
6/16/08
c
Verilog Netlist
Placed design in DEF

Congestion-Driven Routing
Router tries to reduce congestion
Routing occurs based on a cost function
ce
Congestion reduction is given the highest priority
Nets that are in the congested area are spread apart and routed
n
through other areas
d e
ca
Why would you run timing-driven routing?
e
Why would you run congestion-driven routing?
What are the tradeoffs between the two?
nc
What is your design is both congested and not meeting timing? Which
routing type would you run first and why?
d e
ca
What Is Incremental Routing?
Provides an incremental rip-up and reroute capability
e
Reroutes partial routes and nets without routes
Retains fully prerouted nets and pin-to-pin paths
nc
Might use dangling paths to complete routes, but removes dangling
wires left over from global routing
e
Keeps connectivity within the bounding box, but does not constrain
layers or positions
d
The router might change the routing path of another net and route it on a
different layer or in a different position.
ca
The router does not support re-routing of wires with the FIXED keyword.
Change FIXED to ROUTED to reroute these wires.
PAE-Aware Routing
During manufacturing, static charge builds up on metal traces
Metal with static charge accumulated on it, when connected, will discharge
ce
onto a gate, passing high current through it.
The discharge can damage the oxide that insulates the gate and cause
the chip to fail.
n
Antenna ratio is the maximum allowable ratio of metal area to gate area.
e
The router calculates antenna ratio to determine the extent of PAE.
Process antenna violations are fixed when the router finds a net with
d
an antenna ratio for a specified layer that exceeds the maximum
allowed value.
a
Router fixes process antenna violations by
c
Inserting diodes to provide alternate path to discharge static charge and
protect the gate
Changing (jogging) the routing layers connected to a gate to decrease the
area of a metal layer connected to a gate to meet the antenna ratio

Tradeoffs of Fixing PAE
Reverse biased diode insertion
Causes leakage
Increases area
Timing penalties
ce
n
Bridging (breaking antenna by hopping to higher layer)
Extra wiring
Congestion
d
More vias are created
e
ca
What Are SI Effects?

Nanometer designs (130 nm or less) suffer from increased sensitivity
to signal integrity (SI) effects such as
ce
Crosstalk-induced delay changes
Functional failures caused by crosstalk glitches
Caused by
Coupling capacitance
en
Decreased interconnect pitch and features size
d
Higher clock frequencies
Lower supply voltages
ca
SI-Aware Routing
Crosstalk effects such as glitch and delay are measured after the
physical wires are made available.
ce
Router tries to reduce crosstalk between wires.
Creates routes with reduced coupling capacitances by
n
Parallel wire minimization: Limiting the distance that two wires travel
adjacent to each other
e
Layer switching: Changing the track assignment for a wire so that potential
victim nets can be moved away from a strongly driven signal net
a d
Net shielding: Using power and ground lines to shield critical high-speed
signals such as clocks
Track reassignment: Assign tracks to parallel wires that are further apart
c
with in-between tracks assigned to shorter, less noise-sensitive wires
Soft spacing: Making use of available free space to spread wire segments
apart
Why would you use incremental routing?
e
For process antennas, how is the router constrained to fix these?
How can a router make choices that will reduce the effect of signal
integrity?
nc
d e
ca
Clock Routing
Usually routed same way as signals, but we can choose to route clock
nets by themselves before routing other nets.
ce
Clock nets given priority during global routing.
Clock nets are routed as straight as possible.
n
When clock nets are routed, one track of spacing can be added
around these nets to improve coupling capacitance.
d e
Shielding can also be added to clock net for additional signal integrity.
Clock routes can be marked as fixed, so that post-route optimization
a
will not reroute the clocks and alter the clock skew, timing, etc.
6/16/08
Wide Wire Routing

Routing is done with wider wires for post-route yield optimization.
e
Router widens wires where resources are available.
Does not add DRC
Does not affect timing
nc
Does not add antenna violations
Wire widening uses non-default rules.
d e
ca
Super-Threading Routing
Portions of the design flow can be Multiple Threading Distributed Processing
accelerated using multiple-CPU
e
processing. There are three modes: Job Job
c
Multiple threading
Job is divided into several threads
n
Thread
Multiple processors in a single
machine process each thread
e
concurrently
Processor Processor Processor
Distributed processing
d
Job is processed by two or more Super Threading
networked computers running
a
concurrently Job
c
Super threading
Combination of multithreading and
distributed processing Thread Thread
Delivers scalable performance and

capacity
Processor Processor
Super-Threading Routing (continued)

Combines advantages of multi-threaded routing with flexibility of
distributed parallel routing
10X
ce
Boosts routing performance on 600K to 400M gate designs by up to
n
Reduces design cycles significantly without sacrificing quality
Tasks are partitioned among different CPUs automatically by the
e
router
d
Speedup is nearly linear as the number of CPUs grows
ca
What Is Diagonal Routing?
Some routers take advantage of
45-degree “diagonal” routes on
e
certain metal layers.
c
M1 and M2 are “orthogonal” so
that the connections to the M8 - Vertical
standard cells are preserved,
n
M7 – Horizontal
while M7 and M8 (the top layers)
M6 – 45 Degree Left
e
are also orthogonal for power grid
creation. M5 – 45 Degree Right
d
The middle layers can be 45
degrees offset and alternate M3 – 45 Degree Right
a
direction between metal layers. M2 - Vertical
c
M1 - Horizontal
Diagonal Routing: Pros and Cons

Pros
Can achieve good routing quality,
avoid routing congestion, and
improve timing
May decrease the overall area of
ce M8 - Vertical
n
the design because of the routing M7 – Horizontal
efficiency M6 – 45 Degree Left
Cons
d e
Must have a special library and
vendor who will accept the
M5 – 45 Degree Right
M3 – 45 Degree Right
a
diagonal routes M2 - Vertical
c
Must have special tools for M1 - Horizontal
place/route, as well as physical

verification

Routing
e
nc
d e
ca
Global Route vs. Detail Route
e
Global Route Detail Route
Runs on the entire design. Can route the entire design, an area, or
c
selected nets.
n
Finds generalized pathways without Lays down physical wires based on the
laying down actual wires. global routing plan.
minimize use of vias.
d e
Iterative passes are made to optimize
global routing, shorten wire length, and
Fixes DRC violations during search and
repair routing.
ca
Congestion map is updated. If antenna rules are included in the LEF,
antenna repair will also be done during
detail routing.

Timing-Driven vs. Congestion-Driven Routing
e
Timing-Driven Congestion-Driven
c
Router routes critical nets to meet Router routes nets keeping low
timing constraint. congestion as a high priority.
routed as short as possible.
en
A critical net will be forced to be Nets will be forced to be spread apart
from a heavily congested area.
a d
May create congestion if many critical
nets have to be forced into a small
channel.
6/16/08
Summary
After placement, a trial route is run to get an estimate of congestion
and parasitic values.
ce
Early detailed routing provides physical information necessary for
prevention of problems for physical synthesis.
Congestion is displayed as a red diamond after trial route, and colored
lines after global route.
en
Clock nets are routed first and fixed into position so that the router
does not alter them in subsequent runs.
a d
The number of wires assigned to a gcell should not exceed the
number of tracks available.
6/16/08
True or false
e
1. Congestion map is created after running detail routing.
2. Shape based routers are limited to small designs and are the
nc
preferred solution for top-level routing.
3. Global router provides guidance to the detailed router.
e
4. Detailed routing stops once the congestion map is created.
5. Super threading increases the runtime for the routing phase.
a d
6/16/08
Learning Activity
e
Study metrics from several routing log files
Identify potential downstream issues
c
n
e
a d
6/16/08
Power Consumption and
Power Grid Analysis
Module 8
June 16, 2008
How Has Power Influenced Technology?
ce
en
a d
6/16/08
Module Objectives
In this module, you will
e
Identify the inputs and outputs of power-consumption and power grid
analysis tools
nc
Explain the three components of power (leakage, switching, internal)
Articulate the difference between static and dynamic power

consumption
d e
Identify the types of power grid analysis, the difference between static
and dynamic power grid analysis, and what each is used for
a
Recognize low-power design issues and apply three power-saving
design techniques
6/16/08
What affects power consumption in a chip?
e
How does power affect the cost of a chip?
nc
d e
ca
Power consumption and analysis (PowerMeter power calculation
functionality)
Low-power design
ce
Power grid analysis (VoltageStorm® power and power rail verification)
en
a d
6/16/08
Power Consumption and Analysis

What is power consumption?
e
Inputs and outputs for power consumption calculation
Static power consumption
Dynamic power consumption
nc
d e
ca
What Is Power Consumption?
Power consumption is a critical design criteria. Today, for most system-on-
a-chip (SoC) designs, the power budget is one of the most important design
e
goals of the project.
c
Definition: Power consumption is the amount of energy over time that
must be supplied to a circuit to maintain normal operation. Power
n
consumption is measured in watts (W).
e
Example: The increasing speed and complexity in today’s
microprocessor chips has resulted in a significant increase in the
d
power requirement and determines the battery life in hours for portable
devices.
ca
Why Is Power Consumption an Issue?

250
Exponential increase in chip density
Leakage Power (W)
Tens of millions of gates implemented on a 200 Active Power (W)
e
reasonably small die
Power (W)
Increase in power density and total power 150
c
dissipation
100
Limits of what packaging, cooling, and other
infrastructure can support exceeded
n
50
Battery life has declined as features have been
added faster than power (per feature) has been
e
0
reduced.
250 180 130 90 70
Deep submicron technology, 90 nm and below Technology (nm) *Source = Intel
d
Leakage current is increasing dramatically.
Microprocessor chips can dissipate up to 100-150W of power.
a
Power density causes large number of local hot spots on the die.
Poses reliability problems (mean time to failure decreases exponentially with
c
temperature)
Timing degrades and leakage increases with temperature
These problems are all expected to get worse as we move to the next
technology nodes.

Benefits of Reducing Power Consumption
Reducing system power consumption
e
Extends battery life in portable systems
Reduces system temperature

Improves timing
Reduces leakage
nc
e
Reduces system fan noise (on some models)
Provides better reliability
Lowers cooling cost
a d
Simplifies power supply and delivery
6/16/08
Components of Power Consumption

Static power component due to leakage
Leakage power: Power consumed when cells are not switching
ce
Dynamic power component: Related to charging and discharging of load
capacitance and due to a path from Vdd to ground
Switching power: Power consumed through charge and discharge of gate
n
capacitance. The total gate capacitance consists of the sum of the capacitance of
internal gate nodes and capacitance of the gate output load.
e
Short circuit power: Power consumed when both N and P devices are ON at the
same time. Current path established from power rail to ground. It is a function of
d
output load and input slew.
ca Ptotal = Pstatic + Pdynamic

e
nc
d e
ca
What Is a Power Library?

Definition: A power library is a collection of cells described in a particular
format to represent the power characteristics for those cells.
ce
Example: The ASIC (Application Specific Integrated Circuit) vendor of a
design library (standard cells and macro’s) provides its customer with a Liberty
(.lib) version of their cells, which apart from timing information, contains power
n
information that the power analysis tool can use to calculate leakage and
active power consumption for the cells.
e
Example: power information in a .lib file
cell (INVXL) { values ( ……… );
cell_footprint : inv; }
d
area : 6.6528; fall_power(energy_template_7x7) {
pin(A) { index_1 ("0.0250, 0.0800, 0.3000,
direction : input; 0.7000, 1.2000, 1.7000, 2.3000");
a
capacitance : 0.00270; index_2 ("0.00018, 0.01050, 0.01925,
} 0.04200, 0.07350, 0.11550, 0.15575");
pin(Y) { values ( ……… );
direction : output; }
c
capacitance : 0.0; }
function : "(!A)"; timing() {
internal_power() { …
related_pin : "A"; }
rise_power(energy_template_7x7) { max_capacitance : 0.15575;
index_1 ("0.0250, 0.0800, 0.3000, }
0.7000, 1.2000, 1.7000, 2.3000"); cell_leakage_power : 0.0173;
index_2 ("0.00018, 0.01050, 0.01925, }
0.04200, 0.07350, 0.11550, 0.15575");

Inputs and Outputs for Power Consumption Calculation
Power libraries provide power analysis tool with the
following information:
e
Functional information (.lib)
.libs, .cl, SPEF, Pin capacitances (.lib)
SDC, TWF, VCD
Leakage power (.lib)
c
Internal power tables (.lib)
Internal decoupling cap (.cl)
n
Power Physical size and location of power ports (.cl)
Consumption Internal power net resistance (.cl)
e
Tool
Tap currents (.cl)
Function of a power analysis tool
d
Calculates instance-based static and dynamic power
Power consumption
Consumption
a
Runs in two modes:
Vector driven: Use actual switching activity from a VCD
file
c
Vector-less: Probabilistically project the activity
To throughout a design
Power Grid Analysis We use the results of the power consumption tool to
Tool perform static and/or dynamic power grid analysis
Produce reports on the power consumed by each cell,
cell type, or hierarchical block in the design
What Is Switching Activity?

Switching activity (α) is the number of transitions (0-to-1 and 1-to-0)
for every net in a circuit when input stimuli are applied.
relationship to that period?
ce
Within a given CLK period, how often will an input switch in
n
clock
cycles
Net A
d e
ca
In the example, net A switches two times but clock switches six times.
Activity of the clock is set at 1 since the clock is always switching.
Calculation would be 2/6 = .333 (net A’s activity).

Input and Output, Format
Input
Gate-level netlist in the Verilog®
e
language and or DEF (tool dependent)
Power characterized libraries in tool-
c
specific format
Timing libraries in Liberty (.lib) format VCD
n
Timing constraints in SDC format
Gates + SDC TWF
SPEF
Extraction data in SPEF format DEF
e
Timing windows file (TWF)
Value-change-dump file (VCD) Power Analysis
d
Output Logical Power
Libraries Libraries
a
Textual output
Reports on the power consumed
by each cell or block (.pwr) file. Reports
c
Graphical output
Instance-based power and power
density
Power consumption of the clock
distribution network

e
nc
d e
ca
Static Power Consumption
Pstatic = VDD x Ileakage
ce
en
Silicon devices are not ideal switches.
Static power dissipation is the power that is lost while circuit signals are not
d
actively switching.
This power dissipation includes leakage and standby power dissipation (i.e.,
a
leakage power when voltage is applied even if circuit is not switching).
c
Static power consumption is the summation of leakage, state dependent
leakage, and averaging of internal and switching over time.
What Is Static Power Analysis?

Static power analysis is the calculation of leakage power
e
Computes average power consumption based upon various
assumptions
Much faster than simulation
nc
It is a full-chip and instance-based power consumption analysis
e
Less accurate than simulation
Hard to model real delays
a
simulation vectors
d
Probabilities model the environment in a less accurate way than
6/16/08
How Is Static Power Analysis Done?
No simulation is done to determine actual net activity.
Vector-independent (probabilistic activity-based with
optional VCD)
ce
By understanding the logic functionality and the activity at
the input pins, the activity at the output pins is predicted .
Analysis types
.libs, .cl, SPEF,
SDC, TWF, VCD
n
Area-based
Power
e
A power per unit area is assumed and multiplied by the Consumption
total die area Tool
d
Easy, but not very accurate and is used in floorplanning
Cell-based
a
Power
Power for each cell is taken from the library entry Consumption
More accurate and is used by synthesis tools prior to
c
place and route
Instance-based
Takes in consideration output load of each instance Static Power Analysis
Reports
Calculates power from library tables
Most accurate, but requires information from place and
route

e
nc
d e
ca
Dynamic Power Consumption
Pdynamic = α x CL x VDD2 x f
CL
ce Where,
α – Switching activity
f – Operating frequency
is switching.
en
A circuit does not draw constant current. Current draw increases when a cell
Dynamic power consists of power dissipated inside a cell (mostly due to short-
d
circuit current during switching) and power dissipated to charge/discharge net
capacitance.
ca
Dynamic power is a function of voltage, toggle rate, and net loading.
Dynamic power consumption is the power of each instance over time, taking
into account simultaneous switching activity.
Timing Window File (TWF) provides windows when nets are switching relative
to clock edges; default input activity or VCD provides switching activity (toggle
rate.
What Is Dynamic Power Analysis?

Computes actual power consumption using actual net activity derived
from a simulation
e
Best analysis since it takes into account that not all nets are driven at the
same frequency
c
Dependent on the actual test vectors used to derive the net activity
Requires significant CPU time in simulation
Gate-level analysis
en
Net activity information from simulation vectors
Time-based input slew and output load for each cell
d
Cell power characterization from the library
Usually performed during analysis since it is faster, but not very accurate
a
Transistor-level analysis
c
Simulation vectors for at least the I/Os (such as running SPICE on a full
design
Performed at signoff since it takes a long time, but is very accurate
What if vectors are not available for simulation?

How Is Dynamic Power Analysis Done?
SIMULATION with a representative set of vectors
Derived by designer
e
Simulation
Vector based
Uses VCD for switching activity and timing and TWF for
c
input slews Toggle
Most accurate solution if “right” vectors are provided by Rates
n
user
Vector-less
e
Simulation
Uses TWF for input slews and timing Driven
Best approach to obtain full-chip transient information Power
d
Analysis
Transistor level
a
Very accurate
Power
Much faster than SPICE
c
Report
Gate level
Faster than transistor level
To
Still very accurate due to good modeling of power dissipation Power Grid Analysis
at cell level Tool
Difference between Static and Dynamic Power Analysis
Static Power Analysis Dynamic Power Analysis
e
It is the average power over time for It is the average or peak power over time
each instance, resulting in one power resulting in current waveforms, i.e., at
c
number. each time step across the simulation
window.
n
Calculates average IR drop. Calculates the worst case IR drop
transients.
dissipation to calculate a constant
d e
Static IR drop analysis is a first-order
approximation that uses the total power
Dynamic IR drop analysis deals with the
voltage drop of current surges.
a
current draw.
Is a fast process available early in the Provides visibility of simultaneous
c
design phase and provides correct switching, decap optimization to control
information on power grid issues. leakage power, and the effect of
packaging.

Static vs. Dynamic IR Drop Analysis
Static (average) Dynamic IR drop
e
IR drop (worst-case)
nc
17 mV increase
in IR drop due
to switching
d e
ca
Review Questions
1. What are the three components of power consumption?
2. What is the purpose of a power library?
power consumption?
ce
3. What is the difference between static power consumption and dynamic
en
a d
6/16/08
Power consumption and analysis
e
Power grid analysis
Low-power design
nc
d e
ca
Power Grid Analysis

What is power grid analysis?
e
Inputs and outputs of power grid analysis
Tasks of power grid analysis
nc
d e
ca
What Is a Power Grid?
IC power distribution systems are designed to provide
G V G V
needed voltages and currents to the transistors that
e
perform the logic functions of a chip.
V
block 4
V
c
Definition: The system that distributes the block 5
needed voltages and currents evenly
G
throughout the chip to ensure the correct logic
n
block 3
functioning is achieved through a network of
V
block 4
e
wires called the power grid.
V
G
Example: In our design, we over-engineered block 1
d
V
our power grid to avoid IR-drop problems, but in
G
doing so, we did not have enough resources to
a
properly route our design. G V G V G V
6/16/08
What Is Power Grid Analysis?

Voltage drops occur in the power distribution network because of interconnect
resistance.
e
Power grid analysis evaluates how power is distributed from the voltage
source to the transistors and gates in the design.
c
It is the analysis of the power grid and not power consumption in a design.
n
e
VDD_1
a d
VDD_2
6/16/08
c +
-
Resistance of
interconnect

What Is the Purpose of a Power Grid Analysis?
Checking the integrity of the supply voltage
Detects voltage (IR) drops on VDD nets
ce
Detects ground bounce on VSS nets
Reduce the effect of nets affected on a design's overall timing and
n
functionality
Reduces cause of silicon failure
e
Reduces electromigration (EM) effects
d
ca
How Is Power Grid Analysis Done?

Power grid analysis at transistor-level
Transistors are modeled as current sources attached to the power grid.
ce
A tap current (currents arising from transistor to power grid connection) data file
provides the details for each current source.
These currents are used to perform either a simple steady-state analysis or a
n
dynamic analysis of the power grid.
VDD
d e
ca
How Is Power Grid Analysis Done? (continued)
Power grid analysis at cell/gate-level
The current distribution within a cell or a block is done on an instance-by-instance
e
basis.
c
An instance-based power consumption file or current data file supplies the power
consumed on an instance-by-instance basis in Watts.
n
Current source applied as black box or gray box.
VDD
d e
ca Port view
Detailed view
Power Grid Analysis

e
nc
d e
ca
Input
.lib,
Gate-level netlist in the Verilog LEF/ SPEF, etc.
e
GDSII
language + DEF
c
Power grid cell view library Power
Library view consumption
Power consumption data Generator analysis tool
n
Output
Graphical display
e
Power
Power Grid Consumption
Plots View Library
d
Reports
DEF/GDSII
a
Hierarchical power-
grid analysis tool
6/16/08
Analysis Results
509
Typical Sequence to Run Power Grid Analysis

Create the power grid view libraries or get them from
your library provider LEF/
.lib,
e
SPEF, etc.
GDSII
Create the top-level DEF/GDS of your design
c
Create power consumption data Power
Library view consumption
Provide power consumed on per-instance basis
Generator analysis tool
n
Provide power consumed on a per-cell basis
Area-based power distribution based on total number
e
Power
More data for cells = more accurate power consumption Power Grid Consumption
data View Library
d
Run power grid analysis
Link to the power grid view libraries DEF/GDSII
a
Load the power consumption data
Hierarchical power-
Set up and run the analysis grid analysis tool
6/16/08
Analysis Results
510
Power Grid Analysis
e
nc
d e
ca
Tasks of Power Grid Analysis

IR drop and ground bounce
e
Electromigration (EM)
Current density Neither of these two
c
are handled by
Joule heating or wire self-heat (signal nets)
power grid analysis
n
Hot electron effects but are factors that
affect the overall
e
power analysis
a d
6/16/08
What Is Voltage (IR) Drop and Ground Bounce?
IR drop
e
Voltage drops caused by current flowing from the power source through the
resistive power network to the on-chip devices is called IR drop.
c
Ground bounce
Voltage spikes caused by current flowing from on-chip devices though the
n
resistive ground network to the ground pins (or bumps)
e
IR drop and ground bounce combine to impact silicon performance.
d
VDD = 1.20V
a
VDD = 1.1V
c
CLK
VDD = 1.17V
IR Drop Impacts on Setup and Hold Time

In the case where the IR drop occurs within the signal path, the signal is
slowed, potentially causing setup time violations for this signal path.
e
Setup Time Violation
c
CLK CLK
Latch Latch DATA
n
IR drop
DATA +
IR drop
e
Setup
d
In the case where IR drop occurs on a clock buffer, the clock signal beyond
this buffer is slowed, potentially causing hold time violations for all signals
a
clocked by this clock branch.
Hold Time Violation
IR drop
c
CLK
CLK Hold
DATA
Latch Latch
CLK +
IR drop Hold

How Does a Power Rail IR Drop Occur?
3. Current through a resistor 4. IR drop reduces operating
e
causes voltage drop (Ohm’s voltage and impacts
Law). circuit performance.
VDD
Current
nc
e
1.1V
1.2V
a d 2. Load
capacitance
charges up.
c
1. Input signal
switches.
circuit
Example Colors for IR Drop
e
VDD 3.300 volts
c
3.266 volts
Color 8
Color 7
n
Below
Color 6 Incremental values for color 2 - 7 also
transistor
e
Color 5 operating
Color 4 voltage
Color 3
d
3.062 volts
Color 2
3.000 volts
a
Color 1
VSS
c
0.0 volts

IR Drop Example for Chip
ce Abrupt IR drop color

change shows
locations where there
n
is a discontinuity in
the power grid.
d e These RAMs connect
a
well to the power grid.
6/16/08
These RAMs do not
connect well to the
grid.
517
What Is Electromigration?
Electromigration is a wear-out mechanism of metal wires.
e
Metal atoms migrate over a period of time, causing open circuits,
shorts circuits, or unacceptable increases in resistance.
nc
There are two main causes of electromigration failure:
High (DC) current densities
Joule heating, which is caused by high alternating currents
e
These wear-out mechanisms can take extended periods of time.
d
a void e migrated ions
(short hazard)
c
(open)

Causes of Electromigration
Electromigration is mechanical failure in the wire caused by frequently varying
thermal conditions.
heat above oxide temperature.
ce
As pulses go through the wire, the power dissipated by the wire causes it to
The difference in the thermal constants between the oxide and the wire
n
causes mechanical stress, and the wire can eventually fail resulting in chip
failure in the field.
e
EM failures as seen though a scanning electron microscope (SEM)
d
ca FESEM micrograph of aluminum
lines exhibiting classic
electromigration voiding.
Hillocks formed in a Cu line
during electromigration test.
www.nd.edu
Electromigration Damages
Voids
ce
en
a
Hillock
d
c
www.diei.unipg.it/RICERCA/www_em/voidhill.gif

High (DC) Current Densities
Physical migration of metal atoms due to “electron wind” can
eventually create a break in a wire.
Equation)
ce
MTTF (mean time to failure) ∝ 1/J2 where J= current density (Blacks
Current density must not exceed specification Æ wire Ii/wi < Jspec
n
Specified as mA per μm wire width (e.g., 1mA/ μm) or mA per via cut
e
EM occurs both in signal (AC=bidirectional) and power wires
(DC = unidirectional)
d
Much worse for DC than AC; DC occurs inside cells and in power buses
a
6/16/08
Example: Current Density

There is a high
current density
ce due to a narrow
metal3 power grid
strap connecting
n
to the internal
RAM.
d e A failure here is
catastrophic.
ca
What Is Joule Heating?
Wire Self-Heat (WSH)
May also be called signal wire electromigration, or Joule heating, since it is related to the
e
power that is dissipated into the interconnect.
WSH is the rise in temperature due to the electron movement within a conductor, i.e.,
c
wire heats above oxide temperature as pulses go through.
Depends on metal composition, signal frequency, wire sizes, slew rates, and amount of
n
capacitance driven
Self-heating = More EM
e
Since SH increases temperature, self-heating on a metal line can aggravate EM effects.
SH on a line can also increase EM effects on neighboring lines.
d
Because self-heating contributes to electromigration, failures are typically labeled as EM,
not SH.
ca Wire self-heat
Hot Electron Effect (Short Channel Effect)

Caused by extremely high electric fields between source and drain
Occurs when voltages are not scaled as fast as dimensions
ce
Electrons pick up speed in the channel
Fastest electrons damage the oxide and interface near the drain
n
Transistor threshold and mobility change over the life of the part, i.e.,
threshold eventually moves to a point where the device no longer meets
e
specifications
Oxide and/or interface
d
is damaged here.
Gate
+++
ca
Electrons pick up speed in channel;
“hot” electrons are the fastest of a
statistically fast bunch.
+++
N+ diffusion
Impact ionization occurs here.

Power Grid Analysis
What is Power Grid Analysis?
e
Inputs and Outputs of Power Grid Analysis
Tasks of Power Grid Analysis
Static Power Grid Analysis
Dynamic Power Grid Analysis
nc
d e
ca
What Is Static Power Grid Analysis?

Simple approach providing comprehensive coverage without the
requirement of extensive circuit simulations
ce
Solves Ohm's and Kirchoff's laws for a given power network while
ignoring localized switching effects on the power grid
Detects and fixes major supply grid problems
en
Main challenge of the static approach is accuracy
Local dynamic effects are not accounted for
a d
6/16/08
How Is Static Power Grid Analysis Done?
Select the power grid view libraries to be used in the
power-rail analysis.
e
Read in the
The parasitic resistance of the power grid is extracted, and Power grid views
a resistor matrix of the power grid is built.
nc
An average current for each transistor or gate connected
to the power grid is calculated.
The average currents are distributed around the resistance
Extract
Power grid
parasitic
information
e
matrix based on the physical location of the transistor
gate. Create
d
resistor
At every VDD I/O pin, a source of VDD is applied to the matrix
matrix.
a
A static matrix solve is then used to calculate the currents Calculate
and IR drops throughout the resistance matrix. average
c
current
Calculation of an instance-based static power
consumption is done, which contains the instance-based
Calculate
power-consumption data for all instances of each cell and current and IR
block in the design. drop
What Does Static Power Grid Analysis Find?

Static IR drop analysis finds power grid weakness caused by
e
Missing vias
Insufficient vias
Missing power connections
Insufficient power route widths
nc
e
Power planning decisions
a d
Power grid electromigration analysis lets you do the following:
Run a comprehensive analysis that is not vector dependent
c
Find problems in both vias and routing
Run checks against current density rules
Run analysis checks against Black’s equation

Power Grid Analysis
e
nc
d e
ca
What Is Dynamic Power Grid Analysis?

Involves a comprehensive dynamic circuit simulation of the power grid
network, which includes localized switching effects
grid to be extracted
ce
Analysis requires that both resistance and capacitance of the power
Localized dynamic and package inductance effects are taken into

account
en
Results can be extremely accurate
a d
6/16/08
How Is Dynamic Power Grid Analysis Done?
Select the power grid view libraries to be used in the
power-rail analysis.
e
Read in the
power grid views
The parasitic resistance and capacitance of the power
c
grid and the signal nets are extracted.
The dynamic tap currents are passed from the power Extract
n
calculation tool. power grid
and signal-net
Power calculator calculates the currents over time. parasitic
e
information
Rail analysis calculates where the current is varying
d
over time based on the calculations of the power
calculator. Calculate
a
dynamic current
6/16/08
Perform
rail analysis
531
Purpose of Dynamic Power Grid Analysis

The purpose is to obtain a quantitative analysis, measured against vectors.
Some specific reasons for dynamic analysis are to
Simulate a specific test vector
ce
Calculate the power grid characteristics over time
weakness
en
Identify which specific test vector activated an implementation
d
Examine the time correlation of tap current
a
Obtain a better estimation of the precise magnitude of IR drop
c
How do you identify the test vectors?
In addition to using vectors in a dynamic power grid analysis, there are
methods that do not require vectors, but use a timing window file (TWF)
instead.

Analysis Output from Power Grid Analysis
IR drop
ce Transistor device
n
currents
d e
ca
Current congestion
Electromigration
Method of Reducing IR Drop

Input
e
Current Drawn
from VDD
VDD
IR Drop with
nc 1.2V
1.1V
e
Decoupling
Adding decoupling capacitors
makes a static approach more
d
accurate. Decoupling capacitors act
as a local charge source.
VDD
1.2V
ca Decoupling
Capacitors
Input

Method of Reducing IR Drop (continued)
The red area means a voltage drop of more than
10% of the nominal supply voltage. The solution
e
is to use wider power stripes or use more metal
on higher levels.
nc
Additional power stripes are added to the design
e
and are marked in cyan and magenta.
d
ca
This IR drop plot is made after an increase of the
number of power stripes.
This plot shows a very low voltage drop, which is
required for a functional chip.
Review Questions
What is a power grid?
e
What are the tasks of power grid analysis?
What is the difference between static power grid analysis and dynamic
power grid analysis?
nc
d e
ca
Power consumption and analysis
e
Power grid analysis
Low-power design
nc
d e
ca
Low-Power Design
Need for low-power design
e
Low-power design techniques
Clock gating
Multi-threshold Logic
nc
Multi-voltage with shut-off
d e
ca
Need for Low-Power Design
Exponential increase in chip density.
e
In deep submicron technology (130 nm, 90 nm, and below), leakage
current increases dramatically.
current.
nc
In some 65 nm designs, leakage current is nearly as large as dynamic
d e
ca
Where and When to Save Power

Power is a constraint like timing and area -> good optimization potentiality.
Switching intensive networking applications use 50% -> watch the clock tree
and its sequential elements.
ce
The earlier in the design process power consumption is addressed, the bigger
the impact.
n
At higher levels of abstraction, there are more degrees of freedom for large
e
changes to the design implementation.
a d
6/16/08
Power Saving Techniques
Some of the low-power design techniques discussed today are
Circuit and chip design
Clock gating
Process
ce
n
Multi-threshold logic
e
RTL
d
Synthesis RTL clock-gating for dynamic
ca Floorplanning
Physical
Implementation
Power grid planning for multi-voltage
IR drop and EM analysis
Muti-Vdd optimization
Dual-Vth optimization for leakage
Physical clock gating
Low-Power Design
Need for low power design
e
Low-power design techniques
Clock gating
nc
d e
ca
Clock Gating
Clock distribution network contributes to a
significant portion of total power
e
consumption
Clock buffers have the highest toggle rate,
c
and often have a high drive strength to
minimize clock delay
n
Flip-flops with an active clock dissipate
some dynamic power even if the inputs
e
and outputs are unchanged.
Shut-off the clock during periods of
d
inactivity to avoid unnecessary power
consumption
ca
Clock gating
Clock-Gating Styles
Designer has the following control:
Latch-free {OR}
e
Latch-based or latch-free gating
style EN
GCLK
Which register banks to gate or

exclude from gating
Positive (AND) or negative (OR)
nc CLK
Latch-free {INV NAND BUF}
e
gating logic EN
d
GCLK
Minimal bit-width of gated CLK
registers
ca
Clock gating
EN
CLK
Latch-based {NAND INV}
GCLK

Implementation of Clock Gating
Clock gating is a two-step process:
Step1: Identify enable conditions
Done using
ce
Step 2: inserting clock-gating cells into the clock path using the enable logic
n
Simple combinational logic (output hold on a register)
More complex sequential logic that spans multiple clocks
automatically.
d e
Commercially available synthesis tools accomplish the second task
a
D_in D_in
D_out
D_out
c
CG CG
CG
Non-Optimized With-Clock Gate Combinational clock gating

Sequential clock-gating
Clock Gating Advantages and Disadvantages

Advantages
Reduces the dynamic power consumption by the clock network
ce
Reduced internal power consumption at the clock-gated flip-flops
No need for muxes to re-circulate the data for these flip-flops (saves power
and area)
Disadvantages
No effect on leakage
en
d
May result in setup time or hold time violations
Clock gating has to be inserted before clock tree synthesis (CTS) in most
a
power design flows and hence presents design issues
c
Affects testability by introducing multiple clock domains (solved if we use a
latch-based design)
Adding clock gating may not always be accompanied by reduced power
Clock gating adds logic that consumes power

Multi-Threshold Logic
Using libraries with multiple VT has become a common way of reducing
leakage current as geometries have shrunk (130 nm, 90 nm)
ce
Sub-threshold leakage depends exponentially on VT.
Today, many libraries offer two or three versions of their cells: Low VT,
Standard VT, and High VT.
n
The implementation tools can take advantage of these libraries to optimize
e
timing and power simultaneously.
Leakage Delay
d
100%
80%
Clock gating
ca
60%
40%
20%
0%
LVt SVt HVt
Leakage vs. Delay at 90 nm
Implementing a Multi-Threshold Logic

A “Dual VT” flow is common during synthesis.
e
Minimize total number of fast, leaky low VT transistors by deploying
them only when required to meet timing.
nc
Involves an initial synthesis targeting a primary library followed by an
optimization step targeting additional libraries with differing thresholds.
Examples
Goal: High performance
d e
Synthesizing with high-performance, high-leakage library first and then
relaxing back any cells not on the critical path by swapping them for lower
a
performing, lower leakage equivalent
Goal: Minimum leakage
c
Target the low-leakage library first and then swap in higher performing,
high-leakage equivalents to meet timing in critical paths
Clock gating

Multi-Threshold Logic: Advantage and Disadvantages
Advantages
e
Can reduce leakage power without compromising performance.
Delay has a much weaker dependence on VT.
Disadvantages
nc
Leakage current increases exponentially with VT reduction.
d e
In terms of cost, requires one additional mask.
Reducing leakage power may compromise performance.
Clock gating
ca
Multi-Voltage with Shut-Off

Dynamic power is proportional to VDD2, lowering VDD on selected blocks
helps reduce power significantly.
ce
Different blocks have different performance objectives and constraints.
A lower supply rail means that the dynamic and static power will be lower for
the cells on this rail.
n
Partition the internal logic of the chip into multiple voltage regions or power
e
domains, each with its own supply.
For example, processor needs to run as fast as the semiconductor technology will
d
allow; high supply voltage is required.
In a USB block run at a relatively slow
a
Cache RAMS
frequency dictated by protocol, a 1.2V
lower supply rail may be sufficient
c
for the block to meet its timing SOC
constraints.
0.9V
CPU
Clock gating 1.0V
Multi-Voltage Architecture
Techniques to Achieve Multi-Voltage
To achieve multi-voltage on a chip, the following techniques are
implemented:
Voltage scaling interfaces – level shifters
Power gating
ce
n
Signal isolation cell
State retention power gates
Sleep transistors
d e
Clock gating
ca
Multi-Threshold Logic
Multi-Voltage with shut-off
Level Shifters (Voltage Scaling Interfaces)

VDD1 VDD2
e
Logic Logic
VSS
nc
Ensure signals going from one domain to another (e.g., 0.9V to 1.2V) will not
turn on both the NMOS and PMOS networks, causing crowbar currents.
e
Domain gets the voltage swings (and rise- and fall-times) that it expects.
clk
a
Q
VDDL
VSS
d
OUTL
D
clk
Q
VDDL
VSS
VDDH
OUTH
6/16/08
c
1.2V Domain
1.1V Domain
0.9V Domain
High-to-low level shifter cells

Implemented using two inverters in series
1.2V Domain 0.9V Domain
1.1V Domain
Low-to-high level shifter cells
More complex - Implemented using a buffered and an inverted
form of the lower voltage signal used to drive a cross-coupled
transistor structure running at the higher voltage
Power Gating
The technique used to turn off blocks that are not being used is known as
power gating.
Reduce the overall leakage power of a chip.
ce
Selectively powering down certain blocks in the chip while keeping other
blocks powered up.
n
Goal: To maximize power savings while minimizing the impact on
e
performance.
Activity Profile with Power Gating
d
SL W SL W SL
EE AK EE AK EE
P E P E P
a
200 mW
SLEEP events –
c
Power
Dynamic Dynamic Dynamic

Power Power Power Initiate entry to the low power
Activity 1 Activity 2 Activity 3 mode
20 mW
Leakage Power Leakage Power Leakage Power
WAKE events –
10 mW
Activity 1 (e.g., Clock Gated) Activity 2 Initiate return to active mode
Time
Signal Isolation Cells

Powering down regions on a chip should not
result in crowbar current or spurious behavior
e
at the inputs of powered–up blocks.
Inputs to the power gated blocks can be
c
driven to valid logic values by powered up
blocks without creating electrical (or
Vdd
n
functional) problems in the powered down
block. Pwr Isolation cell
Switch
e
Iso
The outputs of powered down blocks must be
controlled by using an isolation cell to clamp
d
the output to a specific, legal value.
Iso
Three basic types of isolation cell
a
Those that clamp the signal to “0”,(use AND
gate)
c
Those that clamp it to “1”, and (use OR-gate)
Those that latch it to the most recent value

State Retention Power Gates
Retention strategy prevents loss of state
information when block is powered down.
ce
On power up, state of block must be restored
from external source or build up state form reset
condition.
n
Time and power requirement can be significant.
e
Methods of saving and restoring the internal
state of a power gated block
d
Software approach: Based on reading and writing Vdd
registers (state info stored in processor memory) Pwr
Switch
a
Scan-based approach: Based on using a
dedicated set of scan chains to store state of chip Vdd VRET
c
D Q
Register-based approach: Uses retention SRPG
registers (contains a “shadow” register) to Clk Cell
preserve the registers state during power down Ret
and restore it at power up
Vss
Sleep Transistors: Fine-Grain Power Gating

Switches are embedded inside cells/IP.
A power gating control signal “SLEEP” (or “SLEEPN”) controls the sleep
ce
transistor to switch on and off the power supply to the cell.
A PMOS sleep transistor is used to switch VDD supply and is called “header
switch.” The NMOS sleep transistor controls VSS supply and is called “footer
n
switch.”
INPUTS
d e
VDD
OUTPUTS*
SLEEP
VDD
a
OUTPUTS*
SLEEPN INPUTS
Clock gating
c
Multi-threshold Logic
6/16/08
VSS

VSS
556
Sleep Transistors: Coarse-Grain Power Gating
Dedicated cells that can switch off the entire power or ground network of
particular row of cells
ce
A power gating control signal “SLEEP” controls the sleep transistors
connected in parallel between permanent and virtual power networks
n
VDD
e
SLEEP
VVDD
a d OUTPUTS*
INPUTS
Clock gating
c
Sleep Transistors: Advantages and Disadvantages

Advantages Disadvantages
Allows design functionality Lowering the voltage also increases the
and performance that
would not be achievable
without multi-voltage
ce delay of the gates in the design.

Mixing blocks at different VDD supplies
adds some complexity to the design.
n
Minimizes leakage, which
Multiple power domains require more
provides greatest
e
careful and detailed floorplanning.
reduction in power
Power grids become more complex.
d
Multi-voltage designs require additional
resources on the board (additional
a
regulators to provide the additional
c
supplies)
Power up and power down sequencing.
Clock gating There may be a required sequence for
Multi-Threshold Logic powering up the design to avoid deadlock.
Multi-Voltage with shut-off

Typical Design with Multi-Voltage
0.8v lib 1.0v
1.2v
Level shifters
e
1.0v lib
0.8v
Level shifters
c
1.2v lib
A general multi-voltage implementation showing libraries for the various power domains on the same chip.
Library Domain 2
(1.2V)
Power Domain 2
en Iso_cell
Level Shifter (LS)

Library Domain 3
Power Domain 3
(1.0V) Memory
d
Library
Iso_cell Domain 2
Power
Low Vt Normal Vt High Vt Level Shifter (LS) Domain 3
a
(High Speed) (Low leakage,
lower Speed)
Library Domain 4
Library Domain 1 Power Domain 4 LS
c
1.2 V (0.8V)
Power domain 1
Iso_cell
Iso_cell
Level Shifter (LS)
A more detailed block-level diagram showing the various elements that interface between the different power domains.
Summary Impact of Standard Low-Power Techniques
Technique Power Timing Area Impact Impact Impact Impact
e
Penalty Architecture Design Verification Place and
Route
c
Clock Medium Little Little Low Low None Low
Gating
n
Multi Vt Medium Little Little Low Low None Low
e
Multi- Large Little Little High Medium Low Medium
Voltage
d
Power Large Little Medium High Medium Low Medium
Gating
a
~ Large
6/16/08
Review Questions
What is clock gating?
e
How is multi-threshold logic implemented?
How is multi-voltage achieved?
nc
d e
ca
Summary
The tasks of a power consumption tool are to calculate static (leakage)
and dynamic (switching and internal) power for each instance in the
e
design.
c
The tasks of a power grid analysis tool are to use the instance power
(static) and current (dynamic) results to check for IR drop, ground
n
bounce, and electromigration in a design.
e
The earlier in the design process power consumption is addressed,
the bigger the impact since there are more degrees of freedom for
d
large changes to the design implementation.
a
Low-power design helps achieve significant power reduction at the
cost of addition design complexity.
6/16/08
True or false
e
1. In a power library, look-up tables are implemented by creating multiple
templates of common information that can be used to represent internal
c
power.
2. The effect of IR drop on a signal path is that the signal path is slowed,
en
thus causing a hold violation.
3. Wire electromigration is related to the power that is dissipated into the
interconnect.
d
4. Dynamic power consists of power dissipated inside a cell and power
dissipated to charge/discharge net capacitance.
a
5. By using multi-threshold logic, the implementation tool can take
c
advantage of HVT/LVT/SVT libraries to optimize timing and power
simultaneously.
6. A lower supply rail means that the dynamic and static power will be
lower for the cells on this rail.
Sources
Power Library
e
Library Compiler™ User Guide: Modeling Timing, Signal Integrity, and
Power in Technology Libraries, version A-2007.12, December 2007
Low-Power Design
nc
Voltage Storm Data Prep Manual, version 6.1.2
e
Low-Power Methodology Manual for System-on-Chip Design by
Michael Keating, David Flynn, Robert Aitken, Alan Gibbons, and
d
Kaijian Shi
ca
Reference: Formulae for Power Consumption Calculation
Ptotal = Pstatic + Pdynamic
e
Pstatic = VDD x Ileakage
c
Ileakage = [Number of transistors (logic gates + memory array) *
Average length of transistor in meter] * [Subthreshold leakage + Gate
n
Leakage]
Length of transistor is give in terms of its channel length denoted by λ
calculation purpose.
Pdynamic = α x CL x VDD2 x f
d e
where 1λ = 0.04 μm/λ in this example and must be used in μm for
Where
ca
α – Switching activity
f – Operating frequency
CL = [Number of transistors (logic gates + memory array) * Average
length of transistor in meter]
Reference: Example
Operating Voltage = 1.2V
Number of transistors = 200 million
Average logic transistor = 8λ (where 1λ = 0.04 μm/λ)
e
Subthreshold Leakage = 30 nA/μm
Gate Leakage = 2 nA/μm
Static power dissipation:
P static = I static * VDD
Transistors:
nc
[(200*10e6) * (8λ * (0.04 μm/λ)] = 6.4*10e6 μm
e
On an average, half the transistors are OFF and contribute subthreshold leakage.
Total static current is
d
(64*10e6 μm) * [(30 nA/μm)/2 + (2 nA/μm)] = 1088 mA
1088 mA * 1.2V = 1305.6 mW
a
Dynamic power dissipation:
P dynamic = α * C * VDD2 * f
c
Transistors:
200 * 10e6 * 8 λ * 0.04 μm/λ * 2 fF/μm = 128 nF
Dynamic Power Consumption per MHz or GHz:
[(0.1 * 12.8nF) + (0.05 * 25.6nF)] * (1.2)2 = 3.68 mW/MHz or 3.68W at 1 GHz

Extraction and Delay Calculation
Module 9
June 16, 2008
How Is Delay in a Circuit Estimated or Calculated?
e
reg r1, r2;
always @ (posedge clk) During
c
r2 <= !r1; RTL Coding
r1
en
u1
r2 After
d
Synthesis
ca
u1
r1
r2
During
Place/Route

Module Objectives
e
Articulate how extraction and delay calculation are run using standard
parasitic and delay formats
c
Compare the different extraction models, including parallel plate, 2.5D,
and 3D
(SPEF) file
en
State the various sections of a Standard Parasitic Exchange Format
Describe the concepts of propagation delay, transition time, and slew
d
State the various sections of a Standard Delay Format (SDF) file
a
Describe how delays are annotated during various phases of the design
flow
6/16/08
Topics In This Module

Parasitic extraction
e
Delay calculation
nc
d e
ca
What is capacitance?
e
What is resistance?
nc
d e
ca
What Is Capacitance?
Definition: Capacitance is a
measure of the amount of electric
e
charge stored between two plates
for a potential difference (voltage)
c
conductor1
across the plates.
Capacitance (C) is proportional to
n
the cross sectional area (A) of the
distance capacitance
e
plates, and inversely proportional
to the distance (D) between them. conductor2
d
C = K * A/D, where K is the
dielectric value of the
a
material between the plates
c
Example: The long wires in the
Cross-sectional area
design incurred a very large
capacitance between them, and,
therefore, the timing of the design
was compromised.

What Is Resistance?
Definition: Electrical resistance is
a measure of the degree to which
e
an object opposes an electric
current through it.
Resistance (R) is proportional to
nc
the length (L) of the wire and
inversely proportional to the cross- conductor1
e
sectional area (A).
R = K * L/A resistance
Example: For our current
a d
technology, wire resistance is
estimated with a factor measuring
c
resistance per unit length.
What Is Parasitic Extraction?

Definition: The process of
extracting the capacitance and
e
resistance values for all of the
interconnects (wires) in a circuit.
c
conductor1
Example: After routing, we ran
parasitic extraction and examined
n
the output files to make sure the
e
resistance and capacitance values capacitance
were below our maximum limit.
conductor2
a d resistance
6/16/08
Parasitic Extraction
Extraction models
e
SPEF file
Correlation
nc
d e
ca
Interconnects (Wires)
Extraction deals with the wires or
connections in a design. W
Interconnects (wires) in a given
e
S
c
technology will have several rules P
and specifications associated with
each metal layer.
Among the many rules
Width (W)
Pitch (P)
en
Spacing (S)
a d
Resistance per square unit
(RPSQ)
RPSQ
6/16/08
c m2

m2
576
Interconnects (Wires) (continued)
The thickness of the wires in a given
TABLE OF WIRE VALUES FOR 90nm PROCESS
technology is assumed to be constant.
e
METAL minimum
Resistance is characterized per square LAYER
width pitch
spacing
RPSQ
unit (RPSQ).
c
M8 0.42 0.84 0.42 2.7500e-02
Most technologies have three different
n
grades of interconnects: M7 0.42 0.84 0.42 2.7500e-02
Internal cell routes
e
M6 0.14 0.28 0.14 8.0600e-02
M1
Finest width, spacing
d
M5 0.14 0.28 0.14 8.0600e-02
Signal routes
a
M2 to M(N-2) M4 0.14 0.28 0.14 8.0600e-02
Medium width, spacing
c
M3 0.14 0.28 0.14 8.0600e-02
Global/power routes
M(N-1) to MN M2 0.14 0.28 0.14 8.0600e-02
Largest width, spacing
Thick metal M1 0.12 0.28 0.12 1.3000e-01
Interconnects (Wires) Examples
e
Signal Routes
VDD
nc
GND
d e
ca Internal Cell Routes
Power Routes

Resistance and Capacitance
Resistance calculations are typically m2
simple:
Single layer
Vias and via arrays
ce m1
via12
n
Capacitance calculations can be very
complex:
e
Multi-layer m2
d
Multi-dimension
Coupling capacitances
a
m1
Line-to-ground (net to substrate)
c
Line-to-line (nets on same layer) m1
Crossover (nets on different

layers)
substrate
Parallel Plate or 1D Model

Parallel plate simply models the “line-to-ground.”
Very quick extraction and calculation
e
Typically used in iterations during place/route
c
en B
a d substrate
6/16/08
Near Body Effects
Near body effects are coupling capacitances between adjacent layers of metal
There are several types:

Area capacitance (Ca)
Coupling capacitance (Cc)
ce
Fringe or sidewall capacitance (Cf)
n
Crossover capacitance (Cr)
e
a d Cr
Cc
6/16/08
c Ca
Cf
2D or 2.5D Model
2D or 2.5D models: Some of the “near-
body” effects
C
Much slower to extract
capacitance vs. 1D model
because there is more
information.
ce A B D
Much more accurate for crosstalk
en
and noise effects because the
coupling capacitances that E
d
contribute to crosstalk and noise
are extracted.
a
Used during detailed analysis substrate
c
during or after place/route.

3D Model
3D models: All of the “near-body”
effects
C
e
F
Very, very slow
c
Extremely accurate
Used for critical parts of a design,
n
A B D
usually the high-speed areas in
e
need of very accurate analysis
E G
a d
c
substrate

Routed Design
Input
e
TCL
Routed design in the Verilog®
c
language or other HDL + DEF or DEF or
GDSII
GDSII
n
Physical libraries in LEF format
Extraction
Tool-specific libraries, map files,
e
etc.
Physical
Extraction constraints and Library
d
SPEF
commands in TCL
Output
a
Parasitic File
SPEF file containing all of the RC
c
information for the routed nets in
the design

Parasitic Extraction in Flow
Extraction is performed during various
stages of place/route.
e
Rough estimates based on Specification
c
“virtual” routes after placement Designer Placement
Micro-
Physical Synthesis
Detailed estimates based on Architecture Scan Reorder

Design Optimization
“actual” routes after routing Designer
Delay Calculation
PostPlace
Signal Integrity
Extraction
RTL CTS
e
Output of extraction (SPEF) is used in Design Optimization
PostCTS
Logic Synthesis
many other steps in the flow. Route
d
Delay calculation for nets Gates Gates PostRoute
a
Design Verification
Signal integrity values for nets
Mask Prep
Delay values for static timing
c
GDSII
GDSII
analysis
Power and reliability analysis
during physical verification
Extraction models
e
SPEF file
Correlation
nc
d e
ca
What Is SPEF?
Definition: IEEE standard for *SPEF "IEEE 1481-1999"
representing parasitic data of *DESIGN “Sample“
e
*DATE “13:03:59 Monday December 18, 2007”
wires in a chip in ASCII format *VENDOR “Sample Tool Vendor”
*PROGRAM “Parasitics Generator”
c
Example: In order to perform *VERSION “1.1.0”
signoff, we ran parasitic extraction *DESIGN_FLOW “EXTERNAL_LOADS”
*DIVIDER /
and wrote out a SPEF file, which
n
*DELIMITER :
contained all of the capacitance *BUS_DELIMITER [ ]
*T_UNIT 1 NS
e
and resistance information of our *C_UNIT 1 PF
design. We input the SPEF file *R_UNIT 1 OHM
*L_UNIT 1 HENRY
into our timing and power analysis
d
tools to finalize our specification *POWER_NETS VDD
*GND_NETS VSS
for performance/Watt.
a
*PORTS
Note: SPEF also contains “inductance” CONTROL O *L 30 *S 0 0
c
FARLOAD O *L 30 *S 0 0
information, which is used for advanced INVX1FNTC_IN I *L 30 *S 5 5
processes or highly detailed analysis. NEARLOAD O *L 30 *S 0 0
TREE O *L 30 *S 0 0
We will not discuss inductance in this
*D_NET INVX1FNTC_IN 0.033
course.
…
IEEE Std 1481-1999

This is from the IEEE specification for SPEF.
e
9.1 Introduction
c
The Standard Parasitic Exchange Format (SPEF) provides a standard
medium to pass parasitic information between EDA tools during any
n
stage in the design process. Parasitics can be represented on a net-
by-net basis in many different levels of sophistication, from a simple
e
lumped capacitance, to a fully distributed RC tree, to a multiple pole
AWE representation.
a d
6/16/08
IEEE Std 1481-1999 (continued)
9.2 Targeted applications for SPEF
SPEF is suitable for use in many different tool combinations. Because
e
parasitics can be represented in various levels of sophistication, SPEF_files
can communicate parasitic information throughout the design flow process. A
c
design can be distributed between multiple SPEF_files. The files can also
communicate information such as slews and the “routing confidence”
n
indicating at what stage of the design process and/or how the parasitics were
generated. A diagram of how SPEF interfaces with various example
e
applications is shown in Figure 15.
a d
6/16/08
Where does SPEF come from?
e
Where is it used?
nc
d e
ca
What’s in an SPEF File?
Here are the basic elements of an SPEF file SPEF File
Header
e
Header
Contains all of the basic information of
the SPEF file’s origin and specifications
c
Name Map
Name map
n
Substitution of net names for symbols
Power and Ground Nets
Power and ground nets
e
Names of the power and ground nets Externals, Ports
Externals, ports
d
Specifies the port name, direction,
coordinates, capacitive load, slew, etc.
a
Internals
Internals
c
Detailed or reduced view of signal and
power nets in the design
Hierarchical entities
Used to reference instantiated
components with a sub-module SPEF
Hierarchical Entities
What’s in the Header Section?

The header of the SPEF file includes origin
information, design specifics, and unit SPEF File
e
definitions.
Header
SPEF_version
c
design_name Name Map
date
n
vendor Power and Ground Nets
e
program_name
Externals, Ports
program_version
d
unit_def
Pin/bus/hierarchy definitions
ca
The SPEF version is important, since syntax
will change and tools will support different
versions of SPEF.
Also, the program name and version are
important for debugging problems, possibly
wit faulty tool versions.
Internals

What’s in the Name Map Section?
The name map section simply has aliases for
long net names. SPEF File
e
name_map ::= *NAME_MAP Header
c
name_map_entry {name_map_entry}
name_map_entry ::= index mapped_item Name Map
n
index ::= *<pos_integer>
mapped_item ::= identifier | Power and Ground Nets
bit_identifier | path | name |
e
physical_ref
Externals, Ports
d
Example:
*NAME_MAP
a
*1 NET_1
*2 NET_2 Internals
c
…
*20 NET_20
Name maps are optional and reduce the

overall text in the SPEF.
What’s in the Power and Ground Nets Section?

This section simply states the names of
SPEF File
the power and ground nets.
e
Header
Example:
c
*POWER_NETS VDD Name Map
n
*GND_NETS VSS
e
Externals, Ports
a d Internals
6/16/08
594
What’s in the Externals, Ports Section?
The externals and ports section
SPEF File
describes the interfaces to the design,
e
including name, direction (I or O), Header
capacitive load (L), slew (S), and other
c
timing information.
Name Map
Example:
n
*PORTS
e
A O *L 30 *S 0.0 0.0
Externals, Ports
B O *L 30 *S 0.0 0.0
d
C O *L 30 *S 0.0 0.0
D O *L 30 *S 0.0 0.0
a
E I *L 30 *S 5000 5000
Internals
c
A,B,C,D,E = Port
I/O = Input or Output
L = Load
S = Slew
What’s in the Internals Section?

Internals describe the signal and power nets
SPEF File
in the design and can be of the following
e
type:
Header
d_net
c
r_net Name Map
n
d_pnet
r_pnet
e
d_net and r_net are detailed and reduced Externals, Ports
representations for signal nets.
a
representation for power nets
d
d_pnet and r_pnet are detailed and reduced
Internals
c
The d_net representations are detailed and
have much more information, while the r_net
representations are more compact and less
accurate. Use the appropriate type for the
part of the flow, d_net for signoff, r_net for
intermediate analysis. Hierarchical Entities

Internals
Syntax
e
internal_def ::= nets {nets}
nets ::= d_net | r_net | d_pnet | r_pnet
c
d_net ::=
*D_NET net_ref total_cap
n
[routing_conf] [conn_sec] [cap_sec] [res_sec] [induc_sec] *END
r_net ::=
e
*R_NET net_ref total_cap [routing_conf] {driver_reduc} *END
d_pnet ::=
d
*D_PNET pnet_ref total_cap
a
[routing_conf] [pconn_sec] [pcap_sec] [pres_sec] [pinduc_sec] *END
r_pnet ::=
c
*R_PNET pnet_ref total_cap [routing_conf] {pdriver_reduc} *END
We will show examples of “d_net” and “r_net” in the next few slides, and omit the “pnet”
examples.
Internals: d_net
A d_net is a detailed description of a net
in a design. // d_net example for SPEF
e
*D_NET INVX1FNTC 2.033341
It is comprised of several sections, *CONN
c
among them *I FL_1281:X O *L 0.0
*I I1184:A I *L 0.343
*D_NET declaration *I FL_1000:A I *L 0.343
n
*I NL_1000:A I *L 0.343
*I TR_1000:A I *L 0.343
Net reference
e
*CAP
Total capacitance 216 FL_1000:A 0.346393
217 I1184:A 0.344053
d
Connectivity (*CONN) section 218 INVX1FNTC_IN 0
219 INVX1FNTC_IN:10 0.0154198
Capacitance (*CAP) section 220 INVX1FNTC_IN:11 0.0117827
a
…
Resistance (*RES) section *RES
c
152 INVX1FNTC_IN INVX1FNTC_IN:18 8.39117
In the case where a specific net has a 153 INVX1FNTC_IN INVX1FNTC_IN:5 25.1397
154 INVX1FNTC_IN:11 INVX1FNTC_IN:20 4.59517
very high capacitance, you can search 155 INVX1FNTC_IN:12 INVX1FNTC_IN:13 3.688
through the section to see if the value is …
reasonable. *END

Internals: r_net
An r_net is a reduced description of a
net in a design.
*R_NET declaration
ce
It is comprised of several sections,
among them
// r_net example for SPEF
*R_NET NE_794 2.67137
n
*DRIVER NL_1039:X
Net reference *CELL INVX
*C2_R1_C1 1.0039 367.972 1.66747
e
Total capacitance *LOADS
*RC NL_1040:A 1.25641
driver information (*DRIVER) *RC NL_2039:A 714.176
d
pie_model (*C2_R1_C1) *END
a
load information (*LOADS)
RC information (*RC)
c
During timing analysis, you may need to
inspect sections of the SPEF file, like
the r_net section to make sure the
values are reasonable.
What’s in the Hierarchical Entities Section?

Hierarchical entities are references to SPEF File
submodules that are instantiated in the
e
given design and a have their own local Header
SPEF file.
c
Syntax Name Map
n
define_def ::= define_entry
{define_entry} Power and Ground Nets
e
define_entry ::= Externals, Ports
*DEFINE inst_name
d
{inst_name} entity
a
| *PDEFINE physical_inst
entity
Internals
c
entity ::= qstring
Example
*DEFINE blk1 “subBLOCK”


SPEF Example 1: Basic d_net File
*SPEF "IEEE 1481-1999" *CAP
*DESIGN “Sample“ 216 FL_1000:A 0.346393
*DATE “13:03:59 Monday December 18, 2007” 217 I1184:A 0.344053
*VENDOR “Sample Tool Vendor” 218 INVX1FNTC_IN 0
e
*PROGRAM “Parasitics Generator” 219 INVX1FNTC_IN:10 0.0154198
*VERSION “1.1.0” 220 INVX1FNTC_IN:11 0.0117827
*DESIGN_FLOW “EXTERNAL_LOADS” …
*DIVIDER / Header 240 NL_1000:A 0.344804
*DELIMITER : 241 TR_1000:A 0.34506
c
*BUS_DELIMITER [ ]
*T_UNIT 1 NS *RES
*C_UNIT 1 PF 152 INVX1FNTC_IN INVX1FNTC_IN:18 8.39117
*R_UNIT 1 OHM 153 INVX1FNTC_IN INVX1FNTC_IN:5 25.1397
n
*L_UNIT 1 HENRY 154 INVX1FNTC_IN:11 INVX1FNTC_IN:20
4.59517
*POWER_NETS VDD Power and …
*GND_NETS VSS 175 INVX1FNTC_IN:9 INVX1FNTC_IN:10 10.8533
Ground Nets 176 INVX1FNTC_IN:9 INVX1FNTC_IN:11 1.05164
e
*PORTS *END
CONTROL O *L 30 *S 0 0
FARLOAD O *L 30 *S 0 0 *D_NET NE_794 1.98538
Externals/ Internals
INVX1FNTC_IN I *L 30 *S 5 5
Ports *CONN
d
NEARLOAD O *L 30 *S 0 0
TREE O *L 30 *S 0 0 *I NL_1039:X O *L 0 *D INVX
*I NL_2039:A I *L 0.343
*D_NET INVX1FNTC_IN 0.033 *I NL_1040:A I *L 0.343
a
*CONN *CAP
*P INVX1FNTC_IN I 3387 NE_794 0
*I FL_1281:A *L 0.033 3388 NE_794:1 0.0792492
*END …
c
*D_NET INVX1FNTC 2.033341 Internals 3413 NL_1040:A 0.344453
3414 NL_2039:A 0.343427
*CONN
*I FL_1281:X O *L 0.0 *RES
*I I1184:A I *L 0.343 2879 NE_794:1 NE_794:13 66.1953
*I FL_1000:A I *L 0.343 2880 NE_794:1 NE_794:2 0.311289
*I NL_1000:A I *L 0.343 …
*I TR_1000:A I *L 0.343 2903 NL_1039:X NE_794:25 1.00317
2904 NL_2039:A NE_794:23 0.171175
*END
SPEF Example 2: Basic r_net File

*SPEF "IEEE 1481-1999"
*DESIGN “Sample”
*DATE “Fri Feb 9 15:29:56 2007”
*VENDOR “Sample Tool Vendor”
e
*PROGRAM “Parasitics Generator”
*VERSION “1.1.0”
*DESIGN_FLOW “EXTERNAL_LOADS” “EXTERNAL_SLEWS”
*DIVIDER / Header
c
*DELIMITER :
*BUS_DELIMITER [ ]
*T_UNIT 1.0 PS
*C_UNIT 1.0 PF
n
*R_UNIT 1.0 OHM
*L_UNIT 1.0 HENRY
*POWER_NETS VDD
e
*GROUND_NETS VSS Power and Ground Nets
*PORTS
TREE O *L 30 *S 0.0 0.0
d
FARLOAD O *L 30 *S 0.0 0.0
NEARLOAD O *L 30 *S 0.0 0.0
CONTROL O *L 30 *S 0.0 0.0
INVX1FNTC_IN I *L 30 *S 5000 5000
Externals/Ports
a
*R_NET NE_794 2.67137
*DRIVER NL_1039:X
*CELL INVX
c
*C2_R1_C1 1.0039 367.972 1.66747
*LOADS
*RC NL_1040:A 1.25641
*RC NL_2039:A 714.176
*END
*D_NET INVX1FNTC_IN 0.033
Internals
*CONN
*P INVX1FNTC_IN I
*I FL_1281:A *L 0.033
*END

SPEF Example 3: Top Level with Name Map
*SPEF “IEEE 1481-1999”
*DESIGN “topLevel”
*DATE “MON Sep 9 9:34:01 2008”
*VENDOR “Sample Tool Vendor”
e
*PROGRAM “ParasiticsGenerator”
*VERSION “1.0 ALPHA”
*DESIGN_FLOW “EXTERNAL_SLEWS” “EXTERNAL_LOADS”
*DIVIDER | Header
c
*DELIMITER :
*BUS_DELIMITER [ ]
*T_UNIT 1.0 PS
*C_UNIT 1.0 PF
n
*R_UNIT 1.0 OHM
*L_UNIT 1.0 UH
*NAME_MAP
e
*1 IN1
*2 net1a
*3 blk1 Name Map
*4 net3b
d
*5 OUT1
*PORTS
*5 O *L 0.05 Externals/Ports
a
*1 I *S 5000 5000
*DEFINE *3 “subBLOCK”
Hierarchical Entity
c
*D_NET *4 0.32429
*CONN
*I *3:OUT2 O
*I I104:I I *L 0.044
*CAP
1 *3:OUT2 0.011307
2 I104:I 0.128838
3 *4:1 0.140145
Internals
*RES
5 *3:OUT2 *4:1 7.128
6 *4:1 I104:I 2.55215
*END
Extraction models
e
SPEF file
Correlation
nc
d e
ca
Extraction Correlation
There are two types of extraction that
are run in the physical implementation Optimization
e
Netlist Extraction
flow.
c
Extraction during optimization
Extraction during optimization is Place/Route
n
done because it is much faster.
Extraction during signoff
e
Extraction during signoff is done GDSII
because it is more accurate, but
d
slower and requires special
inputs, such as GDSII, tools Signoff
a
specific libraries, and mapping Extraction
files.
6/16/08
SPEF Parasitic File
605
Running Extraction with QRC

QRC is Cadence’s extraction tool.
e
Steps
Create Extraction Libraries
Create extraction libraries
c
QRC requires special libraries
generated from technology specific
n
files. Input Routed Design
Input routed design
DEF, GDSII, etc.

Create command file
d e
Commands and directives for setup
Create Command File
a
and extraction.
Run Extraction
c
Run extraction
Runs the extraction algorithms with the

options specified in the command file.
Generate Output File
Generate output file
Generates SPEF or other format.

Running Extraction with QRC (continued)
Command line:
# QRC Command File : GDSII -> SPEF
e
qrc –cmd script.cmd –log process_technology \
-technology_library_file assura_tech.lib \
logfile.log -technology_name tsmc13
c
output_setup \ TCL
-net_name_space schematic \
Command file includes many options, -temporary_directory_name QRCRun \
-file_name QRC_coupled.spef
among them:
n
extraction_setup \ Physical
-max_fracture_length infinite \ Library
-net_name_space layout \
process_technology -max_fracture_length_unit micron
e
input_db \ Routed Design
setup commands (input, output, -type assura \
-directory_name ../rundir \
and extraction) -run_name EngineX4 \
d
GDSII
-format GDS \
-design_file ../routed1.gds \
input_db -design_cell_name EngineX4
a
output_db -type spef Extraction
output_db extract -selection all -type rc_coupled
global_nets -nets VDD VSS
capacitance -decoupling_factor 1.0
c
extract filter_coupling_cap \
SPEF
-coupling_cap_threshold_absolute 0.01
filter_cap \
global_nets -exclude_floating_nets true
filter_res \ Parasitic File
-remove_dangling_res true \
capacitance -merge_parallel_res true
filter commands
In a SPEF header, why are the program_name and program_version
important?
used?
ce
What is the difference between d_net and r_net? When are they
What is the difference between coupling cap and fringe cap? Which
integrity?
en
kind of capacitance do we need to be concerned about for signal
What do you do with an SPEF file?
a d
6/16/08
Parasitic extraction
e
Delay calculation
nc
d e
ca
Delay Calculation
Delay calculation fundamentals
e
SDF
Back-annotation and forward-annotation
nc
d e
ca
What Is Propagation Delay?
Definition: The propagation delay is the time difference between the input
signal crossing a voltage threshold and the output signal crossing a voltage
e
threshold.
c
Example: The inverter had a propagation delay of 10 ps.
VH
voltage
VTH_50
input signal
en
d
VL
propagation
delay
INV
a
VH tprop = 10ps
c
VTH_50
VL
output signal
time
What Is Slew/Transition Time?

Definition: The slew time of a signal is measured as the rate of its transition,
typically in volts/ns. The transition time is the time it takes for the signal to
e
pass through two specified voltage thresholds. The threshold points are
usually defined as a certain percentage of the voltage swing.
c
Example: The slew of the output signal was 0.01 volt/ps, whereas the
transition time to go from 10% of VDD to 90% of VDD was 10 ps.
n
e
voltage transition transition
time time
d
VH
VTH_90
VTH_10
VL
ca slew
slew
time

Delays in a Timing Path
A timing path consists of the sum of delays between a start point and an end point.
The delays can include
Cell delay
Interconnect delay
The start points and end points can include
ce
Register clocks, inputs
Ports of the design
Pins of a macro inside the design
en
a d path delay
D
c
start end
point point
CK->Q
cell interconnect cell interconnect cell interconnect cell

delay delay delay delay delay delay delay
What Is Cell Delay (tcell)?

The cell delay (propagation delay) is the delay through the cell as determined by
Cell’s “intrinsic” delay
Load on the cell
Slew of the input signal
ce
tcell = tinstrinsic + tload_slew
en
d
Intrinsic Delay Load Slew
Cell delay with zero load The larger the load, the The larger the input slew,
longer the delay the longer the delay
ca tinstrinsic

Load and Slew: tload_slew
The load and slew dependent portion of the cell delay is calculated via tables in a
technology timing library.
ce
Library vendor characterizes each cell in the library for timing.
Table values serve as boundaries, so the delays can be estimated between

the given table values.
slew
en
# Table for load/slew dependent cell delay
Model(ioDelayRiseModel
d
(Spline
(Input_Slew_Axis 0.050 0.200 1.000 4.000 20.000)
delay
(Load_Axis 0.0446 0.892 3.568 14.275)
a
values
data((0.7210 0.8471 1.2849 3.05673)
(0.8119 0.9380 1.3758 3.1475)
c
(0.9975 1.1236 1.5612 3.3322)
(1.4293 1.5552 1.9922 3.7609)
load (3.3955 3.5204 3.9542 5.7101))
What Is Interconnect Delay?

The delay through the nets or interconnects of a design are calculated by the
resistance and capacitance of the nets. These values can be
Estimated
Reduced
Detailed
ce
Estimated
en
Reduced Detailed
d
Uses a “wire load model” (WLM) Uses a reduced SPEF and Uses a detailed SPEF and
that estimates the net delay annotates a lumped RC value annotates the detailed RC
based on load and slew to the net values to the net
ca WLM
SPEF
r_net
SPEF
d_net

Calculating the Path Delay
So, the path delay in our original example would be the sum of the cell and
interconnect delays.
ce
tpath = tc1 + ti1 + tc2 + ti2 + tc3 + ti3 + tc4, where
tc1 is the clock-to-q delay of the starting register
ti1, ti2, and ti3 and the interconnect delays
en
tc2 and tc3 are the cell delays of the logic in between the registers
tc4 is the setup time of the ending register
start
a d tpath
D
end
c
point point
CK->Q
tc1 ti1 tc2 ti2 tc3 ti3 tc4
How does slew and load affect delay?
e
How does a library vendor get the timing data for its technology
libraries?
nc
d e
ca
Delay Calculation
Input
ce
Parasitic extraction file (SPEF)
Routed Design
Gates +
DEF
TCL
n
SPEF
Logical timing libraries in Liberty
format Delay Calculation
e
Optional: Physical libraries in LEF Logical Physical
Library Library
format SDF
d
Constraints and commands in
TCL Delay File
Output
ca
Standard Delay Format (SDF) file
containing all of the delay
information in the design
Delay Calculation in Flow

Delay calculation is performed at all stages
of the place/route flow, including logic
e
synthesis.
Rough estimates based on wire load Specification Floorplanning Place/Route
c
models in logic synthesis Designer Placement
Better estimates after floorplanning, Micro-

Physical Synthesis
Architecture Scan Reorder
n
placement, and CTS
Design Optimization
Designer
Delay Calculation
PostPlace
Best estimates based on extracted
Signal Integrity
Extraction
RTL
e
CTS
parasitics after routing
Design Optimization
PostCTS
Logic Synthesis
Output of delay calculation (SDF) is used in
d
Route
many other steps in the flow. Synthesized Design Optimization
Gates Gates PostRoute
Internally, it is used during logic
a
synthesis during optimization. Design Verification
Mask Prep
In signal integrity, delay calculation
c
creates incremental SDF for timing GDSII
GDSII
analysis, based on the SI parasitics.
In static timing analysis, the SDF file
is used to annotate timing on cells
and nets.

Delay Calculation
e
SDF
nc
d e
ca
What Is SDF?
Definition: An IEEE standard for Example SDF File
(DELAYFILE
the representation and
e
(SDFVERSION "3.0")
interpretation of timing data for (DESIGN "BIGCHIP")
(DATE "March 12, 1995 09:46")
use at any stage of an electronic (VENDOR "Southwestern ASIC")
c
(PROGRAM "Fast program")
design process (VERSION "1.2a")
(DIVIDER /)
(VOLTAGE 5.5:5.0:4.5)
Example: In our design flow, we
n
(PROCESS "best:nom:worst")
have a standalone delay (TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
e
calculator that outputs SDF. We (CELL
(CELLTYPE "BIGCHIP")
loaded the SDF into our static (INSTANCE top)
(DELAY
timing analysis tool to verify our
d
(ABSOLUTE
(INTERCONNECT mck b/c/clk (.6:.7:.9))
design meets its performance (INTERCONNECT d[0] b/c/d (.4:.5:.6))
requirements. )
a
)
)
(CELL
c
(CELLTYPE "AND2")
(INSTANCE top/b/d)
(DELAY
(ABSOLUTE
(IOPATH a y (1.5:2.5:3.4) (2.5:3.6:4.7))
(IOPATH b y (1.4:2.3:3.2) (2.3:3.4:4.3))
)
)
)
…

SDF Specification Version 3.0
This is from the SDF specification.
Introduction
e
The SDF file stores the timing data generated by EDA tools for use at any stage in the design
process. The data in the SDF file is represented in a tool-independent way and can include
nc
Delays: Module path, device, interconnect, and port
Timing checks: Setup, hold, recovery, removal, skew, width, period, and nochange
Timing constraints: Path, skew, period, sum, and diff
Incremental and absolute delays
d e
Timing environment: Intended operating timing environment
Conditional and unconditional module path delays and timing checks
a
Design/instance-specific or type/library-specific data
c
Scaling, environmental, and technology parameters
Throughout a design process, you can use several different SDF files. Some of these files can
contain pre-layout timing data. Others can contain path constraint or post-layout timing data.
What’s in an SDF File?

Here are the basic elements of an SDF file.
SDF File
Header
e
Contains all of the basic information of Header
the SDF file’s origin and specifications
c
Cell entries
Identifies a cell or macro that contains Cell Entries
n
timing data to be applied
Within a cell entry, there can be delay,
e
timing check, and timing environment
entries
Delay entries
d
Identifies I/O paths, ports, and
interconnects that contain timing data
Delay Entries
a
to be applied
Timing check entries
c
Associate timing check limit values with
specific cell instances
Timing environment entries Timing Check Entries
Contains timing environment
information, constraints, etc. Timing Environment Entries

Header
The header contains basic information of the SDF
file’s origin and specifications, including among SDF File
others
e
SDF version Header
Design name
c
Vendor, program name, and version
Process Information, timescale
Cell Entries
n
Example:
e
(DELAYFILE
(SDFVERSION "3.0")
(DESIGN “MYCHIP")
(DATE “December 30, 2007 12:08")
d
(VENDOR "ASIC_vendor")
(PROGRAM “SDF_program")
(VERSION “2.4.1") Delay Entries
a
(DIVIDER /)
(VOLTAGE 1.5:1.3:1.1)
c
(TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
The SDF version, vendor, and program name are Timing Check Entries
important to note for debug reasons. Also, process,
temperature, voltage, and timescale information
should be consistent with your timing analysis. Timing Environment Entries
Cell Entries
Cell entries identify the cells and macros in a
design with the following information: SDF File
e
Cell type Header
c
Cell instance name
Example: Cell Entries
(CELL
(CELLTYPE “DFF”)
(INSTANCE u1/u2/u3_reg)
en
d
…
Delay Entries
a
)
)
c
)
The delay, timing check, and timing Timing Check Entries

environment entries can be located inside of
each cell entry.
Timing Environment Entries

Delay Entries
There are three types of delays in a delay entry:
Absolute
Incremental
Pulse width
ce
n
It is important to differentiate them. For a typical analysis with crosstalk, the
absolute delays are first annotated, then the incremental delays due to crosstalk are
e
annotated and added to the existing delays.
d
Absolute Incremental Pulse Width
SDF delays overwrite existing SDF delays are added to existing Pulse limits are set for specific
a
delays during annotation. delays during annotation. Points.
c
in1 limit1
2ns 2ns
in2
Incr out Pulse

2ns SDF 3ns
SDF limit2 Width
1ns 1ns
Delay Entries (continued)

Delay entries associate delay values with the elements
of its cell entry and can include the following delay SDF File
types.
e
Absolute Header
SDF replaces existing delay values in the
c
design during annotation.
Increment
SDF adds to existing delay values in the Cell Entries
n
design during annotation.
Pathpulse
e
Pulse width limits
Examples:
d
(DELAY
(ABSOLUTE
Delay Entries
a
(IOPATH (posedge clk) q (22:28:33) (25:30:37))
(PORT clr (32:39:49) (35:41:47))
)
)
c
(DELAY
(INCREMENT
(IOPATH (posedge clk) q (-4::2) (-7::5))
(PORT clr (2:3:4) (5:6:7))
) Timing Check Entries
)
(DELAY
(PATHPULSE i1 o1 (13) (21))
) Timing Environment Entries

Setup, Hold, Recovery, and Removal
Review on timing checks
Setup and hold
Recovery
Removal
ce
Setup/Hold
Setup: Limit of time where data
en
Recovery
Limit of time between the
Removal
Limit of time between an
d
must remain stable before the removal of an asynchronous active clock edge and the
clock edge signal (not data) and an removal of an asynchronous
a
active clock edge signal (not data)
Hold: Limit of time where data
a_rstb a_rstb
must remain stable after the
c
clk clk
clock edge clk clk
a_rstb a_rstb
recovery removal
Timing Check Entries

Timing check entries associate timing check limit
values with specific cell instances. Among the SDF File
available types
e
Setup Header
c
Hold
Recovery
Removal Cell Entries
Example:
(TIMINGCHECK
(SETUP din (posedge clk) (12))
en
d
)
(TIMINGCHECK
Delay Entries
a
(HOLD din (posedge clk) (9.5))
)
(TIMINGCHECK
c
(RECOVERY (posedge clearbar) (posedge
clk) (11.5))
)
(TIMINGCHECK
(REMOVAL (posedge clearbar) (posedge
clk) (6.3)) Timing Environment Entries
)

Timing environment entries associate
SDF File
constraint values on critical paths, as
e
well as provide information about the Header
environment the circuit will operate.
c
Among the entries are
Constraints for path, period, skew, Cell Entries
n
etc.
e
Time for arrival, departure, slack
Waveform
In most cases, all of this information is
a
contained in the Standard Design
Constraints (SDC) file
d Delay Entries
6/16/08
631
SDF Example: Basic File

Example
(DELAYFILE
(SDFVERSION "3.0")
(DESIGN "BIGCHIP")
(DATE "March 12, 1995 09:46")
e
(VENDOR "Southwestern ASIC")
(PROGRAM "Fast program")
(VERSION "1.2a")
Header
(DIVIDER /)
(VOLTAGE 5.5:5.0:4.5)
c
(TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
(CELL
(CELLTYPE "BIGCHIP")
n
(INSTANCE top)
(DELAY
(ABSOLUTE
(INTERCONNECT mck b/c/clk (.6:.7:.9))
Cell 1 – Top with interconnects
(INTERCONNECT d[0] b/c/d (.4:.5:.6))
e
)
)
)
(CELL
(CELLTYPE "AND2")
(INSTANCE top/b/d)
d
(DELAY
(ABSOLUTE
Cell 2 – AND gate with delays
(IOPATH a y (1.5:2.5:3.4) (2.5:3.6:4.7))
(IOPATH b y (1.4:2.3:3.2) (2.3:3.4:4.3))
)
a
)
)
(CELL
(CELLTYPE "DFF")
(INSTANCE top/b/c)
c
(DELAY
(ABSOLUTE
(IOPATH (posedge clk) q (2:3:4) (5:6:7))
(PORT clr (2:3:4) (5:6:7))
Cell 3 – Register with delays, setup checks
)
)
(TIMINGCHECK
(SETUPHOLD d (posedge clk) (3:4:5) (-1:-1:-
1))
(WIDTH clk (4.4:7.5:11.3))
)
)
(CELL
. . .
)
More Cells
)

Running Delay Calculation
SignalStorm® NDC is Cadence’s delay
calculation tool.
e
Steps Generate SignalStorm
Generate SignalStorm libraries Libraries
c
Generates libraries for more accurate
delay calculation Generate SignalStorm
n
Generate a SignalStorm design Design Database
database
Imports netlist information
e
Import SPEF
Import SPEF
Imports parasitics
d
Setup Conditions
Setup conditions
a
Sets up the boundary conditions for the
design, including slew and load
information Calculate Delay
c
Calculate delay
Core algorithm to calculate delay for
the design Generate Output File and
Reports
Generate output file and reports
Generate SDF and reports
Running Delay Calculation (continued)

Command line:
TCL
e
sndc –S script.cmd –L
logfile.log
c
SPEF
Command file includes many options,

among them
n
Tech
Libraries
# QRC Command File : GDSII -> SPEF
Create and open design database db_open demo
db_install -spef test.spef
e
db_setup -setup test.st -process worst Routed Design
Import SPEF db_load TEST_CHIP
db_delay -process worst
db_xtk -process worst
Import setup commands
d
db_report sdf -p worst -report test.sdf DEF
-design TEST_CHIP -xtk_min fast
Load and link design -xtk_max slow
db_close
a
Delay Calculation
Calculate delay
Write output files and reports
c
SDF
Delay File

Delay Calculation
e
SDF
nc
d e
ca
Back-Annotation
Delay calculation produces an
SDF timing file, based on
e
*SPEF
technology information, SPEF,
and design information (netlist).
nc
The analysis tool can now read in
the SDF, as well as the design
and technology information, to
Tech
Lib
Delay
Calculator
e
produce its reports.
Since the SDF is already created,
d
all of the timing information can be SDF Netlist
used by all subsequent tools in the
a
flow, thus ensuring consistency.
c
Tech Analysis
Lib Tool

Forward-Annotation
An analysis tool can take the user
constraints and technology library
e
information and create and SDF
file with more granular constraints Tech Analysis User
c
Lib Tool Constraints
to drive an implementation tool.
The implementation tool
n
(synthesis, or place/route) can use
e
the information in the forward- SDF
annotated SDF file to more Constraints
accurately constrain the design
d
and possibly make better choices
to meet its overall constraints.
ca Implementation
Tool
Summary
There are various type of extraction models, which vary accuracy with
runtime. They include parallel plate, 2.5D, and 3D models.
ce
The Standard Parasitic Exchange Format (SPEF) file is the IEEE
standard to store parasitic information for a design. It has several
sections, including header, externals, and internals.
en
Fundamentally, delay calculation is based on the concepts of
propagation delay, transition time, and slew. We saw that delay is a
function of transition time and slew, among other variables.
a d
The Standard Delay Format (SDF) file is the IEEE standard to store
delay information. It has several sections, including header, cell, and
delay entries.
6/16/08
c
SDF delay data can be back-annotated to analysis tools, whereas
SDF constraint data can be forward-annotated to implementation tools.

True or false
e
1. 3D extraction models are used for quick and relatively inaccurate
parasitic calculations.
of the tool.
nc
2. In the header section of SPEF, there is a place to annotate the version
3. Recovery is similar to a hold time check, and removal is similar a
e
setup time check.
d
4. In the “cell entry” of an SDF file, all of the relevant timing information
for the cell is included within its boundaries.
ca
5. One advantage of using a delay calculator’s SDF is that the timing
calculations will be consistent throughout the entire flow.
Learning Activity
e
Study the physical implementation flowchart
Add SDF and SPEF files at the appropriate step in the flow
c
n
e
a d
6/16/08
Sources
Standard Parasitic Exchange Format (SPEF), IEEE Standard 1481-1999
e
Standard Delay Format Specification Version 3.0, Open Verilog
International: http://www.eda.org/sdf/sdf_3.0.pdf
nc
d e
ca
ce
en
a d
6/16/08
Static Timing Analysis and
Signal Integrity Analysis
Module 10
June 16, 2008
How Do You Check if a Circuit Meets Timing?
ce
en
a d
6/16/08
Module Objective
In the class, you will be able to
e
Explain static timing and signal integrity (SI) analysis and identify
problems
nc
d e
ca
What determines the speed at which a circuit works?
e
How do you gauge if your circuit works correctly at the required
speed?
nc
d e
ca
Static timing analysis (STA)
e
Signal integrity analysis
nc
d e
ca

Timing analysis
e
Timing constraints
Constraint checking and report timing: slacks and violations
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
Design rule verification
6/16/08
Purpose for Timing Analysis
The goal of timing analysis is to verify that a design meets timing
requirements under a specified set of timing constraints.
incurring timing violations.
ce
Timing analysis lets you determine how fast a design can run without
The results of timing analysis can be used to fine tune and debug the
n
speed-limiting, critical paths in a design.
e
a d
6/16/08
Types of Timing Analysis

Static timing analysis
Adds delays for all elements in a timing path together and compares with
e
given timing constraints
c
Analyzes all possible timing paths in a short period of time
Ignores functionality of circuit, thus analyzing paths that cannot be
n
exercised and must be eliminated by the designer
Preferred method for signoff
Dynamic timing analysis
d e
Designer creates timing test vectors that are simulated using a gate-level
netlist to verify timing
a
No false paths exist
Easy to miss paths by not including them in vectors
c
Requires a significant amount of CPU time to do simulations
This is a mandatory late-stage run to ensure that paths not tested by static
timing analysis are checked
In this course, we will only cover static timing analysis.
What Is Static Timing Analysis?
The preferred method for timing
signoff
e
Specification
c
computing the timing of Designer Placement
logically related paths for a Micro-
Physical Synthesis
digital design without regard

Design Optimization
Designer
to large scale functional
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL
e
behavior CTS
Design Optimization
PostCTS
Example: To determine the Logic Synthesis
d
Route
timing of the design, we ran Synthesized
Netlist PostRoute
static timing analysis after
a
Detail
detail route, and saw several Routed
Design
GDSII
paths violating their setup
c
time requirements.

Input
ce
Design in the Verilog® language or
other HDL (Note: STA can be run
on a design at any stage of the back-
end flow.)
SPEF
SDF Routed Design
n
SDC TCL
Incremental
Constraints in Synopsys Design SDF Gates
e
Logical timing libraries in Liberty Static Timing
Analysis
d
(.lib) format
Logical
Constraints and commands in TCL Library
a
SPEF, SDF, and
incremental SDF (SI analysis) Reports
Output
6/16/08
c
Timing reports, including noise-on-
delay effects (SI analysis)

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
What Are Timing Constraints?

Timing constraints represent the performance goals for your designs.
e
Software tools use the timing constraints to guide the timing-driven
optimization tools in order to meet these goals.
Clocks definition
n
Input delay/arrival time
c
Some of the timing constraints that STA tool follows are
d e
Output delay/required time
ca
What Is a Clock Definition?
Clock period: The time difference between two consecutive rising or falling
clock edges when they cross a specific reference level
rectangular waveform
ce
Duty cycle: The ratio between the pulse duration (t) and the period (T) of a
en
Pulse Duration t
Duty Cycle = t/T
a
Rising Falling
Edges Edges
d Clock Period T
6/16/08
Types of Clocks
Ideal clocks
To simplify clock analysis, we assume that under ideal condition all flip-flops are
e
clocked together at a time reference (time = 0 ns).
c
In ideal mode, clock tree has zero insertion delay.
Propagated clocks
n
Insertion delay is the known delay of the clock tree to any given end point.
Clock uncertainty = clock skew + clock jitter, is the unknown variation in clock
e
delays.
Clock delays are calculated from clock tree routing and extracted delays.
a
Ideal Clock
d
Provides more accuracy and is used for final timing closure.
c
Clock Insertion Delay Clk
skew
Clock uncertainty Clk
Propagated Clock jitter

Pre-CTS and Post-CTS Constraints
Pre-CTS
Ideal Clock
Ideal clocks with uncertainty are
e
Clock pin
used. C. logic
c
Uncertainty consists of margin
(extra delay the design team Delay
n
adds), clock skew, and clock Clock Source Network
latency latency
jitter. source
e
Estimated latency is considered.
d
Post-CTS
Propagated Clock
Propagated clocks are used.
a
Clock pin
Uncertainty consists of margin C. logic
c
and clock jitter.
Propagated latency is Delay
considered. Clock Source Network

source latency latency
What Are Arrival Time and Required Time?

The input delay time or the arrival time is the time that the data is presented to
the inputs of the module or register, respectively.
e
The external delay time or the required time is the time determined by external
c
logic before the next rising edge of the clock.
Input delay
en Output delay
a d
6/16/08
c clock period
Data arrival timing
clock period clock period

What Is an Operating Condition?
Integrated circuits display performance differences depending on the fabrication process, voltage and
temperature (PVT) characteristics.
e
Each wafer batch is made with a slightly different set of process parameters and thus, inherently, the
die will run at different speeds. In fact, there can even be variations across a single die, (OCV), which
c
will be discussed later.
This constraint describes the process, voltage, and temperature conditions of design.
n
There are three conditions: worst, best, and typical.
Operating conditions can be set from a single set of libraries (min, typ, or max) or from multiple
e
libraries (min and max), and used to perform setup and hold analysis.
The technology libraries contain information on how to scale the cell parameters with variation in
d
process parameters and operating conditions that can be used to calculate accurate cell delay.
ca
WORST case,
HIGH temperature,
LOW voltage,
BEST process
WORST case,
HIGH temperature,
LOW voltage,
WORST process
STD cell library

BEST case,
HIGH temperature,
HIGH voltage,
WORST process
BEST case,
LOW temperature,
HIGH voltage,
BEST process

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
Check Constraints
When a design is loaded into an STA tool and constraints are applied
Checks for consistency and completeness of the timing constraints specified for a
e
design
c
The timing constraints should be complete before running a timing debug
STA tools come with specific commands that run these checks.
en
CHECK Constraints
a d Clean?
NO
c
YES
TIMING
Check Constraints (continued)

Some of the common checks are
Connectivity checks for clock and data to ensure that clock and data
signals are propagated
ce
Arrival time and required time for each clock in a multiple clock system
Clock gating points
en
Combinational loop in the design
Constant collision/contradiction on a net connected to the pin
Multiple clocks arriving at a leaf cell
a d
6/16/08
What Is a Timing Report?
The timing report is a summary of the final timing information.
There are separate reports for setup time analysis and hold timing analysis.
Header
ce
The report usually consists of the following parts:
n
Body
• Start point: Endpoint pair for which timing is being
e
calculated
Header • End point arrival time calculation
• Slack calculation
a d Timing information for all paths from:
c
• An external input pin to an internal
Body register
• An internal register (or input select
pin) to an output pin
• An internal register to another
internal register (C2C)
Example Timing Report

Endpoint: data_out[4] (^) checked with leading edge of 'vclk1'
Beginpoint: DATA_BUS_MACH_INST/reg_4/Q (^) triggered by leading edge of 'vclk1'
Other End Arrival Time 0.000
e
+ Source Insertion Delay 3.000
- External Delay 2.000
+ Phase Shift 10.000 Header
- Uncertainty 0.250
c
= Required Time 10.750
- Arrival Time 7.447
= Slack Time 3.303
Clock Rise Edge 0.000
n
+ Source Insertion Delay 4.000
= Beginpoint Arrival Time 4.000
+-------------------------------------------------------------------------------------+
e
| Instance | Arc | Cell | Delay | Arrival | Required |
Body| | | | | Time | Time |
|----------------------------+---------------+-----------+-------+---------+----------|
d
| i_150 | Y ^ | | | 4.000 | 7.303 |
| DTMF_INST/m_clk__L1_I1 | A ^ -> Y v | CLKINVX20 | 0.327 | 4.327 | 7.630 |
| DTMF_INST/m_clk__L2_I2 | A v -> Y ^ | CLKINVX20 | 0.278 | 4.604 | 7.908 |
a
| DATA_BUS_MACH_INST/reg_4 | CK ^ -> Q ^ | SDFFRHQX1 | 0.507 | 5.112 | 8.415 |
| TDSP_CORE_GLUE_INST/i_9712 | A ^ -> Y v | INVXL | 0.135 | 5.247 | 8.550 |
| TDSP_CORE_GLUE_INST/i_9713 | A v -> Y ^ | INVXL | 0.101 | 5.348 | 8.651 |
| PORT_BUS_MACH_INST/i_9761 | A ^ -> Y v | INVXL | 0.095 | 5.443 | 8.747 |
c
| PORT_BUS_MACH_INST/i_9762 | A v -> Y ^ | INVX2 | 0.122 | 5.566 | 8.869 |
| FE_OFC1146_tdsp_portO_4_ | A ^ -> Y ^ | BUFX12 | 0.172 | 5.738 | 9.041 |
| IOPADS_INST/Ptdspop04 | I ^ -> PAD ^ | PDO04CDG | 1.709 | 7.447 | 10.750 |
| | data_out[4] ^ | | 0.000 | 7.447 | 10.750 |
+-------------------------------------------------------------------------------------+

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
What Are Setup Time and Hold Time?

Synchronous inputs have setup/hold specification relative to clock.
e
Setup Time: The time a synchronous input must be stable before active clock
edge.
c
Hold Time: The time a synchronous input must be stable after active clock
edge.
en
Input Data Valid
d
Setup Time Hold Time
ca Clock

Setup Time and Hold Time Violations
A setup time violation is when a signal arrives too late and misses the time
when it should advance.
clock cycle before it should.
ce
A hold time violation is when a signal arrives too early and advances one
en Input Data
d
Setup Time Hold Time
Violation Violation
ca
Clock
Timing Report for Setup Violations

Path 1: VIOLATED Setup Check with Pin
reg_2/CK Arrival Required
Endpoint: reg_2/D (v) checked with leading Instance Arc Cell Delay Time Time
edge of ’CLK1’
e
Beginpoint: reg_1/Q (v) triggered by leading clk ^ 0.000 0.088
edge of ’CLK1’
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.178
- Setup 0.167
c
ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.275
+ Phase Shift 2.000
= Required Time 1.937 ck_2 A ^ -> Y ^ BUFX2 0.094 0.282 0.369
- Arrival Time 1.946 ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.462
n
= Slack Time -0.009
Clock Rise Edge 0.000 ck_4 A ^ -> Y ^ CLKAND2X2 0.150 0.524 0.612
reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 0.900
e
t_1 A ^ -> Y ^ BUFX8 0.111 0.923 1.011
t_2 A ^ -> Y ^ BUFX8 0.092 1.015 1.103
reg_1 reg_2
d
t_3 A ^ -> Y ^ BUFX8 0.092 1.107 1.195
Q D t_4 A ^ -> Y ^ BUFX8 0.092 1.199 1.287
a
CK CK
t_5 A ^ -> Y ^ BUFX4 0.132 1.331 1.379
t_1 t_12
clk t_6 A ^ -> Y ^ BUFX8 0.092 1.423 1.471
t_7 A ^ -> Y ^ BUFX6 0.112 1.535 1.563
c
ck_0 t_8 A ^ -> Y ^ BUFX8 0.092 1.627 1.655
ck_4
t_9 A ^ -> Y ^ BUFX4 0.128 1.755 1.747
t_10 A ^ -> Y ^ BUFX8 0.088 1.843 1.835
t_11 B ^ -> Y ^ NAND2X1 0.066 1.909 1.901
t_12 A ^ -> Y ^ INVX1 0.037 1.946 1.937
reg_2 D v DFFRHQX1 0.000 1.946 1.937

Timing Report for Hold Violations
Path 1: VIOLATED Hold Check with Pin reg_3/CK
Endpoint: reg_3/D (v) checked with leading edge of ’CLK1’
e
Beginpoint: reg_1/Q (v) triggered by leading edge of ’CLK1’
+ Hold 0.179
c
+ Phase Shift 0.000 Arrival Required
= Required Time 1.152 Instance Arc Cell Delay Time Time
Arrival Time 1.099
n
clk ^ 0.000 0.088
= Slack Time -0.053
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.178
Clock Rise Edge 0.000
e
= Beginpoint Arrival Time 0.000 ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.275
ck_2 A ^ -> Y ^ BUFX2 0.094 0.282 0.369
d
ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.462
ck_4 A ^ -> Y ^ CLKAND2X2 0.150 0.524 0.612
a
reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 0.900
t_1 A ^ -> Y ^ BUFX8 0.092 0.904 0.992
c
t_2 A ^ -> Y ^ BUFX8 0.092 0.996 1.084
t_15 B ^ -> Y ^ NAND2X1 0.066 1.062 1.115
t_16 A ^ -> Y ^ INVX1 0.037 1.099 1.152
reg_3 D v DFFRHQX1 0.000 1.099 1.152
Techniques to Reduce Timing Violations

To fix setup violation, we need to speed up the delay path causing
violation by
ce
Increasing cell drivability by upsizing cell
Adding buffers to optimize the critical path and reducing the load on
complex gates with large fanout
Upsize Cell
en Insert Buffer
a d
To fix hold violation, we need to make the signal path slow by
Adding delay cells to slow the signal
c
Reducing drivability of cells
Insert Delay Cell

Down size Cell

What are the two type of timing analysis?
e
What constraints define a clock?
In reading a timing report, how do you know that the design has a
timing violation?
nc
d e
ca

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
What Are Timing Exceptions?
Paths that are given special consideration by the timing analysis tool
False paths
Multicycle paths
ce
n
False Paths Multicycle Paths
Paths that are not exercised during operation Paths that take multiple cycles
False
Path
DFF1
d e Multicycle
Path
DFF1
a
N cycles
Result is used every
c
N clock cycles
DFF2
DFF2
Timing Exceptions: False Path

False path
A path that has no functional purpose or a path that does not need
Reasons for false paths
ce
to be timing constrained (i.e., path between two clock domains).
Path is never exercised during circuit operation
Blocking false paths

n
Path is only possible in special operation mode (test mode, etc.)
e A
a_b
A+B
d
Blocking of timing arcs C
Blocking the path itself
a
adder
c_d
c
B C+D
Sel
Examples: Multiplexed Logic in a Test Mode

Timing Exceptions: Multicycle Path
Multicycle paths
The paths that exist between two synchronous clock domains with integral
ce
multiples of clock frequency
data
data
en data data data
d
CLK2 CLK1
BlockA BlockB
ca
CLK1
DATA
CLK2
T cycle
time
T/2 cycle
time

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
Three Timing Analysis Modes
There are three different timing analysis modes.
e
Timing Analysis Mode Description
c
Single Single operating condition used to scale delay
value
n
Best case and worst Analyzes off-chip variation for two extreme
case (BC-WC) operating conditions
e
On-chip variation (OCV) OCV is the small difference in the operating
parameter value across the chip.
d
In this course, we will only cover the OCV mode.
a
6/16/08
What Is On-Chip Variation Analysis?

Cannot assume constant PVT across die. It is essential to comprehend impact
of these variations in timing analysis.
ce
In this analysis mode, the delay calculation for one path may be based on
maximum operation condition while delay calculation for another path may be
based on minimum operating condition for setup and hold checks.
n
On-chip variations
for setup timing check
On-Chip Variation Analysis

Data
d e
Data Delay
Worst Case
ca Clock
Best Case
CLOCK Delay
On-chip variations
for hold timing check

STA Tools in OCV Analysis Mode
Computes min and max delays for cells and nets by multiplying annotated
delay with min and max timing de-rate value, respectively.
ce
Apply min and max delays to different paths simultaneously.
For setup check, annotate worst-case SDF. Use max delay for launch path and min
delay for capture path.
n
For hold check, annotate best-case SDF. Use min delay for launch path and max
delay for capture path.
Launch
0 2 4 6
d
8
e 10
a
Early Late
Launch Capture
Capture
c
L T L T L T CLK1
Phase shift (late)
Ideal clock edges
OCV Mode Setup

early path
launch clock
root
ce
n
late path
capture clock
d e min
library
max
library
ca
For setup check, the timing delay values from the Max library are used for the
data and the launch clock network delay.
The delay values from the Min library are used for the capturing clock network
delay assuming that the clocks are set in propagated mode.

OCV Mode Hold
late path
capture clock
root
ce
launch clock
en early path
d
min max
library library
ca
For hold check, the timing delay values from the Min library are used for the
data arrival time and launch clock network delay.
The delay values from the Max library are used for the capturing clock network
delay assuming that the clocks are set in propagated mode.
What Are CPPR and CRPR?

Definition: Clock path pessimism removal (CPPR) and
clock re-convergence pessimism removal (CRPR) are the process of
e
identifying and removing the pessimism introduced in the slack reports for
clock paths when the clock paths have a segment in common.
c
Example: In the on-chip variation methodology, during setup checks, if both
the launch clock late path and the capture clock early path share a portion of
n
the clock network, then for the common clock network, a pessimism equal to
the difference in maximum and minimum delay values is introduced in the
e
slack values.
d
common segment
a
early path
c
launch clock
root
late path
capture clock

CRPR: Pessimism Calculation in OCV Mode
Fast path x Mfast
d1
e
FF1
Dcommon (dc)
root
nc FF2
No OCV : path to FF1 = dc + d1

path to FF2 = dc + d2 e
Slow path x Mslow
d
d2
With OCV : path to FF1 = (dc + d1) x Mfast

path to FF2 = (dc + d2) x Mslow
ca
CRPR: The common path cannot be de-rated by two different values at the
same time.
The slack calculation is too pessimistic.
The pessimism is P = dc x Mslow – dc x Mfast.
New slack = slack(w/o CRPR) + P.
Timing Report with Clock Pessimism

Path 1: MET Setup Check with Pin reg_2/CK Arrival Required
Endpoint: reg_2/D (v) checked with leading Instance Arc Cell Delay Time Time
edge of ’CLK1’
e
Beginpoint: reg_1/Q (v) triggered by leading clk ^ 0.000 0.508
edge of ’CLK1’
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.598
c
- Setup 0.167 ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.695
+ Phase Shift 2.000 ck_2 A ^ -> Y ^ BUFX2 0.094 0.282 0.789
+ CPPR Adjustment 0.420
ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.882
n
= Required Time 2.358
- Arrival Time 1.946 ck_4 A ^ -> Y ^ CLKAND2X2 0.150 0.524 1.032
= Slack Time 0.412 reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 1.320
e
Clock Rise Edge 0.000 t_1 A ^ -> Y ^ BUFX8 0.111 0.923 1.431
t_2 A ^ -> Y ^ BUFX8 0.092 1.015 1.523
d
t_3 A ^ -> Y ^ BUFX8 0.092 1.107 1.615
t_4 A ^ -> Y ^ BUFX8 0.092 1.199 1.707
a
t_5 A ^ -> Y ^ BUFX4 0.132 1.331 1.799
t_6 A ^ -> Y ^ BUFX8 0.092 1.423 1.891
c
t_7 A ^ -> Y ^ BUFX6 0.112 1.535 1.983
t_8 A ^ -> Y ^ BUFX8 0.092 1.627 2.075
t_9 A ^ -> Y ^ BUFX4 0.128 1.755 2.167
t_10 A ^ -> Y ^ BUFX8 0.088 1.843 2.255
t_11 B ^ -> Y ^ NAND2X1 0.066 1.909 2.321
t_12 A ^ -> Y ^ INVX1 0.037 1.946 2.358
reg_2 D v DFFRHQX1 0.000 1.946 2.358

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
MMMC Design
Today’s chips include
GPRS MP3 Awake Scan
Multiple standards support EDGE Camera Doze BIST
e
WCDMA Gaming Sleep OPMISR
Multiple functionalities
c
Multiple power profiles
Multiple test modes
n
Results in multiple constraint sets
e
It becomes more difficult below 90 nm to
Determine worst-case corner combinations
d
Determine RC corners
Mode 1 Mode 2 Mode 3
Determine constraint modes
a
(functionality) (test) (power)
Min Max Min Max Min Max
MMMC provides the ability to concurrently
c
support multiple combinations of modes and SDC1 SDC2 SDC3
corners.
Example: Cell phone chips typically need to
be designed for 20 mode/corners scenarios.

What Is MMMC Analysis?
In deep submicron processes Min Max
Cell and wire delay behave differently DFF1 DFF2 DFF3
e
depending on process variation
Analysis needs to be done at more than just
c
a single min corner and single max corner
Identification of single worst corner-case and
fixing violation becomes difficult due to
n
differing condition
Multi-corner capability enables you to
e
analyze and optimize at all these corner Delay Calculation Corners
cases.
RC Corner
Multi-mode timing analysis • Timing Libs Constraint Mode
d
• cdB Libs Descriptions
A design can have multiple modes of • PVT setting • Clock defs
operation and each mode can have different, • De-rating
a
• Constants
even conflicting, constraints • SDF • Exceptions
Allows concurrent analysis and optimization • RC Controls (SDC)
c
of multiple modes, eliminating iterations for
timing closure
Multi-corner timing analysis
Used to resolve different timing problems that Delay Corner
appear at different processes, voltages, and Constraint Mode
temperatures pointers
Analysis Views
How Is MMMC Analysis Achieved?

To achieve multi-corner analysis and optimization normal mode 1 mode 2
SDC SDC SDC
1.Set up the environment
2.Define the scenarios
3.Load the SDC file
ce
4.Analyze the timing reports from multiple scenarios
Synthesis
STA Tool
n
5.Determine which scenario to optimize Load design
e
To analyze by using the sequential multimode Create
scenario scenario
Per scenario
d
1.Define the current scenarios Set
operating
2.Identify the critical scenario based on timing report
a
conditions
generated by STA tool
3.Define the most critical scenario as the first scenario
c
Set constraints
in the current scenario definition
Identify most
4.Run optimizations such as clock tree optimization, critical scenario
post placement optimization, or routing optimizations
Analyze
Repeat steps 2 through 4 until timing is satisfactory
Optimize

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
What Is Correlation?
Synthesis, place and route, and the sign-off tools Design Entry
are different (usually).
e
Synthesis uses wire load models estimation of Synthesis
physical design.
c
Timing Engine
Need to adjust wire load model coefficients.
n
Place and route uses more realistic numbers for Place
physical design.
Timing Engine
e
Timing more accurate as flow progresses.
Different timing engines used at different stage use
Route
d
different technique to calculate timing.
Timing Engine
Do the optimizer and placer see the same worst paths
a
as the static timer?
RC Extraction
c
Correlation is an indication of the relationship
between two variables.
SI Analysis
Static Timing
Analysis (Sign-off)

Why Correlate?
Majority of today's design flows utilize two timing-analysis tools.
One for implementation, and a second for signoff
e
Implementation tools have an in-built extraction tool, which are different from sign-
off extraction tools.
c
Extracted output will be different
Both tools should see same information and provide the same results.
n
Prevents additional work at the time of sign-off
e
At 130 and 90 nm, parasitic effects are small, and there is not that much that you need
to correlate.
d
At 45 nm, correlation between different timing infrastructures is nearly impossible,
based on the number of complex effects.
ca Delay Variation Using
250 nm
Sign-Off Tool
180 nm 130 nm
Delay Variation Using
Technology Node
Implementation Tool
90 nm 65 nm
How to Achieve Correlation

To correlate native extraction results with sign-off extraction
Compare SPEF files from basic and sign-off extraction
Total capacitance
ce
Generate the scaling factors or the de-rating factors for
Cross-coupling capacitance
Resistance
en
The timing scaling factors affect the path delay values generated in
the timing reports.
a d
Scaling factors are set for data paths, clock paths, minimum and
maximum operating conditions.
6/16/08
Post-CTS and Post-Route Correlation
Post-CTS
Post-CTS
Actual clock tree delays
e
propagated Clock pin
C. logic
c
Actual clock net delays used
instead of estimates done at pre-
CTS Delay
n
Clock Source network
Post-route source latency latency
e
All cells and nets have fixed
location on design
d
Generates a more realistic timing
Post Route
result
a
Clock pin
Effects due to congestion are
C. logic
taken into account
c
Effects due to signal integrity can Delay
be taken into account
Clock Source network
Account for mismatch between source latency latency
pre-route and post-route delays

Timing analysis
e
Timing constraints
Timing exceptions
nc
e
d
a
Timing correlation
6/16/08
What Are Design Rule Constraints?
Design rule constraints are requirements depending on the technology library.
Default constraints always exist implicitly because of selected target libraries.
ce
These rules are established by the library vendor for the proper functioning of
the fabricated circuit; they must not be violated.
User can set more restrictive values—the explicit values—but cannot remove
n
implicit design rule constraint attributes.
d e Constraints
a
Max Tran
c .lib Max fanout
Max Cap
Some Design Rule Constraints

What constraints are there for the outputs of logic gates?
e
Output of every gate usually has one or more of the following design
rule constraints:
c
Max transition
n
Max fanout
Max capacitance
d e
ca
What Is Maximum Transition Time?
The maximum transition time for a net is the longest time allowed for
its driving pin to change logic values.
ce
Typically, fixed by buffering the output of driving gate.
Upsized Driver or Added Buffers
1x
en
Before Optimization
After Optimization
1x
d
2x 1x
1x
ca
Maximum Transition
Rule Violation
Maximum Transition
Rule Met
What Is Maximum Fanout?

The fanout of a net is the physical number of logic gate inputs to which an
output is connected.
ce
To prevent routing congestion, as well as to help the synthesis tool meet
maximum transition and capacitance constraints, we need to specify the
maximum fanout limit for the design.
n
Most technology libraries place fanout restrictions on driving pins, creating an
implicit fanout constraint for every driving pin in designs using that library.
d e
ca
What Is Maximum Capacitance?
Maximum capacitance specifies the maximum capacitance allowed on
the output pin of a cell.
ce
The maximum capacitance design rule constraint allows you to control
the capacitance of nets directly.
The design rule constraints max_fanout and max_transition limit the
n
actual capacitance of nets indirectly.
e
a d
6/16/08
Learning Activity
e
Study a timing report which has a critical path which is failing to meet
the timing requirement
nc
Analyze the report and identify the problem
Decide which course of action is best suited to fix the critical path so
that it meets timing
e
Present your findings to the class
d
ca 20 minutes for activity

e
Signal integrity analysis
nc
d e
ca
What Is Signal Integrity?

Definition: Unintended
effects on digital signals Specification Floorplanning Place/Route
e
caused by interconnect Designer Placement
parasitic resistance or Micro-
c
Physical Synthesis

capacitance that causes
Design Optimization
Designer
noise and/or changes delays Pre-CTS
Delay Calculation
Signal Integrity
n
Extraction
RTL CTS
Example: In our example Design Optimization

PostCTS
e
design, we saw SI effects Logic Synthesis
Route
such as noise-on-delay and Synthesized
Netlist PostRoute
glitches, due to long nets that
d
Detail
were running in parallel. Routed
Design
GDSII
a
GDSII
c
GDSII Mask Prep

Signal integrity
Input
e
SPEF
Routed design in the Verilog language
or other HDL + DEF
c
Constraints in Synopsys Design Routed Design
Constraints (SDC) format SDC TCL
n
Gates +
Constraints and commands in TCL DEF
Parasitic extraction file (SPEF)
e
Logical timing libraries in Liberty (.lib) Signal Integrity
format
d
Logical Physical
Physical libraries in LEF format Incremental Library Library
SDF
Tool specific SI libraries
a
Tool
Specific
Output Delay File Library
c
Incremental SDF file containing all of
the delay information in the design
related to noise-on-delay
Reports for glitch nets
List of problem nets that need to be re-
routed
SI Problems with Changing Process Technology
ce
n
SI Problems
d e
ca 0.15 0.13 0.09
Process Technology
0.065

Why Now?
These SI effects have always existed, but they are worse at deep submicron sizes
because of
e
Finer geometries
c
Greater wire and via resistance
Higher electric fields (if supply voltage not scaled)
n
Smaller spacing rules between wires
More metal layers
e
Higher ratio of cross coupling to grounded capacitance
d
Interconnect: Determining factor for
performance, power, and yield
a
30
Delay (ps)
25
c
20 Total
15
Gate
10
5 Interconnect
0.65 0.5 0.35 0.25 0.18 0.13 0.09 0.065
Shrinking Process

Crosstalk (cross coupling): noise on delay
e
Glitch on functionality
Noise library
ECO repair files
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
What Is Crosstalk Effect?
Crosstalk is caused by transition on an adjoining signal having a capacitive or
inductive coupling between neighboring wires leading to an unintended logic
e
transition.
Victim net: Net on path affected by crosstalk
c
Aggressor net: Net that affects victim net
Switching window: Time interval when a signal transition may occur. When
n
coupled signals switch
e
In opposite direction (aggressor), victim line signal delay increases.
In same direction (helper) victim line signal delay decreases
a
Aggressor
Wire R
d Aggressor net
c
Grounded C
Drive R Coupling C
Victim net
Victim
Input Noise Tolerance
Effect of Crosstalk on Delay

Crosstalk can lead to an increase in delay, which may lead to setup or hold
time failures when both attacker and victim are changing simultaneously.
Timing predictions become inaccurate.
ce
Crosstalk can have two effect on victim nets:
Crosstalk causes signal to slow down.
Cell Delay here

n
Crosstalk causes signal to speed up.
e
Delay here depends on the
behavior of other nets
a d Wire R
in1
FF
c
in
Grounded C Coupling C
Other logic net(s)

a1

Crosstalk Causes Signal to Slow Down
When attacker and victim are changing
in opposite directions Wire R victim
FF
e
The cross coupling between the in
two nets causes the victim to slow
c
Coupling C
down.
n
This can affect the setup time a1
requirement of the flip-flop if the
e
signal arrives late.
In1 Data
a d Setup Time
c
Clock
In1 Data
Setup Time
Crosstalk Causes Signal to Speed Up

When both attacker and victim are
Wire R victim
falling or rising simultaneously FF
e
in
The cross coupling between the
c
two nets causes the victim to Coupling C
speed up.
n
This can affect the hold time a1
requirement for a flip-flop if the
e
signal arrives early.
d
In1 Data
a
Hold Time
c
Clock
In1 Data Hold Time

Crosstalk Analysis
Steps
Computes the timing windows and slew
e
Compute slew rate
rates internally
c
Uses timing windows and logic
constraints to disallow specific
n
simultaneous switching scenarios Disallow
between victim and attacker nets simultaneous
e
switching
Analyzes each valid overlapping
attacker subset to determine the worst-
d
case delay change
Find victim with
a
Outputs either an incremental or full worst delay
SDF file for all nets
6/16/08
Generate
incremental
or full SDF
711

e
Noise library
ECO repair files
nc
d e
ca
Impact of Noise on Functionality
Coupling noise can cause functional failures.
Slew rate (dv/dt) and capacitance (C) set glitch current (i).
ce
Load impedance sets the glitch voltage.
The attacker causes a significant glitch on the reset signal such that it resets the
flip-flop and destroys the stored logic state.
n
With lower transistor threshold voltages (Vtn and Vtp) for low power design, glitches
can lead to unintended switching of transistors.
d e 1 d
q
a
Attacker
0
clk
c
i C
1 reset
Victim
i=Cdv/dt
Noise Analysis Flow

Steps
Propagates the noise glitch to see if the
e
Propagate
noise glitch reaches a storage element noise glitch
c
(latch or flip-flop)
This reduces the number of potential
n
false alarms as it utilizes the inherent Check if noise
glitch filtering properties of CMOS logic reached storage
e
Measures the height of the glitch after it elements
has propagated to the receiver output
d
Performs sensitivity analysis, which
a
determines if a glitch will amplify or not
If the glitch does not amplify, it cannot
Measure glitch
height at
receiver output
6/16/08
c
cause a functional failure

Perform
Sensitivity
Analysis
714
Example Text Glitch Report
Generated with generate_report -sort_by rcvr_peak -slack
e
*******************************************************************************************
CeltIC Noise Report
Generated: Fri Aug 15 10:22:01 PDT 2007
c
***************************************************************************
Report Options:
---------------------------------------------------------------------------
n
Slack : yes
Sort by : noise (receiver input peak)
Threshold : 10.0 (mV)
e
Level : VH and VL
---------------------------------------------------------------------------
Peak(mV) Level TotalArea %AreaTillPeak Width(ps) VictimNet
d
1687.614 VL 1067.88 17.17 1265.55 U2DFF:CP {CLK2}
Receiver output peak:
a
Value ReceiverNet
1559.185 U2DFF/CP (DFQD1)
c
Constituents:
Source Peak(mV) Offset(ps) Slew(ps) Edge Net TraceBackNet(NoiseType)
Cpl: 1687.614 4950.000 50.000 R CLK1 -
Baselevel: 0.000 - - - - -
---------------------------------------------------------------------------
SI Repair Techniques for Crosstalk Glitch and Delay

Minimizes disturbance to existing place and route by
e
Increasing the spacing between the affected nets
Upsizing the victim driver so the affect of the aggressor is minimized
c
Add a shielding wire between the affected nets; shield is usually VSS
n
d e
ca
e
Noise library
ECO repair files
nc
d e
ca
Noise Library
Signal integrity analysis requires each cell in the circuit to be modeled
(characterized) using a hierarchical model, such as
UDN (user-defined noise)
ce
ECHO (hierarchical block)
XILM (interconnect logic model) or cdB (block)
make_cdb utility.
en
This pre-characterized information is stored in a noise library using the
d
The characterization determines the sensitivity of the cell library to
noise glitches on the inputs.
ca
Factors such as resistance, capacitance, noise tolerance, and output
holding strength are to be taken into account during characterization.

What’s in a Noise Library?
Characterized gate-level data
UDN portion
e
Input characterization data
c
Output characterization data
Slew characterization data
n
SPICE transistor description
e
Copy of transistor-level cell
Cell renamed to _CADMOS_<cellname>
d
Characterized slew on input
of last logic stage output
a
(- rise, -fall ) Internal Slew Characterization
Cell input slew
(-slews)
6/16/08
c Internal node
(-rise_prop_to –fall_prop_to)
capVal

Output pin connected
to internal node
-connNL
719
Generating a Noise Library

The make_cdb utility performs characterization and automatically extracts I/O
port direction as follows:
e
As specified in the Synopsys .lib (preferred approach)
As specified by the set_port command
c
Ports connected to gates are marked as inputs
Ports connected to transistor channels are marked as outputs
en
Channel connected inputs or bidirects must be marked manually
Records the Vds-Ids curves for each Vgs connected to each cell output
d
Calculates the noise threshold of each cell input and the I/O pin capacitance
Cell Library CMOS
a
SPICE Device
Netlist(s) Model
c
Synopsys Command File make-cdb
Library (TCL)
.lib
Noise
Library
.cdb
What’s in a cdB File?
A block-level cdB contains a cell-level view and a cdB Structure
transistor-level view. SPICE Transistor Model(s)
e
The cell-level view contains pin capacitance, Characterized Data For cell1
calibrated input noise threshold, and
c
subckt transistor description for cell1
nonlinear output drive strength.
…
The transistor-level view contains an ECHO
n
Characterized Data For cell N
built with the cells and R/C network
connected to each I/O pin. This is different subckt transistor description for cell N
e
than the .cdB created by make_cdb, which
contains a UDN built with transistors, not
d
cells. Noise Check
a
Noise Check
UDN
c
UDN
Cell Level View
Transistor-Level View

e
Noise Library
ECO repair files
nc
d e
ca
What Is Engineering Change Order Mode?
ECO mode is used in an SI analysis tool to
e
Analyze both glitch and delay failures
Fix propagated noise failures
c
Output a tool-specific ECO command file
n
d e
ca
ECO Repair Files

An ECO repair file is a tool-specific output command file generated
when the tool operates in the ECO mode.
equivalents.
ce
The ECO mode uses the Liberty file (.lib) or user-defined cell
The tool can fix glitch and incremental delay failures with the ECO
option.
en
The tool automatically outputs the ECO repair file in a text file and a
HTML format, showing the original noise and the new noise after
a d
swapping in a new cell.
Victim driver cells can be upsized. (Swapping victim driver cells will
not fix the failure if the coupling is caused by a long wire.)
6/16/08
Noise-on-Delay Fixing
Options for ECO analysis on noise
failures
e
Buffer: Buffer insertion Place and Route
Resize: Driver resizing
Spacing: Wire spacing
nc
Shieldnet: Shield net insertion
Nofix: Do not do ECO analysis Extraction
ECO
Repair File
e
for noise failures (Glitch +Delay)
Default option is spacing.
a d Noise Analysis
6/16/08
c Static
Timing
Analysis
ECO Repair File Example

The ECO HTML file below generated by CeltIC contains a detailed table with
information on noise and delay ECOs. It is generated automatically when the ECO
e
is enabled.
nc
d e
ca
e
Glitch on functionality and delay
Noise library
ECO repair files
nc
d e
ca
Hierarchical Methodology
Design sizes and complexity increasing
Longer turnaround time and capacity limitations when running designs in a flat
hierarchy
ce
To handle complexity, block-based hierarchical design methodologies are
used
en
a d Black
c
Box

What Is an XILM?
XILM is an interconnect logic model that contains all the nets from the
boundary to the first latch or flip-flop and the cross-coupling capacitance for
e
noise analysis.
c
It is created using CeltIC NDC.
The XILM model is used for both hierarchical noise and timing analysis.
en
d
Propagated
Attacker Noise Attacker Attacker
Failure?
a
Primary d q d q Primary
Input Output
c
Victim
clk clk
Advantages of Hierarchical Analysis

Reduction in turnaround time
e
Less likely to have a capacity limitation
Gives feedback earlier in the design cycle
c
Supports a continuous convergence methodology
n
d e
ca
Learning Activity
e
Be given a handout of a SI report which contains violations
You have to analyze the report and trace the cause of the problem
nc
Decide which strategy is best suited to fix the violation
d e
ca
STA Summary
The goal of timing analysis is to verify that a design meets timing
requirements under a specified set of timing constraints.
possible paths.
ce
STA ignores functionality of circuit and analyzes the timing for all
Timing constraints are used by designer to guide the timing
n
optimization tools in order to meet the timing goals.
e
The timing reports provide a summary of the final timing information,
which reports timing failures (setup and hold) for all paths starting with
the worst failing path.
a d
Timing exceptions are set on paths that are not designed to be
exercised during normal circuit operation.
c
Timing analysis modes, such as OCV mode, direct the tool so that it
takes into account the small difference in operating parameters across
the chip while analyzing the design.
Design rule constraints are the requirements established by the library
vendor for the proper functioning of the fabricated circuit.
Signal Integrity Summary
SI issues lead to failure in performance of a circuit due to errors
induced in the normal operation of a design through crosstalk and
e
glitches.
c
A noise library characterizes the cells in a design to determine its
sensitivity to noise glitches on their inputs.
en
An ECO repair file is a command file that provides information used to
repair nets that suffer from noise and that should be fixed in the
database available after place and route.
a d
XILM is an interconnect logic model that defines the noise propagation
up to the first latch/flip-flop from the boundary pins.
6/16/08

True or false
e
In static timing analysis, the designer creates timing test vectors that are
simulated using a gate-level netlist to verify timing.
from multiple libraries.
nc
Operating conditions are always set from a single set of libraries and never
The timing constraints should be complete before running a timing debug.
e
Design rule constraints are requirements depending on technology library.
Crosstalk is caused by transition on an adjoining signal having a capacitive
a
unintended logic transition.
d
or inductive coupling between neighboring wires leading to an
Coupling noise can cause functional failures.
6/16/08
Terminology
Term Description
Constraint-related
e
Clock Skew The maximum difference in arrival times of clock signal to any two latches/FFs fed by the clock network
Clock Jitter The maximum difference in phase of clock between any two periods
c
Clock latency Specifies the delay along the clock tree (Source latency + Clock network latency)
Slew Rate Represents the maximum rate of change of a signal at any point in a circuit
n
Path Delay Represents the time taken for signal to propagate from one point to another
Timing report-related
e
Beginpoint Flip-flop or port at which the signal is launched with respect to the clock
Endpoint Flip-flop or port at which the launched signal is captured with respect to the clock
Other end arrival It is the capture clock path from clock source to capture flop register
d
time
Slack Slack or timing margin is the difference between the “required arrival time” and “actual arrival time”
a
Phase Shift Phase shift is the delay adjustment used to calculate the appropriate required time at the path end point
Instance Master cell definition used multiple time in a design with a unique name
c
Arc Any signal path along a net from one start point to one end point
Operating mode-related
Launch Clock Clock signal at the starting flip-flop which launches the data
Capture Clock Clock signal at the ending flip-flop which captures the data
Early signal Earliest time at which the value on a net /point can change from its previous cycle stable value
Late Latest time at which the value on a net/point can settle to its final stable value for the current cycle
ce
en
a d
6/16/08
Design Optimization
Module 11
June 16, 2008
Optimization Process
Optimization is the successive
refinement of a product or design.
e
Usually, it takes several iterations
c
of optimization until a product or
design is complete.
Trees
n
The types of optimizations
performed on the product or
e
design depend on the stage.
For example, to make lumber,
d
trees are chopped down, cut into
long strips, sized, and sanded.
a
“Optimization”
In digital design, we also see
c
various optimizations as the
design progresses through the
physical implementation flow.
Lumber

Module Objective
e
Explain the value of optimization at the various stages of the design
flow to meet timing
nc
d e
ca

Optimization for timing, SI, power, and area
e
Inserting repeaters to optimize for timing
Pre-CTS, post-CTS, and post-routing optimization
nc
d e
ca
What Is Optimization?
Unless you are an absolute genius, your design will not meet the
timing requirements on the first run.
ce
Optimization is the process of iterating through a design such that it
meets timing, area, and power specifications.
In general, optimization can be broken down into the following areas:
Timing
Signal integrity
en
d
Power
Area
ca
What Is Timing Closure?

A placed and routed design achieves timing closure when it meets its
timing specifications while also satisfying electrical, design rule, and
e
signal integrity constraints.
c
Timing closure is often one of the greatest causes of ASIC tapeout
schedule slips.
en
The problem lies in the discrepancy between front-end and back-end
designers’ concept of timing.
Front-end designers use wireload models to predict timing, and back-
a d
end designers use a fully placed design, including its resistance and
capacitance (RC) values.
Who is more accurate?
6/16/08
What Are Wireload Models?
One of the most vexing problems Sample wireload model file
traditional synthesis tools face is
e
wire_load(“sample_wl10") {
how to predict interconnect
resistance : 8.5e-8;
parasitic values.
c
capacitance : 1.5e-4;
One approach is to develop a area : 0.7;
lookup table that ties the RC slope : 66.667;
n
values of a net to its fanout. fanout_length (1,66.667);
}
e
Tools calculate the appropriate wire_load(“sample_wl20") {
wire load block for each net. resistance : 8.5e-8;
d
capacitance : 1.5e-4;
These values are derived from area : 0.7;
statistical analysis of ASIC foundry
a
slope : 133.334;
data for a given process node. fanout_length (1,133.334);
c
}
Drawbacks of Wire Load Models

Since these wireload model selections are based on discrete values of
the wire area, they are generally crude and inaccurate.
ce
In process nodes of around 1 micron, the dominant component of net
delay is the I/O pin delay of standard cells. Therefore, the wireload
delay plays an insignificant role.
e
widths mean more resistance.
n
As device dimensions shrink, global routes get longer and smaller wire
The wire load can no longer be relied on to close timing.
a d
A better replacement for wireload models is physical synthesis, where
synthesis and placement are combined to more accurately calculate
the wire delay timing based on physical data.
6/16/08
Optimizing for Timing
There are many ways to reduce delay; we will cover some
fundamental techniques here.
ce
Upsizing gates increases their drive strength and, thus, reduces the time it
takes for that gate to transition based on a given load.
Upsizing a gate increases its own input capacitance, giving its driver
n
higher capacitive load.
A technique called logical effort was invented to optimize the size of
e
gates along a path for minimal delay.
The tool will usually perform calculations for you.
a d
Reduce wire capacitance
Usually involves shortening the wire lengths of critical paths by moving
cells or inserting buffers
6/16/08
c Switching to a higher metal layer can also reduce capacitance
Optimizing for Timing (continued)

Often, your design will contain an adder or multiplier unit in the logic
path.
ce
For a large number of bits for example, a carry lookahead adder performs
much better than a ripple carry adder.
Physical synthesis tools optimize datapath elements to meet timing, while
n
balancing area and power.
e
If all fails and the datapath contains too much combinational delay, it is
often viable to simply break the path and insert a register in between,
d
creating an extra pipeline stage.
An extra pipeline stage means more latency and more area.
a
Such a change usually requires changing the RTL itself.
c
Signal Integrity
As technology continues to scale, the aspect ratio of the horizontal-to-
vertical dimensions are reduced.
capacitances.
ce
This results in increased ratios of coupling capacitance to substrate
The impact on the victim line is a strong function of the rise time of the
n
interfering signal and the strength of the gate driving line Y.
e
A voltage step on line X causes a transient step on Y that decays with
a time constant: τ XY = RY ( C XY + CY )
a d X
CXY coupling capacitance
c
Y
VX RY CY substrate capacitance
Signal Integrity (continued)

Aside from delay issues, crosstalk can also cause functional failures.
e
When a voltage is applied on line X, there is also a change of voltage
on line Y equal to C XY
c
ΔVY = ΔV X
(CY + C XY )
n
If this change in voltage is large enough, it can cause an erroneous
logic value at the load of line Y.
d e X
a
CXY coupling capacitance
Y
c
VX RY CY substrate capacitance

Optimizing for Signal Integrity
There are a few ways to reduce the effects of crosstalk. Recall that the
equation for delay is τ XY = RY (C XY + CY )
e
Reduce RY, which means upsizing the driver of line Y.
c
RY
en
Insert a repeater in the line.
RY
d
Reduce the capacitance, which means separating the wires or changing
metal layers.
ca
Power
Power is a major issue in most chips, especially those that are used in
mobile devices where battery life is limited.
ce
Recall that power is given by the equation P = f*C*Vdd2 where f is the
operating frequency, C is the total capacitance of the circuit, and Vdd
is the supply voltage.
en
Most of the time, the voltage supply and the operating frequency of the
circuit is already determined long before the physical implementation
stage.
How can we the reduce power?
a d
6/16/08
Optimizing for Power
To reduce power
Reduce capacitance.
ce
Decrease size of standard cells. Power is also a linear function of the
driving current, and smaller gates output less current.
ANDX10
en ANDX6
a d
Leakage current is a dominant factor in today’s (90 nm and below)
chips and can account for as much as 30% of the power consumption.
To reduce leakage current, gates with a higher threshold must be
6/16/08
c
used.
Power and Timing Tradeoff

As we were discussing power optimization, you may have noticed that
some of the techniques are in direct conflict with those in timing
e
optimization.
c
For example, downsizing gates leads to less power, but also more
delay.
one correct solution.
en
This is an age-old problem in the development of ICs and there is not
Every chip has its own priorities regarding power or delay.
a d
For example, a mobile phone processor may not need to run at 2 GHz,
but it must consume as little power as possible.
6/16/08
Visualizing the Tradeoff
In the graphic below, the purple box represents the constraints for
energy (power) and delay (timing) put on your design.
can have.
ce
The blue curve represents the highest possible efficiency your design
Your goal should be to move your design onto the blue curve.
application of your chip.
en
Again, the exact desired location on the blue curve depends on the
d
The derivation of this curve is highly theoretical and is beyond the
scope of this class.
ca
Optimizing for Area

The purpose of shrinking device dimensions from 90 nm to 65 nm to
45 nm is to fit more transistors on a die giving the chip more
e
functionality for the same area.
c
Area is therefore a very important specification, especially for chips
used for medical purposes such as hearing aids and pacemakers.
RAMs and register files.
en
The components that usually take up the most area on a chip are
Shrinking the size of RAMs is an architectural issue and must be
d
settled with the RTL designer.
a
6/16/08
c SRAM

Optimizing for Area (continued)
Downsizing gates also has a small effect, but comes at a cost of
reduced speed and signal integrity.
has been taken up.
ce
Utilization is defined as how much percentage of the floorplan area
If the utilization is too high, the design may become congested,
timing.
en
making it difficult to route. Longer routes also make it harder to meet
a
Congestion
d
6/16/08

e
nc
d e
ca
Inserting Repeaters
Recall that you may upsize gates to decrease the delay through a
path.
buffers to reduce fanout.
ce
If the fanout of a gate is too high, then it is a viable option to insert
But why would inserting an extra stage in the path decrease the
overall delay?
en
Take for instance the following circuit; the input capacitance of the
buffer is Cg, and the value of its load is 16Cg
a d
c
Cg
16Cg
Inserting Repeaters (continued)

Recall that the electrical fanout of a gate is defined as its loading
capacitance divided by its input capacitance.
e
16 C
= 16
g
For the previous circuit, the electrical fanout of the buffer is C g
c
Since it is the only buffer in the circuit, the total fanout of the circuit is
also 16.
en
Let’s insert another buffer and size it to be twice as large as the first
buffer so that its input capacitance is 2Cg.
d
Cg 2Cg
16Cg
ca
The total electrical fanout of the circuit is now 2C g Cg
= 10
16C g
Recall that since the total delay of the circuit is roughly proportional to
the total electrical fanout of the circuit, we have effectively reduced the
delay of the path.
+
2C g

What if we replaced that buffer with a larger one?
Cg
ce
4Cg
16Cg
en
A quick calculation will show that the total electrical fanout is now 8 instead of
10.
d
How do we pick the optimal electrical fanout?
a
This problem can be solved by
Calculating the total delay for N stages of buffers and a total electrical fanout of
c
F(loading capacitance divided by input capacitance of the first buffer)
Taking the derivative with respect to N
Finding the zero of the derivative (call it N0). The optimal electrical fanout is then
equal to N 0
F

Although it would be a nice exercise, we will not perform the detailed
calculations here as there are many factors we did not take into
e
account, such as the intrinsic delay and loading of each buffer.
c
A numerical analysis of the problem reveals that the optimal electrical
fanout is roughly equal to 4.
en
This means to achieve optimal delay, every stage in the logic path
should have equal electrical fanout and equal delay.
The method of logical effort, which will not be explained in this class,
d
explains how to size logic gates of any type.
a
6/16/08
Restructuring Logic
Logic gates with a high number of inputs are not desirable.
e
Usually, it is much more effective to restructure a wide gate into
smaller gates.
individual gates.
nc
This allows more flexibility in terms of optimization for each of the
d e
ca

e
nc
d e
ca
Optimization During the Design Flow
Now that you have learned all of these optimization techniques, where do you
use them?
lectures.
ce
Below is a typical back-end flow that you may be familiar with from past
Netlist
en Floorplan
d
Power Plan
ca Placement
Routing
Optimization During the Design Flow (continued)

Typically, tools have three stages of optimization within the flow:
e
Netlist
c
Floorplan
n
Power planning
e
Placement
d
Pre-CTS Optimization
a
Clock tree synthesis
c
Post-CTS Optimization
Routing
Post-Routing Optimization

The first optimization that takes
place is right after the placement Netlist
e
stage.
Floorplan
c
It is here that we have the most
freedom. Power planning
n
The techniques that are commonly
used here include Placement
e
Inserting buffers for high fanout
nets
d
Upsizing and downsizing gates Clock tree synthesis
a
Restructuring logic to meet timing.
Since the metal routes are not in
c
place yet, we cannot perform any
Routing
optimization by moving metal
layers.
When the clock network is put in place, a new element comes into
play called clock skew.
ce
This factor is because the clock needs to propagate from the center of
the clock tree toward the peripherals.
n
Skew
Reg1
d e Reg1 Reg2
Reg2
ca Clock
source

Post-CTS Optimization (continued)
When harmful skew is added to
the timing path, the path can Netlist
e
violate timing depending on the
amount of the skew and the Floorplan
c
nature of the path.
Power planning
To mitigate the effects of skew,
n
you can Placement
e
Insert buffers in the clock tree to
lessen the skew Pre-CTS Optimization
d
Re-time and use any of the
previously mentioned techniques Clock tree synthesis
a
to fix timing
Once again, no metal routes have
c
been placed, although the clock
Routing
signals are often routed during
clock tree synthesis.
Now the that the design is fully
placed, routed, powered, and Netlist
e
clocked, it is time to undergo the
final phase. Floorplan
This is the stage to perform fixes

on hold violations.
Note however, that at this stage,
nc Power planning
Placement
e
there is usually not enough room
to do much modification. Pre-CTS Optimization
d
Moving standard cells and macros Clock tree synthesis
may require intensive re-routing.
a
Therefore, the following
c
techniques are usually used:
Changing metal layers
Moving metal layers
Resizing gates
Routing

Summary
Achieving timing closure is a difficult task and requires careful
negotiation between front-end and back-end designers.
ce
Wireload models were originally used to determine timing in a design,
but are quickly becoming obsolete with shrinking device dimensions.
Timing can be improved by upsizing gates, shortening wire lengths,
etc.
en
Signal integrity issues are usually caused by coupling capacitance
between wires that are close to each other.
a d
They can be solved by moving wires and upsizing drivers.
Power can be reduced by downsizing gates and using high-threshold
c
cells.
Summary (continued)
The power and timing tradeoff is always a critical consideration
depending on the application of the chip.
ce
RAMs and register files should be used sparingly to optimize for area.
A high amount of optimization makes the design difficult to route and
n
may cause congestion.
Buffers can be inserted to reduce delay in a pattern such that the
e
electrical fanout for each stage is approximately 4.
d
Physical implementation usually consists of three stages of
optimization: pre-CTS, post-CTS, and post-routing.
ca
Each successive stage will have less freedom to optimize as metal
layers are being added.

Learning Activity
In this activity, the class will
e
Study several scenarios (given in the next few slides) within the
optimization flow diagram
scenario shown
nc
Identify which problems can potentially occur as a result of the
Brainstorm the optimizations steps that are necessary to mitigate
e
these problems.
a d
6/16/08
Class Activity: Optimization Case 1

Here is a sample netlist in schematic form.
e
Run physical synthesis on the netlist, and find that the gates
highlighted in red are violating timing.
c
What types of optimization would you perform on this netlist?
n
d e
ca
Here is a design that has been through CTS.
e
What are some possible problems with this design?
How would you fix them?
nc Register Register
e
Clock “Long”
Buffer
d
Clock Net
Clock PLL
ca SRAM Register

Here is a design that has been through detailed routing.
e
What problems can you see in the routing?
How would you fix them?
nc
d e
ca “Long”
Signal Nets

True or false
e
1. As power increases on a chip, the delay decreases.
2. Clock skew is generally not desired and should be minimized through

optimization.
nc
3. Buffers can be upsized arbitrarily to optimize delay.
e
4. Crosstalk optimization can only be performed after routing.
5. Post-route optimization has more options to modify the placement of
d
cells versus pre-CTS optimization.
a
6/16/08
ce
en
a d
6/16/08
Engineering Change Orders, Design
Verification, and Tapeout
Module 12
June 16, 2008
Design Changes
From specification to final Functional
implementation, a chip can undergo Changes
e
changes at various stages. Specification
c
Functional changes to the specification
can require RTL Coding
n
A restart of the entire
implementation process
e
Logical
Implementation
Changes to the design during the
d
Physical
These functional changes impact
a
Implementation
Schedule
Cost
c
Design Verification
and Tapeout
Features of the product
Final
Implementation

Module Objectives
e
Articulate what an Engineering Change Order (ECO) is and what ECO
techniques are used at the different stages of the flow
requirements
nc
Articulate the various steps in verification as well as list the tape-out
d e
ca
What are some plans and projects in everyday life that you are
involved with?
ce
Can you give example of some “last-minute” changes that have
occurred in those plans and projects?
Can you give examples of a “checklist” or some type of process or
n
documentation to ensure that the plan or project is complete?
e
a d
6/16/08
Engineering change orders (ECOs)
e
Design verification
Tapeout
nc
d e
ca
What Is an ECO?
Definition: The process of inserting a logic change directly into the
netlist after it has already been processed by an automatic tool
ce
Example: After our final netlist was created, our marketing person
informed the team of a must-have feature for the chip. To incorporate
the feature, we created and implemented an ECO.
en
a d
6/16/08
ECOs
Implementing ECOs is one of the most challenging aspects of the design
process.
ce
ECOs are necessary to implement important product features, but we must
do so with as minimum impact to schedule and cost, while making sure
what we implement is correct.
ECO types
en
In the next few slides, we will cover
d
ECO implementation types
a
Using back-end tools to implement ECOs
6/16/08
ECO Types
Generally, there are two types of ECOs.
e
Functional ECOs
Changes to the specification to add or remove functionality to the design
nc
The ECO’d netlist and the original RTL do not match functionally
RTL must be modified to match the ECO’d netlist
Timing ECOs
d e
Changes to the netlist, typically late in place/route, that do not change the
function, but try to improve on the timing of the design
The ECO’d netlist and the RTL do match functionally
ca
Functional ECOs
Steps
1 Functional
e
1. The specification calls to add or Specification
Changes
remove functionality.
c
2. The netlist is manually modified 3 RTL Coding
either in logic implementation or
n
physical implementation.
e
3. The RTL code is modified to Logical 2
Implementation
match the functionality of the
ECO’d netlist and verified.
a d
4. Once the ECO is verified, then the
rest of tapeout process is
completed.
Physical
Implementation
2
c
Design Verification 4
and Tapeout
Final
Implementation
Functional ECOs (continued)

To save time, we skip the logical
// RTL Code
implementation or physical
e
always @ (posedge clk)
implementation steps by reusing q <= !((!c ? !b : a) || d);
the information from previous
c
runs. Netlist after Logic Synthesis
The instance names (u1, u2, etc.) u3
n
a u4 u5
of the gates are preserved from
e
logic synthesis into placement and
u1
physical implementation. b
d
If we modified the RTL, then re-
c
synthesized the design, - all of the u2
a
instance names would be different d
and our placement information
c
from the previous runs would be Netlist after Placement
useless. u1 u3
u2
u4 u5

How Much Is Too Much?
For a functional ECO, how much logic can be implemented?
e
To simplify, there are three cases:
Easy: Easy to perform an ECO, just a few gates
nc
Medium: ECO will be tough, not impossible
Difficult to impossible: Better off re-synthesizing the design
e
Difficulty
a d Re-synthesize design from RTL
6/16/08
c
ECO
1-100 Gates
ECO or
Re-synthesize
~1% of Total Gates

Amount of Logic
787
Timing ECOs
Timing ECOs typically occurs late in the Netlist after Logic Synthesis
physical implementation process. a
u3
u5
e
u4
Steps
c
u1
1. Critical paths are analyzed with b
the place/route tool or a static
n
timing analysis tool. c
u2
2. Suggestions are made by the d
e
design engineer or the tool to
implement a timing ECO. Netlist after Placement
d
Example: Upsize “u4” to next u1 u3
higher power.
a
u2
3. The ECO is done, and timing is u4 u5
re-analyzed.
6/16/08
c
4. This is iterated until all paths meet
timing.
5. Once the design meets timing, the
rest of the flow is completed.
Netlist after Timing ECO
u1

u4
u3
u5
u2
788
ECO Implementation Types
There are three types of ECO implementation types:
e
Spare gates
Metal fix
Focused ion beam (FIB)
nc
d e
ca
What Are Spare Gates?

Definition: The purposeful insertion of extra logic in the netlist, just in
case an ECO is required. Spare gates can be used to modify or add
e
logic to an existing design.
c
Most design teams will have a strategy to include spare gates in their
design, just in case they are needed for ECOs.
en
Spare gates can be implemented before we tapeout a design, and
typically during the physical implementation process.
There are several methods for including spare gates:
a d
Randomly sprinkle gates where available
Instantiate a “pack” of ECO gates at various levels in the design hierarchy
c
Use “ECO bulk” cells

Random Spare Gates
The simplest method is to randomly add
spare gates in the areas that are
e
available. Netlist before random spare cell insertion
c
This can be a manual process or u1 u3
done using the place/route tools u2
utilities
n
u4 u5
Simple process
Flip-flops typically are not

e
Can be difficult to find the proper
gate with the right drive strength
d
Netlist after random spare cell insertion
u1 s1 s2 s3 u3 s4
a
connected to the clock-tree or
s5 s6 s7 s8 u2
scan-chain, and must be
c
connected if used u4 u5
Instantiating an ECO Pack

A more proactive approach is to use an
ECO pack.
Instantiate in various quantities
ce
throughout the design hierarchy.
The ECO pack can contain flops,
// RTL Code
always @ (posedge clk)
q <= !((!c ? !b : a) || d);
n
muxes, and random gates.
// Instantiate ECO Packs
The flops can be connected to the
e
eco_pack eco_u0 (…);
clock-tree and scan-chain during eco_pack eco_u1 (…);
the normal implementation
d
eco_pack eco_u2 (…);
process.
a
Design team has better control of
the instantiation, contents, and
c
reuse of the spare gates.

ECO Bulk Cells
Some vendors have special ECO cells Netlist after random ECO bulk cell insertion
called “bulk” cells.
e
u1 e1 e2 e3 u3 e4
ECO bulk cells are randomly
e5 e6 e7 e8 u2
c
placed throughout the design.
u4 u5
Can be “programmed” by adding a
n
specific functional cell on top of Netlist after ECO bulk cell modification
the bulk cell.
e
u1 e1* e2 e3 u3 e4
A single ECO bulk cell can e5 e6 e7 e8 u2
become an inverter, nand, nor,
d
u4 u5
xor, etc., just by changing the
functional connections on top of
a
VDD VDD
the cell.
c
Gives a lot of flexibility for later
stage ECOs. a z a z
VSS VSS
e1 e1*
Implementing ECOs
If carefully planned, an ECO “pack” of
cells would be located near every ECO s1
e
location.
Unfortunately, there is not enough room
c
on most chips to do so.
n
Let’s say
The output of u4 is currently
e
connected to input of u5.
We need to invert the output of u4
d
and feed it to u5.
a
We do not have ECO “bulk” cells.
s1 is a spare inverter.
c
s2 is a spare 2-1 mux.
How do you choose the right cell to

implement the ECO?
6/16/08
u1

s2
u4
u3
u5
u2
794
e
Spare gates
Metal fix
nc
d e
ca
What Is Metal Fix?

Definition: Once the design has gone
through detail route, a metal fix is an ECO
e
where only a few metal layers are modified in Mask N
order to modify connectivity between existing
c
logic in the design.
Metal fix occurs after the design has taped-
n
out and is in the midst of production.
e
To make changes at this point, it is always Mask 10
best to consider a metal fix or a “metal-only” Mask 9
fix because we can reuse our previous work Mask 8
d
as much as possible. Mask 7
Mask 6
a
Consider the “masks” for a tapeout: Mask 5
Mask 4
Each mask represents a layer in our
c
Mask 3
design. Mask 2
If we make modifications, we would Mask 1
like to minimize the number of layers,
to minimize the number of masks
changed.

What Is Metal Fix? (continued)
For example, let’s say we needed to
Netlist before metal-only fix
create a simple change
e
u1 s1 s2 s3 u3 s4
Re-route the existing design to
c
use an inverter instead of a buffer s5 s6 s7 s8 u2
(u2). u4 u5
n
Identify a spare cell close by to re-
route (s8).
d e
Implement the metal-only changes
with as few layers as possible.
Change just two mask layers and
Netlist after metal-only fix
a
u1 s1 s2 s3 u3 s4
continue with production.
s5 s6 s7 s8 u2
c
u4 u5

e
Spare gates
Metal fix
nc
d e
ca
What Is a Focused Ion Beam?
Definition: Once the design has gone
through the manufacturing process, a
e
focused ion beam (FIB) machine can be
used to etch away or add connections
c
to a die in order to modify or add logic to
an existing design.
An FIB is a specialized machine that

can add or remove material with very
high accuracy.
en
d
After a chip has been produced,
wire connections can be removed
a
or added to change functionality.
c
This is an expensive alternative
and is done for one die.
This is usually done for prototype http://en.wikipedia.org/wiki/Image:Fib_tem_sample.jpg
parts, etc.
ECOs with Back-End Tools

In most back-end tools, we can choose
to implement ECOs using
e
// Original Netlist
ECO by netlist buf1x u2 (.a(n1), .z(n2));
c
Create a modified Verilog® netlist
and have the back-end tool // ECO Netlist
n
incorporate the new cells. // buf1x u2 (.a(n1), .z(n2));
ECO by change list inv1x u2 (.a(n1), .z(n2));
e
Create a command file to add or
remove cells and connections.
a d // ECO change list

-buf1x u2
-n1 u2.a
c
-n2 u2.z
+inv1x u2
+n1 u2.a
+n2 u2.z

Situation
e
Assume you have a single-gate ECO to implement, changing a two-
input AND gate to a two-input OR gate.
nc
You have a spare 2-1 mux near the two-input AND gate.
You have a spare two-input OR gate far from the two-input AND gate.
e
Questions
How can a 2-1 mux behave like a two-input OR gate?
a d
How would you implement this ECO?
What factors would you consider when choosing the mux or the OR
c
gate?
Learning Activity
e
Study several scenarios of design at different stages of the
nc
Decide which course of action is best suited for your scenario,
including the implementation and verification of your ECO
d e
ca
e
Design verification
Tapeout
nc
d e
ca
Design Verification
The design verification flow consists of
Physical
Formal verification or logic
e
Verification
Original
equivalence checking (LEC) Physical
Formal
Verification
c
Implementation
(LEC)
Physical verification ECO
GDSII export to layout
n
GDSII
to Layout
Signoff LVS and DRC
d e Layout Tool
Mask Prep
a
GDSII and
for Tapeout Manufacturing
c
Signoff
LVS and DRC

Logic Equivalence Checking
ECOs involve the manual edit of a netlist.
One task we must perform is to ensure the functionality is consistent between
ECO’d netlist (timing ECO).
ce
the RTL and the ECO’d netlist (functional ECO) or the original netlist to the
LEC, which is part of formal verification, can be used to verify these cases.
RTL
Functional ECO
Design
en ECO’d RTL
Timing ECO
d
Engineer RTL
Netlist
Logic
ca
Synthesis
Design
Engineer
Formal
Verification (LEC)
ECO’d
Netlist Netlist
Logic
Synthesis
Design
Engineer
Formal
Verification (LEC)
ECO’d
Netlist
Verifying Functional ECOs

Edits so far
Manual edits to the netlist during
e
Specification
physical implementation (1) Simulation (3)
c
Corresponding edits to the RTL to 2 RTL Coding
reflect the functional changes (2)
To verify that the RTL code matches the

netlist, we can either run
Simulation of the RTL code vs. the
en Simulation (3)
Logical
Implementation
Formal
Verification (4)
d
netlist and compare results (3) Physical
1
Implementation
a
Formal verification or equivalence
checking of the RTL code vs. the
c
ECO’d netlist (4) Design Verification
and Tapeout
Final
Implementation

Verifying Timing ECOs
Edits so far
Manual edits to the netlist during
e
Specification
physical implementation (1)
c
Since the functionality has not been RTL Coding
changed, we can functionally verify the
n
ECO’d netlist with the original netlist.
e
Logical
Simulation of the original netlist Implementation
vs. the ECO’d netlist and compare
Simulation (2)
d
results (2) Original
1 Physical Formal
Formal verification or equivalence Implementation Verification (3)
a
ECO
checking of the original netlist vs. Simulation (2)
the ECO’d netlist (3)
c
Design Verification
and Tapeout
Final
Implementation
Design Verification
The design verification flow consists of Physical
Verification
Formal verification or LEC
e
Original
Formal
Physical
Verification
Physical verification Implementation
c
ECO (LEC)

GDSII
n
Signoff LVS and DRC to Layout
d e GDSII
Layout Tool
Mask Prep
and
a
c
Sign-off
LVS and DRC

Physical Verification
Physical verification involves several checks from within the place/route
environment before the GDSII generation.
These checks include

Connectivity
ce
n
Geometry
e
Antenna
Manufacturability
a d
6/16/08
What Are Connectivity Checks?

Verify the connectivity of your design to detect and report various
conditions, including
Opens
Unconnected pins
ce
n
Dangling wires
e
Loops
Partial routing
a
verifyConnectivity
d
In the SOC Encounter® environment use the command
6/16/08
c
Then, view the resulting violations in the Violation Browser.

What Are Geometry Checks?
Like a design rule checker (DRC), this checks the physical layout of the
design, including the following violations for nets.
Width
Length
ce
n
Spacing
e
Area
Overlap
Enclosure
Wire extension
a d
Via stacking
verifyGeometry
6/16/08
c
In the SOC Encounter environment, use the command
What Are Antenna Checks?

During fabrication, excess charge can build up in a long wire and break down the
thin gate oxide of the load connected to it.
M1
M2
ce M1
Driver
Circuit after fabrication
en Load
d
Breakdown!
a
M1 M1
Driver Load
c
Circuit during fabrication

What Are Antenna Checks? (continued)
In each technology process, there are rules, per metal layer, which dictate how
much area can be connected to a pin. If the area exceeds that, it is an antenna
e
violation. To fix the violation, one can
c
Change metal layers so that the rule is met
Add a diode so that there is a discharge path for the excess charge
M1
M2
en M1
M2
M1
d
Driver Load
a
Circuit after metal layer change
c
M2
M1 M1
Driver Diode Load
Circuit after diode insertion
What Are Antenna Checks? (continued)

Check the charge that builds up on pins caused by routing that does
not have a discharge path to a gate.
ce
Check for pin routing that violates the maximum antenna charge for
the pins, and report violations on pins that have an antenna ratio
larger than the maximum allowed antenna ratio specified for the
n
routing layer.
e
Check for unconnected metal segments that violate the maximum
area specified in the technology file.
a
verifyProcessAntenna
d
6/16/08
What Are Manufacturability Checks?
Alpha particles can cause problems during manufacture:
Via defects
Cell defects
Wire defects
ce
Via Defects
en
Cell Defects Wire Defects
d
Alpha particle blocks via Alpha particle blocks a gate pin Alpha particle causes a short
ca
What Are Manufacturability Checks? (continued)

To improve the manufacturability of the design, design teams should consider the
following:
Use redundant vias.
ce
Use “yield-hardened” library cells.
Use thicker wires with more spacing on critical nets.
Redundant Vias
en
Yield Hardened Cells Thicker wires + more spacing
d
Improve via reliability Cells are slower, but safer Wires and spacing take up more
space, but are safer
ca
Manufacturability
Calculates the probability of yield loss due to the following effects:
Cell failures
Via failures
Wire opens and shorts
ce
n
These effects are caused by random particles that land on the die during
fabrication, causing defects.
reportYield
d e
a
Note: This accounts for only a portion of actual yield loss. There is also parametric
yield loss due to RC variation or systematic yield loss due to lithography problems.
6/16/08
Design Verification
Physical
e
Verification
Original
Formal
Physical
Physical verification Verification
c
Implementation
ECO (LEC)
n
GDSII
d e Layout Tool
Mask Prep
a
GDSII and
c
Signoff
LVS and DRC

GDSII Export to Layout
The GDSII exported from the
Physical
place/route tool does not have all of the
e
Verification
necessary information for manufacturing Original
Formal
Physical
and final LVS/DRC sign-off. Verification
c
Implementation
ECO (LEC)
A layout tool, such as Virtuoso, is
n
required to produce the final GDSII for GDSII
to Layout
tapeout and final LVS/DRC sign-off.
d e Layout Tool
Mask Prep
a
GDSII and
c
Signoff
LVS and DRC
Design Verification
Physical
e
Verification
Original
Formal
Physical
Physical verification Verification
c
Implementation
ECO (LEC)
n
GDSII
d e Layout Tool
Mask Prep
a
GDSII and
c
Signoff
LVS and DRC

What Are LVS and DRC?
Definition: Layout Versus Schematic
Physical
(LVS) and Design Rule Check (DRC)
e
Verification
are sign-off checks run to ensure the Original
Formal
Physical
integrity, functionality, and Verification
c
Implementation
(LEC)
manufacturability of the chip. ECO
LVS is a comparison of the
n
GDSII
Verilog® netlist vs. GDSII to to Layout
e
ensure the functionality of the
design. Layout Tool
d
DRC is a detailed check of the
routed design against the Mask Prep
a
GDSII and
technology’s set of rules. for Tapeout Manufacturing
c
Signoff
LVS and DRC

LVS
Input
e
TCL
Gate-level netlist in the Verilog Gates GDSII
c
language
GDSII LVS
n
Rule deck
Rule SPICE
Deck Libs
SPICE libraries
Commands in Tcl
Output
LVS reports
d e Reports
ca
Input and Output, Format (continued)
DRC
Input
e
TCL
GDSII
GDSII
c
Rule deck
DRC
Commands in Tcl
n
Rule
Output Deck
e
DRC reports
Reports
a d
6/16/08
Layout vs. Schematic: LVS

Schematic refers to the gate-level netlist that once was a schematic
diagram and now comes from the synthesis tool.
c
determined from that layout,
e
A flat layout is produced, and active devices and routing are
Where poly overlaps diffusion, a transistor is assumed.
able to route signals.
en
All poly, diffusion, and metal layers are conductive and are assumed to be
The netlist extracted from the layout is compared to the original gate-
d
level netlist to verify that they are the same.
a
This is a double-check on the place and route process.
c
An LVS check should be done on the final layout of all ICs.

LVS Analysis Process
Steps: Design Netlist:
• From Gates to Transistors

• Primary I/Os Identified
• Connectivity Traced
ce Net1 Net2 Net3
n
IN1 O1
• Device Recognition
Design Layout:
Net1 VDD
I1
Net2
VDD
d
I3
e Net3
Design Transistors:
Net1 I1
2/1
Net2
I3
2/1 Net3
IN1
ca
GND
A B
I2
GND
A B
I4
O1 IN1
I2
1/1
I4
1/1
O1
Design Rule Checks

Goals
Increase yield and reliability of ICs
Allow automated design checks
ce
Simplify the design process for the designer
n
Typical design rules
e
Minimum width and spacing on each layer
Overlap of metal over via
Metal coverage/slotting rules
a d
Rules are created by the foundry for each manufacturing process.
c
Most rules generated through process characterization
Some rules derived from consistent failure modes of ICs

Design Rule Checks (continued)
Design rules for the layout assure that the IC will work when it is
manufactured.
Example design rules
ce
Minimum width for all layers
n
Minimum spacing for all layers
Minimum spacing between layers: diffusion to well boundary
d e
Overlap of between layers for vias, contacts, and transistors
Percent coverage of a layer for metal layers
Design rule files are rules as drawn.
ca
Photographic processes can be positive or negative.
Widths as manufactured may be larger or smaller then as drawn.
A DRC must be performed on the final layout.
Example Design Rules

Metal 1
overlap of via
Via width
ce Metal 2
width
Metal 2
en
d
overlap of via
ca Metal 1 to
Metal 1
spacing
Metal 1 width

e
Design verification
Tapeout
nc
d e
ca
Tapeout
Tapeout checklist
e
Mask preparation
Chip manufacturing
nc
d e
ca
Tapeout Checklist
Design teams need to have a checklist to ensure that all processes and
procedures were covered during the design, implementation, and
e
verification phases.
c
Important areas to check
n
RTL code and netlist information
All related timing, constraint, power, signal integrity information
d e
All related design-for-test (DFT) information
All related simulation information (RTL, gate)
a
All related vendor specific requirements
c
All related package, board, software, and system information
All related sign-off criteria
Tapeout Checklist: Example

There are many tasks to track and record for tapeout. They
include, among others,
e
RTL Code Freeze and Version Noted
Synthesis Netlist Version Noted Make sure starting RTL code and netlist are noted
c
Testbench Versions Noted
Functional Verification Passed Make sure simulations pass
Pre-Layout Timing Analysis
n
Validate early timing and SDCs
SDC validity Checked
Boundary Scan Checked
e
Memory BIST Checked
Ensure all DFT processes are complete
Scan Chain Insertion Checked
d
Floorplan Version Noted
Power Grid Analysis Checked
Validate early place/route power and timing
Place/Route with Timing Closure Done
a
Signoff
Physical Verification (LVS/DRC/Antenna)
c
Formal Verification
Static Timing Analysis Ensure all sign-off criteria is met
IR Drop
EM Check
ATPG Done Create ATPG vectors, and make sure
Gate-Level Verification Done Gate-level simulations pass

What Are Masks, Wafers, and Photolithography?
Masks (or Photomasks)
Definition: An opaque plate with holes or transparencies that allow light to
Wafers
ce
shine through in a defined pattern, commonly used in photolithography.
Definition: Thin slice of semi-conducting material, such as a silicon
Photolithography
en
crystal, on which microcircuits are constructed.
Definition: A process used in the fabrication of integrated circuits to
a d
selectively remove parts of a thin film.
Example: In the fabrication of semiconductor devices, masks are used to
create custom patterns of different materials on wafers using
c
photolithography.
Mask Preparation
With advanced geometries, there are problems with the creation of masks due to
the very small sizes of wires and gates.
causing errors in the mask.
ce
The light sources used to create the masks themselves are not accurate enough,
n
Layout 0.25µ 0.18µ
d e
a
0.13µ 90 nm 65 nm
6/16/08
c Figures courtesy Synopsys Inc.

What Are OPC and PSMs?
Optical proximity correction (OPC)
Definition: A photolithography enhancement technique commonly used to
Phase Shifting Masks (PSMs)
ce
compensate for image errors due to diffraction or process effects.
Definition: Photomasks that take advantage of the interference generated
en
by phase differences to improve image resolution in photolithography
Example: OPC and PSM are used in advanced geometries to
improve the printability of wires during mask creation.
a d
6/16/08
Advanced Mask Technologies

To address these issues, several advanced mask technologies have been
developed, including
OPC
PSM
ce
en
a d
6/16/08
OPC
OPC is the manipulation of the
mask itself to create extra patterns Optical Proximity Correction (OPC)
e
Design Wafer
to compensate for the errors due
to photolithographic process.
As technologies advance to
smaller geometries, the
nc
wavelength of the light used in the
No OPC
e
photolithographic process is
actually bigger than the mask
shapes themselves, causing
d
errors. OPC
a
The extra shapes modify the mask
to compensate for these effects.
6/16/08
PSMs
Like OPC, PSMs serve to
Phase Shifting Masks (PSM)
compensate for errors in the
e
photolithographic process. (a) Regular mask
(b) Alternating PSM Mask
c
PSM relies on the interference (c) Attenuating PSM Mask
created by mask modifications to
achieve its goal. Both (b) and (c) have the
n
effect of improving
the contrast on some
e
parts of the wafer, which
could improve the
resolution, as is done
d
with OPC
ca
Chip Manufacturing
Masks and wafers are processed to create integrated circuits.
Masks
ce
en
d
Chemical
and other Wafers
a
Processing
Wafers
c
Wafers Processed Integrated
Wafers Wafer Circuits
Photolithography
e
Start with wafer at current step
Spin on a photoresist
nc
e
Pattern photoresist with mask
a
etch, implant, etc.
d
Step specific processing
6/16/08
c Wash off resist
Courtesy K. Yang, UCLA

Processed Wafers
Most processed wafers contain many copies of the same integrated circuit.
Some processed wafers contain one copy of many different integrated circuits
called a shuttle.
ce
Shuttles are used for prototypes or test chips.
en
a d
6/16/08
c Courtesy D. Bouldin, U. Tennessee
Packaging Process
The last step in the fabrication of a
semiconductor device is packaging.
e
Die Cut
Steps Wafers
c
Die cut—From the wafer, each
individual die is cut Processed Integrated
n
Wafer Circuits
Die attachment—The die mounted
e
to the package or support
structure
d
IC bonding—Interconnect the die
I/O with the package I/O
a
Die Attachment IC Bonding
IC encapsulation—Enclose the die
c
with ceramic, plastic, or epoxy to
prevent physical damage or
corrosion
IC Encapsulation

Learning Activity
List as many of the tapeout requirements as you can, without looking back at
e
the lecture material or your notes
c
n
d e
ca
Summary
ECOs are a vital part of the design process. Design teams have to add
critical functionality with least impact on schedule and cost using
e
ECOs. They do this by carefully planning for ECOs up front.
c
Design verification involves several checks to ensure that the design
functionality, integrity, and manufacturability of the chip are verified.
en
Tapeout involves ensuring all of the important steps in the overall
process are accounted for, through mask preparation and the final
manufacturing steps.
a d
6/16/08
True or false
e
1. Design teams can plan for ECOs very early in the design process.
2. When using a register as a spare gate for an ECO, you can simply
nc
connect it up like a regular logic gate.
3. LVS and DRC are run on a netlist just after logic synthesis.
e
4. Alpha particles cause random errors during the manufacture of a chip.
5. When creating a tapeout checklist, it is important to note all of the

sign-off criteria.
a d
6/16/08
Sources
Gennari, Frank. Overview of OPC.
http://www.cs.berkeley.edu/~ejr/GSI/cs267-s04/homework-
e
0/results/gennari/
nc
d e
ca

Cadence Workshop Trainee

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cadence Workshop Trainee

Uploaded by

Copyright:

Available Formats

BD03: Digital Physical Design

June 16, 2008

6/16/08 BD03: Digital Physical Design 2

6/16/08 BD03: Digital Physical Design 3

Phase 1: Curriculum Map

6/16/08 BD03: Digital Physical Design 4

 Explain the steps involved in synthesis, floorplanning, placement,

 Describe how engineering change orders (ECOs) are performed, and

6/16/08 BD03: Digital Physical Design 5

 Be respectful of each other.

6/16/08 BD03: Digital Physical Design 6

 Assignments are due on the date indicated in the syllabus.

B. Praiseworthy performance, meets course requirements and criteria

6/16/08 BD03: Digital Physical Design 7

Assignments and Grades (continued)

Homework 3: 20% 8/13/08

Formal Study Group Presentation 10% 8/18/08 – 8/21/08

Final Exam 40% 8/22/08

6/16/08 BD03: Digital Physical Design 8

6/16/08 BD03: Digital Physical Design 9

Course Calendar (continued)

6/16/08 BD03: Digital Physical Design 10

6/16/08 BD03: Digital Physical Design 11

Course Calendar (continued)

5 Aug 20 Formal Study Group Presentation

Aug 22 Final Exam

June 16, 2008

The Life of a CMOS Inverter

Specification RTL Gates

BD03: Digital Physical Design

6/16/08 BD03: Digital Physical Design 17

 Include the necessary inputs and

15 minutes for activity

Topics in This Module

Overall Design Flow

Timing GDSII Placed/Routed Design

Physical Verification Layout

6/16/08 BD03: Digital Physical Design 22

BD03: Digital Physical Design

definition: Processes in the

6/16/08 BD03: Digital Physical Design 24

Back End or Physical Design

Static Timing Analysis

Layout Design Verification

GDSII GDSII Mask Prep

6/16/08 BD03: Digital Physical Design 27

Topics in This Module

 Inputs and outputs

 Example per step

explicit set of requirements to be

satisfied by a material, product, Design Optimization

or service. Logic Synthesis

MHz core clock with a serial

6/16/08 BD03: Digital Physical Design 30

implements the specification and

Static Timing Analysis

implementation. Design Optimization

 Example: For Block A, the Synthesized

the block into several smaller

6/16/08 BD03: Digital Physical Design 31

Specification and Microarchitecture: Input and Output, Format

BD03: Digital Physical Design 32

We are designing a chip called “EX”

6/16/08 BD03: Digital Physical Design 33

Explain the steps involved in synthesis, floorplanning, placement,

Describe how engineering change orders (ECOs) are performed, and

Be respectful of each other.

Assignments are due on the date indicated in the syllabus.

Include the necessary inputs and

Inputs and outputs

Example per step

Example: For Block A, the Synthesized

Example: The three blocks of the

Floorplan constraints and script in Floorplanned Design

Assign the din, clk, and dout I/Os

Perform macro placement

Place RAMs and macros

Check power plan

Example: To meet timing, we

Example: Since logic synthesis

Scan chain information in DEF

Physical netlist is reordered based on placement.

Example: After initial

Upsizing or downsizing cells

Re-synthesizing paths to improve timing, etc. C

Clock latency in the design

Modifications to the clock tree itself