Professional Documents
Culture Documents
Version 1.0
STUDENT
HANDOUT
Legal Notices
Copyright Notice
© 1990-2008 Cadence Design Systems, Inc. All rights reserved.
e
When printed on paper, this presentation qualifies as a STUDENT HANDOUT.
This course and the material in it is owned by Cadence Design Systems, Inc. (Cadence), 2655 Seely Avenue, San Jose, CA
95134, USA. Unless you have received express written approval directly from Cadence, you are not allowed to copy, scan,
c
replicate, disclose, distribute, or publish this document, or any part of it.
Confidentiality Notice
n
No part of this publication may be reproduced in whole or in part by any means (including photocopying or storage in an
information storage/retrieval system) or transmitted in any form or by any means without prior written permission from
Cadence Design Systems, Inc. (Cadence).
e
Information in this document is subject to change without notice and does not represent a commitment on the part of Cadence.
The information contained herein is the proprietary and confidential information of Cadence or its licensors, and is supplied
subject to, and may be used only by Cadence’s customer in accordance with, a written agreement between Cadence and its
d
customer. Except as may be explicitly set forth in such agreement, Cadence does not make, and expressly disclaims, any
representations or warranties as to the completeness, accuracy or usefulness of the information contained in this document.
Cadence does not warrant that use of such information will not infringe any third party rights, nor does Cadence assume any
a
liability for damages or costs of any kind that may result from use of such information.
RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in
c
subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013.
UNPUBLISHED This document contains unpublished confidential information and is not to be disclosed or used except as
authorized by written contract with Cadence. Rights reserved under the copyright laws of the United States.
e
the address above or call 800.862.4522 from the US or +1.408.943.1234 internationally.
Allegro® HDL-ICE® Silicon Ensemble®
c
Accelerating Mixed Signal Design® Incisive® Silicon Express™
Assura® InstallScape™ SKILL®
BuildGates® IP Gallery™ SoC Encounter™
n
Cadence® (brand and logo) NanoRoute® SourceLink® online customer support
CeltIC® NC-Verilog® Specman®
e
Conformal® NeoCell® Spectre®
Connections® NeoCircuit® Speed Bridge®
Diva® OpenBook® online documentation library UltraSim®
d
Dracula® OrCAD® Verifault-XL®
ElectronStorm® Palladium® Verification Advisor®
a
Encounter® Pearl® Verilog®
EU CAD® PowerSuite® Virtuoso®
Fire & Ice® PSpice® VoltageStorm®
c
First Encounter® SignalStorm® Xtreme®
HDL-ICE® Silicon Design Chain™
Other Trademarks
Open SystemC, Open SystemC Initiative, OSCI, SystemC, and SystemC Initiative are trademarks or registered
trademarks of Open SystemC Initiative, Inc. in the United States and other countries and are used with permission. All
other trademarks are the property of their respective holders.
e
BG01
BG02 IC Process
Semiconductor BG03 IC Packaging BG04 Test and DFT
and Devices
c
Business Processes
n
Digital Discipline
e
BD01 Digital IC BD02 Digital IC BD03 Digital
Tool
Lab Training
Training
Architecture Design Physical Design
Analog Discipline
a
BA01 Analog IC
d
BA02 Mixed-Signal
BA04 Mixed-Signal
c
CMOS Physical Lab Training
Design IC Design
Implementation
or
BA03 RF IC Design
e
Draw a complete flowchart of the digital design implementation flow
and explain the steps in detail
n
design and timing analysis
c
Describe how cell libraries and timing libraries are used for physical
e
clock tree synthesis, routing, extraction, delay calculation, static timing
analysis, and design optimization
a d
Contrast power consumption and power grid analysis, and apply
power saving design techniques
c
Explain the issues involved with signal integrity
Course Policies
It is important that you attend class. Your participation is essential.
e
Three or more absences will damage your grade.
Be on time.
nc
Conduct only one conversation at a time.
e
Turn off cell phones, pagers, and laptops.
d
Get involved.
Come prepared to discuss the day’s assignment.
a
Volunteer.
c
Ask questions.
Share relevant ideas and observations.
Offer your own experiences.
e
Assignments are discussed in class.
Grades
nc
Keep a copy of all assignments you hand in.
d e
A. Outstanding achievement, exceeding course requirements
a
C. Average, satisfactory performance
c
D. Below average, marginal performance
e
Homework 1: 15% 7/30/08
Describe the issues, changes in design flow,
c
and considerations that design teams must
take into account when designing for a deep
submicron process (90 nm or less).
n
Homework 2: 15% 8/6/08
Create a clock tree constraint file for
e
automatic CTS based on a specification.
d
Part I. Given several scenarios, calculate
static and dynamic power.
Part II. Given several IR-drop heat maps,
a
discuss the potential problems and solutions.
Part III. Given a block diagram and several
scenarios, discuss which possible low-power
c
design methods can be used to reduce overall
power.
e
1 July 21 Introduction to Digital Physical Design flowchart, activity in class.
Implementation
- Inputs
c
- Steps in Flow
1 July 22 Introduction and Overview of Layout LEF terms, activity in class.
n
Technology
- Layout Layers Homework Assignment 1 (7/3/08):
e
- Introduction to Physical Verification, Describe the issues, changes in design flow,
DRC/LVS, DRC and considerations that design teams must
- Cell Libraries, LEF Syntax take into account when designing for a deep
d
submicron process in 90 nm or less.
1 July 23 Timing Libraries and Constraint Create timing constraints, activity in class.
a
Files
- Concepts
- Libraries
c
- Constraint Files
2 July 28 Synthesis Review log file and optimization steps,
- Logical Synthesis Optimization Steps activity in class.
- Physical Synthesis Overview
e
2 July 29 Floorplanning and Placement Examples of floorplans, activity in class.
- Floorplanning Fundamentals
- Placement Fundamentals
c
2 July 30 Clock Tree Synthesis Homework Assignment 2 (7/10/08):
- Clock Trees and Clock Tree Synthesis Describe the issues, changes in design flow,
n
- Clock Tree Specification and considerations that design teams must
- CTS Reports take into account when designing for a deep
submicron process in 90 nm or less.
e
- Low-Power Clocking Techniques
3 Aug 4 Routing Review routing log files, activity in class.
- Fundamentals
d
- Special Types of Routing
3 Aug 5 Power Consumption and Power Grid Homework Assignment 3 (7/17/08):
a
Analysis Part I. Given several scenarios, calculate
- Power Consumption static and dynamic power.
c
- Power Grid Analysis Part II. Given several IR-drop heat maps,
- Low-Power Design Techniques discuss the potential problems and solutions.
Part III. Given a block diagram and several
scenarios, discuss which possible low-power
design methods can be used to reduce
overall power.
e
3 Aug 6 Extraction and Delay Calculation Flowchart with SPEF/SDF, activity in class.
- Extraction Models and SPEF Format
- Delay Calculation Fundamentals and SDF
c
Format
4 Aug 11 Static Timing Analysis and Signal Timing and SI report analysis, activities in
n
Integrity Analysis class.
- Timing Constraints and Analysis
e
- Design Rule Verification
- Signal Integrity Fundamentals Analysis
4 Aug 12 Design Optimization Review optimization cases, activity in class.
d
- Fundamentals
- Types
a
4 Aug 13 Engineering Change Orders, Design ECO scenarios and tapeout requirements,
Verification, and Tapeout activities in class.
c
- ECO Types and Fundamentals
- Physical Verification Overview
- Tapeout Requirements
e
5 Aug 18 Formal Study Group Presentation
c
5 Aug 19 Formal Study Group Presentation
5
Aug 21
en
Formal Study Group Presentation
a d
6/16/08
c BD03: Digital Physical Design 12
Recommended Text
Hennessy, John L. and Patterson, David A. Computer Architecture,
Fourth Edition: A Quantitative Approach. San Francisco, CA: Morgan
e
Kaufmann. 2007.
c
ISBN-10: 0123704901
ISBN-13: 978-0123704900
en
a d
6/16/08
c BD03: Digital Physical Design 13
Instructor Information
Instructor name:
e
Phone:
E-mail:
Office location:
Office hours:
nc
d e
ca
6/16/08 BD03: Digital Physical Design 14
Introduction to Digital Physical
Implementation
Module 1
e
“Device that outputs the module invx1(a,z);
inverse of its input with
c
input a; invx1
minimum size and power”
output z; a z
assign z=!a;
n
endmodule
d e
Transistor and Layout
a
VDD VDD
6/16/08
c a
GND
z a
16
Design Implementation Flow
Much like the simple CMOS inverter, the general process of digital design
implementation is the transformation of a design into various representations,
e
eventually into physical hardware devices, just on a much BIGGER scale.
SPEC
nc RTL Gates
d e
ca Layout
Module Objectives
In this module, you will be able to
e
Draw a complete flowchart of the digital design implementation flow
nc
d e
ca
6/16/08 BD03: Digital Physical Design 18
Learning Activity
In this activity, you will
Complete a flowchart of the digital
design implementation flow
Include the design flow steps
ce RTL
n
outputs Design Flow
?
e
Step
a d ?
6/16/08
c BD03: Digital Physical Design
GDSII
19
e
Basic implementation flow
Example flow
nc
d e
ca
6/16/08 BD03: Digital Physical Design 20
Overall Design Flow
A design flow can be divided into three phases:
e
System
Logical
Physical
nc
In each phase, two main processes need to be performed:
Implementation
Verification
d e
ca
6/16/08 BD03: Digital Physical Design 21
e
IMPLEMENTATION System Simulation Designer
SYSTEM
c
Microarchitecture
n
System Simulation Designer
e
RTL
LOGICAL
Formal
Verification Logic Simulation Logic Synthesis
a d
Gate Level Simulation
Gates
Place/Route
Synthesized Netlist
c
PHYSICAL
GDSII GDSII
e
Designer
SYSTEM
c
Microarchitecture
n
Designer
e
RTL
LOGICAL
Logic Synthesis
a
Gates
Place/Route
d Synthesized Netlist
6/16/08
c GDSII
Layout
GDSII
Placed/Routed Design
GDSII
Implementation Flow
Specification
e
Designer
c
Microarchitecture
Front-end chip design
FRONT-END
definition: Processes in
n
Designer the overall chip design flow
that involve system and
logical design and
e
RTL verification
Logic Synthesis
a
Gates
Place/Route
d
Synthesized Netlist
c
Back-end chip design
BACK-END
GDSII GDSII
e
Basic physical implementation flow
Example flow
nc
d e
ca
6/16/08 BD03: Digital Physical Design 25
e
steps, such as
Floorplanning
Placement
Clock Tree Synthesis (CTS)
Route
nc Gates
Place/Route
Synthesized Netlist
Extraction
Delay Calculation
d e
Static Timing Analysis (STA)
Gates
Place/Route
Placed/Routed Gates
a
GDSII GDSII
Signal Integrity
c
Design Optimization
Physical Synthesis
Design Verification
Mask Prep
6/16/08 BD03: Digital Physical Design 26
Back-End Implementation Flow
Specification Floorplanning Place/Route
e
Designer Placement
c
Microarchitecture
Scan Reorder
Physical Synthesis
n
Design Optimization
Designer
Delay Calculation
Signal Integrity
e
Extraction
CTS
RTL
Design Optimization
d
Post-CTS
Logic Synthesis
Route
a
Synthesized Design Optimization
Gates Gates
Post-Route
c
Detail
Routed GDSII
Design
e
Basic implementation flow
Example flow
nc
d e
ca
6/16/08 BD03: Digital Physical Design 28
Flow Example
Let’s take a simple example through the implementation flow.
e
We will cover each step and highlight the following:
c
Definition and step in the overall flow
Formats
en
a d
6/16/08
c BD03: Digital Physical Design 29
What Is a Specification?
Ideas begin with a specification, which
Floorplanning Place/Route
Specification
can be a textual, graphical, or
e
sometimes a software representation. Designer Placement
Physical Synthesis
Microarchitecture
c
Scan Reorder
Definition: A specification is an
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
RTL
n
CTS
e
Route
Synthesized Design Optimization
Netlist Gates Post-Route
Example: The specification for
d
Detail
the latest chip specified a 250- Routed
Design
GDSII
a
Layout Design Verification
interface, able to process 1 Mb
GDSII
of data per second at less than GDSII Mask Prep
c
10W total power.
e
the block will be implemented. Floorplanning
Specification Place/Route
c
Definition: The microarchitecture Designer Placement
Physical Synthesis
Microarchitecture Scan Reorder
n
defines specific mechanisms and
Delay Calculation
Pre-CTS
Signal Integrity
structures for achieving that
Extraction
RTL
e
CTS
d
Route
a
Detail
microarchitecture and partitioned Routed
Design
GDSII
c
Layout Design Verification
modules.
GDSII GDSII Mask Prep
ce
Marketing, CEO (Chief Executive
Officer), CTO (Chief Technology
Officer), etc.
n
Output: Document or model in Specification
text/graphics or software (C++,
e
SystemC, SystemVerilog, etc.)
Designer
format
Microarchitecture
a d
Input: Specification + requirement
from designer
Microarchitecture
6/16/08
c
Output: Typically a document in
text/graphics, could be software
as well
c
Three main partitions “A,” “B,” and
e EX (Block Diagram)
n
“C”
e
Memories in each partition
din, clk A C dout
Perimeter I/O
250-MHz clock
a
10W total power
d B
c
Die size not to exceed 10x10 mm2
due to custom package
requirements
Example: Microarchitecture
For Block C
32-bit data bus interface to
Block A
16-bit control interface from
Block B
ce EX (Block Diagram)
n
Use 64 Mb of SRAM
e
Duplicate datapath elements in a 32
parallel implementation
din, clk A C dout
d
Limit of five clock cycles from data 16
ca
6/16/08 BD03: Digital Physical Design 34
What Is Logic Synthesis?
Definition: The process of Floorplanning
Specification Place/Route
parsing, translating,
e
Designer Placement
optimizing, and mapping RTL
code into a specified standard
Physical Synthesis
Microarchitecture Scan Reorder
c
cell library
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL CTS
n
Example: To determine the Design Optimization
Post-CTS
Logic Synthesis
feasibility of the design, we
e
Route
d
Routed GDSII
timing, power, and area. Design
a
Layout Design Verification
6/16/08
c BD03: Digital Physical Design 35
e
other HDL
c
Constraints in Synopsys Design
Constraints (SDC) format
Logic Synthesis Library
n
Timing Libraries in Liberty (.lib)
format Synthesized
Gates
e
Netlist
Output
Gate Level Netlist in the Verilog
d
language or other HDL
ca
6/16/08 BD03: Digital Physical Design 36
Example: Logic Synthesis
We use the RTL for blocks A, B, and C
RTL
to produce the following netlists:
e
For Blocks A, B, C
block_a.vg
c
block_b.vg Logic Synthesis
n
block_c.vg
Gates
At the top level EX, the module are
e
Synthesized
instantiated: Gates
d
// top.vg block_a.vg
block_b.vg
block_c.vg
module ex (…);
a
top.vg
block_a u0 (…);
c
block_b u1 (…);
block_c u2 (…);
endmodule
What Is Floorplanning?
Definition: Process of deriving Floorplanning
Specification Place/Route
the die size, allocating space for
e
Designer Placement
soft blocks, planning power, and
macro placement.
Physical Synthesis
c
Static Timing Analysis
Design Optimization
Designer Pre-CTS
Delay Calculation
Signal Integrity
RTL CTS
n
chip were floorplanned to Design Optimization
Post-CTS
Logic Synthesis
minimize the distance between the
e
Route
d
Routed GDSII
reduces the routing between the Design
a
Layout Design Verification
6/16/08
c BD03: Digital Physical Design 38
Floorplanning: Input and Output, Format
Input
Synthesized Netlist
Gate Level Netlist in the Verilog
e
SDC TCL
language or other HDL Gates
c
Constraints in Synopsys Design
Constraints (SDC) format
Floorplanning
n
Logical Timing Libraries in Liberty
(.lib) format Logical
Library
Physical
Library
Gates +
e
Physical Libraries in LEF format DEF
d
TCL
Output
a
Floorplanned design in the Verilog
c
language (logical connectivity
data) or other HDL + DEF
(physical data)
Example: Floorplanning
With a top level netlist, we can begin to
floorplan the chip
Set die size to 10x10 mm2
ce EX
n
Create hard blocks for A, B, and C din dout
e
Size the blocks A, B, and C A C
Perform power planning
a d
Check for early routing congestion
clk
B
10mm
c
Check for early block utilization
10mm
ce from_a dout
n
Check for early routing congestion
e
RAM A1
RAM A0
Check for early block utilization clk
d
It is important to make sure the floorplan
is routable and meets the utilization
a
requirements with a given RAM and from_b
macro placement, pin assignment, etc.
6/16/08
c BD03: Digital Physical Design 41
What Is Placement?
Definition: Process of placing Floorplanning
Specification Place/Route
the standard cells in a
e
Designer Placement
floorplanned design.
Microarchitecture
Physical Synthesis
Scan Reorder
c
Example: After the chip was
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
floorplanned, we performed
Extraction
RTL CTS
n
placement and discovered the Design Optimization
Post-CTS
e
Route
design. Detail
d
Routed GDSII
Design
Question: How can we avoid
a
Layout Design Verification
this problem?
GDSII GDSII Mask Prep
6/16/08
c BD03: Digital Physical Design 42
What Is Physical Synthesis?
Definition: The combination of
Floorplanning Place/Route
Specification
logical synthesis and
e
placement. Designer Placement
Physical Synthesis
Microarchitecture
c
Scan Reorder
Delay Calculation
Pre-CTS
Signal Integrity
ran physical synthesis which, in
Extraction
RTL
n
CTS
e
Route
ran logic restructuring. Synthesized
Gates Design Optimization
Netlist Post-Route
d
Detail
Routed GDSII
Design
a
Layout Design Verification
6/16/08
c BD03: Digital Physical Design 43
e
SDC TCL
Gates +
language or other HDL + DEF DEF
c
Constraints in Synopsys Design
Constraints (SDC) format Placement
n
Logical Timing Libraries in Liberty
Logical Physical
(.lib) format Gates + Library Library
e
DEF
Physical Libraries in LEF format
Placement constraints and script Placed Design
d
in TCL
Output
ca
Placed design in the Verilog
language or other HDL + DEF
Top-Down Placement
ce
Or we can place the standard cells for each of the blocks separately.
Bottom-Up Placement
n
EX C
din
A C
a
RAM A1
RAM A0
c
clk clk
u10 u11 u12
B u14
u13
u15
u16 u17
from_b
e
design to optimize for routing, Designer Placement
timing, etc.
Physical Synthesis
Microarchitecture
c
Scan Reorder
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
RTL
n
CTS
e
Route
reorder after placement so that Synthesized
Gates Design Optimization
Netlist Post-Route
the scan chain routing will be
d
Detail
optimal. Routed
Design
GDSII
a
Layout Design Verification
c
A scan chain is the connection
of the flip-flops in a design, such
that test patterns can be scanned
in and results scanned out
during automated testing.
e
language or other HDL + DEF Placed Design
SDC SCANDEF
c
Constraints in Synopsys Design Gates +
DEF
Constraints (SDC) format
n
Logical Timing Libraries in Liberty
Scan Reorder
(.lib) format
e
Physical Libraries in LEF format Logical Physical
Gates + Library Library
d
SCANDEF format
Scan Chain Reordered
Output Design
ca
Scan chain reordered design in
the Verilog language or other HDL
+ DEF
ce
Logical netlist was stitched numerically.
n
SI DFF1 DFF2 DFF3 SO
e
Logical Netlist
SI DFF1 DFF2 SO
a
DFF3
SI
6/16/08
c DFF1
DFF3
48
Example: Scan Reorder (continued)
Reordered scan chain requires much less routing resources in the example design.
n
dff1 dff3 dff1 dff3
d e
a
RAM A1
RAM A0
RAM A1
RAM A0
dff2 dff2
6/16/08
c BD03: Digital Physical Design 49
e
improve the quality of a digital Designer Placement
design
Physical Synthesis
Microarchitecture
c
Scan Reorder
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
RTL
n
CTS
e
Route
fix timing violations that may Synthesized
Gates Design Optimization
Netlist Post-Route
show up now that the design is
d
Detail
placed and we have delays Routed
Design
GDSII
based on estimated
a
Layout Design Verification
interconnect.
GDSII GDSII Mask Prep
6/16/08
c BD03: Digital Physical Design 50
Pre-CTS Design Optimization: Input and Output, Format
Input
Scan chain reordered design in Scan Chain
e
the Verilog language or other HDL Reordered Design
+ DEF SDC TCL
c
Gates +
Constraints in Synopsys Design DEF
n
clocks) Design Optimization
Pre-CTS
Logical Timing Libraries in Liberty
e
Logical Physical
(.lib) format Gates + Library Library
DEF
Physical Libraries in LEF format
d
Commands in TCL Optimized Placed Design
a
Output
c
Optimized placed design in the
Verilog language or other HDL +
DEF
ce
Pre-CTS design optimization can clean up some of these issues by
n
Buffering nets
d e from_a dout
ca
RAM A1
RAM A0
clk
u10 u11 u12
u13
u14 u15
u16 u17
u11 and u16 are upsized
from_b
e
the same reason.
c
u20
en
d
RAM A1
RAM A0
a
u10 u11 u12
c
u13
u14 u15
u16 u17
e
path, with the goal of Designer Placement
c
Physical Synthesis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
Example: We ran clock tree Design Optimization
e
Route
block and saw a large clock Synthesized
Gates Design Optimization
skew due to bad clock Netlist Post-Route
d
Detail
constraints. We ended up re- Routed GDSII
Design
running clock tree synthesis
a
with better constraints to get Layout Design Verification
6/16/08
c BD03: Digital Physical Design 54
Clock Tree Synthesis: Input and Output, Format
Input
Optimized design in the Verilog
e
language or other HDL + DEF Optimized Placed Design
c
Constraints in Synopsys Design SDC TCL
Gates +
Constraints (SDC) format DEF
n
Logical Timing Libraries in Liberty
(.lib) format CTS
e
Physical Libraries in LEF format Logical Physical
Gates + Library Library
Clock constraints and commands DEF
d
in TCL
Placed Design with
Output
a
Clock Trees Inserted
Post-CTS design with clock trees
c
inserted in the Verilog language or
other HDL + DEF
e
tree in order to minimize
c
Clock skew in the design
n
C
d e from_a dout
a
RAM A1
RAM A0
c
clk
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
c0,c1,c2,c3 clock buffers are added
from_b
e
Netlist before CTS
DFF1 DFF2
c
u11 u13
u10 u11
u11
en
c2
Netlist after CTS
DFF1
u10
u13
c3
DFF2
u11
d
c0 c1
cac0
dff1
u10
u14
u16
u11
u13
u17
Placement after CTS
c2
u15
c1
dff2
u11
c3
ce
n
Upsizing or downsizing cells
d e from_a dout
a
RAM A1
RAM A0
c
clk
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
from_b
e
Floorplanning Place/Route
Specification
standard cells, macros, and
I/Os of a digital design to Designer Placement
c
specific metal layers in the
Physical Synthesis
Microarchitecture Scan Reorder
process technology to
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL CTS
Example: We ran a
Design Optimization
e
Post-CTS
Logic Synthesis
preliminary route on the Route
d
Post-Route
a
placement with a placement Layout Design Verification
density screen to force a
c
lower utilization in that area GDSII GDSII Mask Prep
e
language or other HDL + DEF Placed Design with
Clocks Inserted
c
Constraints in Synopsys Design
Constraints (SDC) format SDC TCL
Gates +
DEF
n
Logical Timing Libraries in Liberty
(.lib) format
Route
e
Physical Libraries in LEF format
Logical Physical
Route constraints and commands Gates + Library Library
d
in TCL DEF
ca
Routed design in the Verilog
language or other HDL + DEF
e
to their specific routing layers according to the synthesized netlist.
The router will try to minimize
Route congestion
nc
d e
RAM A1
RAM A0
ca c0
u10
u14
u16
u11
u13
u17
c2
u15
c1
u12
ce
n
Upsizing or downsizing cells
d e
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
c0,c1,c2,c3 clock buffers are routed, u10 and u14 are upsized
RTL to placement?
ce
What precautions would you take if you were to take your design from
en
a d
6/16/08
c BD03: Digital Physical Design 63
What Is Extraction?
Definition: Process of
calculating the parasitic
e
Floorplanning Place/Route
Specification
resistance and capacitance
Designer Placement
of the interconnect of the
c
Physical Synthesis
Microarchitecture
physical design Scan Reorder
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
n
Signal Integrity
RTL CTS
e
Post-CTS
Logic Synthesis
the design with varying Route
d
Post-Route
a
fully routed design, because Layout Design Verification
c
GDSII GDSII Mask Prep
metal type and length. There
are no estimates for nets at
this point.
e
language or other HDL + DEF or Routed Design
GDSII TCL
c
DEF or
GDSII
LVS verified netlist
n
Physical Libraries in LEF format
Extraction
Extraction constraints and
e
commands in TCL Physical
Library
SPEF
Output
d
Standard Parasitic Extraction Parasitic File
Format (SPEF) file containing all
a
of the RC information for the
routed nets in the design
6/16/08
c BD03: Digital Physical Design 65
Example: Extraction
When the design has been routed, we can perform a detailed extraction of the
resistance and capacitance of the routed nets in the design.
design.
ce
This RC data will give us a more accurate report of the timing and power of the
n
Resistance and
capacitance
e
for each net is
“extracted”
and saved in
d
a SPEF file.
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
e
interconnect and standard Designer Placement
c
Physical Synthesis
Microarchitecture Scan Reorder
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
design, delay calculation was Design Optimization
e
Route
after final route. Using the Synthesized
Gates Design Optimization
delay information, we were Netlist Post-Route
d
Detail
able to find several timing Routed GDSII
Design
violations in the design.
a
Layout Design Verification
6/16/08
c BD03: Digital Physical Design 67
e
Routed Design
language or other HDL + DEF TCL
Gates +
c
Parasitic extraction file (SPEF) DEF
SPEF
Logical Timing Libraries in Liberty
n
(.lib) format Delay Calculation
Physical Libraries in LEF format
Logical Physical
e
Library Library
Constraints and commands in SDF
TCL
d
Delay File
Output
a
Standard Delay Format (SDF) file
containing all of the delay
c
information in the design
e
Separate delay calculator
c
STA tool
The reason for generating an SDF file is to have consistency for all timing
n
calculations throughout the flow. Once it is generated, then all tools can access the
same SDF file.
d e
Delay for each
cell and net
in the design
is calculated
a RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
e
caused by interconnect Designer Placement
parasitic resistance or
c
Physical Synthesis
Design Optimization
Designer
Delay Calculation
Pre-CTS
and/or changes delays
Signal Integrity
n
Extraction
RTL CTS
Design Optimization
e
Route
design, we saw signal Synthesized
Gates Design Optimization
integrity (SI) effects such as Netlist Post-Route
d
Detail
noise-on-delay and glitches, Routed GDSII
Design
due to long nets that were
a
running in parallel. Layout Design Verification
c
What is noise-on-delay?
Crosstalk-induced delay or incremental
delay due to coupling capacitance?
What is a glitch?
A glitch is a bump or change in value
caused by a changing signal effecting a
neighboring wire.
e
or other HDL + DEF SPEF
c
Constraints (SDC) format Routed Design
Constraints and commands in TCL SDC TCL
Gates +
n
Parasitic extraction file (SPEF) DEF
e
format Signal Integrity
Physical Libraries in LEF format
Logical Physical
d
Power rail IR-drop data Incremental Library Library
SDF
Tool specific SI libraries Tool
a
Specific
c
(SDF) file containing all of the delay
information in the design related to
noise-on-delay
Reports for glitch nets
List of problem nets that need to be
re-routed.
e
Crosstalk-induced delay
c
Noise
Power rail IR drop can cause
n
Weakened drivers
Increased delays
d e
Incremental delay due
coupling capacitance is
stored in an SDF file
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
e
Floorplanning Place/Route
Specification
vendors and foundries have adopted
Designer Placement
it.
c
Physical Synthesis
Microarchitecture Scan Reorder
Delay Calculation
n
Signal Integrity
computing the timing of
Extraction
RTL CTS
e
Post-CTS
Logic Synthesis
digital design without regard to Route
d
Post-Route
Detail
Routed GDSII
Example: To determine the Design
a
timing of the design, we ran Layout Design Verification
c
GDSII GDSII Mask Prep
detail route, and saw several
paths violating their setup time
requirements.
e
language or other HDL (Note: SPEF
c
any stage of the back-end flow) SDF Routed Design
SDC TCL
Constraints in Synopsys Design Incremental
n
SDF Gates
Constraints (SDC) format
Logical Timing Libraries in Liberty
e
Static Timing
(.lib) format Analysis
d
Library
TCL
a
SPEF, SDF, and incremental SDF
Reports
Output
c
Timing reports, including noise-
on-delay effects
ce
During the implementation phase to check on timing, etc.
n
For signoff just before tapeout to ensure all paths meet timing
e
Full chip timing
can now be run
d
with routing and
SI effects
included
a RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
e
drop and EM) are signoff checks run to Specification Floorplanning Place/Route
c
manufacturability of the chip.
Physical Synthesis
Design Optimization
n
Designer
level SPICE netlist vs. GDSII to
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL
ensure the connectivity of the CTS
e
design. Design Optimization
Post-CTS
Logic Synthesis
Route
DRC is a detailed check of the
Synthesized
d
Design Optimization
physical design against the process Netlist Gates Post-Route
a
Design
IR drop is a detailed check of the
Layout Design Verification
chip’s power plan to ensure that the
c
supply voltages do not drop below GDSII GDSII Mask Prep
accepted levels.
EM is a detailed check to ensure that
the current density in all parts of the
design does not exceed accepted
levels.
6/16/08 BD03: Digital Physical Design 76
What Is Mask Prep?
Process of creating the mask set from the GDSII database to allow chip
manufacturing
Specification
ce Floorplanning Place/Route
n
Designer Placement
Physical Synthesis
e
Microarchitecture Scan Reorder
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
d
RTL CTS
Design Optimization
Post-CTS
Logic Synthesis
a
Route
Synthesized Design Optimization
Netlist Gates Post-Route
c
Detail
Routed GDSII
Design
e
GDSII
Rule deck
c
Output DRC
n
DRC reports Rule
Deck
e
Reports
a d
6/16/08
c BD03: Digital Physical Design 78
LVS: Input and Output, Format
Input
Gate Level Netlist in the
e
Gates GDSII
Verilog language
c
GDSII LVS
Rule deck
n
Rule SPICE
Deck Libs
SPICE libraries
e
Output Reports
LVS reports
a d
6/16/08
c BD03: Digital Physical Design 79
Power Grid Analysis, IR Drop, and EM: Input and Output, Format
Input
Gate Level Netlist in the Verilog
e
language + DEF
VCD
c
Power characterized libraries in
tool-specific format Gates + SDC TWF
SPEF
DEF
n
Timing libraries in Liberty (.lib)
format Power Grid
e
Timing constraints in SDC format Analysis
Logical Power
Extraction data in SPEF format Libraries Libraries
d
Timing windows file (TWF)
Value-change-dump file (optional)
a
Reports
Output
c
IR drop reports
EM reports
Output
ce Mask Prep
Tech
n
Optimized GDSII Optimized Files
GDSII
d e
ca
6/16/08 BD03: Digital Physical Design 81
ce
When the design passes all of the PV checks, a GDSII is produced and mask prep
can begin. Mask prep involves complex processes such as lithography (the process
of creating the masks to create the layers for an integrated circuit) modifications,
n
etc.
e
Make sure
power, LVS, DRC
checks pass
d
Perform mask
prep
a
RAM A1
RAM A0
c
u10 u11 c2 u12
u13 c3
c0 u14 u15
u16 u17 c1
e
Which process steps can be done at multiple stages of the flow?
If you were to lead the design of a chip, how would you organize your
c
resources to handle the various tasks?
n
d e
ca
6/16/08 BD03: Digital Physical Design 83
Summary
We have introduced all of the steps in the physical implementation flow:
e
Specification, microarchitecture, RTL, logic synthesis
nc
Extraction, delay calculation, static timing analysis, signal integrity
e
Design verification, mask prep
d
Each step in the process, in and of itself, is very detailed, so we will spend
the rest of course learning more about each step.
ca
6/16/08 BD03: Digital Physical Design 84
Testing Your Understanding
True or false
e
1. In creating a floorplan, we can gather information to see if our design
is routable.
nc
2. If a design does not meet timing after synthesis, it is possible that it
can meet timing during placement.
3. When routing a design, it is best to avoid having long parallel routes.
timing analysis.
d e
4. Accurate SDC constraints are important to meet timing during static
a
5. Errors in physical verification are simple to fix.
6/16/08
c BD03: Digital Physical Design 85
Learning Activity
In this activity, you will
Complete a flowchart of the digital
design implementation flow
Include the design flow steps
ce RTL
n
outputs Design Flow
?
e
Step
Fill in the missing or wrong
sections of the flowchart ?
a d
10 minutes for debriefing
?
6/16/08
c BD03: Digital Physical Design
GDSII
86
Terms and Definitions
Floorplanning Process of deriving the die size, allocating space for soft blocks, planning power, and macro
placement.
e
Placement Process of placing the standard cells in a floorplanned design
Clock Tree Synthesis Process of inserting buffers in the clock tree of a digital design
c
Route Process of connecting the pins of the standard cells, macros, and I/Os of a digital design to
specific metal layers in the process technology to match the schematic.
n
Extraction Process of calculating the parasitic resistance and capacitance of the interconnect of the physical
design
e
Delay Calculation Process of computing the delay of interconnect and standard cells in a digital design
Static Timing Analysis Process of computing the timing of logically related paths for a digital design without regard to
d
large scale functional behavior
Signal Integrity Unintended effects on digital signals caused by interconnect parasitic resistance or capacitance
a
that causes noise and/or changes delays
Design Optimization Process of using automated algorithms to improve the quality of a digital design
c
Physical Synthesis Process of combining logic synthesis and placement to improve the accuracy of the physical
implementation of a digital design
Design Verification Process of physically verifying the design rules and backend checks of a design
Mask Prep Process of creating the mask set from the GDSII database to allow chip manufacturing
e
(connectivity)
Liberty Format for logical libraries, includes timing, area, and power information
c
SDC Synopsys Design Constraints, includes clocks and timing constraints
Clock Skew Delay difference between clock paths in a design
n
Clock Latency Delay from clock source to destination in a design
SPEF Standard Parasitic Exchange Format, standard format for representing capacitance and resistance
e
for each net
SDF Standard Delay Format, standard format for representing interconnect and cell delays
d
LVS Layout vs. schematic, connectivity checking
DRC Design Rule Check, physical rule checking
a
IR Drop Voltage Drop, measure of power plan integrity
EM Electromigration, term used to describe failures in wires due to high current
c
TWF Timing Windows File, file used in signal integrity analysis to determine the overlap of signals
VCD Value Change Dump, file used to provide toggle information to power analysis
GDSII Graphic Data System, standard format for IC layout data exchange
Rule Deck Technology specific information used by physical verification
Spice Deck Format to represent circuits, cells, and macros in detail
Urban Planning
When civil engineers plan the layout of an urban settlement, they need to consider
Total population and population density of the settlement.
ce
Locations of parks, apartments, shopping centers, etc.
en
a d
6/16/08
c BD03: Digital Physical Design 90
Integrated Circuit Layout
In a similar fashion, but on a micron scale, layout engineers must also decide where
to place parts of the circuit under design and follow spacing rules.
ce
en
a d
6/16/08
c BD03: Digital Physical Design 91
Module Objectives
In this module, you will be able to
e
Describe the fabrication of field effect transistors (FETs) and layout
technologies, and correspond layout data to library exchange format
c
(LEF) syntax
Read a design rule manual (DRM) and interpret design rule check
en
(DRC) and layout versus schematic (LVS) errors
Describe how cell libraries are used
a d
6/16/08
c BD03: Digital Physical Design 92
Topics in This Module
Introduction to layout describing layers, FETs, and logic gate layouts
e
DRC and reading a DRM
Review
nc
d e
ca
6/16/08 BD03: Digital Physical Design 93
Introduction to Layout
After a design has been synthesized, it is time to start laying out the
design.
ce
Usually, place and route tools more or less automate the process
using ready-made transistor and gate libraries provided by the
foundry.
en
Sometimes, when performance and design density is of primary
importance, the designer must lay out the design manually.
This approach, called “custom design,” leads to high production costs
a d
and a long time to market.
There are usually three reasons to justify a custom design:
c
The block can be re-used many times such as a library cell.
The product can be sold in a large volume, such as microprocessors.
Cost is not the primary concern, such as chips used in space.
ce
Below are the symbol schematic and the 3D diagram for an n-channel
n
The gate is made out of polysilicon.
e
The drain and source are made out of heavily doped n+ diffusion layers
(also called active area).
d
The bulk is made out of p-type substrate.
a
Gate
c
Poly Gate
n+ drain n+ source
Drain Source
Bulk p-substrate bulk
Symbol Schematic 3D Diagram
e
Due to lithographic error margins, the polysilicon gate must be extended over the
diffusion layer according to design rules (discussed later).
nc
Metal contacts must be added for the drain and source diffusion layers.
Metal 1
e
contact
a d Gate
6/16/08
c Source Drain
e
The drain and source of a p-channel MOS (PMOS) are made out of p+ diffusion
layer, and the substrate is composed of n-type material.
c
For consistency in this module, the wafer is created with a p-type process. This
means the default substrate is p-type material. Therefore, since the PMOS requires
n-type material, a well composed of n-type material must be built around the
n
diffusion layer.
In layout tools, this well is also called the select region.
e
Layout of p-channel transistor
a d
c
Gate
Source Drain
n-well
ce
The inverter needs power and ground metal strips (rails). Recall that the
source of the PMOS is connected to the power rail, whereas the source of
n
the NMOS is connected to the ground rail.
e
The poly can be extended to connect together both gates.
Recall that for an inverter with equal drive strength for rising and falling
d
transitions, the PMOS is twice the width of the NMOS.
Since the substrate needs to be biased, we will also need to add substrate
a
contacts to both transistors (n-tap for n-type substrate contact, p-tap for p-
type substrate contact).
6/16/08
c
In digital circuits, the substrate of a transistor is usually biased to the same
voltage as the source of the transistor to avoid body effect.
n-tap
ce Drain
In Out
en n-well
Source
Gate
a d Source
In Out
Drain
c
Note: In digital circuit schematics, the
bulk node is usually not drawn because it
is assumed that the bulk is connected to
the source.
p-tap
Ground
6/16/08 BD03: Digital Physical Design 99
Stick Diagram
Just like the way a writer writes a rough draft for an essay, layout engineers
can also plan their layout on paper before diving into the tools.
e
A commonly preferred method for scratch work is a stick diagram.
c
A stick diagram is a way to visualize a layout without drawing the actual
dimensions.
n
Each object (that is, poly, metal strip, diffusion) is represented by a
dimensionless “stick.”
e
Below is a stick diagram for the inverter we just drew. Can you identify which
each stick represents?
a d
Stick diagram of an inverter
6/16/08
c BD03: Digital Physical Design 100
NAND Gate Layout
Let’s draw a stick diagram for a slightly more complicated, two-input NAND
gate.
NMOS in series.
ce
Looking at the schematic, we see that there are two PMOS in parallel and two
Due to this fact, we do not need to draw separate diffusion layers for two
n
devices that share sources and drains.
A B
d e A B
ca A
n-tap VDD
ce
en
d A B
ca p-tap Ground
e
Always try to create a continuous diffusion layer or well.
nc
substrate taps in each separate well.
All poly strips should run in one direction (usually vertical).
e
Keep metal jogs to a minimum, and absolutely no diagonal wires,
since diagonal wires can cause problems with design rules
d
downstream.
a
Power and ground rails should be extra wide to allow a large amount
of current to flow into your device.
c
Placing additional contacts never hurt. They will give you more options
in terms of where to place the metal wire during routing.
Plan your layout on paper first; the stick diagram is your friend!
Technology Layers
In most layout tools today, you will
have access to a layers palette
e
like the one shown on the right
from Cadence Virtuoso Layout
c
Editor.
We have already encountered
n
some of the layers on the previous
e
slides.
The table on the next slide will
d
summarize commonly used
layers.
ca
6/16/08 BD03: Digital Physical Design 104
Technology Layers (continued)
Metal(1-8)
Poly
nactive
pactive
d
nselect n well for PMOS
a
pselect In a p-process, this represents an abstract
boundary.
c
cc Contact cut. This layer, in conjunction with
metal layers, is used to create vias and
contacts.
Class Exercise
Draw the stick diagram and layout for a two-input NOR gate.
ce A
en B
a d A B
6/16/08
c BD03: Digital Physical Design 106
Topics in This Module
Introduction to layout describing layers, FETs, and logic gate layouts
e
DRC and reading a DRM
Review
nc
d e
ca
6/16/08 BD03: Digital Physical Design 107
Design Rules
Today’s semiconductor manufacturing processes are extremely
complex. It is simply not possible to expect every layout engineer to
e
understand the intricacies of the fabrication process.
c
Layout engineers want tighter, smaller designs.
en
Design rules act as an interface and a compromise between layout
d
By understanding design rules, layout engineers can make their
design as compact as possible while ensuring that their design will
a
have a high yield.
6/16/08
c BD03: Digital Physical Design 108
Design Rule Manual
Usually, the layout engineer will have access to a document called the
design rule manual (DRM), which explains all the design rules that
e
need to be followed.
c
The information is also annotated into layout tools that automatically
check the design for violations as the design is being laid out.
en
We will take a brief look at a sample DRM.
To make the rules more readable, the rules in this manual are divided
into sections based on the different layers.
a d
6/16/08
c BD03: Digital Physical Design 109
e
1A
1F
1A Minimum width 2.2 μm
c
1B
1B Minimum spacing in x and y, 1.6 μm
n
both N-wells biased at the
same potential
N-well
e
1C Minimum spacing either or 3.0 μm
both N-wells not biased or
biased to different potentials
1C
a 1E
d 1D
1E
Minimum enclosure of p-
active region
Minimum spacing in x and y
1.5 μm
2.0 μm
c
to an external n-active region
N-active
N-active
ce 2A
2B
Minimum width
0.8 μm
2B
P-active
2C
P-Active Rules
1.0 μm
d
3A Minimum width 0.6 μm
a
3B Minimum spacing over field 0.8 μm
c
3C Minimum spacing to n-active 1.0 μm
e
4E Rule Rule Description Drawn
c
4A Required size (square) 0.8 μm2
n
4A
4C 4B Minimum spacing 0.6 μm
e
4C Minimum poly contact 1 0.4 μm
spacing to any active region
d
4B
4D Minimum active region 0.6 μm
contact 1 to poly
a
4F
4E Minimum enclosure by any 0.2 μm
active region
c
4F Minimum enclosure by poly 0.2 μm
e
Rule Rule Description Drawn
5A
e
5B Minimum spacing 0.8 μm
d
5C Minimum overlap of contact 1 0.2 μm
a
5B
6/16/08
c
Note: These dimensions are not drawn to scale.
The rules for contacts belonging to other layers are similar; just the numbers are different.
Discussion Questions
If we had design rule violations on METAL2 and METAL3 after detail
route, which sections of the DRM should we refer to?
ce
Why is there a minimum spacing rule for specific metal layers?
n
Are the rules different per metal layer?
d e
ca
6/16/08 BD03: Digital Physical Design 114
Topics in This Module
Introduction to layout describing layers, FETs, and logic gate layouts
e
DRC and reading a DRM
Review
nc
d e
ca
6/16/08 BD03: Digital Physical Design 115
ce
A function called Layout versus Schematic (LVS) is found in most
tools, which checks your layout against a schematic netlist.
The tool first extracts a netlist from the layout by using some basic
rules:
en
A transistor is detected when poly overlaps active regions.
All poly, diffusion, and metal layers are conductive and are assumed to
d
route signals.
ca
6/16/08 BD03: Digital Physical Design 116
Layout vs. Schematic (continued)
Net2 Net2
e
Net3
Net1 VDD VDD Net1 I1 I3 Net3
I1 I3 2/1 2/1
GND
A B
I2
GND
A B
I4
nc I2
1/1
I4
1/1
d e
ca Net1
IN1
Net2 Net3
O1
e
comparing a netlist against a layout.
c
These are some general tips when performing LVS:
Similar to Verilog® design, a bottom-up approach should be used when
n
performing LVS. If LVS does not pass, then the error can be narrowed
down to the interconnects, because the smaller blocks are already LVS
e
clean.
d
Label your layout. All pins and wires should be labeled exactly as they
appear in the netlist. It gives the tool a good chance to correctly identify a
a
mismatch.
If the device count between layout and netlist is the same, do not perform
c
any netlist reduction. If the count is different, check to see if the layout is
correct before performing netlist reduction, because this process attempts
to simplify logic and can potentially collapse nets.
ce
The two netlists should be topologically equivalent, meaning they have the
same type of devices.
n
Set your constraints to check only the above factors.
e
The second goal of LVS is to make sure that device parameters and
capacitance values are correct. This can only be done if the netlist
d
annotates such information.
a
Check the reports.
LVS reports usually consist of matching and non-matching nets and
c
devices.
Most tools have a cross-probing feature that will highlight the equivalent
object on both the layout and the schematic of the netlist if one is selected.
This is your best debugging friend!
e
DRC and reading a DRM
Review
nc
d e
ca
6/16/08 BD03: Digital Physical Design 120
Physical Libraries
After a standard cell is laid out, the information is encapsulated into a
Library Exchange Format (LEF).
ce
The LEF provides a means to exchange layout information between
layout and routing tools in the IC flow (such as the Cadence®
Virtuoso® tools and the SoC Encounter® RTL-to-GDSII system).
standard cell.
en
The LEF contains only information on layout of metal layers inside a
This information includes the locations of I/O pins and also internal
a
route.
d
metal routing so that the router knows where to route and where not to
6/16/08
c BD03: Digital Physical Design 121
e
The unit of distance is in microns.
nc
LEF statements end with a semicolon. A space must separate the last
character in the statement and the semicolon.
d e
LEF information is usually divided into two files, a technology and a
a
LEF statements can be defined in any order. But data must be defined
before it is used. The following table is the typical format for LEF files.
6/16/08
c BD03: Digital Physical Design 122
Typical LEF Format
Statements for a tech LEF file. Statements for a standard cell LEF file.
[VERSION statement] [VERSION statement]
e
[BUSBITCHARS statement] [BUSBITCHARS statement]
[DIVIDERCHAR statement] [DIVIDERCHAR statement]
c
[UNITS statement] [VIA statement] ...
[MANUFACTURINGGRID statement] [SITE statement]
n
[USEMINSPACING statement] [MACRO statement
[CLEARANCEMEASURE statement ;] [PIN statement] ...
e
[PROPERTYDEFINITIONS statement] [OBS statement ...] ] ...
[LAYER(Nonrouting) statement [BEGINEXT statement] ...
d
| LAYER(Routing) statement] ... [END LIBRARY]
[SPACING statement ]
[MAXVIASTACK statement]
a
[VIA statement] ...
[VIARULE statement] ...
c
[VIARULE GENERATE statement] ...
[NONDEFAULTRULE statement] ...
[SITE statement] ...
[BEGINEXT statement] ...
[END LIBRARY]
ce
The bulk of the technology LEF file describes the metal and via layers,
and their process rules (such as width, spacing, extension, minimum
area, and antenna area).
en
Metal layers are used to connect standard cells and macros, whereas
vias are used to connect different metal layers.
A via is a rectangular object that connects two routing layers together.
a
two routing layers.
d
The via is usually composed of three layers: a cut layer sandwiched by
6/16/08
c BD03: Digital Physical Design 124
LAYER Statement
Every layer in the technology is described with the LAYER statement.
e
There are four types of layers: CUT, Routing, Implant, and
Masterslice.
nc
In this class, we will cover only the CUT and Routing layers, which are
responsible for creating metal routes and the vias.
Implant and Masterslice layers are beyond the scope of this module
e
and will not be discussed here.
a d
6/16/08
c BD03: Digital Physical Design 125
Routing LAYER
LAYER ME1
Routing layers are responsible for TYPE ROUTING ;
creating metal routes between WIDTH 0.160 ;
e
cells. AREA 0.1024 ;
SPACING 0.160 ;
c
For each layer, there are many SPACING 0.26 RANGE 1.765 100000.0 ;
PITCH 0.400 ;
attributes to set. On the right is a OFFSET 0.200 ;
sample LEF file describing the
n
DIRECTION HORIZONTAL ;
attributes for metal layer 1. THICKNESS 0.320 ;
HEIGHT 0.46 ;
e
The important attributes will be MINENCLOSEDAREA 0.3072 ;
MINIMUMCUT 2 WIDTH 1.40 ;
described in detail on the next MAXWIDTH 25.00 ;
d
slide. CAPACITANCE CPERSQDIST 1.1012E-04 ;
RESISTANCE RPERSQ 0.09100000 ;
a
EDGECAPACITANCE 9.362E-05 ;
MINIMUMDENSITY 20 ;
MAXIMUMDENSITY 80 ;
c
DENSITYCHECKWINDOW 200 200 ;
DENSITYCHECKSTEP 100 ;
FILLACTIVESPACING 0.8 ;
ANTENNACUMAREARATIO 396 ;
ANTENNACUMDIFFAREARATIO PWL ( ( 0 396 )
( 0.102 396 ) ( 0.103 999999999 ) ( 1 999999999
) ) ;
END ME1
e
Width Minimum width of the routing
Thickness wires
c
Area Minimum area for a polygon of
metal
n
Top down view Thickness Minimum thickness of wire
a
ex: SPACING 0.26 RANGE 1.765
100000.0 ;
means the minimum spacing is
c
0.26 for wires with widths beyond
1.765 microns.
Spacing Width
6/16/08 BD03: Digital Physical Design 127
ce Attribute Description
n
Capacitance calculations
Capacitance The capacitance per
square unit of the wire-to-
e
Resistance ground capacitance
Resistance The resistance per square
d
of the metal
EdgeCapacitance Capacitance EdgeCapacitance The capacitance from the
a
sidewall to the ground of
the metal
6/16/08
c BD03: Digital Physical Design 128
Routing LAYER Attributes (continued)
Most place and route tools have
routing tracks. All metal routes
e
must be placed squarely on these
Attribute Description
tracks.
c
Offset The distance of the first routing track
from the edge of the chip
d
Direction Each metal layer has a preferred
direction that the auto router will route
with. It is either vertical or horizontal.
a
Diagonal tracks are usually not
preferred.
6/16/08
c
Offset Pitch
Edge of Chip
BD03: Digital Physical Design 129
Vias
Vias are contacts that connect
together different metal layers. Sample Via Definition
e
//The LAYER statement for metal 1 and 2
Vias usually have three layers: two defined on previous slides.
c
routing layers and a CUT layer in
between. LAYER VI1
TYPE CUT ;
n
SPACING 0.20 ;
END VI1
e
VIA VI1_H DEFAULT
RESISTANCE 4.0000e+00 ;
d
CUT Layer LAYER ME1 ;
RECT -0.16 -0.1 0.16 0.1 ;
a
LAYER VI1 ;
RECT -0.1 -0.1 0.1 0.1 ;
LAYER ME2 ;
c
RECT -0.16 -0.1 0.16 0.1 ;
END VI1_H
e
MACRO INVX10MTL PIN VSS
CLASS CORE ; DIRECTION INOUT ;
FOREIGN INVX10MTL 0.000 0.000 ; USE GROUND ;
c
ORIGIN 0.000 0.000 ; SHAPE ABUTMENT ;
SIZE 3.200 BY 2.800 ; PORT
SYMMETRY X Y ; LAYER ME1 ;
n
SITE SAMPLEFSNSITE ; RECT 2.540 -0.180 3.200 0.180 ;
PIN Y RECT 2.260 -0.180 2.540 0.680 ;
DIRECTION OUTPUT ; RECT 1.460 -0.180 2.260 0.180 ;
e
PORT RECT 1.180 -0.180 1.460 0.580 ;
LAYER ME1 ; END
RECT 2.815 0.605 3.100 2.305 ; END VSS
d
RECT 1.980 1.040 2.815 1.760 ; PIN VDD
RECT 1.700 0.605 1.980 2.305 ; DIRECTION INOUT ;
RECT 1.220 0.740 1.700 2.020 ; USE POWER ;
a
END SHAPE ABUTMENT ;
ANTENNADIFFAREA 1.687 ; PORT
END Y LAYER ME1 ;
c
PIN A RECT 2.540 2.620 3.200 2.980 ;
DIRECTION INPUT ; RECT 2.260 2.070 2.540 2.980 ;
PORT RECT 1.460 2.620 2.260 2.980 ;
LAYER ME1 ; RECT 1.180 2.180 1.460 2.980 ;
RECT 0.160 1.140 1.040 1.500 ; END
END END VDD
ANTENNAGATEAREA 0.888 ; END INVX10MTL
END A
e
standard cell, and I/O pads. standard cells, the value is CORE.
Foreign Specifies the name of the macro
c
We will focus on standard cells. when seen in a tool. It specifies how
the position and orientation would be
translated when read into a layout
n
tool.
Origin Specifies the origin of the macro
e
relative to a DEF COMPONENT
placement point. Usually leave this
d
as 0 0 to avoid confusion.
Size Dimensions of the MACRO
ca
6/16/08 BD03: Digital Physical Design 132
MACRO Symmetry
A chip is divided into core rows in which standard cells are placed.
The rows are usually placed in a flipped and abutted pattern, with alternating north (N),
e
and flipped south (FS) orientations.
c
Standard cells are placed in the rows, in N or FS orientation, such that they share VDD
rails and VSS rails.
n
Cells in the N row have the N orientation, whereas those in the FS row have FS
orientation.
N Row
FS Row ca N
FS
VDD or VSS rail
Cells on the N row that are flipped about their y-axis have the FN orientation,
N Row N
e
whereas those flipped vertically on the FS row have the S orientation.
FS Row FS
en S
d
The SYMMETRY statement (SYMMETRY X ;) tells the placer which
orientations are allowed when placing cells in the rows.
a
Possible values include
c
X : N and FS orientations should allowed
Y : N and FN orientations should allowed
X Y: All orientations should allowed
R90: Do not use this value for standard cells
e
itself, it is time to look at the most PIN VDD
important components of a standard DIRECTION INOUT ;
cell: its pins. USE POWER ;
c
SHAPE ABUTMENT ;
The pin DIRECTION specifies the PORT
direction of the pin. Values can be
n
LAYER ME1 ;
either INPUT, OUTPUT, or INOUT, RECT 2.540 2.620 3.200 2.980 ;
TRISTATE, or FEEDTHRU. RECT 2.260 2.070 2.540 2.980 ;
e
The pin SHAPE specifies how the pin RECT 1.460 2.620 2.260 2.980 ;
is connected. Values can be RECT 1.180 2.180 1.460 2.980 ;
END
d
ABUTMENT, RING, or FEEDTHRU
(used only for pins with special END VDD
connection requirements, such as
a
power/ground).
The pin USE specifies how the pin is
c
used. Values can be either
ANALOG, CLOCK, GROUND,
POWER, or SIGNAL.
e
ABUTMENT: Pins that stretch across the cell joining the same pin on adjacent
cells without routing. (Power rails are a good example.)
c
RING: Pin on a large macro that forms a ring around the macro allowing
connection to any point on the ring (used for power on big macros such as
RAMS).
n
e
FEEDTHRU: Pin with an irregular shape with a jog within the cell.
Abutment
a d Ring Feedthrough
6/16/08
c BD03: Digital Physical Design 136
MACRO Pin Port Block
The port statement begins a section, which specifies the location of the metal and via
geometries of the pin relative to the standard cell origin.
e
There can be more than one port block. All ports are electrically connected for that pin.
c
The LAYER statement specifies the layer of the metal or via geometry in the port. There
can be more than one LAYER or VIA statement in each PORT.
n
The RECT statements give the dimensions of the port. (The first two numbers are the x
y coordinates of one corner, whereas the second two numbers are the x and y
e
coordinates of the corner diagonally across from the first one. The convention is lower
left, upper right for the two sets of coordinates.)
a d PIN A
DIRECTION IN ;
USE SIGNAL ;
c
PORT
LAYER ME1 ;
RECT 0.000 0.000 1.000 1.000 ;
RECT 1.000 0.000 2.000 2.000 ;
END
END VSS
e
DRC and reading a DRM
Review
nc
d e
ca
6/16/08 BD03: Digital Physical Design 138
Summary
Layout is the process of placing physical instances of a netlist onto a
chip.
e
This process is primarily used for full custom designs or library cells.
c
The layout for a transistor consists of a polysilicon gate, a diffusion layer,
and a substrate layer.
n
Metal contacts, vias, and substrate taps are needed as interconnects for
your transistors.
e
It is a good idea to lay your design out on paper using a stick diagram
before diving into a tool.
d
Stick diagrams ensure a continuous diffusion layer and consistent vertical
poly strips.
ca
Design rules allow layout engineers to produce a high-yield design
without understanding the intricacies of the fabrication process.
The process engineer provides a document called the design rule manual,
which contains all the pertinent design rules to the layout engineer.
The manual contains minimum spacing requirements for all layers on a
layout.
Summary (continued)
LVS is a tool to check the functional correctness of a layout by
comparing the layout against the netlist for which it was designed.
library file.
ce
The information from a completed layout is annotated into a LEF
The LEF file is used in automatic place and route tools, giving the tools
n
information about the routing layers for a certain technology process.
e
The technology LEF file contains design rules of all metal layers, whereas
the standard cell LEF file contains the locations of all internal pins and
d
routing inside the standard cells.
ca
6/16/08 BD03: Digital Physical Design 140
Testing Your Understanding
True or false
e
1. Laying out an entire chip manually is an easy process and is done
routinely in the industry.
material.
nc
2. The diffusion layer of an NMOS is made out of heavily doped p-type
e
layout.
d
4. Design rules must be strictly followed in order for the design to have a
high yield.
ca
5. The technology LEF file contains only the standard cell information
about a certain technology process.
Learning Activity
In this activity, you will
e
Match the following LEF file terms with the corresponding diagram in
the handout.
c
Present your results to the class.
n
e
15 minutes for activity
10 minutes for debriefing
a d
6/16/08
c BD03: Digital Physical Design 142
Timing Libraries and Constraint Files
Module 3
Design 1
ce
Verilog
Design 2
Timing
Library
en
Logic
d
Synthesis Design 3
Constraints
ca Design 4
ce
Design1 is the smallest and Design4 is the biggest, but all four designs
perform the same logical function.
How were the designs made different?
Timing library
en
The timing library and constraints made all the difference.
d
Guides as to which technology to target to, for example 130 nm or 65 nm.
Constraints
a
This defines the rules based on which the design has to be made.
c
If the rules are written well, the results is a better and smaller design.
If the rules are written poorly, even with the best technology, the result is
the worst and biggest design.
ce
The designers write equivalent behavioral Verilog code, which has the
same functionality as a digital circuit that is to be manufactured.
A synthesis tool is used to convert this behavioral code into a structural
n
code implementing the same functionality.
Structural Verilog consists of instantiated gates.
e
But from where does the synthesis tool get these gates?
d
ca
6/16/08 BD03: Digital Physical Design 146
Module Objectives
In this module, you will be able to
e
Identify the syntax of a timing library and describe how the numbers in
the library are used for timing analysis
c
Create a constraint file based on timing specifications
n
d e
ca
6/16/08 BD03: Digital Physical Design 147
e
Constraints
General-purpose and object-access constraints
Timing constraints
n
Environmental constraints
c
d e
ca
6/16/08 BD03: Digital Physical Design 148
What Are Timing Libraries?
Every foundry has a list of gates with which it can build designs.
e
A list of such gates and cells is stored in a file generically called a
library.
c
Cells are library representations of gates. You use a cell from a library to
create a gate in your design.
technology library.
en
One such file that is used by a synthesis tool, is called as synthesis
a d
Different views (logical, physical, etc.) of the gates and cells are stored in
different files. Together, these views are called a technology library.
Other library files also exist and contain information needed by the back-
c
end tools (not discussed in this section).
e
format, which uses a .lib
extension.
c
A library file is comprised not only
of a list of gates but also RTL SDC
n
Their functional/logical definitions
Power, energy, and timing
e
characteristics
Their physical characteristics such Logic Synthesis Library
d
as area and footprint
Synthesized
If this library is given as an input to Gates
a
Netlist
a synthesis tool along with the
behavioral (RTL) code, it converts
c
it into an appropriate structural
design.
Synthesis can replace cells with
other cells of the same footprint
without affecting logic function.
General attributes
Header
ce
Library File
Cell name
Body
Documentation attributes
Unit attributes
Operating conditions
en Physical description
d
Pin information
Threshold and default definitions
a
Templates
Power characteristics
More attributes
c
Voltage information
Wire load definitions
6/16/08
Timing characteristics
e
Header
lookup, or calculated
General attributes
c
Other attributes not discussed
n
revision: Revision number
Unit attributes
date: Date created
comment: Any comments
Unit attributes
d
time_unit: nano, pico, etc.
e Operating conditions
a
voltage_unit: milli, micro, etc. Templates
c
current_unit: milli, micro, Etc.
More attributes
pulling_resistance_unit: Ω, etc.
leakage_power_unit: watt, etc. Voltage information
capacitive_load_unit: pico,
Wire load definitions
femto, farad, etc.
6/16/08 BD03: Digital Physical Design 152
Operating Conditions
Operating conditions are the conditions Library File
under which the chip will operate,
e
including process, temperature, and Header
voltage General attributes
nom_process: 1, 2, etc.
nc Documentation attributes
Unit attributes
operating_conditions
process: 1, 2, etc.
d e Operating conditions
a
temperature: 100, 120, etc.
voltage: 1, 0.9, etc.
Templates
c
tree_type: balanced, etc. More attributes
Voltage information
e
10, 30, etc. Header
Slew_lower_threshold_pct_fall: General attributes
c
90, 70, etc.
These indicate the points from Documentation attributes
n
where the slew should be
calculated Unit attributes
30% here
d
70% here
e Operating conditions
a
Templates
Default definitions
c
Contains attributes such as More attributes
default_fanout_load,
default_max_transition, etc. Voltage information
There are many more attributes
that are not discussed Wire load definitions
e
Header
Power/energy template
General attributes
c
Timing template, etc.
Documentation attributes
Template shows how these
characteristics would be
described in the library
Let’s look at an example to
en Unit attributes
Operating conditions
d
understand better: Threshold and default definitions
a
lu_table_template(delay_template_7x7) { Templates
variable_1: input_net_transition;
c
variable_2:
total_output_net_capacitance;
More attributes
index_1 ("1000, 1001, 1002, 1003, 1004,
1005, 1006"); Voltage information
index_2 ("1000, 1001, 1002, 1003, 1004,
1005, 1006");
}
Wire load definitions
Example Template
Shown below is an example of timing template. (Templates for other
characteristics are similar and will not be discussed in this module.)
ce
lu_table_template(delay_template_7x7) {
variable_1: input_net_transition; (ex: 1, 2, 3, 4, 5, 6, 7)
variable_2: total_output_net_capacitance; (ex: 10, 20, 40, 80,
n
160, 320, 640)
index_1 ("100, 101, 102, 103, 104, 105, 106");
index_2 ("100, 101, 102, 103, 104, 105, 106");
e
}
d
lu indicates that it is a lookup and not calculated.
a
variable_1 indicates the factor for row indices.
c
variable_2 indicates the factor for column indices.
Example: Delay for an input_net_transition of 2 ps and
total_output_net_capacitance of 80 pF is row2 and column4. From the
table, we get this value to be 103 ps.
e
attributes (I/O pads), which are not Header
discussed here. General attributes
c
Voltage information
Minimum, maximum, and other
Documentation attributes
n
complimentary MOS (CMOS)
characteristics of the input and Unit attributes
e
output voltages are described in
this section, Operating conditions
d
Wire load definitions
Threshold and default definitions
In a digital circuit, not only the
a
gates, but even wires have delays
associated with them. Templates
c
They may be small compared to
gate delay, but considering the More attributes
amount of wiring in the latest
chips, their delay accounts to as Voltage information
much as 50%.
Wire load definitions
ce
Many WLM choices are available in a timing library, and they are
chosen based on the size of a design.
Example:
e
wire_load(“wire_load name") {
resistance : 8.0e-8;
n
d
capacitance: 1.2e-4;
area : 0.7;
a
slope : 66.667;
fanout_length (200.0);
}
6/16/08
c
A custom wire load model (CWLM) is a user-generated model that can
be used to more accurately estimate the net delays.
e
Body
Physical description
Cell name
c
cell_footprint: general name, ex
and2, or2, etc.
n
area: area of the cell, 20.8, 30.4,
etc.
Physical description
e
Example:
d
cell (ADDX1) { Pin information
cell_footprint : add;
a
area : 80.000;
Power characteristics
6/16/08
c Timing characteristics
Pin Information
Direction: Input/Output/Inout/Internal
Library File
Capacitance: Capacitance that is seen
e
at this pin. Body
Output pins: Cell name
c
Function: Value based on the inputs
Example:
n
Function: (in1 in2) Æ and gate
Function: (in1 | in2) Æ or gate Physical description
e
Example:
pin(CI) {
direction : input;
d
Pin information
capacitance : 0.004189;
}
a
pin(S) {
direction : output; Power characteristics
c
capacitance : 0.0;
function : "(A ^ B ^ CI)";
Other characteristics of the pin include
power and timing characteristics Timing characteristics
Note: Power is not discussed in this
module.
6/16/08 BD03: Digital Physical Design 160
Timing Characteristics
The timing information is Library File
displayed for each output pin in
e
relation to each input pin, in the Body
form of a lookup table. Cell name
c
If it is not calculated
n
There are multiple lookup tables
for each type of delay. Physical description
e
Rise delay
Rise transition
d
Pin information
Fall delay, etc.
a
It is displayed exactly as we saw
in the timing template earlier, but a Power characteristics
c
little more detail.
Timing characteristics
e
Example:
c
pin(Y) {
direction: output;
capacitance: 0.0;
n
function: "(A B)";
internal_power() {
e
related_pin: "A";
cell_rise(delay_template_7x7) {
index_1 ("0.04, 0.07, 0.1, 0.2, 0.5, 1.0, 2");
d
index_2 ("0.006, 0.030, 0.078, 0.174, 0.366, 0.749,
1.523");
values ( \
a
"0.07, 0.09, 0.13, 0.20, 0.35, 0.64, 1.23", \
"0.08, 0.10, 0.13, 0.21, 0.35, 0.65, 1.24", \
c
"0.09, 0.11, 0.15, 0.22, 0.37, 0.66, 1.25", \
"0.11, 0.13, 0.17, 0.25, 0.39, 0.68, 1.28", \
"0.14, 0.17, 0.20, 0.28, 0.42, 0.72, 1.31", \
"0.18, 0.21, 0.25, 0.33, 0.47, 0.76, 1.35", \
"0.23, 0.26, 0.31, 0.39, 0.54, 0.83, 1.42");
}
ce
Depending on various values for these indexes, corresponding delay
en
follows, what would the cell_rise be if the input_net_transition was
0.07 and the total_output_net_capacitance was 0.030?
d
lu_table_template(delay_template_7x7) {
variable_1 : input_net_transition;
a
variable_2 : total_output_net_capacitance;
index_1 ("1000, 1001, 1002, 1003, 1004, 1005,
c
1006");
index_2 ("1000, 1001, 1002, 1003, 1004, 1005,
1006");
}
ce
One such file (the liberty file), which is used by a synthesis tool, is
called a synthesis technology library.
The library file is divided into two main parts:
en
Header: Contains all the attributes and terminology used in the library
Body: Contains characteristics of each cell that a foundry has for a specific
technology
a d
Synthesis tools use these files to generate structural Verilog files
equivalent to behavioral (RTL) Verilog files given as inputs.
c
Next, we will see what else is given as inputs to a synthesis tool.
e
cell_footprint : dffx1;
area : 50.0;
pin(D) {
direction : input;
DFFX1 DFFX1 timing() {
BUFX1
c
related_pin : "CK";
timing_type : setup_rising;
rise_constraint(setup_template_3x3) {
index_1 ("0.05, 1.4, 4.5");
index_2 ("0.05, 1.4, 3.3");
values ( \
n
"0.156250, 0.070312, 0.113281", \
"0.246094, 0.140625, 0.175781", \
"0.203125, 0.093750, 0.128906");
}
pin(Q) {
e
direction : output;
timing() {
related_pin : "CK";
timing_type : rising_edge;
cell (BUFX1) { timing_sense : non_unate;
d
cell_footprint : buf; cell_rise(delay_template_7x7) {
area : 13.0; index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5");
pin(A) { index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
direction : input; values ( \
} "0.291957, 0.437181, 0.550916, 0.843878, 1.248819, 1.788431, 2.305442", \
a
pin(Y) { "0.316264, 0.461499, 0.575227, 0.868187, 1.273127, 1.812741, 2.329752", \
direction : output; "0.388358, 0.533648, 0.647351, 0.940318, 1.345271, 1.884899, 2.401920", \
function : "A"; "0.439033, 0.584292, 0.697982, 0.990937, 1.395897, 1.935540, 2.452571", \
internal_power() { "0.462183, 0.607445, 0.721146, 1.014067, 1.419031, 1.958683, 2.475723", \
timing() { "0.468653, 0.613990, 0.727660, 1.020554, 1.425521, 1.965184, 2.482228", \
c
related_pin : "A"; "0.460997, 0.606314, 0.719968, 1.012831, 1.417787, 1.957454, 2.474507");
timing_sense : positive_unate; }
cell_rise(delay_template_7x7) { lu_table_template(delay_template_7x7) {
index_1 ("0.05, 0.15, 0.6, 1.4, 2.3, 3.3, 4.5"); variable_1 : input_net_transition;
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); variable_2 : total_output_net_capacitance;
values ( \ index_1 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
"0.094400, 0.235579, 0.351869, 0.653282, 1.070188, 1.625921, 2.158454", \ index_2 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
"0.116567, 0.257243, 0.373654, 0.675230, 1.092220, 1.647999, 2.180553", \ }
"0.156644, 0.301067, 0.417546, 0.719020, 1.136089, 1.691941, 2.224538", \
"0.165784, 0.318068, 0.434036, 0.735633, 1.152743, 1.708488, 2.241054", \ lu_table_template(setup_template_3x3) {
"0.149625, 0.311618, 0.428220, 0.729893, 1.147035, 1.702969, 2.235440", \ variable_1 : constrained_pin_transition;
"0.117344, 0.289370, 0.407324, 0.710811, 1.128181, 1.684128, 2.216830", \ variable_2 : related_pin_transition;
"0.067751, 0.250660, 0.370401, 0.676924, 1.096166, 1.652295, 2.184962"); index_1 ("1000, 1001, 1002");
} index_2 ("1000, 1001, 1002");
}
e
Constraints
General-purpose and object-access constraints
Timing constraints
Environmental constraints
nc
d e
ca
6/16/08 BD03: Digital Physical Design 166
Constraints
The rules that are written are referred to as constraints.
e
Constraints are essential to meet design goals in terms of area, timing,
and power to obtain the best possible implementation of a circuit.
nc
Constraints allow designers to control various aspects of synthesis.
e
the most optimal result.
a d
6/16/08
c BD03: Digital Physical Design 167
Defining Constraints
Every EDA tool has its own commands to define constraints for a
design.
ce
However, there is a common format, which is supported by almost all
the EDA tools, to define the constraints.
This format is called Synopsis Design Constraint (SDC) format.
en
The constraints are defined using special SDC commands.
a d
6/16/08
c BD03: Digital Physical Design 168
SDC Format
The SDC commands are divided into these broad categories:
General-purpose commands
Object-access commands
Timing commands
ce
n
Environmental commands
e
only these for this course.
d
ca
6/16/08 BD03: Digital Physical Design 169
e
Constraints
General-purpose and object-access constraints
Timing constraints
n
Environmental constraints
c
d e
ca
6/16/08 BD03: Digital Physical Design 170
General-Purpose Commands
Following are general-purpose commands:
expr: Used to create simple expressions
ce
Syntax: expr arg1 arg2 arg3 … argn
Example: expr 0.1 + 0.2 + 0.1
The result of this is the addition of the three numbers that is 0.4.
en
set: Used to define variables
Syntax: set variable_name Value
Example: set design design1
a d
The variable $design now contains the value design1. A list of values
can also be defined as shown below:
set list {item1, item2, item3, item4 … itemn}
6/16/08
c
There are other general-purpose commands that are not discussed
here.
Object-Access Commands
These commands are used to get the location of an object in the
design.
design.
ce
The object can be a cell, a block, a port, a pin, or anything else in the
en
all_clocks: Returns a list of all clocks
all_inputs: Returns a list of all inputs within a clock domain
a d
Syntax: all_inputs –clock <clock_name>
all_outputs: Returns a list of all outputs within a clock domain
Syntax: all_outputs –clock <clock_name>
6/16/08
c
get_cells: Searches for a cell with a particular naming pattern and returns
its location if found
Syntax: get_cells pattern
e
Syntax: get_clocks pattern
c
get_nets: Searches for a net (equivalent to a wire in the Verilog language)
with a particular naming pattern and returns its location if found
n
Syntax: get_nets pattern
get_pins: Searches for a pins (ports of cells) with a particular naming
e
pattern and returns its location if found
Syntax: get_pins pattern
a d
get_ports: Searches for a port (inputs/outputs to the design) with a
particular naming pattern and returns its location if found
Syntax: get_ports pattern
6/16/08
c BD03: Digital Physical Design 173
e
Constraints
General-purpose and object-access constraints
Timing constraints
n
Environmental constraints
c
d e
ca
6/16/08 BD03: Digital Physical Design 174
Timing Constraints
To model a clock in a design, use the create_clock and the
create_generated_clock SDC commands.
ce
Let’s look at a few definitions before we start modeling a clock.
Clock period is defined as the time difference between two consecutive
rising or falling clock edges.
en
Duty cycle is defined as the ratio between the pulse duration (t) and the
period (T) of a rectangular waveform.
d
Pulse Duration t
Duty Cycle = t/T
ca
Rising Falling
Edges Edges
Clock Period T
create_clock
The create_clock command is used to model a clock waveform.
e
Syntax:
create_clock -period <period_value in nanoseconds> \
c
-name <clock name> -waveform <edge list> source_objects
n
Example:
create_clock –name core_clock –period 10 \
e
–waveform (4, 10) [get_port “clock”]
d
The clock waveform that would be modeled is as below:
a
Pulse Duration t = 6 ns
Core_clock
6/16/08
c 4 10 14 20 24 30
Clock Period T = 10 ns
176
create_generated_clock
This command is used to generate the model of a clock from an
existing clock model, such as from a PLL or dedicated clocking block.
Example:
ce
core_clock is the base clock defined in the design below.
n
clock2 is derived by multiplying the core_clock by 2.
e
External
clock port
d
PLL multiply
by 2 logic
ca core_clock is defined
at this point.
clock2 is defined at
this point.
create_generated_clock (continued)
If the definition of the core_clock changes, it is automatically reflected
in the generated clock model.
create_generated_clock
e
Some of the arguments that can be passed to it are
c
-name <clock_name>
-source <master_pin>
en -divide_by <factor>
-multiply_by <factor>
-duty_cycle <percent>
d
-invert
-master_clock clock
ca
Example: create_generated_clock –name clock2
source_objects
ce
n
core_clock
clock2
d e
ca
6/16/08 BD03: Digital Physical Design 179
set_clock_transition
The create_clock command assumes an ideal clock with no rise and fall times.
To model some realistic values of rise and fall time, the set_clock_transition command
e
is used.
Some of the arguments to this command are
c
set_clock_transition -rise –fall <transition> <clock_list>
Example: set_clock_transition –rise 0.1
n
[get_clocks “clock_core”]
Set_clock_transition –fall 0.1
e
[get_clocks “clock_core”]
The above two commands together model the clock core to have a rise and fall time of 0.1 ns.
a ideal clock
d
c
0.1 0.1
with clock_
transition
ce
In SDC, this is achieved by the set_clock_uncertainty command
n
set_clock_uncertainty -from <from_clock>
-to <to_clock>
e
-setup
-hold
a
Example:
d <uncertainty>
<object_list>
c
set_clock_uncertainty 1.0 [get_port “clock2”]
An uncertainty of 1.0 ns is set for clock2, as shown in the next slide.
set_clock_uncertainty (continued)
clock2
ce
en
The uncertainty value means that the clock edge can start 1 ns before
or after the ideal clock edge.
d
Example:
set_clock_uncertainty –from core_clock –to clock2 3.0
a
This means that if there is a logical path that goes from the core_clock
clock domain to the clock2 clock domain, the uncertainty for such paths is
c
3 ns.
e
when certain technology specific cells or macros require it.
c
Care should be taken in using this command, because not only does it
render the said path as not timed, but any other path that passes
n
through this path as well.
d e
ca
6/16/08 BD03: Digital Physical Design 183
set_disable_timing (continued)
When the path from in2 to out2 as shown by the red arc is disabled, all the other paths going through
this arc are also disabled.
e
That is, by stating this one arc, two paths are disabled:
1. D1->Q1->in2->out2->D3->Q3 and
c
2. D2->in1->out1-> in2->out2->D3->Q3
But if the intention was only to disable one pat, say number 1 above, it should have been stated in a
n
different way or using a different command, which we will see later.
Various arguments that can be used with command are
e
set_disable_timing -from <from_pin_name> -to <to_pin_name> <cell_pin_list>
Example: set_disable_timing –from in2 –to out2
d
sel
D1 Q1 in2
a
D Q D3 Q3
D Q
c
out2
D2 in1
D Q
out1
e
False path: A path that has no functional purpose, or a path that does not
need to be timing constrained (for example, path between two clock
c
domains).
When a path is set as a false path, the synthesis tool only maps it to
technology-specific gates.
en
The tool does not optimize or improve the timing of this path even if it does
not meet timing.
d
Reasons for false path:
a
Path is never exercised during circuit operation
Path is only possible in special operation mode (test mode, etc.)
6/16/08
c
This command is different from set_disable_timing in the sense that
Only the paths specified are set as false.
Any other paths passing through a false path, but not sharing the same
exact start-end pair, will not be affected.
set_false_path (continued)
Some of the arguments that can be passed to this command are
set_false_path -from <from_list> -to <to_list> -through <through_list>
e
The following command sets the path from F1 to F3 as false:
set_false_path –from [get_cells “F1”] –to [get_cells “F3”]
nc
But if the intention is to set all the paths that pass through the red arc as false, this is
how it can be done:
set_false_path –through [get_cell “OR1/in2”]
D1 Q1
d e sel
in2
a
D Q D3
M1 OR1 Q3
F1
D Q
c
out2 F3
D2 in1
D Q AND1
F2 out1
e
Input delay is the time it would take for the data to arrive at the input
port of the design.
should have.
nc
Output delay is the margin that the data going out of the output port
It can be viewed as the input delay for the input port of another design that
e
is connected to this output port.
The figure on the following slide illustrates this better.
a d
6/16/08
c BD03: Digital Physical Design 187
e
The corresponding commands in the SDC format to model these
delays are
c
Input delay: set_input_delay
n
Output delay: set_output_delay
Output delay
e
Input delay
a d
6/16/08
c Clock period
Data arrival timing
Clock period Clock period
Data Required timing
e
-clock <clock_name>
-max
c
-min
-add_delay
n
<delay_value>
e
<port_pin_list>
If an input delay has already been specified for a pin, then the –add_delay
d
argument enables the new delay specified to be added on to the existing
delay.
a
Example: set_input_delay –max 3.0 [get_pin “in1”]
c
–clock [get_clock “core_clock”]
This command assumes an input delay of 3.0 ns for the data coming in at
the input port in1.
set_output_delay
Some of the arguments to set_output_delay are
set_output_delay
e
-clock <clock_name>
-max
c
-min
-add_delay
n
<delay_value>
e
<port_pin_list>
If an output delay has already been specified for a pin, then the
d
–add_delay argument enables the new delay specified to be added on to
the existing delay.
a
Example: set_output_delay –max 3.0 [get_pin “out1”]
c
–clock [get_clock “core_clock”]
This command assumes an output delay of 3.0 ns for the data going out of
the output port out1.
e
Input delay Output delay
nc
d e
Clock period Clock period Clock period
a
Data arrival timing Data Required timing
c
Same is true on the output side, the case being that the output load that is
being driven must be modeled.
These modeling techniques fall under environmental modeling commands and
will be covered after this section.
set_max_delay
max_delay is the period that a combinational path from the input port
to the output port in the design should meet.
ce
n
In Out
Combinational 1
d e
ca Max Delay
e
-from <from_list>
-to <to_list>
c
-through <through_list>
<delay_value>
en
Example: set_max_delay 5.0 –from [get_port “IN”]
–through [get_cell “combinational1”]
–to [get_port “OUT”]
a d
This command sets a maximum delay that is allowable through the
combinational path shown in the previous figure as 5.0 ns.
6/16/08
c BD03: Digital Physical Design 193
set_multicycle_path
The figure below helps illustrate multi_cycle_path.
D1 Q1
D Q
ce D Q
en
d
Data captured Data not Data captured
here; launch
captured here here
a
data from D1
c
The data in this example is captured every other clock cycle.
Specify in the SDC file that this particular path has a time period of two
times that of the clock period.
This can be achieved by using the set_multicycle_path command.
ce -end
-from <from_list>
-to <to_list>
n
-through <through_list>
e
<path_multiplier>
Example: set_multicycle_path –from [get_pin “D1”]
d
–to [get_pin “Q1”] 2
a
This command sets the time period for this particular path as twice the
clock period of that clock domain.
6/16/08
c BD03: Digital Physical Design 195
e
Constraints
General-purpose and object-access constraints
Timing constraints
n
Environmental constraints
c
d e
ca
6/16/08 BD03: Digital Physical Design 196
set_driving_cell
Remember the virtual logic outside the design when modeling I/O
delays.
ce
The set_driving_cell command specifies the cell type that is driving a
n
Virtual Logic
Output delay
e
Input delay
a d
6/16/08
c Clock period
Data arrival timing
Clock period Clock period
Data Required timing
set_driving_cell (continued)
Some of the arguments to this command are
set_driving_cell
e
-lib_cell lib_cell_name
-library <lib_name>
c
-pin <pin_name>
-clock <clock_name>
n
<port_list>
e
Example: set_driving_cell –libcell AND2X
–library xyz_130nm –pin Y
d
–clock [get_clocks “clk”] [get_port “input1”]
a
This command indicates that the output pin Y of an AND2X gate from the
library xyz_130nm is connected to the input1 port in the clk clock domain.
6/16/08
c BD03: Digital Physical Design 198
set_input_transition
This command models the transition of the waveform at the input port.
e
Rise time: The time it takes for the waveform to rise from 5% to 95% of
its final value.
nc
Fall time: The time it takes for the waveform to fall from 95% to 5% of
Input transition
d e
ca Rise time Fall time
set_input_transition (continued)
Some of the arguments to this command are
set_input_transition
e
-rise
-fall
c
-clock <clock_name>
<transition>
n
<port_list>
d e
Example: set_input_transition –rise 0.1
–clock [get_clocks “clk”]
a
[get_port “input1”]
set_input_transition –fall 0.2
c
–clock [get_clocks “clk”]
[get_port “input1”]
These commands model the waveform of input1 with a rise time of 0.1 ns
and a fall time of 0.2 ns.
e
At the output, one is not concerned about the cell that may be driven
or the transition that they may receive.
nc
View this as the input port modeling that another block, connected to
this output port, would model.
Some of the arguments to this command are
set_load -min
d
-max
e
<value>
a
<Objects>
c
Example: set_load –max 20 [get_ports “output1”]
This command sets a maximum load of 20 fF on the output1 port.
set_case_analysis
In a large design, all of the logic in the design may not be active at the
same time.
ce
Logic blocks may be activated based on the value of certain inputs.
This is done sometimes to save power.
n
Some designs themselves are configurable to perform different tasks
depending on certain input values.
EN[1:0]
d e Block1
EN(00)
Block2
EN(01)
ca EN(11)
Block4
EN(10)
Block3
ce
To get accurate timing and power numbers of the entire design, it
should be timed with one block enabled at a time, because that is
essentially how the design would actually behave.
en
This can be achieved by setting the enable pin to a constant value for
timing with the use of the set_case_analysis command.
The arguments that can be passed to it are
set_case_analysis
a d <value (0 or 1)>
<port_or_pin_list>
Example: set_case_analysis 0 [get_port “EN[0]”]
6/16/08
c set_case_analysis 0 [get_port “EN[1]”]
This command sets the value of the EN pin to binary 00 during timing.
set_max_fanout
Fanout indicates the number of cells being
driven by one cell.
e
If this number is very big, the size of the
driving cell is increased by the synthesis to
c
be able to drive this large load.
Bigger cell means more area, and
sometimes it is desirable to restrict the size
n
of cells.
This can be achieved by specifying the
e
maximum number of loads a cell can drive,
and this is exactly what this command
d
models.
The arguments to this command are
a
set_max_fanout <value>
object_list
c
Example: set_max_fanout 16 TOP_LEVEL
This command sets a limit of 16 on the
Fanout of 16
number of loads to all the cells in the design
TOP_LEVEL.
ce
Constraints guide the synthesis tool and tell it how to handle different
The libraries provide the synthesis tool with the building blocks for the
design itself.
en
a d
6/16/08
c BD03: Digital Physical Design 205
ce
B. Yes, it is possible and will result in a better design.
C. Yes, it is possible, but will result in a worse design.
en
2. Write the SDC command to model a clock that has a frequency of 100
d
3. Write the SDC command to model rise and fall times of 100 ps for the
above mentioned clock.
ca
6/16/08 BD03: Digital Physical Design 206
Testing Your Understanding (continued)
4. Which of the following constitutes uncertainty?
e
A. Clock skew
B. Clock jitter
c
C. Wire load assumptions
n
D. Margin
E. All of the above
d e
ca
6/16/08 BD03: Digital Physical Design 207
Learning Activity
In this activity, you will
e
Interpret the specifications for a given design
c
Present your results to the class
n
e
20 minutes for activity
10 minutes for debriefing
a d
6/16/08
c BD03: Digital Physical Design 208
Synthesis
Module 4
12
Paper
and Pencil
ce
n
Months
d e
Schematic
Capture
a
.1 Logic Synthesis
6/16/08
c
Logic synthesis has dramatically reduced the ASIC design cycle. You will
learn why in this module.
e
Explain the optimization stages of the synthesis flow
nc
d e
ca
6/16/08 BD03: Digital Physical Design 211
Discussion Questions
What is logic synthesis?
e
What are the inputs and outputs to and from logic synthesis?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 212
Topics in This Module
Logic synthesis
Introduction
Reading HDL source files
Elaborating design
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
optimizing, and mapping RTL Designer Placement
c
Microarchitecture
Physical Synthesis
Scan Reorder
standard cell library
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
Pre-CTS
Signal Integrity
n
Extraction
RTL CTS
Example: To determine the Design Optimization
Post-CTS
feasibility of the design, we Logic Synthesis
e
Route
need to synthesize the RTL Synthesized Design Optimization
Netlist Gates
code into gates, and measure Post-Route
d
Detail
timing, power, and area. Routed GDSII
Design
a
Layout Design Verification
6/16/08
c BD03: Digital Physical Design 214
Logic Synthesis: Input and Output, Format
Input
RTL SDC
RTL in the Verilog® language or
e
other HDL
c
Constraints in Synopsys Design
Constraints (SDC) format
Logic Synthesis Library
n
Timing Libraries in Liberty (.lib)
format Synthesized
Gates
e
Gates
Output
Gate-level netlist in the Verilog
d
language or other HDL
ca
6/16/08 BD03: Digital Physical Design 215
Minimize power
ce
In terms of switching activity in individual gates, deactivated circuit blocks
In terms of leakage power
Maximize performance
en
In terms of maximum clock frequency of synchronous systems, throughput for asynchronous
systems
d
Quickly produce accurate functional models
Gate-level model is functionally equivalent to RTL model
a
Gate-level model is produced in less time than is required by an experienced logic designer to
create the same model
6/16/08
c
Produce predictable and accurate results
Timing, area, and power consumption calculations should correspond with actual values
measured on physical device once manufactured.
e
Read RTL source files Parse source code, check read_hdl
syntax
c
Elaboration Build data structures and elaborate
registers
Technology-independent
mapping
n
Optimize data structures
e
synthesize –to_generic
d
(mapping) gates
a
Technology-dependent Use optimized gates in the retime (optional)
optimization technology library
c
Scan chain insertion Build the scan chain synthesize –to_map
–incremental
Timing report analysis Create timing reports report_timing
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
Example
in the next phase.
ce
If the source files pass the lint check, they are loaded into memory for use
n
rc:/> read_hdl -v2001 my_design.v
e
a d
6/16/08
c BD03: Digital Physical Design 219
e
Reading Verilog file ‘my_design.v'
c
assign #1875 write_clk_int = ~clk; loads my_design.v
|
n
Warning : Ignoring delay specifier. [VLOGPT-35]
: in file '/my_design.v' on line 373, column 14.
: A delay specifier, either in an assignment or as a separate statement, is
e
not synthesizable.
assign #1875 postamble_clk_int = ~clk;
a d
Linting process has detected a problem in my_design.v.
Details of the problem are listed. In this case, my_design.v includes a Verilog
c
construct that is not synthesizable (Verilog # construct).
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
Elaborating Design
Builds data structures and infers registers in the design
e
Function expansion (e.g., functions are in-line expanded)
Constant propagation
Original code
nc
Detect operands driven by constant values and pre-compute the output
e
a = 0;
b = a + 1;
d
c = 2 * a;
Optimized code
ca
b = 1;
c = 0
Original loop
ce
would have iterated. This allows for greatest possible optimizations later.
n
z[a] = x[a] + y[2-a];
e
Unrolled loop
z[2] = x[2] + y[0]
d
z[1] = x[1] + y[1]
a
z[0] = x[0] + y[3]
a = x
6/16/08
c BD03: Digital Physical Design 223
removed.
Original code
ce
is never referenced elsewhere. Such operations are detected and
n
a = x
e
b = a + 1
c = 2 * a
d
Optimized code
b = x + 1
ca
c = 2 * x
Dead code elimination removed
a = x
e
elaborate command
rc:/> elaborate my_design builds my_design
c
Elaborating block my_design from file ‘my_design.v'.
Warning : Removing unused register. [CDFG-508]
n
: Removing unused register 'doing_wr_r' in module ‘my_design' in
file ‘my_design.v' on line 155. Beginning of
e
Info : Unused module input port. [CDFG-500] elaboration section
: Input port 'p_clk' is not used in module ‘my_design' in file
‘my_design.v' on line 90.
d
End of elaboration
Done elaborating ‘my_design'. section
ca
6/16/08 BD03: Digital Physical Design 225
ce
n
Technology-independent (generic) mapping
Technology transformation
d e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
implementation.
c
Technology-independent mapping and optimization techniques:
Carry save arithmetic optimization
Logic pruning
Resource sharing
Speculation
en
a d
Implementation selection
Arithmetic optimization
c
Common sub-expression sharing
Logic speculation
e
The carry logic for the intermediate sums is saved until the very end, thus
c
saving area and possibly timing.
a bc d e f
n
a b c d e f
+ + +
d e
a
+ +
z z
6/16/08
c Carry-propagate
228
Pruning Logic Driving Unused Pins
By default, logic that drives unused (unloaded) hierarchical pins is optimized
away.
c
transitively drive an output port.
e
In the example below, instances in red are deleted because they do not
D Q
en D Q
*
in1
in2
a d * *
c
in3 1 1 out1
in4 0 1’b1 0
Resource Sharing
A resource is any computational element, Given the following HDL description:
such as an add, shift, or “if/then” operation. if (select)
e
sum <= A + B;
Each type of operator in the RTL description else
sum <= C + D;
requires a unique resource type.
c
For instance + operator requires an adder One possible implementation:
and’ > requires a comparator. A
n
+
Maximum number of resources required for B
MUX sum
each operator type is the number of times an
e
operator is used in the RTL description. C
D
+
Resources can be reduced, thus saving
d
area, using the following techniques: select
a
Another, more efficient implementation.
resource type. For instance, + and - operators
A
can be mapped to an add-subtract unit. MUX
c
C
Operators in different clock cycles can share select sum
+
the same resource. This is determined by B
analyzing if there are any data flow or control D
MUX
flow conflicts (discussed later).
if (Q =‘0’)
Q
A
MUX
C
ce B
MUX
D
Speculation
A B C D
n
x = a + b; + +
else
+
e
y = c + d; X Y
Q
X
Resource MUX
Y
d
Sharing
ca
6/16/08 BD03: Digital Physical Design 231
e
RTL File
Cadence Encounter® RTL Compiler (RC) has
such a library known as ChipWare.
Z <= X + Y
c
ChipWare (CW) library includes
n
Common combinational and sequential
components
Arithmetic components (adders, subtractors,
e
ChipWare
add_op
multipliers) Library
a
maps those operators to CW
d
in RTL files it reads and automatically
ADD_SUB ADD ALU
c
components, if available.
CW components often have multiple
Implementations
architectural implementations that allow
ripple CLA proprietary
logic synthesis to pick one according to
design need.
e
Design constraints determine the appropriate ChipWare component.
c
en fastest
Brent-Kung
Carry Look-Forward
d
Z <= A*B + C +
a
HDL Operator Carry Look-Ahead
c
smallest
Ripple Carry
Arithmetic Optimization
SUM <= A + B + C + D
A
e
B +
+
c
Initial Order C
SUM
D +
n
A
+
e
B
SUM
Optimized For Speed C +
•All inputs have equal delay
d
D +
a
Late A
B +
SUM
c
Optimized For Speed C +
•Input A is late arriving
D +
e
SUM1 <= A + B + C
SUM2 <= A + B + D
c
SUM3 <= B + A + E
n
The “A+B” sub-expression could be shared, thus saving two adders in the process.
The order within the sub-expressions is not important, but the position must be the
e
same.
d
A B C A B D B A E A B C D E
a
+ + + +
c
Sharing of Sub-
+ + + + + +
Expressions
ce
Multiplexor optimization
en
Carry-save arithmetic optimization
d
You can run this stage separately by using the following command:
synthesize –to_generic -effort <effort_level>
ca
6/16/08 BD03: Digital Physical Design 236
Log Entries for Technology Independent Mapping
Starts technology-
independent
e
rc:/> synthesize -to_generic optimization
process
Deleting 2 sequential instances. They do not transitively
c
drive any primary outputs: Logic pruning
vpb/vpo/luma_sel_a1_reg[0], vpb/vpo/luma_sel_reg[0] (floating root)
Info
: The implementation
n
: An implementation was inferred. [CWD-19]
e
'/hdl_libraries/GB/components/increment/implementations/very_fast' was
Implementation
selection
d
inferred through the binding 'b1' for the call to synthetic operator
'INCREMENT_CI_OP'. Mux
a
Optimizing muxes in design ‘my_design' optimization
c
End of technology-
Synthesis succeeded. independent
optimization
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
implement the circuit.
c
Technology mapping is normally done after technology-independent
optimization.
Why technology mapping?
en
Straight implementation may not be good. For example, F = abcdef as a
six-input AND gate cause a long delay.
d
Gates in the library are pre-designed; they are usually optimized in terms
a
of area, delay, power, etc.
Fastest gates along the critical path, area-efficient gates (combination)
c
off the critical path.
Global mapping
ce
are derived from the fastest arrival time.
Optimizes for area, timing, power, and maps the design while aiming for
Remapping
en
the target clock frequency
Evaluates every cell in the design and resizes as needed to improve area
and power consumption
a
Incremental optimization
d
c
Runs Design Rule Checks (DRCs), timing and area cleanup, and critical
region resynthesis (CRR) for timing optimization
e
on the effort level you set. The result of this stage is the target for each cost group.
synthesize –to_mapped -effort <effort_level>
nc Starts technology-
dependent
optimization
process
e
Mapping my_design to gates.
Technology
Mapping ‘my_design'... mapping
d
Preparing the circuit
Structuring (delay-based) logic partition in alu_32...
a
Target setting
Performing redundancy-removal...
c
Performing bdd-opto...
Performing redundancy-removal... End of target setting
Done structuring (delay-based) logic partition in alu_32
e
meet the target timing.
synthesize –to_mapped -effort <effort_level>
nc
Restructuring (delay-based) cb_part_4...
e
Done restructuring (delay-based) cb_part_4
d
Restructuring (delay-based) cb_oseq_3...
Indicates the beginning of
global mapping
a
Done restructuring (delay-based) cb_oseq_3
c
Optimizing component cb_oseq_3...
Restructuring (delay-based) cb_part...
Done restructuring (delay-based) cb_part
Optimizing component cb_part...
ce
n
Total
Total Worst
Operation Area Slacks Worst Path
e
-------------------------------------------------------------------------------
global_map 721782 -308 VIT_ACS10/NEW_reg[5]/CP --> VIT_ACS26/NEW_reg[1]/D
d
fine_map 514143 -372 VIT_ACS10/NEW_reg[6]/CP --> VIT_ACS8/NEW_reg[2]/D
area_map 512565 -344 VIT_ACS23/NEW_reg[4]/CP --> VIT_ACS31/NEW_reg[7]/D
a
area_map 498515 -345 VIT_ACS1/NEW_reg[5]/CP --> CS4/SELECT_REG_reg/D
Done mapping dtmf_chip
c
Indicates the beginning of
remapping
ce
n
Total Total - - - - DRC Totals - - -
Total Worst Neg Max Max Max
Operation Area Slacks Slack Trans Cap Fanout
e
-------------------------------------------------------------------
init_delay 498515 -345 -124671 414 18 229
d
Path: VIT_ACS1/NEW_reg[5]/CP -->
VIT_ACS4/THREE_SELECT_REG_reg/D
incr_delay 502638 -301 -114125 129 69 276
a
Path: VIT_ACS16/NEW_reg[2]/CP --> VIT_ACS2/NEW_reg[6]/D
incr_delay 511982 -267 -100144 0 19 614 Indicates the
c
Path: VIT_ACS19/NEW_reg[6]/CP --> beginning of
VIT_ACS13/NEW_reg[1]/D incremental
synthesis
incr_delay 515304 -221 -91064 0 34 614
Path: VIT_ACS30/NEW_reg[2]/CP -->
VIT_ACS10/NEW_reg[0]/D
e
engine, and the number of times that the routine has been run to improve
the design goals.
nc
Trick Calls
crr_rsyn
Accepts
389 (
Attempts
215 /
Time
-------------------------------------------------------
300 ) 79917
e
tricks must be small.
crr_glob 25 ( 198 / 215 ) 5324
crit_upsz 4746 ( 2047 / 2117 ) 31691
d
fopt 358 ( 0 / 0 ) 23
crit_dnsz 428 ( 23 / 25 ) 4970
dup 347 ( 1 / 1 ) 250
a
DRC fixing is done at the end fopt 1076 ( 261 / 336 ) 22515
of each pass. setup_dn 398 ( 11 / 14 ) 324
c
exp 25 ( 23 / 64 ) 3214
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
system) is implemented by one or
more logic gates in a pre-designed
c
set of gates (called technology
library or cell library).
n
Advantage: Gates in the cell
e
library have a highly optimized,
pre-defined path to silicon, so that
the area and delay parameters are
d
known and accurate.
ca
6/16/08 BD03: Digital Physical Design 247
e
Register re-timing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 248
Controlling Boundary Optimization
Examines input and output pin characteristics of a sub-design to try and optimize a
mapped netlist
design
ce
Removes any gate that drives output ports that are not connected outside a
n
Propagates constant values across hierarchical boundaries and eliminates
e
unnecessary logic.
d
a being 0, the blocks L1 and L2
are equivalent and therefore
a
optimized. L2
L1
c
clk
Constant Hierarchical
a=0 boundary
Retiming
Retiming optimizes the register locations in the design to improve the results without
changing the combinational logic or latency through the chip or block. Use the
e
following attributes to control retiming on the design and sub-designs:
Reposition flops
6ns
nc 4ns
d e
WNS: -1 ns
Combine
flops
a
5ns
6/16/08
c
Required Clock : 5ns
retime –min_delay
250
Topics in This Module
Logic synthesis
Introduction
Reading HDL source files
Elaborating design
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
Test Synthesis
Manufacturing defects in ASICs are detected using automated test
equipment (ATE), which sends special bit patterns known as test
e
vectors into the inputs of the ASIC and compares the output to
expected values. Any difference could mean the ASIC is not
functioning properly.
nc
Improve testability by making every register in the design look like a
“virtual I/O.”
d e
Allows every flip-flop to be independently controlled and observed.
Allows every flip-flop to act like a combinational logic input.
Allows every flip-flop to act like a combinational logic output.
ca
6/16/08 BD03: Digital Physical Design 252
Test Synthesis (continued)
Test synthesis is the modification of a chip design to make both the
chip and the PC board system containing it more testable.
ce
Coupled with this testability is the automatic test-pattern generation
(ATPG) of test vectors. Design for test (DFT) lets you modify a design
to make a circuit more testable. Test synthesis tools can assist in both
n
places.
e
The use of test synthesis for DFT techniques and ATPG reduces from
months to days the time to generate manufacturing test vectors.
a d
Use the DFT features of RTL Compiler to improve your ability to
control and observe internal signal nodes. After RTL and logic
synthesis, test synthesis can perform full or partial internal-scan cell
c
insertion and boundary scan. An ASIC vendor often implements
special cells in the ASIC library to handle these tasks.
ce
Internal scan replaces latches and flip-flops with their scan-equivalent
latches and flip-flops. Each scan cell has a scan-data input (SDI), a
scan-data output (SDO), and a test-enable (TE) input. The tool
n
connects groups of these cells in chains of equal or similar length.
data_in
ca
clock
ce
shift_enable signal selects normal functional data input or a new scan
n
Scan inputs are chained to the output of other flip-flops.
Same clocks are used for both scan and functional operations.
d e Scan-DFF
Q
a
2:1
SI DFF
c
SE
QB
Muxed-Scan Hookup
Add scan chains
e
D Q D Q
SI SI SI
SE
CK
QB
nc SE
CK
QB
D Q
d e D
SI
Q SO
a
SI
c
SE SE
QB QB
CK CK
SE
Connect shift_enable
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q
SO
SE
6/16/08
cSE
CK
QB
257
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q SO
SE
6/16/08
c
SE
CK
QB
258
Muxed-Scan Shift Cycle
Sequence: SE to active state, pulse clock “n” times to scan in/out data
SI
D
SI
Q
ce D
SI
Q
n
SE SE
QB QB
CK
e
CK
a
D
SI
Q
d D
SI
Q
SO
SE
6/16/08
cSE
CK
QB
259
e
Elaborate Design
Modify constraints Set Timing and Design
c
Constraints Shift enable
Modify optimization Test mode
Apply Optimization Directives
directives Prevent scan mapping of flops
n
Setup for DFT Rule Checker Internal clocks as test clocks
Run DFT Rule Checker and DFT controllable constraints
Abstract scan segments
e
Report Registers
Fix DFT Violations
Test-point insertion
d
Add Testability Logic Shadow logic insertion
Synthesize Design and Map to
Scan
a
Scan chains
Set up DFT Configuration Number of scan chains
Constraints and Preview Scan Length of scan chains
c
Chains Control data lockup elements
Connect Scan Chains
Netlist, SDC
Yes ScanDEF, ATPG, Abstraction Model
ce
n
Technology-independent (generic) mapping
Technology transformation
d e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
Technology libraries: slow_normal 1.0 slow_hvt 1.1 tpz973gtc 230 ram_128x16A 0.0
ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 information.
Operating conditions: slow (balanced_tree)
Wireload mode: enclosed
c
============================================================
Pin Type Fanout Load Slew Delay Arrival
(fF) (ps) (ps) (ps)
----------------------------------------------------------------------------------
n
(clock m_clk) launch 0 R
latency +4000 4000 R
DTMF_INST
TDSP_CORE_INST
e
DATA_BUS_MACH_INST
data_out_reg[0]/clk 0 4000 R Body includes
data_out_reg[0]/q (u) unmapped_d_flop 19 155.1 0 +258 4258 R arrival time
DATA_BUS_MACH_INST/data_out[0]
calculation.
d
TDSP_CORE_GLUE_INST/data_out[0]
TDSP_CORE_GLUE_INST/port_data_in[0]
PORT_BUS_MACH_INST/data_in[0]
PORT_BUS_MACH_INST/pad_data_out[0]
a
TDSP_CORE_INST/port_pad_data_out[0]
DTMF_INST/port_pad_data_out[0]
IOPADS_INST/tdsp_portO[0]
c
Ptdspop00/I +0 4258
Ptdspop00/PAD PDO04CDG 1 6719.0 2038 +1648 5906 R
IOPADS_INST/tdsp_port_out[0]
port_pad_data_out[0] out port +0 5906 R
(ou_del_1) ext delay +500 6406 R
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(clock refclk) capture 6000 R
uncertainty -250 5750 R
---------------------------------------------------------------------------------- Footer includes
Timing slack : -656ps (TIMING VIOLATION) timing slack
Start-point : DTMF_INST/TDSP_CORE_INST/DATA_BUS_MACH_INST/data_out_reg[0]/clk calculation.
End-point : port_pad_data_out[0]
e
Generated on: Jul 23 2007 03:16:40 AM
Module: dtmf_chip Header includes
Technology libraries: slow_normal 1.0 slow_hvt 1.1 tpz973gtc 230 ram_128x16A 0.0 library and module
ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 information.
c
Operating conditions: slow (balanced_tree)
Wireload mode: enclosed
============================================================
Tool-specific information
en
In the header, the following information is given:
d
Timestamp
a
Module information
Technology libraries
6/16/08
c
Operating conditions
Wireload mode
e
latency +4000 4000 R
DTMF_INST
TDSP_CORE_INST
DATA_BUS_MACH_INST
c
data_out_reg[0]/clk 0 4000 R Body includes
data_out_reg[0]/q (u) unmapped_d_flop 19 155.1 0 +258 4258 R arrival time
DATA_BUS_MACH_INST/data_out[0] calculation.
TDSP_CORE_GLUE_INST/data_out[0]
n
TDSP_CORE_GLUE_INST/port_data_in[0]
PORT_BUS_MACH_INST/data_in[0]
PORT_BUS_MACH_INST/pad_data_out[0]
e
TDSP_CORE_INST/port_pad_data_out[0]
DTMF_INST/port_pad_data_out[0]
IOPADS_INST/tdsp_portO[0]
Ptdspop00/I +0 4258
Ptdspop00/PAD PDO04CDG 1 6719.0 2038 +1648 5906 R
d
IOPADS_INST/tdsp_port_out[0]
port_pad_data_out[0] out port +0 5906 R
(ou_del_1) ext delay +500 6406 R
a
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(clock refclk) capture 6000 R
uncertainty -250 5750 R
The body of the timing report includes arrival time calculation and includes
c
Instance pins the timing path goes through
Fanout for each pin output
Load and slew for each pin output
Incremental delay for each cell
Cumulative delay (or arrival time) for each cell
e
Start-point : DTMF_INST/TDSP_CORE_INST/DATA_BUS_MACH_INST/data_out_reg[0]/clk calculation.
End-point : port_pad_data_out[0]
Timing slack
nc
The footer section of the timing report shows the final calculation and includes
e
Start point
End point
a d
6/16/08
c BD03: Digital Physical Design 265
Discussion Questions
Given the timing report in the previous example:
e
What type of path is being checked?
input->reg, reg->reg, reg->output, or input->output?
nc
What logic gates are involved in the timing path?
e
Which clocks are launching and capturing the data?
d
What is the clock period of the design?
a
What is the clock uncertainty?
c
What is the arrival time?
Why does the path violate timing, and how can it be fixed?
e
report datapath Prints a datapath resources report
report design_rules Prints design rule violations
report gates
report hierarchy
summary
nc
Reports libcells used, total area, and instance count
e
report instance Prints an instance report
d
report memory Prints a memory usage report
report messages Prints a summary of error messages that have been
a
issued
c
report power Prints a power report
report qor Prints a quality of results report
report timing Prints a timing report
report summary Prints an area, timing, and design rules report
6/16/08 BD03: Digital Physical Design 267
ce
n
Technology-independent (generic) mapping
Technology transformation
d
Timing report analysis
e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
administrator to ensure that RC is installed and working properly.
c
To invoke RC from the UNIX prompt, type the following:
unix% rc
n
A message appears similar to the following, as well as the rc_shell prompt.
e
Now you can enter commands directly at this prompt and synthesize a design.
d
Checking out license 'RTL_Compiler_Ultra'... (0 seconds elapsed)
License RTL_Compiler_Ultra checkout failed
a
Checking out license 'RTL_Compiler_Verification'... (0 seconds
elapsed)
c
Cadence Encounter(r) RTL Compiler
Version v06.20-s019_1 (64-bit), built Mar 8 2007
rc:/>
such as vi or emacs.
ce
Log file is a text document that can be viewed using any text editor
n
Checking out license 'RTL_Compiler_Ultra'... (0 seconds elapsed)
License RTL_Compiler_Ultra checkout failed RC license checkout info
Checking out license 'RTL_Compiler_Verification'... (1 seconds elapsed)
e
RC version
Cadence Encounter(r) RTL Compiler
Version v06.20-s019_1 (32-bit), built Mar 8 2007
d
Welcome message
========================================================================
a
Welcome to Encounter (TM) Encounter(r) RTL Compiler
Any line that begins with
======================================================================== rc:/> is a command
issued to RC.
c
rc:/> source config/libraries_virage.tcl All other lines are RC’s
response.
rc.log
ce
n
Technology-independent (generic) mapping
Technology transformation
d e
Technology-dependent optimizations
a
Running logic synthesis
c
Physical synthesis
Fundamentals
Basic operation and flow
e
placement.
Logical synthesis and placement
c
optimizations can be run Logic Timing
Library
concurrently in a single Synthesis
n
executable.
Physical information, usually Netlist
e
reserved for physical design
(place/route) tools, is used by the
d
physical synthesis tool to optimize Floorplan Floorplan
the design.
a
Physical library Floorplan
Netlist
Floorplan information
6/16/08
c
Timing is more accurate, because
the wiring estimations are based
on real placement, but run times
are typically higher because of the
additional placement steps done.
Placement
272
RC Physical Methodology
Introduced by Cadence in RTL Compiler 7.1
e
Incorporates physical/process data into interconnect delay calculations
QoS Prediction
nc
Physical Layout Estimation (PLE)
d e
ca
6/16/08 BD03: Digital Physical Design 273
into account.
ce
PLE uses a proprietary algorithm that takes design and vendor process data
Other significant differences between WLMs and PLE are summarized below:
en
a d
6/16/08
c BD03: Digital Physical Design 274
Synthesis Flow with PLE Enabled
Inputs Steps Commands
e
.lib Read tech lib set_attr library
c
cap set_attr lef_library
LEF Read physical lib info
Table set_attr cap_table_file
RTL
n
Read Verilog source files
e
read_hdl
d
Elaborate design elaborate
a
DEF .sdc Apply constraints read_sdc
c
set_attr def_file
Define DFT controls and DRC
Basic Flow
Quality-of-Silicon Prediction
Quality of silicon (QoS) prediction
e
Targets the long nets PLE cannot estimate—the last 10-20% of nets.
nc
Uses SVP to perform trial place and route, so loading on long nets can
be estimated properly.
d e
Works in concert with PLE, and maximum predictability is achieved
when both features are enabled.
ca
6/16/08 BD03: Digital Physical Design 276
Synthesis Flow with QoS Prediction Enabled
Inputs Steps Commands
DEF LEF
cap
Table
ce
Read physical lib info
set_attr lef_library
set_attr cap_table_file
set_attr def_file
d
QoS prediction
predict_qos
(silicon virtual prototyping)
ca
Basic Flow
QoS Flow
Incremental optimization
Generate reports
synthesize –to_map -incr
report_timing,
area, power, qor, etc.
Summary
We have discussed the following topics in this module:
e
Major phases of logic synthesis
Technology-independent (generic) mapping
nc
Technology transformation
Technology-dependent optimizations
Scan chain insertion
d e
Fundamental concepts of physical synthesis
Integration of logic synthesis and placement
a
Usage in the flow with PLE and QoS prediction
6/16/08
c BD03: Digital Physical Design 278
Testing Your Understanding
True or false
e
1. Technology-independent optimization takes place before technology-
dependent optimization.
nc
2. Boundary optimization takes place during technology mapping.
mapping.
d e
4. A Boolean network is generated immediately after technology
a
5. Physical synthesis is the integration of floorplanning and placement.
6/16/08
c BD03: Digital Physical Design 279
Learning Activity
In this activity, you will
e
Study a log file after synthesis, including a timing report
c
Present your results to the class
n
e
20 minutes for activity
10 minutes for debriefing
a d
6/16/08
c BD03: Digital Physical Design 280
Floorplanning and Placement
Module 5
ce
en
a d
6/16/08
c BD03: Digital Physical Design 282
How?
Built in layers from the ground up
Silicon
ce
en
a d
6/16/08
c BD03: Digital Physical Design 283
How? (continued)
Electrical wiring
ce
Made up of building blocks
en
a d
Bricks in the case of apartments
Silicon atoms, dopants, and metals in the case of microchips
6/16/08
c BD03: Digital Physical Design 284
How? (continued)
Built using a floorplan
“Rooms” have explicit functions
ce
en
a d
6/16/08
c BD03: Digital Physical Design 285
Module Objectives
In this module, you will be able to
e
Articulate the steps in floorplanning and power planning
nc
d e
ca
6/16/08 BD03: Digital Physical Design 286
Discussion Questions
Recall the flowchart diagram of the
design flow steps required to take an
e
idea to product (chip).
c
In which part of the flow does
floorplanning occur?
n
In which part of the flow does
placement occur?
e
Design Flow
Input/Output ? ?
Step
d
? Input/Output
a
? ?
6/16/08
c BD03: Digital Physical Design 287
e
Power planning
Placement
nc
d e
ca
6/16/08 BD03: Digital Physical Design 288
Floorplanning
Definition
e
Implementation flow overview
DEF file
How to floorplan
nc
Floorplanning inputs and outputs
e
Module constraint types
d
Pin placement
ca
6/16/08 BD03: Digital Physical Design 289
What Is Floorplanning?
Floorplanning is the process of deriving the die size, allocating space for soft
blocks, planning power, and macro placement.
Example:
ce E C
n
F G
e
D
B A
d
F B
G A
ca C D
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
Route
GDSII GDSII
e
logic synthesis Gates
c
Constraints (SDC) are needed so
that timing with STA can be
accurate and measured against
n
the specifications of the design Tech
Constraints Lib
Timing library (.lib) contains the
e
timing information for each
discrete logic gate or macro Floorplanning
d
Physical library (LEF) contains
information about the shape and
a
Phys
connectivity of the technology Lib
library cells
Outputs
6/16/08
c
Floorplan of the design, which is
saved in the form of a DEF file
292
What Is Design Exchange Format (DEF)?
Definition: A specification for
representing logical connectivity
e
and physical layout of an
Gates
integrated circuit in ASCII format
nc Constraints
Tech
Lib
e
connectivity, and physical location
of cells and macros on the chip. It Floorplanning
d
contains floorplanning information
such as standard cell rows,
a
Phys
groups, placement and routing Lib
blockages, placement constraints,
c
and power domain boundaries. It
also contains the physical
representation for pins, signal DEF
routing, and power routing,
including rings and stripes.
e
PROPERTYDEFINITIONS
COMPONENTPIN designRuleWidth REAL ;
DESIGN FE_CORE_BOX_LL_X REAL 2.8 ;
Header information
DESIGN FE_CORE_BOX_UR_X REAL 1997.2 ;
DESIGN FE_CORE_BOX_LL_Y REAL 2.8 ;
c
DESIGN FE_CORE_BOX_UR_Y REAL 3997.2 ;
END PROPERTYDEFINITIONS
n
ROW
ROW
…
CORE_ROW_2 UMC13FSNSITE 5600 16800 FS DO 4986 BY 1 STEP 800 0 ;
CORE_ROW_3 UMC13FSNSITE 5600 22400 N DO 4986 BY 1 STEP 800 0 ; Area and rows
ROW CORE_ROW_1423 UMC13FSNSITE 5600 7974400 N DO 4986 BY 1 STEP 800 0 ;
ROW CORE_ROW_1424 UMC13FSNSITE 5600 7980000 FS DO 4986 BY 1 STEP 800 0 ;
e
ROW CORE_ROW_1425 UMC13FSNSITE 5600 7985600 N DO 4986 BY 1 STEP 800 0 ;
TRACKS Y 1200 DO 5000 STEP 1600 LAYER ME8 ; TRACKS X 1200 DO 2500 STEP 1600 LAYER ME8 ;
TRACKS X 500 DO 5000 STEP 800 LAYER ME7 ; TRACKS Y 1200 DO 5000 STEP 1600 LAYER ME7 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME6 ; TRACKS X 500 DO 5000 STEP 800 LAYER ME6 ;
TRACKS X 400 DO 5000 STEP 800 LAYER ME5 ; TRACKS Y 400 DO 10000 STEP 800 LAYER ME5 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME4 ; TRACKS X 400 DO 5000 STEP 800 LAYER ME4 ;
d
TRACKS X 400 DO 5000 STEP 800 LAYER ME3 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME3 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME2 ;
TRACKS X 400 DO 5000 STEP 800 LAYER ME2 ;
Routing tracks
TRACKS X 400 DO 5000 STEP 800 LAYER ME1 ;
TRACKS Y 400 DO 10000 STEP 800 LAYER ME1 ; and GCell information
a
GCELLGRID X 3992400 DO 2 STEP 7600 ;
GCELLGRID X 400 DO 500 STEP 8000 ;
GCELLGRID X 0 DO 2 STEP 400 ;
GCELLGRID Y 7992400 DO 2 STEP 7600 ;
c
GCELLGRID Y 400 DO 1000 STEP 8000 ;
GCELLGRID Y 0 DO 2 STEP 400 ;
PINS 765 ;
- ADC0[0] + NET ADC0[0] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7524700 ) E ;
- ADC0[10] + NET ADC0[10] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7564700 ) E ;
- ADC0[11] + NET ADC0[11] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7568700 ) E ;
-ADC0[1] + NET ADC0[1] + DIRECTION INPUT + USE SIGNAL + LAYER ME3 ( -1000 0 ) ( 1000 600 ) + FIXED ( 0 7528700 ) E ;
Pins
- TST_SEL + NET TST_SEL + DIRECTION INPUT + USE SIGNAL + LAYER ME4 ( -300 0 ) ( 300 2000 ) + FIXED ( 3501690 8000000 ) S ;
…
END PINS
END DESIGN
[BUSBITCHARS statement]
DESIGN statement
[TECHNOLOGY statement]
ce [COMPONENTS section]
[PINS section]
[PINPROPERTIES section]
[UNITS statement]
[HISTORY statement]
e
[PROPERTYDEFINITIONS SECTION ]
n [BLOCKAGE section]
[SLOTS section]
[FILLS section]
[DIEAREA statement]
a
[ROWS statement]
[TRACKS statement]
d [SPECIALNETS section]
[NETS section]
[SCANCHAINS section]
c
[GCELLGRID statement]
[VIAS statement]
[STYLES statement]
6/16/08
[GROUPS section]
[BEGINEXT section]
How to Floorplan
When the design is imported into the tool, a default die size is
calculated and displayed, and each module is assigned a physical
e
representation using a default placement density of 70% and aspect
ratio of 1.
nc
Each unit represents a particular module in the design.
Floorplanning allocates position and area to each unit.
d e
ca
6/16/08 BD03: Digital Physical Design 296
How to Floorplan (continued)
Position the modules and blocks in the die area. In general, position
the modules and blocks such that the area of the bounding rectangle
e
is minimum or meets the die size requirement. Try different
orientations, aspect ratios, and placement densities of the modules to
puzzle fit them into the die area.
nc
The bounding rectangle represents the die area.
d e
ca
6/16/08 BD03: Digital Physical Design 297
ce
modules. The higher the flightlines between two modules, the closer these
modules will have to be within the design.
Flightlines indicate how much communication occurs between two
n
modules.
The diagram below shows how to floorplan optimally. The numbers
modules.
d e
over the flightlines indicate the number of nets between corresponding
a
121
B B
D
A
c
34
D 57
E
C A
C
152 104
E
6/16/08 BD03: Digital Physical Design 298
How to Floorplan (continued)
Example: The design below shows the flightlines between one of the
modules and its macro on the right side of the die area, as well as with
e
other modules that communicate with it.
nc
d e
ca
6/16/08 BD03: Digital Physical Design 299
Type
None
Definition
ce
Contents of module are placed without any constraint.
n
Guide Module is placed in core design area. It guides placement
of the module’s cells in the vicinity of guides location.
Fence
d e
Fence is a hard constraint in core design area. Design for
the module is self-contained within the rigid outline of a
fence.
a
Region Same as a fence, except that instances from other modules
can be placed within its physical outline.
c
Soft Guide Similar to guide, except that there are no fixed locations.
ce
en
a d
6/16/08
c BD03: Digital Physical Design 301
Pin Placement
There are two ways to handle pin placement, using a bottom-up or top-
down approach.
Bottom up
ce
Pins are initially placed along with the cells in a block to optimize their
placement with respect to that block.
n
The top-level floorplan is finished, and pin placement is re-optimized
considering both top-level goals and block timing.
Top down
d e
The pins are initially placed in the top-level floorplan to optimize their
placement on a global level.
a
Then, their location is fixed within a block, and the block level cells are
placed.
c
Finally, the pin placement is re-optimized considering both top-level
goals and block timing.
Use bottom up if the top-level design is incomplete so progress can be
made at the block level. Use top down if the top-level design is near
complete so that you can account for the inter-block connections.
6/16/08 BD03: Digital Physical Design 302
Pin Placement Goals
Identifying critical paths and making placement tradeoffs to optimize
the critical paths
Wire length reduction
ce
Achieving timing by reducing the amount of block-to-block or IO-to-block
interconnect
en
Achieving via-free direct routes
Achieving accurate pin matching between hierarchical boundaries
d
Pin spacing variation in congested areas
a
6/16/08
c BD03: Digital Physical Design 303
e
Power planning
Placement
nc
d e
ca
6/16/08 BD03: Digital Physical Design 304
Power Planning
Definition
e
Goals
nc
e
Types of power routing
d
Steps involved in power routing
a
Multiple supply voltages
6/16/08
c BD03: Digital Physical Design 305
Example:
ce
en
a d
6/16/08
c BD03: Digital Physical Design 306
What Are Voltage (IR) Drop and Electromigration?
Voltage (IR) drop is the voltage drop across a chip’s power network
caused by current and resistance associated with the power network.
ce
Electromigration (EM) is the mechanical failure of metal wires because
of metal atoms migrating over a long period of time due to high current
densities, causing open circuits, short circuits, or unacceptable
n
increases in resistance.
d e
ca
6/16/08 BD03: Digital Physical Design 307
ce
To size the power wires and choose the metal layers necessary to
deliver the required power to different parts of the chip without causing
failure
en
a d
6/16/08
c BD03: Digital Physical Design 308
Need for Power Planning
Power-related issues can
e
Affect chip timing due to excessive rail voltage drop (“IR-drop”) and
ground bounce
c
Lead to complete device failure due to electromigration effects
n
d e EM Failures as seen though a Scanning
Electron Microscope (SEM)
ca
The effects of IR-drop and other power-related issues can be limited by
Good power-grid design
e
basic elements into the power network.
c
Power pads that supply power to
the chip
n
Power rings around the periphery
of the die that carry power to the
e
standard cells and macros
Rings are put on higher level
d
routing layers leaving the lower
layers for signal routing
ca
Power rails and trunks that cross
the entire die or sections of the die
e
Quantification of chip power
Total chip power
Maximum power density
nc
Total chip power fluctuations
d e
Allocation and coordination of chip resources
Wiring tracks for power grid
a
Low Vt devices
Dynamic circuits
6/16/08
c
Clock gating
Placement and quantity of decoupling capacitors
the block
Uniform grid
ce
Rings are placed around blocks to assure even power distribution within
n
Usually used inside lower level partitions
e
a d
6/16/08
c BD03: Digital Physical Design 312
Trunks and Rings Methodology
Each block has its own ring
G V G V
structure
Each block has a trunk that
ce
connects the top level to the block block 3
V
Rings can be shared between block 5
n
G
abutted blocks
G
Requires less routing resources
e
block 2
V
Changes in design may require block 4
d
changes to power structure
V
ca V
Ring block 1
Trunk
G
G V G V G V
ce
V
block 4
V
block 5
Primary distribution through upper
n
G
metal layers
G
e
block 3
align with each other
V
block 4
d
G
ca block 1
V
G V G V G V
ce
Distribute power vertically within a ring
Typical power routing routes horizontally in metal 1 (including standard cell row
power rails) and vertically in metal 2
n
Metal 1
Power Stripe
Power Ring
Row of cells
d e
ca Metal 2
directions
ce
Created by layers of power straps going in alternate vertical and horizontal
n
targets are met
e
Example:
a d
6/16/08
c BD03: Digital Physical Design 316
Steps Involved in Power Routing
Create core power rings
e
Connect core power pads to the core power rings
c
n
Add power rings around the macros
e
Add power rails to the power plan for standard cell area
d
Modify power rails for macro power rings, routing blockages, and other restrictions
ca Add vertical and horizontal stripes to reduce IR drop at power rails of cells and macros
Connect power rails to cell power pins and extend to the power rings and connect with vias
e
Route power/ground along the standard cell rows
Follows the pins of each cell and stitches them together
nc
Connects these routes to power rings (and vertical stripes)
e
Connect dangling power routes to stripes/rings
d
Connect power rings to I/O power pads
ca
6/16/08 BD03: Digital Physical Design 318
Power Consumption
Power on a chip is consumed when it is active (dynamic power) as
well as inactive (leakage power).
Leakage power
ce
Power consumed when cells are not switching
n
Main sources of leakage power are sub-threshold leakage currents, which
reduce linearly with supply voltage
e
Dynamic power
d
It is the power associated with switching of nets and cells
It is calculated as Power = f x C x V2
a
How can the power consumption on a chip be reduced?
c
6/16/08 BD03: Digital Physical Design 319
ce
It aims at minimizing the supply voltage level wherever possible.
Instead of the chip operating from single uniform supply voltage, a
n
range of supply voltages are assigned to different areas of the chip.
It also assigns separate power-nets to different blocks, and steps the
e
power-net voltages down wherever the chip and block performance
allow.
a d
6/16/08
c BD03: Digital Physical Design 320
Discussion Question
Assuming the following chip diagram, what considerations should be taken into
account when designing a power plan?
ce Block1
1.0V
en
a d Block2
0.8V
Block3
1.2V
6/16/08
c BD03: Digital Physical Design 321
e
Power planning
Placement
nc
d e
ca
6/16/08 BD03: Digital Physical Design 322
Placement
Definition
e
Placement goals
ECO placement
nc
e
Incremental placement
d
Boundary scan
a
Scan chain re-order
6/16/08
c BD03: Digital Physical Design 323
What Is Placement?
Definition: Process of placing the standard cells in a floorplanned design
e
Example: The diagram shows a die area with no cells (left), and the cells
placed within the die (right).
nc
d e
ca
6/16/08 BD03: Digital Physical Design 324
Placement Goals
Goals of placement step are to
Guarantee that the router can complete the routing step
ce
Minimize all the critical net delays by placing cells close to each other,
thus reducing interconnect lengths
Minimize the die size as much as possible
en
Reduce routing congestions, if any
d
Bad placement can lead to sub-optimal routes and cause paths to fail
timing
ca
6/16/08 BD03: Digital Physical Design 325
ce
Standard cells are placed in rows that are drawn within the core area.
n
aligned correctly.
Placement should be routable and meet timing requirements.
d e
a
VDD VDD
CELL
c
GND GND
Standard Cell Row
e
VDD VDD Cells with
regular orientation
CELL
c
GND GND
Gap
n
VDD VDD
CELL
e
GND GND
d
Regular + Flipped Orientation, Shared Rows
a
VDD VDD Cell with
regular orientation
CELL
c
Shared rail GND GND Cell with
flipped orientation
CELL
VDD VDD
e
to be flipped and abutted so the pairs
can share power and ground rails. This
c
is the most common approach.
Second configuration is to flip every
n
other cell row but leave a gap between
every two cell rows mainly for routing
purposes. Creates larger power rails
e
and densely packed cell structure.
Last configuration is to leave a gap
d
between every cell row and not flip the
rows. Useful when only two or three
a
metal layers are available for routing.
c
The command to run placement is
placeDesign
ce
Placer identifies critical nets and performs placement to meet the
constraints. It pays less attention to meeting timing constraints on non-
n
critical nets, but more attention to enhancing routability.
Why do we need this?
d e
Growing interconnect versus gate delay ratios
Higher levels of on-die functional integration makes global interconnects
even longer
ca
Increased chip operating frequencies that makes timing closure tougher
Increased number of macros and standard cells for modern designs
e
Path-based
Tries to minimize the longest path delay
optimization
Net-based
nc
Complexity is high since it maintains an accurate timing view during
on individual nets
d e
First transforms timing constraints into either length constraints or weights
a
placement engine, obtains new placement with better timing
Complexity is lower compared to path-based algorithms
6/16/08
c BD03: Digital Physical Design 330
Engineering Change Order Placement
Engineering change order (ECO) placement is used to place unplaced
cells to a partially or fully placed design.
ce
In a partially placed design, unplaced cells are placed in timing-driven
mode followed by legalization (overlap removal).
In a fully placed design, only legalization step takes place.
en
Make sure that ECO logic changes do not exceed the previously
When ECO placement is run, it places only the cells that are unplaced.
d
It cannot move the cells that are fixed and makes only minor
a
modification to cells already placed.
The command to run ECO placement is
6/16/08
c ecoPlace
Incremental Placement
Incremental placement works on an already placed design to improve
overall quality and timing.
ce
To use incremental placement, the following command should be run
n
placeDesign –incremental
Regular placement
d e
The above command performs a two-pass placement flow.
a
Incremental placement
c
In addition to having placement information about all placed cells, it
maintains information about space available for adding new cells.
ce
tester to test connectivity of the
I/O pins on the fabricated chip
Provides a means to test
n
interconnects between integrated
circuits on a board without using
e
physical test probes
Is synonymous with Joint Test
d
Action Group (JTAG)
a
JTAG is the name used for the
IEEE 1149.1 standard entitled
c
Standard Test Access Port and
boundary-scan architecture to test
access ports.
Boundary Scan
Boundary scan adds one or more
memory elements, called
e
boundary-scan cells, to each I/O
pin of the device, which can
c
selectively override the
functionality of that pin.
n
The collection of boundary scan
cells is configured into a parallel-
e
in, parallel-out shift register.
Test sequence is passed into the
d
shift register, and the data coming
out is compared.
a
Boundary scan cells do not
contribute to the functionality of
c
the internal core logic.
Test access port (TAP) controller
is a state machine whose
transitions are controlled by a
TMS signal.
e
following signals to support
operation of boundary scan.
Test data is shifted around the
c
shift register in serial mode from
n
input pin Test Data In (TDI).
e
Test data is terminated at output
pin Test Data Out (TDO).
d
Test Clock (TCK) synchronizes
the internal state machine
a
operation.
c
Test Reset (TRST) is an optional
input pin to reset the TAP
controller’s state machine.
Test Mode State (TMS)
determines the next state.
6/16/08 BD03: Digital Physical Design 335
ce
Each JTAG compliant device contains an ID code.
Issuing a correct sequence of JTAG commands, the ID codes of all the
devices can be read out.
en
The ID codes read out from JTAG chain are compared with the actual
ID codes of the device. If they match, the JTAG chain is correctly
d
connected and the devices are in place.
a
Benefits of JTAG
Shorter test times
c
Higher test coverage
Increased diagnostic capability
Lower capital equipment cost
ce
All the registers in the design are connected in one or more scan
chains so that their inputs can be controlled and their outputs can be
observed.
en
Flip-flops have an extra signal called scan enable.
When Scan Enable is de-asserted, the flip-flop behaves normally and
passes the data.
d
When Scan Enable is asserted, all the flip-flops are connected into a long
a
shift register, with one end of the chain as primary input and the other end
primary output.
6/16/08
c BD03: Digital Physical Design 337
e
finally switching back to test mode to shift out the resulting flop values.
The resulting vector is compared with a known “good” vector to
nc
determine if the chip is functioning correctly.
Why do we need to re-order the scan chain?
e
During placement, cells are placed to meet functional timing and minimize
congestion, and the scan chain connectivity is ignored.
d
This results in long, inefficient routing between flops in the chain and
causes routing congestion.
a
Re-ordering the scan chain reduces congestion by connecting the
c
cells based on their placement.
It may cause hold time violations in the chain, and buffers may need to be
inserted to fix the same.
result?
ce
How would you reorder the scan chain after initial placement to get the optimal
n
DFF U1 DFF U10
d
DFF U6
e DFF U8
ca DFF U5
DFF U7
DFF U3
DFF U2 DFF U4
DFF U9
Summary
The back-end flow starts with floorplanning. Here is where we get to
see the physical chip.
ce
Floorplanning is a puzzle-fitting stage, where we have to fit modules
en
throughout the chip and meet the current requirements.
Placement of the cells and macros into the core area is to be done
with the ultimate goal of meeting timing and reducing congestion.
a d
Each step affects the overall goals of meeting timing and power
requirements. Quality time spent in floorplanning and power network
implementation reduces the number of iterations to achieving a
6/16/08
c
working chip that meets design specifications.
e
1. Boundary scan adds more functional logic to the existing internal
logic.
back-end flow.
nc
2. Timing-driven placement reduces the number of iterations through the
nets.
d e
4. Timing-driven placement tries to first meet routability on the critical
a
5. In floorplanning, a guide is considered to be a rigid constraint.
6. The DEF file is saved in binary format and can only be read in by the
6/16/08
c
tool.
Learning Activity
In this activity, you will
e
Study several examples of bad floorplans
Identify the bad practices from each and how you would correct them
c
Present your results to the class
n
e
20 minutes for activity
10 minutes for debriefing
d
ca
6/16/08 BD03: Digital Physical Design 342
Clock Tree Synthesis
Module 6
e
Combinational Combinational
Combinational
logic 1 Combinational
FF FF logic 2 FF
logic 1 logic 2
nc
e
CLK
a
FF
d
Combinational
Combinational
logic 1
logic 1
FF Combinational
Combinational
logic 2
logic 2
FF
CLK
6/16/08
c BD03: Digital Physical Design 344
Module Objectives
In this module, you will be able to
e
Explain a clock tree and why you need to create one
c
Describe the benefits of using useful skew versus classical zero skew
n
d e
ca
6/16/08 BD03: Digital Physical Design 345
Discussion Question
Recall the diagram of the design flow
steps to take an idea to product (chip)
ce
What part of the flow does Clock
Tree Synthesis (CTS) occur?
en ? Design Flow
Step
d
Input/Output ? CTS
a
?
6/16/08
c BD03: Digital Physical Design 346
Topics in This Module
Clock trees and clock tree synthesis
e
Clock tree specification
nc
d e
ca
6/16/08 BD03: Digital Physical Design 347
e
reference for the movement of data
within that system.
nc
inserted into the clock signal path
in such a way that the overall
delay from the generator to all
e
destinations is minimized.
d
Example: Instead of one electrical
signal path being optimized, the
a
path in the design was broken up
and strategically buffered to
c
minimize the delay. The resulting
network resembled a tree in that
the central clock signal branches
throughout the chip using these
buffers and ends up with the clock
signal reaching all of the leaf cells.
e
important.
c
Reasons why we need to build a clock tree:
Large chip area
Different flop densities
en
Non-uniform distribution of flops
All flops need to get clock signal at the same time
Power budget
a d
Clock routing: hard problem
c
The clock distribution network distributes the clock signal(s) from a
common point to all the elements that need it.
Ideal Clock
All flip-flops are clocked together
Block1
ce
Used prior to clock tree insertion and place and route for timing analysis
Block2
n
A data
CLK1
e
B data
CLK2
Ideal
d
CLK C data
CLK3
ca CLK3
CLK2
CLK1
Ideal
CLK
Note: This diagram assumes zero clock skew and insertion delay.
6/16/08 BD03: Digital Physical Design 350
Propagated Clock
Clock delays are extracted from clock tree routing
Block1
ce
More accurate and used in final timing closure
Block2
n
A data
CLK1
e
B data
CLK2
d
Propagated C
CLK data
CLK3
ca CLK3
CLK2
CLK1
Propagated
CLK
Different net lengths means different arrival time of clock at each flip flop
pin
ce
Delay and transition time affected by large number of elements connected to one
n
FF FF FF FF
d e
FF FF FF FF
a
Clock Source
c
FF FF FF FF
FF FF FF FF
e
Clock skew
Clock latency
Clock jitter
nc
d e
ca
6/16/08 BD03: Digital Physical Design 353
e
maximum time it takes the clock to
reach different leaf cells CTS Inserted
c
Buffers FF
Typically hurts performance of the
design, although in some cases
n
Clock
helps achieve timing targets Source
(useful skew) Minimum Insertion Delay
e
Caused by a clock tree with Maximum Insertion Delay
d
Same clock source
due to
a
Different types of buffers Clock
source
Varying capacitance and
c
resistance values of nets FF1
Gating components
Off-chip or on-chip variations FF2
ce
The clock tree is treated as ideal.
All combinational blocks must fit into the same fixed time period.
n
All registers are clocked at the same time.
e
You do not need knowledge of signal timing.
a d
A good classical skew minimization strategy does not necessarily
correlate with good performance.
6/16/08
c BD03: Digital Physical Design 355
advantage.
Helps meet setup and hold time.
ce
Increased latency but decreased clock period provides a net timing
period.
en
Some combinational paths require more time than the allowed clock
d
Adjusting clock delays to registers allows allocation of more time to
some paths and less on others.
ca
Time is borrowed from neighboring paths that have positive slack.
e
Delay Delay
Delay Delay
of of
FF of
5 ns FF of
2ns FF
c
5 ns 2ns
en
d
Time Period 4 ns Time Period 4 ns
1 ns margin
Propagated clock
obtained by Delay Delay with useful skew
a
Delay Delay
speeding up of of
FF of
5 ns FF of
2 ns FF
source clock 5 ns 2 ns
6/16/08
c 1ns
e
Delay Delay
Delay Delay
of of
FF of
5 ns FF of
2 ns FF
c
5 ns 2 ns
en
d
Time Period 4 ns Time Period 4 ns
Propagated clock
Delay Delay with useful skew
a
Delay Delay
of of
FF of
5 ns
5 ns
FF of
2 ns
2 ns
FF
6/16/08
c 1ns
358
Discussion Questions
In a circuit after clock tree synthesis and a clock period of 5 ns, there is a
-1 ns worst-case negative slack.
ce
Will useful skew always improve the timing path?
n
What other ways could you improve the timing of this path?
d e
ca
6/16/08 BD03: Digital Physical Design 359
e
Insertion delay is also known as clock network latency.
c
n
FF
d e
CTS Inserted Buffers
FF
a
Clock Source
c
Minimum Insertion Delay
e
period
The common sources of jitter are
Internal circuitry of the phase-locked loop (PLL)
c
Random thermal or mechanical noise from a
crystal vibration
n
Other resonating devices Jitter
Signal transmitters
e
Crosstalk
VCC sag
Ground bounce
a d
Electromagnetic Interferences from nearby
devices
c
There are three types of clock jitter:
Period Jitter
Cycle-to-cycle jitter
Long-term jitter
e
period
The deviation is either leading or
lagging the ideal position.
c
Ideal clock
Measured and expressed in time or edge location
frequency
n
Period Jitter
Used to calculate timing margins in
e
systems
a d
6/16/08
c BD03: Digital Physical Design 362
What Is Cycle-to-Cycle Jitter?
Change in a clock’s output Ideal clock Lesser clock Ideal clock
transition from its corresponding period (T) period (T1) period (T)
e
position in the previous cycle.
c
Large cycle to cycle jitter can Jitter = T-T1
cause a system to fail.
n
Most difficult type of jitter to
measure.
d e
ca
6/16/08 BD03: Digital Physical Design 363
e
Ideal clock
edge location
a clock’s output transition from its
Cycle 0
c
ideal over a large number of Ideal clock
cycles edge location
Cycle N
n
Long-term Jitter Jitter
d e
ca
6/16/08 BD03: Digital Physical Design 364
Types of Clock Trees
A clock tree can be implemented in the following styles:
e
Binary tree
H tree
nc
d e
ca
6/16/08 BD03: Digital Physical Design 365
Length of a to d = Length of a to g
e
Same length, so same delay.
b
c
Results in a clock skew between the
c
clock signals at d and g.
Drawback
en
The branch affect – The clock signals
from b to e and f contribute a capacitance
d e f
Conceptual structure
g
d
that would actually increase the delay
d
from a to g.
a
As the size of the clock distribution tree
e
increases, the effects on clock signal a b c
c
become worse.
f
g
Physical structure
ce
Provides equal propagation delays to each
n
the memory elements in equal lengths
Clock Source
e
Drawback
Total wire lengths is much greater compared
d
to standard clock tree
Increased capacitance of the H-tree structure
ca
6/16/08 BD03: Digital Physical Design 367
e
elements such as rams and flops.
nc
Definition: Process of inserting buffers in the clock path, with the goal
to minimize clock skew and latency to optimize for timing
Example: We ran clock tree synthesis on the example block and saw
e
a large clock skew due to bad clock constraints. We ended up re-
running clock tree synthesis with better constraints to get an optimal
result.
a d
6/16/08
c BD03: Digital Physical Design 368
Need for Clock Tree Synthesis
Clock signals are typically loaded with the greatest fanout.
e
Differences and uncertainty in the arrival times of the clock signals can
severely limit the maximum performance of the entire system.
nc
Design needs to be operated at the highest speeds of any signal.
Clock signals are affected by technology scaling (Moore’s law).
e
Long global interconnect lines become significantly more resistive as
line dimensions are decreased.
a d
Catastrophic race conditions can be created in which an incorrect data
signal may latch within a register.
6/16/08
c BD03: Digital Physical Design 369
e
Verilog® netlist
Timing library, which contains the timing
c
information for each discrete logic gate or
macro Tech
Netlist File
n
Physical library, which contains
information about the shape and Clock Tree Specification file DEF
File
e
connectivity of the technology library cells
Clock Tree Synthesis
Placement information such as a DEF file Phys
Lib
d
Outputs
Netlist with clock tree inserted
a
Reports on the results of the run in ASCII
text or HTML format
c
Routing guide files for clock tree Routing Macro
Netlist Reports
Guides Models
preroutes to be used during trial routing
Macro model files for partitions or
modules
RTL
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
Route
GDSII GDSII
After CTS
Clock buffer tree is built to balance output loads and minimize clock skew.
e
Buffers can be added to the network to meet the minimum insertion delay
c
FF FF FF FF
en FF FF FF FF
a
Clock Source
d FF FF FF FF
6/16/08
c FF FF
372
Goals of CTS
Deliver clock to all memory elements with
Acceptable skew
ce
Least amount of insertion delay
en
a d
6/16/08
c BD03: Digital Physical Design 373
e
of core logic
This ensures that the timing
performance of the core logic is
c
met. Scope of clock tree
First define/understand scope or
extent of the clock tree
en
This would include items such as
total load, routing area, distance
Define clock tree
constraints
CTS
d
the clock has to travel, available
routing layers, and routing
Define clock tree
restrictions.
a
topology
6/16/08
c BD03: Digital Physical Design
Insert clock tree
374
Steps Involved in CTS (continued)
Define the constraints that the
Initial placement
clock tree must satisfy.
e
of core logic
Include minimum and maximum
c
insertion delay and maximum
skew Scope of clock tree
n
This is part of the clock tree
specification file. Define clock tree
e
constraints
CTS
a d Define clock tree
topology
6/16/08
c BD03: Digital Physical Design
Insert clock tree
375
e
of core logic
including
Number of levels or buffer stages
c
in the tree Scope of clock tree
Type of buffers/inverters
Fanout limit at each level
d
automatically by a clock tree
generator tool. Define clock tree
a
topology
This is part of the clock tree
c
specification file.
Insert clock tree
e
of core logic
logic cells.
c
The buffers are placed or inserted
in strategic placed to minimize the Scope of clock tree
clock delay and routing.
CTS
a d Define clock tree
topology
6/16/08
c BD03: Digital Physical Design
Insert clock tree
377
e
of core logic
with optimization for meeting all
timing goals.
c
This step is optional and can be Scope of clock tree
done along with CTS or with the
n
routing phase.
Define clock tree
e
constraints
CTS
6/16/08
c BD03: Digital Physical Design
Insert clock tree
378
CTS Operation Modes
There are two modes for running CTS:
e
Manual CTS allows user to control
Number of levels
Number of buffers
nc
Types of buffer at each level
d e
Numbers depend on timing constraint in the clock tree specification file.
CTS traces the clock net through buffers, inverters, and gated elements.
a
In most cases, you would use automatic CTS. In case you have issues with
c
the clock tree (skew, etc.), you can specify the CTS manually. In some
cases where the design is very regular or very high speed, an experienced
designer will manually specify the CTS constraints to better control the
output.
e
Clock tree specification
nc
d e
ca
6/16/08 BD03: Digital Physical Design 380
CTS Guidelines
There are two CTS modes for specifying the clock tree:
Manual CTS
Automatic CTS
ce
Both modes require a clock specification file to create the clock tree.
n
In manual CTS mode, the clock tree structure has to be specified by
the user.
d e
In automatic CTS mode, the tool automatically creates the clock tree
structure from the specification file.
a
Automatic CTS is the preferred method of creating the clock tree.
6/16/08
c BD03: Digital Physical Design 381
e
Using the Create Clock Tree Spec form (GUI)
nc
Using the specifyClockTree command with –template parameter. This
method creates a basic clock tree specification template file,
template.ctstch.
d e
Each method is similar and will allow the user to easily create a clock tree
specification. The first option uses the GUI, whereas the other commands
use the command line.
ca
The GUI command allows the user to fill in all of the values, and then a
clock specification file is generated. In the other commands, a template is
created and the user must modify the values.
e
order given below. Individual statements
within each section can appear in any
c
order. The contents in the specification CLOCK SPEC FILE
file are
n
Timing Constraint File
e
Naming attributes (optional) Macro Model Data
d
Router Attributes
Clock grouping data (optional) Requirements for manual/automatic
a
and Gated CTS
Attributes used by the routing tool
c
(optional)
Requirements for manual CTS or
automatic, gated CTS
n
Timing Constraint File
TimingConstraintFile /path/cts.tcl
Naming Attributes
e
Macro Model Data
d
Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
c BD03: Digital Physical Design 384
Naming Attributes
Allows user to customize the
name delimiter that CTS uses
e
when inserting buffers and
updating clock root and net names
c
CLOCK SPEC FILE
The UseSingleDelim command
instructs CTS to use a single
n
Timing Constraint File
e
characters for the given delimiter. Macro Model Data
Default: clk__L3_I2
Clock Grouping Data
d
With UseSingleDelim YES: Router Attributes
clk_L3_I2 Requirements for manual/automatic
a
and Gated CTS
Example
c
UseSingleDelim YES
NameDelimiter #
e
has delays have to specified for
pins.
c
CLOCK SPEC FILE
Example
n
Timing Constraint File
MacroModel pin m1/clk 20ps 18ps 20ps
18ps 30ff Naming Attributes
e
Macro Model Data
d
Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
c BD03: Digital Physical Design 386
Clock Grouping Data
Specifies two or more clock
domains for which you want CTS
e
to balance the skew
c
The arguments are the clock root
CLOCK SPEC FILE
pin names.
n
Timing Constraint File
Example
Naming Attributes
ClkGroup
e
Macro Model Data
+ U1/CGEN_1
Clock Grouping Data
d
+ U2/CGEN_2
Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
c BD03: Digital Physical Design 387
Router Attributes
Defines attributes that CTS
passes to the router for routing the
e
clock net.
c
Example
CLOCK SPEC FILE
RouteTypeName CK1
n
Timing Constraint File
NonDefaultRule rule1
Naming Attributes
PreferredExtraSpace 1
e
TopPreferredExtraSpace 1 Macro Model Data
d
Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
c BD03: Digital Physical Design 388
Requirements for Manual/Automatic CTS and Gated CTS
All of the “optional” clock tree
specification sections were
e
mentioned in the previous
sections.
c
CLOCK SPEC FILE
In the next few slides, we will
discuss the requirements for the
n
Timing Constraint File
e
Manual CTS Macro Model Data
d
Clock Gated CTS Router Attributes
Requirements for manual/automatic
a
and Gated CTS
6/16/08
c BD03: Digital Physical Design 389
e
ClockNetName CK
LevelNumber 2
To Flip flops
c
LevelSpec 1 2 BUFX2
LevelSpec 2 16 BUFX3
n
PostOpt YES
e
OptAddBuffer YES CK CK
End
a d
To Flip flops
Level 1, 2 BUFX2
c
Level 2, 16 BUFX3
e
Phase Delay 1
AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns clk_out/Y
Flip Flops
MaxFanout 30 CTS Buffer 2
SinkMaxTran 500ps
n
Sink Input Max Skew
CTS Buffer 1 Transition Time
BufMaxTran 500ps Phase Delay 2 FPU/CORE
MaxSkew 600ps or can be a
e
std cell
NoGating NO CTS Buffer 4
MaxDepth 10 Buffer Input
Pin FPU/CORE/A
CTS Buffer 3
d
Transition Time
RouteType CLK1_ROUTE
DetailReport YES XPU/CAM
a
RouteClkNet YES
PostOpt YES CTS Buffer 5
c
OptAddBuffer YES Pin XPU/CAM/C
Buffer BUFX2 BUFX4 BUFX8
INVX1 INVX2 INVX4
End
e
pin name from which to start tracing AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
delay. If this statement is not specified, AutoCTSRootPin clk_out/Y
the tool automatically sets the delay to MaxDelay 5ns
c
10 ns MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
If this statement is not specified, the AutoCTSRootPin clk_out/Y
tool automatically sets the delay to MaxDelay 5ns
c
0.0 ns MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
connected to the clock buffer at the AutoCTSRootPin clk_out/Y
last stage of the clock tree. MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
time constraint for the sinks. The AutoCTSRootPin clk_out/Y
maximum value is 10,000 ns. The MaxDelay 5ns
c
default value is 400 ps. MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
time constraint for buffers. The AutoCTSRootPin clk_out/Y
maximum value is 10,000 ns. The MaxDelay 5ns
c
default value is 400 ps. MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
sinks (clock pins). The default value is AutoCTSRootPin clk_out/Y
300 ps. MaxDelay 5ns
c
The lower the skew, the better the MinDelay 0ns
clock tree, and hence the better overall MaxFanout 30
n
timing performance for the design. SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
gates AutoCTSRootPin clk_out/Y
Rising: Stops tracing through a gate MaxDelay 5ns
c
(including buffers and inverters) and MinDelay 0ns
treats the gate as a rising-edge- MaxFanout 30
triggered flip-flop clock pin.
n
SinkMaxTran 500ps
Falling: Stops tracing through a gate BufMaxTran 500ps
(including buffers and inverter) and
e
MaxSkew 600ps
treats the gate as a falling-edge-
triggered flip-flop clock pin. NoGating NO
d
MaxDepth 10
NO: Default behavior for gated-clock
RouteType CLK1_ROUTE
designs. Allows CTS to trace through
a
clock gating logic. DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
tracing. The default value is 1024, i.e., AutoCTSRootPin clk_out/Y
CTS limits the number of levels of MaxDelay 5ns
c
clock tree tracing to 1024. MinDelay 0ns
Tracing is done by CTS (before MaxFanout 30
n
inserting buffers) to understand the SinkMaxTran 500ps
logical structure of the design and see BufMaxTran 500ps
that there are no feedback loops.
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
routing attributes are being defined. AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
detailed report, which includes timing AutoCTSRootPin clk_out/Y
information for every component in the MaxDelay 5ns
c
design. Default behavior is not to MinDelay 0ns
generate a detailed report.
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
clock nets. Default behavior is not to AutoCTSRootPin clk_out/Y
route the clock net. MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
optimization, i.e., it resizes buffers or AutoCTSRootPin clk_out/Y
inverters, refines placements, and MaxDelay 5ns
c
corrects routing for signal and clock MinDelay 0ns
wires. Default: YES,
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
during optimization. Effective only if AutoCTSRootPin clk_out/Y
PostOpt YES is specified. MaxDelay 5ns
c
Tries to meet the trigger edge skew MinDelay 0ns
constraints as defined in the clock tree MaxFanout 30
n
specification file. SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
use during automatic gated CTS. AutoCTSRootPin clk_out/Y
MaxDelay 5ns
c
MinDelay 0ns
MaxFanout 30
n
SinkMaxTran 500ps
BufMaxTran 500ps
e
MaxSkew 600ps
NoGating NO
d
MaxDepth 10
RouteType CLK1_ROUTE
a
DetailReport YES
RouteClkNet YES
c
PostOpt YES
OptAddBuffer YES
Buffer BUFX2 BUFX4 BUFX8 INVX1
INVX2 INVX4
End
e
How to create a clock tree specification file
nc
d e
ca
6/16/08 BD03: Digital Physical Design 407
CTS Report
After running CTS on a design, a report is created containing
information about the clock tree constructed. The report contains
e
several sections.
c
Library Information: The process information used to create the clock tree.
Example
#
#
#
en
Complete Clock Tree Timing Report
CLOCK: cgen/i_5/Y
a
#
#
#
d
Mode: preRoute
Library Name : slow
Operating Condition : slow
c
# Process : 1
# Voltage : 1.62
# Temperature : 125
Example
Nr. of Subtrees : 1
ce
n
Nr. of Sinks : 343
Nr. of Buffer : 9
e
Nr. of Level (including gates) : 2
Max trig. edge delay at sink(F):
d
TPRAM/mod1/CK 477.7(ps)
a
Min trig. edge delay at sink(R):
TPRAM/mod2/CK 459.6(ps)
6/16/08
c BD03: Digital Physical Design 409
e
Example
c
Actual) (Required)
Rise Phase Delay : 459.6~477.7(ps) 0~5000(ps)
Fall Phase Delay : 432.8~446.7(ps) 0~5000(ps)
n
Trig. Edge Skew : 18.1(ps) 250(ps)
Rise Skew : 18.1(ps)
e
Fall Skew : 13.9(ps)
Max. Rise Buffer Tran : 238.5(ps) 550(ps)
Max. Fall Buffer Tran : 141.4(ps) 550(ps)
d
Max. Rise Sink Tran : 366.2(ps) 550(ps)
Max. Fall Sink Tran : 204.5(ps) 550(ps)
a
Min. Rise Buffer Tran : 120(ps) 0(ps)
Min. Fall Buffer Tran : 120(ps) 0(ps)
Min. Rise Sink Tran : 340.6(ps) 0(ps)
c
Min. Fall Sink Tran : 192(ps) 0(ps)
Example
ce
***** Max Transition Time Violation *****
Pin Name (Actual) (Required)
n
-----------------------------------------------------------------
reg/CK [406 353.5](ps) 400(ps)
reg2/CK [406 353.4](ps) 400(ps)
e
clk0__L6_I2/A [345.5 288.1](ps) 300(ps)
clk0__L7_I4/A [346.2 296.3](ps) 300(ps)
clk0__L9_I11/A [351.6 299.9](ps) 300(ps)
d
clk0__L9_I10/A [361.5 305.9](ps) 300(ps)
ca
6/16/08 BD03: Digital Physical Design 411
e
Example
c
cgen/i_5/Y delay[0 0] ( CK__L1_I0/A )
********** Skew Distribution **********
LEVEL 1 Buffer:
n
Input Delay Range Nr of Buffers
[0.6 0.6] 1
(max, min, avg, skew) = (0.6(ps) 0.6(ps) 0.6(ps) 0(ps))
e
-------------------------------------------------------
Output Delay Range Nr of Buffers
d
[195.5 195.5] 1
(max, min, avg, skew) = (195.5(ps) 195.5(ps) 195.5(ps) 0(ps))LEVEL 2
Buffer:
a
-------------------------------------------------------
Input Delay Range Nr of Buffers
c
[212.8 212.8] 1
(max, min, avg, skew) = (212.8(ps) 212.8(ps) 212.8(ps) 0(ps))
e
How to create a clock tree specification file
nc
d e
ca
6/16/08 BD03: Digital Physical Design 413
ce
Significant power can be wasted in transitions within blocks, even when their
en
a d
6/16/08
c BD03: Digital Physical Design 414
Low-Power Clocking Technique
Gated clocks
Involves adding logic gates to the clock distribution tree
ce
Prevents switching in the areas of the chip not being used
Exact savings are very design dependent, but around 20-30% is often
achievable.
en
FF FF
a d FF
Gated Clock Section
c
Clock Source FF
FF
Summary
Clock signal is needed to synchronize all memory elements in a chip.
e
Clock tree has to be created to provide clock signal with the least
amount of skew and insertion delay.
combinational logic.
nc
Skew affects the amount of clock period available for the
e
flops to correct datapath timing violations.
d
Clock tree has to provide an acceptable input transition to all the flip
flops.
a
Low-power designs make use of gated clocks.
c
6/16/08 BD03: Digital Physical Design 416
Testing Your Understanding
True or false
e
1. Clock tree adds more wire into the design as compared to a clock
mesh.
nc
2. Propagated clock signal arrives at all flip-flops within a design at the
d e
4. A clock tree specification file is needed in the case of both manual
a
5. Default behavior of clock tree synthesis is to only place the clock
buffers into the design and not route them.
6/16/08
c BD03: Digital Physical Design 417
ce
en
a d
6/16/08
c BD03: Digital Physical Design 418
Routing
Module 7
e
DFF BUF NAND
c
CKBUF CKBUF CKBUF DFF
DFF
n
CKBUF NOR CKBUF INV
d e
a
DFF BUF NAND
6/16/08
c CKBUF
DFF
NOR
420
Module Objectives
In this module, you will be able to
e
Analyze benefits of timing-driven versus congestion-driven routing
nc
Explain how trial routing works and its benefits
e
fixing
a d
6/16/08
c BD03: Digital Physical Design 421
Discussion Question
Recall the flowchart diagram of the
design flow steps to take an idea to
e
product (chip)
c
In what part of the flow does
routing occur?
en ? Design Flow
Step
d
Input/Output ? Routing
a
?
6/16/08
c BD03: Digital Physical Design 422
Topics in This Module
Routing
e
Various types of routing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 423
Routing
Definition
e
Routing inputs and outputs
Router
Types of routers
Routing tracks
nc
e
Goals of routing
d
Steps involved in routing
a
Congestion
6/16/08
c BD03: Digital Physical Design 424
What Is Routing?
After placement of the individual
standard cells and macros, the
e
connections between the pins of the
cells need to be formed using metal
c
wires and vias. All wires connecting the
placed components have to obey the
n
design rules.
e
Definition: Process of connecting the
pins of the standard cells, macros, and
d
IOs of a digital design to specific metal
layers in the process technology to
a
match the schematic.
6/16/08
c BD03: Digital Physical Design 425
e
Timing library (.lib) contains the Tech
Netlist File
timing information for each
c
discrete logic gate or macro
Technology library (LEF) contains
n
DEF
information about the routing File
layers and their rules
Routing
e
Physical cell library (LEF) contains
information about the shape and Phys
connectivity of the technology Lib
d
library cells
Placement information such as a
a
DEF file
Routing guides from clock tree Routed Congestion
c
synthesis (CTS) (optional) Design table
Outputs
Routed design
Congestion table
RTL
ce Logic Synthesis
Gates
Timing
Closure Place
and
en Floorplanning
Static
Timing
Analysis
Test
d
Power Planning
Route
Placement
Route
GDSII GDSII
Router
To handle various cost functions and constraints of deep submicron
layouts, router needs the capability to handle
Variable wire widths
ce
Variable spacing requirements
n
Shielding and interleaving
e
Minimum area rules
a d
6/16/08
c BD03: Digital Physical Design 428
Types of Routers: Grid-Based
Most commonly used router, because it is fast and mature
Performs well for flat designs less than 3 million gates and for 130 nm
and larger designs
Used for block-based designs
Relatively high-speed router
ce
the routing area
en
Superimposes a mesh-like grid running horizontally and vertically over
d
Each vertical and horizontal grid intersection point on the mesh is
maintained as a pointer in memory
a
The larger the design grows or the smaller the process geometry, the
more grid points need to be allocated in memory and the more time it
c
takes for routing
A trial-router is a type of grid-based router used to quickly perform
global and detail routing to estimate congestion and timing at the early
stages of the physical implementation flow
grid restrictions
ce
Does not need to adhere to the concept of grid and so is not limited by
Preferred solution for top-level routing and can handle complex and
custom requirements
en
a d
6/16/08
c BD03: Digital Physical Design 430
Types of Routers: Graph-Based
Combines the performance characteristics of a grid-based router with
the flexibility of a shape-based router
ce
Fast tool capable of handling all aspects of routing complex multi-
million gate designs, both at block level and top level
Views a design similar to a grid-based router in that there are grid
en
lines in both the vertical and horizontal direction, however it considers
these grids only as a guideline for routing
Does not require that every grid intersection on the design be
a d
allocated a pointer in memory, only the grid points in the vicinity of the
routing task will be considered as needed
Through efficient memory handling, graph-based routers can handle
6/16/08
c
significantly larger design sizes
Types of Routers
Super Threading
Multi-CPU
ce Graph-based
router
100-million-gate SoC
designs with hierarchy
65-nm variable pitch
n
Speed and Capacity
e
Designs of one million
Grid-based standard cells
d
routers Best for flat 130 nm
and above
ca Flexibility
Shape-based
routers
60–80K nets
structured custom
(Top level)
e
In grid-based routing systems, design rules determine the minimum center-to-
center distance for each metal layer
nc
Congestion occurs if there are more wires to be routed than available tracks
Detailed routing track is track for actual wire locations
Global routing track is coarser track for global routing
Detailed
routing
d e Global
routing
a
track track
6/16/08
c BD03: Digital Physical Design 433
Goals of Routing
Responsible for functionally connecting all signal nets, power nets,
and buses in a design
ce
Route the design quickly and be free from design rule check (DRC),
layout versus schematic (LVS), and signal integrity (SI) errors
Effectively meet design for manufacturability and overall timing
specifications
en
a d
6/16/08
c BD03: Digital Physical Design 434
Steps Involved in Routing
Global routing
Assigns nets to specific metal layers and global routing cells
ce
Tries to avoid congested global cells while minimizing detours
Tries to avoid prerouted power and ground signal, placement, and routing
blockages
n
Track assignment
e
Assigns each net to a specific track
Tries to avoid large number of vias
d
Operates on the entire design at once
a
Detail routing
Tries to fix DRC violations using a fixed-size, small area known as SBox
c
Traverses the whole design box by box until entire routing pass is
complete
Search and repair
Fixes any shorts or violations that are present
What Is Congestion?
Congestion occurs when
Design is densely routed
ce
More wires are needed at a location than the number of available tracks
n
trial route
d e
ca
6/16/08 BD03: Digital Physical Design 436
Analyzing Congestion
Actions to consider:
e
Block placements can be adjusted to make sure that connecting pins
face each other.
nc
Check for obstructions that may cause the congestion in the area.
d e
Read the log files for congestion information during global route, as
well as violation and iteration information during detail route.
ca
6/16/08 BD03: Digital Physical Design 437
e
#
# OverCon OverCon OverCon OverCon
c
# #Gcell #Gcell #Gcell #Gcell %Gcell
# Layer (1-2) (3-4) (5-6) (7-17) OverCon Worst case
n
# -------------------------------------------------------------------------------------------------------- on Metal2
# Metal 1 1625(2.35%) 34(0.05%) 0(0.00%) 0(0.00%) (2.40%)
# Metal 2 11546(16.7%) 6353(9.19%) 4728(6.84%) 3787(5.48%) (38.2%)
e
# Metal 3 8500(12.3%) 904(1.31%) 37(0.05%) 1(0.00%) (13.7%)
# Metal 4 14951(21.6%) 764(1.11%) 20(0.03%) 0(0.00%) (22.8%)
d
# Metal 5 8473(12.3%) 37(0.05%) 0(0.00%) 0(0.00%) (12.3%)
# Metal 6 854(1.24%) 0(0.00%) 0(0.00%) 0(0.00%) (1.24%)
a
# --------------------------------------------------------------------------------------------------------
# Total 45949(11.1%) 8092(1.95%) 4785(1.15%) 3788(0.91%) (15.1%)
#
c
# The worst congested Gcell OverCon (routing demand over resource in number of tracks) = 17
Note: Overflow/OverCon = (Demand – Supply) per gcell
e
…
9 Run the search and repair
# completing 100% with 98663 violations
up to the 19th iteration.
c
# number of violations = 98663
#Complete Detail Routing. 9 Check the log file on which
#Total wire length = 559006793 um. layers the violations occur.
n
#Total half perimeter of net bounding box =
471147199 um. 9 Check the violations
graphically if there are lots
e
#Total wire length on LAYER MT1 = 18600362 um.
… of violations (>1000).
#Total number of vias = 31662170
d
#Total number of vias on LAYER MT1 to MT2 =
11200471
a
…
#Total number of DRC violations = 98663
#Total number of violations on LAYER MT1 = 61945
c
#Total number of violations on LAYER MT2 = 5161
#Total number of violations on LAYER MT3 = 280
#Total number of violations on LAYER MT4 = 258
#Total number of violations on LAYER MT5 = 124
#Total number of violations on LAYER MT6 = 414
#Total number of violations on LAYER MT7 = 30481
…
e
Various types of routing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 440
Various Types of Routing
Trial route
e
Global routing
Detail routing
Timing-driven routing
Congestion-driven routing
nc
e
Incremental routing
d
Process Antenna Effect (PAE)-aware routing
a
SI-aware routing
Clock routing
6/16/08
c
Super-threading routing
Diagonal routing
e
Estimate and view routing congestion: Produces a congestion map
that is viewed to get early feedback on whether a design is routable
nc
Estimate parasitic values for optimization and timing analysis: Creates
actual wires to get good representation of RC and coupling
d e
ca
6/16/08 BD03: Digital Physical Design 442
Trial Route Effort Level
Prototyping
Runs quickly to gauge the feasibility of the netlist
Medium effort
Default selection
ce
Components in the design might not be routed at legal locations
High effort
en
For additional iteration to lower congestion
d
Low effort
For quick routing, and it completes without congestion detouring
a
At this effort level, you throw away the route information
c
This mode is typically used only when you partition the design
Use “Prototype” or “Low” effort if you want to have a very quick look at
the routability of the design. Use “Medium” effort in most cases, and
“High” effort if “Medium” is showing congestion.
Trial Route
Advantages Disadvantages
Routes the design quickly
ce
Estimates congestion and parasitic data
Does not fix DRC violations or give DRC
clean routing results
Routes are only used to estimate
n
early in the design cycle parasitic values for timing analysis and
e
not signal integrity analysis
d
amount of congestion in each metal
layer
ca
6/16/08 BD03: Digital Physical Design 444
Discussion Questions
What are the benefits of running trial route?
e
What issues can be predicted by running trial route?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 445
ce
Does not create actual routing wires
n
Commonly used in cell-based design, chip assembly, and datapath
e
Also used in floorplanning and placement
a d
6/16/08
c BD03: Digital Physical Design 446
Global Routing Goals
Minimize the wire length
Total wire length calculated by global router should be within a few
ce
percentage points of that estimated by the placer
en
between adjacent global routing cells (gcells) on a specific layer
Optimize routes for timing and signal integrity
Tries to meet hold and setup timing
d
Minimizes design rule violations
a
6/16/08
c BD03: Digital Physical Design 447
e
Start
c
nets to the gcells
Router attempts to find the
n
shortest path through the gcells
No actual connections are made
d e
No nets are assigned to specific
ca End
e
accommodate Start
gcell
nc Start
e
Start
d
Start
ca End
End
End
End
e
Start
Congestion map uses colors to gcell
indicate whether there are too
c
few, too many, or the correct
Start
number of nets assigned to the
n
gcells
gcells are marked over-congested
e
Start
if router assigns too many nets to
a gcell
d
Start
ca Edge has
3
crossings
End
End
End
End
Edge has
2
crossings
ce
en
a d
6/16/08
c
Trial Route View Global Route View
e
Must understand most or all design rules
nc
Goal is to complete all of the required interconnects without violations
All nets will be routed, even if they contain violations (It is better to
e
have a route with a violation, than no route at all.)
a d
6/16/08
c BD03: Digital Physical Design 452
Detail Routing Steps
Router divides the chip into areas called switch boxes (SBoxes)
SBoxes align with gcell boundaries
ce
Router follows global routing plan
Lays down actual wires that connect the pins to the corresponding nets
n
Creates shorts or spacing violations rather than leave unconnected nets
d e
Locates the shorts and spacing violations
Reroutes affected areas to eliminate as many violations as possible
a
Runs post-route optimization
Runs rigorous search-and-repair steps
6/16/08
c
Stops once it cannot make further progress on routing the design
Timing-Driven Routing
Routing along the timing-critical path is given priority
e
Creates shorter and faster connections along the critical path
nc
Reduces routing congestion problems for critical paths
Does not adversely impact timing of non-critical paths
e
Input files needed for timing-driven routing
Physical libraries in LEF
a d
Timing library in .lib format
Timing constraints in .sdc format or a timing graph
Extended capacitance table
6/16/08
c
Verilog Netlist
Placed design in DEF
ce
Congestion reduction is given the highest priority
Nets that are in the congested area are spread apart and routed
n
through other areas
d e
ca
6/16/08 BD03: Digital Physical Design 455
Discussion Questions
Why would you run timing-driven routing?
e
Why would you run congestion-driven routing?
nc
What is your design is both congested and not meeting timing? Which
routing type would you run first and why?
d e
ca
6/16/08 BD03: Digital Physical Design 456
What Is Incremental Routing?
Provides an incremental rip-up and reroute capability
e
Reroutes partial routes and nets without routes
nc
Might use dangling paths to complete routes, but removes dangling
wires left over from global routing
e
Keeps connectivity within the bounding box, but does not constrain
layers or positions
d
The router might change the routing path of another net and route it on a
different layer or in a different position.
ca
The router does not support re-routing of wires with the FIXED keyword.
Change FIXED to ROUTED to reroute these wires.
PAE-Aware Routing
During manufacturing, static charge builds up on metal traces
Metal with static charge accumulated on it, when connected, will discharge
ce
onto a gate, passing high current through it.
The discharge can damage the oxide that insulates the gate and cause
the chip to fail.
n
Antenna ratio is the maximum allowable ratio of metal area to gate area.
e
The router calculates antenna ratio to determine the extent of PAE.
Process antenna violations are fixed when the router finds a net with
d
an antenna ratio for a specified layer that exceeds the maximum
allowed value.
a
Router fixes process antenna violations by
c
Inserting diodes to provide alternate path to discharge static charge and
protect the gate
Changing (jogging) the routing layers connected to a gate to decrease the
area of a metal layer connected to a gate to meet the antenna ratio
ce
n
Bridging (breaking antenna by hopping to higher layer)
Extra wiring
Congestion
d
More vias are created
e
ca
6/16/08 BD03: Digital Physical Design 459
ce
Crosstalk-induced delay changes
Functional failures caused by crosstalk glitches
Caused by
Coupling capacitance
en
Decreased interconnect pitch and features size
d
Higher clock frequencies
Lower supply voltages
ca
6/16/08 BD03: Digital Physical Design 460
SI-Aware Routing
Crosstalk effects such as glitch and delay are measured after the
physical wires are made available.
ce
Router tries to reduce crosstalk between wires.
n
Parallel wire minimization: Limiting the distance that two wires travel
adjacent to each other
e
Layer switching: Changing the track assignment for a wire so that potential
victim nets can be moved away from a strongly driven signal net
a d
Net shielding: Using power and ground lines to shield critical high-speed
signals such as clocks
Track reassignment: Assign tracks to parallel wires that are further apart
c
with in-between tracks assigned to shorter, less noise-sensitive wires
Soft spacing: Making use of available free space to spread wire segments
apart
Discussion Questions
Why would you use incremental routing?
e
For process antennas, how is the router constrained to fix these?
How can a router make choices that will reduce the effect of signal
integrity?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 462
Clock Routing
Usually routed same way as signals, but we can choose to route clock
nets by themselves before routing other nets.
ce
Clock nets given priority during global routing.
n
When clock nets are routed, one track of spacing can be added
around these nets to improve coupling capacitance.
d e
Shielding can also be added to clock net for additional signal integrity.
Clock routes can be marked as fixed, so that post-route optimization
a
will not reroute the clocks and alter the clock skew, timing, etc.
6/16/08
c BD03: Digital Physical Design 463
e
Router widens wires where resources are available.
Does not add DRC
nc
Does not add antenna violations
d e
ca
6/16/08 BD03: Digital Physical Design 464
Super-Threading Routing
Portions of the design flow can be Multiple Threading Distributed Processing
accelerated using multiple-CPU
e
processing. There are three modes: Job Job
c
Multiple threading
Job is divided into several threads
n
Thread
Multiple processors in a single
machine process each thread
e
concurrently
Processor Processor Processor
Distributed processing
d
Job is processed by two or more Super Threading
networked computers running
a
concurrently Job
c
Super threading
Combination of multithreading and
distributed processing Thread Thread
10X
ce
Boosts routing performance on 600K to 400M gate designs by up to
n
Reduces design cycles significantly without sacrificing quality
Tasks are partitioned among different CPUs automatically by the
e
router
d
Speedup is nearly linear as the number of CPUs grows
ca
6/16/08 BD03: Digital Physical Design 466
What Is Diagonal Routing?
Some routers take advantage of
45-degree “diagonal” routes on
e
certain metal layers.
c
M1 and M2 are “orthogonal” so
that the connections to the M8 - Vertical
standard cells are preserved,
n
M7 – Horizontal
while M7 and M8 (the top layers)
M6 – 45 Degree Left
e
are also orthogonal for power grid
creation. M5 – 45 Degree Right
d
M4 – 45 Degree Left
The middle layers can be 45
degrees offset and alternate M3 – 45 Degree Right
a
direction between metal layers. M2 - Vertical
c
M1 - Horizontal
ce M8 - Vertical
n
the design because of the routing M7 – Horizontal
efficiency M6 – 45 Degree Left
Cons
d e
Must have a special library and
vendor who will accept the
M5 – 45 Degree Right
M4 – 45 Degree Left
M3 – 45 Degree Right
a
diagonal routes M2 - Vertical
c
Must have special tools for M1 - Horizontal
e
Various types of routing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 469
e
Global Route Detail Route
Runs on the entire design. Can route the entire design, an area, or
c
selected nets.
n
Finds generalized pathways without Lays down physical wires based on the
laying down actual wires. global routing plan.
d e
Iterative passes are made to optimize
global routing, shorten wire length, and
Fixes DRC violations during search and
repair routing.
ca
Congestion map is updated. If antenna rules are included in the LEF,
antenna repair will also be done during
detail routing.
e
Timing-Driven Congestion-Driven
c
Router routes critical nets to meet Router routes nets keeping low
timing constraint. congestion as a high priority.
en
A critical net will be forced to be Nets will be forced to be spread apart
from a heavily congested area.
a d
May create congestion if many critical
nets have to be forced into a small
channel.
6/16/08
c BD03: Digital Physical Design 471
Summary
After placement, a trial route is run to get an estimate of congestion
and parasitic values.
ce
Early detailed routing provides physical information necessary for
prevention of problems for physical synthesis.
Congestion is displayed as a red diamond after trial route, and colored
lines after global route.
en
Clock nets are routed first and fixed into position so that the router
does not alter them in subsequent runs.
a d
The number of wires assigned to a gcell should not exceed the
number of tracks available.
6/16/08
c BD03: Digital Physical Design 472
Testing Your Understanding
True or false
e
1. Congestion map is created after running detail routing.
2. Shape based routers are limited to small designs and are the
nc
preferred solution for top-level routing.
3. Global router provides guidance to the detailed router.
e
4. Detailed routing stops once the congestion map is created.
a d
6/16/08
c BD03: Digital Physical Design 473
Learning Activity
In this activity, you will
e
Study metrics from several routing log files
c
Present your results to the class
n
e
20 minutes for activity
10 minutes for debriefing
a d
6/16/08
c BD03: Digital Physical Design 474
Power Consumption and
Power Grid Analysis
Module 8
ce
en
a d
6/16/08
c BD03: Digital Physical Design 476
Module Objectives
In this module, you will
e
Identify the inputs and outputs of power-consumption and power grid
analysis tools
nc
Explain the three components of power (leakage, switching, internal)
d e
Identify the types of power grid analysis, the difference between static
and dynamic power grid analysis, and what each is used for
a
Recognize low-power design issues and apply three power-saving
design techniques
6/16/08
c BD03: Digital Physical Design 477
Discussion Questions
What affects power consumption in a chip?
e
How does power affect the cost of a chip?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 478
Topics in This Module
Power consumption and analysis (PowerMeter power calculation
functionality)
Low-power design
ce
Power grid analysis (VoltageStorm® power and power rail verification)
en
a d
6/16/08
c BD03: Digital Physical Design 479
e
Inputs and outputs for power consumption calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 480
What Is Power Consumption?
Power consumption is a critical design criteria. Today, for most system-on-
a-chip (SoC) designs, the power budget is one of the most important design
e
goals of the project.
c
Definition: Power consumption is the amount of energy over time that
must be supplied to a circuit to maintain normal operation. Power
n
consumption is measured in watts (W).
e
Example: The increasing speed and complexity in today’s
microprocessor chips has resulted in a significant increase in the
d
power requirement and determines the battery life in hours for portable
devices.
ca
6/16/08 BD03: Digital Physical Design 481
e
reasonably small die
Power (W)
c
dissipation
100
Limits of what packaging, cooling, and other
infrastructure can support exceeded
n
50
Battery life has declined as features have been
added faster than power (per feature) has been
e
0
reduced.
250 180 130 90 70
Deep submicron technology, 90 nm and below Technology (nm) *Source = Intel
d
Leakage current is increasing dramatically.
Microprocessor chips can dissipate up to 100-150W of power.
a
Power density causes large number of local hot spots on the die.
Poses reliability problems (mean time to failure decreases exponentially with
c
temperature)
Timing degrades and leakage increases with temperature
These problems are all expected to get worse as we move to the next
technology nodes.
e
Extends battery life in portable systems
nc
e
Reduces system fan noise (on some models)
a d
Simplifies power supply and delivery
6/16/08
c BD03: Digital Physical Design 483
ce
Dynamic power component: Related to charging and discharging of load
capacitance and due to a path from Vdd to ground
Switching power: Power consumed through charge and discharge of gate
n
capacitance. The total gate capacitance consists of the sum of the capacitance of
internal gate nodes and capacitance of the gate output load.
e
Short circuit power: Power consumed when both N and P devices are ON at the
same time. Current path established from power rail to ground. It is a function of
d
output load and input slew.
e
Inputs and outputs for power consumption calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 485
ce
Example: The ASIC (Application Specific Integrated Circuit) vendor of a
design library (standard cells and macro’s) provides its customer with a Liberty
(.lib) version of their cells, which apart from timing information, contains power
n
information that the power analysis tool can use to calculate leakage and
active power consumption for the cells.
e
Example: power information in a .lib file
cell (INVXL) { values ( ……… );
cell_footprint : inv; }
d
area : 6.6528; fall_power(energy_template_7x7) {
pin(A) { index_1 ("0.0250, 0.0800, 0.3000,
direction : input; 0.7000, 1.2000, 1.7000, 2.3000");
a
capacitance : 0.00270; index_2 ("0.00018, 0.01050, 0.01925,
} 0.04200, 0.07350, 0.11550, 0.15575");
pin(Y) { values ( ……… );
direction : output; }
c
capacitance : 0.0; }
function : "(!A)"; timing() {
internal_power() { …
related_pin : "A"; }
rise_power(energy_template_7x7) { max_capacitance : 0.15575;
index_1 ("0.0250, 0.0800, 0.3000, }
0.7000, 1.2000, 1.7000, 2.3000"); cell_leakage_power : 0.0173;
index_2 ("0.00018, 0.01050, 0.01925, }
0.04200, 0.07350, 0.11550, 0.15575");
e
Functional information (.lib)
.libs, .cl, SPEF, Pin capacitances (.lib)
SDC, TWF, VCD
Leakage power (.lib)
c
Internal power tables (.lib)
Internal decoupling cap (.cl)
n
Power Physical size and location of power ports (.cl)
Consumption Internal power net resistance (.cl)
e
Tool
Tap currents (.cl)
Function of a power analysis tool
d
Calculates instance-based static and dynamic power
Power consumption
Consumption
a
Runs in two modes:
Vector driven: Use actual switching activity from a VCD
file
c
Vector-less: Probabilistically project the activity
To throughout a design
Power Grid Analysis We use the results of the power consumption tool to
Tool perform static and/or dynamic power grid analysis
Produce reports on the power consumed by each cell,
cell type, or hierarchical block in the design
ce
Within a given CLK period, how often will an input switch in
n
clock
cycles
Net A
d e
ca
In the example, net A switches two times but clock switches six times.
e
language and or DEF (tool dependent)
Power characterized libraries in tool-
c
specific format
Timing libraries in Liberty (.lib) format VCD
n
Timing constraints in SDC format
Gates + SDC TWF
SPEF
Extraction data in SPEF format DEF
e
Timing windows file (TWF)
Value-change-dump file (VCD) Power Analysis
d
Output Logical Power
Libraries Libraries
a
Textual output
Reports on the power consumed
by each cell or block (.pwr) file. Reports
c
Graphical output
Instance-based power and power
density
Power consumption of the clock
distribution network
e
Inputs and outputs for power consumption calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 490
Static Power Consumption
ce
en
Silicon devices are not ideal switches.
Static power dissipation is the power that is lost while circuit signals are not
d
actively switching.
This power dissipation includes leakage and standby power dissipation (i.e.,
a
leakage power when voltage is applied even if circuit is not switching).
c
Static power consumption is the summation of leakage, state dependent
leakage, and averaging of internal and switching over time.
e
Computes average power consumption based upon various
assumptions
nc
It is a full-chip and instance-based power consumption analysis
e
Less accurate than simulation
Hard to model real delays
a
simulation vectors
d
Probabilities model the environment in a less accurate way than
6/16/08
c BD03: Digital Physical Design 492
How Is Static Power Analysis Done?
No simulation is done to determine actual net activity.
Vector-independent (probabilistic activity-based with
optional VCD)
ce
By understanding the logic functionality and the activity at
the input pins, the activity at the output pins is predicted .
Analysis types
.libs, .cl, SPEF,
SDC, TWF, VCD
n
Area-based
Power
e
A power per unit area is assumed and multiplied by the Consumption
total die area Tool
d
Easy, but not very accurate and is used in floorplanning
Cell-based
a
Power
Power for each cell is taken from the library entry Consumption
More accurate and is used by synthesis tools prior to
c
place and route
Instance-based
Takes in consideration output load of each instance Static Power Analysis
Reports
Calculates power from library tables
Most accurate, but requires information from place and
route
6/16/08 BD03: Digital Physical Design 493
e
Inputs and outputs for power consumption calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 494
Dynamic Power Consumption
Pdynamic = α x CL x VDD2 x f
CL
ce Where,
α – Switching activity
f – Operating frequency
is switching.
en
A circuit does not draw constant current. Current draw increases when a cell
Dynamic power consists of power dissipated inside a cell (mostly due to short-
d
circuit current during switching) and power dissipated to charge/discharge net
capacitance.
ca
Dynamic power is a function of voltage, toggle rate, and net loading.
Dynamic power consumption is the power of each instance over time, taking
into account simultaneous switching activity.
Timing Window File (TWF) provides windows when nets are switching relative
to clock edges; default input activity or VCD provides switching activity (toggle
rate.
6/16/08 BD03: Digital Physical Design 495
e
Best analysis since it takes into account that not all nets are driven at the
same frequency
c
Dependent on the actual test vectors used to derive the net activity
Requires significant CPU time in simulation
Gate-level analysis
en
Net activity information from simulation vectors
Time-based input slew and output load for each cell
d
Cell power characterization from the library
Usually performed during analysis since it is faster, but not very accurate
a
Transistor-level analysis
c
Simulation vectors for at least the I/Os (such as running SPICE on a full
design
Performed at signoff since it takes a long time, but is very accurate
What if vectors are not available for simulation?
e
Simulation
Vector based
Uses VCD for switching activity and timing and TWF for
c
input slews Toggle
Most accurate solution if “right” vectors are provided by Rates
n
user
Vector-less
e
Simulation
Uses TWF for input slews and timing Driven
Best approach to obtain full-chip transient information Power
d
Analysis
Transistor level
a
Very accurate
Power
Much faster than SPICE
c
Report
Gate level
Faster than transistor level
To
Still very accurate due to good modeling of power dissipation Power Grid Analysis
at cell level Tool
e
It is the average power over time for It is the average or peak power over time
each instance, resulting in one power resulting in current waveforms, i.e., at
c
number. each time step across the simulation
window.
n
Calculates average IR drop. Calculates the worst case IR drop
transients.
d e
Static IR drop analysis is a first-order
approximation that uses the total power
Dynamic IR drop analysis deals with the
voltage drop of current surges.
a
current draw.
Is a fast process available early in the Provides visibility of simultaneous
c
design phase and provides correct switching, decap optimization to control
information on power grid issues. leakage power, and the effect of
packaging.
e
IR drop (worst-case)
nc
17 mV increase
in IR drop due
to switching
d e
ca
6/16/08 BD03: Digital Physical Design 499
Review Questions
1. What are the three components of power consumption?
2. What is the purpose of a power library?
power consumption?
ce
3. What is the difference between static power consumption and dynamic
en
a d
6/16/08
c BD03: Digital Physical Design 500
Topics in This Module
Power consumption and analysis
e
Power grid analysis
Low-power design
nc
d e
ca
6/16/08 BD03: Digital Physical Design 501
e
Inputs and outputs of power grid analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 502
What Is a Power Grid?
IC power distribution systems are designed to provide
G V G V
needed voltages and currents to the transistors that
e
perform the logic functions of a chip.
V
block 4
V
c
Definition: The system that distributes the block 5
G
throughout the chip to ensure the correct logic
n
block 3
functioning is achieved through a network of
V
block 4
e
wires called the power grid.
V
G
Example: In our design, we over-engineered block 1
d
V
our power grid to avoid IR-drop problems, but in
G
doing so, we did not have enough resources to
a
properly route our design. G V G V G V
6/16/08
c BD03: Digital Physical Design 503
e
Power grid analysis evaluates how power is distributed from the voltage
source to the transistors and gates in the design.
c
It is the analysis of the power grid and not power consumption in a design.
n
e
VDD_1
a d
VDD_2
6/16/08
c +
-
Resistance of
interconnect
ce
Detects ground bounce on VSS nets
n
functionality
Reduces cause of silicon failure
e
Reduces electromigration (EM) effects
d
ca
6/16/08 BD03: Digital Physical Design 505
ce
A tap current (currents arising from transistor to power grid connection) data file
provides the details for each current source.
These currents are used to perform either a simple steady-state analysis or a
n
dynamic analysis of the power grid.
VDD
d e
ca
6/16/08 BD03: Digital Physical Design 506
How Is Power Grid Analysis Done? (continued)
Power grid analysis at cell/gate-level
The current distribution within a cell or a block is done on an instance-by-instance
e
basis.
c
An instance-based power consumption file or current data file supplies the power
consumed on an instance-by-instance basis in Watts.
n
Current source applied as black box or gray box.
VDD
d e
ca Port view
Detailed view
e
Inputs and outputs of power grid analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 508
Input and Output, Format
Input
.lib,
Gate-level netlist in the Verilog LEF/ SPEF, etc.
e
GDSII
language + DEF
c
Power grid cell view library Power
Library view consumption
Power consumption data Generator analysis tool
n
Output
Graphical display
e
Power
Power Grid Consumption
Plots View Library
d
Reports
DEF/GDSII
a
Hierarchical power-
grid analysis tool
6/16/08
c BD03: Digital Physical Design
Analysis Results
509
e
SPEF, etc.
GDSII
Create the top-level DEF/GDS of your design
c
Create power consumption data Power
Library view consumption
Provide power consumed on per-instance basis
Generator analysis tool
n
Provide power consumed on a per-cell basis
Area-based power distribution based on total number
e
Power
More data for cells = more accurate power consumption Power Grid Consumption
data View Library
d
Run power grid analysis
Link to the power grid view libraries DEF/GDSII
a
Load the power consumption data
Hierarchical power-
Set up and run the analysis grid analysis tool
6/16/08
c BD03: Digital Physical Design
Analysis Results
510
Power Grid Analysis
What is power grid analysis?
e
Inputs and outputs of power grid analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 511
e
Electromigration (EM)
Current density Neither of these two
c
are handled by
Joule heating or wire self-heat (signal nets)
power grid analysis
n
Hot electron effects but are factors that
affect the overall
e
power analysis
a d
6/16/08
c BD03: Digital Physical Design 512
What Is Voltage (IR) Drop and Ground Bounce?
IR drop
e
Voltage drops caused by current flowing from the power source through the
resistive power network to the on-chip devices is called IR drop.
c
Ground bounce
Voltage spikes caused by current flowing from on-chip devices though the
n
resistive ground network to the ground pins (or bumps)
e
IR drop and ground bounce combine to impact silicon performance.
d
VDD = 1.20V
a
VDD = 1.1V
c
CLK
VDD = 1.17V
e
Setup Time Violation
c
CLK CLK
n
IR drop
DATA +
IR drop
e
Setup
d
In the case where IR drop occurs on a clock buffer, the clock signal beyond
this buffer is slowed, potentially causing hold time violations for all signals
a
clocked by this clock branch.
Hold Time Violation
IR drop
c
CLK
CLK Hold
DATA
Latch Latch
CLK +
IR drop Hold
e
causes voltage drop (Ohm’s voltage and impacts
Law). circuit performance.
VDD
Current
nc
e
1.1V
1.2V
a d 2. Load
capacitance
charges up.
c
1. Input signal
switches.
circuit
e
VDD 3.300 volts
c
3.266 volts
Color 8
Color 7
n
Below
Color 6 Incremental values for color 2 - 7 also
transistor
e
Color 5 operating
Color 4 voltage
Color 3
d
3.062 volts
Color 2
3.000 volts
a
Color 1
VSS
c
0.0 volts
n
is a discontinuity in
the power grid.
a
well to the power grid.
6/16/08
c BD03: Digital Physical Design
These RAMs do not
connect well to the
grid.
517
What Is Electromigration?
Electromigration is a wear-out mechanism of metal wires.
e
Metal atoms migrate over a period of time, causing open circuits,
shorts circuits, or unacceptable increases in resistance.
nc
There are two main causes of electromigration failure:
High (DC) current densities
Joule heating, which is caused by high alternating currents
e
These wear-out mechanisms can take extended periods of time.
d
a void e migrated ions
(short hazard)
c
(open)
ce
As pulses go through the wire, the power dissipated by the wire causes it to
The difference in the thermal constants between the oxide and the wire
n
causes mechanical stress, and the wire can eventually fail resulting in chip
failure in the field.
e
EM failures as seen though a scanning electron microscope (SEM)
d
ca FESEM micrograph of aluminum
lines exhibiting classic
electromigration voiding.
Hillocks formed in a Cu line
during electromigration test.
www.nd.edu
Electromigration Damages
Voids
ce
en
a
Hillock
d
c
www.diei.unipg.it/RICERCA/www_em/voidhill.gif
Equation)
ce
MTTF (mean time to failure) ∝ 1/J2 where J= current density (Blacks
Current density must not exceed specification Æ wire Ii/wi < Jspec
n
Specified as mA per μm wire width (e.g., 1mA/ μm) or mA per via cut
e
EM occurs both in signal (AC=bidirectional) and power wires
(DC = unidirectional)
d
Much worse for DC than AC; DC occurs inside cells and in power buses
a
6/16/08
c BD03: Digital Physical Design 521
ce due to a narrow
metal3 power grid
strap connecting
n
to the internal
RAM.
d e A failure here is
catastrophic.
ca
6/16/08 BD03: Digital Physical Design 522
What Is Joule Heating?
Wire Self-Heat (WSH)
May also be called signal wire electromigration, or Joule heating, since it is related to the
e
power that is dissipated into the interconnect.
WSH is the rise in temperature due to the electron movement within a conductor, i.e.,
c
wire heats above oxide temperature as pulses go through.
Depends on metal composition, signal frequency, wire sizes, slew rates, and amount of
n
capacitance driven
Self-heating = More EM
e
Since SH increases temperature, self-heating on a metal line can aggravate EM effects.
SH on a line can also increase EM effects on neighboring lines.
d
Because self-heating contributes to electromigration, failures are typically labeled as EM,
not SH.
ca Wire self-heat
ce
Electrons pick up speed in the channel
Fastest electrons damage the oxide and interface near the drain
n
Transistor threshold and mobility change over the life of the part, i.e.,
threshold eventually moves to a point where the device no longer meets
e
specifications
Oxide and/or interface
d
is damaged here.
Gate
+++
ca
Electrons pick up speed in channel;
“hot” electrons are the fastest of a
statistically fast bunch.
+++
N+ diffusion
e
Inputs and Outputs of Power Grid Analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 525
ce
Solves Ohm's and Kirchoff's laws for a given power network while
ignoring localized switching effects on the power grid
Detects and fixes major supply grid problems
en
Main challenge of the static approach is accuracy
a d
6/16/08
c BD03: Digital Physical Design 526
How Is Static Power Grid Analysis Done?
Select the power grid view libraries to be used in the
power-rail analysis.
e
Read in the
The parasitic resistance of the power grid is extracted, and Power grid views
nc
An average current for each transistor or gate connected
to the power grid is calculated.
The average currents are distributed around the resistance
Extract
Power grid
parasitic
information
e
matrix based on the physical location of the transistor
gate. Create
d
resistor
At every VDD I/O pin, a source of VDD is applied to the matrix
matrix.
a
A static matrix solve is then used to calculate the currents Calculate
and IR drops throughout the resistance matrix. average
c
current
Calculation of an instance-based static power
consumption is done, which contains the instance-based
Calculate
power-consumption data for all instances of each cell and current and IR
block in the design. drop
e
Missing vias
Insufficient vias
nc
e
Power planning decisions
a d
Power grid electromigration analysis lets you do the following:
Run a comprehensive analysis that is not vector dependent
c
Find problems in both vias and routing
e
Inputs and outputs of power grid analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 529
grid to be extracted
ce
Analysis requires that both resistance and capacitance of the power
en
Results can be extremely accurate
a d
6/16/08
c BD03: Digital Physical Design 530
How Is Dynamic Power Grid Analysis Done?
Select the power grid view libraries to be used in the
power-rail analysis.
e
Read in the
power grid views
The parasitic resistance and capacitance of the power
c
grid and the signal nets are extracted.
The dynamic tap currents are passed from the power Extract
n
calculation tool. power grid
and signal-net
Power calculator calculates the currents over time. parasitic
e
information
d
over time based on the calculations of the power
calculator. Calculate
a
dynamic current
6/16/08
c BD03: Digital Physical Design
Perform
rail analysis
531
ce
Calculate the power grid characteristics over time
weakness
en
Identify which specific test vector activated an implementation
d
Examine the time correlation of tap current
a
Obtain a better estimation of the precise magnitude of IR drop
c
How do you identify the test vectors?
In addition to using vectors in a dynamic power grid analysis, there are
methods that do not require vectors, but use a timing window file (TWF)
instead.
IR drop
ce Transistor device
n
currents
d e
ca
Current congestion
Electromigration
6/16/08 BD03: Digital Physical Design 533
e
Current Drawn
from VDD
VDD
IR Drop with
nc 1.2V
1.1V
e
Decoupling
Adding decoupling capacitors
makes a static approach more
d
accurate. Decoupling capacitors act
as a local charge source.
VDD
1.2V
ca Decoupling
Capacitors
Input
e
is to use wider power stripes or use more metal
on higher levels.
nc
Additional power stripes are added to the design
e
and are marked in cyan and magenta.
d
ca
This IR drop plot is made after an increase of the
number of power stripes.
This plot shows a very low voltage drop, which is
required for a functional chip.
Review Questions
What is a power grid?
e
What are the tasks of power grid analysis?
What is the difference between static power grid analysis and dynamic
power grid analysis?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 536
Topics in This Module
Power consumption and analysis
e
Power grid analysis
Low-power design
nc
d e
ca
6/16/08 BD03: Digital Physical Design 537
Low-Power Design
Need for low-power design
e
Low-power design techniques
Clock gating
Multi-threshold Logic
nc
Multi-voltage with shut-off
d e
ca
6/16/08 BD03: Digital Physical Design 538
Need for Low-Power Design
Exponential increase in chip density.
e
In deep submicron technology (130 nm, 90 nm, and below), leakage
current increases dramatically.
current.
nc
In some 65 nm designs, leakage current is nearly as large as dynamic
d e
ca
6/16/08 BD03: Digital Physical Design 539
Switching intensive networking applications use 50% -> watch the clock tree
and its sequential elements.
ce
The earlier in the design process power consumption is addressed, the bigger
the impact.
n
At higher levels of abstraction, there are more degrees of freedom for large
e
changes to the design implementation.
a d
6/16/08
c BD03: Digital Physical Design 540
Power Saving Techniques
Some of the low-power design techniques discussed today are
Circuit and chip design
Clock gating
Multi-voltage with shut-off
Process
ce
n
Multi-threshold logic
e
RTL
d
Synthesis RTL clock-gating for dynamic
ca Floorplanning
Physical
Implementation
Power grid planning for multi-voltage
IR drop and EM analysis
Muti-Vdd optimization
Dual-Vth optimization for leakage
Physical clock gating
Low-Power Design
Need for low power design
e
Low-power design techniques
Clock gating
Multi-threshold logic
Multi-voltage with shut-off
nc
d e
ca
6/16/08 BD03: Digital Physical Design 542
Clock Gating
Clock distribution network contributes to a
significant portion of total power
e
consumption
Clock buffers have the highest toggle rate,
c
and often have a high drive strength to
minimize clock delay
n
Flip-flops with an active clock dissipate
some dynamic power even if the inputs
e
and outputs are unchanged.
Shut-off the clock during periods of
d
inactivity to avoid unnecessary power
consumption
ca
Clock gating
Multi-threshold logic
Multi-voltage with shut-off
Clock-Gating Styles
Designer has the following control:
Latch-free {OR}
e
Latch-based or latch-free gating
style EN
GCLK
nc CLK
e
gating logic EN
d
GCLK
Minimal bit-width of gated CLK
registers
ca
Clock gating
Multi-threshold logic
Multi-voltage with shut-off
EN
CLK
Latch-based {NAND INV}
GCLK
Done using
ce
Step 2: inserting clock-gating cells into the clock path using the enable logic
n
Simple combinational logic (output hold on a register)
More complex sequential logic that spans multiple clocks
automatically.
d e
Commercially available synthesis tools accomplish the second task
a
D_in D_in
D_out
D_out
c
CG CG
CG
ce
Reduced internal power consumption at the clock-gated flip-flops
No need for muxes to re-circulate the data for these flip-flops (saves power
and area)
Disadvantages
No effect on leakage
en
d
May result in setup time or hold time violations
Clock gating has to be inserted before clock tree synthesis (CTS) in most
a
power design flows and hence presents design issues
c
Affects testability by introducing multiple clock domains (solved if we use a
latch-based design)
Adding clock gating may not always be accompanied by reduced power
Clock gating adds logic that consumes power
ce
Sub-threshold leakage depends exponentially on VT.
Today, many libraries offer two or three versions of their cells: Low VT,
Standard VT, and High VT.
n
The implementation tools can take advantage of these libraries to optimize
e
timing and power simultaneously.
Leakage Delay
d
100%
80%
Clock gating
ca
Multi-threshold logic
Multi-voltage with shut-off
60%
40%
20%
0%
LVt SVt HVt
Leakage vs. Delay at 90 nm
e
Minimize total number of fast, leaky low VT transistors by deploying
them only when required to meet timing.
nc
Involves an initial synthesis targeting a primary library followed by an
optimization step targeting additional libraries with differing thresholds.
Examples
Goal: High performance
d e
Synthesizing with high-performance, high-leakage library first and then
relaxing back any cells not on the critical path by swapping them for lower
a
performing, lower leakage equivalent
Goal: Minimum leakage
c
Target the low-leakage library first and then swap in higher performing,
high-leakage equivalents to meet timing in critical paths
Clock gating
Multi-threshold logic
Multi-voltage with shut-off
e
Can reduce leakage power without compromising performance.
Disadvantages
nc
Leakage current increases exponentially with VT reduction.
d e
In terms of cost, requires one additional mask.
Clock gating
ca
Multi-threshold logic
Multi-voltage with shut-off
ce
Different blocks have different performance objectives and constraints.
A lower supply rail means that the dynamic and static power will be lower for
the cells on this rail.
n
Partition the internal logic of the chip into multiple voltage regions or power
e
domains, each with its own supply.
For example, processor needs to run as fast as the semiconductor technology will
d
allow; high supply voltage is required.
In a USB block run at a relatively slow
a
Cache RAMS
frequency dictated by protocol, a 1.2V
lower supply rail may be sufficient
c
for the block to meet its timing SOC
constraints.
0.9V
CPU
Clock gating 1.0V
Multi-threshold logic
Multi-voltage with shut-off
Multi-Voltage Architecture
6/16/08 BD03: Digital Physical Design 550
Techniques to Achieve Multi-Voltage
To achieve multi-voltage on a chip, the following techniques are
implemented:
Voltage scaling interfaces – level shifters
Power gating
ce
n
Signal isolation cell
State retention power gates
Sleep transistors
d e
Clock gating
ca
Multi-Threshold Logic
Multi-Voltage with shut-off
e
Logic Logic
VSS
nc
Ensure signals going from one domain to another (e.g., 0.9V to 1.2V) will not
turn on both the NMOS and PMOS networks, causing crowbar currents.
e
Domain gets the voltage swings (and rise- and fall-times) that it expects.
clk
a
Q
VDDL
VSS
d
OUTL
D
clk
Q
VDDL
VSS
VDDH
OUTH
6/16/08
c
1.2V Domain
1.1V Domain
0.9V Domain
1.1V Domain
Low-to-high level shifter cells
More complex - Implemented using a buffered and an inverted
form of the lower voltage signal used to drive a cross-coupled
transistor structure running at the higher voltage
BD03: Digital Physical Design 552
Power Gating
The technique used to turn off blocks that are not being used is known as
power gating.
Reduce the overall leakage power of a chip.
ce
Selectively powering down certain blocks in the chip while keeping other
blocks powered up.
n
Goal: To maximize power savings while minimizing the impact on
e
performance.
Activity Profile with Power Gating
d
SL W SL W SL
EE AK EE AK EE
P E P E P
a
200 mW
SLEEP events –
c
Power
e
at the inputs of powered–up blocks.
Inputs to the power gated blocks can be
c
driven to valid logic values by powered up
blocks without creating electrical (or
Vdd
n
functional) problems in the powered down
block. Pwr Isolation cell
Switch
e
Iso
The outputs of powered down blocks must be
controlled by using an isolation cell to clamp
d
the output to a specific, legal value.
Iso
Three basic types of isolation cell
a
Those that clamp the signal to “0”,(use AND
gate)
c
Those that clamp it to “1”, and (use OR-gate)
Those that latch it to the most recent value
ce
On power up, state of block must be restored
from external source or build up state form reset
condition.
n
Time and power requirement can be significant.
e
Methods of saving and restoring the internal
state of a power gated block
d
Software approach: Based on reading and writing Vdd
registers (state info stored in processor memory) Pwr
Switch
a
Scan-based approach: Based on using a
dedicated set of scan chains to store state of chip Vdd VRET
c
D Q
Register-based approach: Uses retention SRPG
registers (contains a “shadow” register) to Clk Cell
preserve the registers state during power down Ret
and restore it at power up
Vss
A power gating control signal “SLEEP” (or “SLEEPN”) controls the sleep
ce
transistor to switch on and off the power supply to the cell.
A PMOS sleep transistor is used to switch VDD supply and is called “header
switch.” The NMOS sleep transistor controls VSS supply and is called “footer
n
switch.”
INPUTS
d e
VDD
OUTPUTS*
SLEEP
VDD
a
OUTPUTS*
SLEEPN INPUTS
Clock gating
c
Multi-threshold Logic
Multi-voltage with shut-off
6/16/08
VSS
556
Sleep Transistors: Coarse-Grain Power Gating
Dedicated cells that can switch off the entire power or ground network of
particular row of cells
ce
A power gating control signal “SLEEP” controls the sleep transistors
connected in parallel between permanent and virtual power networks
n
VDD
e
SLEEP
VVDD
a d OUTPUTS*
INPUTS
Clock gating
c
Multi-threshold logic
Multi-voltage with shut-off
n
Minimizes leakage, which
Multiple power domains require more
provides greatest
e
careful and detailed floorplanning.
reduction in power
Power grids become more complex.
d
Multi-voltage designs require additional
resources on the board (additional
a
regulators to provide the additional
c
supplies)
Power up and power down sequencing.
Clock gating There may be a required sequence for
Multi-Threshold Logic powering up the design to avoid deadlock.
Multi-Voltage with shut-off
e
1.0v lib
0.8v
Level shifters
c
1.2v lib
A general multi-voltage implementation showing libraries for the various power domains on the same chip.
Library Domain 2
(1.2V)
Power Domain 2
en Iso_cell
d
Library
Iso_cell Domain 2
Power
Low Vt Normal Vt High Vt Level Shifter (LS) Domain 3
a
(High Speed) (Low leakage,
lower Speed)
Library Domain 4
Library Domain 1 Power Domain 4 LS
c
1.2 V (0.8V)
Power domain 1
Iso_cell
Iso_cell
Level Shifter (LS)
A more detailed block-level diagram showing the various elements that interface between the different power domains.
e
Penalty Architecture Design Verification Place and
Route
c
Clock Medium Little Little Low Low None Low
Gating
n
Multi Vt Medium Little Little Low Low None Low
e
Multi- Large Little Little High Medium Low Medium
Voltage
d
Power Large Little Medium High Medium Low Medium
Gating
a
~ Large
6/16/08
c BD03: Digital Physical Design 560
Review Questions
What is clock gating?
e
How is multi-threshold logic implemented?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 561
Summary
The tasks of a power consumption tool are to calculate static (leakage)
and dynamic (switching and internal) power for each instance in the
e
design.
c
The tasks of a power grid analysis tool are to use the instance power
(static) and current (dynamic) results to check for IR drop, ground
n
bounce, and electromigration in a design.
e
The earlier in the design process power consumption is addressed,
the bigger the impact since there are more degrees of freedom for
d
large changes to the design implementation.
a
Low-power design helps achieve significant power reduction at the
cost of addition design complexity.
6/16/08
c BD03: Digital Physical Design 562
Testing Your Understanding
True or false
e
1. In a power library, look-up tables are implemented by creating multiple
templates of common information that can be used to represent internal
c
power.
2. The effect of IR drop on a signal path is that the signal path is slowed,
en
thus causing a hold violation.
3. Wire electromigration is related to the power that is dissipated into the
interconnect.
d
4. Dynamic power consists of power dissipated inside a cell and power
dissipated to charge/discharge net capacitance.
a
5. By using multi-threshold logic, the implementation tool can take
c
advantage of HVT/LVT/SVT libraries to optimize timing and power
simultaneously.
6. A lower supply rail means that the dynamic and static power will be
lower for the cells on this rail.
Sources
Power Library
e
Library Compiler™ User Guide: Modeling Timing, Signal Integrity, and
Power in Technology Libraries, version A-2007.12, December 2007
Low-Power Design
nc
Voltage Storm Data Prep Manual, version 6.1.2
e
Low-Power Methodology Manual for System-on-Chip Design by
Michael Keating, David Flynn, Robert Aitken, Alan Gibbons, and
d
Kaijian Shi
ca
6/16/08 BD03: Digital Physical Design 564
Reference: Formulae for Power Consumption Calculation
Ptotal = Pstatic + Pdynamic
e
Pstatic = VDD x Ileakage
c
Ileakage = [Number of transistors (logic gates + memory array) *
Average length of transistor in meter] * [Subthreshold leakage + Gate
n
Leakage]
calculation purpose.
Pdynamic = α x CL x VDD2 x f
d e
where 1λ = 0.04 μm/λ in this example and must be used in μm for
Where
ca
α – Switching activity
f – Operating frequency
CL = [Number of transistors (logic gates + memory array) * Average
length of transistor in meter]
6/16/08 BD03: Digital Physical Design 565
Reference: Example
Operating Voltage = 1.2V
Number of transistors = 200 million
Average logic transistor = 8λ (where 1λ = 0.04 μm/λ)
e
Subthreshold Leakage = 30 nA/μm
Gate Leakage = 2 nA/μm
Static power dissipation:
P static = I static * VDD
Transistors:
nc
[(200*10e6) * (8λ * (0.04 μm/λ)] = 6.4*10e6 μm
e
On an average, half the transistors are OFF and contribute subthreshold leakage.
Total static current is
d
(64*10e6 μm) * [(30 nA/μm)/2 + (2 nA/μm)] = 1088 mA
1088 mA * 1.2V = 1305.6 mW
a
Dynamic power dissipation:
P dynamic = α * C * VDD2 * f
c
Transistors:
200 * 10e6 * 8 λ * 0.04 μm/λ * 2 fF/μm = 128 nF
Dynamic Power Consumption per MHz or GHz:
[(0.1 * 12.8nF) + (0.05 * 25.6nF)] * (1.2)2 = 3.68 mW/MHz or 3.68W at 1 GHz
Module 9
e
reg r1, r2;
always @ (posedge clk) During
c
r2 <= !r1; RTL Coding
r1
en
u1
r2 After
d
Synthesis
ca
u1
r1
r2
During
Place/Route
e
Articulate how extraction and delay calculation are run using standard
parasitic and delay formats
c
Compare the different extraction models, including parallel plate, 2.5D,
and 3D
(SPEF) file
en
State the various sections of a Standard Parasitic Exchange Format
d
State the various sections of a Standard Delay Format (SDF) file
a
Describe how delays are annotated during various phases of the design
flow
6/16/08
c BD03: Digital Physical Design 569
e
Delay calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 570
Discussion Questions
What is capacitance?
e
What is resistance?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 571
What Is Capacitance?
Definition: Capacitance is a
measure of the amount of electric
e
charge stored between two plates
for a potential difference (voltage)
c
conductor1
across the plates.
Capacitance (C) is proportional to
n
the cross sectional area (A) of the
distance capacitance
e
plates, and inversely proportional
to the distance (D) between them. conductor2
d
C = K * A/D, where K is the
dielectric value of the
a
material between the plates
c
Example: The long wires in the
Cross-sectional area
design incurred a very large
capacitance between them, and,
therefore, the timing of the design
was compromised.
e
an object opposes an electric
current through it.
Resistance (R) is proportional to
nc
the length (L) of the wire and
inversely proportional to the cross- conductor1
e
sectional area (A).
R = K * L/A resistance
a d
technology, wire resistance is
estimated with a factor measuring
c
resistance per unit length.
e
resistance values for all of the
interconnects (wires) in a circuit.
c
conductor1
Example: After routing, we ran
parasitic extraction and examined
n
the output files to make sure the
e
resistance and capacitance values capacitance
were below our maximum limit.
conductor2
a d resistance
6/16/08
c BD03: Digital Physical Design 574
Parasitic Extraction
Extraction models
e
SPEF file
Correlation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 575
Interconnects (Wires)
Extraction deals with the wires or
connections in a design. W
Interconnects (wires) in a given
e
S
c
technology will have several rules P
and specifications associated with
each metal layer.
Among the many rules
Width (W)
Pitch (P)
en
Spacing (S)
a d
Resistance per square unit
(RPSQ)
RPSQ
6/16/08
c m2
576
Interconnects (Wires) (continued)
The thickness of the wires in a given
TABLE OF WIRE VALUES FOR 90nm PROCESS
technology is assumed to be constant.
e
METAL minimum
Resistance is characterized per square LAYER
width pitch
spacing
RPSQ
unit (RPSQ).
c
M8 0.42 0.84 0.42 2.7500e-02
Most technologies have three different
n
grades of interconnects: M7 0.42 0.84 0.42 2.7500e-02
e
M6 0.14 0.28 0.14 8.0600e-02
M1
Finest width, spacing
d
M5 0.14 0.28 0.14 8.0600e-02
Signal routes
a
M2 to M(N-2) M4 0.14 0.28 0.14 8.0600e-02
c
M3 0.14 0.28 0.14 8.0600e-02
Global/power routes
M(N-1) to MN M2 0.14 0.28 0.14 8.0600e-02
Largest width, spacing
Thick metal M1 0.12 0.28 0.12 1.3000e-01
e
Signal Routes
VDD
nc
GND
d e
ca Internal Cell Routes
Power Routes
ce m1
via12
n
Capacitance calculations can be very
complex:
e
Multi-layer m2
d
Multi-dimension
Coupling capacitances
a
m1
Line-to-ground (net to substrate)
c
Line-to-line (nets on same layer) m1
e
Typically used in iterations during place/route
c
en B
a d substrate
6/16/08
c BD03: Digital Physical Design 580
Near Body Effects
Near body effects are coupling capacitances between adjacent layers of metal
ce
Fringe or sidewall capacitance (Cf)
n
Crossover capacitance (Cr)
e
a d Cr
Cc
6/16/08
c Ca
Cf
2D or 2.5D Model
2D or 2.5D models: Some of the “near-
body” effects
C
Much slower to extract
capacitance vs. 1D model
because there is more
information.
ce A B D
Much more accurate for crosstalk
en
and noise effects because the
coupling capacitances that E
d
contribute to crosstalk and noise
are extracted.
a
Used during detailed analysis substrate
c
during or after place/route.
e
F
Very, very slow
c
Extremely accurate
n
A B D
usually the high-speed areas in
e
need of very accurate analysis
E G
a d
c
substrate
e
TCL
Routed design in the Verilog®
c
language or other HDL + DEF or DEF or
GDSII
GDSII
n
Physical libraries in LEF format
Extraction
Tool-specific libraries, map files,
e
etc.
Physical
Extraction constraints and Library
d
SPEF
commands in TCL
Output
a
Parasitic File
SPEF file containing all of the RC
c
information for the routed nets in
the design
e
Floorplanning Place/Route
Rough estimates based on Specification
c
“virtual” routes after placement Designer Placement
Micro-
Physical Synthesis
Detailed estimates based on Architecture Scan Reorder
Delay Calculation
PostPlace
Signal Integrity
Extraction
RTL CTS
e
Output of extraction (SPEF) is used in Design Optimization
PostCTS
Logic Synthesis
many other steps in the flow. Route
d
Synthesized Design Optimization
Delay calculation for nets Gates Gates PostRoute
a
Design Verification
Signal integrity values for nets
Mask Prep
c
GDSII
GDSII
analysis
Power and reliability analysis
during physical verification
Parasitic Extraction
Extraction models
e
SPEF file
Correlation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 586
What Is SPEF?
Definition: IEEE standard for *SPEF "IEEE 1481-1999"
representing parasitic data of *DESIGN “Sample“
e
*DATE “13:03:59 Monday December 18, 2007”
wires in a chip in ASCII format *VENDOR “Sample Tool Vendor”
*PROGRAM “Parasitics Generator”
c
Example: In order to perform *VERSION “1.1.0”
signoff, we ran parasitic extraction *DESIGN_FLOW “EXTERNAL_LOADS”
*DIVIDER /
and wrote out a SPEF file, which
n
*DELIMITER :
contained all of the capacitance *BUS_DELIMITER [ ]
*T_UNIT 1 NS
e
and resistance information of our *C_UNIT 1 PF
design. We input the SPEF file *R_UNIT 1 OHM
*L_UNIT 1 HENRY
into our timing and power analysis
d
tools to finalize our specification *POWER_NETS VDD
*GND_NETS VSS
for performance/Watt.
a
*PORTS
Note: SPEF also contains “inductance” CONTROL O *L 30 *S 0 0
c
FARLOAD O *L 30 *S 0 0
information, which is used for advanced INVX1FNTC_IN I *L 30 *S 5 5
processes or highly detailed analysis. NEARLOAD O *L 30 *S 0 0
TREE O *L 30 *S 0 0
We will not discuss inductance in this
*D_NET INVX1FNTC_IN 0.033
course.
…
e
9.1 Introduction
c
The Standard Parasitic Exchange Format (SPEF) provides a standard
medium to pass parasitic information between EDA tools during any
n
stage in the design process. Parasitics can be represented on a net-
by-net basis in many different levels of sophistication, from a simple
e
lumped capacitance, to a fully distributed RC tree, to a multiple pole
AWE representation.
a d
6/16/08
c BD03: Digital Physical Design 588
IEEE Std 1481-1999 (continued)
9.2 Targeted applications for SPEF
SPEF is suitable for use in many different tool combinations. Because
e
parasitics can be represented in various levels of sophistication, SPEF_files
can communicate parasitic information throughout the design flow process. A
c
design can be distributed between multiple SPEF_files. The files can also
communicate information such as slews and the “routing confidence”
n
indicating at what stage of the design process and/or how the parasitics were
generated. A diagram of how SPEF interfaces with various example
e
applications is shown in Figure 15.
a d
6/16/08
c BD03: Digital Physical Design 589
Discussion Questions
Where does SPEF come from?
e
Where is it used?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 590
What’s in an SPEF File?
Here are the basic elements of an SPEF file SPEF File
Header
e
Header
Contains all of the basic information of
the SPEF file’s origin and specifications
c
Name Map
Name map
n
Substitution of net names for symbols
Power and Ground Nets
Power and ground nets
e
Names of the power and ground nets Externals, Ports
Externals, ports
d
Specifies the port name, direction,
coordinates, capacitive load, slew, etc.
a
Internals
Internals
c
Detailed or reduced view of signal and
power nets in the design
Hierarchical entities
Used to reference instantiated
components with a sub-module SPEF
Hierarchical Entities
e
definitions.
Header
SPEF_version
c
design_name Name Map
date
n
vendor Power and Ground Nets
e
program_name
Externals, Ports
program_version
d
unit_def
Pin/bus/hierarchy definitions
ca
The SPEF version is important, since syntax
will change and tools will support different
versions of SPEF.
Also, the program name and version are
important for debugging problems, possibly
wit faulty tool versions.
Internals
Hierarchical Entities
e
name_map ::= *NAME_MAP Header
c
name_map_entry {name_map_entry}
name_map_entry ::= index mapped_item Name Map
n
index ::= *<pos_integer>
mapped_item ::= identifier | Power and Ground Nets
bit_identifier | path | name |
e
physical_ref
Externals, Ports
d
Example:
*NAME_MAP
a
*1 NET_1
*2 NET_2 Internals
c
…
*20 NET_20
e
Header
Example:
c
*POWER_NETS VDD Name Map
n
*GND_NETS VSS
Power and Ground Nets
e
Externals, Ports
a d Internals
6/16/08
c BD03: Digital Physical Design
Hierarchical Entities
594
What’s in the Externals, Ports Section?
The externals and ports section
SPEF File
describes the interfaces to the design,
e
including name, direction (I or O), Header
capacitive load (L), slew (S), and other
c
timing information.
Name Map
Example:
n
Power and Ground Nets
*PORTS
e
A O *L 30 *S 0.0 0.0
Externals, Ports
B O *L 30 *S 0.0 0.0
d
C O *L 30 *S 0.0 0.0
D O *L 30 *S 0.0 0.0
a
E I *L 30 *S 5000 5000
Internals
c
A,B,C,D,E = Port
I/O = Input or Output
L = Load
S = Slew
Hierarchical Entities
e
type:
Header
d_net
c
r_net Name Map
n
d_pnet
Power and Ground Nets
r_pnet
e
d_net and r_net are detailed and reduced Externals, Ports
representations for signal nets.
a
representation for power nets
d
d_pnet and r_pnet are detailed and reduced
Internals
c
The d_net representations are detailed and
have much more information, while the r_net
representations are more compact and less
accurate. Use the appropriate type for the
part of the flow, d_net for signoff, r_net for
intermediate analysis. Hierarchical Entities
e
internal_def ::= nets {nets}
nets ::= d_net | r_net | d_pnet | r_pnet
c
d_net ::=
*D_NET net_ref total_cap
n
[routing_conf] [conn_sec] [cap_sec] [res_sec] [induc_sec] *END
r_net ::=
e
*R_NET net_ref total_cap [routing_conf] {driver_reduc} *END
d_pnet ::=
d
*D_PNET pnet_ref total_cap
a
[routing_conf] [pconn_sec] [pcap_sec] [pres_sec] [pinduc_sec] *END
r_pnet ::=
c
*R_PNET pnet_ref total_cap [routing_conf] {pdriver_reduc} *END
We will show examples of “d_net” and “r_net” in the next few slides, and omit the “pnet”
examples.
Internals: d_net
A d_net is a detailed description of a net
in a design. // d_net example for SPEF
e
*D_NET INVX1FNTC 2.033341
It is comprised of several sections, *CONN
c
among them *I FL_1281:X O *L 0.0
*I I1184:A I *L 0.343
*D_NET declaration *I FL_1000:A I *L 0.343
n
*I NL_1000:A I *L 0.343
*I TR_1000:A I *L 0.343
Net reference
e
*CAP
Total capacitance 216 FL_1000:A 0.346393
217 I1184:A 0.344053
d
Connectivity (*CONN) section 218 INVX1FNTC_IN 0
219 INVX1FNTC_IN:10 0.0154198
Capacitance (*CAP) section 220 INVX1FNTC_IN:11 0.0117827
a
…
c
152 INVX1FNTC_IN INVX1FNTC_IN:18 8.39117
In the case where a specific net has a 153 INVX1FNTC_IN INVX1FNTC_IN:5 25.1397
154 INVX1FNTC_IN:11 INVX1FNTC_IN:20 4.59517
very high capacitance, you can search 155 INVX1FNTC_IN:12 INVX1FNTC_IN:13 3.688
through the section to see if the value is …
reasonable. *END
*R_NET declaration
ce
It is comprised of several sections,
among them
// r_net example for SPEF
*R_NET NE_794 2.67137
n
*DRIVER NL_1039:X
Net reference *CELL INVX
*C2_R1_C1 1.0039 367.972 1.66747
e
Total capacitance *LOADS
*RC NL_1040:A 1.25641
driver information (*DRIVER) *RC NL_2039:A 714.176
d
pie_model (*C2_R1_C1) *END
a
load information (*LOADS)
RC information (*RC)
c
During timing analysis, you may need to
inspect sections of the SPEF file, like
the r_net section to make sure the
values are reasonable.
e
given design and a have their own local Header
SPEF file.
c
Syntax Name Map
n
define_def ::= define_entry
{define_entry} Power and Ground Nets
e
define_entry ::= Externals, Ports
*DEFINE inst_name
d
{inst_name} entity
a
| *PDEFINE physical_inst
entity
Internals
c
entity ::= qstring
Example
e
*PROGRAM “Parasitics Generator” 219 INVX1FNTC_IN:10 0.0154198
*VERSION “1.1.0” 220 INVX1FNTC_IN:11 0.0117827
*DESIGN_FLOW “EXTERNAL_LOADS” …
*DIVIDER / Header 240 NL_1000:A 0.344804
*DELIMITER : 241 TR_1000:A 0.34506
c
*BUS_DELIMITER [ ]
*T_UNIT 1 NS *RES
*C_UNIT 1 PF 152 INVX1FNTC_IN INVX1FNTC_IN:18 8.39117
*R_UNIT 1 OHM 153 INVX1FNTC_IN INVX1FNTC_IN:5 25.1397
n
*L_UNIT 1 HENRY 154 INVX1FNTC_IN:11 INVX1FNTC_IN:20
4.59517
*POWER_NETS VDD Power and …
*GND_NETS VSS 175 INVX1FNTC_IN:9 INVX1FNTC_IN:10 10.8533
Ground Nets 176 INVX1FNTC_IN:9 INVX1FNTC_IN:11 1.05164
e
*PORTS *END
CONTROL O *L 30 *S 0 0
FARLOAD O *L 30 *S 0 0 *D_NET NE_794 1.98538
Externals/ Internals
INVX1FNTC_IN I *L 30 *S 5 5
Ports *CONN
d
NEARLOAD O *L 30 *S 0 0
TREE O *L 30 *S 0 0 *I NL_1039:X O *L 0 *D INVX
*I NL_2039:A I *L 0.343
*D_NET INVX1FNTC_IN 0.033 *I NL_1040:A I *L 0.343
a
*CONN *CAP
*P INVX1FNTC_IN I 3387 NE_794 0
*I FL_1281:A *L 0.033 3388 NE_794:1 0.0792492
*END …
c
*D_NET INVX1FNTC 2.033341 Internals 3413 NL_1040:A 0.344453
3414 NL_2039:A 0.343427
*CONN
*I FL_1281:X O *L 0.0 *RES
*I I1184:A I *L 0.343 2879 NE_794:1 NE_794:13 66.1953
*I FL_1000:A I *L 0.343 2880 NE_794:1 NE_794:2 0.311289
*I NL_1000:A I *L 0.343 …
*I TR_1000:A I *L 0.343 2903 NL_1039:X NE_794:25 1.00317
2904 NL_2039:A NE_794:23 0.171175
*END
e
*PROGRAM “Parasitics Generator”
*VERSION “1.1.0”
*DESIGN_FLOW “EXTERNAL_LOADS” “EXTERNAL_SLEWS”
*DIVIDER / Header
c
*DELIMITER :
*BUS_DELIMITER [ ]
*T_UNIT 1.0 PS
*C_UNIT 1.0 PF
n
*R_UNIT 1.0 OHM
*L_UNIT 1.0 HENRY
*POWER_NETS VDD
e
*GROUND_NETS VSS Power and Ground Nets
*PORTS
TREE O *L 30 *S 0.0 0.0
d
FARLOAD O *L 30 *S 0.0 0.0
NEARLOAD O *L 30 *S 0.0 0.0
CONTROL O *L 30 *S 0.0 0.0
INVX1FNTC_IN I *L 30 *S 5000 5000
Externals/Ports
a
*R_NET NE_794 2.67137
*DRIVER NL_1039:X
*CELL INVX
c
*C2_R1_C1 1.0039 367.972 1.66747
*LOADS
*RC NL_1040:A 1.25641
*RC NL_2039:A 714.176
*END
*D_NET INVX1FNTC_IN 0.033
Internals
*CONN
*P INVX1FNTC_IN I
*I FL_1281:A *L 0.033
*END
e
*PROGRAM “ParasiticsGenerator”
*VERSION “1.0 ALPHA”
*DESIGN_FLOW “EXTERNAL_SLEWS” “EXTERNAL_LOADS”
*DIVIDER | Header
c
*DELIMITER :
*BUS_DELIMITER [ ]
*T_UNIT 1.0 PS
*C_UNIT 1.0 PF
n
*R_UNIT 1.0 OHM
*L_UNIT 1.0 UH
*NAME_MAP
e
*1 IN1
*2 net1a
*3 blk1 Name Map
*4 net3b
d
*5 OUT1
*PORTS
*5 O *L 0.05 Externals/Ports
a
*1 I *S 5000 5000
*DEFINE *3 “subBLOCK”
Hierarchical Entity
c
*D_NET *4 0.32429
*CONN
*I *3:OUT2 O
*I I104:I I *L 0.044
*CAP
1 *3:OUT2 0.011307
2 I104:I 0.128838
3 *4:1 0.140145
Internals
*RES
5 *3:OUT2 *4:1 7.128
6 *4:1 I104:I 2.55215
*END
Parasitic Extraction
Extraction models
e
SPEF file
Correlation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 604
Extraction Correlation
There are two types of extraction that
are run in the physical implementation Optimization
e
Netlist Extraction
flow.
c
Extraction during optimization
Extraction during optimization is Place/Route
n
done because it is much faster.
e
Extraction during signoff is done GDSII
d
slower and requires special
inputs, such as GDSII, tools Signoff
a
specific libraries, and mapping Extraction
files.
6/16/08
c BD03: Digital Physical Design
SPEF Parasitic File
605
e
Steps
Create Extraction Libraries
Create extraction libraries
c
QRC requires special libraries
generated from technology specific
n
files. Input Routed Design
Input routed design
d e
Commands and directives for setup
Create Command File
a
and extraction.
Run Extraction
c
Run extraction
e
qrc –cmd script.cmd –log process_technology \
-technology_library_file assura_tech.lib \
logfile.log -technology_name tsmc13
c
output_setup \ TCL
-net_name_space schematic \
Command file includes many options, -temporary_directory_name QRCRun \
-file_name QRC_coupled.spef
among them:
n
extraction_setup \ Physical
-max_fracture_length infinite \ Library
-net_name_space layout \
process_technology -max_fracture_length_unit micron
e
input_db \ Routed Design
setup commands (input, output, -type assura \
-directory_name ../rundir \
and extraction) -run_name EngineX4 \
d
GDSII
-format GDS \
-design_file ../routed1.gds \
input_db -design_cell_name EngineX4
a
output_db -type spef Extraction
output_db extract -selection all -type rc_coupled
global_nets -nets VDD VSS
capacitance -decoupling_factor 1.0
c
extract filter_coupling_cap \
SPEF
-coupling_cap_threshold_absolute 0.01
filter_cap \
global_nets -exclude_floating_nets true
filter_res \ Parasitic File
-remove_dangling_res true \
capacitance -merge_parallel_res true
filter commands
Discussion Questions
In a SPEF header, why are the program_name and program_version
important?
used?
ce
What is the difference between d_net and r_net? When are they
What is the difference between coupling cap and fringe cap? Which
integrity?
en
kind of capacitance do we need to be concerned about for signal
a d
6/16/08
c BD03: Digital Physical Design 608
Topics in This Module
Parasitic extraction
e
Delay calculation
nc
d e
ca
6/16/08 BD03: Digital Physical Design 609
Delay Calculation
Delay calculation fundamentals
e
SDF
nc
d e
ca
6/16/08 BD03: Digital Physical Design 610
What Is Propagation Delay?
Definition: The propagation delay is the time difference between the input
signal crossing a voltage threshold and the output signal crossing a voltage
e
threshold.
c
Example: The inverter had a propagation delay of 10 ps.
VH
voltage
VTH_50
input signal
en
d
VL
propagation
delay
INV
a
VH tprop = 10ps
c
VTH_50
VL
output signal
time
e
pass through two specified voltage thresholds. The threshold points are
usually defined as a certain percentage of the voltage swing.
c
Example: The slew of the output signal was 0.01 volt/ps, whereas the
transition time to go from 10% of VDD to 90% of VDD was 10 ps.
n
e
voltage transition transition
time time
d
VH
VTH_90
VTH_10
VL
ca slew
slew
time
ce
Register clocks, inputs
Ports of the design
Pins of a macro inside the design
en
a d path delay
D
c
start end
point point
CK->Q
ce
tcell = tinstrinsic + tload_slew
en
d
Intrinsic Delay Load Slew
Cell delay with zero load The larger the load, the The larger the input slew,
longer the delay the longer the delay
ca tinstrinsic
ce
Library vendor characterizes each cell in the library for timing.
slew
en
# Table for load/slew dependent cell delay
Model(ioDelayRiseModel
d
(Spline
(Input_Slew_Axis 0.050 0.200 1.000 4.000 20.000)
delay
(Load_Axis 0.0446 0.892 3.568 14.275)
a
values
data((0.7210 0.8471 1.2849 3.05673)
(0.8119 0.9380 1.3758 3.1475)
c
(0.9975 1.1236 1.5612 3.3322)
(1.4293 1.5552 1.9922 3.7609)
load (3.3955 3.5204 3.9542 5.7101))
Reduced
Detailed
ce
Estimated
en
Reduced Detailed
d
Uses a “wire load model” (WLM) Uses a reduced SPEF and Uses a detailed SPEF and
that estimates the net delay annotates a lumped RC value annotates the detailed RC
based on load and slew to the net values to the net
ca WLM
SPEF
r_net
SPEF
d_net
ce
tpath = tc1 + ti1 + tc2 + ti2 + tc3 + ti3 + tc4, where
tc1 is the clock-to-q delay of the starting register
ti1, ti2, and ti3 and the interconnect delays
en
tc2 and tc3 are the cell delays of the logic in between the registers
tc4 is the setup time of the ending register
start
a d tpath
D
end
c
point point
CK->Q
Discussion Questions
How does slew and load affect delay?
e
How does a library vendor get the timing data for its technology
libraries?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 618
Input and Output, Format
Delay Calculation
Input
Routed design in the Verilog
language or other HDL + DEF
ce
Parasitic extraction file (SPEF)
Routed Design
Gates +
DEF
TCL
n
SPEF
Logical timing libraries in Liberty
format Delay Calculation
e
Optional: Physical libraries in LEF Logical Physical
Library Library
format SDF
d
Constraints and commands in
TCL Delay File
Output
ca
Standard Delay Format (SDF) file
containing all of the delay
information in the design
e
synthesis.
Rough estimates based on wire load Specification Floorplanning Place/Route
c
models in logic synthesis Designer Placement
n
placement, and CTS
Static Timing Analysis
Design Optimization
Designer
Delay Calculation
PostPlace
Best estimates based on extracted
Signal Integrity
Extraction
RTL
e
CTS
parasitics after routing
Design Optimization
PostCTS
Logic Synthesis
Output of delay calculation (SDF) is used in
d
Route
many other steps in the flow. Synthesized Design Optimization
Gates Gates PostRoute
a
synthesis during optimization. Design Verification
Mask Prep
In signal integrity, delay calculation
c
creates incremental SDF for timing GDSII
GDSII
analysis, based on the SI parasitics.
In static timing analysis, the SDF file
is used to annotate timing on cells
and nets.
e
SDF
nc
d e
ca
6/16/08 BD03: Digital Physical Design 621
What Is SDF?
Definition: An IEEE standard for Example SDF File
(DELAYFILE
the representation and
e
(SDFVERSION "3.0")
interpretation of timing data for (DESIGN "BIGCHIP")
(DATE "March 12, 1995 09:46")
use at any stage of an electronic (VENDOR "Southwestern ASIC")
c
(PROGRAM "Fast program")
design process (VERSION "1.2a")
(DIVIDER /)
(VOLTAGE 5.5:5.0:4.5)
Example: In our design flow, we
n
(PROCESS "best:nom:worst")
have a standalone delay (TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
e
calculator that outputs SDF. We (CELL
(CELLTYPE "BIGCHIP")
loaded the SDF into our static (INSTANCE top)
(DELAY
timing analysis tool to verify our
d
(ABSOLUTE
(INTERCONNECT mck b/c/clk (.6:.7:.9))
design meets its performance (INTERCONNECT d[0] b/c/d (.4:.5:.6))
requirements. )
a
)
)
(CELL
c
(CELLTYPE "AND2")
(INSTANCE top/b/d)
(DELAY
(ABSOLUTE
(IOPATH a y (1.5:2.5:3.4) (2.5:3.6:4.7))
(IOPATH b y (1.4:2.3:3.2) (2.3:3.4:4.3))
)
)
)
…
e
The SDF file stores the timing data generated by EDA tools for use at any stage in the design
process. The data in the SDF file is represented in a tool-independent way and can include
nc
Delays: Module path, device, interconnect, and port
Timing checks: Setup, hold, recovery, removal, skew, width, period, and nochange
Timing constraints: Path, skew, period, sum, and diff
d e
Timing environment: Intended operating timing environment
a
Design/instance-specific or type/library-specific data
c
Scaling, environmental, and technology parameters
Throughout a design process, you can use several different SDF files. Some of these files can
contain pre-layout timing data. Others can contain path constraint or post-layout timing data.
e
Contains all of the basic information of Header
the SDF file’s origin and specifications
c
Cell entries
Identifies a cell or macro that contains Cell Entries
n
timing data to be applied
Within a cell entry, there can be delay,
e
timing check, and timing environment
entries
Delay entries
d
Identifies I/O paths, ports, and
interconnects that contain timing data
Delay Entries
a
to be applied
Timing check entries
c
Associate timing check limit values with
specific cell instances
Timing environment entries Timing Check Entries
Contains timing environment
information, constraints, etc. Timing Environment Entries
e
SDF version Header
Design name
c
Vendor, program name, and version
Process Information, timescale
Cell Entries
n
Example:
e
(DELAYFILE
(SDFVERSION "3.0")
(DESIGN “MYCHIP")
(DATE “December 30, 2007 12:08")
d
(VENDOR "ASIC_vendor")
(PROGRAM “SDF_program")
(VERSION “2.4.1") Delay Entries
a
(DIVIDER /)
(VOLTAGE 1.5:1.3:1.1)
c
(PROCESS "best:nom:worst")
(TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
The SDF version, vendor, and program name are Timing Check Entries
important to note for debug reasons. Also, process,
temperature, voltage, and timescale information
should be consistent with your timing analysis. Timing Environment Entries
Cell Entries
Cell entries identify the cells and macros in a
design with the following information: SDF File
e
Cell type Header
c
Cell instance name
(CELL
(CELLTYPE “DFF”)
(INSTANCE u1/u2/u3_reg)
en
d
…
Delay Entries
a
)
)
c
)
Incremental
Pulse width
ce
n
It is important to differentiate them. For a typical analysis with crosstalk, the
absolute delays are first annotated, then the incremental delays due to crosstalk are
e
annotated and added to the existing delays.
d
Absolute Incremental Pulse Width
SDF delays overwrite existing SDF delays are added to existing Pulse limits are set for specific
a
delays during annotation. delays during annotation. Points.
c
in1 limit1
2ns 2ns
in2
1ns 1ns
e
Absolute Header
SDF replaces existing delay values in the
c
design during annotation.
Increment
SDF adds to existing delay values in the Cell Entries
n
design during annotation.
Pathpulse
e
Pulse width limits
Examples:
d
(DELAY
(ABSOLUTE
Delay Entries
a
(IOPATH (posedge clk) q (22:28:33) (25:30:37))
(PORT clr (32:39:49) (35:41:47))
)
)
c
(DELAY
(INCREMENT
(IOPATH (posedge clk) q (-4::2) (-7::5))
(PORT clr (2:3:4) (5:6:7))
) Timing Check Entries
)
(DELAY
(PATHPULSE i1 o1 (13) (21))
) Timing Environment Entries
Recovery
Removal
ce
Setup/Hold
Setup: Limit of time where data
en
Recovery
Limit of time between the
Removal
Limit of time between an
d
must remain stable before the removal of an asynchronous active clock edge and the
clock edge signal (not data) and an removal of an asynchronous
a
active clock edge signal (not data)
Hold: Limit of time where data
a_rstb a_rstb
must remain stable after the
c
clk clk
clock edge clk clk
a_rstb a_rstb
recovery removal
e
Setup Header
c
Hold
Recovery
Removal Cell Entries
Example:
(TIMINGCHECK
(SETUP din (posedge clk) (12))
en
d
)
(TIMINGCHECK
Delay Entries
a
(HOLD din (posedge clk) (9.5))
)
(TIMINGCHECK
c
(RECOVERY (posedge clearbar) (posedge
clk) (11.5))
)
Timing Check Entries
(TIMINGCHECK
(REMOVAL (posedge clearbar) (posedge
clk) (6.3)) Timing Environment Entries
)
e
well as provide information about the Header
environment the circuit will operate.
c
Among the entries are
Constraints for path, period, skew, Cell Entries
n
etc.
e
Time for arrival, departure, slack
Waveform
a
contained in the Standard Design
Constraints (SDC) file
d Delay Entries
6/16/08
c BD03: Digital Physical Design
Timing Check Entries
631
e
(VENDOR "Southwestern ASIC")
(PROGRAM "Fast program")
(VERSION "1.2a")
Header
(DIVIDER /)
(VOLTAGE 5.5:5.0:4.5)
c
(PROCESS "best:nom:worst")
(TEMPERATURE -40:25:125)
(TIMESCALE 100 ps)
(CELL
(CELLTYPE "BIGCHIP")
n
(INSTANCE top)
(DELAY
(ABSOLUTE
(INTERCONNECT mck b/c/clk (.6:.7:.9))
Cell 1 – Top with interconnects
(INTERCONNECT d[0] b/c/d (.4:.5:.6))
e
)
)
)
(CELL
(CELLTYPE "AND2")
(INSTANCE top/b/d)
d
(DELAY
(ABSOLUTE
Cell 2 – AND gate with delays
(IOPATH a y (1.5:2.5:3.4) (2.5:3.6:4.7))
(IOPATH b y (1.4:2.3:3.2) (2.3:3.4:4.3))
)
a
)
)
(CELL
(CELLTYPE "DFF")
(INSTANCE top/b/c)
c
(DELAY
(ABSOLUTE
(IOPATH (posedge clk) q (2:3:4) (5:6:7))
(PORT clr (2:3:4) (5:6:7))
Cell 3 – Register with delays, setup checks
)
)
(TIMINGCHECK
(SETUPHOLD d (posedge clk) (3:4:5) (-1:-1:-
1))
(WIDTH clk (4.4:7.5:11.3))
)
)
(CELL
. . .
)
More Cells
)
e
Steps Generate SignalStorm
Generate SignalStorm libraries Libraries
c
Generates libraries for more accurate
delay calculation Generate SignalStorm
n
Generate a SignalStorm design Design Database
database
Imports netlist information
e
Import SPEF
Import SPEF
Imports parasitics
d
Setup Conditions
Setup conditions
a
Sets up the boundary conditions for the
design, including slew and load
information Calculate Delay
c
Calculate delay
Core algorithm to calculate delay for
the design Generate Output File and
Reports
Generate output file and reports
Generate SDF and reports
e
sndc –S script.cmd –L
logfile.log
c
SPEF
n
Tech
Libraries
# QRC Command File : GDSII -> SPEF
Create and open design database db_open demo
db_install -spef test.spef
e
db_setup -setup test.st -process worst Routed Design
Import SPEF db_load TEST_CHIP
db_delay -process worst
db_xtk -process worst
Import setup commands
d
db_report sdf -p worst -report test.sdf DEF
-design TEST_CHIP -xtk_min fast
Load and link design -xtk_max slow
db_close
a
Delay Calculation
Calculate delay
c
SDF
Delay File
e
SDF
nc
d e
ca
6/16/08 BD03: Digital Physical Design 635
Back-Annotation
Delay calculation produces an
SDF timing file, based on
e
*SPEF
technology information, SPEF,
and design information (netlist).
nc
The analysis tool can now read in
the SDF, as well as the design
and technology information, to
Tech
Lib
Delay
Calculator
e
produce its reports.
Since the SDF is already created,
d
all of the timing information can be SDF Netlist
a
flow, thus ensuring consistency.
c
Tech Analysis
Lib Tool
e
information and create and SDF
file with more granular constraints Tech Analysis User
c
Lib Tool Constraints
to drive an implementation tool.
The implementation tool
n
(synthesis, or place/route) can use
e
the information in the forward- SDF
annotated SDF file to more Constraints
d
and possibly make better choices
to meet its overall constraints.
ca Implementation
Tool
Summary
There are various type of extraction models, which vary accuracy with
runtime. They include parallel plate, 2.5D, and 3D models.
ce
The Standard Parasitic Exchange Format (SPEF) file is the IEEE
standard to store parasitic information for a design. It has several
sections, including header, externals, and internals.
en
Fundamentally, delay calculation is based on the concepts of
propagation delay, transition time, and slew. We saw that delay is a
function of transition time and slew, among other variables.
a d
The Standard Delay Format (SDF) file is the IEEE standard to store
delay information. It has several sections, including header, cell, and
delay entries.
6/16/08
c
SDF delay data can be back-annotated to analysis tools, whereas
SDF constraint data can be forward-annotated to implementation tools.
e
1. 3D extraction models are used for quick and relatively inaccurate
parasitic calculations.
of the tool.
nc
2. In the header section of SPEF, there is a place to annotate the version
e
setup time check.
d
4. In the “cell entry” of an SDF file, all of the relevant timing information
for the cell is included within its boundaries.
ca
5. One advantage of using a delay calculator’s SDF is that the timing
calculations will be consistent throughout the entire flow.
Learning Activity
In this activity, you will
e
Study the physical implementation flowchart
Add SDF and SPEF files at the appropriate step in the flow
c
Present your results to the class
n
e
20 minutes for activity
10 minutes for debriefing
a d
6/16/08
c BD03: Digital Physical Design 640
Sources
Standard Parasitic Exchange Format (SPEF), IEEE Standard 1481-1999
e
Standard Delay Format Specification Version 3.0, Open Verilog
International: http://www.eda.org/sdf/sdf_3.0.pdf
nc
d e
ca
6/16/08 BD03: Digital Physical Design 641
ce
en
a d
6/16/08
c BD03: Digital Physical Design 642
Static Timing Analysis and
Signal Integrity Analysis
Module 10
ce
en
a d
6/16/08
c BD03: Digital Physical Design 644
Module Objective
In the class, you will be able to
e
Explain static timing and signal integrity (SI) analysis and identify
problems
nc
d e
ca
6/16/08 BD03: Digital Physical Design 645
Discussion Questions
What determines the speed at which a circuit works?
e
How do you gauge if your circuit works correctly at the required
speed?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 646
Topics in This Module
Static timing analysis (STA)
e
Signal integrity analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 647
e
Timing constraints
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 648
Purpose for Timing Analysis
The goal of timing analysis is to verify that a design meets timing
requirements under a specified set of timing constraints.
ce
Timing analysis lets you determine how fast a design can run without
The results of timing analysis can be used to fine tune and debug the
n
speed-limiting, critical paths in a design.
e
a d
6/16/08
c BD03: Digital Physical Design 649
e
given timing constraints
c
Analyzes all possible timing paths in a short period of time
Ignores functionality of circuit, thus analyzing paths that cannot be
n
exercised and must be eliminated by the designer
Preferred method for signoff
d e
Designer creates timing test vectors that are simulated using a gate-level
netlist to verify timing
a
No false paths exist
Easy to miss paths by not including them in vectors
c
Requires a significant amount of CPU time to do simulations
This is a mandatory late-stage run to ensure that paths not tested by static
timing analysis are checked
In this course, we will only cover static timing analysis.
6/16/08 BD03: Digital Physical Design 650
What Is Static Timing Analysis?
The preferred method for timing
signoff
Definition: Process of
e
Floorplanning Place/Route
Specification
c
computing the timing of Designer Placement
Physical Synthesis
Architecture Scan Reorder
digital design without regard
Delay Calculation
Pre-CTS
Signal Integrity
Extraction
RTL
e
behavior CTS
Design Optimization
PostCTS
d
Route
timing of the design, we ran Synthesized
Gates Design Optimization
Netlist PostRoute
static timing analysis after
a
Detail
detail route, and saw several Routed
Design
GDSII
c
Layout Design Verification
time requirements.
GDSII GDSII Mask Prep
ce
Design in the Verilog® language or
other HDL (Note: STA can be run
on a design at any stage of the back-
end flow.)
SPEF
n
SDC TCL
Incremental
Constraints in Synopsys Design SDF Gates
e
Constraints (SDC) format
Logical timing libraries in Liberty Static Timing
Analysis
d
(.lib) format
Logical
Constraints and commands in TCL Library
a
SPEF, SDF, and
incremental SDF (SI analysis) Reports
Output
6/16/08
c
Timing reports, including noise-on-
delay effects (SI analysis)
e
Timing constraints
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 653
e
Software tools use the timing constraints to guide the timing-driven
optimization tools in order to meet these goals.
Clocks definition
n
Input delay/arrival time
c
Some of the timing constraints that STA tool follows are
Operating conditions
d e
Output delay/required time
ca
6/16/08 BD03: Digital Physical Design 654
What Is a Clock Definition?
Clock period: The time difference between two consecutive rising or falling
clock edges when they cross a specific reference level
rectangular waveform
ce
Duty cycle: The ratio between the pulse duration (t) and the period (T) of a
en
Pulse Duration t
Duty Cycle = t/T
a
Rising Falling
Edges Edges
d Clock Period T
6/16/08
c BD03: Digital Physical Design 655
Types of Clocks
Ideal clocks
To simplify clock analysis, we assume that under ideal condition all flip-flops are
e
clocked together at a time reference (time = 0 ns).
c
In ideal mode, clock tree has zero insertion delay.
Propagated clocks
n
Insertion delay is the known delay of the clock tree to any given end point.
Clock uncertainty = clock skew + clock jitter, is the unknown variation in clock
e
delays.
Clock delays are calculated from clock tree routing and extracted delays.
a
Ideal Clock
d
Provides more accuracy and is used for final timing closure.
c
Clock Insertion Delay Clk
skew
e
Clock pin
used. C. logic
c
Uncertainty consists of margin
(extra delay the design team Delay
n
adds), clock skew, and clock Clock Source Network
latency latency
jitter. source
e
Estimated latency is considered.
d
Post-CTS
Propagated Clock
Propagated clocks are used.
a
Clock pin
Uncertainty consists of margin C. logic
c
and clock jitter.
Propagated latency is Delay
e
The external delay time or the required time is the time determined by external
c
logic before the next rising edge of the clock.
Input delay
en Output delay
a d
6/16/08
c clock period
Data arrival timing
clock period clock period
Data Required timing
e
Each wafer batch is made with a slightly different set of process parameters and thus, inherently, the
die will run at different speeds. In fact, there can even be variations across a single die, (OCV), which
c
will be discussed later.
This constraint describes the process, voltage, and temperature conditions of design.
n
There are three conditions: worst, best, and typical.
Operating conditions can be set from a single set of libraries (min, typ, or max) or from multiple
e
libraries (min and max), and used to perform setup and hold analysis.
The technology libraries contain information on how to scale the cell parameters with variation in
d
process parameters and operating conditions that can be used to calculate accurate cell delay.
ca
WORST case,
HIGH temperature,
LOW voltage,
BEST process
WORST case,
HIGH temperature,
LOW voltage,
WORST process
e
Timing constraints
Timing exceptions
nc
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 660
Check Constraints
When a design is loaded into an STA tool and constraints are applied
Checks for consistency and completeness of the timing constraints specified for a
e
design
c
The timing constraints should be complete before running a timing debug
STA tools come with specific commands that run these checks.
en
CHECK Constraints
a d Clean?
NO
c
YES
TIMING
ce
Arrival time and required time for each clock in a multiple clock system
Clock gating points
en
Combinational loop in the design
Constant collision/contradiction on a net connected to the pin
Multiple clocks arriving at a leaf cell
a d
6/16/08
c BD03: Digital Physical Design 662
What Is a Timing Report?
The timing report is a summary of the final timing information.
There are separate reports for setup time analysis and hold timing analysis.
Header
ce
The report usually consists of the following parts:
n
Body
e
calculated
Header • End point arrival time calculation
• Slack calculation
c
• An external input pin to an internal
Body register
• An internal register (or input select
pin) to an output pin
• An internal register to another
internal register (C2C)
e
+ Source Insertion Delay 3.000
- External Delay 2.000
+ Phase Shift 10.000 Header
- Uncertainty 0.250
c
= Required Time 10.750
- Arrival Time 7.447
= Slack Time 3.303
Clock Rise Edge 0.000
n
+ Source Insertion Delay 4.000
= Beginpoint Arrival Time 4.000
+-------------------------------------------------------------------------------------+
e
| Instance | Arc | Cell | Delay | Arrival | Required |
Body| | | | | Time | Time |
|----------------------------+---------------+-----------+-------+---------+----------|
d
| i_150 | Y ^ | | | 4.000 | 7.303 |
| DTMF_INST/m_clk__L1_I1 | A ^ -> Y v | CLKINVX20 | 0.327 | 4.327 | 7.630 |
| DTMF_INST/m_clk__L2_I2 | A v -> Y ^ | CLKINVX20 | 0.278 | 4.604 | 7.908 |
a
| DATA_BUS_MACH_INST/reg_4 | CK ^ -> Q ^ | SDFFRHQX1 | 0.507 | 5.112 | 8.415 |
| TDSP_CORE_GLUE_INST/i_9712 | A ^ -> Y v | INVXL | 0.135 | 5.247 | 8.550 |
| TDSP_CORE_GLUE_INST/i_9713 | A v -> Y ^ | INVXL | 0.101 | 5.348 | 8.651 |
| PORT_BUS_MACH_INST/i_9761 | A ^ -> Y v | INVXL | 0.095 | 5.443 | 8.747 |
c
| PORT_BUS_MACH_INST/i_9762 | A v -> Y ^ | INVX2 | 0.122 | 5.566 | 8.869 |
| FE_OFC1146_tdsp_portO_4_ | A ^ -> Y ^ | BUFX12 | 0.172 | 5.738 | 9.041 |
| IOPADS_INST/Ptdspop04 | I ^ -> PAD ^ | PDO04CDG | 1.709 | 7.447 | 10.750 |
| | data_out[4] ^ | | 0.000 | 7.447 | 10.750 |
+-------------------------------------------------------------------------------------+
e
Timing constraints
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 665
e
Setup Time: The time a synchronous input must be stable before active clock
edge.
c
Hold Time: The time a synchronous input must be stable after active clock
edge.
en
Input Data Valid
d
Setup Time Hold Time
ca Clock
ce
A hold time violation is when a signal arrives too early and advances one
en Input Data
d
Setup Time Hold Time
Violation Violation
ca
Clock
e
Beginpoint: reg_1/Q (v) triggered by leading clk ^ 0.000 0.088
edge of ’CLK1’
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.178
Other End Arrival Time 0.104
- Setup 0.167
c
ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.275
+ Phase Shift 2.000
= Required Time 1.937 ck_2 A ^ -> Y ^ BUFX2 0.094 0.282 0.369
- Arrival Time 1.946 ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.462
n
= Slack Time -0.009
Clock Rise Edge 0.000 ck_4 A ^ -> Y ^ CLKAND2X2 0.150 0.524 0.612
= Beginpoint Arrival Time 0.000
reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 0.900
e
t_1 A ^ -> Y ^ BUFX8 0.111 0.923 1.011
t_2 A ^ -> Y ^ BUFX8 0.092 1.015 1.103
reg_1 reg_2
d
t_3 A ^ -> Y ^ BUFX8 0.092 1.107 1.195
a
CK CK
t_5 A ^ -> Y ^ BUFX4 0.132 1.331 1.379
t_1 t_12
clk t_6 A ^ -> Y ^ BUFX8 0.092 1.423 1.471
t_7 A ^ -> Y ^ BUFX6 0.112 1.535 1.563
c
ck_0 t_8 A ^ -> Y ^ BUFX8 0.092 1.627 1.655
ck_4
t_9 A ^ -> Y ^ BUFX4 0.128 1.755 1.747
t_10 A ^ -> Y ^ BUFX8 0.088 1.843 1.835
t_11 B ^ -> Y ^ NAND2X1 0.066 1.909 1.901
t_12 A ^ -> Y ^ INVX1 0.037 1.946 1.937
e
Beginpoint: reg_1/Q (v) triggered by leading edge of ’CLK1’
Other End Arrival Time 0.933
+ Hold 0.179
c
+ Phase Shift 0.000 Arrival Required
= Required Time 1.152 Instance Arc Cell Delay Time Time
Arrival Time 1.099
n
clk ^ 0.000 0.088
= Slack Time -0.053
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.178
Clock Rise Edge 0.000
e
= Beginpoint Arrival Time 0.000 ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.275
d
ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.462
a
reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 0.900
c
t_2 A ^ -> Y ^ BUFX8 0.092 0.996 1.084
ce
Increasing cell drivability by upsizing cell
Adding buffers to optimize the critical path and reducing the load on
complex gates with large fanout
Upsize Cell
en Insert Buffer
a d
To fix hold violation, we need to make the signal path slow by
Adding delay cells to slow the signal
c
Reducing drivability of cells
e
What constraints define a clock?
In reading a timing report, how do you know that the design has a
timing violation?
nc
d e
ca
6/16/08 BD03: Digital Physical Design 671
e
Timing constraints
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 672
What Are Timing Exceptions?
Paths that are given special consideration by the timing analysis tool
False paths
Multicycle paths
ce
n
False Paths Multicycle Paths
Paths that are not exercised during operation Paths that take multiple cycles
False
Path
DFF1
d e Multicycle
Path
DFF1
a
N cycles
c
N clock cycles
DFF2
DFF2
ce
to be timing constrained (i.e., path between two clock domains).
e A
a_b
A+B
d
Blocking of timing arcs C
a
adder
c_d
c
B C+D
Sel
Examples: Multiplexed Logic in a Test Mode
ce
multiples of clock frequency
data
data
d
CLK2 CLK1
BlockA BlockB
ca
CLK1
DATA
CLK2
T cycle
time
T/2 cycle
time
6/16/08 BD03: Digital Physical Design 675
e
Timing constraints
Timing exceptions
nc
Setup and hold timing violations
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 676
Three Timing Analysis Modes
There are three different timing analysis modes.
e
Timing Analysis Mode Description
c
Single Single operating condition used to scale delay
value
n
Best case and worst Analyzes off-chip variation for two extreme
case (BC-WC) operating conditions
e
On-chip variation (OCV) OCV is the small difference in the operating
parameter value across the chip.
d
In this course, we will only cover the OCV mode.
a
6/16/08
c BD03: Digital Physical Design 677
ce
In this analysis mode, the delay calculation for one path may be based on
maximum operation condition while delay calculation for another path may be
based on minimum operating condition for setup and hold checks.
n
On-chip variations
for setup timing check
d e
Data Delay
Worst Case
ca Clock
Best Case
CLOCK Delay
On-chip variations
for hold timing check
ce
Apply min and max delays to different paths simultaneously.
For setup check, annotate worst-case SDF. Use max delay for launch path and min
delay for capture path.
n
For hold check, annotate best-case SDF. Use min delay for launch path and max
delay for capture path.
Launch
0 2 4 6
d
8
e 10
a
Early Late
Launch Capture
Capture
c
L T L T L T CLK1
Phase shift (late)
Ideal clock edges
launch clock
root
ce
n
late path
capture clock
d e min
library
max
library
ca
For setup check, the timing delay values from the Max library are used for the
data and the launch clock network delay.
The delay values from the Min library are used for the capturing clock network
delay assuming that the clocks are set in propagated mode.
late path
capture clock
root
ce
launch clock
en early path
d
min max
library library
ca
For hold check, the timing delay values from the Min library are used for the
data arrival time and launch clock network delay.
The delay values from the Max library are used for the capturing clock network
delay assuming that the clocks are set in propagated mode.
e
identifying and removing the pessimism introduced in the slack reports for
clock paths when the clock paths have a segment in common.
c
Example: In the on-chip variation methodology, during setup checks, if both
the launch clock late path and the capture clock early path share a portion of
n
the clock network, then for the common clock network, a pessimism equal to
the difference in maximum and minimum delay values is introduced in the
e
slack values.
d
common segment
a
early path
c
launch clock
root
late path
capture clock
e
FF1
Dcommon (dc)
root
nc FF2
d
d2
ca
CRPR: The common path cannot be de-rated by two different values at the
same time.
The slack calculation is too pessimistic.
The pessimism is P = dc x Mslow – dc x Mfast.
New slack = slack(w/o CRPR) + P.
6/16/08 BD03: Digital Physical Design 683
e
Beginpoint: reg_1/Q (v) triggered by leading clk ^ 0.000 0.508
edge of ’CLK1’
ck_0 A ^ -> Y ^ BUFX2 0.091 0.091 0.598
Other End Arrival Time 0.104
c
- Setup 0.167 ck_1 A ^ -> Y ^ BUFX2 0.097 0.188 0.695
+ Phase Shift 2.000 ck_2 A ^ -> Y ^ BUFX2 0.094 0.282 0.789
+ CPPR Adjustment 0.420
ck_3 A ^ -> Y ^ BUFX2 0.092 0.374 0.882
n
= Required Time 2.358
- Arrival Time 1.946 ck_4 A ^ -> Y ^ CLKAND2X2 0.150 0.524 1.032
= Slack Time 0.412 reg_1 CK^ -> Q v DFFRHQX1 0.288 0.812 1.320
e
Clock Rise Edge 0.000 t_1 A ^ -> Y ^ BUFX8 0.111 0.923 1.431
= Beginpoint Arrival Time 0.000
t_2 A ^ -> Y ^ BUFX8 0.092 1.015 1.523
d
t_3 A ^ -> Y ^ BUFX8 0.092 1.107 1.615
t_4 A ^ -> Y ^ BUFX8 0.092 1.199 1.707
a
t_5 A ^ -> Y ^ BUFX4 0.132 1.331 1.799
t_6 A ^ -> Y ^ BUFX8 0.092 1.423 1.891
c
t_7 A ^ -> Y ^ BUFX6 0.112 1.535 1.983
t_8 A ^ -> Y ^ BUFX8 0.092 1.627 2.075
t_9 A ^ -> Y ^ BUFX4 0.128 1.755 2.167
t_10 A ^ -> Y ^ BUFX8 0.088 1.843 2.255
t_11 B ^ -> Y ^ NAND2X1 0.066 1.909 2.321
t_12 A ^ -> Y ^ INVX1 0.037 1.946 2.358
reg_2 D v DFFRHQX1 0.000 1.946 2.358
e
Timing constraints
Timing exceptions
nc
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 685
MMMC Design
Today’s chips include
GPRS MP3 Awake Scan
Multiple standards support EDGE Camera Doze BIST
e
WCDMA Gaming Sleep OPMISR
Multiple functionalities
c
Multiple power profiles
Multiple test modes
n
Results in multiple constraint sets
e
It becomes more difficult below 90 nm to
Determine worst-case corner combinations
d
Determine RC corners
Mode 1 Mode 2 Mode 3
Determine constraint modes
a
(functionality) (test) (power)
Min Max Min Max Min Max
MMMC provides the ability to concurrently
c
support multiple combinations of modes and SDC1 SDC2 SDC3
corners.
Example: Cell phone chips typically need to
be designed for 20 mode/corners scenarios.
e
depending on process variation
Analysis needs to be done at more than just
c
a single min corner and single max corner
Identification of single worst corner-case and
fixing violation becomes difficult due to
n
differing condition
Multi-corner capability enables you to
e
analyze and optimize at all these corner Delay Calculation Corners
cases.
RC Corner
Multi-mode timing analysis • Timing Libs Constraint Mode
d
• cdB Libs Descriptions
A design can have multiple modes of • PVT setting • Clock defs
operation and each mode can have different, • De-rating
a
• Constants
even conflicting, constraints • SDF • Exceptions
Allows concurrent analysis and optimization • RC Controls (SDC)
c
of multiple modes, eliminating iterations for
timing closure
Multi-corner timing analysis
Used to resolve different timing problems that Delay Corner
appear at different processes, voltages, and Constraint Mode
temperatures pointers
Analysis Views
6/16/08 BD03: Digital Physical Design 687
ce
4.Analyze the timing reports from multiple scenarios
Synthesis
STA Tool
n
5.Determine which scenario to optimize Load design
e
To analyze by using the sequential multimode Create
scenario scenario
Per scenario
d
1.Define the current scenarios Set
operating
2.Identify the critical scenario based on timing report
a
conditions
generated by STA tool
3.Define the most critical scenario as the first scenario
c
Set constraints
in the current scenario definition
Identify most
4.Run optimizations such as clock tree optimization, critical scenario
post placement optimization, or routing optimizations
Analyze
Repeat steps 2 through 4 until timing is satisfactory
Optimize
e
Timing constraints
Timing exceptions
nc
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 689
What Is Correlation?
Synthesis, place and route, and the sign-off tools Design Entry
are different (usually).
e
Synthesis uses wire load models estimation of Synthesis
physical design.
c
Timing Engine
Need to adjust wire load model coefficients.
n
Place and route uses more realistic numbers for Place
physical design.
Timing Engine
e
Timing more accurate as flow progresses.
Different timing engines used at different stage use
Route
d
different technique to calculate timing.
Timing Engine
Do the optimizer and placer see the same worst paths
a
as the static timer?
RC Extraction
c
Correlation is an indication of the relationship
between two variables.
SI Analysis
Static Timing
Analysis (Sign-off)
e
Implementation tools have an in-built extraction tool, which are different from sign-
off extraction tools.
c
Extracted output will be different
Both tools should see same information and provide the same results.
n
Prevents additional work at the time of sign-off
e
At 130 and 90 nm, parasitic effects are small, and there is not that much that you need
to correlate.
d
At 45 nm, correlation between different timing infrastructures is nearly impossible,
based on the number of complex effects.
250 nm
Sign-Off Tool
180 nm 130 nm
Delay Variation Using
Technology Node
Implementation Tool
90 nm 65 nm
Total capacitance
ce
Generate the scaling factors or the de-rating factors for
Cross-coupling capacitance
Resistance
en
The timing scaling factors affect the path delay values generated in
the timing reports.
a d
Scaling factors are set for data paths, clock paths, minimum and
maximum operating conditions.
6/16/08
c BD03: Digital Physical Design 692
Post-CTS and Post-Route Correlation
Post-CTS
Post-CTS
Actual clock tree delays
e
propagated Clock pin
C. logic
c
Actual clock net delays used
instead of estimates done at pre-
CTS Delay
n
Clock Source network
Post-route source latency latency
e
All cells and nets have fixed
location on design
d
Generates a more realistic timing
Post Route
result
a
Clock pin
Effects due to congestion are
C. logic
taken into account
c
Effects due to signal integrity can Delay
be taken into account
Clock Source network
Account for mismatch between source latency latency
pre-route and post-route delays
e
Timing constraints
Timing exceptions
nc
e
On-chip variation (OCV) and clock path pessimism removal (CPPR)
d
Multi-mode multi-corner (MMMC) design
a
Timing correlation
6/16/08
c BD03: Digital Physical Design 694
What Are Design Rule Constraints?
Design rule constraints are requirements depending on the technology library.
ce
These rules are established by the library vendor for the proper functioning of
the fabricated circuit; they must not be violated.
User can set more restrictive values—the explicit values—but cannot remove
n
implicit design rule constraint attributes.
d e Constraints
a
Max Tran
Max Cap
e
Output of every gate usually has one or more of the following design
rule constraints:
c
Max transition
n
Max fanout
Max capacitance
d e
ca
6/16/08 BD03: Digital Physical Design 696
What Is Maximum Transition Time?
The maximum transition time for a net is the longest time allowed for
its driving pin to change logic values.
ce
Typically, fixed by buffering the output of driving gate.
1x
en
Before Optimization
After Optimization
1x
d
2x 1x
1x
ca
Maximum Transition
Rule Violation
Maximum Transition
Rule Met
ce
To prevent routing congestion, as well as to help the synthesis tool meet
maximum transition and capacitance constraints, we need to specify the
maximum fanout limit for the design.
n
Most technology libraries place fanout restrictions on driving pins, creating an
implicit fanout constraint for every driving pin in designs using that library.
d e
ca
6/16/08 BD03: Digital Physical Design 698
What Is Maximum Capacitance?
Maximum capacitance specifies the maximum capacitance allowed on
the output pin of a cell.
ce
The maximum capacitance design rule constraint allows you to control
the capacitance of nets directly.
The design rule constraints max_fanout and max_transition limit the
n
actual capacitance of nets indirectly.
e
a d
6/16/08
c BD03: Digital Physical Design 699
Learning Activity
In this activity, you will
e
Study a timing report which has a critical path which is failing to meet
the timing requirement
nc
Analyze the report and identify the problem
Decide which course of action is best suited to fix the critical path so
that it meets timing
e
Present your findings to the class
d
ca 20 minutes for activity
10 minutes for debriefing
e
Signal integrity analysis
nc
d e
ca
6/16/08 BD03: Digital Physical Design 701
e
caused by interconnect Designer Placement
c
Physical Synthesis
Design Optimization
Designer
noise and/or changes delays Pre-CTS
Delay Calculation
Signal Integrity
n
Extraction
RTL CTS
e
design, we saw SI effects Logic Synthesis
Route
such as noise-on-delay and Synthesized
Gates Design Optimization
Netlist PostRoute
glitches, due to long nets that
d
Detail
were running in parallel. Routed
Design
GDSII
a
Layout Design Verification
GDSII
c
GDSII Mask Prep
e
SPEF
Routed design in the Verilog language
or other HDL + DEF
c
Constraints in Synopsys Design Routed Design
Constraints (SDC) format SDC TCL
n
Gates +
Constraints and commands in TCL DEF
e
Logical timing libraries in Liberty (.lib) Signal Integrity
format
d
Logical Physical
Physical libraries in LEF format Incremental Library Library
SDF
Tool specific SI libraries
a
Tool
Specific
Output Delay File Library
c
Incremental SDF file containing all of
the delay information in the design
related to noise-on-delay
Reports for glitch nets
List of problem nets that need to be re-
routed
ce
n
SI Problems
d e
ca 0.15 0.13 0.09
Process Technology
0.065
e
Finer geometries
c
Greater wire and via resistance
Higher electric fields (if supply voltage not scaled)
n
Smaller spacing rules between wires
e
Higher ratio of cross coupling to grounded capacitance
d
Interconnect: Determining factor for
performance, power, and yield
a
30
Delay (ps)
25
c
20 Total
15
Gate
10
5 Interconnect
Shrinking Process
6/16/08 BD03: Digital Physical Design 705
e
Glitch on functionality
Noise library
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
6/16/08 BD03: Digital Physical Design 706
What Is Crosstalk Effect?
Crosstalk is caused by transition on an adjoining signal having a capacitive or
inductive coupling between neighboring wires leading to an unintended logic
e
transition.
Victim net: Net on path affected by crosstalk
c
Aggressor net: Net that affects victim net
Switching window: Time interval when a signal transition may occur. When
n
coupled signals switch
e
In opposite direction (aggressor), victim line signal delay increases.
In same direction (helper) victim line signal delay decreases
a
Aggressor
Wire R
d Aggressor net
c
Grounded C
Drive R Coupling C
Victim net
Victim
Input Noise Tolerance
ce
Crosstalk can have two effect on victim nets:
Crosstalk causes signal to slow down.
e
Delay here depends on the
behavior of other nets
a d Wire R
in1
FF
c
in
Grounded C Coupling C
e
The cross coupling between the in
c
Coupling C
down.
n
This can affect the setup time a1
requirement of the flip-flop if the
e
signal arrives late.
In1 Data
a d Setup Time
c
Clock
In1 Data
Setup Time
e
in
The cross coupling between the
c
two nets causes the victim to Coupling C
speed up.
n
This can affect the hold time a1
requirement for a flip-flop if the
e
signal arrives early.
d
In1 Data
a
Hold Time
c
Clock
e
Compute slew rate
rates internally
c
Uses timing windows and logic
constraints to disallow specific
n
simultaneous switching scenarios Disallow
between victim and attacker nets simultaneous
e
switching
Analyzes each valid overlapping
attacker subset to determine the worst-
d
case delay change
Find victim with
a
Outputs either an incremental or full worst delay
SDF file for all nets
6/16/08
c BD03: Digital Physical Design
Generate
incremental
or full SDF
711
e
Glitch on functionality
Noise library
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
6/16/08 BD03: Digital Physical Design 712
Impact of Noise on Functionality
Coupling noise can cause functional failures.
Slew rate (dv/dt) and capacitance (C) set glitch current (i).
ce
Load impedance sets the glitch voltage.
The attacker causes a significant glitch on the reset signal such that it resets the
flip-flop and destroys the stored logic state.
n
With lower transistor threshold voltages (Vtn and Vtp) for low power design, glitches
can lead to unintended switching of transistors.
d e 1 d
q
a
Attacker
0
clk
c
i C
1 reset
Victim
i=Cdv/dt
e
Propagate
noise glitch reaches a storage element noise glitch
c
(latch or flip-flop)
This reduces the number of potential
n
false alarms as it utilizes the inherent Check if noise
glitch filtering properties of CMOS logic reached storage
e
Measures the height of the glitch after it elements
has propagated to the receiver output
d
Performs sensitivity analysis, which
a
determines if a glitch will amplify or not
If the glitch does not amplify, it cannot
Measure glitch
height at
receiver output
6/16/08
c
cause a functional failure
714
Example Text Glitch Report
Generated with generate_report -sort_by rcvr_peak -slack
e
*******************************************************************************************
CeltIC Noise Report
Generated: Fri Aug 15 10:22:01 PDT 2007
c
***************************************************************************
Report Options:
---------------------------------------------------------------------------
n
Slack : yes
Sort by : noise (receiver input peak)
Threshold : 10.0 (mV)
e
Level : VH and VL
---------------------------------------------------------------------------
Peak(mV) Level TotalArea %AreaTillPeak Width(ps) VictimNet
d
1687.614 VL 1067.88 17.17 1265.55 U2DFF:CP {CLK2}
a
Value ReceiverNet
1559.185 U2DFF/CP (DFQD1)
c
Constituents:
Source Peak(mV) Offset(ps) Slew(ps) Edge Net TraceBackNet(NoiseType)
Cpl: 1687.614 4950.000 50.000 R CLK1 -
Baselevel: 0.000 - - - - -
---------------------------------------------------------------------------
e
Increasing the spacing between the affected nets
c
Add a shielding wire between the affected nets; shield is usually VSS
n
d e
ca
6/16/08 BD03: Digital Physical Design 716
Signal Integrity Analysis
Crosstalk (cross coupling): noise on delay
e
Glitch on functionality
Noise library
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
6/16/08 BD03: Digital Physical Design 717
Noise Library
Signal integrity analysis requires each cell in the circuit to be modeled
(characterized) using a hierarchical model, such as
UDN (user-defined noise)
ce
ECHO (hierarchical block)
XILM (interconnect logic model) or cdB (block)
make_cdb utility.
en
This pre-characterized information is stored in a noise library using the
d
The characterization determines the sensitivity of the cell library to
noise glitches on the inputs.
ca
Factors such as resistance, capacitance, noise tolerance, and output
holding strength are to be taken into account during characterization.
e
Input characterization data
c
Output characterization data
Slew characterization data
n
SPICE transistor description
e
Copy of transistor-level cell
Cell renamed to _CADMOS_<cellname>
d
Characterized slew on input
of last logic stage output
a
(- rise, -fall ) Internal Slew Characterization
Cell input slew
(-slews)
6/16/08
c Internal node
(-rise_prop_to –fall_prop_to)
capVal
719
e
As specified in the Synopsys .lib (preferred approach)
As specified by the set_port command
c
Ports connected to gates are marked as inputs
Ports connected to transistor channels are marked as outputs
en
Channel connected inputs or bidirects must be marked manually
Records the Vds-Ids curves for each Vgs connected to each cell output
d
Calculates the noise threshold of each cell input and the I/O pin capacitance
Cell Library CMOS
a
SPICE Device
Netlist(s) Model
c
Synopsys Command File make-cdb
Library (TCL)
.lib
Noise
Library
.cdb
6/16/08 BD03: Digital Physical Design 720
What’s in a cdB File?
A block-level cdB contains a cell-level view and a cdB Structure
e
The cell-level view contains pin capacitance, Characterized Data For cell1
calibrated input noise threshold, and
c
subckt transistor description for cell1
nonlinear output drive strength.
…
The transistor-level view contains an ECHO
n
Characterized Data For cell N
built with the cells and R/C network
connected to each I/O pin. This is different subckt transistor description for cell N
e
than the .cdB created by make_cdb, which
contains a UDN built with transistors, not
d
cells. Noise Check
a
Noise Check
UDN
c
UDN
Cell Level View
Transistor-Level View
6/16/08 BD03: Digital Physical Design 721
e
Glitch on functionality
Noise Library
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
6/16/08 BD03: Digital Physical Design 722
What Is Engineering Change Order Mode?
ECO mode is used in an SI analysis tool to
e
Analyze both glitch and delay failures
c
Output a tool-specific ECO command file
n
d e
ca
6/16/08 BD03: Digital Physical Design 723
equivalents.
ce
The ECO mode uses the Liberty file (.lib) or user-defined cell
The tool can fix glitch and incremental delay failures with the ECO
option.
en
The tool automatically outputs the ECO repair file in a text file and a
HTML format, showing the original noise and the new noise after
a d
swapping in a new cell.
Victim driver cells can be upsized. (Swapping victim driver cells will
not fix the failure if the coupling is caused by a long wire.)
6/16/08
c BD03: Digital Physical Design 724
Noise-on-Delay Fixing
Options for ECO analysis on noise
failures
e
Buffer: Buffer insertion Place and Route
Resize: Driver resizing
Spacing: Wire spacing
nc
Shieldnet: Shield net insertion
Nofix: Do not do ECO analysis Extraction
ECO
Repair File
e
for noise failures (Glitch +Delay)
Default option is spacing.
a d Noise Analysis
6/16/08
c Static
Timing
Analysis
e
is enabled.
nc
d e
ca
6/16/08 BD03: Digital Physical Design 726
Signal Integrity Analysis
Crosstalk (cross coupling): noise on delay
e
Glitch on functionality and delay
Noise library
nc
Hierarchical SI analysis: block noise model (XILM)
d e
ca
6/16/08 BD03: Digital Physical Design 727
Hierarchical Methodology
Design sizes and complexity increasing
Longer turnaround time and capacity limitations when running designs in a flat
hierarchy
ce
To handle complexity, block-based hierarchical design methodologies are
used
en
a d Black
c
Box
e
noise analysis.
c
It is created using CeltIC NDC.
The XILM model is used for both hierarchical noise and timing analysis.
en
d
Propagated
Attacker Noise Attacker Attacker
Failure?
a
Primary d q d q Primary
Input Output
c
Victim
clk clk
e
Less likely to have a capacity limitation
c
Supports a continuous convergence methodology
n
d e
ca
6/16/08 BD03: Digital Physical Design 730
Learning Activity
In this activity, you will
e
Be given a handout of a SI report which contains violations
You have to analyze the report and trace the cause of the problem
nc
Decide which strategy is best suited to fix the violation
d e
20 minutes for activity
10 minutes for debriefing
ca
6/16/08 BD03: Digital Physical Design 731
STA Summary
The goal of timing analysis is to verify that a design meets timing
requirements under a specified set of timing constraints.
possible paths.
ce
STA ignores functionality of circuit and analyzes the timing for all
n
optimization tools in order to meet the timing goals.
e
The timing reports provide a summary of the final timing information,
which reports timing failures (setup and hold) for all paths starting with
the worst failing path.
a d
Timing exceptions are set on paths that are not designed to be
exercised during normal circuit operation.
c
Timing analysis modes, such as OCV mode, direct the tool so that it
takes into account the small difference in operating parameters across
the chip while analyzing the design.
Design rule constraints are the requirements established by the library
vendor for the proper functioning of the fabricated circuit.
6/16/08 BD03: Digital Physical Design 732
Signal Integrity Summary
SI issues lead to failure in performance of a circuit due to errors
induced in the normal operation of a design through crosstalk and
e
glitches.
c
A noise library characterizes the cells in a design to determine its
sensitivity to noise glitches on their inputs.
en
An ECO repair file is a command file that provides information used to
repair nets that suffer from noise and that should be fixed in the
database available after place and route.
a d
XILM is an interconnect logic model that defines the noise propagation
up to the first latch/flip-flop from the boundary pins.
6/16/08
c BD03: Digital Physical Design 733
e
In static timing analysis, the designer creates timing test vectors that are
simulated using a gate-level netlist to verify timing.
nc
Operating conditions are always set from a single set of libraries and never
e
Design rule constraints are requirements depending on technology library.
Crosstalk is caused by transition on an adjoining signal having a capacitive
a
unintended logic transition.
d
or inductive coupling between neighboring wires leading to an
6/16/08
c BD03: Digital Physical Design 734
Terminology
Term Description
Constraint-related
e
Clock Skew The maximum difference in arrival times of clock signal to any two latches/FFs fed by the clock network
Clock Jitter The maximum difference in phase of clock between any two periods
c
Clock latency Specifies the delay along the clock tree (Source latency + Clock network latency)
Slew Rate Represents the maximum rate of change of a signal at any point in a circuit
n
Path Delay Represents the time taken for signal to propagate from one point to another
Timing report-related
e
Beginpoint Flip-flop or port at which the signal is launched with respect to the clock
Endpoint Flip-flop or port at which the launched signal is captured with respect to the clock
Other end arrival It is the capture clock path from clock source to capture flop register
d
time
Slack Slack or timing margin is the difference between the “required arrival time” and “actual arrival time”
a
Phase Shift Phase shift is the delay adjustment used to calculate the appropriate required time at the path end point
Instance Master cell definition used multiple time in a design with a unique name
c
Arc Any signal path along a net from one start point to one end point
Operating mode-related
Launch Clock Clock signal at the starting flip-flop which launches the data
Capture Clock Clock signal at the ending flip-flop which captures the data
Early signal Earliest time at which the value on a net /point can change from its previous cycle stable value
Late Latest time at which the value on a net/point can settle to its final stable value for the current cycle
ce
en
a d
6/16/08
c BD03: Digital Physical Design 736
Design Optimization
Module 11
Optimization Process
Optimization is the successive
refinement of a product or design.
e
Usually, it takes several iterations
c
of optimization until a product or
design is complete.
Trees
n
The types of optimizations
performed on the product or
e
design depend on the stage.
For example, to make lumber,
d
trees are chopped down, cut into
long strips, sized, and sanded.
a
“Optimization”
c
various optimizations as the
design progresses through the
physical implementation flow.
Lumber
e
Explain the value of optimization at the various stages of the design
flow to meet timing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 739
e
Inserting repeaters to optimize for timing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 740
What Is Optimization?
Unless you are an absolute genius, your design will not meet the
timing requirements on the first run.
ce
Optimization is the process of iterating through a design such that it
meets timing, area, and power specifications.
In general, optimization can be broken down into the following areas:
Timing
Signal integrity
en
d
Power
Area
ca
6/16/08 BD03: Digital Physical Design 741
e
signal integrity constraints.
c
Timing closure is often one of the greatest causes of ASIC tapeout
schedule slips.
en
The problem lies in the discrepancy between front-end and back-end
designers’ concept of timing.
Front-end designers use wireload models to predict timing, and back-
a d
end designers use a fully placed design, including its resistance and
capacitance (RC) values.
Who is more accurate?
6/16/08
c BD03: Digital Physical Design 742
What Are Wireload Models?
One of the most vexing problems Sample wireload model file
traditional synthesis tools face is
e
wire_load(“sample_wl10") {
how to predict interconnect
resistance : 8.5e-8;
parasitic values.
c
capacitance : 1.5e-4;
One approach is to develop a area : 0.7;
lookup table that ties the RC slope : 66.667;
n
values of a net to its fanout. fanout_length (1,66.667);
}
e
Tools calculate the appropriate wire_load(“sample_wl20") {
wire load block for each net. resistance : 8.5e-8;
d
capacitance : 1.5e-4;
These values are derived from area : 0.7;
statistical analysis of ASIC foundry
a
slope : 133.334;
data for a given process node. fanout_length (1,133.334);
c
}
ce
In process nodes of around 1 micron, the dominant component of net
delay is the I/O pin delay of standard cells. Therefore, the wireload
delay plays an insignificant role.
e
widths mean more resistance.
n
As device dimensions shrink, global routes get longer and smaller wire
a d
A better replacement for wireload models is physical synthesis, where
synthesis and placement are combined to more accurately calculate
the wire delay timing based on physical data.
6/16/08
c BD03: Digital Physical Design 744
Optimizing for Timing
There are many ways to reduce delay; we will cover some
fundamental techniques here.
ce
Upsizing gates increases their drive strength and, thus, reduces the time it
takes for that gate to transition based on a given load.
Upsizing a gate increases its own input capacitance, giving its driver
n
higher capacitive load.
A technique called logical effort was invented to optimize the size of
e
gates along a path for minimal delay.
The tool will usually perform calculations for you.
a d
Reduce wire capacitance
Usually involves shortening the wire lengths of critical paths by moving
cells or inserting buffers
6/16/08
c Switching to a higher metal layer can also reduce capacitance
ce
For a large number of bits for example, a carry lookahead adder performs
much better than a ripple carry adder.
Physical synthesis tools optimize datapath elements to meet timing, while
n
balancing area and power.
e
If all fails and the datapath contains too much combinational delay, it is
often viable to simply break the path and insert a register in between,
d
creating an extra pipeline stage.
An extra pipeline stage means more latency and more area.
a
Such a change usually requires changing the RTL itself.
c
6/16/08 BD03: Digital Physical Design 746
Signal Integrity
As technology continues to scale, the aspect ratio of the horizontal-to-
vertical dimensions are reduced.
capacitances.
ce
This results in increased ratios of coupling capacitance to substrate
The impact on the victim line is a strong function of the rise time of the
n
interfering signal and the strength of the gate driving line Y.
e
A voltage step on line X causes a transient step on Y that decays with
a time constant: τ XY = RY ( C XY + CY )
a d X
c
Y
VX RY CY substrate capacitance
e
When a voltage is applied on line X, there is also a change of voltage
on line Y equal to C XY
c
ΔVY = ΔV X
(CY + C XY )
n
If this change in voltage is large enough, it can cause an erroneous
logic value at the load of line Y.
d e X
a
CXY coupling capacitance
Y
c
VX RY CY substrate capacitance
e
Reduce RY, which means upsizing the driver of line Y.
c
RY
en
Insert a repeater in the line.
RY
d
Reduce the capacitance, which means separating the wires or changing
metal layers.
ca
6/16/08 BD03: Digital Physical Design 749
Power
Power is a major issue in most chips, especially those that are used in
mobile devices where battery life is limited.
ce
Recall that power is given by the equation P = f*C*Vdd2 where f is the
operating frequency, C is the total capacitance of the circuit, and Vdd
is the supply voltage.
en
Most of the time, the voltage supply and the operating frequency of the
circuit is already determined long before the physical implementation
stage.
How can we the reduce power?
a d
6/16/08
c BD03: Digital Physical Design 750
Optimizing for Power
To reduce power
Reduce capacitance.
ce
Decrease size of standard cells. Power is also a linear function of the
driving current, and smaller gates output less current.
ANDX10
en ANDX6
a d
Leakage current is a dominant factor in today’s (90 nm and below)
chips and can account for as much as 30% of the power consumption.
To reduce leakage current, gates with a higher threshold must be
6/16/08
c
used.
e
optimization.
c
For example, downsizing gates leads to less power, but also more
delay.
en
This is an age-old problem in the development of ICs and there is not
a d
For example, a mobile phone processor may not need to run at 2 GHz,
but it must consume as little power as possible.
6/16/08
c BD03: Digital Physical Design 752
Visualizing the Tradeoff
In the graphic below, the purple box represents the constraints for
energy (power) and delay (timing) put on your design.
can have.
ce
The blue curve represents the highest possible efficiency your design
Your goal should be to move your design onto the blue curve.
en
Again, the exact desired location on the blue curve depends on the
d
The derivation of this curve is highly theoretical and is beyond the
scope of this class.
ca
6/16/08 BD03: Digital Physical Design 753
e
functionality for the same area.
c
Area is therefore a very important specification, especially for chips
used for medical purposes such as hearing aids and pacemakers.
en
The components that usually take up the most area on a chip are
d
settled with the RTL designer.
a
6/16/08
c SRAM
ce
Utilization is defined as how much percentage of the floorplan area
timing.
en
making it difficult to route. Longer routes also make it harder to meet
a
Congestion
d
6/16/08
c BD03: Digital Physical Design 755
e
Inserting repeaters to optimize for timing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 756
Inserting Repeaters
Recall that you may upsize gates to decrease the delay through a
path.
ce
If the fanout of a gate is too high, then it is a viable option to insert
But why would inserting an extra stage in the path decrease the
overall delay?
en
Take for instance the following circuit; the input capacitance of the
buffer is Cg, and the value of its load is 16Cg
a d
c
Cg
16Cg
e
16 C
= 16
g
For the previous circuit, the electrical fanout of the buffer is C g
c
Since it is the only buffer in the circuit, the total fanout of the circuit is
also 16.
en
Let’s insert another buffer and size it to be twice as large as the first
buffer so that its input capacitance is 2Cg.
d
Cg 2Cg
16Cg
ca
The total electrical fanout of the circuit is now 2C g Cg
= 10
16C g
Recall that since the total delay of the circuit is roughly proportional to
the total electrical fanout of the circuit, we have effectively reduced the
delay of the path.
+
2C g
Cg
ce
4Cg
16Cg
en
A quick calculation will show that the total electrical fanout is now 8 instead of
10.
d
How do we pick the optimal electrical fanout?
a
This problem can be solved by
Calculating the total delay for N stages of buffers and a total electrical fanout of
c
F(loading capacitance divided by input capacitance of the first buffer)
Taking the derivative with respect to N
Finding the zero of the derivative (call it N0). The optimal electrical fanout is then
equal to N 0
F
6/16/08 BD03: Digital Physical Design 759
e
account, such as the intrinsic delay and loading of each buffer.
c
A numerical analysis of the problem reveals that the optimal electrical
fanout is roughly equal to 4.
en
This means to achieve optimal delay, every stage in the logic path
should have equal electrical fanout and equal delay.
The method of logical effort, which will not be explained in this class,
d
explains how to size logic gates of any type.
a
6/16/08
c BD03: Digital Physical Design 760
Restructuring Logic
Logic gates with a high number of inputs are not desirable.
e
Usually, it is much more effective to restructure a wide gate into
smaller gates.
individual gates.
nc
This allows more flexibility in terms of optimization for each of the
d e
ca
6/16/08 BD03: Digital Physical Design 761
e
Inserting repeaters to optimize for timing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 762
Optimization During the Design Flow
Now that you have learned all of these optimization techniques, where do you
use them?
lectures.
ce
Below is a typical back-end flow that you may be familiar with from past
Netlist
en Floorplan
d
Power Plan
ca Placement
Routing
e
Netlist
c
Floorplan
n
Power planning
e
Placement
d
Pre-CTS Optimization
a
Clock tree synthesis
c
Post-CTS Optimization
Routing
Post-Routing Optimization
e
stage.
Floorplan
c
It is here that we have the most
freedom. Power planning
n
The techniques that are commonly
used here include Placement
e
Inserting buffers for high fanout
Pre-CTS Optimization
nets
d
Upsizing and downsizing gates Clock tree synthesis
a
Restructuring logic to meet timing.
Post-CTS Optimization
Since the metal routes are not in
c
place yet, we cannot perform any
Routing
optimization by moving metal
layers.
Post-Routing Optimization
Post-CTS Optimization
When the clock network is put in place, a new element comes into
play called clock skew.
ce
This factor is because the clock needs to propagate from the center of
the clock tree toward the peripherals.
n
Skew
Reg1
d e Reg1 Reg2
Reg2
ca Clock
source
e
violate timing depending on the
amount of the skew and the Floorplan
c
nature of the path.
Power planning
To mitigate the effects of skew,
n
you can Placement
e
Insert buffers in the clock tree to
lessen the skew Pre-CTS Optimization
d
Re-time and use any of the
previously mentioned techniques Clock tree synthesis
a
to fix timing
Post-CTS Optimization
Once again, no metal routes have
c
been placed, although the clock
Routing
signals are often routed during
clock tree synthesis.
Post-Routing Optimization
Post-Routing Optimization
Now the that the design is fully
placed, routed, powered, and Netlist
e
clocked, it is time to undergo the
final phase. Floorplan
nc Power planning
Placement
e
there is usually not enough room
to do much modification. Pre-CTS Optimization
d
Moving standard cells and macros Clock tree synthesis
may require intensive re-routing.
a
Therefore, the following
c
techniques are usually used:
Changing metal layers
Moving metal layers
Resizing gates
Post-CTS Optimization
Routing
Post-Routing Optimization
ce
Wireload models were originally used to determine timing in a design,
but are quickly becoming obsolete with shrinking device dimensions.
Timing can be improved by upsizing gates, shortening wire lengths,
etc.
en
Signal integrity issues are usually caused by coupling capacitance
between wires that are close to each other.
a d
They can be solved by moving wires and upsizing drivers.
Power can be reduced by downsizing gates and using high-threshold
c
cells.
Summary (continued)
The power and timing tradeoff is always a critical consideration
depending on the application of the chip.
ce
RAMs and register files should be used sparingly to optimize for area.
A high amount of optimization makes the design difficult to route and
n
may cause congestion.
Buffers can be inserted to reduce delay in a pattern such that the
e
electrical fanout for each stage is approximately 4.
d
Physical implementation usually consists of three stages of
optimization: pre-CTS, post-CTS, and post-routing.
ca
Each successive stage will have less freedom to optimize as metal
layers are being added.
e
Study several scenarios (given in the next few slides) within the
optimization flow diagram
scenario shown
nc
Identify which problems can potentially occur as a result of the
e
these problems.
a d
20 minutes for activity
10 minutes for debriefing
6/16/08
c BD03: Digital Physical Design 771
e
Run physical synthesis on the netlist, and find that the gates
highlighted in red are violating timing.
c
What types of optimization would you perform on this netlist?
n
d e
ca
6/16/08 BD03: Digital Physical Design 772
Class Activity: Optimization Case 2
Here is a design that has been through CTS.
e
What are some possible problems with this design?
nc Register Register
e
Clock “Long”
Buffer
d
Clock Net
Clock PLL
ca SRAM Register
e
What problems can you see in the routing?
nc
d e
ca “Long”
Signal Nets
e
1. As power increases on a chip, the delay decreases.
nc
3. Buffers can be upsized arbitrarily to optimize delay.
e
4. Crosstalk optimization can only be performed after routing.
d
cells versus pre-CTS optimization.
a
6/16/08
c BD03: Digital Physical Design 775
ce
en
a d
6/16/08
c BD03: Digital Physical Design 776
Engineering Change Orders, Design
Verification, and Tapeout
Module 12
Design Changes
From specification to final Functional
implementation, a chip can undergo Changes
e
changes at various stages. Specification
c
Functional changes to the specification
can require RTL Coding
n
A restart of the entire
implementation process
e
Logical
Implementation
Changes to the design during the
d
implementation process
Physical
These functional changes impact
a
Implementation
Schedule
Cost
c
Design Verification
and Tapeout
Features of the product
Final
Implementation
e
Articulate what an Engineering Change Order (ECO) is and what ECO
techniques are used at the different stages of the flow
requirements
nc
Articulate the various steps in verification as well as list the tape-out
d e
ca
6/16/08 BD03: Digital Physical Design 779
Discussion Questions
What are some plans and projects in everyday life that you are
involved with?
ce
Can you give example of some “last-minute” changes that have
occurred in those plans and projects?
Can you give examples of a “checklist” or some type of process or
n
documentation to ensure that the plan or project is complete?
e
a d
6/16/08
c BD03: Digital Physical Design 780
Topics in This Module
Engineering change orders (ECOs)
e
Design verification
Tapeout
nc
d e
ca
6/16/08 BD03: Digital Physical Design 781
What Is an ECO?
Definition: The process of inserting a logic change directly into the
netlist after it has already been processed by an automatic tool
ce
Example: After our final netlist was created, our marketing person
informed the team of a must-have feature for the chip. To incorporate
the feature, we created and implemented an ECO.
en
a d
6/16/08
c BD03: Digital Physical Design 782
ECOs
Implementing ECOs is one of the most challenging aspects of the design
process.
ce
ECOs are necessary to implement important product features, but we must
do so with as minimum impact to schedule and cost, while making sure
what we implement is correct.
ECO types
en
In the next few slides, we will cover
d
ECO implementation types
a
Using back-end tools to implement ECOs
6/16/08
c BD03: Digital Physical Design 783
ECO Types
Generally, there are two types of ECOs.
e
Functional ECOs
Changes to the specification to add or remove functionality to the design
nc
The ECO’d netlist and the original RTL do not match functionally
RTL must be modified to match the ECO’d netlist
Timing ECOs
d e
Changes to the netlist, typically late in place/route, that do not change the
function, but try to improve on the timing of the design
The ECO’d netlist and the RTL do match functionally
ca
6/16/08 BD03: Digital Physical Design 784
Functional ECOs
Steps
1 Functional
e
1. The specification calls to add or Specification
Changes
remove functionality.
c
2. The netlist is manually modified 3 RTL Coding
either in logic implementation or
n
physical implementation.
e
3. The RTL code is modified to Logical 2
Implementation
match the functionality of the
ECO’d netlist and verified.
a d
4. Once the ECO is verified, then the
rest of tapeout process is
completed.
Physical
Implementation
2
c
Design Verification 4
and Tapeout
Final
Implementation
e
always @ (posedge clk)
implementation steps by reusing q <= !((!c ? !b : a) || d);
the information from previous
c
runs. Netlist after Logic Synthesis
n
a u4 u5
of the gates are preserved from
e
logic synthesis into placement and
u1
physical implementation. b
d
If we modified the RTL, then re-
c
synthesized the design, - all of the u2
a
instance names would be different d
and our placement information
c
from the previous runs would be Netlist after Placement
useless. u1 u3
u2
u4 u5
e
To simplify, there are three cases:
Easy: Easy to perform an ECO, just a few gates
nc
Medium: ECO will be tough, not impossible
e
Difficulty
6/16/08
c
ECO
1-100 Gates
ECO or
Re-synthesize
787
Timing ECOs
Timing ECOs typically occurs late in the Netlist after Logic Synthesis
physical implementation process. a
u3
u5
e
u4
Steps
c
u1
1. Critical paths are analyzed with b
the place/route tool or a static
n
timing analysis tool. c
u2
2. Suggestions are made by the d
e
design engineer or the tool to
implement a timing ECO. Netlist after Placement
d
Example: Upsize “u4” to next u1 u3
higher power.
a
u2
3. The ECO is done, and timing is u4 u5
re-analyzed.
6/16/08
c
4. This is iterated until all paths meet
timing.
5. Once the design meets timing, the
rest of the flow is completed.
Netlist after Timing ECO
u1
u5
u2
788
ECO Implementation Types
There are three types of ECO implementation types:
e
Spare gates
Metal fix
nc
d e
ca
6/16/08 BD03: Digital Physical Design 789
e
logic to an existing design.
c
Most design teams will have a strategy to include spare gates in their
design, just in case they are needed for ECOs.
en
Spare gates can be implemented before we tapeout a design, and
typically during the physical implementation process.
There are several methods for including spare gates:
a d
Randomly sprinkle gates where available
Instantiate a “pack” of ECO gates at various levels in the design hierarchy
c
Use “ECO bulk” cells
e
available. Netlist before random spare cell insertion
c
This can be a manual process or u1 u3
done using the place/route tools u2
utilities
n
u4 u5
Simple process
d
Netlist after random spare cell insertion
u1 s1 s2 s3 u3 s4
a
connected to the clock-tree or
s5 s6 s7 s8 u2
scan-chain, and must be
c
connected if used u4 u5
ce
throughout the design hierarchy.
The ECO pack can contain flops,
// RTL Code
always @ (posedge clk)
q <= !((!c ? !b : a) || d);
n
muxes, and random gates.
// Instantiate ECO Packs
The flops can be connected to the
e
eco_pack eco_u0 (…);
clock-tree and scan-chain during eco_pack eco_u1 (…);
the normal implementation
d
eco_pack eco_u2 (…);
process.
a
Design team has better control of
the instantiation, contents, and
c
reuse of the spare gates.
e
u1 e1 e2 e3 u3 e4
ECO bulk cells are randomly
e5 e6 e7 e8 u2
c
placed throughout the design.
u4 u5
Can be “programmed” by adding a
n
specific functional cell on top of Netlist after ECO bulk cell modification
the bulk cell.
e
u1 e1* e2 e3 u3 e4
A single ECO bulk cell can e5 e6 e7 e8 u2
become an inverter, nand, nor,
d
u4 u5
xor, etc., just by changing the
functional connections on top of
a
VDD VDD
the cell.
c
Gives a lot of flexibility for later
stage ECOs. a z a z
VSS VSS
e1 e1*
6/16/08 BD03: Digital Physical Design 793
Implementing ECOs
If carefully planned, an ECO “pack” of
cells would be located near every ECO s1
e
location.
Unfortunately, there is not enough room
c
on most chips to do so.
n
Let’s say
The output of u4 is currently
e
connected to input of u5.
We need to invert the output of u4
d
and feed it to u5.
a
We do not have ECO “bulk” cells.
s1 is a spare inverter.
c
s2 is a spare 2-1 mux.
6/16/08
u1
u4
u3
u5
u2
794
ECO Implementation Types
There are three types of ECO implementation types:
e
Spare gates
Metal fix
nc
d e
ca
6/16/08 BD03: Digital Physical Design 795
e
where only a few metal layers are modified in Mask N
order to modify connectivity between existing
c
logic in the design.
n
out and is in the midst of production.
e
To make changes at this point, it is always Mask 10
best to consider a metal fix or a “metal-only” Mask 9
fix because we can reuse our previous work Mask 8
d
as much as possible. Mask 7
Mask 6
a
Consider the “masks” for a tapeout: Mask 5
Mask 4
Each mask represents a layer in our
c
Mask 3
design. Mask 2
If we make modifications, we would Mask 1
like to minimize the number of layers,
to minimize the number of masks
changed.
e
u1 s1 s2 s3 u3 s4
Re-route the existing design to
c
use an inverter instead of a buffer s5 s6 s7 s8 u2
(u2). u4 u5
n
Identify a spare cell close by to re-
route (s8).
d e
Implement the metal-only changes
with as few layers as possible.
Change just two mask layers and
Netlist after metal-only fix
a
u1 s1 s2 s3 u3 s4
continue with production.
s5 s6 s7 s8 u2
c
u4 u5
e
Spare gates
Metal fix
nc
d e
ca
6/16/08 BD03: Digital Physical Design 798
What Is a Focused Ion Beam?
Definition: Once the design has gone
through the manufacturing process, a
e
focused ion beam (FIB) machine can be
used to etch away or add connections
c
to a die in order to modify or add logic to
an existing design.
en
d
After a chip has been produced,
wire connections can be removed
a
or added to change functionality.
c
This is an expensive alternative
and is done for one die.
This is usually done for prototype http://en.wikipedia.org/wiki/Image:Fib_tem_sample.jpg
parts, etc.
e
// Original Netlist
ECO by netlist buf1x u2 (.a(n1), .z(n2));
c
Create a modified Verilog® netlist
and have the back-end tool // ECO Netlist
n
incorporate the new cells. // buf1x u2 (.a(n1), .z(n2));
ECO by change list inv1x u2 (.a(n1), .z(n2));
e
Create a command file to add or
remove cells and connections.
c
-n2 u2.z
+inv1x u2
+n1 u2.a
+n2 u2.z
e
Assume you have a single-gate ECO to implement, changing a two-
input AND gate to a two-input OR gate.
nc
You have a spare 2-1 mux near the two-input AND gate.
You have a spare two-input OR gate far from the two-input AND gate.
e
Questions
How can a 2-1 mux behave like a two-input OR gate?
a d
How would you implement this ECO?
What factors would you consider when choosing the mux or the OR
c
gate?
Learning Activity
In this activity, you will
e
Study several scenarios of design at different stages of the
implementation process
nc
Decide which course of action is best suited for your scenario,
including the implementation and verification of your ECO
Present your findings to the class
d e
20 minutes for activity
10 minutes for debriefing
ca
6/16/08 BD03: Digital Physical Design 802
Topics in This Module
Engineering change orders (ECOs)
e
Design verification
Tapeout
nc
d e
ca
6/16/08 BD03: Digital Physical Design 803
Design Verification
The design verification flow consists of
Physical
Formal verification or logic
e
Verification
Original
equivalence checking (LEC) Physical
Formal
Verification
c
Implementation
(LEC)
Physical verification ECO
n
GDSII
to Layout
Signoff LVS and DRC
d e Layout Tool
Mask Prep
a
GDSII and
for Tapeout Manufacturing
c
Signoff
LVS and DRC
ce
the RTL and the ECO’d netlist (functional ECO) or the original netlist to the
LEC, which is part of formal verification, can be used to verify these cases.
RTL
Functional ECO
Design
en ECO’d RTL
Timing ECO
d
Engineer RTL
Netlist
Logic
ca
Synthesis
Design
Engineer
Formal
Verification (LEC)
ECO’d
Netlist Netlist
Logic
Synthesis
Design
Engineer
Formal
Verification (LEC)
ECO’d
Netlist
e
Specification
c
Corresponding edits to the RTL to 2 RTL Coding
reflect the functional changes (2)
en Simulation (3)
Logical
Implementation
Formal
Verification (4)
d
netlist and compare results (3) Physical
1
Implementation
a
Formal verification or equivalence
checking of the RTL code vs. the
c
ECO’d netlist (4) Design Verification
and Tapeout
Final
Implementation
e
Specification
c
Since the functionality has not been RTL Coding
changed, we can functionally verify the
n
ECO’d netlist with the original netlist.
e
Logical
Simulation of the original netlist Implementation
vs. the ECO’d netlist and compare
Simulation (2)
d
results (2) Original
1 Physical Formal
Formal verification or equivalence Implementation Verification (3)
a
ECO
checking of the original netlist vs. Simulation (2)
the ECO’d netlist (3)
c
Design Verification
and Tapeout
Final
Implementation
Design Verification
The design verification flow consists of Physical
Verification
Formal verification or LEC
e
Original
Formal
Physical
Verification
Physical verification Implementation
c
ECO (LEC)
n
Signoff LVS and DRC to Layout
d e GDSII
Layout Tool
Mask Prep
and
a
for Tapeout Manufacturing
c
Sign-off
LVS and DRC
ce
n
Geometry
e
Antenna
Manufacturability
a d
6/16/08
c BD03: Digital Physical Design 809
Unconnected pins
ce
n
Dangling wires
e
Loops
Partial routing
a
verifyConnectivity
d
In the SOC Encounter® environment use the command
6/16/08
c
Then, view the resulting violations in the Violation Browser.
Length
ce
n
Spacing
e
Area
Overlap
Enclosure
Wire extension
a d
Via stacking
verifyGeometry
6/16/08
c
In the SOC Encounter environment, use the command
M1
M2
ce M1
Driver
en Load
d
Breakdown!
a
M1 M1
Driver Load
c
Circuit during fabrication
e
violation. To fix the violation, one can
c
Change metal layers so that the rule is met
Add a diode so that there is a discharge path for the excess charge
M1
M2
en M1
M2
M1
d
Driver Load
a
Circuit after metal layer change
c
M2
M1 M1
ce
Check for pin routing that violates the maximum antenna charge for
the pins, and report violations on pins that have an antenna ratio
larger than the maximum allowed antenna ratio specified for the
n
routing layer.
e
Check for unconnected metal segments that violate the maximum
area specified in the technology file.
a
verifyProcessAntenna
d
In the SOC Encounter environment, use the command
6/16/08
c BD03: Digital Physical Design 814
What Are Manufacturability Checks?
Alpha particles can cause problems during manufacture:
Via defects
Cell defects
Wire defects
ce
Via Defects
en
Cell Defects Wire Defects
d
Alpha particle blocks via Alpha particle blocks a gate pin Alpha particle causes a short
ca
6/16/08 BD03: Digital Physical Design 815
ce
Use “yield-hardened” library cells.
Redundant Vias
en
Yield Hardened Cells Thicker wires + more spacing
d
Improve via reliability Cells are slower, but safer Wires and spacing take up more
space, but are safer
ca
6/16/08 BD03: Digital Physical Design 816
Manufacturability
Calculates the probability of yield loss due to the following effects:
Cell failures
Via failures
ce
n
These effects are caused by random particles that land on the die during
fabrication, causing defects.
reportYield
d e
In the SOC Encounter environment, use the command
a
Note: This accounts for only a portion of actual yield loss. There is also parametric
yield loss due to RC variation or systematic yield loss due to lithography problems.
6/16/08
c BD03: Digital Physical Design 817
Design Verification
The design verification flow consists of
Physical
Formal verification or LEC
e
Verification
Original
Formal
Physical
Physical verification Verification
c
Implementation
ECO (LEC)
GDSII export to layout
n
GDSII
Signoff LVS and DRC to Layout
d e Layout Tool
Mask Prep
a
GDSII and
for Tapeout Manufacturing
c
Signoff
LVS and DRC
e
Verification
necessary information for manufacturing Original
Formal
Physical
and final LVS/DRC sign-off. Verification
c
Implementation
ECO (LEC)
A layout tool, such as Virtuoso, is
n
required to produce the final GDSII for GDSII
to Layout
tapeout and final LVS/DRC sign-off.
d e Layout Tool
Mask Prep
a
GDSII and
for Tapeout Manufacturing
c
Signoff
LVS and DRC
Design Verification
The design verification flow consists of
Physical
Formal verification or LEC
e
Verification
Original
Formal
Physical
Physical verification Verification
c
Implementation
ECO (LEC)
GDSII export to layout
n
GDSII
Signoff LVS and DRC to Layout
d e Layout Tool
Mask Prep
a
GDSII and
for Tapeout Manufacturing
c
Signoff
LVS and DRC
e
Verification
are sign-off checks run to ensure the Original
Formal
Physical
integrity, functionality, and Verification
c
Implementation
(LEC)
manufacturability of the chip. ECO
n
GDSII
Verilog® netlist vs. GDSII to to Layout
e
ensure the functionality of the
design. Layout Tool
d
DRC is a detailed check of the
routed design against the Mask Prep
a
GDSII and
technology’s set of rules. for Tapeout Manufacturing
c
Signoff
LVS and DRC
e
TCL
Gate-level netlist in the Verilog Gates GDSII
c
language
GDSII LVS
n
Rule deck
Rule SPICE
Deck Libs
SPICE libraries
Commands in Tcl
Output
LVS reports
d e Reports
ca
6/16/08 BD03: Digital Physical Design 822
Input and Output, Format (continued)
DRC
Input
e
TCL
GDSII
GDSII
c
Rule deck
DRC
Commands in Tcl
n
Rule
Output Deck
e
DRC reports
Reports
a d
6/16/08
c BD03: Digital Physical Design 823
c
determined from that layout,
e
A flat layout is produced, and active devices and routing are
en
All poly, diffusion, and metal layers are conductive and are assumed to be
The netlist extracted from the layout is compared to the original gate-
d
level netlist to verify that they are the same.
a
This is a double-check on the place and route process.
c
An LVS check should be done on the final layout of all ICs.
n
IN1 O1
• Device Recognition
Design Layout:
Net1 VDD
I1
Net2
VDD
d
I3
e Net3
Design Transistors:
Net1 I1
2/1
Net2
I3
2/1 Net3
IN1
ca
GND
A B
I2
GND
A B
I4
O1 IN1
I2
1/1
I4
1/1
O1
ce
Simplify the design process for the designer
n
Typical design rules
e
Minimum width and spacing on each layer
Overlap of metal over via
Metal coverage/slotting rules
a d
Rules are created by the foundry for each manufacturing process.
c
Most rules generated through process characterization
Some rules derived from consistent failure modes of ICs
ce
Minimum width for all layers
n
Minimum spacing for all layers
Minimum spacing between layers: diffusion to well boundary
d e
Overlap of between layers for vias, contacts, and transistors
Percent coverage of a layer for metal layers
ca
Photographic processes can be positive or negative.
Widths as manufactured may be larger or smaller then as drawn.
Via width
ce Metal 2
width
Metal 2
en
d
overlap of via
ca Metal 1 to
Metal 1
spacing
Metal 1 width
e
Design verification
Tapeout
nc
d e
ca
6/16/08 BD03: Digital Physical Design 829
Tapeout
Tapeout checklist
e
Mask preparation
Chip manufacturing
nc
d e
ca
6/16/08 BD03: Digital Physical Design 830
Tapeout Checklist
Design teams need to have a checklist to ensure that all processes and
procedures were covered during the design, implementation, and
e
verification phases.
c
Important areas to check
n
RTL code and netlist information
d e
All related design-for-test (DFT) information
a
All related vendor specific requirements
c
All related package, board, software, and system information
e
RTL Code Freeze and Version Noted
Synthesis Netlist Version Noted Make sure starting RTL code and netlist are noted
c
Testbench Versions Noted
Functional Verification Passed Make sure simulations pass
Pre-Layout Timing Analysis
n
Validate early timing and SDCs
SDC validity Checked
Boundary Scan Checked
e
Memory BIST Checked
Ensure all DFT processes are complete
Scan Chain Insertion Checked
d
Floorplan Version Noted
Power Grid Analysis Checked
Validate early place/route power and timing
Place/Route with Timing Closure Done
a
Signoff
Physical Verification (LVS/DRC/Antenna)
c
Formal Verification
Static Timing Analysis Ensure all sign-off criteria is met
IR Drop
EM Check
ATPG Done Create ATPG vectors, and make sure
Gate-Level Verification Done Gate-level simulations pass
Wafers
ce
shine through in a defined pattern, commonly used in photolithography.
Photolithography
en
crystal, on which microcircuits are constructed.
a d
selectively remove parts of a thin film.
Example: In the fabrication of semiconductor devices, masks are used to
create custom patterns of different materials on wafers using
c
photolithography.
Mask Preparation
With advanced geometries, there are problems with the creation of masks due to
the very small sizes of wires and gates.
ce
The light sources used to create the masks themselves are not accurate enough,
n
Layout 0.25µ 0.18µ
d e
a
0.13µ 90 nm 65 nm
6/16/08
c Figures courtesy Synopsys Inc.
ce
compensate for image errors due to diffraction or process effects.
en
by phase differences to improve image resolution in photolithography
Example: OPC and PSM are used in advanced geometries to
improve the printability of wires during mask creation.
a d
6/16/08
c BD03: Digital Physical Design 835
PSM
ce
en
a d
6/16/08
c BD03: Digital Physical Design 836
OPC
OPC is the manipulation of the
mask itself to create extra patterns Optical Proximity Correction (OPC)
e
Design Wafer
to compensate for the errors due
to photolithographic process.
As technologies advance to
smaller geometries, the
nc
wavelength of the light used in the
No OPC
e
photolithographic process is
actually bigger than the mask
shapes themselves, causing
d
errors. OPC
a
The extra shapes modify the mask
to compensate for these effects.
6/16/08
c BD03: Digital Physical Design 837
PSMs
Like OPC, PSMs serve to
Phase Shifting Masks (PSM)
compensate for errors in the
e
photolithographic process. (a) Regular mask
(b) Alternating PSM Mask
c
PSM relies on the interference (c) Attenuating PSM Mask
created by mask modifications to
achieve its goal. Both (b) and (c) have the
n
effect of improving
the contrast on some
e
parts of the wafer, which
could improve the
resolution, as is done
d
with OPC
ca
6/16/08 BD03: Digital Physical Design 838
Chip Manufacturing
Masks and wafers are processed to create integrated circuits.
Masks
ce
en
d
Chemical
and other Wafers
a
Processing
Wafers
c
Wafers Processed Integrated
Wafers Wafer Circuits
Photolithography
e
Start with wafer at current step
Spin on a photoresist
nc
e
Pattern photoresist with mask
a
etch, implant, etc.
d
Step specific processing
6/16/08
c Wash off resist
Some processed wafers contain one copy of many different integrated circuits
called a shuttle.
ce
Shuttles are used for prototypes or test chips.
en
a d
6/16/08
c Courtesy D. Bouldin, U. Tennessee
Packaging Process
The last step in the fabrication of a
semiconductor device is packaging.
e
Die Cut
Steps Wafers
c
Die cut—From the wafer, each
individual die is cut Processed Integrated
n
Wafer Circuits
Die attachment—The die mounted
e
to the package or support
structure
d
IC bonding—Interconnect the die
I/O with the package I/O
a
Die Attachment IC Bonding
c
with ceramic, plastic, or epoxy to
prevent physical damage or
corrosion
IC Encapsulation
e
the lecture material or your notes
c
n
10 minutes for activity
10 minutes for debriefing
d e
ca
6/16/08 BD03: Digital Physical Design 843
Summary
ECOs are a vital part of the design process. Design teams have to add
critical functionality with least impact on schedule and cost using
e
ECOs. They do this by carefully planning for ECOs up front.
c
Design verification involves several checks to ensure that the design
functionality, integrity, and manufacturability of the chip are verified.
en
Tapeout involves ensuring all of the important steps in the overall
process are accounted for, through mask preparation and the final
manufacturing steps.
a d
6/16/08
c BD03: Digital Physical Design 844
Testing Your Understanding
True or false
e
1. Design teams can plan for ECOs very early in the design process.
2. When using a register as a spare gate for an ECO, you can simply
nc
connect it up like a regular logic gate.
3. LVS and DRC are run on a netlist just after logic synthesis.
e
4. Alpha particles cause random errors during the manufacture of a chip.
a d
6/16/08
c BD03: Digital Physical Design 845
Sources
Gennari, Frank. Overview of OPC.
http://www.cs.berkeley.edu/~ejr/GSI/cs267-s04/homework-
e
0/results/gennari/
nc
d e
ca
6/16/08 BD03: Digital Physical Design 846