You are on page 1of 25

Clock Concurrent Optimization

Paul Cunningham, Marc Swinnen, Steev Wilcox

Electronic Design Processes


April 10, 2009

© Azuro, Inc. 2009 1


The Clock Timing Gap

© Azuro, Inc. 2009 2


Traditional Design Flows
RTL

Synthesis

Initial Placement Chip speed measured using “ideal” clocks

Physical Optimization

TRADTIIONAL PURPOSE OF CTS:


CTS Make propagated clocks look like ideal clocks
by building “balanced” clock networks

Post-CTS Optimization

Routing Chip speed measured using “propagated” clocks

Post-Route Optimization

Final Layout

© Azuro, Inc. 2009 3


Reality Today
RTL (e.g. Verilog)

Synthesis

Initial Placement Ideal Timing


 Clock Gating
 Clock Muxing
Physical Optimization
 Clock Generators
– Especially for hold
MANY
ITERATIONS
CTS BIG DIFFERENCE!!  Complex Scan Chains
 OCV derates and CPPR
Post-CTS Optimization  Multi-corner
– Especially for hold
 Multi-mode
Routing
Propagated Timing

Post-Route Optimization

Final Layout

© Azuro, Inc. 2009 4


Technology Trends
Opening the Clock Timing Gap

© Azuro, Inc. 2009 5


Trends Driving the Clock Timing Gap
clock (T) clock (T)
Clock Timing
Gap CPPRBC
“Skew” does not include
? OCV effects CPPRAB
2X to 5X
skew OCV ± 10%
clock period

OCV ± 10%

A B C
D ≈ clock period
D

Traditional Optimization OCV

D < T - skew  OCV affects each pair of FFs differently (CPPR)

 OCV effect can be very big - e.g. 10% of 3T

 CTS cannot predict OCV impact

 So, “skew=0” does not mean FFs are really balanced

© Azuro, Inc. 2009 6


Trends Driving the Clock Timing Gap
clock (T) clock (T)
Clock Timing
Gap

“Skew “does not include


? enable

2T … 5T
gate offsets CG
skew

CG
offset

Traditional Optimization Clock Gating

D < T - skew  Clock gates are supposed to have a very big skew

 Traditional optimization tries to prevent this by


‘cloning’ the gates and pushing them down the tree

 Traditional approach cannot correctly optimize or time


CG enable paths

© Azuro, Inc. 2009 7


Trends Driving the Clock Timing Gap Clk-A

clock (T)

Clock Timing Clk-B


Gap 1,000 FFs

10,000 FFs
? “Skew “does not include
interclock skew
skew
Are all FFs
“balanced”?
Clk-C Clk-D
1,000 FFs
AOI2

D 2,000 FFs

Traditional Optimization Clock Complexity


500 FFs

D < T - skew  Clock balancing becomes very difficult, or even


theoretically impossible

 Requires extensive manual intervention

 Final clock implementation is very different from


original, ideal assumptions

© Azuro, Inc. 2009 8


The Clock Timing Gap is Growing
Pre-CTS Timing Report CTS Post-CTS Timing Report

Propagated clocks timing and ideal clocks timing are diverging

Number of Paths The clock timing gap is


growing exponentially

180nm, σ = 7% of T

65nm, σ = 27% of T

45nm, σ = 50% of T

Difference in Pre- to Post-CTS Timing (% of period T)

© Azuro, Inc. 2009 9


CMPLX

Ideal vs. Propagated Clocks Timing Gap


 Difference between ideal and propagated timing across 60 chips
– Top 10% worst violating paths
– Difference measured as a %age of clock period

60%

with ocv

inter-clock
50%
reg-to-cg

reg-to-reg

40%

30%

20%

10%

0%
180nm 130nm 65nm 45nm

© Azuro, Inc. 2009 10


Key Limitation of Traditional Flows
RTL

Synthesis

Initial Placement
“Ideal clocks”
Big decisions about chip
speed vs. area vs. power
Physical Optimization made here using ideal clocks

CTS Two worlds tearing apart


(more than 50% at 40nm!!)

Post-CTS Optimization

Downstream steps don’t have


the freedom to correct all “Propagated clocks”
Routing the mistakes made pre-CTS
in the flow

Post-Route Optimization

Final Layout

© Azuro, Inc. 2009 11


The Key Problems

 Physical timing optimization today is all based on ideal clocks timing


– Timing opt is based on wrong information (like wire load models in the past)
– Cannot see the real timing situation

 Clock balancing is not achievable, not necessary, and not helpful


– Even if CTS skew=0, Propagated timing ≠ Ideal timing
– Clock balancing imposes severe restrictions on timing optimization – for no benefit

© Azuro, Inc. 2009 12


Solution: Clock Concurrent Optimization
RTL

Synthesis
Pretend clocks
“Ideal clocks”
Initial Placement

Clock Concurrent Optimization Build clocks and optimize logic


at the same time

Real clocks
“Propagated clocks”
Routing

Post-Route Optimization

Final Layout

© Azuro, Inc. 2009 13


Clock Concurrent Optimization

© Azuro, Inc. 2009 14


Clock Concurrent Technology

Traditional Physical Optimization Clock Concurrent Optimization

clock T clock T

Extend physical
optimization into
the clocks
L C

skew

Gmax Gmax
More
degrees of
freedom
Gmax < T - skew L + Gmax < T + C

variable fixed fixed variable variable fixed variable

© Azuro, Inc. 2009 15


Time Borrowing in Clock Concurrent Opt.

clock

? ?

slack

Using CC-Opt, slack can flow across register boundaries

© Azuro, Inc. 2009 16


Logic Chains Limit Time Borrowing

clock

Looping Chain

IO Chain

© Azuro, Inc. 2009 17


Speed is Not Limited by the Critical Path

 The “critical path” does NOT limit the chip speed

 CC-Opt can easily move slack along a chain to where it is needed

critical path

slack

 CC-Opt will optimize “non-critical” paths to create spare slack

© Azuro, Inc. 2009 18


Speed is Limited by the Critical Chain

 The “CRITICAL CHAIN” is the focus of CC-Opt


– Critical chain is the chain with the longest delay/stage

Logic delay

11 13 8
Delay 11+9+19+8+13
= = 12
Stage 5
9 19

traditional critical path

15 11
Delay 15+16+11
= = 14
Stage 3
16

critical chain

© Azuro, Inc. 2009 19


CC-Opt Benefits

© Azuro, Inc. 2009 20


Summary of CC-Opt
RTL (e.g. Verilog)

 Build clocks directly for timing not


Synthesis skew balancing
Ideal – Consider setup and hold timing
Timing Initial Placement
– Understand OCV timing
– Understand clock gate timing
– Understand clock mux timing
– Understand clock generator timing
– Understand multi-corner
Clock Concurrent Optimization – Understand multi-mode

 Eliminate need to configure any


skew groups
– Skew groups are just a work-around for a
Routing broken flow!
Propagated
Timing
Post-Route Optimization

Final Layout

© Azuro, Inc. 2009 21


Key Benefits of CC-Opt.
 Up to 20% increase in clock speed
– Fundamentally more degrees of freedom during optimization
 All the benefits of useful skew and more!
– Directly targets propagated timing

 Accelerated timing closure


– No requirement to configure any skew groups
– Automatically handles clock muxing, clock gating, clock generators,
OCV, multi-corner (setup & hold), and multi-mode

 Reduced iterations to the frontend


– No need manually “retime” logic across register boundaries

 Reduced IR-drop
– Clocks are not balanced!

 Reduced power
– Clock buffers are only used where it is necessary for timing

© Azuro, Inc. 2009 22


Rubix™ - An Implementation

© Azuro, Inc. 2009 23


Rubix™ Flow and Key Features
RTL
 Full industry standard STA
– SDC constraint format
– Multi-corner and multi-mode
Synthesis
– OCV derates and CPPR

 Global routing
Placement – Ability to export “route guides”
Verilog Placed SDC
netlist DEF  Physical Optimization
Phys. Opt. – Timing-driven incremental placement
– Timing-driven high-fanout net buffering
– Cell sizing and logic transformations
CTS RUBIX™ – Legalization

 Clocks
Post-CTS Opt.
– Comprehensive skew group support
Verilog Placed
(can mix and overlap with timing windows
netlist DEF
based CTS)
Routing
 Multi-voltage
– Clock buffering and net buffering across
voltage islands
PRO
 Timing driven scan-chain
reordering
– Setup and hold aware
GDSII

© Azuro, Inc. 2009 24


Thanks!

For more information see CC-Opt White Paper at


www.azuro.com

© Azuro, Inc. 2009 25

You might also like