You are on page 1of 43

Industrial Clock Synthesis

Pei-Hsin Ho
Implementation Group
Synopsys, Inc.
Outline

• Problems that our customers care about


• Existing solutions and plan-of-record solutions
• Improvements required

© 2009 Synopsys, Inc. (2)


Clock Network
• Clock network delivers the clock signal to synchronize
every sequential cell in the clock domain

i j

© 2009 Synopsys, Inc. (3)


Clock Network Metrics
• Power
• Insertion delay for a clock path
Usually longer than the clock period
Proxy for variation and power
Logic level
RC delay
• Skew under variation
False skew
Variation
• Multiple corners
• OCV (on-chip variation) i j

© 2009 Synopsys, Inc. (4)


Power: Clock Major Culprit
• Power: biggest implementation issue today
Consumer electronics
Frequency limiter: 4GHz ceiling
• Biggest trouble maker
Clock
• 1/3 total power
• Many large and leaky buffers

i j

© 2009 Synopsys, Inc. (5)


Variation: Clock Most Susceptible
• Variation: biggest implementation issue tomorrow
Frequency limiter
clock cycle time
• >40% dead cycle time
Consumer electronics Logic time

• Biggest trouble maker clk variation margin


Clock clock skew
• OCV impact: 2X logic vari. margin

i j

© 2009 Synopsys, Inc. (6)


Variation: Clock Most Susceptible
• MC CTS: how to minimize skew for all corners?
Variation ratios of cells and wires differ in different corners
• Variation rations of wires differ in different layers
• Failed chips  hold violation caused by variable skews
Skew = 5-5 = 0; Skew = 5.7-5.6 = 0.1

1 1 1.1 1.1
1 1.2
1 1.1
1 1.2
1 1 1.2
1.1
3 3.3

© 2009 Synopsys, Inc. (7)


Performance

• Performance: biggest implementation challenge


yesterday?
1G Hz ASICs
• Clock skew is still a big issue for performance today
More clocks, more modes and more IPs
Complex clock gating
Higher wire delays

i j

© 2009 Synopsys, Inc. (8)


Clock Is Key
• Not competitive in clock not competitive in timing or
power for 45nm and beyond
• State-of-the-art clock-tree synthesis algorithms
i2

i1

i3

i4

© 2009 Synopsys, Inc. (9)


Industrial Solutions for Power

• Clock gating
Conventional
Sequential
Physical clock gating
• Register clumping
• Register banking

© 2009 Synopsys, Inc. (10)


Clock Gating Used by 90% of Synopsys
Users
90%
Clock Gating

Gate-Level Power 51%


Optimization

43%
Multi-Voltage Design

Multi-Threshold 42%
Design
2007
State 18%
2006
Retention/MTCMOS

Power Network 14%


Synthesis

0% 20% 40% 60% 80% 100%


Please check the techniques your team is using on your current project.
2007 N = 718; Margin of error = +/- 4%

© 2009 Synopsys, Inc. (11)


Clock Gating

ICG D Q
ICG Q en
en D
gclk en clk gclk
clk
clk Low
High
activity activity

• Automatic clock gating


• always@(posedge clk)
if (en) Q <= D
• ICGs (Integrated Clock Gating cells)
• Fewer sizes and uneven loads harder to balance skew
• Consume power

© 2009 Synopsys, Inc. (12)


Physical Clock Gating
• Design Compiler inserts ICGs during logic synthesis
ICG drive flops wide-spread for datapath timing; unbalanced ICG levels
Leave power saving opportunities on the table

m1 flop

i1 clock gate
m2
macro

r3
r2 buffer
r1

r4
r6

i4
r5

© 2009 Synopsys, Inc. (13)


ICG Merge
• Merge ICGs of the same enable signal

s1
m1 flop

i1 clock gate
m2
macro

r3
r2 buffer
r1

r4
r6

i4
r5

© 2009 Synopsys, Inc. (14)


ICG Removal
• Remove ICGs that are ineffective (small fanout and mostly
enabled) or causing unbalanced ICG levels

s1 flop

i1 clock gate

macro

r3
r2 buffer
r1

r4
r6

i4
r5

© 2009 Synopsys, Inc. (15)


ICG Splitting
• Split ICGs based on timing, DRC and placement

i2
s1 flop

i1 i1 clock gate

macro

buffer

i4

© 2009 Synopsys, Inc. (16)


Issue: Higher ICGs

• Merging ICGs with the same enable may save more


clock tree power
Can gate a larger subtree
• Splitting ICGs make enable signal timing more easily
satisfied Merge

a
Split 3

a a
1 2

© 2009 Synopsys, Inc. (17)


Issue: Higher ICGs

• Does not always save power!


Higher ICGs restrict the sharing of the subtrees
May introduce enable timing violations

Merge

a
Split 3

a a
1 2

© 2009 Synopsys, Inc. (18)


Issue: Multi-Layer ICGs

• Multi-layer ICGs may save more clock tree power


DC PwrC generates multi-layer ICGs by enable factoring
May gate a larger subtree

Factor
c
3

Removal

a&c b&c a b
1 2 1 2

© 2009 Synopsys, Inc. (19)


Issue: Multi-Layer ICGs

• But does not always save power!


Multi-layer ICGs restrict sharing of the subtrees
Extra ICG may consume more power than a very small gated
subtree

Factor
c
3

Removal

a&c b&c a b
1 2 1 2

© 2009 Synopsys, Inc. (20)


Flop Placement: Register Clumping
• Around 8% reduction in total power on average

© 2009 Synopsys, Inc. (21)


Flop Placement: Register Banking
• Automatically place registers into banks
Reduce power
Reduce clock skew
Implemented as RP (relative placement) constraints
Routability may be an issue for some designs

© 2009 Synopsys, Inc. (22)


Industrial Solutions for Variation

• Clock meshes
• OCV-aware clustering

© 2009 Synopsys, Inc. (23)


metal 8 metal 7 metal 6 planGroup boundary periphery IOs

Clock Meshes
• Good skew under variation
Tree above the mesh
Trees below the mesh to drive
the flops
• Bad for power
(~+30% clock power)
More wires
Can only gate the small clock trees below the mesh
• Few SNPS customers mass-produce IC products with
clock meshes
• Insight from clock mesh?
Regularity good for variation
© 2009 Synopsys, Inc. (24)
Dummy ICG Insertion
• Insert dummy ICGs to balance topology

i2
flop

i1 i1 clock gate

macro

buffer

i3

i4

© 2009 Synopsys, Inc. (25)


OCV-Aware Register Clustering
• Try to cluster registers with critical timing paths in between within
a 1st or 2nd level cluster minimize variation impact to timing

i2
flop

i1 i1 clock gate

i j macro

buffer

i3
i j
i4

© 2009 Synopsys, Inc. (26)


Register Clumping
• Place registers closer for the leaf-level buffers or ICGs to
minimize leaf-level net capacitance (>50% of total net cap)

i2
flop

i1 i1 clock gate

macro

buffer

i3

i4

© 2009 Synopsys, Inc. (27)


Register Banking
• Place registers into rectangular banks (more dramatic form of
register clumping)

i2
flop

i1 i1 clock gate

macro

buffer

i3

i4

© 2009 Synopsys, Inc. (28)


Regular Clock Tree Synthesis
• Synthesize regular buffer tree based on DRC and placement
Balanced buffer levels, ICG levels, fanout, wire length (by placement), metal
layer
i2
i2
flop

i1 i1 clock gate
i1
macro

buffer

i3

i4

© 2009 Synopsys, Inc. (29)


Industrial Solutions for Timing

• Clock routing
• Useful skews
• Inter-clock delay balancing
• CTS for SoCs
• Multi-voltage-domain and multi-mode CTS

© 2009 Synopsys, Inc. (30)


Clock Routing

• Signal routing mostly for routability (wire length) and


timing
• Clock routing mostly for skew under variation and power
(wire length)

• Snaking for skew under variation


• Selective shielding clock nets to
control skew
© 2009 Synopsys, Inc. (31)
Detail-Route Clocks with Wire
Snaking
• Detail route the clock tree using minimal wire snaking (and
shielding) to fine-tune skew

i2

flop

i1 clock gate
i1
macro

buffer

i3
i4

© 2009 Synopsys, Inc. (32)


Before Physical Synthesis

• Irregular gated clock topology bad for variation and


power

m1 flop

i1 clock gate
m2
macro

r3
r2 buffer
r1

r4
r6

i4
r5

© 2009 Synopsys, Inc. (33)


After Physical Synthesis

• Regular gated clock topology good for variation and


power
i2

flop

i1 clock gate
i1
macro

buffer

i3
i4

© 2009 Synopsys, Inc. (34)


Useful Skew

• Clock skews can be used to fix timing violations


Setup: trigger the launcher sooner and/or the capturer later
Hold: trigger the launcher later and/or the capturer sooner
• Risk
Clock skew is hard to control under variation

i j

© 2009 Synopsys, Inc. (35)


Inter-Clock Delay Balancing
• If there are timing paths from one clock domain to
another clock
• Inter-clock delay must be balanced based on the
timing constraints
• Extra insertion delay is bad for power and variation

i j

© 2009 Synopsys, Inc. (36)


CTS for SoCs

• SoC
Large number of IPs with
known clock latencies
• Hard to balance skews Routing Clock source Placement
blockages blockage
Large number of placement
and/or routing blockages
• Hard to balance topology
Multiple voltage domains and
multiple operation modes
• Multiple clocks
Macro clock pins
• Complex requirements
© 2009 Synopsys, Inc. (37)
Multiple Voltage Domains and
Multiple Modes
• Clock shared by multiple voltage
domains
No timing path in between insert
isolation cells near the top of the
clock tree to save power Routing Clock source Placement
blockages blockage
• Clock going through a voltage
domain that may be turned off in
an operation mode
Clock buffers powered by always-
on power rail
• Implication in power rail synthesis

Macro clock pins

© 2009 Synopsys, Inc. (38)


Summary

• Competitive clock synthesis technology is key for IC


product differentiation
Power
Variation
Performance
• Academic research in this area will make huge
impacts to the real world through the realization of
cool, robust and fast ICs

© 2009 Synopsys, Inc. (39)


Backup Slides

© 2009 Synopsys, Inc. (40)


Pipelined Datapath with Bubbles

• Invalid data (bubbles) may go through the pipelined


datapath and consume energy during computation

+ * &

© 2009 Synopsys, Inc. (41)


Conventional Clock Gating

• Gate clock upon invalid data


always @ (posedge clk)
if (vld) begin a1 <= a0; b1 <= b0; end

+ * &

© 2009 Synopsys, Inc. (42)


Sequential Clock Gating

• Introduce a valid-bit pipeline to track valid data and gate


the clock to the datapath so that no pipeline stage
computes for invalid data

+ * &

© 2009 Synopsys, Inc. (43)

You might also like