Professional Documents
Culture Documents
Clockg PDF
Clockg PDF
for
Power and CTS QoR
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
Objective
Describe the clock gating methodology to meet target
Skew
Insertion delay
Power
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
EN
EN
CLK
CLK
High
activity
gclk
Low
activity
Area savings
Eliminating multiplexers saves area
Easy to implement
No RTL code change is required
Clock gating is automatically inserted by the tool
Technology independent
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
Input RTL
Insert
Insertclock
clockgating
gating
Compile
Compile
IC
ICCompiler
Compiler
Physical
PhysicalCompiler
Compiler
Merge clock gates
Merge clock gates
Placement and placement
Placement and placement
optimization
optimization
Astro
Astro
Replicate clock gates
Replicate clock gates
Clock tree synthesis
Clock tree synthesis
Detail routing
Detail routing
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Methodology
Clock
gating considerations
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
10
Input RTL
Read
Readin
inVerilog
Verilog
read_verilog
read_verilog
Define
Definethe
theclocks
clocks
create_clock
create_clock
Insert
Insertclock
clockgating
gating
insert_clock_gating
insert_clock_gating
RTL Synthesis
Compile
Compile
compile
compile
11
Minimum bitwidth
This is the minimum bitwidth of register banks that will be
gated
By default, the minimum bitwidth is 3
No area or power benefit with register banks with bitwidth
less than 3
RTL Synthesis
12
Module A
d1
d1
a
b
a
b
EN
CG
clk
EN
Module B
Module B
d2
d2
EN
CG
Top
Top
clk
CG
Extra ports
added
13
= 160.6544 mW
= 102.5581 mW
--------= 263.2125 mW
(61%)
(39%)
(100%)
RTL Synthesis
14
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Methodology
Clock
gating considerations
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
15
RTL Synthesis
16
EN
CLK
CLK
EN
GCLK
GCLK
No glitches on gated
clock
RTL Synthesis
17
GCLK
CLK
18
CLK
EN
CLK
CG
( )
RTL Synthesis
( + )
19
RTL Synthesis
20
ICG
ICG
ICG
ICG
21
60
60
ICG
ICG
300
ICG
ICG
30
30
ICG
27
108
ICG
ICG
27
ICG
22
minimum_bitwidth value \
-max_fanout value
RTL Synthesis
23
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
24
Small fanout
To keep the clock gate and its register fanout together
during placement, use
set physopt_disable_auto_bound_for_gated_clock false
Helps
Physical Synthesis
25
Physical Synthesis
26
Gate-level design
Identify
Identifyclock
clockgates
gates
identify_clock_gates
identify_clock_gates
Merge
Mergeclock
clockgates
gates
merge_clock_gates
merge_clock_gates
Placement
Placementoptimization
optimization
Only required in a
Verilog-based flow
27
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Prepare your clock structure for CTS
Replicate clock gates
Summary of recommendations
Sample results
Planned enhancements
Summary
28
60
Replicate clock
gates
60
34
ICG
ICG
ICG
ICG
300
28
31
ICG
ICG
ICG
ICG
108
28
ICG
ICG
25
25
ICG
ICG
8
Clock Tree Synthesis
29
EN1
ICG
ICG
EN2
EN2
ICG
ICG
Active High
ICG
To enable, use
set power_cg_all_registers true
RTL Synthesis
30
ICG
25
ICG
25
ICG
20
ICG
108
31
ICG
ICG
25
32
ICG
25
Same engine used for clustering in clock tree synthesis and clock gate replication
Clock Tree Synthesis
31
Clock gates are larger than clock buffers and consume more power
Impact on power and area
32
Only when
needed
Placed design
Yes
Replicate
Replicateclock
clockgates
gates
Clock
Clocktree
treesynthesis
synthesis
Meet target
skew ?
Yes
Detail routing
Clock Tree Synthesis
Unbalanced
clock
structure ?
No
No
Check other
factors
33
34
35
36
Using split_clock_net in IC
Compiler
split_clock_net
objects object_list
-gate_sizing
gate_relocation
37
Why?
Based on
38
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
39
40
41
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
42
Flow highlights
RTL synthesis No max fanout constraint
Insert clock gating
150ps
(default: unlimited)
Insert always active clock
gating cells
Physical
synthesis
No group bounds
Results
Clock tree
synthesis
Final skew
141ps
Final power
27mW
43
Flow highlights
RTL synthesis No max fanout constraint
Insert clock gating
Target skew
100ps
Results
Final skew
91ps
Final power
16mW
(default: unlimited)
Insert always active clock
gating cells
Physical
synthesis
No group bounds
Clock tree
synthesis
No replication of clock
gates
44
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
45
IC Compiler only
Use clock gate optimization to optimize the timing of the
enable pin after CTS
46
Agenda
Objective
Introduction to clock gating
Clock gating methodology
Overview
RTL synthesis
Physical synthesis
Clock tree synthesis
Summary of recommendations
Sample results
Planned enhancements
Summary
47
Summary
Understand the power and CTS requirements of your
design
Choose the clock gating methodology based on your
design requirements
Use integrated clock gating
Process the clock structure based on your CTS and power
requirements
Select the right fanout of clock gates during RTL
synthesis
Use merge and replication of clock gates only if
necessary
48
Appendix
Sample scripts
Summary of clock gating methodologies
Overview of clock gating methodology using ASCII
interchange format
How to handle enable signal timing
Equivalence checking in Formality
Clock gating and design-for-test
Details on replicate clock gates
Additional considerations with discrete clock gating
49
Sample DC Script
#Set clock gating options, max_fanout default is unlimited
set_clock_gating_style
-sequential_cell latch \
-positive_edge_logic {integrated} \
-control_point before \
-control_signal scan_enable
#Create a more balanced clock tree by inserting always enabled ICGs
set power_cg_all_registers true
set power_remove_redundant_clock_gates true
read_db design.gtech.db
current_design top
link
source design.cstr.tcl
#Insert clock gating
insert_clock_gating
compile
#Generate a report on clock gating inserted
report_clock_gating
50
clock_tree Clk \
max_capacitance 0.3 \
-max_transition 0.3
51
52
53
Example:
cell1 cell2
cell2 cell3
cell4 cell5
cell1,
54
Unlimited Clock
Fanout at RTL
Balanced Clock
Fanout at RTL
When?
Why?
Power is a priority.
CTS QoR, enable pin
constraints more flexible.
Based
on
55
Input RTL
Insert
Insertclock
clockgating
gating
Compile
Compile
IC
ICCompiler
Compiler
Physical
PhysicalCompiler
Compiler
Identify clock gating cells
Identify clock gating cells
Merge clock gates
Merge clock gates
Placement and placement
Placement and placement
optimization
optimization
Astro
Astro
Replicate clock gates
Replicate clock gates
(astSplitClockNet)
(astSplitClockNet)
Clock tree synthesis
Clock tree synthesis
Detail routing
Detail routing
Skew analysis
Skew analysis
56
CLK
Registers
CG
( )
( + )
57
Formal Verification
The Synopsys formal verification tool, Formality, can
perform equivalence checking when the design has
inserted clock gating cells
The following command instructs Formality to account
for clock gating logic
fm_shell > set verification_clock_gate_hold_mode any
58
59
Levels of
design
hierarchy
Data in
Data out
D
Di
Flipflops
CLK
EN
Enable
logic
ENCLK
Flipflops
Latch
G
Clock is not
controllable
= not tested
= partially tested
= fully tested
60
scan_enable
Levels of
design
hierarchy
Control point
Data in
Data out
D
Di
Flipflops
CLK
Control
logic
EN
Latch
G
= not tested
= partially tested
= fully tested
ENCLK
Register
bank
61
test_mode
Levels of
design
hierarchy
Control point
Data in
Data out
D
Di
Flipflops
CLK
Enable
logic
D
EN
Latch
G
= not tested
= partially tested
= fully tested
Register
ENCLK bank
62
Complete Observability
EN3
Other
observability
nodes
Observe
flop
EN2
CLK
EN1
D
dataout
testmode
EN
Latch
CLK
Unobservable point
63
SE1
CG1
FF
SE2
SE3
CG1
FF
64
Replication of ICG
Load on
ICG: 2pf
8 ICGs
65
Constraints
The replication of the specified instances is based on fixing DRC at the
output of each instance
The DRC constraints considered are maximum fanout, maximum
capacitance and maximum transition
The tool converts maximum fanout and maximum transition into
equivalent capacitance values, and uses the tightest of the three
capacitance values as the maximum capacitance constraint
Behavior
The tool splits the specified instance as many times as is necessary to
fix the DRC on the output of each clock gate
66
Solution
2000 registers
~120 ICGs
3000 registers
Load on each ICG < 0.35pf
Fanout of each ICG ~ 25
67
Solution
~80 ICGs
1000 registers
2000 registers
~120 ICGs
68
Solution
Set the following DRC constraints (specify a large maximum capacitance and
maximum transition constraint, so that the tool chooses the maximum fanout
constraint as the tightest constraint)
set_clock_tree_options \
max_capacitance 10000 \
max_transition 10000 \
max_fanout 1000
split_clock_net object clk
1000 registers
2 ICGs
1000 registers
3 ICGs
2000 registers
Fanout of each ICG ~1000
3000 registers
69
Solution
Replicate the clock gate cg2 such that the fanout of each replicated instance is ~200
set_clock_tree_options \
max_capacitance 10000 \
max_transition 10000 \
max_fanout 200
split_clock_net object cg2
1000 registers
200 registers
~15 ICGs
1000 registers
200 registers
3000 registers
195 registers
195 registers
Fanout of each ICG ~ 200
70
EN
EN1
GCLK
A
CLK
EN
EN1
CLK@ B
GCLK
glitch!
71
In Astro,
Place the latch and AND gates close together
Specify a large netweight on the net
Get the clock to go through the latch, that is, ignore the CLK
pin of the latch as a sync pin
Use the astSetClockNonStop command
Refer to SolvNet article 003097