Professional Documents
Culture Documents
Distribution
Power dissipation in clock distribution, Single driver
versus distributed buffers, buffers & device sizing
under process Variations, zero skew Vs. Tolerable
skew, Chip & Package co-design of Clock network
Introduction
• In synchronous systems, chip performance is directly
proportional to its clock frequency.
• Clock nets need to be routed with great precision, since the
actual length of the path of a net from its entry point to its
terminals determines the maximum clock frequency on which
a chip may operate.
• A clock router needs to take several factors into account,
including the resistance and capacitance of the metal layers,
the noise and cross talk in wires, and the type of load to be
driven.
• In addition, the clock signal must arrive simultaneously at all
functional units with little or no waveform distortion.
• Another important issue related to clock nets
is buffering, which is necessary to control
skew, delay and wave distortion.
• However, buffering not only increases the
transistor count, it also significantly impacts
the power consumption of the chip. In some
cases, clock can consume as much as 25% of
the total power and occupy 5-10% of the chip
area.
Clock Tree models
Nehalem clock distribution inIntel® CoreTM
i7/i5/i3 processors
Spine structure for clock distribution
A Scalable, Sub-1W, Sub-10ps Clock Skew, Global Clock Distribution
Architecture
for Intel® CoreTM i7/i5/i3 Microprocessors
where the second and third terms are the global and local
wiring capacitance respectively, α is an estimation factor
depending on the algorithm used for local clock routing
Dynamic power dissipated by clock increases as the number of clocked
devices and the chip dimensions increase.
• global clock may account for up to 40% of the total system power
dissipation
• For low power clock distribution, measures have to be taken to
reduce the
1.clock terminal load,
2.the routing capacitance and
3.the driver capacitance
• Clock skews are the variations of delays from clock source to clock
terminals.
• To achieve the desired performance, clock skews have to be
controlled within very small or tolerable values
• Clock phase delay, the longest delay from source to sinks, also has
to be controlled in order to maximize system
throughput
• technology advances into deep-submicron, device
sizes are shrinking rapidly which reduces the clock
terminal capacitances.
• The increase of chip dimensions also make the
interconnect capacitance increasingly important .
• the high frequency requirement, performance
driven clock tree construction methods such as
adjusting wire lengths or widths to reduce clock skew
increase the interconnect capacitance to a more
dominant part of the total load capacitance on clock.
• Reducing the interconnect capacitance may
significantly reduce the overall system power
consumption
• low power systems with reduced supply
voltages require increasing the device sizes to
maintain the necessary speed, i.e. the sizes of
clock drivers are increased substantially to
ensure fast clock transitions.
• This exacerbates both the dynamic and short-
circuit power dissipated by clock.
• clock distribution is a multi-objective design
problem. Minimizing clock power consumption
has to be considered together with meeting
the constraints of clock skew and phase delay
Single Driver vs. Distributed Buffers
1. Clock driving schemes
Single driver scheme
Distributed buffers scheme
2. Buffer insertion in clock Tree