Professional Documents
Culture Documents
Ks Mar07 PDF
Ks Mar07 PDF
A Thesis
Submitted for the Degree of
Master of Science (Engineering)
In the Faculty of Engineering
By
Kalpesh Shah
Acknowledgements
My sincere gratitude to both my guides - Prof S K Nandy and Dr. Vish Visvanathan. Prof
Nandy, thank you for your guidance right from the start of the MS curriculum till the end. I
would not have dreamt of the final chapters had it not been for your timely guidance. To
Vish, thank you for bearing with me and guiding me from the beginning till end, in your
busy schedule at office. You are the one who encouraged me from enrolling for this
program till end. Thank you for your valuable inputs and comments on the material. My
sincere thanks to IISc and specifically SERC staff who helped me through various
administrative work.
To my colleagues and managers at Texas Instruments, thank you for your cooperation you are a team I am proud of. Thanks for your support and the camaraderie. A special
thanks to Harinath for approving my MS Program and Venugopal Puvvada, my manager
when most of this work happened. Discussions with him made this work relevant to Multimillion gate designs and found real application.
Thanks to many of my friends with whom I discussed similar topics like my research
throughout this period Ananth, Gokul, Mallik, Suravi, Saby, Bram, Ashish, Aishwarya and
Sumedha. A special thanks to Anjana Ghose for all that you did for me while I was not in
Bangalore.
Thanks to my family for having stood behind me like a rock. To my parents, thanks for
your support and affection your unrelenting persistence helped me to complete last
step. To Pratiksha thank you for being my invisible strength. Your constant reassuring
presence and confidence in me drove me to this point in journey. To Bhavesh and Deepti
thank you for being my savior at times of load at home. Without you folks, this thesis
would not have materialized. And finally, thanks to little Harsh who came to this world
halfway through my MS and Darsh who saw my MS from the age of 1 year you kept me
giving unasked needed breaks and made everything so live.
Table of Contents
Acknowledgements.................................................................................................................. 3
Abstract ................................................................................................................................... 11
1
Introduction ...................................................................................................................13
1.1
Motivation ........................................................................................................................................13
1.1.1
1.1.2
1.1.3
1.2
1.3
Terms ..............................................................................................................................................24
Thesis outline and Contribution......................................................................................................25
2.1
2.2
2.3
Overview .........................................................................................................................................27
Toggle Activity Estimation ..............................................................................................................29
Multi-million gate solution ...............................................................................................................30
2.3.1
2.3.2
2.4
2.5
Power Estimation.......................................................................................................... 39
3.1
3.2
3.3
Overview .........................................................................................................................................39
Current approaches to Power Analysis..........................................................................................42
Power analysis Tools ......................................................................................................................45
3.3.1
3.3.2
3.3.3
3.3.4
3.4
3.4.1
3.4.2
3.4.3
3.5
3.6
3.6.1
3.6.2
3.6.3
3.6.4
3.6.5
3.7
Summary .........................................................................................................................................62
4.1
4.2
Overview .........................................................................................................................................63
Cell Characterization.......................................................................................................................64
4.2.1
4.2.2
4.3
4.3.1
4.4
4.4.1
4.4.2
4.4.3
4.5
4.5.1
4.5.2
4.6
Summary .........................................................................................................................................87
Power Up Analysis........................................................................................................89
5.1
5.2
5.2.1
5.2.2
5.3
5.4
Conclusion...................................................................................................................105
6.1
6.2
Summary .......................................................................................................................................105
Scope of Future Work...................................................................................................................106
References...................................................................................................................109
Table of Figures
Figure 1.1 Power Dissipation in CMOS designs ......................................................................................13
Figure 1.2 Power Density trend in CMOS designs...................................................................................14
Figure 1.3 Leakage and Dynamic Power Dissipation [2].........................................................................15
Figure 1.4 Schematic of Power Grid in CMOS designs...........................................................................18
Figure 1.5 Normalized delay and normalized delay to voltage ratio........................................................21
Figure 1.6 Total power break up into leakage and active........................................................................23
Figure 2.1 Schematic of logic circuit 1......................................................................................................31
Figure 2.2 Schematic of Logic Circuit 2....................................................................................................32
Figure 2.3 Gated clock example ...............................................................................................................34
Figure 2.4 Gate Level Netlist for 'simple' design......................................................................................36
Figure 2.5 Timing Arcs in extracted model of 'simple' design..................................................................37
Figure 3.1 Venn diagram of Power Components.....................................................................................40
Figure 3.2 Power Estimation in Design Stages ........................................................................................45
Figure 3.3 Power Estimation Validation Flow...........................................................................................49
Figure 3.4 Legends for Validation Flow ....................................................................................................49
Figure 4.1 Voltage over time representation at an internal design node ................................................63
Figure 4.2 Schematic circuit for instantaneous voltage drop analysis ....................................................64
Figure 4.3 Inverter waveforms measured at different nodes...................................................................66
Figure 4.4 transition time vs. peak power for Inverter..............................................................................68
Figure 4.5 Transition time vs. peak power for nand gate.........................................................................68
Figure 4.6 Load vs. peak power for AND gate.........................................................................................69
Figure 4.7 Load vs. Peak power for OR gate...........................................................................................69
Figure 4.8 State Dependency on cell switching .......................................................................................70
Figure 4.9 Cell Characterization Flow.......................................................................................................72
Figure 4.10 Power Grid Modeling .............................................................................................................73
Figure 4.11 Peak IR drop Computation Flow ...........................................................................................79
Figure 4.12 Prime Time flow for arrival time computation .......................................................................80
Figure 4.13 Power Grid Generation Flow.................................................................................................81
Figure 4.14 PSN waveform of Proposed Method.....................................................................................86
Figure 4.15 PSN Reference Waveform ....................................................................................................86
Figure 5.1 Gated Power Supply ([74]) ......................................................................................................89
Figure 5.2 Layout of 1M gate with switch network ...................................................................................92
Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output..................................................92
Figure 5.4 Typical PG network with Power Switches...............................................................................93
Figure 5.5 Schematic Switch network Analysis Flow...............................................................................95
Figure 5.6 Analysis model of Virtual Power Network...............................................................................96
Figure 5.7 Infinitesimal Time Division for Current Prediction...................................................................97
Figure 5.8 Reduced Switch Network for validation ................................................................................100
Figure 5.9 Voltage Ramp up over Time for various nodes ....................................................................103
Figure 5.10 Current comparison over time.............................................................................................103
Figure 1 1MHz, Peak: 838.9 uW.............................................................................................................118
Figure 2 100MHz, Peak: 840.7 uW.........................................................................................................118
List of Tables
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
10
Abstract
Power has become an important design closure parameter in todays ultra low submicron
digital designs. The impact of the increase in power is multi-discipline to researchers ranging
from power supply design, power converters or voltage regulators design, system, board and
package thermal analysis, power grid design and signal integrity analysis to minimizing power
itself. This work focuses on challenges arising due to increase in power to power grid design
and analysis.
Challenges arising due to lower geometries and higher power are very well researched topics
and there is still lot of scope to continue work. Traditionally, designs go through average IR
drop analysis. Average IR drop analysis is highly dependent on current dissipation estimation.
This work proposes a vector less probabilistic toggle estimation which is extension of one of
the approaches proposed in literature. We have further used toggles computed using this
approach to estimate power of ISCAS89 benchmark circuits. This provides insight into quality
of toggles being generated. Power Estimation work is further extended to comprehend with
various state of the art methodologies available i.e. spice based power estimation, logic
simulation based power estimation, commercially available tool comparisons etc. We finally
arrived at optimum flow recommendation which can be used as per design need and schedule.
Todays design complexity high frequencies, high logic densities and multiple level clock and
power gating - has forced design community to look beyond average IR drop. High rate of
switching activities induce power supply fluctuations to cells in design which is known as
11
instantaneous IR drop. However, there is no good analysis methodology in place to analyze this
phenomenon. Ad hoc decoupling planning and on chip intrinsic decoupling capacitance helps
to contain this noise but there is no guarantee. This work also applies average toggle
computation approach to compute instantaneous IR drop analysis for designs. Instantaneous IR
drop is also known as dynamic IR drop or power supply noise. We are proposing cell
characterization methodology for standard cells. This data is used to build power grid model of
the design. Finally, the power network is solved to compute instantaneous IR drop.
Leakage Power Minimization has forced design teams to do complex power gating multi
level MTCMOS usage in Power Grid. This puts additonal analysis challenge for Power Grid in
terms of ON/OFF sequencing and noise injection due to it. This work explains the state of art
here and highlights some of the issues and trade offs using MTCMOS logic. It further suggests
a simple approach to quickly access the impact of MTCMOS gates in Power Grid in terms of
peak currents and IR drop. Alternatively, the approach suggested also helps in MTCMOS gate
optimization. Early leakage optimization overhead can be computed using this approach.
12
1 Introduction
1.1 Motivation
VLSI industry is facing one of the biggest challenges in its evolution Power Integrity closure
the next after cross talk induced integrity issues in previous decade. Power Dissipation has
phenomenally increased across years as shown in Figure 1.1 giving rise to this challenge.
Figure 1.2 shows the increase in power density due to ultra low scaling and hence increasing
the components cramped in unit area.
100000
18KW
5KW
1.5KW
Power (Watts)
10000
500W
1000
Pentium proc
100
286
10
8008
1 4004
8086
8085
8080
486
386
0.1
1971
1974
1978
1985
1992
Year
2000
13
2004
2008
10000
Rocket
Nozzle
1000
Nuclear
Reactor
100
10
8086
4004
8008
8080
8085
286
Hot Plate
386
P6
Pentium proc
486
1
1970
1980
1990
Year
2000
2010
Table 1.1 below shows consolidation of ITRS2003 [1] predictions on power as well as its
impact on design as well as operating voltages.
2003
2004
(90u)
2005
2006
2007
(65u)
2008
2009
2010
(45u)
2012
Vdd(High Perf)
1.2
1.2
1.1
1.1
1.1
0.9
Vdd(Low Power)
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
149
158
167
180
189
200
210
218
240
Battery Operated(W)
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
PG Pads
1700
1800
2000
2100
2200
2300
2400
2400
2600
14
Further, Figure 1.3 shows that there is leakage as well as dynamic component of power those
are continuously increasing leakage dominating dynamic in newer technology nodes. [2]
Next sections describe how these give rise to challenges in Power Grid analysis and leads to the
work done.
15
16
proposed in a few papers. In this technique, the use of probabilities was expanded to allow the
specification of probability waveforms. This approach assumed spatial independence, and was
not restricted only to synchronous circuits.
Another probabilistic approach was proposed, where the transition density measure of circuit
activity was introduced by Farid N. [12]. An algorithm was also presented for propagating the
transition density in to the circuit. This approach does not make a zero-delay assumption and
makes only the spatial independence assumption. Result of this independence assumption
makes computed density values insensitive to the internal circuit delays.
Yet another probabilistic approach was presented in [13] by A. Ghosh et. al., where Binary
Decision Diagrams (BDDs) were used to take into account internal node correlations and
toggle power, at the cost of increased computation. This approach can become computationally
expensive. Apart from that, latest literature describes more accurate toggle estimation methods
based on Bayesian networks [14-16]. They get limited to handle high gate count designs. All of
the above probabilistic and statistical techniques are applicable only to combinational circuits.
They require the user to specify information on the activity at the latch outputs.
This work addresses the toggle computation problem or pattern dependence problem for multimillion gate designs by extending Najms approach [12]. Using this average power estimation
has been performed in various stages of the designs.
1.1.2 Power Supply Noise
With a phenomenal rise in the switching speed in the VSLI circuits, the probability of large
number of cells switching in a short period of time increases. A large number of simultaneous
17
switching occurring in a short period of time can cause a considerable amount of noise in the
power supply network of a circuit. Power supply noise means decrease in voltage seen by cell
Power Ground nodes. Schematic of Power Network gird is shown in Figure 1.4. The resistive
parasitic R in the power distribution network is accountable for the resistive noise, which is the
IR voltage drop in the PG network. Apart from R, on chip decoupling capacitance also plays a
big role. The switching noise in the power distribution network must be contained to a tolerable
level to ensure the reliability/performance of a circuit.
IO Pad
IO Pad
IO Pad
IO Pad
Vss
Pad
Vss Pad
IO Pad
1
IO Pad
IO Pad
Excessive voltage drops manifest themselves as glitches on the PG buses and cause:
IO Pad
According to a study on Pentium4 [26], power supply noise can reduce clock frequency by
6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these are
handled through various margins in design flow as there are no efficient solutions available to
address dynamic V drop problem in design flow.
There is some work done to estimate peak power as well as decoupling capacitor in this regard.
In [27], a pattern-independent, linear time algorithm is described that estimates the maximum
current waveforms at various contact points in the circuit. The algorithm is first demonstrated
for simple gate delay and current models. The expression for modeling the delays and current
waveforms for a general gate is derived and the way to extend the algorithm under more
general models is also described. The authors improved the work in [28]. In [29] measures of
peak power are proposed in the context of sequential circuits, and a procedure is presented to
obtain lower bounds on these measures, as well as providing the actual input vectors that attain
such bounds. Automatic generation of a functional vector loop for near-worst case power
consumption is attained.
power dissipation in VLSI circuits. The method is based on the theory of extreme order
statistics and its application to the probabilistic distributions of the cycle-by-cycle power
consumption, the maximum-likelihood estimation, and the Monte-Carlo simulation. It can be
used to predict the maximum power of a VLSI circuit in the set of constrained input vector
pairs as well as the complete set of all possible input vector pairs. The simulation-based nature
of the method avoids the limitations of a gate-level delay model and a gate-level circuit
structure. Also, the method produces maximum power estimates to satisfy user-specified error
19
and confidence levels. Experimental results show that this method typically produces maximum
power estimates within 5% of the actual value and with a 90% confidence level by only
simulating less than 2500 input vectors. Another technique described in [31] computes peak
powers of design while maintaining the current waveform accuracy. It models logic gates by
breaking the gates into various nodes. It then models various currents in terms of these nodes
which are evaluated quickly during logic simulation to measure power. However, this is based
on logical simulation so extremely difficult to scale.
Chen and Ling [36] proposed an approach to estimate the power supply noise based on an
integrated package-level and chip-level power bus model. Chang, Gupta, and Breuer [37]
proposed an analytical model to estimate the ground bounce caused by the switching in the
internal circuitry for sub-micron VLSI circuits. Jiang, Cheng, and Deng [38] proposed a
Genetic Algorithm-based approach that considered the dependence of switching noise on input
patterns under a distributed RC model of the PG network. Zhao, Roy, and Kho proposed an
event-driven simulation based approach to calculate the worst case power supply noise under a
distributed RLC model [39].
There are still more challenges in this area where very little work has been done.
First, to analyze Power Ground (PG) noise, worst case vectors are required using which the
parasitic network of chip is simulated. Not only the whole approach needs lot of data and
memory but todays SPICE simulators are not able to handle such complexity in terms of
runtime and capacity. Many times (read as all the time) determining the worst case vectors is
not straightforward.
20
Second, todays design has huge PG network. It is known that the voltages seen at various
nodes in this network will vary. A resultant voltage across power-ground bus for a macro
impacts the delay as shown in Figure 1.5. Note that delay is non-linear at low voltages. Further,
the change in delay to change is voltage is more non linear compare to delay this is of very
important to designers as it can cause delay issues or design failures. Due to high dependency
of delay to voltage, dynamic V-drop in PG network is fast becoming a critical concern for the
Rise Delay
Fall Delay
risedelay2voltage_chan
ge
falldelay2voltage_chang
e
Voltage
Third aspect to PG noise problem is that it is an iterative phenomenon [41]. When voltage
across cell decreases due to sudden rise in switching activity, it also changes the delays and
hence the simultaneous switching. This in turn can reduce/increase the dynamic noise issues.
Reduce in a sense that the simultaneous switching may reduce all together or increase because
it can move one hot spot of the design to some other hot spot. Handling of this is not a trivial
task from analysis perspective.
21
Four, design methodologies today expect analysis to meet predefined PG noise targets. In
reality, any acceptable voltage drop is fine if we meet the required timing goals. However, this
is not done due to lack of analysis data.
Five, it has been found that many times the device fail on testers due to excessive simultaneous
switching in SCAN testing. This creates serious testability issues and hence not only we need to
analyze dynamic V drop for functional mode but also some other modes like test.
This work addresses the dynamic PG noise problem. The problem is also described as dynamic
V drop problem in some literature. Based on the above-mentioned issues, the goal is to address
the dynamic V drop problem with efficient runtime that addresses todays multi million gate
designs. The goal is to also evaluate the impact of dynamic V drop on timing.
1.1.3 MTCMOS Analysis
Leakage power consists of more than half of total power in todays ultra sub micron designs.
See Figure 1.6 below.
22
Leakage power control and power network integrity have become one of the key area of
interest for todays power sensitive designs. In comments on Power Consumption Problem at
the 2002 International Electron Devices Meeting, Intel chairman Andrew Grove cited off-state
current leakage in particular as a limiting factor in future microprocessor integration. [72]
Designers have been coming out innovative way to reduce leakage power using various
techniques reducing device power supply and frequency of operation [73], Multi-Vt transistor
usage [74-79], controlling input states [74], memory leakage reduction [75], using reverse body
bias [76], and using transistor stack [77]. A detailed study on sources of leakage power and
reduction techniques can be found in [82].
Several techniques are available to reduce the leakage gated power supply using power
switches is one of the most promising techniques. Power switches consist of several PMOS
23
transistors and controlling signals and are used to dynamically switch off or on the power
supply to specific region in the chip. This work studies the challenges associated with using
power switches and proposes fast analysis technique to estimate peak currents while Power
ramp up of logic happens.
1.2 Terms
Generic terms used in this report are described below.
ASIC
Block
Also known as functional block or module. Any block within the design
hierarchy instantiated one or more times that will be laid out separately is
referred to as a block module. Block modules are defined divisions of a chip
based on functionality and can be worked on independently of other
functional blocks.
Netlist
A description of the circuit. The description can be a gate-level or RegisterTransfer level (RTL) one. It can also be in different languages like Verilog
or VHDL or SPICE.
Physical Design
RTL
Characterization
24
CMOS
Die
Timing Window
25
power estimation as well as addresses the challenges of toggle estimation which has varied
applications like peak power estimation, power supply noise analysis and reliability analysis.
Second, Dynamic Power supply Noise estimation. In this regard, a prototype flow is developed
in conjunction with Prime Time STA flow and Spice to measure Power Supply noise. The work
describes gate characterization methodology that involves one time SPICE simulation and how
the PG network is modeled using the characterized data.
Third problem addressed is power grid analysis where MTCMOS gates are inserted. The work
focuses on MTCMOS analysis challenges and key factors to focus on when a bunch of logic
turns ON from OFF state. In this regard, a flow is developed to estimate peak currents or
optimize MTCMOS resistance and switches.
We restrict out scope to CMOS circuits mapped on a predefined cell library and we follow the
two step paradigm library modeling and analysis of design using modeled information.
Library modeling involves description of cells, their functional, structural or electrical behavior
as needed for block or design analysis, which happens once for all. Electrical behavior
modeling happens through characterization using circuit simulator (e.g. SPICE [3]).
The document is organized as below. Toggle estimation problem is addressed in chapter 2.
Chapter 3 describes the various Power Estimation techniques and tools available in industry
and compares the power numbers with the above toggle estimation method. Chapter 4 describes
Power Supply Noise Estimation and Chapter 5 describes MTCMOS Power Up analysis. Finally,
huge lists of publications are shown at the end for further reference.
26
STATIC
DYNAMIC
27
STATIC
DYNAMIC
approach.
Vector-less approach.
Modeling of certain element (hard Since it is vector based, functional models can be
macro/complex block) is difficult.
Lot of research into products for average Can give instantaneous power.
power estimation.
Synopsys has: Power Compiler
This work describes the approach used for toggle frequency estimation and its limitations.
Further it proposes solution to handle these limitations which makes the approach usable for
big designs.
Few terms are used below to clarify discussion:
Transition Density: If a logic signal x(t) makes n(T) transitions in a time internal of
length T, then the transition density of x(t) is defined as:
D(x) = n(T)/T where T is very huge time (infinite ideally)
28
For large T, D(x) becomes time invariant function and hence there is no need to account
for temporal correlation.
Toggle Frequency: If a node x is toggling n(T) times over a time interval of length
T, then the toggle frequency F(x) is defined as:
F(x) = n(T)/(2*T) where T is very huge time (infinite ideally)
Example, if the node is switching at 20 MHz, it is expected that the node will switch 2
times in 50 ns. As it can be seen, the toggle frequency can be converted to transition
density or switching activity by the following equation,
Toggle density = #of transitions/Period = Switching Activity
All the three terms mentioned above are used interchangeably in this document.
It should be noted that toggle frequency of a node has no direct relation with the clock
domain(s) in which node (or logic) exists. We have used the clock domain frequency to
upper bound the toggle frequency calculated by our approach.
Signal Probability: Signal probability P(x) at a node x is defined as the average
fraction of clock period in which the stead state value of x is logic
high.
29
dy y
=
y
x
=
1
x=0
dx
(1)
It was shown in [5] that, if the inputs xI to boolean logic are (spatially) independent, then the
density of its output y is given by:
n
D( y ) = P(
i =1
dy
) D ( xi )
dxi
(2)
In (2), it is assumed that all inputs are independent. This can lead to inaccuracy where primary
inputs will be diverging and than reconverging to primary outputs they are not really spatially
independent. However, at a block, the primary inputs can be considered pretty much
independent and hence the above approach can be modeled more accurately if the whole
blocks boolean difference is computed.
Given the signal probability and toggle density values at the primary inputs of a logic circuit, a
single pass over the circuit, using (2), gives the density at every node. Note that apart from
estimating toggle densities at the output node, we also need to calculate output signal
probabilities to do toggle density estimation of subsequent circuit logic. This is simple for two
input AND gate.
P(Y) = P(A)*P(B)
or
30
done hierarchically or there is reusable IPs in design which do not have net list. The approach
described in previous section was extended to handle such requirements.
We also came across several issues while applying this approach to some large designs [>5M
gates] and implementing tool Toggle Frequency Calculator. In this section, we will discuss
solutions those addresses each of the problem in detail.
2.3.1 Deriving automatic toggle frequency values
1
31
In case of above, Input Clk or D going to block can be primary inputs. Unless user gives
toggle rate, it is highly difficult to compute the same. We used static timing analysis
[24][25] specifications to derive these inputs. They are,
Input Delay Specification A constraint that specifies the minimum or maximum
amount of delay from a clock edge to the arrival of a signal at a
specified input port. Input delay specification is with respect to a clock
that triggers events on that signal.
Clock specification specifies the characteristics of a clock, including the clock
name, source period and waveform.
Mode Specifications specifies the constant values applied on certain port or pins
to drive timing analysis in a specific mode. This means that these pins
or ports are not toggling during the analysis. It also specifies the
constant value to which the port or pin is tied to.
For clock inputs, we used the toggle rate specified as per the clock specification.
For non-clock inputs, we used the clock specified on the Input Delay specification.
For constant ports, we used 0 toggle rate and static probability based on constant value
tied i.e. if it is constant 0, static probability is 0 else it is 1.
32
A Sample SDC file with above command is shown in Appendix A. Note that SDC file
is collection of commands in tcl format so we have shown the commands which are
primarily required.
2
Some Boolean gates were not taking care realistic scenarios: exor/exnor gates, mux
Equation (1,2) can compute higher toggle rate than clock toggle rate. This can go even
higher than clock toggle rate if there are more such gates in transitive fan out. We found
that this is not the case on actual designs and in many cases, this was not intended
behavior. We exceptionally identified such cells and clipped their toggle rate to half of
the clock toggle rate.
In similar fashion, we exceptionally identified mux cells and assigned the output toggle
rate to maximum toggle rate of all inputs.
33
We made the gated elements transparent for toggle propagation. A clock gating cell is
handled like a buffer.
7
34
Some of the care needs to be taken despite of all the above solutions. For example,
toggle estimation must be done based on the targeted application. This drives certain
inputs used in 1-6 above. In the implementation, we kept certain hooks to give control
to the user.
2.3.2 Hierarchical Modeling
1. Huge portion of the design is occupied by memories however memory output switching
activity calculation is not straight forward
2. Complex functionalities: Hard macros
3. Multi-million gates cannot afford to have flat analysis due to cycle time and inherent
limitations of probabilistic approaches. We needed to devise a method to do hierarchical
analysis by modeling sub-blocks and using them as a black box.
We used the timing modeling approach to handle (1), (2), (3).
All standard library components are presently modeled in liberty file. [69] Static timing
analysis tools can generate similar liberty file for blocks after completing the analysis. [25]
This file has following information,
Setup and Hold constraints for the data input and clock input
35
Figure 2.4 shows the gate level netlist of a design called simple. Figure 2.5 shows the timing
arcs which will be extracted by Prime Time a leading industry timing analysis tool. [25]
Timing arc information will be used to compute output toggle rate as explained below.
36
There are combinational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2
will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1
toggle rate at output pin. The other timing arch is clk2->out1. In this case, out1 will be assigned
average switching activity of clk2.
Thus using timing model information, we generate output toggle rates of memories, complex
hard macros or blocks.
37
2.5 Summary
In this work, we address real issues being faced by large designs. Automatic toggle generation
eases usability as well as improves accuracy. Hierarchical analysis helps in hierarchical design
which is common methodology to handle design complexity.
38
3 Power Estimation
3.1 Overview
Accurate Power Estimates are necessary at various stages of the design in order to make correct
architectural, implementation and cost tradeoffs.[61] Architectural level tradeoffs are higher
level and involves software or instruction level power modeling or high level activity numbers
for different blocks to do implementation tradeoffs. Many times weighted averages are used to
identify best cost options [62-65]. Once the design gets converted to structural net list and
Physical Design starts, Power Estimation mainly drives package design, PG network design
and lower level power minimization. In this case, power dissipation is described as below.
39
Figure 3.1 defines various components of power and their relation ship or contribution to total
power estimation.
Internal
Power
PCellLeakage(i)
Cell(i )
40
In this work, above power components and their computation are extensively studied. To
address the problem in systematic manner, power estimation has been simplified the following
way. These assumptions are acceptable given the global analysis that we are considering.
Power supply and ground voltage levels throughout the chip are fixed so that it becomes
simpler to compute the power by estimating the current drawn by every sub-circuit assuming a
given fixed power supply voltage. Note that this does not mean that different blocks can not be
at different voltage level. This allows pre-characterizing library components for required
voltage points.
The circuit is built of logic gates and latches or reusable IPs, and has the popular and wellstructured design style of a synchronous sequential circuit. In other words, it consists of flops
driven by a common clock and combinational logic blocks whose inputs (outputs) are derived
from flop outputs (inputs). It is also assumed that the flops are edge-triggered and, with the use
of CMOS design technology, the circuit draws no steady-state supply current. This allows
breaking down average power dissipation of the circuit into 2 components
This chapter is organized as below. In the next section, we have further explained cell based
power analysis. Next section briefly introduces tools used to compare power estimation as
performed by toggle computation described in previous chapter. Later validation and results are
described.
41
PleakageTotal =
PCellLeaka ge(i)
(3)
Cell (i )
Where PcellLeakage(I) is the leakage power dissipation of each cell. Technology library developers
annotate the library cells with the approximate total leakage power dissipated by each cell.
There is usually a single static power number per library cell but sometimes leakage power can
depend on the logical condition of the cell. In this case, the library cell is annotated with a state
dependent static power.
A cells internal power is the sum of the internal power of all of the cells inputs and outputs as
modeled in the technology library:
Internal
Ei * A(i ) * f (i )
(4)
Pin ( i )
Where Ei is the internal energy of each pin. In practice, the internal energy if a pin is
characterized in the technology library and can be accessed by simple table look-up. Depending
42
on the required accuracy, different look-up tables can be provided by the library designers as
explained in Table 3.1.
Pin
Lookup Table
Indices
Direction
One-
Input/
dimensional
Output
Two-
Output
Output
dimensional
Threedimensional
(Cload (i ) * A(i ) * f (i ))
(5)
Cell
Where Cload(i) is the capacitive load of net i. Without any physical information, the load
capacitance Cload(i) is calculated using the wire load model of the net and the fanout of the
driving pin. Usually, this approach achieves relative accuracy.
Apart from the approaches mentioned above, the following factors are also important for
accurate power estimation.
43
(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOS
component as a capacitor, it is clear that power varies based on the variation on supply
voltage.
3. Power increases with increase in frequency of operation. In fact, many designs now a
day have different modes of operation. A high frequency mode when the device is
operational and a low frequency mode when the device is in standby mode. The impact
of frequency on power estimation is already being discussed in previous section.
4. Now a day, most of the designs have a significant chunk of flops or registers. According
to one statistics, around 40-50% logic of the design contains flops. If all the flops are
clocked throughout the operation, clock network consumes almost 50% of total power.
It is sometimes helpful to analyze power consumption on clock network. This work
analyzes clock power contribution to total power.
5. Process corner also impacts the currents and power consumption. This is especially true
for leakage power. A typical VLSI process has leakage power variation of order of 4-6
from worst process to best process.
44
Based on power sensitivity and tool study analysis in this section, we propose a power
estimation flow in typical design cycle as shown in Figure 3.2 below. Note that the power
Power Estimation
(spreadsheet)
Architecture
Forward SAIF*
Or Frequency
Constraints
RTL
Unplaced Netlist
Toggle Frequency
Calculator
Placed Netlist
Detailed Route Over
Power Estimation
in Power
Compiler (wire
load, global SPEF,
Detailed SPEF)
Logic Simulation
analysis varies from RTL design to pre layout netlist to post layout netlist.
PIF File
Generation
RC
RCSPICE
SPICENetlist
Netlist
NanoSim
Recommended
PrimePower
Least Preferred
45
circuit nodes and propagates the same. It is known fact that power compiler cannot estimate
good switching activity for sequential cells. It should be also noted that most ASIC vendors
have cell power modeling based on Synopsys Liberty syntax so it is highly important to have
single cell power estimation close to Power Compiler number. Synopsys Reference Manual on
Power Compiler [18] gives basic power calculation theory and description of terms being used
in its tools.
We used power compiler in two modes.
One mode was to use power compiler as complete solution for power estimation. In this
approach, we generated input switching activity from our vectors and specified to
power compiler. Power compiler propagated the switching activity based on switching
probability. It then calculates power. In this method, it used some assignment method
for sequential cells and we went ahead with that because our aim was to verify default
switching activity propagation algorithm of Power Compiler.
Second mode was to use power compiler just as power calculation engine. In this
approach, we generated switching activity at all the nodes by using methodology
defined in Chapter 3 and used the power calculation engine. As mentioned earlier,
power calculation engine is quite accurate and so based on power estimation; our aim
was to evaluate switching activity determination accuracy of other methods.
3.3.2 Power Mill (or Nano Sim) [4][68]
Power Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE engine at core. It
has been identified as nicely correlating for two of the single cell circuits and one small design
46
with SPICE. Power Mill is dynamic simulation based tool and hence it requires patterns for
simulation.
We used Power Mill to calculate average and peak power. The main reason was runtime
advantage of PowerMill compare to SPICE. It should be noted here that Power Mill is capable
of taking SPICE net list as input so any switching between from Power Mill and SPICE is
transparent, if needed.
3.3.3 Prime Power [66]
Prime Power is another offering in Synopsys power portfolio. This is dynamic vector based
solution. However the key difference with Power Mill is that Power Mill is SPICE based tool
whereas Prime Power is logic simulation based tool. In other words, Power Mill is more tuned
for accuracy and Analog kind of designs whereas Prime Power is tuned to digital and
specifically ASIC kind of designs with reasonably good accuracy. Prime Power has PLI
interface with leading industry simulators e.g. VCS, Modelsim, Verilog etc. While doing logic
verification with these simulators, if we instantiate one call/command, the PLI dumps binary
files. These binary files can be used in Prime Power to do power estimation. It should be noted
that Prime Power can do peak power analysis also.
We used Prime Power for both average and peak power analysis. The simulator interface being
used was VCS.
3.3.4 Other Tools
This project used VTRAN for converting vectors to SPICE stimulus. VTRAN is one of the
offerings as part of Synopsys and is generic translator of vectors from one format to another. It
47
is supporting all major industry formats as well as internal formats of many prominent
ASIC/EDA vendors.
VCS was used for logic simulation. There is no specific reason for using this simulator except
that it is Synopsys offering so will go with Prime Power without major hurdles.
There are few TI internal programs used to set up an automated flow. They are listed below.
1. genFuncTDL An internal utility to generate random vectors with specified clock rate.
2. SimOut A test constraint validation environment.
3. SDFAligner for translating SDF from one simulator to other simulator compatible
format.
4. SigProbGen For converting vectors to input switching activity and probability
calculator.
5. DREPGEN for generating data compatible for TFC.
6. ASCII benchmark data to Verilog netlist and SPICE netlist translator.
48
DREPGEN
VERILOG
NETLIST
DC Scripts
TRANSLATER
Verilog
POWER
ESTIMATION
Spice
NETLIST
ISCAS89
Circuits
DREPFILE
+ DATA
RANDOM
TDL
GENFUNC
TDL
TFC
USERFREQ
FILE
SIGPROBGEN
SWITCHING
ACTIVITYFILE
VTRAN cmd
VTRAN
PWL
FILE
POWER
MILL
SMOUT
CFG
CMD
TRANSLATER
SPICE
TEST
Bench
SDF
POWER
PrimePower
PIF
VCS_PIF
COMPARISON AND
REPORT
n
n
n
n
n
n
n
49
Full VCD
Flops
50
The power numbers mainly reflect the cell internal power and switching power only due
to gate input capacitances as no interconnects were assumed.
All the experiments are done at nominal operating point i.e. normal process, 25 C
temperatures and 1.2 voltage (nominal voltage).
Clock network power is 50% of total dynamic power but this is not true in all cases.
Run time reduction from static approach is more than 1000 times.
Prime Power reported power is optimistic in many cases to PowerMill. This is not in
our expectation and we are looking into it.
TFC is within 30% of PowerMill reported power. However there are certain exceptions
where it reports 30% optimistic power or >50% pessimistic power.
51
Design
Name
IN
OUT
Flops
Boolean
(gates+inv)
s111
s1196
14
14
18
388+141
s1238
14
14
18
428+80
s13207
31
121
669
2573+5378
s13207_1
62
152
638
2573+5378
s1423
17
74
490+167
s1488
19
550+103
s1494
19
558+89
s15850
14
87
597
3448+6324
s15850_1
77
150
534
3448+6324
s208_1
10
66+38
s27
8+2
s298
14
75+44
s344
11
15
101+59
s349
11
15
104+57
52
Design
Name
IN
OUT
Flops
Boolean
(gates+inv)
s35932
35
320
1728
12204+3861
s382
21
99+59
s38417
28
106
1636
8709+13470
s38584
12
278
1452
11448+7805
s38584_1
38
304
1426
11448+7805
s386
118+41
s4
s400
21
106+58
s420_1
18
16
140+78
s444
21
119+62
s5
1+0
s510
19
179+32
s526
21
141+52
s526n
21
140+54
s5378
35
49
179
1004+1775
s641
35
24
19
107+272
53
Design
Name
IN
OUT
Flops
Boolean
(gates+inv)
s713
35
23
19
139+254
s820
18
19
256+33
s832
18
19
262+25
s838_1
34
32
288+158
s9234
19
22
228
2027+3570
s9234_1
36
39
211
2027+3570
s953
16
23
29
311+84
Design
23
S13207_1
24
S15850
25
S15850_1
26
S35932
250
54
Design
189
S38584
205
S38584_1
212
Design Name
CLK Power
Total Power
%CLK/Total
s4
2.13
3.35
63.6
s27
6.39
10.91
58.61
s208_1
17.05
30.43
56.04
s298
29.84
54.12
55.14
s344
31.97
61.11
52.32
s349
31.97
61.14
52.29
s382
47.04
91.73
51.28
s386
12.79
32.28
39.62
s400
47.04
94.51
49.77
55
Design Name
CLK Power
Total Power
%CLK/Total
s420_1
34.1
53.75
63.46
s444
44.76
84.83
52.77
s510
12.79
29.43
43.46
s526n
44.76
85.94
52.08
s526
44.76
85.89
52.11
s641
40.5
117.38
34.5
s713
40.5
123.07
32.91
s820
10.66
72.29
14.74
s832
10.66
72.5
14.7
s838_1
68.21
99.96
68.24
s953
61.81
102.37
60.38
s1494
12.79
158.7
8.06
s1488
12.79
158.24
8.08
s1423
157.73
356.1
44.29
s1238
38.37
150.51
25.49
s1196
38.37
151.17
25.38
56
Design Name
CLK Power
Total Power
%CLK/Total
s5378
381.55
751.75
50.75
s9234_1
449.75
891.59
50.44
s9234
485.99
632.35
76.85
s13207_1
1359.9
1908.3
71.26
s13207
1426
1718
83
s15850
1272.5
1971.3
64.55
s15850_1
1138.2
2630.3
43.27
s38417
3289.1
4659.3
70.59
s35932
3450.5
9654
35.74
s38584_1
2920.7
8339.6
35.02
s38584
2966.3
8057.2
36.82
%new
Design
Name
Power
Proposed
Prime
Power
power/
Compiler
Approach
Power
Mill
power
%power
%new
%prime
compiler/
approach/
power/
PowerMill
PowerMill
PowerMill
91.62
-22.24
-100
compiler
s111
5.5
2.23
2.87
57
-59.42
%new
Design
Name
Power
Proposed
Prime
Power
power/
Compiler
Approach
Power
Mill
power
%power
%new
%prime
compiler/
approach/
power/
PowerMill
PowerMill
PowerMill
compiler
s4
3.72
3.35
2.93
2.79
-9.95
33.43
20.16
4.95
s5
2.49
1.34
0.47
1.72
-46.12
44.66
-22.05
-72.61
s27
12.69
10.91
10.03
9.36
-14.01
35.54
16.55
7.14
s208_1
44.91
30.43
22.4
29.03
-32.25
54.7
4.81
-22.84
s298
67.33
54.12
40.05
41.42
-19.62
62.57
30.67
-3.31
s344
85.24
61.11
56.55
65.7
-28.31
29.74
-6.99
-13.93
s349
86.48
61.14
56.66
65.86
-29.3
31.31
-7.16
-13.97
s382
83.57
91.73
52.75
53.15
9.76
57.25
72.6
-0.75
s386
75.15
32.28
42.78
48.46
-57.05
55.07
-33.4
-11.73
s400
83.96
94.51
52.77
53.3
12.58
57.51
77.32
-1
s420_1
70.19
53.75
45.6
44.12
-23.43
59.11
21.83
3.37
s444
83.79
84.83
52.9
53.64
1.24
56.22
58.15
-1.38
s510
64.68
29.43
18.23
47.43
-54.51
36.36
-37.96
-61.57
s526n
85.2
85.94
53.54
53.89
0.87
58.1
59.48
-0.65
s526
85.41
85.89
53.67
54.08
0.57
57.93
58.83
-0.75
58
%new
Design
Name
Power
Proposed
Prime
Power
power/
Compiler
Approach
Power
Mill
power
%power
%new
%prime
compiler/
approach/
power/
PowerMill
PowerMill
PowerMill
compiler
s641
159.77
117.38
72.37
93.34
-26.53
71.17
25.76
-22.46
s713
162.62
123.07
74.51
96.57
-24.32
68.41
27.44
-22.84
s820
119.02
72.29
47.96
73
-39.27
63.04
-0.98
-34.3
s832
119.18
72.5
48.03
73.34
-39.17
62.51
-1.14
-34.51
s838_1
126.27
99.96
93.41
75.78
-20.84
66.63
31.91
23.27
s953
159.75
102.37
85.98
88.5
-35.92
80.51
15.67
-2.85
s1494
187.71
158.7
98.28
136.47
-15.45
37.54
16.29
-27.99
s1488
203.99
158.24
98.16
135.83
-22.42
50.18
16.5
-27.73
s1423
406.56
356.1
244.9
278.03
-12.41
46.23
28.08
-11.92
s1238
302.45
150.51
128.2
151.55
-50.24
99.57
-0.69
-15.41
s1196
296.7
151.17
126.5
151.13
-49.05
96.33
0.03
-16.3
s5378
1041.2
751.75
584.3
688.62
-27.8
51.2
9.17
-15.15
s9234_1
1480.6
891.59
704.7
812.36
-39.78
82.26
9.75
-13.25
s9234
1300.4
632.35
508.2
472.82
-51.37
175.03
33.74
7.48
s13207_1
2853
1908.3
1533
1677.46
-33.11
70.08
13.76
-8.61
59
%new
Design
Name
Power
Proposed
Prime
Power
power/
Compiler
Approach
Power
Mill
power
%power
%new
%prime
compiler/
approach/
power/
PowerMill
PowerMill
PowerMill
compiler
s13207
2572
1718
1436
1418.89
-33.2
81.27
21.08
1.21
s15850
2640.3
1971.3
1400
1361.52
-25.34
93.92
44.79
2.83
s15850_1
3272.6
2630.3
1539
1945.25
-19.63
68.24
35.22
-20.88
s38417
7654.6
4659.3
4352
4688.74
-39.13
63.26
-0.63
-7.18
s35932
17606
9654
6789
8513.75
-45.17
106.79
13.39
-20.26
s38584_1
12031.7
8339.6
5630
6738.36
-30.69
78.56
23.76
-16.45
s38584
10951.4
8057.2
4261
6235.13
-26.43
75.64
29.22
-31.66
60
We can approximate the average power for each cell based on toggle densities and approximate
power or ground network as distributed or lumped R and C. SPICE simulating this power
network, one can estimate average power/ground bus currents. [31]
3.6.2 Average power dissipation
As a direct consequence of the power estimation described above, it should be clear that the
analysis gives overall average power dissipation, summing over all circuit nodes.
3.6.3 Electro migration failures
Electro migration [93][94] is a major reliability problem caused by the transport of atoms in a
metal line due to electron flow. Under persistent current stress, this can cause deformations of
the metal, leading to either short or open circuits. The electro migration failure depends on
average and root mean square RMS current densities in metal leads. The average current in
each metal lead can be estimated by the method described in this chapter and thus potential
electro migration current can be addressed either in power network or signal lead.
3.6.4 Power Routing
It has been noticed that inaccurate power estimation normally is the root cause of over design
of power network. By estimating accurate power number, it is possible to have dense power
grid on a block and light power grid on some other block and thus reducing the overall IR drop
problem also.
61
3.7 Summary
Based on our validation flow and analysis of results, it can be found that there is a way to
estimate a good power number with minimum run time as shown Table 3.3. However as the
method suggests, the toggle frequency calculation method has certain limitations as it is based
on probabilistic algorithms and it does not have timing information or it does not do any logical
simulation. Some power designers may be interested in having good accuracy at the cost of
run time. We have proposed a power estimation flow that caters the need of power user as
well as normal users also.
62
Max Voltage
Voltage
Time
The dips in voltages are due to sudden change in currents during logic switching since
inductance will have additional di/dt noise. Apart from that, in CMOS currents are higher while
logic switches compare to average currents used for average IR drop analysis. This causes
additional i(t)*R drop where R is resistance of Power Grid. Total drop seen at the sink of
current is:
deltaV = L(di/dt) + i(t)*R
63
Most popular technique to control this IR drop is to insert decoupling capacitors in the design.
Figure 4.2 shows electrical representation of inductance and dynamic switching of cell that
causes Power supply noise and decoupling capacitors that helps in meeting this instantaneous
need.
Lpd
Vdd Pin
Rpd
Vdd
Idd
Rnd
Cpd
LpsVss Pin Rps
Vss
Cps
Vdd Net
Cdecap
Cnd
Cell
Iss
Vss Net
Rns
Cns
This work focuses on computing instantaneous IR drop (deltaV) or actual voltage (Vdd-deltaV)
at Cells Power/Ground ports. Vdd is ideal voltage source here and constant over time. Here
also our approach is focused on cell based designs. Next section explains the cell
characterization and modeling needed for block level analysis. Using this characterization, we
build a power grid network that can be simulated. This is discussed in section 5.3. Section 5.4
explains the prototype flow we developed and chapter ends with validation results and
conclusion.
64
65
Output is
rising. This
alignment is
preserved for
better results
during current
waveform
generation.
Output is
rising. There is
notable
symmetry for
rise/fall. This
helps us to
characterize
only one
current and do
the analysis at
Power/Ground
network.
Same is true
for Output
falling.
66
In this work, we have maintained temporal relation ship between Power and Ground current
waveforms and decoupled the simulations i.e. they are simulated separately and IR drop results
are merged.
We performed simulations and arrived at following conclusions.
The shape of the current waveform remains the same if the patterns used are same
across different frequencies. Note here that the overall simulation time decreases when
frequency increases for a same set of patterns. This is not a surprise as the load being
charged and discharged is same during each transition for the same slew and for the
same set of patterns. In case of CMOS gate, shape of current waveform remains same
for very high frequencies (period ~= 3 times of 0-100% slew). (Appendix C)
The slew or transition time (used interchangeably) plays a big role for peak power
determination of cells. When the slew decreases, the width of the current spike
decreases with increase in peak. Figure 4.4 and Figure 4.5 shows the peak power
variation for different input transition times. Note the variation of ~2x for inverter and
~1.5x for 2 input NAND gate.
67
Figure 4.5 Transition time vs. peak power for nand gate
Peak power varies while change in output load. The change is as expected since
capacitance increase along with MOS resistance provides exponential voltage ramp up.
Peak is largely dependent on MOS ON resistance as well as initial voltage. Figure 4.6
and Figure 4.7 shows the plot of variation for AND as well as OR gate. Note that the
variation is ~1-3% across wide range of load.
68
For cell characterization, pattern dependency is not critical. This is expected as most of
the circuits will be 1-2 level of logic where each pattern will activate/deactivate most of
the transistors. However, soon when cells start becoming larger, some logic may not get
activated during switching. In this case, it is important to choose useful patterns for cell
current characterization.
For cell characterization, transition direction matters for a given power supply. It means
that output rise transition or fall transition are important to capture during
69
characterization and use them appropriately during use. (Figure 4.3) In our case, we
capture rise and fall transition together and use them for analysis, making proposed
approach direction independent. Figure 4.8 State Dependency on cell switching
70
3. The temporal correlation between different inputs influences the characterization data a
lot. This is due to simultaneous switching. We have used the least affecting combination
i.e. 0 skew between multiple inputs in our analysis this is worst case also. (Figure 4.8)
4.2.2 Current Characterization Flow
Current Source generation involves time variant current waveform determination for each cell.
This is current waveform as it is seen at VDD pin of cell when the cell output is rising or falling.
The flow is shown in Figure 4.9. Sample SPICE deck is shown in Appendix D. PERL Program
that takes input from SPICE simulation has following options available. In our case, we took
last option with 75ps as sampling interval.
1. full Whole current data available in the punch file is given as output in two column
format, first column giving the simulation time and the second column giving the
current value corresponding to each simulation time instance.
2. fixed The total simulation time is divided into 8192 points and the current value at
these 8192 time-values is obtained either directly, if available or by interpolation.
3. Interval filtered An interval in picoseconds is specified and according to that, the
program obtains the time-values for which the data is expected. Again, the current data
corresponding to these time-values is obtained directly, if available or by interpolation.
71
SPICE simulation
@ 10 MHz
Perl Processing to
Sample VDD currents
Figure 4.9 Cell Characterization Flow
Using the above methodology, we characterized all the cells which were being instantiated in
ISCAS89 circuits.
72
Once, the power grid is determined along with capacitance and current source distribution, it
can be realized as matrix data structure and can be solved for computing voltages at desired
nodes specifically the nodes where cell components are connected as below.
V*Y=I
73
In our work, we have computed resistances and capacitors based on technology data for 130nm
node. A sample program was written to realize the mesh structure as shown in Figure 4.10 for
VDD network and VSS was taken as ideal ground. This is not an issue since we can lump all
the VSS network elements to VDD network. After determining Power Grid Current Waveform,
we solved the network through SPICE simulations.
4.3.1 Power Grid Current Waveform Modeling
Power Grid Current waveform modeling involves following steps:
1. Compute Toggle frequency for each of the instance in design as proposed in Chapter 2.
2. Using the current characterized data for the cell, transform the current data at the above
computed toggle frequency.
3. Compute the input arrival for each of the instance in design. This is done using Static
Timing Analysis. Compute the shift required in current waveform with reference to
clock edge. For simplicity, we have assumed 0 skew for clock network.
4. Hook up the current sources and solve the PG network.
5. Determine the PG model simulation time.
There are explained further below.
1
74
Characterized data was transformed from time domain to frequency domain. The
sampling is done at fixed frequency (much higher than common design frequency
values) 1000/75 ~ 13.33 GHz and [t, i(t)] are stored.
I(t) = i(0)d(0) + i(0+Ts)d(0+Ts) + i(0+2*Ts)d(0+2*Ts) + N Samples
Where,
Ts is sampling frequency in this case 13.33 GHz
i(t) is current value at time t
d(t) = 1 when t=n*Ts else 0. n ranges from 1,,N
For computation efficiency N may be chosen as power of 2 N = 2 ** n (n is integer)
Now, the Fourier transform of the samples have been performed:
I[k] = i[n]*
2
Model the current waveform for each Boolean gate at computed toggle frequency.
A compression factor (M) is defined to meet the targeted frequency of the cell under
consideration.
M = targeted frequency/cell characterized frequency (10MHz in this work)
Transformation allows preserving base of the current transients. This would not have
been possible in a time domain while we scale frequency. Hence, the need of frequency
domain transformation. Appendix E shows the waveform generated after transformation
from 1 MHz waveform. As it can be seen, 1GHz waveform is not per expectation. This
is not an issue since apart from clock cells, other cells are not expected to switch at 1
75
GHz average toggle frequency. Beside, this can be handled by having higher frequency
characterization for clock cells.
When the data was transformed to frequency domain and the frequency spectrum was
seen, the notable point was that we had a good chunk of lower frequency components signifying the approximate triangles of SPICE waveform and most of the medium to
high frequency components were zero - signifying the zero or low-leakage portion of
the power waveform.
Attach the current waveform at a PG node where this cells power or ground pin is
connected.
If all instances in the design are applied with respective waveforms, metrics solver gives
peak voltage drop value from 0 to LCM (period of all gates)
In reality we are using a smaller number than that to ensure less simulation time and
more realistic data. Instead we computed simulation time as below.
Tstop = f(minimum toggle frequency, max delay)
= Time Period of minimum freq cell + maximum delay of all cell outputs
= 2000 ns (for minimum frequency as 1 MHz and 1000 ns as worst delay)
76
Do timing analysis and based on input arrival time, the current waveforms are shifted
along time axis. The purpose behind timing analysis is to establish temporal correlation
between various nodes of the design i.e. even though 2 or more nodes have same toggle
frequency; this will not switch all instances in design simultaneously unless needed. In
this work, we have chosen to work with toggle frequency and delay instead of timing
window [28][45]. The reasons,
Not all circuit nodes switch in all the clock cycles. Average activity computation
establishes relative amount of switching among various nodes. This is possible because
activity estimation techniques consider circuit functionality. Average switching activity
for most of nodes is believed at 20% of the controlling clock frequency. In certain
solutions, the average switching activity for non clock signals is assumed to be 10%
only.
Timing window method uses classical path sensitization to identify the interval of
switching. Inherent assumption of STA that all activity on a path should finish within 1
clock period (unless specified explicitly using multi-cycle path), the timing intervals for
all nodes will lie within a clock period. This makes whole approach of pseudo dynamic
simulation pessimistic. (see results)
During timing analysis, we collected 2 sets of data. One, sensitization edge of the node
i.e. whether the node is rising or falling at that time and second, delay of the node from
reference node.
Definition: Reference nodes are those nodes that can be considered as 0 delay
nodes. All the flip-flop outputs are considered as reference node in our
analysis. When the input clock to the flip-flop has some propagation
77
delay associated with it, the reference node will have delay associated
with it.
It can be seen that any frequency higher than 1 MHz will have at least some repetition in its
current signature i.e. a node is switching at 50 MHz (20ns) will have 50 repetitions of its
current signature in 1000 ns simulation.
By changing the minimum frequency, we can change the simulation time considerably. For
example, by changing minimum frequency to 50 MHz, we can ensure that all the current
sources with less than 50 MHz do not contribute (or contributes an average current) to dynamic
V drop analysis and in that case maximum simulation time can become only 20 ns. In all our
analysis we have assumed 1 MHz as minimum frequency.
Number of points in piece wise linear current waveform is based on the sampling resolution
that we did as first step after reading characterized data. An increase or decrease in this
frequency can change the accuracy trading some runtime. In our analysis, we have assumed 75
ps as sampling interval.
Clock network toggles all the time. Also many designs aim for smaller insertion delays as well
as near zero skew. This makes clock network as one of the largest contributor of total current as
well as peak current.
78
of any node. Alternatively frequency constraints can be generated from logic simulation or
functional patterns. SDC contains timing constraints of the design. This is used in toggle
activity calculation as well as timing analysis. Timing information consists of max delay for
paths converging to any node and sensitization edge across that path. Current signatures for
each of the blocks (library macros as well as hierarchical block) are generated from current
models, timing information and activity estimation. The document explains, all the three
processing steps toggle calculation, timing measurement, current signature generation and
block modeling in detail. Once the current signatures are hooked to parasitic PG-network, a
transient simulation is performed to measure V-drop at each macro node as well as dynamic
transient current waveform is generated for the power-ground pins. The V-drop data is being
fed to timing analysis engine to analyze impact of V-drop to timing.
Netlist
Frequency Constraints
SDC
Timing Analysis
PWL Generator
SPICE Simulation
79
Current Char
Next sections explain Power Grid Generator, Timing Information Generation and SPICE
simulation details.
4.4.1 Timing Information Generation
Timing information was generated using Prime Time. Prime Time requires Verilog netlist,
SDC and SPEF (Standard Parasitic Exchange Format) files as an input. We also wrote a tcl
script (Prime Time supports TCL command language) to get arrival time information for all
nodes of the circuit. Prime Time flow is shown in Figure 4.12 below. Sample SDC file [24][25]
and SPEF used are shown in Appendix A and B.
SDC File
Verilog
Netlist
SPEF
Prime Time
Arrival Time
Computation
Timing Report
Figure 4.12 Prime Time flow for arrival time computation
80
Toggle Frequency
Calculator
Cell Flow
Perl Code
(Processes various Inputs)
Timing Report
(delay information)
MATLAB Program
-Compression Factor computed (M)
- M based compression in freq domain
Analysis
Flow
Perl Code
PG Mesh Generation
Current PWL hookup
PG Network
PERL program combines the toggle frequency values obtained using TFC and delay values for
corresponding nodes for all the nodes. The output file containing this information for all the
cells is given to MATLAB.
MATLAB program It is given two inputs. One being the current data at prototype frequencies
for all the gates. The other input is a file containing delay and average activity information for
all the cells of the circuit. Depending upon the activity, the prototype current data is
compressed. And this data is shifted by the amount equal to the delay at that node. The same
procedure is repeated for all the cells. This information about the current data for all the cells is
stored in a file. The second input is a file, which contains the following information about the
VLSI circuit for which we have to obtain the power data.
81
Based on the generated current signatures, a new PG network is created. After this, all the
macro instances are replaced with the corresponding current signatures. In our analysis, we
took a PG network with uniform Power Grid and ideal GND. We did not do any actual power
routing but attached the current sources randomly. This is compared with actual spice circuits
for all macros in the same PG network at the same locations.
4.4.3 SPICE Simulation
Now, each cell is replaced by current source driven by its corresponding PWL data. Package R,
L & C is attached to the top-level power pins. SPICE simulation is performed. The voltage at
each node of the power mesh is punched. The IR drop for each cell is calculated using a
CODAC (Characterization & Optimization of Digital & Analog Circuits) program (TI Internal
Program), which subtracts power supply from the minimum voltage obtained at each node to
give the Peak Dynamic IR Drop at that node. This is done for all the nodes of the circuit. The
same CODAC program can be used to calculate the Average Dynamic IR Drop at each node of
the circuit.
Modeled power grid by creating an nxm mesh. The resistance of each arm in mesh was
derived from Ohm/um number. We also assumed 2 such arms in parallel to comprehend
multi-layer chip scenario.
Matrix solver was not developed as part of this work. Instead, we used SPICE
simulators available.
82
We executed the flow as explained in previous section. Instead of 1MHz, we used 10MHz for
characterization. This is to reduce the amount of data. We still did 13.33GHz sampling of cell
data.
4.5.1 Peak Power Results
Three small circuits were studied to stabilize the above approach. These three circuits are
TWOAND :- The circuit consist of two AND gate one after the another.
ANDOR :- The circuit consists of one AND gate followed by one OR gate.
2AND-1OR :- This circuit has two AND gate at the first level. The outputs of these
AND gates are given to an OR gate whose output is the final output.
The peak power data is obtained for three small circuits using the approach described in the
report and using SPICE simulation. The data obtained using average switching activity
approach and SPICE for 100 Mega Hz and 500 Mega Hz input frequency is given below in
Table 4.1.
TWOAND
AND-OR
2AND-1OR
FREEQUNCY
Our
Spice
Our
SPICE
Approach
Our
SPICE
Approach
Approach
0.0016817
100 MHz
0.0016
0.0009409
83
0.0008421
0.0019253
0.0019
500 MHz
0.00168113
0.0016
0.0009410
0.00086539
0.00192531
0.0018
100 Inverter Chain It is a chain of 100 inverters with the output of the previous
inverter acting as the input of the next. Delay of the chain is higher than the frequency
of operation.
32 Bit Shift Register This 32-bit shift register is series/parallel shift register.
Depending upon the input and selection criteria, the input is shifted in series or parallel
manner.
16 Bit Adder This is 16-bit binary adder. Carry Forward logic is used for addition.
Following points are taken into account while generating the net lists for these circuits.
Uniform mesh structure is used and all leaf cells are placed randomly on to it.
Reduced interconnect network was used using driving point admittance estimation for
power as well as signal lines.
The peak Dynamic IR drop data is obtained using Average Activity approach, Timing Window
approach and SPICE simulation. The data obtained is shown in Table 4.2.
84
Circuit
%Drop in
%Drop in Timing
SPICE
average activity
Window Approach
%Drop
1.65
17.5
40
12
16 Bit Adder
31
NA
19.16
It is clear that the accuracy of the Average Activity method is better than Timing Window
method. To check the performance of this approach, Average Activity method was applied to a
few industry standard circuits. Table 4.3 below shows the comparison of the maximum
Dynamic IR Drop in a circuit using average switching activity and Power Mill. Power Mill is a
SPICE based transient analysis tool offered by Synopsys. It is now called Nano Sim.
circuit
%Error
s27
4.5
5.8
-22.4138
s344
6.3
6.6
-4.54545
s349
6.2
7.5
-17.3333
s444
8.6
13.3
-35.3383
s1238
13.4
13.3
0.75188
s298
12.5
15
-16.6667
85
Power Supply Noise waveforms for average activity approach to spice simulation with actual
logic is shown in Figure 4.14, Figure 4.15 below.
86
4.6 Summary
We proposed novel PG network modeling technique. The approach involves average switching
activity calculation, transient current characterization of basic Boolean gates of library,
derivation of PG network model and doing transient simulation of the PG model using vector
less approach. The results are derived from this simulation as desired. Further, our global
average switching activity calculation method ensures that we can consider global timing
impact due to global voltage drop without causing extra runtime. This reduces the need of
local maximum voltage drop analysis on timing [26]. It is also noted in our approach that we
have detailed data of voltage drop across chip/block and based on this profile, we can also use
suitable decoupling placement at required location. The validation is done and results are
compared with dynamic fast SPICE simulator (Nano Sim) and proved that this average
switching rate calculation gives as close results as dynamic vector analysis. However, the
advantage comes from the fact that average switching activity also gives accurate analysis of
average V drop. Hence the approach we are suggesting gives both average and dynamic PG
noise results simultaneously.
The approach is scalable to multimillion gate designs by using the technique proposed by
Blaauw et al [55]. There is further possibility to expand this work to understand decap
sensitivity as well as to skew the analysis for certain end target e.g. PG grid robustness or
Monte Carlo based analysis for higher accuracy and coverage.
87
88
5 Power Up Analysis
One of the popular techniques to reduce leakage is to use gated power supply. [74, 79, 80].
Shekhar [74] has highlighted a technique called sleep transistor and challenges associated
with that. This technique proposes to gate power supply using a high threshold transistor when
not required as shown in Figure 5.1. The sleep transistor also known as power switch turns
off power supply when a portion of chip is idle and thus saving leakage current. Apart from
design challenges, the technique has additional Design Analysis challenges as mentioned below.
1. When Power Supply turns on from off state, a huge capacitive load gets charged
causing a huge surge in current causing Power Supply Noise (PSN). This can couple
with signal lines causing state change or delay change. It can also remain within supply
89
network but causing huge dynamic IR drop that in turn affects circuit performance. The
goal is to predict the surge and control that.
2. The transistor in series with the supply acts as a huge resistor in normal mode of
operation causing additional IR drop. This in turn degrades performance. The IR drop
across the transistor can be as high as 5-20mV. The goal is to do an average IR drop
analysis to access the impact of switch.
3. Optimization of switches to get the best leakage improvement. The optimization has
area penalty or IR drop or Power Supply Noise as cost parameters. For example, low
number of switches gives good leakage improvement but high IR drop and Power
Supply noise.
4. When power supply goes down, all sequential logic in the virtual power domain losses
its state. This puts extra constraint overall on system behavior. There is also a technique
where the state is preserved through retention flops. [2, 81] The technique does need
extra power routing to save state as well as control logic. The timing analysis needs to
capture the mode switching.
5. Placement and Routing of extra signals, special cells (like retention flops etc) and
virtual power network.
6. Leakage and number of power switch trade off
7. Power routing closes immediately after floor plan. The switches need to be placed by
this time. It is important to have early power up analysis flow to compute required
90
number of optimal switches meeting the peak current surge as well as IR drop and
leakage needs.
Often, PSN is non-negotiable parameter and design-planning goal is to identify total number of
switches that limits PSN to user-defined level. This paper describes an analytical method to
determine optimum number of power switches and power up glitch. Section II elaborates on
switched PG network and PSN problem. Section III outlines the approach to analyze such
networks. Section IV correlates the results we have achieved with SPICE and the efficiency of
algorithm.
91
Power Switch
Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output
A typical PG network with Power Switches can be represented as shown in Figure 5.4. Some of
the characteristics of this network are: [87]
2 domains one golden domain and non-gated power supply, second multiple virtual
domains and switched power supply.
All virtual domains are unconnected within. They are connected to golden domain
through switch network.
92
Switch network consists of one or more different kind of switches for a given domain.
Control logic enables any one or more virtual domains to turn on/off any time.
NonGated
Power Network
Logic
Network
VDD
Switch
Network
SW
Virtual Power
Network
Logic
Network
ZOOM
SW1 SW VDD
SW2 SW VDD
SW3 SW
VDD
N Switches
Parallel Configuration
VDD
SW1 SW VDD
SW2 SW VDD
SW3 SW
D1
D1
N Switches
D1
Sequential Configuration
Figure 5.4 Typical PG network with Power Switches
When the power supply is off and virtual network is disconnected, the current that passes
through is leakage current. If leakage current of the virtual logic is significantly higher than that
of switch network leakage, leakage current improvement happens. When the switches are
turned on i.e. when the power supply connects to virtual power network, the loads in virtual
93
power network start getting charged. Loads include interconnect capacitances, gate
capacitances as well as the circuit diffusion/diode caps. The amount of current being sunk by
these caps depends on the ability of switch network to provide charge in a given time. Due to
fast current need of the virtual power domain, there is L*di/dt noise being injected into circuit
that can affect normal functioning of the golden power domain. Note that despite of capacitive
load dominating, the peak current is still limited by saturation current of switch causing current
profile we got in Figure 5.3.
94
Voltage at any node in virtual power network is of the same value at any time instant
during power ON if there is zero static IR drop.
Switch Network is sequential. Parallel configuration essentially means a BIG switch all transistors forming a BIG switch with characteristic lumped to a single MOS.
High-level flow for the analysis is shown in block diagram Figure 5.5.
Switch IV
Characterization
Determination of
required parameters
Figure 5.5 Schematic Switch network Analysis Flow
95
But dq = C * dv
Hence dv = I * dt / C
VDD
Vout
Switch
Network
Extracted
Total Cload
Equation 3 forms the basis of Algorithm 1 described in next section. The delay between two
consecutive switches is used to predict the charge being supplied by the switch to virtual power
96
network domain. The IV table of the switch is used to predict current by further dividing delay
into infinitesimal small time duration as shown in Figure 5.7. Based on the initial voltage and
charge supplied, the voltage has been derived when the next switch just starts turning on. This
process continues till either all switches are turned on or the specified voltage level is reached.
Further, the same method continues if all the switches are turned on but voltage value is lower
than the ideal voltage value (VDD golden) to predict the maximum surge in current. Predicted
number of switches is used to predict static IR drop across switch network as explained in
Algorithm 2. This is another important parameter that will not be discussed further in this
chapter.
Alternatively, voltage value that can be reached with given number of switches.
97
Maximum current surge that will happen given the number of switches.
98
Above algorithm is developed for the case where the delay between 2 consecutive switches in
sequential switch network is same. However, it is possible to extend for different delay scenario.
In this case, we need to use timing information from Static Timing Analysis or simulations.
5.2.2.2 Algorithm for Static IR drop analysis across power switches:
{
Read switch characterization data for static IR drop, read ON Channel
resistance (RON)
Determine total number of switches required to reach desired voltage level
desired voltage level is specified by user by Algorithm for power Switch
Network Analysis
Effective resistance of the switches predicted above (N) is: RON/N
Compute power consumption of switched off or virtual power network using
any methods described in this work (can be outside this work also!)
Compute average current consumption of the virtual power network. Iavg =
Pavg/VDD
Static IR drop across switch network is: Iavg*RON/N.
}
99
gates in the virtual network or more. This will take weeks to simulate even with fast SPICE
simulators available in market. Also it is very late in design cycle!
Alternately we can reduce the virtual power network by modeling the interconnect load and
gate capacitance with a huge distributed capacitance and on channel transistor resistance with
effective resistance in series with each distributed C to reduce the number of active elements
and simulate the reduced power network using SPICE (Figure 5.8). This approach gives orders
of improvement in terms of simulation time but the run time is still days. This can be done
during design planning or after detailed design is over!
The technique we presented in last section is static in nature and reduces the runtime to few
minutes and gives very good correlation to techniques described above. The algorithms
described above were analyzed with switches designed in TIs 90 nm node. All the results
below are for a 1M equivalent gate block. 1M Gates could not be simulated using SPICE along
with switches so a simplified model described in previous paragraph was employed to get
100
SPICE accuracy data while keeping switch network intact. We had employed switch network
with two kinds of switches for this analysis [87]. One set of switches took the virtual domain
till a specific voltage level and second kind of switches with high capacity were turned on in a
sequential manner to measure surge in current.
Table 5.1 shows prediction of switches for given voltage. When the numbers of switches are
increasing the algorithm gives results within 1% accuracy to SPICE based simulation whereas
when the numbers of switches are less, the inaccuracy is within 10%. In other words, the actual
number is quite close to realistic number with accuracy 1-10%. This table also shows the
current surge prediction and the switch number which turns ON causing maximum peak.
Essentially, along with surge, we predict the switch at which the maximum surge occurs. This
helps to further optimize the 2nd type of switch network. Table 5.2 shows voltage prediction
given the number of switches.
The advantage of whole solution comes from the superlative run time improvement that
enables early analysis and tradeoffs in the design Table 5.3. The runtime clearly outweighs
the small inaccuracy in switch prediction or voltage prediction. Note that runtime does not
include switch IV characterization time since it is one time effort. In static analysis, we can
dump lot more information quickly as per the need to understand certain behavior for tradeoff
analysis. We can also predict time domain behavior of voltage and current using the approach
described in this work. Figure 5.9 compares predicted voltage over time to few arbitrary nodes
simulated in SPICE. Figure 5.10 compares predicted current over time to current measured at
VDD. This is good considering that the analysis is targeted for early trade off analysis.
101
Actual
Switches by
Current
Current Surge
#Switches
Algorithm
Surge (mA)
after #switches
20
380
403
950
123
69
760
771
881
114
271
1560
1554
749
100
583
2340
2328
467
97
869
2964
2971
266
81
1170
4368
4308
24
43
Vdesired (mV)
Surge Current
#
Simulated
Voltage by
Surge
%Error in
after switch #
Switches
Voltage (mV)
Algorithm
Current
voltages
(mA)
780
63
70.54
892
101
11
1560
280
273.53
784
94
-0.2
2340
587
589.26
546
78
0.38
3120
926
927.7
263
64
0.18
102
No. of switches
780
~1.5
<1
1560
~4
<1
2340
~5
<1
2940
~6
<1
1400
Voltage in mV
1200
1000
800
600
400
200
0
Time
Predicted
SPICE@node1
SPICE@node2
Current in mA
1000
800
600
400
200
0
Time
Predicted
SPICE
103
5.4 Summary
There are various techniques to improve leakage power of the design - gated power supply or
sleep transistor or switched power network is one of the efficient methods to reduce the
leakage power. The analysis techniques described in this work helps in giving quick data for
architecture level decisions while using switched network technique. The runtime is in few
seconds and hence Design Team can do lots of iterations to get the optimum number of
switches. The analytical method to calculate total no of switches is fast since it involves one
time SPICE simulation only IV characteristic of switch - and rest of the analysis is performed
using static analysis. We have also analyzed power on glitch for the design using the method
that contributes to Power Supply Noise during power up. All the results are closely matching
with SPICE simulation.
104
6 Conclusion
6.1 Summary
Power Grid analysis challenges being faced by CMOS technology is discussed in this thesis.
For robust power grid, designs need to go through following analysis:
5. Power Up analysis for MTCMOS based digital designs. The methodology is validated
using prototype flow and gives superlative run time improvement compare to Spice.
The methodology also helps in MTCMOS gate optimization.
106
decoupling capacitors intrinsic due to NWELL, non switching gates, RAMs as well as
intentional being distributed by user. Decoupling capacitor estimation, characterization and
what-if impact analysis on instantaneous IR drop is import area for further research.
Fifth MTCMOS analysis approach proposed in this work is useful early in design planning to
make efficient tradeoffs of MTCMOS switches vs. noise tolerance levels in design. In this work,
we have modeled switch power network with a lumped capacitance. This does not model time
domain behavior of PG network due to PG resistance. A more accurate approach can be
developed that models distributed RC for PG network once placement and power routing is
done. It is our belief that this will give quick accurate analysis of actual network compare to
SPICE like simulations.
107
108
7 References
1.
Semiconductor
Industry
Assoc.,
International
Technology
Roadmap
for
Semiconductors,
2003
Update
http://public.itrs.net/Files/2003ITRS/Home2003.htm
2.
Nam Sung Kim, David Blaauw et al, Leakage Current: Moores Law Meets Static Power, IEEE Computer, Dec 2003.
3.
4.
Rabe, D; Jochens, G.; Kruse, L.; Nebel, W, Power-simulation of cell based ASICs: accuracy- and performance trade-offs, Proceedings
of Design automation and test in Europe, Feb 1998
5.
F. Najm, A survey of power estimation techniques in VLSI circuits, IEEE Trans. VLSI System., vol. 2, pp. 446455, Dec. 1994.
6.
C. Y. Tsui, M. Pedram, and A. Despain, Efficient estimation of dynamic power dissipation under a real delay model, in Proc. IEEE Int.
Conf. Computer-Aided Design, 1993, pp. 224228
7.
B. J. George et al., Power analysis and characterization for semi custom design, in Proc. Int. Workshop Low Power Design, 1994, pp.
215218.
8.
J.-Y. Lin et al., A cell-based power estimation in CMOS combinational circuits, in Proc. IEEE Int. Conf. Computer-Aided Design,
1994, pp. 304309.
9.
H. Sarin and A. McNelly, A power modeling and characterization method for logic simulation, in Proc. IEEE Custom Integrated
Circuits Conf., 1995, pp. 363366.
109
21. F.N. Najm, R.Burch, P. Yang, and I.N. Hajj. Probabilistic Simulation for Reliability Analysis of CMOS VLSI Circuits. IEEE
Transactions on CAD, 9(4):439-450, April 1990.
22. Randal S and Tom Phoenix and Brian d foy, Learning Perl, 4th Edition, OReilly & Associates, ISBN 0596101058
23. Matlab Tutorial, http://www.math.ufl.edu/help/matlab-tutorial/
24. Synopsys, Inc, Using the Synopsys Design Constraints Format, Application Note, Sept 2005.
25. Himanshu Bhatnagar, Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and Primetime, 2nd
Edition, Kluwer Academic Publishers, ISBN: 0792376447.
26. Martin Saint-Laurent, Swaminathan, "Impact of Power Supply Noise on Timing In High Frequency Microprocessors", IEEE Trans on
Advanced Packaging, pp. 135-144, Feb 2004
27. Kriplani, H.; Najm, F.; Hajj, I, Improved Delay and Current Models for Estimating Maximum Currents in CMOS VLSI Circuits,
ISCAS 94, pp. 435-438, June 1994.
28. Kriplani, H.; Najm, F.N.; Hajj, I.N, Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI
Circuits: Algorithms, Signal Correlations, and Their Resolution, IEEE Trans on CAD of international circuits and systems, pp. 9981012, Aug 1995.
29. Hsiao, M.S.; Rudnick, E.M.; Patel, J.H., Peak Power Estimation of VLSI Circuits: New Peak Power Measures, IEEE Trans on VLSI
Systems, pp. 435-439, Aug 2000
30. Qing Wu; Qinru Qiu; Pedram, M, Estimation of Peak Power Dissipation in VLSI Circuits Using the Limiting Distributions of Extreme
Order Statistics, IEEE Trans on CAD of integrated Circuits and Systems, pp. 942-956, Aug 2001.
31. Boliolo, A. Benini, L. de Micheli, G. Ricco, B., Gate-level power and current simulation of CMOS integrated circuits, Very Large
Scale Integration (VLSI) Systems, pp. 473-488, Dec 1997
32. Anantha
Chandrakasans
Home
Page:
http://www-mtl.mit.edu/~anantha/publications.html,
http://www.fetchbook.info/search_Anantha_Chandrakasan/searchBy_Author.html
33. FFT Tutorial, http://www.ele.uri.edu/~hansenj/projects/ele436/fft.pdf
34. Jeff Tranter and Paul Raines, Tcl/Tk in Nutshell, OReilly Associates, ISBN 1565924339
35. Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, Discrete Time Signal Processing, 2nd Edition, Prentice Hall, ISBN 0137549202
36. Chen, H.H.; Ling, D.D, Power Supply Analysis Methodology for Deep-Submicron VLSI Chip Design, DAC, pp. 638-643, June 1997.
37. Yi-Shing Chang; Gupta, S.K.; Breuer, M.A, Analysis of Ground Bounce in Deep-Submicron Circuits, VLSI Test Symposium, pp. 110116, May 1997
38. Yi-Min Jiang; Kwang-Ting Cheng; An-Chang Deng, Estimation of Maximum Power Supply Noise for Deep Sub-Micron Designs,
International sym on low power electronics and design, pp. 233-238, Aug 1998.
39. Zhao, S.; Roy, K.; Koh, C.-K, Estimation of Inductive and Resistive Switching Noise on Power Supply Network in Deep Sub-Micron
CMOS Circuits, International conference on Computer Design, pp. 65-72, Sept 2000.
40. S. Bobba, I.N.Hajj, Maximum voltage variation in the power distribution network of VLSI circuits with RLC Models, Proc of ISLPED,
Aug2001
110
41. Bai, G.; Bobba, S.; Hajji, I.N, "Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay in VLSI Circuits",
DAC, pp. 295-300, 2001.
42. G. Steele, et al., Full-Chip Verification Methods for DSM Power Distribution Systems, Proc. Of DAC, pp. 744-749, 1998
43. R. Chaudhry, D. Blaauw, R. Panda and T. Edwards, Current Signature Compression For IR-Drop Analysis, Proc. Design Automation
Conference, pp. 162-167, 2000
44. S. Bobba and I. N. Hajj, Estimation of maximum current envelope for power bus analysis and design, Proc. of ISPD, pp 141-146, Apr
1998
45. Rishi Bhooshan (TI) et.al, A Unique Method For Dynamic Voltage Drop Analysis and Decoupling Capacitance Estimation,, VDAT
2003
46. Cirit, M.A., Characterizing a VLSI standard cell library, Digital Object Identifier 10.1109/CICC, pp.25.7.2-25.7.4, May 1991
47. Debnath, S.P.; Sukumar, J.; Udaykumar, H, A methodology for fast vector based power supply and substrate noise analyses,
International conference on VLSI Design, pp. 808-811, Jan 2005.
48. Dalal, A.; Lev, L.; Mitra, S.; Design of an efficient power distribution network for the UltraSPARC-I microprocessor, IEEE conference
on Computer Design: VLSI in computers and processors, pp. 118-123, Oct 1995
49. Chen, H.H.; Schuster, S.E.; On-chip decoupling capacitor optimization for high-performance VLSI design, VLSI Technology, Systems
and Applications, pp. 99-103, June 1995.
50. Larsson, P, Power supply noise in future IC's: a crystal ball reading, Custom Integrated Circuits, pp. 467-474, May 1999.
51. Sotman, M.; Popovich, M.; Kolodny, A.; Friedman, E, Leveraging symbiotic on-die decoupling capacitance, Electrical Performance of
Electronic Packaging, pp. 111-114, Oct 2005
52. Larsson, P, Resonance and damping in CMOS circuits with on-chip decoupling capacitance, IEEE Transactions on Circuits and
Systems-I, vol 45, pp. 849-858, Aug 1998
53. Larsson, P, Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitance, IEEE Journal of Solid State Circuits,
vol 32, pp 574-576, Apr 1997
54. Chaudhry, R.; Panda, R.; Edwards, T.; Blaauw, D, Design and analysis of power distribution networks with accurate RLC models,
International conference on VLSI Design, pp. 151-155, Jan 2000
55. Min Zhao; Panda, R.V.; Sapatnekar, S.S.; Edwards, T.; Chaudhry, R.; Blaauw, D, Hierarchical analysis of power distribution networks,
DAC, pp. 150-155, June 2000
56. IBM Methodology for Power Supply Noise - http://www.research.ibm.com/da/nova.html
57. R. Heald et. al, Implementation of a 3rd Generation Sparc V9 64b Microprocessor, Proc IEEE ISSCC, pp. 412-413, 2000
58. Yi-Min Jiang Kwang-Ting Cheng, Analysis of Performance Impact Caused by Power Supply Noise in Deep Submicron Devices, DAC,
June 1999
59. Apache Design Solutions, Reshaping Nanometer Flows with Physical Power Integrity, http://www.apache-da.com, White Paper, May
2003.
60. Anthony Ralston, Philip Rabinowitz, A First course in Numerical Analysis, 2nd Edition, Dover Publications, ISBN 048641454X.
61. Kalpesh Shah, SNUG 2006 Panel Discussion
111
62. H. Mehta, R.M.Owens, M.J.Irwin, Energy Characterization Based on Clustering, 33rd Design Automation Conference, June 1996.
63. D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A framework for Architectural-Level Power Analysis and Optimizations, Proc of
International Symposium on Computer Architecture, pp. 83-94, June 2000
64. V. Tiwari, S. Malik, and A. Wolfe, Power Analysis of Embedded Software: A First Step toward software power minimization, IEEE
Trans VLSI Systems, vol2, no. 4, pp 437-445, 1994
65. E. Macii, M. Pedram and F. Somenzi, High Level Power Modeling and Estimation, IEEE Transactions on Computer Aided Design of
Integrated Circuits and Systems, vol 17, November 1998.
66. Synopsys Prime Power - http://www.synopsys.com/products/power/primepower_ds.pdf
67. Synopsys Power Compiler - http://www.synopsys.com/products/power/power_ds.pdf
68. Synopsys Nanosim - http://www.synopsys.com/products/mixedsignal/nanosim/nanosim.html
69. Synopsys Liberty Format - http://www.synopsys.com/partners/tapin/lib_info.html
70. M Horowitz and R Gonzalez, Energy dissipation in general purpose Microprocessors, IJSSC, vol31, Sept 1996.
71. Brglez, F. Bryan, D. Kozminski, K. , Combinational profiles of sequential benchmark circuits, ISCAS, vol 3, pp. 1929-1934, May
1989.
72. R. Wilson and D. Lammers, Grove Calls Leakage Chip Designers Top Problem, EE Times, 13 Dec 2002;
www.eetimes.com/story/OEG20021213S0040.
73. Intel SpeedStem technology, http://www.intel.com
74. Y.Ye, S Borkar, V. De, A New Technique for Standby Leakage Reduction in High-Performance Circuits, 1998 Symposium on VLSI
Circuits, June 1998.
75. M. Powell et al., Reducing Leakage in a High Performance Deep-Submicron Instruction Cache, IEEE Trans. VLSI, Feb 2001, pp 77-89
76. Ali K., Charles H. et al., Effect of reverse body bias for low power CMOS circuits
77. Kaushik R, Mark C.J., Dinesh S., leakage control with efficient use of transistor stacks in single threshold CMOS
78. Shekhar Borkar, Low Power Design Challenges for the Decade, 2001.
79. Kumagai, K.; Iwaki, H.; Yoshida, H.; Suzuki, H.; Yamada, T.; Kurosawa, S.; A Novel Powering Down Scheme for low Vt CMOS
Circuits, 1998 Symposium on , 11-13 June 1998. Pages:44 45
80. Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J., 1V high-speed digital circuit technology with 0.5μm multi-threshold
CMOS, IEEE ASIC Conference, 1993.
81. Akamatsu, H.; Iwata, T.; Yamamoto, H.; Hirata, T.; Yamauchi, H.; Kotani, H.; Matsuzawa, A.; A low power data holding circuit with
an intermittent power supply scheme for sub-1V MT-CMOS LSIs, VLSI Circuits, 1996. Digest of Technical Papers., 1996 Symposium
on , 13-15 June 1996 Pages:14 15
82. Ye, Y.; Borkar, S.; De, V. , A new technique for standby leakage reduction in high-performance circuits, Symposium on VLSI Circuits,
June 1998. Page(s): 40-41
83. Das, K.K.; Joshi, R.V.; Chuang, C.T.; Cook, P.W.; Brown, R.B., New digital circuit techniques for total standby leakage reduction in
Nano-scale SOI technology, pp. 309-312, ISSCC, Sept 2003.
84. Wenxin Wang; Anis, M.; Areibi, S, Fast techniques for standby leakage reduction in MTCMOS circuits, ISOCC, pp. 21-24, Sept 2004
112
85. Fei Li; Lei He; Saluja, K.K.; Estimation of maximum power-up current, DAC, pp. 51-56, Jan 2002
86. Calhoun, B.H.; Honore, F.A.; Chandrakasan, A.P, A leakage reduction methodology for distributed MTCMOS, JSSC, pp. 818-826,
May 2004
87. Royannez, P.; Mair, H.; Dahan, F.; Wagner, M.; Streeter, M.; Bouetel, L.; Blasquez, J.; Clasen, H.; Semino, G.; Dong, J.; Scott, D.; Pitts,
B.; Raibaut, C.; Uming Ko, 90nm Low Leakage SoC Design Techniques for Wireless Applications, ISSCC, pp. 138-139, Feb 2005.
88. R. Heald, et al., Implementation of a 3rd Generation SPARC V9 64b Microprocessor, Proc. IEEE ISSCC, pp 412-413, 2000
89. P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, High Performance Microprocessor Design, IEEE Journal of Solid
State Circuits, vol 33, no 5, pp. 676-686, Apr 1998.
90. J. Darnauer, D. Chengson, B. Schmidt, and E. Priest, Electrical Evaluation of Flip-Chip package Alternatives for Next Generation
Microprocessor, Electronic Components and Technology Conference, pp. 666-673, 1998
91. S. Borkar, Low Power Design Challenges for the Decade, Proc. of ISLPED, 2000
92. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel and F. Baez, Reducing Power in High performance Microprocessors, Proc. of
Design Automations Conference, 1997
93. Wachnik, R.A.; Filippi, R.G.; Shaw, T.M.; Lin, P.C, Practical benefits of the electromigration short-length effect, including a new design
rule methodology and an electromigration resistant power grid with enhanced wireability, Sym on VLSI Technology, pp. 220-221, June
2000.
94. J. Kitchin, Statistical Electromigration Budgeting for Reliable Design and Verification in a 300-MHz Microprocessor, Symposium on
VLSI Circuits Digests, pp. 115-116, 1995
95. T .H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to Algorithms, PHI
96. Chapra, S.C, Canale R P Numerical Methods for Engineers 3rd Ed., McGraw-Hill 1998.
97. Rabey, Digital Integrated Circuits Design, Pearson Education, Second Edition, 2003
113
114
115
0.1
*RES
0 G2
G2:0 0.1
1 NO210_3:A
G2:0 0.1
*END
*D_NET G1 0.1
*PORTS
G17 O *L 0.1
G3 I *S 0.1 0.1
G2 I *S 0.1 0.1
G1 I *S 0.1 0.1
G0 I *S 0.1 0.1
PREZ I *S 0.1 0.1
CLK I *S 0.1 0.1
*CONN
*I NO210_2:A I *L 0.1 *D NO210
*P G1 I *L 0.1
*CAP
0 G1
0.1
1 NO210_2:A
2 G1:0 0.1
*RES
0 G1
G1:0 0.1
1 NO210_2:A
G1:0 0.1
*CONN
*I IV110_1:Y O *L 0.1 *D IV110
*P G17 O *L 0.1
*CAP
0 G17
0.1
1 IV110_1:Y
2 G17:0 0.1
*END
*D_NET G0 0.1
0.1
*CONN
*I IV110_0:A I *L 0.1 *D IV110
*P G0 I *L 0.1
*RES
0 G17
G17:0 0.1
1 IV110_1:Y
G17:0 0.1
*CAP
0 G0
0.1
1 IV110_0:A
2 G0:0 0.1
*END
*D_NET G3 0.1
0.1
*RES
0 G0
G0:0 0.1
1 IV110_0:A
G0:0 0.1
*CONN
*I OR210_1:A I *L 0.1 *D OR210
*P G3 I *L 0.1
*CAP
0 G3
0.1
1 OR210_1:A
2 G3:0 0.1
0.1
*END
*RES
0 G3
G3:0 0.1
1 OR210_1:A
G3:0 0.1
*END
*CAP
0 PREZ 0.1
1 DTP10J_0:PREZ
2 DTP10J_1:PREZ
3 DTP10J_2:PREZ
4 PREZ:0
0.1
*D_NET G2 0.1
*CONN
116
0.1
0.1
0.1
*END
*RES
0 PREZ PREZ:0 0.1
1 DTP10J_0:PREZ
2 DTP10J_1:PREZ
3 DTP10J_2:PREZ
PREZ:0 0.1
PREZ:0 0.1
PREZ:0 0.1
*D_NET G5 0.1
*CONN
*I DTP10J_0:Q O *L 0.1 *D DTP10J
*I NO210_1:A I *L 0.1 *D NO210
*END
*CAP
0 DTP10J_0:Q
1 NO210_1:A
2 G5:0 0.1
*RES
0 DTP10J_0:Q
1 NO210_1:A
0.1
0.1
G5:0 0.1
G5:0 0.1
*END
0.1
0.1
0.1
*D_NET G6 0.1
*CONN
*I DTP10J_1:Q O *L 0.1 *D DTP10J
*I AN210_0:B I *L 0.1 *D AN210
*RES
0 CLK
CLK:0 0.1
1 DTP10J_0:CLK CLK:0 0.1
2 DTP10J_1:CLK CLK:0 0.1
3 DTP10J_2:CLK CLK:0 0.1
*CAP
0 DTP10J_1:Q
1 AN210_0:B
2 G6:0 0.1
0.1
0.1
*END
*RES
0 DTP10J_1:Q
1 AN210_0:B
0.1
0.1
G10:0 0.1
G10:0 0.1
117
G6:0 0.1
G6:0 0.1 *END
118
tech="voltage 1.2v"
"vdd 0 1.2 0.01"
"vss 0 0 0.01"
"invoke spice3 %input %output"
* spice options
.inc /user/kalpu/cloc/autochar/userware/spice_options noprint
* temperature = 25
.temp 25
.inc ../user_data/models_strong noprint
*.inc /db/pdk/1233c035a/current/models/current/tis/model.paths.strong
noprint
.inc /user/kalpu/cloc/autochar/subckt/sr40/an210h noprint
PVDD 1.2
vvdd vdd 0 PVDD
RVDD VDD VDD_inv1 1000
RVSS VSS_inv1 0 1000
xinv1 A B Y VSS_inv1 vdd_inv1 an210h
*10 MHz
VA A 0 PULSE 0 PVDD 1n pslew pslew
pslew 50n 100n Vb B 0 PVDD
Pslew 0.01n
pload 50ff
CY Y 0 pload
.tran 0.01ns 250ns
.MEASURE TR AVGPWR AVG P(Vvdd) FROM=20ns TO=60ns .punch tr V(Vdd_inv1
vss_inv1) .punch tr I(VVDD) .punch tr I(rvdd) .punch tr V(A B Y) *.punch tr
I(rvdd rvss)
.end
119
120
121