Professional Documents
Culture Documents
6, JUNE 2007
637
AbstractWe propose a multiple supply voltage scaling algorithm for low power designs. The algorithm combines a greedy approach and an iterative improvement optimization approach. In
phase I, it simultaneously scales down as many gates as possible
to lower supply voltages. In phase II, a multiple way partitioning
algorithm is applied to further refine the supply voltage assignment of gates to reduce the total power consumption. During both
phases, the timing correctness of the circuit is maintained. Level
converters (LCs) are adjusted correctly according to the local connectivity of the different supply voltage driven gates. Experimental
results show that the proposed algorithm can effectively convert
the unused slack of gates into power savings. We use two of the
ISPD2001 benchmarks and all of the ISCAS89 benchmarks as test
cases. The 0.13- m CMOS TSMC library is used. On average, the
proposed algorithm improves the power consumption of the original design by 42.5% with a 10.6% overhead in the number of LCs.
Our study shows that the key factor in achieving power saving is including the most comportable supply voltage in the scaling process.
Index TermsAlgorithms, low power, multiple voltages assignment, partition, power optimization, voltage scaling.
I. INTRODUCTION
Scaling down the supply voltage of a gate will cause the gate
to have a longer gate delay. In order to maintain the correctness
of the timing, only the gates along noncritical paths are assigned
to a lower supply voltage to convert the unused slack into power
savings. The average distribution of gates with different slacks
for 16 MCNC91 benchmarks is shown in Fig. 1; these were
presented in [1]. In Fig. 1, the slack of each gate was normalized
to the longest path delay. It may be seen from this figure that
the number of gates on critical paths (i.e., gates with zero or
close-to-zero slack) accounts for only about 14% of the total
number of gates. The number of gates with a slack larger than
0.2 comprise more than 60% of the total number of gates. This
means that there is plenty of room for power reduction via the
utilization of lower supply voltages on the gates of large slack.
However, in a voltage-scaled circuit, if a lower supply
voltage gate (a VDDL gate) drives a higher supply voltage gate
(a VDDH gate), a level converter (an LC) must be inserted as
a bridge between these two gates [2]. This is because the output
signal of the VDDL gate will cause a static current flow from
the VDD to VSS in the VDDH gate. An example is shown in
Fig. 2. In Fig. 2, the inverter is a VDDH gate and this inverter
is driven by a VDDL gate. Since the voltage of the input signal
of the inverter will not be higher than VDDL even when the
input signal is at the HIGH level, the pMOS in this inverter
.
may not be cut-off if
represents the threshold voltage of the pMOS. This will cause
a static current flow from VDD to VSS through the pMOS to
nMOS. Thus, an LC is needed between a VDDL and a VDDH
gate to prevent the creation of a static current. However, the LC
will also consume power and will cause a timing delay. It also
increases the chip area. An LC is not required if a VDDH gate
drives a VDDL gate. The number of LCs in a voltage-scaled
638
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
CHI et al.: GATE LEVEL MULTIPLE SUPPLY VOLTAGE ASSIGNMENT ALGORITHM FOR POWER OPTIMIZATION UNDER TIMING CONSTRAINT
639
(1)
(2)
where and are the numbers of input and output pins of gate
.
is the power consumption of the th input pin. The values
may be extracted from the lookup table in the library acof
cording to the total capacitance of fan-out load. represents the
and
represent the switching acfrequency of the circuit.
tivity of the th input pin and the th output pin, respectively.
is the loading capacitance on the th output pin. The value
is the sum of the capacitances of the fan-out net and the
of
driven pins of the net. The capacitance of each net is estimated
represents the supply voltage
by applying wire load model.
at gate . For example, if is a VDDH/VDDL gate then
equals to VDDH/VDDL. The power consumption of an LC is
of each LC is asalso calculated by (2). The supply voltage
signed as VDDH.
The total power consumption of the circuit is calculated by
(3)
640
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
Fig. 4. Example of the insertion and removal of an LC. (a) The original netlist
of a four terminal net is shown. When gate A is scaled down to a VDDL gate,
an LC is inserted at the output of gate A, as shown in (b). When gate D is
successively scaled down to a VDDL gate, the netlist becomes (c). Finally, gate
A is scaled up to a VDDH gate, the LC is removed and the netlist becomes (d).
the gates such that the total power consumption is reduced while
maintaining the same cycle time of the design.
In the voltage scaling process, we need to adjust the netlist
by inserting or removing an LC according to the local connectivity of the voltage scaled gates. An example is shown in Fig. 4
in which two voltage domains VDDL and VDDH are used.
Fig. 4(a) shows the original connection of a four terminal net
. First, when gate is scaled down to a VDDL gate, an LC
is inserted at the output of gate . The net is divided into
two nets and
as shown in Fig. 4(b). Then gate
is successively scaled down to a VDDL gate and the netlist becomes
Fig. 4(c). Finally, gate is scaled up to a VDDH gate, the LC is
removed, and the netlist becomes Fig. 4(d). Due to the inserting
and removing of an LC from the netlist, the netlist is changed
dynamically. This change is considered in the timing analysis
process. In the example shown in Fig. 4(b), the delay from an
input pin of gate to the input pin of gate is the summation
of the delays of gate , net , gate LC, and net .
A. Algorithm Overview
At the beginning, we apply the timing analysis procedure to
calculate the cycle time of the circuit and the slack of each gate.
This cycle time is used as the timing constraint of the design in
order to maintain the timing correctness of the circuit. Then the
algorithm will proceed with two phases. In phase I, we apply
a greedy approach that scales down the supply voltages of as
many gates as possible. It allows all gates along the timing-loose
paths to be scaled down simultaneously and results in a smaller
number of LCs on these paths. The VDDL gates will be fixed
at the lowest voltage and the rest of gates are referred to scalable gates. Only the supply voltages of scalable gates will be
reassigned in phase II. In phase II, we utilize the technique of
the multiway partitioning algorithm to perform the voltage assignments. Different voltage domains are treated as different
partitions. We refer to a voltage assignment as a move. Each
scalable gate is moved to the voltage domain of the maximum
power gain. The iterative optimization process is executed until
the total power of the circuit can no longer be reduced. During
CHI et al.: GATE LEVEL MULTIPLE SUPPLY VOLTAGE ASSIGNMENT ALGORITHM FOR POWER OPTIMIZATION UNDER TIMING CONSTRAINT
641
Total Power
(4)
If the voltage movement requires an insertion or removal of a
level converter then the power associated with the level con.
verter is added to or subtracted from the Total Power
During the scaling process, only the power consumptions of the
fan-in gates of the scaled gate are affected. Thus, the power gain
of a gate , may be simplified by following equation:
(5)
and
represent the power values of gate before
where
and after gate is scaled, respectively. The values of and
may be calculated by (2). By applying (5), the power gain of
each gate may be incrementally updated. An example is shown
, and are VDDH
in Fig. 7. If the supply voltages of gates
is VDDL, then the power gain
and the supply voltage of
of gate is
.
According to the definition of power gain, each gate has
power gains, denoted as
for each
. Then, we
,
define the maximum gain value of a gate , denoted as
for each
. It is
as the maximum value among
defined as follows:
(6)
value of the affected gates
After one gate is scaled, the
will be updated.
2) Proposed Algorithm: First, we set up an unlock flag for
each scalable gate and calculate the power gain of each gate.
and
Then we select the unlocked gate with the largest
move it to the best voltage domain. We insert or remove level
converters according to local connectivity. Then we lock the
moved gate and calculate the slacks of all affected gates. When
a gate is scaled, the arrival time of the gates on the downstream paths of the fan-in gates of and the require time of
the gates on the upstream paths of are changed. Thus, the
slacks of these gates are updated. If any slack is negative, then
the move will cause a timing violation and the move is rejected.
If the move is accepted, we will update the power gain of all
connected gates.
We use the same example as shown in Fig. 7. If gate has
the maximum gain among all unlocked gates and
, we select gate and scale it to a VDDL gate.
Then, a level converter is inserted between gates and . If the
slacks of all affected gates are positive or zero, then the move
, and are updated.
is accepted. The power gains of gates
is changed. Before
In this process, the output load of gate
scaling, the load of gate is equal to the sum of the capacitance
of a two terminal net and the input capacitance of
. After
becomes the capacitance of a two
scaling, the load of gate
terminal net and the input capacitance of gate . Therefore, both
delay and power consumption of gate should be updated. The
642
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
sum. We calculate the partial sum after each move. Because the
power gain may be negative, the value of the partial sum fluctuates. During the gate-move process, we keep the largest partial
sum of power gain and record the corresponding state. After an iteration is terminated, the state of the largest partial sum is found.
CHI et al.: GATE LEVEL MULTIPLE SUPPLY VOLTAGE ASSIGNMENT ALGORITHM FOR POWER OPTIMIZATION UNDER TIMING CONSTRAINT
643
Fig. 7. Example of the calculation of power gain when gate B is scaled. The power consumption of gate A stays the same when gate B is scaled, thus P = P .
Due to the removal of LC the loading capacitance of gate D is changed. It leads to a change in the power consumption of gate D . Therefore, the power gain
G(B; VDDL) of gate B is equal to (P
+P +P
)
(P + P + P
).
comparison. The experimental platform is SUN Blade 1000 running Solaris 9 with 8-G RAM. Two ISPD2001 circuit benchmarks, Mac1 and Mac2 [16], and the ISCAS89 benchmarks [17]
are used as the test cases. The TSMC 0.13- m CMOS library
is used in our experiments. In these experiments, in order to reduce the complexity of the problem, we only use one type of LC
that can convert an input signal that is within the range (0.6 V,
1.2 V) to a 1.2 V output signal. As presented in the previous researches [1], [7], we assume the switching activity of all nets is
a constant. Because the value will not affect the results of the
voltage scaling, we assign 1 to the switching activity to reduce
644
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
TABLE I
COMPARISON OF THE RESULTS OF OUR ALGORITHM AFTER PHASE I AND PHASE II
sively scales gates from high voltage to low voltage such that
the slacks of gates are decreased. We find that, originally, slacks
are heavily clustered between 7 and 11 ns. After phase I, the
slacks are shifted to the interval [4 ns, 9 ns]. The slacks are generally reduced. After phase II, 25% of slacks are less than 1 ns.
There are about 1100 gates with the slacks close to zero. The
cycle time of the design is 12.2 ns. It shows that this algorithm
utilizes the slack of each gate to scale down the voltage of the
gates. Yet, we can detect that there are gates which still have
large slacks after phase II. This is because these gates are on the
paths with delays much shorter than the cycle time. Even though
the voltages of all gates along these paths have been scaled to
the low supply voltage, the slacks of these gates are still much
larger than the slacks of other gates. This implies that the supply
voltages of these gates could be scaled further downward.
Then, we compared the results of the proposed algorithm with
dual supply voltages and the results of GECVS [7]. The two
voltage domains used in this experiment are 1.2 and 0.6 V. In
this experiment, we implement the GECVS without the backoff
on the timing of the circuit. Thus, the total power of GECVS
may be further reduced if the backoff delay is allowed. The experimental results are listed in Table II. As shown in Table II,
column 2 shows the number of gates. The original power of
these circuits is listed in column 3. Columns 47 show the results of our algorithm. Column 4 shows the percentage of the
number of LCs over the number of total gates (including LCs).
The percentage of total power of LCs over the total power of
the circuit is shown in column 5. Column 6 shows the saved
power compared to the original power. The CPU time is shown
in column 7. The results of GECVS are shown in columns 811,
respectively.
On average, our algorithm can reduce total power consumption by 37.2% with an 11.9% overhead on the total number of
LCs. The GECVS algorithm can only reduce total power consumption by 29.7% with a 16.9% overhead on total number of
LCs. In this experiment, an interesting effect is observed in the
test case s1488. As shown in Table II, our algorithm uses more
LCs but consumes less LC power than GECVS. It is because
the output load on each LC will affect its power consumption.
CHI et al.: GATE LEVEL MULTIPLE SUPPLY VOLTAGE ASSIGNMENT ALGORITHM FOR POWER OPTIMIZATION UNDER TIMING CONSTRAINT
Fig. 9. Slack distribution of s13207 which was scaled by the proposed algorithm using dual supply voltages (1.2 and 0.6 V). (a) the slack distibution before
scaling. (b) After phase I. (c) After phase II.
The power consumption of any two LCs may not be the same.
We calculated the total loading capacitance of all LCs in s1488.
The values are 2.047 and 3.259 pf by using our algorithm and
GECVS, respectively. Therefore, it is possible that a netlist with
more LCs consumes less LC power.
Table III shows a comparison of the improvements our algorithm provides compared to GECVS in the number of level
converters, power consumption, and CPU time. The percentage
of the number of scalable gates in phase II with respect to the
number of total gates is shown in column 5. The table shows
that our algorithm can save more power and use fewer level converters. On average our algorithm improves the number of level
converters by 34.2% and the power consumption by 11.0% in
27.0% less CPU time. Looking at the same table we can see
that, on average, the percentage of scalable gates in phase II of
our algorithm is only 73.8% of the scalable gates of GECVS.
Thus, the problem size of our algorithm is much smaller than
645
that of GECVS. Therefore, even though our algorithm is an iterative optimization process, it still uses less CPU time.
We have also explored the impact on power savings by using
multiple voltage domains. We compared the results of using two
and four voltage domains in the scaling process. The experimental results of the proposed algorithm that uses four voltage
domains are listed in Table IV. Column 2 shows the percentage
of the number of LCs over the number of total gates (including
LCs). The percentage of total power of LCs over the total power
is shown in column 3. Column 4 shows the saved power compared to the original power. The original power is shown in the
column 3 of Table II. The CPU time is shown in column 5.
Comparing the results in Table IV and the results shown in
Table II, we can find that the average power saving of four
voltage domains is 5.3% more than the power savings achieved
by using two voltage domains. Fig. 10 shows a comparison of
the power savings in three programs, namely, GECVS, our algorithm with dual voltage domains, and our algorithm with four
voltage domains.
These experiments show that using more voltage domains can
save more power. We find that in the case Mac1, using four
voltage domains in our algorithm saves 12.1% more power than
that with dual supply voltages. Therefore, we use Mac1 as the
test case to further study the impact on power savings when different numbers of voltage domains are used. The experimental
results are shown in Table V. In Table V, column 1 shows the
number of voltage domains. Column 2 is the original power of
the circuit. The power saved after phase I and phase II are shown
in columns 3 and 4, respectively. The voltage domains used in
these different experiments are shown in column 5.
As shown in Table V, we can find that when dual supply voltages (1.2 and 0.6 V) are used, the power saving is 40.3%. When
the supply voltage 0.8 V is added into the voltage domains in the
experiments, the power savings on rows 3 and 4 are the same.
When the supply voltage 0.7 V is used, the power saving on
rows 5, 6, and 7 are very close. It means that 0.6 V is not a good
choice for the lower bound of the voltage domains. Therefore,
we use the same test case to study the impact on power saving
by applying different lower bounds in the voltage domain. The
experimental results are shown in Table VI.
Table VI shows that after phase II, when 1.2 and 0.7 V are
used, the largest power consumption can be saved. Tables V
and VI also show the comparable power savings when 0.7 V is
included in the voltage domains. Therefore, we can find that the
use of the most comportable supply voltage is the main factor
in determining total power savings. The most comportable
supply voltage of different circuits is determined according to
the timing tightness of the paths in the circuit.
Although using more voltage domains saves more power, it
needs more voltage islands and the designer must make extra
efforts to accommodate the extra voltage islands. Therefore, for
the proposed algorithm, if the most comportable supply voltage
for each case is known, the use of dual supply voltages is a good
choice for considering both power and effort saving.
In the previous experiments, the switching activity of each net
is 1. In this experiment, we would like to assign different values
of switching activity to different nets. Reference [18] described
a patternless method to estimate the switching activity of simple
646
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
TABLE II
EXPERIMENTAL RESULTS OF OUR ALGORITHM AND THE GECVS [7] ALGORITHM. THE TWO SUPPLY VOLTAGES USED ARE 1.2 AND 0.6 V
TABLE III
COMPARISON OF THE IMPROVEMENTS OUR ALGORITHM
PROVIDES COMPARED TO GECVS [7] IN THE NUMBER OF
LEVEL CONVERTERS, POWER CONSUMPTION, AND CPU TIME
Fig. 10. Power saving of the three programs, GECVS, our algorithm with dual
supply voltages, and our algorithm with four supply voltages.
TABLE V
COMPARE THE RESULTS OF USING DIFFERENT NUMBER OF
VOLTAGE DOMAINS ON MAC 1
TABLE IV
RESULTS OF OUR ALGORITHM USING FOUR VOLTAGE DOMAINS
TABLE VI
RESULTS OF OUR ALGORITHM USING DIFFERENT LOWER
BOUNDS OF VOLTAGE DOMAINS ON MAC1
CHI et al.: GATE LEVEL MULTIPLE SUPPLY VOLTAGE ASSIGNMENT ALGORITHM FOR POWER OPTIMIZATION UNDER TIMING CONSTRAINT
647
TABLE VII
EXPERIMENTAL RESULTS OF THE PROPOSED ALGORITHM WITH ESTIMATED SWITCHING ACTIVITY
domains are used, more voltage islands will be needed and designers will be burdened with the extra voltage islands in their
designs. Thus, using dual voltage domains is a good choice
both for saving power and facilitating the design effort. We also
studied the impact of considering switching activity on total
power consumption. The results show that the algorithm reduces
total power consumption by 39.5% as compared to the original
circuit. Low power design is always an important issue for VLSI
designs. By applying lower supply voltages on nontiming critical gates, we can greatly reduce the total power consumption.
REFERENCES
[1] C. Chen, A. Srivastava, and M. Sarrafzadeh, On gate level power optimization using dual-supply voltages, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 9, no. 5, pp. 616629, Oct. 2001.
[2] K. Usami and M. Horowitz, Clustered voltage scaling technique for
low-power design, in Proc. Int. Symp. Low Power Design, 1995, pp.
38.
[3] K. Usami, T. Ishikawa, M. Kanazawa, and H. Kotani, Low-power
design technique for ASICs by partially reducing supply voltage, in
Proc. 9th Annu. IEEE Int. ASIC Conf. , 1996, pp. 301304.
[4] K. Usami, M. Igarashi, F. Minami, M. Ishikawa, M. Ichida, and K.
Nogami, Automated low-power technique exploiting multiple supply
voltages applied to a media processor, IEEE J. Solid-State Circuits,
vol. 33, no. 3, pp. 463472, 1998.
[5] M. Igarashi, A low-power design method using multiple supply voltages, in Proc. Int. Symp. Low Power Design, 1997, pp. 3641.
[6] Y. J. Yeh and S. Y. Kuo, An optimization-based low-power voltage
scaling technique using multiple supply voltages, in Proc. IEEE Int.
Symp. Circuits Syst., 2001, pp. 535538.
[7] S. N. Kulkarni, A. N. Srivastava, and D. Sylvester, A new algorithm
for improved VDD assignment in low power dual VDD systems, in
Proc. Int. Symp. Low Power Design, 2004, pp. 200205.
[8] D. Kang, M. C. Johnson, and K. Roy, Multiple-Vdd scheduling/allocation for partitioned floorplan, in Proc. 21st Int. Conf. Comput. Design, 2003, pp. 412418.
[9] D. Kang, M. C. Johnson, and K. Roy, Simultaneous multiple-Vdd
scheduling and allocation for partitioned floorplan, in Proc. 5th Int.
Symp. Quality Electron. Design, 2004, pp. 98103.
[10] A. Manzak and C. Chakrabarti, A low power scheduling scheme with
resources operating at multiple voltages, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 10, no. 1, pp. 614, Feb. 2002.
[11] S. P. Mohanty and N. Ranganathan, Simultaneous peak and average
power minimization during datapath scheduling, IEEE Trans. Circuits
Syst.I: Reg. Papers, vol. 52, no. 6, pp. 11571165, Jun. 2005.
[12] A. Srivastava and D. Sylvester, Minimizing total power by simultaneously Vdd/Vth assignment, in Proc. Asia South Pacific Design Autom.
Conf., 2003, pp. 400403.
648
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007
Hung Hsie Lee received the B.S. degree in information and computer engineering from Chung Yuan
Christian University, Taiwan, R.O.C., in 2006.
Currently, he is an Engineer in the SoC Technology Center, Industrial Technology Research
Institute, Hsinchu, Taiwan, R.O.C. His research
interests include the area of design automation for
low power VLSI IC design.
Sung Han Tsai received the B.S. degree in information and computer engineering from Chung Yuan
Christian University, Taiwan, R.O.C., in 2006.
Currently, he is an Engineer with Springsoft Inc.,
Hsinchu, Taiwan, R.O.C. His research interests include the general area of VLSI CAD.