An Efficient Approach to Low-leakage

Power VLSI Design using Variable Body
Biasing

A thesis Presented
By
Md. Asif Jahangir Chowdhury
Student Id. 0606047
&
Md. Shahriar Rizwan
Student Id. 0606072

In partial fulfillment of the
Requirements for the B.Sc in
Electrical and Electronics Engineering




Department of Electrical and Electronics Engineering,
BUET, Bangladesh
March 2012
iii




Bangladesh University of Engineering
and Technology

CERTIFICATE
This is to certify that the thesis report entitled “An Efficient Approach to Low-
leakage Power VLSI Design using Variable Body Biasing” submitted by Md. Asif
Jahangir Chowdhury, (student id. 0606047), and Md. Shahriar Rizwan, (student id.
0606072) in partial fulfillment of the requirements for the award of B.Sc degree in
Department of Electrical and Electronics Engineering in Bangladesh University of
Engineering and Technology is an authentic work under my supervision and guidance.
To the best of my knowledge, the matter embodied in the thesis has not been submitted
to any other University / Institute for the award of any Degree or Diploma.

Approved by:
Dr. Md. Shafiqul Islam
Professor
Department of Electrical And Electronics Engineering
BUET.



iv






Dedicated
To Our Parents







v

ACKNOWLEDGEMENTS

We would like to express our sincere gratitude and appreciation to everyone who made
this thesis possible. Most of all, we would like to thank our advisor, Professor Dr. Md.
Shafiqul Islam for giving us the opportunity to work under him and lending every
support at every stage of this thesis. We are deeply indebted to his esteemed guidance,
constant encouragement and fruitful suggestions from the beginning to the end of this
thesis. His trust and support inspired us in the most important moments of making right
decisions and we are delighted to work under his supervision.
We would also like to express our gratitude to our beloved parents who inspired us in
each and every step of our lives.

THE AUTHORS





vi

TABLE OF CONTENTS


DEDICATION ......................................................................................................... IV
ACKNOWLEDGEMENTS ...................................................................................... V
LIST OF TABLES ................................................................................................ VIII
LIST OF FIGURES ................................................................................................. IX
LIST OF SYMBOLS OR ABBREVIATIONS ....................................................... XI
ABSTRACT ........................................................................................................... XII
CHAPTER 1: INTRODUCTION........................................................................... 14
1.1 PROBLEM STATEMENT .................................................................................. 16
1.2 CONTRIBUTIONS ...................................................................................... 16
1.3 THESIS ORGANIZATION ........................................................................... 17
CHAPTER 2: MOTIVATION ............................................................................... 18
CHAPTER 3: NOTATION AND BACKGOUND .................................................. 22
3.1 LEAKAGE POWER ............................................................................................... 22
3.2 SRAM CELL LEAKAGE PATHS ............................................................................ 26
3.3 SWITCHING POWER AND DELAY TRADEOFFS........................................................ 27
3.4 CIRCUIT PERFORMANCE ESTIMATION ................................................................. 28
CHAPTER 4: PREVIOUS WORKS ...................................................................... 31
4.1 STATIC POWER REDUCTION VLSI RESEARCH ..................................................... 31
4.1.1 Static power reduction research for generic logic circuits .......................... 31
4.1.1.1 Sleep transistor .................................................................................... 32
4.1.1.2 Forced Stack ........................................................................................ 33
4.1.1.3 Sleepy Stack ........................................................................................ 34
4.1.1.4 Sleepy keeper ...................................................................................... 35
4.1.1.5 Dual Sleep ........................................................................................... 36
4.1.1.6 Dual Stack ........................................................................................... 37
vii

4.1.2 Static power reduction research for SRAM ................................................. 38
4.1.2.1 Sleep transistor .................................................................................... 38
4.1.2.2 Dual Sleep ........................................................................................... 39
4.1.2.3 Dual Stack ........................................................................................... 40
4.1.2.4 Sleepy Keeper in SRAM ...................................................................... 41
CHAPTER 5: VARIABLE BODY BIASING TECHNIQUE ................................ 42
5.1 VARIABLE BODY BIASING APPROACH .................................................................. 42
5.2 VARIABLE BODY BIASING STRUCTURE ................................................................ 43
5.3 VARIABLE BODY BIASING OPERATION ................................................................. 44
5.4 ANALYSIS OF SUBTHRESHOLD LEAKAGE REDUCTION .......................................... 45
5.5 ESTIMATION OF DELAY FOR VARIABLE BODY BIASING TECHNIQUE ....................... 46
CHAPTER 6: EXPERIMENTAL RESULTS ....................................................... 48
6.1 EXPERIMENTAL RESULTS FOR GENERAL LOGIC CIRCUITS ..................................... 48
6.1.1 Experimental results for CO4 ..................................................................... 48
6.1.2 Experimental results for FA ....................................................................... 54
6.2 EXPERIMENTAL RESULTS FOR SRAM ................................................................. 59
6.3 COMPARISON WITH PREVIOUS METHODS ............................................................. 63
CHAPTER 7: CONCLUSION ............................................................................... 65
7.1 CONCLUSION ........................................................................................... 65
7.2 SUGGESTIONS FOR FUTURE WORK ............................................. 66
APPENDIX .............................................................................................................. 67
A. AREA ESTIMATION .................................................................................... 67
B. CIRCUIT DIAGRAMS .................................................................................. 67
BIBLIOGRAPHY .............................................................................................. 73


viii

LIST OF TABLES


Table 1 Power and area results from [15] ................................................................... 19
Table 2 Energy consumption scenario of a cell phone (0.07µ) from [15] .................... 20
Table 3 Leakage model parameters (0.5μ tech) ........................................................... 25
Table 4 Chosen technology and V
dd
value .................................................................. 48
Table 5 Static power data for chain of 4 inverters (nano watt) .................................... 49
Table 6 Dynamic power data for chain of 4 inverters (micro watt) ............................. 50
Table 7 propagation delay data for chain of 4 inverters (Pico seconds) ....................... 51
Table 8 Power delay Product data for chain of 4 inverters (femto joule) ..................... 52
Table 9 Area delay data for chain of 4 inverters (µm
2
) ............................................... 53
Table 10 Static power data for 1 bit full adder (nano watt) ......................................... 54
Table 11 Dynamic power data for 1 bit full adder (micro watt) .................................. 55
Table 12 Data of propagation delay for 1 bit full adder (nano second) ........................ 56
Table 13 Power delay product data for 1 bit full adder (femto joule) .......................... 57
Table 14 Area data for 1 bit full adder (µm
2
) .............................................................. 58
Table 15 Static power data for SRAM (nano watt) ..................................................... 59
Table 16 Dynamic power data for SRAM (micro watt) .............................................. 60
Table 17 Data of propagation delay for SRAM (nano second) .................................... 61
Table 18 Power delay product data for SRAM (femto joule) ...................................... 62
Table 19 Area data for SRAM (µm
2
) ......................................................................... 62
Table 20 Comparison of VBB Approach for a Chain of Four Inverters (for 90 nm
process) ..................................................................................................................... 64
Table 21 Comparison of VBB Approach for a 1 bit full adder (for 90 nm process) ..... 64
Table 22 Comparison of VBB Approach for a SRAM (for 90 nm process) ................. 64


ix

LIST OF FIGURES

Figure 1 Sub-threshold leakage of an nFET ................................................................ 22
Figure 2 (a) A single transistor (left) and (b) stacked transistors (right) ...................... 23
Figure 3 SRAM cell leakage paths ............................................................................. 26
Figure 4 logical efforts of basic logic gates ................................................................ 29
Figure 5 Sleep transistor............................................................................................. 32
Figure 6 “Forced Stack” ............................................................................................. 33
Figure 7 sleepy stack .................................................................................................. 34
Figure 8 Sleepy keeper ............................................................................................... 35
Figure 9 Dual Sleep ................................................................................................... 36
Figure 10 Dual Stack ................................................................................................. 37
Figure 11 SLEEP TRANSISTOR IN SRAM.............................................................. 38
Figure 12 “DUAL SLEEP” IN SRAM ....................................................................... 39
Figure 13 “DUAL STACK” IN SRAM ...................................................................... 40
Figure 14 “SLEEPY KEEPER” IN SRAM................................................................. 41
Figure 15 An Inverter with (a) Sleepy Keeper (left) (b) Variable body biasing structure (right) .... 43
Figure 16 (a) Sleep transistor without body biasing transistor ..................................... 45
Figure 17 (a) Inverter with VBB Technique (left) (b) Inverter of equal strength (right) ........ 46
Figure 18 Static Power Consumption (CO4) .............................................................. 49
Figure 19 Dynamic power consumption (CO4) .......................................................... 50
Figure 20 Propagation delay (CO4) ............................................................................ 51
Figure 21 power delay product ................................................................................... 52
Figure 22 Area comparison (CO4) ............................................................................. 53
Figure 23 Static power consumption for FA ............................................................... 54
Figure 24 Dynamic power consumption for FA .......................................................... 55
Figure 25 Propagation delay comparison in FA .......................................................... 56
Figure 26 Power Delay Product for FA ...................................................................... 57
Figure 27 Area Comparison for FA ............................................................................ 58
Figure 28 Static power consumption for SRAM cell .................................................. 59
Figure 29 Dynamic power consumption for SRAM .................................................... 60
Figure 30 Propagation delay comparison for SRAM .................................................. 61
x

Figure 31 Power Delay Product of SRAM.................................................................. 62
Figure 32 Area comparison for SRAM ....................................................................... 63
Figure 33 SLEEP TRANSISTOR .............................................................................. 67
Figure 34 “FORCED STACK” METHOD ................................................................. 68
Figure 35 “SLEEPY KEEPER” METHOD ................................................................ 68
Figure 36 “DUAL SLEEP” METHOD ....................................................................... 69
Figure 37 “DUAL STACK” METHOD ..................................................................... 69
Figure 38 “VARIABLE BODY BIASING TECHNIQUE” ........................................ 70
Figure 39 “SLEEPY KEEPER” (FA) ......................................................................... 71
Figure 40 “DUAL SLEEP” (FA) ................................................................................ 71
Figure 41 “DUAL STACK” (FA) .............................................................................. 72
Figure 42 “VBB” (FA) ............................................................................................... 72
xi

LIST OF SYMBOLS OR ABBREVIATIONS

6-T 6 Transistors.
CMOS Complementary Metal Oxide Semiconductor.
CO4 Chain of 4 Inverter
DIBL Drain Induced Barrier Lowering.
FBB Forward-Body Bias.
FA Full Adder
ITRS International Technology Roadmap for Semiconductors.
MTCMOS Multi-Threshold-voltage CMOS.
RBB Reverse-Body Bias.
SRAM Static Random Access Memory.
VLSI Very Large Scale Integration
ZBB Zero Body Bias
VBB Variable Body Bias



xii

ABSTRACT

The ubiquitous era of emerging portable devices demands long battery life time as a
primary design goal. Subthreshold circuit design can reduce energy per cycle in an
order of magnitude of nominal operating circuits by scaling power supply voltage (V
dd
)
below the device threshold voltage. But, it lowers significantly circuit performance as a
penalty. Stringent energy budget and moderate speed requirements of ultra low power
systems in the market may not be best satisfied just by scaling a single supply voltage.
Optimized circuits with dual supply voltages provide an opportunity to resolve these
demands.
The primary focus of this thesis is to provide more efficient low-power solutions for
Very Large Scale Integration (VLSI) designers. Especially, we concentrate on leakage
power reduction. Although leakage power was negligible at 0.18µ technology and
above, in nano scale technology, such as 0.07µ, leakage power is almost equal to
dynamic power consumption.
In this thesis we present a new CMOS circuit design technique called “VARIABLE
BODY BIASING”. This structure dramatically reduces leakage. It tries to combine the
good features from the sleep transistor technique, sleepy keeper technique, dual sleep
technique and dual stack technique. The sleepy transistor technique can achieve ultra-
low leakage power consumption, but loses logic state during sleep mode. Sleepy
keeper, dual sleep and dual stack technique can retain state but the static power
consumption is not satisfactory in these methods. To get satisfactory leakage power
dual - V
th
is a must for these techniques. We know using body biasing we can control
the threshold voltage. If a voltage difference is created between body and source of a
MOSFET the threshold voltage increases. With the increase of the threshold voltage
leakage current decreases resulting in a decrease in Static Power or Leakage power. If
high- V
th
MOS is used in the circuit (i.e. sleepy keeper, dual sleep, dual stack) delay
increases. So if we can make such arrangements so that the V
th
of the sleep transistors
remains low during active mode and becomes high in sleep mode both leakage power
and delay can be checked within desired limits. In our design we have designed this
arrangement.
xiii

In sleep transistor there are two sleep transistors, one PMOS and one NMOS. In our
design we have added another two transistors; one PMOS and one NMOS in such way
that the drain of the added MOS is connected to the body of the sleep transistors
respectively, the gates and source are in parallel. With this arrangement we also added
two MOS like sleepy keeper technique to save state. So when the circuit is in active
mode, there is now voltage difference between the body and source in the sleep
transistors as the added transistors are on and offer no resistance between the body and
source (means body and source are short). When the circuit is in sleep mode there is a
high resistance between body and source resulting in a higher - V
th
. Consequently, the
leakage power consumed reduces. Moreover, due to low - V
th
during active mode delay
remains in a reasonable range.
One of the advantages of the “Variable Body Biasing” technique is saving state.
Therefore, the “Variable Body Biasing” technique can be applicable memory design,
i.e., Static Random Access Memory (SRAM). When we apply the sleepy stack to
SRAM cell design, we can observe new Pareto points which have not been presented
prior to the research in this thesis. Although the “Variable Body Biasing” incurs some
area overheads, the “Variable Body Biasing” SRAM cell can achieve ultra-low leakage
power consumption while suppressing two main leakage paths in an SRAM cell. When
compared to a high-V
th
SRAM cell, which is the best prior state-saving SRAM cell.





14

CHAPTER 1

INTRODUCTION

Ultra-low power applications such as micro-sensor networks, pacemakers, and many
Portable devices require extreme energy constraint for long battery life time.
Subthreshold operation presents an opportunity for such energy-constrained
applications with its very low energy consumption [1-6]. Subthreshold circuits offer a
promising solution for implementing highly energy-constrained systems in clock ranges
of low to medium frequencies for remote or mobile applications.
As the power supply voltage (V
dd
) is scaled below the device threshold voltage (V
th
),
the subthreshold current ever so slowly charges and discharges nodes for the circuit‟s
logic function [4]. This weak driving current inherently limits the performance but
minimum energy operation of the circuit is achieved with reduced dynamic and leakage
power, resulting in long battery life [7-9].
In the past decades, subthreshold circuit design was not well recognized in the area of
digital circuits as high performance demand was a major concern. Lately, however,
portability has become a trend in the electronics market place. Low energy per
operation is a primary design parameter in such applications. Without the performance
requirement, a subthreshold circuit can operate at its minimum energy operating point
that is only slightly above the absolute minimum voltage [10] that would guarantee the
correct logic function. Even for applications requiring high peak performance, ultra-
dynamic voltage scaling (UDVS) [11] can provide an opportunity for subthreshold
circuit design that would switch between a nominal voltage high performance mode
and an energy efficient subthreshold mode according to the system work load.
Before the mobile era, power consumption has been a fundamental problem. To solve
the power dissipation problem, many researchers have proposed different ideas from
the device level to the architectural level and above. However, there is no universal way
15

to avoid tradeoffs between power, delay and area, and thus designers are required to
choose appropriate techniques that satisfy application and product needs. Power
consumption of CMOS consists of dynamic and static components. Dynamic power is
consumed when transistors are switching, and static power is consumed regardless of
transistor switching. Dynamic power consumption was previously (at 0.18 µ
technology and above) the single largest concern for low-power chip designers since
dynamic power accounted for 90% for more of the total chip power. Therefore, many
previously proposed techniques, such as voltage and frequency scaling, focused on
dynamic power reduction. However, as the feature size shrinks, e.g., to 0.09µ and
0.065µ, static power has become a great challenge for current and future technologies.
Based on the International Technology Roadmap for Semiconductors (ITRS) [12] Kim
et al. report that sub threshold leakage power dissipation of a chip may exceed dynamic
power dissipation at the 65nm feature size [13] .One of the main reasons causing the
leakage power increase is increase of sub threshold leakage power. When technology
feature size scales down, supply voltage and threshold voltage also scale down. Sub
threshold leakage power increases exponentially as threshold voltage decreases.
Furthermore, the structure of the short channel device lowers the threshold voltage even
lower. In addition to sub threshold leakage, another contributor to leakage power is
gate-oxide leakage power due to the tunneling current through the gate-oxide insulator.
Since gate-oxide thickness will be reduced as the technology decreases, in nano-scale
technology, gate-oxide leakage power may be comparable to sub threshold leakage
power if not handled properly. However, we assume other techniques will address gate-
oxide leakage; for example, high-k dielectric gate insulators may provide a solution to
reduce gate-leakage [13]. Therefore, this thesis focuses on reducing sub threshold
leakage power consumption.
There are quite a few static power reduction methods present currently. Most of these
try to establish a balance between power and delay trade-off by implementing different
techniques. One of the most effective dynamic power reduction techniques is lowering
the supply voltage of CMOS transistors because the power consumption of CMOS
transistors increases quadratically proportional to the supply voltage. However,
lowering the supply voltage incurs an increase in transistor switching delays. Therefore,
designing CMOS circuits typically necessitates tradeoffs between performance (in
terms of delay) and power consumption. In this dissertation, we provide a circuit
16

structure named variable body biasing as a new remedy for designers in terms of static
power. With almost 95% reduction of static power, the variable body biasing method
does not degrade the delay or dynamic power consumption of the circuit, which makes
this approach a very attractive one for the circuit designers.

1.1 Problem Statement
This research work addresses new low power approaches for Very Large Scale
Integration (VLSI) logic and memory. Power dissipation is one of the major concerns
when designing a VLSI system. Until recently, dynamic power was the only concern.
However, as the technology feature size shrinks, static power, which was negligible
before, becomes an issue as important as dynamic power. Since static power increases
dramatically (indeed, even exponentially) in nano-scale silicon VLSI technology, the
importance of reducing leakage power consumption cannot be overstressed. A well-
known previous technique called the sleep transistor technique cuts off V
dd
and/or Gnd
connections of transistors to save leakage power consumption. However, when
transistors are allowed to float, a system may have to wait a long time to reliably
restore lost state and thus may experience seriously degraded performance. Therefore,
retaining state is crucial for a system that requires fast response even while in an
inactive state. Our research provides new VLSI techniques that achieve ultra-low
leakage power consumption while maintaining logic state, and thus can be used for a
system with long inactive times but a fast response time requirement.

1.2 Contributions
The following items are the main contributions of this research:
 Design of Variable body biasing technique for logic circuits
The “VARIABLE BODY BIASING” technique is applied to generic logic circuits, and
we achieve orders of magnitude leakage power reduction compared to the best prior
state saving technique we could find (namely, sleepy stack [14,15], sleepy keeper [16],
dual sleep [17], dual stack [18]).
 Design of a Variable body biasing SRAM cell
17

Static Random Access Memory (SRAM) is a power hungry component in a VLSI chip.
Therefore, we apply the “VARIABLE BODY BIASING” technique to SRAM design.
We provide new Pareto points that can be used by designers who want extremely low
leakage power consumption.
1.3 Thesis organization
The thesis is organized into eight chapters:
CHAPTER 1: INTRODUCTION. This chapter introduces power consumption issues in
VLSI. This chapter also summarizes the contributions of this thesis. Finally, this
chapter explains organization of the thesis.
CHAPTER 2: MOTIVATION. This chapter addresses our motivation for this research.
CHAPTER 3: NOTATION AND BACKGROUND. This chapter explains important
notation and background used throughout this dissertation.
CHAPTER 4: PREVIOUS WORK. This chapter describes previous work in power
reduction research and explains key differences between our solutions and previous
work.
CHAPTER 5: VARIABLE BODY BIASING TECHNIQUE APPLICATION. This
chapter introduces the “VARIABLE BODY BIASING” technique. First the structure of
the circuit is explained followed by a detailed explanation of the circuit operation. An
analytical model of the circuit is derived and compared to the previous techniques.
CHAPTER 6: EXPERIMENTAL RESULTS. This chapter discusses the experimental
results from various applications of the technique. The “variable body biasing”
technique is empirically compared to well-known previous approaches. The
comparisons are assessed in terms of static power, dynamic power, delay, power delay
product and area occupied while changing numerous VLSI and CMOS circuit
parameters.
CHAPTER 7: CONCLUSION. This chapter summarizes the major accomplishments of
this thesis.
18

CHAPTER 2

MOTIVATION

Subthreshold circuit design is suitably applicable for emerging portable applications
that need tremendously low energy operation. The limitation of this technique is very
slow speed of operation due to the extremely scaled down supply voltage. Despite a
very high energy efficiency, the subthreshold design has been applied only in niche
markets due to its low performance. Depending upon the application, size, weight and
cost can be equally important as performance. Especially for remote, portable and
mobile applications, low-power has significance. Reduced power consumption makes
the circuits lighter, reduces or eliminates cooling subsystems, and reduces the weight
and extends the life of the energy source.
The multi-V
dd
technique has been widely implemented for two supply voltages [19].
The dual-V
dd
design is best suited for exploiting the time slack in a subthreshold circuit
as well. Although the gate delay exponentially depends on V
dd
in the subthreshold
region it may be possible to find an optimal lower supply voltage for the available time
slack in the circuit. A DC to DC voltage converter [20] will then allow the voltage
management.
Historically, in the 1980‟s CMOS technology took over the mainstream of VLSI
design because CMOS consumes far less power than its predecessors (NMOS, bipolar,
etc.). Although this advantage still holds, power dissipation of CMOS has nonetheless
become a problem. For a long time, dynamic power accounted for more than 90%
(typically, over 99%) of total chip power, and thus was frequently used as the metric
for total power consumption for technologies 0.18µ and above. However, as technology
scales down to tens of nanometers, leakage power becomes as important as dynamic
power. Therefore, many ideas have been proposed to tackle the leakage power problem.
Although cutting off transistors from power rails, e.g., using the sleep transistor
19

technique, is one of the possible solutions, losing state during inactive mode incurs long
wake-up time and thus may not be appropriate for a system that requires fast response
times.
To provide a motivational scenario to illustrate the possible impact of this thesis, let us
compare the impact of static (leakage) power consumption in the context of a cell
phone example. We assume that in general, the cell phone we consider is always on
(i.e., 24 hours a day). However, the actual usage time of the cell phone is very limited.
If we assume a 500 minute calling plan with 500 minutes total used per month, the cell
phone is active only 1.15% (500min/(30Joys × 24 bours × 60mins)) of the total
on-time. This means that during rest – 98.85% of the time – the cell phone is non-
active; however, due to static power consumption, during rest (standby) the cell phone
still consumes energy and reduces battery life. In technology such as 0.07µ, the impact
of leakage power is huge. Let us consider an energy consumption scenario of a cell
phone predicted based on experimental results from [15]. Specifically, Table 1 shows
some specific results from [15] for 0.07µ technology at 25oC. Table 2 shows a
hypothetical energy consumption scenario.

Table 1 Power and area results from [15]

Forced Stack Sleepy Stack

Active
Power(W)
Leakage Power
(W)
Area (µ
2
) Active
Power(W)
Leakage Power
(W)
Area (µ
2
)
4
inverters
1.25E-06 9.81E-10 5.97E+00 1.09E-06 4.56E-12 9.03E+00
512B
SRAM
5.22E-04 5.39E-06 2.00E+01 5.80E-04 3.24E-07 3.66E+01




20

Table 2 Energy consumption scenario of a cell phone (0.07µ) from [15]
Forced stack Sleepy stack
Active
power
(W)
Leakage
power (W)
Area (µ
2
) Energy (J)
(Month)
Active
power
(W)
Leakage
power
(W)
Area (µ
2
) Energy (J)
(Month)
Processor
logic
circuits
1.38E-01 1.02E-01 6.61E+05 2.65E+05 1.47E-01 5.74E-04 1.21E+06 5.87E+03
32KB
SRAM
5.54E-03 4.15E-02 6.61E+05 1.06E+05 6.09E-03 2.44E-03 1.21E+06 6.44E+03
TOTAL 1.43E-01 1.43E-01 1.32E+06 3.72E+05 1.53E-01 3.01E-03 2.42E+06 1.23E+04

First, we assume a single chip containing an embedded processor core in 0.07µ
technology. The chip largely consists of logic circuits and a 32KB SRAM; note that
we exclude I/Os and the pad frame. Furthermore, we only consider here the digital
chip; i.e., the liquid crystal display, Radio Frequency (RF) circuitry, etc., are all
ignored. Second, we assume that SRAM and logic circuits each occupy half of the
digital chip area, respectively. We estimate 32KB SRAM area based on SRAM cell
area which we will present in Chapter 5 – note that in all cases we exclude test, e.g., our
SRAM does not include Built-In Self Test (BIST). The forced stack 32KB SRAM area
is 6.61 × 10
5
µ
2
, and the sleepy stack SRAM area is 1.21 × 10
6
µ
2
. Then we estimate
that the processor logic gates occupy the same amount of area as the 32KB SRAM as
shown in the area columns of Table 2.
Third, we also assume that at 0.07µ technology leakage power consumption is as much
as active power consumption when we use the forced stack technique. We multiply
forced stack leakage values from Table 1 by a factor (specifically, 939), so that forced
stack leakage power becomes the same as forced stack active power, i.e., 143mW. Then
we apply the same factor (939) to the sleepy stack leakage power from Table 1,
resulting in sleepy stack leakage power of 3.01mW. In other words, while Table 1 is
based on Berkeley Predicted Technology Model (BPTM) [21], we instead assume a
scenario where leakage power equals active power (which is, we believe, a hypothetical
situation we may possibly see in the future.) Now, recalling that our cell phone is active
500 minutes per month and thus inactive 42700 minutes per month, we calculate forced
stack digital chip energy per month as follows:

21

Encrgy1 = 143mw ∗ (500 ∗ 60scc) + 143mw ∗ 42700 ∗ 60scc
= 37.2K[
Similarly, we calculate sleepy stack digital chip energy per month as follows:
Encrgy2 = 153mw ∗ (500 ∗ 60scc) +3.01mw ∗ 42700 ∗ 60scc
= 1.23K[
The result predicts that the ultra-low leakage power technology, i.e., sleepy stack,
serves 30X total energy consumption compared to the best prior work, i.e., forced
stack. Therefore, potentially, the ultra-low leakage power technique can extend by 30X
the cell phone battery life in this motivational example. There is a cost for this 30X
savings, however, note that the overall area increase 83% (from 1.32 mm
2
to 2.42 mm
2
-
Table 2).
Although there already exist many low-leakage techniques, the best prior low-leakage
technique in terms of leakage power reduction, the sleep transistor technique, loses
logic state during sleep mode. Therefore, the sleep transistor technique requires non-
negligible time to wake-up the device from the sleep mode. If we consider an
emergency calling situation to use cell phone, this wake-up time may not be acceptable.
Therefore, an ultra- low-leakage technique that can save state even in non-active mode
can be quite important in nano-scale technology VLSI.
In this dissertation, we use circuit based techniques to reduce leakage power
consumption. Especially, our technique can retain logic state and thus fast response
time can be achieved even during non-active mode. The technique can be applicable to
generic logic circuits as well memory, i.e., SRAM, since our technique can retain state.
In this chapter, some motivation for the importance of this research is provided. In the
next chapter, we explain expressions, notation and background important for this thesis.

22

CHAPTER 3
NOTATION AND BACKGOUND

In this chapter, we explain important notation and VLSI background used in this
dissertation. First, we introduce subthreshold leakage power consumption on which our
research focuses. Next, we explain the background underlying a particular leakage
power model able to explain the stack effect. We then explain the body-bias effect,
which is an important leakage reduction factor in our research. Furthermore, we explain
subthreshold leakage power consumption of a conventional 6 Transistor (6-T) SRAM
cell. Finally, we explain switching power and delay tradeoffs of CMOS circuits and
some key terms of circuit performance estimations.

3.1 Leakage power
In this section, we explain notation and background relevant to leakage power
consumption.











Although dynamic power is dominant for technologies at 180nm and above, leakage
(static) power consumption becomes another dominant factor for 130nm and below.
One of the main contributors to static power consumption in CMOS is subthreshold
leakage current shown in Figure 1, i.e., the drain to source current when the gate
Figure 1 Sub-threshold leakage of an nFET
23

voltage is smaller than the transistor threshold voltage. Since subthreshold current
increases exponentially as the threshold voltage decreases, nano scale technologies wit h
scaled down threshold voltages will severely suffer from subthreshold leakage power
consumption.
Assuming the leakage current is constant the static power dissipation is the
product of total leakage current and supply voltage,

stotic
= I
stotic
I
JJ
(3.1)
Static power reduction involves minimizing I
stotic
, which is almost equal to the
subthreshold leakage current I
sub
for V
gs
< I
tb
.












Subthreshold leakage can be reduced by stacking transistors, i.e., taking advantage of
the so-called “stack effect” [22] or alternatively applying variable body biasing (I
sb0
),
which we will use in section 5. The stack effect occurs when two or more stacked
transistors are turned off together; the result is reduced leakage power consumption. Let
us explain an important stack effect leakage reduction model. The model we explain
here is based on the leakage models in [22] and [23]. For a turned off single transistor
shown in Figure 2(a), leakage current (I
sub0
) can be expressed as follows:

I
sub 0
= A c
1
n I
0
(I
gs0
−I
tb0
−I
sb0
+pI
Js0
)
(1 −c
− I
Js0
/I
0
)
= A c
1
n I
0
(−I
tb0
+pI
JJ
)


Figure 2 (a) A single transistor (left) and (b) stacked transistors (right)
24

Where,
 A =
0
C
ox
_
w
I
c¡¡
]I
0
2
c
1.8
,
 n is the subthreshold swing coefficient
 I
0
is the thermal voltage.
 I
gs0
, I
tb0
, I
sb0
and I
Js0
are the gate-to-source voltage, the zero-bias threshold
voltage, the source-to-base voltage and the drain-to-source voltage,
respectively.
 is the body-bias effect coefficient
 p is the Drain Induced Barrier Lowering (DIBL) coefficient.

0
is zero-bias mobility,
 C
ox
is the gate-oxide capacitance,
 w is the width of the transistor, and
 I
c¡¡
is the effective channel length [24].
(Note that throughout this thesis we assume
n
= 2
p
, i.e., nMOS carrier mobility is
twice PMOS carrier mobility. Also note that we use a W/L ratio based on a actual
transistor size, in which way a W/L ratio properly characterizes circuit models used in
this thesis.) We assume 1 >> c
−I
Js0
/I
0
.
Let us assume that the two stacked transistors (M1 and M2) in Figure 2(b) are turned
off. We also assume that the transistor width of each of M1 and M2 is the same as the
transistor width of M0 ( w
H0
= w
H1
= w
H2
). Two leakage currents I
sub 1
of the
transistor M1 and I
sub 2
of the transistor M2 can be expressed as follows:

I
sub 1
= A c
1
n I
0
(I
gs1
−I
tb1
−I
sb1
+pI
Js1
)
(1 −c
− I
Js1
/I
0
)
= A c
1
n I
0
(−I
x
−I
tb0
−I
x
+p(I
JJ
−I
x
))

I
sub 2
= A c
1
n I
0
(I
gs2
−I
tb2
−I
sb2
+pI
Js2
)
(1 −c
− I
Js2
/I
0
)
= A c
1
n I
0
(−I
tb0
+pI
x
)
(1 −c
− I
x
/I
0
)
Where I
x
is the voltage at the node between M1 and M2, and we assume 1 >>
c
−I
Js1
/I
0
.
Now consider leakage current reduction between I
sub 0
and I
sub 1
(= I
sub 2
). The
reduction factor X can be expressed as follows:
25

X =
I
sub 0
I
sub 1
=
A c
1
n I
0
(−I
tb0
+pI
JJ
)
A c
1
n I
0
(−I
x
−I
tb0
−I
x
+p(I
JJ
−I
x
))
= c
I
x
n I
0
(1++p)
(3.2)

I
x
in Equation (3.2) can be derived by letting I
sub 1
= I
sub 2
and by solving the
following equation:
c
1
n I
0
(pI
JJ
−I
x
(1++2p))
+ c

I
x
I
0
= 1 (3.3)

If all the parameters are known, we can calculate stack effect leakage power reduction
using Equations (3.2) and (3.3). As an example, we consider leakage model parameter
values targeting 0.5 technology in Table 3 obtained from [22]. From Equation (3.3),
we calculate I
x
=0.0443V, and from Equation (3.2), we obtain leakage reduction
factor X = 4.188.

Table 3 Leakage model parameters (0.5μ tech)









Although the reduction is 4.188X at 0.5 technology, the reduction increases at nano-
scale technology because p increases as technology feature size shrinks.
Threshold voltage of a CMOS transistor can be controlled using body bias. In
general, we apply V
dd
to the body (e.g., an n-well or n-tub) of PMOS and apply gnd to a
body (e.g., p-well or p-substrate) of NMOS. This condition, in which source voltage
and body voltage of a transistor are the same, is called Zero-Body Bias (ZBB).
Threshold voltage at ZBB is called ZBB threshold voltage. When body voltage is lower
than source voltage by biasing negative voltage to body, this condition is called
Reverse-Body Bias (RBB). Alternatively, when body voltage is higher than source
voltage by biasing positive voltage to body, this condition is called Forward-Body Bias
Parameter Value
I
JJ
1V
I
tb
0.2V
n (subthreshold slope coefficient) 1.5
p (DIBL coefficient) 0.05 V/V
(body-bias effect coefficient) 0.24 V/V
26

(FBB). When RBB is applied to a transistor, threshold voltage increases, and when
FBB applied to a transistor, threshold voltage decreases. This phenomenon is called
body-bias effect, and this is frequently used to control threshold voltage dynamically
[25].

In this section, Section 3.1, we explained subthreshold leakage power
consumption, the stack effect, and body-bias effects which can alter subthreshold
leakage power consumption. In the next section, we explain leakage current of an
SRAM cell.

3.2 SRAM cell leakage paths











In this section, we explain the major subthreshold leakage components in a 6-T
SRAM cell. The subthreshold leakage current in an SRAM cell is typically categorized
into two kinds [26] as shown in Figure 3: (i) cell leakage current that flows from V
dd
to
Gnd internal to the cell and (ii) bitlinc leakage current that flows from bitlinc (or
bitlinc′) to Gnd.
Although an SRAM cell has two bitlinc leakage paths, the bitlinc leakage
current and bitlinc′ leakage current differs according to the value stored in the SRAM
bit. If an SRAM cell holds „1‟ as shown in Figure 3, the bitlinc leakage current passing
through N3 and N2 is effectively suppressed due to two reasons. First, after precharging
bitlinc and bitlinc′ both to „1,‟ the source voltage and the drain voltage of N3 are the
same and thus potentially no current flows through N3. Second, two stacked and turned
Figure 3 SRAM cell leakage paths
27

off transistors (N2 and N3) induce the stack effect. Meanwhile, for this case where the
SRAM bit holds value „1,‟ a large bitlinc′ leakage current flows passing through N4
and N1. If, on the other hand, the SRAM cell holds „0,‟ a large bitlinc leakage current
flows while bitlinc′ leakage current is suppressed.
In this section, Section 3.2, we explain the two major types of leakage paths in
an SRAM cell (cell leakage and bitlinc leakage). In next section, we explain tradeoffs
between switching power and delay.

3.3 Switching power and delay tradeoffs

In this section, we explain tradeoffs between switching power and delay. In
CMOS, power consumption consists of leakage power and dynamic power – note that
dynamic power includes both switching power and short-circuit power. Switching
power is consumed when a gate charges its output load capacitance, and short-circuit
power is consumed when a pull-up network and a pull-down network are on together
for an instant while transistors are turning on and off. For 0.18μ channel lengths and
above, leakage power is very small compared to dynamic power. Furthermore, short-
circuit power is also less than 10% of the dynamic power for a typical CMOS design,
and the ratio between dynamic power and short-circuit power does not change as long
as the ratio between supply voltage and threshold voltage remains the same [27]. Since,
for 0.18μ and above, short circuit power and leakage power are relatively small
compared to switching power, CMOS power consumption of a particular CMOS gate
under consideration can be represented by the following switching power (P
switching
)
equation for 0.18μ and above:

switc bing
= p
t
C
I
I
JJ
2
¡ (3.4)
Where, C
L
, V
dd
, and f denote the load capacitance of a CMOS gate, the supply voltage
and the clock frequency, respectively [28]. Notation p
t
denotes the switching ratio of a
gate output; this switching ratio represents the number of times the particular gate‟s
output changes from Gnd to V
dd
per second – please note that when output capacitance
discharges from V
dd
to Gnd, switching power is not consumed because power from V
dd

is not used (e.g., discharging to Gnd does not consume battery power). The switching
ratio varies according to the input vectors and benchmark programs, and thus an
average value of each benchmark may be used as a switching ratio.
28

Equation (3.4) shows that lowering V
dd
decreases CMOS switching power
consumption quadratically. However, this power reduction unfortunately entails an
increase in the gate delay in a CMOS circuit as shown in following approximated
equation:
I
J

I
JJ
(I
JJ
−I
tb
)

(3.5)
Where, T
d
, V
th
, and denote the gate delay in a CMOS circuit, the threshold voltage
and velocity saturation index of a transistor, respectively. It is well-known that while
has values close to 2 for above 2.0μ, for 0.25μ is between 1.3 and 1.5, and for below
0.1μ is close to 1 [28-29]. However, instead of scaling down a value along with
the technology feature size, CMOS technology may take a constant value to avoid
the hotcarrier related problem [30]. A constant value could be accomplished by
changing V
th
because is a function of gate-source voltage [31]. If we scale down V
dd
,
switching power in Equation (3.4) decreases, while the gate delay in Equation (3.5)
increases. Therefore, CMOS circuit speed can be traded with switching power
consumption as shown in Equations (3.4) and (3.5).

When there exist tradeoffs between multiple criteria, e.g., power and delay, we
may say one design is better than another design in specific criteria. The point of design
space is called a Pareto point if there is no point with one or more inferior objective
[32]. In this thesis we estimate leakage power consumption by measuring static power
when transistors are not switching. Furthermore, we estimate active power
consumption by measuring power when transistors are switching. This active power
includes dynamic power consumption and leakage power consumption. In this chapter
we explained important notation and VLSI background used in this thesis. In the next
section, we explain previous low-power research related to our research.

3.4 Circuit Performance Estimation
In this section we introduce a vastly used technique to estimate the propagation
delay of VLSI circuits, Linear Delay Model. Using this model one can quickly and
crudely calculate the propagation delay in unit of = 3RC (parasitic delay of a unit
an inverter). For 180nm process =15ps.
29

In general the propagation delay of a gate can be written as
J = ¡ + p (3.6)

Where, p is the parasitic delay inherent to the gate when no load is attached; f is the
effort delay that depends on the complexity and fan-out of the gate: [33]
¡ = gb
The complexity is represented by the logical effort, g. An inverter is defined to have a
logical effort of „1‟. Logical effort of a gate is defined by ratio of the input capacitance
to the input capacitance of an inverter that can deliver the same output current. For 2-
input NAND gate and 2-input NOR gate g is 4/3 and 5/3 respectively. (Figure 4)












For general case it can be shown that that logical effort of n-input NAND gate, g =
(n+2)/3 and n-input NOR gate, g = (2n+1)/3. A gate driving h identical copies of itself
is said to have fan-out or electrical effort of h. If the load is not identical h is defined by

b =
C
out
C
in


The parasitic delay p of a gate is the delay of the gate when drives zero load. A crude
method is count the diffusion capacitance on the output node. It can be shown that the p
of n-input NAND gate and n-input NOR is equals to n. For calculating delay in
multistage logic networks we define following terms:

Figure 4 logical efforts of basic logic gates
30

Path logical effort, 0 = [g
i

Path electrical effort, E =
C
out(potb)
C
in
(potb)

Branching effort, b =
C
onpotb
+C
o¡¡potb
C
onpotb

Path branching effort, B = [b
i

Path effort, F = 0BE
Path effort delay, Ð
F
= _¡
i

Path parasitic delay, = _p
i



Finally, the path delay, D is the sum of the delays each stage:
Ð = J
i
= Ð
F
+

In this section, we discussed a simple model to estimate propagation delay of a
circuit by hand calculation. This model gives the designers an insight of the circuit
using which he can design faster circuits.

31

CHAPTER 4

PREVIOUS WORKS

In this chapter, we review important prior work that is closely related to our research.
Furthermore, the previous works are compared to our research. We explore the prior
work targeting leakage power reduction mainly. But we also shed light to other
performance criteria such as dynamic power, propagation delay, power delay product
and area etc.
4.1 Static Power Reduction VLSI research
In this section, we discuss previous low-power techniques that primarily target
reducing leakage power consumption of CMOS circuits. Techniques for leakage power
reduction can be grouped into two categories: (I) state-saving techniques where circuit
state (present value) is retained and (II) state-destructive techniques where the current
Boolean output value of the circuit might be lost [13]. A state-saving technique has an
advantage over a state-destructive technique in that with a state-saving technique the
circuitry can immediately resume operation at a point much later in time without
having to somehow regenerate state. We characterize each low-leakage technique
according to this criterion. We study low-leakage techniques for generic logic circuits
followed by low-leakage SRAM designs separately.
4.1.1 Static power reduction research for generic logic circuits
This section explains previously proposed low-leakage techniques for generic logic
circuits. As introduced, previously proposed work can be divided into techniques that
either (i) save state or (ii) destroy state. Although our research focuses on techniques
which save state, we also review the state-destructive techniques for the purposes of
comparison. In state-destructive category there is sleep transistor technique and forced
stack. The state saving category includes sleepy stack, sleepy keeper, dual sleep and
dual stack method.
32

4.1.1.1 Sleep transistor








State-destructive techniques cut off transistor (pull-up or pull-down or both) networks
from supply voltage or ground using sleep transistors [34]. These types of techniques
are also called gated-V
dd
and gated-Gnd (note that a gate dc lock is generally used for
dynamic power reduction). Motoh et al. propose a technique they call Multi-Threshold
Voltage CMOS (MTCMOS) [34], which adds high-V
th
sleep transistors between pull-
up networks and V
dd
and between pull-down networks and ground as shown in Figure 1
while logic circuits use low-V
th
transistors in order to maintain fast logic switching
speeds. The sleep transistors are turned off when the logic circuits are not in use. By
isolating the logic networks using sleep transistors, the sleep transistor technique
dramatically reduces leakage power during sleep mode. However, the additional sleep
transistors increase area and delay. Furthermore, the pull-up and pull-down networks
will have floating values and thus will lose state during sleep mode. These floating
values significantly impact the wake-up time and energy of the sleep technique due to
the requirement to recharge transistors which lost state during sleep (this issue is
nontrivial, especially for registers and flip-flops).
Comparison with prior works using sleep transistors
The sleep transistor technique and the “Variable Body Biasing” technique both achieve
roughly the same static power savings over conventional CMOS. However, unlike the
sleep transistor technique, the “Variable Body Biasing” technique saves logic state
during low leakage mode (sleep mode), and this is a significant advantage over the
state-destructive sleep transistor technique. The sleep transistor technique requires non-
Figure 5 Sleep transistor
33

negligible power consumption to restore lost state. Further, the wake-up time of the
sleep transistor technique is significant, while the “Variable Body Biasing” technique
needs only a very small extra wake-up time (a few clock cycles).
4.1.1.2 Forced Stack
Another technique to reduce leakage power is transistor stacking. Transistor
stacking exploits the stack effect explained in Chapter3; the stack effect results in
substantial subthreshold leakage current reduction when two or more stacked transistors
are turned off together.






Example1: The stack effect can be understood from the forced stack inverter
example shown in Figure 6. Unlike a generic CMOS inverter, this forced stack inverter
consists of two pull-up transistors and two pull-down transistors. All inputs share the
same input „A.‟ If A =0, then both transistors M1 and M2 are turned off. Due to the
internal resistance of M2, the intermediate node voltage V
x
is higher than Gnd. The
positive potential of V
x
results in a negative gate-source voltage (V
gs
) for M1 and
negative source-base voltage (V
sb
) for M1. Furthermore, M1 has a reduced drain-source
voltage (V
ds
), which degrades the Drain Induced Barrier Lowering (DIBL) effect [35].
All three effects together change the leakage reduction factor X in Equation 3.2
(see Chapter 3), reducing leakage current by an order of magnitude for today‟s channel
lengths (0.18µ, 0.13µ, 0.10µ and 0.07µ) [36].
Narendra et al. study the effectiveness of the stack effect including effects from
increasing the channel length [37]. Since forced stacking of what previously was a
Figure 6 “Forced Stack”
34

single transistor increases delay, Johnson et al. propose an algorithm that finds circuit
input vectors that maximize stacked transistors of existing complex logic [38].
Comparison with prior work using the forced stack approach
Compared to the forced stack technique, the “Variable Body Biasing” technique
potentially achieves more power savings because the “Variable Body Biasing” can
control the change in body bias during circuit operation. The forced stack technique
cannot use high-V
th
transistors without dramatic delay increase (larger than 5X delay
increase compared to conventional CMOS).
4.1.1.3 Sleepy Stack
The sleepy stack approach has a structure combining the stack and sleep approaches by
dividing every transistor into two transistors of half width and placing a sleep transistor
in parallel with one of the divided transistor [14, 15]. As shown in Figure 7, sleep
transistors are placed in parallel to the divided transistor closest to V
dd
for pull-up and
in parallel to the divided transistor closest to GND for pull-down. The sleepy stack
approach can have advantages of both the stack approach and the sleep approach.
During active mode, the sleepy stack approach results in lower delay than the stack
approach because sleep transistors placed in parallel (i) reduce resistance and (ii) are
already on. When sleep transistors are turned off; the existence of a path from either
V
dd
or GND prevents floating output. Also, leakage current can further be reduced by
applying high-V
th
on sleep transistors and the transistors in parallel to the sleep
transistors. However, area penalty is significant matter since every transistor is replaced
by three transistors and since additional wires are added for S and S′, which are sleep
signals.






Figure 7 SLEEPY STACK
35

Comparison with prior work using the sleepy stack approach
Compared to the sleepy stack technique, the “Variable Body Biasing” technique
achieves 86% more power savings because the “Variable Body Biasing” can control the
change in body bias during circuit operation. The sleepy stack requires 32.3% more
area than “Variable Body Biasing” technique. This area overhead is a major
improvement over sleepy stack technique.
4.1.1.4 Sleepy keeper
Another approach called sleepy keeper utilizes leakage feedback technique [16] and is
shown in Figure 8. In this approach, a PMOS transistor is placed in parallel to the sleep
transistor (S) and a NMOS transistor is placed in parallel to the sleep transistor (S').
The two transistors are driven by the output of the inverter. During sleep mode, sleep
transistors are turned off and one of the transistors in parallel to the sleep transistors
keep the connection with the appropriate power rail.









Comparison with prior work using the sleepy keeper approach
Compared to the sleepy keeper technique, the “Variable Body Biasing” technique
achieves 50% more power savings because the “Variable Body Biasing” can control the
change in body bias during circuit operation. The sleepy keeper requires less area than
“Variable Body Biasing” technique. But the reduction in leakage power is more useful.
Figure 8 Sleepy keeper
36

4.1.1.5 Dual Sleep
Dual sleep approach uses the advantage of using the two extra pull-up and two extra
pull-down transistors in sleep mode either in OFF state or in ON state. Since the dual
sleep portion can be made common to all logic circuitry, less number of transistors is
needed to apply a certain logic circuit. In OFF state each of the pull-up and pull-down
networks consists of both PMOS and NMOS transistors in order to reduce the leakage
power. There are three obvious advantages. Firstly, it maintains state in sleep mode.
Secondly, like the sleep, sleepy stack and sleepy keeper approaches, dual-V
th

technology can be applied in dual sleep approach to obtain greater leakage power
reduction [17].










Comparison with prior work using the dual sleep approach
Dual sleep method requires 94.7%, 95.3% and 80.49% more leakage power compared
to “Variable Body Biasing” technique respectively for chain of 4 inverters, 1 bit Full
adder and SRAM cell. There is around 7% improvement in propagation delay for logic
circuits for “Variable Body Biasing” technique compared to dual sleep method.




Figure 9 Dual Sleep
37

4.1.1.6 Dual Stack
The dual stack approach (Figure 10) uses 2 extra PMOS in the pull down network and
2 extra NMOS in the pull up network. As a result the NMOS degrades high logic level
and the PMOS degrades the low logic level. Due to the body effect they further
decrease the voltage level. So, the pass transistor decreases the voltage applied across
the main circuit. The stacked transistors are held in reverse body bias. As a result their
threshold is high. High threshold voltage causes low leakage current and hence low
leakage power. Again minimum transistor size of aspect ratio 1 is used to reduce the
static power more [18].










Comparison with prior work using the dual stack approach
Dual sleep method requires 92.93%, 94.58% and 77.14% more leakage power
compared to “Variable Body Biasing” technique respectively for chain of 4 inverters, 1
bit Full adder and SRAM cell. There is improvement also in propagation delay for both
logic circuits and memory circuits (i.e. SRAM) for “Variable Body Biasing” technique
compared to dual sleep method.



Figure 10 Dual Stack
38

4.1.2 Static power reduction research for SRAM
In this section, we discuss state-of-the-art low-power memory techniques, especially
SRAM on which our research focuses and hence make comparisons.
4.1.2.1 Sleep transistor
This is same as applying sleep transistor in generic logic circuits i.e. chain of four
inverters. The V
dd
and Gnd rails are separated from the circuit through a PMOS and an
NMOS transistor respectively.












Comparison with our work
Sleep transistor method requires 35.4% more static power and 35.7% more
dynamic power than our approach. It has 0.89% less delay and 31.9% area compared to
our approach. Our proposed approach has overall power delay product 35.05% more
than sleep transistor approach.



Figure 11 SLEEP TRANSISTOR IN SRAM
39

4.1.2.2 Dual Sleep












Sleep transistors are crucial part in any low leakage power design. Generally, the sleep
transistor is used to reduce leakage power in off mode and other techniques are adopted
to save the state. In this method, each of the rails is separated by a header and footer
sleep transistor. It is similar to the case of logic circuits. We apply S=1 when the circuit
is in active mode and S=0 when it is in sleep mode.
Comparison with our work
Dual sleep method requires 84.12% more static power and 35.49% more
dynamic power than our approach. It has 0.10% more delay compared to our approach.
Our proposed approach has overall power delay product 35.35% more than dual sleep
approach.




Figure 12 “DUAL SLEEP” IN SRAM
40

4.1.2.3 Dual Stack














Figure 13 shows the configuration of dual stack method in case of a SRAM. In this
method, there are two extra MOSFETS parallel to the sleep transistors. These extra
MOSFET helps to retain state which is crucial for the operation of SRAM. As the
retention transistors are stacked they help to reduce leakage power. Its operation is
similar to the case of logic circuits. We apply S=1 when the circuit is in active mode
and S=0 when it is in sleep mode.

Comparison with our work
Dual stack method requires 80.73% more static power and 0.48% less dynamic
power than our approach. It has 0.05% less delay compared to our approach. Our
proposed approach has overall power delay product 0.46% less than dual stack
approach.

Figure 13 “DUAL STACK” IN SRAM
41

4.1.2.4 Sleepy Keeper in SRAM










Figure 14 shows the configuration of sleepy keeper method in case of a SRAM. In this
method, there is extra MOSFET parallel to the sleep transistors. These extra MOSFET
helps to retain state which is required for the operation of SRAM. In this case the
retention transistors are not stacked so they offer small amount of reduction in leakage
power. Its operation is similar to the case of logic circuits. We apply S=1 when the
circuit is in active mode and S=0 when it is in sleep mode.
Comparison with our work
Sleep transistor method requires 37.1795% more static power and 39.5954%
more dynamic power than our approach. It has 0.43471% more delay compared to our
approach. Our proposed approach has overall power delay product 39.86% more than
sleepy keeper approach.

Figure 14 “SLEEPY KEEPER” IN SRAM
42

CHAPTER 5
VARIABLE BODY BIASING TECHNIQUE

In this chapter, we introduce our new leakage power reduction technique we name
“Variable Body Biasing.” We derived this technique by controlling the V
th
of the sleep
transistor of sleepy keeper technique mode wise so that the subthreshold leakage can be
minimized in sleep mode. However, unlike the sleep transistor technique, the variable
body biasing technique retains the exact logic state; and, unlike the sleepy keeper
technique, our technique can utilize variable V
th
using body effect without suffering
delay penalties. Therefore, far better than any prior approach known to this thesis
author, the variable body biasing technique can achieve ultra-low leakage power
consumption while saving state.

We first explain the structure of the variable body biasing technique using an
inverter. Then we describe the details of variable body biasing operation in active mode
and sleep mode. The advantages of the variable body biasing technique over the sleep
transistor technique and the sleepy keeper technique are explored. Finally, we apply
linear delay model to our variable body biasing technique to estimate the propagation
delay.

5.1 Variable body biasing approach
In this section, we explain our variable body biasing structure comparing to the sleepy
keeper technique for an inverter. The details of the variable body biasing inverter are
described as an example. Two operation modes, active mode and sleep mode, of the
variable body biasing technique are explored.





43


















5.2 Variable body biasing structure
We have already described the sleepy keeper structure in section 4.1.1.4. In sleepy
keeper structure the sleep transistors still have some subthreshold leakage in sleep
mode which can be reduced by increasing their V
th
using body effect. To implement
this we use a PMOS (M2) and a NMOS (M5) (Figure15 (b)). The drain of the PMOS
(M2) is connected with the body of the sleep PMOS (M1) and the source the PMOS
(M2) is connected to the V
dd
. Similarly the drain of the NMOS (M5) is connected with
the body of the sleep NMOS (M4) and the source the NMOS (M5) is connected to the
Gnd. The other PMOS (M3) and NMOS (M6) are used as keeper to retain the state of
the output in sleep mode. The W/L ratio of the inverter PMOS is 6 and NMOS is 3. All
other transistor have W/L ratio of unity.

Figure 15 An Inverter with (a) Sleepy Keeper (left) (b) Variable body biasing structure
(right)
44

5.3 Variable body biasing operation
During active mode (S=1, S′=0) transistor M1, M2, M4 and M5 act as short circuit so
the inverter works normally. The V
sb
of the PMOS (M1) and NMOS (M4) is almost
zero. As a result the V
th
of sleep transistors decreases by body effect. (Equation (5.1))
I
tb
= I
tb0
+ (¸
s
+ I
sb
−¸
s
) (5.1)
Where I
tb0
is the threshold voltage source is at the body potential,
s
= 2I
0
ln⁡(N
A
/n
i
)
is the surface potential at threshold, is the body-bias effect coefficient, typically in the
range 0.4 to 1 V
1/2
, N
A
is the doping level and I
0
is the thermal voltage.

On the other hand, during sleep mode (S=0, S′=1) transistor M1, M2, M4 and M5 are
cutoff which make V
sb
of the sleep transistor non zero. As a result the V
th
of sleep
transistors increases hence the subthreshold leakage is reduced by the following
equation:
I
sub
= A c
1
n I
0
(I
gs
−I
tb0
−I
sb
+pI
Js
)
(1 −c
− I
Js
/I
0
) (5.2)
Where A =
0
C
ox
_
w
I
c¡¡
]I
0
2
c
1.8
, n is the subthreshold swing coefficient and I
0
is the
thermal voltage. I
gs
, I
tb0
, I
sb
and I
Js
are the gate-to-source voltage, the zero-bias
threshold voltage, the source-to-base voltage and the drain-to-source voltage,
respectively. is the body-bias effect coefficient, and p is the Drain Induced Barrier
Lowering (DIBL) coefficient.
0
is zero-bias mobility, C
ox
is the gate-oxide
capacitance, w is the width of the transistor, and I
c¡¡
is the effective channel length.

Reducing the subthreshold leakage makes the static dissipation lower. Thus the V
th
of
the sleep transistors (M1, M4) are varied mode wise. In active mode it is necessary to
decrease the V
th
for reducing the propagation delay:
I
J

I
JJ
(I
JJ
−I
tb
)

(5.3)
Where T
d
, V
th
, and denote the gate delay in a CMOS circuit, the threshold voltage
and velocity saturation index of a transistor, respectively.

45

The state retention is accomplished by keeper pair NMOS (M3) and PMOS (M6).
During sleep mode, if output is high the NMOS (M3) keeps the state high and if the
output is low PMOS (M6) keeps the state low independent of any margin of noise.
5.4 Analysis of Subthreshold leakage reduction
Using analytical approach we can show that the subthreshold current can be reduced by
variable body biasing technique. In Figure 16(a) we have a sleep transistor without
body biasing transistor and in Figure 16(b) a sleep transistor (M1) with body biasing
transistor (M2).











For sleep transistor without body biasing subthreshold leakage is given by
I
sub
= A c
1
n I
0
(I
gs
−I
tb0
−I
sb
+pI
Js
)
(1 − c
− I
Js
/I
0
) (5.4)
During sleep mode S′ is high, so I
gs
is zero. As source is at body potential, the
subthreshold leakage becomes
I
sub
= A c
1
n I
0
(−I
tb0
+pI
Js
)
(1 − c
− I
Js
/I
0
) (5.5)

Now for sleep transistor with body biasing transistor subthreshold leakage through M2
is
I
sub 2
= A c
1
n I
0
(I
gs2
−I
tb0
−I
sb2
+pI
Js2
)
(1 − c
− I
Js2
/I
0
) (5.6)

Here during sleep mode I
gs2
is zero. Unlike the sleep transistor (M1), the body biasing
transistor (M2)‟s source is connected with body. So I
sb2
is also zero. Hence
Figure 16 (a) Sleep transistor without body biasing transistor
(b) With body biasing transistor
46

I
sub 2
= A c
1
n I
0
(−I
tb0
+pI
Js2
)
(1 −c
− I
Js2
/I
0
) (5.7)
I
sub 2
= A c
1
n I
0
(−I
tb0
+pI
Js2
)
−A c
1
n I
0
(−I
tb0
+ (p−n) I
Js2
)
(5.8)
Now the subthreshold leakage through M1 is given by
I
sub 1
= A c
1
n I
0
(I
gs1
−I
tb0
−I
sb1
+pI
Js1
)
(1 − c
− I
Js1
/I
0
) (5.9)
Here I
gs1
is zero and I
sb1
is equal to I
Js2
. Hence I
sub 1
becomes
I
sub 1
= A c
1
n I
0
(−I
tb0
−I
sJ2
+pI
Js1
)
(1 − c
− I
Js1
/I
0
) (5.10)

From Figure 16(b) we see that only I
sub 1
contributes to the static current. Comparing
subthreshold leakage of sleep transistor without body biasing from equation (5.5) with
subthreshold leakage of sleep transistor with body biasing from equation (5.10) we find
that,
I
sub 1
< I
sub

So by body biasing the static current during sleep mode is reduced thus resulting in a
reduced static power (according to Equation 3.1).
5.5 Estimation of delay for variable body biasing technique
In this section we estimate the propagation delay of an inverter by applying the linear
delay model which is discussed in section (3.4).














Figure 17 (a) Inverter with VBB Technique (left) (b) Inverter of equal strength (right)
47

To find the logical effort of the inverter with variable body biasing technique Figure
17(a) we have to determine it‟s the input capacitance and also input capacitance of a
simple inverter of equal strength Figure 17(b).
Input capacitance of the inverter with variable body biasing technique is
C
in
=6+3=9,
Input capacitance of a simple inverter of equal strength is
C
in
′=3/2+3/4=9/4.
Logical effort, g =
9
9/4
= 4
Transistor M3 and M6 contributes to parasitic capacitance. So p = 6 +3 +1 + 1 = 11
Assuming fan-out of one identical gate, Electrical effort, b = 1
Now according to linear delay model, propagation delay, J = gb + p = 4 × 1 +11 =
15 in unit of = 3RC, parasitic delay of unit inverter. For 180nm process, = 15ps.
So, propagation delay of the inverter with variable body biasing technique is estimated
to be 15 × 11 ps = 165ps for 180 nm process.[33]

In this chapter, we introduced the variable body biasing technique for leakage power
reduction. In this technique, the V
th
of the sleep transistor of sleepy keeper technique is
controlled using body effect so that the subthreshold leakage can be minimized in sleep
mode. In sleep mode the V
th
of the sleep transistor is increased as a result subthreshold
leakage is reduced as well as the static power dissipation. During active mode, due to
the body biasing transistor V
th
of the sleep transistor is decreased so the propagation
delay can be kept within the limit.

In the next chapter we apply the variable body biasing structure to generic logic circuits
and to SRAM, explaining in detail our methodology.



48

CHAPTER 6
EXPERIMENTAL RESULTS

We compare Lhe ºvA8lA8LL 8Cu? 8lA“lnC" Lechnique to a number of key, well-known low-
leakage techniques. At first, we explore the experimental results for general logic circuits.
Then we explore the experimental results for SRAM cell design.
6.1 Experimental results for general logic circuits
In this section, we explain the experimental results for generic logic circuits. We utilize two
logic designs namely (1) chain of four inverters (CO4) and (2) 1 bit full adder (FA). The chosen
technologies are BSIM4 PTM Model [36] and their supply voltages are given in Table 4.
Table 4 Chosen technology and V
dd
value
130 nm 90 nm 65 nm 45 nm 32 nm
1.3V 1.2V 1.1V 1.0V 0.9V

6.1.1 Experimental results for CO4
We have considered the following three techniques for comparing with our proposed
technique:
 SLEEPY STACK
 DUAL SLEEP
 DUAL STACK
The data of static power consumption of these methods for different technologies are shown
in Table 5. From the table we can clearly understand that variable body biasing technique has
the lowest static power consumption compared to the other techniques.



49

Table 5 Static power data for chain of 4 inverters (nano watt)
Different
Technologies SLEEPY STACK DUAL SLEEP DUAL STACK VBB
130nm 10.3 26 21.6 1.44
90nm 9.16 21.3 16.9 1.13
65nm 8.82 15.5 11.6 0.82
45nm 4.41 9.23 6.22 0.494
32nm 2.44 6.59 3.79 0.364



Figure 18 Static Power Consumption (CO4)

Figure 18 shows the graphical representation of the static power consumption in a CO4 for
different technologies in different methods. From the graph we can see the downward
tendency of static power consumption compared to other methods in all of the technologies.
The data of dynamic power consumption of these methods for different technologies are
shown in Table 6 in the next page. From the presented data we can see that the dynamic
power is almost equal to dual stack method and less than the sleepy stack and dual sleep
method.


0
5
10
15
20
25
30
SLEEPY
STACK
DUAL SLEEP DUAL STACK Variable
Body Biasing
N
a
n
o

W
a
t
t
Static Power Consumption
130nm
90nm
65nm
45nm
32nm
50

Table 6 Dynamic power data for chain of 4 inverters (micro watt)
Different
Technologies SLEEPY STACK DUAL SLEEP DUAL STACK VBB
130nm 21.1 26.5 12 12.2
90nm 12.6 16.2 7.31 7.21
65nm 6.98 9.47 4.31 4.22
45nm 3.26 4.67 2.06 2.05
32nm 1.61 2.33 1.03 1.03



Figure 19 Dynamic power consumption (CO4)

Figure 19 shows the graphical representation of the dynamic power consumption in a CO4 for
different technologies in different methods. From the graph we can see that dynamic power is
a maximum for dual sleep method and dual stack and variable body biasing method have
almost equal power consumption.
The data of propagation delay of aforementioned methods for different technologies are
shown in Table 7.


0
5
10
15
20
25
30
SLEEPY STACK DUAL SLEEP DUAL STACK variable body
biasing
M
i
c
r
o

w
a
t
t
Dynamic Power Consumption
130nm
90nm
65nm
45nm
32nm
51

Table 7 propagation delay data for chain of 4 inverters (Pico seconds)
Different
Technologies SLEEPY STACK DUAL SLEEP DUAL STACK
variable body
biasing
130nm 85.9 53.5 84.3 57.5
90nm 68.6 39.1 38.2 34.9
65nm 57 35.8 61.5 32.8
45nm 50.3 30.9 38.9 44.9
32nm 46.8 35.9 74.2 78.6




Figure 20 Propagation delay (CO4)

Figure 20 shows the graphical representation of the propagation delay in a CO4 for different
technologies in different methods.
The data of Power delay Product of aforementioned methods for different technologies are
shown in Table 8.



30
40
50
60
70
80
90
SLEEPY STACK DUAL SLEEP DUAL STACK variable body
biasing
P
i
c
o

S
e
c
o
n
d
s
Propagation Delay
130nm
90nm
65nm
45nm
32nm
52

Table 8 Power delay Product data for chain of 4 inverters (femto joule)
Different
Technologies SLEEPY STACK DUAL SLEEP DUAL STACK
variable body
biasing
130nm 1.8133748 1.419141 1.013421 0.701583
90nm 0.8649884 0.634253 0.279888 0.251668
65nm 0.3983627 0.339581 0.265778 0.138443
45nm 0.1641998 0.144588 0.080376 0.092067
32nm 0.0754622 0.083884 0.076707 0.080987




Figure 21 power delay product

Figure 21 shows the graphical representation of the power delay product in a CO4 for
different technologies in different methods.
The data of area of aforementioned methods for different technologies are shown in Table 9.
lrom Lhls we can see LhaL varlable body blaslng meLhod’s area ls smaller Lhan dual stack and
sleepy stack method but greater than dual sleep method.




0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
SLEEPY STACK DUAL SLEEP DUAL STACK variable body
biasing
f
e
m
t
o

j
o
u
l
e
Power delay product
130nm
90nm
65nm
45nm
32nm
53

Table 9 Area delay data for chain of 4 inverters (µm
2
)
Different
Technologies SLEEPY STACK DUAL SLEEP DUAL STACK
Variable Body
Biasing
130nm 57.87067 34.0692 40.851525 39.17843
90nm 27.73683 16.3053 19.579725 18.77783
65nm 14.4676675 8.504925 10.21288125 9.794606
45nm 6.9342075 4.076325 4.89493125 4.694456
32nm 3.5064832 2.061312 2.475264 2.373888



Figure 22 Area comparison (CO4)

Figure 22 shows the graphical representation of area comparison in a CO4 for different
technologies in different methods.
From the above data we can see that it was possible to reduce the static power consumption
by our proposed method many times than that of the previous methods. In this process, our
process did not suffer any area or delay penalties as it has almost equal delay and area
compared to the previous methods.




0
10
20
30
40
50
60
70
SLEEPY STACK DUAL SLEEP DUAL STACK Variable Body
Biasing
M
i
c
r
o

m
e
t
e
r

s
q
u
a
r
e
Area
130nm
90nm
65nm
45nm
32nm
54

6.1.2 Experimental results for FA
We have considered the following four techniques for comparison with our proposed
technique:
 SLEEP TRANSISTOR
 SLEEPY KEEPER
 DUAL SLEEP
 DUAL STACK
The data of static power consumption of these methods for different technologies are shown
in Table 10.
Table 10 Static power data for 1 bit full adder (nano watt)
Different
technologies
SLEEP
TRANSISTOR
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 0.327 0.365 4.11 3.6544 0.18814
90nm 0.278 0.305 3.3417 2.8934 0.15712
65nm 0.251 0.273 2.7807 2.3359 0.13521
45nm 0.176 0.189 1.6361 1.2675 0.089568
32nm 0.164 0.173 1.2712 0.86129 0.0759



Figure 23 Static power consumption for FA

0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
SLEEP
TRANSISTOR
SLEEPY KEEPER DUAL SLEEP DUAL STACK variable Vsb
N
a
n
o

W
a
t
t
Static Power Consumption
130nm
90nm
65nm
45nm
32nm
55


Figure 23 shows the graphical representation of static power consumption in an FA for
different technologies in different methods. The data of dynamic power consumption of these
methods for different technologies are shown in Table 11.
Table 11 Dynamic power data for 1 bit full adder (micro watt)
Different
technologies
SLEEP
TRANSISTOR
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 26.2 26.4 26.2 17.3 16
90nm 15 15 15 9.44 8.54
65nm 8.28 8.35 8.28 5.14 4.63
45nm 3.57 3.59 3.57 2.34 2.13
32nm 1.6 1.61 1.6 1.1 1.01



Figure 24 Dynamic power consumption for FA

Figure 24 shows the graphical representation of the dynamic power consumption in an FA for
different technologies in different methods.
The data of propagation delay of aforementioned methods for different technologies are
shown in Table 12.

0
5
10
15
20
25
30
SLEEP
TRANSISTOR
SLEEPY KEEPER DUAL SLEEP DUAL STACK variable Vsb
m
i
c
r
o

W
a
t
t
Dynamic Power Consumption
130nm
90nm
65nm
45nm
32nm
56


Table 12 Data of propagation delay for 1 bit full adder (nano second)
Different
technologies
SLEEP
TRANSISTOR
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 20 20 20 19.9 19.9
90nm 20 20 20 20 18.7
65nm 20 20 20 18.8 18.8
45nm 20 20 20 18.8 18.8
32nm 20 20 20 18.8 18.8



Figure 25 Propagation delay comparison in FA

Figure 25 shows the graphical representation of the propagation delay in an FA for different
technologies in different methods.
The data of power delay product of aforementioned methods for different technologies are
shown in Table 13.




18
18.5
19
19.5
20
20.5
SLEEP
TRANSISTOR
SLEEPY KEEPER DUAL SLEEP DUAL STACK variable Vsb
n
a
n
o

s
e
c
o
n
d
s
propagation delay
130nm
90nm
65nm
45nm
32nm
57

Table 13 Power delay product data for 1 bit full adder (femto joule)
Different
technologies
SLEEP
TRANSISTOR
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 524 528 524 344 317
90nm 299 301 300 188 160
65nm 166 167 166 96.7 87
45nm 71.4 71.8 71.4 44 39.9
32nm 32 32.2 32 20.6 19



Figure 26 Power Delay Product for FA

Figure 26 shows the graphical representation of the power delay product in an FA for different
technologies in different methods.
The data of area comparison of aforementioned methods for different technologies are
shown in Table 14.



0
100
200
300
400
500
600
SLEEP
TRANSISTOR
SLEEPY KEEPER DUAL SLEEP DUAL STACK variable Vsb
f
e
m
t
o

J
o
u
l
e
POWER DELAY PRODUCT
130nm
90nm
65nm
45nm
32nm
58

Table 14 Area data for 1 bit full adder (µm2)
Different
technologies
SLEEP
TRANSISTOR
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 317.1 312.36 317.98 359.15 330.28
90nm 151.98 149.71 152.4 172.14 163.9
65nm 79.28 78.09 79.5 89.79 85.07
45nm 37.996 37.43 38.1 43.03 40.98
32nm 19.21 18.93 19.3 21.76 20.19



Figure 27 Area Comparison for FA

Figure 27 shows the graphical representation of the area comparison in an FA for different
technologies in different methods.
From the data presented above, we can realize that for 1 bit full adder we get the same kind
of result like we got for chain of four inverters for all the methods under consideration.




0
50
100
150
200
250
300
350
400
SLEEP
TRANSISTOR
SLEEPY
KEEPER
DUAL SLEEP DUAL STACK variable Vsb
m
i
c
r
o

m
e
t
e
r

s
q
u
a
r
e
AREA
130nm
90nm
65nm
45nm
32nm
59

6.2 Experimental results for SRAM
In this section, we explore the experimental results for the different SRAM cell variations. Like
the generic circuit experimental comparisons in Section 6.1, here, we have considered the
following four techniques for comparison with our proposed technique:
 SLEEP TRANSISTOR
 SLEEPY KEEPER
 DUAL SLEEP
 DUAL STACK
The data of static power of aforementioned methods for different technologies are shown in
Table 15.
Table 15 Static power data for SRAM (nano watt)
Different
technologies SLEEP
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 5.31 5.46 21.6 17.8 3.43
90nm 4.25 4.39 16.5 14.5 2.88
65nm 3.27 3.41 12.3 10.5 2.4
45nm 1.87 1.95 7.31 5.9 1.36
32nm 1.37 1.41 5.35 3.97 0.948


Figure 28 Static power consumption for SRAM cell

0
5
10
15
20
25
sleep sleepy
keeper
dual sleep dual stack variable
vsb
N
a
n
o

W
a
t
t
Static Power Consumption
130nm
90nm
65nm
45nm
32nm
60

Figure 28 shows the graphical representation of the static power consumption in SRAM for
different technologies in different methods.
The data of dynamic power aforementioned methods for different technologies are shown in
Table 16.
Table 16 Dynamic power data for SRAM (micro watt)
Different
Technologies SLEEP
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 32.5 34.6 32.4 20.8 20.9
90nm 22 21.2 21.7 13.2 13.2
65nm 14.6 14.2 14.7 8.35 7.66
45nm 7.62 7.85 7.62 4.2 4.19
32nm 4.27 4.05 4.27 2.23 2.27



Figure 29 Dynamic power consumption for SRAM

Figure 29 shows the graphical representation of the dynamic power consumption in SRAM for
different technologies in different methods.
The data of propagation delay of aforementioned methods for different technologies are
shown in Table 17.


0
5
10
15
20
25
30
35
40
sleep sleepy
keeper
dual sleep dual stack variable
body
biasing
M
i
c
r
o

W
a
t
t
Dynamic Power Consumption
130nm
90nm
65nm
45nm
32nm
61

Table 17 Data of propagation delay for SRAM (nano second)
Different
technologies SLEEP
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 5.97 6.05 6.03 6.0207 6.0237
90nm 6.03 5.97 6 6.0194 6.0221
65nm 5.99 5.97 5.95 6.0208 6.0158
45nm 6 6.01 6 6.0228 6.0232
32nm 6.08 6.08 6.08 6.04 6.06



Figure 30 Propagation delay comparison for SRAM

Figure 30 shows the graphical representation of the propagation delay comparison in SRAM
for different technologies in different methods.
The data of power delay product of aforementioned methods for different technologies are
shown in Table 18.





5.95
5.97
5.99
6.01
6.03
6.05
6.07
6.09
sleep sleepy
keeper
dual sleep dual stack variable
body
biasing
N
a
n
o

S
e
c
o
n
d
s
Propagation Delay
130nm
90nm
65nm
45nm
32nm
62

Table 18 Power delay product data for SRAM (femto joule)
Different
Technologies SLEEP
SLEEPY
KEEPER DUAL SLEEP DUAL STACK VBB
130nm 0.194057 0.209363 0.195502 0.125338 0.125916
90nm 0.132686 0.12659 0.130299 0.079543 0.079509
65nm 0.087474 0.084794 0.087538 0.050337 0.046095
45nm 0.045731 0.04719 0.045764 0.025331 0.025245
32nm 0.02597 0.024633 0.025994 0.013493 0.013762



Figure 31 Power Delay Product of SRAM

Figure 31 shows the graphical representation of the power delay product in SRAM for
different technologies in different methods.
The data of area of aforementioned methods for different technologies are shown in Table 19.
Table 19 Area data for SRAM (µm2)
Different
Technologies sleep
sleepy
keeper dual sleep dual stack
Variable body
biasing
130nm 51.1284 56.3022 17 33 34.85625
90nm 42.56 49 9 16 16.70625
65nm 32.968 36 4 10 8.714063
45nm 25 30 3 2.7 4.176563
32nm 14 20 2 2.2 2.112


0
0.05
0.1
0.15
0.2
0.25
sleep sleepy
keeper
dual sleep dual stack variable
vsb
f
e
m
t
o

j
o
o
u
l
e
Power Delay Product
130nm
90nm
65nm
45nm
32nm
63


Figure 32 Area comparison for SRAM

Figure 32 shows the graphical representation of the area in SRAM for different technologies in
different methods.
6.3 Comparison with previous methods
When we compare our resulL Lo anoLher resulL, we ofLen say one ls ºless Lhan" Lhe oLher. ln
parLlcular, ºx ls n° less Lhan ?" means whaL Lq. 6.1 shows:[39]
n =
prc: ious mct boJ Joto −ncw mct boJ Joto
prc:ious mct boJ Joto
× 100% (6.1)
For example, when two propagation delay measurements result in, X is 8.18E-10s and Y is
1.23E-09s, n is 50 from calculation using Eq. 6.1. In this case, we say X is 50% less delay than
Y. This equation is used for all other comparison such as area and power consumption.
The comparisons of VBB approach using 90 nm technologies with the existing methods for a
chain of four inverters, 1 bit full adder and for a SRAM cell are summarized in Table 20, 21 and
Table 22, respecLlvely. Pere º+" denoLes lmproved and º-º, denoLes degraded performance.



0
10
20
30
40
50
60
sleep sleepy
keeper
dual sleep dual stack Variable
body
biasing
M
i
c
r
o
m
e
t
e
r

S
q
u
a
r
e
Area
130nm
90nm
65nm
45nm
32nm
64

Table 20 Comparison of VBB Approach for a Chain of Four Inverters (for 90 nm
process)
Methods delay Static Power
Dynamic
Power
Area
Dual sleep +8.37% +94.7% +55.4% -15.16%
Dual stack +46.67% +92.93% +2.09% +4.09%

Here VBB approach exhibits 8.37%, 94.7% and 55.4% improved performance with respect to
dual sleep technique in delay, Static Power, Dynamic Power respectively while giving 15.16%
penalty in area. With respect to Dual Stack technique it shows 46.67%, 92.93%, 2.09% and
4.09% improved performance in delay, static power, dynamic power and area respectively.
Table 21 Comparison of VBB Approach for a 1 bit full adder (for 90 nm process)
Methods delay Static Power
Dynamic
Power
Area
Dual sleep +6.5% +95.3% +40.97% -7.55%
Dual stack +6.5% +94.58% +9.53% 4.7887%

Here VBB approach exhibits 6.5%, 95.3% and 40.97% improved performance with respect to
dual sleep technique in delay, Static Power, Dynamic Power respectively while giving 7.55%
penalty in area. With respect to Dual Stack technique it shows 6.5%, 94.58%, 9.53% and
4.7887% improved performance in delay, static power, dynamic power and area respectively.
Table 22 Comparison of VBB Approach for a SRAM (for 90 nm process)
Methods delay Static Power
Dynamic
Power
Area
Dual sleep -1.1% +80.49% +47.89% -74.28%
Dual stack +0.08% +77.14% +8.26% +12.86%

Here VBB approach exhibits 80.49% and 47.89% improved performance with respect to dual
sleep technique in static Power, dynamic Power respectively while giving 1.1% and 74.28%
penalty in delay and area respectively. With respect to Dual Stack technique it shows 0.08%,
77.14%, 8.26% and 12.86% improved performance in delay, static power, dynamic power and
area respectively.
65

CHAPTER 7
CONCLUSION
This section provides the summary of our contribution, the ratiocination
of this work and some suggestions for future work.

7.1 CONCLUSION
With rigid energy budget in energy constrained systems, subthreshold circuit design
has become a predominant technique in recent years. The battery life of remote or
portable devices may not be affordable to the system demands. In an extreme case,
micro-sensor networks may require very little energy consumption to be supplied by
electrical energy converted from the ambient energy, such as energy harvesting or
energy scavenging. These challenges are solved by designing the systems with respect
to a very low supply voltage below V
th
, but performance penalty still remains for
subthreshold circuits. Without the performance requirement, we can focus on minimum
energy operation as a primary goal. On the other hand, some energy efficient systems
have a wide range of speed requirements; therefore the operation of systems may occur
at a non-minimum energy point. We utilize the body biasing effect to further lower
energy budget for energy constrained systems that have speed requirement or not.
Using Variable body Biasing design for subthreshold circuits, static energy is always
less than the prior works while maintaining system speed requirements.
In this dissertation we proposed a new static power reduction technique named
“Variable Body Biasing”. With the help of this technique we were able to reduce the
static power consumption in low power CMOS circuit without penalizing in delay or
area. This design technique offers the low power CMOS circuit designers a new armor
in their arsenal.


66


7.2 SUGGESTIONS FOR FUTURE WORK

 We have implemented our design in chain of four inverters, 1 bit full adders and
SRAM circuit. More tests could be done on ISCAS benchmark circuits for
further verification.
 In our design we tried to keep the delay and area equal to previous cases.
Further research could be done to explore design techniques to reduce delay and
area as well as static power, hence overall increase of circuit performance.
 We have used a minimum of 32 nm nodes in our research. Smaller processes
could be used to explore the static power consumption in sub nanometer
processes.
67

APPENDIX

A. AREA ESTIMATION
Layouts of all the considered approaches are designed based on 130nm process
by using standard layout design application. Areas for below 130nm technology
are estimated by scaling the area of each approach layout designed based on
130nm process. The areas are scaled by a ratio of squares with addition of a
10% overhead for nonlinear scaling layers (i.e., metal layers). For example, if
an area of 100.00µm
2
is measured for 130 nm technology, the area for 120nm
technology would be 100.00μm
2
* (120
2
/ 130
2
) * 1.1 = 93.73 μm
2
.
B. CIRCUIT DIAGRAMS














Figure 33 SLEEP TRANSISTOR
68




























Figure 34 “FORCED STACK” METHOD
Figure 35 “SLEEPY KEEPER” METHOD
69




























Figure 36 “DUAL SLEEP” METHOD
Figure 37 “DUAL STACK” METHOD
70




























Figure 38 “VARIABLE BODY BIASING TECHNIQUE”
71




























Figure 39 “SLEEPY KEEPER” (FA)
Figure 40 “DUAL SLEEP” (FA)
72
















Figure 41 “DUAL STACK” (FA)
Figure 42 “VBB” (FA)
73


BIBLIOGRAPHY

[1] C. H. I. Kim, H. Soeleman, and K. Roy, “Ultra-Low-Power DLMS Adaptive
Filter for Hearing Aid Applications,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol.11,no.6,pp.1058–1067,2003.

[2] M.Seok, S.Hanson, Y.S.Lin, Z.Foo, D.Kim, Y.Lee, N.Liu, D.Sylvester, and
D.Blaauw, “The Phoenix Processor: a 30pW Platform for Sensor Applications,”
in Proceedings of IEEE Symposium on VLSI Circuits, 2008, pp.188–189.

[3] R.Vaddi, S.Dasgupta, and R.P.Agarwal, “Device and Circuit Design Challenges
in the Digital Subthreshold Region for Ultra low-Power Applications,” VLSI
Design,vol.2009,pp.1–14,Jan.2009.

[4] A.Wang, B.H.Calhoun, and A.P.Chandrakasan, “Subthreshold Design for Ultra
Low-Power Systems.” Springer, 2006.

[5] A.Wang and A.Chandrakasan, “A 180mV FFT Processor Using Subthreshold
Circuit Techniques,” in IEEE International Solid-State Circuits Conference
Digest of Technical Papers, 2004, pp.292–529.

[6] B.Zhai, S.Pant, L.Nazhandali, S.Hanson, J.Olson, A.Reeves, M.Minuth,
R.Helfand,T.Austin, D.Sylvester, and D.Blaauw, “Energy-Efficient
Subthreshold Processor Design,”IEEE Transactions on Very Large Scale
Integration (VLSI) Systems,vol.17,no.8,pp.1127–1137, aug2009.

[7] M.Kulkarni, “A Reduced Constraint Set Linear Program for Low-Power Design
of Digital Circuits,” Master‟s thesis, Auburn University, Dept. of ECE, Auburn,
Alabama, Dec.2010.

74

[8] M.Kulkarni and V.D.Agrawal, “A Tutorial on Battery Simulation-Matching
Power Source to Electronic System,” in Proceedings of 14th IEEE VLSI Design
and Test Symposium, July 2010.

[9] M.Kulkarni and V.D.Agrawal, “Energy Source Lifetime Optimization for a
Digital System through Power Management,” in Proceedings of 43rd IEEE
Southeastern Symposium on System Theory, Mar.2011,pp.75–80.

[10] B.Zhai, D.Blaauw, D.Sylvester, and K.Flautner, “Theoretical and Practical
Limits of Dynamic Voltage Scaling,” in Proceedings of 41st Design
Automation Conference, 2004, pp. 868–873.

[11] B. H. Calhoun and A. P. Chandrakasan, “Ultra-Dynamic Voltage Scaling
(UDVS) Using Subthreshold Operation and Local Voltage Dithering,” IEEE
Journal of Solid-State Circuits,vol.41, no.1,pp.238–245,2006.

[12] International Technology Roadmap for Semiconductors by Semiconductor
Industry Association, 2002. [Online] Available http://public.itrs.net

[13] kim,n., Austin,t., Baauw,d., Mudge,t., Flautner,k., Hu,j., Irwin, m., Kandemir,
m., and Narayanan, v., “Leakage Current: Moore‟s Law Meets Static Power,”
IEEE computer, vol. 36, pp. 68–75, December 2003.

[14] J.C. Park, V. J. Mooney III and P. Pfeiffenberger, “Sleepy Stack Reduction of
Leakage Power,” Proceeding of the International Workshop on Power and
Timing Modeling, Optimization and Simulation, pp. 148-158, September 2004.

[15] J. Park, “Sleepy Stack: a New Approach to Low Power VLSI and Memory,”
Ph.D. Dissertation, School of Electrical and Computer Engineering, Georgia
Institute of Technology, 2005. [Online].Available http://etd.gatech.edu/theses

[16] S. Kim and V. Mooney, “The Sleepy Keeper Approach: Methodology, Layout
and Power Results for a 4 bit Adder,” Technical Report GIT-CERCS-06-03,
75

Georgia Institute of Technology, March 2006,http://www.cercs.gatech.edu/tech-
reports/tr2006/git-cercs-06-03.pdf.

[17] N. Karmakar, M. Z. Sadi, M. K. Alam and M. S. Islam, “A novel dual sleep
approach to low leakage and area efficient VLSI design” Proc. 2009 IEEE
Regional Symposium on Micro and Nano Electronics(RSM2009), Kota Bharu,
Malaysia, August 10-12, 2009, pp. 409-414.

[18] M. S. Islam, M. Sultana Nasrin, Nuzhat Mansur and Naila Tasneem, “Dual
Stack Method: A Novel Approach to Low Leakage and Speed Power Product
VLSI Design” Proc, International Conference on Electrical and Computer
Engineering (ICECE) 2010, Dhaka, Bangladesh. 18-20 December 2010, pp. 89-
92.

[19] V.Kursun and E.G.Friedman, Multi-Voltage CMOS Circuit Design. Wiley,
2006.

[20] Y.Ramadass and A.Chandrakasan, “Voltage scalable switched capacitor dc-dc
converter for ultra-low-power on-chip applications,” in Proceedings of Power
Electronics Specialists Conference, 2007, pp.2353–2359.

[21] Berkeley Predictive echnology Model (BPTM). [Online]. Available http: //
www. device. eecs. berkeley.edu/˜ptm/.

[22] JOHNSON, M.C., SOMASEKHAR,D., and ROY,K., “Models and Algorithms
for Bounds on Leakage in CMOS Circuits,” IEEE Transactions on Computer
Aided De-sign on Integrated Circuits and Systems, vol.18, no.6, pp.714–725,
June1999.

[23] NARENDRA, S., DE,V., BORKAR,S., ANTONIADIS,D.A., and
CHANDRAKASAN,A.P., “Full-Chip Subthreshold Leakage Power Prediction
and Reduction Techniques for Sub-0.18µm CMOS,” IEEE Journal of Solid-
State Circuits, vol.39, no.2, pp.501–510, February2004.

76

[24] SHEU, B., SCHARFETTER, D., KO,P.-K., and JENG,M.-C., “BSIM: Berkeley
short-channel IGFET model for MOS transistors,” IEEE Journal of Solid-State
Circuits, vol.22, pp.558–566, August1987.

[25] UYEMURA, J.P., CMOS Logic Circuit Design Second Edition. Norwell,
Massachusetts USA: Kluwer Academic Publishers, 1999.

[26] KIM,C. and ROY,K., “Dynamic Vt SRAM: a Leakage Tolerant Cache Memory
for Low Voltage Microprocessors,” Proceedings of the International
Symposium on Low Power Electronics and Design, pp.251–254, August2002.

[27] NOSE,K. and SAKURAI,T., “Analysis and Future Trend of Short Circuit
Power,” IEEE Transactions on Computer Aided Design of Integrated Circuits
and Systems, vol.19, no.9, pp.1023–1030, September 2000.

[28] CHANDRAKASAN, A. P., SHENG, S., and BRODERSEN, R.W., “Low-
Power CMOS Digital Design,” IEEE Journal of Solid-State Circuits, vol.27,
no.4, pp.473–484, April1992.

[29] KHELLAH,M.M. and ELMASRY,M.I., “Power Minimization of High-
Performance Submicron CMOS Circuits Using a Dual-V
dd
Dual-V
th
(DVDV)
Approach,” Proceedings of the International Symposium on Low Power
Electronics and Design, pp.106–108, 1999.

[30] SAKURAI, T. and NEWTON, A. R., “Alpha-Power Law MOSFET Model and
Its Application to CMOS Inverter Delay and Other Formulas,” IEEE Journal of
Solid State Circuits, vol.25, no.2, pp.584–593, April 1990.

[31] BOWMAN,K.A., AUSTIN,B.L., EBLE,J.C., TANG,X., and MEINDL,J.D., “A
Physical Alpha-Power Law MOSFET Model,” IEEE Journal of Solid-State
Circuits, vol.34, no.10, pp.1410–1414, October 1999.

[32] MICHELI, G.D., “Synthesis and Optimization of Digital Circuits.” USA:
McGraw-Hill Inc., 1994.
77


[33] Neil H. E. Weste., Harris, David., Banerjee, Ayan, “CMOS VLSI DESIGN: a
circuits and systems perspective”, third edition, Pearson, pp. 116-117, 2006

[34] MUTOH,S., DOUSEKI,T., MATSUYA,Y., AOKI,T., SHIGEMATSU,S., and
YAMADA,J., “1-V Power Supply High-speed Digital Circuit Technology with
Multi-threshold-Voltage CMOS,” IEEE Journal of Solis-State Circuits, vol.30,
no.8, pp.847–854, August 1995.

[35] CHEN,Z., JOHNSON,M., WEI,L., and ROY,K., “Estimation of Standby
Leakage Power in CMOS Circuits Considering Accurate Modeling of
Transistor Stacks,” Proceedings of the International Symposium on Low Power
Electronics and Design, pp.239–244, August 1998.

[36] Berkeley Predictive Technology Model (BPTM). [Online]. Available
http://www.device.eecs.berkeley.edu/˜ptm/.

[37] NARENDRA,S., S.BORKAR,V.D., ANTONIADIS,D., and
CHANDRAKASAN,A., “Scaling of Stack Effect and its Application for
Leakage Reduction,” Proceedings of the International Symposium on Low
Power Electronics and Design, pp.195–200, August 2001.

[38] JOHNSON,M., SOMASEKHAR,D., CHIOU,L.-Y., and ROY,K., “Leakage
Control with Efficient Use of Transistor Stacks in Single Threshold CMOS,”
IEEE Transactions on VLSI Systems, vol.10, no.1, pp.1–5, February 2002

[39] D. Patterson and J. Hennessy, Computer Architecture: A Quantitative
Approach. Palo Alto, California: Morgan Kaufmann Publishers, pp. 5-7, 1990.